I have two variations of CSV files. One of them uses double quotes, the other one doesn't.
A: "shipment_id","status","to_name","to_address_1" etc
B: shipment_id,status,to_name,to_address_1 etc
How can read the CSV and print out the value for shipment_id regardless of which type of CSV is submitted?
My code doesn't seem to work when the CSV doesn't use double quotes.
with open(file_location) as f_obj:
reader = csv.DictReader(f_obj, delimiter=',')
for line in reader:
print(line['shipment_id'])
Try this:
with open(file_location) as f_obj:
f_obj = f_obj.read().replace('"','').splitlines()
reader = csv.DictReader(f_obj, delimiter=',')
for line in reader:
print(line['shipment_id'])
.replace('"', '') will work if it has double quotes, and it will do nothing if it doesn't.
Let me know if it works :)
Based on what I think the .csv file should look like and experience from pandas read_csv, decided to give my input as follow
example of the test.csv file
"1233",No,N,C
9999,OK,C,N
example of the test1.csv file
"321",ok,P,A
980,No,A,G
"1980","No",A,"G"
Code with specified fieldnames for test.csv with print(line['shipment_id']):
import csv
with open('test.csv') as f_obj:
reader = csv.DictReader(f_obj, delimiter=',', fieldnames=['shipment_id','status','to_name','to_address_1'])
for line in reader:
print(line['shipment_id'])
output:
1233
9999
Code with specified fieldnames for test1.csv with print(line['shipment_id']):
with open('test1.csv') as f_obj:
reader_ddQ = csv.DictReader(f_obj, delimiter=',', fieldnames=['shipment_id','status','to_name','to_address_1'])
for line in reader_ddQ:
print(line['shipment_id'])
output:
321
980
1980
Code with specified fieldnames for test1.csv with print(line):
with open('test1.csv') as f_obj:
reader = csv.DictReader(f_obj, delimiter=',', fieldnames=['shipment_id','status','to_name','to_address_1'])
for line in reader:
print(line)
output:
OrderedDict([('shipment_id', '321'), ('status', 'ok'), ('to_name', 'P'), ('to_address_1', 'A')])
OrderedDict([('shipment_id', '980'), ('status', 'No'), ('to_name', 'A'), ('to_address_1', 'G')])
OrderedDict([('shipment_id', '1980'), ('status', 'No'), ('to_name', 'A'), ('to_address_1', 'G')])
Source for the csv.DictReader
You should be able to use quotechar as parameter:
reader = csv.DictReader(f_obj, delimiter=',', quotechar='"')
(or maybe '\"' - I don't know how Python handles this).
This should work on both versions of your data.
If DictReader doen't support the quotechar parameter, try to use it on csv.reader directly.
Related
I have a csv file as follow:
lat,lon,date,data1,data2
1,2,3,4,5
6,7,8,9,10
From this csv file I want to retrieve and extract the column date and data1 to another csv file. I have the following code:
import csv
os.chdir(mydir)
column_names = ["date", "data1"]
index=[]
with open("my.csv", "r") as f:
mycsv = csv.DictReader(f)
for row in mycsv:
for col in column_names:
try:
data=print(row[col])
with open("test2.txt", "w") as f:
print(data, file=f)
except KeyError:
pass
Unfortunately, the output is a file with a "none" on it... Does anyone knows how to retrieve and write to another file the data I wish to use?
There are a few issues with your code:
Everytime you open("test2.txt", "w"), w option will open your file and delete all its contents.
You are storing return value or print, which is None and then trying to print this into yout file
Read your CSV into a list of dict's, as below:
import csv
with open('your_csv.csv') as csvfile:
reader = csv.DictReader(csvfile)
read_l = [{key:value for key, value in row.items() if key in ('date', 'data1')}
for row in reader]
and then use DictWriter to write to a new CSV.
with open('new.csv', 'w') as csvfile:
fieldnames = read_l[0].keys()
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
writer.writeheader()
for row in read_l[1:]:
writer.writerow(row)
Try with below steps may help you. But they require pandas library.Install pandas library before you go for below steps. input.csv contains data that you have mentioned.
import pandas as pd
df=pd.read_csv('input.csv')
df_new=df.iloc[0:,2:4]
df_new.to_csv("output.csv",index=False)
The reason why you see None in your file is because you're assigning the result of print(row[col]) to your data variable:
data=print(row[col])
print() doesn't return anything, therefore the content of data is None. If you remove the print() and just have data = row[col], you will get something valuable.
There is one more issue that I see in your code, which you probably want to get fixed:
You're opening the file over and over again with each iteration in the first loop. Therefore, with each row you're overwriting the entire file with that rows value. If you want the entire column, then you'd have open the file once, before the loop.
I will recommend you should use panda. I haven't run this script but something like this should work.
import panda as pd
import csv
frame = pd.read_csv('my.csv')
df=frame[['date','data2']]
with open('test2.csv', 'a', newline='') as csvfile:
writer = csv.writer(csvfile, delimiter=',',
quotechar='|', quoting=csv.QUOTE_MINIMAL)
writer.writerow(df)
import pandas as pd
df = pd.read_csv("my.csv") #optional "header"=True
new_df = df[["date","data1"]]
new_df.to_csv("new_csv_name.csv")
#if you don't need index
new_df.to_csv('new_csv_name.csv', index=False)
I have a CSV file containing the following.
0.000264,0.000352,0.000087,0.000549
0.00016,0.000223,0.000011,0.000142
0.008853,0.006519,0.002043,0.009819
0.002076,0.001686,0.000959,0.003107
0.000599,0.000133,0.000113,0.000466
0.002264,0.001927,0.00079,0.003815
0.002761,0.00288,0.001261,0.006851
0.000723,0.000617,0.000794,0.002189
I want convert the values into an array in Python and keep the same order (row and column). How I can achieve this?
I have tried different functions but ended with error.
You should use the csv module:
import csv
results = []
with open("input.csv") as csvfile:
reader = csv.reader(csvfile, quoting=csv.QUOTE_NONNUMERIC) # change contents to floats
for row in reader: # each row is a list
results.append(row)
This gives:
[[0.000264, 0.000352, 8.7e-05, 0.000549],
[0.00016, 0.000223, 1.1e-05, 0.000142],
[0.008853, 0.006519, 0.002043, 0.009819],
[0.002076, 0.001686, 0.000959, 0.003107],
[0.000599, 0.000133, 0.000113, 0.000466],
[0.002264, 0.001927, 0.00079, 0.003815],
[0.002761, 0.00288, 0.001261, 0.006851],
[0.000723, 0.000617, 0.000794, 0.002189]]
If your file doesn't contain parentheses
with open('input.csv') as f:
output = [float(s) for line in f.readlines() for s in line[:-1].split(',')]
print(output);
The csv module was created to do just this. The following implementation of the module is taken straight from the Python docs.
import csv
with open('file.csv','rb') as csvfile:
reader = csv.reader(csvfile, delimiter=',', quotechar='|')
for row in reader:
#add data to list or other data structure
The delimiter is the character that separates data entries, and the quotechar is the quotechar.
I would like to create a subset of a large CSV file using the rows that have the 4th column ass "DOT" and output to a new file.
This is the code I currently have:
import csv
outfile = open('DOT.csv','w')
with open('Service_Requests_2015_-_Present.csv', newline='', encoding='utf-8') as f:
reader = csv.reader(f)
for row in reader:
if row[3] == "DOT":
outfile.write(row)
outfile.close()
The error is:
outfile.write(row)
TypeError: must be str, not list
How can I manipulate row so that I will be able to just straight up do write(row), if not, what is the easiest way?
You can combine your two open statements, as the with statement accepts multiple arguments, like this:
import csv
infile = 'Service_Requests_2015_-_Present.csv'
outfile = 'DOT.csv'
with open(infile, encoding='utf-8') as f, open(outfile, 'w') as o:
reader = csv.reader(f)
writer = csv.writer(o, delimiter=',') # adjust as necessary
for row in reader:
if row[3] == "DOT":
writer.writerow(row)
# no need for close statements
print('Done')
Make your outfile a csv.writer and use writerow instead of write.
outcsv = csv.writer(outfile, ...other_options...)
...
outcsv.writerow(row)
That is how I would do it... OR
outfile.write(",".join(row)) # comma delimited here...
In Above code you are trying to write list with file object , we can not write list that give error "TypeError: must be str, not list" you can convert list in string format then you able to write row in file. outfile.write(str(row))
or
import csv
def csv_writer(input_path,out_path):
with open(out_path, 'ab') as outfile:
writer = csv.writer(outfile)
with open(input_path, newline='', encoding='utf-8') as f:
reader = csv.reader(f)
for row in reader:
if row[3] == "DOT":
writer.writerow(row)
outfile.close()
csv_writer(input_path,out_path)
[This code for Python 3 version. In Python 2.7, the open function does not take a newline argument, hence the TypeError.]
I have an input csv that look like
email,trait1,trait2,trait3
foo#gmail,biz,baz,buzz
bar#gmail,bizzy,bazzy,buzzy
foobars#gmail,bizziest,bazziest,buzziest
and I need the output format to look like
Indv,AttrName,AttrValue,Start,End
foo#gmail,"trait1",biz,,,
foo#gmail,"trait2",baz,baz,,
foo#gmail,"trait3",buzz,,,
For each row in my input file I need to write a row for the N-1 columns in the input csv. The Start and End fields in the output file can be empty in some cases.
I'm trying to read in the data using a DictReader. So for i've been able to read in the data with
import unicodecsv
import os
import codecs
with open('test.csv') as csvfile:
reader = unicodecsv.csv.DictReader(csvfile)
outfile = codecs.open("test-write", "w", "utf-8")
outfile.write("Indv", "ATTR", "Value", "Start","End\n")
for row in reader:
outfile.write([row['email'],"trait1",row['trait1'],'',''])
outfile.write([row['email'],"trait2",row['trait2'],row['trait2'],''])
outfile.write([row['email'],"trait3",row['trait3'],'','')
Which doesn't work. (I think I need to cast the list to a string), and is also very brittle as I'm hardcoding the column names for each row. The bigger issue is that the data within the for loop isn't written to "test-write". Only the line
outfile.write("Indv", "ATTR", "Value", "Start","End\n") actually write out to the file. Is DictReader the appropriate class to use in my case?
This uses a unicodecsv.DictWriter and the zip() function to do what you want, and the code is fairly readable in my opinion.
import unicodecsv
import os
import codecs
with open('test.csv') as infile, \
codecs.open('test-write.csv', 'w', 'utf-8') as outfile:
reader = unicodecsv.DictReader(infile)
fieldnames = 'Indv,AttrName,AttrValue,Start,End'.split(',')
writer = unicodecsv.DictWriter(outfile, fieldnames)
writer.writeheader()
for row in reader:
email = row['email']
trait1, trait2, trait3 = row['trait1'], row['trait2'], row['trait3']
writer.writerows([ # writes three rows of output from each row of input
dict(zip(fieldnames, [email, 'trait1', trait1])),
dict(zip(fieldnames, [email, 'trait2', trait2, trait2])),
dict(zip(fieldnames, [email, 'trait3', trait3]))])
Here's the contents of the test-write.csv file it produced from your example input csv file:
Indv,AttrName,AttrValue,Start,End
foo#gmail,trait1,biz,,
foo#gmail,trait2,baz,baz,
foo#gmail,trait3,buzz,,
bar#gmail,trait1,bizzy,,
bar#gmail,trait2,bazzy,bazzy,
bar#gmail,trait3,buzzy,,
foobars#gmail,trait1,bizziest,,
foobars#gmail,trait2,bazziest,bazziest,
foobars#gmail,trait3,buzziest,,
I may be completely off since I don't do a lot of work with unicode, but it seems to me that the following should work:
import csv
with open('test.csv', 'ur') as csvin, open('test-write', 'uw') as csvout:
reader = csv.DictReader(csvin)
writer = csv.DictWriter(csvout, fieldnames=['Indv', 'AttrName',
'AttrValue', 'Start', 'End'])
for row in reader:
for traitnum in range(1, 4):
key = "trait{}".format(traitnum)
writer.writerow({'Indv': row['email'], 'AttrName': key,
'AttrValue': row[key]})
import pandas as pd
pd1 = pd.read_csv('input_csv.csv')
pd2 = pd.melt(pd1, id_vars=['email'], value_vars=['trait1','trait2','trait3'], var_name='AttrName', value_name='AttrValue').rename(columns={'email': 'Indv'}).sort(columns=['Indv','AttrName']).reset_index(drop=True)
pd2.to_csv('output_csv.csv', index=False)
Unclear on what the Start and End fields represent, but this gets you everything else.
I'm trying to parse a pipe-delimited file and pass the values into a list, so that later I can print selective values from the list.
The file looks like:
name|age|address|phone|||||||||||..etc
It has more than 100 columns.
Use the 'csv' library.
First, register your dialect:
import csv
csv.register_dialect('piper', delimiter='|', quoting=csv.QUOTE_NONE)
Then, use your dialect on the file:
with open(myfile, "rb") as csvfile:
for row in csv.DictReader(csvfile, dialect='piper'):
print row['name']
Use Pandas:
import pandas as pd
pd.read_csv(filename, sep="|")
This will store the file in a dataframe. For each column, you can apply conditions to select the required values to print. It takes a very short time to execute. I tried with 111,047 rows.
If you're parsing a very simple file that won't contain any | characters in the actual field values, you can use split:
fileHandle = open('file', 'r')
for line in fileHandle:
fields = line.split('|')
print(fields[0]) # prints the first fields value
print(fields[1]) # prints the second fields value
fileHandle.close()
A more robust way to parse tabular data would be to use the csv library as mentioned in Spencer Rathbun's answer.
In 2022, with Python 3.8 or above, you can simply do:
import csv
with open(file_path, "r") as csvfile:
reader = csv.reader(csvfile, delimiter='|')
for row in reader:
print(row[0], row[1])