I'm trying to write lists like this to a CSV file:
['ABC','One,Two','12']
['DSE','Five,Two','52']
To a file like this:
ABC One,Two 12
DSE Five,Two 52
Basically, write anything inside '' to a cell.
However, it is splitting One and Two into different cells and merging ABC with One in the first cell.
Part of my script:
out_file_handle = open(output_path, "ab")
writer = csv.writer(out_file_handle, delimiter = "\t", dialect='excel', lineterminator='\n', quoting=csv.QUOTE_NONE)
output_final = (tsv_name_list.split(".")[0]+"\t"+key + "\t" + str(listOfThings))
output_final = str([output_final]).replace("[","").replace("]","").replace('"',"").replace("'","")
output_final = output_final.split("\\t")
print output_final #gives the first lists of strings I mentioned above.
writer.writerow(output_final)
First output_final line gives
ABC One,Two 12
DSE Five,Two 52
Using the csv module simply works, so you're going to need to be more specific about what's convincing you that the elements are bleeding across cells. For example, using the (now quite outdated) Python 2.7:
import csv
data_lists = [['ABC','One,Two','12'],
['DSE','Five,Two','52']]
with open("out.tsv", "wb") as fp:
writer = csv.writer(fp, delimiter="\t", dialect="excel", lineterminator="\n")
writer.writerows(data_lists)
I get an out.tsv file of:
dsm#winter:~/coding$ more out.tsv
ABC One,Two 12
DSE Five,Two 52
or
>>> out = open("out.tsv").readlines()
>>> for row in out: print repr(row)
...
'ABC\tOne,Two\t12\n'
'DSE\tFive,Two\t52\n'
which is exactly as it should be. Now if you take these rows, which are tab-delimited, and for some reason split them using commas as the delimiter, sure, you'll think that there are two columns, one with ABC\tOne and one with Two\t12. But that would be silly.
You've set up the CSV writer, but then for some reason you completely ignore it and try to output the lines manually. That's pointless. Use the functionality available.
writer = csv.writer(...)
for row in tsv_list:
writer.writerow(row)
Related
I am trying to add a header to my CSV file.
I am importing data from a .csv file which has two columns of data, each containing float numbers. Example:
11 22
33 44
55 66
Now I want to add a header for both columns like:
ColA ColB
11 22
33 44
55 66
I have tried this:
with open('mycsvfile.csv', 'a') as f:
writer = csv.writer(f)
writer.writerow(('ColA', 'ColB'))
I used 'a' to append the data, but this added the values in the bottom row of the file instead of the first row. Is there any way I can fix it?
One way is to read all the data in, then overwrite the file with the header and write the data out again. This might not be practical with a large CSV file:
#!python3
import csv
with open('file.csv',newline='') as f:
r = csv.reader(f)
data = [line for line in r]
with open('file.csv','w',newline='') as f:
w = csv.writer(f)
w.writerow(['ColA','ColB'])
w.writerows(data)
i think you should use pandas to read the csv file, insert the column headers/labels, and emit out the new csv file. assuming your csv file is comma-delimited. something like this should work:
from pandas import read_csv
df = read_csv('test.csv')
df.columns = ['a', 'b']
df.to_csv('test_2.csv')
I know the question was asked a long time back. But for others stumbling across this question, here's an alternative to Python.
If you have access to sed (you do if you are working on Linux or Mac; you can also download Ubuntu Bash on Windows 10 and sed will come with it), you can use this one-liner:
sed -i 1i"ColA,ColB" mycsvfile.csv
The -i will ensure that sed will edit in-place, which means sed will overwrite the file with the header at the top. This is risky.
If you want to create a new file instead, do this
sed 1i"ColA,ColB" mycsvfile.csv > newcsvfile.csv
In this case, You don't need the CSV module. You need the fileinput module as it allows in-place editing:
import fileinput
for line in fileinput.input(files=['mycsvfile.csv'], inplace=True):
if fileinput.isfirstline():
print 'ColA,ColB'
print line,
In the above code, the print statement will print to the file because of the inplace=True parameter.
For the issue where the first row of the CSV file gets replaced by the header, we need to add an option.
import pandas as pd
df = pd.read_csv('file.csv', **header=None**)
df.to_csv('file.csv', header = ['col1', 'col2'])
You can set reader.fieldnames in your code as list
like in your case
with open('mycsvfile.csv', 'a') as fd:
reader = csv.DictReader(fd)
reader.fieldnames = ["ColA" , "ColB"]
for row in fd
I'm looking to read a CSV file and make some calculations to the data in there (that part I have done).
After that I need to write all the data into a new file Exactly the way it is in the original with the exception of one column which will be changed to the result of the calculations.
I can't show the actual code (confidentiality issues) but here is an example of the code.
headings = ["losts", "of", "headings"]
with open("read_file.csv", mode="r") as read,\
open("write.csv", mode='w') as write:
reader = csv.DictReader(read, delimiter = ',')
writer = csv.DictWriter(write, fieldnames=headings)
writer.writeheader()
for row in reader:
writer.writerows()
At this stage I am just trying to return the same CSV in "write" as I have in "read"
I haven't used CSV much so not sure if I'm going about it the wrong way, also understand that this example is super simple but I can't seem to get my head around the logic of it.
You're really close!
headings = ["losts", "of", "headings"]
with open("read_file.csv", mode="r") as read, open("write.csv", mode='w') as write:
reader = csv.DictReader(read, delimiter = ',')
writer = csv.DictWriter(write, fieldnames=headings)
writer.writeheader()
for row in reader:
# do processing here to change the values in that one column
processed_row = some_function(row)
writer.writerow(processed_row)
Why you don't use pandas?
import pandas as pd
csv_file = pd.read_csv('read_file.csv')
# do_something_and_add_a_column_here()
csv_file.to_csv('write_file.csv')
The problem I have can be illustrated by showing a couple of sample rows in my csv (semicolon-separated) file, which look like this:
4;1;"COFFEE; COMPANY";4
3;2;SALVATION ARMY;4
Notice that in one row, a string is in quotation marks and has a semi-colon inside of it (none of the columns have quotations marks around them in my input file except for the ones containing semicolons).
These rows with the quotation marks and semicolons are causing a problem -- basically, my code is counting the semicolon inside quotation marks within the column/field. So when I read in this row, it reads this semicolon inside the string as a delimiter, thus making it seem like this row has an extra field/column.
The desired output would look like this, with no quotation marks around "coffee company" and no semicolon between 'coffee' and 'company':
4;1;COFFEE COMPANY;4
3;2;SALVATION ARMY;4
Actually, this column with "coffee company" is totally useless to me, so the final file could look like this too:
4;1;xxxxxxxxxxx;4
3;2;xxxxxxxxxxx;4
How can I get rid of just the semi-colons inside of this one particular column, but without getting rid of all of the other semi-colons?
The csv module makes it relatively easy to handle a situation like this:
# Contents of input_file.csv
# 4;1;"COFFEE; COMPANY";4
# 3;2;SALVATION ARMY;4
import csv
input_file = 'input_file.csv' # Contents as shown in your question.
with open(input_file, 'r', newline='') as inp:
for row in csv.reader(inp, delimiter=';'):
row[2] = row[2].replace(';', '') # Remove embedded ';' chars.
# If you don't care about what's in the column, use the following instead:
# row[2] = 'xyz' # Value not needed.
print(';'.join(row))
Printed output:
4;1;COFFEE COMPANY;4
3;2;SALVATION ARMY;4
Follow-on question: How to write this data to a new csv file?
import csv
input_file = 'input_file.csv' # Contents as shown in your question.
output_file = 'output_file.csv'
with open(input_file, 'r', newline='') as inp, \
open(output_file, 'w', newline='') as outp:
writer= csv.writer(outp, delimiter=';')
for row in csv.reader(inp, delimiter=';'):
row[2] = row[2].replace(';', '') # Remove embedded ';' chars.
writer.writerow(row)
Here's an alternative approach using the Pandas library which spares you having to code for loops:
import pandas as pd
#Read csv into dataframe df
df = pd.read_csv('data.csv', sep=';', header=None)
#Remove semicolon in column 2
df[2] = df[2].apply(lambda x: x.replace(';', ''))
This gives the following dataframe df:
0 1 2 3
0 4 1 COFFEE COMPANY 4
1 3 2 SALVATION ARMY 4
Pandas provides several inbuilt functions to help you manipulate data or make statistical conclusions. Having the data in a tabular format can also make working with it more intuitive.
I am creating a league table for a 6 a side football league and I am attempting to sort it by the points column and then display it in easygui. The code I have so far is this:
data = csv.reader(open('table.csv'), delimiter = ',')
sortedlist = sorted(data, key=operator.itemgetter(7))
with open("Newtable.csv", "wb") as f:
fileWriter = csv.writer(f, delimiter=',')
for row in sortedlist:
fileWriter.writerow(row)
os.remove("table.csv")
os.rename("Newtable.csv", "table.csv")
os.close
The number 7 relates to the points column in my csv file. I have a problem with Newtable only containing the teams information that has the highest points and the table.csv is apparently being used by another process and so cannot be removed.
If anyone has any suggestions on how to fix this it would be appreciated.
If the indentation in your post is actually the indentation in your script (and not a copy-paste error), then the problem is obvious:
os.rename() is executed during the for loop (which means that it's called once per line in the CSV file!), at a point in time where Newtable.csv is still open (not by a different process but by your script itself), so the operation fails.
You don't need to close f, by the way - the with statement takes care of that for you. What you do need to close is data - that file is also still open when the call occurs.
Finally, since a csv object contains strings, and strings are sorted alphabetically, not numerically (so "10" comes before "2"), you need to sort according to the numerical value of the string, not the string itself.
You probably want to do something like
with open('table.csv', 'rb') as infile:
data = csv.reader(infile, delimiter = ',')
sortedlist = [next(data)] + sorted(data, key=lambda x: int(x[7])) # or float?
# next(data) reads the header before sorting the rest
with open("Newtable.csv", "wb") as f:
fileWriter = csv.writer(f, delimiter=',')
fileWriter.writerows(sortedList) # No for loop needed :)
os.remove("table.csv")
os.rename("Newtable.csv", "table.csv")
I'd suggest using pandas:
Assuming an input file like this:
team,points
team1, 5
team2, 6
team3, 2
You could do:
import pandas as pd
a = pd.read_csv('table.csv')
b=a.sort('points',ascending=False)
b.to_csv('table.csv',index=False)
I am trying to add a header to my CSV file.
I am importing data from a .csv file which has two columns of data, each containing float numbers. Example:
11 22
33 44
55 66
Now I want to add a header for both columns like:
ColA ColB
11 22
33 44
55 66
I have tried this:
with open('mycsvfile.csv', 'a') as f:
writer = csv.writer(f)
writer.writerow(('ColA', 'ColB'))
I used 'a' to append the data, but this added the values in the bottom row of the file instead of the first row. Is there any way I can fix it?
One way is to read all the data in, then overwrite the file with the header and write the data out again. This might not be practical with a large CSV file:
#!python3
import csv
with open('file.csv',newline='') as f:
r = csv.reader(f)
data = [line for line in r]
with open('file.csv','w',newline='') as f:
w = csv.writer(f)
w.writerow(['ColA','ColB'])
w.writerows(data)
i think you should use pandas to read the csv file, insert the column headers/labels, and emit out the new csv file. assuming your csv file is comma-delimited. something like this should work:
from pandas import read_csv
df = read_csv('test.csv')
df.columns = ['a', 'b']
df.to_csv('test_2.csv')
I know the question was asked a long time back. But for others stumbling across this question, here's an alternative to Python.
If you have access to sed (you do if you are working on Linux or Mac; you can also download Ubuntu Bash on Windows 10 and sed will come with it), you can use this one-liner:
sed -i 1i"ColA,ColB" mycsvfile.csv
The -i will ensure that sed will edit in-place, which means sed will overwrite the file with the header at the top. This is risky.
If you want to create a new file instead, do this
sed 1i"ColA,ColB" mycsvfile.csv > newcsvfile.csv
In this case, You don't need the CSV module. You need the fileinput module as it allows in-place editing:
import fileinput
for line in fileinput.input(files=['mycsvfile.csv'], inplace=True):
if fileinput.isfirstline():
print 'ColA,ColB'
print line,
In the above code, the print statement will print to the file because of the inplace=True parameter.
For the issue where the first row of the CSV file gets replaced by the header, we need to add an option.
import pandas as pd
df = pd.read_csv('file.csv', **header=None**)
df.to_csv('file.csv', header = ['col1', 'col2'])
You can set reader.fieldnames in your code as list
like in your case
with open('mycsvfile.csv', 'a') as fd:
reader = csv.DictReader(fd)
reader.fieldnames = ["ColA" , "ColB"]
for row in fd