NEWBIE USING PYTHON (2.7.9)- When I export a gzipped file to a csv using:
myData = gzip.open('file.gz.DONE', 'rb')
myFile = open('output.csv', 'wb') with myFile:
writer = csv.writer(myFile)
writer.writerows(myData)
print("Writing complete")
It is printing in the csv with a comma deliminated in every character. eg.
S,V,R,","2,1,4,0,",",2,0,1,6,1,1,3,8,0,4,",",5,0,5,0,1,3,4,2,0,6,4,7,3,6,4,",",",",2,0,0,0,5,6,5,9,2,9,6,7,4,",",2,0,0,7,2,4,5,2,3,5,",",0,0,0,2,","
I,V,E,",",",",",",E,N,",",N,/,A,",",0,4,2,1,4,4,9,3,7,0,",":,I,R,_,",",N,/,A,",",U,N,A,N,S,W,",",",",",",",","
"
S,V,R,",",4,7,3,3,5,5,",",2,0,5,7,",",5,0,5,0,1,4,5,0,1,6,4,8,6,3,7,",",",",2,0,0,0,5,5,3,9,2,9,2,8,0,",",2,0,4,4,1,0,8,3,7,8,",",0,0,0,2,","
I,V,E,",",",",",",E,N,",",N,/,A,",",0,4,4,7,3,3,5,4,5,5,",",,:,I,R,_,",",N,/,A,",",U,N,A,N,S,W,",",",",",",",","
How do I get rid of the comma so that it is exported with the correct fields? eg.
SVR,2144370,20161804,50501342364,,565929674,2007245235,0002,1,PPDAP,PPLUS,DEACTIVE,,,EN,N/A,214370,:IR_,N/A,,,,,
SVR,473455,208082557,14501648637,,2000553929280,2044108378,0002,1,3G,CODAP,INACTIVE,,,EN,N/A,35455,:IR_,N/A,,,,,
You are only opening the gzip file. I think you are expecting the opened file to act automatically like an iterator. Which it does. However each line is a text string. The writerows expects an iterator with each item being an array of values to write with comma separation. Thus given an iterator with each item being a sting, and given that a string is an array of characters you get the result you found.
Since you didn't mention what the gzip data lines really contain I can't guess how to parse the lines into an array of reasonable chunks. But assuming a function called 'split_line' appropriate to that data you could do
with gzip.open('file.gz.Done', 'rb') as gzip_f:
data = [split_line(l) for l in gzip_f]
with open('output.csv', 'wb') as myFile:
writer = csv.writer(myFile)
writer.writerows(data)
print("Writing complete")
Of course at this point doing row by row and putting the with lines together makes sense.
See https://docs.python.org/2/library/csv.html
I think it's simply because gzip.open() will give you a file-like object but csvwriter.writerows() needs a list of lists of strings to do its work.
But I don't understand why you want to use the csv module. You look like you only want to extract the content of the gzip file and save it in a output file uncompressed. You could do that like that:
import gzip
input_file_name = 'file.gz.DONE'
output_file_name = 'output.csv'
with gzip.open(input_file_name, 'rt') as input_file:
with open('output.csv', 'wt') as output_file:
for line in input_file:
output_file.write(line)
print("Writing complete")
If you want to use the csv module because you're not sure your input data is properly formatted (and you want an error message right away) you could then do:
import gzip
import csv
input_file_name = 'file.gz.DONE'
output_file_name = 'output.csv'
with gzip.open(input_file_name, 'rt', newline='') as input_file:
reader_csv = csv.reader(input_file)
with open('output.csv', 'wt', newline='') as output_file:
writer_csv = csv.writer(output_file)
writer_csv.writerows(reader_csv)
print("Writing complete")
Is that what you were trying to do ? It's difficult to guess because we don't have the input file to understand.
If it's not what you want, could you care to clarify what you want?
Since I now have information the gzipped file is itself comma, separated values it simplifies thus..
with gzip.open('file.gz.DONE', 'rb') as gzip_f, open('output.csv', 'wb') as myFile:
myfile.write(gzip_f.read())
In other words it is just a round about gunzip to another file.
I am reading in data of the following type from a file, and I need a method to store it for further calculations.
ID1 , ID2 , value
A , 1 , 520
A , 2 , 180
A , 3 , 80
B , 1 , 49
C , 1 , 96
C , 2 , 287
etc.
What is the best way to save it?
In PERL, I would have used a hash and separators, as follows, and then called by hash key and separated using split over the comma:
$data{$ID1} .= $ID2.':'.$value.',';
I have to address the following problem in PYTHON as it would be integrated with other code, but I am new to the language. Please suggest what might be the best way to do it.
P.S. The input data file is huge (~500Mb) and could be more.
Thanks for the help.
If you've loaded your data with Python, and your next program down the line is also written in Python, you can simply use the pickle module, like this:
big_list_list = [["A", 1, 520], ["A", 2, 180], ["B", 1, 49]]
import pickle
# Storing the data
with open("data.pickle", "wb") as outfile:
pickle.dump(big_list_list, outfile)
# Retrieving the data
with with open("data.pickle", "rb") as infile:
reconstructed_big_list_list = pickle.load(infile)
This has two caveats: if part of your workflow includes Non-Python programs, they won't be able to read pickles. And you shouldn't trust pickle files from arbitrary sources, since they could contain malicious code.
Instead of using pickles, you can also use JSON files. Simple replace the word pickle with json in the recipy above. JSON has the advantage that many Non-Python programs can deal with it.
Even more universal would be the use of CSV files, like this:
import csv
with open('data.csv', 'w', newline='') as outfile:
writer = csv.writer(outfile)
writer.writerows(big_list_list)
with open('data.csv', newline='') as infile:
reader = csv.reader(infile)
reconstructed_big_list_list = [row for row in reader]
Python's standard library also includes the module sqlite3, which allows you to write your data to a database, which might be useful if your data becomes more complicated than a simple list of lists, or you need concurrent access.
PS.: I just saw that you noted that your files could be uge. In this case, you could modify the CSV solution to store and load your data incrementally:
import csv
with open('data.csv', 'w', newline='') as outfile:
writer = csv.writer(outfile)
for row in big_list_list:
writer.writerow(row)
with open('data.csv', newline='') as infile:
reader = csv.reader(infile)
for row in reader:
print(row)
I am currently trying to write a csv file in python. The format is as following:
1; 2.51; 12
123; 2.414; 142
EDIT: I already get the above format in my CSV, so the python code seems ok. It appears to be an excel issue which is olved by changing the settigs as #chucksmash mentioned.
However, when I try to open the generated csv file with excel, it doesn't recognize decimal separators. 2.414 is treated as 2414 in excel.
csvfile = open('C:/Users/SUUSER/JRITraffic/Data/data.csv', 'wb')
writer = csv.writer(csvfile, delimiter=";")
writer.writerow(some_array_with_floats)
Did you check that the csv file is generated correctly as you want? Also, try to specify the delimeter character that your using for the csv file when you import/open your file. In this case, it is a semicolon.
For python 3, I think your above code will also run into a TypeError, which may be part of the problem.
I just made a modification with your open method to be 'w' instead of 'wb' since the array has float and not binary data. This seemed to generate the result that you were looking for.
csvfile = open('C:/Users/SUUSER/JRITraffic/Data/data.csv', 'w')
An ugly solution, if you really want to use ; as the separator:
import csv
import os
with open('a.csv', 'wb') as csvfile:
csvfile.write('sep=;'+ os.linesep) # new line
writer = csv.writer(csvfile, delimiter=";")
writer.writerow([1, 2.51, 12])
writer.writerow([123, 2.414, 142])
This will produce:
sep=;
1;2.51;12
123;2.414;142
which is recognized fine by Excel.
I personally would go with , as the separator in which case you do not need the first line, so you can basically:
import csv
with open('a.csv', 'wb') as csvfile:
writer = csv.writer(csvfile) # default delimiter is `,`
writer.writerow([1, 2.51, 12])
writer.writerow([123, 2.414, 142])
And excel will recognize what is going on.
A way to do this is to specify dialect=csv.excel in the writer. For example:
a = [[1, 2.51, 12],[123, 2.414, 142]]
csvfile = open('data.csv', 'wb')
writer = csv.writer(csvfile, delimiter=";", dialect=csv.excel)
writer.writerows(a)
csvfile.close()
Unless Excel is already configured to use semicolon as its default delimiter, it will be necessary to import data.csv using Data/FromText and specify semicolon as the delimiter in the Text Import Wizard step 2 screen.
Very little documentation is provided for the Dialect class at csv.Dialect. More information about it is at Dialects in the PyMOTW's "csv – Comma-separated value files" article on the Python csv module. More information about csv.writer() is available at https://docs.python.org/2/library/csv.html#csv.writer.