Python csv: Optimizing header output - python

I am writing a huge (160k+) set of data from SQL to a CSV. My script functions exactly as intended, but I am sure there has to be a more efficient way of including a header in the output. I cobbled together the following from reading writing header in csv python with DictWriter but feel like it lacks elegance.
Here's my code:
f = open(outfile,'w')
wf = csv.DictWriter(f, fieldnames, restval='OOPS')
wf.writer.writerow(wf.fieldnames)
f.close()
f = open(outfile,'a')
wf = csv.writer(f)
wf.writerows(rows)
f.close()
fieldnames is defined explicitly (10 custom column names), rows contains the fetchall() from my query.

Untested, but I don't see why this shouldn't do the job:
import csv
with open(outfile, "wb") as outf:
outcsv = csv.writer(outf)
outcsv.writerow(fieldnames)
outcsv.writerows(rows)

Related

csv writing alternate whitespace in lines within a loop python

I have a code for generating a csv file shown below.
import csv
csvData = [["HAI", "Hello"]]
while(1):
with open('test.csv', 'a') as csvFile:
writer = csv.writer(csvFile, delimiter=',', quoting=csv.QUOTE_ALL)
writer.writerows(csvData)
csvFile.close()
When i run this code it will generate a csv file like this
1."HAI","Hello"
3."HAI","Hello"
That is it will generate on alternative lines,that is first write takes palce on line 1 ,second write takes on line 3.I want to write it on line 2. Anybody please help me.
If you look at the documentation for CSV formatting parameters, you will see that there is the argument lineterminator, which defaults to \r\n.
You can get your desired output by simply passing a regular \n like so:
writer = csv.writer(csvFile, delimiter=",", quoting=csv.QUOTE_ALL, lineterminator="\n")

Python changing Comma Delimitation CSV

NEWBIE USING PYTHON (2.7.9)- When I export a gzipped file to a csv using:
myData = gzip.open('file.gz.DONE', 'rb')
myFile = open('output.csv', 'wb') with myFile:
writer = csv.writer(myFile)
writer.writerows(myData)
print("Writing complete")
It is printing in the csv with a comma deliminated in every character. eg.
S,V,R,","2,1,4,0,",",2,0,1,6,1,1,3,8,0,4,",",5,0,5,0,1,3,4,2,0,6,4,7,3,6,4,",",",",2,0,0,0,5,6,5,9,2,9,6,7,4,",",2,0,0,7,2,4,5,2,3,5,",",0,0,0,2,","
I,V,E,",",",",",",E,N,",",N,/,A,",",0,4,2,1,4,4,9,3,7,0,",":,I,R,_,",",N,/,A,",",U,N,A,N,S,W,",",",",",",",","
"
S,V,R,",",4,7,3,3,5,5,",",2,0,5,7,",",5,0,5,0,1,4,5,0,1,6,4,8,6,3,7,",",",",2,0,0,0,5,5,3,9,2,9,2,8,0,",",2,0,4,4,1,0,8,3,7,8,",",0,0,0,2,","
I,V,E,",",",",",",E,N,",",N,/,A,",",0,4,4,7,3,3,5,4,5,5,",",,:,I,R,_,",",N,/,A,",",U,N,A,N,S,W,",",",",",",",","
How do I get rid of the comma so that it is exported with the correct fields? eg.
SVR,2144370,20161804,50501342364,,565929674,2007245235,0002,1,PPDAP,PPLUS,DEACTIVE,,,EN,N/A,214370,:IR_,N/A,,,,,
SVR,473455,208082557,14501648637,,2000553929280,2044108378,0002,1,3G,CODAP,INACTIVE,,,EN,N/A,35455,:IR_,N/A,,,,,
You are only opening the gzip file. I think you are expecting the opened file to act automatically like an iterator. Which it does. However each line is a text string. The writerows expects an iterator with each item being an array of values to write with comma separation. Thus given an iterator with each item being a sting, and given that a string is an array of characters you get the result you found.
Since you didn't mention what the gzip data lines really contain I can't guess how to parse the lines into an array of reasonable chunks. But assuming a function called 'split_line' appropriate to that data you could do
with gzip.open('file.gz.Done', 'rb') as gzip_f:
data = [split_line(l) for l in gzip_f]
with open('output.csv', 'wb') as myFile:
writer = csv.writer(myFile)
writer.writerows(data)
print("Writing complete")
Of course at this point doing row by row and putting the with lines together makes sense.
See https://docs.python.org/2/library/csv.html
I think it's simply because gzip.open() will give you a file-like object but csvwriter.writerows() needs a list of lists of strings to do its work.
But I don't understand why you want to use the csv module. You look like you only want to extract the content of the gzip file and save it in a output file uncompressed. You could do that like that:
import gzip
input_file_name = 'file.gz.DONE'
output_file_name = 'output.csv'
with gzip.open(input_file_name, 'rt') as input_file:
with open('output.csv', 'wt') as output_file:
for line in input_file:
output_file.write(line)
print("Writing complete")
If you want to use the csv module because you're not sure your input data is properly formatted (and you want an error message right away) you could then do:
import gzip
import csv
input_file_name = 'file.gz.DONE'
output_file_name = 'output.csv'
with gzip.open(input_file_name, 'rt', newline='') as input_file:
reader_csv = csv.reader(input_file)
with open('output.csv', 'wt', newline='') as output_file:
writer_csv = csv.writer(output_file)
writer_csv.writerows(reader_csv)
print("Writing complete")
Is that what you were trying to do ? It's difficult to guess because we don't have the input file to understand.
If it's not what you want, could you care to clarify what you want?
Since I now have information the gzipped file is itself comma, separated values it simplifies thus..
with gzip.open('file.gz.DONE', 'rb') as gzip_f, open('output.csv', 'wb') as myFile:
myfile.write(gzip_f.read())
In other words it is just a round about gunzip to another file.

I have a list of IDs, each of which is again associated with several IDS and some values. How to code in python to save this data?

I am reading in data of the following type from a file, and I need a method to store it for further calculations.
ID1 , ID2 , value
A , 1 , 520
A , 2 , 180
A , 3 , 80
B , 1 , 49
C , 1 , 96
C , 2 , 287
etc.
What is the best way to save it?
In PERL, I would have used a hash and separators, as follows, and then called by hash key and separated using split over the comma:
$data{$ID1} .= $ID2.':'.$value.',';
I have to address the following problem in PYTHON as it would be integrated with other code, but I am new to the language. Please suggest what might be the best way to do it.
P.S. The input data file is huge (~500Mb) and could be more.
Thanks for the help.
If you've loaded your data with Python, and your next program down the line is also written in Python, you can simply use the pickle module, like this:
big_list_list = [["A", 1, 520], ["A", 2, 180], ["B", 1, 49]]
import pickle
# Storing the data
with open("data.pickle", "wb") as outfile:
pickle.dump(big_list_list, outfile)
# Retrieving the data
with with open("data.pickle", "rb") as infile:
reconstructed_big_list_list = pickle.load(infile)
This has two caveats: if part of your workflow includes Non-Python programs, they won't be able to read pickles. And you shouldn't trust pickle files from arbitrary sources, since they could contain malicious code.
Instead of using pickles, you can also use JSON files. Simple replace the word pickle with json in the recipy above. JSON has the advantage that many Non-Python programs can deal with it.
Even more universal would be the use of CSV files, like this:
import csv
with open('data.csv', 'w', newline='') as outfile:
writer = csv.writer(outfile)
writer.writerows(big_list_list)
with open('data.csv', newline='') as infile:
reader = csv.reader(infile)
reconstructed_big_list_list = [row for row in reader]
Python's standard library also includes the module sqlite3, which allows you to write your data to a database, which might be useful if your data becomes more complicated than a simple list of lists, or you need concurrent access.
PS.: I just saw that you noted that your files could be uge. In this case, you could modify the CSV solution to store and load your data incrementally:
import csv
with open('data.csv', 'w', newline='') as outfile:
writer = csv.writer(outfile)
for row in big_list_list:
writer.writerow(row)
with open('data.csv', newline='') as infile:
reader = csv.reader(infile)
for row in reader:
print(row)

Excel disregards decimal separators when working with Python generated CSV file

I am currently trying to write a csv file in python. The format is as following:
1; 2.51; 12
123; 2.414; 142
EDIT: I already get the above format in my CSV, so the python code seems ok. It appears to be an excel issue which is olved by changing the settigs as #chucksmash mentioned.
However, when I try to open the generated csv file with excel, it doesn't recognize decimal separators. 2.414 is treated as 2414 in excel.
csvfile = open('C:/Users/SUUSER/JRITraffic/Data/data.csv', 'wb')
writer = csv.writer(csvfile, delimiter=";")
writer.writerow(some_array_with_floats)
Did you check that the csv file is generated correctly as you want? Also, try to specify the delimeter character that your using for the csv file when you import/open your file. In this case, it is a semicolon.
For python 3, I think your above code will also run into a TypeError, which may be part of the problem.
I just made a modification with your open method to be 'w' instead of 'wb' since the array has float and not binary data. This seemed to generate the result that you were looking for.
csvfile = open('C:/Users/SUUSER/JRITraffic/Data/data.csv', 'w')
An ugly solution, if you really want to use ; as the separator:
import csv
import os
with open('a.csv', 'wb') as csvfile:
csvfile.write('sep=;'+ os.linesep) # new line
writer = csv.writer(csvfile, delimiter=";")
writer.writerow([1, 2.51, 12])
writer.writerow([123, 2.414, 142])
This will produce:
sep=;
1;2.51;12
123;2.414;142
which is recognized fine by Excel.
I personally would go with , as the separator in which case you do not need the first line, so you can basically:
import csv
with open('a.csv', 'wb') as csvfile:
writer = csv.writer(csvfile) # default delimiter is `,`
writer.writerow([1, 2.51, 12])
writer.writerow([123, 2.414, 142])
And excel will recognize what is going on.
A way to do this is to specify dialect=csv.excel in the writer. For example:
a = [[1, 2.51, 12],[123, 2.414, 142]]
csvfile = open('data.csv', 'wb')
writer = csv.writer(csvfile, delimiter=";", dialect=csv.excel)
writer.writerows(a)
csvfile.close()
Unless Excel is already configured to use semicolon as its default delimiter, it will be necessary to import data.csv using Data/FromText and specify semicolon as the delimiter in the Text Import Wizard step 2 screen.
Very little documentation is provided for the Dialect class at csv.Dialect. More information about it is at Dialects in the PyMOTW's "csv – Comma-separated value files" article on the Python csv module. More information about csv.writer() is available at https://docs.python.org/2/library/csv.html#csv.writer.

Downloading Google spreadsheet to csv - csv.writer adding a delimiter after every character

Going off the code from here: https://gist.github.com/cspickert/1650271 Instead of printing, I want to write to a csv file.
Added this at the bottom:
# Request a file-like object containing the spreadsheet's contents
csv_file = gs.download(ss)
# Write CSV object to a file
with open('test.csv', 'wb') as fp:
a = csv.writer(fp, delimiter=',')
a.writerows(csv_file)
Maybe I need to do a transform to csv_file before I can write?
Documentation says that:
csvwriter.writerows(rows) Write all the rows parameters (a list of
row objects as described above) to the writer’s file object,
formatted according to the current dialect.
Since csv_file is a file-like object, you need to convert it to a list of rows:
rows = csv.reader(csv_file)
a.writerows(rows)
Or, better yet, you can simply write to file:
csv_file = gs.download(ss)
with open('test.csv', 'wb') as fp:
fp.write(csv_file.read())

Categories

Resources