Python changing Comma Delimitation CSV - python
NEWBIE USING PYTHON (2.7.9)- When I export a gzipped file to a csv using:
myData = gzip.open('file.gz.DONE', 'rb')
myFile = open('output.csv', 'wb') with myFile:
writer = csv.writer(myFile)
writer.writerows(myData)
print("Writing complete")
It is printing in the csv with a comma deliminated in every character. eg.
S,V,R,","2,1,4,0,",",2,0,1,6,1,1,3,8,0,4,",",5,0,5,0,1,3,4,2,0,6,4,7,3,6,4,",",",",2,0,0,0,5,6,5,9,2,9,6,7,4,",",2,0,0,7,2,4,5,2,3,5,",",0,0,0,2,","
I,V,E,",",",",",",E,N,",",N,/,A,",",0,4,2,1,4,4,9,3,7,0,",":,I,R,_,",",N,/,A,",",U,N,A,N,S,W,",",",",",",",","
"
S,V,R,",",4,7,3,3,5,5,",",2,0,5,7,",",5,0,5,0,1,4,5,0,1,6,4,8,6,3,7,",",",",2,0,0,0,5,5,3,9,2,9,2,8,0,",",2,0,4,4,1,0,8,3,7,8,",",0,0,0,2,","
I,V,E,",",",",",",E,N,",",N,/,A,",",0,4,4,7,3,3,5,4,5,5,",",,:,I,R,_,",",N,/,A,",",U,N,A,N,S,W,",",",",",",",","
How do I get rid of the comma so that it is exported with the correct fields? eg.
SVR,2144370,20161804,50501342364,,565929674,2007245235,0002,1,PPDAP,PPLUS,DEACTIVE,,,EN,N/A,214370,:IR_,N/A,,,,,
SVR,473455,208082557,14501648637,,2000553929280,2044108378,0002,1,3G,CODAP,INACTIVE,,,EN,N/A,35455,:IR_,N/A,,,,,
You are only opening the gzip file. I think you are expecting the opened file to act automatically like an iterator. Which it does. However each line is a text string. The writerows expects an iterator with each item being an array of values to write with comma separation. Thus given an iterator with each item being a sting, and given that a string is an array of characters you get the result you found.
Since you didn't mention what the gzip data lines really contain I can't guess how to parse the lines into an array of reasonable chunks. But assuming a function called 'split_line' appropriate to that data you could do
with gzip.open('file.gz.Done', 'rb') as gzip_f:
data = [split_line(l) for l in gzip_f]
with open('output.csv', 'wb') as myFile:
writer = csv.writer(myFile)
writer.writerows(data)
print("Writing complete")
Of course at this point doing row by row and putting the with lines together makes sense.
See https://docs.python.org/2/library/csv.html
I think it's simply because gzip.open() will give you a file-like object but csvwriter.writerows() needs a list of lists of strings to do its work.
But I don't understand why you want to use the csv module. You look like you only want to extract the content of the gzip file and save it in a output file uncompressed. You could do that like that:
import gzip
input_file_name = 'file.gz.DONE'
output_file_name = 'output.csv'
with gzip.open(input_file_name, 'rt') as input_file:
with open('output.csv', 'wt') as output_file:
for line in input_file:
output_file.write(line)
print("Writing complete")
If you want to use the csv module because you're not sure your input data is properly formatted (and you want an error message right away) you could then do:
import gzip
import csv
input_file_name = 'file.gz.DONE'
output_file_name = 'output.csv'
with gzip.open(input_file_name, 'rt', newline='') as input_file:
reader_csv = csv.reader(input_file)
with open('output.csv', 'wt', newline='') as output_file:
writer_csv = csv.writer(output_file)
writer_csv.writerows(reader_csv)
print("Writing complete")
Is that what you were trying to do ? It's difficult to guess because we don't have the input file to understand.
If it's not what you want, could you care to clarify what you want?
Since I now have information the gzipped file is itself comma, separated values it simplifies thus..
with gzip.open('file.gz.DONE', 'rb') as gzip_f, open('output.csv', 'wb') as myFile:
myfile.write(gzip_f.read())
In other words it is just a round about gunzip to another file.
Related
os.walk-ing through directory to read and write all the CSVs
I have a bunch of folders and sub-folders with CSVs that have quotation marks that I need to get rid of, so I'm trying to build a script that iterates through and performs the operation on all CSVs. Below is the code I have. It correctly identifies what is and is not a CSV. And it re-writes them all -- but it's writing blank data in -- and not the row data without the quotation marks. I know that this is happening around lines 14-19 but I don't know know what to do. import csv import os rootDir = '.' for dirName, subDirList, fileList in os.walk(rootDir): print('Found directory: %s' % dirName) for fname in fileList: # Check if it's a .csv first if fname.endswith('.csv'): input = csv.reader(open(fname, 'r')) output = open(fname, 'w') with output: writer = csv.writer(output) for row in input: writer.writerow(row) # Skip if not a .csv else: print 'Not a .csv!!'
The problem is here: input = csv.reader(open(fname, 'r')) output = open(fname, 'w') As soon as you do that second open in 'w' mode, it erases the file. So, your input is looping over an empty file. One way to fix this is to you read the whole file into memory, and only then erase the whole file and rewrite it: input = csv.reader(open(fname, 'r')) contents = list(input) output = open(fname, 'w') with output: writer = csv.writer(output) for row in contents: writer.writerow(row) You can simplify this quite a bit: with open(fname, 'r') as infile: contents = list(csv.reader(infile)) with open(fname, 'w') as outfile: csv.writer(outfile).writerows(contents) Alternatively, you can write to a temporary file as you go, and then move the temporary file on top of the original file. This is a bit more complicated, but it has a major advantage—if you have an error (or someone turns off the computer) in the middle of writing, you still have the old file and can start over, instead of having 43% of the new file and all your data is lost: dname = os.path.dirname(fname) with open(fname, 'r') as infile, tempfile.NamedTemporaryFile('w', dir=dname, delete=False) as outfile: writer = csv.writer(outfile) for row in csv.reader(infile): writer.writerow(row) os.replace(outfile.name, fname) If you're not using Python 3.3+, you don't have os.replace. On Unix, you can just use os.rename instead, but on Windows… it's a pain to get this right, and you probably want to look for a third-party library on PyPI. (I haven't used any of then, buy if you're using Windows XP/2003 or later and Python 2.6/3.2 or later, pyosreplace looks promising.)
How to write output file in CSV format in python?
I tried to write output file as a CSV file but getting either an error or not the expected result. I am using Python 3.5.2 and 2.7 also. Getting error in Python 3.5: wr.writerow(var) TypeError: a bytes-like object is required, not 'str' and In Python 2.7, I am getting all column result in one column. Expected Result: An output file same format as the input file. Code: import csv f1 = open("input_1.csv", "r") resultFile = open("out.csv", "wb") wr = csv.writer(resultFile, quotechar=',') def sort_duplicates(f1): for i in range(0, len(f1)): f1.insert(f1.index(f1[i])+1, f1[i]) f1.pop(i+1) for var in f1: #print (var) wr.writerow([var]) If I am using resultFile = open("out.csv", "w"), I get one row extra in the output file. If I am using above code, getting one row and column extra.
On Python 3, csv requires that you open the file in text mode, not binary mode. Drop the b from your file mode. You should really use newline='' too: resultFile = open("out.csv", "w", newline='') Better still, use the file object as a context manager to ensure it is closed automatically: with open("input_1.csv", "r") as f1, \ open("out.csv", "w", newline='') as resultFile: wr = csv.writer(resultFile, dialect='excel') for var in f1: wr.writerow([var.rstrip('\n')]) I've also stripped the lines from f1 (just to remove the newline) and put the line in a list; csv.writer.writerow wants a sequence with columns, not a single string. Quoting the csv.writer() documentation: If csvfile is a file object, it should be opened with newline='' [1]. [...] All other non-string data are stringified with str() before being written. [1] If newline='' is not specified, newlines embedded inside quoted fields will not be interpreted correctly, and on platforms that use \r\n linendings on write an extra \r will be added. It should always be safe to specify newline='', since the csv module does its own (universal) newline handling.
Others have answered that you should open the output file in text mode when using Python 3, i.e. with open('out.csv', 'w', newline='') as resultFile: ... But you also need to parse the incoming CSV data. As it is your code reads each line of the input CSV file as a single string. Then, without splitting that line into its constituent fields, it passes the string to the CSV writer. As a result, the csv.writer will treat the string as a sequence and output each character , including any terminating new line character, as a separate field. For example, if your input CSV file contains: 1,2,3,4 Your output file would be written like this: 1,",",2,",",3,",",4," " You should change the for loop to this: for row in csv.reader(f1): # process the row wr.writerow(row) Now the input CSV file will be parsed into fields and row will contain a list of strings - one for each field. For the previous example, row would be: for row in csv.reader(f1): print(row) ['1', '2', '3', '4'] And when that list is passed to the csv.writer the output to the file will be: 1,2,3,4 Putting all of that together you get this code: import csv with open('input_1.csv') as f1, open('out.csv', 'w', newline='') as resultFile: wr = csv.writer(resultFile, dialect='excel') for row in csv.reader(f1): wr.writerow(row)
open file without b mode b mode open your file as binary you can open file as w open_file = open("filename.csv", "w")
You are opening the input file in normal read mode but the output file is opened in binary mode, correct way resultFile = open("out.csv", "w") As shown above if you replace "wb" with "w" it will work.
Loop that will iterate a certain number of times through a CSV in Python
I have a large CSV file (~250000 rows) and before I work on fully parsing and sorting it I was trying to display only a part of it by writing it to a text file. csvfile = open(file_path, "rb") rows = csvfile.readlines() text_file = open("output.txt", "w") row_num = 0 while row_num < 20: text_file.write(", ".join(row[row_num])) row_num += 1 text_file.close() I want to iterate through the CSV file and write only a small section of it to a text file so I can look at how it does this and see if it would be of any use to me. Currently the text file ends up empty. A way I thought might do this would be to iterate through the file with a for loop that exits after a certain number of iteration but I could be wrong and I'm not sure how to do this, any ideas?
There's nothing specifically wrong with what you're doing, but it's not particularly Pythonic. In particular reading the whole file into memory with readlines() at the start seems pointless if you're only using 20 lines. Instead you could use a for loop with enumerate and break when necessary. csvfile = open(file_path, "rb") text_file = open("output.txt", "w") for i, row in enumerate(csvfile): text_file.write(row) if row_num >= 20: break text_file.close() You could further improve this by using with blocks to open the files, rather than closing them explicitly. For example: with open(file_path, "rb") as csvfile: #your code here involving csvfile #now the csvfile is closed! Also note that Python might not be the best tool for this - you could do it directly from Bash, for example, with just head -n20 csvfile.csv > output.txt.
A simple solution would be to just do : #!/usr/bin/python # -*- encoding: utf-8 -*- file_path = './test.csv' with open(file_path, 'rb') as csvfile: with open('output.txt', 'wb') as textfile: for i, row in enumerate(csvfile): textfile.write(row) if i >= 20: break Explanation : with open(file_path, 'rb') as csvfile: with open('output.txt', 'wb') as textfile: Instead of using open and close, it is recommended to use this line instead. Just write the lines that you want to execute when your file is opened into a new level of indentation. 'rb' and 'wb' are the keywords you need to open a file in respectively 'reading' and 'writing' in 'binary mode' for i, row in enumerate(csvfile): This line allows you to read line by line your CSV file, and using a tuple (i, row) gives you both the content of the row and its index. That's one of the awesome built-in functions from Python : check out here for more about it. Hope this helps ! EDIT : Note that Python has a CSV package that can do that without enumerate : # -*- encoding: utf-8 -*- import csv file_path = './test.csv' with open(file_path, 'rb') as csvfile: reader = csv.reader(csvfile) with open('output.txt', 'wb') as textfile: writer = csv.writer(textfile) i = 0 while i<20: row = next(reader) writer.writerow(row) i += 1 All we need to use is its reader and writer. They have functions next (that reads one line) and writerow (that writes one). Note that here, the variable row is not a string, but a list of strings, because the function does the split job by itself. It might be faster than the previous solution. Also, this has the major advantage of allowing you to look anywhere you want in the file, no necessarily from the beginning (just change the bounds for i)
Excel disregards decimal separators when working with Python generated CSV file
I am currently trying to write a csv file in python. The format is as following: 1; 2.51; 12 123; 2.414; 142 EDIT: I already get the above format in my CSV, so the python code seems ok. It appears to be an excel issue which is olved by changing the settigs as #chucksmash mentioned. However, when I try to open the generated csv file with excel, it doesn't recognize decimal separators. 2.414 is treated as 2414 in excel. csvfile = open('C:/Users/SUUSER/JRITraffic/Data/data.csv', 'wb') writer = csv.writer(csvfile, delimiter=";") writer.writerow(some_array_with_floats)
Did you check that the csv file is generated correctly as you want? Also, try to specify the delimeter character that your using for the csv file when you import/open your file. In this case, it is a semicolon.
For python 3, I think your above code will also run into a TypeError, which may be part of the problem. I just made a modification with your open method to be 'w' instead of 'wb' since the array has float and not binary data. This seemed to generate the result that you were looking for. csvfile = open('C:/Users/SUUSER/JRITraffic/Data/data.csv', 'w')
An ugly solution, if you really want to use ; as the separator: import csv import os with open('a.csv', 'wb') as csvfile: csvfile.write('sep=;'+ os.linesep) # new line writer = csv.writer(csvfile, delimiter=";") writer.writerow([1, 2.51, 12]) writer.writerow([123, 2.414, 142]) This will produce: sep=; 1;2.51;12 123;2.414;142 which is recognized fine by Excel. I personally would go with , as the separator in which case you do not need the first line, so you can basically: import csv with open('a.csv', 'wb') as csvfile: writer = csv.writer(csvfile) # default delimiter is `,` writer.writerow([1, 2.51, 12]) writer.writerow([123, 2.414, 142]) And excel will recognize what is going on.
A way to do this is to specify dialect=csv.excel in the writer. For example: a = [[1, 2.51, 12],[123, 2.414, 142]] csvfile = open('data.csv', 'wb') writer = csv.writer(csvfile, delimiter=";", dialect=csv.excel) writer.writerows(a) csvfile.close() Unless Excel is already configured to use semicolon as its default delimiter, it will be necessary to import data.csv using Data/FromText and specify semicolon as the delimiter in the Text Import Wizard step 2 screen. Very little documentation is provided for the Dialect class at csv.Dialect. More information about it is at Dialects in the PyMOTW's "csv – Comma-separated value files" article on the Python csv module. More information about csv.writer() is available at https://docs.python.org/2/library/csv.html#csv.writer.
Downloading Google spreadsheet to csv - csv.writer adding a delimiter after every character
Going off the code from here: https://gist.github.com/cspickert/1650271 Instead of printing, I want to write to a csv file. Added this at the bottom: # Request a file-like object containing the spreadsheet's contents csv_file = gs.download(ss) # Write CSV object to a file with open('test.csv', 'wb') as fp: a = csv.writer(fp, delimiter=',') a.writerows(csv_file) Maybe I need to do a transform to csv_file before I can write?
Documentation says that: csvwriter.writerows(rows) Write all the rows parameters (a list of row objects as described above) to the writer’s file object, formatted according to the current dialect. Since csv_file is a file-like object, you need to convert it to a list of rows: rows = csv.reader(csv_file) a.writerows(rows) Or, better yet, you can simply write to file: csv_file = gs.download(ss) with open('test.csv', 'wb') as fp: fp.write(csv_file.read())