CSV Problems and Help Needed with Code

CSV Problems and Help Needed with Code - python

So I am trying to create a Python script that will sort a specified column of an Excel sheet. So far my code is...
import csv
import operator
with open('case_name.csv') as infile:
data = list(csv.reader(infile, dialect=csv.excel_tab))
data.sort(key=operator.itemgetter(2))
with open('case_name_sorted.csv', 'w') as outfile:
writer = csv.writer(outfile, dialect='excel')
writer.writerows(data)
However, when I run this code I continue to get an error that says...
data = list(csv.reader(infile, dialect=csv.excel_tab))
_csv.Error: new-line character seen in unquoted field - do you need to open the file in universal-newline mode?
I did some research and found out that .csv files do not work well on a Mac. So what should I change the file to that will keep it working like an Excel sheet? Also, if anyone has any pointers on how else I could sort a column, I would very much appreciate some tips. Thanks!

try opening the file with the mode
open('case_name.csv', mode='rU')
compare with: https://docs.python.org/2/library/functions.html#open

Related

Multiline CSV read using Python3

Everyday we get CSV file from vendor and we need to parse them and insert it to database. We use single Python3 program for all the tasks.
The problem happening is with multiline CSV files, where the contents in the second lines are skipped.
48.11363;11.53402;81369;München;"";1.0;1962;I would need
help from
Stackoverflow;"";"";"";289500.0;true;""
Here the field "I would need help from Stackoverflow" is spread in 3 lines.
The problem that happens is python3 only considers "I would Need" as a record and skips the rest of the part.
At present I am using below options to read from database :
with open(file_path, newline='', encoding='utf-8') as f:
reader = csv.reader(f, delimiter=',' , quotechar='"', quoting=csv.QUOTE_MINIMAL)
for row in reader:
{MY LOGIC}
Is there any way to include multiline CSV as a single record.
I understand, In pyspark, there is an option of option("multiline",True) but we don't want to use pyspark in first place.
Looking for options.
Thanks in Advance

read csv file as a string in python

I accidentally corrupted a csv file (delimiters no longer working - thanks Microsoft Excel!). I want to salvage some data by reading it as a string and searching for things - I can see the text by opening the file on notepad, but I can't figure out how to load that string from the filepath in python.
I imagine it would be a variation of
csv_string = open(filepath, 'something').read()
but I can't get it to work, or find a solution on SO / google.

It should work with the following code, but it is not the best way to deal with csv.
csv_string = ''.join(open(filepath, 'r').readlines())

Something like:
with open(filepath, 'r') as corrupted_file:
for line in corrupted_file:
print(line) # Or whatever

You can read csv from this .
import csv
reader = csv.reader(open("samples/sample.csv"))
for title, year, director in reader:
print year, title

Generating CSV and blank line

I am generating and parsing CSV files and I'm noticing something odd.
When the CSV gets generated, there is always an empty line at the end, which is causing issues when subsequently parsing them.
My code to generate is as follows:
with open(file, 'wb') as fp:
a = csv.writer(fp, delimiter=",", quoting=csv.QUOTE_NONE, quotechar='')
a.writerow(["Id", "Builing", "Age", "Gender"])
results = get_results()
for val in results:
if any(val):
a.writerow(val)
It doesn't show up via the command line, but I do see it in my IDE/text editor
Does anyone know why it is doing this?
Could it be possible whitespace?

Is the problem the line terminator? It could be as simple as changing one line:
a = csv.writer(fp, delimiter=",", quoting=csv.QUOTE_NONE, quotechar='', lineterminator='\n')
I suspect this is it since I know that csv.writer defaults to using carriage return + line feed ("\r\n") as the line terminator. The program you are using to read the file might be expecting just a line feed ("\n"). This is common in switching file back and forth between *nix and Windows.
If this doesn't work, then the program you are using to read the file seems to be expecting no line terminator for the last row, I'm not sure the csv module supports that. For that, you could write the csv to a StringIO, "strip()" it and then write that your file.
Also since you are not quoting anyting, is there a reason to use csv at all? Why not:
with open(file, 'wb') as fp:
fp.write("\n".join( [ ",".join([ field for field in record ]) for record in get_results()]))

Excel disregards decimal separators when working with Python generated CSV file

I am currently trying to write a csv file in python. The format is as following:
1; 2.51; 12
123; 2.414; 142
EDIT: I already get the above format in my CSV, so the python code seems ok. It appears to be an excel issue which is olved by changing the settigs as #chucksmash mentioned.
However, when I try to open the generated csv file with excel, it doesn't recognize decimal separators. 2.414 is treated as 2414 in excel.
csvfile = open('C:/Users/SUUSER/JRITraffic/Data/data.csv', 'wb')
writer = csv.writer(csvfile, delimiter=";")
writer.writerow(some_array_with_floats)

Did you check that the csv file is generated correctly as you want? Also, try to specify the delimeter character that your using for the csv file when you import/open your file. In this case, it is a semicolon.

For python 3, I think your above code will also run into a TypeError, which may be part of the problem.
I just made a modification with your open method to be 'w' instead of 'wb' since the array has float and not binary data. This seemed to generate the result that you were looking for.
csvfile = open('C:/Users/SUUSER/JRITraffic/Data/data.csv', 'w')

An ugly solution, if you really want to use ; as the separator:
import csv
import os
with open('a.csv', 'wb') as csvfile:
csvfile.write('sep=;'+ os.linesep) # new line
writer = csv.writer(csvfile, delimiter=";")
writer.writerow([1, 2.51, 12])
writer.writerow([123, 2.414, 142])
This will produce:
sep=;
1;2.51;12
123;2.414;142
which is recognized fine by Excel.
I personally would go with , as the separator in which case you do not need the first line, so you can basically:
import csv
with open('a.csv', 'wb') as csvfile:
writer = csv.writer(csvfile) # default delimiter is `,`
writer.writerow([1, 2.51, 12])
writer.writerow([123, 2.414, 142])
And excel will recognize what is going on.

A way to do this is to specify dialect=csv.excel in the writer. For example:
a = [[1, 2.51, 12],[123, 2.414, 142]]
csvfile = open('data.csv', 'wb')
writer = csv.writer(csvfile, delimiter=";", dialect=csv.excel)
writer.writerows(a)
csvfile.close()
Unless Excel is already configured to use semicolon as its default delimiter, it will be necessary to import data.csv using Data/FromText and specify semicolon as the delimiter in the Text Import Wizard step 2 screen.
Very little documentation is provided for the Dialect class at csv.Dialect. More information about it is at Dialects in the PyMOTW's "csv – Comma-separated value files" article on the Python csv module. More information about csv.writer() is available at https://docs.python.org/2/library/csv.html#csv.writer.

Python CSV: Remove quotes from value

I have a process where a CSV file can be downloaded, edited then uploaded again. On the download, the CSV file is in the correct format, with no wrapping double quotes
1, someval, someval2
When I open the CSV in a spreadsheet, edit and save, it adds double quotes around the strings
1, "someEditVal", "someval2"
I figured this was just the action of the spreadsheet (in this case, openoffice). I want my upload script to remove the wrapping double quotes. I cannot remove all quotes, just incase the body contains them, and I also dont want to just check first and last characters for double quotes.
Im almost sure that the CSV library in python would know how to handle this, but not sure how to use it...
EDIT
When I use the values within a dictionary, they turn out as follows
{'header':'"value"'}
Thanks

For you example, the following works:
import csv
writer = csv.writer(open("out.csv", "wb"), quoting=csv.QUOTE_NONE)
reader = csv.reader(open("in.csv", "rb"), skipinitialspace=True)
writer.writerows(reader)
You might need to play with the dialect options of the CSV reader and writer -- see the documentation of the csv module.

Thanks to everyone who was trying to help me, but I figured it out. When specifying the reader, you can define the quotechar
csv.reader(upload_file, delimiter=',', quotechar='"')
This handles the wrapping quotes of strings.

For Python 3:
import csv
writer = csv.writer(open("query_result.csv", "wt"), quoting=csv.QUOTE_NONE, escapechar='\\')
reader = csv.reader(open("out.txt", "rt"), skipinitialspace=True)
writer.writerows(reader)
The original answer gives this error under Python 3. Also See this SO for detail: csv.Error: iterator should return strings, not bytes
Traceback (most recent call last):
File "remove_quotes.py", line 11, in
writer.writerows(reader)
_csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

CSV Problems and Help Needed with Code - python

try opening the file with the mode open('case_name.csv', mode='rU') compare with: https://docs.python.org/2/library/functions.html#open

Related

Multiline CSV read using Python3

read csv file as a string in python

Generating CSV and blank line

Excel disregards decimal separators when working with Python generated CSV file

Python CSV: Remove quotes from value

Categories

Resources