Cleaning unicode characters while writing to csv

Cleaning unicode characters while writing to csv - python

I am using a certain REST api to get data, and then attemping to write it to a csv using python 2.7
In the csv, every item with a tuple has u' ' around it. For example, with the 'tags' field i am retrieving, i am getting [u'01d/02d/major/--', u'45m/04h/12h/24h', u'internal', u'net', u'premium_custom', u'priority_fields_swapped', u'priority_saved', u'problem', u'urgent', u'urgent_priority_issue'] . However, if I print the data in the program prior to it being written in the csv, the data looks fine, .ie ('01d/02d/major--', '45m/04h/12h/24h', etc). So I am assuming I have to modify something in the csv write command or within the the csv writer object itself. My question is how to write the data into the csv properly so that there are no unicode characters.

In Python3:
Just define the encoding when opening the csv file to write in.
If the row contains non ascii chars, you will get UnicodeEncodeError
row = [u'01d/02d/major/--', u'45m/04h/12h/24h', u'internal', u'net', u'premium_custom', u'priority_fields_swapped', u'priority_saved', u'problem', u'urgent', u'urgent_priority_issue']
import csv
with open('output.csv', 'w', newline='', encoding='ascii') as f:
writer = csv.writer(f)
writer.writerow(row)

Related

Making a CSV via Excel gives me 'ï»¿' in front of the first column name

I generated a csv via excel and when printing the key names, I get some weird characters appended to the first key like so:
keys(['ï»¿row1', 'row2']
import csv
path = 'C:\\Users\\asdf\\Desktop\\file.csv'
with open(path, 'r') as file:
reader = csv.DictReader(file)
for row in reader:
print(row.keys())
However, if I just create the csv in the IDE everything works fine and no strange chars are printed. How can I read the excel csv in to chop off the strange characters?

with open(path, 'r', encoding='utf-8-sig')
this worked

creating a csv file from a function result, python

I am using this pdf to csv function from {Python module for converting PDF to text} and I was wondering how can I now export the result to a csv file on my drive? I tried adding in the function
with open('C:\location', 'wb') as f:
writer = csv.writer(f)
for row in data:
writer.writerow(row)
but the resulting csv file has one character per row and not the rows I have when printing data in python.

If you are printing a single character per row, then what you have is a string. Your loop
for row in data:
translates to
for character in string:
so you need to break your string up into the chunks you want written on a single row. You might be able to use something like data.split() but it's hard to say without seeing more of your code and data.
In response to your comment:
yes, you can just dump the data to a CSV... If it adheres to the rules of CSV. If your data is separated by commas, with each row terminated by a newline, then you can just write your data to a file.
with open ("file.csv",'w') as f:
f.write(data)
This will ONLY work if your data adheres to the rules of csv.

Using Python's CSV library to print an array as a csv file

I have a python list as such:
[['a','b','c'],['d','e','f'],['g','h','i']]
I am trying to get it into a csv format so I can load it into excel:
a,b,c
d,e,f
g,h,i
Using this, I am trying to write the arary to a csv file:
with open('tables.csv','w') as f:
f.write(each_table)
However, it prints out this:
[
[
'
a
'
,
...
...
So then I tried putting it into an array (again) and then printing it.
each_table_array=[each_table]
with open('tables.csv','w') as f:
f.write(each_table_array)
Now when I open up the csv file, its a bunch of unknown characters, and when I load it into excel, I get a character for every cell.
Not too sure if it's me using the csv library wrong, or the array portion.
I just figured out that the table I am pulling data from has another table within one of its cells, this expands out and messes up the whole formatting

You need to use the csv library for your job:
import csv
each_table = [['a','b','c'],['d','e','f'],['g','h','i']]
with open('tables.csv', 'w') as csvfile:
writer = csv.writer(csvfile)
for row in each_table:
writer.writerow(row)

As a more flexible and pythonic way use csv module for dealing with csv files Note that as you are in python 2 you need the method newline='' * in your open function . then you can use csv.writer to open you csv file for write:
import csv
with open('file_name.csv', 'w',newline='') as csvfile:
spamwriter = csv.writer(csvfile, delimiter=',')
spamwriter.writerows(main_list)
From python wiki: If newline='' is not specified, newlines embedded inside quoted fields will not be interpreted correctly, and on platforms that use \r\n linendings on write an extra \r will be added. It should always be safe to specify newline='', since the csv module does its own (universal) newline handling.

Python csv writer : "Unknown Dialect" Error

I have a very large string in the CSV format that will be written to a CSV file.
I try to write it to CSV using the simplest if the python script
results=""" "2013-12-03 23:59:52","/core/log","79.223.39.000","logging-4.0",iPad,Unknown,"1.0.1.59-266060",NA,NA,NA,NA,3,"1385593191.865",true,ERROR,"app_error","iPad/Unknown/webkit/537.51.1",NA,"Does+not",false
"2013-12-03 23:58:41","/core/log","217.7.59.000","logging-4.0",Win32,Unknown,"1.0.1.59-266060",NA,NA,NA,NA,4,"1385593120.68",true,ERROR,"app_error","Win32/Unknown/msie/9.0",NA,"Does+not,false
"2013-12-03 23:58:19","/core/client_log","79.240.195.000","logging-4.0",Win32,"5.1","1.0.1.59-266060",NA,NA,NA,NA,6,"1385593099.001",true,ERROR,"app_error","Win32/5.1/mozilla/25.0",NA,"Could+not:+{"url":"/all.json?status=ongoing,scheduled,conflict","code":0,"data":"","success":false,"error":true,"cached":false,"jqXhr":{"readyState":0,"responseText":"","status":0,"statusText":"error"}}",false"""
resultArray = results.split('\n')
with open(csvfile, 'wb') as f:
writer = csv.writer(f)
for row in resultArray:
writer.writerows(row)
The code returns
"Unknown Dialect"
Error
Is the error because of the script or is it due to the string that is being written?
EDIT
If the problem is bad input how do I sanitize it so that it can be used by the csv.writer() method?

You need to specify the format of your string:
with open(csvfile, 'wb') as f:
writer = csv.writer(f, delimiter=',', quotechar="'", quoting=csv.QUOTE_ALL)
You might also want to re-visit your writing loop; the way you have it written you will get one column in your file, and each row will be one character from the results string.
To really exploit the module, try this:
import csv
lines = ["'A','bunch+of','multiline','CSV,LIKE,STRING'"]
reader = csv.reader(lines, quotechar="'")
with open('out.csv', 'wb') as f:
writer = csv.writer(f)
writer.writerows(list(reader))
out.csv will have:
A,bunch+of,multiline,"CSV,LIKE,STRING"
If you want to quote all the column values, then add quoting=csv.QUOTE_ALL to the writer object; then you file will have:
"A","bunch+of","multiline","CSV,LIKE,STRING"
To change the quotes to ', add quotechar="'" to the writer object.

The above code does not give csv.writer.writerows input that it expects. Specifically:
resultArray = results.split('\n')
This creates a list of strings. Then, you pass each string to your writer and tell it to writerows with it:
for row in resultArray:
writer.writerows(row)
But writerows does not expect a single string. From the docs:
csvwriter.writerows(rows)
Write all the rows parameters (a list of row objects as described above) to the writer’s file object, formatted according to the current dialect.
So you're passing a string to a method that expects its argument to be a list of row objects, where a row object is itself expected to be a sequence of strings or numbers:
A row must be a sequence of strings or numbers for Writer objects
Are you sure your listed example code accurately reflects your attempt? While it certainly won't work, I would expect the exception produced to be different.
For a possible fix - if all you are trying to do is to write a big string to a file, you don't need the csv library at all. You can just write the string directly. Even splitting on newlines is unnecessary unless you need to do something like replacing Unix-style linefeeds with DOS-style linefeeds.
If you need to use the csv module after all, you need to give your writer something it understands - in this example, that would be something like writer.writerow(['A','bunch+of','multiline','CSV,LIKE,STRING']). Note that that's a true Python list of strings. If you need to turn your raw string "'A','bunch+of','multiline','CSV,LIKE,STRING'" into such a list, I think you'll find the csv library useful as a reader - no need to reinvent the wheel to handle the quoted commas in the substring 'CSV,LIKE,STRING'. And in that case you would need to care about your dialect.

you can use 'register_dialect':
for example for escaped formatting:
csv.register_dialect('escaped', escapechar='\\', doublequote=True, quoting=csv.QUOTE_ALL)

Non Ascii character export on csv with python

I have a list of lists and I export it on a csv. Some of the lists entities are strings and some them with non-ascii characters.
For example: Name = "Ömer Berin"
I try Name.encode('utf-8'), before exporting but on the csv the name show like this "Γ–mer Berin"
I use this code for exporting:
with open("output.csv", "wb") as f:
writer = csv.writer(f)
writer.writerows(mylist)

UnicodeWriter should satisfy your needs http://docs.python.org/2/library/csv.html#csv-examples

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.