Non Ascii character export on csv with python

Non Ascii character export on csv with python - python

I have a list of lists and I export it on a csv. Some of the lists entities are strings and some them with non-ascii characters.
For example: Name = "Ömer Berin"
I try Name.encode('utf-8'), before exporting but on the csv the name show like this "Γ–mer Berin"
I use this code for exporting:
with open("output.csv", "wb") as f:
writer = csv.writer(f)
writer.writerows(mylist)

UnicodeWriter should satisfy your needs http://docs.python.org/2/library/csv.html#csv-examples

Related

Making a CSV via Excel gives me 'ï»¿' in front of the first column name

I generated a csv via excel and when printing the key names, I get some weird characters appended to the first key like so:
keys(['ï»¿row1', 'row2']
import csv
path = 'C:\\Users\\asdf\\Desktop\\file.csv'
with open(path, 'r') as file:
reader = csv.DictReader(file)
for row in reader:
print(row.keys())
However, if I just create the csv in the IDE everything works fine and no strange chars are printed. How can I read the excel csv in to chop off the strange characters?

with open(path, 'r', encoding='utf-8-sig')
this worked

Cleaning unicode characters while writing to csv

I am using a certain REST api to get data, and then attemping to write it to a csv using python 2.7
In the csv, every item with a tuple has u' ' around it. For example, with the 'tags' field i am retrieving, i am getting [u'01d/02d/major/--', u'45m/04h/12h/24h', u'internal', u'net', u'premium_custom', u'priority_fields_swapped', u'priority_saved', u'problem', u'urgent', u'urgent_priority_issue'] . However, if I print the data in the program prior to it being written in the csv, the data looks fine, .ie ('01d/02d/major--', '45m/04h/12h/24h', etc). So I am assuming I have to modify something in the csv write command or within the the csv writer object itself. My question is how to write the data into the csv properly so that there are no unicode characters.

In Python3:
Just define the encoding when opening the csv file to write in.
If the row contains non ascii chars, you will get UnicodeEncodeError
row = [u'01d/02d/major/--', u'45m/04h/12h/24h', u'internal', u'net', u'premium_custom', u'priority_fields_swapped', u'priority_saved', u'problem', u'urgent', u'urgent_priority_issue']
import csv
with open('output.csv', 'w', newline='', encoding='ascii') as f:
writer = csv.writer(f)
writer.writerow(row)

Python2.7 Write unicode dictionary into a csv file

I have a dictionary scraped from a Chinese website. All processed with unicode. Now I want to write data into a csv file. The first line contains all the dict.keys() and the second line contains all the dict.values()
How to write this dictionary into the csv? Specifically, I need all the Chinese characters displayed in csv. I am having trouble with converting them.
Thanks in advance,
data = {u'\u6ce8\u518c\u8d44\u672c': u'6500\u4e07\u5143\u4eba\u6c11\u5e01[8]', u'\u7ecf\u8425\u8303\u56f4': u'\u4e92\u8054\u7f51', u'\u5b98\u7f51': u'http://www.tencent.com/', u'\u6210\u7acb\u65f6\u95f4': u'1998\u5e7411\u670811\u65e5[8]', u'\u6ce8\u518c\u53f7': u'440301103448669[8]', u'\u5e74\u8425\u4e1a\u989d': u'1028.63\u4ebf\u5143\u4eba\u6c11\u5e01\uff082015\u5e74\uff09[9]', u'\u521b\u59cb\u4eba': u'\u9a6c\u5316\u817e\u5f20\u5fd7\u4e1c\u8bb8\u6668\u6654\u9648\u4e00\u4e39\u66fe\u674e\u9752[10]', u'\u603b\u90e8\u5730\u70b9': u'\u4e2d\u56fd\u6df1\u5733', u'\u603b\u88c1': u'\u5218\u70bd\u5e73', u'\u6ce8\u518c\u5730': u'\u6df1\u5733', u'\u5916\u6587\u540d\u79f0': u'Tencent', u'\u8463\u4e8b\u5c40\u4e3b\u5e2d': u'\u9a6c\u5316\u817e', u'\u5458\u5de5\u6570': u'2.5\u4e07\u4f59\u4eba\uff082014\u5e74\uff09', u'\u516c\u53f8\u6027\u8d28': u'\u6709\u9650\u8d23\u4efb\u516c\u53f8[8]', u'\u516c\u53f8\u53e3\u53f7': u'\u4e00\u5207\u4ee5\u7528\u6237\u4ef7\u503c\u4e3a\u4f9d\u5f52', u'\u4f01\u4e1a\u613f\u666f': u'\u6700\u53d7\u5c0a\u656c\u7684\u4e92\u8054\u7f51\u4f01\u4e1a', u'\u516c\u53f8\u4f7f\u547d': u'\u901a\u8fc7\u4e92\u8054\u7f51\u670d\u52a1\u63d0\u5347\u4eba\u7c7b\u751f\u6d3b\u54c1\u8d28', u'\u6cd5\u5b9a\u4ee3\u8868\u4eba': u'\u9a6c\u5316\u817e', u'\u767b\u8bb0\u673a\u5173': u'\u6df1\u5733\u5e02\u5e02\u573a\u76d1\u7763\u7ba1\u7406\u5c40\u5357\u5c71\u5c40[8]', u'\u516c\u53f8\u540d\u79f0': u'\u6df1\u5733\u5e02\u817e\u8baf\u8ba1\u7b97\u673a\u7cfb\u7edf\u6709\u9650\u516c\u53f8[8]'}

It would be trivial if you were using Python3 that natively uses Unicode:
import csv
with open("file.csv", "w", newline='', encoding='utf8') as fd:
dw = DictWriter(fd, data.keys()
dw.writeheader()
dw.writerow(data)
As you prefixed your unicode strings with u, I assume that you use Python2. The csv module is great as processing csv files, but the Python2 version does not natively process Unicode strings. To process a unicode dict, you can just encode its keys and values in utf8:
import csv
utf8data = { k.encode('utf8'): v.encode('utf8') for (k,v) in data.iteritems() }
with open("file.csv", "wb") as fd:
dw = DictWriter(fd, utf8data.keys()
dw.writeheader()
dw.writerow(utf8data)

Try to use the codecs module.
import codecs
with codecs.open(filename, "w", "utf-8") as f:
for key, value in data.iteritems():
f.write(key+','+value+'\n')
This should have the desidered behaviour

'utf-8' encoding solves the problem. One approach for converting the dictionary into a csv file is pandas library. It can solve the problem easily.
import pandas as pd
df = pd.DataFrame.from_dict(data, orient='index')
df.to_csv('output.csv', encoding='utf-8', header=None)

Using Python's CSV library to print an array as a csv file

I have a python list as such:
[['a','b','c'],['d','e','f'],['g','h','i']]
I am trying to get it into a csv format so I can load it into excel:
a,b,c
d,e,f
g,h,i
Using this, I am trying to write the arary to a csv file:
with open('tables.csv','w') as f:
f.write(each_table)
However, it prints out this:
[
[
'
a
'
,
...
...
So then I tried putting it into an array (again) and then printing it.
each_table_array=[each_table]
with open('tables.csv','w') as f:
f.write(each_table_array)
Now when I open up the csv file, its a bunch of unknown characters, and when I load it into excel, I get a character for every cell.
Not too sure if it's me using the csv library wrong, or the array portion.
I just figured out that the table I am pulling data from has another table within one of its cells, this expands out and messes up the whole formatting

You need to use the csv library for your job:
import csv
each_table = [['a','b','c'],['d','e','f'],['g','h','i']]
with open('tables.csv', 'w') as csvfile:
writer = csv.writer(csvfile)
for row in each_table:
writer.writerow(row)

As a more flexible and pythonic way use csv module for dealing with csv files Note that as you are in python 2 you need the method newline='' * in your open function . then you can use csv.writer to open you csv file for write:
import csv
with open('file_name.csv', 'w',newline='') as csvfile:
spamwriter = csv.writer(csvfile, delimiter=',')
spamwriter.writerows(main_list)
From python wiki: If newline='' is not specified, newlines embedded inside quoted fields will not be interpreted correctly, and on platforms that use \r\n linendings on write an extra \r will be added. It should always be safe to specify newline='', since the csv module does its own (universal) newline handling.

Python csv writer : "Unknown Dialect" Error

I have a very large string in the CSV format that will be written to a CSV file.
I try to write it to CSV using the simplest if the python script
results=""" "2013-12-03 23:59:52","/core/log","79.223.39.000","logging-4.0",iPad,Unknown,"1.0.1.59-266060",NA,NA,NA,NA,3,"1385593191.865",true,ERROR,"app_error","iPad/Unknown/webkit/537.51.1",NA,"Does+not",false
"2013-12-03 23:58:41","/core/log","217.7.59.000","logging-4.0",Win32,Unknown,"1.0.1.59-266060",NA,NA,NA,NA,4,"1385593120.68",true,ERROR,"app_error","Win32/Unknown/msie/9.0",NA,"Does+not,false
"2013-12-03 23:58:19","/core/client_log","79.240.195.000","logging-4.0",Win32,"5.1","1.0.1.59-266060",NA,NA,NA,NA,6,"1385593099.001",true,ERROR,"app_error","Win32/5.1/mozilla/25.0",NA,"Could+not:+{"url":"/all.json?status=ongoing,scheduled,conflict","code":0,"data":"","success":false,"error":true,"cached":false,"jqXhr":{"readyState":0,"responseText":"","status":0,"statusText":"error"}}",false"""
resultArray = results.split('\n')
with open(csvfile, 'wb') as f:
writer = csv.writer(f)
for row in resultArray:
writer.writerows(row)
The code returns
"Unknown Dialect"
Error
Is the error because of the script or is it due to the string that is being written?
EDIT
If the problem is bad input how do I sanitize it so that it can be used by the csv.writer() method?

You need to specify the format of your string:
with open(csvfile, 'wb') as f:
writer = csv.writer(f, delimiter=',', quotechar="'", quoting=csv.QUOTE_ALL)
You might also want to re-visit your writing loop; the way you have it written you will get one column in your file, and each row will be one character from the results string.
To really exploit the module, try this:
import csv
lines = ["'A','bunch+of','multiline','CSV,LIKE,STRING'"]
reader = csv.reader(lines, quotechar="'")
with open('out.csv', 'wb') as f:
writer = csv.writer(f)
writer.writerows(list(reader))
out.csv will have:
A,bunch+of,multiline,"CSV,LIKE,STRING"
If you want to quote all the column values, then add quoting=csv.QUOTE_ALL to the writer object; then you file will have:
"A","bunch+of","multiline","CSV,LIKE,STRING"
To change the quotes to ', add quotechar="'" to the writer object.

The above code does not give csv.writer.writerows input that it expects. Specifically:
resultArray = results.split('\n')
This creates a list of strings. Then, you pass each string to your writer and tell it to writerows with it:
for row in resultArray:
writer.writerows(row)
But writerows does not expect a single string. From the docs:
csvwriter.writerows(rows)
Write all the rows parameters (a list of row objects as described above) to the writer’s file object, formatted according to the current dialect.
So you're passing a string to a method that expects its argument to be a list of row objects, where a row object is itself expected to be a sequence of strings or numbers:
A row must be a sequence of strings or numbers for Writer objects
Are you sure your listed example code accurately reflects your attempt? While it certainly won't work, I would expect the exception produced to be different.
For a possible fix - if all you are trying to do is to write a big string to a file, you don't need the csv library at all. You can just write the string directly. Even splitting on newlines is unnecessary unless you need to do something like replacing Unix-style linefeeds with DOS-style linefeeds.
If you need to use the csv module after all, you need to give your writer something it understands - in this example, that would be something like writer.writerow(['A','bunch+of','multiline','CSV,LIKE,STRING']). Note that that's a true Python list of strings. If you need to turn your raw string "'A','bunch+of','multiline','CSV,LIKE,STRING'" into such a list, I think you'll find the csv library useful as a reader - no need to reinvent the wheel to handle the quoted commas in the substring 'CSV,LIKE,STRING'. And in that case you would need to care about your dialect.

you can use 'register_dialect':
for example for escaped formatting:
csv.register_dialect('escaped', escapechar='\\', doublequote=True, quoting=csv.QUOTE_ALL)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.