How to export Python list as csv file [duplicate] - python

I am trying to create a .csv file with the values from a Python list. When I print the values in the list they are all unicode (?), i.e. they look something like this
[u'value 1', u'value 2', ...]
If I iterate through the values in the list i.e. for v in mylist: print v they appear to be plain text.
And I can put a , between each with print ','.join(mylist)
And I can output to a file, i.e.
myfile = open(...)
print >>myfile, ','.join(mylist)
But I want to output to a CSV and have delimiters around the values in the list e.g.
"value 1", "value 2", ...
I can't find an easy way to include the delimiters in the formatting, e.g. I have tried through the join statement. How can I do this?

import csv
with open(..., 'wb') as myfile:
wr = csv.writer(myfile, quoting=csv.QUOTE_ALL)
wr.writerow(mylist)
Edit: this only works with python 2.x.
To make it work with python 3.x replace wb with w (see this SO answer)
with open(..., 'w', newline='') as myfile:
wr = csv.writer(myfile, quoting=csv.QUOTE_ALL)
wr.writerow(mylist)

Here is a secure version of Alex Martelli's:
import csv
with open('filename', 'wb') as myfile:
wr = csv.writer(myfile, quoting=csv.QUOTE_ALL)
wr.writerow(mylist)

For another approach, you can use DataFrame in pandas:
And it can easily dump the data to csv just like the code below:
import pandas
df = pandas.DataFrame(data={"col1": list_1, "col2": list_2})
df.to_csv("./file.csv", sep=',',index=False)

The best option I've found was using the savetxt from the numpy module:
import numpy as np
np.savetxt("file_name.csv", data1, delimiter=",", fmt='%s', header=header)
In case you have multiple lists that need to be stacked
np.savetxt("file_name.csv", np.column_stack((data1, data2)), delimiter=",", fmt='%s', header=header)

Use python's csv module for reading and writing comma or tab-delimited files. The csv module is preferred because it gives you good control over quoting.
For example, here is the worked example for you:
import csv
data = ["value %d" % i for i in range(1,4)]
out = csv.writer(open("myfile.csv","w"), delimiter=',',quoting=csv.QUOTE_ALL)
out.writerow(data)
Produces:
"value 1","value 2","value 3"

Jupyter notebook
Let's say that your list name is A
Then you can code the following and you will have it as a csv file (columns only!)
R="\n".join(A)
f = open('Columns.csv','w')
f.write(R)
f.close()

You could use the string.join method in this case.
Split over a few of lines for clarity - here's an interactive session
>>> a = ['a','b','c']
>>> first = '", "'.join(a)
>>> second = '"%s"' % first
>>> print second
"a", "b", "c"
Or as a single line
>>> print ('"%s"') % '", "'.join(a)
"a", "b", "c"
However, you may have a problem is your strings have got embedded quotes. If this is the case you'll need to decide how to escape them.
The CSV module can take care of all of this for you, allowing you to choose between various quoting options (all fields, only fields with quotes and seperators, only non numeric fields, etc) and how to esacpe control charecters (double quotes, or escaped strings). If your values are simple, string.join will probably be OK but if you're having to manage lots of edge cases, use the module available.

This solutions sounds crazy, but works smooth as honey
import csv
with open('filename', 'wb') as myfile:
wr = csv.writer(myfile, quoting=csv.QUOTE_ALL,delimiter='\n')
wr.writerow(mylist)
The file is being written by csvwriter hence csv properties are maintained i.e. comma separated.
The delimiter helps in the main part by moving list items to next line, each time.

Here is working copy-paste example for Python 3.x with options to define your own delimiter and quote char.
import csv
mylist = ['value 1', 'value 2', 'value 3']
with open('employee_file.csv', mode='w') as employee_file:
employee_writer = csv.writer(employee_file, delimiter=',', quotechar='"', quoting=csv.QUOTE_ALL)
employee_writer.writerow(mylist)
This will generate employee_file.csv that looks like this:
"value 1","value 2","value 3"
NOTE:
If quoting is set to csv.QUOTE_MINIMAL, then .writerow() will quote
fields only if they contain the delimiter or the quotechar. This is
the default case.
If quoting is set to csv.QUOTE_ALL, then .writerow() will quote all
fields.
If quoting is set to csv.QUOTE_NONNUMERIC, then .writerow() will quote
all fields containing text data and convert all numeric fields to the
float data type.
If quoting is set to csv.QUOTE_NONE, then .writerow() will escape
delimiters instead of quoting them. In this case, you also must
provide a value for the escapechar optional parameter.

To create and write into a csv file
The below example demonstrate creating and writing a csv file.
to make a dynamic file writer we need to import a package import csv, then need to create an instance of the file with file reference
Ex:- with open("D:\sample.csv","w",newline="") as file_writer
here if the file does not exist with the mentioned file directory then python will create a same file in the specified directory, and "w" represents write, if you want to read a file then replace "w" with "r" or to append to existing file then "a". newline="" specifies that it removes an extra empty row for every time you create row so to eliminate empty row we use newline="", create some field names(column names) using list like fields=["Names","Age","Class"], then apply to writer instance like
writer=csv.DictWriter(file_writer,fieldnames=fields)
here using Dictionary writer and assigning column names, to write column names to csv we use writer.writeheader() and to write values we use writer.writerow({"Names":"John","Age":20,"Class":"12A"}) ,while writing file values must be passed using dictionary method , here the key is column name and value is your respective key value
import csv
with open("D:\\sample.csv","w",newline="") as file_writer:
fields=["Names","Age","Class"]
writer=csv.DictWriter(file_writer,fieldnames=fields)
writer.writeheader()
writer.writerow({"Names":"John","Age":21,"Class":"12A"})

For those looking for less complicated solution. I actually find this one more simplisitic solution that will do similar job:
import pandas as pd
a = ['a','b','c']
df = pd.DataFrame({'a': a})
df= df.set_index('a').T
df.to_csv('list_a.csv', index=False)
Hope this helps as well.

you should use the CSV module for sure , but the chances are , you need to write unicode . For those Who need to write unicode , this is the class from example page , that you can use as a util module:
import csv, codecs, cStringIO
class UTF8Recoder:
"""
Iterator that reads an encoded stream and reencodes the input to UTF-8
"""
def __init__(self, f, encoding):
self.reader = codecs.getreader(encoding)(f)
def __iter__(self):
return self
def next(self):
return self.reader.next().encode("utf-8")
class UnicodeReader:
"""
A CSV reader which will iterate over lines in the CSV file "f",
which is encoded in the given encoding.
"""
def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds):
f = UTF8Recoder(f, encoding)
self.reader = csv.reader(f, dialect=dialect, **kwds)
def next(self):
row = self.reader.next()
return [unicode(s, "utf-8") for s in row]
def __iter__(self):
return self
class UnicodeWriter:
"""
A CSV writer which will write rows to CSV file "f",
which is encoded in the given encoding.
"""
def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds):
# Redirect output to a queue
self.queue = cStringIO.StringIO()
self.writer = csv.writer(self.queue, dialect=dialect, **kwds)
self.stream = f
self.encoder = codecs.getincrementalencoder(encoding)()
def writerow(self, row):
self.writer.writerow([s.encode("utf-8") for s in row])
# Fetch UTF-8 output from the queue ...
data = self.queue.getvalue()
data = data.decode("utf-8")
# ... and reencode it into the target encoding
data = self.encoder.encode(data)
# write to the target stream
self.stream.write(data)
# empty queue
self.queue.truncate(0)
def writerows(self, rows):
for row in rows:
self.writerow(row)

Here is another solution that does not require the csv module.
print ', '.join(['"'+i+'"' for i in myList])
Example :
>>> myList = [u'value 1', u'value 2', u'value 3']
>>> print ', '.join(['"'+i+'"' for i in myList])
"value 1", "value 2", "value 3"
However, if the initial list contains some ", they will not be escaped. If it is required, it is possible to call a function to escape it like that :
print ', '.join(['"'+myFunction(i)+'"' for i in myList])

Related

List to csv without commas in Python

I have a following problem.
I would like to save a list into a csv (in the first column).
See example here:
import csv
mylist = ["Hallo", "der Pixer", "Glas", "Telefon", "Der Kühlschrank brach kaputt."]
def list_na_csv(file, mylist):
with open(file, "w", newline="") as csv_file:
csv_writer = csv.writer(csv_file)
csv_writer.writerows(mylist)
list_na_csv("example.csv", mylist)
My output in excel looks like this:
Desired output is:
You can see that I have two issues: Firstly, each character is followed by comma. Secondly, I don`t know how to use some encoding, for example UTF-8 or cp1250. How can I fix it please?
I tried to search similar question, but nothing worked for me. Thank you.
You have two problems here.
writerows expects a list of rows, said differently a list of iterables. As a string is iterable, you write each word in a different row, one character per field. If you want one row with one word per field, you should use writerow
csv_writer.writerow(mylist)
by default, the csv module uses the comma as the delimiter (this is the most common one). But Excel is a pain in the ass with it: it expects the delimiter to be the one of the locale, which is the semicolon (;) in many West European countries, including Germany. If you want to use easily your file with your Excel you should change the delimiter:
csv_writer = csv.writer(csv_file, delimiter=';')
After your edit, you want all the data in the first column, one element per row. This is kind of a decayed csv file, because it only has one value per record and no separator. If the fields can never contain a semicolon nor a new line, you could just write a plain text file:
...
with open(file, "w", newline="") as csv_file:
for row in mylist:
print(row, file=file)
...
If you want to be safe and prevent future problems if you later want to process more corner cases values, you could still use the csv module and write one element per row by including it in another iterable:
...
with open(file, "w", newline="") as csv_file:
csv_writer = csv.writer(csv_file, delimiter=';')
csv_writer.writerows([elt] for elt in mylist)
...
l = ["Hallo", "der Pixer", "Glas", "Telefon", "Der Kühlschrank brach kaputt."]
with open("file.csv", "w") as msg:
msg.write(",".join(l))
For less trivial examples:
l = ["Hallo", "der, Pixer", "Glas", "Telefon", "Der Kühlschrank, brach kaputt."]
with open("file.csv", "w") as msg:
msg.write(",".join([ '"'+x+'"' for x in l]))
Here you basically set every list element between quotes, to prevent from the intra field comma problem.
Try this it will work 100%
import csv
mylist = ["Hallo", "der Pixer", "Glas", "Telefon", "Der Kühlschrank brach kaputt."]
def list_na_csv(file, mylist):
with open(file, "w") as csv_file:
csv_writer = csv.writer(csv_file)
csv_writer.writerow(mylist)
list_na_csv("example.csv", mylist)
If you want to write the entire list of strings to a single row, use csv_writer.writerow(mylist) as mentioned in the comments.
If you want to write each string to a new row, as I believe your reference to writing them in the first column implies, you'll have to format your data as the class expects: "A row must be an iterable of strings or numbers for Writer objects". On this data that would look something like:
csv_writer.writerows((entry,) for entry in mylist)
There, I'm using a generator expression to wrap each word in a tuple, thus making it an iterable of strings. Without something like that, your strings are themselves iterables and lead to it delimiting between each character as you've seen.
Using csv to write a single entry per line is almost pointless, but it does have the advantage that it will escape your delimiter if it appears in the data.
To specify an encoding, the docs say:
Since open() is used to open a CSV file for reading, the file will by
default be decoded into unicode using the system default encoding (see
locale.getpreferredencoding()). To decode a file using a different
encoding, use the encoding argument of open:
import csv with open('some.csv', newline='', encoding='utf-8') as f:
reader = csv.reader(f)
for row in reader:
print(row)
The same applies to writing in something other than the system default encoding: specify the encoding argument when
opening the output file.
try split("\n")
example:
counter = 0
amazing list = ["hello","hi"]
for x in titles:
ok = amazinglist[counter].split("\n")
writer.writerow(ok)
counter +=1

Program that takes a list of numbers 1- 1000, reverses them , puts them into a file (called my_file) and prints them back into my_file [duplicate]

I am trying to create a .csv file with the values from a Python list. When I print the values in the list they are all unicode (?), i.e. they look something like this
[u'value 1', u'value 2', ...]
If I iterate through the values in the list i.e. for v in mylist: print v they appear to be plain text.
And I can put a , between each with print ','.join(mylist)
And I can output to a file, i.e.
myfile = open(...)
print >>myfile, ','.join(mylist)
But I want to output to a CSV and have delimiters around the values in the list e.g.
"value 1", "value 2", ...
I can't find an easy way to include the delimiters in the formatting, e.g. I have tried through the join statement. How can I do this?
import csv
with open(..., 'wb') as myfile:
wr = csv.writer(myfile, quoting=csv.QUOTE_ALL)
wr.writerow(mylist)
Edit: this only works with python 2.x.
To make it work with python 3.x replace wb with w (see this SO answer)
with open(..., 'w', newline='') as myfile:
wr = csv.writer(myfile, quoting=csv.QUOTE_ALL)
wr.writerow(mylist)
Here is a secure version of Alex Martelli's:
import csv
with open('filename', 'wb') as myfile:
wr = csv.writer(myfile, quoting=csv.QUOTE_ALL)
wr.writerow(mylist)
For another approach, you can use DataFrame in pandas:
And it can easily dump the data to csv just like the code below:
import pandas
df = pandas.DataFrame(data={"col1": list_1, "col2": list_2})
df.to_csv("./file.csv", sep=',',index=False)
The best option I've found was using the savetxt from the numpy module:
import numpy as np
np.savetxt("file_name.csv", data1, delimiter=",", fmt='%s', header=header)
In case you have multiple lists that need to be stacked
np.savetxt("file_name.csv", np.column_stack((data1, data2)), delimiter=",", fmt='%s', header=header)
Use python's csv module for reading and writing comma or tab-delimited files. The csv module is preferred because it gives you good control over quoting.
For example, here is the worked example for you:
import csv
data = ["value %d" % i for i in range(1,4)]
out = csv.writer(open("myfile.csv","w"), delimiter=',',quoting=csv.QUOTE_ALL)
out.writerow(data)
Produces:
"value 1","value 2","value 3"
Jupyter notebook
Let's say that your list name is A
Then you can code the following and you will have it as a csv file (columns only!)
R="\n".join(A)
f = open('Columns.csv','w')
f.write(R)
f.close()
You could use the string.join method in this case.
Split over a few of lines for clarity - here's an interactive session
>>> a = ['a','b','c']
>>> first = '", "'.join(a)
>>> second = '"%s"' % first
>>> print second
"a", "b", "c"
Or as a single line
>>> print ('"%s"') % '", "'.join(a)
"a", "b", "c"
However, you may have a problem is your strings have got embedded quotes. If this is the case you'll need to decide how to escape them.
The CSV module can take care of all of this for you, allowing you to choose between various quoting options (all fields, only fields with quotes and seperators, only non numeric fields, etc) and how to esacpe control charecters (double quotes, or escaped strings). If your values are simple, string.join will probably be OK but if you're having to manage lots of edge cases, use the module available.
This solutions sounds crazy, but works smooth as honey
import csv
with open('filename', 'wb') as myfile:
wr = csv.writer(myfile, quoting=csv.QUOTE_ALL,delimiter='\n')
wr.writerow(mylist)
The file is being written by csvwriter hence csv properties are maintained i.e. comma separated.
The delimiter helps in the main part by moving list items to next line, each time.
Here is working copy-paste example for Python 3.x with options to define your own delimiter and quote char.
import csv
mylist = ['value 1', 'value 2', 'value 3']
with open('employee_file.csv', mode='w') as employee_file:
employee_writer = csv.writer(employee_file, delimiter=',', quotechar='"', quoting=csv.QUOTE_ALL)
employee_writer.writerow(mylist)
This will generate employee_file.csv that looks like this:
"value 1","value 2","value 3"
NOTE:
If quoting is set to csv.QUOTE_MINIMAL, then .writerow() will quote
fields only if they contain the delimiter or the quotechar. This is
the default case.
If quoting is set to csv.QUOTE_ALL, then .writerow() will quote all
fields.
If quoting is set to csv.QUOTE_NONNUMERIC, then .writerow() will quote
all fields containing text data and convert all numeric fields to the
float data type.
If quoting is set to csv.QUOTE_NONE, then .writerow() will escape
delimiters instead of quoting them. In this case, you also must
provide a value for the escapechar optional parameter.
To create and write into a csv file
The below example demonstrate creating and writing a csv file.
to make a dynamic file writer we need to import a package import csv, then need to create an instance of the file with file reference
Ex:- with open("D:\sample.csv","w",newline="") as file_writer
here if the file does not exist with the mentioned file directory then python will create a same file in the specified directory, and "w" represents write, if you want to read a file then replace "w" with "r" or to append to existing file then "a". newline="" specifies that it removes an extra empty row for every time you create row so to eliminate empty row we use newline="", create some field names(column names) using list like fields=["Names","Age","Class"], then apply to writer instance like
writer=csv.DictWriter(file_writer,fieldnames=fields)
here using Dictionary writer and assigning column names, to write column names to csv we use writer.writeheader() and to write values we use writer.writerow({"Names":"John","Age":20,"Class":"12A"}) ,while writing file values must be passed using dictionary method , here the key is column name and value is your respective key value
import csv
with open("D:\\sample.csv","w",newline="") as file_writer:
fields=["Names","Age","Class"]
writer=csv.DictWriter(file_writer,fieldnames=fields)
writer.writeheader()
writer.writerow({"Names":"John","Age":21,"Class":"12A"})
For those looking for less complicated solution. I actually find this one more simplisitic solution that will do similar job:
import pandas as pd
a = ['a','b','c']
df = pd.DataFrame({'a': a})
df= df.set_index('a').T
df.to_csv('list_a.csv', index=False)
Hope this helps as well.
you should use the CSV module for sure , but the chances are , you need to write unicode . For those Who need to write unicode , this is the class from example page , that you can use as a util module:
import csv, codecs, cStringIO
class UTF8Recoder:
"""
Iterator that reads an encoded stream and reencodes the input to UTF-8
"""
def __init__(self, f, encoding):
self.reader = codecs.getreader(encoding)(f)
def __iter__(self):
return self
def next(self):
return self.reader.next().encode("utf-8")
class UnicodeReader:
"""
A CSV reader which will iterate over lines in the CSV file "f",
which is encoded in the given encoding.
"""
def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds):
f = UTF8Recoder(f, encoding)
self.reader = csv.reader(f, dialect=dialect, **kwds)
def next(self):
row = self.reader.next()
return [unicode(s, "utf-8") for s in row]
def __iter__(self):
return self
class UnicodeWriter:
"""
A CSV writer which will write rows to CSV file "f",
which is encoded in the given encoding.
"""
def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds):
# Redirect output to a queue
self.queue = cStringIO.StringIO()
self.writer = csv.writer(self.queue, dialect=dialect, **kwds)
self.stream = f
self.encoder = codecs.getincrementalencoder(encoding)()
def writerow(self, row):
self.writer.writerow([s.encode("utf-8") for s in row])
# Fetch UTF-8 output from the queue ...
data = self.queue.getvalue()
data = data.decode("utf-8")
# ... and reencode it into the target encoding
data = self.encoder.encode(data)
# write to the target stream
self.stream.write(data)
# empty queue
self.queue.truncate(0)
def writerows(self, rows):
for row in rows:
self.writerow(row)
Here is another solution that does not require the csv module.
print ', '.join(['"'+i+'"' for i in myList])
Example :
>>> myList = [u'value 1', u'value 2', u'value 3']
>>> print ', '.join(['"'+i+'"' for i in myList])
"value 1", "value 2", "value 3"
However, if the initial list contains some ", they will not be escaped. If it is required, it is possible to call a function to escape it like that :
print ', '.join(['"'+myFunction(i)+'"' for i in myList])

Read CSV with comma as linebreak

I have a file saved as .csv
"400":0.1,"401":0.2,"402":0.3
Ultimately I want to save the data in a proper format in a csv file for further processing. The problem is that there are no line breaks in the file.
pathname = r"C:\pathtofile\file.csv"
with open(pathname, newline='') as file:
reader = file.read().replace(',', '\n')
print(reader)
with open(r"C:\pathtofile\filenew.csv", 'w') as new_file:
csv_writer = csv.writer(new_file)
csv_writer.writerow(reader)
The print reader output looks exactly how I want (or at least it's a format I can further process).
"400":0.1
"401":0.2
"402":0.3
And now I want to save that to a new csv file. However the output looks like
"""",4,0,0,"""",:,0,.,1,"
","""",4,0,1,"""",:,0,.,2,"
","""",4,0,2,"""",:,0,.,3
I'm sure it would be intelligent to convert the format to
400,0.1
401,0.2
402,0.3
at this stage instead of doing later with another script.
The main problem is that my current code
with open(pathname, newline='') as file:
reader = file.read().replace(',', '\n')
reader = csv.reader(reader,delimiter=':')
x = []
y = []
print(reader)
for row in reader:
x.append( float(row[0]) )
y.append( float(row[1]) )
print(x)
print(y)
works fine for the type of csv files I currently have, but doesn't work for these mentioned above:
y.append( float(row[1]) )
IndexError: list index out of range
So I'm trying to find a way to work with them too. I think I'm missing something obvious as I imagine that it can't be too hard to properly define the linebreak character and delimiter of a file.
with open(pathname, newline=',') as file:
yields
ValueError: illegal newline value: ,
The right way with csv module, without replacing and casting to float:
import csv
with open('file.csv', 'r') as f, open('filenew.csv', 'w', newline='') as out:
reader = csv.reader(f)
writer = csv.writer(out, quotechar=None)
for r in reader:
for i in r:
writer.writerow(i.split(':'))
The resulting filenew.csv contents (according to your "intelligent" condition):
400,0.1
401,0.2
402,0.3
Nuances:
csv.reader and csv.writer objects treat comma , as default delimiter (no need to file.read().replace(',', '\n'))
quotechar=None is specified for csv.writer object to eliminate double quotes around the values being saved
You need to split the values to form a list to represent a row. Presently the code is splitting the string into individual characters to represent the row.
pathname = r"C:\pathtofile\file.csv"
with open(pathname) as old_file:
with open(r"C:\pathtofile\filenew.csv", 'w') as new_file:
csv_writer = csv.writer(new_file, delimiter=',')
text_rows = old_file.read().split(",")
for row in text_rows:
items = row.split(":")
csv_writer.writerow([int(items[0]), items[1])
If you look at the documentation, for write_row, it says:
Write the row parameter to the writer’s file
object, formatted according to the current dialect.
But, you are writing an entire string in your code
csv_writer.writerow(reader)
because reader is a string at this point.
Now, the format you want to use in your CSV file is not clearly mentioned in the question. But as you said, if you can do some preprocessing to create a list of lists and pass each sublist to writerow(), you should be able to produce the required file format.

Python issues on character encoding

I'm working on a program that need to take two files and merge them and write the union file as a new one. The problem is that the output file contains chars like this \xf0 or if i change some of the encodings the result is something like that \u0028. The input file are codificated in utf8. How can i print on the output file chars like "è" or "ò" and "-"
I have done this code:
import codecs
import pandas as pd
import numpy as np
goldstandard = "..\\files\file1.csv"
tweets = "..\\files\\file2.csv"
with codecs.open(tweets, "r", encoding="utf8") as t:
tFile = pd.read_csv(t, delimiter="\t",
names=['ID', 'Tweet'],
quoting=3)
IDs = tFile['ID']
tweets = tFile['Tweet']
dict = {}
for i in range(len(IDs)):
dict[np.int64(IDs[i])] = [str(tweets[i])]
with codecs.open(goldstandard, "r", encoding="utf8") as gs:
for line in gs:
columns = line.split("\t")
index = np.int64(columns[0])
rowValue = dict[index]
rowValue.append([columns[1], columns[2], columns[3], columns[5]])
dict[index] = rowValue
import pprint
pprint.pprint(dict)
ndic = pprint.pformat(dict, indent=4)
f = codecs.open("out.csv", "w", "utf8")
f.write(ndic)
f.close()
and this is example of the outputs
desired: Beyoncè
obtained: Beyonc\xe9
You are producing Python string literals, here:
import pprint
pprint.pprint(dict)
ndic = pprint.pformat(dict, indent=4)
Pretty-printing is useful for producing debugging output; objects are passed through repr() to make non-printable and non-ASCII characters easily distinguishable and reproducible:
>>> import pprint
>>> value = u'Beyonc\xe9'
>>> value
u'Beyonc\xe9'
>>> print value
Beyoncé
>>> pprint.pprint(value)
u'Beyonc\xe9'
The é character is in the Latin-1 range, outside of the ASCII range, so it is represented with syntax that produces the same value again when used in Python code.
Don't use pprint if you want to write out actual string values to the output file. You'll have to do your own formatting in that case.
Moreover, the pandas dataframe will hold bytestrings, not unicode objects, so you still have undecoded UTF-8 data at that point.
Personally, I'd not even bother using pandas here; you appear to want to write CSV data, so I've simplified your code to use the csv module instead, and I'm not actually bothering to decode the UTF-8 here (this is safe for this case as both input and output is entirely in UTF-8):
import csv
tweets = {}
with open(tweets, "rb") as t:
reader = csv.reader(t, delimiter='\t')
for id_, tweet in reader:
tweets[id_] = tweet
with open(goldstandard, "rb") as gs, open("out.csv", 'wb') as outf:
reader = csv.reader(gs, delimiter='\t')
writer = csv.reader(outf, delimiter='\t')
for columns in reader:
index = columns[0]
writer.writerow([tweets[index]] + columns[1:4] + [columns[5])
Note that you really want to avoid using dict as a variable name; it masks the built-in type, I used tweets instead.

Create a .csv file with values from a Python list

I am trying to create a .csv file with the values from a Python list. When I print the values in the list they are all unicode (?), i.e. they look something like this
[u'value 1', u'value 2', ...]
If I iterate through the values in the list i.e. for v in mylist: print v they appear to be plain text.
And I can put a , between each with print ','.join(mylist)
And I can output to a file, i.e.
myfile = open(...)
print >>myfile, ','.join(mylist)
But I want to output to a CSV and have delimiters around the values in the list e.g.
"value 1", "value 2", ...
I can't find an easy way to include the delimiters in the formatting, e.g. I have tried through the join statement. How can I do this?
import csv
with open(..., 'wb') as myfile:
wr = csv.writer(myfile, quoting=csv.QUOTE_ALL)
wr.writerow(mylist)
Edit: this only works with python 2.x.
To make it work with python 3.x replace wb with w (see this SO answer)
with open(..., 'w', newline='') as myfile:
wr = csv.writer(myfile, quoting=csv.QUOTE_ALL)
wr.writerow(mylist)
Here is a secure version of Alex Martelli's:
import csv
with open('filename', 'wb') as myfile:
wr = csv.writer(myfile, quoting=csv.QUOTE_ALL)
wr.writerow(mylist)
For another approach, you can use DataFrame in pandas:
And it can easily dump the data to csv just like the code below:
import pandas
df = pandas.DataFrame(data={"col1": list_1, "col2": list_2})
df.to_csv("./file.csv", sep=',',index=False)
The best option I've found was using the savetxt from the numpy module:
import numpy as np
np.savetxt("file_name.csv", data1, delimiter=",", fmt='%s', header=header)
In case you have multiple lists that need to be stacked
np.savetxt("file_name.csv", np.column_stack((data1, data2)), delimiter=",", fmt='%s', header=header)
Use python's csv module for reading and writing comma or tab-delimited files. The csv module is preferred because it gives you good control over quoting.
For example, here is the worked example for you:
import csv
data = ["value %d" % i for i in range(1,4)]
out = csv.writer(open("myfile.csv","w"), delimiter=',',quoting=csv.QUOTE_ALL)
out.writerow(data)
Produces:
"value 1","value 2","value 3"
Jupyter notebook
Let's say that your list name is A
Then you can code the following and you will have it as a csv file (columns only!)
R="\n".join(A)
f = open('Columns.csv','w')
f.write(R)
f.close()
You could use the string.join method in this case.
Split over a few of lines for clarity - here's an interactive session
>>> a = ['a','b','c']
>>> first = '", "'.join(a)
>>> second = '"%s"' % first
>>> print second
"a", "b", "c"
Or as a single line
>>> print ('"%s"') % '", "'.join(a)
"a", "b", "c"
However, you may have a problem is your strings have got embedded quotes. If this is the case you'll need to decide how to escape them.
The CSV module can take care of all of this for you, allowing you to choose between various quoting options (all fields, only fields with quotes and seperators, only non numeric fields, etc) and how to esacpe control charecters (double quotes, or escaped strings). If your values are simple, string.join will probably be OK but if you're having to manage lots of edge cases, use the module available.
This solutions sounds crazy, but works smooth as honey
import csv
with open('filename', 'wb') as myfile:
wr = csv.writer(myfile, quoting=csv.QUOTE_ALL,delimiter='\n')
wr.writerow(mylist)
The file is being written by csvwriter hence csv properties are maintained i.e. comma separated.
The delimiter helps in the main part by moving list items to next line, each time.
Here is working copy-paste example for Python 3.x with options to define your own delimiter and quote char.
import csv
mylist = ['value 1', 'value 2', 'value 3']
with open('employee_file.csv', mode='w') as employee_file:
employee_writer = csv.writer(employee_file, delimiter=',', quotechar='"', quoting=csv.QUOTE_ALL)
employee_writer.writerow(mylist)
This will generate employee_file.csv that looks like this:
"value 1","value 2","value 3"
NOTE:
If quoting is set to csv.QUOTE_MINIMAL, then .writerow() will quote
fields only if they contain the delimiter or the quotechar. This is
the default case.
If quoting is set to csv.QUOTE_ALL, then .writerow() will quote all
fields.
If quoting is set to csv.QUOTE_NONNUMERIC, then .writerow() will quote
all fields containing text data and convert all numeric fields to the
float data type.
If quoting is set to csv.QUOTE_NONE, then .writerow() will escape
delimiters instead of quoting them. In this case, you also must
provide a value for the escapechar optional parameter.
To create and write into a csv file
The below example demonstrate creating and writing a csv file.
to make a dynamic file writer we need to import a package import csv, then need to create an instance of the file with file reference
Ex:- with open("D:\sample.csv","w",newline="") as file_writer
here if the file does not exist with the mentioned file directory then python will create a same file in the specified directory, and "w" represents write, if you want to read a file then replace "w" with "r" or to append to existing file then "a". newline="" specifies that it removes an extra empty row for every time you create row so to eliminate empty row we use newline="", create some field names(column names) using list like fields=["Names","Age","Class"], then apply to writer instance like
writer=csv.DictWriter(file_writer,fieldnames=fields)
here using Dictionary writer and assigning column names, to write column names to csv we use writer.writeheader() and to write values we use writer.writerow({"Names":"John","Age":20,"Class":"12A"}) ,while writing file values must be passed using dictionary method , here the key is column name and value is your respective key value
import csv
with open("D:\\sample.csv","w",newline="") as file_writer:
fields=["Names","Age","Class"]
writer=csv.DictWriter(file_writer,fieldnames=fields)
writer.writeheader()
writer.writerow({"Names":"John","Age":21,"Class":"12A"})
For those looking for less complicated solution. I actually find this one more simplisitic solution that will do similar job:
import pandas as pd
a = ['a','b','c']
df = pd.DataFrame({'a': a})
df= df.set_index('a').T
df.to_csv('list_a.csv', index=False)
Hope this helps as well.
you should use the CSV module for sure , but the chances are , you need to write unicode . For those Who need to write unicode , this is the class from example page , that you can use as a util module:
import csv, codecs, cStringIO
class UTF8Recoder:
"""
Iterator that reads an encoded stream and reencodes the input to UTF-8
"""
def __init__(self, f, encoding):
self.reader = codecs.getreader(encoding)(f)
def __iter__(self):
return self
def next(self):
return self.reader.next().encode("utf-8")
class UnicodeReader:
"""
A CSV reader which will iterate over lines in the CSV file "f",
which is encoded in the given encoding.
"""
def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds):
f = UTF8Recoder(f, encoding)
self.reader = csv.reader(f, dialect=dialect, **kwds)
def next(self):
row = self.reader.next()
return [unicode(s, "utf-8") for s in row]
def __iter__(self):
return self
class UnicodeWriter:
"""
A CSV writer which will write rows to CSV file "f",
which is encoded in the given encoding.
"""
def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds):
# Redirect output to a queue
self.queue = cStringIO.StringIO()
self.writer = csv.writer(self.queue, dialect=dialect, **kwds)
self.stream = f
self.encoder = codecs.getincrementalencoder(encoding)()
def writerow(self, row):
self.writer.writerow([s.encode("utf-8") for s in row])
# Fetch UTF-8 output from the queue ...
data = self.queue.getvalue()
data = data.decode("utf-8")
# ... and reencode it into the target encoding
data = self.encoder.encode(data)
# write to the target stream
self.stream.write(data)
# empty queue
self.queue.truncate(0)
def writerows(self, rows):
for row in rows:
self.writerow(row)
Here is another solution that does not require the csv module.
print ', '.join(['"'+i+'"' for i in myList])
Example :
>>> myList = [u'value 1', u'value 2', u'value 3']
>>> print ', '.join(['"'+i+'"' for i in myList])
"value 1", "value 2", "value 3"
However, if the initial list contains some ", they will not be escaped. If it is required, it is possible to call a function to escape it like that :
print ', '.join(['"'+myFunction(i)+'"' for i in myList])

Categories

Resources