I'm trying to write a list of strings like below to a file separated by the given delimiter.
res = [u'123', u'hello world']
When I try splitting by TAB like below it gives me the correctly formatted string.
writer = csv.writer(sys.stdout, delimiter="\t")
writer.writerow(res)
gives --> 123 hello world
But when I try to split by space using delimiter=" ", it gives me the space but with quotation marks like below.
123 "hello world"
How do I remove quotation marks. So that when I use space as the delimiter I should get
123 hello world.
EIDT: when I try using the escapechar it doesn't make any double quotes. But everywhere in my testdata it appears a space, it makes it double.
You can set the csv.writer to quote nothing with quoting=csv.QUOTE_NONE for example:
import csv
with open('eggs.csv', 'wb') as csvfile:
spamwriter = csv.writer(csvfile, delimiter=' ',
escapechar=' ', quoting=csv.QUOTE_NONE)
spamwriter.writerow(['Spam'] * 5 + ['Baked Beans'])
spamwriter.writerow(['Spam', 'Lovely Spam', 'Wonderful Spam'])
Produces:
Spam Spam Spam Spam Spam Baked Beans
Spam Lovely Spam Wonderful Spam
If you do QUOTING_NONE you also need and escape character.
Quoting behavior is controlled by the various quoting arguments provided to the writer (or set on the Dialect object if you prefer to do things that way). The default setting is QUOTE_MINIMAL, which will not produce the behavior you're describing unless a value contains your delimiter character, quote character, or line terminator character. Doublecheck your test data - [u'123', u'hello'] won't produce what you describe, but [u'123', u' hello'] would.
You can specify QUOTE_NONE if you're sure that's the behavior you want, in which case it'll either try to escape instances of your delimiter character if you set an escape character, or raise an exception if you don't.
Do you need the csv lib? Just join the strings...
>>> res = [u'123', u'hello']
>>> print res
[u'123', u'hello']
>>> print " ".join(res)
123 hello
What worked for me was using a regular writer, not the csv.writer, and simply use your delimiter in between columns ('\t' in my case):
with open(target_path, 'w', encoding='utf-8') as fd:
# some code iterating over a pandas daftaframe called mydf
# create a string out of column 0, '\t' (tab delimiter) and column 1:
output = mydf.loc[i][0] + '\t' + mydf.loc[i][1] +'\n'
# write that output string (line) to the file in every iteration
fd.write(output)
It might not be the "correct" way but it definitely kept the original lines in my project, which included many strings and quotations.
Related
I am trying to read a CSV file that sometimes uses double quotes (") for strings and sometimes uses single quotes (') for strings.
I would like to read the file to properly handle these strings.
It is not necessary but it would be helpful if " don't " was parsed correctly. This is why I want to avoid just replacing every ' for ".
A crude way to handle this would be to use regex to detect any single quotation which is either preceded by a space, or followed by a space.
We can then replace just these quotations with " and ignore the ones which have letters directly next to them.
CSV
"""Let's do a test""","""We will replace all 'single' quotation's not within""","""A word to """""
Python
import re
pattern = r'((?<=\s)\')|(\'(?=\s))'
data = []
with open('hello.csv', 'r') as file:
for row in file.readlines():
data.append(re.sub(pattern, '"', row))
Output
['"""Let\'s do a test""","""We will replace all "single" quotation\'s not within""","""A word to """""\n']
You can use quoting=csv.QUOTE_NONE to prevent quote processing by csv.reader, then use ast.literal_eval to interpret values as Python literals (or, if that fails, keep them as strings).
import io
import csv
from ast import literal_eval
def unquote(item):
item = item.strip()
try:
return literal_eval(item.strip())
except ValueError:
return item
f = io.StringIO(r'''
bare, "John \"O'Brien\" Smith", 'John "O\'Brien" Smith', 42
'''.strip())
reader = csv.reader(f, quoting=csv.QUOTE_NONE)
for row in reader:
parsed_row = [unquote(item) for item in row]
print(parsed_row)
# => ['bare', 'John "O\'Brien" Smith', 'John "O\'Brien" Smith', 42]
Note though that since they are evaluated as Python literals, any unquoted field values that represent valid Python literals (e.g. True or 42) will not remain as strings.
Problem: I cannot seem to parse the information in a text file because python reads it as a full string not individual separate strings. The spaces between each variable is not a \t which is why it does not separate. Is there a way for python to flexibly remove the spaces and put a comma or \t instead?
Example DATA:
MOR125-1 MOR129-1 0.587
MOR125-1 MOR129-3 0.598
MOR129-1 MOR129-3 0.115
The code I am using:
with open("Distance_Data_No_Bootstrap_RAW.txt","rb") as f:
reader = csv.reader(f,delimiter="\t")
d=list(reader)
for i in range(3):
print d[i]
Output:
['MOR125-1 MOR129-1 0.587']
['MOR125-1 MOR129-3 0.598']
['MOR129-1 MOR129-3 0.115']
Desired Output:
['MOR125-1', 'MOR129-1', '0.587']
['MOR125-1', 'MOR129-3', '0.598']
['MOR129-1', 'MOR129-3', '0.115']
You can simply declare the delimiter to be a space, and ask csv to skip initial spaces after a delimiter. That way, your separator is in fact the regular expression ' +', that is one or more spaces.
rd = csv.reader(fd, delimiter=' ', skipinitialspace=True)
for row in rd:
print row
['MOR125-1', 'MOR129-1', '0.587']
['MOR125-1', 'MOR129-3', '0.598']
['MOR129-1', 'MOR129-3', '0.115']
You can instruct csv.reader to use space as delimiter and skip all the extra space:
reader = csv.reader(f, delimiter=" ", skipinitialspace=True)
For detailed information about available parameters check Python docs:
Dialect.delimiter
A one-character string used to separate fields. It defaults to ','.
Dialect.skipinitialspace
When True, whitespace immediately following the delimiter is ignored. The default is False.
I am a second year EE student.
I just started learning python for my project.
I intend to parse a csv file with a format like
3520005,"Toronto (Ont.)",C ,F,2503281,2481494,F,F,0.9,1040597,979330,630.1763,3972.4,1
2466023,"Montréal (Que.)",V ,F,1620693,1583590,T,F,2.3,787060,743204,365.1303,4438.7,2
5915022,"Vancouver (B.C.)",CY ,F,578041,545671,F,F,5.9,273804,253212,114.7133,5039.0,8
3519038,"Richmond Hill (Ont.)",T ,F,162704,132030,F,F,23.2,53028,51000,100.8917,1612.7,28
into a text file like the following
Toronto 2503281
Montreal 1620693
Vancouver 578041
I am extracting the 1st and 5th column and save it into a text file.
This is what i have so far.
import csv
file = open('raw.csv')
reader = csv.reader(file)
f = open('NicelyDone.text','w')
for line in reader:
f.write("%s %s"%line[1],%line[5])
This is not working for me, I was able to extract the data from the csv file as line[1],line[5]. (I am able to print it out)
But I dont know how to write it to a .text file in the format i wanted.
Also, I have to process the first column eg, "Toronto (Ont.)" into "Toronto".
I am familiar with the function find(), I assume that i could extract Toronto out of Toronto(Ont.) using "(" as the stopping character,
but based on my research , I have no idea how to use it and ask it to return me the string(Toronto).
Here is my question:
What is the data format for line[1]?
If it is string how come f.write() does not work?
If it is not string, how do i convert it to a string?
How do i extract the word Toronto out of Toronto(Ont) into a string form using find() or other methods.
My thinking is that I could add those 2 string together like c = a+ ' ' + b, that would give me the format i wanted.
So i can use f.write() to write into a file :)
Sorry if my questions sounds too easy or stupid.
Thanks ahead
Zhen
All data read you get from csv.reader are strings.
There is a variety of solutions to this, but the simplest would be to split on ( and strip away any whitespace:
>>> a = 'Toronto (Ont.)'
>>> b = a.split('(')
>>> b
Out[16]: ['Toronto ', 'Ont.)']
>>> c = b[0]
>>> c
Out[18]: 'Toronto '
>>> c.strip()
Out[19]: 'Toronto'
or in one line:
>>> print 'Toronto (Ont.)'.split('(')[0].strip()
Another option would have been to use regular expression (the re module).
The specific problem in your code lies here:
f.write("%s %s"%line[1],%line[5])
Using the % syntax to format your string, you have to provide either a single value, or an iterable. In your case this should be:
f.write("%s %s" % (line[1], line[5]))
Another way to do the exact same thing, is to use the format method.
f.write('{} {}'.format(line[1], line[5]))
This is a flexible way of formating strings, and I recommend that you read about in the docs.
Regarding your code, there is a couple of things you should consider.
Always remember to close your file handlers. If you use with open(...) as fp, this is taken care of for you.
with open('myfile.txt') as ifile:
# Do stuff
# The file is closed here
Don't use reserved words as your variable name. file is such a thing, and by using it as something else (shadowing it), you may cause problems later on in your code.
To write your data, you can use csv.writer:
with open('myfile.txt', 'wb') as ofile:
writer = csv.writer(ofile)
writer.writerow(['my', 'data'])
From Python 2.6 and above, you can combine multiple with statements in one statement:
with open('raw.csv') as ifile, open('NicelyDone.text','w') as ofile:
reader = csv.reader(ifile)
writer = csv.writer(ofile)
Combining this knowledge, your script can be rewritten to something like:
import csv
with open('raw.csv') as ifile, open('NicelyDone.text', 'wb') as ofile:
reader = csv.reader(ifile)
writer = csv.writer(ofile, delimiter=' ')
for row in reader:
city, num = row[1].split('(')[0].strip(), row[5]
writer.writerow([city, num])
I don't recall csv that well, so I don't know if it's a string or not. What error are you getting? In any case, assuming it is a string, your line should be:
f.write("%s %s " % (line[1], line[5]))
In other words, you need a set of parentheses. Also, you should have a trailing space in your string.
A somewhat hackish but concise way to do this is: line[1].split("(")[0]
This will create a list that splits on the ( symbol, and then you extract the first element.
I am writing the csv file like this
for a in products:
mylist =[]
for h in headers['product']:
mylist.append(a.get(h))
writer.writerow(mylist)
My my few fields are text fields can conatins any characters like , " ' \n or anything else. what is the safest way to write that in csv file. also file will also have integers and floats
You should use QUOTE_ALL quoting option:
import StringIO
import csv
row = ["AAA \n BBB ,222 \n CCC;DDD \" EEE ' FFF 111"]
output = StringIO.StringIO()
wr = csv.writer(output, quoting=csv.QUOTE_ALL)
wr.writerow( row )
# Test:
contents = output.getvalue()
parsedRow = list(csv.reader([contents]))[0]
if parsedRow == row: print "BINGO!"
using csv.QUOTE_ALL will ensure that all of your entries are quoted like so:
"value1","value2","value3" while using csv.QUOTE_NONE will give you: value1,value2,value3
Additionally, this will change all of your quotes in the entries to double quotes as follows. "somedata"user"somemoredata will become
"somedata""user""somemoredata in your written .csv
However, if you set your quotechar to the backslash character (for example), your entry will return as \" for all quotes.
create=csv.writer(open("test.csv","wb"),quoting=csv.QUOTE_NONEescapechar='\\', quotechar='"')
for element in file:
create.writerow(element)
and the previous example will become somedata\"user\"somemoredata which is clean. It will also escape any commas that you have in your elements the same way.
I am trying to read a bunch of data in .csv file into an array in format:
[ [a,b,c,d], [e,f,g,h], ...]
Running the code below, when I print an entry with a space (' ') the way I'm accessing the element isn't correct because it stops at the first space (' ').
For example if Business, Fast Company, Youtube, fastcompany is the 10th entry...when I print the below I get on separate lines:
Business,Fast
Company,YouTube,FastCompany
Any advice on how to get as the result: [ [a,b,c,d], [Business, Fast Company, Youtube, fastcompany], [e,f,g,h], ...]?
import csv
partners = []
partner_dict = {}
i=9
with open('partners.csv', 'rb') as csvfile:
spamreader = csv.reader(csvfile, delimiter=' ', quotechar='|')
for row in spamreader:
partners.append(row)
print len(partners)
for entry in partners[i]:
print entry
The delimiter argument specifies which character to use to split each row of the file into separate values. Since you're passing ' ' (a space), the reader is splitting on spaces.
If this is really a comma-separated file, use ',' as the delimiter (or just leave the delimiter argument out and it will default to ',').
Also, the pipe character is an unusual value for the quote character. Is it really true that your input file contains pipes in place of quotes? The sample data you supplied contains neither pipes nor quotes.
There are a few issues with your code:
The "correct" syntax for iterating over a list is for entry in partners:, not for entry in partners[i]:
The partners_dict variable in your code seems to be unused, I assume you'll use it later, so I'll ignore it for now
You're opening a text file as binary (use open(file_name, "r") instead of open(file_name, "rb")
Your handling of the processed data is still done inside of the context manager (with ... [as ...]:-block)
Your input text seems to delimit by ", ", but you delimit by " " when parsing
If I understood your question right your problem seems to be caused by the last one. The "obvious solution" would probably be to change the delimeter argument to ", ", but only single-char strings are allowed as delimiters by the module. So what do we do? Well, since "," is really the "true" delimiter (it's never supposed to be inside actual unquoted data, contrary to spaces), that would seem like a good solution. However, now all your values start with " " which is probably not what you want. So what do you do? Well, all strings have a pretty neat strip() method which by default removes all whitespace in the beginning and end of the string. So, to strip() all the values, let's use a "list comprehension" (evaluates an expression on all items in a list and then returns a new list with the new values) which should look somewhat like [i.strip() for i in row] before appending it to partners.
In the end your code should hopefully look somewhat like this:
import csv
partners = []
with open('partners.csv', 'r') as csvfile:
spamreader = csv.reader(csvfile, delimiter=',', quotechar='|')
for row in spamreader:
partners.append([i.strip() for i in row])
print len(partners)
for entry in partners:
print entry