Problem: I cannot seem to parse the information in a text file because python reads it as a full string not individual separate strings. The spaces between each variable is not a \t which is why it does not separate. Is there a way for python to flexibly remove the spaces and put a comma or \t instead?
Example DATA:
MOR125-1 MOR129-1 0.587
MOR125-1 MOR129-3 0.598
MOR129-1 MOR129-3 0.115
The code I am using:
with open("Distance_Data_No_Bootstrap_RAW.txt","rb") as f:
reader = csv.reader(f,delimiter="\t")
d=list(reader)
for i in range(3):
print d[i]
Output:
['MOR125-1 MOR129-1 0.587']
['MOR125-1 MOR129-3 0.598']
['MOR129-1 MOR129-3 0.115']
Desired Output:
['MOR125-1', 'MOR129-1', '0.587']
['MOR125-1', 'MOR129-3', '0.598']
['MOR129-1', 'MOR129-3', '0.115']
You can simply declare the delimiter to be a space, and ask csv to skip initial spaces after a delimiter. That way, your separator is in fact the regular expression ' +', that is one or more spaces.
rd = csv.reader(fd, delimiter=' ', skipinitialspace=True)
for row in rd:
print row
['MOR125-1', 'MOR129-1', '0.587']
['MOR125-1', 'MOR129-3', '0.598']
['MOR129-1', 'MOR129-3', '0.115']
You can instruct csv.reader to use space as delimiter and skip all the extra space:
reader = csv.reader(f, delimiter=" ", skipinitialspace=True)
For detailed information about available parameters check Python docs:
Dialect.delimiter
A one-character string used to separate fields. It defaults to ','.
Dialect.skipinitialspace
When True, whitespace immediately following the delimiter is ignored. The default is False.
Related
I want to export some data from DB to CSV file. I need to add a '|' delimiter to specific fields. At the moment when I export file, I use something like that:
- To specific fields (at the end and beginning) I add '|':
....
if response.value_display.startswith('|'):
sheets[response.sheet.session][response.input.id] = response.value_display
else:
sheets[response.sheet.session][response.input.id] = '|'+response.value_display+'|'
....
And I have CSV writer function settings like that:
self.writer = csv.writer(self.queue, dialect=dialect,
lineterminator='\n',
quotechar='',
quoting=csv.QUOTE_NONE,
escapechar=' ',
** kwargs)
Now It works, but when I have DateTime fields(where is space) writer adds some extra space.
When I have default settings (sometimes) at the end and beginning CSV writer add double-quotes but I don't know why and what it depends on.
To remove your extra spaces I would just do something like.
file = open(the_file.csv, w+) #open your csv file
file.write(file.readline().replace(" ", " ") #finds any two spaces and replaces with one
file.close()
With the delimiter it is specific to the situation. If you want to add it at the beginning or the end.
delimiter = "|"
my_str = my_str + delimiter
or
delimiter = "|"
my_str = delimiter + my_str
If you want to add the delimiter somewhere else you may have to get creative as it would be based on the context.
I'm not sure on the double quotes. I'd replace like the spaces.
file = open(the_file.csv, w+) #open your csv file
file.write(file.readline().replace("\"", "'")
file.close()
Assuming you wanted to replace the double quote with a single quote.
I'm trying to make csv module to parse lines containing quoted strings and quoted separators. Unfortunately I'm not able to achieve desired results with any dialect/format parameters. Is there any way to parse this:
'"AAA", BBB, "CCC, CCC"'
and get this:
['"AAA"', 'BBB', '"CCC, CCC"'] # 3 elements, one quoted separator
?
Two fundamental requirements:
Quotations have to be preserved
Quoted, and not escaped separators have to be copied as regular characters
Is it possible?
There are 2 issues to overcome:
spaces around comma separator: skipinitialspace=True does the job (see also Python parse CSV ignoring comma with double-quotes)
preserving quoting when reading: replacing quotes by tripled quotes allows to preserve quotes
That second part is described in the documentation as:
Dialect.doublequote
Controls how instances of quotechar appearing inside a field should themselves be quoted. When True, the character is doubled. When False, the escapechar is used as a prefix to the quotechar. It defaults to True.
standalone example, without file:
import csv
data = ['"AAA", BBB, "CCC, CCC"'.replace('"','"""')]
cr = csv.reader(data,skipinitialspace=True)
row = next(cr)
print(row)
result:
['"AAA"', 'BBB', '"CCC, CCC"']
with a file as input:
import csv
with open("input.csv") as f:
cr = csv.reader((l.replace('"','"""' for l in f),skipinitialspace=True)
for row in cr:
print(row)
Have you tried this ?
import csv
with open('file.csv') as csvfile:
reader = csv.reader(csvfile, delimiter=',', quotechar='"')
for row in reader:
print row
I'm trying to write a list of strings like below to a file separated by the given delimiter.
res = [u'123', u'hello world']
When I try splitting by TAB like below it gives me the correctly formatted string.
writer = csv.writer(sys.stdout, delimiter="\t")
writer.writerow(res)
gives --> 123 hello world
But when I try to split by space using delimiter=" ", it gives me the space but with quotation marks like below.
123 "hello world"
How do I remove quotation marks. So that when I use space as the delimiter I should get
123 hello world.
EIDT: when I try using the escapechar it doesn't make any double quotes. But everywhere in my testdata it appears a space, it makes it double.
You can set the csv.writer to quote nothing with quoting=csv.QUOTE_NONE for example:
import csv
with open('eggs.csv', 'wb') as csvfile:
spamwriter = csv.writer(csvfile, delimiter=' ',
escapechar=' ', quoting=csv.QUOTE_NONE)
spamwriter.writerow(['Spam'] * 5 + ['Baked Beans'])
spamwriter.writerow(['Spam', 'Lovely Spam', 'Wonderful Spam'])
Produces:
Spam Spam Spam Spam Spam Baked Beans
Spam Lovely Spam Wonderful Spam
If you do QUOTING_NONE you also need and escape character.
Quoting behavior is controlled by the various quoting arguments provided to the writer (or set on the Dialect object if you prefer to do things that way). The default setting is QUOTE_MINIMAL, which will not produce the behavior you're describing unless a value contains your delimiter character, quote character, or line terminator character. Doublecheck your test data - [u'123', u'hello'] won't produce what you describe, but [u'123', u' hello'] would.
You can specify QUOTE_NONE if you're sure that's the behavior you want, in which case it'll either try to escape instances of your delimiter character if you set an escape character, or raise an exception if you don't.
Do you need the csv lib? Just join the strings...
>>> res = [u'123', u'hello']
>>> print res
[u'123', u'hello']
>>> print " ".join(res)
123 hello
What worked for me was using a regular writer, not the csv.writer, and simply use your delimiter in between columns ('\t' in my case):
with open(target_path, 'w', encoding='utf-8') as fd:
# some code iterating over a pandas daftaframe called mydf
# create a string out of column 0, '\t' (tab delimiter) and column 1:
output = mydf.loc[i][0] + '\t' + mydf.loc[i][1] +'\n'
# write that output string (line) to the file in every iteration
fd.write(output)
It might not be the "correct" way but it definitely kept the original lines in my project, which included many strings and quotations.
I am writing the csv file like this
for a in products:
mylist =[]
for h in headers['product']:
mylist.append(a.get(h))
writer.writerow(mylist)
My my few fields are text fields can conatins any characters like , " ' \n or anything else. what is the safest way to write that in csv file. also file will also have integers and floats
You should use QUOTE_ALL quoting option:
import StringIO
import csv
row = ["AAA \n BBB ,222 \n CCC;DDD \" EEE ' FFF 111"]
output = StringIO.StringIO()
wr = csv.writer(output, quoting=csv.QUOTE_ALL)
wr.writerow( row )
# Test:
contents = output.getvalue()
parsedRow = list(csv.reader([contents]))[0]
if parsedRow == row: print "BINGO!"
using csv.QUOTE_ALL will ensure that all of your entries are quoted like so:
"value1","value2","value3" while using csv.QUOTE_NONE will give you: value1,value2,value3
Additionally, this will change all of your quotes in the entries to double quotes as follows. "somedata"user"somemoredata will become
"somedata""user""somemoredata in your written .csv
However, if you set your quotechar to the backslash character (for example), your entry will return as \" for all quotes.
create=csv.writer(open("test.csv","wb"),quoting=csv.QUOTE_NONEescapechar='\\', quotechar='"')
for element in file:
create.writerow(element)
and the previous example will become somedata\"user\"somemoredata which is clean. It will also escape any commas that you have in your elements the same way.
I am trying to read a bunch of data in .csv file into an array in format:
[ [a,b,c,d], [e,f,g,h], ...]
Running the code below, when I print an entry with a space (' ') the way I'm accessing the element isn't correct because it stops at the first space (' ').
For example if Business, Fast Company, Youtube, fastcompany is the 10th entry...when I print the below I get on separate lines:
Business,Fast
Company,YouTube,FastCompany
Any advice on how to get as the result: [ [a,b,c,d], [Business, Fast Company, Youtube, fastcompany], [e,f,g,h], ...]?
import csv
partners = []
partner_dict = {}
i=9
with open('partners.csv', 'rb') as csvfile:
spamreader = csv.reader(csvfile, delimiter=' ', quotechar='|')
for row in spamreader:
partners.append(row)
print len(partners)
for entry in partners[i]:
print entry
The delimiter argument specifies which character to use to split each row of the file into separate values. Since you're passing ' ' (a space), the reader is splitting on spaces.
If this is really a comma-separated file, use ',' as the delimiter (or just leave the delimiter argument out and it will default to ',').
Also, the pipe character is an unusual value for the quote character. Is it really true that your input file contains pipes in place of quotes? The sample data you supplied contains neither pipes nor quotes.
There are a few issues with your code:
The "correct" syntax for iterating over a list is for entry in partners:, not for entry in partners[i]:
The partners_dict variable in your code seems to be unused, I assume you'll use it later, so I'll ignore it for now
You're opening a text file as binary (use open(file_name, "r") instead of open(file_name, "rb")
Your handling of the processed data is still done inside of the context manager (with ... [as ...]:-block)
Your input text seems to delimit by ", ", but you delimit by " " when parsing
If I understood your question right your problem seems to be caused by the last one. The "obvious solution" would probably be to change the delimeter argument to ", ", but only single-char strings are allowed as delimiters by the module. So what do we do? Well, since "," is really the "true" delimiter (it's never supposed to be inside actual unquoted data, contrary to spaces), that would seem like a good solution. However, now all your values start with " " which is probably not what you want. So what do you do? Well, all strings have a pretty neat strip() method which by default removes all whitespace in the beginning and end of the string. So, to strip() all the values, let's use a "list comprehension" (evaluates an expression on all items in a list and then returns a new list with the new values) which should look somewhat like [i.strip() for i in row] before appending it to partners.
In the end your code should hopefully look somewhat like this:
import csv
partners = []
with open('partners.csv', 'r') as csvfile:
spamreader = csv.reader(csvfile, delimiter=',', quotechar='|')
for row in spamreader:
partners.append([i.strip() for i in row])
print len(partners)
for entry in partners:
print entry