I am reading in a csv file and dealing with each line as a list. At the end, I'd like to reprint to a .csv file, but the lines aren't necessarily even. I obviously cannot just go "print row", since this will print it as a list. How can I print it in .csv format?
Read manual, there's a CSV writer method (with example too). Don't print the data, store them and then write them into CSV file
http://docs.python.org/library/csv.html#csv.writer
Assuming that "row" contains a list of strings, you could try using
print ",".join(row)
Though it looks like your question was more towards writing to a csv file rather than printing it to standard output, here an example to do the latter using StringIO:
import StringIO
import csv
a_list_from_csv_file = [['for', 'bar'], [1, 2]]
out_fd = StringIO.StringIO()
writer = csv.writer(out_fd, delimiter='|')
for row in a_list_from_csv_file:
writer.writerow(row)
print out_fd.getvalue()
Like this you can use the different delimiters or escape characters as used by the csv you are reading.
What do you mean by "the lines aren't necessarily even"? Are you using a homebrew CSV parser or are you using the csv module?
If the former and there's nothing special you need to escape, you could try something like
print ",".join([ '"' + x.replace('"', '""') + '"' for x in row])
If you don't want ""s around each field, maybe make a function like escape_field() that checks if it needs to be wrapped in double quotes and/or escaped.
Related
I'm doing some measurements in the lab and want to transform them into some nice Python plots. The problem is the way the software exports CSV files, as I can't find a way to properly read the numbers. It looks like this:
-10;-0,0000026
-8;-0,00000139
-6;-0,000000546
-4;-0,000000112
-2;-5,11E-09
0,0000048;6,21E-09
2;0,000000318
4;0,00000304
6;0,0000129
8;0,0000724
10;0,000268
Separation by ; is fine, but I need every , to be ..
Ideally I would like Python to be able to read numbers such as 6.21E-09 as well, but I should be able to fix that in excel...
My main issue: Change every , to . so Python can read them as a float.
The simplest way would be for you to convert them to string and then use the .replace() method to pretty much do anything. For i.e.
txt = "0,0000048;6,21E-09"
txt = txt.replace(';', '.')
You could also read the CSV file (I don't know how you are reading the file) but depending on the library, you could change the 'delimiter' (to : for example). CSV is Comma-separated values and as the name implies, it separates columns by means of '.
You can do whatever you want in Python, for example:
import csv
with open('path_to_csv_file', 'r') as csv_file:
data = list(csv.reader(csv_file, delimiter=';'))
data = [(int(raw_row[0]), float(raw_row[1].replace(',', '.'))) for row in data]
with open('path_to_csv_file', 'w') as csv_file:
writer = csv.writer(csv_file, delimiter=';')
writer.writerows(data)
Can you consider a regex to match the ',' all in the text, then loop the match results in a process that takes ',' to '.'.
I downloaded data from internet and saved as a csv (comma delimited) file. The image shows what the file looks like in excel.
Using csv.reader in python, I printed each row. I have shown my code below along with the output in Spyder.
import csv
with open('p_dat.csv', 'r') as file:
reader = csv.reader(file)
for row in reader:
print(row)
I am very confused as to why my values are not comma separated. Any help will be greatly appreciated.
As pointed out in the comments, technically this is a TSV (tab-separated values) file, which is actually perfectly valid.
In practice, of course, not all libraries will make a "hard" distinction between a TSV and CSV file. The way you parse a TSV file is basically the same as the way you parse a CSV file, except that the delimiter is different.
There are actually multiple valid delimiters for this kind of file, such as tabs, commas, and semicolons. Which one you choose is honestly a matter of preference, not a "hard" technical limit.
See the specification for csvs. There are many options for the delimiter in the file. In this case you have a tab, \t.
The option is important. Suppose your data had commas in it, then a , as a delimiter would not be a good choice.
Even though they're named comma-separated values, they're sometimes separated by different symbols (like the tab character that you have currently).
If you want to use Python to view this as a comma-separated file, you can try something like:
import csv
...
with open('p_dat.csv', 'r') as file:
reader = csv.reader(file)
for row in reader:
commarow = row.replace("\t",",")
print(commarow)
I'm trying to convert csv format to tsv.
I just used this to change comma to tab.
tsv = re.sub(',', '\t', csv)
But I can't deal with the string with comma inside, for example:
dept_no,dt,hello
1,20180411,hello its me
2,20180412,here has tab
3,20180412,"here, is, commas"
Is there any way to convert to tsv without affecting comma inside of the string?
Try following, csv module should take care of inline commas.
import csv
csv.writer(file('output.tsv', 'w+'), delimiter='\t').writerows(csv.reader(open("input.csv")))
EDIT-1
[Added open method as per discussion]
Insted of using file where we are calling constructor directly, you can use open which is preferred way as per documentation here.
Result is same.
csv.writer(open('output.tsv', 'w+'), delimiter='\t').writerows(csv.reader(open("input.csv")))
Result
I have an excel sheet that has a lot of data in it in one column in the form of a python dictionary from a sql database. I don't have access to the original database and I can't import the CSV back into sql with the local infile command due to the fact that the keys/values on each row of the CSV are not in the same order. When I export the excel sheet to CSV I get:
"{""first_name"":""John"",""last_name"":""Smith"",""age"":30}"
"{""first_name"":""Tim"",""last_name"":""Johnson"",""age"":34}"
What is the best way to remove the " before and after the curly brackets as well as the extra " around the keys/values?
I also need to leave the integers alone that don't have quotes around them.
I am trying to then import this into python with the json module so that I can print specific keys but I can't import them with the doubled double quotes. I ultimately need the data saved in a file that looks like:
{"first_name":"John","last_name":"Smith","age":30}
{"first_name":"Tim","last_name":"Johnson","age":34}
Any help is most appreciated!
Easy:
text = re.sub(r'"(?!")', '', text)
Given the input file: TEST.TXT:
"{""first_name"":""John"",""last_name"":""Smith"",""age"":30}"
"{""first_name"":""Tim"",""last_name"":""Johnson"",""age"":34}"
The script:
import re
f = open("TEST.TXT","r")
text_in = f.read()
text_out = re.sub(r'"(?!")', '', text_in)
print(text_out)
produces the following output:
{"first_name":"John","last_name":"Smith","age":30}
{"first_name":"Tim","last_name":"Johnson","age":34}
This should do it:
with open('old.csv') as old, open('new.csv', 'w') as new:
new.writelines(re.sub(r'"(?!")', '', line) for line in old)
If the input file is just as shown, and of the small size you mention, you can load the whole file in memory, make the substitutions, and then save it. IMHO, you don't need a RegEx to do this. The easiest to read code that does this is:
with open(filename) as f:
input= f.read()
input= str.replace('""','"')
input= str.replace('"{','{')
input= str.replace('}"','}')
with open(filename, "w") as f:
f.write(input)
I tested it with the sample input and it produces:
{"first_name":"John","last_name":"Smith","age":30}
{"first_name":"Tim","last_name":"Johnson","age":34}
Which is exactly what you want.
If you want, you can also pack the code and write
with open(inputFilename) as if:
with open(outputFilename, "w") as of:
of.write(if.read().replace('""','"').replace('"{','{').replace('}"','}'))
but I think the first one is much clearer and both do exactly the same.
I think you are overthinking the problem, why don't replace data?
l = list()
with open('foo.txt') as f:
for line in f:
l.append(line.replace('""','"').replace('"{','{').replace('}"','}'))
s = ''.join(l)
print s # or save it to file
It generates:
{"first_name":"John","last_name":"Smith","age":30}
{"first_name":"Tim","last_name":"Johnson","age":34}
Use a list to store intermediate lines and then invoke .join for improving performance as explained in Good way to append to a string
You can actual use the csv module and regex to do this:
st='''\
"{""first_name"":""John"",""last_name"":""Smith"",""age"":30}"
"{""first_name"":""Tim"",""last_name"":""Johnson"",""age"":34}"\
'''
import csv, re
data=[]
reader=csv.reader(st, dialect='excel')
for line in reader:
data.extend(line)
s=re.sub(r'(\w+)',r'"\1"',''.join(data))
s=re.sub(r'({[^}]+})',r'\1\n',s).strip()
print s
Prints
{"first_name":"John","last_name":"Smith","age":"30"}
{"first_name":"Tim","last_name":"Johnson","age":"34"}
I have a text file (.txt) which could be in tab separated format or pipe separated format, and I need to convert it into CSV file format. I am using python 2.6. Can any one suggest me how to identify the delimiter in a text file, read the data and then convert that into comma separated file.
Thanks in advance
I fear that you can't identify the delimiter without knowing what it is. The problem with CSV is, that, quoting ESR:
the Microsoft version of CSV is a textbook example of how not to design a textual file format.
The delimiter needs to be escaped in some way if it can appear in fields. Without knowing, how the escaping is done, automatically identifying it is difficult. Escaping could be done the UNIX way, using a backslash '\', or the Microsoft way, using quotes which then must be escaped, too. This is not a trivial task.
So my suggestion is to get full documentation from whoever generates the file you want to convert. Then you can use one of the approaches suggested in the other answers or some variant.
Edit:
Python provides csv.Sniffer that can help you deduce the format of your DSV. If your input looks like this (note the quoted delimiter in the first field of the second row):
a|b|c
"a|b"|c|d
foo|"bar|baz"|qux
You can do this:
import csv
csvfile = open("csvfile.csv")
dialect = csv.Sniffer().sniff(csvfile.read(1024))
csvfile.seek(0)
reader = csv.DictReader(csvfile, dialect=dialect)
for row in reader:
print row,
# => {'a': 'a|b', 'c': 'd', 'b': 'c'} {'a': 'foo', 'c': 'qux', 'b': 'bar|baz'}
# write records using other dialect
Your strategy could be the following:
parse the file with BOTH a tab-separated csv reader and a pipe-separated csv reader
calculate some statistics on resulting rows to decide which resultset is the one you want to write. An idea could be counting the total number of fields in the two recordset (expecting that tab and pipe are not so common). Another one (if your data is strongly structured and you expect the same number of fields in each line) could be measuring the standard deviation of number of fields per line and take the record set with the smallest standard deviation.
In the following example you find the simpler statistic (total number of fields)
import csv
piperows= []
tabrows = []
#parsing | delimiter
f = open("file", "rb")
readerpipe = csv.reader(f, delimiter = "|")
for row in readerpipe:
piperows.append(row)
f.close()
#parsing TAB delimiter
f = open("file", "rb")
readertab = csv.reader(f, delimiter = "\t")
for row in readerpipe:
tabrows.append(row)
f.close()
#in this example, we use the total number of fields as indicator (but it's not guaranteed to work! it depends by the nature of your data)
#count total fields
totfieldspipe = reduce (lambda x,y: x+ y, [len(f) for f in piperows])
totfieldstab = reduce (lambda x,y: x+ y, [len(f) for f in tabrows])
if totfieldspipe > totfieldstab:
yourrows = piperows
else:
yourrows = tabrows
#the var yourrows contains the rows, now just write them in any format you like
Like this
from __future__ import with_statement
import csv
import re
with open( input, "r" ) as source:
with open( output, "wb" ) as destination:
writer= csv.writer( destination )
for line in input:
writer.writerow( re.split( '[\t|]', line ) )
I would suggest taking some of the example code from the existing answers, or perhaps better use the csv module from python and change it to first assume tab separated, then pipe separated, and produce two output files which are comma separated. Then you visually examine both files to determine which one you want and pick that.
If you actually have lots of files, then you need to try to find a way to detect which file is which.
One of the examples has this:
if "|" in line:
This may be enough: if the first line of a file contains a pipe, then maybe the whole file is pipe separated, else assume a tab separated file.
Alternatively fix the file to contain a key field in the first line which is easily identified - or maybe the first line contains column headers which can be detected.
for line in open("file"):
line=line.strip()
if "|" in line:
print ','.join(line.split("|"))
else:
print ','.join(line.split("\t"))