I want adapt a csv from comma-separated to tab-separated. There are also commas between quotes, so I need an exception for that. So, some googling and stackoverflow got me this:
import re
f1 = open('query_result.csv', 'r')
f2 = open('query_result_tab_separated.csv', 'w')
for line in f1:
line = re.sub(',(?=(([^\"]*\"){2})*[^\"]*$)(?![^\[]*\])', '\t', line)
f2.write(line)
f1.close()
However, between the quotes I also find escaped quotes \". An example of a line:
"01-003412467812","Drontmann B.V.",1,6420,"Expert in \"Social, Life and Tech Sciences\""
My current code changes the comma after Social into a tab as well, but I don't want this. How can I make an exception for quotes and within that exception and exception for escaped quotes?
You can't do this with regexp.
Python has a csv module which is intended to do this:
import csv
with open('test.csv', 'rb') as csvfile:
data = csv.reader(csvfile, delimiter=',', quotechar='"', escapechar='\\')
for row in data:
print ' | '.join(row)
The csv module can handle this. You can set the escape character and specify how quotes within a field are escaped using escapechar and doublequote:
import csv
with open('file.csv') as infile, open('file_tabs.csv', 'w') as outfile:
r = csv.reader(infile, doublequote=False, escapechar='\\')
w = csv.writer(outfile, delimiter='\t', doublequote=False, escapechar='\\')
w.writerows(r)
This will create a new tab delimited file that preserves the commas and escaped quotes within a field from the original file. Alternatively, the default settings will use "" (double quote) to escape the quotes:
w = csv.writer(outfile, delimiter='\t')
which would write data like this:
01-003412467812 Drontmann B.V. 1 6420 "Expert in ""Social, Life and Tech Sciences"""
Related
Hello i want to create a csv file in python with the standard library csv.
My Code:
csv_columns = ['receiptNr','category','date','allvalue', 'quantity', 'value']
dict_data = {'receiptNr': 293293, 'category': 'Sbudget' ,'date': '29.11.2020' ,'allvalue': '2.70' , 'quantity': '2 STK' , 'value': '1.35'}
csv_file = r"C:\Maturaprojekt\table.csv"
try:
with open(csv_file, 'w') as csvfile:
writer = csv.DictWriter(csvfile, fieldnames=csv_columns, lineterminator = '\n')
writer.writeheader()
writer.writerow(dict_data)
except IOError:
print("I/O error")
the output is:
output
but i want to have:
wantedfile
raw text data:
data
That's problem of delimiter.
The delimiter used in your csv file is probably not the same delimiter used in your spreadsheet program.
You can add a delimiter parameter to csv.DictWriter to change the delimiter of your csv file if you want, like this:
writer = csv.DictWriter(csvfile, delimiter=';', fieldnames=csv_columns, lineterminator = '\n')
Or you could change the csv file delimiter chosen in your spreadsheet program.
As #Chris commented, you're showing us screenshots from a spreadsheet program, you will have to select the right delimiter in your spreadsheet program (usually when you open the csv file).
In short: the default delimiter used by csv.DictWriter is not the same that is used in your spreadsheet program.
How you could change the delimiter (separator) used by your spreadshet program depends on the spreadsheet program you use.
You could go to your favorite search engine and search for something like this:
csv file change separator libreoffice, or csv file change separator excel, or csv file change delimiter excel, etc.
I need to process some .csv files. Some of them have field entries of 1 double quote (") or possibly several mixed in with other text. I need to escape them all. So far I'm doing this:
def process_file():
input_path = 'input.txt'
output_path = 'output.txt'
with open(input_path) as input_file, open(output_path, 'w+') as output_file:
for line in input_file:
newline = line.replace('"', '""""')
output_file.write(newline)
How can I make sure that the replace only happens with single characters and does not replace "" or """" for example.
I'd like to use python instead of any command line solution. Also, these files are very large, which is why I'm looping through lines instead of loading the whole thing into memory.
Thanks to #mkrieger1 and this question, I was able to put together this solution:
def process_file():
input_path = 'input.txt'
output_path = 'output.txt'
with open(input_path) as input_file, open(output_path, 'w+') as output_file:
for line in input_file:
newline = re.sub(r'(?<!")"(?!")', '""""', line)
output_file.write(newline)
You can use a regular expression:
import re
newline = re.sub(r'^"$', '"""', line)
I'm trying to import a csv file which has # as delimiter and \r\n as line break. Inside one column there is data which also has newline in it but \n.
I'm able to read one line after another without problems but using the csv lib (Python 3) I've got stuck.
The below example throws a
Error: new-line character seen in unquoted field - do you need to open the file in universal-newline mode?
Is it possible to use the csv lib with multiple newline characters?
Thanks!
import csv
with open('../database.csv', newline='\r\n') as csvfile:
file = csv.reader(csvfile, delimiter='#', quotechar='"')
for row in file:
print(row[3])
database.csv:
2202187#"645cc14115dbfcc4defb916280e8b3a1"#"cd2d3e434fb587db2e5c2134740b8192"#"{
Age = 22;
Salary = 242;
}
Please try this code. According to Python 3.5.4 documentation, with newline=None, common line endings like '\r\n' ar replaced by '\n'.
import csv
with open('../database.csv', newline=None) as csvfile:
file = csv.reader(csvfile, delimiter='#', quotechar='"')
for row in file:
print(row[3])
I've replaced newline='\r\n' by newline=None.
You also could use the 'rU' modifier but it is deprecated.
...
with open('../database.csv', 'rU') as csvfile:
...
import csv
in_txt = csv.reader(open(post.text, "rb"), delimiter = '\t')
out_csv = csv.writer("C:\Users\sptechsoft\Documents\source3.csv", 'wb')
out_csv.writerows(in_txt)
when executing above code i am getting IO error and i need to save in CSV in seperate folder
You dont need to open file before passing it to csvreader.
You can directly pass the file to csvreader and it would work
import csv
in_txt = csv.reader("post.text", "rb", delimiter = '\t')
out_csv = csv.writer("C:\Users\sptechsoft\Documents\source3.csv", 'wb')
out_csv.writerows(in_txt)
Try the following:
import csv
with open(post.text, "rb") as f_input, open(r"C:\Users\sptechsoft\Documents\source3.csv", "wb") as f_output:
in_csv = csv.reader(f_input, delimiter='\t')
out_csv = csv.writer(f_output)
out_csv.writerows(in_csv)
The csv.reader() and csv.writer() needs either a list or a file object. It cannot open the file for you. By using with it ensures the files are correctly closed automatically afterwards.
Also do not forget to prefix your path string with r to disable any string escaping due to the backslashes.
I am trying to add extra columns in a csv file after processing an input csv file. But, I am getting extra new line added after each line in the output.
What's missing or wrong in my below code -
import csv
with open('test.csv', 'r') as infile:
with open('test_out.csv', 'w') as outfile:
reader = csv.reader(infile, delimiter=',')
writer = csv.writer(outfile, delimiter=',')
for row in reader:
colad = row[5].rstrip('0123456789./ ')
if colad == row[5]:
col2ad = row[11]
else:
col2ad = row[5].split(' ')[-1]
writer.writerow([row[0],colad,col2ad] +row[1:])
I am processing huge a csv file so would like to get rid of those extra lines.
I had the same problem on Windows (your OS as well, I presume?). CSV and Windows as combination make a \r\r\n at the end of each line (so: double newline).
You need to open the output file in binary mode:
with open('test_out.csv', 'wb') as outfile:
For other answers:
Python's CSV writer produces wrong line terminator
CSV in Python adding an extra carriage return