Hi I am trying to use csv library to convert my CSV file into a new one.
The code that I wrote is the following:
import csv
import re
file_read=r'C:\Users\Comarch\Desktop\Test.csv'
file_write=r'C:\Users\Comarch\Desktop\Test_new.csv'
def find_txt_in_parentheses(cell_txt):
pattern = r'\(.+\)'
return set(re.findall(pattern, cell_txt))
with open(file_write, 'w', encoding='utf-8-sig') as file_w:
csv_writer = csv.writer(file_w, lineterminator="\n")
with open(file_read, 'r',encoding='utf-8-sig') as file_r:
csv_reader = csv.reader(file_r)
for row in csv_reader:
cell_txt = row[0]
txt_in_parentheses = find_txt_in_parentheses(cell_txt)
if len(txt_in_parentheses) == 1:
txt_in_parentheses = txt_in_parentheses.pop()
cell_txt_new = cell_txt.replace(' ' + txt_in_parentheses,'')
cell_txt_new = txt_in_parentheses + '\n' + cell_txt_new
row[0] = cell_txt_new
csv_writer.writerow(row)
The only problem is that in the resulting file (Test_new.csv file), I have CRLF instead of LF.
Here is a sample image of:
read file on the left
write file on the right:
And as a result when I copy the csv column into Google docs Excel file I am getting a blank line after each row with CRLF.
Is it possible to write my code with the use of csv library so that LF is left inside a cell instead of CRLF.
From the documentation of csv.reader
If csvfile is a file object, it should be opened with newline=''1
[...]
Footnotes
1(1,2)
If newline='' is not specified, newlines embedded inside quoted fields will not be interpreted correctly, and on platforms that use \r\n linendings on write an extra \r will be added. It should always be safe to specify newline='', since the csv module does its own (universal) newline handling.
This is precisely the issue you're seeing. So...
with open(file_read, 'r', encoding='utf-8-sig', newline='') as file_r, \
open(file_write, 'w', encoding='utf-8-sig', newline='') as file_w:
csv_reader = csv.reader(file_r, dialect='excel')
csv_writer = csv.writer(file_w, dialect='excel')
# ...
You are on Windows, and you open the file with mode 'w' -- which gives you windows style line endings. Using mode 'wb' should give you the preferred behaviour.
Related
I have a large (1.6million rows+) .csv file that has some data with leading spaces, tabs, and trailing spaces and maybe even trailing tabs. I need to read the data in, strip all of that whitespace, and then spit the rows back out into a new .csv file preferably with the most efficient code possible and using only built-in modules in python 3.7
Here is what I have that is currently working, except it only spits out the header over and over and over and doesn't seem to take care of trailing tabs (not a huge deal though on trailing tabs):
def new_stripper(self, input_filename: str, output_filename: str):
"""
new_stripper(self, filename: str):
:param self: no idea what this does
:param filename: name of file to be stripped, must have .csv at end of file
:return: for now, it doesn't return anything...
-still doesn't remove trailing tabs?? But it can remove trailing spaces
-removes leading tabs and spaces
-still needs to write to new .csv file
"""
import csv
csv.register_dialect('strip', skipinitialspace=True)
reader = csv.DictReader(open(input_filename), dialect='strip')
reader = (dict((k, v.strip()) for k, v in row.items() if v) for row in reader)
for row in reader:
with open(output_filename, 'w', newline='') as out_file:
writer = csv.writer(out_file, delimiter=',')
writer.writerow(row)
input_filename = 'testFile.csv'
output_filename = 'output_testFile.csv'
new_stripper(self='', input_filename=input_filename, output_filename=output_filename)
As written above, the code just prints the headers over and over in a single line. I've played around with the arrangement and indenting of the last four lines of the def with some different results, but the closest I've gotten is getting it to print the header row again and again on new lines each time:
...
# headers and headers for days
with open(output_filename, 'w', newline='') as out_file:
writer = csv.writer(out_file, delimiter=',')
for row in reader:
writer.writerow(row)
EDIT1: Here's the result from the non-stripping correctly thing. Some of them have leading spaces that weren't stripped, some have trailing spaces that weren't stripped. It seems like the left-most column was properly stripped of leading spaces, but not trailing spaces; same with header row.
enter image description here
Update: Here's the solution I was looking for:
def get_data(self, input_filename: str, output_filename: str):
import csv
with open(input_filename, 'r', newline='') as in_file, open(output_filename, 'w', newline='') as out_file:
r = csv.reader(in_file, delimiter=',')
w = csv.writer(out_file, delimiter=',')
for line in r:
trim = (field.strip() for field in line)
w.writerow(trim)
input_filename = 'testFile.csv'
output_filename = 'output_testFile.csv'
get_data(self='', input_filename=input_filename, output_filename=output_filename)
Don't make life complicated for yourself, "CSV" files are simple plain text files, and can be handled in a generic way:
with open('input.csv', 'r') as inf, open('output.csv', 'w') as of:
for line in inf:
trim = (field.strip() for field in line.split(','))
of.write(','.join(trim)+'\n')
Alternatively, using the csv module:
import csv
with open('input.csv', 'r') as inf, open('output.csv', 'w') as of:
r = csv.reader(inf, delimiter=',')
w = csv.writer(of, delimiter=',')
for line in r:
trim = (field.strip() for field in line)
w.writerow(trim)
Unfortunately I cannot comment, but I believe you might want to strip every entry in csv of the white space (not just the line). If that is the case, then, based on Jan's answer, this might do the trick:
with open('file.csv', 'r') as inf, open('output.csv', 'w') as of:
for line in inf:
of.write(','.join(list(map(str.strip, line.split(',')))) + '\n')
What it does is it splits each line by comma resulting in a list of values, then strips every element from whitespace to later join them back up and save to output file.
your final reader variable contains tuple of dicts but your writer expects list.
you can either user csv.DictWriter or store the processed data(v) in a list first and then write to csv and include headers using writer.writeheader()
I want to end each interation of a for loop with writing a new line of content (including newline) to a csv file. I have this:
# Set up an output csv file with column headers
with open('outfile.csv','w') as f:
f.write("title; post")
f.write("\n")
This does not appear to write an actual \n (newline) the file. Further:
# Concatenate into a row to write to the output csv file
csv_line = topic_title + ";" + thread_post
with open('outfile.csv','w') as outfile:
outfile.write(csv_line + "\n")
This, also, does not move the cursor in the outfile to the next line. Each new line, with every iteration of the loop, just overwrites the most recent one.
I also tried outfile.write(os.linesep) but did not work.
change 'w' to 'a'
with open('outfile.csv','a')
with open('outfile.csv', 'w', newline='') as f:
f.writerow(...)
Alternatively:
f = csv.writer('outfile.csv', lineterminator='\n')
I confront with same problem, only need follow:
f = csv.writer('outfile.csv', lineterminator='\n')
If your using python2 then use
with open('xxx.csv', 'a') as f:
writer = csv.writer(f)
writer.writerow(fields)
If you are using python3 make use of newline=''
Please try with: open(output_file_name.csv, 'a+', newline='') as f:
Problem
I need to re-format a text from comma (,) separated values to pipe (|) separated values. Pipe characters within the values of the original (comma separated) text shall be replaced by a space for representation in the (pipe separated) result text.
The pipe separated result text shall be written back to the same file from which the original comma separated text has been read.
I am using python 2.6
Possible Solution
I should read the file first and remove all pipes with spaces in that and later replace (,) with (|).
Is there a the better way to achieve this?
Don't reinvent the value-separated file parsing wheel. Use the csv module to do the parsing and the writing for you.
The csv module will add "..." quotes around values that contain the separator, so in principle you don't need to replace the | pipe symbols in the values. To replace the original file, write to a new (temporary) outputfile then move that back into place.
import csv
import os
outputfile = inputfile + '.tmp'
with open(inputfile, 'rb') as inf, open(outputfile, 'wb') as outf:
reader = csv.reader(inf)
writer = csv.writer(outf, delimiter='|')
writer.writerows(reader)
os.remove(inputfile)
os.rename(outputfile, inputfile)
For an input file containing:
foo,bar|baz,spam
this produces
foo|"bar|baz"|spam
Note that the middle column is wrapped in quotes.
If you do need to replace the | characters in the values, you can do so as you copy the rows:
outputfile = inputfile + '.tmp'
with open(inputfile, 'rb') as inf, open(outputfile, 'wb') as outf:
reader = csv.reader(inf)
writer = csv.writer(outf, delimiter='|')
for row in reader:
writer.writerow([col.replace('|', ' ') for col in row])
os.remove(inputfile)
os.rename(outputfile, inputfile)
Now the output for my example becomes:
foo|bar baz|spam
Sounds like you're trying to work with a variation of CSV - in that case, Python's CSV library might as well be what you need. You can use it with custom delimiters and it will auto-handle escaping for you (this example was yanked from the manual and modified):
import csv
with open('eggs.csv', 'wb') as csvfile:
spamwriter = csv.writer(csvfile, delimiter='|')
spamwriter.writerow(['One', 'Two', 'Three])
There are also ways to modify quoting and escaping and other options. Reading works similarly.
You can create a temporary file from the original that has the pipe characters replaced, and then replace the original file with it when the processing is done:
import csv
import tempfile
import os
filepath = 'C:/Path/InputFile.csv'
with open(filepath, 'rb') as fin:
reader = csv.DictReader(fin)
fout = tempfile.NamedTemporaryFile(dir=os.path.dirname(filepath)
delete=False)
temp_filepath = fout.name
writer = csv.DictWriter(fout, reader.fieldnames, delimiter='|')
# writer.writeheader() # requires Python 2.7
header = dict(zip(reader.fieldnames, reader.fieldnames))
writer.writerow(header)
for row in reader:
for k,v in row.items():
row[k] = v.replace('|'. ' ')
writer.writerow(row)
fout.close()
os.remove(filepath)
os.rename(temp_filepath, filepath)
For Python I'm opening a csv file that appears like:
jamie,london,uk,600087
matt,paris,fr,80092
john,newyork,ny,80071
How do I enclose the words with quotes in the csv file so it appears like:
"jamie","london","uk","600087"
etc...
What I have right now is just the basic stuff:
filemame = "data.csv"
file = open(filename, "r")
Not sure what I would do next.
If you are just trying to convert the file, use the QUOTE_ALL constant from the csv module, like this:
import csv
with open('data.csv') as input, open('out.csv','w') as output:
reader = csv.reader(input)
writer = csv.writer(output, delimiter=',', quoting=csv.QUOTE_ALL)
for line in reader:
writer.writerow(line)
I have to convert some txt files to csv
(and make some operation during the conversion).
I use csv.Sniffer() class to detect wich delimiter is used in the txt
This code
with open(filename_input, 'r') as f1, open(filename_output, 'wb') as f2:
dialect = csv.Sniffer().sniff(f1.read(1024)) #### detect delimiters
f1.seek(0)
r=csv.reader(f1, delimiter=dialect )
writer = csv.writer(f2,delimiter=';')
return: Error: Could not determine delimiter
This work
with open(filename_input, 'r') as f1, open(filename_output, 'wb') as f2:
#dialect = csv.Sniffer().sniff(f1.read(1024)) #### detect delimiters
#f1.seek(0)
r=csv.reader(f1, delimiter='\t' )
writer = csv.writer(f2,delimiter=';')
or
with open(filename_input, 'r') as f1, open(filename_output, 'wb') as f2:
#dialect = csv.Sniffer().sniff(f1.read(1024)) #### detect delimiters
#f1.seek(0)
r=csv.reader(f1, dialect="excel-tab")
writer = csv.writer(f2,delimiter=';')
this is a txt row example (10 records delimited by Tab)
166 14908941 sa_s NOVA i 7.05 DEa 7.17 Ncava - Deo mo 7161 4,97
why csv.Sniffer() class doesn't work?
The bug was read only 1024 byte to parse the entire txt(maybe this is not enough to detect the delimiter).
Now this code works without no other edit:
with open(filename_input, 'r') as f1, open(filename_output, 'wb') as f2:
dialect = csv.Sniffer().sniff(f1.read()) #### error with dialect = csv.Sniffer().sniff(f1.read(1024))
f1.seek(0)
r=csv.reader(f1, delimiter=dialect )
writer = csv.writer(f2,delimiter=';')
You have to use dialect.delimiter instead of just dialect because what is returned is of type class Dialect and you need its attribute Dialect.delimiter
rows=csv.reader(f1, delimiter=dialect.delimiter)
Modified code will be as below
import csv
filename_input = 'filein.txt'
filename_output = 'fileout.csv'
with open(filename_input, 'r') as f1, open(filename_output, 'wb') as f2:
dialect = csv.Sniffer().sniff(f1.read(1024), "\t") #### detect delimiters
f1.seek(0)
print(dialect.delimiter)
rows=csv.reader(f1, delimiter=dialect.delimiter)
writer = csv.writer(f2,delimiter=';')
writer.writerows(rows)
Output:
C:\pyp>python.exe txttocsv.py
,
C:\pyp>
Also note that from doc:
sniff(sample, delimiters=None)
Analyze the given sample and return a Dialect subclass reflecting
the parameters found. If the optional delimiters parameter is given,
it is interpreted as a string containing possible valid delimiter
characters.
Hence if the delimiter that you want to find in your text file is something like # instead of , or ; then you should mention that in sniff function as second parameter like this:
dialect = csv.Sniffer().sniff(f1.read(1024), '#')
Update: For reading whole file you will need
dialect = csv.Sniffer().sniff(f1.read())
The code works but in CSV that is generated each record is skipping one line.
The code i used :-
import csv
filename_input = r'filepath.txt'
filename_output = r'filepath.csv'
with open(filename_input, 'r') as tmp, open(filename_output, 'w') as tmp2:
dialect = csv.Sniffer().sniff(tmp.read(1024), ";") #### detect delimiters
tmp.seek(0)
print(dialect.delimiter)
rows=csv.reader(tmp, delimiter=dialect.delimiter)
writer = csv.writer(tmp2,delimiter=',')
writer.writerows(rows)
Input:-
Output:-