Adding custom delimiters back to a csv? - python

Currently, I take in a csv file using custom delimiters, "|". I then read it in and modify it using the code below:
import csv
ChangedDate = '2018-10-31'
firstfile = open('example.csv',"r")
firstReader = csv.reader(firstfile, delimiter='|')
firstData = list(firstReader)
outputFile = open("output.csv","w")
iteration = 0
for row in firstData:
firstData[iteration][25] = ChangedDate
iteration+=1
outputwriter = csv.writer(open("output.csv","w"))
outputwriter.writerows(firstData)
outputFile.close()
However, when I write the rows to my output file, they are comma seperated. This is a problem because I am dealing with large financial data, and therefore commas appear naturally, such as $8,000.00, hence the "|" delimiters of the original file. Is there a way to "re-delimit" my list before I write it to an output file?

You can provide the delimiter to the csv.writer:
with open("output.csv", "w") as f:
outputwriter = csv.writer(f, delimiter='|')

Related

Remove 1000's separator from column in CSV?

I have a Python script where I'm importing a csv that has commas in values over 1000. These values are strings in the csv. I need to remove the commas from the values, and convert the strings to rounded floats inside the csv before it's imported into Python.
I've tried appending all the new values to a list to use the csv.writer, but I haven't been able to figure out how to have the writer only replace the values in the column that have commas. Here's what I have so far. :
import csv
RoomReport = r'path_to_csv'
new_values_list = []
f = open(RoomReport, "r")
reader = csv.reader(f)
writer = csv.writer(f)
for row in reader:
useable_area = row[7]
if "," in useable_area:
useable_area_no_comma = useable_area.replace(",","")
useable_area_rounded = int(round(float(useable_area_no_comma)))
new_values_list.append(useable_area_rounded)
f.close()
As I mentioned in a comment, this can only be done if the input csv file is formatted in a way that will allow the commas in the numbers to be differentiated from the commas between each one of them.
Here's an example of one way it could be done (by quoting all the values):
"0","1","2","3","4","5","6","7,123.6","8","9"
"0","1","2","3","4","5","6","1,000","8","9"
"0","1","2","3","4","5","6","20,000","8","9"
Here's code that will do what you want. It uses the locale.atof function to simplify cleaning up the number:
import csv
import locale
# Set local to someplace that uses a comma for the thousands separator.
locale.setlocale(locale.LC_ALL, 'English_US.1252')
RoomReport = r'RoomReport.csv'
cleaned_report = r'RoomReport_cleaned.csv'
new_values_list = []
with open(RoomReport, "r", newline='') as inp:
for row in csv.reader(inp):
if "," in row[7]:
row[7] = int(round(locale.atof(row[7])))
new_values_list.append(row)
# Create cleaned-up output file.
with open(cleaned_report, "w", newline='') as outp:
csv.writer(outp, quoting=csv.QUOTE_ALL).writerows(new_values_list)
The RoomReport_cleaned.csv it creates from the example input will contain this:
"0","1","2","3","4","5","6","7124","8","9"
"0","1","2","3","4","5","6","1000","8","9"
"0","1","2","3","4","5","6","20000","8","9"
Note that since the values in the output no longer have commas embedded in them, the quoting all fields is not longer necessary—so could be left out by not specifying csv.QUOTE_ALL.
maybe something like this?
import re
from sys import stdout
isnum = re.compile('^[0-9, ]+$')
non = re.compile('[, ]')
fd = StringIO()
out = csv.writer(fd)
out.writerow(['foo','1,000,000',19])
out.writerow(['bar','1,234,567',20])
fd.seek(0)
inp = csv.reader(fd)
out = csv.writer(stdout)
for row in inp:
for i, x in enumerate(row):
if isnum.match(x):
row[i] = float(non.sub('', x))
out.writerow(row)

Modify field in csv file with Python

I am attempting to remove special characters from a specific column within my csv file. But I cant figure out a way to specify the column I would like to change. Here is what I have:
import csv
input_file = open('src/list.csv', 'r')
output_file = open('src/list_new.csv', 'w')
data = csv.reader(input_file)
writer = csv.writer(output_file, quoting=csv.QUOTE_ALL) # dialect='excel')
specials = '#'
for line in data:
line = str(line)
new_line = str.replace(line, specials, '')
writer.writerow(new_line.split(','))
input_file.close()
output_file.close()
Instead of searching through the whole file how can I specify the column ("Names") I would like to remove the special characters from?
Maybe use csv.DictReader? Then you can refer to the column by name.

Read CSV with comma as linebreak

I have a file saved as .csv
"400":0.1,"401":0.2,"402":0.3
Ultimately I want to save the data in a proper format in a csv file for further processing. The problem is that there are no line breaks in the file.
pathname = r"C:\pathtofile\file.csv"
with open(pathname, newline='') as file:
reader = file.read().replace(',', '\n')
print(reader)
with open(r"C:\pathtofile\filenew.csv", 'w') as new_file:
csv_writer = csv.writer(new_file)
csv_writer.writerow(reader)
The print reader output looks exactly how I want (or at least it's a format I can further process).
"400":0.1
"401":0.2
"402":0.3
And now I want to save that to a new csv file. However the output looks like
"""",4,0,0,"""",:,0,.,1,"
","""",4,0,1,"""",:,0,.,2,"
","""",4,0,2,"""",:,0,.,3
I'm sure it would be intelligent to convert the format to
400,0.1
401,0.2
402,0.3
at this stage instead of doing later with another script.
The main problem is that my current code
with open(pathname, newline='') as file:
reader = file.read().replace(',', '\n')
reader = csv.reader(reader,delimiter=':')
x = []
y = []
print(reader)
for row in reader:
x.append( float(row[0]) )
y.append( float(row[1]) )
print(x)
print(y)
works fine for the type of csv files I currently have, but doesn't work for these mentioned above:
y.append( float(row[1]) )
IndexError: list index out of range
So I'm trying to find a way to work with them too. I think I'm missing something obvious as I imagine that it can't be too hard to properly define the linebreak character and delimiter of a file.
with open(pathname, newline=',') as file:
yields
ValueError: illegal newline value: ,
The right way with csv module, without replacing and casting to float:
import csv
with open('file.csv', 'r') as f, open('filenew.csv', 'w', newline='') as out:
reader = csv.reader(f)
writer = csv.writer(out, quotechar=None)
for r in reader:
for i in r:
writer.writerow(i.split(':'))
The resulting filenew.csv contents (according to your "intelligent" condition):
400,0.1
401,0.2
402,0.3
Nuances:
csv.reader and csv.writer objects treat comma , as default delimiter (no need to file.read().replace(',', '\n'))
quotechar=None is specified for csv.writer object to eliminate double quotes around the values being saved
You need to split the values to form a list to represent a row. Presently the code is splitting the string into individual characters to represent the row.
pathname = r"C:\pathtofile\file.csv"
with open(pathname) as old_file:
with open(r"C:\pathtofile\filenew.csv", 'w') as new_file:
csv_writer = csv.writer(new_file, delimiter=',')
text_rows = old_file.read().split(",")
for row in text_rows:
items = row.split(":")
csv_writer.writerow([int(items[0]), items[1])
If you look at the documentation, for write_row, it says:
Write the row parameter to the writer’s file
object, formatted according to the current dialect.
But, you are writing an entire string in your code
csv_writer.writerow(reader)
because reader is a string at this point.
Now, the format you want to use in your CSV file is not clearly mentioned in the question. But as you said, if you can do some preprocessing to create a list of lists and pass each sublist to writerow(), you should be able to produce the required file format.

Pipe delimiter file, but no pipe inside data

Problem
I need to re-format a text from comma (,) separated values to pipe (|) separated values. Pipe characters within the values of the original (comma separated) text shall be replaced by a space for representation in the (pipe separated) result text.
The pipe separated result text shall be written back to the same file from which the original comma separated text has been read.
I am using python 2.6
Possible Solution
I should read the file first and remove all pipes with spaces in that and later replace (,) with (|).
Is there a the better way to achieve this?
Don't reinvent the value-separated file parsing wheel. Use the csv module to do the parsing and the writing for you.
The csv module will add "..." quotes around values that contain the separator, so in principle you don't need to replace the | pipe symbols in the values. To replace the original file, write to a new (temporary) outputfile then move that back into place.
import csv
import os
outputfile = inputfile + '.tmp'
with open(inputfile, 'rb') as inf, open(outputfile, 'wb') as outf:
reader = csv.reader(inf)
writer = csv.writer(outf, delimiter='|')
writer.writerows(reader)
os.remove(inputfile)
os.rename(outputfile, inputfile)
For an input file containing:
foo,bar|baz,spam
this produces
foo|"bar|baz"|spam
Note that the middle column is wrapped in quotes.
If you do need to replace the | characters in the values, you can do so as you copy the rows:
outputfile = inputfile + '.tmp'
with open(inputfile, 'rb') as inf, open(outputfile, 'wb') as outf:
reader = csv.reader(inf)
writer = csv.writer(outf, delimiter='|')
for row in reader:
writer.writerow([col.replace('|', ' ') for col in row])
os.remove(inputfile)
os.rename(outputfile, inputfile)
Now the output for my example becomes:
foo|bar baz|spam
Sounds like you're trying to work with a variation of CSV - in that case, Python's CSV library might as well be what you need. You can use it with custom delimiters and it will auto-handle escaping for you (this example was yanked from the manual and modified):
import csv
with open('eggs.csv', 'wb') as csvfile:
spamwriter = csv.writer(csvfile, delimiter='|')
spamwriter.writerow(['One', 'Two', 'Three])
There are also ways to modify quoting and escaping and other options. Reading works similarly.
You can create a temporary file from the original that has the pipe characters replaced, and then replace the original file with it when the processing is done:
import csv
import tempfile
import os
filepath = 'C:/Path/InputFile.csv'
with open(filepath, 'rb') as fin:
reader = csv.DictReader(fin)
fout = tempfile.NamedTemporaryFile(dir=os.path.dirname(filepath)
delete=False)
temp_filepath = fout.name
writer = csv.DictWriter(fout, reader.fieldnames, delimiter='|')
# writer.writeheader() # requires Python 2.7
header = dict(zip(reader.fieldnames, reader.fieldnames))
writer.writerow(header)
for row in reader:
for k,v in row.items():
row[k] = v.replace('|'. ' ')
writer.writerow(row)
fout.close()
os.remove(filepath)
os.rename(temp_filepath, filepath)

re.sub for a csv file

I am receiving a error on this code. It is "TypeError: expected string or buffer". I looked around, and found out that the error is because I am passing re.sub a list, and it does not take lists. However, I wasn't able to figure out how to change my line from the csv file into something that it would read.
I am trying to change all the periods in a csv file into commas. Here is my code:
import csv
import re
in_file = open("/test.csv", "rb")
reader = csv.reader(in_file)
out_file = open("/out.csv", "wb")
writer = csv.writer(out_file)
for row in reader:
newrow = re.sub(r"(\.)+", ",", row)
writer.writerow(newrow)
in_file.close()
out_file.close()
I'm sorry if this has already been answered somewhere. There was certainly a lot of answers regarding this error, but I couldn't make any of them work with my csv file. Also, as a side note, this was originally an .xslb excel file that I converted into csv in order to be able to work with it. Was that necessary?
You could use list comprehension to apply your substitution to each item in row
for row in reader:
newrow = [re.sub(r"(\.)+", ",", item) for item in row]
writer.writerow(newrow)
for row in reader does not return single element to parse it rather it returns list of of elements in that row so you have to unpack that list and parse each item individually, just like #Trii shew you:
[re.sub(r'(\.)+','.',s) for s in row]
In this case, we are using glob to access all the csv files in the directory.
The code below overwrites the source csv file, so there is no need to create an output file.
NOTE:
If you want to get a second file with the parameters provided with re.sub, replace write = open(i, 'w') for write = open('secondFile.csv', 'w')
import re
import glob
for i in glob.glob("*.csv"):
read = open(i, 'r')
reader = read.read()
csvRe = re.sub(re.sub(r"(\.)+", ",", str(reader))
write = open(i, 'w')
write.write(csvRe)
read.close()
write.close()

Categories

Resources