Remove 1000's separator from column in CSV?

Remove 1000's separator from column in CSV? - python

I have a Python script where I'm importing a csv that has commas in values over 1000. These values are strings in the csv. I need to remove the commas from the values, and convert the strings to rounded floats inside the csv before it's imported into Python.
I've tried appending all the new values to a list to use the csv.writer, but I haven't been able to figure out how to have the writer only replace the values in the column that have commas. Here's what I have so far. :
import csv
RoomReport = r'path_to_csv'
new_values_list = []
f = open(RoomReport, "r")
reader = csv.reader(f)
writer = csv.writer(f)
for row in reader:
useable_area = row[7]
if "," in useable_area:
useable_area_no_comma = useable_area.replace(",","")
useable_area_rounded = int(round(float(useable_area_no_comma)))
new_values_list.append(useable_area_rounded)
f.close()

As I mentioned in a comment, this can only be done if the input csv file is formatted in a way that will allow the commas in the numbers to be differentiated from the commas between each one of them.
Here's an example of one way it could be done (by quoting all the values):
"0","1","2","3","4","5","6","7,123.6","8","9"
"0","1","2","3","4","5","6","1,000","8","9"
"0","1","2","3","4","5","6","20,000","8","9"
Here's code that will do what you want. It uses the locale.atof function to simplify cleaning up the number:
import csv
import locale
# Set local to someplace that uses a comma for the thousands separator.
locale.setlocale(locale.LC_ALL, 'English_US.1252')
RoomReport = r'RoomReport.csv'
cleaned_report = r'RoomReport_cleaned.csv'
new_values_list = []
with open(RoomReport, "r", newline='') as inp:
for row in csv.reader(inp):
if "," in row[7]:
row[7] = int(round(locale.atof(row[7])))
new_values_list.append(row)
# Create cleaned-up output file.
with open(cleaned_report, "w", newline='') as outp:
csv.writer(outp, quoting=csv.QUOTE_ALL).writerows(new_values_list)
The RoomReport_cleaned.csv it creates from the example input will contain this:
"0","1","2","3","4","5","6","7124","8","9"
"0","1","2","3","4","5","6","1000","8","9"
"0","1","2","3","4","5","6","20000","8","9"
Note that since the values in the output no longer have commas embedded in them, the quoting all fields is not longer necessary—so could be left out by not specifying csv.QUOTE_ALL.

maybe something like this?
import re
from sys import stdout
isnum = re.compile('^[0-9, ]+$')
non = re.compile('[, ]')
fd = StringIO()
out = csv.writer(fd)
out.writerow(['foo','1,000,000',19])
out.writerow(['bar','1,234,567',20])
fd.seek(0)
inp = csv.reader(fd)
out = csv.writer(stdout)
for row in inp:
for i, x in enumerate(row):
if isnum.match(x):
row[i] = float(non.sub('', x))
out.writerow(row)

Related

Adding custom delimiters back to a csv?

Currently, I take in a csv file using custom delimiters, "|". I then read it in and modify it using the code below:
import csv
ChangedDate = '2018-10-31'
firstfile = open('example.csv',"r")
firstReader = csv.reader(firstfile, delimiter='|')
firstData = list(firstReader)
outputFile = open("output.csv","w")
iteration = 0
for row in firstData:
firstData[iteration][25] = ChangedDate
iteration+=1
outputwriter = csv.writer(open("output.csv","w"))
outputwriter.writerows(firstData)
outputFile.close()
However, when I write the rows to my output file, they are comma seperated. This is a problem because I am dealing with large financial data, and therefore commas appear naturally, such as $8,000.00, hence the "|" delimiters of the original file. Is there a way to "re-delimit" my list before I write it to an output file?

You can provide the delimiter to the csv.writer:
with open("output.csv", "w") as f:
outputwriter = csv.writer(f, delimiter='|')

Read CSV with comma as linebreak

I have a file saved as .csv
"400":0.1,"401":0.2,"402":0.3
Ultimately I want to save the data in a proper format in a csv file for further processing. The problem is that there are no line breaks in the file.
pathname = r"C:\pathtofile\file.csv"
with open(pathname, newline='') as file:
reader = file.read().replace(',', '\n')
print(reader)
with open(r"C:\pathtofile\filenew.csv", 'w') as new_file:
csv_writer = csv.writer(new_file)
csv_writer.writerow(reader)
The print reader output looks exactly how I want (or at least it's a format I can further process).
"400":0.1
"401":0.2
"402":0.3
And now I want to save that to a new csv file. However the output looks like
"""",4,0,0,"""",:,0,.,1,"
","""",4,0,1,"""",:,0,.,2,"
","""",4,0,2,"""",:,0,.,3
I'm sure it would be intelligent to convert the format to
400,0.1
401,0.2
402,0.3
at this stage instead of doing later with another script.
The main problem is that my current code
with open(pathname, newline='') as file:
reader = file.read().replace(',', '\n')
reader = csv.reader(reader,delimiter=':')
x = []
y = []
print(reader)
for row in reader:
x.append( float(row[0]) )
y.append( float(row[1]) )
print(x)
print(y)
works fine for the type of csv files I currently have, but doesn't work for these mentioned above:
y.append( float(row[1]) )
IndexError: list index out of range
So I'm trying to find a way to work with them too. I think I'm missing something obvious as I imagine that it can't be too hard to properly define the linebreak character and delimiter of a file.
with open(pathname, newline=',') as file:
yields
ValueError: illegal newline value: ,

The right way with csv module, without replacing and casting to float:
import csv
with open('file.csv', 'r') as f, open('filenew.csv', 'w', newline='') as out:
reader = csv.reader(f)
writer = csv.writer(out, quotechar=None)
for r in reader:
for i in r:
writer.writerow(i.split(':'))
The resulting filenew.csv contents (according to your "intelligent" condition):
400,0.1
401,0.2
402,0.3
Nuances:
csv.reader and csv.writer objects treat comma , as default delimiter (no need to file.read().replace(',', '\n'))
quotechar=None is specified for csv.writer object to eliminate double quotes around the values being saved

You need to split the values to form a list to represent a row. Presently the code is splitting the string into individual characters to represent the row.
pathname = r"C:\pathtofile\file.csv"
with open(pathname) as old_file:
with open(r"C:\pathtofile\filenew.csv", 'w') as new_file:
csv_writer = csv.writer(new_file, delimiter=',')
text_rows = old_file.read().split(",")
for row in text_rows:
items = row.split(":")
csv_writer.writerow([int(items[0]), items[1])

If you look at the documentation, for write_row, it says:
Write the row parameter to the writer’s file
object, formatted according to the current dialect.
But, you are writing an entire string in your code
csv_writer.writerow(reader)
because reader is a string at this point.
Now, the format you want to use in your CSV file is not clearly mentioned in the question. But as you said, if you can do some preprocessing to create a list of lists and pass each sublist to writerow(), you should be able to produce the required file format.

Convert from space to comma and reorder the values Python

I have a .csv file with many lines and with the structure:
YYY-MM-DD HH first_name quantity_number second_name first_number second_number third_number
I have a script in python to convert the separator from space to comma, and that working fine.
import csv
with open('file.csv') as infile, open('newfile.dat', 'w') as outfile:
for line in infile:
outfile.write(" ".join(line.split()).replace(' ', ','))
I need change, in the newfile.dat, the position of each value, for example put the HH value in position 6, the second_name value in position 2, etc.
Thanks in advance for your help.

If you're import csv might as well use it
import csv
with open('file.csv', newline='') as infile, open('newfile.dat', 'w+', newline='') as outfile:
read = csv.reader(infile, delimiter=' ')
write = csv.writer(outfile) #defaults to excel format, ie commas
for line in read:
write.writerow(line)
Use newline='' when opening csv files, otherwise you get double spaced files.
This just writes the line as it is in the input. If you want to change it before writing, do it in the for line in read: loop. line is a list of strings, which you can change the order of in any number of ways.
One way to reorder the values is to use operator.itemgetter:
from operator import itemgetter
getter = itemgetter(5,4,3,2,1,0) #This will reverse a six_element list
for line in read:
write.writerow(getter(line))

To reorder the items, a basic way could be as follows:
split_line = line.split(" ")
column_mapping = [9,6,3,7,3,2,1]
reordered = [split_line[c] for c in column_mapping]
joined = ",".join(reordered)
outfile.write(joined)
This splits up the string, reorders it according to column_mapping and then combines it back into one string (comma separated)
(in your code don't include column_mapping in the loop to avoid reinitialising it)

Use python to parse values from ping output into csv

I wrote a code using RE to look for "time=" and save the following value in a string. Then I use the csv.writer attribute writerow, but each number is interpreted as a column, and this gives me trouble later. Unfortunately there is no 'writecolumn' attribute. Should I save the values as an array instead of a string and write every row separately?
import re
import csv
inputfile = open("ping.txt")
teststring = inputfile.read()
values = re.findall(r'time=(\d+.\d+)', teststring)
with open('parsed_ping.csv', 'wb') as csvfile:
writer = csv.writer(csvfile, delimiter=' ',
quotechar='|', quoting=csv.QUOTE_MINIMAL)
writer.writerow(values)
EDIT: I understood that "values" is already a list. I tried to iterate it and write a row for each item with
for item in values:
writer.writerow(item)
Now i get a space after each character, like
4 6 . 6
4 7 . 7
EDIT2: The spaces are the delimiters. If i change the delimiter to comma, i get commas between digits. I just don't get why he's interpreting each digit as a separate column.

If your csv file only contains one column, it's not really a "comma-separated file" anymore, is it?
Just write the list to the file directly:
import re
inputfile = open("ping.txt")
teststring = inputfile.read()
values = re.findall(r'time=(\d+\.\d+)', teststring)
with open('parsed_ping.csv', 'w') as csvfile:
csvfile.write("\n".join(values)

I solved this. I just needed to use square brackets in the writer.
for item in values:
writer.writerow([item])
This gives me the correct output.

Parse delimited csv file using Python, output to terminal or file

I have been working on a Python script to parse a single delimited column in a csv file. However, the column has multiple different delimiters and I can't figure out how to do this.
I have another script that works on similar data, but can't get this one to work. The data below is in a single column on the row. I want to have the script parse these out and add tabs in between each. Then I want to append this data into a list with only the unique items. Typically I am dealing with several hundred rows of this data and would like to parse the entire file and then return only the unique items in two columns (one for IP and other for URL).
Data to parse: 123.123.123.123::url.com,url2.com,234.234.234.234::url3.com (note ":" and "," are used as delimiters on the same line)
Script I am working with:
import sys
import csv
csv_file = csv.DictReader(open(sys.argv[1], 'rb'), delimiter=':')
uniq_rows = []
for column in csv_file:
X = column[' IP'].split(':')[-1]
row = X + '\t'
if row not in uniq_rows:
uniq_rows.append(row)
for row in uniq_rows:
print row
Does anyone know how to accomplish what I am trying to do?

Change the list (uniq_rows = []) to a set (uniq_rows = set()):
csv_file = csv.DictReader(open(sys.argv[1], 'rU'), delimiter=':')
uniq_rows = set()
for column in csv_file:
X = column[' IP'].split(':')[-1]
row = X + '\t'
uniq_rows.add(row)
for row in list(uniq_rows):
print row
If you need further help, leave a comment

you can also just use replace to change your import lines: (not overly pythonic I guess but standard builtin):
>>> a = "123.123.123.123::url.com,url2.com,234.234.234.234::url3.com"
>>> a = a.replace(',','\t')
>>> a = a.replace(':','\t')
>>> print (a)
123.123.123.123 url.com url2.com 234.234.234.234 url3.com
>>>
as mentioned in comment here a simple text manipulation to get you (hopefully) the right output prior to cleaning non duplicates:
import sys
read_raw_file = open('D:filename.csv') # open current file
read_raw_text = read_raw_file.read()
new_text = read_raw_text.strip()
new_text = new_text.replace(',','\t')
# new_text = new_text.replace('::','\t') optional if you want double : to only include one column
new_text = new_text.replace(':','\t')
text_list = new_text.split('\n')
unique_items = []
for row in text_list:
if row not in unique_items:
unique_items.append(row)
new_file ='D:newfile.csv'
with open(new_file,'w') as write_output_file: #generate new file
for i in range(0,len(unique_items)):
write_output_file.write(unique_items[i]+'\n')
write_output_file.close()

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Remove 1000's separator from column in CSV? - python

Related

Adding custom delimiters back to a csv?

Read CSV with comma as linebreak

Convert from space to comma and reorder the values Python

Use python to parse values from ping output into csv

Parse delimited csv file using Python, output to terminal or file

Categories

Resources