I have a .csv file with many lines and with the structure:
YYY-MM-DD HH first_name quantity_number second_name first_number second_number third_number
I have a script in python to convert the separator from space to comma, and that working fine.
import csv
with open('file.csv') as infile, open('newfile.dat', 'w') as outfile:
for line in infile:
outfile.write(" ".join(line.split()).replace(' ', ','))
I need change, in the newfile.dat, the position of each value, for example put the HH value in position 6, the second_name value in position 2, etc.
Thanks in advance for your help.
If you're import csv might as well use it
import csv
with open('file.csv', newline='') as infile, open('newfile.dat', 'w+', newline='') as outfile:
read = csv.reader(infile, delimiter=' ')
write = csv.writer(outfile) #defaults to excel format, ie commas
for line in read:
write.writerow(line)
Use newline='' when opening csv files, otherwise you get double spaced files.
This just writes the line as it is in the input. If you want to change it before writing, do it in the for line in read: loop. line is a list of strings, which you can change the order of in any number of ways.
One way to reorder the values is to use operator.itemgetter:
from operator import itemgetter
getter = itemgetter(5,4,3,2,1,0) #This will reverse a six_element list
for line in read:
write.writerow(getter(line))
To reorder the items, a basic way could be as follows:
split_line = line.split(" ")
column_mapping = [9,6,3,7,3,2,1]
reordered = [split_line[c] for c in column_mapping]
joined = ",".join(reordered)
outfile.write(joined)
This splits up the string, reorders it according to column_mapping and then combines it back into one string (comma separated)
(in your code don't include column_mapping in the loop to avoid reinitialising it)
Related
I have a string within a text file that reads as one row, but I need to split the string into multiple rows based on a separator. If possible, I would like to separate the elements in the string based on the period (.) separating the different line elements listed here:
"Line 1: Element '{URL1}Decimal': 'x' is not a valid value of the atomic type 'xs:decimal'.Line 2: Element '{URL2}pos': 'y' is not a valid value of the atomic type 'xs:double'.Line 3: Element '{URL3}pos': 'y z' is not a valid value of the list type '{list1}doubleList'"
Here is my current script that is able to read the .txt file and convert it to a csv, but does not separate each entry into it's own row.
import glob
import csv
import os
path = "C:\\Users\\mdl518\\Desktop\\txt_strip\\"
with open(os.path.join(path,"test.txt"), 'r') as infile, open(os.path.join(path,"test.csv"), 'w') as outfile:
stripped = (line.strip() for line in infile)
lines = (line.split(",") for line in stripped if line)
writer = csv.writer(outfile)
writer.writerows(lines)
If possible, I would like to be able to just write to a .txt with multiple rows but a .csv would also work - Any help is most appreciated!
One way to make it work:
import glob
import csv
import os
path = "C:\\Users\\mdl518\\Desktop\\txt_strip\\"
with open(os.path.join(path,"test.txt"), 'r') as infile, open(os.path.join(path,"test.csv"), 'w') as outfile:
stripped = (line.strip() for line in infile)
lines = ([sent] for para in (line.split(".") for line in stripped if line) for sent in para)
writer = csv.writer(outfile)
writer.writerows(lines)
Explanation below:
The output is one line because code in the last line reads a 2d array and there is only one instance in that 2d array which is the entire paragraph. To visualise it, "lines" is stored as [[s1,s2,s3]] where writer.writerows() takes rows input as [[s1],[s2],[s3]]
There can be two improvements.
(1) Take period '.' as seperator. line.split(".")
(2) Iterate over the split list in the list comprehension.
lines = ([sent] for para in (line.split(".") for line in stripped if line) for sent in para)
str.split() splits a string by separator and store instances in a list. In your case, it tried to store the list in a list comprehension which made it a 2d array. It saves your paragraph into [[s1,s2,s3]]
I have a Python script where I'm importing a csv that has commas in values over 1000. These values are strings in the csv. I need to remove the commas from the values, and convert the strings to rounded floats inside the csv before it's imported into Python.
I've tried appending all the new values to a list to use the csv.writer, but I haven't been able to figure out how to have the writer only replace the values in the column that have commas. Here's what I have so far. :
import csv
RoomReport = r'path_to_csv'
new_values_list = []
f = open(RoomReport, "r")
reader = csv.reader(f)
writer = csv.writer(f)
for row in reader:
useable_area = row[7]
if "," in useable_area:
useable_area_no_comma = useable_area.replace(",","")
useable_area_rounded = int(round(float(useable_area_no_comma)))
new_values_list.append(useable_area_rounded)
f.close()
As I mentioned in a comment, this can only be done if the input csv file is formatted in a way that will allow the commas in the numbers to be differentiated from the commas between each one of them.
Here's an example of one way it could be done (by quoting all the values):
"0","1","2","3","4","5","6","7,123.6","8","9"
"0","1","2","3","4","5","6","1,000","8","9"
"0","1","2","3","4","5","6","20,000","8","9"
Here's code that will do what you want. It uses the locale.atof function to simplify cleaning up the number:
import csv
import locale
# Set local to someplace that uses a comma for the thousands separator.
locale.setlocale(locale.LC_ALL, 'English_US.1252')
RoomReport = r'RoomReport.csv'
cleaned_report = r'RoomReport_cleaned.csv'
new_values_list = []
with open(RoomReport, "r", newline='') as inp:
for row in csv.reader(inp):
if "," in row[7]:
row[7] = int(round(locale.atof(row[7])))
new_values_list.append(row)
# Create cleaned-up output file.
with open(cleaned_report, "w", newline='') as outp:
csv.writer(outp, quoting=csv.QUOTE_ALL).writerows(new_values_list)
The RoomReport_cleaned.csv it creates from the example input will contain this:
"0","1","2","3","4","5","6","7124","8","9"
"0","1","2","3","4","5","6","1000","8","9"
"0","1","2","3","4","5","6","20000","8","9"
Note that since the values in the output no longer have commas embedded in them, the quoting all fields is not longer necessary—so could be left out by not specifying csv.QUOTE_ALL.
maybe something like this?
import re
from sys import stdout
isnum = re.compile('^[0-9, ]+$')
non = re.compile('[, ]')
fd = StringIO()
out = csv.writer(fd)
out.writerow(['foo','1,000,000',19])
out.writerow(['bar','1,234,567',20])
fd.seek(0)
inp = csv.reader(fd)
out = csv.writer(stdout)
for row in inp:
for i, x in enumerate(row):
if isnum.match(x):
row[i] = float(non.sub('', x))
out.writerow(row)
I have a file saved as .csv
"400":0.1,"401":0.2,"402":0.3
Ultimately I want to save the data in a proper format in a csv file for further processing. The problem is that there are no line breaks in the file.
pathname = r"C:\pathtofile\file.csv"
with open(pathname, newline='') as file:
reader = file.read().replace(',', '\n')
print(reader)
with open(r"C:\pathtofile\filenew.csv", 'w') as new_file:
csv_writer = csv.writer(new_file)
csv_writer.writerow(reader)
The print reader output looks exactly how I want (or at least it's a format I can further process).
"400":0.1
"401":0.2
"402":0.3
And now I want to save that to a new csv file. However the output looks like
"""",4,0,0,"""",:,0,.,1,"
","""",4,0,1,"""",:,0,.,2,"
","""",4,0,2,"""",:,0,.,3
I'm sure it would be intelligent to convert the format to
400,0.1
401,0.2
402,0.3
at this stage instead of doing later with another script.
The main problem is that my current code
with open(pathname, newline='') as file:
reader = file.read().replace(',', '\n')
reader = csv.reader(reader,delimiter=':')
x = []
y = []
print(reader)
for row in reader:
x.append( float(row[0]) )
y.append( float(row[1]) )
print(x)
print(y)
works fine for the type of csv files I currently have, but doesn't work for these mentioned above:
y.append( float(row[1]) )
IndexError: list index out of range
So I'm trying to find a way to work with them too. I think I'm missing something obvious as I imagine that it can't be too hard to properly define the linebreak character and delimiter of a file.
with open(pathname, newline=',') as file:
yields
ValueError: illegal newline value: ,
The right way with csv module, without replacing and casting to float:
import csv
with open('file.csv', 'r') as f, open('filenew.csv', 'w', newline='') as out:
reader = csv.reader(f)
writer = csv.writer(out, quotechar=None)
for r in reader:
for i in r:
writer.writerow(i.split(':'))
The resulting filenew.csv contents (according to your "intelligent" condition):
400,0.1
401,0.2
402,0.3
Nuances:
csv.reader and csv.writer objects treat comma , as default delimiter (no need to file.read().replace(',', '\n'))
quotechar=None is specified for csv.writer object to eliminate double quotes around the values being saved
You need to split the values to form a list to represent a row. Presently the code is splitting the string into individual characters to represent the row.
pathname = r"C:\pathtofile\file.csv"
with open(pathname) as old_file:
with open(r"C:\pathtofile\filenew.csv", 'w') as new_file:
csv_writer = csv.writer(new_file, delimiter=',')
text_rows = old_file.read().split(",")
for row in text_rows:
items = row.split(":")
csv_writer.writerow([int(items[0]), items[1])
If you look at the documentation, for write_row, it says:
Write the row parameter to the writer’s file
object, formatted according to the current dialect.
But, you are writing an entire string in your code
csv_writer.writerow(reader)
because reader is a string at this point.
Now, the format you want to use in your CSV file is not clearly mentioned in the question. But as you said, if you can do some preprocessing to create a list of lists and pass each sublist to writerow(), you should be able to produce the required file format.
A CSV returns the following values
"1,323104,564382"
"2,322889,564483"
"3,322888,564479"
"4,322920,564425"
"5,322942,564349"
"6,322983,564253"
"7,322954,564154"
"8,322978,564121"
How would i take the " marks off each end of the rows, it seems to make individual columns when i do this.
reader=[[i[0].replace('\'','')] for i in reader]
does not change the file at all
It seems strictly easier to peel the quotes off first, and then feed it to the csv reader, which simply takes any iterable over lines as input.
import csv
import sys
f = open(sys.argv[1])
contents = f.read().replace('"', '')
reader = csv.reader(contents.splitlines())
for x,y,z in reader:
print x,y,z
Assuming every line is wrapped by two double quotes, we can do this:
f = open("filename.csv", "r")
newlines = []
for line in f: # we could use a list comprehension, but for simplicity, we won't.
newlines.append(line[1:-1])
f.close()
f2 = open("filename.csv", "w")
for index, line in enumerate(f2):
f2.write(newlines[index])
f2.close()
[1:-1] uses a list-indexing operation to get the second letter of the string to the last letter of the string, each represented by the indexes 1 and -1.
enumerate() is a helper function that turns an iterable into (0, first_element), (1, second_element), ... pairs.
Iterating over a file gets you its lines.
I'm having some problems with the following file.
Each line has the following content:
foobar 1234.569 7890.125 12356.789 -236.4569 236.9874 -569.9844
What I want to edit in this file, is reverse last three numbers, positive or negative.
The output should be:
foobar 1234.569 7890.125 12356.789 236.4569 -236.9874 569.9844
Or even better:
foobar,1234.569,7890.125,12356.789,236.4569,-236.9874,569.9844
What is the easiest pythonic way to accomplish this?
At first I used the csv.reader, but I found out it's not tab separated, but random (3-5) spaces.
I've read the CSV module and some examples / similar questions here, but my knowledge of python ain't that good and the CSV module seems pretty tough when you want to edit a value of a row.
I can import and edit this in excel with no problem, but I want to use it in a python script, since I have hundreds of these files. VBA in excel is not an option.
Would it be better to just regex each line?
If so, can someone point me in a direction with an example?
You can use str.split() to split your white-space-separated lines into a row:
row = line.split()
then use csv.writer() to create your new file.
str.split() with no arguments, or None as the first argument, splits on arbitrary-width whitespace and ignores leading and trailing whitespace on the line:
>>> 'foobar 1234.569 7890.125 12356.789 -236.4569 236.9874 -569.9844\n'.split()
['foobar', '1234.569', '7890.125', '12356.789', '-236.4569', '236.9874', '-569.9844']
As a complete script:
import csv
with open(inputfilename, 'r') as infile, open(outputcsv, 'wb') as outfile:
writer = csv.writer(outfile)
for line in infile:
row = line.split()
inverted_nums = [-float(val) for val in row[-3:]]
writer.writerow(row[:-3] + inverted_nums)
from operator import neg
with open('file.txt') as f:
for line in f:
line = line.rstrip().split()
last3 = map(str,map(neg,map(float,line[-3:])))
print("{0},{1}".format(line[0],','.join(line[1:-3]+last3)))
Produces:
>>>
foobar,1234.569,7890.125,12356.789,236.4569,-236.9874,569.9844
CSV outputting version:
with open('file.txt') as f, open('ofile.txt','w+') as o:
writer = csv.writer(o)
for line in f:
line = line.rstrip().split()
last3 = map(neg,map(float,line[-3:]))
writer.writerow(line[:-3]+last3)
You could use genfromtxt:
import numpy as np
a=np.genfromtxt('foo.csv', dtype=None)
with open('foo.csv','w') as f:
for el in a[()]:
f.write(str(el)+',')