Clarification:
So if my file has 10 lines:
THe first line is a heading, so I want to append some text at the end of first line
THen I have a list which contains 9 elements..
I want to read that list and append the end of each line with corresponding element..
So basically list[0] to second line, list[1] to third line and so on..
I have a file which is delimted by comma.
something like this:
A,B,C
0.123,222,942
......
Now I want to do something like this:
A,B,C,D #append "D" just once
0.123,222,942,99293
............
This "D" is actually saved in a list so yeah I have this "D"
How do I do this? I mean I know the naive way.
like go thru each line and do something like
string += str(list[i])
Basically how do i append something at the end of the file in pythonic way :)
Just create a new file:
data = ['header', 1, 2, 3, 4]
with open("infile", 'r') as inf, open("infile.2", 'w') as outf:
outf.writelines('%s,%s\n' % (s.strip(), n) for s, n in zip(inf, data))
If you want to "update" the input file, just rename the new one afterwards
import os
os.unlink("infile")
os.rename("infile.2", "infile")
Short answer: Use the csv module.
Long answer:
import csv
newvalues = [...]
with open("path/to/input.csv") as file:
data = list(csv.reader(file))
with open("path/to/input.csv", "w") as file:
writer = csv.writer(file)
for row, newvalue in zip(data, newvalues):
row.append(newvalue)
writer.writerow(row)
Naturally, this depends on the lines in the file and newvalues being the same length. If this isn't the case, you could use something like zip_longest to fill in the excess lines with a given value.
If you are doing this to the different files, we can do it even more easily:
import csv
newvalues = [...]
with open("path/to/input.csv") as from, open("path/to/output.csv", "w") as to:
reader = csv.reader(from)
writer = csv.writer(to)
for row, newvalue in zip(reader, newvalues):
row.append(newvalue)
writer.writerow(row)
This also has the advantage of not reading the entire file into memory, so for very large files, this is a better solution.
Related
I'm trying to delete some number of data rows from a file, essentially just because there are too many data points. I can easily print them to IDLE but when I try to write the lines to a file, all of the data from one row goes into one column. I'm definitely a noob but it seems like this should be "trivial"
I've tried it with writerow and writerows, zip(), with and without [], I've changed the delimiter and line terminator.
import csv
filename = "velocity_result.csv"
with open(filename, "r") as source:
for i, line in enumerate(source):
if i % 2 == 0:
with open ("result.csv", "ab") as result:
result_writer = csv.writer(result, quoting=csv.QUOTE_ALL, delimiter=',', lineterminator='\n')
result_writer.writerow([line])
This is what happens:
input = |a|b|c|d| <row
|e|f|g|h|
output = |abcd|
<every other row deleted
(just one column)
My expectaion is
input = |a|b|c|d| <row
|e|f|g|h|
output = |a|b|c|d|
<every other row deleted
Once you've read the line, it becomes a single item as far as Python is concerned. Sure, maybe it is a string which has comma separated values in it, but it is a single item still. So [line] is a list of 1 item, no matter how it is formatted.\
If you want to make sure the line is recognized as a list of separate values, you need to make it such, perhaps with split:
result_writer.writerow(line.split('<input file delimiter here>'))
Now the line becomes a list of 4 items, so it makes sense for csv writer to write them as 4 separated values in the file.
What would be a pythonic way to create a list of (to illustrate with an example) the fifth string of every line of a text file, assuming it ressembles something like this:
12, 27.i, 3, 6.7, Hello, 438
In this case, the script would add "Hello" (without quotes) to the list.
In other words (to generalize), with an input "input.txt", how could I get a list in python that takes the nth string (n being a defined number) of every line?
Many thanks in advance!
You could use the csv module to read the file, and store all items in the fifth column in a list:
import csv
with open(my_file) as f:
lst = [row[4] for row in csv.reader(f)]
If its a text file it can be as simple as:
with open(my_file, 'r') as f:
mylist = [line.split(',')[4] for line in f] # adds the 5th element of split to my_list
Since you mentioned that you are using a .txt file, you can try this:
f = open('filename.txt').readlines()
f = [i.strip('\n').split(",") for i in f]
new_f = [i[4] for i in f]
This may not be the most efficient solution, but you could also just hard code it e.g. create a variable equivalent to zero, add one to the variable for each word in the line, and append the word to a list when variable = 5. Then reset the variable equal to zero.
A CSV returns the following values
"1,323104,564382"
"2,322889,564483"
"3,322888,564479"
"4,322920,564425"
"5,322942,564349"
"6,322983,564253"
"7,322954,564154"
"8,322978,564121"
How would i take the " marks off each end of the rows, it seems to make individual columns when i do this.
reader=[[i[0].replace('\'','')] for i in reader]
does not change the file at all
It seems strictly easier to peel the quotes off first, and then feed it to the csv reader, which simply takes any iterable over lines as input.
import csv
import sys
f = open(sys.argv[1])
contents = f.read().replace('"', '')
reader = csv.reader(contents.splitlines())
for x,y,z in reader:
print x,y,z
Assuming every line is wrapped by two double quotes, we can do this:
f = open("filename.csv", "r")
newlines = []
for line in f: # we could use a list comprehension, but for simplicity, we won't.
newlines.append(line[1:-1])
f.close()
f2 = open("filename.csv", "w")
for index, line in enumerate(f2):
f2.write(newlines[index])
f2.close()
[1:-1] uses a list-indexing operation to get the second letter of the string to the last letter of the string, each represented by the indexes 1 and -1.
enumerate() is a helper function that turns an iterable into (0, first_element), (1, second_element), ... pairs.
Iterating over a file gets you its lines.
I have a CSV file that looks like this
a,b,c
d1,g4,4m
t,35,6y
mm,5,m
I'm trying to replace all the m's and y's preceded by a number with 'month' and 'year' respectively. I'm using the following script.
import re,csv
out = open ("out.csv", "wb")
file = "in.csv"
with open(file, 'r') as f:
reader = csv.reader(f)
for ss in reader:
s = str(ss)
month_pair = (re.compile('(\d\s*)m'), 'months')
year_pair = (re.compile('(\d\s*)y'), 'years')
def substitute(s, pairs):
for (pattern, substitution) in pairs:
match = pattern.search(s)
if match:
s = pattern.sub(match.group(1)+substitution, s)
return s
pairs = [month_pair, year_pair]
print (substitute(s, pairs))
It does replace but it does that only on the last row, ignoring the ones before it. How can I have it iterate over all the rows and write to another csv file?
You can use positive look-behind :
>>> re.sub(r'(?<=\d)m','months',s)
'a,b,c\nd1,g4,4months\nt,35,6y\nmm,5,m'
>>> re.sub(r'(?<=\d)y','years',s)
'a,b,c\nd1,g4,4m\nt,35,6years\nmm,5,m'
In this line
print (substitute(s, pairs))
your variable s is only the last line in your file. Note how you update s in your file reading to be the current line.
Solutions (choose one):
You could try another for-loop to iterate over all lines.
Or move the substitution into the for-loop where you read the lines of the file. This is definitely the better solution!
You can easily lookup how to write a new file or change the file you are working on.
I am having problems getting my Python script to do what I want. It does not appear to be modifying my file.
I want to:
Read in a *.csv file that has the following format
PropertyName::PropertyValue,…,PropertyName::PropertyValue,{ExtPropertyName::ExtPropertyValue},…,{ExtPropertyName:: ExtPropertyValue}
I want to remove PropertyName:: and leave behid just a column of the PropertyValue
I want to add a header line
I was trying to step through replacing the :: values with a comma, but cant seem to get this to work:
fin = csv.reader(open('infile', 'rb'), delimiter=',')
fout = open('outfile', 'w')
for row in fin:
fout.write(','.join(','.join(item.split()) for item in row) + '::')
fout.close()
Any advice, whether on my first step problem, or to a bigger picture resolution is always appreciated. Thanks.
UPDATE/EDIT asked for by a person nice enough to review for me!
Here is the first line of the *.csv file (INPUT)
InnerDiameterOrWidth::0.1,InnerHeight::0.1,Length2dCenterToCenter::44.6743867864386,Length3dCenterToCenter::44.6768028159989,Length2dToInsideEdge::44.2678260053526,Length3dToInsideEdge::44.2717800813466,Length2dToOutsideEdge::44.6743867864386,Length3dToOutsideEdge::44.6768028159989,MinimumCover::0,MaximumCover::0,StartConnection::ImmxGisUtilityNetworkCommon.Connection,
In a perfect world here is what I would like my text file to look like (OUTPUT)
InnerDiameterOrWidth, InnerHeight, Length2dCenterToCenter,,,,,,,,,,,
0.1,0.1,44.6743867864386
so one header line and the values in column
UPDATED JSON Info
The end of each line has JSON formatted text:
{StartPoint::7858.35924983374[%2C]1703.69341358077[%2C]-3.075},{EndPoint::7822.85045874375[%2C]1730.80294308742[%2C]-3.53962362760298}
WHich I need to split into X Y Z and X Y Z with headers
Maybe something like this (assuming that each line has the same keys, and in the same order):
import csv
with open("diam.csv", "rb") as fin, open("diam_out.csv", "wb") as fout:
reader = csv.reader(fin)
writer = csv.writer(fout)
for i, line in enumerate(reader):
split = [item.split("::") for item in line if item.strip()]
if not split: # blank line
continue
keys, vals = zip(*split)
if i == 0:
# first line: write header
writer.writerow(keys)
writer.writerow(vals)
which produces
localhost-2:coding $ cat diam_out.csv
InnerDiameterOrWidth,InnerHeight,Length2dCenterToCenter,Length3dCenterToCenter,Length2dToInsideEdge,Length3dToInsideEdge,Length2dToOutsideEdge,Length3dToOutsideEdge,MinimumCover,MaximumCover,StartConnection
0.1,0.1,44.6743867864386,44.6768028159989,44.2678260053526,44.2717800813466,44.6743867864386,44.6768028159989,0,0,ImmxGisUtilityNetworkCommon.Connection
I think most of that code should make sense, except maybe the zip(*split) trick: that basically transposes a sequence, i.e.
>>> s = [['a','1'],['b','2']]
>>> zip(*s)
[('a', 'b'), ('1', '2')]
so that the elements are now grouped together by their index (the first ones are all together, the second, etc.)