I have a dataset of about 10 CSV files. I want to combine those files row-wise into a single CSV file.
What I tried:
import csv
fout = open("claaassA.csv","a")
# first file:
writer = csv.writer(fout)
for line in open("a01.ihr.60.ann.csv"):
print line
writer.writerow(line)
# now the rest:
for num in range(2, 10):
print num
f = open("a0"+str(num)+".ihr.60.ann.csv")
#f.next() # skip the header
for line in f:
print line
writer.writerow(line)
#f.close() # not really needed
fout.close()
Definitively need more details in the question (ideally examples of the inputs and expected output).
Given the little information provided, I will assume that you know that all files are valid CSV and they all have the same number or lines (rows). I'll also assume that memory is not a concern (i.e. they are "small" files that fit together in memory). Furthermore, I assume that line endings are new line (\n).
If all these assumptions are valid, then you can do something like this:
input_files = ['file1.csv', 'file2.csv', 'file3.csv']
output_file = 'output.csv'
output = None
for infile in input_files:
with open(infile, 'r') as fh:
if output:
for i, l in enumerate(fh.readlines()):
output[i] = "{},{}".format(output[i].rstrip('\n'), l)
else:
output = fh.readlines()
with open(output_file, 'w') as fh:
for line in output:
fh.write(line)
There are probably more efficient ways, but this is a quick and dirty way to achieve what I think you are asking for.
The previous answer implicitly assumes we need to do this in python. If bash is an option then you could use the paste command. For example:
paste -d, file1.csv file2.csv file3.csv > output.csv
I don't understand fully why you use the library csv. Actually, it's enough to fill the output file with the lines from given files (it they have the same columns' manes and orders).
input_path_list = [
"a01.ihr.60.ann.csv",
"a02.ihr.60.ann.csv",
"a03.ihr.60.ann.csv",
"a04.ihr.60.ann.csv",
"a05.ihr.60.ann.csv",
"a06.ihr.60.ann.csv",
"a07.ihr.60.ann.csv",
"a08.ihr.60.ann.csv",
"a09.ihr.60.ann.csv",
]
output_path = "claaassA.csv"
with open(output_path, "w") as fout:
header_written = False
for intput_path in input_path_list:
with open(intput_path) as fin:
header = fin.next()
# it adds the header at the beginning and skips other headers
if not header_written:
fout.write(header)
header_written = True
# it adds all rows
for line in fin:
fout.write(line)
I'm having some problems with the following file.
Each line has the following content:
foobar 1234.569 7890.125 12356.789 -236.4569 236.9874 -569.9844
What I want to edit in this file, is reverse last three numbers, positive or negative.
The output should be:
foobar 1234.569 7890.125 12356.789 236.4569 -236.9874 569.9844
Or even better:
foobar,1234.569,7890.125,12356.789,236.4569,-236.9874,569.9844
What is the easiest pythonic way to accomplish this?
At first I used the csv.reader, but I found out it's not tab separated, but random (3-5) spaces.
I've read the CSV module and some examples / similar questions here, but my knowledge of python ain't that good and the CSV module seems pretty tough when you want to edit a value of a row.
I can import and edit this in excel with no problem, but I want to use it in a python script, since I have hundreds of these files. VBA in excel is not an option.
Would it be better to just regex each line?
If so, can someone point me in a direction with an example?
You can use str.split() to split your white-space-separated lines into a row:
row = line.split()
then use csv.writer() to create your new file.
str.split() with no arguments, or None as the first argument, splits on arbitrary-width whitespace and ignores leading and trailing whitespace on the line:
>>> 'foobar 1234.569 7890.125 12356.789 -236.4569 236.9874 -569.9844\n'.split()
['foobar', '1234.569', '7890.125', '12356.789', '-236.4569', '236.9874', '-569.9844']
As a complete script:
import csv
with open(inputfilename, 'r') as infile, open(outputcsv, 'wb') as outfile:
writer = csv.writer(outfile)
for line in infile:
row = line.split()
inverted_nums = [-float(val) for val in row[-3:]]
writer.writerow(row[:-3] + inverted_nums)
from operator import neg
with open('file.txt') as f:
for line in f:
line = line.rstrip().split()
last3 = map(str,map(neg,map(float,line[-3:])))
print("{0},{1}".format(line[0],','.join(line[1:-3]+last3)))
Produces:
>>>
foobar,1234.569,7890.125,12356.789,236.4569,-236.9874,569.9844
CSV outputting version:
with open('file.txt') as f, open('ofile.txt','w+') as o:
writer = csv.writer(o)
for line in f:
line = line.rstrip().split()
last3 = map(neg,map(float,line[-3:]))
writer.writerow(line[:-3]+last3)
You could use genfromtxt:
import numpy as np
a=np.genfromtxt('foo.csv', dtype=None)
with open('foo.csv','w') as f:
for el in a[()]:
f.write(str(el)+',')
I have written a python script to process a set of ASCII files within a given dir. I wonder if there is a more concise and/or "pythonesque" way to do it, without loosing readability?
Python Code
import os
import fileinput
import glob
import string
indir='./'
outdir='./processed/'
for filename in glob.glob(indir+'*.asc'): # get a list of input ASCII files to be processed
fin=open(indir+filename,'r') # input file
fout=open(outdir+filename,'w') # out: processed file
lines = iter(fileinput.input([indir+filename])) # iterator over all lines in the input file
fout.write(next(lines)) # just copy the first line (the header) to output
for line in lines:
val=iter(string.split(line,' '))
fout.write('{0:6.2f}'.format(float(val.next()))), # first value in the line has it's own format
for x in val: # iterate over the rest of the numbers in the line
fout.write('{0:10.6f}'.format(float(val.next()))), # the rest of the values in the line has a different format
fout.write('\n')
fin.close()
fout.close()
An example:
Input:
;;; This line is the header line
-5.0 1.090074154029272 1.0034662411357929 0.87336062116561186 0.78649408279093869 0.65599958665017222 0.4379879132749317 0.26310799350679176 0.087808018565486673
-4.9900000000000002 1.0890770415316042 1.0025480136545413 0.87256100700428996 0.78577373527626004 0.65539842673645277 0.43758616966566649 0.26286647978335914 0.087727357602906453
-4.9800000000000004 1.0880820021223023 1.0016316956763136 0.87176305623792771 0.78505488659611744 0.65479851808106115 0.43718526271594083 0.26262546925502467 0.087646864773454014
-4.9700000000000006 1.0870890372077564 1.0007172884938402 0.87096676998908273 0.78433753775986659 0.65419986152386733 0.4367851929843618 0.26238496225635727 0.087566540188423345
-4.9600000000000009 1.086098148170821 0.99980479337809591 0.87017214936140763 0.78362168975984026 0.65360245789061966 0.4363859610200459 0.26214495911617541 0.087486383957276398
Processed:
;;; This line is the header line
-5.00 1.003466 0.786494 0.437988 0.087808
-4.99 1.002548 0.785774 0.437586 0.087727
-4.98 1.001632 0.785055 0.437185 0.087647
-4.97 1.000717 0.784338 0.436785 0.087567
-4.96 0.999805 0.783622 0.436386 0.087486
Other than a few minor changes, due to how Python has changed through time, this looks fine.
You're mixing two different styles of next(); the old way was it.next() and the new is next(it). You should use the string method split() instead of going through the string module (that module is there mostly for backwards compatibility to Python 1.x). There's no need to use go through the almost useless "fileinput" module, since open file handle are also iterators (that module comes from a time before Python's file handles were iterators.)
Edit: As #codeape pointed out, glob() returns the full path. Your code would not have worked if indir was something other than "./". I've changed the following to use the correct listdir/os.path.join solution. I'm also more familiar with the "%" string interpolation than string formatting.
Here's how I would write this in more idiomatic modern Python
def reformat(fin, fout):
fout.write(next(fin)) # just copy the first line (the header) to output
for line in fin:
fields = line.split(' ')
# Make a format header specific to the number of fields
fmt = '%6.2f' + ('%10.6f' * (len(fields)-1)) + '\n'
fout.write(fmt % tuple(map(float, fields)))
basenames = os.listdir(indir) # get a list of input ASCII files to be processed
for basename in basenames:
input_filename = os.path.join(indir, basename)
output_filename = os.path.join(outdir, basename)
with open(input_filename, 'r') as fin, open(output_filename, 'w') as fout:
reformat(fin, fout)
The Zen of Python is "There should be one-- and preferably only one --obvious way to do it". It's interesting how you functions which, during the last 10+ years, was "obviously" the right solution, but are no longer. :)
fin=open(indir+filename,'r') # input file
fout=open(outdir+filename,'w') # out: processed file
#code
fin.close()
fout.close()
can be written as:
with open(indir+filename,'r') as fin, open(outdir+filename,'w') as fout:
#code
In python 2.6, you can use:
with open(indir+filename,'r') as fin:
with open(outdir+filename,'w') as fout:
#code
And the line
lines = iter(fileinput.input([indir+filename]))
is useless. You can just iterate over an open file(fin in your case)
You can also do line.split(' ') instead of string.split(line, ' ')
If you change those things, there is no need to import string and fileinput.
Edit: I didn't know you can use inline code. That's cool
In my build script, I have this code:
inFile = open(sourceFile,'r')
outFile = open(targetFile,'w')
for line in inFile:
line = doKeywordSubstitution(line)
outFile.write(line)
inFile.close()
outFile.close()
I don't know of a way to make this any more concise. Putting the line-changing logic in a different function looks neater to me though.
I may be missing the point of your code, but I don't understand why you have lines = iter(fileinput.input([indir+filename])).
I don't understand why do you use: string.split(line, ' ') instead of just line.split(' ').
Well maybe I would write the string-processing part like this:
values = line.split(' ')
values[0] = '{0:6.2f}'.format(float(values[0]))
values[1:] = ['{0:10.6f}'.format(float(v)) for v in values[1:]]
fout.write(' '.join(values))
At least for me this looks better but this might be subjective :)
Instead of indir I would use os.curdir. Instead of "./processed" I would do: os.path.join(os.curdir, 'processed').
How can I insert a string at the beginning of each line in a text file, I have the following code:
f = open('./ampo.txt', 'r+')
with open('./ampo.txt') as infile:
for line in infile:
f.insert(0, 'EDF ')
f.close
I get the following error:
'file' object has no attribute 'insert'
Python comes with batteries included:
import fileinput
import sys
for line in fileinput.input(['./ampo.txt'], inplace=True):
sys.stdout.write('EDF {l}'.format(l=line))
Unlike the solutions already posted, this also preserves file permissions.
You can't modify a file inplace like that. Files do not support insertion. You have to read it all in and then write it all out again.
You can do this line by line if you wish. But in that case you need to write to a temporary file and then replace the original. So, for small enough files, it is just simpler to do it in one go like this:
with open('./ampo.txt', 'r') as f:
lines = f.readlines()
lines = ['EDF '+line for line in lines]
with open('./ampo.txt', 'w') as f:
f.writelines(lines)
Here's a solution where you write to a temporary file and move it into place. You might prefer this version if the file you are rewriting is very large, since it avoids keeping the contents of the file in memory, as versions that involve .read() or .readlines() will. In addition, if there is any error in reading or writing, your original file will be safe:
from shutil import move
from tempfile import NamedTemporaryFile
filename = './ampo.txt'
tmp = NamedTemporaryFile(delete=False)
with open(filename) as finput:
with open(tmp.name, 'w') as ftmp:
for line in finput:
ftmp.write('EDF '+line)
move(tmp.name, filename)
For a file not too big:
with open('./ampo.txt', 'rb+') as f:
x = f.read()
f.seek(0,0)
f.writelines(('EDF ', x.replace('\n','\nEDF ')))
f.truncate()
Note that , IN THEORY, in THIS case (the content is augmented), the f.truncate() may be not really necessary. Because the with statement is supposed to close the file correctly, that is to say, writing an EOF (end of file ) at the end before closing.
That's what I observed on examples.
But I am prudent: I think it's better to put this instruction anyway. For when the content diminishes, the with statement doesn't write an EOF to close correctly the file less far than the preceding initial EOF, hence trailing initial characters remains in the file.
So if the with statement doens't write EOF when the content diminishes, why would it write it when the content augments ?
For a big file, to avoid to put all the content of the file in RAM at once:
import os
def addsomething(filepath, ss):
if filepath.rfind('.') > filepath.rfind(os.sep):
a,_,c = filepath.rpartition('.')
tempi = a + 'temp.' + c
else:
tempi = filepath + 'temp'
with open(filepath, 'rb') as f, open(tempi,'wb') as g:
g.writelines(ss + line for line in f)
os.remove(filepath)
os.rename(tempi,filepath)
addsomething('./ampo.txt','WZE')
f = open('./ampo.txt', 'r')
lines = map(lambda l : 'EDF ' + l, f.readlines())
f.close()
f = open('./ampo.txt', 'w')
map(lambda l : f.write(l), lines)
f.close()
How do I write a list to a file? writelines() doesn't insert newline characters, so I need to do:
f.writelines([f"{line}\n" for line in lines])
Use a loop:
with open('your_file.txt', 'w') as f:
for line in lines:
f.write(f"{line}\n")
For Python <3.6:
with open('your_file.txt', 'w') as f:
for line in lines:
f.write("%s\n" % line)
For Python 2, one may also use:
with open('your_file.txt', 'w') as f:
for line in lines:
print >> f, line
If you're keen on a single function call, at least remove the square brackets [], so that the strings to be printed get made one at a time (a genexp rather than a listcomp) -- no reason to take up all the memory required to materialize the whole list of strings.
What are you going to do with the file? Does this file exist for humans, or other programs with clear interoperability requirements?
If you are just trying to serialize a list to disk for later use by the same python app, you should be pickleing the list.
import pickle
with open('outfile', 'wb') as fp:
pickle.dump(itemlist, fp)
To read it back:
with open ('outfile', 'rb') as fp:
itemlist = pickle.load(fp)
Simpler is:
with open("outfile", "w") as outfile:
outfile.write("\n".join(itemlist))
To ensure that all items in the item list are strings, use a generator expression:
with open("outfile", "w") as outfile:
outfile.write("\n".join(str(item) for item in itemlist))
Remember that itemlist takes up memory, so take care about the memory consumption.
Using Python 3 and Python 2.6+ syntax:
with open(filepath, 'w') as file_handler:
for item in the_list:
file_handler.write("{}\n".format(item))
This is platform-independent. It also terminates the final line with a newline character, which is a UNIX best practice.
Starting with Python 3.6, "{}\n".format(item) can be replaced with an f-string: f"{item}\n".
Yet another way. Serialize to json using simplejson (included as json in python 2.6):
>>> import simplejson
>>> f = open('output.txt', 'w')
>>> simplejson.dump([1,2,3,4], f)
>>> f.close()
If you examine output.txt:
[1, 2, 3, 4]
This is useful because the syntax is pythonic, it's human readable, and it can be read by other programs in other languages.
I thought it would be interesting to explore the benefits of using a genexp, so here's my take.
The example in the question uses square brackets to create a temporary list, and so is equivalent to:
file.writelines( list( "%s\n" % item for item in list ) )
Which needlessly constructs a temporary list of all the lines that will be written out, this may consume significant amounts of memory depending on the size of your list and how verbose the output of str(item) is.
Drop the square brackets (equivalent to removing the wrapping list() call above) will instead pass a temporary generator to file.writelines():
file.writelines( "%s\n" % item for item in list )
This generator will create newline-terminated representation of your item objects on-demand (i.e. as they are written out). This is nice for a couple of reasons:
Memory overheads are small, even for very large lists
If str(item) is slow there's visible progress in the file as each item is processed
This avoids memory issues, such as:
In [1]: import os
In [2]: f = file(os.devnull, "w")
In [3]: %timeit f.writelines( "%s\n" % item for item in xrange(2**20) )
1 loops, best of 3: 385 ms per loop
In [4]: %timeit f.writelines( ["%s\n" % item for item in xrange(2**20)] )
ERROR: Internal Python error in the inspect module.
Below is the traceback from this internal error.
Traceback (most recent call last):
...
MemoryError
(I triggered this error by limiting Python's max. virtual memory to ~100MB with ulimit -v 102400).
Putting memory usage to one side, this method isn't actually any faster than the original:
In [4]: %timeit f.writelines( "%s\n" % item for item in xrange(2**20) )
1 loops, best of 3: 370 ms per loop
In [5]: %timeit f.writelines( ["%s\n" % item for item in xrange(2**20)] )
1 loops, best of 3: 360 ms per loop
(Python 2.6.2 on Linux)
Because i'm lazy....
import json
a = [1,2,3]
with open('test.txt', 'w') as f:
f.write(json.dumps(a))
#Now read the file back into a Python list object
with open('test.txt', 'r') as f:
a = json.loads(f.read())
Serialize list into text file with comma sepparated value
mylist = dir()
with open('filename.txt','w') as f:
f.write( ','.join( mylist ) )
In Python 3 you can use print and * for argument unpacking:
with open("fout.txt", "w") as fout:
print(*my_list, sep="\n", file=fout)
Simply:
with open("text.txt", 'w') as file:
file.write('\n'.join(yourList))
In General
Following is the syntax for writelines() method
fileObject.writelines( sequence )
Example
#!/usr/bin/python
# Open a file
fo = open("foo.txt", "rw+")
seq = ["This is 6th line\n", "This is 7th line"]
# Write sequence of lines at the end of the file.
line = fo.writelines( seq )
# Close opend file
fo.close()
Reference
http://www.tutorialspoint.com/python/file_writelines.htm
file.write('\n'.join(list))
Using numpy.savetxt is also an option:
import numpy as np
np.savetxt('list.txt', list, delimiter="\n", fmt="%s")
You can also use the print function if you're on python3 as follows.
f = open("myfile.txt","wb")
print(mylist, file=f)
with open ("test.txt","w")as fp:
for line in list12:
fp.write(line+"\n")
Why don't you try
file.write(str(list))
I recently found Path to be useful. Helps me get around having to with open('file') as f and then writing to the file. Hope this becomes useful to someone :).
from pathlib import Path
import json
a = [[1,2,3],[4,5,6]]
# write
Path("file.json").write_text(json.dumps(a))
# read
json.loads(Path("file.json").read_text())
You can also go through following:
Example:
my_list=[1,2,3,4,5,"abc","def"]
with open('your_file.txt', 'w') as file:
for item in my_list:
file.write("%s\n" % item)
Output:
In your_file.txt items are saved like:
1
2
3
4
5
abc
def
Your script also saves as above.
Otherwise, you can use pickle
import pickle
my_list=[1,2,3,4,5,"abc","def"]
#to write
with open('your_file.txt', 'wb') as file:
pickle.dump(my_list, file)
#to read
with open ('your_file.txt', 'rb') as file:
Outlist = pickle.load(file)
print(Outlist)
Output:
[1, 2, 3, 4, 5, 'abc', 'def']
It save dump the list same as a list when we load it we able to read.
Also by simplejson possible same as above output
import simplejson as sj
my_list=[1,2,3,4,5,"abc","def"]
#To write
with open('your_file.txt', 'w') as file:
sj.dump(my_list, file)
#To save
with open('your_file.txt', 'r') as file:
mlist=sj.load(file)
print(mlist)
This logic will first convert the items in list to string(str). Sometimes the list contains a tuple like
alist = [(i12,tiger),
(113,lion)]
This logic will write to file each tuple in a new line. We can later use eval while loading each tuple when reading the file:
outfile = open('outfile.txt', 'w') # open a file in write mode
for item in list_to_persistence: # iterate over the list items
outfile.write(str(item) + '\n') # write to the file
outfile.close() # close the file
Another way of iterating and adding newline:
for item in items:
filewriter.write(f"{item}" + "\n")
In Python3 You Can use this loop
with open('your_file.txt', 'w') as f:
for item in list:
f.print("", item)
Redirecting stdout to a file might also be useful for this purpose:
from contextlib import redirect_stdout
with open('test.txt', 'w') as f:
with redirect_stdout(f):
for i in range(mylst.size):
print(mylst[i])
i suggest this solution .
with open('your_file.txt', 'w') as f:
list(map(lambda item : f.write("%s\n" % item),my_list))
Let avg be the list, then:
In [29]: a = n.array((avg))
In [31]: a.tofile('avgpoints.dat',sep='\n',dtype = '%f')
You can use %e or %s depending on your requirement.
i think you are looking for an answer like this.
f = open('output.txt','w')
list = [3, 15.2123, 118.3432, 98.2276, 118.0043]
f.write('a= {:>3d}, b= {:>8.4f}, c= {:>8.4f}, d= {:>8.4f}, e=
{:>8.4f}\n'.format(*list))
f.close()
poem = '''\
Programming is fun
When the work is done
if you wanna make your work also fun:
use Python!
'''
f = open('poem.txt', 'w') # open for 'w'riting
f.write(poem) # write text to file
f.close() # close the file
How It Works:
First, open a file by using the built-in open function and specifying the name of
the file and the mode in which we want to open the file. The mode can be a
read mode (’r’), write mode (’w’) or append mode (’a’). We can also specify
whether we are reading, writing, or appending in text mode (’t’) or binary
mode (’b’). There are actually many more modes available and help(open)
will give you more details about them. By default, open() considers the file to
be a ’t’ext file and opens it in ’r’ead mode.
In our example, we first open the file in write text mode and use the write
method of the file object to write to the file and then we finally close the file.
The above example is from the book "A Byte of Python" by Swaroop C H.
swaroopch.com