How do I concatenate multiple CSV files row-wise using python?

How do I concatenate multiple CSV files row-wise using python? - python

I have a dataset of about 10 CSV files. I want to combine those files row-wise into a single CSV file.
What I tried:
import csv
fout = open("claaassA.csv","a")
# first file:
writer = csv.writer(fout)
for line in open("a01.ihr.60.ann.csv"):
print line
writer.writerow(line)
# now the rest:
for num in range(2, 10):
print num
f = open("a0"+str(num)+".ihr.60.ann.csv")
#f.next() # skip the header
for line in f:
print line
writer.writerow(line)
#f.close() # not really needed
fout.close()

Definitively need more details in the question (ideally examples of the inputs and expected output).
Given the little information provided, I will assume that you know that all files are valid CSV and they all have the same number or lines (rows). I'll also assume that memory is not a concern (i.e. they are "small" files that fit together in memory). Furthermore, I assume that line endings are new line (\n).
If all these assumptions are valid, then you can do something like this:
input_files = ['file1.csv', 'file2.csv', 'file3.csv']
output_file = 'output.csv'
output = None
for infile in input_files:
with open(infile, 'r') as fh:
if output:
for i, l in enumerate(fh.readlines()):
output[i] = "{},{}".format(output[i].rstrip('\n'), l)
else:
output = fh.readlines()
with open(output_file, 'w') as fh:
for line in output:
fh.write(line)
There are probably more efficient ways, but this is a quick and dirty way to achieve what I think you are asking for.
The previous answer implicitly assumes we need to do this in python. If bash is an option then you could use the paste command. For example:
paste -d, file1.csv file2.csv file3.csv > output.csv

I don't understand fully why you use the library csv. Actually, it's enough to fill the output file with the lines from given files (it they have the same columns' manes and orders).
input_path_list = [
"a01.ihr.60.ann.csv",
"a02.ihr.60.ann.csv",
"a03.ihr.60.ann.csv",
"a04.ihr.60.ann.csv",
"a05.ihr.60.ann.csv",
"a06.ihr.60.ann.csv",
"a07.ihr.60.ann.csv",
"a08.ihr.60.ann.csv",
"a09.ihr.60.ann.csv",
]
output_path = "claaassA.csv"
with open(output_path, "w") as fout:
header_written = False
for intput_path in input_path_list:
with open(intput_path) as fin:
header = fin.next()
# it adds the header at the beginning and skips other headers
if not header_written:
fout.write(header)
header_written = True
# it adds all rows
for line in fin:
fout.write(line)

Related

Adding a comma to end of first row of csv files within a directory using python

Ive got some code that lets me open all csv files in a directory and run through them removing the top 2 lines of each file, Ideally during this process I would like it to also add a single comma at the end of the new first line (what would have been originally line 3)
Another approach that's possible could be to remove the trailing comma's on all other rows that appear in each of the csvs.
Any thoughts or approaches would be gratefully received.
import glob
path='P:\pytest'
for filename in glob.iglob(path+'/*.csv'):
with open(filename, 'r') as f:
lines = f.read().split("\n")
f.close()
if len(lines) >= 1:
lines = lines[2:]
o = open(filename, 'w')
for line in lines:
o.write(line+'\n')
o.close()

adding a counter in there can solve this:
import glob
path=r'C:/Users/dsqallihoussaini/Desktop/dev_projects/stack_over_flow'
for filename in glob.iglob(path+'/*.csv'):
with open(filename, 'r') as f:
lines = f.read().split("\n")
print(lines)
f.close()
if len(lines) >= 1:
lines = lines[2:]
o = open(filename, 'w')
counter=0
for line in lines:
counter=counter+1
if counter==1:
o.write(line+',\n')
else:
o.write(line+'\n')
o.close()

One possible problem with your code is that you are reading the whole file into memory, which might be fine. If you are reading larger files, then you want to process the file line by line.
The easiest way to do that is to use the fileinput module: https://docs.python.org/3/library/fileinput.html
Something like the following should work:
#!/usr/bin/env python3
import glob
import fileinput
# inplace makes a backup of the file, then any output to stdout is written
# to the current file.
# change the glob..below is just an example.
#
# Iterate through each file in the glob.iglob() results
with fileinput.input(files=glob.iglob('*.csv'), inplace=True) as f:
for line in f: # Iterate over each line of the current file.
if f.filelineno() > 2: # Skip the first two lines
# Note: 'line' has the newline in it.
# Insert the comma if line 3 of the file, otherwise output original line
print(line[:-1]+',') if f.filelineno() == 3 else print(line, end="")

Ive added some encoding as well as mine was throwing a error but encoding fixed that up nicely
import glob
path=r'C:/whateveryourfolderis'
for filename in glob.iglob(path+'/*.csv'):
with open(filename, 'r',encoding='utf-8') as f:
lines = f.read().split("\n")
#print(lines)
f.close()
if len(lines) >= 1:
lines = lines[2:]
o = open(filename, 'w',encoding='utf-8')
counter=0
for line in lines:
counter=counter+1
if counter==1:
o.write(line+',\n')
else:
o.write(line+'\n')
o.close()

Python prints two lines in the same line when merging files

I am new to Python and I'm getting this result and I am not sure how to fix it efficiently.
I have n files, let's say for simplicity just two, with some info with this format:
1.250484649 4.00E-02
2.173737246 4.06E-02
... ...
This continues up to m lines. I'm trying to append all the m lines from the n files in a single file. I prepared this code:
import glob
outfile=open('temp.txt', 'w')
for inputs in glob.glob('*.dat'):
infile=open(inputs,'r')
for row in infile:
outfile.write(row)
It reads all the .dat files (the ones I am interested in) and it does what I want but it merges the last line of the first file and the first line of the second file into a single line:
1.250484649 4.00E-02
2.173737246 4.06E-02
3.270379524 2.94E-02
3.319202217 6.56E-02
4.228424345 8.91E-03
4.335169497 1.81E-02
4.557886098 6.51E-02
5.111075901 1.50E-02
5.547288248 3.34E-02
5.685118615 3.22E-03
5.923718239 2.86E-02
6.30299944 8.05E-03
6.528018125 1.25E-020.704223685 4.98E-03
1.961058114 3.07E-03
... ...
I'd like to fix this in a smart way. I can fix this if I introduce a blank line between each data line and then at the end remove all the blank likes but this seems suboptimal.
Thank you!

There's no newline on the last line of each .dat file, so you'll need to add it:
import glob
with open('temp.txt', 'w') as outfile:
for inputs in glob.glob('*.dat'):
with open(inputs, 'r') as infile:
for row in infile:
if not row.endswith("\n"):
row = f"{row}\n"
outfile.write(row)
Also using with (context managers) to automatically close the files afterwards.
To avoid a trailing newline - there's a few ways to do this, but the simplest one that comes to mind is to load all the input data into memory as individual lines, then write it out in one go using "\n".join(lines). This puts "\n" between each line but not at the end of the last line in the file.
import glob
lines = []
for inputs in glob.glob('*.dat'):
with open(inputs, 'r') as infile:
lines += [line.rstrip('\n') for line in infile.readlines()]
with open('temp.txt', 'w') as outfile:
outfile.write('\n'.join(lines))
[line.rstrip('\n') for line in infile.readlines()] - this is a list comprehension. It makes a list of each line in an individual input file, with the '\n' removed from the end of the line. It can then be += appended to the overall list of lines.
While we're here - let's use logging to give status updates:
import glob
import logging
OUT_FILENAME = 'test.txt'
lines = []
for inputs in glob.glob('*.dat'):
logging.info(f'Opening {inputs} to read...')
with open(inputs, 'r') as infile:
lines += [line.rstrip('\n') for line in infile.readlines()]
logging.info(f'Finished reading {inputs}')
logging.info(f'Opening {OUT_FILENAME} to write...')
with open(OUT_FILENAME, 'w') as outfile:
outfile.write('\n'.join(lines))
logging.info(f'Finished writing {OUT_FILENAME}')

How to txt file merge using python?

I have two txt files. I want to merge those files using python.
I just began to study python and need some help. I tried searching google to resolve this resolved this but I can't find solution.
So please help me.
Below are my two txt file.
a.txt has this data.
Zone,alias1,alias2
PA1_H1,PA1,H1
PA2_H2,PA2,H2
b.txt has this data.
WWN,Port,Aliases
da,0,PA1
dq,1,PA2
d2,3,H1
d4,1,H2
Expected Output
Zone,alias1,WWN,Port,alias2,WWN,Port
PA1_H1,PA1,da,0,H1,d2,3
PA2_H2,PA2,dq,1,H2,d4,1
I tried below script but I can't merge.
row = []
for line in open("mod_alias.txt"):
line = line.split(',')[2]
row.append(line)
strings = row
for line in open("mod_all_cfgshow.txt"):
if any(s in line for s in strings):
field1,field2,field3 = line.split(',')
print field1,field2,field3
How can I merge file?
Could you show me example?

Here's some code to get you started. This code will show you how to open both files and combine them. Then, all you'll need to do is modify the code to merge the files using whichever specific rules you'd like.
# Open Files, read data to lists, and strip data
with open("b.txt") as bFile:
bContent = bFile.readlines()
bContent = [line.strip() for line in bContent]
with open('a.txt') as aFile:
aContent = aFile.readlines()
aContent = [line.strip() for line in aContent]
# Create a file to store the merged text
m = open('merged.txt','w')
# Cycle through the text read from files and merge, and then print to file
for aLine, bLine in zip(aContent, bContent):
mergedString = aLine+','+bLine
print>>m,mergedString

This should get you started
import csv
# read all the data in b.txt into a dictionary, key'd by the alias. We'll look this up later
data = {}
with open("b.txt") as infile:
for row in csv.DictReader(infile):
alias = row["Aliases"]
data[alias] = row
with open("a.txt") as fin, open("output.txt", 'w') as fout:
infile = csv.DictReader(fin)
outfile = csv.DictWriter(headers=infile.headers+data.keys())
for row in infile:
row.update(data[row["Aliases"]]) # update the row with the data from b.txt
outfile.writerow(row)

"Move" some parts of the file to another file

Let say I have a file with 48,222 lines. I then give an index value, let say, 21,000.
Is there any way in Python to "move" the contents of the file starting from index 21,000 such that now I have two files: the original one and the new one. But the original one now is having 21,000 lines and the new one 27,222 lines.
I read this post which uses partition and is quite describing what I want:
with open("inputfile") as f:
contents1, sentinel, contents2 = f.read().partition("Sentinel text\n")
with open("outputfile1", "w") as f:
f.write(contents1)
with open("outputfile2", "w") as f:
f.write(contents2)
Except that (1) it uses "Sentinel Text" as separator, (2) it creates two new files and require me to delete the old file. As of now, the way I do it is like this:
for r in result.keys(): #the filenames are in my dictionary, don't bother that
f = open(r)
lines = f.readlines()
f.close()
with open("outputfile1.txt", "w") as fn:
for line in lines[0:21000]:
#write each line
with open("outputfile2.txt", "w") as fn:
for line in lines[21000:]:
#write each line
Which is quite a manual work. Is there a built-in or more efficient way?

You can also use writelines() and dump the sliced list of lines from 0 to 20999 into one file and another sliced list from 21000 to the end into another file.
with open("inputfile") as f:
content = f.readlines()
content1 = content[:21000]
content2 = content[21000:]
with open("outputfile1.txt", "w") as fn1:
fn1.writelines(content1)
with open('outputfile2.txt','w') as fn2:
fn2.writelines(content2)

How to read one particular line from .txt file in python?

I know I can read the line by line with
dataFile = open('myfile.txt', 'r')
firstLine = dataFile.readline()
secondLine = dataFile.readline()
...
I also know how to read all the lines in one go
dataFile = open('myfile.txt', 'r')
allLines = dataFile.read()
But my question is how to read one particular line from .txt file?
I wish to read that line by its index.
e.g. I want the 4th line, I expect something like
dataFile = open('myfile.txt', 'r')
allLines = dataFile.readLineByIndex(3)

Skip 3 lines:
with open('myfile.txt', 'r') as dataFile:
for i in range(3):
next(dataFile)
the_4th_line = next(dataFile)
Or use linecache.getline:
the_4th_line = linecache.getline('myfile.txt', 4)

From another Ans
Use Python Standard Library's linecache module:
line = linecache.getline(thefilename, 33)
should do exactly what you want. You don't even need to open the file -- linecache does it all for you!

You can do exactly as you wanted with this:
DataFile = open('mytext.txt', 'r')
content = DataFile.readlines()
oneline = content[5]
DataFile.close()
you could take this down to three lines by removing oneline = content[5] and using content[5] without creating another variable (print(content[5]) for example) I did this just to make it clear that content[5] must be a used as a list to read the one line.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How do I concatenate multiple CSV files row-wise using python? - python

Related

Adding a comma to end of first row of csv files within a directory using python

Python prints two lines in the same line when merging files

How to txt file merge using python?

"Move" some parts of the file to another file

How to read one particular line from .txt file in python?

Categories

Resources