I have two txt files. I want to merge those files using python.
I just began to study python and need some help. I tried searching google to resolve this resolved this but I can't find solution.
So please help me.
Below are my two txt file.
a.txt has this data.
Zone,alias1,alias2
PA1_H1,PA1,H1
PA2_H2,PA2,H2
b.txt has this data.
WWN,Port,Aliases
da,0,PA1
dq,1,PA2
d2,3,H1
d4,1,H2
Expected Output
Zone,alias1,WWN,Port,alias2,WWN,Port
PA1_H1,PA1,da,0,H1,d2,3
PA2_H2,PA2,dq,1,H2,d4,1
I tried below script but I can't merge.
row = []
for line in open("mod_alias.txt"):
line = line.split(',')[2]
row.append(line)
strings = row
for line in open("mod_all_cfgshow.txt"):
if any(s in line for s in strings):
field1,field2,field3 = line.split(',')
print field1,field2,field3
How can I merge file?
Could you show me example?
Here's some code to get you started. This code will show you how to open both files and combine them. Then, all you'll need to do is modify the code to merge the files using whichever specific rules you'd like.
# Open Files, read data to lists, and strip data
with open("b.txt") as bFile:
bContent = bFile.readlines()
bContent = [line.strip() for line in bContent]
with open('a.txt') as aFile:
aContent = aFile.readlines()
aContent = [line.strip() for line in aContent]
# Create a file to store the merged text
m = open('merged.txt','w')
# Cycle through the text read from files and merge, and then print to file
for aLine, bLine in zip(aContent, bContent):
mergedString = aLine+','+bLine
print>>m,mergedString
This should get you started
import csv
# read all the data in b.txt into a dictionary, key'd by the alias. We'll look this up later
data = {}
with open("b.txt") as infile:
for row in csv.DictReader(infile):
alias = row["Aliases"]
data[alias] = row
with open("a.txt") as fin, open("output.txt", 'w') as fout:
infile = csv.DictReader(fin)
outfile = csv.DictWriter(headers=infile.headers+data.keys())
for row in infile:
row.update(data[row["Aliases"]]) # update the row with the data from b.txt
outfile.writerow(row)
Related
What I'm trying to do is to open two CSV files and print only the lines in which the content of a column in file 1 and file 2 match. I already know that I should end up with 14 results, but instead the first line of the CSV file I'm working with gets printed 14 times. Where did I go wrong?
file1 = open("../dir/file1.csv", "r")
for line in file1:
file1splitted = line.strip().split(",")
file2 = open("../dir/file2.csv", "r")
for line in file2:
file2splitted = line.strip().split(",")
for line in file1:
if file1splitted[0] == file2splitted [2]:
print (file1splitted[0],file1splitted[1], file2splitted[6], file2splitted[10], file2splitted[12])
file1.close()
file2.close()
You should be using the csv module for reading these files because splitting on commas is not reliable; it's fine for a single CSV column to contain values that themselves include commas.
I've added a couple of things to try make this cleaner and to help you move forward in your learning:
I've used the with context manager that automatically closes a file once you're done reading it. No need for .close()
I've packaged the csv reading code into a function. Now we only need to write that part once and we can call the function with any file.
I've used the csv module to read the file. This will return a nested list of rows, each inner list representing a single row.
I've used a list comprehension which is a neater way of writing a for loop that creates a list. In this case, it's a list of all the items in the first column of file_1.
I've converted the list in Point 4 into a set. When we iterate through file_2, we can very quickly check whether a row value has been seen in file_1 (set lookup is O(1) rather than having to iterate through file_1 every single time).
The indices I print are from my own test files, you will need to adapt them to your own use-case.
import csv
def read_csv(file_name):
with open(file_name) as infile: # Context manager to auto-close files at end
reader = csv.reader(infile)
#next(reader) remove the hash if you want to drop the headers
return list(reader)
file_1 = read_csv('file_1.csv')
file_2 = read_csv('file_2.csv')
# Make a set of file_1 column 0 with a list comprehension
file_1_vals = set([item[0] for item in file_1])
# Now iterate through file_2
for row in file_2:
if row[2] in file_1_vals:
print(row[1])
file1 = open("../dir/file1.csv", "r")
file2 = open("../dir/file2.csv", "r")
for line in file1:
file1splitted = line.strip().split(",")
for line in file2:
file2splitted = line.strip().split(",")
if file1splitted[0] == file2splitted [2]:
print (file1splitted[0],file1splitted[1], file2splitted[6], file2splitted[10], file2splitted[12])
file1.close()
file2.close()
if you provide your csv files then I can help you more.
Suppose I have a big file as file.txt and it has data of around 300,000. I want to split it based on certain key location. See file.txt below:
Line 1: U0001;POUNDS;**CAN**;1234
Line 2: U0001;POUNDS;**USA**;1234
Line 3: U0001;POUNDS;**CAN**;1234
Line 100000; U0001;POUNDS;**CAN**;1234
The locations are limited to 10-15 different nation. And I need to separate each record of a particular country in one particular file. How to do this task in Python
Thanks for help
This will run with very low memory overhead as it writes each line as it reads it.
Algorithm:
open input file
read a line from input file
get country from line
if new country then open file for country
write the line to country's file
loop if more lines
close files
Code:
with open('file.txt', 'r') as infile:
try:
outfiles = {}
for line in infile:
country = line.split(';')[2].strip('*')
if country not in outfiles:
outfiles[country] = open(country + '.txt', 'w')
outfiles[country].write(line)
finally:
for outfile in outfiles.values():
outfile.close()
with open("file.txt") as f:
content = f.readlines()
# you may also want to remove whitespace characters like `\n` at the end of each line
text = [x.strip() for x in content]
x = [i.split(";") for i in text]
x.sort(key=lambda x: x[2])
from itertools import groupby
from operator get itemgetter
y = groupby(x, itemgetter(2))
res = [(i[0],[j for j in i[1]]) for i in y]
for country in res:
with open(country[0]+".txt","w") as writeFile:
writeFile.writelines("%s\n" % ';'.join(l) for l in country[1])
will group by your item!
Hope it helps!
Looks like what you have is a csv file. csv stands for comma-separated values, but any file that uses a different delimiter (in this case a semicolon ;) can be treated like a csv file.
We'll use the python module csv to read the file in, and then write a file for each country
import csv
from collections import defaultdict
d = defaultdict(list)
with open('file.txt', 'rb') as f:
r = csv.reader(f, delimiter=';')
for line in r:
d[l[2]].append(l)
for country in d:
with open('{}.txt'.format(country), 'wb') as outfile:
w = csv.writer(outfile, delimiter=';')
for line in d[country]:
w.writerow(line)
# the formatting-function for the filename used for saving
outputFileName = "{}.txt".format
# alternative:
##import time
##outputFileName = lambda loc: "{}_{}.txt".format(loc, time.asciitime())
#make a dictionary indexed by location, the contained item is new content of the file for the location
sortedByLocation = {}
f = open("file.txt", "r")
#iterate each line and look at the column for the location
for l in f.readlines():
line = l.split(';')
#the third field (indices begin with 0) is the location-abbreviation
# make the string lower, cause on some filesystems the file with upper chars gets overwritten with only the elements with lower characters, while python differs between the upper and lower
location = line[2].lower().strip()
#get previous lines of the location and store it back
tmp = sortedByLocation.get(location, "")
sortedByLocation[location]=tmp+l.strip()+'\n'
f.close()
#save file for each location
for location, text in sortedByLocation.items():
with open(outputFileName(location) as f:
f.write(text)
I have a dataset of about 10 CSV files. I want to combine those files row-wise into a single CSV file.
What I tried:
import csv
fout = open("claaassA.csv","a")
# first file:
writer = csv.writer(fout)
for line in open("a01.ihr.60.ann.csv"):
print line
writer.writerow(line)
# now the rest:
for num in range(2, 10):
print num
f = open("a0"+str(num)+".ihr.60.ann.csv")
#f.next() # skip the header
for line in f:
print line
writer.writerow(line)
#f.close() # not really needed
fout.close()
Definitively need more details in the question (ideally examples of the inputs and expected output).
Given the little information provided, I will assume that you know that all files are valid CSV and they all have the same number or lines (rows). I'll also assume that memory is not a concern (i.e. they are "small" files that fit together in memory). Furthermore, I assume that line endings are new line (\n).
If all these assumptions are valid, then you can do something like this:
input_files = ['file1.csv', 'file2.csv', 'file3.csv']
output_file = 'output.csv'
output = None
for infile in input_files:
with open(infile, 'r') as fh:
if output:
for i, l in enumerate(fh.readlines()):
output[i] = "{},{}".format(output[i].rstrip('\n'), l)
else:
output = fh.readlines()
with open(output_file, 'w') as fh:
for line in output:
fh.write(line)
There are probably more efficient ways, but this is a quick and dirty way to achieve what I think you are asking for.
The previous answer implicitly assumes we need to do this in python. If bash is an option then you could use the paste command. For example:
paste -d, file1.csv file2.csv file3.csv > output.csv
I don't understand fully why you use the library csv. Actually, it's enough to fill the output file with the lines from given files (it they have the same columns' manes and orders).
input_path_list = [
"a01.ihr.60.ann.csv",
"a02.ihr.60.ann.csv",
"a03.ihr.60.ann.csv",
"a04.ihr.60.ann.csv",
"a05.ihr.60.ann.csv",
"a06.ihr.60.ann.csv",
"a07.ihr.60.ann.csv",
"a08.ihr.60.ann.csv",
"a09.ihr.60.ann.csv",
]
output_path = "claaassA.csv"
with open(output_path, "w") as fout:
header_written = False
for intput_path in input_path_list:
with open(intput_path) as fin:
header = fin.next()
# it adds the header at the beginning and skips other headers
if not header_written:
fout.write(header)
header_written = True
# it adds all rows
for line in fin:
fout.write(line)
I have an original text file with 100 rows and 40 columns of data.
I would like to write an individual text file for each data row of the original text file.
I can only work out how to do it the long way:
Data = loadtxt('Data.txt')
Row1 = Data[0,:]
np.savetxt('Row1.txt', [Row1])
Row2 = Data[1,:]
np.savetxt('Row2.txt', [Row2])
Row3 = Data[2,:] etc....
Is there a way of using a loop to make this process quicker/do it all at once so I can avoid doing this 100 times?
I was thinking something along the lines of
with open('Data.txt') as f:
for line in f.
line_out = f.readlines():
with open(line + '.txt','w') as fout:
fout.write(line_out)
This doesn't work but I can't work out what the code should be.
You're on the right track. This should give you files with names corresponding to each line number:
counter = 0
with open("sampleInput.txt",'rU') as f:
for i in f:
newFileName = 'newFile_'+str(counter)
outFile = open(newFileName,'w')
outFile.write(i)
outFile.close()
counter+=1
Consider fileNames.txt contain all the words for creating multiple .txt files.
f = open('fileNames.txt', 'r+')
for line in f:
if '\n' in line:
line = line[:-1] #assuming /n at the end of file
new = open("%s.txt"%line,"w+")
new.write("File with name %s"%line) #content for each file.
new.close()
New files will not be created if \n is present in the string. Hence avoid such conditions.
If fileNames.txt contains---> frog four legs
Then three files named frog.txt four.txt and legs.txt will be created.
I'm a new Python user.
I have a txt file that will be something like:
3,1,3,2,3
3,2,2,3,2
2,1,3,3,2,2
1,2,2,3,3,1
3,2,1,2,2,3
but may be less or more lines.
I want to import each line as a list.
I know you can do it as such:
filename = 'MyFile.txt'
fin=open(filename,'r')
L1list = fin.readline()
L2list = fin.readline()
L3list = fin.readline()
but since I don't know how many lines I will have, is there another way to create individual lists?
Do not create separate lists; create a list of lists:
results = []
with open('inputfile.txt') as inputfile:
for line in inputfile:
results.append(line.strip().split(','))
or better still, use the csv module:
import csv
results = []
with open('inputfile.txt', newline='') as inputfile:
for row in csv.reader(inputfile):
results.append(row)
Lists or dictionaries are far superiour structures to keep track of an arbitrary number of things read from a file.
Note that either loop also lets you address the rows of data individually without having to read all the contents of the file into memory either; instead of using results.append() just process that line right there.
Just for completeness sake, here's the one-liner compact version to read in a CSV file into a list in one go:
import csv
with open('inputfile.txt', newline='') as inputfile:
results = list(csv.reader(inputfile))
Create a list of lists:
with open("/path/to/file") as file:
lines = []
for line in file:
# The rstrip method gets rid of the "\n" at the end of each line
lines.append(line.rstrip().split(","))
with open('path/to/file') as infile: # try open('...', 'rb') as well
answer = [line.strip().split(',') for line in infile]
If you want the numbers as ints:
with open('path/to/file') as infile:
answer = [[int(i) for i in line.strip().split(',')] for line in infile]
lines=[]
with open('file') as file:
lines.append(file.readline())