Create multiple texts files from data in an original text file - python

I have an original text file with 100 rows and 40 columns of data.
I would like to write an individual text file for each data row of the original text file.
I can only work out how to do it the long way:
Data = loadtxt('Data.txt')
Row1 = Data[0,:]
np.savetxt('Row1.txt', [Row1])
Row2 = Data[1,:]
np.savetxt('Row2.txt', [Row2])
Row3 = Data[2,:] etc....
Is there a way of using a loop to make this process quicker/do it all at once so I can avoid doing this 100 times?
I was thinking something along the lines of
with open('Data.txt') as f:
for line in f.
line_out = f.readlines():
with open(line + '.txt','w') as fout:
fout.write(line_out)
This doesn't work but I can't work out what the code should be.

You're on the right track. This should give you files with names corresponding to each line number:
counter = 0
with open("sampleInput.txt",'rU') as f:
for i in f:
newFileName = 'newFile_'+str(counter)
outFile = open(newFileName,'w')
outFile.write(i)
outFile.close()
counter+=1

Consider fileNames.txt contain all the words for creating multiple .txt files.
f = open('fileNames.txt', 'r+')
for line in f:
if '\n' in line:
line = line[:-1] #assuming /n at the end of file
new = open("%s.txt"%line,"w+")
new.write("File with name %s"%line) #content for each file.
new.close()
New files will not be created if \n is present in the string. Hence avoid such conditions.
If fileNames.txt contains---> frog four legs
Then three files named frog.txt four.txt and legs.txt will be created.

Related

Creating files with names based on entries in txt file

i = 1
with open("randomStuff\\test\\brief.txt") as textFile:
lines = [line.split('\n') for line in textFile]
for row in lines:
for elem in row:
with open(elem + ".txt", "w") as newLetter:
newLetter.writelines(elem)
i += 1
I have a txt file with names. I want to create files with those names like:
firstnameLastname.txt
The names appear in the files too.
At the moment it is working fine, but it creates on empty file called ".txt"
Can someone tell me why? If I'm right the problem should be in the loops.
Add an if statement to prevent creating files on empty lines
Edit
i = 1
with open("randomStuff\\test\\brief.txt") as textFile:
lines = [line.split('\n') for line in textFile]
for row in lines:
for elem in row:
if elem == “”:
continue
with open(elem + ".txt", "w") as newLetter:
newLetter.writelines(elem)
i += 1
Continue will jump to the next loop cycle without execute the below code
I don't know why you have so many loops:
from pathlib import Path
text_file_content = Path("randomStuff/test/brief.txt").read_text().split_lines()
for line in text_file_content:
if line: # in case you have a new line at the end of your file, which you probably should
with open(f"{line}.txt", "w") as new_letter:
new_letter.writelines(line)

How to txt file merge using python?

I have two txt files. I want to merge those files using python.
I just began to study python and need some help. I tried searching google to resolve this resolved this but I can't find solution.
So please help me.
Below are my two txt file.
a.txt has this data.
Zone,alias1,alias2
PA1_H1,PA1,H1
PA2_H2,PA2,H2
b.txt has this data.
WWN,Port,Aliases
da,0,PA1
dq,1,PA2
d2,3,H1
d4,1,H2
Expected Output
Zone,alias1,WWN,Port,alias2,WWN,Port
PA1_H1,PA1,da,0,H1,d2,3
PA2_H2,PA2,dq,1,H2,d4,1
I tried below script but I can't merge.
row = []
for line in open("mod_alias.txt"):
line = line.split(',')[2]
row.append(line)
strings = row
for line in open("mod_all_cfgshow.txt"):
if any(s in line for s in strings):
field1,field2,field3 = line.split(',')
print field1,field2,field3
How can I merge file?
Could you show me example?
Here's some code to get you started. This code will show you how to open both files and combine them. Then, all you'll need to do is modify the code to merge the files using whichever specific rules you'd like.
# Open Files, read data to lists, and strip data
with open("b.txt") as bFile:
bContent = bFile.readlines()
bContent = [line.strip() for line in bContent]
with open('a.txt') as aFile:
aContent = aFile.readlines()
aContent = [line.strip() for line in aContent]
# Create a file to store the merged text
m = open('merged.txt','w')
# Cycle through the text read from files and merge, and then print to file
for aLine, bLine in zip(aContent, bContent):
mergedString = aLine+','+bLine
print>>m,mergedString
This should get you started
import csv
# read all the data in b.txt into a dictionary, key'd by the alias. We'll look this up later
data = {}
with open("b.txt") as infile:
for row in csv.DictReader(infile):
alias = row["Aliases"]
data[alias] = row
with open("a.txt") as fin, open("output.txt", 'w') as fout:
infile = csv.DictReader(fin)
outfile = csv.DictWriter(headers=infile.headers+data.keys())
for row in infile:
row.update(data[row["Aliases"]]) # update the row with the data from b.txt
outfile.writerow(row)

Split a large text file to small ones based on location

Suppose I have a big file as file.txt and it has data of around 300,000. I want to split it based on certain key location. See file.txt below:
Line 1: U0001;POUNDS;**CAN**;1234
Line 2: U0001;POUNDS;**USA**;1234
Line 3: U0001;POUNDS;**CAN**;1234
Line 100000; U0001;POUNDS;**CAN**;1234
The locations are limited to 10-15 different nation. And I need to separate each record of a particular country in one particular file. How to do this task in Python
Thanks for help
This will run with very low memory overhead as it writes each line as it reads it.
Algorithm:
open input file
read a line from input file
get country from line
if new country then open file for country
write the line to country's file
loop if more lines
close files
Code:
with open('file.txt', 'r') as infile:
try:
outfiles = {}
for line in infile:
country = line.split(';')[2].strip('*')
if country not in outfiles:
outfiles[country] = open(country + '.txt', 'w')
outfiles[country].write(line)
finally:
for outfile in outfiles.values():
outfile.close()
with open("file.txt") as f:
content = f.readlines()
# you may also want to remove whitespace characters like `\n` at the end of each line
text = [x.strip() for x in content]
x = [i.split(";") for i in text]
x.sort(key=lambda x: x[2])
from itertools import groupby
from operator get itemgetter
y = groupby(x, itemgetter(2))
res = [(i[0],[j for j in i[1]]) for i in y]
for country in res:
with open(country[0]+".txt","w") as writeFile:
writeFile.writelines("%s\n" % ';'.join(l) for l in country[1])
will group by your item!
Hope it helps!
Looks like what you have is a csv file. csv stands for comma-separated values, but any file that uses a different delimiter (in this case a semicolon ;) can be treated like a csv file.
We'll use the python module csv to read the file in, and then write a file for each country
import csv
from collections import defaultdict
d = defaultdict(list)
with open('file.txt', 'rb') as f:
r = csv.reader(f, delimiter=';')
for line in r:
d[l[2]].append(l)
for country in d:
with open('{}.txt'.format(country), 'wb') as outfile:
w = csv.writer(outfile, delimiter=';')
for line in d[country]:
w.writerow(line)
# the formatting-function for the filename used for saving
outputFileName = "{}.txt".format
# alternative:
##import time
##outputFileName = lambda loc: "{}_{}.txt".format(loc, time.asciitime())
#make a dictionary indexed by location, the contained item is new content of the file for the location
sortedByLocation = {}
f = open("file.txt", "r")
#iterate each line and look at the column for the location
for l in f.readlines():
line = l.split(';')
#the third field (indices begin with 0) is the location-abbreviation
# make the string lower, cause on some filesystems the file with upper chars gets overwritten with only the elements with lower characters, while python differs between the upper and lower
location = line[2].lower().strip()
#get previous lines of the location and store it back
tmp = sortedByLocation.get(location, "")
sortedByLocation[location]=tmp+l.strip()+'\n'
f.close()
#save file for each location
for location, text in sortedByLocation.items():
with open(outputFileName(location) as f:
f.write(text)

"Move" some parts of the file to another file

Let say I have a file with 48,222 lines. I then give an index value, let say, 21,000.
Is there any way in Python to "move" the contents of the file starting from index 21,000 such that now I have two files: the original one and the new one. But the original one now is having 21,000 lines and the new one 27,222 lines.
I read this post which uses partition and is quite describing what I want:
with open("inputfile") as f:
contents1, sentinel, contents2 = f.read().partition("Sentinel text\n")
with open("outputfile1", "w") as f:
f.write(contents1)
with open("outputfile2", "w") as f:
f.write(contents2)
Except that (1) it uses "Sentinel Text" as separator, (2) it creates two new files and require me to delete the old file. As of now, the way I do it is like this:
for r in result.keys(): #the filenames are in my dictionary, don't bother that
f = open(r)
lines = f.readlines()
f.close()
with open("outputfile1.txt", "w") as fn:
for line in lines[0:21000]:
#write each line
with open("outputfile2.txt", "w") as fn:
for line in lines[21000:]:
#write each line
Which is quite a manual work. Is there a built-in or more efficient way?
You can also use writelines() and dump the sliced list of lines from 0 to 20999 into one file and another sliced list from 21000 to the end into another file.
with open("inputfile") as f:
content = f.readlines()
content1 = content[:21000]
content2 = content[21000:]
with open("outputfile1.txt", "w") as fn1:
fn1.writelines(content1)
with open('outputfile2.txt','w') as fn2:
fn2.writelines(content2)

Copy columns from multiple text files in Python

I have a large number of text files containg data arranged into a fixed number of rows and columns, the columns being separated by spaces. (like a .csv but using spaces as the delimiter). I want to extract a given column from each of these files, and write it into a new text file.
So far I have tried:
results_combined = open('ResultsCombined.txt', 'wb')
def combine_results():
for num in range(2,10):
f = open("result_0."+str(num)+"_.txt", 'rb') # all the text files have similar filename styles
lines = f.readlines() # read in the data
no_lines = len(lines) # get the number of lines
for i in range (0,no_lines):
column = lines[i].strip().split(" ")
results_combined.write(column[5] + " " + '\r\n')
f.close()
if __name__ == "__main__":
combine_results()
This produces a text file containing the data I want from the separate files, but as a single column. (i.e. I've managed to 'stack' the columns on top of each other, rather than have them all side by side as separate columns). I feel I've missed something obvious.
In another attempt, I manage to write all the separate files to a single file, but without picking out the columns that I want.
import glob
files = [open(f) for f in glob.glob("result_*.txt")]
fout = open ("ResultsCombined.txt", 'wb')
for row in range(0,488):
for f in files:
fout.write( f.readline().strip() )
fout.write(' ')
fout.write('\n')
fout.close()
What I basically want is to copy column 5 from each file (it is always the same column) and write them all to a single file.
If you don't know the maximum number of rows in the files and if the files can fit into memory, then the following solution would work:
import glob
files = [open(f) for f in glob.glob("*.txt")]
# Given file, Read the 6th column in each line
def readcol5(f):
return [line.split(' ')[5] for line in f]
filecols = [ readcol5(f) for f in files ]
maxrows = len(max(filecols, key=len))
# Given array, make sure it has maxrows number of elements.
def extendmin(arr):
diff = maxrows - len(arr)
arr.extend([''] * diff)
return arr
filecols = map(extendmin, filecols)
lines = zip(*filecols)
lines = map(lambda x: ','.join(x), lines)
lines = '\n'.join(lines)
fout = open('output.csv', 'wb')
fout.write(lines)
fout.close()
Or this option (following your second approach):
import glob
files = [open(f) for f in glob.glob("result_*.txt")]
fout = open ("ResultsCombined.txt", 'w')
for row in range(0,488):
for f in files:
fout.write(f.readline().strip().split(' ')[5])
fout.write(' ')
fout.write('\n')
fout.close()
... which uses a fixed number of rows per file but will work for very large numbers of rows because it is not storing the intermediate values in memory. For moderate numbers of rows, I'd expect the first answer's solution to run more quickly.
Why not read all the entries from each 5th column into a list and after reading in all the files, write them all to the output file?
data = [
[], # entries from first file
[], # entries from second file
...
]
for i in range(number_of_rows):
outputline = []
for vals in data:
outputline.append(vals[i])
outfile.write(" ".join(outputline))

Categories

Resources