Splitting file into smaller files by lines

Splitting file into smaller files by lines - python

I am trying to figure out a way to split a big txt file with columns of data into smaller files for uploading purposes. The big file has 4000 lines and I wondering if there is a way to divide it into four parts such as
file 1 (lines 1-1000)
file 2 (lines 1001-2000)
file 3 (lines 2001-3000)
file 4 (lines 3001-4000)
I appreciate the help.

This works (you could implement a for rather than a while loop but it makes little difference and does not assume how many files will be necessary):
with open('longFile.txt', 'r') as f:
lines = f.readlines()
threshold=1000
fileID=0
while fileID<len(lines)/float(threshold):
with open('fileNo'+str(fileID)+'.txt','w') as currentFile:
for currentLine in lines[threshold*fileID:threshold*(fileID+1)]:
currentFile.write(currentLine)
fileID+=1
Hope this helps. Try to use open in a with block as suggested in python docs.

Give this a try:
fhand = open(filename, 'r')
all_lines = fhand.readlines()
for x in xrange(4):
new_file = open(new_file_names[x], 'w')
new_file.write(all_lines[x * 1000, (x + 1) * 1000])

I like Aleksander Lidtke's, but with a for loop and a pop() twist for fun. I also like to maintain some of the files original naming when I do this, since it is usually to multiple files. So I added the name "split" in it.
with open('Data.txt','r') as f:
lines = f.readlines()
limit=1000
for o in range(len(lines)):
if lines!=[]:
with open(f.name.split(".")[0] +"_" + str(o) + '.txt','w') as NewFile:
for i in range(limit):
if lines!=[]:NewFile.write(lines.pop(0))

Related

Fastest way to read and delete N lines in python

Fastest way to read and delete N lines in python.
First I read the file something like this: (I think this is the best way to read large files: Source)
N = 50
with open("ahref.txt", "r+") as f:
link_list = [(next(f)).removesuffix("\n") for x in range(N)]
after that I run my code:
# My code here
After that I want to delete the first N line (I read it: Source).
# Source: https://stackoverflow.com/questions/4710067/how-to-delete-a-specific-line-in-a-file/28057753#28057753
with open("target.txt", "r+") as f:
d = f.readlines()
f.seek(0)
for i in d:
if i != "line you want to remove...":
f.write(i)
f.truncate()
This code doesn't work for me. Because I read only N numbers of lines.
I can remove lines:
with open("xml\\ahref.txt", "r+") as f:
N = 5
all_lines = f.readlines()
f.seek(0)
f.truncate()
f.writelines(all_lines[N:])
But there is a problem with that:
I have to read all the lines and after that I have to write all the lines.
which is not a fast way (There are many ways, but it needs to read all line)
What is the fastest way in terms of performance? because the file is huge.

fastest way is not to read the entire file in memory and use a temporary output file that you can then move over the original file if required
try:
N = 50
mode = "r+"
if not os.path.isfile('output'): mode = "w+"
with open('input', 'r') as fin, open('output', mode) as fout:
for index, line in enumerate(fout): N += 1
for index, line in enumerate(fin):
if index > N: fout.write(line)
# i haven't tested this you may need index > N or index >= N

Split and print the word before and after the \ of *n of lines, from a txt to two different txt's

I searched around a bit, but I couldn't find a solution that fits my needs.
I'm new to python, so I'm sorry if what I'm asking is pretty obvious.
I have a .txt file (for simplicity I will call it inputfile.txt) with a list of names of folder\files like this:
camisos\CROWDER_IMAG_1.mov
camisos\KS_HIGHENERGY.mov
camisos\KS_LOWENERGY.mov
What I need is to split the first word (the one before the \) and write it to a txt file (for simplicity I will call it outputfile.txt).
Then take the second (the one after the \) and write it in another txt file.
This is what i did so far:
with open("inputfile.txt", "r") as f:
lines = f.readlines()
with open("outputfile.txt", "w") as new_f:
for line in lines:
text = input()
print(text.split()[0])
This in my mind should print only the first word in the new txt, but I only got an empty txt file without any error.
Any advice is much appreciated, thanks in advance for any help you could give me.

You can read the file in a list of strings and split each string to create 2 separate lists.
with open("inputfile.txt", "r") as f:
lines = f.readlines()
X = []
Y = []
for line in lines:
X.append(line.split('\\')[0] + '\n')
Y.append(line.split('\\')[1])
with open("outputfile1.txt", "w") as f1:
f1.writelines(X)
with open("outputfile2.txt", "w") as f2:
f2.writelines(Y)

How do I concatenate multiple CSV files row-wise using python?

I have a dataset of about 10 CSV files. I want to combine those files row-wise into a single CSV file.
What I tried:
import csv
fout = open("claaassA.csv","a")
# first file:
writer = csv.writer(fout)
for line in open("a01.ihr.60.ann.csv"):
print line
writer.writerow(line)
# now the rest:
for num in range(2, 10):
print num
f = open("a0"+str(num)+".ihr.60.ann.csv")
#f.next() # skip the header
for line in f:
print line
writer.writerow(line)
#f.close() # not really needed
fout.close()

Definitively need more details in the question (ideally examples of the inputs and expected output).
Given the little information provided, I will assume that you know that all files are valid CSV and they all have the same number or lines (rows). I'll also assume that memory is not a concern (i.e. they are "small" files that fit together in memory). Furthermore, I assume that line endings are new line (\n).
If all these assumptions are valid, then you can do something like this:
input_files = ['file1.csv', 'file2.csv', 'file3.csv']
output_file = 'output.csv'
output = None
for infile in input_files:
with open(infile, 'r') as fh:
if output:
for i, l in enumerate(fh.readlines()):
output[i] = "{},{}".format(output[i].rstrip('\n'), l)
else:
output = fh.readlines()
with open(output_file, 'w') as fh:
for line in output:
fh.write(line)
There are probably more efficient ways, but this is a quick and dirty way to achieve what I think you are asking for.
The previous answer implicitly assumes we need to do this in python. If bash is an option then you could use the paste command. For example:
paste -d, file1.csv file2.csv file3.csv > output.csv

I don't understand fully why you use the library csv. Actually, it's enough to fill the output file with the lines from given files (it they have the same columns' manes and orders).
input_path_list = [
"a01.ihr.60.ann.csv",
"a02.ihr.60.ann.csv",
"a03.ihr.60.ann.csv",
"a04.ihr.60.ann.csv",
"a05.ihr.60.ann.csv",
"a06.ihr.60.ann.csv",
"a07.ihr.60.ann.csv",
"a08.ihr.60.ann.csv",
"a09.ihr.60.ann.csv",
]
output_path = "claaassA.csv"
with open(output_path, "w") as fout:
header_written = False
for intput_path in input_path_list:
with open(intput_path) as fin:
header = fin.next()
# it adds the header at the beginning and skips other headers
if not header_written:
fout.write(header)
header_written = True
# it adds all rows
for line in fin:
fout.write(line)

what is a quick way to import a text file in python?

I have a plain text file with a sequence of numbers, one on each line. I need to import those values into a list. I'm currently learning python and I'm not sure of which is a fast or even "standard" way of doing this (also, I come from R so I'm used to the scan or readLines functions that makes this task a breeze).
The file looks like this (note: this isn't a csv file, commas are decimal points):
204,00
10,00
10,00
10,00
10,00
11,00
70,00
276,00
58,00
...
Since it uses commas instead of '.' for decimal points, I guess the task's a little harder, but it should be more or less the same, right?
This is my current solution, which I find quite cumbersome:
f = open("some_file", "r")
data = f.read().replace('\n', '|')
data = data[0:(len(data) - 2)].replace(',', '.')
data = data.split('|')
x = range(len(data))
for i in range(len(data)):
x[i] = float(data[i])
Thanks in advance.

UPDATE
I didn't realize the comma was the decimal separator. If the locale is set right, something like this should work
lines = [locale.atof(line.strip()) for line in open(filename)]
if not, you could do
lines = [float(line.strip().replace(',','.')) for line in open(filename)]
lines = [line.strip() for line in open(filename)]
if you want the data as numbers ...
lines = [map(float,line.strip().split(',')) for line in open(filename)]
edited as per first two comments below

bsoist's answer is good if locale is set correctly. If not, you can simply read the entire file in and split on the line breaks (\n), then use a list comprehension for replacements.
with open('some_file.txt', 'r') as datafile:
data = datafile.read()
x = [float(value.replace(",", ".")) for value in data.split('\n')]

For a more simpler way you could just do
Read = []
with open('File.txt', 'r') as File:
Read = File.readLines()
for A in Read:
print A
The "with open()" will open the file and quit when it's finished reading. This is good practice IIRC.
Then the For loop will just loop over Read and print out the lines.

Read a multielement list, look for an element and print it out in python

I am writing a python script in order to write a tex file. But I had to use some information from another file. Such file has names of menus in each line that I need to use. I use split to have a list for each line of my "menu".
For example, I had to write a section with the each second element of my lists but after running, I got anything, what could I do?
This is roughly what I am doing:
texfile = open(outputtex.tex', 'w')
infile = open(txtfile.txt, 'r')
for line in infile.readlines():
linesplit = line.split('^')
for i in range(1,len(infile.readlines())):
texfile.write('\section{}\n'.format(linesplit[1]))
texfile.write('\\begin{figure*}[h!]\n')
texfile.write('\centering\n')
texfile.write('\includegraphics[scale=0.95]{pg_000%i.pdf}\n' %i)
texfile.write('\end{figure*}\n')
texfile.write('\\newpage\n')
texfile.write('\end{document}')
texfile.close()
By the way, in the inclugraphics line, I had to increace the number after pg_ from "0001" to "25050". Any clues??
I really appreciate your help.

I don't quite follow your question. But I see several errors in your code. Most importantly:
for line in infile.readlines():
...
...
for i in range(1,len(infile.readlines())):
Once you read a file, it's gone. (You can get it back, but in this case there's no point.) That means that the second call to readlines is yielding nothing, so len(infile.readlines()) == 0. Assuming what you've written here really is what you want to do (i.e. write file_len * (file_len - 1) + 1 lines?) then perhaps you should save the file to a list. Also, you didn't put quotes around your filenames, and your indentation is strange. Try this:
with open('txtfile.txt', 'r') as infile: # (with automatically closes infile)
in_lines = infile.readlines()
in_len = len(in_lines)
texfile = open('outputtex.tex', 'w')
for line in in_lines:
linesplit = line.split('^')
for i in range(1, in_len):
texfile.write('\section{}\n'.format(linesplit[1]))
texfile.write('\\begin{figure*}[h!]\n')
texfile.write('\centering\n')
texfile.write('\includegraphics[scale=0.95]{pg_000%i.pdf}\n' %i)
texfile.write('\end{figure*}\n')
texfile.write('\\newpage\n')
texfile.write('\end{document}')
texfile.close()
Perhaps you don't actually want nested loops?
infile = open('txtfile.txt', 'r')
texfile = open('outputtex.tex', 'w')
for line_number, line in enumerate(infile):
linesplit = line.split('^')
texfile.write('\section{{{0}}}\n'.format(linesplit[1]))
texfile.write('\\begin{figure*}[h!]\n')
texfile.write('\centering\n')
texfile.write('\includegraphics[scale=0.95]{pg_000%i.pdf}\n' % line_number)
texfile.write('\end{figure*}\n')
texfile.write('\\newpage\n')
texfile.write('\end{document}')
texfile.close()
infile.close()

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Splitting file into smaller files by lines - python

Give this a try: fhand = open(filename, 'r') all_lines = fhand.readlines() for x in xrange(4): new_file = open(new_file_names[x], 'w') new_file.write(all_lines[x * 1000, (x + 1) * 1000])

Related

Fastest way to read and delete N lines in python

Split and print the word before and after the \ of *n of lines, from a txt to two different txt's

How do I concatenate multiple CSV files row-wise using python?

what is a quick way to import a text file in python?

Read a multielement list, look for an element and print it out in python

Categories

Resources