File data binding with column names

File data binding with column names - python

I have files with hundreds and thousands rows of data but they are without any column.
I am trying to go to every file and make them row by row and store them in list after that I want to assign values by columns. But here I am confused what to do because values are around 60 in every row and some extra columns with value assigned and they should be added in every row.
Code so for:
import re
import glob
filenames = glob.glob("/home/ashfaque/Desktop/filetocsvsample/inputfiles/*.txt")
columns = []
with open("/home/ashfaque/Downloads/coulmn names.txt",encoding = "ISO-8859-1") as f:
file_data = f.read()
lines = file_data.splitlines()
for l in lines:
columns.append(l.rstrip())
total = {}
for name in filenames:
modified_data = []
with open(name,encoding = "ISO-8859-1") as f:
file_data = f.read()
lines = file_data.splitlines()
for l in lines:
if len(l) >= 1:
modified_data.append(re.split(': |,',l))
rows = []
i = len(modified_data)
x = 0
while i > 60:
r = lines[x:x+59]
x = x + 60
i = i - 60
rows.append(r)
z = len(modified_data)
while z >= 60:
z = z - 60
if z > 1:
last_columns = modified_data[-z:]
x = []
for l in last_columns:
if len(l) > 1:
del l[0]
x.append(l)
elif len(l) == 1:
x.append(l)
for row in rows:
for vl in x:
row.append(vl)
for r in rows:
for i in range(0,len(r)):
if len(r) >= 60:
total.setdefault(columns[i],[]).append(r[i])
In other script I have separated both row with 60 values and last 5 to 15 columns which should be added with row are separate but again I am confused how to bind all the data.
Data Should look like this after binding.
outputdata.xlsx
Data Input file:
inputdata.txt
What Am I missing here? any tool ?

I believe that your issue can be resolved by taking the input file and turning it into a CSV file which you can then import into whatever program you like.
I wrote a small generator that would read a file a line at a time and return a row after a certain number of lines, in this case 60. In that generator, you can make whatever modifications to the data as you need.
Then with each generated row, I write it directly to the csv. This should keep the memory requirements for this process pretty low.
I didn't understand what you were doing with the regex split, but it would be simple enough to add it to the generator.
import csv
OUTPUT_FILE = "/home/ashfaque/Desktop/File handling/outputfile.csv"
INPUT_FILE = "/home/ashfaque/Desktop/File handling/inputfile.txt"
# This is a generator that will pull only num number of items into
# memory at a time, before it yields the row.
def get_rows(path, num):
row = []
with open(path, "r", encoding="ISO-8859-1") as f:
for n, l in enumerate(f):
# apply whatever transformations that you need to here.
row.append(l.rstrip())
if (n + 1) % num == 0:
# if rows need padding then do it here.
yield row
row = []
with open(OUTPUT_FILE, "w") as output:
csv_writer = csv.writer(output)
for r in get_rows(INPUT_FILE, 60):
csv_writer.writerow(r)

Related

how to create a list and then print it in ascending order

def list():
list_name = []
list_name_second = []
with open('CoinCount.txt', 'r', encoding='utf-8') as csvfile:
num_lines = 0
for line in csvfile:
num_lines = num_lines + 1
i = 0
while i < num_lines:
for x in volunteers[i].name:
if x not in list_name: # l
f = 0
while f < num_lines:
addition = []
if volunteers[f].true_count == "Y":
addition.append(1)
else:
addition.append(0)
f = f + 1
if f == num_lines:
decimal = sum(addition) / len(addition)
d = decimal * 100
percentage = float("{0:.2f}".format(d))
list_name_second.append({'Name': x , 'percentage': str(percentage)})
list_name.append(x)
i = i + 1
if i == num_lines:
def sort_percentages(list_name_second):
return list_name_second.get('percentage')
print(list_name_second, end='\n\n')
above is a segment of my code, it essentially means:
If the string in nth line of names hasn't been listed already, find the percentage of accurate coins counted and then add that all to a list, then print that list.
the issue is that when I output this, the program is stuck on a while loop continuously on addition.append(1), I'm not sure why so please can you (using the code displayed) let me know how to update the code to make it run as intended, also if it helps, the first two lines of code within the txt file read:
Abena,5p,325.00,Y
Malcolm,1p,3356.00,N
this doesn't matter much but just incase you need it, I suspect that the reason it is stuck looping addition.append(1) is because the first line has a "Y" as its true_count

Writing data into CSV file

I have a code that is basically doing this:
row1 = []
count = 0
writer = csv.writer(myFile)
row = []
for j in range(0, 2):
for i in range(0, 4):
row1.append(i+count)
count = count + 1
print(row1)
writer.writerows(row1)
row1[:] = []
I'm creating some lists and I want to map each value to a column, like this
This error showed it up iterable expected. How can I do that?

#roganjosh is right, what you need to write one row at a time is writerow:
import csv
myFile = open("aaa.csv", "w", newline="")
row1 = []
count = 0
writer = csv.writer(myFile)
row = []
for j in range(0, 2):
for i in range(0, 4):
row1.append(i+count)
count = count + 1
print(row1)
writer.writerow(row1)
row1[:] = []
myFile.close() # Don't forget to close your file

You probably need to call the method .writerow() instead of the plural .writerows(), because you write a single line to the file on each call.
The other method is to write multiple lines at once to the file.
Or you could also restructure your code like this to write all the lines at the end:
import csv
row_list = []
for j in range(2):
row = [j+i for i in range(4)]
row_list.append(row)
# row_list = [
# [j+i for i in range(4)]
# for j in range(2)]
with open('filename.csv', 'w', newline='') as f:
writer = csv.writer(f)
writer.writerows(row_list)

It's much simpler and easier to manipulate tabular data in pandas -- is there a reason you don't want to use pandas?
import pandas as pd
df = pd.DataFrame()
for i in range(4):
df[i] = range(i, i+4)
# Any other data wrangling
df.to_csv("file.csv")

Splitting a large json file into multiple json files using python [duplicate]

I have a text file say really_big_file.txt that contains:
line 1
line 2
line 3
line 4
...
line 99999
line 100000
I would like to write a Python script that divides really_big_file.txt into smaller files with 300 lines each. For example, small_file_300.txt to have lines 1-300, small_file_600 to have lines 301-600, and so on until there are enough small files made to contain all the lines from the big file.
I would appreciate any suggestions on the easiest way to accomplish this using Python

lines_per_file = 300
smallfile = None
with open('really_big_file.txt') as bigfile:
for lineno, line in enumerate(bigfile):
if lineno % lines_per_file == 0:
if smallfile:
smallfile.close()
small_filename = 'small_file_{}.txt'.format(lineno + lines_per_file)
smallfile = open(small_filename, "w")
smallfile.write(line)
if smallfile:
smallfile.close()

Using itertools grouper recipe:
from itertools import zip_longest
def grouper(n, iterable, fillvalue=None):
"Collect data into fixed-length chunks or blocks"
# grouper(3, 'ABCDEFG', 'x') --> ABC DEF Gxx
args = [iter(iterable)] * n
return zip_longest(fillvalue=fillvalue, *args)
n = 300
with open('really_big_file.txt') as f:
for i, g in enumerate(grouper(n, f, fillvalue=''), 1):
with open('small_file_{0}'.format(i * n), 'w') as fout:
fout.writelines(g)
The advantage of this method as opposed to storing each line in a list, is that it works with iterables, line by line, so it doesn't have to store each small_file into memory at once.
Note that the last file in this case will be small_file_100200 but will only go until line 100000. This happens because fillvalue='', meaning I write out nothing to the file when I don't have any more lines left to write because a group size doesn't divide equally. You can fix this by writing to a temp file and then renaming it after instead of naming it first like I have. Here's how that can be done.
import os, tempfile
with open('really_big_file.txt') as f:
for i, g in enumerate(grouper(n, f, fillvalue=None)):
with tempfile.NamedTemporaryFile('w', delete=False) as fout:
for j, line in enumerate(g, 1): # count number of lines in group
if line is None:
j -= 1 # don't count this line
break
fout.write(line)
os.rename(fout.name, 'small_file_{0}.txt'.format(i * n + j))
This time the fillvalue=None and I go through each line checking for None, when it occurs, I know the process has finished so I subtract 1 from j to not count the filler and then write the file.

I do this a more understandable way and using less short cuts in order to give you a further understanding of how and why this works. Previous answers work, but if you are not familiar with certain built-in-functions, you will not understand what the function is doing.
Because you posted no code I decided to do it this way since you could be unfamiliar with things other than basic python syntax given that the way you phrased the question made it seem as though you did not try nor had any clue as how to approach the question
Here are the steps to do this in basic python:
First you should read your file into a list for safekeeping:
my_file = 'really_big_file.txt'
hold_lines = []
with open(my_file,'r') as text_file:
for row in text_file:
hold_lines.append(row)
Second, you need to set up a way of creating the new files by name! I would suggest a loop along with a couple counters:
outer_count = 1
line_count = 0
sorting = True
while sorting:
count = 0
increment = (outer_count-1) * 300
left = len(hold_lines) - increment
file_name = "small_file_" + str(outer_count * 300) + ".txt"
Third, inside that loop you need some nested loops that will save the correct rows into an array:
hold_new_lines = []
if left < 300:
while count < left:
hold_new_lines.append(hold_lines[line_count])
count += 1
line_count += 1
sorting = False
else:
while count < 300:
hold_new_lines.append(hold_lines[line_count])
count += 1
line_count += 1
Last thing, again in your first loop you need to write the new file and add your last counter increment so your loop will go through again and write a new file
outer_count += 1
with open(file_name,'w') as next_file:
for row in hold_new_lines:
next_file.write(row)
note: if the number of lines is not divisible by 300, the last file will have a name that does not correspond to the last file line.
It is important to understand why these loops work. You have it set so that on the next loop, the name of the file that you write changes because you have the name dependent on a changing variable. This is a very useful scripting tool for file accessing, opening, writing, organizing etc.
In case you could not follow what was in what loop, here is the entirety of the function:
my_file = 'really_big_file.txt'
sorting = True
hold_lines = []
with open(my_file,'r') as text_file:
for row in text_file:
hold_lines.append(row)
outer_count = 1
line_count = 0
while sorting:
count = 0
increment = (outer_count-1) * 300
left = len(hold_lines) - increment
file_name = "small_file_" + str(outer_count * 300) + ".txt"
hold_new_lines = []
if left < 300:
while count < left:
hold_new_lines.append(hold_lines[line_count])
count += 1
line_count += 1
sorting = False
else:
while count < 300:
hold_new_lines.append(hold_lines[line_count])
count += 1
line_count += 1
outer_count += 1
with open(file_name,'w') as next_file:
for row in hold_new_lines:
next_file.write(row)

lines_per_file = 300 # Lines on each small file
lines = [] # Stores lines not yet written on a small file
lines_counter = 0 # Same as len(lines)
created_files = 0 # Counting how many small files have been created
with open('really_big_file.txt') as big_file:
for line in big_file: # Go throught the whole big file
lines.append(line)
lines_counter += 1
if lines_counter == lines_per_file:
idx = lines_per_file * (created_files + 1)
with open('small_file_%s.txt' % idx, 'w') as small_file:
# Write all lines on small file
small_file.write('\n'.join(stored_lines))
lines = [] # Reset variables
lines_counter = 0
created_files += 1 # One more small file has been created
# After for-loop has finished
if lines_counter: # There are still some lines not written on a file?
idx = lines_per_file * (created_files + 1)
with open('small_file_%s.txt' % idx, 'w') as small_file:
# Write them on a last small file
small_file.write('n'.join(stored_lines))
created_files += 1
print '%s small files (with %s lines each) were created.' % (created_files,
lines_per_file)

import csv
import os
import re
MAX_CHUNKS = 300
def writeRow(idr, row):
with open("file_%d.csv" % idr, 'ab') as file:
writer = csv.writer(file, delimiter=',', quotechar='\"', quoting=csv.QUOTE_ALL)
writer.writerow(row)
def cleanup():
for f in os.listdir("."):
if re.search("file_.*", f):
os.remove(os.path.join(".", f))
def main():
cleanup()
with open("large_file.csv", 'rb') as results:
r = csv.reader(results, delimiter=',', quotechar='\"')
idr = 1
for i, x in enumerate(r):
temp = i + 1
if not (temp % (MAX_CHUNKS + 1)):
idr += 1
writeRow(idr, x)
if __name__ == "__main__": main()

with open('/really_big_file.txt') as infile:
file_line_limit = 300
counter = -1
file_index = 0
outfile = None
for line in infile.readlines():
counter += 1
if counter % file_line_limit == 0:
# close old file
if outfile is not None:
outfile.close()
# create new file
file_index += 1
outfile = open('small_file_%03d.txt' % file_index, 'w')
# write to file
outfile.write(line)

I had to do the same with 650000 line files.
Use the enumerate index and integer div it (//) with the chunk size
When that number changes close the current file and open a new one
This is a python3 solution using format strings.
chunk = 50000 # number of lines from the big file to put in small file
this_small_file = open('./a_folder/0', 'a')
with open('massive_web_log_file') as file_to_read:
for i, line in enumerate(file_to_read.readlines()):
file_name = f'./a_folder/{i // chunk}'
print(i, file_name) # a bit of feedback that slows the process down a
if file_name == this_small_file.name:
this_small_file.write(line)
else:
this_small_file.write(line)
this_small_file.close()
this_small_file = open(f'{file_name}', 'a')

Set files to the number of file you want to split the master file to
in my exemple i want to get 10 files from my master file
files = 10
with open("data.txt","r") as data :
emails = data.readlines()
batchs = int(len(emails)/10)
for id,log in enumerate(emails):
fileid = id/batchs
file=open("minifile{file}.txt".format(file=int(fileid)+1),'a+')
file.write(log)

A very easy way would if you want to split it in 2 files for example:
with open("myInputFile.txt",'r') as file:
lines = file.readlines()
with open("OutputFile1.txt",'w') as file:
for line in lines[:int(len(lines)/2)]:
file.write(line)
with open("OutputFile2.txt",'w') as file:
for line in lines[int(len(lines)/2):]:
file.write(line)
making that dynamic would be:
with open("inputFile.txt",'r') as file:
lines = file.readlines()
Batch = 10
end = 0
for i in range(1,Batch + 1):
if i == 1:
start = 0
increase = int(len(lines)/Batch)
end = end + increase
with open("splitText_" + str(i) + ".txt",'w') as file:
for line in lines[start:end]:
file.write(line)
start = end

In Python files are simple iterators. That gives the option to iterate over them multiple times and always continue from the last place the previous iterator got. Keeping this in mind, we can use islice to get the next 300 lines of the file each time in a continuous loop. The tricky part is knowing when to stop. For this we will "sample" the file for the next line and once it is exhausted we can break the loop:
from itertools import islice
lines_per_file = 300
with open("really_big_file.txt") as file:
i = 1
while True:
try:
checker = next(file)
except StopIteration:
break
with open(f"small_file_{i*lines_per_file}.txt", 'w') as out_file:
out_file.write(checker)
for line in islice(file, lines_per_file-1):
out_file.write(line)
i += 1

Reorganizing data

I have to input a text file that contains comma seperated and line seperated data in the following format:
A002,R051,02-00-00,05-21-11,00:00:00,REGULAR,003169391,001097585,05-21-11,04:00:00,REGULAR,003169415,001097588,05-21-11,08:00:00,REGULAR,003169431,001097607
Multiple sets of such data is present in the text file
I need to print all this in new lines with the condition:
1st 3 elements of every set followed by 5 parameters in a new line. So solution of the above set would be:
A002,R051,02-00-00,05-21-11,00:00:00,REGULAR,003169391,001097585
A002,R051,02-00-00,05-21-11,04:00:00,REGULAR,003169415,001097588
A002,R051,02-00-00,05-21-11,08:00:00,REGULAR,003169431,001097607
My function to achieve it is given below:
def fix_turnstile_data(filenames):
for name in filenames:
f_in = open(name, 'r')
reader_in = csv.reader(f_in, delimiter = ',')
f_out = open('updated_' + name, 'w')
writer_out = csv.writer(f_out, delimiter = ',')
array=[]
for line in reader_in:
i = 0
j = -1
while i < len(line):
if i % 8 == 0:
i+=2
j+=1
del array[:]
array.append(line[0])
array.append(line[1])
array.append(line[2])
elif (i+1) % 8 == 0:
array.append(line[i-3*j])
writer_out.writerow(array)
else:
array.append(line[i-3*j])
i+=1
f_in.close()
f_out.close()
The output is wrong and there is a space of 3 lines at the end of those lines whose length is 8. I suspect it might be the writer_out.writerow(array) which is to blame.
Can anyone please help me out?

Hmm, the logic you use ends up being fairly confusing. I'd do it more along these lines (this replaces your for loop), and this is more Pythonic:
for line in reader_in:
header = line[:3]
for i in xrange(3, len(line), 5):
writer_out.writerow(header + line[i:i+5])

Splitting large text file into smaller text files by line numbers using Python

I have a text file say really_big_file.txt that contains:
line 1
line 2
line 3
line 4
...
line 99999
line 100000
I would like to write a Python script that divides really_big_file.txt into smaller files with 300 lines each. For example, small_file_300.txt to have lines 1-300, small_file_600 to have lines 301-600, and so on until there are enough small files made to contain all the lines from the big file.
I would appreciate any suggestions on the easiest way to accomplish this using Python

lines_per_file = 300
smallfile = None
with open('really_big_file.txt') as bigfile:
for lineno, line in enumerate(bigfile):
if lineno % lines_per_file == 0:
if smallfile:
smallfile.close()
small_filename = 'small_file_{}.txt'.format(lineno + lines_per_file)
smallfile = open(small_filename, "w")
smallfile.write(line)
if smallfile:
smallfile.close()

Using itertools grouper recipe:
from itertools import zip_longest
def grouper(n, iterable, fillvalue=None):
"Collect data into fixed-length chunks or blocks"
# grouper(3, 'ABCDEFG', 'x') --> ABC DEF Gxx
args = [iter(iterable)] * n
return zip_longest(fillvalue=fillvalue, *args)
n = 300
with open('really_big_file.txt') as f:
for i, g in enumerate(grouper(n, f, fillvalue=''), 1):
with open('small_file_{0}'.format(i * n), 'w') as fout:
fout.writelines(g)
The advantage of this method as opposed to storing each line in a list, is that it works with iterables, line by line, so it doesn't have to store each small_file into memory at once.
Note that the last file in this case will be small_file_100200 but will only go until line 100000. This happens because fillvalue='', meaning I write out nothing to the file when I don't have any more lines left to write because a group size doesn't divide equally. You can fix this by writing to a temp file and then renaming it after instead of naming it first like I have. Here's how that can be done.
import os, tempfile
with open('really_big_file.txt') as f:
for i, g in enumerate(grouper(n, f, fillvalue=None)):
with tempfile.NamedTemporaryFile('w', delete=False) as fout:
for j, line in enumerate(g, 1): # count number of lines in group
if line is None:
j -= 1 # don't count this line
break
fout.write(line)
os.rename(fout.name, 'small_file_{0}.txt'.format(i * n + j))
This time the fillvalue=None and I go through each line checking for None, when it occurs, I know the process has finished so I subtract 1 from j to not count the filler and then write the file.

I do this a more understandable way and using less short cuts in order to give you a further understanding of how and why this works. Previous answers work, but if you are not familiar with certain built-in-functions, you will not understand what the function is doing.
Because you posted no code I decided to do it this way since you could be unfamiliar with things other than basic python syntax given that the way you phrased the question made it seem as though you did not try nor had any clue as how to approach the question
Here are the steps to do this in basic python:
First you should read your file into a list for safekeeping:
my_file = 'really_big_file.txt'
hold_lines = []
with open(my_file,'r') as text_file:
for row in text_file:
hold_lines.append(row)
Second, you need to set up a way of creating the new files by name! I would suggest a loop along with a couple counters:
outer_count = 1
line_count = 0
sorting = True
while sorting:
count = 0
increment = (outer_count-1) * 300
left = len(hold_lines) - increment
file_name = "small_file_" + str(outer_count * 300) + ".txt"
Third, inside that loop you need some nested loops that will save the correct rows into an array:
hold_new_lines = []
if left < 300:
while count < left:
hold_new_lines.append(hold_lines[line_count])
count += 1
line_count += 1
sorting = False
else:
while count < 300:
hold_new_lines.append(hold_lines[line_count])
count += 1
line_count += 1
Last thing, again in your first loop you need to write the new file and add your last counter increment so your loop will go through again and write a new file
outer_count += 1
with open(file_name,'w') as next_file:
for row in hold_new_lines:
next_file.write(row)
note: if the number of lines is not divisible by 300, the last file will have a name that does not correspond to the last file line.
It is important to understand why these loops work. You have it set so that on the next loop, the name of the file that you write changes because you have the name dependent on a changing variable. This is a very useful scripting tool for file accessing, opening, writing, organizing etc.
In case you could not follow what was in what loop, here is the entirety of the function:
my_file = 'really_big_file.txt'
sorting = True
hold_lines = []
with open(my_file,'r') as text_file:
for row in text_file:
hold_lines.append(row)
outer_count = 1
line_count = 0
while sorting:
count = 0
increment = (outer_count-1) * 300
left = len(hold_lines) - increment
file_name = "small_file_" + str(outer_count * 300) + ".txt"
hold_new_lines = []
if left < 300:
while count < left:
hold_new_lines.append(hold_lines[line_count])
count += 1
line_count += 1
sorting = False
else:
while count < 300:
hold_new_lines.append(hold_lines[line_count])
count += 1
line_count += 1
outer_count += 1
with open(file_name,'w') as next_file:
for row in hold_new_lines:
next_file.write(row)

lines_per_file = 300 # Lines on each small file
lines = [] # Stores lines not yet written on a small file
lines_counter = 0 # Same as len(lines)
created_files = 0 # Counting how many small files have been created
with open('really_big_file.txt') as big_file:
for line in big_file: # Go throught the whole big file
lines.append(line)
lines_counter += 1
if lines_counter == lines_per_file:
idx = lines_per_file * (created_files + 1)
with open('small_file_%s.txt' % idx, 'w') as small_file:
# Write all lines on small file
small_file.write('\n'.join(stored_lines))
lines = [] # Reset variables
lines_counter = 0
created_files += 1 # One more small file has been created
# After for-loop has finished
if lines_counter: # There are still some lines not written on a file?
idx = lines_per_file * (created_files + 1)
with open('small_file_%s.txt' % idx, 'w') as small_file:
# Write them on a last small file
small_file.write('n'.join(stored_lines))
created_files += 1
print '%s small files (with %s lines each) were created.' % (created_files,
lines_per_file)

import csv
import os
import re
MAX_CHUNKS = 300
def writeRow(idr, row):
with open("file_%d.csv" % idr, 'ab') as file:
writer = csv.writer(file, delimiter=',', quotechar='\"', quoting=csv.QUOTE_ALL)
writer.writerow(row)
def cleanup():
for f in os.listdir("."):
if re.search("file_.*", f):
os.remove(os.path.join(".", f))
def main():
cleanup()
with open("large_file.csv", 'rb') as results:
r = csv.reader(results, delimiter=',', quotechar='\"')
idr = 1
for i, x in enumerate(r):
temp = i + 1
if not (temp % (MAX_CHUNKS + 1)):
idr += 1
writeRow(idr, x)
if __name__ == "__main__": main()

with open('/really_big_file.txt') as infile:
file_line_limit = 300
counter = -1
file_index = 0
outfile = None
for line in infile.readlines():
counter += 1
if counter % file_line_limit == 0:
# close old file
if outfile is not None:
outfile.close()
# create new file
file_index += 1
outfile = open('small_file_%03d.txt' % file_index, 'w')
# write to file
outfile.write(line)

I had to do the same with 650000 line files.
Use the enumerate index and integer div it (//) with the chunk size
When that number changes close the current file and open a new one
This is a python3 solution using format strings.
chunk = 50000 # number of lines from the big file to put in small file
this_small_file = open('./a_folder/0', 'a')
with open('massive_web_log_file') as file_to_read:
for i, line in enumerate(file_to_read.readlines()):
file_name = f'./a_folder/{i // chunk}'
print(i, file_name) # a bit of feedback that slows the process down a
if file_name == this_small_file.name:
this_small_file.write(line)
else:
this_small_file.write(line)
this_small_file.close()
this_small_file = open(f'{file_name}', 'a')

Set files to the number of file you want to split the master file to
in my exemple i want to get 10 files from my master file
files = 10
with open("data.txt","r") as data :
emails = data.readlines()
batchs = int(len(emails)/10)
for id,log in enumerate(emails):
fileid = id/batchs
file=open("minifile{file}.txt".format(file=int(fileid)+1),'a+')
file.write(log)

A very easy way would if you want to split it in 2 files for example:
with open("myInputFile.txt",'r') as file:
lines = file.readlines()
with open("OutputFile1.txt",'w') as file:
for line in lines[:int(len(lines)/2)]:
file.write(line)
with open("OutputFile2.txt",'w') as file:
for line in lines[int(len(lines)/2):]:
file.write(line)
making that dynamic would be:
with open("inputFile.txt",'r') as file:
lines = file.readlines()
Batch = 10
end = 0
for i in range(1,Batch + 1):
if i == 1:
start = 0
increase = int(len(lines)/Batch)
end = end + increase
with open("splitText_" + str(i) + ".txt",'w') as file:
for line in lines[start:end]:
file.write(line)
start = end

In Python files are simple iterators. That gives the option to iterate over them multiple times and always continue from the last place the previous iterator got. Keeping this in mind, we can use islice to get the next 300 lines of the file each time in a continuous loop. The tricky part is knowing when to stop. For this we will "sample" the file for the next line and once it is exhausted we can break the loop:
from itertools import islice
lines_per_file = 300
with open("really_big_file.txt") as file:
i = 1
while True:
try:
checker = next(file)
except StopIteration:
break
with open(f"small_file_{i*lines_per_file}.txt", 'w') as out_file:
out_file.write(checker)
for line in islice(file, lines_per_file-1):
out_file.write(line)
i += 1

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

File data binding with column names - python

Related

how to create a list and then print it in ascending order

Writing data into CSV file

Splitting a large json file into multiple json files using python [duplicate]

Reorganizing data

Splitting large text file into smaller text files by line numbers using Python

Categories

Resources