Now I have a text file which i outputted to 3 times using the following code.
def Final_Output():##// Edited
with open('A Class, A.T.I..txt', "a") as file_out:
file_out.write("Full_Name :> " + str(Sur_Name) + ', ' + str(Name) + '\n')
file_out.write("Date: >: " + str(Date) + "\n")
file_out.write("Class: >: " + str(Class) + '\n')
file_out.write("Range >: " + str(Range_Limit) + "\n")
file_out.write("Score :> " + str(N) + '\n')
file_out.write("Score_Percent :> " + str(Score_Percent) + '\n')
Name = "MohKale"
Sur_Name = "MohKale Surname"
Date = "16/9/2015 11:7:52"
Class = "A"
Range_Limit = "10765"
N = "10"
Score_Percent = "0.0"
Final_Output()
Then I created a program which reads the text file continuously (until the end) and then saves them to variables.
a = 0
with open('A Class, A.T.I..txt', "r") as file_out:
for line in file_out:
a = a + 1
Name2 = file_out.readline()
print(Name2)
Sur_name2 = file_out.readline()
print(Sur_name2)
Date2 = file_out.readline()
print(Date2)
Class2 = file_out.readline()
print(Class2)
Range_Limit2 = file_out.readline()
print(Range_Limit2)
N2 = file_out.readline()
print(N2)
Score_Percent2 = file_out.readline()
print(Score_Percent2)
Now technically the program does read the text file, and saves to variables, but the problem is the first time it reads the text file it skips the first line for some reason, the second time it looped it also skipped the first line but also the second one, then the third time it looped it read the 2nd line but skipped the third, this continuously happened skipping the next read file.
can anyone understand and explain the problem
furthermore by putting
print(line)
it does print the missing lines, but this is way too random, is there anyway to prevent such an event
The problem seems to be that you are using a for loop AND iterating over the lines, use just one :)
I suggest that you remove the for loop and do just the file_out.readline()'s
with open('A Class, A.T.I..txt', "r") as file_out:
a = a + 1
Name2 = file_out.readline()
print(Name2)
Sur_name2 = file_out.readline()
print(Sur_name2)
Date2 = file_out.readline()
print(Date2)
Class2 = file_out.readline()
print(Class2)
Range_Limit2 = file_out.readline()
print(Range_Limit2)
N2 = file_out.readline()
print(N2)
Score_Percent2 = file_out.readline()
print(Score_Percent2)
The issue is that your for loop is consuming some of the lines in the file independently of your other .readline() calls. Rather than doing
for line in file_out:
You will want to do something like
try:
while True:
a += 1
Name2 = next(file_out)
print(Name2)
# ...
Score_Percent2 = next(file_out)
print(Score_Percent2)
except StopIteration:
print("Done!")
Related
Truoble with a really annoying homework. I have a csv-file with lots of comma-delimitered fields per row. I need to take the last two fields from every row and write them into a new txt-file. The problem is that some of the latter fields have sentences, those with commas are in double quotes, those without them aren't. For example:
180,easy
240min,"Quite easy, but number 3, wtf?"
300,much easier than the last assignment
I did this and it worked just fine, but the double quotes disappear. The assignment is to copy the fields to the txt-file, use semicolon as delimiter and remove possible line breaks. The text must remain exactly the same. We have an automatic check system, so it's no use arguing if this makes any sense.
import csv
file = open('myfile.csv', 'r')
output= open('mytxt.txt', 'w')
csvr = csv.reader(file)
headline = next(csvr)
for line in csvr:
lgt = len(line)
time = line[lgt - 2].replace('\n', '')
feedb = line[lgt - 1].replace('\n', '')
if time != '' and feedb != '':
output.write(time + ';' + feedb + '\n')
output.close()
file.close()
Is there some easy solution for this? Can I use csv module at all? No one seems to have exactly the same problem.
Thank you all beforehand.
Try this,
import csv
file = open('myfile.csv', 'r')
output= open('mytxt.txt', 'w')
csvr = csv.reader(file)
headline = next(csvr)
for line in csvr:
lgt = len(line)
time = line[lgt - 2].replace('\n', '')
feedb = line[lgt - 1].replace('\n', '')
if time != '' and feedb != '':
if ',' in feedb:
output.write(time + ';"' + feedb + '"\n')
else:
output.write(time + ';' + feedb + '\n')
output.close()
file.close()
Had to do it the ugly way, the file was too irrational. Talked with some collaegues on the same course and apparently the idea was NOT to use csv module here, but to rehearse basic file handling in Python.
file = open('myfile.csv','r')
output = open('mytxt.txt', 'w')
headline = file.readline()
feedb_lst = []
count = 0
for line in file:
if line.startswith('1'): #found out all lines should start with an ID number,
data_lst = line.split(',', 16) #that always starts with '1'
lgt = len(data_lst)
time = data_lst[lgt - 2]
feedb = data_lst[lgt - 1].rstrip()
feedback = [time, feedb]
feedb_lst.append(feedback)
count += 1
else:
feedb_lst[count - 1][1] = feedb_lst[count - 1][1] + line.rstrip()
i = 1
for item in feedb_lst:
if item[0] != '' and item[1] != '':
if i == len(feedb_lst):
output.write(item[0] + ';' + item[1])
else:
output.write(item[0] + ';' + item[1] + '\n')
i += 1
output.close()
file.close()
Thank you for your help!
for x in file.readlines():
something()
I think this code caching all the lines when loop is started. I deleting some of the lines from the file but it still repeating deleted lines. How can I change loop while in it?
def wanted(s,d):
print("deneme = " + str(s))
count = 0
total = 0
TG_count = TC_count = TA_count = GC_count = CC_count = CG_count = GG_count = AA_count = AT_count = TT_count = CT_count = AG_count = AC_count = GT_count = 0
for x in range(d,fileCount):
print(str(x+1) + 'st file processing...')
searchFile = open(str(x) + '.txt',encoding = 'utf-8',mode = "r+")
l = searchFile.readlines()
searchFile.seek(0)
for line in l:
if s in line[:12]:
blabla()
else:
searchFile.write(line)
searchFile.truncate()
searchFile.close()
for p in range(fileCount):
searchFile = open(str(p) + '.txt',encoding = 'utf-8',mode = "r+")
for z in searchFile.readlines():
wanted(z[:12],p)
print("Progressing file " + str(p) + " complete")
I guess it's Python. Yes, readlines() reads the whole file at once. In order to avoid this you can use:
for x in file:
something()
Maybe you can find the appropriate information in the Python tutorial. It says
If you want to read all the lines of a file in a list you can also use list(f) or f.readlines().
So yes, all lines are read and stored in memory.
Also the manual says:
f.readline() reads a single line from the file;
More details can be found in the manual.
I have a csv file with 5 million rows.
I want to split the file into a number a number of rows specified by the user.
Have developed the following code, but its taking too much time for the execution. Can anyone help me with the optimization of the code.
import csv
print "Please delete the previous created files. If any."
filepath = raw_input("Enter the File path: ")
line_count = 0
filenum = 1
try:
in_file = raw_input("Enter Input File name: ")
if in_file[-4:] == ".csv":
split_size = int(raw_input("Enter size: "))
print "Split Size ---", split_size
print in_file, " will split into", split_size, "rows per file named as OutPut-file_*.csv (* = 1,2,3 and so on)"
with open (in_file,'r') as file1:
row_count = 0
reader = csv.reader(file1)
for line in file1:
#print line
with open(filepath + "\\OutPut-file_" +str(filenum) + ".csv", "a") as out_file:
if row_count < split_size:
out_file.write(line)
row_count = row_count +1
else:
filenum = filenum + 1
row_count = 0
line_count = line_count+1
print "Total Files Written --", filenum
else:
print "Please enter the Name of the file correctly."
except IOError as e:
print "Oops..! Please Enter correct file path values", e
except ValueError:
print "Oops..! Please Enter correct values"
I have also tried without "with open"
Oups! You are consistently re-opening the output file on each row, when it is an expensive operation... Your code could could become:
...
with open (in_file,'r') as file1:
row_count = 0
#reader = csv.reader(file1) # unused here
out_file = open(filepath + "\\OutPut-file_" +str(filenum) + ".csv", "a")
for line in file1:
#print line
if row_count >= split_size:
out_file.close()
filenum = filenum + 1
out_file = open(filepath + "\\OutPut-file_" +str(filenum) + ".csv", "a")
row_count = 0
out_file.write(line)
row_count = row_count +1
line_count = line_count+1
...
Ideally, you should even initialize out_file = None before the try block and ensure a clean close in the except blocks with if out_file is not None: out_file.close()
Remark: this code only splits in line count (as yours did). That means that is will give wrong output if the csv file can contain newlines in quoted fields...
You can definitely use the multiprocessing module of python.
This is the result I have achieved when I have a csv file which had 1,000,000 lines in it.
import time
from multiprocessing import Pool
def saving_csv_normally(start):
out_file = open('out_normally/' + str(start/batch_size) + '.csv', 'w')
for i in range(start, start+batch_size):
out_file.write(arr[i])
out_file.close()
def saving_csv_multi(start):
out_file = open('out_multi/' + str(start/batch_size) + '.csv', 'w')
for i in range(start, start+batch_size):
out_file.write(arr[i])
out_file.close()
def saving_csv_multi_async(start):
out_file = open('out_multi_async/' + str(start/batch_size) + '.csv', 'w')
for i in range(start, start+batch_size):
out_file.write(arr[i])
out_file.close()
with open('files/test.csv') as file:
arr = file.readlines()
print "length of file : ", len(arr)
batch_size = 100 #split in number of rows
start = time.time()
for i in range(0, len(arr), batch_size):
saving_csv_normally(i)
print "time taken normally : ", time.time()-start
#multiprocessing
p = Pool()
start = time.time()
p.map(saving_csv_multi, range(0, len(arr), batch_size), chunksize=len(arr)/4) #chunksize you can define as much as you want
print "time taken for multiprocessing : ", time.time()-start
# it does the same thing aynchronically
start = time.time()
for i in p.imap_unordered(saving_csv_multi_async, range(0, len(arr), batch_size), chunksize=len(arr)/4):
continue
print "time taken for multiprocessing async : ", time.time()-start
output shows time taken by each :
length of file : 1000000
time taken normally : 0.733881950378
time taken for multiprocessing : 0.508712053299
time taken for multiprocessing async : 0.471592903137
I have defined three separate functions as functions passed in p.map can only have one parameter and as I am storing csv files in three different folders that is why I have written three functions.
I wrote a script that will open my text file search for a certain word, then select the line that contains this word ans split it into three parts, then it chooses the part which is a number and add 1 to it, so every time I run the script one is added to this number. here is the script:
#!/usr/bin/env python
inputFile = open('CMakeLists.txt', 'r')
version = None
saved = ""
for line in inputFile:
if "_PATCH " in line:
print "inside: ", line
version = line
else:
saved += line
inputFile.close()
inputFile = open('CMakeLists.txt', 'w')
x = version.split('"')
print "x: ", x
a = x[0]
b = int(x[1]) + 1
c = x[2]
new_version = str(a) + '"' + str(b) + '"' + str(c)
print "new_version: ", new_version
inputFile.write(str(saved))
inputFile.write(str(new_version))
inputFile.close()
but my problem is that the new number is being written at the end of the file, I want it to stay in its original place. Any ideas ?
thanks
The problem is that you write the new version number after the original file (without the version line):
inputFile.write(str(saved))
inputFile.write(str(new_version))
You could fix it by saving the lines before and after the line that contains the version separately and then save them in the right order:
#!/usr/bin/env python
inputFile = open('CMakeLists.txt', 'r')
version = None
savedBefore = ""
savedAfter = ""
for line in inputFile:
if "_PATCH " in line:
print "inside: ", line
version = line
elif version is None:
savedBefore += line
else:
savedAfter += line
inputFile.close()
inputFile = open('CMakeLists.txt', 'w')
x = version.split('"')
print "x: ", x
a = x[0]
b = int(x[1]) + 1
c = x[2]
new_version = str(a) + '"' + str(b) + '"' + str(c)
print "new_version: ", new_version
inputFile.write(savedBefore)
inputFile.write(str(new_version))
inputFile.write(savedAfter)
inputFile.close()
Note: you might need to add some extra text with the version line to make it have the same format as the original (such as adding "_PATCH").
There is a lots to say on your code.
Your mistake is that you're writing your "saved" lines and after you are writing your modified version. Hence, this modified line will be written at the end of the file.
Moreover, I advice you to use with statements.
lines = []
with open('CmakeLists.txt', 'r') as _fd:
while True:
line = _fd.readline()
if not line:
break
if '_PATCH ' in line:
a, b, c = line.split('"')
b = int(b) + 1
line = '{} "{}" {}'.format(a, b, c)
lines.append(line)
with open('CmakeLists.txt', 'w') as _fd:
for line in lines:
_fd.write(line)
This code is untested and may contains some error... also, if your input file is huge, putting every lines in a list can be a bad idea.
#!/usr/bin/env python
inputFile = open('CMakeLists.txt', 'r')
version = None
saved = ""
for line in inputFile:
if "_PATCH " in line:
print "inside: ", line
version = line
x = version.split('"')
print "x: ", x
a = x[0]
b = int(x[1]) + 1
c = x[2]
new_version = str(a) + '"' + str(b) + '"' + str(c)
saved += new_version
else:
saved += line
inputFile.close()
inputFile = open('CMakeLists.txt', 'w')
inputFile.write(str(saved))
inputFile.close()
if a certain line is found, update its content and add to saved, once for loop ends, just write saved to file
I successfully simplified a python module that imports data from a spectrometer
(I'm a total beginner, somebody else wrote the model of the code for me...)
I only have one problem: half of the output data (in a .csv file) is surrounded by brackets: []
I would like the file to contain a structure like this:
name, wavelength, measurement
i.e
a,400,0.34
a,410,0.65
...
but what I get is:
a,400,[0.34]
a,410,[0.65]
...
Is there any simple fix for this?
Is it because measurement is a string?
Thank you
import serial # requires pyserial library
ser = serial.Serial(0)
ofile = file( 'spectral_data.csv', 'ab')
while True:
name = raw_input("Pigment name [Q to finish]: ")
if name == "Q":
print "bye bye!"
ofile.close()
break
first = True
while True:
line = ser.readline()
if first:
print " Data incoming..."
first = False
split = line.split()
if 10 <= len(split):
try:
wavelength = int(split[0])
measurement = [float(split[i]) for i in [6]]
ofile.write(str(name) + "," + str(wavelength) + "," + str(measurement) + '\n')
except ValueError:
pass # handles the table heading
if line[:3] == "110":
break
print " Data gathered."
ofile.write('\n')
do this:
measurement = [float(split[i]) for i in [6]]
ofile.write(str(name) + "," + str(wavelength) + "," + ",".join(measurement) + '\n')
OR
ofile.write(str(name) + "," + str(wavelength) + "," + split[6] + '\n')