Manually read lines in python - python

I have a file that has the filename a new line, then the hashvalue of the file then a newline. This pattern repeats. Example:
blah.txt
23847EABF8742
file2.txt
1982834E387FA
I have a class called 'information' that has two member variables.
class information:
filename=''
hashvalue=''
Now I want to read in the file and store a filename and hashvalue in a new instance of an 'information' object and then push the instance of the information object onto a list.
The problem I am having is iterating over the file to read it. I want to read it line by line until the end of file. The problem with python's 'for line in file' method is that it grabs a line each time and I would be forced to do some kind of every other tactic to put the data in the correct member variable.
Instead, this is what I am trying to do...
list=[]
while(not end of file)
x = information()
x.filename = file.readline()
x.hashvalue = file.readline()
list.append(x)

You could write a generator function:
def twolines(file):
cur = None
for i in file:
if cur is None:
cur = i
else:
yield (cur, i)
cur = None
Then pass your file object to twolines(), and do something like
for i, j in twolines(file):
x = information()
x.filename, x.hashvalue = i,j
list.append(x)

while True:
x = information()
x.filename = file.readline()
if not x.filename:
break
x.hashvalue = file.readline()
my_list.append(x)
maybe?
or
while True:
x = information()
try:
x.filename = next(file)
x.hashvalue = next(file)
except StopIterationError:
break
my_list.append(x)
or my favorite
my_list = [filename,hashvalue for filename,hashvalue in zip(file,file)]

Another simple fix is by counting the lines. Introduce a variable like line = 0 for this. now you could try the following:
for lines in file:
line = line + 1
if line % 2 == 1:
# This will be the filename
else:
# This will be the hashcode

How about this:
list = [information(filename=x.rstrip(), hashvalue=next(it).rstrip()) for x in file]

Related

Why does points not change to the users score and instead remains as the temporary value of 0?

points = "temp"
a = "temp"
f = "temp"
def pointincrementer():
global points
points = 0
for line in f:
for word in a:
if word in line:
scorelen = int(len(user+","))
scoreval = line[0:scorelen]
isolatedscore = line.replace(scoreval,'')
if "," in line:
scorestr = isolatedscore.replace(",","")
score = int(scorestr)
points = score + 1
print(points)
def score2():
f = open('test.txt','r')
a = [user]
lst = []
for line in f:
for word in a:
if word in line:
pointincrementer()
print(points)
point = str(points)
winning = (user+","+point+","+"\n")
line = line.replace(line,winning)
lst.append(line)
f.close()
f = open('test.txt','w')
for line in lst:
f.write(line)
f.close()
print("Points updated")
user = input("Enter username: ") #change so user = winners userid
with open('test.txt') as myfile:
if user in myfile.read():
score2()
else:
f = open('test.txt','r')
f2 = f.read()
f3 = (f2+"\n"+user)
f.close()
f = open('test.txt','w')
f.write(f3)
f.close()
score2()
This is paired with test.txt, which looks like this:
one,1,
two,5,
three,4,
four,94,
When this code is run, it it will ask the user their name (as expected) and then will print 0 (when it should instead print the user's score) and then Points updated. Anybody know how to sort this out?
There are many problems with your code. You should not be using global variables like that. Each function should be passed what it needs, do its computing, and return values for the caller to handle. You should not be reading the file multiple times. And you can't write the file while you still have it open with the with statement.
Here, I read the file at the beginning into a Python dictionary. The code just updates the dictionary, then writes it back out at the end. This makes for a simpler and more maintainable structure.
def readdata(fn):
data = {}
for row in open(fn):
info = row.strip().split(',')
data[info[0]] = int(info[1])
return data
def writedata(fn,data):
f = open(fn,'w')
for k,v in data.items():
print( f"{k},{v}", file=f )
def pointincrementer(data,user):
return data[user] + 1
def score2(data, user):
points = pointincrementer(data, user)
print(points)
data[user] = points
print("Points updated")
user = input("Enter username: ")
data = readdata( 'test.txt' )
if user not in data:
data[user] = 0
score2(data, user)
writedata( 'test.txt', data )
The f in pointincrementer() refers to the "temp" string declared on the third line. The f in score2() refers to the file handle declared immediately below the function header. To get around this, you can pass the file handle into pointincrementer():
def pointincrementer(file_handle):
global points
points = 0
for line in file_handle:
for word in a:
if word in line:
scorelen = int(len(user+","))
scoreval = line[0:scorelen]
isolatedscore = line.replace(scoreval,'')
if "," in line:
scorestr = isolatedscore.replace(",","")
score = int(scorestr)
points = score + 1
print(points)
def score2():
file_handle = open('test.txt','r')
a = [user]
lst = []
for line in f:
print(line)
for word in a:
if word in line:
pointincrementer(file_handle)
print(points)
point = str(points)
winning = (user+","+point+","+"\n")
line = line.replace(line,winning)
lst.append(line)
f.close()
f = open('test.txt','w')
for line in lst:
f.write(line)
f.close()
print("Points updated")
This leads to a parsing error. However, as you haven't described what each function is supposed to do, this is the limit to which I can help. (The code is also extremely difficult to read -- the lack of readability in this code snippet is likely what caused this issue.)

Open two files. Read from the first file and do lookup on second before writing a line

I have two text files. I can open both with Python successfully.
I open the first file and read a data element into a variable using the for l in file construct.
I open the second file and read a data element into a variable using the for l in file construct.
If both variables match I write data to a text file. For the first line read it works perfectly but subsequent lines do not. The FIN variable never changes even though it finds a new line that starts with D further along. Is there a way to loop through two files like this? Am I missing something obvious?
File2Split = 'c:\\temp\\datafile\\comparionIP.txt'
GetResident = 'c:\\temp\\datafile\\NPINumbers.txt'
writefile = open('c:\\temp\\datafile\\comparionIPmod.txt','w')
openfile = open(File2Split,'r')
openfileNPI = open(GetResident,'r')
FIN = ''
FirstChar = ''
FIN2 = ''
for l in openfile:
FirstChar = (l[0:1])
if FirstChar =='D':
FIN = (l[21:31])
#print (FIN)
if FIN.startswith('1'):
writefile.write(l)
elif FirstChar in ['F','G','C','R']:
writefile.write(l)
elif FirstChar =='N':
for l2 in openfileNPI:
FIN2 = (l2[0:10])
NPI = ('N' + (l2[11:21]))
if FIN2 == FIN:
writefile.write(NPI + '\n')
openfileNPI.close()
openfile.close()
writefile.close()

How can I change for loop while in it?

for x in file.readlines():
something()
I think this code caching all the lines when loop is started. I deleting some of the lines from the file but it still repeating deleted lines. How can I change loop while in it?
def wanted(s,d):
print("deneme = " + str(s))
count = 0
total = 0
TG_count = TC_count = TA_count = GC_count = CC_count = CG_count = GG_count = AA_count = AT_count = TT_count = CT_count = AG_count = AC_count = GT_count = 0
for x in range(d,fileCount):
print(str(x+1) + 'st file processing...')
searchFile = open(str(x) + '.txt',encoding = 'utf-8',mode = "r+")
l = searchFile.readlines()
searchFile.seek(0)
for line in l:
if s in line[:12]:
blabla()
else:
searchFile.write(line)
searchFile.truncate()
searchFile.close()
for p in range(fileCount):
searchFile = open(str(p) + '.txt',encoding = 'utf-8',mode = "r+")
for z in searchFile.readlines():
wanted(z[:12],p)
print("Progressing file " + str(p) + " complete")
I guess it's Python. Yes, readlines() reads the whole file at once. In order to avoid this you can use:
for x in file:
something()
Maybe you can find the appropriate information in the Python tutorial. It says
If you want to read all the lines of a file in a list you can also use list(f) or f.readlines().
So yes, all lines are read and stored in memory.
Also the manual says:
f.readline() reads a single line from the file;
More details can be found in the manual.

how to join incorporate splitted lines with replacing data from a file into the same string

So as most of us are thinking it's a duplicate which is not, so what I'm trying to achieve is let's say there is a Master string like the below and couple of files mentioned in it then we need to open the files and check if there are any other files included in it, if so we need to copy that into the line where we fetched that particular text.
Master String:
Welcome
How are you
file.txt
everything alright
signature.txt
Thanks
file.txt
ABCDEFGHtele.txt
tele.txt
IJKL
signature.txt
SAK
Output:
Welcome
How are you
ABCD
EFGH
IJKL
everything alright
SAK
Thanks
for msplitin [stext.split('\n')]:
for num, items in enumerate(stext,1):
if items.strip().startswith("here is") and items.strip().endswith(".txt"):
gmsf = open(os.path.join(os.getcwd()+"\txt", items[8:]), "r")
gmsfstr = gmsf.read()
newline = items.replace(items, gmsfstr)
How to join these replace items in the same string format.
Also, any idea on how to re-iterate the same function until there are no ".txt". So, once the join is done there might be other ".txt" inside a ".txt.
Thanks for your help in advance.
A recursive approach that works with any level of file name nesting:
from os import linesep
def get_text_from_file(file_path):
with open(file_path) as f:
text = f.read()
return SAK_replace(text)
def SAK_replace(s):
lines = s.splitlines()
for index, l in enumerate(lines):
if l.endswith('.txt'):
lines[index] = get_text_from_file(l)
return linesep.join(lines)
You can try:
s = """Welcome
How are you
here is file.txt
everything alright
here is signature.txt
Thanks"""
data = s.split("\n")
match = ['.txt']
all_matches = [s for s in data if any(xs in s for xs in match)]
for index, item in enumerate(data):
if item in all_matches:
data[index] ="XYZ"
data = "\n".join(data)
print data
Output:
Welcome
How are you
XYZ
everything alright
XYZ
Thanks
Added new requirement:
def file_obj(filename):
fo = open(filename,"r")
s = fo.readlines()
data = s.split("\n")
match = ['.txt']
all_matches = [s for s in data if any(xs in s for xs in match)]
for index, item in enumerate(data):
if item in all_matches:
file_obj(item)
data[index] ="XYZ"
data = "\n".join(data)
print data
file_obj("first_filename")
We can create temporary file object and keep the replaced line in that temporary file object and once everything line is processed then we can replace with the new content to original file. This temporary file will be deleted automatically once its come out from the 'with' statement.
import tempfile
import re
file_pattern = re.compile(ur'(((\w+)\.txt))')
original_content_file_name = 'sample.txt'
"""
sample.txt should have this content.
Welcome
How are you
here is file.txt
everything alright
here is signature.txt
Thanks
"""
replaced_file_str = None
def replace_file_content():
"""
replace the file content using temporary file object.
"""
def read_content(file_name):
# matched file name is read and returned back for replacing.
content = ""
with open(file_name) as fileObj:
content = fileObj.read()
return content
# read the file and keep the replaced text in temporary file object(tempfile object will be deleted automatically).
with open(original_content_file_name, 'r') as file_obj, tempfile.NamedTemporaryFile() as tmp_file:
for line in file_obj.readlines():
if line.strip().startswith("here is") and line.strip().endswith(".txt"):
file_path = re.search(file_pattern, line).group()
line = read_content(file_path) + '\n'
tmp_file.write(line)
tmp_file.seek(0)
# assign the replaced value to this variable
replaced_file_str = tmp_file.read()
# replace with new content to the original file
with open(original_content_file_name, 'w+') as file_obj:
file_obj.write(replaced_file_str)
replace_file_content()

Optimizing python file search?

I'm having some trouble optimizing this part of code.
It works, but seems unnecessary slow.
The function searches after a searchString in a file starting on line line_nr and returns the line number for first hit.
import linecache
def searchStr(fileName, searchString, line_nr = 1, linesInFile):
# The above string is the input to this function
# line_nr is needed to search after certain lines.
# linesInFile is total number of lines in the file.
while line_nr < linesInFile + 1:
line = linecache.getline(fileName, line_nr)
has_match = line.find(searchString)
if has_match >= 0:
return line_nr
break
line_nr += 1
I've tried something along these lines, but never managed to implement the "start on a certain line number"-input.
Edit: The usecase. I'm post processing analysis files containing text and numbers that are split into different sections with headers. The headers on line_nr are used to break out chunks of the data for further processing.
Example of call:
startOnLine = searchStr(fileName, 'Header 1', 1, 10000000):
endOnLine = searchStr(fileName, 'Header 2', startOnLine, 10000000):
Why don't you start with simplest possible implementation ?
def search_file(filename, target, start_at = 0):
with open(filename) as infile:
for line_no, line in enumerate(infile):
if line_no < start_at:
continue
if line.find(target) >= 0:
return line_no
return None
I guess your file is like:
Header1 data11 data12 data13..
name1 value1 value2 value3...
...
...
Header2 data21 data22 data23..
nameN valueN1 valueN2 valueN3..
...
Does the 'Header' string contains any constant formats(i.e: all start with '#' or sth). If so, you can read the line directly, judge if the line contains this format (i.e: if line[0]=='#') and write different code for different kinds of lines(difination line and data line in your example).
Record class:
class Record:
def __init__(self):
self.data={}
self.header={}
def set_header(self, line):
...
def add_data(self, line):
...
iterate part:
def parse(p_file):
record = None
for line in p_file:
if line[0] == "#":
if record : yield record
else:
record = Record()
record.set_header(line)
else:
record.add_data(line)
yield record
main func:
data_file = open(...)
for rec in parse(data_file):
...

Categories

Resources