I am trying to do a calculation based on the content of a line, but only if another line in the same document satisfies specific criteria. The order of the lines is not consistent.
A file might look like this:
Line A: 200
Line B: 200
Line C: 5
So an example condition would be, if Line C is 6 or greater, add the value from Line A "200" to a counter.
I have tried a variety of if statements, and also tried setting a BOOL. I haven't been able to get either to work. An excerpt of my latest attempt follows:
counter = 0
good = True
for line in text:
line = line.strip()
if line.startswith('Line C') :
rtime = re.findall('[0-9]+:[0-9]+', line)
for t in rtime:
if t < 6 :
good = False
print("-----To Small. Ignore Line A")
break
else :
good = True
while good == True :
if line.startswith('Line A') :
numstring = re.findall('[0-9]+', line)
for num in numstring:
temp = float(num)
counter = counter + temp
else : continue
print("----- good must be False. Should be ignoring Line A")
First, read all the rows from the file into a dictionary so that you have:
{'Line A':200, 'Line B':200, 'Line C':5}
After this it is easy to apply the criterias with conditionals like "if value['Line A'] > 6:" etc.
I am leaving with you the implementation of this because it sounds a bit homework-y. Let me know if you need more help!
Maybe you can use a dictionary if the lines aren't too long. A simple way would just add the lines to a dictionary and then check your condition.
import re
allDataLines = []
allQualifierLines = []
dataFileName = 'testdata.txt'
def chckForQualifierLine(line):
# lines containing qualifier
if not line.startswith('Line C'):
return False
# do more checks here, if not good just return False
allQualifierLines.append(line)
return True
def chckForDataLine(line):
# lines containing data
if not line.startswith('Line A'):
return False
# do more checks here, if not good just return False
allDataLines.append(line)
return True
with open(dataFileName, 'r') as text:
# Further file processing goes here
line = text.readline()
while line != '':
#print(line, end='')
if not chckForQualifierLine(line):
chckForDataLine(line)
line = text.readline()
for qualifierLine in allQualifierLines:
# this line is valid qualifier
print(f"Q: {qualifierLine}")
for dataLine in allDataLines:
# do with data line(s) what ever is to do here
print(f"D: {dataLine}")
Related
I am writing a small piece of code which parses latex files and gives me the newcommands defined. A test case would be this simple latex file:
% +--------------------------------------------------------------------+
% | |
% | New particle stuff |
% | |
% +--------------------------------------------------------------------+
\newcommand*{\Hmp}{\ensuremath{H^{\mp}}\xspace}
\newcommand*{\susy}[1]{\ensuremath{\tilde{#1}}\xspace}
\newcommand*{\susy2}[1,2]{\ensuremath{\tilde{#1}\tilde{#2}}\xspace}
There might be a more complicated case where the command expands several lines so I need to keep track of different steps like if the command needs to be incremented with more lines or when the command has finished and is ready to be finished.
The thing is that, within a couple of nested if/else statements the scope of the variables seems lost and the variables are not updated anymore. Here is what am I am doing:
macros = []
warg_keep = re.compile("newcommand\*\{(.*)\}\[(.*)\]\{(.*)")
woarg_keep = re.compile("newcommand\*\{(.*)\}\{(.*)")
warg_one = re.compile("newcommand\*\{(.*)\}\[(.*)\]\{(.*)\}")
woarg_one = woarg = re.compile("newcommand\*\{(.*)\}\{(.*)\}")
keep = False
for line in open(file).readlines():
line = line.strip()
if len(line) == 0 or line[0] == "%":
continue
if not keep:
newcommand = {"key":"","command":"","args":[]}
added = False
if "newcommand" in line:
if line[-1] == "%":
clean_line = line[0:-1]
keep = True
newcommand = get_cmd_from_line(warg_keep,woarg_keep,clean_line)
else:
newcommand = get_cmd_from_line(warg_one, woarg_one, line)
added = True
elif keep:
# Now it dos not matter how it ends, the command will always be added the line without the
# last character, it can be either % or } but it shouldn't be added
newcommand["command"] += line[0:-1]
# End the keep
if line[-1] != "%":
keep = False
added = True
elif added:
macros.append(newcommand)
The issue is when I assign the newcommand variable the value I get from the get_cmg_from_line function (which I have tested works perfectly) it doesn't update the newcommand variable but if I move it over the previous if then it recognizes it and updates it. The same thing happens with the keep and added variables.
I have searched for this and found a lot of things about scopes/if/functions etc. which I alrady knew and since ifs shouldn't define scope I don't know why this is happening... am I missing something stupid? How should I update the value of the newcommand variable? Since it might get updated with new lines coming. The only solution I see is to flatten the code but I would like to maintain it like this.
EDIT: I changed a little bit the original code to accommodate extra features of the text but without flattening the code it doesn't work either. So the code on top is not working for the reason I mention. The code below works perfectly and passes all tests:
macros = []
warg_keep = re.compile("newcommand\*\{(.*)\}\[(.*)\]\{(.*)")
woarg_keep = re.compile("newcommand\*\{(.*)\}\{(.*)")
warg_one = re.compile("newcommand\*\{(.*)\}\[(.*)\]\{(.*)\}")
woarg_one = woarg = re.compile("newcommand\*\{(.*)\}\{(.*)\}")
keep = False
for line in open(file).readlines():
line = line.strip()
if len(line) == 0 or line[0] == "%":
continue
if not keep:
newcommand = {"key":"","command":"","args":[]}
added = False
if "newcommand" in line and line [-1] == "%":
clean_line = line[0:-1]
keep = True
newcommand = get_cmd_from_line(warg_keep,woarg_keep,clean_line)
if "newcommand" in line and line[-1] != "%":
newcommand = get_cmd_from_line(warg_one, woarg_one, line)
added = True
if not "newcommand" in line and keep:
# Now it dos not matter how it ends, the command will always be added the line without the
# last character, it can be either % or } but it shouldn't be added
newcommand["command"] += line[0:-1]
if not "newcommand" in line and keep and line[-1] != "%":
# End the keep
keep = False
added = True
if added:
macros.append(newcommand)
On a cursory inspection, my first guess is that elif keep and elif added should be if keep and if added, respectively.
Another possibility is that you are expecting newcommand to accumulate from one line to the next, but you are resetting it on each pass. Should newcommand = { … } be moved in front of the for line in …:?
The scope of local variables in Python (like many other scripting languages) is at function level, and not at block level.
Example:
def function():
x = 5
if True:
y = 8
print(x)
print(y)
function()
# -> 5
# -> 8
I'm trying to write a function that reads through a text file until it finds a word (say "hello"), then print the next x lines of string starting with string 1 (say "start_description") until string 2 (say "end_description").
hello
start_description 123456 end_description
The function should look like description("hello") and the following output should look like
123456
It's a bit hard to explain. I know how to find the certain word in the text file but I don't know how to print, as said, the next few lines between the two strings (start_description and end_description).
EDIT1:
I found some code which allows to print the next 8, 9, ... lines. But because the text in between the two strings is of variable length, that does not work...
EDIT2:
Basically it's the same question as in this post: Python: Print next x lines from text file when hitting string, but the range(8) does not work for me (see EDIT1).
The input file could look like:
HELLO
salut
A: 123456.
BYE
au revoir
A: 789123.
The code should then look like:
import re
def description(word):
doc = open("filename.txt",'r')
word = word.upper()
for line in doc:
if re.match(word,line):
#here it should start printing all the text between start_description and end_description, for example 123456
return output
print description("hello")
123456
print description("bye")
789123
Here's a way using split:
start_desc = 'hello'
end_desc = 'bye'
str = 'hello 12345\nabcd asdf\nqwer qwer erty\n bye'
print str.split('hello')[1].split('bye')[0]
The first split will result in:
('', ' 12345\nabcd asdf\nqwer qwer erty\n bye')
So feed the second element to the second split and it will result in:
('12345\nabcd asdf\nqwer qwer erty\n ', '')
Use the first element.
You can then use strip() to remove the surrounding spaces if you wish.
def description(infilepath, startblock, endblock, word, startdesc, enddesc):
with open(infilepath) as infile:
inblock = False
name = None
found = False
answer = []
for line in infile:
if found and not inblock: return answer
if line.strip() != startblock and not inblock: continue
if line.strip() == startblock: inblock = True
elif line.strip() == endblock: inblock = False
if not line.startswith(startdesc):
name = line.strip()
continue
if name is not None and name != word: continue
if not line.startswith(startdesc): continue
answer.append(line.strip().lstrip(startdesc).rstrip(enddesc))
I have a small script which goes like this:
(Notice the comparison)
def readVariations(path_to_source_config):
varsTuple = []
source_file = open(path_to_source_config, "r")
for line in source_file:
line_no_spaces = line.replace(" ","")
if line_no_spaces[0] == '[':
current_line_ = line_no_spaces.replace("[", "")
current_line = current_line_.replace("]", "")
section = "ExecutionOptimizer"
if current_line == section:
print current_line
#_tuple = section_name, option_name, range_start, range_end, step
#varsTuple.append(_tuple)
return varsTuple
What it does is reads a config file (.cfg) and needs to check if it finds a particular string.
The following line comes up in the config file:
[ExecutionOptimizer]
For some reason the comparison is failing when the same string is encountered in the file.
Can you please tell me why.
I suspect line ends with a newline character, and it remains there throughout all of your replace operations. Then your comparison fails because "ExecutionOptimizer\n" doesn't equal "ExecutionOptimizer". You can discard the newline using strip:
line_no_spaces = line.strip().replace(" ","")
Use "is" key word.
"==" is for equality testing
From a Python interpreter:
> a = 'tea'
> b = ''.join(['t', 'e', 'a'])
> a == b
True
> a is b
False
My function doesn't work as it is supposed to. I keep getting 'True' when all line[0] are less than line[2]. I know this is pretty trivial, but it's an exercise i've taken to better understand files and for
def contains_greater_than(filename):
"""
(str) --> bool
The text file of which <filename> is the name contains multiple lines.
Each line consists of two integer numbers, separated by a space.
This returns True iff in at least one of those lines, the first number
is larger than the second one.
"""
lines = open(filename).readlines()
for line in lines:
if line[0] > line[2]:
return True
return False
my data:
3 6
3 7
3 8
2 9
3 20
Having been thoroughly schooled in my over-thought previous answer, may I offer this far simpler solution which still short-circuits as intended:
for line in lines:
x, y = line.split()
if int(x) > int(y): return True
return False
line[0] = "3" , line[1] = " "
for all cases in your data ('3' < ' ' = False)
you need to do
split_line = line.split()
then
numbers = [int(x) for x in split_line]
then looks at numbers[0] and numbers[1]
1) You are comparing strings that you need to convert to integers
2) You will only grab the first and third character (so, you won't get the 0 in 20)
Instead use
first, second = line.split()
if first < second:
Here's a whole-hog functional rewrite. Hope this is enlightening ;-)
import functools
def line_iter(fname):
with open(fname) as inf:
for line in inf:
line = line.strip()
if line:
yield line
def any_line(fn, fname):
return any(fn(line) for line in line_iter(fname))
def is_greater_than(line):
a,b = [int(i) for i in line]
return a > b
contains_greater_than = functools.partial(any_line, is_greater_than)
"3 20" is a string, just do map(int, LINE.split()) before.
but how do you want compare 2 numbers with 2 numbers?
The main problem is you are comparing characters of the line, not the values of the two numbers on each one. This can be avoided first splitting the line into white-space-separated words, and then turning those into an integer value for the comparison by applying theint()function to each one:
def contains_greater_than(filename):
with open(filename) as inf:
for line in inf:
a, b = map(int, line.split())
if a > b:
return True
return False
print(contains_greater_than('comparison_data.txt'))
This can all be done very succinctly in Python using the built-inany()function with a couple of generator expressions:
def contains_greater_than(filename):
with open(filename) as inf:
return any(a > b for a, b in (map(int, line.split()) for line in inf))
I wasted plenty of hours trying to figure out the problem but no luck. Tried asking the TA at my school, but he was useless. I am a beginner and I know there are a lot of mistakes in it, so it would be great if I can get some detail explanation as well. Anyways, basically what I am trying to do with the following function is:
Use while loop to check and see if random_string is in TEXT, if not
return NoneType
if yes, then use a for loop to read lines from that TEXT and put it
in list, l1.
then, write an if statement to see if random_string is in l1.
if it is, then do some calculations.
else read the next line
Finally, return the calculations as a whole.
TEXT = open('randomfile.txt')
def random (TEXT, random_string):
while random_string in TEXT:
for lines in TEXT:
l1=TEXT.readline().rsplit()
if random_string in l1:
'''
do some calculations
'''
else:
TEXT.readline() #read next line???
return #calculations
return None
Assuming the calculation is a function of the line, then:
def my_func(fileobj,random_string,calculation_func):
return [calculation_func(line) for line in fileobj if random_string in line] or None
otherwise, you could do this:
def my_func(fileobj,random_string):
calculated = []
for line in fileobj:
if random_string in line:
#do calculations, append to calculated
return calculated or None
I omitted the while loop because it would needlessly increase the complexity of the function. fileobj assumes a file-like object, such as a buffer or like one returned by open.
Edit With while loop:
def my_func(fileobj,random_string):
calculated = []
try:
while True: #remnant from competitive programming - makes it faster
line = fileobj.readline()
if random_string in line:
#do calculations, append to calculated
except EOFError: #catches the error thrown when calling readline after file is empty.
return calculated or None
Edit 2
Taking into account the OP's new information
def my_func(fileobj,random_string):
total = 0
number = 0
try:
while True:
line = fileobj.readline()
if random_string in line:
total += float(line.split()[1])
number += 1
if total == number == 0:
return 0 #or whatever default value if random_string isn't in the file
return total/number
Shorter version:
def my_func(fileobj,random_string):
results = [float(line.split()[1]) for line in fileobj if random_string in line]
return sum(results)/len(results)
Maybe?:
def my_func(ccy):
with open('randomfile.txt', 'r') as f:
l1 = [float(line.split()[-1]) for line in f.readlines() if ccy in line]
if l1:
return sum(l1) / len(l1)
else:
return None
If I can clarify your requirements:
Use while loop to check and see if random_string is in a file, if not return None.
Collect lines that have random_string in a list.
Do some calculations on the lines collected and return the result of the calculations.
Then the following should get you started:
calculation_lines = []
random_string = 'needle'
with open('somefile.txt') as the_file:
for line in the_file:
if random_string in line:
calculation_lines.append(line)
if not calculation_lines:
return None # no lines matched
for i in calculation_lines:
# do some calculations
result_of_calculations = 42
return result_of_calculations