matching parentheses using a file - python

I am trying to match parentheses from a line in a file but when I use the code below without getting data from a file and entering it instead it works and matched the parentheses. I don't know how to allow it to work with numbers and letters in between too.
i have tried many different ways but this has worked the best so far. I think there is firstly something wrong with what i am printing but i have tried everything that i know to fix that. i am also new to python so it might not be the best code.
class Stack:
def __init__(self):
self._items = []
def isEmpty(self):
return self._items == []
def push(self,item):
self._items.append(item)
def pop(self):
return self._items.pop()
stack = Stack()
open_list = ["[","{","("]
close_list = ["]","}",")"]
def open_file():
file = open("testing.txt","r")
testline = file.readline()
count = 1
while testline != "":
testline = testline[:-1]
check(testline,count)
testline = file.readline()
count = count + 1
def check(testline,count):
stack = []
for i in testline:
if i in open_list:
stack.append(i)
elif i in close_list:
pos = close_list.index(i)
if ((len(stack) > 0) and
(open_list[pos] == stack[len(stack)-1])):
stack.pop()
else:
print ("Unbalanced")
print (count)
if len(stack) == 0:
print ("Balanced")
print (count)
def main():
open_file()
if __name__=="__main__":
main()
output:
if the file contains
dsf(hkhk[khh])
ea{jhkjh[}}
hksh[{(]
sd{hkh{hkhk[hkh]}}]
the output is
Balanced
1
Unbalanced
2
Unbalanced
2
Unbalanced
3
Unbalanced
4
Balanced
4
The first four are correct but it adds 2 and i have no idea where it is coming from. I need the count for later purposes when i am printing (ie line 1 is balanced)

Time to learn the basis of debugging...
#emilanov has given hints for the open_file function, so I will focus on the check function.
for i in range (0,len(testline),1):
Is probably not what you want: i will take integer values from 0 to len(testline) -1. The rule is: when things go wrong, use a debugger or add trace prints. Here
for i in range (0,len(testline),1):
print(i) # trace - comment out for production code
would have made the problem evident.
What you want is probably:
for i in testline:

There are some problems with your open_file() function.
The while-loop finishes only when testline == "" returns true. So when later you do check(testline), you actually give the function an empty string, so it can't really do it's job.
I assume the purpose of the while loop is to remove the newline character \n for each line in the file? The problem is you're not saving the intermediate lines anywhere. Then when file.readline() returns a "" because the file doesn't have any more lines, you give that empty string to the function.
Some suggestions
# A way to remove newlines
testline = testline.replace("\n", "")
# Check all the lines
lines = file.readlines()
count = len(lines)
for testline in lines:
testline = testline.replace("\n", "")
check(testline)
# And if you're sure that the file will have only one line
testline = file.readline()[:1] # read line and remove '\n'
check(testline)
Remember, a string is just a list full of characters. So you can do len(string) to see the length. Or you can do len(file.readlines()) to see how many lines a file has. Either way, you can get rid of the count variable.
Printing
When you call print(check()) it first calls the check() function with no parameters, so it can't actually check anything. That's why you can't see the right print statement.
A suggested edit would be to move the print statement at the end of your open_file() function, so that you have print(check(testline))
Another possible solution would be to put a return statement in your open_file() function.
def open_file():
# Some code...
return check(testline)
def check():
# Some code...
print(open_file())
The easiest will probably be to replace the return statements in check() with print statements though.

Related

My code is missing some of the lines im trying to get out of a file

The basic task is to write a function, get_words_from_file(filename), that returns a list of lower case words that are within the region of interest. They share with you a regular expression: "[a-z]+[-'][a-z]+|[a-z]+[']?|[a-z]+", that finds all words that meet this definition. My code works well on some of the tests but fails when the line that indicates the region of interest is repeated.
Here's is my code:
import re
def get_words_from_file(filename):
"""Returns a list of lower case words that are with the region of
interest, every word in the text file, but, not any of the punctuation."""
with open(filename,'r', encoding='utf-8') as file:
flag = False
words = []
count = 0
for line in file:
if line.startswith("*** START OF"):
while count < 1:
flag=True
count += 1
elif line.startswith("*** END"):
flag=False
break
elif(flag):
new_line = line.lower()
words_on_line = re.findall("[a-z]+[-'][a-z]+|[a-z]+[']?|[a-z]+",
new_line)
words.extend(words_on_line)
return words
#test code:
filename = "bee.txt"
words = get_words_from_file(filename)
print(filename, "loaded ok.")
print("{} valid words found.".format(len(words)))
print("Valid word list:")
for word in words:
print(word)
The issue is the string "*** START OF" is repeated and isn't included when it is inside the region of interest.
The test code should result in:
bee.txt loaded ok.↩
16 valid words found.↩
Valid word list:↩
yes↩
really↩
this↩
time↩
start↩
of↩
synthetic↩
test↩
case↩
end↩
synthetic↩
test↩
case↩
i'm↩
in↩
too
But I'm getting:
bee.txt loaded ok.↩
11 valid words found.↩
Valid word list:↩
yes↩
really↩
this↩
time↩
end↩
synthetic↩
test↩
case↩
i'm↩
in↩
too
Any help would be great!
Attached is a screenshot of the file
The specific problem of your code is the if .. elif .. elif statement, you're ignoring all lines that look like the line that signals the start or end of a block, even if it's in the test block.
You wanted something like this for your function:
def get_words_from_file(filename):
"""Returns a list of lower case words that are with the region of
interest, every word in the text file, but, not any of the punctuation."""
with open(filename, 'r', encoding='utf-8') as file:
in_block = False
words = []
for line in file:
if not in_block and line == "*** START OF A SYNTHETIC TEST CASE ***\n":
in_block = True
elif in_block and line == "*** END TEST CASE ***\n":
break
elif in_block:
words_on_line = re.findall("[a-z]+[-'][a-z]+|[a-z]+[']?|[a-z]+", line.lower())
words.extend(words_on_line)
return words
This is assuming you are actually looking for the whole line as a marker, but of course you can still use .startswith() if you actually accept that as the start or end of the block, as long as it's sufficiently unambiguous.
Your idea of using a flag is fine, although naming a flag to whatever it represents is always a good idea.

extracting the data of a specified function in a python file | adding comments to highlight what is removed

I want to extract the code written under a specified function. I am trying to do it like this:
With an example file TestFile.py containing the following function sub():
def sub(self,num1,num2):
# Subtract two numbers
answer = num1 - num2
# Print the answer
print('Difference = ',answer)
If I run get_func_data.py:
def giveFunctionData(data, function):
dataRequired = []
for i in range(0, len(data)):
if data[i].__contains__(str(function)):
startIndex = i
for p in range(startIndex + 1, len(data)):
dataRequired.append(data[p])
if data[p].startswith('\n' + 'def'):
dataRequired.remove(dataRequired[len(dataRequired) - 1])
break
print(dataRequired)
return dataRequired
data = []
f = open("TestFile.py", "r")
for everyLine in f:
if not(everyLine.startswith('#') or everyLine.startswith('\n' + '#')):
data.append(everyLine)
giveFunctionData(data,'sub') # Extract content in sub() function
I expect to obtain the following result:
answer = num1 - num2
print('Difference = ',answer)
But here I get the comments written inside the function as well. Instead of the list, Is there a way to get it as it is written in the file?
Returning a string from your function giveFunctionData()
In your function giveFunctionData you're instantiating the variable dataRequired as a list and returning it after assigning it a value so of course you're getting a list back.
You'd have to unpack the list back into a string. One way could be this:
# Unpack the list into a string
function_content = ''
for line in dataRequired:
function_content += line + '\n'
# function_content now contains your desired string
The reason you're still getting comment lines
Iterating from a file object instantiated via open() will give you a list of lines from a file with \n already used as a delimiter for lines. As a result, there aren't any \n# for .startswith('\n' + '#')) to find.
General comments
There is no need to specify the newline and # character separately like you did in .startswith('\n' + '#')). '\n#' would have been fine
If you intend for the file to be run as a script, you really should put your code to be run in a if __name__ == "__main__": conditional. See What does if name == “main”: do?
It might be cleaner to move the reading of the file object into your giveFunctionData() function. It also eliminates having to to iterate over it multiple times.
Putting it all together
Note that this script isn't able to ignore comments placed in the same line as code, (eg. some = statement # With comments won't be comment-stripped)
def giveFunctionData(data, function):
function_content = ''
# Tells us whether to append lines to the `function_content` string
record_content = False
for line in data:
if not record_content:
# Once we find a match, we start recording lines
if function in line:
record_content = True
else:
# We keep recording until we encounter another function
if line.startswith('def'):
break
elif line.isspace():
continue
elif '#' not in line:
# Add line to `function_content` string
function_content += line
return function_content
if __name__ == "__main__":
data = []
script = open("TestFile.py")
output = giveFunctionData(script, 'sub')
print(output)
I have generated code which an do your task. I don't think you require 2 different processing part like function and code to fetch data.
You can do one thing, create a function which accept 2 arguments i.e. File Name and Function Name. Function should return the code you want.
I have created function getFunctionCode(filename,funcname). Code is working well.
def getFunctionCode(filename, funcname):
data = []
with open(filename) as fp:
line = fp.readlines()
startIndex = 0 #From where to start reading body part
endIndex = 0 #till what line because file may have mult
for i in range(len(line)): #Finding Starting index
if(line[i].__contains__(funcname)):
startIndex = i+1
break
for i in range(startIndex,len(line)):
if(line[i].__contains__('def')): #Find end in case - multiple function
endIndex = i-1
break
else:
endIndex = len(line)
for i in range(startIndex,endIndex):
if(line[i] != None):
temp = "{}".format(line[i].strip())[0]
if(temp != '\n' and temp != '#'):
data.append(line[i][:-1])
return(data)
I have read the file provided in first argument.
Then Found out the index where function is location. Function is provided in second arguement.
Starting from the index, I cleared string and checked first character to know about comment (#) and new line (\n).
Finally, the lines without these are appended.
Here, you can find file TestFile.py :
def sub(self,num1,num2):
# Subtract two numbers
answer = num1 - num2
# Print the answer
print('Difference = ',answer)
def add(self,num1,num2):
# addition of two numbers
answer = num1 + num2
# Print the answer
print('Summation = ',answer)
def mul(self,num1,num2):
# Product of two numbers
answer = num1 * num2
# Print the answer
print('Product = ',answer)
Execution of function :
getFunctionCode('TestFile.py','sub')
[' answer = num1 - num2', " print('Difference = ',answer)"]
getFunctionCode('TestFile.py','add')
[' answer = num1 + num2', " print('Summation = ',answer)"]
getFunctionCode('TestFile.py','mul')
[' answer = num1 * num2', " print('Product = ',answer)"]
Solution by MoltenMuffins is easier as well.
Your implementation of this function would fail badly if you have multiple functions inside your TestFile.py file and if you are intending to retrieve the source code of only specific functions from TestFile.py. It would also fail if you have some variables defined between two function definitions in TestFile.py
A more idealistic and simplistic way to extract the source code of a function from TestFile.py would be to use the inspect.getsource() method as follows:
#Import necessary packages
import os
import sys
import inspect
#This function takes as input your pyton .py file and the function name in the .py file for which code is needed
def giveFunctionData(file_path,function_name):
folder_path = os.path.dirname(os.path.abspath(file_path))
#Change directory to the folder containing the .py file
os.chdir(folder_path)
head, tail = os.path.split(file_path)
tail = tail.split('.')[0]
#Contruct import statement for the function that needs to be imported
import_statement = "from " + tail + " import " + function_name
#Execute the import statement
exec(import_statement)
#Extract the function code with comments
function_code_with_comments = eval("inspect.getsource("+function_name+")")
#Now, filter out the comments from the function code
function_code_without_comments = ''
for line in function_code_with_comments.splitlines():
currentstr = line.lstrip()
if not currentstr.startswith("#"):
function_code_without_comments = function_code_without_comments + line + '\n'
return function_code_without_comments
#Specify absolute path of your python file from which function code needs to be extracted
file_path = "Path_To_Testfile.py"
#Specify the name of the function for which code is needed
function_name = "sub"
#Print the output function code without comments by calling the function "giveFunctionData(file_path,function_name)"
print(giveFunctionData(file_path,function_name))
This method will work for any kind of function code that you need to extract irrespective of the formatting of the .py file where the function is present instead of parsing the .py file as a string variable.
Cheers!

python how to decode words using stack?

i have a txt file which is written like this:
6
abcd<<<<<<
n<J<g<1<A<
ABCD<<<1>>>2<<3>->>>>>>
and i want to decode this file by python using 'stack' .
at this file, '<' means cursor moves this ← way
'>'means cursor moves → this way
and '-' means delete the left word right after cursor's location.
so finally, what i should want is
abcd
A1gJn
A1BC32
i tried to make a certain function to solve this question
but i don't know what is wrong with my function.
The written below is what i made.
def decodeString_stack(string):
"""Recover a string from a keylog string
input: string(string), output:(decoded string)
deCoded[ ] : list of decoded string(cursor left)
temp[ ] : list of decoded string(cursor right)
"""
deCoded=[]; temp=[]
for ch in string:
if ch=='<':
x=deCoded.pop()
temp.append(x)
elif ch=='>':
x=temp.pop()
deCoded.append(x)
elif ch=='-':
del deCoded[len(deCoded)-1]
return ''.join(deCoded)
it always stops because list is empty one
import time
fr=open("input.txt",'r')
fw=open("output_txt",'w')
print('start decoding')
startTime=time.time()
for aLine in fr:
deCoded=decodeString_stack(aLine)
print(deCoded)
exeTime=time.time()-startTime
print("decode complete(laspe time= %.2f sec)" %exeTime)
fr.close(); fw.close()
how can i make it right?
Notice that you are defining an empty list, 'deCoded', then immediately trying to 'pop()' from it. For pop to work, something has to be in the list. I suspect this is homework so I won't delve too far into a solution - just note that pop is taking something off the stack, but you can't take something off if nothing exists : )
deCoded=[]; temp=[]
for ch in string:
if ch=='<':
x=deCoded.pop()
temp.append(x)
Without giving away too much, hopefully this will get you started. Also, if you don't use it already, I highly recommend Pycharm. You can get it free - and it will allow you to step debug through your code. It will be a life saver.
def decode_line(line : str):
temp = []
decoded = []
for char in line:
if char == "<":
if temp:
c = temp.pop()
decoded.append(c)
elif char == ">":
pass
elif char == "-":
pass
else:
temp.append(char)
reversed = decoded[::-1]
return reversed
with open('source.txt') as source:
for line in source:
out = decode_line(line)
print(out)

Counting words in a dictionary (Python)

I have this code, which I want to open a specified file, and then every time there is a while loop it will count it, finally outputting the total number of while loops in a specific file. I decided to convert the input file to a dictionary, and then create a for loop that every time the word while followed by a space was seen it would add a +1 count to WHILE_ before finally printing WHILE_ at the end.
However this did not seem to work, and I am at a loss as to why. Any help fixing this would be much appreciated.
This is the code I have at the moment:
WHILE_ = 0
INPUT_ = input("Enter file or directory: ")
OPEN_ = open(INPUT_)
READLINES_ = OPEN_.readlines()
STRING_ = (str(READLINES_))
STRIP_ = STRING_.strip()
input_str1 = STRIP_.lower()
dic = dict()
for w in input_str1.split():
if w in dic.keys():
dic[w] = dic[w]+1
else:
dic[w] = 1
DICT_ = (dic)
for LINE_ in DICT_:
if ("while\\n',") in LINE_:
WHILE_ += 1
elif ('while\\n",') in LINE_:
WHILE_ += 1
elif ('while ') in LINE_:
WHILE_ += 1
print ("while_loops {0:>12}".format((WHILE_)))
This is the input file I was working from:
'''A trivial test of metrics
Author: Angus McGurkinshaw
Date: May 7 2013
'''
def silly_function(blah):
'''A silly docstring for a silly function'''
def nested():
pass
print('Hello world', blah + 36 * 14)
tot = 0 # This isn't a for statement
for i in range(10):
tot = tot + i
if_im_done = false # Nor is this an if
print(tot)
blah = 3
while blah > 0:
silly_function(blah)
blah -= 1
while True:
if blah < 1000:
break
The output should be 2, but my code at the moment prints 0
This is an incredibly bizarre design. You're calling readlines to get a list of strings, then calling str on that list, which will join the whole thing up into one big string with the quoted repr of each line joined by commas and surrounded by square brackets, then splitting the result on spaces. I have no idea why you'd ever do such a thing.
Your bizarre variable names, extra useless lines of code like DICT_ = (dic), etc. only serve to obfuscate things further.
But I can explain why it doesn't work. Try printing out DICT_ after you do all that silliness, and you'll see that the only keys that include while are while and 'while. Since neither of these match any of the patterns you're looking for, your count ends up as 0.
It's also worth noting that you only add 1 to WHILE_ even if there are multiple instances of the pattern, so your whole dict of counts is useless.
This will be a lot easier if you don't obfuscate your strings, try to recover them, and then try to match the incorrectly-recovered versions. Just do it directly.
While I'm at it, I'm also going to fix some other problems so that your code is readable, and simpler, and doesn't leak files, and so on. Here's a complete implementation of the logic you were trying to hack up by hand:
import collections
filename = input("Enter file: ")
counts = collections.Counter()
with open(filename) as f:
for line in f:
counts.update(line.strip().lower().split())
print('while_loops {0:>12}'.format(counts['while']))
When you run this on your sample input, you correctly get 2. And extending it to handle if and for is trivial and obvious.
However, note that there's a serious problem in your logic: Anything that looks like a keyword but is in the middle of a comment or string will still get picked up. Without writing some kind of code to strip out comments and strings, there's no way around that. Which means you're going to overcount if and for by 1. The obvious way of stripping—line.partition('#')[0] and similarly for quotes—won't work. First, it's perfectly valid to have a string before an if keyword, as in "foo" if x else "bar". Second, you can't handle multiline strings this way.
These problems, and others like them, are why you almost certainly want a real parser. If you're just trying to parse Python code, the ast module in the standard library is the obvious way to do this. If you want to be write quick&dirty parsers for a variety of different languages, try pyparsing, which is very nice, and comes with some great examples.
Here's a simple example:
import ast
filename = input("Enter file: ")
with open(filename) as f:
tree = ast.parse(f.read())
while_loops = sum(1 for node in ast.walk(tree) if isinstance(node, ast.While))
print('while_loops {0:>12}'.format(while_loops))
Or, more flexibly:
import ast
import collections
filename = input("Enter file: ")
with open(filename) as f:
tree = ast.parse(f.read())
counts = collections.Counter(type(node).__name__ for node in ast.walk(tree))
print('while_loops {0:>12}'.format(counts['While']))
print('for_loops {0:>14}'.format(counts['For']))
print('if_statements {0:>10}'.format(counts['If']))

Open file and read to find first instance of number. Python

Im stuck on a problem for an assignment, I need to write a program that opens a file on my computer, and scans that file for the first instance of a number. Once it is found it will return
The first number in , filenm is x
otherwise it will say there is no number in filenm.
My code so far is below:
When i run it no matter what it always says theres no number :(
filenm = raw_input("Enter a file name: ")
datain=open(filenm,"r")
try:
c=datain.read(1)
result = []
for line in datain:
c=datain.read(1)
while int(c) >= 0:
c = datain.read(1)
result.append(c)
except:
pass
if len(result) > 0:
print "The first number is",(" ".join(result))+" . "
else:
print "There is no number in" , filenm + "."
That's all you need:
import re
with open("filename") as f:
for line in f:
s=re.search(r'\d+',line)
if s:
print(s.group())
break
open the file;
read it in a loop char-by-char;
check if the char is digit, print whatever you want;
it means there are no numbers in the file, if end-of-file is reached, print "no numbers"
Use <string>.isdigit() method to check if the given string (a single character in your case) is a digit.
I don't recommend mixing iterating through a file
for line in datain:
with using the read method (or any similar one)
c=datain.read(1)
Just stick with one or the other. Personally, I would go with iterating here.
readlines() method returns a list of all the lines in the file. You can then iterate trough the list of characters in each line:
filenm = raw_input("Enter a file name: ")
datain=open(filenm,"r")
try:
result = []
for line in datain.readlines():
print 'line: ' + line
for each in line:
try:
# attempt casting a number to int
int(each)
# if it worked it add it to the result list
result.append(each)
except:
pass
except:
pass
print result
if len(result) > 0:
print "The first number is",(" ".join(result[0]))+". "
else:
print "There is no number in" , filenm + "."
This will only work with the first number character it finds, not sure if you actually need to extract multi digit numbers.
My thoughts:
1) As others noted, don't mask the exception. It would be better to let it be thrown - at least that way you find out what went wrong, if something went wrong.
2) You do want to read the file a line at a time, using for line in file:. The reason for this is that the numbers you want to read are basically "words" (things separated by spaces), and there isn't a built-in way to read the file a word at a time. file.read(1) reads a single byte (i.e. character, for an ASCII file), and then you have to put them together into words yourself, which is tedious. It's much easier to tell Python to split a line into words. At least, I'm assuming here that you're not supposed to pick out the "20" from "spam ham20eggs 10 lobster thermidor".
.readlines() is somewhat redundant; it's a convenience for making a list of the lines in the file - but we don't need that list; we just need the lines one at a time. There is a function defined called .xreadlines() which does that, but it's deprecated - because we can just use for line in file:. Seriously - just keep it simple.
3) int in Python will not return a negative value if the input is non-numeric. It will throw an exception. Your code does not handle that exception properly, because it would break out of the loop. There is no way to tell Python "keep going from where you threw the exception" - and there shouldn't be, because how is the rest of the code supposed to account for what happened?
Actually your code isn't too far off. There are a number of problems. One big one is that the try/except hides errors from you which might have help you figure things out yourself. Another was that you're reading the file with a mixture of a line at a time (and ignoring its contents entirely) as well as a character at a time.
There also seems to be a misunderstand on your part about what the int() function does when given a non-numeric character string, what it does is raise an exception rather than returning something less than 0. While you could enclose a call to it it in a try/except with the except being specifically for ValueError, in this case however it would be easier to just check ahead of time to see if the character is a digit since all you want to do is continue doing that until one that isn't is seen.
So here's one way your code could be revised that would address the above issues:
import string
filenm = raw_input("Enter a file name: ")
datain = open(filenm,"r")
# find first sequence of one or more digits in file
result = []
while True:
c = datain.read(1)
while c in string.digits: # digit?
result.append(c)
c = datain.read(1)
if c == "" or len(result) > 0: # end-of-file or a number has been found
break # get out of loop
if len(result) > 0:
print "The first number is'", "".join(result) + "'."
else:
print "There is no number in'", filenm + "'."
close(datain)

Categories

Resources