How to extract a specific line out of a text file - python

I have the code from the attached picture in a .can-file, which is in this case a text file. The task is to open the file and extract the content of the void function. In this case it would be "$LIN::Kl_15 = 1;"
This is what I already got:
Masterfunktionsliste = open("C:/.../Masterfunktionsliste_Beispiel.can", "r")
Funktionen = []
Funktionen = Masterfunktionsliste.read()
Funktionen = Funktionen.split('\n')
print(Funktionen)
I receive the following list:
['', '', 'void KL15 ein', '{', '\t$LIN::Kl_15 = 1;', '}', '', 'void Motor ein', '{', '\t$LIN::Motor = 1;', '}', '', '']
And now i want to extract the $LIN::Kl_15 = 1; and the $LIN::Motor = 1; line into variables.

Use the { and } lines to decide what lines to extract:
scope_depth = 0
line_stack = list(reversed(Funktionen))
body_lines = []
while len(line_stack) > 0:
next = line_stack.pop()
if next == '{':
scope_depth = scope_depth + 1
elif next == '}':
scope_depth = scope_depth - 1
else:
# test that we're inside at lest one level of {...} nesting
if scope_depth > 0:
body_lines.append(next)
body_lines should now have values ['$LIN::Kl_15 = 1;', '$LIN::Motor = 1;']

You can loop through the list, search for your variables and save it as dict:
can_file_content = ['', '', 'void KL15 ein', '{', '\t$LIN::Kl_15 = 1;', '}', '', 'void Motor ein', '{', '\t$LIN::Motor = 1;', '}', '', '']
extracted = {}
for line in can_file_content:
if "$LIN" in line: # extract the relevant line
parsed_line = line.replace(";", "").replace("\t", "") # remove ";" and "\t"
variable, value = parsed_line.split("=") # split on "="
extracted[variable.strip()] = value.strip() # remove whitespaces
output is {'$LIN::Kl_15': '1', '$LIN::Motor': '1'}
now you can access your new variables with extracted['$LIN::Motor'] which is 1

Related

How to save list, dict, int into file, and assign back from file

I have an PYQT5 UI with that allow user to store values into list, int, and qttextedit (allow user to write as code editor.
Then i save existing values into txt file, so after user changed values can always come back and reload from the local file. The problem i face is when i assign back, it became a letter by letter, even the code assign back into UI will have \n. I also tried turn them into a list, but result still bad.
So i think save into a txt is a bad idea, I looked for pickle but not sure if it can be used or how to use in this case.
score = 22
ocount = 88
scenario = {something here already}
...
usercode = 'var game = 0;\n if (game > 3){gid = 22}'
#List, int, dict, str store to txt file
with open(configName + ".txt", 'w') as f:
f.write(repr(score) + "\n" + \
repr(ocount) + "\n" + \
repr(option) + "\n" + \
repr(option_ans) + "\n" + \
repr(matchname) + "\n" + \
repr(scenario) + "\n" + \
repr(_Options) + "\n" + \
repr(_Optionselected) + "\n" + \
repr(eInfo) + "\n" + \
repr(sVar) + "\n" + \
repr(cusotm) + "\n" + \
repr(survey1) + "\n" + \
repr(survey_ans) + "\n" + \
repr(eInfoName) + "\n" + \
repr(sVarName) + "\n" + \
repr(tempcode) + "\n" + \
repr(usercode))
The value output as below:
2
1
['q1_a1', 'q1_a2', 'q2_a1']
['q1_a1_ans', 'q1_a2_ans', 'q2_a1_ans']
['log1', 'log2']
{'log1': 'if (q1_a1) scores.team1 - 1q1_a1_ans== "23234"){\nproduct_ids = [11,22];\n}', 'log2': 'if (scores.team2 != 11\nproduct_ids = [33,2];\n}'}
['var q1_a1 = 11;', 'var q1_a2 = 11122;', 'var q2_a1 = 2;']
['var q1_a1_ans = 22;', 'var q1_a2_ans = 222;', 'var q2_a1_ans = 2;']
['123']
['321']
['cam1']
['q1_a1', 'q1_a2', 'q2_a1']
['q1_a1_ans', 'q1_a2_ans', 'q2_a1_ans']
['123']
['321']
'+ + \nelse if (\nelse if (\nelse if (\nelse if ('
'var has_answer = function (answer) {\n return indexOf(answer) >= 0;\n };'
#Load back
tmp=[]
f = open("lo.txt", "r")
Lines = f.readlines()
for x in Lines:
tmp.append(x)
score = tmp[0]
ocount = tmp[1]
etc...
Result:
#try change to list
['[', "'", 'q', '1', '_', 'a', '1', "'", ',', ' ', "'", 'q', '1', '_', 'a', '2', "'", ',', ' ', "'", 'q', '2', '_', 'a', '1', "'", ']', '\n']
use exec() to execute string as code
var = ['score', 'option', 'tempcode', 'usercode']
f = open("lo.txt", "r")
Lines = f.readlines()
for i in range(0, len(Lines)):
exec(f'{var[i]} = {Lines[i]}')
# check type
exec(f'print({var[i]}, type({var[i]}))')
output
2 <class 'int'>
['q1_a1', 'q1_a2', 'q2_a1'] <class 'list'>
+ +
else if (
else if (
else if (
else if ( <class 'str'>
var has_answer = function (answer) {
return indexOf(answer) >= 0;
}; <class 'str'>
lo.txt
2
['q1_a1', 'q1_a2', 'q2_a1']
'+ + \nelse if (\nelse if (\nelse if (\nelse if ('
'var has_answer = function (answer) {\n return indexOf(answer) >= 0;\n };'
Or write them in json instead, make sure list is still list.

Transform a code tokens list into valid string code

I have written code to transform Python code into a list to compute BLEU score:
import re
def tokenize_for_bleu_eval(code):
code = re.sub(r'([^A-Za-z0-9_])', r' \1 ', code)
code = re.sub(r'([a-z])([A-Z])', r'\1 \2', code)
code = re.sub(r'\s+', ' ', code)
code = code.replace('"', '`')
code = code.replace('\'', '`')
tokens = [t for t in code.split(' ') if t]
return tokens
Thanks to this snippet my code struct.unpack('h', pS[0:2]) is parsed properly into the list ['struct', '.', 'unpack', '(', 'h', ',', 'p', 'S', '[', '0', ':', '2', ']', ')'].
Initially, I thought I need simply to use the ' '.join(list_of_tokens) but it kills my variable names like this struct . unpack ('h' , p S [ 0 : 2 ] ) and my code is not executable.
I tried to use Regex to stick some variable names but I can't succeed to reverse my function tokenize_for_bleu_eval to find executable code at the end. Is someone get an idea, perhaps without regex which seems to be too complicated here?
EDIT: We can't just remove all spaces between element of the list because there are examples like items = [item for item in container if item.attribute == value] where the result of the backtranslation without space would be itemforiteminaifitem[0]==1 which is not valid.
I am trying to merge the tokens using this script
import re
def tokenize_for_bleu_eval(code):
code = re.sub(r'([^A-Za-z0-9_])', r' \1 ', code)
code = re.sub(r'([a-z])([A-Z])', r'\1 \2', code)
code = re.sub(r'\s+', ' ', code)
code = code.replace('"', '`')
code = code.replace('\'', '`')
tokens = [t for t in code.split(' ') if t]
return tokens
def merge_tokens(tokens):
code = ''.join(tokens)
code = code.replace('`', "'")
code = code.replace(',', ", ")
return code
tokenize = tokenize_for_bleu_eval("struct.unpack('h', pS[0:2])")
print(tokenize) # ['struct', '.', 'unpack', '(', '`', 'h', '`', ',', 'p', 'S', '[', '0', ':', '2', ']', ')']
merge_result = merge_tokens(tokenize)
print(merge_result) # struct.unpack('h', pS[0:2])
Edit:
I found this interesting idea to tokenize and merge.
import re
def tokenize_for_bleu_eval(code):
tokens_list = []
codes = code.split(' ')
for i in range(len(codes)):
code = codes[i]
code = re.sub(r'([^A-Za-z0-9_])', r' \1 ', code)
code = re.sub(r'([a-z])([A-Z])', r'\1 \2', code)
code = re.sub(r'\s+', ' ', code)
code = code.replace('"', '`')
code = code.replace('\'', '`')
tokens = [t for t in code.split(' ') if t]
tokens_list.append(tokens)
if i != len(codes) -1:
tokens_list.append([' '])
flatten_list = []
for tokens in tokens_list:
for token in tokens:
flatten_list.append(token)
return flatten_list
def merge_tokens(flatten_list):
code = ''.join(flatten_list)
code = code.replace('`', "'")
return code
test1 ="struct.unpack('h', pS[0:2])"
test2 = "items = [item for item in container if item.attribute == value]"
tokenize = tokenize_for_bleu_eval(test1)
print(tokenize) # ['struct', '.', 'unpack', '(', '`', 'h', '`', ',', ' ', 'p', 'S', '[', '0', ':', '2', ']', ')']
merge_result = merge_tokens(tokenize)
print(merge_result) # struct.unpack('h', pS[0:2])
tokenize = tokenize_for_bleu_eval(test2)
print(tokenize) # ['items', ' ', '=', ' ', '[', 'item', ' ', 'for', ' ', 'item', ' ', 'in', ' ', 'container', ' ', 'if', ' ', 'item', '.', 'attribute', ' ', '=', '=', ' ', 'value', ']']
merge_result = merge_tokens(tokenize)
print(merge_result) # items = [item for item in container if item.attribute == value]
This script will also remember each space from the input

How do I save specific text in an array from .txt file

I have a .txt file, where I want to save only following characters "N", "1.1" ,"XY", "N", "2.3" ,"xz" in an array.
The .txt file looks like this:
[ TITLE
N 1.1 XY
N 2.3 XZ
]
Here is my code:
src = open("In.txt", "r")
def findOp (row):
trig = False
temp = ["", "", ""]
i = 1
n = 0
for char in row:
i += 1
if (char != '\t') & (char != ' ') & (char != '\n'):
trig = True
temp[n] += char
else:
if trig:
n += 1
trig = False
return temp
for line in src.readlines():
print(findOp(line))
The Output from my code is:
['[', 'TITLE', '']
['', '', '']
['N', '1.1', 'XY']
['N', '2.3', 'XZ']
['', '', '']
[']', '', '']
The problem is the program also saves whitespace characters in an array which i dont want.
I would recommend the trim()-function with witch one you can remove whitespace from a string
Whitespace on both sides:
s = s.strip()
Whitespace on the right side:
s = s.rstrip()
Whitespace on the left side:
s = s.lstrip()
You could check the return array before exiting:
def findOp(row):
trig = False
temp = ["", "", ""]
i = 1
n = 0
for char in row:
i += 1
if (char != '\t') & (char != ' ') & (char != '\n'):
trig = True
temp[n] += char
else:
if trig:
n += 1
trig = False
# Will return `temp` if all elements eval to True otherwise
# it will return None
return temp if all(temp) else None
The value None can then be used as a check condition in subsequent constructs:
for line in src.readlines():
out = findOp(line)
if out:
print(out)
>> ['N', '1.1', 'XY']
>> ['N', '2.3', 'XZ']
Try numpy.genfromtxt:
import numpy as np
text_arr = np.genfromtxt('In.txt', skip_header = 1, skip_footer = 1, dtype = str)
print(text_arr)
Output:
[['N' '1.1' 'XY']
['N' '2.3' 'XZ']]
Or if you want list, add text_arr.tolist()
Try this :
with open('In.txt', 'r') as f:
lines = [i.strip() for i in f.readlines() if i.strip()][1:-1]
output = [[word for word in line.split() if word] for line in lines]
Output :
[['N', '1.1', 'XY'], ['N', '2.3', 'XZ']]

Tokenize Function Not Working As Expected - Python

These are the instructions:
Write a function tokenize(input_string) that takes a string containing an expression and returns a list of tokens. Tokens in this small language will be delimited by whitespace, and so any time there a space (or several spaces in a row) in the input string, we want to split around that.
You should not use the built-in string operation split, but rather should structure your code using the tools we have developed so far.
When all is said and done, For example, running the tokenizer on this string:
tokenize("2 2 + 3 4 / .5 0.2 3.2 + - COS")
should return:
['2', '2', '+', '3', '4', '/', '.5', '0.2', '3.2', '+', '-', 'COS']
This is my code:
def tokenize(input_string):
tokens = []
token = ""
for char in input_string:
if char == " " and input_string[-1] != char and token != "":
tokens.append(token)
token = ""
elif input_string[-1] == char:
tokens.append(token + char)
elif char != " ":
token += char
return tokens
My code works properly with the given example and similar arguments, but when i run something like:
tokenize("pi load store load")
i get:
['pi', 'load', 'loa', 'store', 'load']
What's the bug? Tried finding it with print statements in various parts of the function to no avail. Also any advice on how to better organize the if statements will be greatly appreciated. Thanks in advance for the help.
I think your flaw is in the line elif input_string[-1] == char:.
If I'm understanding you correctly, you are trying to use this elif case to check if you are at the end of the string, and if you are, to add the last token in the string to your list of tokens.
However, if you have the last character in your string appear more than once, it will go into this case every time; that's why you have both 'loa' and 'load' in your list.
My suggestion is to remove all of your checks for the current character being the same as the last character in the string, and add
if token != "":
tokens.append(token)
after your for loop.
To add to the Izaak Weiss answer, please simplify your logic about the checks, this could be a solution:
def tokenize(input_string):
tokens = []
token = ''
for char in input_string:
if char == ' ': # Possible token termination
if token != '':
tokens.append(token)
token = ''
else:
token += char
# Last token
if token != '':
tokens.append(token)
return tokens
Here are 2 approaches:
The "plain" one that you were attempting to implement (tokenizing the string "manually")
A little bit more advanced one that uses [Python]: str.find(sub[, start[, end]]) (also rfind)
Of course there are others as well (e.g. ones that use recursion, or even regular expressions), but they probably are too advanced.
def tokenize_plain(input_string):
tokens = list()
current_token = ""
for char in input_string:
if char == " ":
if current_token:
tokens.append(current_token)
current_token = ""
else:
current_token += char
if current_token:
tokens.append(current_token)
return tokens
def tokenize_find(input_string):
tokens = list()
start = 0
end = input_string.find(" ", start)
while end != -1:
if end == start:
start += 1
else:
tokens.append(input_string[start: end])
start = end
end = input_string.find(" ", start)
end = input_string.rfind(" ", start)
if end == -1:
tokens.append(input_string[start:])
else:
tokens.append(input_string[start: end])
return tokens
if __name__ == "__main__":
for tokenize in [tokenize_plain, tokenize_find]:
for text in ["pi load store load", "2 2 + 3 4 / .5 0.2 3.2 + - COS"]:
print("{}('{}') = {}".format(tokenize.__name__, text, tokenize(text)))
Output:
c:\Work\Dev\StackOverflow\q46372240>c:\Work\Dev\VEnvs\py35x64_test\Scripts\python.exe a.py
tokenize_plain('pi load store load') = ['pi', 'load', 'store', 'load']
tokenize_plain('2 2 + 3 4 / .5 0.2 3.2 + - COS') = ['2', '2', '+', '3', '4', '/', '.5', '0.2', '3.2', '+', '-', 'COS']
tokenize_find('pi load store load') = ['pi', 'load', 'store', 'load']
tokenize_find('2 2 + 3 4 / .5 0.2 3.2 + - COS') = ['2', '2', '+', '3', '4', '/', '.5', '0.2', '3.2', '+', '-', 'COS']

PYTHON How to count letters in words without special characters

I have a code that counts letters in words excluding special characters at the end. I just can't figure out a way to get it to exclude special character at the beginning also.
My code so far:
inFile = open( 'p.txt', "r" ).readlines()
myResults = []
for i in range( 20 ):
myResults.append( 0 )
mySpecialList = [ '-', '+', '#', '#', '!', '(', ')', '?', '.', ',', ':', ';', '"', "'", '`' ]
for line in inFile:
words = str.split( line )
for word in words:
if word not in mySpecialList:
if word[ -1 ] not in mySpecialList :
myResults[ len( word ) ] += 1
else :
myResults[ len( word ) - 1 ] += 1
print( myResults )
Here is some simple code to count all the alpha numeric letters of a single word.
word = "Hello World!"
count = 0
for c in word:
if c.isalnum():
count +1
print( count )
If you wanted to use your special characters you could adapt the code to look like
mySpecialList = ['*', '!']
word = "Hello World!"
count = 0
for c in word:
if c not in mySpecialList:
count +1
print( count )
You can use regular expressions, try it!
For example you can split string and after findall you have a list with all words.
import re
string = "Hello World, Hi + Say"
print(re.findall(r"[\w']+", string))
def reverseword(user_input):
words=str(user_input).split(" ")
newWords = [word[::-1] for word in words]
newSentence = " ".join(newWords)
return newSentence
if __name__ == "__main__":
while True:
ispresent=0
splcharlist=['-', '+', '#', '#', '!', '(', ')', '?', '.', ',', ':', ';', '"', "'", '`'," "]
user_input=input("Enter the input:")
print(len(user_input))
ccount=0
new_input=""
ch_count=0
if len(user_input)>100:
for eletter in user_input:
if eletter not in splcharlist:
ccount=ccount+1
ch_count=ch_count+1
if ccount>100:
break
new_input=user_input[:100]
else:
new_input=user_input
print("This is for your input:",user_input)
print("input with limit :"+str(new_input))
print(len(new_input))
print("The Reverse lists is: ",reverseword(new_input))
if "stop" in user_input:
break

Categories

Resources