multiple occurrence of the same thing in python list - python

I'm trying to implement a postscript interpreter in python. For this part of the program, I'm trying to access multiple occurrences of the same element in a list, but the function call does not do that. I can explain it better with the code.
This loop steps through a list of tokens
for token in tokens:
process_token(token)
tokens is define as:
line = "/three 3 def /four 4 def"
tokens = line.strip().split(" ")
So after this is done tokens looks like ['/three', '3', 'def', '/four', '4', 'def'].
Process tokens will continue to push thing on to a stack until it reaches an operation to be done, in this case def. Once it gets to def it will execute:
if (t == "def"):
handle_def (tokens.index(t)-2, tokens.index(t)-1)
stack.pop()
and here is handle_def():
def handle_def (t, t1):
name = tokens[t]
defin = tokens [t1]
x=name[1:]
dict1 [x]= float(defin)
The problem is when it is done adding {'three':3} to the dictionary, it then should keep reading and add {'four':4} to the dictionary. But when handle_def (tokens.index(t)-2, tokens.index(t)-1) is called it will pass in the index numbers for the first occurrence of def, meaning it just puts {'three':3} into the dictionary again. I want it to skip past the first one and go the later occurrences of the word def. How do I make it do that?
Sorry for the long post, but i felt like it needed the explanation.

list.index will give only the first occurrence in the list. You can use the enumerate function to get the index of the current item being processed, like this
for index, token in enumerate(tokens):
process_token(index, token)
...
...
def process_token(index, t):
...
if t == "def":
handle_def (index - 2, index - 1)
...

Related

Iterate over Python list with clear code - rewriting functions

I've followed a tutorial to write a Flask REST API and have a special request about a Python code.
The offered code is following:
# data list is where my objects are stored
def put_one(name):
list_by_id = [list for list in data_list if list['name'] == name]
list_by_id[0]['name'] = [new_name]
print({'list_by_id' : list_by_id[0]})
It works, which is nice, and even though I understand what line 2 is doing, I would like to rewrite it in a way that it's clear how the function iterates over the different lists. I already have an approach but it returns Key Error: 0
def put(name):
list_by_id = []
list = []
for list in data_list:
if(list['name'] == name):
list_by_id = list
list_by_id[0]['name'] = request.json['name']
return jsonify({'list_by_id' : list_by_id[0]})
My goal with this is also to be able to put other elements, that don't necessarily have the type 'name'. If I get to rewrite the function in an other way I'll be more likely to adapt it to my needs.
I've looked for tools to convert one way of coding into the other and answers in forums before coming here and couldn't find it.
It may not be beatiful code, but it gets the job done:
def put(value):
for i in range(len(data_list)):
key_list = list(data_list[i].keys())
if data_list[i][key_list[0]] == value:
print(f"old value: {key_list[0], data_list[i][key_list[0]]}")
data_list[i][key_list[0]] = request.json[test_key]
print(f"new value: {key_list[0], data_list[i][key_list[0]]}")
break
Now it doesn't matter what the key value is, with this iteration the method will only change the value when it finds in the data_list. Before the code breaked at every iteration cause the keys were different and they played a role.

parse nested function to extract each inner function in python

I have a nested expression as below
expression = 'position(\'a\' IN Concat("function_test"."PRODUCT_CATEGORIES"."CATEGORY_NAME" , "function_test"."PRODUCT_CATEGORIES"."CATEGORY_NAME" ))'
I want the output as by retreiving nested function first and then outer functions
['Concat("function_test"."PRODUCT_CATEGORIES"."CATEGORY_NAME" , "function_test"."PRODUCT_CATEGORIES"."CATEGORY_NAME" )','position(\'a\' IN Concat("function_test"."PRODUCT_CATEGORIES"."CATEGORY_NAME" , "function_test"."PRODUCT_CATEGORIES"."CATEGORY_NAME" ))']
Below is the code I have tried
result = []
for i in range(len(expression)):
if expression[i]=="(":
a.append(i)
elif expression[i]==")":
fromIdx=a.pop()
fromIdx2=max(a[-1],expression.rfind(",", 0, fromIdx))
flag=False
for (fromIndex, toIndex) in first_Index:
if fromIdx2 + 1 >= fromIndex and i <= toIndex:
flag=True
break
if flag==False:
result.append(expression[fromIdx2+1:i+1])
But this works only if expression is separated by ','
for ex:
expression = 'position(\'a\' , Concat("function_test"."PRODUCT_CATEGORIES"."CATEGORY_NAME" , "function_test"."PRODUCT_CATEGORIES"."CATEGORY_NAME" ))'
and result for this expression from my code will be correct as exprected.
In first expression ,I mentioned ,there is IN operator instead of ',' hence my code doesnt work.
Please help
If you want it to be reliable, you need a full-fledged SQL parser. Fortunately, there is an out-of-box solution for that: https://pypi.org/project/sqlparse/. As soon as you have a parsed token tree, you can walk through it and do what you need:
import sqlparse
def extract_functions(tree):
res = []
def visit(token):
if token.is_group:
for child in token.tokens:
visit(child)
if isinstance(token, sqlparse.sql.Function):
res.append(token.value)
visit(tree)
return res
extract_functions(sqlparse.parse(expression)[0])
Explanation.
sqlparse.parse(expression) parses the string and returns a tuple of statements. As there is only one statement in the example, we can just take the first element. If there are many statements, you should rather iterate over all tuple elements.
extract_functions recursively walks over a parsed token tree depth first (since you want inner calls appear before outer ones) using token.is_group to determine if the current token is a leaf, tests if the current token is a function, and if it is, appends its string representation (token.value) to the result list.

How to decode a list and remove items from two lists when there is a match in both of them based on an index?

I have two lists which contain the following type of information.
List #1:
Request_List = ["1/1/1.34", "1/2/1.3.5", "1/3/1.2.3", ...same format elements]
List #2:
Reply_List = ["1/1/0", "1/3/1", "1/2/0", ...same format elements]
From the "Reply" list, I want to be able to compare the second item in the "#/#/#", in this case it will be 1,3,2, and so on with all the items in the Reply list and check if there is a match with the second item in "Request list". If there is a match, then I want to be able to return a new list which would contain the information of the third index in the request string appended with the third index of the matching string in the reply.
The result would be like the following.
Result = ["1.34.0", "1.3.5.0", "1.2.3.1"]
Note that the 0 was appended to the 1.34, the 1 was appended to the 1.3.4 and the 0 was appended to the 1.2.3 from the corresponding indexes in the "Reply" list as the second index existed in the "Reply" list. The 'Reply" list could have the item anywhere placed in the list.
The code which does the problem stated above is shown below.
def get_list_of_error_codes(self, Reply_List , Request_List ):
decoded_Reply_List = Reply_List .decode("utf-8") # I am not sure if this is
the right way to decode all the elements in the list?
Result = [
f"{i.split('/')[-1]}.{j.split('/')[-1]}"
for i in Request_List
for j in decoded_Reply_List
if (i.split("/")[1] == j.split("/")[1])
]
return Result
res = get_list_of_error_codes(Reply_List , Request_List)
print (res) # ["1.34.0", "1.3.5.0", "1.2.3.1"]
Issues I am facing right now:
I am NOT sure if I decode the Reply_List correctly and in the proper manner. Can someone help me also verify this?
I am not sure on how to also remove the corresponding items for the Reply_List and Request_List when I find a match based on the condition if (i.split("/")[1] == j.split("/")[1]).
You can use list comprehension to decode the list:
decoded_Reply_List = [li.decode(encoding='utf-8') for li in Reply_List]
In this case, if you wanted to also remove items from the list while you create the new list, I would say list comprehension isn't the right move. Just go with the nested for loops:
def get_list_of_error_codes(self, Reply_List, Request_List):
decoded_Reply_List = [li.decode(encoding='utf-8') for li in Reply_List]
Result = []
for i in list(Request_List):
for j in decoded_Reply_List:
if (i.split("/")[1] == j.split("/")[1]):
Result.append(f"{i.split('/')[-1]}.{j.split('/')[-1]}")
Reply_List.remove(j)
break
else:
continue
Request_List.remove(i)
return Result
Request_List = ["1/1/1.34", "1/2/1.3.5", "1/3/1.2.3"]
Reply_List = [b"1/1/0", b"1/3/1", b"1/2/0"]
print(get_list_of_error_codes("Foo", Reply_List, Request_List))
# Output: ['1.34.0', '1.3.5.0', '1.2.3.1']
Some things to note:
I added a break so that we don't keep looking for matches if we find one. It will only match the first pair, then move on.
In for i in list(Request_List), I added the list() cast to effectively make a copy of the list. This allows us to remove entries from Request_List without disrupting the loop. I didn't do this for for j in decoded_Reply_List because it's already a copy of Reply_List. (I assumed you wanted to remove the entries from Reply_List)
The last is the else: continue. We don't want to reach Request_List.remove(i) if we didn't find a match. If break is called, else will not be called, which means we will reach Request_List.remove(i). But if the loop completes without finding a match, the loop will then enter else and we will skip the removal step by calling continue
EDIT:
Actually, Reply_List.remove(j) breaks, since we've decoded j in this method, thus decoded j is not the same object as it is in Reply_List. Here's some revised code which will solve this issue:
def get_list_of_error_codes(Reply_List, Request_List):
# decoded_Reply_List = [li.decode(encoding='utf-8') for li in Reply_List]
Result = []
for i in list(Request_List):
for j in list(Reply_List):
dj = j.decode(encoding='utf-8')
if (i.split("/")[1] == dj.split("/")[1]):
Result.append(f"{i.split('/')[-1]}.{dj.split('/')[-1]}")
Reply_List.remove(j)
break
else:
continue
Request_List.remove(i)
return Result
Request_List = ["1/1/1.34", "1/2/1.3.5", "1/3/1.2.3"]
Reply_List = [b"1/1/0", b"1/3/1", b"1/2/0"]
print("Result: ", get_list_of_error_codes(Reply_List, Request_List))
print("Reply_List: ", Reply_List)
print("Request_List: ", Request_List)
# Output:
# Result: ['1.34.0', '1.3.5.0', '1.2.3.1']
# Reply_List: []
# Request_List: []
What I've done is that instead of creating a separate decoded list, I just decode the entries as they're looped through, and then remove the un-decoded entry from Reply_List. This should be a little more efficient too, since we're not looping through Reply_List twice now.

Find elements between two tags in a list

Language: Python 3.4
OS: Windows 8.1
I have some lists like the following:
a = ['text1', 'text2', 'text3','text4','text5']
b = ['text1', 'text2', 'text3','text4','New_element', 'text5']
What is the simplest way to find the elements between two tags in a list?
I want to be able to get it even if the lists and tags have variable number of elements or variable length.
Ex: get elements between text1 and text4 or text1 or text5, etc. Or get the elements between text1 and text5 that has longer length.
I tried using regular expressions like:
re.findall(r'text1(.*?)text5', a)
This will give me an error I guess because you can only use this in a string but not lists.
To get the location of an element in a list use index(). Then use the discovered index to create a slice of the list like:
Code:
print(b[b.index('text3')+1:b.index('text5')])
Results:
['text4', 'New_element']
You can use the list.index method to find the first occurrence of each of your tags, then slice the list to get the value between the indexes.
def find_between_tags(lst, start_tag, end_tag):
start_index = lst.index(start_tag)
end_index = lst.index(end_tag, start_index)
return lst[start_index + 1: end_index]
If either of the tags is not in the list (or if the end tag only occurs before the start tag), one of the index calls will raise a ValueError. You could suppress the exception if you want to do something else, but just letting the caller deal with it seems like a reasonable option to me, so I've left the exception uncaught.
If the tags might occur in this list multiple times, you could extend the logic of the function above to find all of them. For this you'll want to use the start argument to list.index, which will tell it not to look at values before the previous end tag.
def find_all_between_tags(lst, start_tag, end_tag):
search_from = 0
try:
while True:
start_index = lst.index(start_tag, search_from)
end_index = lst.index(end_tag, start_index + 1)
yield lst[start_index + 1:end_index]
search_from = end_index + 1
except ValueError:
pass
This generator does suppress the ValueError, since it keeps on searching until it can't find another pair of tags. If the tags don't exist anywhere in the list, the generator will be empty, but it won't raise any exception (other than StopIteration).
You can get the items between the values by utilizing the index function to search for the index of both objects in the list. Be sure to add one to the index of the first object so it won't be included in the result. See my code below:
def get_sublist_between(e1, e2, li):
return li[li.index(e1) + 1:li.index(e2)]

Python - Autocomplete extension for numbers and suggestion using random queries

I have a very minimalistic code that performs autocompletion for input queries set by the user by storing a historical data of names(close to 1000) in a list. Right now, it gives suggestion in lexicographical smallest order.
The names stored in a list are (fictitious):
names = ["show me 7 wonders of the world","most beautiful places","top 10 places to visit","Population > 1000","Cost greater than 100"]
The queries given by the user can be:
queries = ["10", "greater", ">", "7 w"]
Current Implementation:
class Index(object):
def __init__(self, words):
index = {}
for w in sorted(words, key=str.lower, reverse=True):
lw = w.lower()
for i in range(1, len(lw) + 1):
index[lw[:i]] = w
self.index = index
def by_prefix(self, prefix):
"""Return lexicographically smallest word that starts with a given
prefix.
"""
return self.index.get(prefix.lower(), 'no matches found')
def typeahead(usernames, queries):
users = Index(usernames)
print "\n".join(users.by_prefix(q) for q in queries)
This works fine if the queries start with the pre-stored names. But fails to provide suggestions if a random entry is made(querying somewhere from the middle of string). It also does not recognize numbers and fails for that too.
I was wondering if there could be a way to include the above functionalities to improve my existing implementation.
Any help is greatly appreciated.
It's O(n) but it works. Your function is checking if it starts with a prefix, but the behavior you describe you want is checking if the string contains the query
def __init__(self, words):
self.index = sorted(words, key=str.lower, reverse=True)
def by_prefix(self, prefix):
for item in self.index:
if prefix in item:
return item
This gives:
top 10 places to visit
Cost greater than 100
Population > 1000
show me 7 wonders of the world
Just for the record this takes 0.175 seconds on my pc for 5 queries of 1,000,005 records, with the last 5 records being the matching ones. (Worst case scenario)
If you are not concerned by performance, you can use if prefix in item: for every item in your list names. This statement matches if prefix is part of the string item, e.g.:
prefix item match
'foo' 'foobar' True
'bar' 'foobar' True
'ob' 'foobar' True
...
I think that this is the simplest way to achieve this, but clearly not the fastest.
Another option is to add more entries to your index, e.g. for the item "most beautiful places":
"most beautiful places"
"beautiful places"
"places"
If you do this, you also get matches if you start typing a word that's not the first word in the sentence. You can modify your code like this to do that:
class Index(object):
def __init__(self, words):
index = {}
for w in sorted(words, key=str.lower, reverse=True):
lw = w.lower()
tokens = lw.split(' ')
for j in range(len(tokens)):
w_part = ' '.join(tokens[j:])
for i in range(1, len(w_part) + 1):
index[w_part[:i]] = w
self.index = index
The downside from this approach is that the index gets very large. You could also combine this approach with the one pointed out by Keatinge by storing 2-digit prefixes for every word in your index dictionary and store a list of queries that contain this prefix as items of the index dictionary.

Categories

Resources