I have a dictionary which I would like to use to create a tree. The idea is to get the value of the index specified, append it to a list. Use this value as index in the next item of the dictionary and the process repeats until when we get a None
My dictionary
dict = {
'A' : 'AF',
'BF': 'B',
'AF': 'Z',
'Z' : None,
'B' : 'B'
}
I can loop through the dict and get the first value, but I can't a better way of loop recursively through the dict.
Note x is my index parameter I would like to specify.i.e A,BF,AF,Z or B
def tree(x,dict):
result = []
for value in dict:
result.append(value)
#stuck somewhere here.
#I would like to use this value as an index again and pick next value.
#Do this until I have no further relation
#print final results in a list
print result
When tree(x,dict) is called, take x = 'A' the expected result should be:
['A','AF','Z']
Thank you for your assistance and contribution.
The non-recursive version is much faster but there's one that looks nice
>>> def tree(D, x):
if x is None:
return []
else:
return [x] + tree(D, D[x])
>>> tree(D, 'A')
['A', 'AF', 'Z']
Or as a one-liner:
def tree(D, x):
return [] if x is None else [x] + tree(D, D[x])
This will have quadratic runtime since it adds two lists each time, although if you want performance you would just use .append and then it would be much more practical to just use a loop anyway.
def tree(x,dict):
old_value = x
while True:
value = dict.get(old_value)
if not value:
break
result.append(value)
print result
You could also try a recursive generator:
# This will loop "forever"
data = {
'A' : 'AF',
'BF': 'B',
'AF': 'Z',
'Z' : None,
'B' : 'B'
}
def tree(key):
value = data.get(key)
yield key
if value is not None:
for value in tree(value):
yield value
for value in tree("A"):
# Do something with the value
EDIT: the suggested approach above fails to detect cycles and will loop until maximum recursion depth is reached.
The recursive approach below keeps track of visited nodes to detect cycles and exits if so. The most comprehensible description of how to find cycles is from this answer:
data = {
'A' : 'AF',
'BF': 'B',
'AF': 'Z',
'Z' : None,
'B' : 'B'
}
def visit_graph(graph, node, visited_nodes):
print "\tcurrent node: ", node, "\tvisited nodes: ", visited_nodes
# None means we have reached a node that doesn't have any children
if node is None:
return visited_nodes
# The current node has already been seen, the graph has a cycle we must exit
if node in visited_nodes:
raise Exception("graph contains a cycle")
# Add the current node to the list of visited node to avoid cycles
visited_nodes.append(node)
# Recursively call the method with the child node of the current node
return visit_graph(graph, graph.get(node), visited_nodes)
# "A" does not generate any cycle
print visit_graph(data, "A", [])
# Starting at "B" or "BF" will generate cycles
try:
print visit_graph(data, "B", [])
except Exception, e:
print e
try:
print visit_graph(data, "BF", [])
except Exception, e:
print e
Related
I'm trying to put string variables into list/dictionary in python3.7 and trying to retrieve them later for use.
I know that I can create a dictionary like:
string_dict1 = {"A":"A", "B":"B", "C":"C", "D":"D", "E":"E", "F":"F"}
and then retrieve the values, but it is not I want in this specific case.
Here is the code:
A = ""
B = "ABD"
C = ""
D = "sddd"
E = ""
F = "dsas"
string_dict = {A:"A", B:"B", C:"C", D:"D", E:"E", F:"F"}
string_list = [A,B,C,D,E,F]
for key,val in string_dict.items():
if key == "":
print(val)
for item in string_list:
if item == "":
print(string_list.index(item))
The result I got is:
E
0
0
0
And the result I want is:
A
C
E
0
2
4
If you print string_dict you notice the problem:
string_dict = {A:"A", B:"B", C:"C", D:"D", E:"E", F:"F"}
print(string_dict)
# output: {'': 'E', 'ABD': 'B', 'sddd': 'D', 'dsas': 'F'}
It contains a single entry with the value "".
This is because you are associating multiple values to the same key, and this is not possible in python, so only the last assignment is valid (in this case E:"E").
If you want to associate multiple values with the same key, you could associate a list:
string_dict = {A:["A","C","E"], B:"B", D:"D", F:"F"}
Regarding the list of strings string_list, you get 0 since the method .index(item) returns the index of the first occurrence of item in the list. In your case 0. For example, if you change the list [A,B,C,D,E,F] to [B,B,C,D,E,F]. Your code will print 2.
If you want to print the index of the empty string in your list:
for index, value in enumerate(string_list):
if value == '':
print(index)
Or in a more elegant way you can use a list comprehension:
[i for i,x in enumerate(string_list) if x=='']
Well, I don't think there's a way to get what you want from a dictionary because of how they work. You can print your dictionary and see that it looks like this:
{'': 'E', 'ABD': 'B', 'sddd': 'D', 'dsas': 'F'}
What happened here is A was overwritten by C and then E.
But I played around with the list and here's how I got the last three digits right:
for item in string_list:
if item != '':
print(string_list.index(item) - 1)
This prints:
0
2
4
I am trying to extract values in one column based on another column in pandas,
For example suppose I have 2 columns in dataframe as below
>>> check
child parent
0 b a
1 c a
2 d b
3 e d
Now I want to extract all values in column "child" for value in column "parent"
My initial value can differ for now suppose it is "a" in column "parent"
also length of dataframe might differ.
I tried below but it is not working if there are few more matching values and length of dataframe is more
check = pd.read_csv("Book2.csv",encoding='cp1252')
new = (check.loc[check['parent'] == 'a', 'child']).tolist()
len(new)
a=[]
a.append(new)
for i in range(len(new)):
new[i]
new1 = (check.loc[check['parent'] == new[i], 'child']).tolist()
len(new1)
if(len(new1)>0):
a.append(new1)
for i in range(len(new1)):
new2 = (check.loc[check['parent'] == new1[i], 'child']).tolist()
if(len(new1)>0):
a.append(new2)
flat_list = [item for sublist in a for item in sublist]
>>> flat_list
['b', 'c', 'd', 'e']
Is there any efficient way to get desired results, it will be a great help. Please advice
Recursion is a way to do it. Suppose that check is your dataframe, define a recursive function:
final = [] #empty list which is used to store all results
def getchilds(df, res, value):
where = df['parent'].isin([value]) #check rows where parent is equal to value
newvals = list(df['child'].loc[where]) #get the corresponding child values
if len(newvals) > 0:
res.extend(newvals)
for i in newvals: #recursive calls using child values
getchilds(df, res, i)
getchilds(check, final, 'a')
print(final)
print(final) prints ['b', 'c', 'd', 'e'] if check is your example.
This works if you do not have cyclic calls, like 'b' is child of 'a' and 'a' is child of 'b'. If this is the case, you need to add further checks to prevent infinite recursion.
out_dict = {}
for v in pd.unique(check['parent']):
out_dict[v] = list(pd.unique(check['child'][check['parent']==v]))
Then calling out_dict prints:
{'a': ['b', 'c'], 'b': ['d'], 'd': ['e']}
Let me just make a guess and say you want to get all the values of a column child where parent value is x
import pandas as pd
def get_x_values_of_y(comparison_val, df, val_type="get_parent"):
val_to_be_found = ["child","parent"][val_type=="get_parent"]
val_existing = ["child","parent"][val_type != "get_parent"]
mask_value = df[val_existing] == "a"
to_be_found_column = df[mask_value][val_to_be_found]
unique_results = to_be_found_column.unique().tolist()
return unique_results
check = pd.read_csv("Book2.csv",encoding='cp1252')
# to get results of all parents of child "a"
print get_x_values_of_y("a", check)
# to get results of all children of parent "b"
print get_x_values_of_y("b", check, val_type="get_child")
# to get results of all parents of every child
list_of_all_children = check["child"].unique().tolist()
for each_child in list_of_all_children:
print get_x_values_of_y(each_child, check)
# to get results of all children of every parent
list_of_all_parents = check["parent"].unique().tolist()
for each_parent in list_of_all_parents:
print get_x_values_of_y(each_parent, check, val_type= "get_child")
Hope this solves your problem.
EDIT:
Say I have a string of nested parentheses as follows: ((AB)CD(E(FG)HI((J(K))L))) (assume the parantheses are balanced and enclose dproperly
How do i recursively remove the first set of fully ENCLOSED parantheses of every subset of fully enclosed parentheses?
So in this case would be (ABCD(E(FG)HI(JK)). (AB) would become AB because (AB) is the first set of closed parentheses in a set of closed parentheses (from (AB) to K)), E is also the first element of a set of parentheses but since it doesn't have parentheses nothing is changed, and (J) is the first element in the set ((J)K) and therefore the parentheses would be removed.
This is similar to building an expression tree and so far I have parsed it into a nested list and thought I can recursively check if the first element of every nested list isinstance(type(list)) but I don't know how?
The nested list is as follows:
arr = [['A', 'B'], 'C', 'D', ['E', ['F', 'G'], 'H', 'I', [['J'], 'K']]]
Perhaps convert it into:
arr = [A, B, C, D, [E, [F, G], H, I, [J, K]]
Is there a better way?
If I understood the question correctly, this ugly function should do the trick:
def rm_parens(s):
s2 = []
consec_parens = 0
inside_nested = False
for c in s:
if c == ')' and inside_nested:
inside_nested = False
consec_parens = 0
continue
if c == '(':
consec_parens += 1
else:
consec_parens = 0
if consec_parens == 2:
inside_nested = True
else:
s2.append(c)
s2 = ''.join(s2)
if s2 == s:
return s2
return rm_parens(s2)
s = '((AB)CD(E(FG)HI((J)K))'
s = rm_parens(s)
print(s)
Note that this function will call itself recursively until no consecutive parentheses exist. However, in your example, ((AB)CD(E(FG)HI((J)K)), a single call is enough to produce (ABCD(E(FG)HI(JK)).
You need to reduce your logic to something clear enough to program. What I'm getting from your explanations would look like the code below. Note that I haven't dealt with edge cases: you'll need to check for None elements, hitting the end of the list, etc.
def simplfy(parse_list):
# Find the first included list;
# build a new list with that set of brackets removed.
reduce_list = []
for pos in len(parse_list):
elem = parse_list[pos]
if isinstance(elem, list):
# remove brackets; construct new list
reduce_list = parse_list[:pos]
reduce_list.extend(elem)
reduce_list.extend(parse_list[pos+1:]
# Recur on each list element
return_list = []
for elem in parse_list
if isinstance(elem, list):
return_list.append(simplfy(elem))
else:
return_list.append(elem)
return return_list
Suppose I have a list where each index is either a name, or a list of rooms the preceding name index reserved.
[["Bob"],["125A, "154B", "643A"],["142C", "192B"], ["653G"],
["Carol"], ["95H", 123C"], ["David"], ["120G"]]
So in this case, Bob has the rooms: 125A, 154B, 643A, 152C, 192B, and 653G reserved, etc.
How do I construct a function which would make the above into the following format:
[["Bob", "125A, "154B", "643A", "142C", "192B", "653G"], ["Carol"...
Essentially concatenating [name] with all the [list of room reservations], until the next instance of [name]. I have a function which takes a list, and returns True if a list is a name, and False if it is a list of room reservations, so effectively I have:
[True, False, False, False, True, False, True False] for the above list, but not sure how that would help me, if at all. Assume that if a list contains names, it only has one name.
Given the following method
def is_name(x):
return # if x is a name or not
a simply and short solution is to use a defaultdict
Example:
from collections import defaultdict
def do_it(source):
dd = defaultdict(lambda: [])
for item in sum(source, []): # just use your favourite flattening method here
if is_name(item):
name = item
else:
dd[name].append(item)
return [[k]+v for k,v in dd.items()]
for s in do_it(l):
print s
Output:
['Bob', '125A', '154B', '643A', '142C', '192B', '653G']
['Carol', '95H', '123C']
['David', '120G']
Bonus:
This one uses a generator for laziness
import itertools
def do_it(source):
name, items = None, []
for item in itertools.chain.from_iterable(source):
if is_name(item):
if name:
yield [name] + items
name, items = None, []
name = item
else:
items.append(item)
yield [name] + items
I'll preface this by saying that I strongly agree with #uʍopǝpısdn's suggestion. However if your setup precludes changing it for some reason, this seems to work (although it isn't pretty):
# Original list
l = [["Bob"],["125A", "154B", "643A"],["142C", "192B"], ["653G"], ["Carol"], ["95H", "123C"], ["David"], ["120G"]]
# This is the result of your checking function
mapper = [True, False, False, False, True, False, True, False]
# Final list
combined = []
# Generic counters
# Position in arrays
i = 0
# Position in combined list
k = 0
# Loop through the main list until the end.
# We don't use a for loop here because we want to be able to control the
# position of i.
while i < len(l):
# If the corresponding value is True, start building the list
if mapper[i]:
# This is an example of how the code gets messy quickly
combined.append([l[i][0]])
i += 1
# Now that we've hit a name, loop until we hit another, adding the
# non-name information to the original list
while i < len(mapper) and not mapper[i]:
combined[k].append(l[i][0])
i += 1
# increment the position in our combined list
k += 1
print combined
Assume that the function which takes a list and returns True or False based on whether list contains name or rooms is called containsName() ...
def process(items):
results = []
name_and_rooms = []
for item in items:
if containsName(item):
if name_and_rooms:
results.append(name_and_rooms[:])
name_and_rooms = []
name_and_rooms.append(item[0])
else:
name_and_rooms.extend(item)
if name_and_rooms:
results.append(name_and_rooms[:])
return results
This will print out name even if there are no list of rooms to follow, e.g. [['bob'],['susan']].
Also, this will not merge repeated names, e.g. [['bob'],['123'],['bob'],['456']]. If that is desired, then you'll need to shove names into a temporary dict instead, with each room list as values to it. And then spit out the key-values of the dict at the end. But that on its own will not preserve the order of the names. If you care to preserve the order of the names, you can have another list that contains the order of the names and use that when spitting out the values in the dict.
Really, you should be using a dict for this. This assumes that the order of lists doesn't change (the name is always first).
As others suggested you should re-evaluate your data structure.
>>> from itertools import chain
>>> li_combo = list(chain.from_iterable(lst))
>>> d = {}
>>> for i in li_combo:
... if is_name(i):
... k = i
... if k not in d:
... d[k] = []
... else:
... d[k].append(i)
...
>>> final_list = [[k]+d[k] for k in d]
>>> final_list
[['Bob', '125A', '154B', '643A', '142C', '192B', '653G'], ['Carol', '95H', '123C'], ['David', '120G']]
reduce is your answer. Your data is this:
l=[['Bob'], ['125A', '154B', '643A'], ['142C', '192B'], ['653G'], ['Carol'], ['95H', '123C'], ['David'], ['120G']]
You say you've already got a function that determines if an element is a name. Here is my one:
import re
def is_name(s):
return re.match("[A-z]+$",s) and True or False
Then, using reduce, it is a one liner:
reduce(lambda c, n: is_name(n[0]) and c+[n] or c[:-1]+[c[-1]+n], l, [])
Result is:
[['Bob', '125A', '154B', '643A', '142C', '192B', '653G'], ['Carol', '95H', '123C'], ['David', '120G']]
I have a list of data that includes both command strings as well as the alphabet, upper and lowercase, totaling to 512+ (including sub-lists) strings. I want to parse the input data, but i cant think of any way to do it properly other than starting from the largest possible command size and cutting it down until i find a command that is the same as the string and then output the location of the command, but that takes forever. any other way i can think of will cause overlapping. im doing this in python
say:
L = ['a', 'b',['aa','bb','cc'], 'c']
for 'bb' the output would be '0201' and 'c' would be '03'
so how should i do this?
It sounds like you're searching through the list for every substring. How about you built a dict to lookup the keys. Of cause you still have to start searching at the longest subkey.
L = ['a', 'b',['aa','bb','cc'], 'c']
def lookups( L ):
""" returns `item`, `code` tuples """
for i, item in enumerate(L):
if isinstance(item, list):
for j, sub in enumerate(item):
yield sub, "%02d%02d" % (i,j)
else:
yield item, "%02d" % i
You could then lookup substrings with:
lookupdict = dict(lookups(L))
print lookupdict['bb'] # but you have to do 'bb' before trying 'b' ...
But if the key length is not just 1 or 2, it might also make sense to group the items into separate dicts where each key has the same length.
If you must use this data structure:
from collections import MutableSequence
def scanList( command, theList ):
for i, elt in enumerate( theList ):
if elt == command:
return ( i, None )
if isinstance( elt, MutableSequence ):
for j, elt2 in enumerate( elt ):
if elt2 == command:
return i, j
L = ['a', 'b',['aa','bb','cc'], 'c']
print( scanList( "bb", L ) )
# (2, 1 )
print( scanlist( "c", L ) )
# (3, None )
BUT
This is a bad data structure. Are you able to get this data in a nicer form?