Recursive tree search to get the node level - python

I have a dataset that contains a tree similar to the tree below.
son father
1 1 NA
2 2 1
3 3 1
4 4 2
5 5 NA
6 6 2
7 7 4
8 8 5
9 9 4
Built a function that allows me to search the entire hierarchy of a node (son)
getTree = function(sons){
if( length(sons) > 0 ){
sons = subset(df, father %in% sons)[['son']]
sons = c(sons, getTree( sons ))
}
return(sons)
}
subset(df, son %in% getTree(8))
That returns me
son father
4 4 2
6 6 2
7 7 4
9 9 4
However, in addition to the hierarchy, it is necessary to know at which level of the tree that node (child) is. How do I change, or create another function that allows me to achieve this?
Thanks in advance!

I'm not sure exactly what your function is meant to find in the tree, but here's an example in Python that finds the deepest children nodes in the table along with the depth. It uses a incremented counter on each call that keeps track of the depth:
In [140]: def traverse(sons, depth=0):
...: next_sons = sons[sons['father'].isin(sons['son'])]
...: if len(next_sons) > 0:
...: return traverse(next_sons, depth+1)
...: return sons, depth
In [141]: traverse(df)
Out[141]:
( son father
7 7 4.0
9 9 4.0,
3)

Here might be one recursive option for your to keep the track of node level using data.frame, i.e.,
f <- function(sons) {
getTree <- function(s.df) {
repeat {
sons <- subset(
df,
father %in% s.df$sons[s.df$lvl == max(s.df$lvl)]
)[["son"]]
if (length(sons) == 0) {
return(s.df)
}
p <- data.frame(sons = sons, lvl = max(s.df$lvl) + 1)
s.df <- rbind(s.df, getTree(p))
}
}
getTree(data.frame(sons = sons, lvl = 0))
}
where the levels always start from 0 for the input argument sons to function f, such that
> f(1)
sons lvl
1 1 0
2 2 1
3 3 1
4 4 2
5 6 2
6 7 3
7 9 3
> f(2)
sons lvl
1 2 0
2 4 1
3 6 1
4 7 2
5 9 2
> f(5)
sons lvl
1 5 0
2 8 1

Related

list index out of range in calculation of nodal distance

I am working on a small task in which I have to find the distance between two nodes. Each node has X and Y coordinates which can be seen below.
node_number X_coordinate Y_coordinate
0 0 1 0
1 1 1 1
2 2 1 2
3 3 1 3
4 4 0 3
5 5 0 4
6 6 1 4
7 7 2 4
8 8 3 4
9 9 4 4
10 10 4 3
11 11 3 3
12 12 2 3
13 13 2 2
14 14 2 1
15 15 2 0
For the purpose I mentioned above, I wrote below code,
X1_coordinate = df['X_coordinate'].tolist()
Y1_coordinate = df['Y_coordinate'].tolist()
node_number1 = df['node_number'].tolist()
nodal_dist = []
i = 0
for i in range(len(node_number1)):
dist = math.sqrt((X1_coordinate[i+1] - X1_coordinate[i])**2 + (Y1_coordinate[i+1] - Y1_coordinate[i])**2)
nodal_dist.append(dist)
I got the error
list index out of range
Kindly let me know what I am doing wrong and what should I change to get the answer.
Indexing starts at zero, so the last element in the list has an index that is one less than the number of elements in that list. But the len() function gives you the number of elements in the list (in other words, it starts counting at 1), so you want the range of your loop to be len(node_number1) - 1 to avoid an -off-by-one error.
The problems should been in this line
dist = math.sqrt((X1_coordinate[i+1] - X1_coordinate[i])**2 + (Y1_coordinate[i+1] - Y1_coordinate[i])**2)
the X1_coordinate[i+1] and the ] Y1_coordinate[i+1]] go out of range on the last number call.

Drop rows if value in column changes

Assume I have the following pandas data frame:
my_class value
0 1 1
1 1 2
2 1 3
3 2 4
4 2 5
5 2 6
6 2 7
7 2 8
8 2 9
9 3 10
10 3 11
11 3 12
I want to identify the indices of "my_class" where the class changes and remove n rows after and before this index. The output of this example (with n=2) should look like:
my_class value
0 1 1
5 2 6
6 2 7
11 3 12
My approach:
# where class changes happen
s = df['my_class'].ne(df['my_class'].shift(-1).fillna(df['my_class']))
# mask with `bfill` and `ffill`
df[~(s.where(s).bfill(limit=1).ffill(limit=2).eq(1))]
Output:
my_class value
0 1 1
5 2 6
6 2 7
11 3 12
One of possible solutions is to:
Make use of the fact that the index contains consecutive integers.
Find index values where class changes.
For each such index generate a sequence of indices from n-2
to n+1 and concatenate them.
Retrieve rows with indices not in this list.
The code to do it is:
ind = df[df['my_class'].diff().fillna(0, downcast='infer') == 1].index
df[~df.index.isin([item for sublist in
[ range(i-2, i+2) for i in ind ] for item in sublist])]
my_class = np.array([1] * 3 + [2] * 6 + [3] * 3)
cols = np.c_[my_class, np.arange(len(my_class)) + 1]
df = pd.DataFrame(cols, columns=['my_class', 'value'])
df['diff'] = df['my_class'].diff().fillna(0)
idx2drop = []
for i in df[df['diff'] == 1].index:
idx2drop += range(i - 2, i + 2)
print(df.drop(idx_drop)[['my_class', 'value']])
Output:
my_class value
0 1 1
5 2 6
6 2 7
11 3 12

Do python recursive calls interfere with each other?

I am trying to set up a recursive game solver (for the cracker-barrel peg game). The recursive function appears to not be operating correctly, and some outputs are created with no trace of how they were created (despite logging all steps). Is it possible that the python recursion steps are overwriting eachother?
I have already tried adding in print statements at all steps of the way. The game rules and algorithms work correctly, but the recursive play algorithm is not operating as expected
def recursive_play(board, moves_list, move_history, id, first_trial, recurse_counter):
# Check how many moves are left
tacks_left = len(char_locations(board, character=tack, grid=True))
log_and_print(f"tacks_left: {tacks_left}")
log_and_print(f"moves_left: {len(moves_list)}")
log_and_print(f"moves_list: {moves_list}")
if (len(moves_list) == 0):
if (tacks_left == 1):
# TODO: Remove final move separator
log_and_print(f"ONE TACK LEFT :)!!!!")
log_and_print(f"move_history to retrun for win: {move_history}")
return move_history
pass
elif (len(moves_list) > 0):
# Scan through all moves and make them recursively
for move in moves_list:
if first_trial:
id += 1
else:
# id += 1
id = id
next_board = make_move(board, move)
next_moves = possible_moves(next_board)
if first_trial:
next_history = "START: " + move
else:
next_history = move_history + round_separator + move
# log_and_print(f"og_board:")
prettify_board(board)
log_and_print(f"move: {move}")
log_and_print(f"next_board:")
prettify_board(next_board)
# log_and_print(f"next_moves: {next_moves}")
log_and_print(f"next_history: {next_history}")
log_and_print(f"id: {id}")
log_and_print(f"recurse_counter: {recurse_counter}")
# NOTE: Would this be cleaner with queues?
recursive_play(next_board, moves_list=next_moves, move_history=next_history, id=id, first_trial=False, recurse_counter=recurse_counter+1)
log_and_print(f"finished scanning all moves for board: {board}")
I expect all steps to be logged, and "START" should only occur on the first trial. However, a mysterious "START" appears in a later step with no trace of how that board was created.
Good Output:
INFO:root:next_history: START: 4 to 2 to 1 , 6 to 5 to 4 , 1 to 3 to 6 , 7 to 4 to 2
INFO:root:id: 1
INFO:root:recurse_counter: 3
INFO:root:tacks_left: 5
INFO:root:moves_left: 2
INFO:root:moves_list: ['9 to 8 to 7', '10 to 6 to 3']
INFO:root:o---
INFO:root:xo--
INFO:root:oox-
INFO:root:xoox
INFO:root:move: 9 to 8 to 7
INFO:root:next_board:
INFO:root:o---
INFO:root:xo--
INFO:root:oox-
INFO:root:xoox
INFO:root:next_history: START: 4 to 2 to 1 , 6 to 5 to 4 , 1 to 3 to 6 , 7 to 4 to 2 , 9 to 8 to 7
INFO:root:id: 1
INFO:root:recurse_counter: 4
INFO:root:tacks_left: 4
INFO:root:moves_left: 1
INFO:root:moves_list: ['10 to 6 to 3']
INFO:root:o---
INFO:root:xx--
INFO:root:ooo-
INFO:root:xooo
INFO:root:move: 10 to 6 to 3
INFO:root:next_board:
INFO:root:o---
INFO:root:xx--
INFO:root:ooo-
INFO:root:xooo
INFO:root:next_history: START: 4 to 2 to 1 , 6 to 5 to 4 , 1 to 3 to 6 , 7 to 4 to 2 , 9 to 8 to 7 , 10 to 6 to 3
Bad Output:
INFO:root:move: 6 to 3 to 1
INFO:root:next_board:
INFO:root:x---
INFO:root:xo--
INFO:root:ooo-
INFO:root:oooo
INFO:root:next_history: START: 6 to 3 to 1
INFO:root:id: 2
INFO:root:recurse_counter: 0
INFO:root:tacks_left: 2
INFO:root:moves_left: 1
INFO:root:moves_list: ['1 to 2 to 4']
INFO:root:o---
INFO:root:oo--
INFO:root:xoo-
INFO:root:oooo
INFO:root:move: 1 to 2 to 4
INFO:root:next_board:
INFO:root:o---
INFO:root:oo--
INFO:root:xoo-
INFO:root:oooo
INFO:root:next_history: START: 6 to 3 to 1 , 1 to 2 to 4
INFO:root:id: 2
INFO:root:recurse_counter: 1
INFO:root:tacks_left: 1
INFO:root:moves_left: 0
INFO:root:moves_list: []
INFO:root:ONE TACK LEFT :)!!!!
INFO:root:move_history to retrun for win: START: 6 to 3 to 1 , 1 to 2 to 4
INFO:root:finished scanning all moves for board: ['o---', 'oo--', 'xoo-', 'oooo']
Any tips anyone can provide would be greatly appreciated.

Python: how to find values in a dataframe without loop?

I have two dataframes net and M.
net =
i j d
0 5 3 3
1 2 0 2
2 3 2 1
3 4 5 2
4 0 1 3
5 0 3 4
M =
0 1 2 3 4 5
0 0 3 2 4 1 5
1 3 0 2 0 3 3
2 2 2 0 1 1 4
3 4 0 1 0 3 3
4 1 3 1 3 0 2
5 5 3 4 3 2 0
I want to find in M the same values of net['d'], choose randomly a cell in M and create a new dataframe containing the coordinate of that cell. For instance
net['d'][0] = 3
so in M I find:
M[0][1]
M[1][0]
M[1][4]
M[1][5]
...
Finally net1 would be something like that
net1 =
i1 j1 d1
0 1 5 3
1 5 4 2
2 2 3 1
3 1 2 2
4 1 5 3
5 3 0 4
This what I am doing:
I1 = []
J1 = []
for i in net.index:
tmp = net['d'][i]
ds = np.where( M == tmp)
size = len(ds[0])
ind = randint(size) ## find two random locations with distance ds
h = ds[0][ind]
w = ds[1][ind]
I1.append(h)
J1.append(w)
net1 = pd.DataFrame()
net1['i1'] = I1
net1['j1'] = J1
net1['d1'] = net['d']
I am wondering which is the best way to avoid that loop
You can stack the columns of M and then just sample it with replacement
net = pd.DataFrame({'i':[5,2,3,4,0,0],
'j':[3,0,2,5,1,3],
'd':[3,2,1,2,3,4]})
M = pd.DataFrame({0:[0,3,2,4,1,5],
1:[3,0,2,0,3,3],
2:[2,2,0,1,1,4],
3:[4,0,1,0,3,3],
4:[1,3,1,3,0,2],
5:[5,3,4,3,2,0]})
def random_net(net, M):
# make long table and randomize order of rows and rename columns
net1 = M.stack().reset_index()
net1.columns =['i1', 'j1', 'd1']
# get size of each group for random mapping
net1_id_length = net1.groupby('d1').size()
# add id column to uniquely identify row in net
net_copy = net.copy()
# first map gets size of each group and second gets random integer
net_copy['id'] = net_copy['d'].map(net1_id_length).map(np.random.randint)
net1['id'] = net1.groupby('d1').cumcount()
# make for easy lookup
net_copy = net_copy.set_index(['d', 'id'])
net1 = net1.set_index(['d1', 'id'])
# choose from net1 only those from original net
return net1.reindex(net_copy.index).reset_index('d').reset_index(drop=True).rename(columns={'d':'d1'})
random_net(net, M)
output
d1 i1 j1
0 3 5 1
1 2 0 2
2 1 3 2
3 2 1 2
4 3 3 5
5 4 0 3
Timings on 6 million rows
n = 1000000
net = pd.DataFrame({'i':[5,2,3,4,0,0] * n,
'j':[3,0,2,5,1,3] * n,
'd':[3,2,1,2,3,4] * n})
M = pd.DataFrame({0:[0,3,2,4,1,5],
1:[3,0,2,0,3,3],
2:[2,2,0,1,1,4],
3:[4,0,1,0,3,3],
4:[1,3,1,3,0,2],
5:[5,3,4,3,2,0]})
%timeit random_net(net, M)
1 loop, best of 3: 13.7 s per loop

Using python df.replace with dict does not permanently change values

I generated a DataFrame that includes a column called "pred_categories" with numerical values of 0, 1, 2, and 3. See below:
fileids pred_categories
0 /Saf/DA192069.txt 3
1 /Med/DA000038.txt 2
2 /Med/DA000040.txt 2
3 /Saf/DA191905.txt 3
4 /Med/DA180730.txt 2
I wrote a dict:
di = {3: "SAF", 2: "MED", 1: "FAC", 0: "ENV"}
And it works at first:
df.replace({'pred_categories': di})
Out[16]:
fileids pred_categories
0 /Saf/DA192069.txt SAF
1 /Med/DA000038.txt MED
2 /Med/DA000040.txt MED
3 /Saf/DA191905.txt SAF
4 /Med/DA180730.txt MED
5 /Saf/DA192307.txt SAF
6 /Env/DA178021.txt ENV
7 /Fac/DA358334.txt FAC
8 /Env/DA178049.txt ENV
9 /Env/DA178020.txt ENV
10 /Env/DA178031.txt ENV
11 /Med/DA000050.txt MED
12 /Med/DA180720.txt MED
13 /Med/DA000010.txt MED
14 /Fac/DA358391.txt FAC
but then when checking
df.head()
it doesn't seem to permanently "save" it in the DataFrame. Any pointers on what I'm doing wrong?
print(df)
fileids pred_categories
0 /Saf/DA192069.txt 3
1 /Med/DA000038.txt 2
2 /Med/DA000040.txt 2
3 /Saf/DA191905.txt 3
4 /Med/DA180730.txt 2
5 /Saf/DA192307.txt 3
6 /Env/DA178021.txt 0
7 /Fac/DA358334.txt 1
8 /Env/DA178049.txt 0
9 /Env/DA178020.txt 0
10 /Env/DA178031.txt 0
11 /Med/DA000050.txt 2
12 /Med/DA180720.txt 2
13 /Med/DA000010.txt 2
14 /Fac/DA358391.txt 1
Per default .replace() returns changed DF, but it doesn't change it in place, so you have to do it this way:
df = df.replace({'pred_categories': di})
or
df.replace({'pred_categories': di}, inplace=True)

Categories

Resources