How to generate the component id in the Networkx graph?

How to generate the component id in the Networkx graph? - python

I have a large Graph Network generated using Networkx package.
Here I'm adding a sample
import networkx as nx
import pandas as pd
G = nx.path_graph(4)
nx.add_path(G, [10, 11, 12])
I'm trying to create a dataframe with Node, degrees, component id, component.
Created degrees using
degrees = list(nx.degree(G))
data = pd.DataFrame([list(d) for d in degrees], columns=['Node', 'degree']).sort_values('degree', ascending=False)
extracted components using
Gcc = sorted(nx.connected_components(G), key=len, reverse=True)
Gcc
[{0, 1, 2, 3}, {10, 11, 12}]
And not sure how I can create the Component ID and components in the data.
Required output:
Node degree ComponentID Components
1 1 2 1 {0, 1, 2, 3}
2 2 2 1 {0, 1, 2, 3}
5 11 2 2 {10, 11, 12}
0 0 1 1 {0, 1, 2, 3}
3 3 1 1 {0, 1, 2, 3}
4 10 1 2 {10, 11, 12}
6 12 1 2 {10, 11, 12}
How to generate the component ids and add them to the nodes and degrees?

Create triplets of Node, ComponentId and Component by enumerating over the connected component list, then create a new dataframe from these triplets and merge it with the given dataframe on Node
df = pd.DataFrame([(n, i, c) for i,c in enumerate(Gcc, 1) for n in c],
columns=['Node', 'ComponentID', 'Components'])
data = data.merge(df, on='Node')
Alternatively you can use map instead of merge to individually create ComponentID and Components columns
d = dict(enumerate(Gcc, 1))
data['ComponentID'] = data['Node'].map({n:i for i,c in d.items() for n in c})
data['Components'] = data['ComponentID'].map(d)
print(data)
Node degree ComponentID Components
1 1 2 1 {0, 1, 2, 3}
2 2 2 1 {0, 1, 2, 3}
5 11 2 2 {10, 11, 12}
0 0 1 1 {0, 1, 2, 3}
3 3 1 1 {0, 1, 2, 3}
4 10 1 2 {10, 11, 12}
6 12 1 2 {10, 11, 12}

Related

Recursively creating columns with cross joining dictionary-like dataframe in Python

I have one dataset with 7 variables and 5 indicators (df1):
A B C D E F G R_1 R_2 R_3 R_4 R_5
0 4 16 5 7 1 12 9 B C D F A
1 8 4 10 14 4 5 9 B E A NaN NaN
Second key-value dataset showing cut-off value for each variable (df2):
Variable Value
0 A 11
1 B 15
2 C 22
3 D 25
4 E 3
5 F 14
6 G 15
Want to add another 5 columns R_new_1-R_new_5 on the condition:
if R_1 = B and (value of B, df1) 16>15 (from df2):
df1['R_new_1'] = "C" (from R_2)
df1['R_new_2'] = "D" (from R_3)
df1['R_new_3'] = "F" (from R_4)
df1['R_new_4'] = "A" (from R_5)
df1['R_new_5'] = np.nan
Repeating above for the new R_2 value which is now stored in R_new_2
R_new_1 R_new_2 R_new_3 R_new_4 R_new_5
0 C D F A NaN
1 B A NaN NaN NaN
I have tried the below to automate the above:
var_list={'A','B','C','D','E','F','G'}
for col in var_list:
df1[str(col) + "_val"] = df2[df2['Variable']==str(col)].iloc[0][1]
for col in var_list:
if (df1[str(col) + "_val"] > df1[str(col)]):
df1[str(col) + "_ind"] = "OK"
else:
df1[str(col) + "_ind"] = "NOK"
The first run checks for R_1.
Unable to recursively replace B with C in R_new_1 and replace C with D in R_new_2 and replace D with F in R_new_3 and replace F with A in R_new_4 and CONTINUE checking for R_new_2 and so on
import pandas as pd
##dataframes constructers
#input
data1 = [{'A': 4, 'B': 16, 'C': 5, 'D': 7, 'E': 1, 'F': 12, 'G': 9, 'R_1':'B', 'R_2':'C', 'R_3':'D', 'R_4':'F', 'R_5':'A'},
{'A': 8, 'B': 4, 'C': 10, 'D': 14, 'E': 4, 'F': 5, 'G': 9, 'R_1':'B', 'R_2':'E', 'R_3':'A', 'R_4':np.nan, 'R_5':np.nan}]
df1 = pd.DataFrame(data1)
#input
data2 = [['A', 11], ['B', 15], ['C', 22], ['D', 25], ['E', 3], ['F', 14], ['G', 15]]
df2 = pd.DataFrame(data2, columns=['Variable', 'Value'])
#desired output
data3 = [{'A': 4, 'B': 16, 'C': 5, 'D': 7, 'E': 1, 'F': 12, 'G': 9, 'R_1':'B', 'R_2':'C', 'R_3':'D', 'R_4':'F', 'R_5':'A', 'R_new_1':'C', 'R_new_2':'D', 'R_new_3':'F', 'R_new_4':'A', 'R_new_5':np.nan},
{'A': 8, 'B': 4, 'C': 10, 'D': 14, 'E': 4, 'F': 5, 'G': 9, 'R_1':'B', 'R_2':'E', 'R_3':'A', 'R_4':np.nan, 'R_5':np.nan, 'R_new_1':'B', 'R_new_2':'A', 'R_new_3':np.nan, 'R_new_4':np.nan, 'R_new_5':np.nan}]
df3 = pd.DataFrame(data3)

NumPy - fast stable arg-sort of large array by frequency

I have large 1D NumPy array a of any comparable dtype, some of its elements may be repeated.
How do I find sorting indexes ix that will stable-sort (stability in a sense described here) a by frequencies of values in descending/ascending orders?
I want to find fastest and simplest way to do this. Maybe there is existing standard numpy function to do that.
There is another related question here but it was asking specifically to remove arrays duplicates, i.e. output only unique sorted values, I need all values of original array including duplicates.
I've coded my first trial to do the task, but it is not the fastest (uses Python's loop) and probably not shortest/simplest possible form. This python loop can be very expensive if repeating of equal elements is not high and array is huge. Also would be nice to have short function for doing this all if available in NumPy (e.g. imaginary np.argsort_by_freq()).
Try it online!
import numpy as np
np.random.seed(1)
hi, n, desc = 7, 24, True
a = np.random.choice(np.arange(hi), (n,), p = (
lambda p = np.random.random((hi,)): p / p.sum()
)())
us, cs = np.unique(a, return_counts = True)
af = np.zeros(n, dtype = np.int64)
for u, c in zip(us, cs):
af[a == u] = c
if desc:
ix = np.argsort(-af, kind = 'stable') # Descending sort
else:
ix = np.argsort(af, kind = 'stable') # Ascending sort
print('rows: i_col(0) / original_a(1) / freqs(2) / sorted_a(3)')
print(' / sorted_freqs(4) / sorting_ix(5)')
print(np.stack((
np.arange(n), a, af, a[ix], af[ix], ix,
), 0))
outputs:
rows: i_col(0) / original_a(1) / freqs(2) / sorted_a(3)
/ sorted_freqs(4) / sorting_ix(5)
[[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23]
[ 1 1 1 1 3 0 5 0 3 1 1 0 0 4 6 1 3 5 5 0 0 0 5 0]
[ 7 7 7 7 3 8 4 8 3 7 7 8 8 1 1 7 3 4 4 8 8 8 4 8]
[ 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 5 5 5 5 3 3 3 4 6]
[ 8 8 8 8 8 8 8 8 7 7 7 7 7 7 7 4 4 4 4 3 3 3 1 1]
[ 5 7 11 12 19 20 21 23 0 1 2 3 9 10 15 6 17 18 22 4 8 16 13 14]]

I might be missing something, but it seems that with a Counter you can then sort the indexes of each element according to the count of that element's value, using the element value and then the index to break ties. For example:
from collections import Counter
a = [ 1, 1, 1, 1, 3, 0, 5, 0, 3, 1, 1, 0, 0, 4, 6, 1, 3, 5, 5, 0, 0, 0, 5, 0]
counts = Counter(a)
t = [(counts[v], v, i) for i, v in enumerate(a)]
t.sort()
print([v[2] for v in t])
t.sort(reverse=True)
print([v[2] for v in t])
Output:
[13, 14, 4, 8, 16, 6, 17, 18, 22, 0, 1, 2, 3, 9, 10, 15, 5, 7, 11, 12, 19, 20, 21, 23]
[23, 21, 20, 19, 12, 11, 7, 5, 15, 10, 9, 3, 2, 1, 0, 22, 18, 17, 6, 16, 8, 4, 14, 13]
If you want to maintain ascending order of indexes with groups with equal counts, you can just use a lambda function for the descending sort:
t.sort(key = lambda x:(-x[0],-x[1],x[2]))
print([v[2] for v in t])
Output:
[5, 7, 11, 12, 19, 20, 21, 23, 0, 1, 2, 3, 9, 10, 15, 6, 17, 18, 22, 4, 8, 16, 14, 13]
If you want to maintain the ordering of elements in the order that they originally appeared in the array if their counts are the same, then rather than sort on the values, sort on the index of their first occurrence in the array:
a = [ 1, 1, 1, 1, 3, 0, 5, 0, 3, 1, 1, 0, 0, 4, 6, 1, 3, 5, 5, 0, 0, 0, 5, 0]
counts = Counter(a)
idxs = {}
t = []
for i, v in enumerate(a):
if not v in idxs:
idxs[v] = i
t.append((counts[v], idxs[v], i))
t.sort()
print([v[2] for v in t])
t.sort(key = lambda x:(-x[0],x[1],x[2]))
print([v[2] for v in t])
Output:
[13, 14, 4, 8, 16, 6, 17, 18, 22, 0, 1, 2, 3, 9, 10, 15, 5, 7, 11, 12, 19, 20, 21, 23]
[5, 7, 11, 12, 19, 20, 21, 23, 0, 1, 2, 3, 9, 10, 15, 6, 17, 18, 22, 4, 8, 16, 13, 14]
To sort according to count, and then position in the array, you don't need the value or the first index at all:
from collections import Counter
a = [ 1, 1, 1, 1, 3, 0, 5, 0, 3, 1, 1, 0, 0, 4, 6, 1, 3, 5, 5, 0, 0, 0, 5, 0]
counts = Counter(a)
t = [(counts[v], i) for i, v in enumerate(a)]
t.sort()
print([v[1] for v in t])
t.sort(key = lambda x:(-x[0],x[1]))
print([v[1] for v in t])
This produces the same output as the prior code for the sample data, for your string array:
a = ['g', 'g', 'c', 'f', 'd', 'd', 'g', 'a', 'a', 'a', 'f', 'f', 'f',
'g', 'f', 'c', 'f', 'a', 'e', 'b', 'g', 'd', 'c', 'b', 'f' ]
This produces the output:
[18, 19, 23, 2, 4, 5, 15, 21, 22, 7, 8, 9, 17, 0, 1, 6, 13, 20, 3, 10, 11, 12, 14, 16, 24]
[3, 10, 11, 12, 14, 16, 24, 0, 1, 6, 13, 20, 7, 8, 9, 17, 2, 4, 5, 15, 21, 22, 19, 23, 18]

I just figured myself probably very fast solution for any dtype using just numpy functions without python looping, it works in O(N log N) time. Used numpy functions: np.unique, np.argsort and array indexing.
Although wasn't asked in original question, I implemented extra flag equal_order_by_val if it is False then array elements with same frequencies are sorted as equal stable range, meaning that there could be c d d c d c output like in outputs dumps below, because this is the order as elements go in original array for equal frequency. When flag is True such elements are in addition sorted by value of original array, resulting in c c c d d d. In other words in case of False we sort stably just by key freq, and when it is True we sort by (freq, value) for ascending order and by (-freq, value) for descending order.
Try it online!
import string, math
import numpy as np
np.random.seed(0)
# Generating input data
hi, n, desc = 7, 25, True
letters = np.array(list(string.ascii_letters), dtype = np.object_)[:hi]
a = np.random.choice(letters, (n,), p = (
lambda p = np.random.random((letters.size,)): p / p.sum()
)())
for equal_order_by_val in [False, True]:
# Solving task
us, ui, cs = np.unique(a, return_inverse = True, return_counts = True)
af = cs[ui]
sort_key = -af if desc else af
if equal_order_by_val:
shift_bits = max(1, math.ceil(math.log(us.size) / math.log(2)))
sort_key = ((sort_key.astype(np.int64) << shift_bits) +
np.arange(us.size, dtype = np.int64)[ui])
ix = np.argsort(sort_key, kind = 'stable') # Do sorting itself
# Printing results
print('\nequal_order_by_val:', equal_order_by_val)
for name, val in [
('i_col', np.arange(n)), ('original_a', a),
('freqs', af), ('sorted_a', a[ix]),
('sorted_freqs', af[ix]), ('sorting_ix', ix),
]:
print(name.rjust(12), ' '.join([str(e).rjust(2) for e in val]))
outputs:
equal_order_by_val: False
i_col 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
original_a g g c f d d g a a a f f f g f c f a e b g d c b f
freqs 5 5 3 7 3 3 5 4 4 4 7 7 7 5 7 3 7 4 1 2 5 3 3 2 7
sorted_a f f f f f f f g g g g g a a a a c d d c d c b b e
sorted_freqs 7 7 7 7 7 7 7 5 5 5 5 5 4 4 4 4 3 3 3 3 3 3 2 2 1
sorting_ix 3 10 11 12 14 16 24 0 1 6 13 20 7 8 9 17 2 4 5 15 21 22 19 23 18
equal_order_by_val: True
i_col 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
original_a g g c f d d g a a a f f f g f c f a e b g d c b f
freqs 5 5 3 7 3 3 5 4 4 4 7 7 7 5 7 3 7 4 1 2 5 3 3 2 7
sorted_a f f f f f f f g g g g g a a a a c c c d d d b b e
sorted_freqs 7 7 7 7 7 7 7 5 5 5 5 5 4 4 4 4 3 3 3 3 3 3 2 2 1
sorting_ix 3 10 11 12 14 16 24 0 1 6 13 20 7 8 9 17 2 15 22 4 5 21 19 23 18

How to perform union between sets from different rows at same column at a Dataframe

Which is the best way (fastest) to perform union between sets from different rows at same column of a Dataframe.
For example for the following dataframe:
df_input=pd.DataFrame([[1,{1,2,3}],[1,{11,12}],[2,{1111,2222}],[2,{0,99}]], columns=['name', 'set'])
name set
0 1 {1, 2, 3}
1 1 {11, 12}
2 2 {2222, 1111}
3 2 {0, 99}
I would like to get:
name set
0 1 {1, 2, 3, 11, 12}
1 2 {0, 99, 2222, 1111}
And in case I have two columns wiht different sets, how can I join both columns?
For example, for this dataframe:
df_input=pd.DataFrame([[1,{1,2,3},{'a','b'}],[1,{11,12},{'j'}],[2,{1111,2222},{'m','n'}],[2,{0,99},{'p'}]], columns=['name', 'set1', 'set2'])
name set1 set2
0 1 {1, 2, 3} {b, a}
1 1 {11, 12} {j}
2 2 {2222, 1111} {m, n}
3 2 {0, 99} {p}
I am looking for the way to get this as ouput:
name set1 set2
0 1 {1, 2, 3, 11, 12} {b, j, a}
1 2 {0, 99, 2222, 1111} {m, p, n}
Thank you.

I am really not very knowleadgable in Pandas, and I'm sure there's a better way and (if you have time) you should probably wait for a better answer, but something like this seems to do the trick?
import pandas as pd
df_input=pd.DataFrame([[1,{1,2,3},{'a','b'}],[1,{11,12},{'j'}],[2,{1111,2222},{'m','n'}],[2,{0,99},{'p'}]], columns=['name', 'set1', 'set2'])
new = pd.DataFrame()
for name, agg_df in df_input.groupby('name'):
data = {
'name': name,
'set1': set(),
'set2': set(),
}
agg_df['set1'].apply(lambda c: data['set1'].update(c))
agg_df['set2'].apply(lambda c: data['set2'].update(c))
new = new.append(data, ignore_index=True)
print(new.head())
prints:
name set1 set2
0 1.0 {1, 2, 3, 11, 12} {b, j, a}
1 2.0 {0, 99, 2222, 1111} {p, n, m}
There is more Python syntactic sugar that you sure can use, but that's not really pandas...
import pandas as pd
df_input=pd.DataFrame([[1,{1,2,3},{'a','b'}],[1,{11,12},{'j'}],[2,{1111,2222},{'m','n'}],[2,{0,99},{'p'}]], columns=['name', 'set1', 'set2'])
SET_COLUMNS = ('set1', 'set2')
new = pd.DataFrame()
for name, agg_df in df_input.groupby('name'):
data = {**{'name': name}, **{set_col: set() for set_col in SET_COLUMNS}}
for set_col in SET_COLUMNS:
agg_df[set_col].apply(lambda c: data[set_col].update(c))
new = new.append(data, ignore_index=True)
print(new.head())

python: How can combine rows in dataframe

I tried to combine rows with apply function in dataframe but couldn't.
I would like to combine rows to one list if column (c1+c2) information is same.
for example
Dataframe df1
c1 c2 c3
0 0 x {'a':1 ,'b':2}
1 0 x {'a':3 ,'b':4}
2 0 y {'a':5 ,'b':6}
3 0 y {'a':7 ,'b':8}
4 2 x {'a':9 ,'b':10}
5 2 x {'a':11 ,'b':12}
expected result
Dataframe df1
c1 c2 c3
0 0 x [{'a':1 ,'b':2},{'a':3 ,'b':4}]
1 0 y [{'a':5 ,'b':6},{'a':7 ,'b':8}]
2 2 z [{'a':9 ,'b':10},{'a':11,'b':12}]

Source Pandas DF:
In [20]: df
Out[20]:
c1 c2 c3
0 0 x {'a': 1, 'b': 2}
1 0 x {'a': 3, 'b': 4}
2 0 y {'a': 5, 'b': 6}
3 0 y {'a': 7, 'b': 8}
4 2 x {'a': 9, 'b': 10}
5 2 x {'a': 11, 'b': 12}
Solution:
In [21]: df.groupby(['c1','c2'])['c3'].apply(list).to_frame('c3').reset_index()
Out[21]:
c1 c2 c3
0 0 x [{'a': 1, 'b': 2}, {'a': 3, 'b': 4}]
1 0 y [{'a': 5, 'b': 6}, {'a': 7, 'b': 8}]
2 2 x [{'a': 9, 'b': 10}, {'a': 11, 'b': 12}]
NOTE: I would recommend you to avoid using non-scalar values in Pandas DFs cells - this might cause various difficulties and performance issues

Hierarchical summing in python

Given the following array:
a = []
a.append({'c': 1, 'v': 10, 'p': 4})
a.append({'c': 2, 'v': 10, 'p': 4})
a.append({'c': 3, 'v': 10, 'p': None})
a.append({'c': 4, 'v': 0, 'p': None})
a.append({'c': 5, 'v': 10, 'p': 1})
a.append({'c': 6, 'v': 10, 'p': 1})
Where c = code, v= value and p=parent
table looks like that:
c v p
1 4
2 10 4
3 10
4
5 10 1
6 10 1
I have to sum up each parent with the values of it's children
Expected result table would be:
c v p
1 20 4
2 10 4
3 10
4 30
5 10 1
6 10 1
How to achieve this?

First, you should derive another dictionary, mapping parents to lists of their children, instead of children to their parents. You can use collections.defaultdict for this.
from collections import defaultdict
children = defaultdict(list)
for d in a:
children[d["p"]].append(d["c"])
Also, I suggest another dictionary, mapping codes to their values, so you don't have to search the entire list each time:
values = {}
for d in a:
values[d["c"]] = d["v"]
Now you can very easily define a recursive function for calculating the total value. Note, however, that this will do some redundant calculations. If your data is much larger, you might want to circumvent this by using a bit of memoization.
def total_value(x):
v = values[x]
for c in children[x]:
v += total_value(c)
return v
Finally, using this function in a dict comprehension gives you the total values for each code:
>>> {x: total_value(x) for x in values}
{1: 30, 2: 10, 3: 10, 4: 40, 5: 10, 6: 10}

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to generate the component id in the Networkx graph? - python

Related

Recursively creating columns with cross joining dictionary-like dataframe in Python

NumPy - fast stable arg-sort of large array by frequency

How to perform union between sets from different rows at same column at a Dataframe

python: How can combine rows in dataframe

Hierarchical summing in python

Categories

Resources