I have the code below to load the data:
from pymnet import *
import pandas as pd
nodes_id = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 1, 2, 3, 'aa', 'bb', 'cc']
layers = [1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 3, 3, 3]
nodes = {'nodes': nodes_id, 'layers': layers}
df_nodes = pd.DataFrame(nodes)
to = ['b', 'c', 'd', 'f', 1, 2, 3, 'bb', 'cc', 2, 3, 'a', 'g']
from_edges = ['a', 'a', 'b', 'e', 'a', 'b', 'e', 'aa', 'aa', 'aa', 1, 2, 3]
edges = {'to': to, 'from': from_edges}
df_edges = pd.DataFrame(edges)
I am attempting to use pymnet as a package to create a multi-layered network. (http://www.mkivela.com/pymnet/)
Does anybody know how to create a 3 layered network visualisation using this diagram? The tutorials seem to add nodes one at a time and it is unclear how to use a nodes and edges dataframe for this purpose. The layer groups are provided in the df_nodes.
Thanks
I've wondered the same, have a look at this post:
https://qiita.com/malimo1024/items/499a4ebddd14d29fd320
Use the format of this: mnet[from_node,to_node_2,layer_1,layer_2] = 1 to add edges (inter/intra).
For example:
from pymnet import *
import matplotlib.pyplot as plt
%matplotlib inline
mnet = MultilayerNetwork(aspects=1)
mnet['sato','tanaka','work','work'] = 1
mnet['sato','suzuki','friendship','friendship'] = 1
mnet['sato','yamada','friendship','friendship'] = 1
mnet['sato','yamada','work','work'] = 1
mnet['sato','sato','work','friendship'] = 1
mnet['tanaka','tanaka','work','friendship'] = 1
mnet['suzuki','suzuki','work','friendship'] = 1
mnet['yamada','yamada','work','friendship'] = 1
fig=draw(mnet)
Related
I have a dataset and I need to groupby my dataset based on column group:
import numpy as np
import pandas as pd
arr = np.array([1, 2, 4, 7, 11, 16, 22, 29, 37, 46])
df = pd.DataFrame({'group': ['A', 'B', 'A', 'B', 'A', 'B', 'A', 'B', 'A', 'B'],
"target": arr})
for g_name, g_df in df.groupby("group"):
print("GROUP: {}".format(g_name))
print(g_df)
However, sometimes group might not exist as a column and in this case, I am trying to whole data as a single group.
for g_name, g_df in df.groupby(SOMEPARAMETERS):
print(g_df)
output:
target
1
2
4
7
11
16
22
29
37
46
Is it possible to change the parameter of groupby to get whole data as a single group?
Assuming you mean something like this where you have two columns on which you want to group:
import numpy as np
import pandas as pd
arr = np.array([1, 2, 4, 7, 11, 16, 22, 29, 37, 46])
df = pd.DataFrame({'group1': ['A', 'B', 'A', 'B', 'A', 'B', 'A', 'B', 'A', 'B'],
'group2': ['C', 'D', 'D', 'C', 'D', 'D', 'C', 'D', 'D', 'C'],
'target': arr})
Then you can easily extend your example with:
for g_name, g_df in df.groupby(["group1", "group2"]):
print("GROUP: {}".format(g_name))
print(g_df)
Is this what you meant?
Given a list of string,
['a', 'a', 'c', 'a', 'a', 'a', 'd', 'c', 'd', 'd', 'd', 'd', 'c', 'd', 'd', 'd', 'd', 'c', 'd', 'd', 'd', 'd', 'c', 'b', 'b', 'b', 'd', 'b', 'b', 'b']
I would like to convert to an integer-category form
[0, 0, 2, 0, 0, 0, 3, 2, 3, 3, 3, 3, 2, 3, 3, 3, 3, 2, 3, 3, 3, 3, 2, 1, 1, 1, 3, 1, 1, 1]
This can achieve using numpy unique as below
ipt=['a', 'a', 'c', 'a', 'a', 'a', 'd', 'c', 'd', 'd', 'd', 'd', 'c', 'd', 'd', 'd', 'd', 'c', 'd', 'd', 'd', 'd', 'c', 'b', 'b', 'b', 'd', 'b', 'b', 'b']
_, opt = np.unique(np.array(ipt), return_inverse=True)
But, I curious if there is another alternative without the need to import numpy.
If you are solely interested in finding integer representation of factors, then you can use a dict comprehension along with enumerate to store the mapping, after using set to find unique values:
lst = ['a', 'a', 'c', 'a', 'a', 'a', 'd', 'c', 'd', 'd', 'd', 'd', 'c', 'd', 'd', 'd', 'd', 'c', 'd', 'd', 'd', 'd', 'c', 'b', 'b', 'b', 'd', 'b', 'b', 'b']
d = {x: i for i, x in enumerate(set(lst))}
lst_new = [d[x] for x in lst]
print(lst_new)
# [3, 3, 0, 3, 3, 3, 2, 0, 2, 2, 2, 2, 0, 2, 2, 2, 2, 0, 2, 2, 2, 2, 0, 1, 1, 1, 2, 1, 1, 1]
This approach can be used for general factors, i.e., the factors do not have to be 'a', 'b' and so on, but can be 'dog', 'bus', etc. One drawback is that it does not care about the order of factors. If you want the representation to preserve order, you can use sorted:
d = {x: i for i, x in enumerate(sorted(set(lst)))}
lst_new = [d[x] for x in lst]
print(lst_new)
# [0, 0, 2, 0, 0, 0, 3, 2, 3, 3, 3, 3, 2, 3, 3, 3, 3, 2, 3, 3, 3, 3, 2, 1, 1, 1, 3, 1, 1, 1]
You could take a note out of the functional programming book:
ipt=['a', 'a', 'c', 'a', 'a', 'a', 'd', 'c', 'd', 'd', 'd', 'd', 'c', 'd', 'd', 'd', 'd', 'c', 'd', 'd', 'd', 'd', 'c', 'b', 'b', 'b', 'd', 'b', 'b', 'b']
opt = list(map(lambda x: ord(x)-97, ipt))
This code iterates through the input array and passes each element through the lambda function, which takes the ascii value of the character, and subtracts 97 (to convert the characters to 0-25).
If each string isn't a single character, then the lambda function may need to be adapted.
You could write a custom function to do the same thing as you are using numpy.unique() for.
def unique(my_list):
''' Takes a list and returns two lists, a list of each unique entry and the index of
each unique entry in the original list
'''
unique_list = []
int_cat = []
for item in my_list:
if item not in unique_list:
unique_list.append(item)
int_cat.append(unique_list.index(item))
return unique_list, int_cat
Or if you wanted your indexing to be ordered.
def unique_ordered(my_list):
''' Takes a list and returns two lists, an ordered list of each unique entry and the
index of each unique entry in the original list
'''
# Unique list
unique_list = []
for item in my_list:
if item not in unique_list:
unique_list.append(item)
# Sorting unique list alphabetically
unique_list.sort()
# Integer category list
int_cat = []
for item in my_list:
int_cat.append(unique_list.index(item))
return unique_list, int_cat
Comparing the computation time for these two vs numpy.unique() for 100,000 iterations of your example list, we get:
numpy = 2.236004s
unique = 0.460719s
unique_ordered = 0.505591s
Showing that either option would be faster than numpty for simple lists. More complicated strings decrease the speed of unique() and unique_ordered much more than numpy.unique(). Doing 10,000 iterations of a random, 100 element list of 20 character strings, we get times of:
numpy = 0.45465s
unique = 1.56963s
unique_ordered = 1.59445s
So if efficiency was important and your list had more complex/a larger variety of strings, it would likely be better to use numpy.unique()
Really stupid question as I am new to python:
If I have labels = ['a', 'b', 'c', 'd'],
and indics = [2, 3, 0, 1]
How should I get the corresponding label using each index so I can get: ['c', 'd', 'a', 'b']?
There are a few alternatives, one, is to use a list comprehension:
labels = ['a', 'b', 'c', 'd']
indices = [2, 3, 0, 1]
result = [labels[i] for i in indices]
print(result)
Output
['c', 'd', 'a', 'b']
Basically iterate over each index and fetch the item at that position. The above is equivalent to the following for loop:
result = []
for i in indices:
result.append(labels[i])
A third option is to use operator.itemgetter:
from operator import itemgetter
labels = ['a', 'b', 'c', 'd']
indices = [2, 3, 0, 1]
result = list(itemgetter(*indices)(labels))
print(result)
Output
['c', 'd', 'a', 'b']
Consider a list with elements drawn from a set of symbols, e.g. {A, B, C}:
List --> A, A, B, B, A, A, A, A, A, B, C, C, B, B
Indexing indices --> 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13
How I can re-order this list so that, for any symbol, we have approximately half of the symbols in the first half of the list i.e. [0, [N/2]] of the list and half on the second half? i.e. [[N/2, N]]
Note that there could be multiple solutions to this problem. We also want to compute the resulting list of indices of the permutation, so that we can apply the new ordering to any list associated with the original one.
Is there a name for this problem? Any efficient algorithms for it? Most of the solutions I can think of are very brute-force.
You can use a dictionary here, this will take O(N) time:
from collections import defaultdict
lst = ['A', 'A', 'B', 'B', 'A', 'A', 'A', 'A', 'A', 'B', ' C', ' C', ' B', 'B']
d = defaultdict(list)
for i, x in enumerate(lst):
d[x].append(i)
items = []
indices = []
for k, v in d.items():
n = len(v)//2
items.extend([k]*n)
indices.extend(v[:n])
for k, v in d.items():
n = len(v)//2
items.extend([k]*(len(v)-n))
indices.extend(v[n:])
print items
print indices
Output:
['A', 'A', 'A', ' C', 'B', 'B', 'A', 'A', 'A', 'A', ' C', 'B', 'B', ' B']
[0, 1, 4, 10, 2, 3, 5, 6, 7, 8, 11, 9, 13, 12]
You can do this by getting the rank order of the symbols, then picking alternate ranks for each half of the output array:
x = np.array(['A', 'A', 'B', 'B', 'A', 'A', 'A',
'A', 'A', 'B', 'C', 'C', 'B', 'B'])
order = np.argsort(x)
idx = np.r_[order[0::2], order[1::2]]
print(x[idx])
# ['A' 'A' 'A' 'A' 'B' 'B' 'C' 'A' 'A' 'A' 'B' 'B' 'B' 'C']
print(idx)
# [ 0 4 6 8 3 12 10 1 5 7 2 9 13 11]
By default np.argsort uses the quicksort algorithm, with average time complexity O(N log N). The indexing step would be O(1).
You can use collections.Counter which is even better than just a defaultdict -- and you can place items into the first half and second half separately. That way, if you prefer, you can shuffle the first half and second half as much as you want (and just keep track of the shuffling permutation, with e.g. NumPy's argsort).
import collections
L = ['A', 'A', 'B', 'B', 'A', 'A', 'A', 'A', 'A', 'B', 'C', 'C', 'B', 'B']
idx_L = list(enumerate(L))
ctr = collections.Counter(L)
fh = []
fh_idx = []
sh = []
sh_idx = []
for k, v in ctr.iteritems():
idxs = [i for i,e in idx_L if e == k]
fh = fh + [k for i in range(v//2)]
fh_idx = fh_idx + idxs[:v//2]
sh = sh + [k for i in range(v // 2, v)]
sh_idx = sh_idx + idxs[v//2:]
shuffled = fh + sh
idx_to_shuffled = fh_idx + sh_idx
print shuffled
print idx_to_shuffled
which gives
['A', 'A', 'A', 'C', 'B', 'B', 'A', 'A', 'A', 'A', 'C', 'B', 'B', 'B']
[0, 1, 4, 10, 2, 3, 5, 6, 7, 8, 11, 9, 12, 13]
Shuffle the list with the indices, then split it in half. This method won't perfectly split the symbols every time, but as the number of repeats of each symbol gets larger, it will approach a perfect split.
import random
symbols = ['A', 'A', 'B', 'B', 'A', 'A', 'A', 'A', 'A', 'B', 'C', 'C', 'B', 'B']
indices = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]
both = zip(symbols, indices)
random.shuffle(both)
symbols2, indices2 = zip(*both)
print symbols2
print indices2
Some sample outputs:
Trial #1:
('A', 'C', 'B', 'A', 'A', 'B', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'C')
( 7, 10, 2, 4, 1, 13, 8, 0, 5, 6, 9, 3, 12, 11)
# |
Trial #2
('A', 'A', 'B', 'B', 'C', 'A', 'A', 'A', 'B', 'C', 'A', 'A', 'B', 'B')
( 6, 0, 9, 3, 11, 1, 8, 4, 13, 10, 7, 5, 2, 12)
# |
Trial #3
('A', 'A', 'C', 'C', 'B', 'B', 'A', 'B', 'B', 'A', 'A', 'A', 'A', 'B')
( 4, 5, 11, 10, 2, 3, 0, 13, 12, 6, 7, 8, 1, 9)
# |
Let's suppose I have the following DataFrame:
import pandas as pd
df = pd.DataFrame({'label': ['a', 'a', 'b', 'b', 'a', 'b', 'c', 'c', 'a', 'a'],
'numbers': [1, 1, 1, 1, 1, 2, 2, 2, 2, 2],
'arbitrarydata': [False] * 10})
I want to assign a value to the arbitrarydata column according to the values in both of the other colums. A naive approach would be as follows:
for _, grp in df.groupby(('label', 'numbers')):
grp.arbitrarydata = pd.np.random.rand()
Naturally, this doesn't propagate changes back to df. Is there a way to modify a group such that changes are reflected in the original DataFrame ?
Try using transform, e.g.:
df['arbitrarydata'] = df.groupby(('label', 'numbers')).transform(lambda x: np.random.rand())