How to obtain a subset of nodes in NetworkX? - python

I've been working on a multipartite layout graph with NetworkX. The graph looks something like this:
Each node on my graph has a 'trajectory' attribute and a 'layer' attribute. The layer indicates which column the node belongs to.
An example of the first two columns of nodes is shown:
[('0.0', {'layer': 0, 'trajectory': 0}), ('1.0', {'layer': 0, 'trajectory': 1}), ('2.0', {'layer': 0, 'trajectory': 2}), ('3.0', {'layer': 0, 'trajectory': 3}), ('4.0', {'layer': 0, 'trajectory': 4}), ('5.0', {'layer': 0, 'trajectory': 5}), ('6.0', {'layer': 0, 'trajectory': 6}), ('7.0', {'layer': 0, 'trajectory': 7}), ('8.0', {'layer': 0, 'trajectory': 8}), ('9.0', {'layer': 0, 'trajectory': 9}), ('10.0', {'layer': 0, 'trajectory': 10}), ('11.0', {'layer': 0, 'trajectory': 11}), ('12.0', {'layer': 0, 'trajectory': 12}), ('13.0', {'layer': 0, 'trajectory': 13}), ('14.0', {'layer': 0, 'trajectory': 14}), ('0.1', {'layer': 1, 'trajectory': 0}), ('1.1', {'layer': 1, 'trajectory': 1}), ('2.1', {'layer': 1, 'trajectory': 2}), ('3.1', {'layer': 1, 'trajectory': 3}), ('4.1', {'layer': 1, 'trajectory': 4}), ('5.1', {'layer': 1, 'trajectory': 5}), ('6.1', {'layer': 1, 'trajectory': 6}), ('7.1', {'layer': 1, 'trajectory': 7}), ('8.1', {'layer': 1, 'trajectory': 8}), ('9.1', {'layer': 1, 'trajectory': 9}), ('10.1', {'layer': 1, 'trajectory': 10}), ('11.1', {'layer': 1, 'trajectory': 11}), ('12.1', {'layer': 1, 'trajectory': 12}), ('13.1', {'layer': 1, 'trajectory': 13}), ('14.1', {'layer': 1, 'trajectory': 14}), ('15.1', {'layer': 1, 'trajectory': '15'})]
I need to retrieve all the nodes in a specific column. If I were to pick the 2nd column then to access it I could do:
column = 2
for nodename, nodeattrs in G.nodes(data=True):
if nodeattrs['layer'] == column:
print('I am a node in the column: ' + str(column))
# I do more stuff here
I think that is not a very efficient nor elegant way to solve my issue since I have to check every node in the graph. My graph will have thousands of columns, which makes me believe there has to be a way of obtaining the subset of nodes I want without having to check if each node has the specified layer or not.
Is there a better way to implement this?
EDIT: I found an answer to my question here:
Select nodes and edges form networkx graph with attributes
In my case it would be something along the lines:
dict( (n,d['layer']) for n,d in G.nodes().items() if d['layer'] == 2)
Which returns a dictionary I can save.

Related

delete dict in list from dict in list

I have two lists (with dicts in it):
old_device_data_list = [{'_id': ObjectId('5f48c8e34545fac49fbff5'), 'device_id': 5, 'time': datetime.datetime(2020, 8, 26, 9, 5, 39, 827000), 'values': {'count': 100, 'late': 0, 'max': 0, 'min': 0, 'on_time': 100, 'sum': 100}}]
result = [{'_id': ObjectId('5f48c8e3997640fac49fbff5'), 'device_id': 5, 'time': datetime.datetime(2020, 8, 26, 9, 5, 39, 827000), 'values': {'count': 100, 'late': 0, 'max': 0, 'min': 0, 'on_time': 100, 'sum': 100}}, {'_id': ObjectId('5f48c8e3997640fac49fbff6'), 'device_id': 4, 'time': datetime.datetime(2020, 8, 26, 9, 5, 39, 827000), 'values': {'count': 180, 'late': 0, 'max': 0, 'min': 0, 'on_time': 180, 'sum': 180}}, {'_id': ObjectId('5f48c8e3997640fac49fbff8'), 'device_id': 3, 'time': datetime.datetime(2020, 8, 27, 9, 5, 39, 827000), 'values': {'count': 50, 'late': 0, 'max': 0, 'min': 0, 'on_time': 50, 'sum': 50}}, {'_id': ObjectId('5f48c8e3997640fac49fbff7'), 'device_id': 4, 'time': datetime.datetime(2020, 8, 27, 9, 5, 39, 827000), 'values': {'count': 120, 'late': 0, 'max': 0, 'min': 0, 'on_time': 120, 'sum': 120}}, {'_id': ObjectId('5f48c8e3997640fac49fbff9'), 'device_id': 3, 'time': datetime.datetime(2020, 8, 28, 9, 5, 39, 827000), 'values': {'count': 210, 'late': 0, 'max': 0, 'min': 0, 'on_time': 210, 'sum': 210}}]
I want to delete the dicts from the old_device_data_list out of the result list. I tried it with numpy with:
numpy.setdiff1d(result, old_device_data_list)
Then I got error:
TypeError: '<' not supported between instances of 'dict' and 'dict'
The description of numpy.setdiff1d says:
Return the sorted, unique values in ar1 that are not in ar2.
In order to sort the values, it needs to compare them using the < operator. But dictionaries cannot be compared like this. The relation "smaller than" is not defined for dictionaries.
NumPy is designed for working with numeric values, not for arbitrary Python data structures.
You could use a simple list comprehension to create a list of those dictionaries that are in result but not in old_device_data_list:
result = [d for d in result if d not in old_device_data_list]

Cleanest way to sum list of nested dicts

Is there a cleaner/more pythonic way of summing the contents of a list of nested dicts? Here's what I'm doing, but I suspect that there may be a better way:
list_of_nested_dicts = [{'class1': {'TP': 1, 'FP': 0, 'FN': 2}, 'class2': {'TP': 0, 'FP': 0, 'FN': 0}, 'class3': {'TP': 0, 'FP': 0, 'FN': 0}, 'class4': {'TP': 1, 'FP': 0, 'FN': 2}},
{'class1': {'TP': 1, 'FP': 0, 'FN': 2}, 'class2': {'TP': 0, 'FP': 0, 'FN': 0}, 'class3': {'TP': 0, 'FP': 0, 'FN': 0}, 'class4': {'TP': 1, 'FP': 0, 'FN': 2}},
{'class1': {'TP': 1, 'FP': 0, 'FN': 2}, 'class2': {'TP': 0, 'FP': 0, 'FN': 0}, 'class3': {'TP': 0, 'FP': 0, 'FN': 0}, 'class4': {'TP': 1, 'FP': 0, 'FN': 2}},
{'class1': {'TP': 1, 'FP': 0, 'FN': 2}, 'class2': {'TP': 0, 'FP': 0, 'FN': 0}, 'class3': {'TP': 0, 'FP': 0, 'FN': 0}, 'class4': {'TP': 1, 'FP': 0, 'FN': 2}}]
total_counts = {k:{'TP': 0, 'FP': 0, 'FN': 0} for k in list_of_nested_dicts[0].keys()}
for d in list_of_nested_dicts:
for label,counts_dict in d.items():
for k,v in counts_dict.items():
total_counts[label][k] += v
print(total_counts)
(Assuming all keys are exactly the same, but values could be any integer)
You can have a slightly tighter code using collections (similar result to #blhsing)
import collections
counts = collections.defaultdict(collections.Counter)
for d in list_of_nested_dicts:
for k, v in d.items():
counts[k].update(v)
This will give you a defaultdict of counters instead of only dicts, but they behave similarly. You can also explicitly cast them to dicts at the end if you want.
{'class1': {'FN': 8, 'FP': 0, 'TP': 4},
'class2': {'FN': 0, 'FP': 0, 'TP': 0},
'class3': {'FN': 0, 'FP': 0, 'TP': 0},
'class4': {'FN': 8, 'FP': 0, 'TP': 4}}
vs
defaultdict(<class 'collections.Counter'>,
{'class1': Counter({'FN': 8, 'TP': 4, 'FP': 0}),
'class2': Counter({'TP': 0, 'FP': 0, 'FN': 0}),
'class3': Counter({'TP': 0, 'FP': 0, 'FN': 0}),
'class4': Counter({'FN': 8, 'TP': 4, 'FP': 0})})
One thing in your code that stands out as "unclean" is the fact that you are hard-coding the keys of the sub-dicts in the initialization of total_counts. You can avoid such hard-coding by using the dict.setdefault and dict.get methods as you iterate over the items of the sub-dicts instead:
total_counts = {}
for d in list_of_nested_dicts:
for label, counts_dict in d.items():
for k, v in counts_dict.items():
total_counts[label][k] = total_counts.setdefault(label, {}).get(k, 0) + v

pandas - pd.replace and TypeError

I have all_data dataframe. I want to replace some categorical values in certain columns with numerical values. I'm trying to use this nested dictionary notation (I've checked that the brackets and curly brackets are in place, I don't think that's the issue):
all_data = all_data.replace({'Street': {'Pave': 1, 'Grvl': 0}},
{'LotShape': {'IR3': 1, 'IR2': 2, 'IR1': 3, 'Reg': 4}},
{'Utilities': {'ELO': 0, 'NoSeWa': 0, 'NoSewr': 0, 'AllPub': 1}},
{'LandSlope': {'Sev': 1, 'Mod': 2, 'Gtl': 3}},
{'ExterQual': {'Po': 1, 'Fa': 2, 'TA': 3, 'Gd': 4, 'Ex': 5}},
{'ExterCond': {'Po': 1, 'Fa': 2, 'TA': 3, 'Gd': 4, 'Ex': 5}},
{'BsmtQual': {'NA': 0, 'Po': 1, 'Fa': 2, 'TA': 3, 'Gd': 4,'Ex': 5}},
{'BsmtCond': {'NA': 0, 'Po': 1, 'Fa': 2, 'TA': 3, 'Gd': 4,'Ex': 5}},
{'BsmtExposure': {'NA': 0, 'No': 1, 'Mn': 2, 'Av': 3, 'Gd': 4}},
{'BsmtFinType1': {'NA': 0, 'Unf': 1, 'LwQ': 2, 'Rec': 3, 'BLQ': 4, 'ALQ': 5, 'GLQ': 6}},
{'BsmtFinType2': {'NA': 0, 'Unf': 1,'LwQ': 2,'Rec': 3, 'BLQ': 4,'ALQ': 5, 'GLQ': 6}},
{'HeatingQC': {'Po': 1,'Fa': 2,'TA': 3,'Gd': 4,'Ex': 5}},
{'CentralAir': {'No': 0,'Yes': 1}},
{'KitchenQual': {'Po': 1,'Fa': 2,'TA': 3,'Gd': 4,'Ex': 5}},
{'Functional': {'Sal': -7,'Sev': -6,'Maj1': -5,'Maj2': -4,'Mod': -3,'Min2': -2,'Min1': -1,
'Typ': 0}},
{'FireplaceQu': {'NA': 0,'Po': 1,'Fa': 2,'TA': 3,'Gd': 4,'Ex': 5}},
{'GarageFinish': {'NA': 0,'Unf': 1,'RFn': 2, 'Fin': 3}},
{'GarageQual': {'NA': 0, 'Po': 1,'Fa': 2, 'TA': 3,'Gd': 4, 'Ex': 5}},
{'GarageCond': {'NA': 0,'Po': 1,'Fa': 2,'TA': 3,'Gd': 4,'Ex': 5}},
{'PavedDrive': {'N': 0,'P': 0, 'Y': 1}},
{'Fence': {'NA': 0, 'MnWw': 1,'GdWo': 2,'MnPrv': 3,'GdPrv': 4}},
{'SaleCondition': {'Abnorml': 1, 'Alloca': 1, 'AdjLand': 1, 'Family': 1, 'Normal': 0,
'Partial': 0}}
)
Error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-40-f9c9c28b7237> in <module>()
22 {'Fence': {'NA': 0, 'MnWw': 1,'GdWo': 2,'MnPrv': 3,'GdPrv': 4}},
23 {'SaleCondition': {'Abnorml': 1, 'Alloca': 1, 'AdjLand': 1, 'Family': 1, 'Normal': 0,
---> 24 'Partial': 0}}
25 )
TypeError: replace() takes from 1 to 8 positional arguments but 23 were given
If I remove the 'SaleCondition' row from the above code, the error is again there but this time referring to 'Fence', and so on, for each line of code from bottom up. I've googled but have no idea what this means. Help MUCH appreciated.
You should do something like :
df.replace({'Fence':{'NA': 0, 'MnWw': 1,'GdWo': 2,'MnPrv': 3,'GdPrv': 4},'SaleCondition':{'Abnorml': 1, 'Alloca': 1, 'AdjLand': 1, 'Family': 1, 'Normal': 0,
'Partial': 0}})
the format should be .replace({'col1':{},'col2':{}}) not .replace({'col1':{}},{'col2':{}})

Python sort multi dimensional dict

input={11: {'perc': 0, 'name': u'B test', 'cid': 11, 'total': 0, 'pending': 0, 'complete': 0}, 10: {'perc': 0, 'name': u'C test', 'cid': 10, 'total': 0, 'pending': 0,'complete': 0}, 3: {'perc': 9, 'name': u'Atest Pre-requisites', 'cid': 3, 'total': 11, 'pending': 10, 'complete': 1}}
I want to sort this dict based on name field. I'm new in python, anyone please help me.
First, you should avoid using reserved words (such as input) as variables (now input is redefined and no longer calls the function input()).
Also, a dictionary cannot be sorted. If you don't need the keys, you can transform the dictionary into a list, and then sort it. The code would be like this:
input_dict = {11: {'perc': 0, 'name': u'B test', 'cid': 11, 'total': 0, 'pending': 0, 'complete': 0}, 10: {'perc': 0, 'name': u'C test', 'cid': 10, 'total': 0, 'pending': 0,'complete': 0}, 3: {'perc': 9, 'name': u'Atest Pre-requisites', 'cid': 3, 'total': 11, 'pending': 10, 'complete': 1}}
input_list = sorted(input_dict.values(), key=lambda x: x['name'])
print(input_list)
# prints [{'perc': 9, 'complete': 1, 'cid': 3, 'total': 11, 'pending': 10, 'name': u'Atest Pre-requisites'}, {'perc': 0, 'complete': 0, 'cid': 11, 'total': 0, 'pending': 0, 'name': u'B test'}, {'perc': 0, 'complete': 0, 'cid': 10, 'total': 0, 'pending': 0, 'name': u'C test'}]
EDIT
If you wish to keep the keys and use iteritems() as you said in the comments, use this code instead:
input_dict = {11: {'perc': 0, 'name': u'B test', 'cid': 11, 'total': 0, 'pending': 0, 'complete': 0}, 10: {'perc': 0, 'name': u'C test', 'cid': 10, 'total': 0, 'pending': 0,'complete': 0}, 3: {'perc': 9, 'name': u'Atest Pre-requisites', 'cid': 3, 'total': 11, 'pending': 10, 'complete': 1}}
input_list = sorted(input_dict.iteritems(), key=lambda x: x[1]['name'])
print(input_list)
# prints [(3, {'perc': 9, 'complete': 1, 'cid': 3, 'total': 11, 'pending': 10, 'name': u'Atest Pre-requisites'}), (11, {'perc': 0, 'complete': 0, 'cid': 11, 'total': 0, 'pending': 0, 'name': u'B test'}), (10, {'perc': 0, 'complete': 0, 'cid': 10, 'total': 0, 'pending': 0, 'name': u'C test'})]

How to change one dictionary value in a dictionary of dictionaries

I am running a variation of the following script:
text1={'file1':0,'file2':0}
text2=['100-200','200-300','300-400']
text3=['1','2','3','4']
level1={}
level2={}
for i in text2:
level1[i]=text1
for n in text3:
level2[n]=level1
level2['3']['100-200']['file1']=level2['3']['100-200']['file1']+1
Unfortunately this changes the dictionary from:
{'1': {'200-300': {'file2': 0, 'file1': 0}, '300-400': {'file2': 0, 'file1': 0}, '100-200': {'file2': 0, 'file1': 0}}, '2': {'200-300': {'file2': 0, 'file1': 0}, '300-400': {'file2': 0, 'file1': 0}, '100-200': {'file2': 0, 'file1': 0}}, '3': {'200-300': {'file2': 0, 'file1': 0}, '300-400': {'file2': 0, 'file1': 0}, '100-200': {'file2': 0, 'file1': 0}}, '4': {'200-300': {'file2': 0, 'file1': 0}, '300-400': {'file2': 0, 'file1': 0}, '100-200': {'file2': 0, 'file1': 0}}}
to:
{'1': {'200-300': {'file2': 0, 'file1': 1}, '300-400': {'file2': 0, 'file1': 1}, '100-200': {'file2': 0, 'file1': 1}}, '2': {'200-300': {'file2': 0, 'file1': 1}, '300-400': {'file2': 0, 'file1': 1}, '100-200': {'file2': 0, 'file1': 1}}, '3': {'200-300': {'file2': 0, 'file1': 1}, '300-400': {'file2': 0, 'file1': 1}, '100-200': {'file2': 0, 'file1': 1}}, '4': {'200-300': {'file2': 0, 'file1': 1}, '300-400': {'file2': 0, 'file1': 1}, '100-200': {'file2': 0, 'file1': 1}}}
How do I change only one of the file values and not all of them?
Use a dict comprehension to produce your structure, where loop expressions are evaluated each iteration:
level2 = {n: {i: {'file1':0,'file2':0} for i in text2}} for n in text3}
You are not creating copies of the dictionaries, merely storing references to one dictionary object.
Thus, each time you stored text1 you created a reference, not a copy, and the same goes for each time you referenced level1.

Categories

Resources