I have a json object that I made using networkx:
json_data = json_graph.node_link_data(network_object)
It is structured like this (mini version of my output):
>>> json_data
{'directed': False,
'graph': {'name': 'compose( , )'},
'links': [{'source': 0, 'target': 7, 'weight': 1},
{'source': 0, 'target': 2, 'weight': 1},
{'source': 0, 'target': 12, 'weight': 1},
{'source': 0, 'target': 9, 'weight': 1},
{'source': 2, 'target': 18, 'weight': 25},
{'source': 17, 'target': 25, 'weight': 1},
{'source': 29, 'target': 18, 'weight': 1},
{'source': 30, 'target': 18, 'weight': 1}],
'multigraph': False,
'nodes': [{'bipartite': 1, 'id': 'Icarus', 'node_type': 'Journal'},
{'bipartite': 1,
'id': 'A Giant Step: from Milli- to Micro-arcsecond Astrometry',
'node_type': 'Journal'},
{'bipartite': 1,
'id': 'The Astrophysical Journal Supplement Series',
'node_type': 'Journal'},
{'bipartite': 1,
'id': 'Astronomy and Astrophysics Supplement Series',
'node_type': 'Journal'},
{'bipartite': 1, 'id': 'Astronomy and Astrophysics', 'node_type': 'Journal'},
{'bipartite': 1,
'id': 'Astronomy and Astrophysics Review',
'node_type': 'Journal'}]}
What I want to do is add the following elements to each of the nodes so I can use this data as an input for sigma.js:
"x": 0,
"y": 0,
"size": 3
"centrality": 0
I can't seem to find an efficient way to do this though using add_node(). Is there some obvious way to add this that I'm missing?
While you have your data as a networkx graph, you could use the set_node_attributes method to add the attributes (e.g. stored in a python dictionary) to all the nodes in the graph.
In my example the new attributes are stored in the dictionary attr:
import networkx as nx
from networkx.readwrite import json_graph
# example graph
G = nx.Graph()
G.add_nodes_from(["a", "b", "c", "d"])
# your data
#G = json_graph.node_link_graph(json_data)
# dictionary of new attributes
attr = {"x": 0,
"y": 0,
"size": 3,
"centrality": 0}
for name, value in attr.items():
nx.set_node_attributes(G, name, value)
# check new node attributes
print(G.nodes(data=True))
You can then export the new graph in JSON with node_link_data.
Related
I'm having some trouble accessing a value that is inside an array that contains a dictionary and another array.
It looks like this:
[{'name': 'Alex',
'number_of_toys': [{'classification': 3, 'count': 383},
{'classification': 1, 'count': 29},
{'classification': 0, 'count': 61}],
'total_toys': 473},
{'name': 'John',
'number_of_toys': [{'classification': 3, 'count': 8461},
{'classification': 0, 'count': 3825},
{'classification': 1, 'count': 1319}],
'total_toys': 13605}]
I want to access the 'count' number for each 'classification'. For example, for 'name' Alex, if 'classification' is 3, then the code returns the 'count' of 383, and so on for the other classifications and names.
Thanks for your help!
Not sure what your question asks, but if it's just a mapping exercise this will get you on the right track.
def get_toys(personDict):
person_toys = personDict.get('number_of_toys')
return [ (toys.get('classification'), toys.get('count')) for toys in person_toys]
def get_person_toys(database):
return [(personDict.get('name'), get_toys(personDict)) for personDict in database]
This result is:
[('Alex', [(3, 383), (1, 29), (0, 61)]), ('John', [(3, 8461), (0, 3825), (1, 1319)])]
This isn't as elegant as the previous answer because it doesn't iterate over the values, but if you want to select specific elements, this is one way to do that:
data = [{'name': 'Alex',
'number_of_toys': [{'classification': 3, 'count': 383},
{'classification': 1, 'count': 29},
{'classification': 0, 'count': 61}],
'total_toys': 473},
{'name': 'John',
'number_of_toys': [{'classification': 3, 'count': 8461},
{'classification': 0, 'count': 3825},
{'classification': 1, 'count': 1319}],
'total_toys': 13605}]
import pandas as pd
df = pd.DataFrame(data)
print(df.loc[0]['name'])
print(df.loc[0][1][0]['classification'])
print(df.loc[0][1][0]['count'])
which gives:
Alex
3
383
I have a dataframe that has JSON values are in columns. Those were indented into multiple levels. I would like to extract the end key and value into a new dataframe. I will give you sample column values below
{'shipping_assignments': [{'shipping': {'address': {'address_type':
'shipping', 'city': 'Calder', 'country_id': 'US',
'customer_address_id': 1, 'email': 'roni_cost#example.com',
'entity_id': 1, 'firstname': 'Veronica', 'lastname': 'Costello',
'parent_id': 1, 'postcode': '49628-7978', 'region': 'Michigan',
'region_code': 'MI', 'region_id': 33, 'street': ['6146 Honey Bluff
Parkway'], 'telephone': '(555) 229-3326'}, 'method':
'flatrate_flatrate', 'total': {'base_shipping_amount': 5,
'base_shipping_discount_amount': 0,
'base_shipping_discount_tax_compensation_amnt': 0,
'base_shipping_incl_tax': 5, 'base_shipping_invoiced': 5,
'base_shipping_tax_amount': 0, 'shipping_amount': 5,
'shipping_discount_amount': 0,
'shipping_discount_tax_compensation_amount': 0, 'shipping_incl_tax':
5, 'shipping_invoiced': 5, 'shipping_tax_amount': 0}}, 'items':
[{'amount_refunded': 0, 'applied_rule_ids': '1',
'base_amount_refunded': 0, 'base_discount_amount': 0,
'base_discount_invoiced': 0, 'base_discount_tax_compensation_amount':
0, 'base_discount_tax_compensation_invoiced': 0,
'base_original_price': 29, 'base_price': 29, 'base_price_incl_tax':
31.39, 'base_row_invoiced': 29, 'base_row_total': 29, 'base_row_total_incl_tax': 31.39, 'base_tax_amount': 2.39,
'base_tax_invoiced': 2.39, 'created_at': '2019-09-27 10:03:45',
'discount_amount': 0, 'discount_invoiced': 0, 'discount_percent': 0,
'free_shipping': 0, 'discount_tax_compensation_amount': 0,
'discount_tax_compensation_invoiced': 0, 'is_qty_decimal': 0,
'item_id': 1, 'name': 'Iris Workout Top', 'no_discount': 0,
'order_id': 1, 'original_price': 29, 'price': 29, 'price_incl_tax':
31.39, 'product_id': 1434, 'product_type': 'configurable', 'qty_canceled': 0, 'qty_invoiced': 1, 'qty_ordered': 1,
'qty_refunded': 0, 'qty_shipped': 1, 'row_invoiced': 29, 'row_total':
29, 'row_total_incl_tax': 31.39, 'row_weight': 1, 'sku':
'WS03-XS-Red', 'store_id': 1, 'tax_amount': 2.39, 'tax_invoiced':
2.39, 'tax_percent': 8.25, 'updated_at': '2019-09-27 10:03:46', 'weight': 1, 'product_option': {'extension_attributes':
{'configurable_item_options': [{'option_id': '141', 'option_value':
167}, {'option_id': '93', 'option_value': 58}]}}}]}],
'payment_additional_info': [{'key': 'method_title', 'value': 'Check /
Money order'}], 'applied_taxes': [{'code': 'US-MI--Rate 1', 'title':
'US-MI--Rate 1', 'percent': 8.25, 'amount': 2.39, 'base_amount':
2.39}], 'item_applied_taxes': [{'type': 'product', 'applied_taxes': [{'code': 'US-MI--Rate 1', 'title': 'US-MI--Rate 1', 'percent':
8.25, 'amount': 2.39, 'base_amount': 2.39}]}], 'converting_from_quote': True}
Above is single row value of the dataframe column df['x']
My codes are below to convert
sample = data['x'].tolist()
data = json.dumps(sample)
df = pd.read_json(data)
it gives new dataframe with columns
Index(['applied_taxes', 'converting_from_quote', 'item_applied_taxes',
'payment_additional_info', 'shipping_assignments'],
dtype='object')
When I tried to do the same above to convert the column which has row values
m_df = df['applied_taxes'].apply(lambda x : re.sub('.?\[|$.|]',"", str(x)))
m_sample = m_df.tolist()
m_data = json.dumps(m_sample)
c_df = pd.read_json(m_data)
It doesn't work
Check this link to get the beautified_json
I came across a beautiful ETL package in python called petl. convert the json list into dict form with the help of function called fromdicts(json_string)
order_table = fromdicts(data_list)
If you find any nested dict in any of the columns, use unpackdict(order_table,'nested_col')
it will unpack the nested dict.
In my case, I need to unpack the applied_tax column. Below code will unpack and append the key and value as a column and row in the same table.
order_table = unpackdict(order_table, 'applied_taxes')
If you guys wants to know more about -petl
It seems that your mistake was in tolist(). Try the following:
import pandas as pd
import json
import re
data = {"shipping_assignments":[{"shipping":{"address":{"address_type":"shipping","city":"Calder","country_id":"US","customer_address_id":1,"email":"roni_cost#example.com","entity_id":1,"firstname":"Veronica","lastname":"Costello","parent_id":1,"postcode":"49628-7978","region":"Michigan","region_code":"MI","region_id":33,"street":["6146 Honey Bluff Parkway"],"telephone":"(555) 229-3326"},"method":"flatrate_flatrate","total":{"base_shipping_amount":5,"base_shipping_discount_amount":0,"base_shipping_discount_tax_compensation_amnt":0,"base_shipping_incl_tax":5,"base_shipping_invoiced":5,"base_shipping_tax_amount":0,"shipping_amount":5,"shipping_discount_amount":0,"shipping_discount_tax_compensation_amount":0,"shipping_incl_tax":5,"shipping_invoiced":5,"shipping_tax_amount":0}},"items":[{"amount_refunded":0,"applied_rule_ids":"1","base_amount_refunded":0,"base_discount_amount":0,"base_discount_invoiced":0,"base_discount_tax_compensation_amount":0,"base_discount_tax_compensation_invoiced":0,"base_original_price":29,"base_price":29,"base_price_incl_tax":31.39,"base_row_invoiced":29,"base_row_total":29,"base_row_total_incl_tax":31.39,"base_tax_amount":2.39,"base_tax_invoiced":2.39,"created_at":"2019-09-27 10:03:45","discount_amount":0,"discount_invoiced":0,"discount_percent":0,"free_shipping":0,"discount_tax_compensation_amount":0,"discount_tax_compensation_invoiced":0,"is_qty_decimal":0,"item_id":1,"name":"Iris Workout Top","no_discount":0,"order_id":1,"original_price":29,"price":29,"price_incl_tax":31.39,"product_id":1434,"product_type":"configurable","qty_canceled":0,"qty_invoiced":1,"qty_ordered":1,"qty_refunded":0,"qty_shipped":1,"row_invoiced":29,"row_total":29,"row_total_incl_tax":31.39,"row_weight":1,"sku":"WS03-XS-Red","store_id":1,"tax_amount":2.39,"tax_invoiced":2.39,"tax_percent":8.25,"updated_at":"2019-09-27 10:03:46","weight":1,"product_option":{"extension_attributes":{"configurable_item_options":[{"option_id":"141","option_value":167},{"option_id":"93","option_value":58}]}}}]}],"payment_additional_info":[{"key":"method_title","value":"Check / Money order"}],"applied_taxes":[{"code":"US-MI-*-Rate 1","title":"US-MI-*-Rate 1","percent":8.25,"amount":2.39,"base_amount":2.39}],"item_applied_taxes":[{"type":"product","applied_taxes":[{"code":"US-MI-*-Rate 1","title":"US-MI-*-Rate 1","percent":8.25,"amount":2.39,"base_amount":2.39}]}],"converting_from_quote":"True"}
df = pd.read_json(json.dumps(data))
m_df = df['applied_taxes'].apply(lambda x : re.sub('.?\[|$.|]',"", str(x)))
c_df = pd.read_json(json.dumps(list(m_df)))
print(c_df)
prints the following:
0
0 {'code': 'US-MI-*-Rate 1', 'title': 'US-MI-*-R...
I need to do a function that generate a rank number(integer) based in N parameter received in that function. Each number need to be the same order of magnitude of that parameters, compare numbers of the same field and when the numbers of the same field tie than use second field to deadlock, e.g:
func({'field1': 100, 'field2': 3500225})
func({'field1': 50, 'field2': 5465481362135)
The number generated by the first function must be higher than the second because 100 is greater than 50.
func({'field1': 100, 'field2': 3500225})
func({'field1': 100, 'field2': 5465481362135)
The number generated by the second function must be higher than the first because field1 is tie than to deadloack it need to use the second field, so 5465481362135 is greater than 3500225.
func({'field1': 100, 'field2': 3500225})
func({'field1': 100, 'field2': 3500225, 'field3': 5465481362135, ...N})
In the sample above the second function need to be higher than the first because the first call function doesn't have the field3 so we can set their value to zero. Note that we can have N fields.
I have tried this code bellow but the value after the dot isn't right because a simple sum doesn't consider the priority of fields.:
Config = [
{'id': 1, 'parent_id': None, 'Description': 'root', 'level': 1},
{'id': 2, 'parent_id': 1, 'Description': 'PF', 'level': 1},
{'id': 3, 'parent_id': 1, 'Description': 'FP', 'level': 2},
{'id': 4, 'parent_id': 2, 'Description': 'Bank', 'level': 1},
{'id': 5, 'parent_id': 2, 'Description': 'Input', 'level': 2},
{'id': 6, 'parent_id': 4, 'Description': 'ST', 'level': 1},
{'id': 7, 'parent_id': 4, 'Description': 'CF', 'level': 2},
{'id': 8, 'parent_id': 4, 'Description': 'BB', 'level': 3},
{'id': 9, 'parent_id': 5, 'Description': 'DDS', 'level': 1},
{'id': 10, 'parent_id': 5, 'Description': 'Col', 'level': 2},
{'id': 11, 'parent_id': 3, 'Description': 'Qtd.Event', 'level': 1},
{'id': 12, 'parent_id': 3, 'Description': 'Unix_Date', 'level': 2},
]
def hierarchy_field(field):
if field[0]['parent_id'] is None:
return field
else:
[field.append(x) for x in hierarchy_field([dic for dic in Config if dic['id'] == field[0]['parent_id']])]
return field
def calc_priority(fields):
for field in [dic for dic in Config if dic['Description'] in [x for x in fields]]:
score_level = sum(x['level'] for x in hierarchy_field([field]))*-1
score_level += score_level
score_value = sum(fields.values())
score = str(score_level)+'.'+str(score_value)
return score
# dic[field['Description']] = score_level
# return sorted(dic, reverse=True)
value = calc_priority({'Qtd.Event': 871, 'Unix_Date': 564645})
print(value)
I have a list of id's sorted in a proper oder:
ids = [1, 2, 4, 6, 5, 0, 3]
I also have a list of dictionaries, sorted in some random way:
rez = [{'val': 7, 'id': 1}, {'val': 8, 'id': 2}, {'val': 2, 'id': 3}, {'val': 0, 'id': 4}, {'val': -1, 'id': 5}, {'val': -4, 'id': 6}, {'val': 9, 'id': 0}]
My intention is to sort rez list in a way that corresponds to ids:
rez = [{'val': 7, 'id': 1}, {'val': 8, 'id': 2}, {'val': 0, 'id': 4}, {'val': -4, 'id': 6}, {'val': -1, 'id': 5}, {'val': 9, 'id': 0}, {'val': 2, 'id': 3}]
I tried:
rez.sort(key = lambda x: ids.index(x['id']))
However that way is too slow for me, as len(ids) > 150K, and each dict actually had a lot of keys (some values there are strings). Any suggestion how to do it in the most pythonic, but still fastest way?
You don't need to sort because ids specifies the entire ordering of the result. You just need to pick the correct elements by their ids:
rez_dict = {d['id']:d for d in rez}
rez_ordered = [rez_dict[id] for id in ids]
Which gives:
>>> rez_ordered
[{'id': 1, 'val': 7}, {'id': 2, 'val': 8}, {'id': 4, 'val': 0}, {'id': 6, 'val': -4}, {'id': 5, 'val': -1}, {'id': 0, 'val': 9}, {'id': 3, 'val': 2}]
This should be faster than sorting because it can be done in linear time on average, while sort is O(nlogn).
Note that this assumes that there will be one entry per id, as in your example.
I think you are on the right track. If you need to speed it up, because your list is too long and you are having quadratic complexity, you can turn the list into a dictionary first, mapping the ids to their respective indices.
indices = {id_: pos for pos, id_ in enumerate(ids)}
rez.sort(key = lambda x: indices[x['id']])
This way, indices is {0: 5, 1: 0, 2: 1, 3: 6, 4: 2, 5: 4, 6: 3}, and rez is
[{'id': 1, 'val': 7},
{'id': 2, 'val': 8},
{'id': 4, 'val': 0},
{'id': 6, 'val': -4},
{'id': 5, 'val': -1},
{'id': 0, 'val': 9},
{'id': 3, 'val': 2}]
I have a problem try to use $elemMatch in dual nested array:
Suppose I have this a document:
a = {'cart': [[{'id': 1, 'count': 1}, {'id': 2, 'count': 3}], [{'id': 1, 'count': 5}]]}
And I want to select a document out when id is 1 and count greater than 2:
db.cart.find_one({'cart.0.id': 1, 'cart.0.count': {'$gt': 2}})
But this query will select a out.
Then I have tried these queries:
db.cart.find_one({'cart': {'$elemMatch': {'id': 1, 'count': {'$gt': 2}}}})
db.cart.find_one({'cart': {'$elemMatch': {'id': 2, 'count': {'$gt': 2}}}})
db.cart.find_one({'cart.0': {'$elemMatch': {'id': 1, 'count': {'$gt': 2}}}})
db.cart.find_one({'cart.0': {'$elemMatch': {'id': 2, 'count': {'$gt': 2}}}})
But all return None.
So do $elemMatch support the nested array match? If so, how shall I tune my query?
Given the fact that you have an array within an array, I think you could try something like
db.cart.find_one({'cart': {'$elemMatch': { '$elemMatch' : {'id': 1, 'count': {'$gt': 2}}}}})