python key value two dict matches - python

I am tyring to match the value of two dicts of two sperate keys by looping over them-with hopefully if i in line_aum['id_class'] == line_investor['id_class'] becoming True, then the next sum dunction will work:
Tho it kicks out a different result
so far I have:
for line_aum in aum_obj:
for line_investor in investor_obj:
if i in line_aum['id_class'] == line_investor['id_class']:
total = (sum,line_investor['amount'], line_aum['value'])
amount = line['id_class']
print(amount,total)
Example data:
{'fund_name': '', 'fund_code': 'LFE', 'aumc': '406.37', 'value': '500', 'ddate': '2013-01-01', 'id_fund': '165', 'currency': 'EUR', 'nav': '24.02', 'shares': '16.918', 'estimate': '0', 'id_class': '4526', 'class_name': 'LTD - CLASS B (EUR)'}

Use itertools.product instead of nested loops if both aum_obj and investor_obj are lists:
from itertools import product
for line_aum, line_investor in product(aum_obj, investor_obj):
if line_aum['id_class'] == line_investor['id_class']:
# `line_aum` and `line_investor` have matching values for the `id_class` keys.

Related

How can I explode a nested dictionary into a dataframe?

I have a nested dictionary as below. I'm trying to convert the below to a dataframe with the columns iid, Invnum, #type, execId, CId, AId, df, type. What’s the best way to go about it?
data = {'A': {'B1': {'iid': 'B1', 'Invnum': {'B11': {'#type': '/test_data', 'execId': 42, 'CId': 42, 'AId': 'BAZ'}, 'B12': {'#type': '/test_data', 'CId': 8, 'AId': '123'}}}}, 'B2': {'iid': 'B2', 'Invnum': {'B21': {'#type': '/test_data', 'execId': 215, 'CId': 253,'df': [], 'type': 'F'}, 'B22': {'#type': '/test_data', 'execId': 10,'df': [], 'type': 'F'}}}}
for key1 in data['A'].keys():
for key2 in data['A'][key1]['Invnum']:
print(key1,key2)
Expected output:
As indicated in the comments, your input data is very obscure. This provides a lot of trouble for us, because we don't know what we can assume or not. For my solution I will assume at least the following, based on the example you provide:
In the dictionary there is an entry containing the iid and Invnum as keys in the same level.
The Invnum key is the only key, which has multiple values, or in otherwords is iterable (besides df), and on iteration it must hold the last dictionary. In otherwords, after the Invnum value (e.g. B11), you can only get the last dict with the other fields as keys (#type, execId, CId, AId, df, type), if they exists.
If there is a df value, it will hold a list.
# This is a place holder dictionary, so we can create entries that have the same pattern.
entry = {'#type': '', 'execId': '', 'CId': '', 'AId': '', 'df': '', 'type': ''}
# This will hold all the (properly) format entries for the df.
items = []
def flatten(data):
if isinstance(data, dict):
match data:
# We are searching for a level that contains both an `iid` and `Invnum` key.
case {'iid': id, 'Invnum': remainder}:
for each in remainder:
entry_row = dict(**entry, iid=id, Invnum=each)
entry_row.update(remainder[each])
items.append(entry_row)
case _:
for key, value in data.items():
flatten(value)
# We flatten the data, such that the `items` variable will hold consistent entries
flatten(data)
# Transfer to pandas dataframe, and reorder the values for easy comparison.
df = pd.DataFrame(items)
df = df[['iid', 'Invnum', '#type', 'execId', 'CId', 'AId', 'df', 'type']]
print(df.to_string(index=False))
Output:
iid Invnum #type execId CId AId df type
B1 B11 /test_data 42 42 BAZ
B1 B12 /test_data 8 123
B2 B21 /test_data 215 253 [] F
B2 B22 /test_data 10 [] F
Note:
All entries have been turned into strings, since I am using '' for empty values.
I heavily rely on the above made assumptions, in case they are incorrect, the answer will not match your expectation.
I am using Structural pattern matching, which is introduced in python 3.10.

Replacement for Spark's CASE WHEN THEN

I am new to Spark and am trying to optimize code written by another developer. The scenario is as follows:
There is a list of dictionaries with three key-value pairs. One is source:value, second is target:value and third is column:value.
CASE WHEN THEN statement is generated based on above three key-value pairs. For instance, the list of dictionaries is as follows:
values = [{'target': 'Unknown', 'source': '', 'column': 'gender'},
{'target': 'F', 'source': '0', 'column': 'gender'},
{'target': 'M', 'source': '1', 'column': 'gender'},
{'target': 'F', 'source': 'F', 'column': 'gender'},
{'target': 'F', 'source': 'Fe', 'column': 'gender'}]
The following code generates the CASE WHEN THEN statement that follows.
for value in values:
source_value = value.get("source")
op = op.when(df[column] == source, value.get("target"))
Column<'CASE WHEN (gender = ) THEN Unknown
WHEN (gender = 0) THEN F
WHEN (gender = 1) THEN M
WHEN (gender = F) THEN F
WHEN (gender = Fe) THEN F END'>
This CASE WHEN THEN is then used to select data from a dataframe.
Question: Is the usage of CASE WHEN THEN valid here (is it optimized)? Some of the CASE WHEN statements are very very lengthy (around 1000+). Is there a better way to redo the code (regex perhaps)?
I looked at the below questions, but were not relevant for my case.
CASE WHEN ... THEN
SPARK SQL - case when then
Thanks.
Two alternatives:
Use UDF, in which you can access a dictionary of values
Build a table, and perform broadcast join
The way to know which is better is by examining the execution plan, job duration and total shuffle.

How to Loop over dictionary and return output of specific values to Numpy Array

I am trying to loop over a dictionary, in this case the citi bike data, and pull out the values of the specific keys 'lat' and 'lon', and then put those values in a numpy array. I was able to pull the data from the URL but when trying a for loop I am getting stuck. The dictionary is 'datadict' which I pulled from the URL.
import requests
response = requests.get("https://gbfs.citibikenyc.com/gbfs/en/station_information.json")
datadict = response.json()
I tried this comprehension first and then tried forming a regular for statement.
import numpy as np
dict_reduce = {key: datadict[key] for key in datadict.values() & {'lon', 'lat'}}
coordinates = np.array(dict_reduce)
result = list()
for key in datadict:
for x in datadict[key]:
result.append(x['lon'])
Dictionary Preview:
"""
[{'capacity': 55,
'eightd_has_key_dispenser': False,
'eightd_station_services': [],
'electric_bike_surcharge_waiver': False,
'external_id': '66db237e-0aca-11e7-82f6-3863bb44ef7c',
'has_kiosk': True,
**'lat': 40.76727216,**
'legacy_id': '72',
**'lon': -73.99392888,**
'name': 'W 52 St & 11 Ave',
'region_id': '71',
'rental_methods': ['KEY', 'CREDITCARD'],
'rental_uris': {'android': 'https://bkn.lft.to/lastmile_qr_scan',
'ios': 'https://bkn.lft.to/lastmile_qr_scan'},
'short_name': '6926.01',
'station_id': '72',
'station_type': 'classic'},
{'capacity': 33,
'eightd_has_key_dispenser': False,
'eightd_station_services': [],
'electric_bike_surcharge_waiver': False,
'external_id': '66db269c-0aca-11e7-82f6-3863bb44ef7c',
'has_kiosk': True,
'lat': 40.71911552,
'legacy_id': '79',
'lon': -74.00666661,
'name': 'Franklin St & W Broadway',
'region_id': '71',
'rental_methods': ['KEY', 'CREDITCARD'],
'rental_uris': {'android': 'https://bkn.lft.to/lastmile_qr_scan',
'ios': 'https://bkn.lft.to/lastmile_qr_scan'},
'short_name': '5430.08',
'station_id': '79',
'station_type': 'classic'},
"""
Looks like you need.
dict_reduce = [{key: d[key] for key in {'lon', 'lat'} } for d in datadict]
You should be able to directly pull out the lat and long values from each element in the list.
From your dictionary preview it looks more like a list of dictionaries rather than a single dictionary and you need to account for that.
lat_list = list()
long_list = list()
for elem in datadict:
lat_list.append(elem['lat'])
long_list.append(elem['lon'])

Filtering through a list with embedded dictionaries

I've got a json format list with some dictionaries within each list, it looks like the following:
[{"id":13, "name":"Albert", "venue":{"id":123, "town":"Birmingham"}, "month":"February"},
{"id":17, "name":"Alfred", "venue":{"id":456, "town":"London"}, "month":"February"},
{"id":20, "name":"David", "venue":{"id":14, "town":"Southampton"}, "month":"June"},
{"id":17, "name":"Mary", "venue":{"id":56, "town":"London"}, "month":"December"}]
The amount of entries within the list can be up to 100. I plan to present the 'name' for each entry, one result at a time, for those that have London as a town. The rest are of no use to me. I'm a beginner at python so I would appreciate a suggestion in how to go about this efficiently. I initially thought it would be best to remove all entries that don't have London and then I can go through them one by one.
I also wondered if it might be quicker to not filter but to cycle through the entire json and select the names of entries that have the town as London.
You can use filter:
data = [{"id":13, "name":"Albert", "venue":{"id":123, "town":"Birmingham"}, "month":"February"},
{"id":17, "name":"Alfred", "venue":{"id":456, "town":"London"}, "month":"February"},
{"id":20, "name":"David", "venue":{"id":14, "town":"Southampton"}, "month":"June"},
{"id":17, "name":"Mary", "venue":{"id":56, "town":"London"}, "month":"December"}]
london_dicts = filter(lambda d: d['venue']['town'] == 'London', data)
for d in london_dicts:
print(d)
This is as efficient as it can get because:
The loop is written in C (in case of CPython)
filter returns an iterator (in Python 3), which means that the results are loaded to memory one by one as required
One way is to use list comprehension:
>>> data = [{"id":13, "name":"Albert", "venue":{"id":123, "town":"Birmingham"}, "month":"February"},
{"id":17, "name":"Alfred", "venue":{"id":456, "town":"London"}, "month":"February"},
{"id":20, "name":"David", "venue":{"id":14, "town":"Southampton"}, "month":"June"},
{"id":17, "name":"Mary", "venue":{"id":56, "town":"London"}, "month":"December"}]
>>> [d for d in data if d['venue']['town'] == 'London']
[{'id': 17,
'name': 'Alfred',
'venue': {'id': 456, 'town': 'London'},
'month': 'February'},
{'id': 17,
'name': 'Mary',
'venue': {'id': 56, 'town': 'London'},
'month': 'December'}]

Use list of indices to manipulate a nested dictionary

I'm trying to perform operations on a nested dictionary (data retrieved from a yaml file):
data = {'services': {'web': {'name': 'x'}}, 'networks': {'prod': 'value'}}
I'm trying to modify the above using the inputs like:
{'services.web.name': 'new'}
I converted the above to a list of indices ['services', 'web', 'name']. But I'm not able to/not sure how to perform the below operation in a loop:
data['services']['web']['name'] = new
That way I can modify dict the data. There are other values I plan to change in the above dictionary (it is extensive one) so I need a solution that works in cases where I have to change, EG:
data['services2']['web2']['networks']['local'].
Is there a easy way to do this? Any help is appreciated.
You may iterate over the keys while moving a reference:
data = {'networks': {'prod': 'value'}, 'services': {'web': {'name': 'x'}}}
modification = {'services.web.name': 'new'}
for key, value in modification.items():
keyparts = key.split('.')
to_modify = data
for keypart in keyparts[:-1]:
to_modify = to_modify[keypart]
to_modify[keyparts[-1]] = value
print(data)
Giving:
{'networks': {'prod': 'value'}, 'services': {'web': {'name': 'new'}}}

Categories

Resources