I am pulling json data from an API and have a number of columns in my dataframe that contain dictionaries. These dictionaries are written so that the id and the value are two separate entries in the dictionary like so:
{'id': 'AnnualUsage', 'value': '13071'}
Some of the rows for these columns contain only one dictionary entry like shown above, but others can contain up to 7:
[{'id': 'AnnualUsage', 'value': '13071'},
{'id': 'TestId', 'value': 'Z13753'},
{'id': 'NumberOfMe', 'value': '3'},
{'id': 'Prem', 'value': '960002'},
{'id': 'ProjectID', 'value': '0039'},
{'id': 'Region', 'value': 'CHR'},
{'id': 'Tariff', 'value': 'Multiple'},
{'id': 'Number', 'value': '06860702'}]
When I attempt to break this dictionary down into separate column attributes
CTG_df2 = pd.concat([CTG_df['id'], CTG_df['applicationUDFs'].apply(pd.Series)], axis=1)
I end up with columns in a dataframe each containing a dictionary of the above entry i.e.
{'id': 'AnnualUsageDE', 'value': '13071'}
Is there a way for me to convert my dictionary values into new key-value pairs? For instance I would like to convert from:
{'id': 'AnnualUsageDE', 'value': '13071'}
to
{'AnnualUsageDE': '13071'}
If this is possible I will then be able to create new columns from these attributes.
You can do a dict comprehension. From your list of dicts, compose a new dict where the key is the id of each element and the value is the value of each element.
original = [{'id': 'AnnualUsage', 'value': '13071'},
{'id': 'TestId', 'value': 'Z13753'},
{'id': 'NumberOfMe', 'value': '3'},
{'id': 'Prem', 'value': '960002'},
{'id': 'ProjectID', 'value': '0039'},
{'id': 'Region', 'value': 'CHR'},
{'id': 'Tariff', 'value': 'Multiple'},
{'id': 'Number', 'value': '06860702'}]
newdict = {subdict['id']: subdict['value'] for subdict in original}
print(newdict)
# {'AnnualUsage': '13071',
# 'Number': '06860702',
# 'NumberOfMe': '3',
# 'Prem': '960002',
# 'ProjectID': '0039',
# 'Region': 'CHR',
# 'Tariff': 'Multiple',
# 'TestId': 'Z13753'}
You can iterate through the values and set each of them to the dictionary value:
newdict = dict()
for x in original:
newdict[x["id"]] = x["value"]
Related
I have the following dictionary
{'data':[{'action_values': [
{'action_type': 'offsite_conversion', 'value': '5479.8'},
{'action_type': 'omni_add_to_cart', 'value': '9217.55'},
{'action_type': 'omni_purchase', 'value': '5479.8'},
{'action_type': 'add_to_cart', 'value': '9217.55'}]}]}
And I am trying to convert it where each element after action type is a pandas DataFrame column, and the value as row. Something like
offsite_conversion omni_add_to_cart omni_purchase add_to_cart
0 5479.8 9217.55 5479.8 9217.55
Using .json_normalize():
df = pd.json_normalize(data=data["data"], record_path="action_values").transpose().reset_index(drop=True)
df = df.rename(columns=df.iloc[0]).drop(df.index[0]).reset_index(drop=True)
I have this snapshot of my dataset
test={'data': [{'name': 'john',
'insights': {'data': [{'account_id': '123',
'test_id': '456',
'date_start': '2022-12-31',
'date_stop': '2023-01-29',
'impressions': '4070',
'spend': '36.14'}],
'paging': {'cursors': {'before': 'MAZDZD', 'after': 'MAZDZD'}}},
'status': 'ACTIVE',
'id': '789'},
{'name': 'jack', 'status': 'PAUSED', 'id': '420'}]
}
I want to create a pandas dataframe where the columns are the name, date_start, date_stop, impressions, and spend.
When I tried json_normalize(), it raises an error because some of the keys are missing, when 'status':'PAUSED'. Is there a way to remove the values when the keys are missing from the list or another way of using json_normalize()? I tried errors='ignore' but it doesnt work as well.
I am getting output in this format
But I want output in this format
Any Help will be appreciated
Thankyou in Advance
I've tried to convert my data into an array but it doesn't work as i want
This is my output :
{'date': '2021-12-30 17:31:05.865139', 'sub_data': [{'key': 'day0', 'value': 255}, {'key': 'day1', 'value': 1}, {'key': 'day3', 'value': 8}, {'key': 'day7', 'value': 2}, {'key': 'day15', 'value': 3}, {'key': 'day30', 'value': 5}]}
{'date': '2021-12-31 17:31:05.907697', 'sub_data': [{'key': 'day0', 'value': 222}, {'key': 'day1', 'value': 1}, {'key': 'day3', 'value': 0}, {'key': 'day7', 'value': 0}, {'key': 'day15', 'value': 1}, {'key': 'day30', 'value': 0}]}]
There are a few ways you can generate a pandas dataframe the way you want. The output data you provide is very nested and you have to pull out data. A problem is, that in the sub-set data the dictionary keys are called 'key" and not the actual name. With a custom function you can prepare the data as needed:
Option I:
def generate_dataframe(dataset):
# Init empty DataFrame - bad practice
df_result = pd.DataFrame()
for data in dataset:
dataframe_row = {}
# Convert date
date_time_obj = datetime.strptime(data['date'], '%Y-%m-%d %H:%M:%S.%f')
dataframe_row['date'] = date_time_obj.strftime("%d%b%y")
for vals in data['sub_data']:
dataframe_row[vals['key']] = vals['value']
df_result = df_result.append(dataframe_row, ignore_index=True)
return df_result
dataset =[output_I,output_II]
df = generate_dataframe(dataset)
Option II:
Extract data and transpose sub data
def process_sub_data(data):
# convert subdate to dataframe first
df_data = pd.DataFrame(data['sub_data'])
# Transpose dataframe
df_data = df_data.T
# Make first row to column
df_data.columns = df_data.iloc[0]
df_data = df_data.iloc[1:].reset_index(drop=True)
Option III
You can try to format nested data with
df_res = pd.json_normalize(data, max_level=2)
This will not work properly as your column names (day1, ... day30) are not the keys of the dict
Hope I could help :)
I am trying to figure out how to do an assertion to see if a number exists in a list.
So my list looks like:
data = [{'value': Decimal('4.21'), 'Type': 'sale'},
{'value': Decimal('84.73'), 'Type': 'sale'},
{'value': Decimal('70.62'), 'Type': 'sale'},
{'value': Decimal('15.00'), 'Type': 'credit'},
{'value': Decimal('2.21'), 'Type': 'credit'},
{'value': Decimal('4.21'), 'Type': 'sale'},
{'value': Decimal('84.73'), 'Type': 'sale'},
{'value': Decimal('70.62'), 'Type': 'sale'},
{'value': Decimal('15.00'), 'Type': 'credit'},
{'value': Decimal('2.21'), 'Type': 'credit'}]
Now I am trying to iterate through the list like:
for i in data:
s = i['value']
print(s)
assert 2.21 in i['value'], "Value should be there"
I am somehow only getting the first number returned for "value" i.e. 4.21
You have two problems as other commenters pointed out. You compare the wrong data types (str against Decimal, or after your edit, float against Decimal) and you also terminate on first failure. You probably wanted to write:
assert Decimal('2.21') in (d["value"] for d in data)
This will extract the value of the "value" key from each sub-dictionary inside the list and search for Decimal('2.21') in them.
I have a pandas data frame that contains a list of dictionary on one column, So now I need to update that list of dictionary value based on same data frame id, currently what I do is using the data frame as a lookup while calculating the value,
id name ancestors
55324862 CTICC [{'id': '6197560', 'type': 'neighbor'}, {'id': '6155003', 'type': 'city'}]
6197560 Cape Town City [{'id': '910', 'type': 'city'}, {'id': '6046820', 'type': 'vicinity'},{'id': '55324862', 'type': 'continent'}]
6046820 Cape Town [{'id': '165', 'type': 'country'}, {'id': '55324862', 'type': 'continent'}]
What I do currently
I made a loop up JSON file using the id and name column from the dataframe and iterate through each row of the dataframe and and use the lookup file to generate the ancestors name value.
What I want to achieve
id name ancestors
55324862 CTICC [{'id': '6197560', 'type': 'neighbor','name':'Cape Town City'}]
6197560 Cape Town City [{'id': '6046820', 'type': 'vicinity', 'name':'Cape Town'},{'id': '55324862', 'type': 'continent','name':'CTICC'}]
6046820 Cape Town [{'id': '165', 'type': 'country','name':'YXZ'}, {'id': '55324862', 'type': 'continent','name': 'XYZ'}]
What I want to do:
I don't want to use a lookup file as I have around 700K records to lookup and set name, So is there any other way I can do this without using a lookup file?