Using json_normalize() for missing keys Python Pandas DataFrame - python

I have this snapshot of my dataset
test={'data': [{'name': 'john',
'insights': {'data': [{'account_id': '123',
'test_id': '456',
'date_start': '2022-12-31',
'date_stop': '2023-01-29',
'impressions': '4070',
'spend': '36.14'}],
'paging': {'cursors': {'before': 'MAZDZD', 'after': 'MAZDZD'}}},
'status': 'ACTIVE',
'id': '789'},
{'name': 'jack', 'status': 'PAUSED', 'id': '420'}]
}
I want to create a pandas dataframe where the columns are the name, date_start, date_stop, impressions, and spend.
When I tried json_normalize(), it raises an error because some of the keys are missing, when 'status':'PAUSED'. Is there a way to remove the values when the keys are missing from the list or another way of using json_normalize()? I tried errors='ignore' but it doesnt work as well.

Related

Python dataframe to list/dict

I have a sample dataframe as below
I want this dataframe converted to this below format in python so I can pass it into dtype
{
'FirstName':'string',
'LastName':'string',
'Department':'integer',
'EmployeeID':'string', }
Could anyone please let me know how this can be done.
To note above: I need the exact string {'FirstName': 'string', 'LastName': 'string', 'Department': 'integer', 'EmployeeID': 'string'} from the exact dataframe.
The dataframe has list of primary key names and its datatype. I want to pass this primary_key and datatype combination into concat_df.to_csv(csv_buffer, sep=",", index=False, dtype = {'FirstName': 'string', 'LastName': 'string', 'Department': 'integer', 'EmployeeID': 'string'})
dict/zip the two series:
import pandas as pd
data = pd.DataFrame({
'Column_Name': ['FirstName', 'LastName', 'Department', 'EmployeeID'],
'Datatype': ['string', 'string', 'integer', 'string'],
})
mapping = dict(zip(data['Column_Name'], data['Datatype']))
print(mapping)
prints out
{'FirstName': 'string', 'LastName': 'string', 'Department': 'integer', 'EmployeeID': 'string'}
use to record which is much more handy.
print(dict(df.to_records(index=False)))
Should Gives #
{'FirstName': 'string', 'LastName': 'string', 'Department': 'integer', 'EmployeeID': 'string'}
Edit :
If you want keys alone then
d = dict(df.to_records(index=False))
print(list(d.keys()))
should Gives #
['FirstName', 'LastName', 'Department', 'EmployeeID']
You can do an easy dict comprehension with your data:
Input data:
data = pd.DataFrame({'Column_Name' : ['FirstName', 'LastName', 'Department'], 'Datatype' : ['Jane', 'Doe', 666]})
Dict comprehension:
{n[0]:n[1] for n in data.to_numpy()}
This will give you:
{'FirstName': 'Jane', 'LastName': 'Doe', 'Department': '666'}
There are for sure other ways, e.g. using the pandas to_dict function, but I am not very familiar with this.
Edit:
But keep in mind, a dictionary needs unique values.
Your categories (first, lastname) looks like general categories. This here will only work for a single person, otherwise you have multiple keys.

How to perform an assertion to verify an item is in a list of dicts in Python

I am trying to figure out how to do an assertion to see if a number exists in a list.
So my list looks like:
data = [{'value': Decimal('4.21'), 'Type': 'sale'},
{'value': Decimal('84.73'), 'Type': 'sale'},
{'value': Decimal('70.62'), 'Type': 'sale'},
{'value': Decimal('15.00'), 'Type': 'credit'},
{'value': Decimal('2.21'), 'Type': 'credit'},
{'value': Decimal('4.21'), 'Type': 'sale'},
{'value': Decimal('84.73'), 'Type': 'sale'},
{'value': Decimal('70.62'), 'Type': 'sale'},
{'value': Decimal('15.00'), 'Type': 'credit'},
{'value': Decimal('2.21'), 'Type': 'credit'}]
Now I am trying to iterate through the list like:
for i in data:
s = i['value']
print(s)
assert 2.21 in i['value'], "Value should be there"
I am somehow only getting the first number returned for "value" i.e. 4.21
You have two problems as other commenters pointed out. You compare the wrong data types (str against Decimal, or after your edit, float against Decimal) and you also terminate on first failure. You probably wanted to write:
assert Decimal('2.21') in (d["value"] for d in data)
This will extract the value of the "value" key from each sub-dictionary inside the list and search for Decimal('2.21') in them.

Converting Dictionary Values to new Dictionary

I am pulling json data from an API and have a number of columns in my dataframe that contain dictionaries. These dictionaries are written so that the id and the value are two separate entries in the dictionary like so:
{'id': 'AnnualUsage', 'value': '13071'}
Some of the rows for these columns contain only one dictionary entry like shown above, but others can contain up to 7:
[{'id': 'AnnualUsage', 'value': '13071'},
{'id': 'TestId', 'value': 'Z13753'},
{'id': 'NumberOfMe', 'value': '3'},
{'id': 'Prem', 'value': '960002'},
{'id': 'ProjectID', 'value': '0039'},
{'id': 'Region', 'value': 'CHR'},
{'id': 'Tariff', 'value': 'Multiple'},
{'id': 'Number', 'value': '06860702'}]
When I attempt to break this dictionary down into separate column attributes
CTG_df2 = pd.concat([CTG_df['id'], CTG_df['applicationUDFs'].apply(pd.Series)], axis=1)
I end up with columns in a dataframe each containing a dictionary of the above entry i.e.
{'id': 'AnnualUsageDE', 'value': '13071'}
Is there a way for me to convert my dictionary values into new key-value pairs? For instance I would like to convert from:
{'id': 'AnnualUsageDE', 'value': '13071'}
to
{'AnnualUsageDE': '13071'}
If this is possible I will then be able to create new columns from these attributes.
You can do a dict comprehension. From your list of dicts, compose a new dict where the key is the id of each element and the value is the value of each element.
original = [{'id': 'AnnualUsage', 'value': '13071'},
{'id': 'TestId', 'value': 'Z13753'},
{'id': 'NumberOfMe', 'value': '3'},
{'id': 'Prem', 'value': '960002'},
{'id': 'ProjectID', 'value': '0039'},
{'id': 'Region', 'value': 'CHR'},
{'id': 'Tariff', 'value': 'Multiple'},
{'id': 'Number', 'value': '06860702'}]
newdict = {subdict['id']: subdict['value'] for subdict in original}
print(newdict)
# {'AnnualUsage': '13071',
# 'Number': '06860702',
# 'NumberOfMe': '3',
# 'Prem': '960002',
# 'ProjectID': '0039',
# 'Region': 'CHR',
# 'Tariff': 'Multiple',
# 'TestId': 'Z13753'}
You can iterate through the values and set each of them to the dictionary value:
newdict = dict()
for x in original:
newdict[x["id"]] = x["value"]

Convert nested dict in pandas dataframe

I have a nested dictionary with the following structure. I am trying to convert it to pandas dataframe, however I have problems to split the 'mappings' dictionary to have it in separate columns.
{'16':
{'label': 't1',
'prefLab': 'name',
'altLabel': ['test1', 'test3'],
'map': [{'id': '16', 'idMap': {'ciID': 16, 'map3': '033441'}}]
},
'17':
{'label': 't2',
'prefLab': 'name2',
'broader': ['18'],
'altLabel': ['test2'],
'map': [{'id': '17', 'idMap': {'ciID': 17, 'map1': 1006558, 'map2': 1144}}]
}
}
ideal outcome would be a dataframe with the following structure.
label prefLab broader altLab ciID, map1, map2, map3 ...
16
17
Try with this: assuming your json format name is "data" then
train = pd.DataFrame.from_dict(data, orient='index')

json_normalize None handling change (pandas .23)

I have jsons containing nested values that are sometimes None and the behavior has changed between pandas 0.22.0 and pandas 0.23.0.
In 0.22.0:
from pandas.io.json import json_normalize
my_json = {'event': {'name': 'Bob', 'id': '12345','id2': None},
'id': '12345', 'labels': []}
json_normalize(my_json)
gives:
event.id event.id2 event.name id labels
12345 None Bob 12345 []
which I want.
In 0.23.0:
from pandas.io.json import json_normalize
my_json = {'event': {'name': 'Bob', 'id': '12345','id2': None},
'id': '12345', 'labels': []}
json_normalize(my_json)
returns KeyError: 'id2'
Toggling ignore errors does nothing, and it's not really feasible to change the nested Nones to placeholder values. Anyone know how to achieve the prior behavior with the update?

Categories

Resources