How can i change the dictionary output format in python - python

I am getting output in this format
But I want output in this format
Any Help will be appreciated
Thankyou in Advance
I've tried to convert my data into an array but it doesn't work as i want
This is my output :
{'date': '2021-12-30 17:31:05.865139', 'sub_data': [{'key': 'day0', 'value': 255}, {'key': 'day1', 'value': 1}, {'key': 'day3', 'value': 8}, {'key': 'day7', 'value': 2}, {'key': 'day15', 'value': 3}, {'key': 'day30', 'value': 5}]}
{'date': '2021-12-31 17:31:05.907697', 'sub_data': [{'key': 'day0', 'value': 222}, {'key': 'day1', 'value': 1}, {'key': 'day3', 'value': 0}, {'key': 'day7', 'value': 0}, {'key': 'day15', 'value': 1}, {'key': 'day30', 'value': 0}]}]

There are a few ways you can generate a pandas dataframe the way you want. The output data you provide is very nested and you have to pull out data. A problem is, that in the sub-set data the dictionary keys are called 'key" and not the actual name. With a custom function you can prepare the data as needed:
Option I:
def generate_dataframe(dataset):
# Init empty DataFrame - bad practice
df_result = pd.DataFrame()
for data in dataset:
dataframe_row = {}
# Convert date
date_time_obj = datetime.strptime(data['date'], '%Y-%m-%d %H:%M:%S.%f')
dataframe_row['date'] = date_time_obj.strftime("%d%b%y")
for vals in data['sub_data']:
dataframe_row[vals['key']] = vals['value']
df_result = df_result.append(dataframe_row, ignore_index=True)
return df_result
dataset =[output_I,output_II]
df = generate_dataframe(dataset)
Option II:
Extract data and transpose sub data
def process_sub_data(data):
# convert subdate to dataframe first
df_data = pd.DataFrame(data['sub_data'])
# Transpose dataframe
df_data = df_data.T
# Make first row to column
df_data.columns = df_data.iloc[0]
df_data = df_data.iloc[1:].reset_index(drop=True)
Option III
You can try to format nested data with
df_res = pd.json_normalize(data, max_level=2)
This will not work properly as your column names (day1, ... day30) are not the keys of the dict
Hope I could help :)

Related

Converting nested dictionary into a Pandas DataFrame

I have the following dictionary
{'data':[{'action_values': [
{'action_type': 'offsite_conversion', 'value': '5479.8'},
{'action_type': 'omni_add_to_cart', 'value': '9217.55'},
{'action_type': 'omni_purchase', 'value': '5479.8'},
{'action_type': 'add_to_cart', 'value': '9217.55'}]}]}
And I am trying to convert it where each element after action type is a pandas DataFrame column, and the value as row. Something like
offsite_conversion omni_add_to_cart omni_purchase add_to_cart
0 5479.8 9217.55 5479.8 9217.55
Using .json_normalize():
df = pd.json_normalize(data=data["data"], record_path="action_values").transpose().reset_index(drop=True)
df = df.rename(columns=df.iloc[0]).drop(df.index[0]).reset_index(drop=True)

Converting Dictionary Values to new Dictionary

I am pulling json data from an API and have a number of columns in my dataframe that contain dictionaries. These dictionaries are written so that the id and the value are two separate entries in the dictionary like so:
{'id': 'AnnualUsage', 'value': '13071'}
Some of the rows for these columns contain only one dictionary entry like shown above, but others can contain up to 7:
[{'id': 'AnnualUsage', 'value': '13071'},
{'id': 'TestId', 'value': 'Z13753'},
{'id': 'NumberOfMe', 'value': '3'},
{'id': 'Prem', 'value': '960002'},
{'id': 'ProjectID', 'value': '0039'},
{'id': 'Region', 'value': 'CHR'},
{'id': 'Tariff', 'value': 'Multiple'},
{'id': 'Number', 'value': '06860702'}]
When I attempt to break this dictionary down into separate column attributes
CTG_df2 = pd.concat([CTG_df['id'], CTG_df['applicationUDFs'].apply(pd.Series)], axis=1)
I end up with columns in a dataframe each containing a dictionary of the above entry i.e.
{'id': 'AnnualUsageDE', 'value': '13071'}
Is there a way for me to convert my dictionary values into new key-value pairs? For instance I would like to convert from:
{'id': 'AnnualUsageDE', 'value': '13071'}
to
{'AnnualUsageDE': '13071'}
If this is possible I will then be able to create new columns from these attributes.
You can do a dict comprehension. From your list of dicts, compose a new dict where the key is the id of each element and the value is the value of each element.
original = [{'id': 'AnnualUsage', 'value': '13071'},
{'id': 'TestId', 'value': 'Z13753'},
{'id': 'NumberOfMe', 'value': '3'},
{'id': 'Prem', 'value': '960002'},
{'id': 'ProjectID', 'value': '0039'},
{'id': 'Region', 'value': 'CHR'},
{'id': 'Tariff', 'value': 'Multiple'},
{'id': 'Number', 'value': '06860702'}]
newdict = {subdict['id']: subdict['value'] for subdict in original}
print(newdict)
# {'AnnualUsage': '13071',
# 'Number': '06860702',
# 'NumberOfMe': '3',
# 'Prem': '960002',
# 'ProjectID': '0039',
# 'Region': 'CHR',
# 'Tariff': 'Multiple',
# 'TestId': 'Z13753'}
You can iterate through the values and set each of them to the dictionary value:
newdict = dict()
for x in original:
newdict[x["id"]] = x["value"]

Convert Nested JSON into Dataframe

I have a nested JSON like below. I want to convert it into a pandas dataframe. As part of that, I also need to parse the weight value only. I don't need the unit.
I also want the number values converted from string to numeric.
Any help would be appreciated. I'm relatively new to python. Thank you.
JSON Example:
{'id': '123', 'name': 'joe', 'weight': {'number': '100', 'unit': 'lbs'},
'gender': 'male'}
Sample output below:
id name weight gender
123 joe 100 male
use " from pandas.io.json import json_normalize ".
id name weight.number weight.unit gender
123 joe 100 lbs male
if you want to discard the weight unit, just flatten the json:
temp = {'id': '123', 'name': 'joe', 'weight': {'number': '100', 'unit': 'lbs'}, 'gender': 'male'}
temp['weight'] = temp['weight']['number']
then turn it into a dataframe:
pd.DataFrame(temp)
Something like this should do the trick:
json_data = [{'id': '123', 'name': 'joe', 'weight': {'number': '100', 'unit': 'lbs'}, 'gender': 'male'}]
# convert the data to a DataFrame
df = pd.DataFrame.from_records(json_data)
# conver id to an int
df['id'] = df['id'].apply(int)
# get the 'number' field of weight and convert it to an int
df['weight'] = df['weight'].apply(lambda x: int(x['number']))
df

How to create a dict of dicts from pandas dataframe?

I have a dataframe df
id price date zipcode
u734 8923944 2017-01-05 AERIU87
uh72 9084582 2017-07-28 BJDHEU3
u029 299433 2017-09-31 038ZJKE
I want to create a dictionary with the following structure
{'id': xxx, 'data': {'price': xxx, 'date': xxx, 'zipcode': xxx}}
What I have done so far
ids = df['id']
prices = df['price']
dates = df['date']
zips = df['zipcode']
d = {'id':idx, 'data':{'price':p, 'date':d, 'zipcode':z} for idx,p,d,z in zip(ids,prices,dates,zips)}
>>> SyntaxError: invalid syntax
but I get the error above.
What would be the correct way to do this, using either
list comprehension
OR
pandas .to_dict()
bonus points: what is the complexity of the algorithm, and is there a more efficient way to do this?
I'd suggest the list comprehension.
v = df.pop('id')
data = [
{'id' : i, 'data' : j}
for i, j in zip(v, df.to_dict(orient='records'))
]
Or a compact version,
data = [dict(id=i, data=j) for i, j in zip(df.pop('id'), df.to_dict(orient='r'))]
Note that, if you're popping id inside the expression, it has to be the first argument to zip.
print(data)
[{'data': {'date': '2017-09-31',
'price': 299433,
'zipcode': '038ZJKE'},
'id': 'u029'},
{'data': {'date': '2017-01-05',
'price': 8923944,
'zipcode': 'AERIU87'},
'id': 'u734'},
{'data': {'date': '2017-07-28',
'price': 9084582,
'zipcode': 'BJDHEU3'},
'id': 'uh72'}]

python pandas convert a dictionary to a dataframe

i have two dictionaries as follows. I can convert the first to a dataframe , but the second gives error. Why?
d = {'id': ['CS2_056'], 'cost': [2], 'name': ['Tap']}
df = pd.DataFrame(d)
print(df)
raw_data1 = {
'subject_id': 3,
'first_name': 4,
'last_name': 7}
raw_data1
dfz = pd.DataFrame(raw_data1 )
This is happening because you are not passing an index, which is required when using scalar values. So to solve your issue you would do:
pd.DataFrame(raw_data1, index=[0])

Categories

Resources