Convert nested dict in pandas dataframe

Convert nested dict in pandas dataframe - python

I have a nested dictionary with the following structure. I am trying to convert it to pandas dataframe, however I have problems to split the 'mappings' dictionary to have it in separate columns.
{'16':
{'label': 't1',
'prefLab': 'name',
'altLabel': ['test1', 'test3'],
'map': [{'id': '16', 'idMap': {'ciID': 16, 'map3': '033441'}}]
},
'17':
{'label': 't2',
'prefLab': 'name2',
'broader': ['18'],
'altLabel': ['test2'],
'map': [{'id': '17', 'idMap': {'ciID': 17, 'map1': 1006558, 'map2': 1144}}]
}
}
ideal outcome would be a dataframe with the following structure.
label prefLab broader altLab ciID, map1, map2, map3 ...
16
17

Try with this: assuming your json format name is "data" then
train = pd.DataFrame.from_dict(data, orient='index')

Related

Using json_normalize() for missing keys Python Pandas DataFrame

I have this snapshot of my dataset
test={'data': [{'name': 'john',
'insights': {'data': [{'account_id': '123',
'test_id': '456',
'date_start': '2022-12-31',
'date_stop': '2023-01-29',
'impressions': '4070',
'spend': '36.14'}],
'paging': {'cursors': {'before': 'MAZDZD', 'after': 'MAZDZD'}}},
'status': 'ACTIVE',
'id': '789'},
{'name': 'jack', 'status': 'PAUSED', 'id': '420'}]
}
I want to create a pandas dataframe where the columns are the name, date_start, date_stop, impressions, and spend.
When I tried json_normalize(), it raises an error because some of the keys are missing, when 'status':'PAUSED'. Is there a way to remove the values when the keys are missing from the list or another way of using json_normalize()? I tried errors='ignore' but it doesnt work as well.

Pandas - Extracting data from Series

I am trying to extract a seat of data from a column that is of type pandas.core.series.Series.
I tried
df['col1'] = df['details'].astype(str).str.findall(r'name\=(.*?),')
but the above returns null
Given below is how the data looks like in column df['details']
[{'id': 101, 'name': 'Name1', 'state': 'active', 'boardId': 101, 'goal': '', 'startDate': '2019-01-01T12:16:20.296Z', 'endDate': '2019-02-01T11:16:00.000Z'}]
Trying to extract value corresponding to name field
Expected output : Name1

try this: simple, change according to your need.
import pandas as pd
df = pd.DataFrame([{'id': 101, 'name': 'Name1', 'state': 'active', 'boardId': 101, 'goal': '', 'startDate': '2019-01-01T12:16:20.296Z', 'endDate': '2019-02-01T11:16:00.000Z'}])
print(df['name'][0])
#or if DataFrame inside a column itself
df['details'][0]['name']
NOTE: as you mentioned details is one of the dataset that you have in the existing dataset

import pandas as pd
df = pd.DataFrame([{'id': 101, 'name': 'Name1', 'state': 'active', 'boardId': 101, 'goal': '', 'startDate': '2019-01-01T12:16:20.296Z', 'endDate': '2019-02-01T11:16:00.000Z'}])
#Name column
print(df.name)
#Find specific values in Series
indeces = df.name.str.find("Name") #Returns indeces of such values
df.iloc[index] # Returns all columns that fields name contain "Name"
df.name.iloc[index] # Returns all values from column name, which contain "Name"
Hope, this example will help you.
EDIT:
Your data frame has column 'details', which contain a dict {'id':101, ...}
>>> df['details']
0 {'id': 101, 'name': 'Name1', 'state': 'active'...
And you want to get value from field 'name', so just try:
>>> df['details'][0]['name']
'Name1'

The structure in your series is a dictionary.
[{'id': 101, 'name': 'Name1', 'state': 'active', 'boardId': 101, 'goal': '', 'startDate': '2019-01-01T12:16:20.296Z', 'endDate': '2019-02-01T11:16:00.000Z'}]
You can just point to the element 'name' from that dict with the following command
df['details'][0]['name']
If the name could be different you can get the list of the keys in the dictionary and apply your regex on that list to get your field's name.
Hope that it can help you.

Convert Nested JSON into Dataframe

I have a nested JSON like below. I want to convert it into a pandas dataframe. As part of that, I also need to parse the weight value only. I don't need the unit.
I also want the number values converted from string to numeric.
Any help would be appreciated. I'm relatively new to python. Thank you.
JSON Example:
{'id': '123', 'name': 'joe', 'weight': {'number': '100', 'unit': 'lbs'},
'gender': 'male'}
Sample output below:
id name weight gender
123 joe 100 male

use " from pandas.io.json import json_normalize ".
id name weight.number weight.unit gender
123 joe 100 lbs male

if you want to discard the weight unit, just flatten the json:
temp = {'id': '123', 'name': 'joe', 'weight': {'number': '100', 'unit': 'lbs'}, 'gender': 'male'}
temp['weight'] = temp['weight']['number']
then turn it into a dataframe:
pd.DataFrame(temp)

Something like this should do the trick:
json_data = [{'id': '123', 'name': 'joe', 'weight': {'number': '100', 'unit': 'lbs'}, 'gender': 'male'}]
# convert the data to a DataFrame
df = pd.DataFrame.from_records(json_data)
# conver id to an int
df['id'] = df['id'].apply(int)
# get the 'number' field of weight and convert it to an int
df['weight'] = df['weight'].apply(lambda x: int(x['number']))
df

Python jsonpath Filter Expression

Background:
I have the following example data structure in JSON:
{'sensor' : [
{'assertions_enabled': 'ucr+',
'deassertions_enabled': 'ucr+',
'entity_id': '7.0',
'lower_critical': 'na',
'lower_non_critical': 'na',
'lower_non_recoverable': 'na',
'reading_type': 'analog',
'sensor_id': 'SR5680 TEMP (0x5d)',
'sensor_reading': {'confidence_interval': '0.500',
'units': 'degrees C',
'value': '42'},
'sensor_type': 'Temperature',
'status': 'ok',
'upper_critical': '59.000',
'upper_non_critical': 'na',
'upper_non_recoverable': 'na'}
]}
The sensor list will actually contain many of these dicts containing sensor info.
Problem:
I'm trying to query the list using jsonpath to return me a subset of sensor dicts that have sensor_type=='Temperature' but I'm getting 'False' returned (no match). Here's my jsonpath expression:
results = jsonpath.jsonpath(ipmi_node, "$.sensor[?(#.['sensor_type']=='Temperature')]")
When I remove the filter expression and just use "$.sensor.*" I get a list of all sensors, so I'm sure the problem is in the filter expression.
I've scanned multiple sites/posts for examples and I can't seem to find anything specific to Python (Javascript and PHP seem to be more prominent). Could anyone offer some guidance please?

The following expression does what you need (notice how the attribute is specified):
jsonpath.jsonpath(impi_node, "$.sensor[?(#.sensor_type=='Temperature')]")

I am using jsonpath-ng which seems to be active (as of 23.11.20) and I provide solution based on to Pedro's jsonpath expression:
data = {
'sensor' : [
{'sensor_type': 'Temperature', 'id': '1'},
{'sensor_type': 'Humidity' , 'id': '2'},
{'sensor_type': 'Temperature', 'id': '3'},
{'sensor_type': 'Density' , 'id': '4'}
]}
from jsonpath_ng.ext import parser
for match in parser.parse("$.sensor[?(#.sensor_type=='Temperature')]").find(data):
print(match.value)
Output:
{'sensor_type': 'Temperature', 'id': '1'}
{'sensor_type': 'Temperature', 'id': '3'}
NOTE: besides basic documentation provided on project's homepage I found additional information in tests.

Numeric sort of list of dictionary objects

I am very new to python programming and have yet to buy a textbook on the matter (I am buying one from the store or Amazon today). In the meantime, can you help me with the following problem I have encountered?
I have an list of dictionary objects like this:
stock = [
{ 'date': '2012', 'amount': '1.45', 'type': 'one'},
{ 'date': '2012', 'amount': '1.4', 'type': 'two'},
{ 'date': '2011', 'amount': '1.35', 'type': 'three'},
{ 'date': '2012', 'amount': '1.35', 'type': 'four'}
]
I would like to sort the list by the amount date column and then by the amount column so that the sorted list looks like this:
stock = [
{ 'date': '2011', 'amount': '1.35', 'type': 'three'},
{ 'date': '2012', 'amount': '1.35', 'type': 'four'},
{ 'date': '2012', 'amount': '1.4', 'type': 'two'},
{ 'date': '2012', 'amount': '1.45', 'type': 'one'}
]
I now think I need to use sorted() but as a beginner I am having difficulties understanding to concepts I see.
I tried this:
from operator import itemgetter
all_amounts = itemgetter("amount")
stock.sort(key = all_amounts)
but this resulted in an list that was sorted alphanumerically rather than numerically.
Can someone please tell me how to achieve this seemingly simple sort? Thank-you!

Your sorting condition is too complicated for an operator.itemgetter. You will have to use a lambda function:
stock.sort(key=lambda x: (int(x['date']), float(x['amount'])))
or
all_amounts = lambda x: (int(x['date']), float(x['amount']))
stock.sort(key=all_amounts)

Start by converting your data into a proper format:
stock = [
{ 'date': int(x['date']), 'amount': float(x['amount']), 'type': x['type']}
for x in stock
]
Now stock.sort(key=all_amounts) will return correct results.
As you appear to be new in programming, here's a word of general advice if I may:
Proper data structure is 90 percent of success. Do not try to work around broken data by writing more code. Create a structure adequate to your task and write as less code as possible.

You can also use the fact that python's sort is stable:
stock.sort(key=lambda x: int(x["amount"]))
stock.sort(key=lambda x: int(x["date"]))
Since the items with the same key keep their relative positions when sorting (they're never swapped), you can build up a complicated sort by sorting multiple times.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Convert nested dict in pandas dataframe - python

Try with this: assuming your json format name is "data" then train = pd.DataFrame.from_dict(data, orient='index')

Related

Using json_normalize() for missing keys Python Pandas DataFrame

Pandas - Extracting data from Series

Convert Nested JSON into Dataframe

Python jsonpath Filter Expression

Numeric sort of list of dictionary objects

Categories

Resources