Convert Nested JSON into Dataframe - python

I have a nested JSON like below. I want to convert it into a pandas dataframe. As part of that, I also need to parse the weight value only. I don't need the unit.
I also want the number values converted from string to numeric.
Any help would be appreciated. I'm relatively new to python. Thank you.
JSON Example:
{'id': '123', 'name': 'joe', 'weight': {'number': '100', 'unit': 'lbs'},
'gender': 'male'}
Sample output below:
id name weight gender
123 joe 100 male

use " from pandas.io.json import json_normalize ".
id name weight.number weight.unit gender
123 joe 100 lbs male

if you want to discard the weight unit, just flatten the json:
temp = {'id': '123', 'name': 'joe', 'weight': {'number': '100', 'unit': 'lbs'}, 'gender': 'male'}
temp['weight'] = temp['weight']['number']
then turn it into a dataframe:
pd.DataFrame(temp)

Something like this should do the trick:
json_data = [{'id': '123', 'name': 'joe', 'weight': {'number': '100', 'unit': 'lbs'}, 'gender': 'male'}]
# convert the data to a DataFrame
df = pd.DataFrame.from_records(json_data)
# conver id to an int
df['id'] = df['id'].apply(int)
# get the 'number' field of weight and convert it to an int
df['weight'] = df['weight'].apply(lambda x: int(x['number']))
df

Related

validation check - dictionary value types [duplicate]

This question already has answers here:
How do I parse a string to a float or int?
(32 answers)
Closed 7 months ago.
After converting my csv to dictionary with pandas, a sample of the dictionary will look like this:
[{'Name': '1234', 'Age': 20},
{'Name': 'Alice', 'Age': 30.1},
{'Name': '5678', 'Age': 41.0},
{'Name': 'Bob 1', 'Age': 14},
{'Name': '!##$%', 'Age': 65}]
My goal is to do a validation check if the columns are in string. I'm trying to use pandera or schema libs to achieve it as the csv may contain a million rows. Therefore, I am trying to convert the dict to as follows.
[{'Name': 1234, 'Age': 20},
{'Name': 'Alice', 'Age': 30.1},
{'Name': 5678, 'Age': 41.0},
{'Name': 'Bob 1', 'Age': 14},
{'Name': '!##$%', 'Age': 65}]
After converting the csv data to dict, I use the following code to check if Name is string.
import pandas as pd
from schema import Schema, And, Use, Optional, SchemaError
schema = Schema([{'Name': str,
'Age': float}])
validated = schema.validate(dict)
Is it possible?
Is it possible?
For sure. You can use the int constructor to convert that strings to integers if possible.
for element in list_:
try:
element["Name"] = int(element["Name"])
except ValueError:
pass
A faster way for doing it would be using isdigit method of class str.
for element in list_:
if element["Name"].isdigit(): # Otherwise no need to convert
element["Name"] = int(element["Name"])
So that you don't have to enter that try/except block.

Pandas - Extracting data from Series

I am trying to extract a seat of data from a column that is of type pandas.core.series.Series.
I tried
df['col1'] = df['details'].astype(str).str.findall(r'name\=(.*?),')
but the above returns null
Given below is how the data looks like in column df['details']
[{'id': 101, 'name': 'Name1', 'state': 'active', 'boardId': 101, 'goal': '', 'startDate': '2019-01-01T12:16:20.296Z', 'endDate': '2019-02-01T11:16:00.000Z'}]
Trying to extract value corresponding to name field
Expected output : Name1
try this: simple, change according to your need.
import pandas as pd
df = pd.DataFrame([{'id': 101, 'name': 'Name1', 'state': 'active', 'boardId': 101, 'goal': '', 'startDate': '2019-01-01T12:16:20.296Z', 'endDate': '2019-02-01T11:16:00.000Z'}])
print(df['name'][0])
#or if DataFrame inside a column itself
df['details'][0]['name']
NOTE: as you mentioned details is one of the dataset that you have in the existing dataset
import pandas as pd
df = pd.DataFrame([{'id': 101, 'name': 'Name1', 'state': 'active', 'boardId': 101, 'goal': '', 'startDate': '2019-01-01T12:16:20.296Z', 'endDate': '2019-02-01T11:16:00.000Z'}])
#Name column
print(df.name)
#Find specific values in Series
indeces = df.name.str.find("Name") #Returns indeces of such values
df.iloc[index] # Returns all columns that fields name contain "Name"
df.name.iloc[index] # Returns all values from column name, which contain "Name"
Hope, this example will help you.
EDIT:
Your data frame has column 'details', which contain a dict {'id':101, ...}
>>> df['details']
0 {'id': 101, 'name': 'Name1', 'state': 'active'...
And you want to get value from field 'name', so just try:
>>> df['details'][0]['name']
'Name1'
The structure in your series is a dictionary.
[{'id': 101, 'name': 'Name1', 'state': 'active', 'boardId': 101, 'goal': '', 'startDate': '2019-01-01T12:16:20.296Z', 'endDate': '2019-02-01T11:16:00.000Z'}]
You can just point to the element 'name' from that dict with the following command
df['details'][0]['name']
If the name could be different you can get the list of the keys in the dictionary and apply your regex on that list to get your field's name.
Hope that it can help you.

Convert nested dict in pandas dataframe

I have a nested dictionary with the following structure. I am trying to convert it to pandas dataframe, however I have problems to split the 'mappings' dictionary to have it in separate columns.
{'16':
{'label': 't1',
'prefLab': 'name',
'altLabel': ['test1', 'test3'],
'map': [{'id': '16', 'idMap': {'ciID': 16, 'map3': '033441'}}]
},
'17':
{'label': 't2',
'prefLab': 'name2',
'broader': ['18'],
'altLabel': ['test2'],
'map': [{'id': '17', 'idMap': {'ciID': 17, 'map1': 1006558, 'map2': 1144}}]
}
}
ideal outcome would be a dataframe with the following structure.
label prefLab broader altLab ciID, map1, map2, map3 ...
16
17
Try with this: assuming your json format name is "data" then
train = pd.DataFrame.from_dict(data, orient='index')

How to create a dict of dicts from pandas dataframe?

I have a dataframe df
id price date zipcode
u734 8923944 2017-01-05 AERIU87
uh72 9084582 2017-07-28 BJDHEU3
u029 299433 2017-09-31 038ZJKE
I want to create a dictionary with the following structure
{'id': xxx, 'data': {'price': xxx, 'date': xxx, 'zipcode': xxx}}
What I have done so far
ids = df['id']
prices = df['price']
dates = df['date']
zips = df['zipcode']
d = {'id':idx, 'data':{'price':p, 'date':d, 'zipcode':z} for idx,p,d,z in zip(ids,prices,dates,zips)}
>>> SyntaxError: invalid syntax
but I get the error above.
What would be the correct way to do this, using either
list comprehension
OR
pandas .to_dict()
bonus points: what is the complexity of the algorithm, and is there a more efficient way to do this?
I'd suggest the list comprehension.
v = df.pop('id')
data = [
{'id' : i, 'data' : j}
for i, j in zip(v, df.to_dict(orient='records'))
]
Or a compact version,
data = [dict(id=i, data=j) for i, j in zip(df.pop('id'), df.to_dict(orient='r'))]
Note that, if you're popping id inside the expression, it has to be the first argument to zip.
print(data)
[{'data': {'date': '2017-09-31',
'price': 299433,
'zipcode': '038ZJKE'},
'id': 'u029'},
{'data': {'date': '2017-01-05',
'price': 8923944,
'zipcode': 'AERIU87'},
'id': 'u734'},
{'data': {'date': '2017-07-28',
'price': 9084582,
'zipcode': 'BJDHEU3'},
'id': 'uh72'}]

Excel parsing in python and forming dictionary

I have an Excel file with a lot of data in it . i would like to get the info like in a dictionary example the first column of the excel will be the key and the rest of the column will be the values
Excel:
No name lastname hobby
1 jhon g fishing
2 mike a boxing
3 tom v sking
is it possible to have it like
dict = {No:1, name:jhon, lastname:g, hobby:fishing},
dict = {No:2, name:mike, lastname:a, hobby:boxing},
i tried converting the excel to csv and tried csv.DictReader it did not work for me is there any other way
Given the following CSV file:
No,name,lastname,hobby
1,jhon,g,fishing
2,mike,a,boxing
3,tom,v,sking
The following code appears to do what you're asking for:
In [1]: import csv
In [2]: for d in csv.DictReader(open('file.txt')): print d
...:
{'hobby': 'fishing', 'lastname': 'g', 'name': 'jhon', 'No': '1'}
{'hobby': 'boxing', 'lastname': 'a', 'name': 'mike', 'No': '2'}
{'hobby': 'sking', 'lastname': 'v', 'name': 'tom', 'No': '3'}

Categories

Resources