I have jsons containing nested values that are sometimes None and the behavior has changed between pandas 0.22.0 and pandas 0.23.0.
In 0.22.0:
from pandas.io.json import json_normalize
my_json = {'event': {'name': 'Bob', 'id': '12345','id2': None},
'id': '12345', 'labels': []}
json_normalize(my_json)
gives:
event.id event.id2 event.name id labels
12345 None Bob 12345 []
which I want.
In 0.23.0:
from pandas.io.json import json_normalize
my_json = {'event': {'name': 'Bob', 'id': '12345','id2': None},
'id': '12345', 'labels': []}
json_normalize(my_json)
returns KeyError: 'id2'
Toggling ignore errors does nothing, and it's not really feasible to change the nested Nones to placeholder values. Anyone know how to achieve the prior behavior with the update?
Related
I have this snapshot of my dataset
test={'data': [{'name': 'john',
'insights': {'data': [{'account_id': '123',
'test_id': '456',
'date_start': '2022-12-31',
'date_stop': '2023-01-29',
'impressions': '4070',
'spend': '36.14'}],
'paging': {'cursors': {'before': 'MAZDZD', 'after': 'MAZDZD'}}},
'status': 'ACTIVE',
'id': '789'},
{'name': 'jack', 'status': 'PAUSED', 'id': '420'}]
}
I want to create a pandas dataframe where the columns are the name, date_start, date_stop, impressions, and spend.
When I tried json_normalize(), it raises an error because some of the keys are missing, when 'status':'PAUSED'. Is there a way to remove the values when the keys are missing from the list or another way of using json_normalize()? I tried errors='ignore' but it doesnt work as well.
This question already has answers here:
How do I parse a string to a float or int?
(32 answers)
Closed 7 months ago.
After converting my csv to dictionary with pandas, a sample of the dictionary will look like this:
[{'Name': '1234', 'Age': 20},
{'Name': 'Alice', 'Age': 30.1},
{'Name': '5678', 'Age': 41.0},
{'Name': 'Bob 1', 'Age': 14},
{'Name': '!##$%', 'Age': 65}]
My goal is to do a validation check if the columns are in string. I'm trying to use pandera or schema libs to achieve it as the csv may contain a million rows. Therefore, I am trying to convert the dict to as follows.
[{'Name': 1234, 'Age': 20},
{'Name': 'Alice', 'Age': 30.1},
{'Name': 5678, 'Age': 41.0},
{'Name': 'Bob 1', 'Age': 14},
{'Name': '!##$%', 'Age': 65}]
After converting the csv data to dict, I use the following code to check if Name is string.
import pandas as pd
from schema import Schema, And, Use, Optional, SchemaError
schema = Schema([{'Name': str,
'Age': float}])
validated = schema.validate(dict)
Is it possible?
Is it possible?
For sure. You can use the int constructor to convert that strings to integers if possible.
for element in list_:
try:
element["Name"] = int(element["Name"])
except ValueError:
pass
A faster way for doing it would be using isdigit method of class str.
for element in list_:
if element["Name"].isdigit(): # Otherwise no need to convert
element["Name"] = int(element["Name"])
So that you don't have to enter that try/except block.
I am trying to extract a seat of data from a column that is of type pandas.core.series.Series.
I tried
df['col1'] = df['details'].astype(str).str.findall(r'name\=(.*?),')
but the above returns null
Given below is how the data looks like in column df['details']
[{'id': 101, 'name': 'Name1', 'state': 'active', 'boardId': 101, 'goal': '', 'startDate': '2019-01-01T12:16:20.296Z', 'endDate': '2019-02-01T11:16:00.000Z'}]
Trying to extract value corresponding to name field
Expected output : Name1
try this: simple, change according to your need.
import pandas as pd
df = pd.DataFrame([{'id': 101, 'name': 'Name1', 'state': 'active', 'boardId': 101, 'goal': '', 'startDate': '2019-01-01T12:16:20.296Z', 'endDate': '2019-02-01T11:16:00.000Z'}])
print(df['name'][0])
#or if DataFrame inside a column itself
df['details'][0]['name']
NOTE: as you mentioned details is one of the dataset that you have in the existing dataset
import pandas as pd
df = pd.DataFrame([{'id': 101, 'name': 'Name1', 'state': 'active', 'boardId': 101, 'goal': '', 'startDate': '2019-01-01T12:16:20.296Z', 'endDate': '2019-02-01T11:16:00.000Z'}])
#Name column
print(df.name)
#Find specific values in Series
indeces = df.name.str.find("Name") #Returns indeces of such values
df.iloc[index] # Returns all columns that fields name contain "Name"
df.name.iloc[index] # Returns all values from column name, which contain "Name"
Hope, this example will help you.
EDIT:
Your data frame has column 'details', which contain a dict {'id':101, ...}
>>> df['details']
0 {'id': 101, 'name': 'Name1', 'state': 'active'...
And you want to get value from field 'name', so just try:
>>> df['details'][0]['name']
'Name1'
The structure in your series is a dictionary.
[{'id': 101, 'name': 'Name1', 'state': 'active', 'boardId': 101, 'goal': '', 'startDate': '2019-01-01T12:16:20.296Z', 'endDate': '2019-02-01T11:16:00.000Z'}]
You can just point to the element 'name' from that dict with the following command
df['details'][0]['name']
If the name could be different you can get the list of the keys in the dictionary and apply your regex on that list to get your field's name.
Hope that it can help you.
I have a nested JSON like below. I want to convert it into a pandas dataframe. As part of that, I also need to parse the weight value only. I don't need the unit.
I also want the number values converted from string to numeric.
Any help would be appreciated. I'm relatively new to python. Thank you.
JSON Example:
{'id': '123', 'name': 'joe', 'weight': {'number': '100', 'unit': 'lbs'},
'gender': 'male'}
Sample output below:
id name weight gender
123 joe 100 male
use " from pandas.io.json import json_normalize ".
id name weight.number weight.unit gender
123 joe 100 lbs male
if you want to discard the weight unit, just flatten the json:
temp = {'id': '123', 'name': 'joe', 'weight': {'number': '100', 'unit': 'lbs'}, 'gender': 'male'}
temp['weight'] = temp['weight']['number']
then turn it into a dataframe:
pd.DataFrame(temp)
Something like this should do the trick:
json_data = [{'id': '123', 'name': 'joe', 'weight': {'number': '100', 'unit': 'lbs'}, 'gender': 'male'}]
# convert the data to a DataFrame
df = pd.DataFrame.from_records(json_data)
# conver id to an int
df['id'] = df['id'].apply(int)
# get the 'number' field of weight and convert it to an int
df['weight'] = df['weight'].apply(lambda x: int(x['number']))
df
I have a dataframe df
id price date zipcode
u734 8923944 2017-01-05 AERIU87
uh72 9084582 2017-07-28 BJDHEU3
u029 299433 2017-09-31 038ZJKE
I want to create a dictionary with the following structure
{'id': xxx, 'data': {'price': xxx, 'date': xxx, 'zipcode': xxx}}
What I have done so far
ids = df['id']
prices = df['price']
dates = df['date']
zips = df['zipcode']
d = {'id':idx, 'data':{'price':p, 'date':d, 'zipcode':z} for idx,p,d,z in zip(ids,prices,dates,zips)}
>>> SyntaxError: invalid syntax
but I get the error above.
What would be the correct way to do this, using either
list comprehension
OR
pandas .to_dict()
bonus points: what is the complexity of the algorithm, and is there a more efficient way to do this?
I'd suggest the list comprehension.
v = df.pop('id')
data = [
{'id' : i, 'data' : j}
for i, j in zip(v, df.to_dict(orient='records'))
]
Or a compact version,
data = [dict(id=i, data=j) for i, j in zip(df.pop('id'), df.to_dict(orient='r'))]
Note that, if you're popping id inside the expression, it has to be the first argument to zip.
print(data)
[{'data': {'date': '2017-09-31',
'price': 299433,
'zipcode': '038ZJKE'},
'id': 'u029'},
{'data': {'date': '2017-01-05',
'price': 8923944,
'zipcode': 'AERIU87'},
'id': 'u734'},
{'data': {'date': '2017-07-28',
'price': 9084582,
'zipcode': 'BJDHEU3'},
'id': 'uh72'}]