I'm trying to create a nested Json file from a pandas dataframe. I found a similar question here but when I tried to apply the answer, the output wasn't what I really wanted. I tried to adjust the code to get the desired answer but I haven't been able to.
Let me explain the problem first then I will sow you what I have done so far.
I have the following dataframe:
Region staff_id rate dep
1 300047 77 4
1 300048 45 3
1 300049 32 7
2 299933 63 8
2 299938 86 7
Now I want the json object to look like this:
{'region': 1 :
{ 'Info': [
{'ID': 300047, 'Rate': 77, 'Dept': 4},
{'ID': 300048, 'Rate': 45, 'Dept': 3},
{'ID': 300049, 'Rate': 32, 'Dept': 7}
]
},
'region': 2 :
{ 'Info': [
{'ID': 299933, 'Rate': 63, 'Dept': 8},
{'ID': 299938, 'Rate': 86, 'Dept': 7}
]
}
}
So for every region, there is a tag called info and inside info there is all the rows of that region.
I tried this code from the previous answer:
json_output = list(df.apply(lambda row: {"region": row["Region"],"Info": [{
"ID": row["staff_id"], "Rate": row["rate"], "Dept": row["dep"]}]
},
axis=1).values)
Which will give me every row in the dataframe and not grouped by the region.
Sorry because this seems repetitive, but I have been trying to change that answer to fit mine and I would really appreciate your help.
As mention by Nick ODell, you can loop through the group by element
df = pd.DataFrame({"REGION":[1,1,1,2,2],
"staff_id": [1,2,3,4,5],
"rate": [77,45,32,63,86],
"dep":[4,3,7,8,7]})
desired_op = []
grp_element = list(df.groupby(["REGION"]))
for i in range(len(grp_element)):
empty_dict = {} # this dict will store data according to Region
lst_info = eval(grp_element[i][1][["staff_id","rate","dep"]].to_json(orient='records')) # converting to Json output of grouped data
empty_dict["REGION"] = grp_element[i][1]["REGION"].values[0] # to get Region number
empty_dict['info'] = lst_info
desired_op.append(empty_dict)
print(desired_op)
[{'REGION': 1,
'info': [{'staff_id': 1, 'rate': 77, 'dep': 4},
{'staff_id': 2, 'rate': 45, 'dep': 3},
{'staff_id': 3, 'rate': 32, 'dep': 7}]},
{'REGION': 2,
'info': [{'staff_id': 4, 'rate': 63, 'dep': 8},
{'staff_id': 5, 'rate': 86, 'dep': 7}]}]
Related
I am new to python and learning how to use a dictionary comprehension. I have a movie cast dictionary that I would like to filter on a specific value using the dictionary comprehension technique. I was able to get it work but for some reason I get empty dictionaries added as well if the condition is not met. Why does it do it? And how can I ensure these are not included?
movie_cast = [{'id': 90633,'name': 'Gal Gadot','cast_id': 0, 'order': 0},
{'id': 62064, 'name': 'Chris Pine','cast_id': 15, 'order': 1},
{'id': 41091, 'name': 'Kristen Wiig', 'cast_id': 12,'order': 2},
{'id': 41092, 'name': 'Pedro Pascal', 'cast_id': 13, 'order': 3},
{'id': 32, 'name': 'Robin Wright', 'cast_id': 78, 'order': 4}]
limit = 1
cast_limit = []
for dict in movie_cast:
d = {key:value for (key,value) in dict.items() if dict['order'] < limit}
cast_limit.append(d)
print(cast_limit)
current_result = [{'id': 90633,'name': 'Gal Gadot','cast_id': 0, 'order': 0},
{'id': 62064, 'name': 'Chris Pine','cast_id': 15, 'order': 1},{},{},{}]
desired_result = [{'id': 90633,'name': 'Gal Gadot','cast_id': 0, 'order': 0},
{'id': 62064, 'name': 'Chris Pine','cast_id': 15, 'order': 1}]
Try with this (you need a list comprehension, not a dict comprehension):
cast_limit = [dct for dct in movie_cast if dct['order'] < limit]
I.e., you need to filter out elements of the list, not elements of a dict.
I'm having some trouble accessing a value that is inside an array that contains a dictionary and another array.
It looks like this:
[{'name': 'Alex',
'number_of_toys': [{'classification': 3, 'count': 383},
{'classification': 1, 'count': 29},
{'classification': 0, 'count': 61}],
'total_toys': 473},
{'name': 'John',
'number_of_toys': [{'classification': 3, 'count': 8461},
{'classification': 0, 'count': 3825},
{'classification': 1, 'count': 1319}],
'total_toys': 13605}]
I want to access the 'count' number for each 'classification'. For example, for 'name' Alex, if 'classification' is 3, then the code returns the 'count' of 383, and so on for the other classifications and names.
Thanks for your help!
Not sure what your question asks, but if it's just a mapping exercise this will get you on the right track.
def get_toys(personDict):
person_toys = personDict.get('number_of_toys')
return [ (toys.get('classification'), toys.get('count')) for toys in person_toys]
def get_person_toys(database):
return [(personDict.get('name'), get_toys(personDict)) for personDict in database]
This result is:
[('Alex', [(3, 383), (1, 29), (0, 61)]), ('John', [(3, 8461), (0, 3825), (1, 1319)])]
This isn't as elegant as the previous answer because it doesn't iterate over the values, but if you want to select specific elements, this is one way to do that:
data = [{'name': 'Alex',
'number_of_toys': [{'classification': 3, 'count': 383},
{'classification': 1, 'count': 29},
{'classification': 0, 'count': 61}],
'total_toys': 473},
{'name': 'John',
'number_of_toys': [{'classification': 3, 'count': 8461},
{'classification': 0, 'count': 3825},
{'classification': 1, 'count': 1319}],
'total_toys': 13605}]
import pandas as pd
df = pd.DataFrame(data)
print(df.loc[0]['name'])
print(df.loc[0][1][0]['classification'])
print(df.loc[0][1][0]['count'])
which gives:
Alex
3
383
I have json format like this
{
"2015": [
{
"DayofWeek": 4,
"Date": "2015-02-06 00:00:00",
"Year": 2015,
"y": 43.2,
"x": 10.397
}
],
"2016": [
{
"DayofWeek": 4,
"Date": "2016-02-06 00:00:00",
"Year": 2016,
"y": 43.2,
"x": 10.397,
"Minute": 0
}
],
"2017": [
{
"DayofWeek": 4,
"Date": "2017-02-06 00:00:00",
"Year": 2017,
"y": 43.2,
"x": 10.397,
"Minute": 0
}
]
}
I am reading JSON file like this, and after reading json file; converting it to data frame
with open('sample.json') as json_data:
data = json.load(json_data)
df=pd.DataFrame([data])
Now, I want filter data based on certain input key value like DayofWeek and Year etc.
Example:
Case1:
if input value is DayofWeek=4, then I want filter all objects having DayofWeek=4.
Case2:
if input value is both DayofWeek=4 and year=2017, then I want filter all the 2017 years data from json having DayofWeek=4.
I have tried this code, but it is not working
filteredVal=df['2017']
filter_v={'2015':{'DayofYear':4}}
pd.Series(filter_v)
The Problem is, your json-values contains lists with dicts:
data
>>
{'2015': [{'DayofWeek': 4,
'Date': '2015-02-06 00:00:00',
'Year': 2015,
'y': 43.2,
'x': 10.397}],
'2016': [{'DayofWeek': 4,
'Date': '2016-02-06 00:00:00',
'Year': 2016,
'y': 43.2,
'x': 10.397,
'Minute': 0}],
'2017': [{'DayofWeek': 4,
'Date': '2017-02-06 00:00:00',
'Year': 2017,
'y': 43.2,
'x': 10.397,
'Minute': 0}]}
...pandas cannot process this (as far as I know).
But if every list contains just 1 element, you can convert it:
data_dict = {d: data[d][0] for d in data}
data_dict
>>
{'2015': {'DayofWeek': 4,
'Date': '2015-02-06 00:00:00',
'Year': 2015,
'y': 43.2,
'x': 10.397},
'2016': {'DayofWeek': 4,
'Date': '2016-02-06 00:00:00',
'Year': 2016,
'y': 43.2,
'x': 10.397,
'Minute': 0},
'2017': {'DayofWeek': 4,
'Date': '2017-02-06 00:00:00',
'Year': 2017,
'y': 43.2,
'x': 10.397,
'Minute': 0}}
Now you can make a DataFrame of it, with the index orientation:
df=pd.DataFrame.from_dict(data_dict, orient='index')
df
And access your elements:
Case1:
df[df['DayofWeek']==4]
Case2:
df[(df['DayofWeek']==4) & (df['Year']==2017)]
EDIT
If you have multiple elements inside the list, you can just create a list of all entries:
data_list = [v for d in data for v in data[d]]
df = pd.DataFrame(data_list)
Since you have a Year column, you probably don't even need the json-/dict-key, so I just skipped it. :-)
You can use list comprehension like this:
[data[x] for x in data if data[x][0]['DayofWeek'] == 4 and data[x][0]['Year'] == 2017]
This will give you a list of dictionary entries. If you want a filtered dictionary (to convert to a DataFrame), you can instead do something like this:
filtered_data = {}
filtered_data.update([(x, data[x]) for x in data if data[x][0]['DayofWeek'] == 4 and data[x][0]['Year'] == 2017])
One of the columns of my pandas dataframe looks like this
>> df
Item
0 [{"id":A,"value":20},{"id":B,"value":30}]
1 [{"id":A,"value":20},{"id":C,"value":50}]
2 [{"id":A,"value":20},{"id":B,"value":30},{"id":C,"value":40}]
I want to expand it as
A B C
0 20 30 NaN
1 20 NaN 50
2 20 30 40
I tried
dfx = pd.DataFrame()
for i in range(df.shape[0]):
df1 = pd.DataFrame(df.item[i]).T
header = df1.iloc[0]
df1 = df1[1:]
df1 = df1.rename(columns = header)
dfx = dfx.append(df1)
But this takes a lot of time as my data is huge. What is the best way to do this?
My original json data looks like this:
{
{
'_id': '5b1284e0b840a768f5545ef6',
'device': '0035sdf121',
'customerId': '38',
'variantId': '31',
'timeStamp': datetime.datetime(2018, 6, 2, 11, 50, 11),
'item': [{'id': A, 'value': 20},
{'id': B, 'value': 30},
{'id': C, 'value': 50}
},
{
'_id': '5b1284e0b840a768f5545ef6',
'device': '0035sdf121',
'customerId': '38',
'variantId': '31',
'timeStamp': datetime.datetime(2018, 6, 2, 11, 50, 11),
'item': [{'id': A, 'value': 20},
{'id': B, 'value': 30},
{'id': C, 'value': 50}
},
.............
}
I agree with #JeffH, you should really look at how you are constructing the DataFrame.
Assuming you are getting this from somewhere out of your control then you can convert to the your desired DataFrame with:
In []:
pd.DataFrame(df['Item'].apply(lambda r: {d['id']: d['value'] for d in r}).values.tolist())
Out[]:
A B C
0 20 30.0 NaN
1 20 NaN 50.0
2 20 30.0 40.0
I've got dictionaries in a list:
fit_statstest = [{'activities-heart': [{'dateTime': '2018-02-01',
'value': {'customHeartRateZones': [],
'heartRateZones': [{'caloriesOut': 2119.9464,
'max': 96,
'min': 30,
'minutes': 1232,
'name': 'Out of Range'},
{'caloriesOut': 770.2719,
'max': 134,
'min': 96,
'minutes': 120,
'name': 'Fat Burn'},
{'caloriesOut': 0,
'max': 163,
'min': 134,
'minutes': 0,
'name': 'Cardio'},
{'caloriesOut': 0,
'max': 220,
'min': 163,
'minutes': 0,
'name': 'Peak'}],
'restingHeartRate': 64}}],
'activities-heart-intraday': {'dataset': [{'time': '00:00:00', 'value': 57},
{'time': '00:00:10', 'value': 56},
{'time': '00:00:20', 'value': 59},
{'time': '00:00:35', 'value': 59},
{'time': '02:54:10', 'value': 85},
{'time': '02:54:20', 'value': 71},
{'time': '02:54:30', 'value': 66},
...],'datasetInterval': 1,
'datasetType': 'second'}},
{'activities-heart': [{'dateTime': '2018-02-02',
'value': {'customHeartRateZones': [],
'heartRateZones': [{'caloriesOut': 2200.61802,
'max': 96,
'min': 30,
'minutes': 1273,
'name': 'Out of Range'},
{'caloriesOut': 891.9588,
'max': 134,
'min': 96,
'minutes': 133,
'name': 'Fat Burn'},
{'caloriesOut': 35.8266,
'max': 163,
'min': 134,
'minutes': 3,
'name': 'Cardio'},
{'caloriesOut': 0,
'max': 220,
'min': 163,
'minutes': 0,
'name': 'Peak'}],
'restingHeartRate': 67}}],
'activities-heart-intraday': {'dataset': [{'time': '00:00:10', 'value': 80},
{'time': '00:00:15', 'value': 79},
{'time': '00:00:20', 'value': 74},
{'time': '00:00:25', 'value': 72},
{'time': '03:04:10', 'value': 61},
{'time': '03:04:25', 'value': 61},
{'time': '03:04:40', 'value': 61},
...],
'datasetInterval': 1,
'datasetType': 'second'}}]
I'm trying to append the 'time': 'hh:mm:ss' and 'value': Int to a DataFrame.
This is how I did it for a single dictionary (which worked like a charm):
time_list = []
val_list = []
for i in fit_statsHR['activities-heart-intraday']['dataset']:
val_list.append(i['value'])
time_list.append(i['time'])
And this is how I tried doing it for the multi-level dictionary list:
time_test = []
val_test = []
for i in fit_statstest:
val_test.append(i['activities-heart-intraday']['dataset']['value'])
time_test.append(i['activities-heart-intraday']['dataset']['time'])
heartdftest = pd.DataFrame({'Heart Rate':val_test,'Time':time_test})
I get this error: list indices must be integers or slices, not str; and am not quite sure how to go about solving this problem.
I tried using the .copy() method but had no joy with that either.
UPDATE:
#Phydeaux: Cheers for this! I tried this:
time_test = []
val_test = []
j = np.arange(0,len(fit_statstest))
for i in fit_statstest[j]['activities-heart-intraday']['dataset']:
val_test.append(i['value'])
time_test.append(i['time'])
I get this error now:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-184-f3e7484e1cfc> in <module>()
3 j = np.arange(0,len(fit_statstest))
4
----> 5 for i in fit_statstest[j]['activities-heart-intraday']['dataset']:
6 val_test.append(i['value'])
7 time_test.append(i['time'])
TypeError: only integer scalar arrays can be converted to a scalar index
only integer scalar arrays can be converted to a scalar index. Not sure if I'm going in the right track though!
i['activities-heart-intraday']['dataset'] is a list containing multiple dictionaries, each of which has a 'value' attribute. You are trying to treat this list as if it were a dictionary, which is what causes the exception you are getting.
You had the right idea with your code for the single dictionary. You need to loop through the list and do something with each item.
Edit: you can't directly use np.arange to index a list like that, as the exception says. What were you expecting that to do?
Try this:
time_test = []
val_test = []
# use descriptive names for your loop indices that give a hint about what they represent
for day in fit_statstest:
for entry in day['activities-heart-intraday']['dataset']:
time_test.append(entry['time'])
val_test.append(entry['value'])
Here is one solution via a single list comprehension:
import pandas as pd
time_values = [(d['time'], d['value']) for day in fit_statstest \
for d in day['activities-heart-intraday']['dataset']]
df = pd.DataFrame(time_values, columns=['time', 'value'])
Result
time value
0 00:00:00 57
1 00:00:10 56
2 00:00:20 59
3 00:00:35 59
4 02:54:10 85
5 02:54:20 71
6 02:54:30 66
7 00:00:10 80
8 00:00:15 79
9 00:00:20 74
10 00:00:25 72
11 03:04:10 61
12 03:04:25 61
13 03:04:40 61