Converting nested list into data frame using pandas - python

I have below list which stored in data
{'id': 255719,
'display_name': 'Off Broadway Apartments',
'access_right': {'status': 'OWNED', 'api_enabled': True},
'creation_time': '2021-04-26T15:53:29+00:00',
'status': {'value': 'OFFLINE', 'last_change': '2021-07-10T17:26:50+00:00'},
'licence': {'id': 74213,
'code': '23AF-15A8-0514-2E4B-04DE-5C19-A574-B20B',
'bound_mac_address': '00:11:32:C2:FE:6A',
'activation_time': '2021-04-26T15:53:29+00:00',
'type': 'SUBSCRIPTION'},
'timezone': 'America/Chicago',
'version': {'agent': '3.7.0-b001', 'package': '2.5.1-0022'},
'location': {'latitude': '41.4126', 'longitude': '-99.6345'}}
I would like to convert into data frame.can anyone advise?
I tried below code
df = pd.DataFrame(data)
but it's not coming properly as many nested lists. can anyone advise?

from pandas.io.json import json_normalize
# load json
json = {'id': 255719,
'display_name': 'Off Broadway Apartments',
'access_right': {'status': 'OWNED', 'api_enabled': True},
'creation_time': '2021-04-26T15:53:29+00:00',
'status': {'value': 'OFFLINE', 'last_change': '2021-07-10T17:26:50+00:00'},
'licence': {'id': 74213,
'code': '23AF-15A8-0514-2E4B-04DE-5C19-A574-B20B',
'bound_mac_address': '00:11:32:C2:FE:6A',
'activation_time': '2021-04-26T15:53:29+00:00',
'type': 'SUBSCRIPTION'},
'timezone': 'America/Chicago',
'version': {'agent': '3.7.0-b001', 'package': '2.5.1-0022'},
'location': {'latitude': '41.4126', 'longitude': '-99.6345'}}
Create a fuction to flat nested jsons:
def flatten_json(y):
out = {}
def flatten(x, name=''):
if type(x) is dict:
for a in x:
flatten(x[a], name + a + '_')
elif type(x) is list:
i = 0
for a in x:
flatten(a, name + str(i) + '_')
i += 1
else:
out[name[:-1]] = x
flatten(y)
return out
You can now use that function on your original json file:
flat = flatten_json(json)
df = json_normalize(flat)
Results:
id display_name ... location_latitude location_longitude
0 255719 Off Broadway Apartments ... 41.4126 -99.6345

Related

python according to the same value combining dictionary

i have a list of dict like this
[
{'id': 'A123',
'feature': {'name': 'jack', 'age' : '18' },
'create_time': '2022-5-17 10:29:47',
'is_fast': False},
{'id': 'A123',
'feature': {'gender': 'male'},
'create_time': '2022-5-17 10:29:47',
'is_fast': False},
{'id': 'A123',
'habit': {'name': 'read'},
'create_time': '2022-5-15 10:29:45',
'is_fast': False},
{'id': 'A456',
'feature': {'name': 'rose'},
'create_time': '2022-4-15 10:29:45',
'is_fast': False},
{'id': 'A456',
'habit': {'name': 'sport'},
'create_time': '2022-3-15 10:29:45',
'is_fast': False}
]
But I want to merge the same "id" values ​​together using something function
The desired output is as follows
[
{'id': 'A123',
'feature': {'name': 'jack', 'age' : '18' ,'gender': 'male'},
'habit': {'name': 'read'},
'create_time': '2022-5-19 10:29:47', #Get the latest time based on the same id
'is_fast': False},
{'id': 'A456',
'feature': {'name': 'rose'},
'habit': {'name': 'sport'},
'create_time': '2022-4-15 10:29:45',
'is_fast': False},
]
How can I merge the same "id" values ​​according to these dictionaries..
This should get you started... I put some inline notes to explain what the code is doing. You still need to implement a date time comparison.
def merge_dicts(lst):
final = {} # results
for row in lst: # iterate through list
if row['id'] not in final: # if current item id hasnt been seen
final[row['id']] = row # assign it to results with id as the key
else:
record = final[row['id']] # otherwise compare to data already stored
for k,v in row.items(): #iterate through dictionary items
if k not in record: # if key not in results
record[k] = v # add the key and value
continue
if record[k] == v: continue # if they are already equal move on
if isinstance(v, dict): # if its a dictionary
record[k].update(v) # update the dictionary
else: # must be date time sequence so do some datetime comparison
"""Do some date comparison and assign correct date"""
return [v for k,v in final.items()] # convert to list
print(merge_dicts(lst))
output:
[
{
'id': 'A123',
'feature': {'name': 'jack', 'age': '18', 'gender': 'male'},
'create_time': '2022-5-17 10:29:47',
'is_fast': False,
'habit': {'name': 'read'}
},
{
'id': 'A456',
'feature': {'name': 'rose'},
'create_time': '2022-4-15 10:29:45',
'is_fast': False,
'habit': {'name': 'sport'}
}
]
You can use the dict.setdefault method to initialize sub-dicts under keys that don't already exist to avoid cluttering up your code with conditional statements that test the existence of keys:
merged = {}
for d in lst:
s = merged.setdefault(d['id'], d)
for k, v in d.items():
if isinstance(v, dict):
s.setdefault(k, v).update(v)
elif v > s[k]: # the dates/times in the input follow alphabetical order
s[k] = v # later dates/times takes precedence
print(list(merged.values()))
Demo: https://replit.com/#blhsing/BlandCarelessPolygons#main.py

How to rename keys in a dictionary and make a dataframe of it?

I have a complex situation which I hope to solve and which might profit us all. I collected data from my API, added a pagination and inserted the complete data package in a tuple named q1 and finally I have made a dictionary named dict_1of that tuple which looks like this:
dict_1 = {100: {'ID': 100, 'DKSTGFase': None, 'DK': False, 'KM': None,
'Country: {'Name': GE', 'City': {'Name': 'Berlin'}},
'Type': {'Name': '219'}, 'DKObject': {'Name': '8555', 'Object': {'Name': 'Car'}},
'Order': {'OrderId': 101, 'CreatedOn': '2018-07-06T16:54:36.783+02:00',
'ModifiedOn': '2018-07-06T16:54:36.783+02:00',
'Name': Audi, 'Client': {‘1’ }}, 'DKComponent': {'Name': ‘John’}},
{200: {'ID': 200, 'DKSTGFase': None, 'DK': False, ' KM ': None,
'Country: {'Name': ES', 'City': {'Name': 'Madrid'}}, 'Type': {'Name': '220'},
'DKObject': {'Name': '8556', 'Object': {'Name': 'Car'}},
'Order': {'OrderId': 102, 'CreatedOn': '2018-07-06T16:54:36.783+02:00',
'ModifiedOn': '2018-07-06T16:54:36.783+02:00',
'Name': Mercedes, 'Client': {‘2’ }}, 'DKComponent': {'Name': ‘Sergio’}},
Please note that in the above dictionary I have just stated 2 records. The actual dictionary has 1400 records till it reaches ID 1500.
Now I want to 2 things:
I want to change some keys for all the records. key DK has to become DK1. Key Name in Country has to become Name1 and Name in Object has to become 'Name2'
The second thing I want is to make a dataFrame of the whole bunch of data. My expected outcome is:
This is my code:
q1 = response_2.json()
next_link = q1['#odata.nextLink']
q1 = [tuple(q1.values())]
while next_link:
new_response = requests.get(next_link, headers=headers, proxies=proxies)
new_data = new_response.json()
q1.append(tuple(new_data.values()))
next_link = new_data.get('#odata.nextLink', None)
dict_1 = {
record['ID']: record
for tup in q1
for record in tup[2]
}
#print(dict_1)
for x in dict_1.values():
x['DK1'] = x['DK']
x['Country']['Name1'] = x['Country']['Name']
x['Object']['Name2'] = x['Object']['Name']
df = pd.DataFrame(dict_1)
When i run this I receive the following Error:
Traceback (most recent call last):
File "c:\data\FF\Desktop\Python\PythongMySQL\Talky.py", line 57, in <module>
x['Country']['Name1'] = x['Country']['Name']
TypeError: 'NoneType' object is not subscriptable
working code
lists=[]
alldict=[{100: {'ID': 100, 'DKSTGFase': None, 'DK': False, 'KM': None,
'Country': {'Name': 'GE', 'City': {'Name': 'Berlin'}},
'Type': {'Name': '219'}, 'DKObject': {'Name': '8555', 'Object': {'Name': 'Car'}},
'Order': {'OrderId': 101, 'CreatedOn': '2018-07-06T16:54:36.783+02:00',
'ModifiedOn': '2018-07-06T16:54:36.783+02:00',
'Name': 'Audi', 'Client': {'1' }}, 'DKComponent': {'Name': 'John'}}}]
for eachdict in alldict:
key=list(eachdict.keys())[0]
eachdict[key]['DK1']=eachdict[key]['DK']
del eachdict[key]['DK']
eachdict[key]['Country']['Name1']=eachdict[key]['Country']['Name']
del eachdict[key]['Country']['Name']
eachdict[key]['DKObject']['Object']['Name2']=eachdict[key]['DKObject']['Object']['Name']
del eachdict[key]['DKObject']['Object']['Name']
lists.append([key, eachdict[key]['DK1'], eachdict[key]['KM'], eachdict[key]['Country']['Name1'],
eachdict[key]['Country']['City']['Name'], eachdict[key]['DKObject']['Object']['Name2'], eachdict[key]['Order']['Client']])
pd.DataFrame(lists, columns=[<columnNamesHere>])
Output:
{100: {'ID': 100,
'DKSTGFase': None,
'KM': None,
'Country': {'City': {'Name': 'Berlin'}, 'Name1': 'GE'},
'Type': {'Name': '219'},
'DKObject': {'Name': '8555', 'Object': {'Name2': 'Car'}},
'Order': {'OrderId': 101,
'CreatedOn': '2018-07-06T16:54:36.783+02:00',
'ModifiedOn': '2018-07-06T16:54:36.783+02:00',
'Name': 'Audi',
'Client': {'1'}},
'DKComponent': {'Name': 'John'},
'DK1': False}}

How to convert json into a pandas dataframe?

I'm trying to covert an api response from json to a dataframe in pandas. the problem I am having is that de data is nested in the json format and I am not getting the right columns in my dataframe.
The data is collect from a api with the following format:
{'tickets': [{'url': 'https...',
'id': 1,
'external_id': None,
'via': {'channel': 'web',
'source': {'from': {}, 'to': {}, 'rel': None}},
'created_at': '2020-05-01T04:16:33Z',
'updated_at': '2020-05-23T03:02:49Z',
'type': 'incident',
'subject': 'Subject',
'raw_subject': 'Raw subject',
'description': 'Hi, this is the description',
'priority': 'normal',
'status': 'closed',
'recipient': None,
'requester_id': 409467360874,
'submitter_id': 409126461453,
'assignee_id': 409126461453,
'organization_id': None,
'group_id': 360009916453,
'collaborator_ids': [],
'follower_ids': [],
'email_cc_ids': [],
'forum_topic_id': None,
'problem_id': None,
'has_incidents': False,
'is_public': True,
'due_at': None,
'tags': ['tag_1',
'tag_2',
'tag_3',
'tag_4'],
'custom_fields': [{'id': 360042034433, 'value': 'value of the first custom field'},
{'id': 360041487874, 'value': 'value of the second custom field'},
{'id': 360041489414, 'value': 'value of the third custom field'},
{'id': 360040980053, 'value': 'correo_electrónico'},
{'id': 360040980373, 'value': 'suscribe_newsletter'},
{'id': 360042046173, 'value': None},
{'id': 360041028574, 'value': 'product'},
{'id': 360042103034, 'value': None}],
'satisfaction_rating': {'score': 'unoffered'},
'sharing_agreement_ids': [],
'comment_count': 2,
'fields': [{'id': 360042034433, 'value': 'value of the first custom field'},
{'id': 360041487874, 'value': 'value of the second custom field'},
{'id': 360041489414, 'value': 'value of the third custom field'},
{'id': 360040980053, 'value': 'correo_electrónico'},
{'id': 360040980373, 'value': 'suscribe_newsletter'},
{'id': 360042046173, 'value': None},
{'id': 360041028574, 'value': 'product'},
{'id': 360042103034, 'value': None}],
'followup_ids': [],
'ticket_form_id': 360003608013,
'deleted_ticket_form_id': 360003608013,
'brand_id': 360004571673,
'satisfaction_probability': None,
'allow_channelback': False,
'allow_attachments': True},
What I already tried is the following: I have converted the JSON format into a dict as following:
x = response.json()
df = pd.DataFrame(x['tickets'])
But I'm struggling with the output. I don't know how to get a correct, ordered, normalized dataframe.
(I'm new in this :) )
Let's supose you get your request data by this code r = requests.get(url, auth)
Your data ins't clear yet, so let's get a dataframe of it data = pd.read_json(json.dumps(r.json, ensure_ascii = False))
But, probably you will get a dataframe with one single row.
When I faced a problem like this, I wrote this function to get the full data:
listParam = []
def listDict(entry):
if type(entry) is dict:
listParam.append(entry)
elif type(entry) is list:
for ent in entry:
listDict(ent)
Because your data looks like a dict because of {'tickets': ...} you will need to get the information like that:
listDict(data.iloc[0][0])
And then,
pd.DataFrame(listParam)
I can't show the results because you didn't post the complete data nor told where I can find the data to test, but this will probably work.
You have to convert the json to dictionary first and then convert the dictionary value for key 'tickets' into dataframe.
file = open('file.json').read()
ticketDictionary = json.loads(file)
df = pd.DataFrame(ticketDictionary['tickets'])
'file.json' contains your data here.
df now contains your dataFrame in this format.
For the lists within the response you can have separate dataframes if required:
for field in df['fields']:
df = pd.DataFrame(field)
It will give you this for lengths:
id value
0 360042034433 value of the first custom field
1 360041487874 value of the second custom field
2 360041489414 value of the third custom field
3 360040980053 correo_electrónico
4 360040980373 suscribe_newsletter
5 360042046173 None
6 360041028574 product
7 360042103034 None
This can be one way to structure as you haven't mentioned the exact expected format.

Flattening of JSON for analysis

Require a small help. I need to flatten this json so that i can use it for analysis.
Sample for the json is :
{'data': [{'tag': 'U128_CRC_2', 'timestamp': 1575234889002, 'value': 0.0}],
'metadata': {'event': 'alarm.reset',
'idx': '1372',
'timestamp': 1575234889.002701},
'productID': '41ae4b41-56be-4bf2-a6a8-7aee4d15bf54',
'timestamp': 1575234889008,
'topicIdx': '1'}
I ran the following code :
from pandas import json_normalize
with open('NewJson.json') as f:
d1 = json.load(f)
works_data = json_normalize(data=d1, record_path='data',
meta=['tag','value','timestamp'])
I get the below error for the same:
KeyError: "Try running with errors='ignore' as key 'tag' is not always present"
Can anybody please help
Your problem is that the 'data' key is a list and a dictionary. You have to remove the list manually:
Example:
from pandas.io.json import json_normalize
d1 = {'data': [{'tag': 'U128_CRC_2', 'timestamp': 1575234889002, 'value': 0.0}],
'metadata': {'event': 'alarm.reset',
'idx': '1372',
'timestamp': 1575234889.002701},
'productID': '41ae4b41-56be-4bf2-a6a8-7aee4d15bf54',
'timestamp': 1575234889008,
'topicIdx': '1'}
d1['data'] = d1.get('data')[0]
works_data = json_normalize(data=d1)
works_data
Output:
productID timestamp topicIdx data.tag data.timestamp data.value metadata.event metadata.idx metadata.timestamp
0 41ae4b41-56be-4bf2-a6a8-7aee4d15bf54 1575234889008 1 U128_CRC_2 1575234889002 0.0 alarm.reset 1372 1.575235e+09

Python: Iterating over list of dictionaries and output to a list

I am looping over the json object and output of the object is in the form of lists of dictionaries.
[{'Status': 'active', 'id': '0f1fb86da9c7ee380'}]
[{'Status': 'active', 'id': '0d6b330e4960c3382'}, {'Status': 'active', 'id': '033cfb634e595ccfa'}]
[{'Status': 'active', 'id': '0457f623cbb9f7c95'}]
[{'Status': 'active', 'id': '01b69eb6a3048f749'}, {'Status': 'active', 'id': '0f7ce44a9a5fc82f5'}, {'Status': 'active', 'id': '05417e161acf3ec5d'}]
[{'Status': 'active', 'id': '033cfb634e595ccfa'}, {'Status': 'active', 'id': '01eab32f9808acf19'}]
I have tried something like this so far but it prints in the form of string. I tried using list and appending it but it gives me a weird output. If I don't use list then it gives me the list of all in the form of strings.
Current output:
0f1fb86da9c7ee380
0d6b330e4960c3382
033cfb634e595ccfa
0457f623cbb9f7c95
01b69eb6a3048f749
0f7ce44a9a5fc82f5
05417e161acf3ec5d
0f373f123dc8221de
05417e161acf3ec5d
My code:
for i in data['DBI']:
t = 0
while t < len(i['Groups']):
print i['Groups'][t]['id']
t += 1`
Expected Output: I am looking for the output in the form of the list like this ['0f1fb86da9c7ee380'] ['0d6b330e4960c3382','033cfb634e595ccfa'] ['0457f623cbb9f7c95'] ['01b69eb6a3048f749','0f7ce44a9a5fc82f5',''05417e161acf3ec5d'] ['033cfb634e595ccfa','01eab32f9808acf19']
What you want is something like this:
data = [
[{'Status': 'active', 'id': '0f1fb86da9c7ee380'}],
[{'Status': 'active', 'id': '0d6b330e4960c3382'}, {'Status': 'active', 'id': '033cfb634e595ccfa'}],
[{'Status': 'active', 'id': '0457f623cbb9f7c95'}],
[{'Status': 'active', 'id': '01b69eb6a3048f749'}, {'Status': 'active', 'id': '0f7ce44a9a5fc82f5'}, {'Status': 'active', 'id': '05417e161acf3ec5d'}],
[{'Status': 'active', 'id': '033cfb634e595ccfa'}, {'Status': 'active', 'id': '01eab32f9808acf19'}],
]
new_data = []
for l in data:
current_ids = []
for d in l:
current_ids.append(d["id"])
new_data.append(current_ids)
new_data
output:
[['0f1fb86da9c7ee380'],
['0d6b330e4960c3382', '033cfb634e595ccfa'],
['0457f623cbb9f7c95'],
['01b69eb6a3048f749', '0f7ce44a9a5fc82f5', '05417e161acf3ec5d'],
['033cfb634e595ccfa', '01eab32f9808acf19']]
You can also use list comprehension to achieve this.
output = [[item["id"] for item in items] for items in data]
data =json.loads(v_string)
for id in data['DBI']:
datalist = id['Groups']
i=0
data_list=[] #<-- This was outside above mann for loop.
while i < len(datalist):
for dic in datalist[i]:
if 'active' not in datalist[i][dic]:
data_list.append(datalist[i][dic])
print(data_list)
i+=1

Categories

Resources