I have a very weird json file with a lot of nesting in it. I need to convert it into a Pandas dataframe.
The Json looks something like this:
{
"data": {
"page1": {
"last_name": "suraj",
"first_name": "singh",
"dob": "2020-06-02",
"gender": "Male",
"address1": "asdf",
"city": "asdf",
"state": "ID",
"Zip": "34324",
"phone": "2343243242",
"emailaddress": "suraj.singh#fugetroncorp.com",
"ethnicity": "adsf",
"url": " iVBORw0KGgoAAAANSUhEUgAAAVIAAABkCAYAAADUgbjrAAANS0lEQVR4Xu2dXeh1RRXGH++EICMwyIy3bt4LjSwkUAnMCwP7oA9SKqKSQkMxk64Ky4Lopu9AhQgqIqKEPsiMSLAgLEFCQcGoyEIJEso+yLoqfq8zMez3/I/7/Pee2fPxDBzOef9n75am1njXvc9aaNbP2GXIzAkbACBiBRQicsehu32wEjIARMAIykXoSGAEjYAQWImAiXQigbzcCRsAImEg9B4yAETACCxEwkS4E0LcbASNgBEykngNGwAgYgYUImEgXAujbjYARMAImUs+BkRB4jiReJ/Yo/TdJT4bvHx0JHOt6fARMpMfHznfWg8CrJJ0VSPJF4f1lQTzeIc+lDXJNX7G/+Lf4799J+qWkV0r6mCTk4QUpm5iXWqHS+02klRrGYp0iHzxHSDCSY/Qo4/tJSec0hNVjkn4bCPVrkn7akOwWdQ8CJlJPjy0RgCAvkITXGD1HvMt9LYbeeHd/l/TscPHUM0z7SD3SqXcaiRqPtmS7W9LlJQf0WPkQMJHmw9Y9n44AZHmpJMiSzxBp2iDJB4LHBjHymfcYEvPvEi31gON4UdboDce/T0kaj/PXks4Pof3Z4fN5kl4r6TXhxq9KurqEMh4jPwIm0vwYjzwCJIPH+cbwSonzDyG0hRzjKyZ5esEMj/MSSa8O7+j1K0k/kPQZSf/oRdHR9TCRjj4D1tcfTzMlzzhCJE7WBXn1mni5UNKng9JxmeIXku6VdKckPv9nfdjd45YImEi3RL+PsfEyY8iO50ko+6yg2s8CaRLG9kqcqHqlpOvCUsVfgv683yoJDEgyuXWMgIm0Y+NmUg3iZJ0zJojS5BBe55clPRQItLdQfQrpNZLekKx7/lDSLZJ+ExJhmUzgbmtDwERam0Xqkoc1zjQ5NN2TmYbrcZ2zLg3ySPNWSdeGpNm3JD3lxFEeoFvp1UTaiqXKyQlZvivJrKcjPxgSQ6xxjkScEQOWLm4M2IDBxyU9LOmJcubxSDUiYCKt0SrbyPTuQBLxRBBSsL4HYcYEUe+h+lHIs3zxubAGyhatz4fXNpbyqNUhYCKtziTFBYIkvpLs6YQ8SQ59LzlzXlyoSgbkR4U1TzxRljHABRId9QelErPUJ4aJtD6blJQILxQSpRG2f8DHFk9hQUINAgUfGiF87zsPSs677sYykXZn0tkK4Vmx3kfjhA1EMXojuQYm/KDwGe+cz6VOVI2Of7P6m0ibNd0iwalKhMfFeh9eF2H86I3wHe8cAiWMBxcXFRl9VszU30Q6E6iOLmNN9J6gz2Umi1NhPAQa98PeFLxzr4N2NOlzq2IizY1wXf1zfPH+INL1km6rS7zi0kTPnIG/H4qMOIwvbob2BzSRtm/DQzTgGCPHFglZ8UZHbelOBcJ4CNVrxKPOhhX0NpGuAGJDXRDSQyJkoSGP0Rrrn6wNk0CiUVyZzw7jR5sJK+trIl0Z0Mq7G5lISSbFTfVOJlU+UVsTz0TamsWWyRvXBEfySNM9oexSIIQHB3uhy+aS704QMJGONR3eE6ozkVBhjbR3MiFsJ5SPe0LZ0tRzOb+xZnNF2ppIKzJGAVEgFMJ7jj5SvehLBcbcYgjWgSFQ3r1XdgsLDDamiXQwg0v6YFLB/SpJd3QGAaE7NUKZ2w7jOzNureqYSGu1TF65YtKJyu03dHKyiRqhtydhPOugPpmUdx6594CAiXTcqUAVe550SWs5+ZTWCH1c0s3eEzrupN5KcxPpVshvP+4Vku5KxHhE0tsaKtDBOi8FRiBS1n7ZExqrNW2PriUYCgET6VDmPk1Zwl9OO/HAOhpZfP72hYphiZvqIU1XaKrYUCOJZiIdydq7dT0ZapGemXzN9qg3VbZVCNLkESgQPZ/JxrO9yUc7PYc3R8BEurkJqhCAMBlC4nn0aYOoavBOkYMwns31NGTypvoqpo6FAAETqedBikBaDSn+nQ3sFH7eIgOeHutEHhda9nytEgETaZVm2VQoNrFTPX/qnUKk1OosUWYu3VAPGDwGBZJ3AepNp4YHPwoBE6nnxi4EWIMknOZ11uQCCBWipX7n2o1xKbKMJ0rzOujaCLu/LAiYSLPA2k2nrEniCZLkmTZCftZV2Xa0xvl1SJvqTLF5HbSbadS/IibS/m28hob7CJX+CfchVbzUQ0mVRBdeKO80V6pfw2LuoygCJtKicDc/GITK/k1eJ47Q5hBShXyjt+saoc1Pj3EVMJGOa/ulmuNBQqisZ+4j1biempbs497vejvTUhP4/loQMJHWYom25cBThVDJtlN5aVeL66nxbDzX2Att2+6WPiBgIvVUyIEAhBqJdbqNivH+JelTgz43Kgfe7nNjBEykGxtggOFfGoosv3miK0kpwv4aTk4NYAarmBMBE2lOdN339GTShyU9FdZWWQ5gj+ofJX00nFo6NONvhI1AFQiYSKswQ3dCQJIkk+KWJtZCIdX0VBTfvU7Se0OyimQUm/2pjVri9FR3oFuh7RAwkW6HfY8jT58bj46cj4dE9z1oj8347ACI66kQKQcBuLf3B/T1OA+G08lEOpzJsykMEXIyCTKNjfVPSHJuw0uFQGPmn1A/eqkO++ei6OuKI2AiLQ55dwOSoYdAYxiPgkvPyO8665/zjH93RrFCZREwkZbFu6fRIDsIdPp4Dyo1Ecqv5UHSP15tDPtjtp8z/g77e5pRDetiIm3YeBuKviuMR5ychUbwfBk3HimFRCmrR3JqLdLeEFIP3TICJtKWrVde9mmBkVQCHk1Sol4oOwLwUNOjqYT98eRUeVQ84vAImEiHnwKzACCM51EfJIKmbe1QfpZA4aJdYT+EimfssP8QJH3tIgRMpIvgG+JmvFDWQgmtp62WRyAjI16qw/4hpmR9SppI67NJTRLh8VErdNrIyvNdiVD+EDximb+0sj8y4qFu8cypQ2T3tQ0jYCJt2HiZRYeAdlVyIpSHRGs/fTQN+719KvOEGbl7E+nI1t+tO+uh90z2hcYrc2blc1mCJQk81PijwA8Amf7avOlc+rvfAgiYSAuA3NAQ04LLUXRCeSo17Uo2taJezPaTNKPFSv6uPtWKBSuW00RasXEKi8Z2ItZD0yOeiEDBETy6Xjy4eGoKfdnkzx7UeK7f+1ELT7pehjOR9mLJZXpAlLfsINEttzYt0+iZ74ZQ3yLp5lB9Kp7rZyeCE1PPjJ+vSBAwkXo64I1BotM2p2pTL+jFB/pdGhSKp6bwwl2BqhcrZ9TDRJoR3Aa6Tp/imYp7aNWmBlSdJeJRT0kFJ0iVR0W7GYHTEDCRjjspjtredHU4bjkuMk9rTuKNddT00dOE/Dc1sPVrdNsV199EWhzyzQdkbZDq9btOKplEd5sHrFgCiaE/a8rO9m8+lesRwERajy1KSHLUHlG2N+F9Ocmy3woQKITKs6bYi9rydrAS822YMUykw5hazw9rfK+YqLzreUrjoHK4pninHFigmUwPx6/LO0ykXZp1p1K3Srpu8k3P25tyWjYl07dL+mbOwdx3/QiYSOu30RoSXiHprklHI21vWgPDaR9XSvqIpH9K4jHTXhbJgXIjfZpIGzHUAjFfLuk7ktjaE9s7JX19QZ++9WkESDi9X9J9ki4yKOMiYCLt3/Zstk+TIldJ+okLH69i+JOSfhx+pF4v6c5VenUnzSFgIm3OZAcJ/AJJjyV3ODlyEHyzLv6ipBvC3lu2j7kNiICJtG+jp8c/75Z0ed/qbqJduv7s/0+bmGD7QW347W2QS4LzJD2cdH6ZEyJZoE4z+MY4C8T1d2oird9Gx5WQo4yfDTffJun643bk+/YiQBLv9+GKF/vR0GPOFhNpn3Y/V9K3JV0c1LOnlNfO/zWR5gW49t5NpLVb6HjyfUjSJ8Otd0giU++WB4ELJd1vIs0Dbiu9mkhbsdR8OfFC700uP0fSn+bf7isPRCAuoVAYmtDebUAETKT9GZ2QnlM3tHdI+kZ/KlalEevQkOlfJT23KsksTDEETKTFoC420EOSzg8Z+5cUG3XcgShgEksS+v/ToPPAhu/T8OwfdYm3Mrb9s6SzJT0h6XllhvQotSFgIq3NIpanNQRINJFwYs+uI4DWrLeSvCbSlYB0N8Mi8LgkEnom0mGngGQiHdj4Vn0xAumpJsrosV/XbUAETKQDGt0qr4bA+yTdHnozka4Ga3sdmUjbs5kl3hYBjoTyEDye38STRmPjCC5Hcd0GRMBEOqDRrfJsBCBNXhcE0iSUTwtkx44oVfjC2b36wu4QMJF2Z9JmFeIJp0+uIH0kuhOS6JMXLf28a5h4H+9cm3qb+8T6eTiO+6MVZHcXjSJgIm3UcJ2JDXFxMmhfS0mWz5Eg4z3Tf+eEiCevPhDO2H8i50Duuw0ETKRt2Kl3KecQ6VwM/i3pzLkXz7yOBwVCnLxIKnGu3s0I/B8BE6knQy0IxNA+9SxjmL1PxkhqeKmp15qG9NPQnu+O6jv2Q78Q5xrLDbVgbDkyIWAizQSsuzUCRmAcBEyk49jamhoBI5AJARNpJmDdrREwAuMgYCIdx9bW1AgYgUwImEgzAetujYARGAeB/wEMT+10S9jf7wAAAABJRU5ErkJggg==",
"meds": [
[
"asdf"
]
],
"guardian": false,
"guardianName": "N/A",
"optout": false,
"currentDate": "06-30-2020",
"values": [
{
"value": "asdf"
}
]
}
How can I create a proper structured dataFrame using this so that I can export it into a CSV for a better understanding.
Related
I want to convert a CSV to a JSON format using pandas. I am a tester and want to send some events to Event Hub for that I want to maintain a CSV file and update my records/data using the CSV file. I created a CSV file by reading a JSON using pandas for reference. Now when I am again converting the CSV into JSON using pandas< the data is not getting displayed in the correct format. Can you please help.
Step 1: Converted JSON to CSV using pandas:
df = pd.read_json('C://Users//DAMALI//Desktop/test.json')
df.to_csv('C://Users//DAMALI//Desktop/test.csv')
Step2: Now if I try to convert the JSON again to CSV, it's not getting converted in the same format as earlier:
df = pd.read_csv('C://Users//DAMALI//Desktop/test.csv')
df.to_json('C://Users//DAMALI//Desktop/test1.json')
Providing JSON below:
{
"body": {
"deviceId": "UDM",
"registrationDate": "12/11/2019",
"testRegistration": false,
"serialNumber": "25",
"articleNumber": "R91",
"deviceName": "UDM-test",
"locationId": "lc0",
"sapSoldToId": "1138474",
"crmDomainAccountId": "1234566",
"crmAccountDetails": {
"accountName": "ProjectX",
"accountId": "Instal",
"region": "AP"
},
"productLine": "UD",
"state": "registered",
"installerName": "ABC Rooms",
"installationAddress": {
"street": "Benelu",
"zipCode": "850",
"city": "Kortr",
"state": "OVL",
"country": "Belgi"
},
"customerDetails": {
"name": "John D",
"contactName": "John Doe",
"phone": "+32 999999999",
"email": "john.doe#test.com"
},
"wallConnect": {
"wallSize": "Width 5 x Height 4",
"wallOrientation": "LANDSCAPE",
"displayType": "BVD-D55M21H321A1C300",
"softwareVersion": "1.13.1.1.3"
},
"projector": {
"name": "UDX 40K-123456789",
"subType": "UDX 40K"
},
"featureLicense": ["UDX-aa00213a-5719-440e-a3b5", "UDX-aa00a-571"],
"cloudServiceLicense": ["EN04d5-4d2a-9131-875ad37c5883", "E15-4d2a-9131-875ad37c5154"],
"metadata": {
"cusQuesAns": [{
"ques": "End ucal industry",
"ans": "Hosity",
"key": "CUST_ANSWER"
},
{
"ques": "End user video wall application",
"ans": "Simulation & Virtual Reality",
"key": "CUSSECOND_ANSWER"
}
]
},
"frequency": "realtime",
"subDevices": [{
"deviceType": "DISPLAY",
"serialNumber": "68960",
"articleNumber": "R792",
"wallConnect": {
"displayFMWVersion": "3.0.0",
"displayVariant": "KVD21H331A1C300"
}
}]
},
"properties": {
"drs": {
"type": "salesforce-lm"
}
},
"systemProperties": {
"user-id": "data-cvice",
"message-id": "1b1012cc-9b18c192"
}
}
Try this for converting CSV to JSON
import pandas as pd
df = pd.read_csv (r'Fayzan-Bhatti\test.csv')
df.to_json (r'Fayzan-Bhatti\new_test.json')
I want to exchange 2 json data's value. But keys of these datas are different from each other. I don't know how can I exchange data value between them.
sample json1: A
{
"contact_person":"Mahmut Kapur",
"contact_people": [
{
"email": "m#gmail.com",
"last_name": "Kapur"
}
],
"addresses": [
{
"city": "istanbul",
"country": "CA",
"first_name": "Mahmut",
"street1": "adres 1",
"zipcode": "34678",
"id": "5f61f72b8348230004f149fd"
}
]
"created_at": "2020-09-16T07:29:47.244-04:00",
"updated_at": "2020-09-16T07:32:50.567-04:00",
}
sample json2: B
The values in this example are: Represents the keys in the A json.
{
"Customer":{
"DisplayName":"contact_person",
"PrimaryEmailAddr":{
"Address":"contact_people/email"
},
"FamilyName":"contact_people/last_name",
"BillAddr":{
"City":"addresses/city",
"CountrySubDivisionCode":"addresses/country",
"Line1":"addresses/street1",
"PostalCode":"addresses/zipcode",
"Id":"addresses/id"
},
"GivenName":"addresses/first_name",
"MetaData":{
"CreateTime":"created_at",
"LastUpdatedTime":"updated_at"
}
}
}
The outcome needs to be:
{
"Customer":{
"DisplayName":"Mahmut Kapur",
"PrimaryEmailAddr":{
"Address":"m#gmail.com"
},
"FamilyName":"Kapur",
"BillAddr":{
"City":"istanbul",
"CountrySubDivisionCode":"CA",
"Line1":"adres 1",
"PostalCode":"34678",
"Id":"5f61f72b8348230004f149fd"
},
"GivenName":"Mahmut",
"MetaData":{
"CreateTime":"2020-09-16T07:29:47.244-04:00",
"LastUpdatedTime":"2020-09-16T07:32:50.567-04:00"
}
}
}
So the important thing here is to match the keys. I hope I was able to explain my problem.
This code can do the work for you. I dont know if someone can make this code shorter for you. It basically searches for dict and list till the leaf level and acts accordingly.
a={
"contact_person":"Mahmut Kapur",
"contact_people": [
{
"email": "m#gmail.com",
"last_name": "Kapur"
}
],
"addresses": [
{
"city": "istanbul",
"country": "CA",
"first_name": "Mahmut",
"street1": "adres 1",
"zipcode": "34678",
"id": "5f61f72b8348230004f149fd"
}
],
"created_at": "2020-09-16T07:29:47.244-04:00",
"updated_at": "2020-09-16T07:32:50.567-04:00",
}
b={
"Customer":{
"DisplayName":"contact_person",
"PrimaryEmailAddr":{
"Address":"contact_people/email"
},
"FamilyName":"contact_people/last_name",
"BillAddr":{
"City":"addresses/city",
"CountrySubDivisionCode":"addresses/country",
"Line1":"addresses/street1",
"PostalCode":"addresses/zipcode",
"Id":"addresses/id"
},
"GivenName":"addresses/first_name",
"MetaData":{
"CreateTime":"created_at",
"LastUpdatedTime":"updated_at"
}
}
}
c={}
for keys in b:
if isinstance(b[keys], dict):
for items in b[keys]:
if isinstance(b[keys][items], dict):
for leaf in b[keys][items]:
if "/" in b[keys][items][leaf]:
getter=b[keys][items][leaf].split("/")
b[keys][items][leaf]=a[getter[0]][0][getter[1]]
else:
b[keys][items][leaf]=a[b[keys][items][leaf]]
else:
if "/" in b[keys][items]:
getter=b[keys][items].split("/")
b[keys][items]=a[getter[0]][0][getter[1]]
else:
b[keys][items]=a[b[keys][items]]
else:
if "/" in b[keys]:
getter=b[keys].split("/")
b[keys]=a[getter[0]][0][getter[1]]
else:
b[keys]=a[b[keys]]
print(json.dumps(b,indent=4))
I'm new to python. I'm running python on Azure data bricks. I have a .json file. I'm putting the important fields of the json file here
{
"school": [
{
"schoolid": "mr1",
"board": "cbse",
"principal": "akseal",
"schoolName": "dps",
"schoolCategory": "UNKNOWN",
"schoolType": "UNKNOWN",
"city": "mumbai",
"sixhour": true,
"weighting": 3,
"paymentMethods": [
"cash",
"cheque"
],
"contactDetails": [
{
"name": "picsa",
"type": "studentactivities",
"information": [
{
"type": "PHONE",
"detail": "+917597980"
}
]
}
],
"addressLocations": [
{
"locationType": "School",
"address": {
"countryCode": "IN",
"city": "Mumbai",
"zipCode": "400061",
"street": "Madh",
"buildingNumber": "80"
},
"Location": {
"latitude": 49.313885,
"longitude": 72.877426
},
I need to create a data frame with schoolName as one column & latitude & longitude are others two columns. Can you please suggest me how to do that?
you can use the method json.load(), here's an example:
import json
with open('path_to_file/file.json') as f:
data = json.load(f)
print(data)
use this
import json # built-in
with open("filename.json", 'r') as jsonFile:
Data = jsonFile.load()
Data is now a dictionary of the contents exp.
for i in Data:
# loops through keys
print(Data[i]) # prints the value
For more on JSON:
https://docs.python.org/3/library/json.html
and python dictionaries:
https://www.programiz.com/python-programming/dictionary#:~:text=Python%20dictionary%20is%20an%20unordered,when%20the%20key%20is%20known.
I've got a JSON structure that looks like the following
{
"PersonInformation": {
"PhysicalStatus": "",
"OpenDetainers": [],
"StartDate": "",
"FacilityLog": [],
"CustStatus": "",
"EndDate": ""
},
"IdentityList": [
{
"CreationDate": "01/01/1999",
"PersonNames": [
{
"Suffix": "",
"FirstName": "Johnny",
"LastName": "Appleseed",
"MiddleName": ""
},
{
"Suffix": "",
"FirstName": "Foo",
"LastName": "Bar",
"MiddleName": ""
}
],
"PlaceOfBirthList": [
{
"City": "Boston",
"State": "MA",
"CountryCode": ""
}
]
}
]
}
I can parse the outer array like so, but I'm having trouble figuring out how to loop through one of the child arrays, like "PersonNames"
So I can do this
myjson = json.loads(json_data)
print myjson['PersonInformation']['PhysicalStatus']
for identity_list in myjson['IdentityList']:
print identity_list['CreationDate']
Which returns
OK
01/01/1999
as expected, but I don't know how to take it to the next level to traverse into and loop through "PersonNames"
Thanks for the assistance
You can iterate through the sub-list under the PersonNames key like this:
for identity in myjson['IdentityList']:
for person in identity['PersonNames']:
print person['FirstName'], person['LastName']
I have json file i what to read all the values
data=""" {"employees":[
{"firstName":"John", "lastName":"Doe"},
{"firstName":"Anna", "lastName":"Smith"},
{"firstName":"Peter", "lastName":"Jones"}
]}
{
"maps":[
{"id":"apple","iscategorical":"0"},
{"id":"ball","iscategorical":"0"}
],
"mask":{"id1":"aaaaa"},
"mask":{"id1":"bbb"},
"mask":{"id1":"cccccc"},
"om_points":"value",
"parameters":
{"id":"valore"}
}"""
out = json.loads(data)
how to get all values
firstname
lastname
mask.id1
map.id
output:
[(firstname_vaues,lastname_values,mask.id1,map.id)
(firstname_vaues,lastname_values,mask.id1,map.id) ......]
please help me
First thing, there are two json objects in your data string. So you cannot use json.loads(data). You can seperate them by a charcter like ";" . Then split the string and use json.loads on each of them.Use following code.
import json
data=""" {
"employees": [{
"firstName": "John",
"lastName": "Doe"
}, {
"firstName": "Anna",
"lastName": "Smith"
}, {
"firstName": "Peter",
"lastName": "Jones"
}]
};{
"maps": [{
"id": "apple",
"iscategorical": "0"
}, {
"id": "ball",
"iscategorical": "0"
}],
"mask": {
"id1": "aaaaa"
},
"mask": {
"id1": "bbb"
},
"mask": {
"id1": "cccccc"
},
"om_points": "value",
"parameters": {
"id": "valore"
}
}"""
splitdata = data.split(';')
datatop = json.loads(splitdata[0])
databottom = json.loads(splitdata[1])
Then you can access required fields as follows
print(datatop['employees'][0]['firstName'])
print(datatop['employees'][0]['lastName'])
print(databottom['mask']['id1'])
print(databottom['maps'][0]['id'])