I have a CSV file which I convert into JSON. However, in JSON, I need to format a specific column with curly brackets.
The field time has value "DAY=20220524", this has to be converted into {"DAY":20170801}
json data:
{"ID":200,"Type":"ABC","time":"DAY=20220524"}
{"ID":400,"Type":"ABC","time":"NOON=20220524"}
expected output:
{"ID":200,"Type":"ABC","time": {"DAY":20170801}}
{"ID":400,"Type":"ABC","time": {"DAY":20170801}}
I am not sure how do I do this. Can anyone please help me with this?
With the following file.json:
[
{
"ID": 200,
"Type": "ABC",
"time": "DAY=20220524"
},
{
"ID": 400,
"Type": "ABC",
"time": "NOON=20220524"
}
]
Here is one way to do it:
import pandas as pd
pd.read_json("file.json").assign(
time=lambda df_: df_["time"].apply(lambda x: f"{{{x}}}")
).to_json("new_file.json", orient="records")
In new_file.json:
[
{
"ID": 200,
"Type": "ABC",
"time": "{DAY=20220524}"
},
{
"ID": 400,
"Type": "ABC",
"time": "{NOON=20220524}"
}
]
Related
Tried solution shared in link :: Nested json to csv - generic approach
This worked for Sample 1 , but giving only a single row for Sample 2.
is there a way to have generic python code to handle both Sample 1 and Sample 2.
Sample 1 ::
{
"Response": "Success",
"Message": "",
"HasWarning": false,
"Type": 100,
"RateLimit": {},
"Data": {
"Aggregated": false,
"TimeFrom": 1234567800,
"TimeTo": 1234567900,
"Data": [
{
"id": 11,
"symbol": "AAA",
"time": 1234567800,
"block_time": 123.282828282828,
"block_size": 1212121,
"current_supply": 10101010
},
{
"id": 12,
"symbol": "BBB",
"time": 1234567900,
"block_time": 234.696969696969,
"block_size": 1313131,
"current_supply": 20202020
},
]
}
}
Sample 2::
{
"Response": "Success",
"Message": "Summary succesfully returned!",
"Data": {
"11": {
"Id": "3333",
"Url": "test/11.png",
"value": "11",
"Name": "11 entries (11)"
},
"122": {
"Id": "5555555",
"Url": "test/122.png",
"Symbol": "122",
"Name": "122 cases (122)"
}
},
"Limit": {},
"HasWarning": False,
"Type": 50
}
Try this, you need to install flatten_json from here
import sys
import csv
import json
from flatten_json import flatten
data = json.load(open(sys.argv[1]))
data = flatten(data)
with open('foo.csv', 'w') as f:
out = csv.DictWriter(f, data.keys())
out.writeheader()
out.writerow(data)
Output
> cat foo.csv
Response,Message,Data_11_Id,Data_11_Url,Data_11_value,Data_11_Name,Data_122_Id,Data_122_Url,Data_122_Symbol,Data_122_Name,Limit,HasWarning,Type
Success,Summary succesfully returned!,3333,test/11.png,11,11 entries (11),5555555,test/122.png,122,122 cases (122),{},False,50
Note: False is incorrect in Json, you need to change it to false
I have a very weird json file with a lot of nesting in it. I need to convert it into a Pandas dataframe.
The Json looks something like this:
{
"data": {
"page1": {
"last_name": "suraj",
"first_name": "singh",
"dob": "2020-06-02",
"gender": "Male",
"address1": "asdf",
"city": "asdf",
"state": "ID",
"Zip": "34324",
"phone": "2343243242",
"emailaddress": "suraj.singh#fugetroncorp.com",
"ethnicity": "adsf",
"url": " iVBORw0KGgoAAAANSUhEUgAAAVIAAABkCAYAAADUgbjrAAANS0lEQVR4Xu2dXeh1RRXGH++EICMwyIy3bt4LjSwkUAnMCwP7oA9SKqKSQkMxk64Ky4Lopu9AhQgqIqKEPsiMSLAgLEFCQcGoyEIJEso+yLoqfq8zMez3/I/7/Pee2fPxDBzOef9n75am1njXvc9aaNbP2GXIzAkbACBiBRQicsehu32wEjIARMAIykXoSGAEjYAQWImAiXQigbzcCRsAImEg9B4yAETACCxEwkS4E0LcbASNgBEykngNGwAgYgYUImEgXAujbjYARMAImUs+BkRB4jiReJ/Yo/TdJT4bvHx0JHOt6fARMpMfHznfWg8CrJJ0VSPJF4f1lQTzeIc+lDXJNX7G/+Lf4799J+qWkV0r6mCTk4QUpm5iXWqHS+02klRrGYp0iHzxHSDCSY/Qo4/tJSec0hNVjkn4bCPVrkn7akOwWdQ8CJlJPjy0RgCAvkITXGD1HvMt9LYbeeHd/l/TscPHUM0z7SD3SqXcaiRqPtmS7W9LlJQf0WPkQMJHmw9Y9n44AZHmpJMiSzxBp2iDJB4LHBjHymfcYEvPvEi31gON4UdboDce/T0kaj/PXks4Pof3Z4fN5kl4r6TXhxq9KurqEMh4jPwIm0vwYjzwCJIPH+cbwSonzDyG0hRzjKyZ5esEMj/MSSa8O7+j1K0k/kPQZSf/oRdHR9TCRjj4D1tcfTzMlzzhCJE7WBXn1mni5UNKng9JxmeIXku6VdKckPv9nfdjd45YImEi3RL+PsfEyY8iO50ko+6yg2s8CaRLG9kqcqHqlpOvCUsVfgv683yoJDEgyuXWMgIm0Y+NmUg3iZJ0zJojS5BBe55clPRQItLdQfQrpNZLekKx7/lDSLZJ+ExJhmUzgbmtDwERam0Xqkoc1zjQ5NN2TmYbrcZ2zLg3ySPNWSdeGpNm3JD3lxFEeoFvp1UTaiqXKyQlZvivJrKcjPxgSQ6xxjkScEQOWLm4M2IDBxyU9LOmJcubxSDUiYCKt0SrbyPTuQBLxRBBSsL4HYcYEUe+h+lHIs3zxubAGyhatz4fXNpbyqNUhYCKtziTFBYIkvpLs6YQ8SQ59LzlzXlyoSgbkR4U1TzxRljHABRId9QelErPUJ4aJtD6blJQILxQSpRG2f8DHFk9hQUINAgUfGiF87zsPSs677sYykXZn0tkK4Vmx3kfjhA1EMXojuQYm/KDwGe+cz6VOVI2Of7P6m0ibNd0iwalKhMfFeh9eF2H86I3wHe8cAiWMBxcXFRl9VszU30Q6E6iOLmNN9J6gz2Umi1NhPAQa98PeFLxzr4N2NOlzq2IizY1wXf1zfPH+INL1km6rS7zi0kTPnIG/H4qMOIwvbob2BzSRtm/DQzTgGCPHFglZ8UZHbelOBcJ4CNVrxKPOhhX0NpGuAGJDXRDSQyJkoSGP0Rrrn6wNk0CiUVyZzw7jR5sJK+trIl0Z0Mq7G5lISSbFTfVOJlU+UVsTz0TamsWWyRvXBEfySNM9oexSIIQHB3uhy+aS704QMJGONR3eE6ozkVBhjbR3MiFsJ5SPe0LZ0tRzOb+xZnNF2ppIKzJGAVEgFMJ7jj5SvehLBcbcYgjWgSFQ3r1XdgsLDDamiXQwg0v6YFLB/SpJd3QGAaE7NUKZ2w7jOzNureqYSGu1TF65YtKJyu03dHKyiRqhtydhPOugPpmUdx6594CAiXTcqUAVe550SWs5+ZTWCH1c0s3eEzrupN5KcxPpVshvP+4Vku5KxHhE0tsaKtDBOi8FRiBS1n7ZExqrNW2PriUYCgET6VDmPk1Zwl9OO/HAOhpZfP72hYphiZvqIU1XaKrYUCOJZiIdydq7dT0ZapGemXzN9qg3VbZVCNLkESgQPZ/JxrO9yUc7PYc3R8BEurkJqhCAMBlC4nn0aYOoavBOkYMwns31NGTypvoqpo6FAAETqedBikBaDSn+nQ3sFH7eIgOeHutEHhda9nytEgETaZVm2VQoNrFTPX/qnUKk1OosUWYu3VAPGDwGBZJ3AepNp4YHPwoBE6nnxi4EWIMknOZ11uQCCBWipX7n2o1xKbKMJ0rzOujaCLu/LAiYSLPA2k2nrEniCZLkmTZCftZV2Xa0xvl1SJvqTLF5HbSbadS/IibS/m28hob7CJX+CfchVbzUQ0mVRBdeKO80V6pfw2LuoygCJtKicDc/GITK/k1eJ47Q5hBShXyjt+saoc1Pj3EVMJGOa/ulmuNBQqisZ+4j1biempbs497vejvTUhP4/loQMJHWYom25cBThVDJtlN5aVeL66nxbDzX2Att2+6WPiBgIvVUyIEAhBqJdbqNivH+JelTgz43Kgfe7nNjBEykGxtggOFfGoosv3miK0kpwv4aTk4NYAarmBMBE2lOdN339GTShyU9FdZWWQ5gj+ofJX00nFo6NONvhI1AFQiYSKswQ3dCQJIkk+KWJtZCIdX0VBTfvU7Se0OyimQUm/2pjVri9FR3oFuh7RAwkW6HfY8jT58bj46cj4dE9z1oj8347ACI66kQKQcBuLf3B/T1OA+G08lEOpzJsykMEXIyCTKNjfVPSHJuw0uFQGPmn1A/eqkO++ei6OuKI2AiLQ55dwOSoYdAYxiPgkvPyO8665/zjH93RrFCZREwkZbFu6fRIDsIdPp4Dyo1Ecqv5UHSP15tDPtjtp8z/g77e5pRDetiIm3YeBuKviuMR5ychUbwfBk3HimFRCmrR3JqLdLeEFIP3TICJtKWrVde9mmBkVQCHk1Sol4oOwLwUNOjqYT98eRUeVQ84vAImEiHnwKzACCM51EfJIKmbe1QfpZA4aJdYT+EimfssP8QJH3tIgRMpIvgG+JmvFDWQgmtp62WRyAjI16qw/4hpmR9SppI67NJTRLh8VErdNrIyvNdiVD+EDximb+0sj8y4qFu8cypQ2T3tQ0jYCJt2HiZRYeAdlVyIpSHRGs/fTQN+719KvOEGbl7E+nI1t+tO+uh90z2hcYrc2blc1mCJQk81PijwA8Amf7avOlc+rvfAgiYSAuA3NAQ04LLUXRCeSo17Uo2taJezPaTNKPFSv6uPtWKBSuW00RasXEKi8Z2ItZD0yOeiEDBETy6Xjy4eGoKfdnkzx7UeK7f+1ELT7pehjOR9mLJZXpAlLfsINEttzYt0+iZ74ZQ3yLp5lB9Kp7rZyeCE1PPjJ+vSBAwkXo64I1BotM2p2pTL+jFB/pdGhSKp6bwwl2BqhcrZ9TDRJoR3Aa6Tp/imYp7aNWmBlSdJeJRT0kFJ0iVR0W7GYHTEDCRjjspjtredHU4bjkuMk9rTuKNddT00dOE/Dc1sPVrdNsV199EWhzyzQdkbZDq9btOKplEd5sHrFgCiaE/a8rO9m8+lesRwERajy1KSHLUHlG2N+F9Ocmy3woQKITKs6bYi9rydrAS822YMUykw5hazw9rfK+YqLzreUrjoHK4pninHFigmUwPx6/LO0ykXZp1p1K3Srpu8k3P25tyWjYl07dL+mbOwdx3/QiYSOu30RoSXiHprklHI21vWgPDaR9XSvqIpH9K4jHTXhbJgXIjfZpIGzHUAjFfLuk7ktjaE9s7JX19QZ++9WkESDi9X9J9ki4yKOMiYCLt3/Zstk+TIldJ+okLH69i+JOSfhx+pF4v6c5VenUnzSFgIm3OZAcJ/AJJjyV3ODlyEHyzLv6ipBvC3lu2j7kNiICJtG+jp8c/75Z0ed/qbqJduv7s/0+bmGD7QW347W2QS4LzJD2cdH6ZEyJZoE4z+MY4C8T1d2oird9Gx5WQo4yfDTffJun643bk+/YiQBLv9+GKF/vR0GPOFhNpn3Y/V9K3JV0c1LOnlNfO/zWR5gW49t5NpLVb6HjyfUjSJ8Otd0giU++WB4ELJd1vIs0Dbiu9mkhbsdR8OfFC700uP0fSn+bf7isPRCAuoVAYmtDebUAETKT9GZ2QnlM3tHdI+kZ/KlalEevQkOlfJT23KsksTDEETKTFoC420EOSzg8Z+5cUG3XcgShgEksS+v/ToPPAhu/T8OwfdYm3Mrb9s6SzJT0h6XllhvQotSFgIq3NIpanNQRINJFwYs+uI4DWrLeSvCbSlYB0N8Mi8LgkEnom0mGngGQiHdj4Vn0xAumpJsrosV/XbUAETKQDGt0qr4bA+yTdHnozka4Ga3sdmUjbs5kl3hYBjoTyEDye38STRmPjCC5Hcd0GRMBEOqDRrfJsBCBNXhcE0iSUTwtkx44oVfjC2b36wu4QMJF2Z9JmFeIJp0+uIH0kuhOS6JMXLf28a5h4H+9cm3qb+8T6eTiO+6MVZHcXjSJgIm3UcJ2JDXFxMmhfS0mWz5Eg4z3Tf+eEiCevPhDO2H8i50Duuw0ETKRt2Kl3KecQ6VwM/i3pzLkXz7yOBwVCnLxIKnGu3s0I/B8BE6knQy0IxNA+9SxjmL1PxkhqeKmp15qG9NPQnu+O6jv2Q78Q5xrLDbVgbDkyIWAizQSsuzUCRmAcBEyk49jamhoBI5AJARNpJmDdrREwAuMgYCIdx9bW1AgYgUwImEgzAetujYARGAeB/wEMT+10S9jf7wAAAABJRU5ErkJggg==",
"meds": [
[
"asdf"
]
],
"guardian": false,
"guardianName": "N/A",
"optout": false,
"currentDate": "06-30-2020",
"values": [
{
"value": "asdf"
}
]
}
How can I create a proper structured dataFrame using this so that I can export it into a CSV for a better understanding.
I'm new to python. I'm running python on Azure data bricks. I have a .json file. I'm putting the important fields of the json file here
{
"school": [
{
"schoolid": "mr1",
"board": "cbse",
"principal": "akseal",
"schoolName": "dps",
"schoolCategory": "UNKNOWN",
"schoolType": "UNKNOWN",
"city": "mumbai",
"sixhour": true,
"weighting": 3,
"paymentMethods": [
"cash",
"cheque"
],
"contactDetails": [
{
"name": "picsa",
"type": "studentactivities",
"information": [
{
"type": "PHONE",
"detail": "+917597980"
}
]
}
],
"addressLocations": [
{
"locationType": "School",
"address": {
"countryCode": "IN",
"city": "Mumbai",
"zipCode": "400061",
"street": "Madh",
"buildingNumber": "80"
},
"Location": {
"latitude": 49.313885,
"longitude": 72.877426
},
I need to create a data frame with schoolName as one column & latitude & longitude are others two columns. Can you please suggest me how to do that?
you can use the method json.load(), here's an example:
import json
with open('path_to_file/file.json') as f:
data = json.load(f)
print(data)
use this
import json # built-in
with open("filename.json", 'r') as jsonFile:
Data = jsonFile.load()
Data is now a dictionary of the contents exp.
for i in Data:
# loops through keys
print(Data[i]) # prints the value
For more on JSON:
https://docs.python.org/3/library/json.html
and python dictionaries:
https://www.programiz.com/python-programming/dictionary#:~:text=Python%20dictionary%20is%20an%20unordered,when%20the%20key%20is%20known.
I've been some trouble getting JSON code into a pandas dataframe in python. This is what my JSON code looks like:
{
"results": [
{
"events": [
{
"id": 132,
"name": "rob",
"city": "nyc",
"age": 55
},
{
"id": 324,
"name": "sam",
"city": "boston",
"age": 35,
"favColor": "green"
},
{
"id": 556,
"name": "paul",
"age": 23,
"favColor": "blue"
},
{
"id": 635,
"name": "kyle",
"city": "nyc"
}
]
}
],
"responseinfo": {
"inspectedCount": 295822,
"omittedCount": 0,
"matchCount": 119506,
"wallClockTime": 34
}
}
I'm only trying to create a dataframe out of the data inside the events node and create columns of the keys. In some of these keys are missing however, so these would all have to be merged together to make sure all keys/columns exist.
I tried cycling through each node populating a dictionary and then merging these but I cant figure it out. Any ideas how I can tackle this?
Thanks!
Rob
You can try to use the json module from the standard library to parse the json data, then converting the list of dicts to a Dataframe, like this:
import json
import pandas as pd
json_data = """ {
"results": [
{ ..."""
data = json.loads(json_data)
events = data["results"][0]["events"]
df = pd.DataFrame(events)
Hi I am trying flatten JSON file but unable to . My JSON has 3 indents repeating sample as below
floors": [
{
"uuid": "8474",
"name": "some value",
"areas": [
{
"uuid": "xyz",
"**name**": "qwe",
"roomType": "Name1",
"templateUuid": "sdklfj",
"templateName": "asdf",
"templateVersion": "2.7.1",
"Required1": [
{
"**uuid**": "asdf",
"description": "asdf3",
"categoryName": "asdf",
"familyName": "asdf",
"productName": "asdf3",
"Required2": [
{
"**deviceId**": "asdf",
"**deviceUuid**": "asdf-asdf"
}
]
}
I want for area the corresponding values in nested Required1 and for the Required1 corresponding required 2.(Highlighted in **)
I have tried JSON normalize as below but failed and other free libs :
Attempts :
from pprint import pprint
with open('Filename.json') as data_file:
data_item = json.load(data_file)
Raw_Areas=json_normalize(data_item['floors'],'areas',errors='ignore',record_prefix='Area_')
No area value displayed. Only Required 1 Required 2 still nested
K=json_normalize(data_item['floors'][0],record_path=['Required1','Required2'],errors='ignore',record_prefix='Try_')
from flatten_json import flatten_json
Flat_J1= pd.DataFrame([flatten_json(data_item)])
Looking to get values as below :
Columns expected :
floors.areas.Required1.Required2.deviceUuid
floors.areas.name
(Side by Side)
Please help am I missing anything in my attempt. I am fairly new to JSON loads.
Assuming the following JSON (as multiple people pointed out, it's incomplete). So I completed it based on the bracket openings you had.
dct = {"floors": [
{
"uuid": "8474",
"name": "some value",
"areas": [
{
"uuid": "xyz",
"name": "qwe",
"roomType": "Name1",
"templateUuid": "sdklfj",
"templateName": "asdf",
"templateVersion": "2.7.1",
"Required1": [
{
"uuid": "asdf",
"description": "asdf3",
"categoryName": "asdf",
"familyName": "asdf",
"productName": "asdf3",
"Required2": [
{
"deviceId": "asdf",
"deviceUuid": "asdf-asdf"
}
]
}
]
}
]
}
]}
You can do the following (requires pandas 0.25.0)
df = pd.io.json.json_normalize(
dct, record_path=['floors','areas', 'Required1'],meta=[['floors', 'areas', 'name']])
df = df.explode('Required2')
df = pd.concat([df, df["Required2"].apply(pd.Series)], axis=1)
df = df[['floors.areas.name', 'uuid', 'deviceId', 'deviceUuid']]
Which gives,
>>> floors.areas.name uuid deviceId deviceUuid
>>> 0 qwe asdf asdf asdf-asdf