Parsing JSON into pandas dataframe in Python 3 - python

I've been some trouble getting JSON code into a pandas dataframe in python. This is what my JSON code looks like:
{
"results": [
{
"events": [
{
"id": 132,
"name": "rob",
"city": "nyc",
"age": 55
},
{
"id": 324,
"name": "sam",
"city": "boston",
"age": 35,
"favColor": "green"
},
{
"id": 556,
"name": "paul",
"age": 23,
"favColor": "blue"
},
{
"id": 635,
"name": "kyle",
"city": "nyc"
}
]
}
],
"responseinfo": {
"inspectedCount": 295822,
"omittedCount": 0,
"matchCount": 119506,
"wallClockTime": 34
}
}
I'm only trying to create a dataframe out of the data inside the events node and create columns of the keys. In some of these keys are missing however, so these would all have to be merged together to make sure all keys/columns exist.
I tried cycling through each node populating a dictionary and then merging these but I cant figure it out. Any ideas how I can tackle this?
Thanks!
Rob

You can try to use the json module from the standard library to parse the json data, then converting the list of dicts to a Dataframe, like this:
import json
import pandas as pd
json_data = """ {
"results": [
{ ..."""
data = json.loads(json_data)
events = data["results"][0]["events"]
df = pd.DataFrame(events)

Related

How to create nested Json data in pandas?

I have a CSV file which I convert into JSON. However, in JSON, I need to format a specific column with curly brackets.
The field time has value "DAY=20220524", this has to be converted into {"DAY":20170801}
json data:
{"ID":200,"Type":"ABC","time":"DAY=20220524"}
{"ID":400,"Type":"ABC","time":"NOON=20220524"}
expected output:
{"ID":200,"Type":"ABC","time": {"DAY":20170801}}
{"ID":400,"Type":"ABC","time": {"DAY":20170801}}
I am not sure how do I do this. Can anyone please help me with this?
With the following file.json:
[
{
"ID": 200,
"Type": "ABC",
"time": "DAY=20220524"
},
{
"ID": 400,
"Type": "ABC",
"time": "NOON=20220524"
}
]
Here is one way to do it:
import pandas as pd
pd.read_json("file.json").assign(
time=lambda df_: df_["time"].apply(lambda x: f"{{{x}}}")
).to_json("new_file.json", orient="records")
In new_file.json:
[
{
"ID": 200,
"Type": "ABC",
"time": "{DAY=20220524}"
},
{
"ID": 400,
"Type": "ABC",
"time": "{NOON=20220524}"
}
]

Group By and Count occurences of values in list of nested dicts

I have a JSON file that looks structurally like this:
{
"content": [
{
"name": "New York",
"id": "1234",
"Tags": {
"hierarchy": "CITY"
}
},
{
"name": "Los Angeles",
"id": "1234",
"Tags": {
"hierarchy": "CITY"
}
},
{
"name": "California",
"id": "1234",
"Tags": {
"hierarchy": "STATE"
}
}
]
}
And as an outcome I would like a table view in CSV like so:
tag.key
tag.value
occurrance
hierarchy
CITY
2
hierarchy
STATE
1
Meaning I want to count the occurance of each unique "tag" in my json file and create an output csv that shows this. My original json is a pretty large file.
Firstly construct a dictionary object by using ast.literal_eval function, and then split this object to get a key, value tuples in order to create a dataframe by using zip. Apply groupby to newly formed dataframe, and finally create a .csv file through use of df_agg.to_csv such as
import json
import ast
import pandas as pd
Js= """{
"content": [
{
"name": "New York",
"id": "1234",
"Tags": {
"hierarchy": "CITY"
}
},
....
....
{
"name": "California",
"id": "1234",
"Tags": {
"hierarchy": "STATE"
}
}
]
}"""
data = ast.literal_eval(Js)
key = []
value=[]
for i in list(range(0,len(data['content']))):
value.append(data['content'][i]['Tags']['hierarchy'])
for j in data['content'][i]['Tags']:
key.append(j)
df = pd.DataFrame(list(zip(key, value)), columns =['tag.key', 'tag.value'])
df_agg=df.groupby(['tag.key', 'tag.value']).size().reset_index(name='occurrance')
df_agg.to_csv(r'ThePath\\to\\your\\file\\result.csv',index = False)

Retrieve data from json file using python

I'm new to python. I'm running python on Azure data bricks. I have a .json file. I'm putting the important fields of the json file here
{
"school": [
{
"schoolid": "mr1",
"board": "cbse",
"principal": "akseal",
"schoolName": "dps",
"schoolCategory": "UNKNOWN",
"schoolType": "UNKNOWN",
"city": "mumbai",
"sixhour": true,
"weighting": 3,
"paymentMethods": [
"cash",
"cheque"
],
"contactDetails": [
{
"name": "picsa",
"type": "studentactivities",
"information": [
{
"type": "PHONE",
"detail": "+917597980"
}
]
}
],
"addressLocations": [
{
"locationType": "School",
"address": {
"countryCode": "IN",
"city": "Mumbai",
"zipCode": "400061",
"street": "Madh",
"buildingNumber": "80"
},
"Location": {
"latitude": 49.313885,
"longitude": 72.877426
},
I need to create a data frame with schoolName as one column & latitude & longitude are others two columns. Can you please suggest me how to do that?
you can use the method json.load(), here's an example:
import json
with open('path_to_file/file.json') as f:
data = json.load(f)
print(data)
use this
import json # built-in
with open("filename.json", 'r') as jsonFile:
Data = jsonFile.load()
Data is now a dictionary of the contents exp.
for i in Data:
# loops through keys
print(Data[i]) # prints the value
For more on JSON:
https://docs.python.org/3/library/json.html
and python dictionaries:
https://www.programiz.com/python-programming/dictionary#:~:text=Python%20dictionary%20is%20an%20unordered,when%20the%20key%20is%20known.

Flattening an array in a JSON object

I have a JSON object which I want to flatten before exporting it to CSV. I'd like to use the flatten_json module for this.
My JSON input looks like this:
{
"responseStatus": "SUCCESS",
"responseDetails": {
"total": 5754
},
"data": [
{
"id": 1324651
},
{
"id": 5686131
},
{
"id": 2165735
},
{
"id": 2133256
}
]
}
Easy so far even for a beginner like me, but what I'm interesting in exporting is only the data array. So, I would think of this:
data_json = json["data"]
flat_json = flatten_json.flatten(data_json)
Which doesn't work, since data is an array, stored as a list in Python, not as a dictionary:
[
{
"id": 1324651
},
{
"id": 5686131
},
{
"id": 2165735
},
{
"id": 2133256
}
]
How should I proceed to feed the content of the data array into the flatten_json function?
Thanks!
R.
This function expects a ditionary, let's pass one:
flat_json = flatten_json.flatten({'data': data_json})
Output:
{'data_0_id': 1324651, 'data_1_id': 5686131, 'data_2_id': 2165735, 'data_3_id': 2133256}
You can choose the keys you want to ignore when you call the flatten method. For example, in your case, you can do the following.
flatten_json.flatten(dic, root_keys_to_ignore={'responseStatus', 'responseDetails'})
where dic is the original JSON input.
This will give as output:
{'data_0_id': 1324651, 'data_1_id': 5686131, 'data_2_id': 2165735, 'data_3_id': 2133256}

I am searching to convert complex json to csv using python or R

It would be helpful if values are converted as rows of CSV and keys are received a columns of CSV.
{
"_id": {
"$uId”: “12345678”
},
“comopany_productId”: “J00354”,
“`company_product name`”: “BIKE 12345”,
"search_results": [
{
“product_id”: "44zIVQ",
"constituents”: [
{
“tyre”: “2”,
"name": “dunlop”
},
{
"strength": “able to move 100 km”,
"name": “MRF”
}
],
"name": “Yhakohuka”,
"form": “tyre”,
"schedule": {
"category": “a”,
"label": "It needs a good car to fit in”
},
"standardUnits": 20,
"price": 2000,
"search_score”:0.947474,
“Form”: “tyre”,
"manufacturer": “hum”,
"id": “12345678”,
"size": “4”
},
I want uId, company_productId”, "company_product name", various keys in search_results “tyre”,"name",""strength","name","form","schedule","category","label","standard units","price","search_score","Form","manufacturer","id","size" ias difference column in excel and values as rows.
In python you can use libraries pandas and json to convert it to a csv like this:
from pandas.io.json import json_normalize
import json
json_normalize(json.loads('your_json_string')).to_csv('file_name.csv')
If you have your json saved on a file, use json.load instead, passing the file object to it.

Categories

Resources