I receive this json from an API call:
data = {'List': [{'id': 12403,
'name': 'myname',
'code': 'mycode',
'description': '',
'createdBy': '',
'createdDate': '24-Jun-2008 15:03:59 CDT',
'lastModifiedBy': '',
'lastModifiedDate': '24-Jun-2008 15:03:59 CDT'}]}
I want to handle this data and move it into a dataframe. When I attempt this with json_normalize it's basically putting my list value into a single cell in my dataframe.
My attempt:
import pandas as pd
df = pd.json_normalize(data)
Current output:
List
0 [{'id': 12403, 'name': 'myname', 'code': 'mycode...
Desired output:
Question
What's the best way to work with a list value from JSON to pandas dataframe?
Update
{
"Count": 38,
"Items": [
{
"Actions": [
"edit_",
"remove_",
"attachments_",
"cancel",
"continue",
"auditTrail",
"offline_",
"changeUser",
"linkRecord",
"resendNotification"
],
"Columns": [
{
"Label": "Workflow Name",
"Name": "__WorkflowName__",
"Value": "VOAPTSQA00000735"
},
{
"Label": "Workflow Description",
"Name": "__WorkflowDescription__",
"Value": "Vendor Outsourcing Contract Request (APTSQA | SAP Integration)"
},
{
"Label": "Current Assignee",
"Name": "__CurrentAssignee__",
"Value": "Vendor Outsourcing Integration User"
},
{
"Label": "Last Updated",
"Name": "__DateLastUpdated__",
"Value": "9/7/2022 12:22:14 PM"
},
{
"Label": "Created",
"Name": "__DateCreated__",
"Value": "9/7/2022 12:20:55 PM"
},
{
"Label": "Date Signed",
"Name": "__DateSigned__",
"Value": ""
},
{
"Label": "Completed",
"Name": "__DateCompleted__",
"Value": ""
},
{
"Label": "Status",
"Name": "__Status__",
"Value": "In RFP"
},
{
"Label": "Document ID",
"Name": "__DocumentIdentifier__",
"Value": ""
},
{
"Label": "End Date",
"Name": "__EndDate__",
"Value": "12/31/2033 12:00:00 AM"
},
{
"Label": "Stage Progress",
"Name": "__FormProgress__",
"Value": "0"
},
{
"Label": "Next Signer",
"Name": "__NextSigner__",
"Value": ""
}
],
"ResultSetId": "784a1b83-4d83-4b80-87a3-9c1293baa7d8",
"TaskId": "784a1b83-4d83-4b80-87a3-9c1293baa7d8",
"TokenId": "cdd53c33-803d-4a63-9abd-47b733b55e89"
}
Adding context for my comment about nested list of key pair values. Here when I normalize the json, I get the list of Columns all as one value in a cell.
The values of interest are under the List key, so slice it:
df = pd.json_normalize(data['List'])
output:
id name code description createdBy createdDate lastModifiedBy lastModifiedDate
0 12403 myname mycode 24-Jun-2008 15:03:59 CDT 24-Jun-2008 15:03:59 CDT
Related
I am trying to link several Altair charts that share aspects of the same data. I can do this by merging all the data into one data frame, but because of the nature of the data the merged data frame is much larger than is needed to have two separate data frames for each of the two charts. This is because the columns unique to each chart have many repeated rows for each entry in the shared column.
Would using transform_lookup save space over just using the merged data frame, or does transform_lookup end up doing the whole merge internally?
No, the entire dataset is still included in the vegaspec when you use transform_lookup. You can see this by printing the json spec of the charts you create. With the example from the docs:
import altair as alt
import pandas as pd
from vega_datasets import data
people = data.lookup_people().head(3)
people
name age height
0 Alan 25 180
1 George 32 174
2 Fred 39 182
groups = data.lookup_groups().head(3)
groups
group person
0 1 Alan
1 1 George
2 1 Fred
With pandas merge:
merged = pd.merge(groups, people, how='left',
left_on='person', right_on='name')
print(alt.Chart(merged).mark_bar().encode(
x='mean(age):Q',
y='group:O'
).to_json())
{
"$schema": "https://vega.github.io/schema/vega-lite/v4.8.1.json",
"config": {
"view": {
"continuousHeight": 300,
"continuousWidth": 400
}
},
"data": {
"name": "data-b41b97ffc89b39c92e168871d447e720"
},
"datasets": {
"data-b41b97ffc89b39c92e168871d447e720": [
{
"age": 25,
"group": 1,
"height": 180,
"name": "Alan",
"person": "Alan"
},
{
"age": 32,
"group": 1,
"height": 174,
"name": "George",
"person": "George"
},
{
"age": 39,
"group": 1,
"height": 182,
"name": "Fred",
"person": "Fred"
}
]
},
"encoding": {
"x": {
"aggregate": "mean",
"field": "age",
"type": "quantitative"
},
"y": {
"field": "group",
"type": "ordinal"
}
},
"mark": "bar"
}
With transform lookup all the data is there but as to separate dataset (so technically it takes a little bit of more space with the additional braces and the transform):
print(alt.Chart(groups).mark_bar().encode(
x='mean(age):Q',
y='group:O'
).transform_lookup(
lookup='person',
from_=alt.LookupData(data=people, key='name',
fields=['age'])
).to_json())
{
"$schema": "https://vega.github.io/schema/vega-lite/v4.8.1.json",
"config": {
"view": {
"continuousHeight": 300,
"continuousWidth": 400
}
},
"data": {
"name": "data-5fe242a79352d1fe243b588af570c9c6"
},
"datasets": {
"data-2b374d1509415e1d327c3a7521f8117c": [
{
"age": 25,
"height": 180,
"name": "Alan"
},
{
"age": 32,
"height": 174,
"name": "George"
},
{
"age": 39,
"height": 182,
"name": "Fred"
}
],
"data-5fe242a79352d1fe243b588af570c9c6": [
{
"group": 1,
"person": "Alan"
},
{
"group": 1,
"person": "George"
},
{
"group": 1,
"person": "Fred"
}
]
},
"encoding": {
"x": {
"aggregate": "mean",
"field": "age",
"type": "quantitative"
},
"y": {
"field": "group",
"type": "ordinal"
}
},
"mark": "bar",
"transform": [
{
"from": {
"data": {
"name": "data-2b374d1509415e1d327c3a7521f8117c"
},
"fields": [
"age",
"height"
],
"key": "name"
},
"lookup": "person"
}
]
}
When transform_lookup can save space is if you use it with the URLs of two dataset:
people = data.lookup_people.url
groups = data.lookup_groups.url
print(alt.Chart(groups).mark_bar().encode(
x='mean(age):Q',
y='group:O'
).transform_lookup(
lookup='person',
from_=alt.LookupData(data=people, key='name',
fields=['age'])
).to_json())
{
"$schema": "https://vega.github.io/schema/vega-lite/v4.8.1.json",
"config": {
"view": {
"continuousHeight": 300,
"continuousWidth": 400
}
},
"data": {
"url": "https://vega.github.io/vega-datasets/data/lookup_groups.csv"
},
"encoding": {
"x": {
"aggregate": "mean",
"field": "age",
"type": "quantitative"
},
"y": {
"field": "group",
"type": "ordinal"
}
},
"mark": "bar",
"transform": [
{
"from": {
"data": {
"url": "https://vega.github.io/vega-datasets/data/lookup_people.csv"
},
"fields": [
"age",
"height"
],
"key": "name"
},
"lookup": "person"
}
]
}
I have started learning Python not long ago, and I'm trying to do recursive function for my list. This is the function:
def get_employee(j, field, value):
res = j
for x in res:
if x['Name'] == field and x['Value'] == value:
return res
elif "Properties" not in x:
if x is not None:
continue
elif "Properties" in x:
return get_employee(x['Properties'], field, value)
And this is my JSON:
[{
"Value": "Sales",
"Name": "Department",
"Properties": [{
"Value": "US",
"Name": "Country",
"Properties": [{
"Value": "Employee",
"Name": "EmployeeType",
"Properties": [{
"Value": "Manya Bishter",
"Name": "EmployeeFullName",
"Properties": [{
"Value": 1111,
"Name": "EmployeeID"
},
{
"Value": "Manya",
"Name": "EmployeeFirstName"
},
{
"Value": "Bishter",
"Name": "EmployeeLastName"
}
]
},
{
"Value": "Michael Ort",
"Name": "EmployeeFullName",
"Properties": [{
"Value": 1112,
"Name": "EmployeeID"
},
{
"Value": "Michael",
"Name": "EmployeeFirstName"
},
{
"Value": "Ort",
"Name": "EmployeeLastName"
}
]
}
]
},
{
"Value": "Manager",
"Name": "EmployeeType",
"Properties": [{
"Value": "Nick Fair",
"Name": "EmployeeFullName",
"Properties": [{
"Value": 1113,
"Name": "EmployeeID"
},
{
"Value": "Nick",
"Name": "EmployeeFirstName"
},
{
"Value": "Fair",
"Name": "EmployeeLastName"
}
]
}]
}
]
}]
},
{
"Value": "Marketing",
"Name": "Department",
"Properties": [{
"Value": "US",
"Name": "Country",
"Properties": [{
"Value": "Employee",
"Name": "EmployeeType",
"Properties": [{
"Value": "Tamta Hiresh",
"Name": "EmployeeFullName",
"Properties": [{
"Value": 1121,
"Name": "EmployeeID"
},
{
"Value": "Tamta",
"Name": "EmployeeFirstName"
},
{
"Value": "Hiresh",
"Name": "EmployeeLastName"
}
]
}]
}]
}]
}
]
The function work only on Manya, but nowhere else.
For example, if I do this:
print(get_employee(myjson, "EmployeeFirstName", "Nick"))
It will print:
[{'Value': 1111, 'Name': 'EmployeeID'}, {'Value': 'Manya', 'Name': 'EmployeeFirstName'}, {'Value': 'Bishter', 'Name': 'EmployeeLastName'}]
But for others (like Nick), it will return None.
Can you please help?
Thanks!
Here is the code:
def get_employee(data, field, value):
result = []
def recursor(j, field, value):
res = j
for x in res:
if x['Name'] == field and x['Value'] == value:
result.append(res)
elif "Properties" not in x:
if x is not None:
continue
elif "Properties" in x:
recursor(x['Properties'], field, value)
recursor(data,field,value)
return result
The problem with your recursive function is that, once it hits return get_employee(x['Properties'], field, value); it will stop the outer for loop for x in res: . Thus it will never run on the next item in your list.
I am try to use python to extract value, but I found a weird result.
Following is part of my json variable
"weatherElement": [
{
"elementName": "ELEV",
"elementValue": {
"value": "20.0"
}
},
{
"elementName": "TEMP",
"elementValue": {
"value": "25.0"
}
},
{
"elementName": "D_TNT",
"elementValue": {
"value": "2019-11-22T02:10:00+08:00"
}
}
],
and following code is correct for getting value 25.0 which is temperature
for unit in data['records']['location']:
# print(type(unit)) <---- output <dict>
if unit['stationId'] == 'C0V490':
for wea_unit in unit['weatherElement']: # unit['weatherElement'] is list
if wea_unit['elementName'] == 'TEMP':
print(type(wea_unit['elementValue'])) # is str
return wea_unit['elementValue']
My question is why type(wea_unit['elementValue']) is str?
I think it should be dict and I should use wea_unit['elementValue']['value'] to get '25.0', but it is wrong. Are there anyone know what mistake I made? Thanks!
edit:
following is example code which can run directly
import json
def parse_json_to_dataframe(data):
for unit in data['records']['location']:
# print(type(unit))
if unit['stationId'] == 'C0A560':
for wea_unit in unit['weatherElement']: # unit['weatherElement'] is list
if wea_unit['elementName'] == 'TEMP':
print(type(wea_unit['elementValue']))
return wea_unit['elementValue']
v = {"success":"true",
"result": {"resource_id": "O-A0001-001",
"fields": [{"id": "lat", "type": "Double"},
{"id": "lon", "type": "Double"},
{"id": "locationName", "type": "String"},
{"id": "stationId", "type": "String"},
{"id": "description", "type": "String"},
{"id": "elementName", "type": "String"},
{"id": "elementValue", "type": "Double"},
{"id": "parameterName", "type": "String"},
{"id": "parameterValue", "type": "String"}]}, # result end
"records": {"location": [{"lat": "24.778333",
"lon": "121.494583",
"locationName": "福山",
"stationId": "C0A560",
"time": {"obsTime": "2019-11-22 22:00:00"},
"weatherElement": [
{"elementName": "ELEV", "elementValue": "405.0"},
{"elementName": "WDIR", "elementValue": "0"},
{"elementName": "WDSD", "elementValue": "0.0"},
{"elementName": "TEMP", "elementValue": "19.6"}],
"parameter": [
{"parameterName": "CITY_SN", "parameterValue": "06"},
{"parameterName": "TOWN_SN", "parameterValue": "061"}]}]}}
temp = parse_json_to_dataframe(v)
print(temp)
I am trying to load json from a url and convert to a Pandas dataframe, so that the dataframe would look like the sample below.
I've tried json_normalize, but it duplicates the columns, one for each data type (value and stringValue). Is there a simpler way than this method and then dropping and renaming columns after creating the dataframe? I want to keep the stringValue.
Person ID Position ID Job ID Manager
0 192 936 93 Tom
my_json = {
"columns": [
{
"alias": "c3",
"label": "Person ID",
"dataType": "integer"
},
{
"alias": "c36",
"label": "Position ID",
"dataType": "string"
},
{
"alias": "c40",
"label": "Job ID",
"dataType": "integer",
"entityType": "job"
},
{
"alias": "c19",
"label": "Manager",
"dataType": "integer"
},
],
"data": [
{
"c3": {
"value": 192,
"stringValue": "192"
},
"c36": {
"value": "936",
"stringValue": "936"
},
"c40": {
"value": 93,
"stringValue": "93"
},
"c19": {
"value": 12412453,
"stringValue": "Tom"
}
}
]
}
If c19 is of type string, this should work
alias_to_label = {x['alias']: x['label'] for x in my_json["columns"]}
is_str = {x['alias']: ('string' == x['dataType']) for x in my_json["columns"]}
data = []
for x in my_json["data"]:
data.append({
k: v["stringValue" if is_str[k] else 'value']
for k, v in x.items()
})
df = pd.DataFrame(data).rename(columns=alias_to_label)
I have a project in which i have to convert a json file into a CSV file.
The Json sample :
{
"P_Portfolio Group": {
"depth": 1,
"dataType": "PortfolioOverview",
"levelId": "P_Portfolio Group",
"path": [
{
"label": "Portfolio Group",
"levelId": "P_Portfolio Group"
}
],
"label": "Portfolio Group",
"header": [
{
"id": "Label",
"label": "Security name",
"type": "text",
"contentType": "text"
},
{
"id": "SecurityValue",
"label": "MioCHF",
"type": "text",
"contentType": "number"
},
{
"id": "SecurityValuePct",
"label": "%",
"type": "text",
"contentType": "pct"
}
],
"data": [
{
"dataValues": [
{
"value": "Client1",
"type": "text"
},
{
"value": 2068.73,
"type": "number"
},
{
"value": 14.0584,
"type": "pct"
}
]
},
{
"dataValues": [
{
"value": "Client2",
"type": "text"
},
{
"value": 1511.9,
"type": "number"
},
{
"value": 10.2744,
"type": "pct"
}
]
},
{
"dataValues": [
{
"value": "Client3",
"type": "text"
},
{
"value": 1354.74,
"type": "number"
},
{
"value": 9.2064,
"type": "pct"
}
]
},
{
"dataValues": [
{
"value": "Client4",
"type": "text"
},
{
"value": 1225.78,
"type": "number"
},
{
"value": 8.33,
"type": "pct"
}
]
}
],
"summary": [
{
"value": "Total",
"type": "text"
},
{
"value": 11954.07,
"type": "number"
},
{
"value": 81.236,
"type": "pct"
}
]
}
}
And i want o obtain something like:
Client1,2068.73,14.0584
Client2,1511.9,10.2744
Client3,871.15,5.92
Client4,11954.07,81.236
Can you please give me a hint.
import csv
import json
with open("C:\Users\SVC\Desktop\test.json") as file:
x = json.load(file)
f = csv.writer(open("C:\Users\SVC\Desktop\test.csv", "wb+"))
for x in x:
f.writerow(x["P_Portfolio Group"]["data"]["dataValues"]["value"])
but it doesn't work.
Can you please give me a hint.
import csv
import json
with open('C:\Users\SVC\Desktop\test.json') as json_file:
portfolio_group = json.load(json_file)
with open('C:\Users\SVC\Desktop\test.csv', 'w') as csv_file:
csv_obj = csv.writer(csv_file)
for data in portfolio_group['P_Portfolio Group']['data']:
csv_obj.writerow([d['value'] for d in data['dataValues']])
This results in the following C:\Users\SVC\Desktop\test.csv content:
Client1,2068.73,14.0584
Client2,1511.9,10.2744
Client3,1354.74,9.2064
Client4,1225.78,8.33
Use the pandas library:
import pandas as pd
data = pd.read_csv("C:\Users\SVC\Desktop\test.json")
data.to_csv('test.csv')
done