I am looking to transform my dataframe to json
Age Eye Gender
30 blue male
My current code, I convert the dataframe to json and get the below result:
json_file = df.to_json(orient='records')
json_file
[{'age':'30'},{'eye':'blue'},{'gender':'male'}]
However, I want to add an additional layer that would state the id and name to the json data and then label it as 'info'.
{'id':'5231'
'name':'Bob'
'info': [
{'age':'30'},{'eye':'blue'},{'gender':'male'}
]
}
How would I add the additional fields? I tried reading the docs however I do not see a clear answer on how to add the additional fields in during dataframe to json conversion.
Based on the data you provided this is your answer:
import pandas as pd
a = {'id':'5231',
'name':'Bob',
}
df = pd.DataFrame({'Age':[30], 'Eye':['blue'], 'Gender': ['male']})
json = df.to_json(orient='records')
a['info'] = json
Related
How to convert CSV to nested JSON in Python
This is related to something like this.
I want to convert a flat dataframe file to Nested JSON format:
I have a csv (sales_2020) file in the following format:
and i want a json like this:
i tried the link above and was able to add 1 level using this:
import pandas as pd
df = pd.read_csv('your_file.csv')
df['sales_2020'] = df[['computer','mobile']].to_dict('records')
out = df[['a','Sales_2020']].to_json(orient='records', indent=4)
But i was unable to add 1 more level to it..i.e sales for a specific month..I tried this below solution but doesnt work..
df['jan']['sales_2020'] =df[['computer','mobile']].to_dict('records')
please help me out
I guess what you want is orient='index'
df['sales_2020'] = df[['computer','mobile']].to_dict('records')
out = df.set_index('Month')[['sales_2020']].to_json(orient='index', indent=4)
{
"jan":{
"sales_2020":{
"computer":10,
"mobile":5
}
},
"feb":{
"sales_2020":{
"computer":8,
"mobile":2
}
},
"march":{
"sales_2020":{
"computer":6,
"mobile":12
}
}
}
I have tried a few different ways using Panda to import my JSON to a csv file.
import pandas as pd
df = pd.read_json("CDMP_E2.json")
df.ts_csv("CDMP_Output.csv")
The problem is when I run that code it makes the output all in one "column".
The column header shows up as Credit-NoSQL.
Then the data in the column is everything from each "object"
'date':'2021-08-01','type':'CARD','amount':'100'
So it looks like this:
Credit-NoSQL
'date':'2021-08-01','type':'CARD','amount':'100'
I would instead expect to see date, type and amount as the headers instead.
account date type amount returneddate
ABCD 2021-08-01 CARD 100
EFGHI 2021-08-01 CARD 150 2021-08-04
My JSON file looks as such:
[
{
"Credit-NoSQL":{
"account":"ABCD"
"date":"2021-08-01",
"type":"CARD",
"amount":"100"
}
},
{
"Credit-NoSQL":{
"account":"EFGHI"
"date":"2021-08-02",
"type":"CARD",
"amount":"150"
"returneddate":"2021-08-04"
}
}
]
so I am not sure if it is the way my JSON file is set up with it's list and such or if I am missing something in my python command. I am new to python and still learning so I am at a loss at what I can do next.
No need to use pandas for this.
import json, csv
with open("CDMP_E2.json") as json_file:
data = [item['Credit-NoSQL'] for item in json.load(json_file)]
# Get the union of all dictionary keys
fieldnames = set()
for row in data:
fieldnames |= row
with open("CDMP_Output.csv", "w") as csv_file:
cwrite = csv.DictWriter(csv_file, fieldnames = fieldnames)
cwrite.writeheader()
cwrite.writerows(data)
I have a CSV file with which contains labels and their translation in different languages:
name en_GB de_DE
-----------------------------------------------
ElementsButtonAbort Abort Abbrechen
ElementsButtonConfirm Confirm Bestätigen
ElementsButtonDelete Delete Löschen
ElementsButtonEdit Edit Ãndern
I want to convert this CSV into JSON into following pattern using Python:
{
"de_De": {
"translations":{
"ElementsButtonAbort": "Abbrechen"
}
},
"en_GB":{
"translations":{
"ElementsButtonAbort": "Abort"
}
}
}
How can I do this using Python?
Say your data is as such:
import pandas as pd
df = pd.DataFrame([["ElementsButtonAbort", "Abort", "Arbrechen"],
["ElementsButtonConfirm", "Confirm", "Bestätigen"],
["ElementsButtonDelete", "Delete", "Löschen"],
["ElementsButtonEdit", "Edit", "Ãndern"]],
columns=["name", "en_GB", "de_DE"])
Then, this might not be the best way to do it but at least it works:
df.set_index("name", drop=True, inplace=True)
translations = df.to_dict()
Now, if you want to have get exactly the dictionary that you show as desired output, you can do:
for language in translations.keys():
_ = translations[language]
translations[language] = {}
translations[language]["translations"] = _
Finally, if you wish to save your dictionary into JSON:
import json
with open('PATH/TO/YOUR/DIRECTORY/translations.json', 'w') as fp:
json.dump(translations, fp)
I have a csv (which I turned into a pandas dataframe) in which each row consists of a different JSON file, each JSON file has the exact same format and objects as the others, and each one represents a unique transaction (purchase) I would like to take this dataframe and convert it into a dataframe or excel file in which each column would represent an object from the JSON file and each row would represent each transaction.
The JSON also contains arrays, in which case I would like to be able to retrieve each element of the array. Ideally I would like to be able to retrieve all possible objects from the JSON files and turn them into columns.
A simplified version of a row would be:
{
"source":{
"analyze":true,
"billing":{
"gender":null,
"name":"xxxxx",
"phones":[
{
"area_code":"xxxxx",
"country_code":"xxxxx",
"number":"xxxxx",
"phone_type":"xxxxx"
}
]
},
"created_at":"xxxxx",
"customer":{
"address":{
"city":"xxxxx",
"complement":"xxxxx",
"country":"xxxxx",
"neighborhood":"xxxxx",
"number":"xxxxx",
"state":"xxxxx",
"street":"xxxxx",
"zip_code":"xxxxx"
},
"date_of_birth":"xxxxx",
"documents":[
{
"document_type":"xxxxx",
"number":"xxxxx"
}
],
"email":"xxxxx",
"gender":xxxxx,
"name":"xxxxx",
"number_of_previous_orders":xxxxx,
"phones":[
{
"area_code":"xxxxx",
"country_code":"xxxxx",
"number":"xxxxx",
"phone_type":"xxxxx"
}
],
"register_date":xxxxx,
"register_id":"xxxxx"
},
"device":{
"ip":"xxxxx",
"lat":"xxxxx",
"lng":"xxxxx",
"platform":xxxxx,
"session_id":xxxxx
}
}
}
And my python code,,,
import csv
import json
import pandas as pd
df = pd.read_csv(r"<name of csv file in which each row is a JSON file>")
A simplified of my expected output would be something like
Expected Output
You mean something like this as the output, for example to get area_code:
A_col area_code
0 {"source":{"analyze":true,"billing":{"gender":... xxxxx
first:
"gender":xxxxx, "number_of_previous_orders":xxxxx, "register_date":xxxxx, "platform":xxxxx, "session_id":xxxxx, should be double quoted
get the json document:
newjson = []
with open('./example.json', 'r') as f:
for line in f:
line = line.strip()
newjson.append(line)
format it to string:
jsonString = ''.join(newjson)
turn into python object:
jsonData = json.loads(jsonString)
extract the fields using dictionary operations and turn into pandas dataframe:
newDF = pd.DataFrame({"A_col": jsonString, "area_code": jsonData['source']['billing']['phones'][0]['area_code']}, index=[0])
I have a problem converting a JSON API into a pandas dataframe. I have the following structure of the json file:
{"place":{"AMS":[{"UTC":"14-11-2017 10:00","ValidUTC":"14-11-2017 00:00","Cardinality":"4",...},{"UTC":"14-11-2017 11:00",...}]}}
Now in my pandas DataFrame I want the columns, UTC, ValidUtc, cardinality, etc. So I tried to use the Json normalize function:
main_api = ('https://api.xxx")
url=main_api
json_data = requests.get(url).json()
df = json_normalize(json_data, 'place', ['AMS'])
and
main_api = ('https://api.xxx")
url=main_api
json_data = requests.get(url).json()
df = json_normalize(json_data, 'place')
df = json_normalize(json_data, 'AMS')
but they do not seem to work. Anyone has an idea about how to convert the json correctly in the pandas DataFrame.
Refer to JSON to pandas DataFrame it is very well described for normalizing JSON. Besides, you can to pass a parsing function for the json columns.
Not sure if I recreated your input correctly (you should add the full json without ...).
data = {"place":{"AMS":[{"UTC":"14-11-2017 10:00","ValidUTC":"14-11-2017 00:00","Cardinality":"4"}, {"UTC":"15-11-2017 10:00","ValidUTC":"15-11-2017 00:00","Cardinality":"5"} ]}}
pd.json_normalize(data['place']['AMS'])
Output
UTC ValidUTC Cardinality
0 14-11-2017 10:00 14-11-2017 00:00 4
1 15-11-2017 10:00 15-11-2017 00:00 5