I have to export all data from a Json to Excel workbook. The Json is very complex. The following is the structure:
"section": [
{
"name": "Sch1",
"subsection": [
{
"Sch1": {
"pg1_non_calendar_end_date": [
{
"column1": "AA"
}
],
"pg1_document_status_final": [
{
"column2": "XX"
}
],....
}
This continues for thousands of values.
Expected Output:
My system is Windows 10 with Python 3. I am using Pandas to try to get the data in the way I want. But the result is coming like:
I am using the following o
df2 = pd.DataFrame(books[3]['subsection'][0])
df2 = (
df2["Sch1"]
.apply(pd.Series)
.merge(df2, left_index = True, right_index = True)
)
o = 'output.xslx'
df2[0].to_excel(o, sheet_name='Sheet_name_1')
Pandas has many ways to read JSON files into dataframes. For your use case, iterate over each sheet and read the json files using the following option:
import json
import pandas as pd
input_data = json.load("input.json")
sheets = list() # this would be generated from your original json file
for section in input_data["section"]:
for subsection in section["subsection"]:
sheet = json.dumps(subsection)
sheet_df = pd.read_json(sheet, orient="records")
sheet_df.to_excel("output.xslx", sheet_name="Your_Sheet_Name")
Alternatively you can look into the pd.from_records method for properly loading your data.
Related
I'm trying to see if Out-of-the-Box a way to get JSON file created as per Requirements without having me to re-open JSON file and massage it further. The json array output I get from df.to_json("file", orient='index', indent=2, date_format='iso') has "0", "1", "3" etc as looks like elements root object names. Requirement is not to have those. And Secondly need to name Root Object.
CSV FILE (INPUT)
(https://i.stack.imgur.com/6gDZz.png)](https://i.stack.imgur.com/6gDZz.png)
vendor issuer
honda.com DigiCert
toyota.com GoDaddy
import pandas as pd
df = pd.read_csv('test_input.csv', na_filter=False, skiprows=0)
df.to_json("test_out.json", orient='index', indent=2, date_format='iso')
OUTPUT
{
"0":{
"vendor":"honda-us.com",
"issuer":"Amazon",
"licensed":"10\/11\/2021 16:14",
"expiring":"2\/9\/2023 16:14",
"remaining":57
},
EXPECTING OUTPUT TO BE
{
"vendorslist": [
{
"vendor":"honda-us.com",
"issuer":"Amazon",
"licensed":"10\/11\/2021 16:14",
"expiring":"2\/9\/2023 16:14",
"remaining":57
}
]
},
My recommendation would be somewhat build this yourself.
import json
import pandas as pd
df = pd.read_csv('test_input.csv', na_filter=False, skiprows=0)
data = {"vendorslist": df.to_dict(orient='records')}
with open("test_out.json", "w") as f:
json.dump(data, f, indent=2, default=str)
This may not give you the exact answer you're after, but it should be a good starting point :)
I am not able to generate a proper csv file using the below code. But when I query in individually, I am getting the desired result. Below is the my json file and code
{
"quiz": {
"maths": {
"q2": {
"question": "12 - 8 = ?",
"options": [
"1",
"2",
"3",
"4"
],
"answer": "4"
},
"q1": {
"question": "5 + 7 = ?",
"options": [
"10",
"11",
"12",
"13"
],
"answer": "12"
}
},
"sport": {
"q1": {
"question": "Which one is correct team name in NBA?",
"options": [
"New York Bulls",
"Los Angeles Kings",
"Golden State Warriros",
"Huston Rocket"
],
"answer": "Huston Rocket"
}
}
}
}
import json
import csv
# Opening JSON file and loading the data
# into the variable data
with open('tempjson.json', 'r') as jsonFile:
data = json.load(jsonFile)
flattenData=flatten(data)
employee_data=flattenData
# now we will open a file for writing
data_file = open('data_files.csv', 'w')
# create the csv writer object
csv_writer = csv.writer(data_file)
# Counter variable used for writing
# headers to the CSV file
count = 0
for emp in employee_data:
if count == 0:
# Writing headers of CSV file
header = emp
csv_writer.writerow(header)
count += 1
# Writing data of CSV file
#csv_writer.writerow(employee_data.get(emp))
data_file.close()
Once the above code execute, I get the information as below:
I am not getting it what I am doing wrong. I am flattenning my json file and then trying to change it to csv
You can manipulate the JSON easily with Pandas Dataframes and save it to a CSV.
I'm not sure how your desired CSV should look like, but the following code generates a CSV with columns question, options, and answers. It generates an index column with the name of the quiz and the question number in an alphabetically ordered list (your JSON was unordered). The code below will also work when more different quizzes and questions are added.
Maybe converting it natively in Python is performance-wise better, but manipulation using Pandas makes it easier.
import pandas as pd
# create Pandas dataframe from JSON for easy manipulation
df = pd.read_json("tempjson.json")
# create result dataframe
df_result = pd.DataFrame()
# Get nested dict from each dataframe row
for index, row in df.iterrows():
# Convert it into a new dataframe
df_temp = pd.DataFrame.from_dict(df.loc[index]['quiz'], orient='index')
# Add name of quiz to index
df_temp.index = index + ' ' + df_temp.index
# Append row result to final dataframe
df_result = df_result.append(df_temp)
# Optionally sort alphabetically so questions are in order
df_result.sort_index(inplace=True)
# convert dataframe to CSV
df_result.to_csv('quiz.csv')
Update on request: Export to CSV using flattened JSON:
import json
import csv
from flatten_json import flatten
import pandas
# Opening JSON file and loading the data
# into the variable data
with open("tempjson.json", 'r') as jsonFile:
data = json.load(jsonFile)
flattenData=flatten(data)
df = pd.DataFrame.from_dict(flattenData, orient='index')
# convert dataframe to CSV
df.to_csv('quiz.csv', header=False)
Results in the following CSV (Not sure what your desired outcome is since you did not provide the desired result in your question).
I am currently having a 1000 JSON's in the following format, each for a single employee:
"A": "A_text",
"B": "Datetime stamp of record",
"ID": "123",
"FeatureList": {
"Salary": 100000,
"Age": 45,
"Work Ex": 15,
}
}
My goal is to recursively concatenate these files into one df (see below):
In my current solution:
I can recursively add all files after formatting with:
rootdir ='/folderpath/filename'
all_files = Path(rootdir).rglob('*.json')
I am able to read the file and transpose it with below:
df = pd.read_json('data.json')
df = df.transpose()
But the array buffer look-alike "FeatureList" is creating distorted orientation if I drop or create new columns.
Any advise on my approach would really help. Thanks
If that's your json, then you can use json_normalize:
with open('1.json', 'r+') as f:
data = json.load(f)
df = pd.json_normalize(data).drop(columns=['A']).rename(columns={'B': 'Date'})
print(df)
Date ID FeatureList.Salary FeatureList.Age FeatureList.Work Ex
0 Datetime stamp of record 123 100000 45 15
I have a CSV file with which contains labels and their translation in different languages:
name en_GB de_DE
-----------------------------------------------
ElementsButtonAbort Abort Abbrechen
ElementsButtonConfirm Confirm Bestätigen
ElementsButtonDelete Delete Löschen
ElementsButtonEdit Edit Ãndern
I want to convert this CSV into JSON into following pattern using Python:
{
"de_De": {
"translations":{
"ElementsButtonAbort": "Abbrechen"
}
},
"en_GB":{
"translations":{
"ElementsButtonAbort": "Abort"
}
}
}
How can I do this using Python?
Say your data is as such:
import pandas as pd
df = pd.DataFrame([["ElementsButtonAbort", "Abort", "Arbrechen"],
["ElementsButtonConfirm", "Confirm", "Bestätigen"],
["ElementsButtonDelete", "Delete", "Löschen"],
["ElementsButtonEdit", "Edit", "Ãndern"]],
columns=["name", "en_GB", "de_DE"])
Then, this might not be the best way to do it but at least it works:
df.set_index("name", drop=True, inplace=True)
translations = df.to_dict()
Now, if you want to have get exactly the dictionary that you show as desired output, you can do:
for language in translations.keys():
_ = translations[language]
translations[language] = {}
translations[language]["translations"] = _
Finally, if you wish to save your dictionary into JSON:
import json
with open('PATH/TO/YOUR/DIRECTORY/translations.json', 'w') as fp:
json.dump(translations, fp)
I have a csv (which I turned into a pandas dataframe) in which each row consists of a different JSON file, each JSON file has the exact same format and objects as the others, and each one represents a unique transaction (purchase) I would like to take this dataframe and convert it into a dataframe or excel file in which each column would represent an object from the JSON file and each row would represent each transaction.
The JSON also contains arrays, in which case I would like to be able to retrieve each element of the array. Ideally I would like to be able to retrieve all possible objects from the JSON files and turn them into columns.
A simplified version of a row would be:
{
"source":{
"analyze":true,
"billing":{
"gender":null,
"name":"xxxxx",
"phones":[
{
"area_code":"xxxxx",
"country_code":"xxxxx",
"number":"xxxxx",
"phone_type":"xxxxx"
}
]
},
"created_at":"xxxxx",
"customer":{
"address":{
"city":"xxxxx",
"complement":"xxxxx",
"country":"xxxxx",
"neighborhood":"xxxxx",
"number":"xxxxx",
"state":"xxxxx",
"street":"xxxxx",
"zip_code":"xxxxx"
},
"date_of_birth":"xxxxx",
"documents":[
{
"document_type":"xxxxx",
"number":"xxxxx"
}
],
"email":"xxxxx",
"gender":xxxxx,
"name":"xxxxx",
"number_of_previous_orders":xxxxx,
"phones":[
{
"area_code":"xxxxx",
"country_code":"xxxxx",
"number":"xxxxx",
"phone_type":"xxxxx"
}
],
"register_date":xxxxx,
"register_id":"xxxxx"
},
"device":{
"ip":"xxxxx",
"lat":"xxxxx",
"lng":"xxxxx",
"platform":xxxxx,
"session_id":xxxxx
}
}
}
And my python code,,,
import csv
import json
import pandas as pd
df = pd.read_csv(r"<name of csv file in which each row is a JSON file>")
A simplified of my expected output would be something like
Expected Output
You mean something like this as the output, for example to get area_code:
A_col area_code
0 {"source":{"analyze":true,"billing":{"gender":... xxxxx
first:
"gender":xxxxx, "number_of_previous_orders":xxxxx, "register_date":xxxxx, "platform":xxxxx, "session_id":xxxxx, should be double quoted
get the json document:
newjson = []
with open('./example.json', 'r') as f:
for line in f:
line = line.strip()
newjson.append(line)
format it to string:
jsonString = ''.join(newjson)
turn into python object:
jsonData = json.loads(jsonString)
extract the fields using dictionary operations and turn into pandas dataframe:
newDF = pd.DataFrame({"A_col": jsonString, "area_code": jsonData['source']['billing']['phones'][0]['area_code']}, index=[0])