Python JSON dataframe.to_json() output needs fixing

Python JSON dataframe.to_json() output needs fixing - python

I'm trying to see if Out-of-the-Box a way to get JSON file created as per Requirements without having me to re-open JSON file and massage it further. The json array output I get from df.to_json("file", orient='index', indent=2, date_format='iso') has "0", "1", "3" etc as looks like elements root object names. Requirement is not to have those. And Secondly need to name Root Object.
CSV FILE (INPUT)
(https://i.stack.imgur.com/6gDZz.png)](https://i.stack.imgur.com/6gDZz.png)
vendor issuer
honda.com DigiCert
toyota.com GoDaddy
import pandas as pd
df = pd.read_csv('test_input.csv', na_filter=False, skiprows=0)
df.to_json("test_out.json", orient='index', indent=2, date_format='iso')
OUTPUT
{
"0":{
"vendor":"honda-us.com",
"issuer":"Amazon",
"licensed":"10\/11\/2021 16:14",
"expiring":"2\/9\/2023 16:14",
"remaining":57
},
EXPECTING OUTPUT TO BE
{
"vendorslist": [
{
"vendor":"honda-us.com",
"issuer":"Amazon",
"licensed":"10\/11\/2021 16:14",
"expiring":"2\/9\/2023 16:14",
"remaining":57
}
]
},

My recommendation would be somewhat build this yourself.
import json
import pandas as pd
df = pd.read_csv('test_input.csv', na_filter=False, skiprows=0)
data = {"vendorslist": df.to_dict(orient='records')}
with open("test_out.json", "w") as f:
json.dump(data, f, indent=2, default=str)
This may not give you the exact answer you're after, but it should be a good starting point :)

Related

How to convert CSV to multi level nested JSON in Python

How to convert CSV to nested JSON in Python
This is related to something like this.
I want to convert a flat dataframe file to Nested JSON format:
I have a csv (sales_2020) file in the following format:
and i want a json like this:
i tried the link above and was able to add 1 level using this:
import pandas as pd
df = pd.read_csv('your_file.csv')
df['sales_2020'] = df[['computer','mobile']].to_dict('records')
out = df[['a','Sales_2020']].to_json(orient='records', indent=4)
But i was unable to add 1 more level to it..i.e sales for a specific month..I tried this below solution but doesnt work..
df['jan']['sales_2020'] =df[['computer','mobile']].to_dict('records')
please help me out

I guess what you want is orient='index'
df['sales_2020'] = df[['computer','mobile']].to_dict('records')
out = df.set_index('Month')[['sales_2020']].to_json(orient='index', indent=4)
{
"jan":{
"sales_2020":{
"computer":10,
"mobile":5
}
},
"feb":{
"sales_2020":{
"computer":8,
"mobile":2
}
},
"march":{
"sales_2020":{
"computer":6,
"mobile":12
}
}
}

Extract only ids from json files and read them into a csv

I have a folder including multiple JSON files. Here is a sample JSON file (all JSON files have the same structure):
{
"url": "http://www.lulu.com/shop/alfred-d-byrd/in-the-fire-of-dawn/paperback/product-1108729.html",
"label": "true",
"body": "SOME TEXT HERE",
"ids": [
"360175950098468864",
"394147879201148929"
]
}
I'd like to extract only ids and write them into a CSV file. Here is my code:
import pandas as pd
import os
from os import path
import glob
import csv
import json
input_path = "TEST/True_JSON"
for file in glob.glob(os.path.join(input_path,'*.json')):
with open(file,'rt') as json_file:
json_data = pd.read_json(json_file) #reading json into a pandas dataframe
ids = json_data[['ids']] #select only "response_tweet_ids"
ids.to_csv('TEST/ids.csv',encoding='utf-8', header=False, index=False)
print(ids)
PROBLEM: The above code writes some ids into a CSV file. However, it doesn't return all ids. Also, there are some ids in the output CSV file (ids.csv) that didn't exist in any of my JSON files!
I really appreciate it if someone helps me understand where is the problem.
Thank you,

one other way is create common list for all ids in the folder and write it to the output file only once, here example:
input_path = "TEST/True_JSON"
ids = []
for file in glob.glob(os.path.join(input_path,'*.json')):
with open(file,'rt') as json_file:
json_data = pd.read_json(json_file) #reading json into a pandas dataframe
ids.extend(json_data['ids'].to_list()) #select only "response_tweet_ids"
pd.DataFrame(
ids, colums=('ids', )
).to_csv('TEST/ids.csv',encoding='utf-8', header=False, index=False)
print(ids)
Please read the answer by #lemonhead to get more details.

I think you have two main issues here:
pandas seems to read in ids off-by-1 in some cases, probably due to internally reading in as a float and then converting to an int64 and flooring. See here for a similar issue encountered
To see this:
> x = '''
{
"url": "http://www.lulu.com/shop/alfred-d-byrd/in-the-fire-of-dawn/paperback/product-1108729.html",
"label": "true",
"body": "SOME TEXT HERE",
"ids": [
"360175950098468864",
"394147879201148929"
]
}
'''
> print(pd.read_json(io.StringIO(x)))
# outputs:
url label body ids
0 http://www.lulu.com/shop/alfred-d-byrd/in-the-... true SOME TEXT HERE 360175950098468864
1 http://www.lulu.com/shop/alfred-d-byrd/in-the-... true SOME TEXT HERE 394147879201148928
Note the off by one error with 394147879201148929! AFAIK, one quick way to obviate this in your case is just to tell pandas to read everything in as a string, e.g.
pd.read_json(json_file, dtype='string')
You are looping through your json files and writing each one to the same csv file. However, by default, pandas is opening the file in 'w' mode, which will overwrite any previous data in the file. If you open in append mode ('a') instead, that should do what you intended
ids.to_csv('TEST/ids.csv',encoding='utf-8', header=False, index=False, mode='a')
In context:
for file in glob.glob(os.path.join(input_path,'*.json')):
with open(file,'rt') as json_file:
json_data = pd.read_json(json_file, dtype='string') #reading json into a pandas dataframe
ids = json_data[['ids']] #select only "response_tweet_ids"
ids.to_csv('TEST/ids.csv',encoding='utf-8', header=False, index=False, mode='a')
Overall though, unless you are getting something else from pandas here, why not just use raw json and csv libraries? The following would be do the same without the pandas dependency:
import os
from os import path
import glob
import csv
import json
input_path = "TEST/True_JSON"
all_ids = []
for file in glob.glob(os.path.join(input_path,'*.json')):
with open(file,'rt') as json_file:
json_data = json.load(json_file)
ids = json_data['ids']
all_ids.extend(ids)
print(all_ids)
# write all ids to a csv file
# you could also remove duplicates or other post-processing at this point
with open('TEST/ids.csv', mode='wt', newline='') as fobj:
writer = csv.writer(fobj)
for row in all_ids:
writer.writerow([row])

By default, dataframe.to_csv() overwrites the file. So each time through the loop you replace the file with the IDs from that input file, and the final result is the IDs from the last file.
Use the mode='a' argument to append to the CSV file instead of overwriting.
ids.to_csv(
'TEST/ids.csv', encoding='utf-8', header=False, index=False,
mode='a'
)

Not able to generate a proper csv file from JSON using python

I am not able to generate a proper csv file using the below code. But when I query in individually, I am getting the desired result. Below is the my json file and code
{
"quiz": {
"maths": {
"q2": {
"question": "12 - 8 = ?",
"options": [
"1",
"2",
"3",
"4"
],
"answer": "4"
},
"q1": {
"question": "5 + 7 = ?",
"options": [
"10",
"11",
"12",
"13"
],
"answer": "12"
}
},
"sport": {
"q1": {
"question": "Which one is correct team name in NBA?",
"options": [
"New York Bulls",
"Los Angeles Kings",
"Golden State Warriros",
"Huston Rocket"
],
"answer": "Huston Rocket"
}
}
}
}
import json
import csv
# Opening JSON file and loading the data
# into the variable data
with open('tempjson.json', 'r') as jsonFile:
data = json.load(jsonFile)
flattenData=flatten(data)
employee_data=flattenData
# now we will open a file for writing
data_file = open('data_files.csv', 'w')
# create the csv writer object
csv_writer = csv.writer(data_file)
# Counter variable used for writing
# headers to the CSV file
count = 0
for emp in employee_data:
if count == 0:
# Writing headers of CSV file
header = emp
csv_writer.writerow(header)
count += 1
# Writing data of CSV file
#csv_writer.writerow(employee_data.get(emp))
data_file.close()
Once the above code execute, I get the information as below:
I am not getting it what I am doing wrong. I am flattenning my json file and then trying to change it to csv

You can manipulate the JSON easily with Pandas Dataframes and save it to a CSV.
I'm not sure how your desired CSV should look like, but the following code generates a CSV with columns question, options, and answers. It generates an index column with the name of the quiz and the question number in an alphabetically ordered list (your JSON was unordered). The code below will also work when more different quizzes and questions are added.
Maybe converting it natively in Python is performance-wise better, but manipulation using Pandas makes it easier.
import pandas as pd
# create Pandas dataframe from JSON for easy manipulation
df = pd.read_json("tempjson.json")
# create result dataframe
df_result = pd.DataFrame()
# Get nested dict from each dataframe row
for index, row in df.iterrows():
# Convert it into a new dataframe
df_temp = pd.DataFrame.from_dict(df.loc[index]['quiz'], orient='index')
# Add name of quiz to index
df_temp.index = index + ' ' + df_temp.index
# Append row result to final dataframe
df_result = df_result.append(df_temp)
# Optionally sort alphabetically so questions are in order
df_result.sort_index(inplace=True)
# convert dataframe to CSV
df_result.to_csv('quiz.csv')
Update on request: Export to CSV using flattened JSON:
import json
import csv
from flatten_json import flatten
import pandas
# Opening JSON file and loading the data
# into the variable data
with open("tempjson.json", 'r') as jsonFile:
data = json.load(jsonFile)
flattenData=flatten(data)
df = pd.DataFrame.from_dict(flattenData, orient='index')
# convert dataframe to CSV
df.to_csv('quiz.csv', header=False)
Results in the following CSV (Not sure what your desired outcome is since you did not provide the desired result in your question).

How to convert columns from CSV file to json such that key and value pair are from different columns of the CSV using python?

I have a CSV file with which contains labels and their translation in different languages:
name en_GB de_DE
-----------------------------------------------
ElementsButtonAbort Abort Abbrechen
ElementsButtonConfirm Confirm BestÃ¤tigen
ElementsButtonDelete Delete LÃ¶schen
ElementsButtonEdit Edit Ãndern
I want to convert this CSV into JSON into following pattern using Python:
{
"de_De": {
"translations":{
"ElementsButtonAbort": "Abbrechen"
}
},
"en_GB":{
"translations":{
"ElementsButtonAbort": "Abort"
}
}
}
How can I do this using Python?

Say your data is as such:
import pandas as pd
df = pd.DataFrame([["ElementsButtonAbort", "Abort", "Arbrechen"],
["ElementsButtonConfirm", "Confirm", "BestÃ¤tigen"],
["ElementsButtonDelete", "Delete", "LÃ¶schen"],
["ElementsButtonEdit", "Edit", "Ãndern"]],
columns=["name", "en_GB", "de_DE"])
Then, this might not be the best way to do it but at least it works:
df.set_index("name", drop=True, inplace=True)
translations = df.to_dict()
Now, if you want to have get exactly the dictionary that you show as desired output, you can do:
for language in translations.keys():
_ = translations[language]
translations[language] = {}
translations[language]["translations"] = _
Finally, if you wish to save your dictionary into JSON:
import json
with open('PATH/TO/YOUR/DIRECTORY/translations.json', 'w') as fp:
json.dump(translations, fp)

How to extract objects from a csv / pandas dataframe of JSON files?

I have a csv (which I turned into a pandas dataframe) in which each row consists of a different JSON file, each JSON file has the exact same format and objects as the others, and each one represents a unique transaction (purchase) I would like to take this dataframe and convert it into a dataframe or excel file in which each column would represent an object from the JSON file and each row would represent each transaction.
The JSON also contains arrays, in which case I would like to be able to retrieve each element of the array. Ideally I would like to be able to retrieve all possible objects from the JSON files and turn them into columns.
A simplified version of a row would be:
{
"source":{
"analyze":true,
"billing":{
"gender":null,
"name":"xxxxx",
"phones":[
{
"area_code":"xxxxx",
"country_code":"xxxxx",
"number":"xxxxx",
"phone_type":"xxxxx"
}
]
},
"created_at":"xxxxx",
"customer":{
"address":{
"city":"xxxxx",
"complement":"xxxxx",
"country":"xxxxx",
"neighborhood":"xxxxx",
"number":"xxxxx",
"state":"xxxxx",
"street":"xxxxx",
"zip_code":"xxxxx"
},
"date_of_birth":"xxxxx",
"documents":[
{
"document_type":"xxxxx",
"number":"xxxxx"
}
],
"email":"xxxxx",
"gender":xxxxx,
"name":"xxxxx",
"number_of_previous_orders":xxxxx,
"phones":[
{
"area_code":"xxxxx",
"country_code":"xxxxx",
"number":"xxxxx",
"phone_type":"xxxxx"
}
],
"register_date":xxxxx,
"register_id":"xxxxx"
},
"device":{
"ip":"xxxxx",
"lat":"xxxxx",
"lng":"xxxxx",
"platform":xxxxx,
"session_id":xxxxx
}
}
}
And my python code,,,
import csv
import json
import pandas as pd
df = pd.read_csv(r"<name of csv file in which each row is a JSON file>")
A simplified of my expected output would be something like
Expected Output

You mean something like this as the output, for example to get area_code:
A_col area_code
0 {"source":{"analyze":true,"billing":{"gender":... xxxxx
first:
"gender":xxxxx, "number_of_previous_orders":xxxxx, "register_date":xxxxx, "platform":xxxxx, "session_id":xxxxx, should be double quoted
get the json document:
newjson = []
with open('./example.json', 'r') as f:
for line in f:
line = line.strip()
newjson.append(line)
format it to string:
jsonString = ''.join(newjson)
turn into python object:
jsonData = json.loads(jsonString)
extract the fields using dictionary operations and turn into pandas dataframe:
newDF = pd.DataFrame({"A_col": jsonString, "area_code": jsonData['source']['billing']['phones'][0]['area_code']}, index=[0])

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python JSON dataframe.to_json() output needs fixing - python

Related

How to convert CSV to multi level nested JSON in Python

Extract only ids from json files and read them into a csv

Not able to generate a proper csv file from JSON using python

How to convert columns from CSV file to json such that key and value pair are from different columns of the CSV using python?

How to extract objects from a csv / pandas dataframe of JSON files?

Categories

Resources