I am not able to generate a proper csv file using the below code. But when I query in individually, I am getting the desired result. Below is the my json file and code
{
"quiz": {
"maths": {
"q2": {
"question": "12 - 8 = ?",
"options": [
"1",
"2",
"3",
"4"
],
"answer": "4"
},
"q1": {
"question": "5 + 7 = ?",
"options": [
"10",
"11",
"12",
"13"
],
"answer": "12"
}
},
"sport": {
"q1": {
"question": "Which one is correct team name in NBA?",
"options": [
"New York Bulls",
"Los Angeles Kings",
"Golden State Warriros",
"Huston Rocket"
],
"answer": "Huston Rocket"
}
}
}
}
import json
import csv
# Opening JSON file and loading the data
# into the variable data
with open('tempjson.json', 'r') as jsonFile:
data = json.load(jsonFile)
flattenData=flatten(data)
employee_data=flattenData
# now we will open a file for writing
data_file = open('data_files.csv', 'w')
# create the csv writer object
csv_writer = csv.writer(data_file)
# Counter variable used for writing
# headers to the CSV file
count = 0
for emp in employee_data:
if count == 0:
# Writing headers of CSV file
header = emp
csv_writer.writerow(header)
count += 1
# Writing data of CSV file
#csv_writer.writerow(employee_data.get(emp))
data_file.close()
Once the above code execute, I get the information as below:
I am not getting it what I am doing wrong. I am flattenning my json file and then trying to change it to csv
You can manipulate the JSON easily with Pandas Dataframes and save it to a CSV.
I'm not sure how your desired CSV should look like, but the following code generates a CSV with columns question, options, and answers. It generates an index column with the name of the quiz and the question number in an alphabetically ordered list (your JSON was unordered). The code below will also work when more different quizzes and questions are added.
Maybe converting it natively in Python is performance-wise better, but manipulation using Pandas makes it easier.
import pandas as pd
# create Pandas dataframe from JSON for easy manipulation
df = pd.read_json("tempjson.json")
# create result dataframe
df_result = pd.DataFrame()
# Get nested dict from each dataframe row
for index, row in df.iterrows():
# Convert it into a new dataframe
df_temp = pd.DataFrame.from_dict(df.loc[index]['quiz'], orient='index')
# Add name of quiz to index
df_temp.index = index + ' ' + df_temp.index
# Append row result to final dataframe
df_result = df_result.append(df_temp)
# Optionally sort alphabetically so questions are in order
df_result.sort_index(inplace=True)
# convert dataframe to CSV
df_result.to_csv('quiz.csv')
Update on request: Export to CSV using flattened JSON:
import json
import csv
from flatten_json import flatten
import pandas
# Opening JSON file and loading the data
# into the variable data
with open("tempjson.json", 'r') as jsonFile:
data = json.load(jsonFile)
flattenData=flatten(data)
df = pd.DataFrame.from_dict(flattenData, orient='index')
# convert dataframe to CSV
df.to_csv('quiz.csv', header=False)
Results in the following CSV (Not sure what your desired outcome is since you did not provide the desired result in your question).
Related
I am trying to convert a very long JSON file to CSV. I'm currently trying to use the code below to accomplish this.
import json
import csv
with open('G:\user\jsondata.json') as json_file:
jsondata = json.load(json_file)
data_file = open('G:\user\jsonoutput.csv', 'w', newline='')
csv_writer = csv.writer(data_file)
count = 0
for data in jsondata:
if count == 0:
header = data.keys()
csv_writer.writerow(header)
count += 1
csv_writer.writerow(data.values())
data_file.close()
This code accomplishes writing all the data to a CSV, However only takes the keys for from the first JSON line to use as the headers in the CSV. This would be fine, but further in the JSON there are more keys to used. This causes the values to be disorganized. I was wondering if anyone could help me find a way to get all the possible headers and possibly insert NA when a line doesn't contain that key or values for that key.
The JSON file is similar to this:
[
{"time": "1984-11-04:4:00", "dateOfevent": "1984-11-04", "action": "TAKEN", "Country": "Germany", "Purchased": "YES", ...},
{"time": "1984-10-04:4:00", "dateOfevent": "1984-10-04", "action": "NOTTAKEN", "Country": "Germany", "Purchased": "NO", ...},
{"type": "A4", "time": "1984-11-04:4:00", "dateOfevent": "1984-11-04", "Country": "Germany", "typeOfevent": "H7", ...},
{...},
{...},
]
I've searched for possible solutions all over, but was unable to find anyone having a similar issue.
If want to use csv and json modules to do this then can do it in two passes. First pass collects the keys for the CSV file and second pass writes the rows to CSV file. Also, must use a DictWriter since the keys differ in the different records.
import json
import csv
with open('jsondata.json') as json_file:
jsondata = json.load(json_file)
# stage 1 - populate column names from JSON
keys = []
for data in jsondata:
for k in data.keys():
if k not in keys:
keys.append(k)
# stage 2 - write rows to CSV file
with open('jsonoutput.csv', 'w', newline='') as fout:
csv_writer = csv.DictWriter(fout, fieldnames=keys)
csv_writer.writeheader()
for data in jsondata:
csv_writer.writerow(data)
Could you try to read it normally, and then cobert it to csv using .to_csv like this:
df = pd.read_json('G:\user\jsondata')
#df = pd.json_normalize(df['Column Name']) #if you want to normalize it
dv.to_csv('example.csv')
I have to export all data from a Json to Excel workbook. The Json is very complex. The following is the structure:
"section": [
{
"name": "Sch1",
"subsection": [
{
"Sch1": {
"pg1_non_calendar_end_date": [
{
"column1": "AA"
}
],
"pg1_document_status_final": [
{
"column2": "XX"
}
],....
}
This continues for thousands of values.
Expected Output:
My system is Windows 10 with Python 3. I am using Pandas to try to get the data in the way I want. But the result is coming like:
I am using the following o
df2 = pd.DataFrame(books[3]['subsection'][0])
df2 = (
df2["Sch1"]
.apply(pd.Series)
.merge(df2, left_index = True, right_index = True)
)
o = 'output.xslx'
df2[0].to_excel(o, sheet_name='Sheet_name_1')
Pandas has many ways to read JSON files into dataframes. For your use case, iterate over each sheet and read the json files using the following option:
import json
import pandas as pd
input_data = json.load("input.json")
sheets = list() # this would be generated from your original json file
for section in input_data["section"]:
for subsection in section["subsection"]:
sheet = json.dumps(subsection)
sheet_df = pd.read_json(sheet, orient="records")
sheet_df.to_excel("output.xslx", sheet_name="Your_Sheet_Name")
Alternatively you can look into the pd.from_records method for properly loading your data.
I have a CSV file with which contains labels and their translation in different languages:
name en_GB de_DE
-----------------------------------------------
ElementsButtonAbort Abort Abbrechen
ElementsButtonConfirm Confirm Bestätigen
ElementsButtonDelete Delete Löschen
ElementsButtonEdit Edit Ãndern
I want to convert this CSV into JSON into following pattern using Python:
{
"de_De": {
"translations":{
"ElementsButtonAbort": "Abbrechen"
}
},
"en_GB":{
"translations":{
"ElementsButtonAbort": "Abort"
}
}
}
How can I do this using Python?
Say your data is as such:
import pandas as pd
df = pd.DataFrame([["ElementsButtonAbort", "Abort", "Arbrechen"],
["ElementsButtonConfirm", "Confirm", "Bestätigen"],
["ElementsButtonDelete", "Delete", "Löschen"],
["ElementsButtonEdit", "Edit", "Ãndern"]],
columns=["name", "en_GB", "de_DE"])
Then, this might not be the best way to do it but at least it works:
df.set_index("name", drop=True, inplace=True)
translations = df.to_dict()
Now, if you want to have get exactly the dictionary that you show as desired output, you can do:
for language in translations.keys():
_ = translations[language]
translations[language] = {}
translations[language]["translations"] = _
Finally, if you wish to save your dictionary into JSON:
import json
with open('PATH/TO/YOUR/DIRECTORY/translations.json', 'w') as fp:
json.dump(translations, fp)
I have a csv (which I turned into a pandas dataframe) in which each row consists of a different JSON file, each JSON file has the exact same format and objects as the others, and each one represents a unique transaction (purchase) I would like to take this dataframe and convert it into a dataframe or excel file in which each column would represent an object from the JSON file and each row would represent each transaction.
The JSON also contains arrays, in which case I would like to be able to retrieve each element of the array. Ideally I would like to be able to retrieve all possible objects from the JSON files and turn them into columns.
A simplified version of a row would be:
{
"source":{
"analyze":true,
"billing":{
"gender":null,
"name":"xxxxx",
"phones":[
{
"area_code":"xxxxx",
"country_code":"xxxxx",
"number":"xxxxx",
"phone_type":"xxxxx"
}
]
},
"created_at":"xxxxx",
"customer":{
"address":{
"city":"xxxxx",
"complement":"xxxxx",
"country":"xxxxx",
"neighborhood":"xxxxx",
"number":"xxxxx",
"state":"xxxxx",
"street":"xxxxx",
"zip_code":"xxxxx"
},
"date_of_birth":"xxxxx",
"documents":[
{
"document_type":"xxxxx",
"number":"xxxxx"
}
],
"email":"xxxxx",
"gender":xxxxx,
"name":"xxxxx",
"number_of_previous_orders":xxxxx,
"phones":[
{
"area_code":"xxxxx",
"country_code":"xxxxx",
"number":"xxxxx",
"phone_type":"xxxxx"
}
],
"register_date":xxxxx,
"register_id":"xxxxx"
},
"device":{
"ip":"xxxxx",
"lat":"xxxxx",
"lng":"xxxxx",
"platform":xxxxx,
"session_id":xxxxx
}
}
}
And my python code,,,
import csv
import json
import pandas as pd
df = pd.read_csv(r"<name of csv file in which each row is a JSON file>")
A simplified of my expected output would be something like
Expected Output
You mean something like this as the output, for example to get area_code:
A_col area_code
0 {"source":{"analyze":true,"billing":{"gender":... xxxxx
first:
"gender":xxxxx, "number_of_previous_orders":xxxxx, "register_date":xxxxx, "platform":xxxxx, "session_id":xxxxx, should be double quoted
get the json document:
newjson = []
with open('./example.json', 'r') as f:
for line in f:
line = line.strip()
newjson.append(line)
format it to string:
jsonString = ''.join(newjson)
turn into python object:
jsonData = json.loads(jsonString)
extract the fields using dictionary operations and turn into pandas dataframe:
newDF = pd.DataFrame({"A_col": jsonString, "area_code": jsonData['source']['billing']['phones'][0]['area_code']}, index=[0])
I have a file 'test.json' which contains an array "rows" and another sub array "allowed" in which some alphabets are there like "A","B" etc.but i want to modify the contents of subarray. how can i do??
test.json file is following:
{"rows": [
{
"Company": "google",
"allowed": ["A","B","C"]},#array containg 3 variables
{
"Company": "Yahoo",
"allowed": ["D","E","F"]#array contanig 3 variables
}
]}
But i want to modify "allowed" array . and want to update 3rd index as "LOOK" instead of "C". so that the resultant array should looks like:
{"rows": [
{
"Company": "google",
"allowed": ["A","B","LOOK"]#array containg 3 variables
},
{
"Company": "Yahoo", #array containing 3 variables
"allowed": ["D","E","F"] #array containing 3 variables
}
]}
My program:
import json
with open('test.json') as f:
data = json.load(f)
for row in data['rows']:
a_dict = {row['allowed'][1]:"L"}
with open('test.json') as f:
data = json.load(f)
data.update(a_dict)
with open('test.json', 'w') as f:
json.dump(data, f,indent=2)
There are a couple of problems with your program as it is.
The first issue is you're not looking up the last element of your 'allowed' arrays:
a_dict = {row['allowed'][1]:"L"}
Remember, array indicies start at zero. eg:
['Index 0', 'Index 1', 'Index 2']
But the main problem is when you walk over each row, you fetch the contents of
that row, but then don't do anything with it.
import json
with open('test.json') as f:
data = json.load(f)
for row in data['rows']:
a_dict = {row['allowed'][1]:"L"}
# a_dict is twiddling its thumbs here...
# until it's replaced by the next row's contents
...
It just gets replaced by the next row of the for loop, until you're left with the
final row all by itself in "a_dict", since the last one of course isn't overwritten by
anything. Which in you sample, would be:
{'E': 'L'}
Next you load the original json data again (though, you don't need to -- it's
still in your data variable, unmodified), and add a_dict to it:
with open('test.json') as f:
data = json.load(f)
data.update(a_dict)
This leaves you with this:
{
"rows": [
{
"Company": "google",
"allowed": ["A", "B", "C"]
},
{
"Company": "Yahoo",
"allowed": ["D", "E", "F"]
}
],
"E": "L"
}
So, to fix this, you need to:
Point at the correct 'allowed' index (in your case, that'll be [2]), and
Modify the rows, instead of copying them out and merging them back into data.
In your for loop, each row in data['rows'] is pointing at the value in data, so you can update the contents of row, and your work is done.
One thing I wasn't clear on was whether you meant to update all rows (implied by your looping over all rows), or just update the first row (as shown in your example desired output).
So here's a sample fix which works in either case:
import json
modify_first_row_only = True
with open('test.json', 'r') as f:
data = json.load(f)
rows = data['rows']
if modify_first_row_only:
rows[0]['allowed'][2] = 'LOOK'
else:
for row in rows:
row['allowed'][2] = 'LOOK'
with open('test.json', 'w') as f:
json.dump(data, f, indent=2)