CSV to Multi level JSON structure - python

I have the following csv file (1.csv):
"STUB_1","current_week","previous_week","weekly_diff"
"Crude Oil",1184.951,1191.649,-6.698
Need to convert to the following json
json_body = [
{
"measurement":"Crude Oil",
"fields":
{
"weekly_diff":-6.698,
"current_week":1184.951,
"previous_week":1191.649
}
}
]
df = pd.read_csv("1.csv")
df = df.rename(columns={'STUB_1': 'measurement'})
j = (df.groupby(['measurement'], as_index=True)
.apply(lambda x: x[['current_week','previous_week', 'weekly_diff']].to_dict('r'))
.reset_index()
.rename(columns={0:'fields'})
.to_json(orient='records'))
print j
output:
[
{
"measurement": "Crude Oil",
"fields":
[ #extra bracket
{
"weekly_diff": -6.698,
"current_week": 1184.951,
"previous_week": 1191.649
}
] # extra bracket
}
]
which is almost what I need but with extra [ ].
can anyone help what I did wrong? thank you!

Don't use pandas for this - you would have to do a lot of manual unraveling to turn your table data into a hierarchical structure so why not just skip the middle man and use the built-in csv and json modules to do the task for you, e.g.
import csv
import json
with open("1.csv", "rU") as f: # open your CSV file for reading
reader = csv.DictReader(f, quoting=csv.QUOTE_NONNUMERIC) # DictReader for convenience
data = [{"measurement": r.pop("STUB_1", None), "fields": r} for r in reader] # convert!
data_json = json.dumps(data, indent=4) # finally, serialize the data to JSON
print(data_json)
and you get:
[
{
"measurement": "Crude Oil",
"fields": {
"current_week": 1184.951,
"previous_week": 1191.649,
"weekly_diff": -6.698
}
}
]
However, keep in mind that if you have multiple entries with the same STUB_1 value only the latest will be kept - otherwise you'd have to store your fields as a list which will bring you to your original problem with the data.
A quick note on how it does what it does - first we create a csv.DictReader - it's a convenience reader that will map each row's entry with the header fields. It also uses quoting=csv.QUOTE_NONNUMERIC to ensure automatic conversion to floats for all non-quoted fields in your CSV. Then, in the list comprehension, it essentially reads row by row from the reader and creates a new dict for each row - the measurement key contains the STUB_1 entry (which gets immediately removed with dict.pop()) and fields contains the remaining entries in the row. Finally, the json module is used to serialize this list into a JSON that you want.
Also, keep in mind that JSON (and Python <3.5) doesn't guarantee the order of elements so your measurement entry might appear after the fields entry and same goes for the sub-entries of fields. Order shouldn't matter anyway (except for a few very specific cases) but if you want to control it you can use collections.OrderedDict to build your inner dictionaries in the order you prefer to look at once serialized to JSON.

Related

JSON to CSV: keys to header issue

I am trying to convert a very long JSON file to CSV. I'm currently trying to use the code below to accomplish this.
import json
import csv
with open('G:\user\jsondata.json') as json_file:
jsondata = json.load(json_file)
data_file = open('G:\user\jsonoutput.csv', 'w', newline='')
csv_writer = csv.writer(data_file)
count = 0
for data in jsondata:
if count == 0:
header = data.keys()
csv_writer.writerow(header)
count += 1
csv_writer.writerow(data.values())
data_file.close()
This code accomplishes writing all the data to a CSV, However only takes the keys for from the first JSON line to use as the headers in the CSV. This would be fine, but further in the JSON there are more keys to used. This causes the values to be disorganized. I was wondering if anyone could help me find a way to get all the possible headers and possibly insert NA when a line doesn't contain that key or values for that key.
The JSON file is similar to this:
[
{"time": "1984-11-04:4:00", "dateOfevent": "1984-11-04", "action": "TAKEN", "Country": "Germany", "Purchased": "YES", ...},
{"time": "1984-10-04:4:00", "dateOfevent": "1984-10-04", "action": "NOTTAKEN", "Country": "Germany", "Purchased": "NO", ...},
{"type": "A4", "time": "1984-11-04:4:00", "dateOfevent": "1984-11-04", "Country": "Germany", "typeOfevent": "H7", ...},
{...},
{...},
]
I've searched for possible solutions all over, but was unable to find anyone having a similar issue.
If want to use csv and json modules to do this then can do it in two passes. First pass collects the keys for the CSV file and second pass writes the rows to CSV file. Also, must use a DictWriter since the keys differ in the different records.
import json
import csv
with open('jsondata.json') as json_file:
jsondata = json.load(json_file)
# stage 1 - populate column names from JSON
keys = []
for data in jsondata:
for k in data.keys():
if k not in keys:
keys.append(k)
# stage 2 - write rows to CSV file
with open('jsonoutput.csv', 'w', newline='') as fout:
csv_writer = csv.DictWriter(fout, fieldnames=keys)
csv_writer.writeheader()
for data in jsondata:
csv_writer.writerow(data)
Could you try to read it normally, and then cobert it to csv using .to_csv like this:
df = pd.read_json('G:\user\jsondata')
#df = pd.json_normalize(df['Column Name']) #if you want to normalize it
dv.to_csv('example.csv')

Using Python to conver JSON to CSV

I have tried a few different ways using Panda to import my JSON to a csv file.
import pandas as pd
df = pd.read_json("CDMP_E2.json")
df.ts_csv("CDMP_Output.csv")
The problem is when I run that code it makes the output all in one "column".
The column header shows up as Credit-NoSQL.
Then the data in the column is everything from each "object"
'date':'2021-08-01','type':'CARD','amount':'100'
So it looks like this:
Credit-NoSQL
'date':'2021-08-01','type':'CARD','amount':'100'
I would instead expect to see date, type and amount as the headers instead.
account date type amount returneddate
ABCD 2021-08-01 CARD 100
EFGHI 2021-08-01 CARD 150 2021-08-04
My JSON file looks as such:
[
{
"Credit-NoSQL":{
"account":"ABCD"
"date":"2021-08-01",
"type":"CARD",
"amount":"100"
}
},
{
"Credit-NoSQL":{
"account":"EFGHI"
"date":"2021-08-02",
"type":"CARD",
"amount":"150"
"returneddate":"2021-08-04"
}
}
]
so I am not sure if it is the way my JSON file is set up with it's list and such or if I am missing something in my python command. I am new to python and still learning so I am at a loss at what I can do next.
No need to use pandas for this.
import json, csv
with open("CDMP_E2.json") as json_file:
data = [item['Credit-NoSQL'] for item in json.load(json_file)]
# Get the union of all dictionary keys
fieldnames = set()
for row in data:
fieldnames |= row
with open("CDMP_Output.csv", "w") as csv_file:
cwrite = csv.DictWriter(csv_file, fieldnames = fieldnames)
cwrite.writeheader()
cwrite.writerows(data)

How to convert columns from CSV file to json such that key and value pair are from different columns of the CSV using python?

I have a CSV file with which contains labels and their translation in different languages:
name en_GB de_DE
-----------------------------------------------
ElementsButtonAbort Abort Abbrechen
ElementsButtonConfirm Confirm Bestätigen
ElementsButtonDelete Delete Löschen
ElementsButtonEdit Edit Ãndern
I want to convert this CSV into JSON into following pattern using Python:
{
"de_De": {
"translations":{
"ElementsButtonAbort": "Abbrechen"
}
},
"en_GB":{
"translations":{
"ElementsButtonAbort": "Abort"
}
}
}
How can I do this using Python?
Say your data is as such:
import pandas as pd
df = pd.DataFrame([["ElementsButtonAbort", "Abort", "Arbrechen"],
["ElementsButtonConfirm", "Confirm", "Bestätigen"],
["ElementsButtonDelete", "Delete", "Löschen"],
["ElementsButtonEdit", "Edit", "Ãndern"]],
columns=["name", "en_GB", "de_DE"])
Then, this might not be the best way to do it but at least it works:
df.set_index("name", drop=True, inplace=True)
translations = df.to_dict()
Now, if you want to have get exactly the dictionary that you show as desired output, you can do:
for language in translations.keys():
_ = translations[language]
translations[language] = {}
translations[language]["translations"] = _
Finally, if you wish to save your dictionary into JSON:
import json
with open('PATH/TO/YOUR/DIRECTORY/translations.json', 'w') as fp:
json.dump(translations, fp)

How to extract objects from a csv / pandas dataframe of JSON files?

I have a csv (which I turned into a pandas dataframe) in which each row consists of a different JSON file, each JSON file has the exact same format and objects as the others, and each one represents a unique transaction (purchase) I would like to take this dataframe and convert it into a dataframe or excel file in which each column would represent an object from the JSON file and each row would represent each transaction.
The JSON also contains arrays, in which case I would like to be able to retrieve each element of the array. Ideally I would like to be able to retrieve all possible objects from the JSON files and turn them into columns.
A simplified version of a row would be:
{
"source":{
"analyze":true,
"billing":{
"gender":null,
"name":"xxxxx",
"phones":[
{
"area_code":"xxxxx",
"country_code":"xxxxx",
"number":"xxxxx",
"phone_type":"xxxxx"
}
]
},
"created_at":"xxxxx",
"customer":{
"address":{
"city":"xxxxx",
"complement":"xxxxx",
"country":"xxxxx",
"neighborhood":"xxxxx",
"number":"xxxxx",
"state":"xxxxx",
"street":"xxxxx",
"zip_code":"xxxxx"
},
"date_of_birth":"xxxxx",
"documents":[
{
"document_type":"xxxxx",
"number":"xxxxx"
}
],
"email":"xxxxx",
"gender":xxxxx,
"name":"xxxxx",
"number_of_previous_orders":xxxxx,
"phones":[
{
"area_code":"xxxxx",
"country_code":"xxxxx",
"number":"xxxxx",
"phone_type":"xxxxx"
}
],
"register_date":xxxxx,
"register_id":"xxxxx"
},
"device":{
"ip":"xxxxx",
"lat":"xxxxx",
"lng":"xxxxx",
"platform":xxxxx,
"session_id":xxxxx
}
}
}
And my python code,,,
import csv
import json
import pandas as pd
df = pd.read_csv(r"<name of csv file in which each row is a JSON file>")
A simplified of my expected output would be something like
Expected Output
You mean something like this as the output, for example to get area_code:
A_col area_code
0 {"source":{"analyze":true,"billing":{"gender":... xxxxx
first:
"gender":xxxxx, "number_of_previous_orders":xxxxx, "register_date":xxxxx, "platform":xxxxx, "session_id":xxxxx, should be double quoted
get the json document:
newjson = []
with open('./example.json', 'r') as f:
for line in f:
line = line.strip()
newjson.append(line)
format it to string:
jsonString = ''.join(newjson)
turn into python object:
jsonData = json.loads(jsonString)
extract the fields using dictionary operations and turn into pandas dataframe:
newDF = pd.DataFrame({"A_col": jsonString, "area_code": jsonData['source']['billing']['phones'][0]['area_code']}, index=[0])

Loading data from a JSON file

I am trying to get some data from a JSON file. Here is the code for it -
import csv
import json
ifile = open('facebook.csv', "rb")
reader = csv.reader(ifile)
rownum = 0
for row in reader:
try:
csvfile = open('facebook.csv', 'r')
jsonfile = open('file.json', 'r+')
fieldnames = ("USState","NOFU2008","NOFU2009","NOFU2010", "12MI%", "24MI%")
reader = csv.DictReader( csvfile, fieldnames)
for row in reader:
json.dump(row, jsonfile)
jsonfile.write('\n')
data = json.load(jsonfile)
print data["USState"]
except ValueError:
continue
I am not getting any output on the console for the print statement. The JSON is in the following format
{"USState": "US State", "12MI%": "12 month increase %", "24MI%": "24 month increase %", "NOFU2010": "Number of Facebook UsersJuly 2010", "NOFU2008": "Number of Facebook usersJuly 2008", "NOFU2009": "Number of Facebook UsersJuly 2009"}
{"USState": "Alabama", "12MI%": "109.3%", "24MI%": "400.7%", "NOFU2010": "1,452,300", "NOFU2008": "290,060", "NOFU2009": "694,020"}
I want to access this like NOFU2008 for all the rows.
The problem is in the way you're creating the JSON file. You don't want to use json.dump() for each row and then append those to the JSON file.
To create a JSON file, you should first create a data structure in Python that represents the entire file the way you want it, and then call json.dump() one time only to dump out the entire structure to JSON format.
Making a single json.dump() call for your entire file will insure that it is valid JSON.
I'd also recommend wrapping your list/array of rows inside a dict/object so you have a place to put other properties that pertain to the entire JSON file as opposed to a single row.
It looks like the first couple of rows of your facebook.csv are something like this (with or without the quotes):
"US State","12 month increase %","24 month increase %","Number of Facebook UsersJuly 2010","Number of Facebook usersJuly 2008","Number of Facebook UsersJuly 2009"
"Alabama","109.3%","400.7%","1,452,300","290,060","694,020"
Let's say we want to generate this JSON file from that (indented here for clarity):
{
"rows": [
{
"USState": "US State",
"12MI%": "Number of Facebook usersJuly 2008",
"24MI%": "Number of Facebook UsersJuly 2009",
"NOFU2010": "Number of Facebook UsersJuly 2010",
"NOFU2008": "12 month increase %",
"NOFU2009": "24 month increase %"
},
{
"USState": "Alabama",
"12MI%": "290,060",
"24MI%": "694,020",
"NOFU2010": "1,452,300",
"NOFU2008": "109.3%",
"NOFU2009": "400.7%"
}
]
}
Note that the top level of the JSON file is an object (not an array), and this object has a rows property which is the array of rows.
We can create this JSON file and test it with this Python code:
import csv
import json
# Read the CSV file and convert it to a list of dicts
with open( 'facebook.csv', 'rb' ) as csvfile:
fieldnames = (
"USState", "NOFU2008", "NOFU2009", "NOFU2010",
"12MI%", "24MI%"
)
reader = csv.DictReader( csvfile, fieldnames )
rows = list( reader )
# Wrap the list inside an outer dict
wrap = {
'rows': rows
}
# Format and write the entire JSON in one fell swoop
with open( 'file.json', 'wb' ) as jsonfile:
json.dump( wrap, jsonfile )
# Now test the file by reading it and parsing it
with open( 'file.json', 'rb' ) as jsonfile:
data = json.load( jsonfile )
# For fun, convert the data back to JSON again and pretty-print it
print json.dumps( data, indent=4 )
A few notes... This code does not have the nested reader loops from the original. I have no idea what those were for. One reader should be enough.
In fact, this version doesn't use a loop at all. This line generates a list of rows from the reader object:
rows = list( reader )
Also pay close attention to use use of with where the CSV and JSON files are opened. This is a great way to open a file because the file will be automatically closed at the end of the with block.
Now having said all this, I have to wonder if this exact JSON structure is what you really want? It looks like the first row of the CSV is a header row, so you may want to skip that row? You can do that easily by adding a reader.next() call before converting the rest of the CSV data to a list:
reader.next()
rows = list( reader )
Also I'm not sure I understand how you want to access the resulting data. You wouldn't be able to use data["USState"], because USState is a property of each individual row object. So say a little more about how you want to access the data and we can sort it out.
If you want to create a list of json objects in the file then you should inform yourself about what a list in json looks like.
In case that list elements are separated by a comma you should put something like this into the code:
jsonfile.write(',\n')

Categories

Resources