Mapping key and values in JSON - python

I am trying to map key and values to write it in JSON and I am unable to convert it as required in the below template:
{Pregnancies : [], Glucose : [], BloodPressure : [], SkinThickness : [579], Insulin : [8,13,111,153,...so on]
Below is the code I am working on currently (names is a list having values Blood Pressure, SkinThickness ... and Outlier records has values [], [],[579],[8,13,111,153,...].
Outlier_records
names
joinedlist = names + Outlier_records
joinedlist
json.dumps(joinedlist)
os.chdir(Output)
with open('Outlier_Records.txt', 'w') as json_file:
json.dump(joinedlist, json_file)
The output that I am getting now is attached in the image below whereas I am actually expecting the output to be mapped as above
{Pregnancies: [], BloodPressure: [], SkinThickness: [579], Insulin: [8,13,111,153,...so on]

The template you provided is in json format which are dicts in python, so in this case you would need to create a dictionary and add the corresponding data to it in the form of key-value paris with the code below.
import json
names = ["blood", "test", "test1", "ntek"]
outliner_records = [
[],
[],
[579],
[8,13,111,153]
]
joinedDict = {}
for i in range(len(names)):
joinedDict[names[i]] = outliner_records[i]
with open("tt.json", "w") as json_file:
json.dump(joinedDict, json_file)

Instead of joining your lists you can make a dict from them to pass to json.dumps:
import json
keys = ['apples','bananas','fish']
values = [1,2,[1,2,3]]
out = dict(zip(keys,values))
print(json.dumps(out))
outputs:
{"apples": 1, "bananas": 2, "fish": [1, 2, 3]}

Related

parse JSON file to CSV with key values null in python

Example
{"data":"value1","version":"value2","version1":"value3"}
{"data":"value1","version1":"value3"}
{"data":"value1","version1":"value3","hi":{"a":"true,"b":"false"}}
I have a JSON file and need to convert it to csv, however the rows are not having same columns, and some rows have nested attributes,how to convert them in python script.
I tried JSON to csv using Python code, but it gives me an error
In order to convert a JSON file to a CSV file in Python, you will need to use the Pandas library.
import pandas as pd
data = [
{
"data": "value1",
"version": "value2",
"version1": "value3"
},
{
"data": "value1",
"version1": "value3"
},
{
"data": "value1",
"version1": "value3",
"hi": {
"a": "true,",
"b": "false"
}
}
]
df = pd.DataFrame(data)
df.to_csv('data.csv', index=False)
I have correctly formatted your JSON since it was giving errors.
You could convert the JSON data to a flat list of lists with column names on the first line. Then process that to make the CSV output.
def flatDict(D,p=""):
if not isinstance(D,dict):
return {"":D}
return {p+k+s:v for k,d in D.items() for s,v in flatDict(d,".").items()}
def flatData(data):
lines = [*map(flatDict,data)]
names = dict.fromkeys(k for d in lines for k in d)
return [[*names]] + [ [*map(line.get,names)] for line in lines ]
The flatDict function converts a nested dictionary structure to a single level dictionary with nested keys combined and brought up to the top level. This is done recursively so that it works for any depth of nesting
The flatData function processes each line, to make a list of flattened dictionaries (lines). The union of all keys in that list forms the list of columns names (using a dictionary constructor to get them in order of appearance). The list of names and lines is returned by converting each dictionary to a list mapping key names to line data where present (using the .get() method of dictionaries).
output:
E = [{"data":"value1","version":"value2","version1":"value3"},
{"data":"value1","version1":"value3"},
{"data":"value1","version1":"value3","hi":{"a":"true","b":"false"}} ]
for line in flatData(E):
print(line)
['data', 'version', 'version1', 'hi.a', 'hi.b'] # col names
['value1', 'value2', 'value3', None, None] # data ...
['value1', None, 'value3', None, None]
['value1', None, 'value3', 'true', 'false']

Deeply nested json - a list within a dictionary to Pandas DataFrame

I'm trying to parse nested json results.
data = {
"results": [
{
"components": [
{
"times": {
"periods": [
{
"fromDayOfWeek": 0,
"fromHour": 12,
"fromMinute": 0,
"toDayOfWeek": 4,
"toHour": 21,
"toMinute": 0,
"id": 156589,
"periodId": 20855
}
],
}
}
],
}
],
}
I can get to and create dataframes for "results" and "components" lists, but cannot get to "periods" due to the "times" dict. So far I have this:
df = pd.json_normalize(data, record_path = ['results','components'])
Need a separate "periods" dataframe with the included column names and values. Would appreciate your help on this. Thank you!
I results
II components
III times
IIII periods
The normalize should be correct way:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.json_normalize.html
There is 4 level of nesting. There can be x components in results and y times in components - however that type of nesting is overengineering?
The simplest way of getting data is:
print data['a']['b']['c']['d'] (...)
in your case:
print data['results']['components']['times']['periods']
You can access the specific nested level by this piece of code:
def GetPropertyFromPeriods (property):
propertyList = []
for x in data['results']['components']['times']:
singleProperty = photoURL['periods'][property]
propertyList.append(singleProperty)
return propertyList
This give you access to one property inside periods (fromDayOfWeek, fromHour, fromMinute)
After coverting json value, transform it into pandas dataframe:
print pd.DataFrame(data, columns=["columnA", "columnB”])
If stuck:
How to Create a table with data from JSON output in Python
Python - How to convert JSON File to Dataframe
pandas documentation:
pandas.DataFrame.from_dict
pandas.json_normalize

Python: how to assign multiple values to one key

I extract data using API and retrieve a list of servers and backups. Some servers have more than one backup. This is how I get list of all servers with backaup IDs.
bkplist = requests.get('https://heee.com/1.2/storage/backup')
bkplist_json = bkplist.json()
backup_list = bkplist.json()
backupl = backup_list['storages']['storage']
Json looks like this:
{
"storages": {
"storage": [
{
"access": "",
"created": "",
"license": ,
"origin": "01165",
"size": ,
"state": "",
"title": "",
"type": "backup",
"uuid": "01019",
"zone": ""
},
Firstly I create a dictionary to store this data:
backup = {}
for u in backup_list['storages']['storage']:
srvuuidorg = u['origin']
backup_uuid = u['uuid']
backup[srvuuidorg] = backup_uuid
But then I find out there is more than one value for every server. As dictionary can have just one value assigned to one key I wanted to use some hybrid of list and dictionary, but I just can't figure it out how to do this with this example.
Servers are nested in storages->storage and I need to assign a couple of uuid which is backup ID to one origin which is server ID.
I know about collections module and with a simple example it is quite understandable, but I have a problem how to use this in my example with extracting data through API.
How extract origin and assign to this key other values stored in json uuid?
What's more it is a massive amount of data so I cannot add every value manually.
You can do something like this.
from collections import defaultdict
backup = defaultdict(list)
for u in backup_list['storages']['storage']:
srvuuidorg = u['origin']
backup_uuid = u['uuid']
backup[srvuuidorg].append(backup_uuid)
Note that you can simplify your loop like this.
from collections import defaultdict
backup = defaultdict(list)
for u in backup_list['storages']['storage']:
backup[u['origin']].append(u['uuid'])
But this may be considering as less readable.
You could store uuid list for origin key.
I sugget the following 2 ways:
Creating empty list for first accessing origin, and then appending to it:
backup = {}
for u in backup_list['storages']['storage']:
srvuuidorg = u['origin']
backup_uuid = u['uuid']
if not backup.get(srvuuidorg):
backup[srvuuidorg] = []
backup[srvuuidorg].append(backup_uuid)
Using defaultdict collection, which basically does the same for you under the hood:
from collections import defaultdict
backup = defaultdict(list)
for u in backup_list['storages']['storage']:
srvuuidorg = u['origin']
backup_uuid = u['uuid']
backup[srvuuidorg].append(backup_uuid)
It seems to me that the last way is more elegant.
If you need to store uuid unique list you should use the saem approach with set instead of list.
A json allows to contain an array in a key:
var= {
"array": [
{"id": 1, "value": "one"},
{"id": 2, "value": "two"},
{"id": 3, "value": "three"}
]
}
print var
{'array': [{'id': 1, 'value': 'one'}, {'id': 2, 'value': 'two'}, {'id': 3, 'value': 'three'}]}
var["array"].append({"id": 4, "value": "new"})
print var
{'array': [{'id': 1, 'value': 'one'}, {'id': 2, 'value': 'two'}, {'id': 3, 'value': 'three'}, {'id': 4, 'value': 'new'}]}
You can use a list for multiple values.
dict = {"Greetings": ["hello", "hi"]}

Extract information from a string of data in a CSV file

I'm trying to run some analysis on some data and ran into some questions while parsing the data in csv file.
This is the raw data in one cell:
{"completed": true, "attempts": 1, "item_state": {"1": {"correct": true, "zone": "zone-7"}, "0": {"correct": true, "zone": "zone-2"}, "2": {"correct": true, "zone": "zone-12"}}, "raw_earned": 1.0}
Formatted for clarity:
{
"completed": true,
"attempts": 1,
"item_state": {
"1": {
"correct": true,
"zone": "zone-7"
},
"0": {
"correct": true,
"zone": "zone-2"
},
"2": {
"correct": true,
"zone": "zone-12"
}
},
"raw_earned": 1.0
}
I want to extract only the zone information after each number (1, 0, 2) and put the results (zone-7, zone-2, zone-12) in separate columns. How can I do that using R or Python?
It looks like a dictionary, and when it is stored as an element in a csv, it is stored as a string. In python you can use ast.literal_Eval(). It parses strings to pythonic data types like list, dictionary etc. Also works as data type parser.
If the cell you mentioned is indexed [i,j],
import pandas as pd
import ast
df = pd.read_csv(filename)
a = ast.literal_eval(df.loc[i][j])
b = pd.io.json.json_normalize(a)
output = []
for i in range(df.shape[0]):
c = ast.literal_eval(df.iloc[i][j])
temp = pd.DataFrame({'key':c['item_state'].keys(),'zone':[x['zone'] for x in c['item_state'].values()]})
temp['row_n'] = i
output.append(temp)
output2 = pd.concat(temp)
If [i,j] is your cell,
a in the above code is the dictionary as given in your example.
b is a flattened dictionary and contains all key,value pairs in output.
The rest of the code is to extract only the zone values.
If you are looking to apply this for more than one cell, use the loop, else only use the content inside the loop.
output is a list data frames, each of which has the item_state key and zone value as columns and also a row_number for identification.
output2 is concatenated data frame.
ast - Abstract Syntax Trees
In Python, you can use the json library to do something like this:
d = json.loads(raw_cell_data) # Load the data into a Python dict
results = {}
for key, value in d['item_state'].items():
results[key] = value['zone']
And then you can use results to print to a CSV.
The initial situation is a bit unclear, what you show looks like json, but you mention it is in a csv.
assuming you have a csv where the individual fields are strings containing json data, you can extract the zone information by using the csv and json packages.
set up a for loop to iterate over the rows of the csv (see csv docs for more detail)
and then use the json module to extract the zone from the string.
import csv
import json
# to get ss from a csv:
# my_csv = csv.reader( ... )
# for row in my_csv:
# ss = row[N]
ss = '{"completed": true, "attempts": 1, "item_state": {"1": {"correct": true, "zone": "zone-7"}, "0": {"correct": true, "zone": "zone-2"}, "2": {"correct": true, "zone": "zone-12"}}, "raw_earned": 1.0}'
jj = json.loads(ss)
for vv in jj['item_state'].values():
print(vv['zone'])
Convert the cell value to JSON and then you can access any element you would like so:
import csv
import json
column_index = 0
state_keys = ['1', '0', '2']
with open('data.csv') as f:
reader = csv.reader(f, delimiter=';')
for row in reader:
object = json.loads(row[column_index])
state = object['item_state']
# Show all values under item_state in order they appear:
for key, value in state.items():
print(state[key]['zone'])
# Show only state_keys defined in variable in order they are defined in a list
for key in state_keys:
print(state[key]['zone'])
Some thing like this. Not tested as you have not provided sufficient sample.
import csv
import json
with open('data.csv') as fr:
rows = list(csv.reader(fr))
for row in rows:
data = json.loads(row[0])
new_col_data = [v['zone'] for v in data['item_state'].values()]
row.append(", ".join(new_col_data)
with open('new_data.csv', 'w') as fw:
writer = csv.writer(fw)
writer.writerows(rows)
In R package rjson, function fromJSON is simple to use.
Any of the following ways of reading the JSON string will produce the same result.
library("rjson")
x <- '{"completed": true, "attempts": 1, "item_state": {"1": {"correct": true, "zone": "zone-7"}, "0": {"correct": true, "zone": "zone-2"}, "2": {"correct": true, "zone": "zone-12"}}, "raw_earned": 1.0}'
json <- fromJSON(json_str = x)
# if the string is in a file, say, "so.json"
#json <- fromJSON(file = "so.json")
json is an object of class "list", make a dataframe out of it.
result <- data.frame(zone_num = names(json$item_state))
result <- cbind(result, do.call(rbind.data.frame, json$item_state)[2])
result
# zone_num zone
#1 1 zone-7
#0 0 zone-2
#2 2 zone-12
get item_state and find the value as a zone than append key and value to empty list and finally create the new columns with those list
zone_val = []
zone_key = []
for k,v in d['item_state'].items():
zone_val.append(v['zone'])
zone_key.append(k)
DF[zone_key] = zone_key
DF[zone_val] = zone_val
In Python, it looks like each cells data is a dictionary that also contains dictionaries, ie nested dictionaries
If this cell's data were referenced as a variable cell_data, then you can get into the inner "item_state" dictionary with:
cell_data["item_state"]
this will return
{"1": {"correct": true, "zone": "zone-7"}, "0": {"correct": true, "zone": "zone-2"}, "2": {"correct": true, "zone": "zone-12"}}
Then you can do the same operation one level deeper by asking for the "1" dictionary:
cell_data["item_state"]["1"]
returns:
{'correct': 'true', 'zone': 'zone-7'}
Then once more:
cell_data["item_state"]["1"]["zone"]
returns
'zone-7'
So to bring it all together, you could get what you want with the following:
your_list = list( cell_data["item_state"][i]['zone'] for i in ["1","0","2"] )
returns:
['zone-7', 'zone-2', 'zone-12']

pandas change the order of columns

In my project I'm using flask I get a JSON (by REST API) that has data that I should convert to a pandas Dataframe.
The JSON looks like:
{
"entity_data":[
{"id": 1, "store": "a", "marker": "a"}
]
}
I get the JSON and extract the data:
params = request.json
entity_data = params.pop('entity_data')
and then I convert the data into a pandas dataframe:
entity_ids = pd.DataFrame(entity_data)
the result looks like this:
id marker store
0 1 a a
This is not the original order of the columns. I'd like to change the order of the columns as in the dictionary.
help?
Use OrderedDict for an ordered dictionary
You should not assume dictionaries are ordered. While dictionaries are insertion ordered in Python 3.7, whether or not libraries maintain this order when reading json into a dictionary, or converting the dictionary to a Pandas dataframe, should not be assumed.
The most reliable solution is to use collections.OrderedDict from the standard library:
import json
import pandas as pd
from collections import OrderedDict
params = """{
"entity_data":[
{"id": 1, "store": "a", "marker": "a"}
]
}"""
# replace myjson with request.json
data = json.loads(params, object_pairs_hook=OrderedDict)
entity_data = data.pop('entity_data')
df = pd.DataFrame(entity_data)
print(df)
# id store marker
# 0 1 a a
Just add the column names parameter.
entity_ids = pd.DataFrame(entity_data, columns=["id","store","marker"])
Assuming you have access to JSON sender, you can send the order in the JSON itself.
like
`{
"order":['id','store','marker'],
"entity_data":{"id": [1,2], "store": ["a","b"],
"marker": ["a","b"]}
}
then create DataFrame with columns specified. as said by Chiheb.K.
import pandas as pd
params = request.json
entity_data = params.pop('entity_data')
order = params.pop('order')
entity_df=pd.DataFrame(data,columns=order)
if you cannot explicitly specify the order in the JSON. see this answer to specify object_pairs_hook in
JSONDecoder to get an OrderedDict and then create the DataFrame

Categories

Resources