I have the following list after querying my DB which I'd like to turn into a dictionary:
[{date: date_value1.1, rate: rate_value1.1, source: source_name1},
{date: date_value1.2, rate: rate_value1.2, source: source_name1},
{date: date_value2.1, rate: rate_value2.1, source: source_name2},
{date: date_value2.2, rate: rate_value2.2, source: source_name2},
{date: date_valuenx, rate: rate_valuex, source: source_namex}, ...]
The dictionary should follow the following format:
{
source_name1:
[
{date: date_value1.1, rate: rate_value1.1}
{date: date_value1.2, rate: rate_value1.2}
],
source_name2:
[
{date: date_value2.1, rate: rate_value2.1}
{date: date_value2.2, rate: rate_value2.2}
],
}
I have tried a lot of different code variations, but could not get it to work. What would be the most efficient way to transform the data into the required format?
(This format is the response the client will receive after calling my API. If you have suggestions for better formatting of this response I am also open to suggestions!)
We can use defaultdict and simply append each result to our output list.
from collections import defaultdict
output = defaultdict(list)
data = [{'date': 'date_value1.1', 'rate': 'rate_value1.1', 'source': 'source_name1'}, {'date': 'date_value1.2', 'rate': 'rate_value1.2', 'source': 'source_name1'}, {'date': 'date_value2.1', 'rate': 'rate_value2.1', 'source': 'source_name2'}, {'date': 'date_value2.2', 'rate': 'rate_value2.2', 'source': 'source_name2'}, {'date': 'date_valuenx', 'rate': 'rate_valuex', 'source': 'source_namex'}]
for row in data:
output[row['source']].append({k: v for k, v in row.items() if k != 'source'})
dict(output)
#{'source_name1': [{'date': 'date_value1.1', 'rate': 'rate_value1.1'}, {'date': 'date_value1.2', 'rate': 'rate_value1.2'}], 'source_name2': [{'date': 'date_value2.1', 'rate': 'rate_value2.1'}, {'date': 'date_value2.2', 'rate': 'rate_value2.2'}], 'source_namex': [{'date': 'date_valuenx', 'rate': 'rate_valuex'}]}
import pprint
pprint.pprint(dict(output))
{'source_name1': [{'date': 'date_value1.1', 'rate': 'rate_value1.1'},
{'date': 'date_value1.2', 'rate': 'rate_value1.2'}],
'source_name2': [{'date': 'date_value2.1', 'rate': 'rate_value2.1'},
{'date': 'date_value2.2', 'rate': 'rate_value2.2'}],
'source_namex': [{'date': 'date_valuenx', 'rate': 'rate_valuex'}]}
try this:
d = [{"date":" date_value1.1", "rate": "rate_value1.1", "source": "source_name1"},
{"date": "date_value1.2", "rate": "rate_value1.2", "source": "source_name1"},
{"date": "date_value2.1", "rate": "rate_value2.1", "source": "source_name2"},]
d1 = {}
for ele in d:
key = ele.pop('source')
d1[key] = d1.get(key, list())
d1[key].append(ele)
print(d1)
output is:
{'source_name1': [{'date': ' date_value1.1', 'rate': 'rate_value1.1'}, {'date': 'date_value1.2', 'rate': 'rate_value1.2'}], 'source_name2': [{'date': 'date_value2.1', 'rate': 'rate_value2.1'}]}
Related
I have a relatively simple nested dictionary as below:
emp_details = {
'Employee': {
'jim': {'ID':'001', 'Sales':'75000', 'Title': 'Lead'},
'eva': {'ID':'002', 'Sales': '50000', 'Title': 'Associate'},
'tony': {'ID':'003', 'Sales': '150000', 'Title': 'Manager'}
}
}
I can get the sales info of 'eva' easily by:
print(emp_details['Employee']['eva']['Sales'])
but I'm having difficulty writing a statement to extract information on all employees whose sales are over 50000.
You can't use one statement because the list initializer expression can't have an if without an else.
Use a for loop:
result = {} # dict expression
result_list = [] # list expression using (key, value)
for key, value in list(emp_details['Employee'].items())): # iterate from all items in dictionary
if int(value['Sales']) > 50000: # your judgement
result[key] = value # add to dict
result_list.append((key, value)) # add to list
print(result)
print(result_list)
# should say:
'''
{'jim': {'ID':'001', 'Sales':'75000', 'Title': 'Lead'}, 'tony': {'ID':'003', 'Sales': '150000', 'Title': 'Manager'}}
[('jim', {'ID':'001', 'Sales':'75000', 'Title': 'Lead'}), ('tony', {'ID':'003', 'Sales': '150000', 'Title': 'Manager'})]
'''
Your Sales is of String type.
Therefore, we can do something like this to get the information of employees whose sales are over 50000 : -
Method1 :
If you just want to get the information : -
emp_details={'Employee':{'jim':{'ID':'001', 'Sales':'75000', 'Title': 'Lead'}, \
'eva':{'ID':'002', 'Sales': '50000', 'Title': 'Associate'}, \
'tony':{'ID':'003', 'Sales': '150000', 'Title': 'Manager'}
}}
for emp in emp_details['Employee']:
if int(emp_details['Employee'][emp]['Sales']) > 50000:
print(emp_details['Employee'][emp])
It print outs to -:
{'ID': '001', 'Sales': '75000', 'Title': 'Lead'}
{'ID': '003', 'Sales': '150000', 'Title': 'Manager'}
Method2 : You can use Dict and List comprehension to get complete information : -
emp_details={'Employee':{'jim':{'ID':'001', 'Sales':'75000', 'Title': 'Lead'}, \
'eva':{'ID':'002', 'Sales': '50000', 'Title': 'Associate'}, \
'tony':{'ID':'003', 'Sales': '150000', 'Title': 'Manager'}
}}
emp_details_dictComp = {k:v for k,v in list(emp_details['Employee'].items()) if int(v['Sales']) > 50000}
print(emp_details_dictComp)
emp_details_listComp = [(k,v) for k,v in list(emp_details['Employee'].items()) if int(v['Sales']) > 50000]
print(emp_details_listComp)
Result : -
{'jim': {'ID': '001', 'Sales': '75000', 'Title': 'Lead'}, 'tony': {'ID': '003', 'Sales': '150000', 'Title': 'Manager'}}
[('jim', {'ID': '001', 'Sales': '75000', 'Title': 'Lead'}), ('tony', {'ID': '003', 'Sales': '150000', 'Title': 'Manager'})]
I have a problem. I want to save a dict. But unfortunately I got the following error - TypeError: expected bytes, list found. I want to save my my_dict as netCDF. How could I save my dict? I looked at https://docs.xarray.dev/en/stable/user-guide/io.html , Saving Python dictionary to netCDF4 file and some other links and blogs
from netCDF4 import Dataset
my_dict = {
'_key': '1',
'group': 'test',
'data': {},
'type': '',
'code': '007',
'conType': '1',
'flag': None,
'createdAt': '2021',
'currency': 'EUR',
'detail': {
'selector': {
'number': '12312',
'isTrue': True,
'requirements': [{
'type': 'customer',
'requirement': '1'}]
}
}
'identCode': [],
}
ds = Dataset(my_dict)
[OUT] TypeError: expected bytes, list found
ds.to_netcdf("saved_on_disk.nc")
I am working with json files that stores thousands or even more entries.
firstly I want to understand the data I am working with.
import json
with open("/home/xu/stock_data/stock_market_data/nasdaq/json/AAL.json", "r") as f:
data = json.load(f)
print(json.dumps(data, indent=4))
this gives me a easy to read format, but some of the "keys"(I am not familiar with the json name, so I use the word "key" as in dict objects) have thousands of values, which makes it hard to read as a whole.
I also tried:
import json
with open("/home/xu/stock_data/stock_market_data/nasdaq/json/AAL.json", "r") as f:
data = json.load(f)
df = pd.DataFrame.from_dict(data, orient="index")
print (df.info)
but got
<bound method DataFrame.info of result error
chart [{'meta': {'currency': 'USD', 'symbol': 'AAL',... None>
this result kind of shows the structure, but it ends with ... not showcasing the whole picture.
My Question:
Is there something that works like np.array.shape for json/dict/pandas, of which can show the shape of the structure?
Is there a better library usage of interpretating the json file's structure?
Edit:
Sorry perhaps my wording of my problem was misdirecting. I tried pprint, and it provided me with:
{ 'chart': { 'error': None,
'result': [ { 'events': { 'dividends': { '1406813400': { 'amount': 0.1,
'date': 1406813400},
'1414675800': { 'amount': 0.1,
'date': 1414675800},
'1423146600': { 'amount': 0.1,
'date': 1423146600},
'1430400600': { 'amount': 0.1,
'date': 1430400600},
'1438867800': { 'amount': 0.1,
'date': 1438867800},
'1446561000': { 'amount': 0.1,
'date': 1446561000},
'1454941800': { 'amount': 0.1,
'date': 1454941800},
'1462195800': { 'amount': 0.1,
'date': 1462195800},
'1470231000': { 'amount': 0.1,
'date': 1470231000},
'1478179800': { 'amount': 0.1,
'date': 1478179800},
'1486650600': { 'amount': 0.1,
'date': 1486650600},
'1494595800': { 'amount': 0.1,
'date': 1494595800},
'1502371800': { 'amount': 0.1,
'date': 1502371800},
'1510324200': { 'amount': 0.1,
'date': 1510324200},
'1517841000': { 'amount': 0.1,
'date': 1517841000},
'1525699800': { 'amount': 0.1,
'date': 1525699800},
'1533562200': { 'amount': 0.1,
'date': 1533562200},
'1541428200': { 'amount': 0.1,
'date': 1541428200},
'1549377000': { 'amount': 0.1,
'date': 1549377000},
'1557235800': { 'amount': 0.1,
'date': 1557235800},
'1565098200': { 'amount': 0.1,
'date': 1565098200},
'1572964200': { 'amount': 0.1,
'date': 1572964200},
'1580826600': { 'amount': 0.1,
'date': 1580826600}}},
'indicators': { 'adjclose': [ { 'adjclose': [ 18.19490623474121,
19.326200485229492,
19.05280113220215,
19.80699920654297,
20.268939971923828,
20.891149520874023,
20.928863525390625,
21.28710174560547,
20.88172149658203,
20.93828773498535,
20.721458435058594,
20.514055252075195,
20.466917037963867,
20.994853973388672,
20.81572914123535,
20.2595157623291,
20.155811309814453,
19.816425323486328,
20.702600479125977,
21.032560348510742,
20.740314483642578,
21.0419864654541,
21.26824951171875,
22.531522750854492,
23.266857147216797,
23.587390899658203,
25.9725284576416,
26.27420997619629,
27.150955200195312,
27.273509979248047,
27.7448787689209,
29.507808685302734,
30.92192840576172,
31.4404239654541,
31.817523956298828,
31.940074920654297,
31.676118850708008,
32.354888916015625,
31.157604217529297,
30.158300399780273,
30.63909339904785,
31.148174285888672,
30.969064712524414,
31.496990203857422,
31.01619529724121,
31.666685104370117,
32.31717300415039,
32.31717300415039,
30.497684478759766,
31.69496726989746,
32.006072998046875,
31.7326717376709,
31.940074920654297,
31.826950073242188,
31.346155166625977,
31.61954689025879,
...
...
...
#this goes on and on for the respective "keys" of the json file. which means I have to scroll down thousands of lines to find out what type of data I have.
what I am hoping to find a a solutions that outputs something like this, where it doesn't show the data itself in whole, but only shows the "keys" and maybe some additional information. as some files may literally contain many GBs of data, making it impractical to scroll through.
#this is what I am hoping to achieve.
{
"Name": {
"title": <datatype=str,len=20>,
"time_stamp":<data_type=list, len=3000>,
"closing_price":<data_type=list, len=3000>,
"high_price_of_the_day":<data_type=list, len=3000>
...
...
...
}
}
You have a few options on how to navigate this. If you want to render your data to make more informed decisions quickly, there are the built-in libraries for rendering dictionaries (see pprint) but on a personal level I recommend something that works out of the box without much configuration. I found pprintpp to be the ideal choice for any python data structure. https://pypi.org/project/pprintpp/
Simply run in your terminal:
pip3 install pprintpp
The libraries should install under C:\Users\User\AppData\Local\Programs\Python\PythonXX\Lib\site-packages\pprintpp
After that, simply do this in your code:
import json
from pprintpp import pprint
with open("/home/xu/stock_data/stock_market_data/nasdaq/json/AAL.json", "r") as f:
data = json.load(f)
pprint(data)
You can also do pprint(data, width=1) to guarantee next dictionary key goes on the next line, even if the key is short. Ie:
some_dict = {'a': 'b', 'c': {'aa': 'bb'}}
pprint(data, width=1)
Outputs:
{
'a': 'b',
'c': {
'aa': 'bb',
},
}
Hope this helped! Cheers :)
I have the following JSON object, in which I need to post-process some labels:
{
'id': '123',
'type': 'A',
'fields':
{
'device_safety':
{
'cost': 0.237,
'total': 22
},
'device_unit_replacement':
{
'cost': 0.262,
'total': 7
},
'software_generalinfo':
{
'cost': 3.6,
'total': 10
}
}
}
I need to split the names of labels by _ to get the following hierarchy:
{
'id': '123',
'type': 'A',
'fields':
{
'device':
{
'safety':
{
'cost': 0.237,
'total': 22
},
'unit':
{
'replacement':
{
'cost': 0.262,
'total': 7
}
}
},
'software':
{
'generalinfo':
{
'cost': 3.6,
'total': 10
}
}
}
}
This is my current version, but I got stuck and not sure how to deal with the hierarchy of fields:
import json
json_object = json.load(raw_json)
newjson = {}
for x, y in json_object['fields'].items():
hierarchy = y.split("_")
if len(hierarchy) > 1:
for k in hierarchy:
newjson[k] = ????
newjson = json.dumps(newjson, indent = 4)
Here is recursive function that will process a dict and split the keys:
def splitkeys(dct):
if not isinstance(dct, dict):
return dct
new_dct = {}
for k, v in dct.items():
bits = k.split('_')
d = new_dct
for bit in bits[:-1]:
d = d.setdefault(bit, {})
d[bits[-1]] = splitkeys(v)
return new_dct
>>> splitkeys(json_object)
{'fields': {'device': {'safety': {'cost': 0.237, 'total': 22},
'unit': {'replacement': {'cost': 0.262, 'total': 7}}},
'software': {'generalinfo': {'cost': 3.6, 'total': 10}}},
'id': '123',
'type': 'A'}
I just imported an API to get the exchange rate of Taiwan dollar (TWD) with other currencies.
So I import it with this code :
import requests
r=requests.get('http://api.cambio.today/v1/full/TWD/json?key=X')
dico = r.json()
And it gives me:
{'result': {'from': 'TWD',
'conversion': [{'to': 'AED',
'date': '2020-06-23T07:23:49',
'rate': 0.124169},
{'to': 'AFN', 'date': '2020-06-23T07:19:53', 'rate': 2.606579},
{'to': 'ALL', 'date': '2020-06-19T20:48:10', 'rate': 3.74252},
{'to': 'AMD', 'date': '2020-06-22T12:00:19', 'rate': 16.176679},
{'to': 'AOA', 'date': '2020-06-22T12:32:59', 'rate': 20.160418},
{'to': 'ARS', 'date': '2020-06-23T08:00:01', 'rate': 2.363501}
]}
}
To turn it into a dataframe I tried two things:
df = pd.DataFrame(dico.get('result', {}))
and
from pandas.io.json import json_normalize
dictr = r.json()
df = json_normalize(dictr)
In both cases, I end up with a "conversion" column with one line per currency. For example the first line is: "{'to': 'AFN', 'date': '2020-06-23T07:19:53', 'rate': 2.606579}".
While I would like to have one column for the currency and one for the exchange rate.
Could someone please help me?
The json you pasted is not valid json. But I guess the format of the json should be this one
{'result': {'from': 'TWD',
'conversion': [{'to': 'AED',
'date': '2020-06-23T07:23:49',
'rate': 0.124169},
{'to': 'AFN', 'date': '2020-06-23T07:19:53', 'rate': 2.606579},
{'to': 'ALL', 'date': '2020-06-19T20:48:10', 'rate': 3.74252},
{'to': 'AMD', 'date': '2020-06-22T12:00:19', 'rate': 16.176679},
{'to': 'AOA', 'date': '2020-06-22T12:32:59', 'rate': 20.160418},
{'to': 'ARS', 'date': '2020-06-23T08:00:01', 'rate': 2.363501}]}}
In that case to create dataframe you want you can use
df = pd.DataFrame(dico.get('result', {}).get('conversion', {}))
You need to do get the "conversion" property value with the list of conversion rates, use this:
df = pd.DataFrame(dico["result"]["conversion"])
It will format your conversion data like this:
to date rate
0 AED 2020-06-23T07:23:49 0.124169
1 AFN 2020-06-23T07:19:53 2.606579
2 ALL 2020-06-19T20:48:10 3.742520
3 AMD 2020-06-22T12:00:19 16.176679
4 AOA 2020-06-22T12:32:59 20.160418
5 ARS 2020-06-23T08:00:01 2.363501