Iterating through a JSON file in python 3 - python

Currently I'm trying to get 'stringency' data from a json file which contains dates and countries. Here's an excerpt of what the json output looks like:
import pandas as pd
import json
from bs4 import BeautifulSoup
# load file
with open("Stringency April 8.txt") as file:
stringency_data = json.load(file)
stringency_data["data"]
#this gives the output:
{'2020-01-02': {'ABW': {'confirmed': None,`
'country_code': 'ABW',
'date_value': '2020-01-02',
'deaths': None,
'stringency': 0,
'stringency_actual': 0},
'AFG': {'confirmed': 0,
'country_code': 'AFG',
'date_value': '2020-01-02',
'deaths': 0,
'stringency': 0,
'stringency_actual': 0},
'AGO': {'confirmed': None,
'country_code': 'AGO',
'date_value': '2020-01-02',
'deaths': None,
'stringency': 0,
'stringency_actual': 0},
'AUS': {'confirmed': 0,
'country_code': 'AUS',
'date_value': '2020-01-02',
'deaths': 0,
'stringency': 7.14,
'stringency_actual': 7.14},
'AUT': {'confirmed': 0,
'country_code': 'AUT',
'date_value': '2020-01-02',
'deaths': 0,
'stringency': 0,
'stringency_actual': 0},.........
Here's my code so far (I've shortened it a bit for the sake of this post):
# create empty list for dates
date_index = []
[date_index.append(date) for date in stringency_data["data"]]
#creates empty lists for countries
Australia = []
Austria = []
...
US = []
# put these lists into a list
countries_lists = [Australia, Austria,...US]
# put country codes into a list
country_codes = ["AUS", "AUT",..."USA"]
# loop through countries
i = 0
for country, code in zip(countries_lists, country_codes):
while i<=len(date_index):
country.append(stringency_data["data"][date_index[i]][code]["stringency_actual"])
i+=1
When I print the list "Australia" I get all the values I want. But ever country from Austria onwards is still an empty list.
I get the output - KeyError: "AUS". This indicates that the code retrieved the whole time series, but only for the first country (Australia). How can I loop this for each country code?

Here's what I see about the data you've described/shown:
file data is a dictionary; single known/desired key is "data", value is a dictionary.
--> keys are all date_strings. Each value is a dictionary.
-----> keys are all country_codes. Each value is a dictionary.
--------> a key "stringency_actual" is present, and its value is desired.
So a straightforward plan for getting this data out could look like this:
1. grab file['data']
2. iterate all keys and values in this dictionary. (Actually, you may only care about the values.)
3. iterate all keys and values in this dictionary. Keys are country_codes, which tell you to which list you want to append the stringency_actual value you're about to get.
4. grab this dictionary['stringency_actual'] and append it to the list corresponding to the correct country.
4b. translate the country_code to the country_name, since that's apparently how you would like to store this data for now.
I changed the data retrieval because the data is all dictionaries so it's self-describing by its keys. Doing it this way can help prevent the KeyError I see mentioned in the original question and a comment. (Without the complete input file or the line number of the KeyError, I think none of us is 100% certain which value in the input is causing that KeyError.)
Potential answer:
import json
# Input sample data; would actually be retrieved from file.
stringency_data = json.loads("""
{"data": {"2020-01-02": {"ABW": {"confirmed": null,
"country_code": "ABW",
"date_value": "2020-01-02",
"deaths": null,
"stringency": 0,
"stringency_actual": 0},
"AFG": {"confirmed": 0,
"country_code": "AFG",
"date_value": "2020-01-02",
"deaths": 0,
"stringency": 0,
"stringency_actual": 0},
"AGO": {"confirmed": null,
"country_code": "AGO",
"date_value": "2020-01-02",
"deaths": null,
"stringency": 0,
"stringency_actual": 0},
"AUS": {"confirmed": 0,
"country_code": "AUS",
"date_value": "2020-01-02",
"deaths": 0,
"stringency": 7.14,
"stringency_actual": 7.14},
"AUT": {"confirmed": 0,
"country_code": "AUT",
"date_value": "2020-01-02",
"deaths": 0,
"stringency": 0,
"stringency_actual": 0}}}
}""")
country_name_by_code = {
'ABW': 'Aruba',
'AFG': 'Afghanistan',
'AUS': 'Australia',
'AUT': 'Austria',
# ...
'USA': 'United States'
}
# Output data we want to create
actual_stringencies_by_country_name = {}
# Helper method to store data we're interested in
def append_country_stringency(country_code, actual_stringency_value):
if country_code not in country_name_by_code:
print(f'Unknown country_code value "{country_code}"; ignoring.')
return
country_name = country_name_by_code[country_code]
if country_name not in actual_stringencies_by_country_name:
actual_stringencies_by_country_name[country_name] = []
actual_stringencies_by_country_name[country_name].append(actual_stringency_value)
# Walk our input data and store the parts we're looking for
for date_string, data_this_date in stringency_data['data'].items():
for country_code, country_data in data_this_date.items():
append_country_stringency(country_code, country_data['stringency_actual'])
print(actual_stringencies_by_country_name)
My output:
C:\some_dir>python test.py
Unknown country_code value "AGO"; ignoring.
{'Aruba': [0], 'Afghanistan': [0], 'Australia': [7.14], 'Austria': [0]}

Related

Type Error for Path Data. Must be list or null

I am getting the following error when attempting to convert json data to a dataframe. I have successfully used this same method to convert json to a dataframe with similar data in the same script.
The full error:
TypeError: {'success': True, 'data': {'data1': 1, 'data2': 1, 'data3': 1, 'data4': True, 'data5': 0, 'data6': 0, 'data7': False, 'data8': 'ABC', 'start_date': '2000-04-14', 'end_date': '2000-09-23', 'data9': None, 'add_time': '2000-07-12 23:00:11', 'update_time': '2000-06-1420:18:55', 'data10': 1, 'data11': 'custom', 'data12': None}}
has non list value
{'data1': 1, 'data2': 1, 'data3': 1, 'data4': True, 'data5': 0, 'data6': 0, 'data7': False, 'data8': 'ABC', 'start_date': '2000-04-14', 'end_date': '2000-09-23', 'data9': None, 'add_time': '2000-07-12 23:00:11', 'update_time': '2000-06-1420:18:55', 'data10': 1, 'data11': 'custom', 'data12': None}
for path data. Must be list or null.
the function:
def get_subscriptions(id, df):
subscriptions_params = {'api_token': 'abc'}
subscriptions_headers = {'Content-Type': 'application/json'}
subscriptions_response = requests.get('https://url/{}'.format(id), params=subscriptions_params,
headers=subscriptions_headers)
subscriptions_data = subscriptions_response.json()
subscriptions_temp_df = pd.json_normalize(subscriptions_data, record_path=['data'])
I do the exact same thing with a similar (but actually more complex) piece of data with no problems. An example of the response that works:
{'success': True, 'data': [{'data1': 1, 'data2': {'data3': 1, 'name': 'name', 'email': 'email#email.com', 'data4': 0, 'data5': None, 'data6': False, 'data7': 1}, 'data8': {'data9': 1, 'name': 'name', 'email': 'email#email.com', 'data10': 0, 'data11': None, 'data12': True, 'data13': 1}, 'data14': {'data15': True, 'name': 'name' .... etc.
this one is actually massive, where as for the one with issues the error includes the full length of the data.
removed the actual data, but did not change the type of data. strings inside single quotes are just other strings. 1s are just other numbers, etc.
any ideas why one succeeds and another fails?
I do not know what the issue/difference is, but this small change works:
def get_subscriptions(id, df):
subscriptions_params = {'api_token': 'abc'}
subscriptions_headers = {'Content-Type': 'application/json'}
subscriptions_response = requests.get('https://url/{}'.format(id), params=subscriptions_params, headers=subscriptions_headers)
subscriptions_data = subscriptions_response.json()
subscriptions_data = subscriptions_data['data']
subscriptions_temp_df = pd.json_normalize(subscriptions_data)

Problems matching values from nested dictionary

In TestRail, I have created several testruns. When I execute:
test_runs = client.send_get('get_runs/1')
pprint(test_runs)
The following results are returned:
{'_links': {'next': None, 'prev': None},
'limit': 250,
'offset': 0,
'runs': [{'assignedto_id': None,
'blocked_count': 0,
'completed_on': None,
'config': None,
'config_ids': [],
'created_by': 1,
'created_on': 1651790693,
'custom_status1_count': 0,
'custom_status2_count': 0,
'custom_status3_count': 0,
'custom_status4_count': 0,
'custom_status5_count': 0,
'custom_status6_count': 0,
'custom_status7_count': 0,
'description': None,
'failed_count': 1,
'id': 13,
'include_all': False,
'is_completed': False,
'milestone_id': None,
'name': '2022-05-05-testrun',
'passed_count': 2,
'plan_id': None,
'project_id': 1,
'refs': None,
'retest_count': 0,
'suite_id': 1,
'untested_count': 0,
'updated_on': 1651790693,
'url': 'https://xxxxxxxxxx.testrail.io/index.php?/runs/view/13'},
{'assignedto_id': None,
'blocked_count': 0,
'completed_on': 1650989972,
'config': None,
'config_ids': [],
'created_by': 5,
'created_on': 1650966329,
'custom_status1_count': 0,
'custom_status2_count': 0,
'custom_status3_count': 0,
'custom_status4_count': 0,
'custom_status5_count': 0,
'custom_status6_count': 0,
'custom_status7_count': 0,
'description': None,
'failed_count': 0,
'id': 9,
'include_all': False,
'is_completed': True,
'milestone_id': None,
'name': 'This is a new test run',
'passed_count': 0,
'plan_id': None,
'project_id': 1,
'refs': None,
'retest_count': 0,
'suite_id': 1,
'untested_count': 3,
'updated_on': 1650966329,
'url': 'https://xxxxxxxxxx.testrail.io/index.php?/runs/view/9'}],
'size': 2}
In my code, I am trying to scan through all of the resulting testruns, locate the testrun I'm interested in by matching the testrun name, and then have the ID for the testrun returned.
from pprint import pprint
from testrail import *
class connecting():
def connectPostRun(self):
client = APIClient('https://xxxxxxxxxx.testrail.io')
client.user = 'abc#abc.com'
client.password = 'abc123'
test_run_name = '2022-05-05-testrun'
test_runs = client.send_get('get_runs/1')
pprint(test_runs)
for test_run in test_runs:
if test_run['name'] == test_run_name:
run_id = test_run['id']
break
return run_id
pprint(run_id)
c=connecting()
c.connectPostRun()
Executing the code as is now results in the following error:
if test_run['name'] == test_run_name:
TypeError: string indices must be integers
You're looping over the wrong part of the datastructure that the function returned. The loop for test_run in test_runs: only iterates over the keys of the top-level dictionary ("_links", "limit", etc.).
You want to be looping over test_runs['runs'], which will give you dictionaries with the "name" keys you're matching against. Try making your loop look like this:
for test_run in test_runs['runs']:
if test_run['name'] == test_run_name:
run_id = test_run['id']
break
I'd note that there's a potential problem in this code, that if you never find a matching run, the run_id variable will never be assigned to, so the return statement at the end of the function will raise an exception. If you think that could ever happen, you should probably either set a default value, or perhaps raise your own exception (with a more clear message) if you get into that situation.

Successfully insert multiple document into MongoDB [Python]

I have the following piece of code in python:
def pushHashtagPosts(hashtagPosts):
from bson.json_util import loads
myclient = pymongo.MongoClient(mongoUri)
mydb = myclient["myDB"]
mycol = mydb["hashtags"]
data = loads(hashtagPosts)
posts = mycol.insert_many(data)
Whereas, the content of hashtagPosts looks something like this:
hashtagPosts = [{'hashtag': '###!', 'PostHashHex': '13fc9904028fb62490a3b5dc2111689376e52a06dc636c3322cfa16e33a41398', 'post': {'_id': {'$oid': '608f8eb73718c7977f9c0a43'}, 'PostHashHex': '13fc9904028fb62490a3b5dc2111689376e52a06dc636c3322cfa16e33a41398', 'PosterPublicKeyBase58Check': 'BC1YLhKJZZcPB2WbZSSekFF19UshsmmPoEjtEqrYakzusLmL25xxAJv', 'ParentStakeID': '', 'Body': 'Need hashtags ####! Or else it’s a bit difficult to create personal brand and niche on this platform. \n\nDevs are u listening?', 'ImageURLs': [], 'RecloutedPostEntryResponse': None, 'CreatorBasisPoints': 0, 'StakeMultipleBasisPoints': 12500, 'TimestampNanos': 1.6177643730879583e+18, 'IsHidden': False, 'ConfirmationBlockHeight': 13248, 'InMempool': False, 'StakeEntry': {'TotalPostStake': 0, 'StakeList': []}, 'StakeEntryStats': {'TotalStakeNanos': 0, 'TotalStakeOwedNanos': 0, 'TotalCreatorEarningsNanos': 0, 'TotalFeesBurnedNanos': 0, 'TotalPostStakeNanos': 0}, 'ProfileEntryResponse': None, 'Comments': None, 'LikeCount': 5, 'PostEntryReaderState': None, 'InGlobalFeed': False, 'IsPinned': False, 'PostExtraData': {}, 'CommentCount': 2, 'RecloutCount': 0, 'ParentPosts': None, 'PublicKeyBase58Check': 'BC1YLhKJZZcPB2WbZSSekFF19UshsmmPoEjtEqrYakzusLmL25xxAJv', 'Username': ''}},
{'hashtag': 'investementstrategy', 'PostHashHex': '92f2d08ac8f2b47fe5868b748c7f472e13ad12c284bb0e327cf317b4c2514f83', 'post': {'_id': {'$oid': '608f8eb73718c7977f9c0a3f'}, 'PostHashHex': '92f2d08ac8f2b47fe5868b748c7f472e13ad12c284bb0e327cf317b4c2514f83', 'PosterPublicKeyBase58Check': 'BC1YLhKJZZcPB2WbZSSekFF19UshsmmPoEjtEqrYakzusLmL25xxAJv', 'ParentStakeID': '', 'Body': 'Don’t say that you are going to buy ur own coin to have a steady growth of ur coin \U0001fa99. That doesn’t show the strength of ur investment nor the coin.πŸ“‰πŸ“ˆ Strength lies in others believing in ur talent, creativity and passion enough to invest in U. πŸš€πŸš€πŸš€\n#InvestementStrategy', 'ImageURLs': [], 'RecloutedPostEntryResponse': None, 'CreatorBasisPoints': 0, 'StakeMultipleBasisPoints': 12500, 'TimestampNanos': 1.6178065064906166e+18, 'IsHidden': False, 'ConfirmationBlockHeight': 13397, 'InMempool': False, 'StakeEntry': {'TotalPostStake': 0, 'StakeList': []}, 'StakeEntryStats': {'TotalStakeNanos': 0, 'TotalStakeOwedNanos': 0, 'TotalCreatorEarningsNanos': 0, 'TotalFeesBurnedNanos': 0, 'TotalPostStakeNanos': 0}, 'ProfileEntryResponse': None, 'Comments': None, 'LikeCount': 2, 'PostEntryReaderState': None, 'InGlobalFeed': False, 'IsPinned': False, 'PostExtraData': {}, 'CommentCount': 1, 'RecloutCount': 0, 'ParentPosts': None, 'PublicKeyBase58Check': 'BC1YLhKJZZcPB2WbZSSekFF19UshsmmPoEjtEqrYakzusLmL25xxAJv', 'Username': ''}},
{'hashtag': 'productivity', 'PostHashHex': 'c8fabd96f5d624d06ec8d23e90de19cf07ad4b6696dac321fda815c3000fbf1b', 'post': {'_id': {'$oid': '608f8eb73718c7977f9c0a3d'}, 'PostHashHex': 'c8fabd96f5d624d06ec8d23e90de19cf07ad4b6696dac321fda815c3000fbf1b', 'PosterPublicKeyBase58Check': 'BC1YLhKJZZcPB2WbZSSekFF19UshsmmPoEjtEqrYakzusLmL25xxAJv', 'ParentStakeID': '', 'Body': 'What is the most productive thing u have done in last 24 hours apart from Bitclout???\n\n\U0001f9d0πŸ˜πŸ™ŒπŸΌ #productivity', 'ImageURLs': [], 'RecloutedPostEntryResponse': None, 'CreatorBasisPoints': 0, 'StakeMultipleBasisPoints': 12500, 'TimestampNanos': 1.6178362054980055e+18, 'IsHidden': False, 'ConfirmationBlockHeight': 13487, 'InMempool': False, 'StakeEntry': {'TotalPostStake': 0, 'StakeList': []}, 'StakeEntryStats': {'TotalStakeNanos': 0, 'TotalStakeOwedNanos': 0, 'TotalCreatorEarningsNanos': 0, 'TotalFeesBurnedNanos': 0, 'TotalPostStakeNanos': 0}, 'ProfileEntryResponse': None, 'Comments': None, 'LikeCount': 30, 'PostEntryReaderState': None, 'InGlobalFeed': True, 'IsPinned': False, 'PostExtraData': {}, 'CommentCount': 59, 'RecloutCount': 0, 'ParentPosts': None, 'PublicKeyBase58Check': 'BC1YLhKJZZcPB2WbZSSekFF19UshsmmPoEjtEqrYakzusLmL25xxAJv', 'Username': ''}}]
When I try to insert this data as insert_many() into mongodb I get the following error:
File "test.py", line X, in pushHashtagPosts
data = loads(hashtagPosts) TypeError: the JSON object must be str, bytes or bytearray, not 'list'
However, I have inserted the line 'data = loads(hashtagPosts)' based on the solution at bson.errors.InvalidDocument: key '$oid' must not start with '$' trying to insert document with pymongo because without the 'data = loads(hashtagPosts)' I was getting the following error:
bson.errors.InvalidDocument: key '$oid' must not start with '$'
How to resolve this and successfully insert many documents in the collection?
Your issue is that hashtagPosts is a list but loads expects to work on a string.
So working backwards, the question becomes how did you construct hashtagPosts in the first place? As it contains $oid values, it looks like an output from dumps; but an output from dumps is a string. not a list. So how did it become a list?
If you are creating it manually, then just set it using ObjectId, e.g.
from bson import ObjectId
item = {'_id': ObjectId('608f8eb73718c7977f9c0a43')}
and then you won't need to use loads.

Uisng Json to format and indent in website requests Python

I am trying to indent and assort the format of balance so that it is easier to read. I want to print the RequestResponse like the Expected output. The balance variable is of type tuple. How could I do such a thing?
import bybit
import json
balance = client.Wallet.Wallet_getBalance(coin="BTC").result()
print(balance)
Output:
({'ret_code': 0, 'ret_msg': 'OK', 'ext_code': '', 'ext_info': '', 'result': {'BTC': {'equity': 0.00208347, 'available_balance': 0.00208347, 'used_margin': 0, 'order_margin': 0, 'position_margin': 0, 'occ_closing_fee': 0, 'occ_funding_fee': 0, 'wallet_balance': 0.00208347, 'realised_pnl': 0, 'unrealised_pnl': 0, 'cum_realised_pnl': 8.347e-05, 'given_cash': 0, 'service_cash': 0}}, 'time_now': '1616685310.655072', 'rate_limit_status': 118, 'rate_limit_reset_ms': 1616685310652, 'rate_limit': 120}, <bravado.requests_client.RequestsResponseAdapter object at 0x000001F5E92EB048>)
Expected Output:
{
"cross_seq": 11518,
"data": [
{
"price": "2999.00",
"side": "Buy",
"size": 9,
"symbol": "BTCUSD"
},
{
"price": "3001.00",
"side": "Sell",
"size": 10,
"symbol": "BTCUSD"
}
],
"timestamp_e6": 1555647164875373,
"topic": "orderBookL2_25.BTCUSD",
"type": "snapshot"
}
I think you provided the wrong expected output since the fields between your output and expected output don't match but in general if you want a better display of a dictionary you can use the json package:
response = {'ret_code': 0, 'ret_msg': 'OK', 'ext_code': '', 'ext_info': '', 'result': {'BTC': {'equity': 0.00208347, 'available_balance': 0.00208347, 'used_margin': 0, 'order_margin': 0, 'position_margin': 0, 'occ_closing_fee': 0, 'occ_funding_fee': 0, 'wallet_balance': 0.00208347, 'realised_pnl': 0, 'unrealised_pnl': 0, 'cum_realised_pnl': 8.347e-05, 'given_cash': 0, 'service_cash': 0}}, 'time_now': '1616685310.655072', 'rate_limit_status': 118, 'rate_limit_reset_ms': 1616685310652, 'rate_limit': 120}
import json
json.loads(json.dumps(response, indent=4, sort_keys=True))
This will give you the following output:
{'ext_code': '',
'ext_info': '',
'rate_limit': 120,
'rate_limit_reset_ms': 1616685310652,
'rate_limit_status': 118,
'result': {'BTC': {'available_balance': 0.00208347,
'cum_realised_pnl': 8.347e-05,
'equity': 0.00208347,
'given_cash': 0,
'occ_closing_fee': 0,
'occ_funding_fee': 0,
'order_margin': 0,
'position_margin': 0,
'realised_pnl': 0,
'service_cash': 0,
'unrealised_pnl': 0,
'used_margin': 0,
'wallet_balance': 0.00208347}},
'ret_code': 0,
'ret_msg': 'OK',
'time_now': '1616685310.655072'}
Another solution is to use pprint
import pprint
pprint.pprint(response)
This will give you the following output:
{'ext_code': '',
'ext_info': '',
'rate_limit': 120,
'rate_limit_reset_ms': 1616685310652,
'rate_limit_status': 118,
'result': {'BTC': {'available_balance': 0.00208347,
'cum_realised_pnl': 8.347e-05,
'equity': 0.00208347,
'given_cash': 0,
'occ_closing_fee': 0,
'occ_funding_fee': 0,
'order_margin': 0,
'position_margin': 0,
'realised_pnl': 0,
'service_cash': 0,
'unrealised_pnl': 0,
'used_margin': 0,
'wallet_balance': 0.00208347}},
'ret_code': 0,
'ret_msg': 'OK',
'time_now': '1616685310.655072'}
Import JSON, then using json.dumps(balance, indent=4) would get you that format.
You could add keyword argument of sort_keys=True if you want them sorted.

Decoding Json online to string in Python

i want to decode this json
https://deathsnacks.com/wf/data/voidtraders.json
[{u'Node': u'Kronia Relay (Saturn)', u'NodeIndex': 0, u'ManifestIndex': 0, u'Manifest': None, u'Activation': {u'usec': 0, u'sec': 1520604000}, u'Character': u"Baro'Ki Teel", u'Expiry': {u'usec': 0, u'sec': 1520773200}, u'_id': {u'id': u'5967933ca351963d1cd7faa5'}, u'Config': None, u'NextRotation': None}]
with Python and get the reply like this
Node: Kronia Relay (Saturn)
Activation: X min
Character: Baro'Ki Teel
Expiry: X min
etc etc
import requests, json, pprint
r = requests.get('https://deathsnacks.com/wf/data/voidtraders.json').text
data = json.loads(r)
pprint.pprint(data)
[{'Activation': {'sec': 1520604000, 'usec': 0},
'Character': "Baro'Ki Teel",
'Config': None,
'Expiry': {'sec': 1520773200, 'usec': 0},
'Manifest': None,
'ManifestIndex': 0,
'NextRotation': None,
'Node': 'Kronia Relay (Saturn)',
'NodeIndex': 0,
'_id': {'id': '5967933ca351963d1cd7faa5'}}]
You can also iterate over the dictionary items:
for key,value in data.items():
print('{}: {}'.format(key, value))

Categories

Resources