Python for loop to read a JSON file - python

I am trying to understand a Python for loop that is implemented as below
samples= [(objectinstance.get('sample', record['token'])['timestamp'], record)
for record in objectinstance.scene]
'scene' is a JSON file with list of dictionaries and each dictionary entry refers through values of the token to another JSON file called 'sample' containing 'timestamp' key among other keys.
Although I can roughly understand at a high level, I am not able to decipher how the 'record' is being used here as the output of object's get method. I am thinking this is some sort of list comprehension, but not sure. Can you help understand this and also point me any reference to understand this better? thank you

in non comprehension form it is as below
samples = []
for record in objectinstance.scene:
data = (
objectinstance.get('sample', record['token'])['timestamp'],
record
)
samples.append(data)
objectinstance.get('sample', record['token']) this looks like a method, which took two arguments and return a json/dictionary
{<key1>:<value1>, ... ,'timestmap':<somedata>, ...<keyn>:<valuen>}
and you are saving record with the timestamp value of this call.
it this objectinstance.get can be seen as
class Tmp:
def __init__(self):
self.scene = [{'token': 'a'}, {'token':'b'}, {'token':'c'}]
def get(self, arg1, arg2):
# calculation
return result
objectinstance = Tmp()
samples =[]
for record in objectinstance.scene:
object_instance_data = objectinstance.get('sample', record['token'])
data = object_instance_data['timestamp']
samples.append(data)
so as you can see, there is method in the object class name get, which take 2 arguments, and use them calculation to provide you result in dict/json which as timestamp as key value

Yes, you are right, it is a list comprehension. Schematically, it is something like this:
samples = [(timestamp, item) for item in list_of_dicts]
The result will be a list of touples, where (objectinstance.get('sample', record['token'])['timestamp'] is the first entry and record is the second.
Moreover, objectinstance.get('key', default) gets 'key' from a dict, if not present returns the default value, cf. documentation at python.org. The result of the get method seems to be a dict as well, from which the value of key ['timestamp'] is retrieved.

Related

Reading from nested json and getting None type Error -> try/except

I am reading data from nested json with this code:
data = json.loads(json_file.json)
for nodesUni in data["data"]["queryUnits"]['nodes']:
try:
tm = (nodesUni['sql']['busData'][0]['engine']['engType'])
except:
tm = ''
try:
to = (nodesUni['sql']['carData'][0]['engineData']['producer']['engName'])
except:
to = ''
json_output_for_one_GU_owner = {
"EngineType": tm,
"EngineName": to,
}
I am having an issue with None type error (eg. this one doesn't exists at all nodesUni['sql']['busData'][0]['engine']['engType'] cause there are no data, so I am using try/except. But my code is more complex and having a try/except for every value is crazy. Is there any other option how to deal with this?
Error: "TypeError: 'NoneType' object is not subscriptable"
This is non-trivial as your requirement is to traverse the dictionaries without errors, and get an empty string value in the end, all that in a very simple expression like cascading the [] operators.
First method
My approach is to add a hook when loading the json file, so it creates default dictionaries in an infinite way
import collections,json
def superdefaultdict():
return collections.defaultdict(superdefaultdict)
def hook(s):
c = superdefaultdict()
c.update(s)
return(c)
data = json.loads('{"foo":"bar"}',object_hook=hook)
print(data["x"][0]["zzz"]) # doesn't exist
print(data["foo"]) # exists
prints:
defaultdict(<function superdefaultdict at 0x000001ECEFA47160>, {})
bar
when accessing some combination of keys that don't exist (at any level), superdefaultdict recursively creates a defaultdict of itself (this is a nice pattern, you can read more about it in Is there a standard class for an infinitely nested defaultdict?), allowing any number of non-existing key levels.
Now the only drawback is that it returns a defaultdict(<function superdefaultdict at 0x000001ECEFA47160>, {}) which is ugly. So
print(data["x"][0]["zzz"] or "")
prints empty string if the dictionary is empty. That should suffice for your purpose.
Use like that in your context:
def superdefaultdict():
return collections.defaultdict(superdefaultdict)
def hook(s):
c = superdefaultdict()
c.update(s)
return(c)
data = json.loads(json_file.json,object_hook=hook)
for nodesUni in data["data"]["queryUnits"]['nodes']:
tm = nodesUni['sql']['busData'][0]['engine']['engType'] or ""
to = nodesUni['sql']['carData'][0]['engineData']['producer']['engName'] or ""
Drawbacks:
It creates a lot of empty dictionaries in your data object. Shouldn't be a problem (except if you're very low in memory) as the object isn't dumped to a file afterwards (where the non-existent values would appear)
If a value already exists, trying to access it as a dictionary crashes the program
Also if some value is 0 or an empty list, the or operator will pick "". This can be workarounded with another wrapper that tests if the object is an empty superdefaultdict instead. Less elegant but doable.
Second method
Convert the access of your successive dictionaries as a string (for instance just double quote your expression like "['sql']['busData'][0]['engine']['engType']", parse it, and loop on the keys to get the data. If there's an exception, stop and return an empty string.
import json,re,operator
def get(key,data):
key_parts = [x.strip("'") if x.startswith("'") else int(x) for x in re.findall(r"\[([^\]]*)\]",key)]
try:
for k in key_parts:
data = data[k]
return data
except (KeyError,IndexError,TypeError):
return ""
testing with some simple data:
data = json.loads('{"foo":"bar","hello":{"a":12}}')
print(get("['sql']['busData'][0]['engine']['engType']",data))
print(get("['hello']['a']",data))
print(get("['hello']['a']['e']",data))
we get, empty string (some keys are missing), 12 (the path is valid), empty string (we tried to traverse a non-dict existing value).
The syntax could be simplified (ex: "sql"."busData".O."engine"."engType") but would still have to retain a way to differentiate keys (strings) from indices (integers)
The second approach is probably the most flexible one.

Pythonic way to populate a dictionary from list of records

Background
I have a module called db.py that is basically consist of wrapper functions that make calls to the db. I have a table called nba and that has columns like player_name age player_id etc.
I have a simple function called db_cache() where i make a call to the db table and request to get all the player ids. The output of the response looks something like this
[Record(player_id='31200952409069'), Record(player_id='31201050710077'), Record(player_id='31201050500545'), Record(player_id='31001811412442'), Record(player_id='31201050607711')]
Then I simply iterate through the list and dump each item inside a dictionary.
I am wondering if there is a more pythonic way to populate the dictionary?
My code
def db_cache():
my_dict: Dict[str, None] = {}
response = db.run_query(sql="SELECT player_id FROM nba")
for item in response:
my_dict[item.player_id] = None
return my_dict
my_dict = db_cache()
This is built-in to the dict type:
>>> help(dict.fromkeys)
Help on built-in function fromkeys:
fromkeys(iterable, value=None, /) method of builtins.type instance
Create a new dictionary with keys from iterable and values set to value.
The value we want is the default of None, so all we need is:
my_dict = dict.from_keys(db.run_query(sql="SELECT player_id FROM nba"))
Note that the value will be reused, and not copied, which can cause problems if you want to use a mutable value. In these cases, you should instead simply use the dict comprehension, as given in #AvihayTsayeg's answer.
my_arr = [1,2,3,4]
my_dict = {"item":item for item in my_arr}

Thingspeak: Parse json response with Python

I would like to create an Alexa skill using Python to use data uploaded by sensors to Thingspeak. The cases where I only use one specific value is quite easy, the response from Thingspeak is the value only. When I want to use several values, in my case to sum up the athmospheric pressure to determine tendencies, teh response is a json object like this:
{"channel":{"id":293367,"name":"Weather Station","description":"My first attempt to build a weather station based on an ESP8266 and some common sensors.","latitude":"51.473509","longitude":"7.355569","field1":"humidity","field2":"pressure","field3":"lux","field4":"rssi","field5":"temp","field6":"uv","field7":"voltage","field8":"radiation","created_at":"2017-06-25T07:35:37Z","updated_at":"2018-08-04T12:11:22Z","elevation":"121","last_entry_id":1812},"feeds":
[{"created_at":"2018-10-21T18:11:45Z","entry_id":1713,"field2":"1025.62"},
{"created_at":"2018-10-21T18:12:05Z","entry_id":1714,"field2":"1025.58"},
{"created_at":"2018-10-21T18:12:25Z","entry_id":1715,"field2":"1025.56"},
{"created_at":"2018-10-21T18:12:45Z","entry_id":1716,"field2":"1025.65"},
{"created_at":"2018-10-21T18:13:05Z","entry_id":1717,"field2":"1025.58"},
{"created_at":"2018-10-21T18:13:25Z","entry_id":1718,"field2":"1025.63"}]
I now started with
f = urllib.urlopen(link) # Get your data
json_object = json.load(f)
for entry in json_object[0]
print entry["field2"]
The json object is a bit recursive, it is a list containing a list with an element with an array as the value.
Now I am not quite sure how to iterate over the values of the key "field2" in the array. I am quite new to Python and also json. Perhaps anyone can help me out?
Thanks in advance!
This has nothing to do with json - once the json string parsed by json.load(), what you get is a plain python object (usually a dict, sometimes a list, rarely - but this would be legal - a string, int, float, boolean or None).
it is a list containing a list with an element with an array as the value.
Actually it's a dict with two keys "channel" and "feeds". The first one has another dict for value, and the second a list of dicts. How to use dicts and lists is extensively documented FWIW
https://docs.python.org/3/tutorial/datastructures.html#dictionaries
https://docs.python.org/3/library/stdtypes.html#mapping-types-dict
https://docs.python.org/3/tutorial/introduction.html#lists
https://docs.python.org/3/library/stdtypes.html#sequence-types-list-tuple-range
Here the values you're looking for are stored under the "field2" keys of the dicts in the "feeds" key, so what you want is:
# get the list stored under the "feeds" key
feeds = json_object["feeds"]
# iterate over the list:
for feed in feeds:
# get the value for the "field2" key
print feed["field2"]
You have a dictionary. Use key to access the value
Ex:
json_object = {"channel":{"id":293367,"name":"Weather Station","description":"My first attempt to build a weather station based on an ESP8266 and some common sensors.","latitude":"51.473509","longitude":"7.355569","field1":"humidity","field2":"pressure","field3":"lux","field4":"rssi","field5":"temp","field6":"uv","field7":"voltage","field8":"radiation","created_at":"2017-06-25T07:35:37Z","updated_at":"2018-08-04T12:11:22Z","elevation":"121","last_entry_id":1812},"feeds":
[{"created_at":"2018-10-21T18:11:45Z","entry_id":1713,"field2":"1025.62"},
{"created_at":"2018-10-21T18:12:05Z","entry_id":1714,"field2":"1025.58"},
{"created_at":"2018-10-21T18:12:25Z","entry_id":1715,"field2":"1025.56"},
{"created_at":"2018-10-21T18:12:45Z","entry_id":1716,"field2":"1025.65"},
{"created_at":"2018-10-21T18:13:05Z","entry_id":1717,"field2":"1025.58"},
{"created_at":"2018-10-21T18:13:25Z","entry_id":1718,"field2":"1025.63"}]}
for entry in json_object["feeds"]:
print entry["field2"]
Output:
1025.62
1025.58
1025.56
1025.65
1025.58
1025.63
I just figured it out, it was just like expected.
You have to get the entries array from the dict and than iterate over the list of items and print the value to the key field2.
# Get entries from the response
entries = json_object["feeds"]
# Iterate through each measurement and print value
for entry in entries:
print entry['field2']

Concise way to convert multiple Date Time to seconds?

I am looking for a way to write the code below in a more concise manner. I thought about trying df[timemonths] = pd.to_timedelta(df[timemonths])...
but it did not work (arg must be a string, timedelta, list, tuple, 1-d array, or Series).
Appreciate any help. Thanks
timemonths = ['TimeFromPriorRTtoSRS', 'TimetoAcuteG3','TimetoLateG3',
'TimeSRStoLastFUDeath','TimeDiagnosistoLastFUDeath',
'TimetoRecurrence']
monthsec = 2.628e6 # to convert to months
df.TimetoLocalRecurrence = pd.to_timedelta(df.TimetoLocalRecurrence).dt.total_seconds()/monthsec
df.TimeFromPriorRTtoSRS = pd.to_timedelta(df.TimeFromPriorRTtoSRS).dt.total_seconds()/monthsec
df.TimetoAcuteG3 = pd.to_timedelta(df.TimetoAcuteG3).dt.total_seconds()/monthsec
df.TimetoLateG3 = pd.to_timedelta(df.TimetoLateG3).dt.total_seconds()/monthsec
df.TimeSRStoLastFUDeath = pd.to_timedelta(df.TimeSRStoLastFUDeath).dt.total_seconds()/monthsec
df.TimeDiagnosistoLastFUDeath = pd.to_timedelta(df.TimeDiagnosistoLastFUDeath).dt.total_seconds()/monthsec
df.TimetoRecurrence = pd.to_timedelta(df.TimetoRecurrence).dt.total_seconds()/monthsec
You could write your operation as a lambda function and then apply it to the relevant columns:
timemonths = ['TimeFromPriorRTtoSRS', 'TimetoAcuteG3','TimetoLateG3',
'TimeSRStoLastFUDeath','TimeDiagnosistoLastFUDeath',
'TimetoRecurrence']
monthsec = 2.628e6
convert_to_months = lambda x: pd.to_timedelta(x).dt.total_seconds()/monthsec
df[timemonths] = df[timemonths].apply(convert_to_months)
Granted I am kind of guessing here since you haven't provided any example data to work with.
Iterate over vars() of df
Disclaimer: this solution will most likely only work if the df class doesn't have any other variables.
The way this works is by simply moving the repetitive code after the = to a function.
def convert(times):
monthsec = 2.628e6
return {
key: pd.to_timedelta(value).dt.total_seconds()/monthsec
for key, value in times.items()
}
Now we have to apply this function to each variable.
The problem here is that it can be tedious to apply it to each variable individually, so we could use your list timemonths to apply it based on the keys, however, this requires us to create an array of keys manually like so:
timemonths = ['TimeFromPriorRTtoSRS', 'TimetoAcuteG3','TimetoLateG3', 'TimeSRStoLastFUDeath','TimeDiagnosistoLastFUDeath', 'TimetoRecurrence']
And this can be annoying, especially if you add more, or take away some because you have to keep updating this array.
So instead, let's dynamically iterate over every variable in df
for key, value in convert(vars(df)).items():
setattr(df, key, value)
Full Code:
def convert(times):
monthsec = 2.628e6
return {
key: pd.to_timedelta(value).dt.total_seconds()/monthsec
for key, value in times.items()
}
for key, value in convert(vars(df)).items():
setattr(df, key, value)
Sidenote
The reason I am using setattr is because when examining your code, I came to the conclusion that df was most likely a class instance, and as such, properties (by this I mean variables like self.variable = ...) of a class instance must by modified via setattr and not df['variable'] = ....

Append all fields returned as attributes of a pyodbc cursor to a user-defined data type

I have code along these lines:
classinstance.col1 = queryresult.col1
classinstance.col2 = queryresult.col2
classinstance.col3 = queryresult.col3
classinstance.col4 = queryresult.col4
Which adds variables to the classinstance and assigns the values of the queryresult column with the same name as the variable.
I am hoping to make my code a little more flexible, and not need to identify the columns by name. To this end, I was wondering if there was some way to do a loop over all the columns, rather than handle each one individually. Something like this (This is psuedocode rather than actual code, since I'm not sure what it should actually look like):
for each var in vars(queryresult):
classinstance.(var.name) = var.value
Is this possible? What does it require? Is there some fundamental misunderstanding on my part?
I'm assuming there's only one row in the result for the following example (built with help from comments here). The key component here is zip(row.cursor_description, row) used to get column names from pyodbc.Row object.
# convert row to an object, assuming row variable contains query result
rowdict = { key[0]:value for (key, value) in zip(row.cursor_description, row) }
# loop through keys (equivalent to column names) and set class instance values
# assumes existing instance of class is variable classinstance
for column in rowdict.keys():
classinstance[column] = rowdict[column]

Categories

Resources