When I wish to both retrieve and replace a value in a dict, I naively write:
old_value = my_dict['key']
my_dict['key'] = new_value
But.. that's two lookups for 'key' in my_dict hashtable. And I'm sure that only one is necessary.
How do I get the same behaviour with only one lookup?
Does python automately JIT-optimizes this away?
[EDIT]: I am aware that python dict lookup is cheap and that performance gain would be quite anecdotic unless my_dict is huge or the operation is done billions times a milisecond.
I am just curious about this apparently-basic feature being implemented or not in python, like an old_value = my_dict.retrieve_and_replace('key', new_value).
Storing a reference rather than a value in the dict will do what you want. This isn't intended to be an elegant demonstration, just a simple one:
>>> class MyMutableObject(object):
pass
>>> m = MyMutableObject()
>>> m.value = "old_value"
>>> my_dict["k"] = m
Now when you want to change my_dict["k"] to a new value but remember the old one, with a single lookup on "k":
>>> m2 = my_dict["k"]
>>> m2.value
'old_value'
>>> m2.value = 'new_value'
It's up to you to decide if the price this pays in complexity is worth the time saving of one dictionary lookup. Dereferencing m2.value and assigning it afresh will cost 2 more dictionary lookups under the hood.
Related
I was just playing around with Python's dictionaries and lists. And I found these weird things -
At the time of initialization of any Python dict, list or some other data types, you can't perform operations of that data type.
# Case 1
d1 = {}
d1.update({'a': 'A'})
print(d1)
# Case 2
d2 = {}.update({'p': 'P'})
print(d2)
Output:
{'a': 'A'}
None
The weird thing is, neither d2 initialized not it threw any error.
What I think about this?
Well, "Python's interpreter reads code line by line". So, when it reads the line d1 = {} it saves d1 and it's type (dict) in memory.
But this is not happening with d2 = {}.update({'p': 'P'})
Any operation of dict can be performed on dict object, which in the second case never initiated, ie. dictionary object was never created.
What you think about this?
Please drop your answers and correct me, if I was wrong. Which I guess, I may be.
dict.update() is an operation which has no return type.
Well, .update() does the dictionary update in place and returns nothing, so when you are initiaising an empty dictionary and updating it, it returns Nothing and None gets assigned to the d2. Instead it will be like:
d = {}
d.update({'p': 'P'})
print(d)
I am looking for a way to write the code below in a more concise manner. I thought about trying df[timemonths] = pd.to_timedelta(df[timemonths])...
but it did not work (arg must be a string, timedelta, list, tuple, 1-d array, or Series).
Appreciate any help. Thanks
timemonths = ['TimeFromPriorRTtoSRS', 'TimetoAcuteG3','TimetoLateG3',
'TimeSRStoLastFUDeath','TimeDiagnosistoLastFUDeath',
'TimetoRecurrence']
monthsec = 2.628e6 # to convert to months
df.TimetoLocalRecurrence = pd.to_timedelta(df.TimetoLocalRecurrence).dt.total_seconds()/monthsec
df.TimeFromPriorRTtoSRS = pd.to_timedelta(df.TimeFromPriorRTtoSRS).dt.total_seconds()/monthsec
df.TimetoAcuteG3 = pd.to_timedelta(df.TimetoAcuteG3).dt.total_seconds()/monthsec
df.TimetoLateG3 = pd.to_timedelta(df.TimetoLateG3).dt.total_seconds()/monthsec
df.TimeSRStoLastFUDeath = pd.to_timedelta(df.TimeSRStoLastFUDeath).dt.total_seconds()/monthsec
df.TimeDiagnosistoLastFUDeath = pd.to_timedelta(df.TimeDiagnosistoLastFUDeath).dt.total_seconds()/monthsec
df.TimetoRecurrence = pd.to_timedelta(df.TimetoRecurrence).dt.total_seconds()/monthsec
You could write your operation as a lambda function and then apply it to the relevant columns:
timemonths = ['TimeFromPriorRTtoSRS', 'TimetoAcuteG3','TimetoLateG3',
'TimeSRStoLastFUDeath','TimeDiagnosistoLastFUDeath',
'TimetoRecurrence']
monthsec = 2.628e6
convert_to_months = lambda x: pd.to_timedelta(x).dt.total_seconds()/monthsec
df[timemonths] = df[timemonths].apply(convert_to_months)
Granted I am kind of guessing here since you haven't provided any example data to work with.
Iterate over vars() of df
Disclaimer: this solution will most likely only work if the df class doesn't have any other variables.
The way this works is by simply moving the repetitive code after the = to a function.
def convert(times):
monthsec = 2.628e6
return {
key: pd.to_timedelta(value).dt.total_seconds()/monthsec
for key, value in times.items()
}
Now we have to apply this function to each variable.
The problem here is that it can be tedious to apply it to each variable individually, so we could use your list timemonths to apply it based on the keys, however, this requires us to create an array of keys manually like so:
timemonths = ['TimeFromPriorRTtoSRS', 'TimetoAcuteG3','TimetoLateG3', 'TimeSRStoLastFUDeath','TimeDiagnosistoLastFUDeath', 'TimetoRecurrence']
And this can be annoying, especially if you add more, or take away some because you have to keep updating this array.
So instead, let's dynamically iterate over every variable in df
for key, value in convert(vars(df)).items():
setattr(df, key, value)
Full Code:
def convert(times):
monthsec = 2.628e6
return {
key: pd.to_timedelta(value).dt.total_seconds()/monthsec
for key, value in times.items()
}
for key, value in convert(vars(df)).items():
setattr(df, key, value)
Sidenote
The reason I am using setattr is because when examining your code, I came to the conclusion that df was most likely a class instance, and as such, properties (by this I mean variables like self.variable = ...) of a class instance must by modified via setattr and not df['variable'] = ....
I'm looking for the most efficient/ python way of solving the following Problem:
I have a list of local objects (list_a), a list of objects on a server (list_b). list_b is a list of dictionaries, not objects.
I want to update some information in the local object with the ones given by the server. The assignment can be done by the attribute name or the identifier 'name', in the dictionary. Both lists could be a subset of each other.
Here is my current solution with some example data:
class Dummy():
def __init__(self, name):
self._name = name
self._attr = ''
def __str__(self):
return "Test-Object[" + self._name + ", " + self._attr + "]"
def update(self, obj):
self._attr = obj['attr']
__repr__ = __str__
list_a = [Dummy(str(x)) for x in xrange(10)]
list_b = [{'name': str(x), 'attr': str(x*2)} for x in xrange(8, -1, -1)]
extracted_names_a = [x._name for x in list_a]
extracted_names_b = [x['name'] for x in list_b]
filtered_list_a = (x for x in list_a if x._name in extracted_names_b)
filtered_list_b = (x for x in list_b if x['name'] in extracted_names_a)
sorted_list_a = sorted(filtered_list_a, key=lambda k: k._name)
sorted_list_b = sorted(filtered_list_b, key=lambda k: k['name'])
for obj, d in zip(sorted_list_a, sorted_list_b):
obj.update(d)
print(list_a)
This is just a simple example, in the real world there are 2000+ entries and a little bit more data
Your biggest problem is the filtering. For each element of each list, you’re searching the entire other list to see if it exists. This takes quadratic time. If you convert these objects to sets of names, or dicts keyed by name, you can eliminate that quadratic work and make it log-linear.
After that, the sorted is also no longer necessary, and it’s the only reason the code is log-linear, so now it’ll be linear.
While we’re at it, you’re wasting memory, and possibly time, building up a list just to iterate over it in a generator expression in the next line. This becomes even more important if we get rid of the sorted, because then we don’t ever need a list.
So:
dict_a = {x._name: x for x in list_a}
for d in list_b:
try:
dict_a[d['name']].update(d)
except KeyError:
pass
The dict lookup with try/except takes care of filtering out dicts without matching objects, and you don’t need to filter out objects without matching dicts because they just won’t get called.
If there are a lot more dicts than objects, reverse things to make a dict of the dicts and iterate over the objects.
Or, if you can keep the objects in a dict in the first place, instead of keeping them in a list and making a temporary dict just for this code, even the better. And if you can iterate the dicts one by one as you parse then off the server response instead of first building a list of them, you’ll have eliminated all unnecessary large allocations and probably sped things up further.
Instead of storing your objects in a list, you should convert that list into a dict:
objects_by_name = {obj._name: obj for obj in list_a}
This lets you look up the object associated with a name in O(1) time.
Updating all objects is now as easy as iterating over list_b, grabbing the corresponding object from the dict, and calling its update method:
for dic in list_b:
obj = objects_by_name[dic['name']]
obj.update(dic)
Overall this has a time complexity of O(n), whereas your code is O(n log n) because of the sort.
What is the most pythonic way to set a value in a dict if the value is not already set?
At the moment my code uses if statements:
if "timeout" not in connection_settings:
connection_settings["timeout"] = compute_default_timeout(connection_settings)
dict.get(key,default) is appropriate for code consuming a dict, not for code that is preparing a dict to be passed to another function. You can use it to set something but its no prettier imo:
connection_settings["timeout"] = connection_settings.get("timeout", \
compute_default_timeout(connection_settings))
would evaluate the compute function even if the dict contained the key; bug.
Defaultdict is when default values are the same.
Of course there are many times you set primative values that don't need computing as defaults, and they can of course use dict.setdefault. But how about the more complex cases?
dict.setdefault will precisely "set a value in a dict only if the value is not already set".
You still need to compute the value to pass it in as the parameter:
connection_settings.setdefault("timeout", compute_default_timeout(connection_settings))
This is a bit of a non-answer, but I would say the most pythonic is the if statement as you have it. You resisted the urge to one-liner it with __setitem__ or other methods. You've avoided possible bugs in the logic due to existing-but-falsey values which might happen when trying to be clever with short-circuiting and/or hacks. It's immediately obvious that the compute function isn't used when it wasn't necessary.
It's clear, concise, and readable - pythonic.
One way to do this is:
if key not in dict:
dict[key] = value
Since Python 3.9 you can use the merge operator | to merge two dictionaries. The dict on the right takes precedence:
d = { key: value } | d
Note: this creates a new dictionary with the updated values.
You probably need dict.setdefault:
Create a new dictionary and set a value:
>>> d = {}
>>> d.setdefault('timeout', 120)
120
>>> d
{'timeout': 120}
If a value already set, dict.setdefault won't override it:
>>> d['port']=8080
>>> d.setdefault('port', 8888)
8080
>>> d
{'port': 8080, 'timeout': 120}
I'm using the following to modify kwargs to non-default values and pass to another function:
def f( **non_default_kwargs ):
kwargs = {
'a':1,
'b':2,
}
kwargs.update( non_default_kwargs )
f2( **kwargs )
This has the merits that
you don't have to type the keys twice
all is done in a single function
The answer by #Rotareti makes me wonder if for older version of Python then 3.9, we can do:
>>> dict_a = {'a': 1 }
>>> dict_a = {'a': 3, 'b': 2, **dict_a}
>>> dict_a
{'a': 1, 'b': 2}
(Well, it works for sure on Python3.7, but is this Pythonesque enough?)
I found it convenient and obvious to exploit the return of the dict .get() method being None (Falsy), along with or to put off evaluation of an expensive network request if the key was not present.
d = dict()
def fetch_and_set(d, key):
d[key] = ("expensive operation to fetch key")
if not d[key]:
raise Exception("could not get value")
return d[key]
...
value = d.get(key) or fetch_and_set(d, key)
In my case specifically, I was building a new dictionary from a cache then later updating the cache after expediting the fn() call.
Here's a simplified view of my use
j = load(database) # dict
d = dict()
# see if desired keys are in the cache, else fetch
for key in keys:
d[key] = j.get(key) or fetch(key, network_token)
fn(d) # use d for something useful
j.update(d) # update database with new values (if any)
I have an n-layered dict of dicts and want to get the leaf values by a certain series of keys.
So:
example_dict = {'level_one':
{'level_two_a':
{'level_three_a':[1,2,3],
'level_three_b':[4,5,6]
},
'level_two_b':
{'level_three_c':[7,8,9],
'level_three_d':[10,11,12]
}
}
}
Sometimes I will want to query:
example_dict['level_one']['level_two_a']['level_three_a']
other times I need:
example_dict['level_one']['level_two_b']
The real nested dict is very large, so I want to avoid something like:
result_dict = copy.deepcopy(example_dict)
search_key = ['level_one', 'level_two_a']
for term in search_key:
result_dict = copy.deepcopy(result_dict[term])
Is there a more memory efficient method?
Yes, don't create so many copies. Just reference the subdict:
result = example_dict
search_key = ['level_one', 'level_two_a']
for term in search_key:
result = result[term]
As long as you are not altering the result dict, making a copy is pointless. Since you discard the previous copy and make a new one on every iteration, you are wasting CPU time as well as memory.
Even if you did have to modify result and don't want those changes to affect example_dict, you only need to copy the final result value after looping.