Performance implications of unpacking dictionaries in Python

Performance implications of unpacking dictionaries in Python - python

I have a chunk of code that takes a series of variables and passes them to N number of modules. To simplify readability of code rather than passing the variables over and over again I created a dictionary and unpack that to modules as follows:
message_package = {
'v1' : v1,
'v2' : v2,
'v3' : v3
}
for mod in mods:
mod.f1(**message_package)
[...]
if condition:
mod.f2(**message_package)
Each module then grabs variables they need and ignore the rest:
def mod1.f1(v1=None,**kwargs):
do_something()
From a readability/usability standpoint I find this quite nice -- variables are immediately available without having to pull them out of **kwargs, and if I add a variable to the message package it's only one line and I don't have to update all modules.
As I'm somewhat new to Python I'm wondering... is this very unpythonic? Is there a big performance impact from constantly unpacking these dictionaries and/or is there a better way to do this?

Thanks to all the comments above. I followed martijn's suggestion and ran a simple test using timeit.
The results using my data are as follows:
>>> timeit.timeit('passdict()',setup=setup,number=1000000)
0.1841774140484631
>>> timeit.timeit('unpack()',setup=setup,number=1000000)
0.43643336702371016
>>>
Looks like Cyphase was correct that there would only be a performance issue if I were doing this a "whoooole lot" -- unpacking is twice as slow as passing a dictionary, but only costs 250ms over 1M iterations. For me this is negligible as I'm only dealing with 5-10 calls in one function.

Related

Best practice: how to pass many arguments to a function?

I am running some numerical simulations, in which my main function must receive lots and lots of arguments - I'm talking 10 to 30 arguments depending on the simulation to run.
What are some best practices to handle cases like this? Dividing the code into, say, 10 functions with 3 arguments each doesn't sound very feasible in my case.
What I do is create an instance of a class (with no methods), store the inputs as attributes of that instance, then pass the instance - so the function receives only one input.
I like this because the code looks clean, easy to read, and because I find it easy to define and run alternative scenarios.
I dislike it because accessing class attributes within a function is slower than accessing a local variable (see: How / why to optimise code by copying class attributes to local variables?) and because it is not an efficient use of memory - too much data stored multiple times unnecessarily.
Any thoughts or recommendations?
myinput=MyInput()
myinput.input_sql_table = that_sql_table
myinput.input_file = that_input_file
myinput.param1 = param1
myinput.param2 = param2
myoutput = calc(myinput)
Alternative scenarios:
inputs=collections.OrderedDict()
scenarios=collections.OrderedDict()
inputs['base scenario']=copy.deepcopy(myinput)
inputs['param2 = 100']=copy.deepcopy(myinput)
inputs['param2 = 100'].param2 = 100
# loop through all the inputs and stores the outputs in the ordered dictionary scenarios

I don't think this is really a StackOverflow question, more of a Software Engineering question. For example check out this question.
As far as whether or not this is a good design pattern, this is an excellent way to handle a large number of arguments. You mentioned that this isn't very efficient in terms of memory or speed, but I think you're making an improper micro-optimization.
As far as memory is concerned, the overhead of running the Python interpreter is going to dwarf the couple of extra bytes used by instantiating your class.
Unless you have run a profiler and determined that accessing members of that options class is slowing you down, I wouldn't worry about it. This is especially the case because you're using Python. If speed is a real concern, you should be using something else.
You may not be aware of this, but most of the large scale number crunching libraries for Python aren't actually written in Python, they're just wrappers around C/C++ libraries that are much faster.
I recommend reading this article, it is well established that "Premature optimization is the root of all evil".

You could pass in a dictionary like so:
all_the_kwargs = {kwarg1: 0, kwarg2: 1, kwargN: xyz}
some_func_or_class(**all_the_kwargs)
def some_func_or_class(kwarg1: int = -1, kwarg2: int = 0, kwargN: str = ''):
print(kwarg1, kwarg2, kwargN)
Or you could use several named tuples like referenced here: Type hints in namedtuple
also note that depending on which version of python you are using there may be a limit to the number of arguments you can pass into a function call.
Or you could use just a dictionary:
def some_func(a_dictionary):
a_dictionary.get('argXYZ', None) # defaults to None if argXYZ doesn't exist

Can `del` make Python faster?

As a programmer, I generally try to avoid the del statement because it is often an extra complication that Python program don't often need. However, when browsing the standard library (threading, os, etc...) and the pseudo-standard library (numpy, scipy, etc...) I see it used a non-zero amount of times, and I'd like to better understand when it is/isn't appropriate the del statement.
Specifically, I'm curious about the relationship between the Python del statement and the efficiency of a Python program. It seems to me that del might help a program run faster by reducing the amount of clutter lookup instructions need to sift through. However, I can also see a world where the extra instruction takes up more time than it saves.
My question is: does anyone have any interesting code snippets that demonstrate cases where del significantly changes the speed of the program? I'm most interested in cases where del improves the execution speed of a program, although non-trivial cases where del can really hurt are also interesting.

The main reason that standard Python libraries use del is not for speed but for namespace decluttering ("avoiding namespace pollution" is another term I believe I have seen for this). As user2357112 noted in a comment, it can also be used to break a traceback cycle.
Let's take a concrete example: line 58 of types.py in the cpython implementation reads:
del sys, _f, _g, _C, _c, # Not for export
If we look above, we find:
def _f(): pass
FunctionType = type(_f)
LambdaType = type(lambda: None) # Same as FunctionType
CodeType = type(_f.__code__)
MappingProxyType = type(type.__dict__)
SimpleNamespace = type(sys.implementation)
def _g():
yield 1
GeneratorType = type(_g())
_f and _g are two of the names being deled; as the comment says, they are "not for export".1
You might think this is covered via:
__all__ = [n for n in globals() if n[:1] != '_']
(which is near the end of that same file), but as What's the python __all__ module level variable for? (and the linked Can someone explain __all__ in Python?) note, these affect the names exported via from types import *, rather than what's visible via import types; dir(types).
It's not necessary to clean up your module namespace, but doing so prevents people from sneaking into it and using undefined items. So it's good for a couple of purposes.
1Looks like someone forgot to update this to include _ag. _GeneratorWrapper is harder to hide, unfortunately.

Specifically, I'm curious about the relationship between the Python del statement and the efficiency of a Python program.
As far as performance is concerned, del (excluding index deletion like del x[i]) is primarily useful for GC purposes. If you have a variable pointing to some large object that is no longer needed, deling that variable will (assuming there are no other references to it) deallocate that object (with CPython this happens immediately, as it uses reference counting). This could make the program faster if you'd otherwise be filling your RAM/caches; only way to know is to actually benchmark it.
It seems to me that del might help a program run faster by reducing the amount of clutter lookup instructions need to sift through.
Unless you're using thousands of variables (which you shouldn't be), it's exceedingly unlikely that removing variables using del will make any noticeable difference in performance.

Python - get object - Is dictionary or if statements faster?

I am making a POST to a python script, the POST has 2 parameters. Name and Location, and then it returns one string. My question is I am going to have 100's of these options, is it faster to do it in a dictionary like this:
myDictionary = {"Name":{"Location":"result", "LocationB":"resultB"},
"Name2":{"Location2":"result2A", "Location2B":"result2B"}}
And then I would use.get("Name").get("Location") to get the results
OR do something like this:
if Name = "Name":
if Location = "Location":
result = "result"
elif Location = "LocationB":
result = "resultB"
elif Name = "Name2":
if Location = "Location2B":
result = "result2A"
elif Location = "LocationB":
result = "result2B"
Now if there are hundreds or thousands of these what is faster? Or is there a better way all together?

First of all:
Generally, it's much more pythonic to match keys to values using dictionaries. You should do that from a point of style.
Secondly:
If you really care about performance, python might not always be the optimal tool. However, the dict approach should be much much faster, unless your selections happen about as often as the creation of these dicts. The creation of thousands and thousands of PyObjects to check your case is a really bad idea.
Thirdly:
If you care about your application so much, you might really want to benchmark both solutions -- as usual when it comes to performance questions, there's a million factors including your computing platform that only experiments will help to sort out
Fourth(ly?):
It looks like you're building something like a protocol parser. That's really not python's forte, performance-wise. Maybe you'd want to look into one of the dozens of tools that can write C code parsers for you and wrap that in a native module, it's pretty sure to be faster than either of your implementations, if done right.
Here's the python documentation on Extending Python with C or C++

I decided to test the two scenarios of 1000 Names and 2 locations
The Test Samples
Team Dictionary:
di = {}
for i in range(1000):
di["Name{}".format(i)] = {'Location': 'result{}'.format(i), 'LocationB':'result{}B'.format(i)}
def get_dictionary_value():
di.get("Name999").get("LocationB")
Team If Statement:
I used a python script to generate a 5000 line function if_statements(name, location): following this pattern
elif name == 'Name994':
if location == 'Location':
return 'result994'
elif location == 'LocationB':
return 'result994B'
# Some time later ...
def get_if_value():
if_statements("Name999", "LocationB")
Timing Results
You can time with the timeit function to test the time it takes a function to complete.
import timeit
print(timeit.timeit(get_dictionary_value))
# 0.06353...
print(timeit.timeit(get_if_value))
# 6.3684...
So there you have it, dictionary was 100 times faster on my machine than the hefty 165 KB if-statement function.

I will root for dict().
In most cases [key] selection is much faster than conditional checks. Rule of thumb conditionals are generally used for boolean statements.
The reason for this is; when you create a dictionary you essentially create a registry of that data which is stored in as hashes in a bucket. When you say for instance my dictonary_name['key'] if that value exist python knows the exact location of that value and returns it in almost in an instant.
However conditionals are different. Conditionals are sequential checks meaning worse case it has to check every condition provided to first establish the value's existence then it has return the respective data.
As you can see with 100's of statements this can be problematic. Though in this case dictionaries are faster. You also need to be cognizant of how often and how quickly these checks are. Because if they are faster than the the building of your dictionary you might get an error of value not found.

What is the purpose of collections.ChainMap?

In Python 3.3 a ChainMap class was added to the collections module:
A ChainMap class is provided for quickly linking a number of mappings
so they can be treated as a single unit. It is often much faster than
creating a new dictionary and running multiple update() calls.
Example:
>>> from collections import ChainMap
>>> x = {'a': 1, 'b': 2}
>>> y = {'b': 10, 'c': 11}
>>> z = ChainMap(y, x)
>>> for k, v in z.items():
print(k, v)
a 1
c 11
b 10
It was motivated by this issue and made public by this one (no PEP was created).
As far as I understand, it is an alternative to having an extra dictionary and maintaining it with update()s.
The questions are:
What use cases does ChainMap cover?
Are there any real world examples of ChainMap?
Is it used in third-party libraries that switched to python3?
Bonus question: is there a way to use it on Python2.x?
I've heard about it in Transforming Code into Beautiful, Idiomatic Python PyCon talk by Raymond Hettinger and I'd like to add it to my toolkit, but I lack in understanding when should I use it.

I like #b4hand's examples, and indeed I have used in the past ChainMap-like structures (but not ChainMap itself) for the two purposes he mentions: multi-layered configuration overrides, and variable stack/scope emulation.
I'd like to point out two other motivations/advantages/differences of ChainMap, compared to using a dict-update loop, thus only storing the "final" version":
More information: since a ChainMap structure is "layered", it supports answering question like: Am I getting the "default" value, or an overridden one? What is the original ("default") value? At what level did the value get overridden (borrowing #b4hand's config example: user-config or command-line-overrides)? Using a simple dict, the information needed for answering these questions is already lost.
Speed tradeoff: suppose you have N layers and at most M keys in each, constructing a ChainMap takes O(N) and each lookup O(N) worst-case[*], while construction of a dict using an update-loop takes O(NM) and each lookup O(1). This means that if you construct often and only perform a few lookups each time, or if M is big, ChainMap's lazy-construction approach works in your favor.
[*] The analysis in (2) assumes dict-access is O(1), when in fact it is O(1) on average, and O(M) worst case. See more details here.

I could see using ChainMap for a configuration object where you have multiple scopes of configuration like command line options, a user configuration file, and a system configuration file. Since lookups are ordered by the order in the constructor argument, you can override settings at lower scopes. I've not personally used or seen ChainMap used, but that's not surprising since it is a fairly recent addition to the standard library.
It might also be useful for emulating stack frames where you push and pop variable bindings if you were trying to implement a lexical scope yourself.
The standard library docs for ChainMap give several examples and links to similar implementations in third-party libraries. Specifically, it names Django’s Context class and Enthought's MultiContext class.

I'll take a crack at this:
Chainmap looks like a very just-so kind of abstraction. It's a good solution for a very specialized kind of problem. I propose this use case.
If you have:
multiple mappings (e.g, dicts)
some duplication of keys in those mappings (same key can appear in multiple mappings, but not the case that all keys appear in all mappings)
a consuming application which wishes to access the value of a key in the "highest priority" mapping where there is a total ordering over all the mappings for any given key (that is, mappings may have equal priority, but only if it is known that there are no duplications of key within those mappings) (In the Python application, packages can live in the same directory (same priority) but must have different names, so, by definition, the symbol names in that directory cannot be duplicates.)
the consuming application does not need to change the value of a key
while at the same time the mappings must maintain their independent identity and can be changed asynchronously by an external force
and the mappings are big enough, expensive enough to access, or change often enough between application accesses, that the cost of computing the projection (3) each time your app needs it is a significant performance concern for your application...
Then,
you might consider using a chainmap to create a view over the collection of mappings.
But this is all after-the-fact justification. The Python guys had a problem, came up with a good solution in the context of their code, then did some extra work to abstract their solution so we could use it if we choose. More power to them. But whether it's appropriate for your problem is up to you to decide.

To imperfectly answer your:
Bonus question: is there a way to use it on Python2.x?
from ConfigParser import _Chainmap as ChainMap
However keep in mind that this isn't a real ChainMap, it inherits from DictMixin and only defines:
__init__(self, *maps)
__getitem__(self, key)
keys(self)
# And from DictMixin:
__iter__(self)
has_key(self, key)
__contains__(self, key)
iteritems(self)
iterkeys(self)
itervalues(self)
values(self)
items(self)
clear(self)
setdefault(self, key, default=None)
pop(self, key, *args)
popitem(self)
update(self, other=None, **kwargs)
get(self, key, default=None)
__repr__(self)
__cmp__(self, other)
__len__(self)
Its implementation also doesn't seem particularly efficient.

Learning Python from Ruby; Differences and Similarities

I know Ruby very well. I believe that I may need to learn Python presently. For those who know both, what concepts are similar between the two, and what are different?
I'm looking for a list similar to a primer I wrote for Learning Lua for JavaScripters: simple things like whitespace significance and looping constructs; the name of nil in Python, and what values are considered "truthy"; is it idiomatic to use the equivalent of map and each, or are mumble somethingaboutlistcomprehensions mumble the norm?
If I get a good variety of answers I'm happy to aggregate them into a community wiki. Or else you all can fight and crib from each other to try to create the one true comprehensive list.
Edit: To be clear, my goal is "proper" and idiomatic Python. If there is a Python equivalent of inject, but nobody uses it because there is a better/different way to achieve the common functionality of iterating a list and accumulating a result along the way, I want to know how you do things. Perhaps I'll update this question with a list of common goals, how you achieve them in Ruby, and ask what the equivalent is in Python.

Here are some key differences to me:
Ruby has blocks; Python does not.
Python has functions; Ruby does not. In Python, you can take any function or method and pass it to another function. In Ruby, everything is a method, and methods can't be directly passed. Instead, you have to wrap them in Proc's to pass them.
Ruby and Python both support closures, but in different ways. In Python, you can define a function inside another function. The inner function has read access to variables from the outer function, but not write access. In Ruby, you define closures using blocks. The closures have full read and write access to variables from the outer scope.
Python has list comprehensions, which are pretty expressive. For example, if you have a list of numbers, you can write
[x*x for x in values if x > 15]
to get a new list of the squares of all values greater than 15. In Ruby, you'd have to write the following:
values.select {|v| v > 15}.map {|v| v * v}
The Ruby code doesn't feel as compact. It's also not as efficient since it first converts the values array into a shorter intermediate array containing the values greater than 15. Then, it takes the intermediate array and generates a final array containing the squares of the intermediates. The intermediate array is then thrown out. So, Ruby ends up with 3 arrays in memory during the computation; Python only needs the input list and the resulting list.
Python also supplies similar map comprehensions.
Python supports tuples; Ruby doesn't. In Ruby, you have to use arrays to simulate tuples.
Ruby supports switch/case statements; Python does not.
Ruby supports the standard expr ? val1 : val2 ternary operator; Python does not.
Ruby supports only single inheritance. If you need to mimic multiple inheritance, you can define modules and use mix-ins to pull the module methods into classes. Python supports multiple inheritance rather than module mix-ins.
Python supports only single-line lambda functions. Ruby blocks, which are kind of/sort of lambda functions, can be arbitrarily big. Because of this, Ruby code is typically written in a more functional style than Python code. For example, to loop over a list in Ruby, you typically do
collection.each do |value|
...
end
The block works very much like a function being passed to collection.each. If you were to do the same thing in Python, you'd have to define a named inner function and then pass that to the collection each method (if list supported this method):
def some_operation(value):
...
collection.each(some_operation)
That doesn't flow very nicely. So, typically the following non-functional approach would be used in Python:
for value in collection:
...
Using resources in a safe way is quite different between the two languages. Here, the problem is that you want to allocate some resource (open a file, obtain a database cursor, etc), perform some arbitrary operation on it, and then close it in a safe manner even if an exception occurs.
In Ruby, because blocks are so easy to use (see #9), you would typically code this pattern as a method that takes a block for the arbitrary operation to perform on the resource.
In Python, passing in a function for the arbitrary action is a little clunkier since you have to write a named, inner function (see #9). Instead, Python uses a with statement for safe resource handling. See How do I correctly clean up a Python object? for more details.

I, like you, looked for inject and other functional methods when learning Python. I was disappointed to find that they weren't all there, or that Python favored an imperative approach. That said, most of the constructs are there if you look. In some cases, a library will make things nicer.
A couple of highlights for me:
The functional programming patterns you know from Ruby are available in Python. They just look a little different. For example, there's a map function:
def f(x):
return x + 1
map(f, [1, 2, 3]) # => [2, 3, 4]
Similarly, there is a reduce function to fold over lists, etc.
That said, Python lacks blocks and doesn't have a streamlined syntax for chaining or composing functions. (For a nice way of doing this without blocks, check out Haskell's rich syntax.)
For one reason or another, the Python community seems to prefer imperative iteration for things that would, in Ruby, be done without mutation. For example, folds (i.e., inject), are often done with an imperative for loop instead of reduce:
running_total = 0
for n in [1, 2, 3]:
running_total = running_total + n
This isn't just a convention, it's also reinforced by the Python maintainers. For example, the Python 3 release notes explicitly favor for loops over reduce:
Use functools.reduce() if you really need it; however, 99 percent of the time an explicit for loop is more readable.
List comprehensions are a terse way to express complex functional operations (similar to Haskell's list monad). These aren't available in Ruby and may help in some scenarios. For example, a brute-force one-liner to find all the palindromes in a string (assuming you have a function p() that returns true for palindromes) looks like this:
s = 'string-with-palindromes-like-abbalabba'
l = len(s)
[s[x:y] for x in range(l) for y in range(x,l+1) if p(s[x:y])]
Methods in Python can be treated as context-free functions in many cases, which is something you'll have to get used to from Ruby but can be quite powerful.
In case this helps, I wrote up more thoughts here in 2011: The 'ugliness' of Python. They may need updating in light of today's focus on ML.

My suggestion: Don't try to learn the differences. Learn how to approach the problem in Python. Just like there's a Ruby approach to each problem (that works very well givin the limitations and strengths of the language), there's a Python approach to the problem. they are both different. To get the best out of each language, you really should learn the language itself, and not just the "translation" from one to the other.
Now, with that said, the difference will help you adapt faster and make 1 off modifications to a Python program. And that's fine for a start to get writing. But try to learn from other projects the why behind the architecture and design decisions rather than the how behind the semantics of the language...

I know little Ruby, but here are a few bullet points about the things you mentioned:
nil, the value indicating lack of a value, would be None (note that you check for it like x is None or x is not None, not with == - or by coercion to boolean, see next point).
None, zero-esque numbers (0, 0.0, 0j (complex number)) and empty collections ([], {}, set(), the empty string "", etc.) are considered falsy, everything else is considered truthy.
For side effects, (for-)loop explicitly. For generating a new bunch of stuff without side-effects, use list comprehensions (or their relatives - generator expressions for lazy one-time iterators, dict/set comprehensions for the said collections).
Concerning looping: You have for, which operates on an iterable(! no counting), and while, which does what you would expect. The fromer is far more powerful, thanks to the extensive support for iterators. Not only nearly everything that can be an iterator instead of a list is an iterator (at least in Python 3 - in Python 2, you have both and the default is a list, sadly). The are numerous tools for working with iterators - zip iterates any number of iterables in parallel, enumerate gives you (index, item) (on any iterable, not just on lists), even slicing abritary (possibly large or infinite) iterables! I found that these make many many looping tasks much simpler. Needless to say, they integrate just fine with list comprehensions, generator expressions, etc.

In Ruby, instance variables and methods are completely unrelated, except when you explicitly relate them with attr_accessor or something like that.
In Python, methods are just a special class of attribute: one that is executable.
So for example:
>>> class foo:
... x = 5
... def y(): pass
...
>>> f = foo()
>>> type(f.x)
<type 'int'>
>>> type(f.y)
<type 'instancemethod'>
That difference has a lot of implications, like for example that referring to f.x refers to the method object, rather than calling it. Also, as you can see, f.x is public by default, whereas in Ruby, instance variables are private by default.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.