Python "import" performance query - python

Well, this query struck my mind when someone pointed out to me that importing a package using import package gives more code readability. Is this actually true? I mean when using this statement as compared to from package import x, y, z, isn't there any overhead of importing the entire package?

I don't expect any performance difference. Whole package will be loaded anyway.
For example:
# load dirname() function from os.path module
>>> from os.path import dirname
#the os.path.basename() was not imported
>>> basename('/foo/bar.txt')
NameError: name 'basename' is not defined
# however, basename() is already available anyway:
dirname.__globals__['basename']('/foo/bar.txt')

Using the point notation is always less performant than importing a function directly and calling it, because the function does have to be searched in the modules dictionary. This counts for every getattr operation.
For example when appending items to a list:
lst = []
for i in xrange(5000):
lst.append(i ** .5 * 2)
This is faster:
lst = []
append = lst.append
for i in xrange(5000):
append(i ** .5 * 2)
This can make a real heavy difference.
>>> def bad():
... lst = []
... for i in xrange(500):
... lst.append(i ** .5 * 2)
>>> def good():
... lst = []
... append = lst.append
... for i in xrange(500):
... append(i ** .5 * 2)
>>> from timeit import timeit
>>> timeit("bad()", "from __main__ import bad", number = 1000)
0.175249130875
>>> timeit("good()", "from __main__ import good", number = 1000)
0.146750989286

The performance will be same either way. The entire module is compiled, if needed, and the code executed, the first time you import a module, no matter how you import it.

Which is more readable
from os.path import split, join
then a bunch of split and join calls that would accidentally be read as the string methods of the same name, or
import os.path
then referencing them as os.path.split and os.path.join? Here, it's clear they're methods dealing with paths.
Either way, you have to actually load the whole module. Otherwise, things that you imported that depended on other things in the module you didn't import wouldn't work.

Related

Insert newline after equals sign in self documenting f-string in python3.8

With python3.8, a new feature is self documenting format strings. Where one would normally do this:
>>> x = 10.583005244
>>> print(f"x={x}")
x=10.583005244
>>>
One can now do this, with less repetition:
>>> x = 10.583005244
>>> print(f"{x=}")
x=10.583005244
>>>
This works very well for one line string representations. But consider the following scenario:
>>> import numpy as np
>>> some_fairly_long_named_arr = np.random.rand(4,2)
>>> print(f"{some_fairly_long_named_arr=}")
some_fairly_long_named_arr=array([[0.05281443, 0.06559171],
[0.13017109, 0.69505908],
[0.60807431, 0.58159127],
[0.92113252, 0.4950851 ]])
>>>
Here, the first line does not get aligned, which is (arguably) not desirable. I would rather prefer the output of the following:
>>> import numpy as np
>>> some_fairly_long_named_arr = np.random.rand(4,2)
>>> print(f"some_fairly_long_named_arr=\n{some_fairly_long_named_arr!r}")
some_fairly_long_named_arr=
array([[0.06278696, 0.04521056],
[0.33805303, 0.17155518],
[0.9228059 , 0.58935207],
[0.80180669, 0.54939958]])
>>>
Here, the first line of the output is aligned as well, but it defeats the purpose of not repeating the variable name twice in the print statement.
The example is a numpy array, but it could have been a pandas dataframe etc. as well.
Hence, my question is: Can a newline character be inserted after the = sign in self documenting strings?
I tried to add it like this, but it does not work:
>>> print(f"{some_fairly_long_named_arr=\n}")
SyntaxError: f-string expression part cannot include a backslash
I read the docs on format-specification-mini-language, but most of the formatting there only works for simple data types like integers, and I was not able to achieve what I wanted using those that work.
Sorry for the long write-up.
Wouldn't recommend this at all, but for possibility's sake:
import numpy as np
_old_array2string = np.core.arrayprint._array2string
def _array2_nice_string(*args, **kwargs):
non_nice_string = _old_array2string(*args, **kwargs)
dimension_strings = non_nice_string.split("\n")
if len(dimension_strings) > 1:
dimension_string = dimension_strings[1]
dimension_indent = len(dimension_string) - len(dimension_string.lstrip())
return "\n" + " " * dimension_indent + non_nice_string
return non_nice_string
np.core.arrayprint._array2string = _array2_nice_string
Outputs for:
some_fairly_long_named_arr = np.random.rand(2, 2)
print(f"{some_fairly_long_named_arr=}")
some_fairly_long_named_arr=array(
[[0.95900608, 0.79367873],
[0.58616975, 0.17757661]])
and
some_fairly_long_named_arr = np.random.rand(1, 2)
print(f"{some_fairly_long_named_arr=}")
some_fairly_long_named_arr=array([[0.62492772, 0.80453153]]).
I made it so if if the first dimension is 1, it is kept on the same line.
There is a non-internal method np.array2string that I tried to re-assign, but I never got that working. If someone could find a way to re-assign that public function instead of this internally used one, I'd imagine that'd make this solution a lot cleaner.
I figured out a way to accomplish what I wanted, after reading through the CPython source:
import numpy as np
some_fairly_long_named_arr = np.random.rand(4, 2)
print(f"""{some_fairly_long_named_arr =
}""")
Which produces:
some_fairly_long_named_arr =
array([[0.23560777, 0.96297907],
[0.18882751, 0.40712246],
[0.61351814, 0.1981144 ],
[0.27115495, 0.72303859]])
I would rather prefer a solution that worked in a single line, but this seems to be the only way for now. Perhaps another way will be implemented in a later python version.
However note that the indentation on the continuation line has to be removed for the above mentioned method, as such:
# ...some code with indentation...
print(f"""{some_fairly_long_named_arr =
}""")
# ...more code with indentation...
Otherwise, the alignment of the first line is broken again.
I tried using inspect.cleandoc and textwrap.dedent to alleviate this, but could not manage to fix the indentation issue. But perhaps this is the subject of another question.
Edit: After reading this article, I found a single line solution:
f_str_nl = lambda object: f"{chr(10) + str(object)}" # add \n directly
# f_str_nl = lambda object: f"{os.linesep + str(object)}" # add \r\n on windows
print(f"{f_str_nl(some_fairly_long_named_arr) = !s}")
which outputs:
f_str_nl(some_fairly_long_named_arr) =
[[0.26616956 0.59973262]
[0.86601261 0.10119292]
[0.94125617 0.9318651 ]
[0.10401072 0.66893025]]
The only caveat is that the name of the object gets prepended by the name of the custom lambda function, f_str_nl.
I also found that a similar question was already asked here.

Problem importing modules and functions in Python

I have two files: in one of them (named myrandom) I have defined a function called spinner that would choose a random number from 1 to 6 and would return its value. In the second file, named main, I have imported the first one (as a module) and have also called the spinner function.
This is the code of the file myrandom:
def spinner():
import random
val = random.choice([1, 2, 3, 4, 5, 6])
return val
And this is the code of main:
import myrandom
x = spinner()
print(x)
My problem is that when I run main, I get the following error message: "NameError: name spinner() is not defined". I don't know why I'm getting this error, since I have other files and modules with similar characteristics that run without problems.
Any idea?
You need to use it like:
import myrandom
x = myrandom.spinner()
Or import directly:
from myrandom import spinner
x = spinner()
Or use star import:
from myrandom import *
x = spinner()
You should import it either like this:
import myrandom
x = myrandom.spinner()
or like this:
from myrandom import spinner
x = spinner()
or like this:
from myrandom import *
x = spinner()
An explanation of the different ways of importing can be found here: Importing modules in Python - best practice

How to get "name" from module when using "import module as name"

I can't seem to find where the actual name that a module has been bound to is stored. For example:
import re as my_re
print my_re.__name__ # Output is "re," not "my_re"
I would like to be able to get the name that I imported the module as rather than the actual name of the module.
My use case is that I have a function that takes a function object as an argument and needs to be able to determine what name it is bound to. Here is a more thorough example:
import module as my_module
def my_func(in_func):
print in_func.__bound-name__ # Or something to this effect
my_func(my_module.function1) # Should print "my_module.function1"
I would pass the module name as string and then use globals() to fetch the module for use within the function. Suppose you pass 'np' to the function, then globals()['np'] will return the function.
In [22]: import numpy as np
In [23]: def demo(A):
...: a = globals()[A]
...: print(a.array([i for i in range(10)]))
...:
In [24]: demo('np')
[0 1 2 3 4 5 6 7 8 9]
There is no way to do exactly what you want because string my_re is not stored anywhere, it is only a name of a variable. PEP221 which proposed the syntax for import ... as statement explains that the following lines are equal:
import re as my_re
and
import re
my_re = re
del re

Convert python objects to python AST-nodes

I have a need to dump the modified python object back into source. So I try to find something to convert real python object to python ast.Node (to use later in astor lib to dump source)
Example of usage I want, Python 2:
import ast
import importlib
import astor
m = importlib.import_module('something')
# modify an object
m.VAR.append(123)
ast_nodes = some_magic(m)
source = astor.dump(ast_nodes)
Please help me to find that some_magic
There's no way to do what you want, because that's not how ASTs work.
When the interpreter runs your code, it will generate an AST out of the source files, and interpret that AST to generate python objects.
What happen to those objects once they've been generated has nothing to do with the AST.
It is however possible to get the AST of what generated the object in the first place.
The module inspect lets you get the source code of some python objects:
import ast
import importlib
import inspect
m = importlib.import_module('pprint')
s = inspect.getsource(m)
a = ast.parse(s)
print(ast.dump(a))
# Prints the AST of the pprint module
But getsource() is aptly named.
If I were to change the value of some variable (or any other object) in m, it wouldn't change its source code.
Even if it was possible to regenerate an AST out of an object, there wouldn't be a single solution some_magic() could return.
Imagine I have a variable x in some module, that I reassign in another module:
# In some_module.py
x = 0
# In __main__.py
m = importlib.import_module('some_module')
m.x = 1 + 227
Now, the value of m.x is 228, but there's no way to know what kind of expression led to that value (well, without reading the AST of __main__.py but this would quickly get out of hand). Was it a mere literal? The result of a function call?
If you really have to get a new AST after modifying some value of a module, the best solution would be to transform the original AST by yourself.
You can find where your identifier got its value, and replace the value of the assignment with whatever you want.
For instance, in my small example x = 0 is represented by the following AST:
Assign(targets=[Name(id='x', ctx=Store())], value=Num(n=0))
And to get the AST matching the reassignment I did in __main__.py, I would have to change the value of the above Assign node as the following:
value=BinOp(left=Num(n=1), op=Add(), right=Num(n=227))
If you'd like to go that way, I recommend you check Python's documentation of the AST node transformer (ast.NodeTransformer), as well as this excellent manual that documents all the nodes you can meet in Python ASTs Green Tree Snakes - the missing Python AST docs.
What Vladimir is asking about is certainly useful for compiler optimizations. Indeed, there are ways to accomplish that using the ast library. Here is a simple example demonstrating evaluation of constant functions:
from ast import *
import numpy as np
PURE_FUNS = {'arange' : np.arange}
PROG = '''
A=arange(5)
B=[0, 1, 2, 3, 4]
A[2:3] = 1
C = [A[1], 2, m]
'''
def py_to_ast(o):
if type(o) == np.ndarray:
return List(elts=[py_to_ast(e) for e in o], ctx=Load())
elif type(o) == np.int64:
return Constant(value=o)
# Add elifs for more types here
else:
assert False
class EvalPureFuns(NodeTransformer):
def visit_Call(self, node):
is_const_args = all(type(a) == Constant for a in node.args)
if node.func.id in PURE_FUNS and is_const_args:
res = eval(unparse(node), PURE_FUNS)
return py_to_ast(res)
return node
node = parse(PROG)
node = EvalPureFuns().visit(node)
print(unparse(node))

Modules as function arguments

I have a python script that starts with importing a python module that contains data. A very simplified example is given below:
my_data1.py
bar = 10
baz = [1, 5, 7]
...
my_func.py
from my_data1 import *
def foo():
'''
function that uses the things defined in data (scalar, list, dicts, etc.)
in my_data
'''
return [bar] + baz
This works great for one set of data; however, I have my_data1.py, ..., my_data36.py.
my_data36.py
bar = 31
baz = [-1, 58, 8]
...
that I want to import and then run foo() with that data. I wanted to do something like this:
def foo(my_data):
from my_data import *
results = []
for i in range(1,37):
results.append(foo('my_data{}'.format(i)))
This doesn't work. Ideas?
Use __import__. It takes a string as parameter identifying the module to be imported, and returns the module, which then you can pass as an argument to your functions.
def processDataSet (module):
print (module.baz)
for m in ['data1.py', 'data2.py', 'data69.py']:
processDataSet (__import__ (m) )
"from module import * is invalid inside function definitions." from http://docs.python.org/2/howto/doanddont.html#inside-function-definitions

Categories

Resources