does anybody know if there is a performance difference between having all imports from one module in a single line vs one per line.
For example, having:
from a import A, B, C, D, E, F, G
instead of:
from a import A
from a import B
from a import C
from a import D
from a import E
from a import F
from a import G
I'm trying to convince my team to use reorder-python-imports in our pre-commit hooks and this doubt is the only obstacle that prevents me from adding it.
Combining imports is technically faster but there should not be a noticeable performance difference in real world usage. Python modules execute only once, on their first time being imported. After being initialized, you only incur the negligible cost of the additional import statements themselves.
As long as you only do top-level imports, your imports will only ever execute during startup anyway. By combining them you might at best manage to shave off a negligible amount of milliseconds during startup. Read the Python docs on the import mechanism.
Here is my machine's performance after a million repetitions each:
test1.py
from timeit import timeit
print(timeit("""
from socket import socket
from socket import create_connection
from socket import has_dualstack_ipv6
from socket import getaddrinfo
from socket import gethostbyaddr
"""))
# Prints 2.8450163
test2.py
from timeit import timeit
print(timeit("""
from socket import socket, create_connection, has_dualstack_ipv6, getaddrinfo, gethostbyaddr
"""))
# Prints 0.6992155
Related
I was reviewing code and updating the import statements based on general guidelines, changing “from xxx import *” to “from xxx import m, n, p”. The difference in script execution time, however, was noticeable:
From collections import OrderedDict
From definitions import *
Average script time 22ms
From collections import OrderedDict
From definitions import a, b, c, d, e, f
Average script time 48ms
The topic of import performance has been taken up several times on SE, and these results seem to run counter to some answers. Why would the import statement in this case cause such a significant difference in the script performance?
A follow-up question: This package uses a "definitions.py" to store the package general purpose (mostly static) classes and functions. What is the best way to import all classes from a module, without needing to prefix with "definitions." every time they are used?
EDIT More information... Curiouser and Curiouser
Script timing is done using time.clock() over >50 iterations
It turns out that OrderedDict is already imported in definitions, so when I import it from there then the script time is back down:
From definitions import a, b, c, d, e, f, OrderedDict
Average script time 22ms
Just to thicken the plot, there was also a "from System import Array" statement though this has no effect on the script time. The way that the script imports OrderedDict appears to be the issue.
I trying to write GUI tests using the Linux Desktop Testing Project (ldtp). It seems to work, except that I get long delays at unpredictable times.
For example, when I try:
import os
from ldtp import *
from ldtputils import *
from time import sleep
launchapp('gedit')
waittillguiexist('*-gedit')
ldtp.selectmenuitem ('*-gedit', 'mnuFile;mnuQuit')
It takes more than 30 sec to execute the line "ldtp.selectmenuitem ('*-gedit', 'mnuFile;mnuQuit')"
I feel more time is taken during following lines.
from ldtp import *
from ldtputils import *
There are two ways to improve the performance;
1st : Don't use from ldtp import * instead use import import ldtp. You need to use ldtp.<> everytime.
2nd: If you are using only one function (say selectmenuitem) then use from ldtp import selectmenuitem at the top.
what is the best way to import an module if it is not needed in all time?
Should i import the module in the head of the file without a condition or should i import it with a condition?
Will the import slow down the application with the import in the head?
For example:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from app.settings import CONDITION
from foo.bar import myClass
if CONDITION:
# ... do some action with myClass
or:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from app.settings import CONDITION
if CONDITION:
from foo.bar import myClass
# ... do some action with myClass
Per the PEP 8, you should place all your import statements at the top of the file, and I agree with this, even if you are only going to use it once in a function.
Your code can be a bit unreadable if your import statements are scattered across your code.
As for if the imports will slow your script down: probably. But not in any great numbers that you should really be worrying about.
with the import at the head, whatever milliseconds import takes will be during the startup of your program. Its better than making your program stop to import when a certain condition is active.
Also, importing at the top makes for clean code.
Your second way of importing is probably better if you only occasionally need the module. Especially if the module does some heavy initialization works.
What import does is calling the builtin function __import__(name), see details.
if True:
import os
is equivalent to:
if True:
os = __import__('os')
And the best part is that the result of __import__ is cached, so you don't need to worry that by calling it multiple times you would end up parsing the module multiple times.
EDIT: The other answers do have good points, that it is cleaner to have it on top and if the condition is ever evaluated to True, you end up paying the price sooner or later.
I guess it depends on your specific use case too. For example, often times we want to choose one of the implementations of a particular module, we do:
try:
import simplejson as json
except ImportError:
import json
The question of how to speed up importing of Python modules has been asked previously (Speeding up the python "import" loader and Python -- Speed Up Imports?) but without specific examples and has not yielded accepted solutions. I will therefore take up the issue again here, but this time with a specific example.
I have a Python script that loads a 3-D image stack from disk, smooths it, and displays it as a movie. I call this script from the system command prompt when I want to quickly view my data. I'm OK with the 700 ms it takes to smooth the data as this is comparable to MATLAB. However, it takes an additional 650 ms to import the modules. So from the user's perspective the Python code runs at half the speed.
This is the series of modules I'm importing:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.animation as animation
import scipy.ndimage
import scipy.signal
import sys
import os
Of course, not all modules are equally slow to import. The chief culprits are:
matplotlib.pyplot [300ms]
numpy [110ms]
scipy.signal [200ms]
I have experimented with using from, but this isn't any faster. Since Matplotlib is the main culprit and it's got a reputation for slow screen updates, I looked for alternatives. One is PyQtGraph, but that takes 550 ms to import.
I am aware of one obvious solution, which is to call my function from an interactive Python session rather than the system command prompt. This is fine but it's too MATLAB-like, I'd prefer the elegance of having my function available from the system prompt.
I'm new to Python and I'm not sure how to proceed at this point. Since I'm new, I'd appreciate links on how to implement proposed solutions. Ideally, I'm looking for a simple solution (aren't we all!) because the code needs to be portable between multiple Mac and Linux machines.
Not an actual answer to the question, but a hint on how to profile the import speed with Python 3.7 and tuna (a small project of mine):
python3 -X importtime -c "import scipy" 2> scipy.log
tuna scipy.log
you could build a simple server/client, the server running continuously making and updating the plot, and the client just communicating the next file to process.
I wrote a simple server/client example based on the basic example from the socket module docs: http://docs.python.org/2/library/socket.html#example
here is server.py:
# expensive imports
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.animation as animation
import scipy.ndimage
import scipy.signal
import sys
import os
# Echo server program
import socket
HOST = '' # Symbolic name meaning all available interfaces
PORT = 50007 # Arbitrary non-privileged port
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.bind((HOST, PORT))
s.listen(1)
while 1:
conn, addr = s.accept()
print 'Connected by', addr
data = conn.recv(1024)
if not data: break
conn.sendall("PLOTTING:" + data)
# update plot
conn.close()
and client.py:
# Echo client program
import socket
import sys
HOST = '' # The remote host
PORT = 50007 # The same port as used by the server
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((HOST, PORT))
s.sendall(sys.argv[1])
data = s.recv(1024)
s.close()
print 'Received', repr(data)
you just run the server:
python server.py
which does the imports, then the client just sends via the socket the filename of the new file to plot:
python client.py mytextfile.txt
then the server updates the plot.
On my machine running your imports take 0.6 seconds, while running client.py 0.03 seconds.
You can import your modules manually instead, using imp. See documentation here.
For example, import numpy as np could probably be written as
import imp
np = imp.load_module("numpy",None,"/usr/lib/python2.7/dist-packages/numpy",('','',5))
This will spare python from browsing your entire sys.path to find the desired packages.
See also:
Manually importing gtk fails: module not found
1.35 seconds isn't long, but I suppose if you're used to half that for a "quick check" then perhaps it seems so.
Andrea suggests a simple client/server setup, but it seems to me that you could just as easily call a very slight modification of your script and keep it's console window open while you work:
Call the script, which does the imports then waits for input
Minimize the console window, switch to your work, whatever: *Do work*
Select the console again
Provide the script with some sort of input
Receive the results with no import overhead
Switch away from the script again while it happily awaits input
I assume your script is identical every time, ie you don't need to give it image stack location or any particular commands each time (but these are easy to do as well!).
Example RAAC's_Script.py:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.animation as animation
import scipy.ndimage
import scipy.signal
import sys
import os
print('********* RAAC\'s Script Now Running *********')
while True: # Loops forever
# Display a message and wait for user to enter text followed by enter key.
# In this case, we're not expecting any text at all and if there is any it's ignored
input('Press Enter to test image stack...')
'''
*
*
**RAAC's Code Goes Here** (Make sure it's indented/inside the while loop!)
*
*
'''
To end the script, close the console window or press ctrl+c.
I've made this as simple as possible, but it would require very little extra to handle things like quitting nicely, doing slightly different things based on input, etc.
You can use lazy imports, but it depends on your use case.
If it's an application, you can run necessary modules for GUI, then after window is loaded, you can import all your modules.
If it's a module and user do not use all the dependencies, you can import inside function.
[warning]
It's against pep8 i think and it's not recomennded at some places, but all the reason behind this is mostly readability (i may be wrong though...) and some builders (e.g. pyinstaller) bundling (which can be solved with adding missing dependencies param to spec)
If you use lazy imports, use comments so user knows that there are extra dependencies.
Example:
import numpy as np
# Lazy imports
# import matplotlib.pyplot as plt
def plot():
import matplotlib.pyplot as plt
# Your function here
# This will be imported during runtime
For some specific libraries i think it's necessity.
You can also create some let's call it api in __init__.py
For example on scikit learn. If you import sklearn and then call some model, it's not found and raise error. You need to be more specific then and import directly submodule. Though it can be unconvenient for users, it's imho good practice and can reduce import times significantly.
Usually 10% of imported libraries cost 90% of import time. Very simple tool for analysis is line_profiler
import line_profiler
import atexit
profile = line_profiler.LineProfiler()
atexit.register(profile.print_stats)
#profile
def profiled_function():
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
profiled_function()
This give results
Line # Hits Time Per Hit % Time Line Contents
==============================================================
20 #profile
21 def profiled_function():
22
23 1 2351852.0 2351852.0 6.5 import numpy as np
24 1 6545679.0 6545679.0 18.0 import pandas as pd
25 1 27485437.0 27485437.0 75.5 import matplotlib.pyplot as plt
75% of three libraries imports time is matplotlib (this does not mean that it's bad written, it just needs a lot of stuff for grafic output)
Note:
If you import library in one module, other imports cost nothing, it's globally shared...
Another note:
If using imports directly from python (e.g pathlib, subprocess etc.) do not use lazy load, python modules import times are close to zero and don't need to be optimized from my experience...
I have done just a basic test below, but it shows that runpy can be used to solve this issue when you need to have a whole Python script to be faster (you don't want to put any logic in test_server.py).
test_server.py
import socket
import time
import runpy
import matplotlib.pyplot
HOST = 'localhost'
PORT = 50007
serversocket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
try:
serversocket.bind((HOST, PORT))
except:
print("Server is already running")
exit(1)
# Start server with maximum 100 connections
serversocket.listen(100)
while True:
connection, address = serversocket.accept()
buf = connection.recv(64)
if len(buf) > 0:
buf_str = str(buf.decode("utf-8"))
now = time.time()
runpy.run_path(path_name=buf_str)
after = time.time()
duration = after - now
print("I received " + buf_str + " script and it took " + str(duration) + " seconds to execute it")
test_client.py
import socket
import sys
HOST = 'localhost'
PORT = 50007
clientsocket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
clientsocket.connect((HOST, PORT))
message = sys.argv[1].encode()
clientsocket.send(message)
test_lag.py
import matplotlib.pyplot
Testing:
$ python3 test_client.py test_lag.py
I received test_lag.py script and it took 0.0002799034118652344 seconds to execute it
$ time python3 test_lag.py
real 0m0.624s
user 0m1.307s
sys 0m0.180s
Based on this, module is pre-loaded for fast usage.
When importing modules in Python, what is the difference between this:
from module import a, b, c, d
and this
from module import a
from module import b
from module import c
from module import d
To me it makes sense always to condense code and use the first example, but I've been seeing some code samples out there dong the second. Is there any difference at all or is it all in the preference of the programmer?
There is no difference at all. They both function exactly the same.
However, from a stylistic perspective, one might be more preferable than the other. And on that note, the PEP-8 for imports says that you should compress from module import name1, name2 onto a single line and leave import module1 on multiple lines:
Yes: import os
import sys
No: import sys, os
Ok: from subprocess import Popen, PIPE
In response to #teewuane's comment (repeated here in case the comment gets deleted):
#inspectorG4dget What if you have to import several functions from one
module and it ends up making that line longer than 80 char? I know
that the 80 char thing is "when it makes the code more readable" but I
am still wondering if there is a more tidy way to do this. And I don't
want to do from foo import * even though I am basically importing
everything.
The issue here is that doing something like the following could exceed the 80 char limit:
from module import func1, func2, func3, func4, func5
To this, I have two responses (I don't see PEP8 being overly clear about this):
Break it up into two imports:
from module import func1, func2, func3
from module import func4, func5
Doing this has the disadvantage that if module is removed from the codebase or otherwise refactored, then both import lines will need to be deleted. This could prove to be painful
Split the line:
To mitigate the above concern, it may be wiser to do
from module import func1, func2, func3, \
func4, func5
This would result in an error if the second line is not deleted along with the first, while still maintaining the singular import statement
To add to some of the questions raised from inspectorG4dget's answer, you can also use tuples to do multi-line imports when folder structures start getting deeply nested or you have modules with obtuse names.
from some.module.submodule.that_has_long_names import (
first_item,
second_item,
more_imported_items_with_really_enormously_long_names_that_might_be_too_descriptive,
that_would_certainly_not_fit,
on_one_line,
)
This also works, though I'm not a fan of this style:
from module import (a_ton, of, modules, that_seem, to_keep, needing,
to_be, added, to_the_list, of_required_items)
I would suggest not to follow PEP-8 blindly. When you have about half screen worth of imports, things start becoming uncomfortable and PEP-8 is then in conflicts with PEP-20 readability guidelines.
My preference is,
Put all built-in imports on one line such as sys, os, time etc.
For other imports, use one line per package (not module)
Above gives you good balance because the reader can still quickly glance the dependencies while achieving reasonable compactness.
For example,
My Preference
# one line per package
import os, json, time, sys, math
import numpy as np
import torch, torch.nn as nn, torch.autograd, torch.nn.functional as F
from torchvision models, transforms
PEP-8 Recommandation
# one line per module or from ... import statement
import os
import json
import time
import sys
import math
import numpy as np
import torch
from torch import nn as nn, autograd, nn.functional as F
from torchvision import models, transforms
A concern not mentioned by other answers is git merge conflicts.
Let's say you start with this import statement:
import os
If you change this line to import os, sys in one branch and import json, os in another branch, you will get this conflict when you attempt to merge them:
<<<<<<< HEAD
import os, sys
=======
import json, os
>>>>>>> branch
But if you add import sys and import json on separate lines, you get a nice merge commit with no conflicts:
--- a/foo.py
+++ b/foo.py
### -1,2 -1,2 +1,3 ###
+ import json
import os
+import sys
You will still get a conflict if the two imports were added at the same location, as git doesn't know which order they should appear in. So if you had imported time instead of json, for example:
import os
<<<<<<< HEAD
import sys
=======
import time
>>>>>>> branch
Still, it can be worth sticking with this style for the occasions where it does avoid merge conflicts.
Imports should usually be on separate lines as per PEP 8 guidelines.
# Wrong Use
import os, sys
# Correct Use
import os
import sys
For more import based PEP 8 violations and fixes please check this out https://ayush-raj-blogs.hashnode.dev/making-clean-pr-for-open-source-contributors-pep-8-style.
Both are same.
Use from module import a, b, c, d.
If you want to import only one part of a module, use:
from module import a
If u want to import multiple codes from same module, use:
from module import a,b,c,d
No need to write all in separate lines when both are same.