improving speed of Python module import

improving speed of Python module import - python

The question of how to speed up importing of Python modules has been asked previously (Speeding up the python "import" loader and Python -- Speed Up Imports?) but without specific examples and has not yielded accepted solutions. I will therefore take up the issue again here, but this time with a specific example.
I have a Python script that loads a 3-D image stack from disk, smooths it, and displays it as a movie. I call this script from the system command prompt when I want to quickly view my data. I'm OK with the 700 ms it takes to smooth the data as this is comparable to MATLAB. However, it takes an additional 650 ms to import the modules. So from the user's perspective the Python code runs at half the speed.
This is the series of modules I'm importing:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.animation as animation
import scipy.ndimage
import scipy.signal
import sys
import os
Of course, not all modules are equally slow to import. The chief culprits are:
matplotlib.pyplot [300ms]
numpy [110ms]
scipy.signal [200ms]
I have experimented with using from, but this isn't any faster. Since Matplotlib is the main culprit and it's got a reputation for slow screen updates, I looked for alternatives. One is PyQtGraph, but that takes 550 ms to import.
I am aware of one obvious solution, which is to call my function from an interactive Python session rather than the system command prompt. This is fine but it's too MATLAB-like, I'd prefer the elegance of having my function available from the system prompt.
I'm new to Python and I'm not sure how to proceed at this point. Since I'm new, I'd appreciate links on how to implement proposed solutions. Ideally, I'm looking for a simple solution (aren't we all!) because the code needs to be portable between multiple Mac and Linux machines.

Not an actual answer to the question, but a hint on how to profile the import speed with Python 3.7 and tuna (a small project of mine):
python3 -X importtime -c "import scipy" 2> scipy.log
tuna scipy.log

you could build a simple server/client, the server running continuously making and updating the plot, and the client just communicating the next file to process.
I wrote a simple server/client example based on the basic example from the socket module docs: http://docs.python.org/2/library/socket.html#example
here is server.py:
# expensive imports
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.animation as animation
import scipy.ndimage
import scipy.signal
import sys
import os
# Echo server program
import socket
HOST = '' # Symbolic name meaning all available interfaces
PORT = 50007 # Arbitrary non-privileged port
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.bind((HOST, PORT))
s.listen(1)
while 1:
conn, addr = s.accept()
print 'Connected by', addr
data = conn.recv(1024)
if not data: break
conn.sendall("PLOTTING:" + data)
# update plot
conn.close()
and client.py:
# Echo client program
import socket
import sys
HOST = '' # The remote host
PORT = 50007 # The same port as used by the server
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((HOST, PORT))
s.sendall(sys.argv[1])
data = s.recv(1024)
s.close()
print 'Received', repr(data)
you just run the server:
python server.py
which does the imports, then the client just sends via the socket the filename of the new file to plot:
python client.py mytextfile.txt
then the server updates the plot.
On my machine running your imports take 0.6 seconds, while running client.py 0.03 seconds.

You can import your modules manually instead, using imp. See documentation here.
For example, import numpy as np could probably be written as
import imp
np = imp.load_module("numpy",None,"/usr/lib/python2.7/dist-packages/numpy",('','',5))
This will spare python from browsing your entire sys.path to find the desired packages.
See also:
Manually importing gtk fails: module not found

1.35 seconds isn't long, but I suppose if you're used to half that for a "quick check" then perhaps it seems so.
Andrea suggests a simple client/server setup, but it seems to me that you could just as easily call a very slight modification of your script and keep it's console window open while you work:
Call the script, which does the imports then waits for input
Minimize the console window, switch to your work, whatever: *Do work*
Select the console again
Provide the script with some sort of input
Receive the results with no import overhead
Switch away from the script again while it happily awaits input
I assume your script is identical every time, ie you don't need to give it image stack location or any particular commands each time (but these are easy to do as well!).
Example RAAC's_Script.py:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.animation as animation
import scipy.ndimage
import scipy.signal
import sys
import os
print('********* RAAC\'s Script Now Running *********')
while True: # Loops forever
# Display a message and wait for user to enter text followed by enter key.
# In this case, we're not expecting any text at all and if there is any it's ignored
input('Press Enter to test image stack...')
'''
*
*
**RAAC's Code Goes Here** (Make sure it's indented/inside the while loop!)
*
*
'''
To end the script, close the console window or press ctrl+c.
I've made this as simple as possible, but it would require very little extra to handle things like quitting nicely, doing slightly different things based on input, etc.

You can use lazy imports, but it depends on your use case.
If it's an application, you can run necessary modules for GUI, then after window is loaded, you can import all your modules.
If it's a module and user do not use all the dependencies, you can import inside function.
[warning]
It's against pep8 i think and it's not recomennded at some places, but all the reason behind this is mostly readability (i may be wrong though...) and some builders (e.g. pyinstaller) bundling (which can be solved with adding missing dependencies param to spec)
If you use lazy imports, use comments so user knows that there are extra dependencies.
Example:
import numpy as np
# Lazy imports
# import matplotlib.pyplot as plt
def plot():
import matplotlib.pyplot as plt
# Your function here
# This will be imported during runtime
For some specific libraries i think it's necessity.
You can also create some let's call it api in __init__.py
For example on scikit learn. If you import sklearn and then call some model, it's not found and raise error. You need to be more specific then and import directly submodule. Though it can be unconvenient for users, it's imho good practice and can reduce import times significantly.
Usually 10% of imported libraries cost 90% of import time. Very simple tool for analysis is line_profiler
import line_profiler
import atexit
profile = line_profiler.LineProfiler()
atexit.register(profile.print_stats)
#profile
def profiled_function():
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
profiled_function()
This give results
Line # Hits Time Per Hit % Time Line Contents
==============================================================
20 #profile
21 def profiled_function():
22
23 1 2351852.0 2351852.0 6.5 import numpy as np
24 1 6545679.0 6545679.0 18.0 import pandas as pd
25 1 27485437.0 27485437.0 75.5 import matplotlib.pyplot as plt
75% of three libraries imports time is matplotlib (this does not mean that it's bad written, it just needs a lot of stuff for grafic output)
Note:
If you import library in one module, other imports cost nothing, it's globally shared...
Another note:
If using imports directly from python (e.g pathlib, subprocess etc.) do not use lazy load, python modules import times are close to zero and don't need to be optimized from my experience...

I have done just a basic test below, but it shows that runpy can be used to solve this issue when you need to have a whole Python script to be faster (you don't want to put any logic in test_server.py).
test_server.py
import socket
import time
import runpy
import matplotlib.pyplot
HOST = 'localhost'
PORT = 50007
serversocket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
try:
serversocket.bind((HOST, PORT))
except:
print("Server is already running")
exit(1)
# Start server with maximum 100 connections
serversocket.listen(100)
while True:
connection, address = serversocket.accept()
buf = connection.recv(64)
if len(buf) > 0:
buf_str = str(buf.decode("utf-8"))
now = time.time()
runpy.run_path(path_name=buf_str)
after = time.time()
duration = after - now
print("I received " + buf_str + " script and it took " + str(duration) + " seconds to execute it")
test_client.py
import socket
import sys
HOST = 'localhost'
PORT = 50007
clientsocket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
clientsocket.connect((HOST, PORT))
message = sys.argv[1].encode()
clientsocket.send(message)
test_lag.py
import matplotlib.pyplot
Testing:
$ python3 test_client.py test_lag.py
I received test_lag.py script and it took 0.0002799034118652344 seconds to execute it
$ time python3 test_lag.py
real 0m0.624s
user 0m1.307s
sys 0m0.180s
Based on this, module is pre-loaded for fast usage.

Related

Performance impact in having a single import per line

does anybody know if there is a performance difference between having all imports from one module in a single line vs one per line.
For example, having:
from a import A, B, C, D, E, F, G
instead of:
from a import A
from a import B
from a import C
from a import D
from a import E
from a import F
from a import G
I'm trying to convince my team to use reorder-python-imports in our pre-commit hooks and this doubt is the only obstacle that prevents me from adding it.

Combining imports is technically faster but there should not be a noticeable performance difference in real world usage. Python modules execute only once, on their first time being imported. After being initialized, you only incur the negligible cost of the additional import statements themselves.
As long as you only do top-level imports, your imports will only ever execute during startup anyway. By combining them you might at best manage to shave off a negligible amount of milliseconds during startup. Read the Python docs on the import mechanism.
Here is my machine's performance after a million repetitions each:
test1.py
from timeit import timeit
print(timeit("""
from socket import socket
from socket import create_connection
from socket import has_dualstack_ipv6
from socket import getaddrinfo
from socket import gethostbyaddr
"""))
# Prints 2.8450163
test2.py
from timeit import timeit
print(timeit("""
from socket import socket, create_connection, has_dualstack_ipv6, getaddrinfo, gethostbyaddr
"""))
# Prints 0.6992155

splitting python code into separate files

I am trying to split common python code into separate files.
for example I have svr.py with the following code.
import socket
PORT = 6060
SERVER = socket.gethostbyname(socket.gethostname())
ADDRESS = (SERVER, PORT)
__server = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
__server.bind(ADDRESS)
def startServer():
pass
startServer()
so I am thinking of splitting into 2 python files, since the common section (bcode.py) will be use in svr.py and client.py
file: bcode.py has the following code
import socket
PORT = 6060
SERVER = socket.gethostbyname(socket.gethostname())
ADDRESS = (SERVER, PORT)
file: svr.py has the following code
import socket
import bcode
__server = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
__server.bind(ADDRESS)
def startServer():
pass
startServer()
from my understanding when I do import bcode, python interpreter execude bcode.py so it should has constant PORT, SERVER and ADDRESS in memory, but when I run svr.py, I get the following error message:
Traceback (most recent call last):
File "C:\temp\PythonProject\svr.py", line 9, in <module>
__server.bind(ADDRESS)
NameError: name 'ADDRESS' is not defined
Process finished with exit code 1
it seems the ADDRESS constant in not available on svr.py even after bcode is imported, and also I need to add import socket on svr.py, I thought since I already import socket in bcode.py and when svr.py import bcode, the import socket is carried into svr.py as well.
I appreciated if you could help me on what is the best way to split common code in Python.

First off, I think it is a very good idea to split your code into modules. It will help you keeping your code clean and tidy!
Then, when you import a module in Python, you decide under what namespace its content is included under. In your example, you've used the standard way of importing, namely import bcode. By this approach, all content of bcode is subject to the bcode namespace and must be referenced as such:
import bcode
print(bcode.ADDRESS)
This is also the approach that I recommend, as it keeps your namespaces clean and tidy when your files grow in number and in terms of code lines. This way, there is never any doubt of which ADDRESS is being used.
However, there are other ways to import modules, e.g. by explicitly importing the variable of choice by from bcode import ADDRESS. But then, imagine doing this,
ADDRESS = "127.0.0.1"
from bcode import ADDRESS
print(ADDRESS) # whatever was in bcode ..
This may be fine for now, but someone else that reads your code may overlook the fact that you rewrote the variable or lose track of which is what and where whatever originally came from.
Yet another approach lets you import all content of a module in under the local namespace by using *. This solution may be acceptable for small scripts, however, you'll probably make it really cumbersome for your future-self (or colleges) as you'll definitely lose control over your names (that is, variables, functions, classes, and so on) in the long run,
ADDRESS = "127.0.0.1"
from bcode import *
print(ADDRESS) # whatever was in bcode ..
print(PORT) # whatever was in bcode ..
I strongly recommend that you stick to the first approach (as you already have), and remember to reference the variables appropriately.
As a final note, you should also be aware of the possibility to rename namespaces/modules. I don't really recommend this either, but it may come in handy for e.g. shortening long modules names. Some heavily used modules from the standard lib have some commonly used abbreviations, e.g. the numpy module, often referenced as just np
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(0,5,100)
y = x**2
plt.plot(x,y)

Pymatbridge connection working in terminal but failing in python script

I have a question regarding the pymatbridge. I have been trying to use it as an alternative to the Matlab Engine, which for some reason broke on me recently and I haven't been able to get it to work again. I followed the instructions from Github and when testing my script in the terminal, the zmq connection works great, and the connection gets established every single time. But when I copy paste what's working in the terminal into a python script, the connection fails every single time. I'm not familiar with zmq, but the problem seems to be systematic, so I was wondering if there was something obvious I'm missing. Here is my code.
import os
import glob
import csv
import numpy as np
import matplotlib.pylab as plt
#Alternative to matlab Engine: pymatbridge
import pymatbridge as pymat
matlab = pymat.Matlab(executable='/Applications/MATLAB_R2015a.app/bin/matlab')
#Directory of Matlab functions
Matlab_dir = '/Users/cynthiagerlein/Dropbox (Personal)/Scatterometer/Matlab/'
#Directory with SIR data
SIR_dir = '/Volumes/blahblahblah/OriginalData/'
#Directory with matrix data
Data_dir = '/Volumes/blahblahblah/Data/'
#Create list of names of SIR files to open and save as matrices
os.chdir(SIR_dir)
#Save list of SIR file names
SIR_File_List = glob.glob("*.sir")
#Launch Pymatbridge
matlab.start()
for the_file in SIR_File_List:
print 'We are on file ', the_file
Running_name = SIR_dir + the_file
image = matlab.run_func('/Users/cynthiagerlein/Dropbox\ \(Personal\)/Scatterometer/Matlab/loadsir.m', Running_name)
np.savetxt(Data_dir+the_file[:22] + '.txt.gz',np.array(image['result']) )
I ended up using matlab_wrapper instead, and it's working great and was A LOT easier to install and set up, but I am just curious to understand why the pymatbridge is failing in my script but working in terminal. By the way, I learned about both pymatbridge and matlab_wrapper in the amazing answer to this post (scroll down, 3rd answer).

How can I overload a built-in module in python?

I am trying to bind hosts to specified ips in my python program. Just make it affect in the python program, so I am not going to modify the /etc/hosts file.
I tried to add a bit code to the create_connection function in socket.py for host-ip translation, like this:
host, port = address # the original code in socket.py
# My change here:
if host == "www.google.com":
host = target_ip
for res in getaddrinfo(host, port, 0, SOCK_STREAM): # the original code in socket.py
I found it works fine.
And now I want the host-ip translation only works in this python program.
So my question is: how can I make my python program import this socket.py not the build-in one when using import socket?
To make it clear, here is an example. Suppose 'test' is my work directory:
test
|--- main.py
|--- socket.py
In this case:
How can I make main.py use test/socket.py by import socket?
How can I make another modules use test/socket.py when they are
using import socket?
I think changing the module find path order may help. But I found that even if the current path('') is in the first place of sys.path already and import socket still imports the built-in scoket module.

You can monkey-patch sys.modules, placing your own module instead of the standard socket, before importing any other module which might be using it.
# myscript.py
from myproject import mysocket
import sys
sys.modules['socket'] = mysocket
# ... the rest of your code
import requests
...
For that, mysocket should expose everything which the standard socket does.
# mysocket.py
import socket as _std_socket
from socket import * # expose everything
def create_connection(address, *args, **kwargs):
if address == ...:
address = ...
return _std_socket.create_connection(address, *args, **kwargs)
This might be an over-simplification of what mysocket.py should look like. Youd' likely need to add some definitions before this can be used in production, but you get the idea.
Another approach would be to monkey-patch the socket module itself, i.e. overwrite names inside the original module.
# myscript.py
import socket
def create_connection2(...):
...
socket.create_connection = create_connection2
# ... the rest of your code
import requests
...
I prefer the former approach, becuase it is cleaner in the sense you don't need to go inside the module, only to hide it and override some things in it from the outside.

You can use relative imports to locally use a socket.py module. However, to do this your project must be structured as a package.
from . import socket

Matplotlib and Pylab doesn't work in Python CGI

Matplotlib and Pylab don't work in Python CGI. But the same combination is working in the Python shell.
Following is the code:
#!C:/Python26/python
import cgi
import cgitb
import sys
import os
cgitb.enable()
# set HOME environment variable to a directory the httpd server can write to
os.environ[ 'HOME' ] = '/tmp/'
import matplotlib
# chose a non-GUI backend
matplotlib.use( 'Agg' )
import pylab
#Deals with inputing data into python from the html form
form = cgi.FieldStorage()
# construct your plot
pylab.plot([1,2,3])
print "Content-Type: image/png\n"
# save the plot as a png and output directly to webserver
pylab.savefig( "test.png")

Put
import cgitb ; cgitb.enable()
at the top of your script, run it and show us the traceback. Without that the only help we can provide is to pray for you.
The traceback should be clear enough without extra help really.
An aside, Python cgi is extremely slow and not really something you can use for anything non trivial.

Your code is a little incomplete. As it stands you are writing the plot to a file on the servers hard-drive. You are not returning it to the browser. One method to do this is to save the plot to a StringIO object and then stream it back.
import cStringIO
imgData = cStringIO.StringIO()
pylab.savefig(imgData, format='png')
# rewind the data
imgData.seek(0)
print "Content-Type: image/png\n"
print
print imgData.read()

It seems this is a bug in Python ctypes module. One has to comment the line
#CFUNCTYPE(c_int)(lambda: None).
in $HOME/lib/python2.7/ctypes/__init__.py.
No one understands what that meant, it's a workaround for Windows which makes troubles in Linus cgi, see Python ctypes MemoryError in fcgi process from PIL library.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

improving speed of Python module import - python

Not an actual answer to the question, but a hint on how to profile the import speed with Python 3.7 and tuna (a small project of mine): python3 -X importtime -c "import scipy" 2> scipy.log tuna scipy.log

Related

Performance impact in having a single import per line

splitting python code into separate files

Pymatbridge connection working in terminal but failing in python script

How can I overload a built-in module in python?

Matplotlib and Pylab doesn't work in Python CGI

Categories

Resources