I am trying to split common python code into separate files.
for example I have svr.py with the following code.
import socket
PORT = 6060
SERVER = socket.gethostbyname(socket.gethostname())
ADDRESS = (SERVER, PORT)
__server = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
__server.bind(ADDRESS)
def startServer():
pass
startServer()
so I am thinking of splitting into 2 python files, since the common section (bcode.py) will be use in svr.py and client.py
file: bcode.py has the following code
import socket
PORT = 6060
SERVER = socket.gethostbyname(socket.gethostname())
ADDRESS = (SERVER, PORT)
file: svr.py has the following code
import socket
import bcode
__server = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
__server.bind(ADDRESS)
def startServer():
pass
startServer()
from my understanding when I do import bcode, python interpreter execude bcode.py so it should has constant PORT, SERVER and ADDRESS in memory, but when I run svr.py, I get the following error message:
Traceback (most recent call last):
File "C:\temp\PythonProject\svr.py", line 9, in <module>
__server.bind(ADDRESS)
NameError: name 'ADDRESS' is not defined
Process finished with exit code 1
it seems the ADDRESS constant in not available on svr.py even after bcode is imported, and also I need to add import socket on svr.py, I thought since I already import socket in bcode.py and when svr.py import bcode, the import socket is carried into svr.py as well.
I appreciated if you could help me on what is the best way to split common code in Python.
First off, I think it is a very good idea to split your code into modules. It will help you keeping your code clean and tidy!
Then, when you import a module in Python, you decide under what namespace its content is included under. In your example, you've used the standard way of importing, namely import bcode. By this approach, all content of bcode is subject to the bcode namespace and must be referenced as such:
import bcode
print(bcode.ADDRESS)
This is also the approach that I recommend, as it keeps your namespaces clean and tidy when your files grow in number and in terms of code lines. This way, there is never any doubt of which ADDRESS is being used.
However, there are other ways to import modules, e.g. by explicitly importing the variable of choice by from bcode import ADDRESS. But then, imagine doing this,
ADDRESS = "127.0.0.1"
from bcode import ADDRESS
print(ADDRESS) # whatever was in bcode ..
This may be fine for now, but someone else that reads your code may overlook the fact that you rewrote the variable or lose track of which is what and where whatever originally came from.
Yet another approach lets you import all content of a module in under the local namespace by using *. This solution may be acceptable for small scripts, however, you'll probably make it really cumbersome for your future-self (or colleges) as you'll definitely lose control over your names (that is, variables, functions, classes, and so on) in the long run,
ADDRESS = "127.0.0.1"
from bcode import *
print(ADDRESS) # whatever was in bcode ..
print(PORT) # whatever was in bcode ..
I strongly recommend that you stick to the first approach (as you already have), and remember to reference the variables appropriately.
As a final note, you should also be aware of the possibility to rename namespaces/modules. I don't really recommend this either, but it may come in handy for e.g. shortening long modules names. Some heavily used modules from the standard lib have some commonly used abbreviations, e.g. the numpy module, often referenced as just np
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(0,5,100)
y = x**2
plt.plot(x,y)
Related
Is there a simpler/better or more pythonic way of getting the base hostname?
base_hostname = socket.gethostname().split(".")[0]
As an example, how would I get localhost only as below:
>>> socket.gethostname()
'localhost.localdomain'
>>> socket.getfqdn()
'localhost.localdomain'
>>> socket.gethostname().split('.')[0]
'localhost'
I am asking because I suspect there is something similar to the os.path's abspath, basename, join, split, splitext, etc functions to manipulate hostnames, but I haven't found it yet.
You can make it a bit more pythonic by splitting the string only once, at most:
import socket
socket.gethostname().split('.', 1)[0]
Also, if for some reason you don't want or can't use the socket package, an alternative is to use the platform package:
import platform
platform.node().split('.', 1)[0]
I am trying to bind hosts to specified ips in my python program. Just make it affect in the python program, so I am not going to modify the /etc/hosts file.
I tried to add a bit code to the create_connection function in socket.py for host-ip translation, like this:
host, port = address # the original code in socket.py
# My change here:
if host == "www.google.com":
host = target_ip
for res in getaddrinfo(host, port, 0, SOCK_STREAM): # the original code in socket.py
I found it works fine.
And now I want the host-ip translation only works in this python program.
So my question is: how can I make my python program import this socket.py not the build-in one when using import socket?
To make it clear, here is an example. Suppose 'test' is my work directory:
test
|--- main.py
|--- socket.py
In this case:
How can I make main.py use test/socket.py by import socket?
How can I make another modules use test/socket.py when they are
using import socket?
I think changing the module find path order may help. But I found that even if the current path('') is in the first place of sys.path already and import socket still imports the built-in scoket module.
You can monkey-patch sys.modules, placing your own module instead of the standard socket, before importing any other module which might be using it.
# myscript.py
from myproject import mysocket
import sys
sys.modules['socket'] = mysocket
# ... the rest of your code
import requests
...
For that, mysocket should expose everything which the standard socket does.
# mysocket.py
import socket as _std_socket
from socket import * # expose everything
def create_connection(address, *args, **kwargs):
if address == ...:
address = ...
return _std_socket.create_connection(address, *args, **kwargs)
This might be an over-simplification of what mysocket.py should look like. Youd' likely need to add some definitions before this can be used in production, but you get the idea.
Another approach would be to monkey-patch the socket module itself, i.e. overwrite names inside the original module.
# myscript.py
import socket
def create_connection2(...):
...
socket.create_connection = create_connection2
# ... the rest of your code
import requests
...
I prefer the former approach, becuase it is cleaner in the sense you don't need to go inside the module, only to hide it and override some things in it from the outside.
You can use relative imports to locally use a socket.py module. However, to do this your project must be structured as a package.
from . import socket
I am wondering if there is a way to have a python variable to behave like a python module.
Problem I currently have is that we have python bindings for our API. The bindings are automatically generated through swig and to use them someone would only needs to:
import module_name as short_name
short_name.functions()
Right now we are studying having the API to use Apache Thrift. To use it someone needs to:
client, transport = thrift_connect()
client.functions()
...
transport.close()
Problem is that we have loads of scripts and we were wondering if there is a way to have the thrift client object to behave like a module so that we don't need to modify all scripts. One idea we had was to do something like this:
client, transport = thrift_connect()
global short_name
short_name = client
__builtins__.short_name = client
This 'sort of' works. It creates a global variable 'short_name' that acts like a module, but it also generates other problems. If other files import the same module it is needed to comment those imports. Also, having a global variable is not a bright idea for maintenance purposes.
So, would there be a way to make the thrift client to behave like a module? So that people could continue to use the 'old' syntax, but under the hood the module import would trigger a connection ans return the object as the module?
EDIT 1:
It is fine for every import to open a connection. Maybe we could use some kind of singleton so that a specific interpreter can only open one connection even if it calls multiple imports on different files.
I thought about binding the transport.close() to a object termination. Could be the module itself, if that is possible.
EDIT 2:
This seems to do what I want:
client, transport = thrift_connect()
attributes = dict((name, getattr(client, name)) for name in dir(client) if not (name.startswith('__') or name.startswith('_')))
globals().update(attributes)
Importing a module shouldn't cause a network connection.
If you have mandatory setup/teardown steps then you could define a context manager:
from contextlib import contextmanager
#contextmanager
def thrift_client():
client, transport = thrift_connect()
client.functions()
try:
yield client
finally:
transport.close()
Usage:
with thrift_client() as client:
# use client here
In general, the auto-generated module with C-like API should be private e.g., name it _thrift_client and the proper pythonic API that is used outside should be written on top of it by hand in another module.
To answer the question from the title: you can make an object to behave like a module e.g., see sh.SelfWrapper and quickdraw.Module.
The question of how to speed up importing of Python modules has been asked previously (Speeding up the python "import" loader and Python -- Speed Up Imports?) but without specific examples and has not yielded accepted solutions. I will therefore take up the issue again here, but this time with a specific example.
I have a Python script that loads a 3-D image stack from disk, smooths it, and displays it as a movie. I call this script from the system command prompt when I want to quickly view my data. I'm OK with the 700 ms it takes to smooth the data as this is comparable to MATLAB. However, it takes an additional 650 ms to import the modules. So from the user's perspective the Python code runs at half the speed.
This is the series of modules I'm importing:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.animation as animation
import scipy.ndimage
import scipy.signal
import sys
import os
Of course, not all modules are equally slow to import. The chief culprits are:
matplotlib.pyplot [300ms]
numpy [110ms]
scipy.signal [200ms]
I have experimented with using from, but this isn't any faster. Since Matplotlib is the main culprit and it's got a reputation for slow screen updates, I looked for alternatives. One is PyQtGraph, but that takes 550 ms to import.
I am aware of one obvious solution, which is to call my function from an interactive Python session rather than the system command prompt. This is fine but it's too MATLAB-like, I'd prefer the elegance of having my function available from the system prompt.
I'm new to Python and I'm not sure how to proceed at this point. Since I'm new, I'd appreciate links on how to implement proposed solutions. Ideally, I'm looking for a simple solution (aren't we all!) because the code needs to be portable between multiple Mac and Linux machines.
Not an actual answer to the question, but a hint on how to profile the import speed with Python 3.7 and tuna (a small project of mine):
python3 -X importtime -c "import scipy" 2> scipy.log
tuna scipy.log
you could build a simple server/client, the server running continuously making and updating the plot, and the client just communicating the next file to process.
I wrote a simple server/client example based on the basic example from the socket module docs: http://docs.python.org/2/library/socket.html#example
here is server.py:
# expensive imports
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.animation as animation
import scipy.ndimage
import scipy.signal
import sys
import os
# Echo server program
import socket
HOST = '' # Symbolic name meaning all available interfaces
PORT = 50007 # Arbitrary non-privileged port
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.bind((HOST, PORT))
s.listen(1)
while 1:
conn, addr = s.accept()
print 'Connected by', addr
data = conn.recv(1024)
if not data: break
conn.sendall("PLOTTING:" + data)
# update plot
conn.close()
and client.py:
# Echo client program
import socket
import sys
HOST = '' # The remote host
PORT = 50007 # The same port as used by the server
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((HOST, PORT))
s.sendall(sys.argv[1])
data = s.recv(1024)
s.close()
print 'Received', repr(data)
you just run the server:
python server.py
which does the imports, then the client just sends via the socket the filename of the new file to plot:
python client.py mytextfile.txt
then the server updates the plot.
On my machine running your imports take 0.6 seconds, while running client.py 0.03 seconds.
You can import your modules manually instead, using imp. See documentation here.
For example, import numpy as np could probably be written as
import imp
np = imp.load_module("numpy",None,"/usr/lib/python2.7/dist-packages/numpy",('','',5))
This will spare python from browsing your entire sys.path to find the desired packages.
See also:
Manually importing gtk fails: module not found
1.35 seconds isn't long, but I suppose if you're used to half that for a "quick check" then perhaps it seems so.
Andrea suggests a simple client/server setup, but it seems to me that you could just as easily call a very slight modification of your script and keep it's console window open while you work:
Call the script, which does the imports then waits for input
Minimize the console window, switch to your work, whatever: *Do work*
Select the console again
Provide the script with some sort of input
Receive the results with no import overhead
Switch away from the script again while it happily awaits input
I assume your script is identical every time, ie you don't need to give it image stack location or any particular commands each time (but these are easy to do as well!).
Example RAAC's_Script.py:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.animation as animation
import scipy.ndimage
import scipy.signal
import sys
import os
print('********* RAAC\'s Script Now Running *********')
while True: # Loops forever
# Display a message and wait for user to enter text followed by enter key.
# In this case, we're not expecting any text at all and if there is any it's ignored
input('Press Enter to test image stack...')
'''
*
*
**RAAC's Code Goes Here** (Make sure it's indented/inside the while loop!)
*
*
'''
To end the script, close the console window or press ctrl+c.
I've made this as simple as possible, but it would require very little extra to handle things like quitting nicely, doing slightly different things based on input, etc.
You can use lazy imports, but it depends on your use case.
If it's an application, you can run necessary modules for GUI, then after window is loaded, you can import all your modules.
If it's a module and user do not use all the dependencies, you can import inside function.
[warning]
It's against pep8 i think and it's not recomennded at some places, but all the reason behind this is mostly readability (i may be wrong though...) and some builders (e.g. pyinstaller) bundling (which can be solved with adding missing dependencies param to spec)
If you use lazy imports, use comments so user knows that there are extra dependencies.
Example:
import numpy as np
# Lazy imports
# import matplotlib.pyplot as plt
def plot():
import matplotlib.pyplot as plt
# Your function here
# This will be imported during runtime
For some specific libraries i think it's necessity.
You can also create some let's call it api in __init__.py
For example on scikit learn. If you import sklearn and then call some model, it's not found and raise error. You need to be more specific then and import directly submodule. Though it can be unconvenient for users, it's imho good practice and can reduce import times significantly.
Usually 10% of imported libraries cost 90% of import time. Very simple tool for analysis is line_profiler
import line_profiler
import atexit
profile = line_profiler.LineProfiler()
atexit.register(profile.print_stats)
#profile
def profiled_function():
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
profiled_function()
This give results
Line # Hits Time Per Hit % Time Line Contents
==============================================================
20 #profile
21 def profiled_function():
22
23 1 2351852.0 2351852.0 6.5 import numpy as np
24 1 6545679.0 6545679.0 18.0 import pandas as pd
25 1 27485437.0 27485437.0 75.5 import matplotlib.pyplot as plt
75% of three libraries imports time is matplotlib (this does not mean that it's bad written, it just needs a lot of stuff for grafic output)
Note:
If you import library in one module, other imports cost nothing, it's globally shared...
Another note:
If using imports directly from python (e.g pathlib, subprocess etc.) do not use lazy load, python modules import times are close to zero and don't need to be optimized from my experience...
I have done just a basic test below, but it shows that runpy can be used to solve this issue when you need to have a whole Python script to be faster (you don't want to put any logic in test_server.py).
test_server.py
import socket
import time
import runpy
import matplotlib.pyplot
HOST = 'localhost'
PORT = 50007
serversocket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
try:
serversocket.bind((HOST, PORT))
except:
print("Server is already running")
exit(1)
# Start server with maximum 100 connections
serversocket.listen(100)
while True:
connection, address = serversocket.accept()
buf = connection.recv(64)
if len(buf) > 0:
buf_str = str(buf.decode("utf-8"))
now = time.time()
runpy.run_path(path_name=buf_str)
after = time.time()
duration = after - now
print("I received " + buf_str + " script and it took " + str(duration) + " seconds to execute it")
test_client.py
import socket
import sys
HOST = 'localhost'
PORT = 50007
clientsocket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
clientsocket.connect((HOST, PORT))
message = sys.argv[1].encode()
clientsocket.send(message)
test_lag.py
import matplotlib.pyplot
Testing:
$ python3 test_client.py test_lag.py
I received test_lag.py script and it took 0.0002799034118652344 seconds to execute it
$ time python3 test_lag.py
real 0m0.624s
user 0m1.307s
sys 0m0.180s
Based on this, module is pre-loaded for fast usage.
Summary: when a certain python module is imported, I want to be able to intercept this action, and instead of loading the required class, I want to load another class of my choice.
Reason: I am working on some legacy code. I need to write some unit test code before I start some enhancement/refactoring. The code imports a certain module which will fail in a unit test setting, however. (Because of database server dependency)
Pseduo Code:
from LegacyDataLoader import load_me_data
...
def do_something():
data = load_me_data()
So, ideally, when python excutes the import line above in a unit test, an alternative class, says MockDataLoader, is loaded instead.
I am still using 2.4.3. I suppose there is an import hook I can manipulate
Edit
Thanks a lot for the answers so far. They are all very helpful.
One particular type of suggestion is about manipulation of PYTHONPATH. It does not work in my case. So I will elaborate my particular situation here.
The original codebase is organised in this way
./dir1/myapp/database/LegacyDataLoader.py
./dir1/myapp/database/Other.py
./dir1/myapp/database/__init__.py
./dir1/myapp/__init__.py
My goal is to enhance the Other class in the Other module. But since it is legacy code, I do not feel comfortable working on it without strapping a test suite around it first.
Now I introduce this unit test code
./unit_test/test.py
The content is simply:
from myapp.database.Other import Other
def test1():
o = Other()
o.do_something()
if __name__ == "__main__":
test1()
When the CI server runs the above test, the test fails. It is because class Other uses LegacyDataLoader, and LegacydataLoader cannot establish database connection to the db server from the CI box.
Now let's add a fake class as suggested:
./unit_test_fake/myapp/database/LegacyDataLoader.py
./unit_test_fake/myapp/database/__init__.py
./unit_test_fake/myapp/__init__.py
Modify the PYTHONPATH to
export PYTHONPATH=unit_test_fake:dir1:unit_test
Now the test fails for another reason
File "unit_test/test.py", line 1, in <module>
from myapp.database.Other import Other
ImportError: No module named Other
It has something to do with the way python resolves classes/attributes in a module
You can intercept import and from ... import statements by defining your own __import__ function and assigning it to __builtin__.__import__ (make sure to save the previous value, since your override will no doubt want to delegate to it; and you'll need to import __builtin__ to get the builtin-objects module).
For example (Py2.4 specific, since that's what you're asking about), save in aim.py the following:
import __builtin__
realimp = __builtin__.__import__
def my_import(name, globals={}, locals={}, fromlist=[]):
print 'importing', name, fromlist
return realimp(name, globals, locals, fromlist)
__builtin__.__import__ = my_import
from os import path
and now:
$ python2.4 aim.py
importing os ('path',)
So this lets you intercept any specific import request you want, and alter the imported module[s] as you wish before you return them -- see the specs here. This is the kind of "hook" you're looking for, right?
There are cleaner ways to do this, but I'll assume that you can't modify the file containing from LegacyDataLoader import load_me_data.
The simplest thing to do is probably to create a new directory called testing_shims, and create LegacyDataLoader.py file in it. In that file, define whatever fake load_me_data you like. When running the unit tests, put testing_shims into your PYTHONPATH environment variable as the first directory. Alternately, you can modify your test runner to insert testing_shims as the first value in sys.path.
This way, your file will be found when importing LegacyDataLoader, and your code will be loaded instead of the real code.
The import statement just grabs stuff from sys.modules if a matching name is found there, so the simplest thing is to make sure you insert your own module into sys.modules under the target name before anything else tries to import the real thing.
# in test code
import sys
import MockDataLoader
sys.modules['LegacyDataLoader'] = MockDataLoader
import module_under_test
There are a handful of variations on the theme, but that basic approach should work fine to do what you describe in the question. A slightly simpler approach would be this, using just a mock function to replace the one in question:
# in test code
import module_under_test
def mock_load_me_data():
# do mock stuff here
module_under_test.load_me_data = mock_load_me_data
That simply replaces the appropriate name right in the module itself, so when you invoke the code under test, presumably do_something() in your question, it calls your mock routine.
Well, if the import fails by raising an exception, you could put it in a try...except loop:
try:
from LegacyDataLoader import load_me_data
except: # put error that occurs here, so as not to mask actual problems
from MockDataLoader import load_me_data
Is that what you're looking for? If it fails, but doesn't raise an exception, you could have it run the unit test with a special command line tag, like --unittest, like this:
import sys
if "--unittest" in sys.argv:
from MockDataLoader import load_me_data
else:
from LegacyDataLoader import load_me_data