teradatasql: runtime/cgo: could not obtain pthread_keys - python

When I'm trying to read data from sqlalchemy df=pd.read_sql_table(table, con, schema) getting runtime error :
runtime/cgo: could not obtain pthread_keys
tried 0x115 0x116 0x117 0x118 0x119 0x11a 0x11b 0x11c 0x11d 0x11e 0x11f 0x120 0x121 0x122 0x123 0x124 0x125 0x126 0x127 0x128 0x129 0x12a 0x12b 0x12c 0x12d 0x12e 0x12f 0x130 0x131 0x132 0x133 0x134 0x135 0x136 0x137 0x138 0x139 0x13a 0x13b 0x13c 0x13d 0x13e 0x13f 0x140 0x141 0x142 0x143 0x144 0x145 0x146 0x147 0x148 0x149 0x14a 0x14b 0x14c 0x14d 0x14e 0x14f 0x150 0x151 0x152 0x153 0x154 0x155 0x156 0x157 0x158 0x159 0x15a 0x15b 0x15c 0x15d 0x15e 0x15f 0x160 0x161 0x162 0x163 0x164 0x165 0x166 0x167 0x168 0x169 0x16a 0x16b 0x16c 0x16d 0x16e 0x16f 0x170 0x171 0x172 0x173 0x174 0x175 0x176 0x177 0x178 0x179 0x17a 0x17b 0x17c 0x17d 0x17e 0x17f 0x180 0x181 0x182 0x183 0x184 0x185 0x186 0x187 0x188 0x189 0x18a 0x18b 0x18c 0x18d 0x18e 0x18f 0x190 0x191 0x192 0x193 0x194
Below is the code:
class TeradataWriter:
def __init__(self):
print("in init")
def read_data_from_teradata(self):
try:
print('Create main')
import pdb;pdb.set_trace()
eng = self.create_connection_engine()
df = pd.read_sql_table("table_name", eng, schema="schema")
print(df)
except Exception as ex:
print('Exception: %s', ex.with_traceback())
def create_connection_engine(self):
try:
return create_engine('teradatasql://' + constants.TERADATA_HOST + '/?user='+ constants.TERADATA_USER_NAME + '&password=' + constants.TERADATA_PWD, echo=False)
except Exception as ex:
LOGGER.error('Exception: %s', ex)
raise Exception(message_constants.ERROR_WHILE_CREATING_CONNECTION_WITH_TERADATA)
if __name__ == "__main__":
p = TeradataWriter()
p.write_dataframe_to_teradata()

Edit: This is fixed. I was finally able to get their support and engineering team to reproduce the issue. They now build the driver with a newer version of go. Upgrade to >= 17.0.3, and you shouldn't see anymore segfaults.
I think I finally figured out why this happens. According to this Go issue, it happens if "If the host process spawns threads prior to loading the shared library, the offset will have changed."
In my case, I was importing matplotlib.pyplot in IPython before calling code that loads the shared library. This starts an event loop and causes the conditions that lead to the segfault.
I changed my code to import matplotlib.pyplot after configuring the teradata driver, and it went away.
According to the Go issue, they just need to recompile the library with a newer version of Go, which I've asked them to do. We'll see what they say.

I have run in to same issue -
So to fix the problem, I moved connect statement to main and it kind of fixed. Its worth trying in your case.

Related

Control Flow issue: Python function called but not executed

I have the strangest problem I have ever met in my life.
I have a part of my code that looks like this:
class AzureDevOpsServiceError(Exception):
pass
skip = ["auto"]
def retrieve_results():
print(variable_not_defined)
... # some useful implementation
if not "results" in skip:
try:
print("before")
retrieve_results()
print("after")
except AzureDevOpsServiceError as e:
print(f"Error raised: {e}")
Obviously, this shall raise an error because variable_not_defined is, well, not defined.
However, for some strange reasons, the code executes correctly and prints
before
after
I have tried to call the function with an argument (retrieve_results(1234)) or adding an argument in the function (def retrieve_results(arg1) and retrieve_results()): both modifications will trigger an exception, so obviously the function is called.
Anyone has got a similar issue and knows what happens?
FYI: this is actually what my implementation looks like:
from azure.devops.exceptions import AzureDevOpsServiceError
import logging
def _retrieve_manual_results(connect: Connectivity, data: DataForPickle) -> None:
"""Retrieve the list of Test Results"""
print("G" + ggggggggggggggggggggggggggggggggggggg)
logger = connect.logger
data.run_in_progress = [165644]
if __name__ == "__main__":
p = ...
connect = ...
data = ...
if not "results" in p.options.skip:
try:
print("........B.........")
_retrieve_manual_results(connect, data)
print("........A.........")
except AzureDevOpsServiceError as e:
logging.error(f"E004: Error while retrieving Test Results: {e}")
logging.debug("More details below...", exc_info=True)
As highlighted by #gmds, it was a problem of cache.
Deleting the .pyc file didn't do much.
However, I have found a solution:
Renaming the function (e.g. adding _)
Running the program
Renaming back (i.e. removing _ in the previous example)
Now, the issue is solved.
If anyone knows what is going behind the scene I am very interested.

request.urlretrieve in multiprocessing Python gets stuck

I am trying to download images from a list of URLs using Python. To make the process faster, I used the multiprocessing library.
The problem I am facing is that the script often hangs/freezes on its own, and I don't know why.
Here is the code that I am using
...
import multiprocessing as mp
def getImages(val):
#Dowload images
try:
url= # preprocess the url from the input val
local= #Filename Generation From Global Varables And Rand Stuffs...
urllib.request.urlretrieve(url,local)
print("DONE - " + url)
return 1
except Exception as e:
print("CAN'T DOWNLOAD - " + url )
return 0
if __name__ == '__main__':
files = "urls.txt"
lst = list(open(files))
lst = [l.replace("\n", "") for l in lst]
pool = mp.Pool(processes=4)
res = pool.map(getImages, lst)
print ("tempw")
It often gets stuck halfway through the list (it prints DONE, or CAN't DOWNLOAD to half of the list it has processed but I don't know what is happening on the rest of them). Has anyone faced this problem? I have searched for similar problems (e.g. this link) but found no answer.
Thanks in advance
Ok, I have found an answer.
A possible culprit was the script was stuck in connecting/downloading from the URL. So what I added was a socket timeout to limit the time to connect and download the image.
And now, the issue no longer bothers me.
Here is my complete code
...
import multiprocessing as mp
import socket
# Set the default timeout in seconds
timeout = 20
socket.setdefaulttimeout(timeout)
def getImages(val):
#Dowload images
try:
url= # preprocess the url from the input val
local= #Filename Generation From Global Varables And Rand Stuffs...
urllib.request.urlretrieve(url,local)
print("DONE - " + url)
return 1
except Exception as e:
print("CAN'T DOWNLOAD - " + url )
return 0
if __name__ == '__main__':
files = "urls.txt"
lst = list(open(files))
lst = [l.replace("\n", "") for l in lst]
pool = mp.Pool(processes=4)
res = pool.map(getImages, lst)
print ("tempw")
Hope this solution helps others who are facing the same issue
It looks like you're facing a GIL issue : The python Global Interpreter Lock basically forbid python to do more than one task at the same time.
The Multiprocessing module is really launching separate instances of python to get the work done in parallel.
But in your case, urllib is called in all these instances : each of them is trying to lock the IO process : the one who succeed (e.g. come first) get you the result, while the others (trying to lock an already locked process) fail.
This is a very simplified explanation, but here are some additionnal ressources :
You can find another way to parallelize requests here : Multiprocessing useless with urllib2?
And more info about the GIL here : What is a global interpreter lock (GIL)?

How do I get the current IPython / Jupyter Notebook name

I am trying to obtain the current NoteBook name when running the IPython notebook. I know I can see it at the top of the notebook. What I am after something like
currentNotebook = IPython.foo.bar.notebookname()
I need to get the name in a variable.
adding to previous answers,
to get the notebook name run the following in a cell:
%%javascript
IPython.notebook.kernel.execute('nb_name = "' + IPython.notebook.notebook_name + '"')
this gets you the file name in nb_name
then to get the full path you may use the following in a separate cell:
import os
nb_full_path = os.path.join(os.getcwd(), nb_name)
I have the following which works with IPython 2.0. I observed that the name of the notebook is stored as the value of the attribute 'data-notebook-name' in the <body> tag of the page. Thus the idea is first to ask Javascript to retrieve the attribute --javascripts can be invoked from a codecell thanks to the %%javascript magic. Then it is possible to access to the Javascript variable through a call to the Python Kernel, with a command which sets a Python variable. Since this last variable is known from the kernel, it can be accessed in other cells as well.
%%javascript
var kernel = IPython.notebook.kernel;
var body = document.body,
attribs = body.attributes;
var command = "theNotebook = " + "'"+attribs['data-notebook-name'].value+"'";
kernel.execute(command);
From a Python code cell
print(theNotebook)
Out[ ]: HowToGetTheNameOfTheNoteBook.ipynb
A defect in this solution is that when one changes the title (name) of a notebook, then this name seems to not be updated immediately (there is probably some kind of cache) and it is necessary to reload the notebook to get access to the new name.
[Edit] On reflection, a more efficient solution is to look for the input field for notebook's name instead of the <body> tag. Looking into the source, it appears that this field has id "notebook_name". It is then possible to catch this value by a document.getElementById() and then follow the same approach as above. The code becomes, still using the javascript magic
%%javascript
var kernel = IPython.notebook.kernel;
var thename = window.document.getElementById("notebook_name").innerHTML;
var command = "theNotebook = " + "'"+thename+"'";
kernel.execute(command);
Then, from a ipython cell,
In [11]: print(theNotebook)
Out [11]: HowToGetTheNameOfTheNoteBookSolBis
Contrary to the first solution, modifications of notebook's name are updated immediately and there is no need to refresh the notebook.
As already mentioned you probably aren't really supposed to be able to do this, but I did find a way. It's a flaming hack though so don't rely on this at all:
import json
import os
import urllib2
import IPython
from IPython.lib import kernel
connection_file_path = kernel.get_connection_file()
connection_file = os.path.basename(connection_file_path)
kernel_id = connection_file.split('-', 1)[1].split('.')[0]
# Updated answer with semi-solutions for both IPython 2.x and IPython < 2.x
if IPython.version_info[0] < 2:
## Not sure if it's even possible to get the port for the
## notebook app; so just using the default...
notebooks = json.load(urllib2.urlopen('http://127.0.0.1:8888/notebooks'))
for nb in notebooks:
if nb['kernel_id'] == kernel_id:
print nb['name']
break
else:
sessions = json.load(urllib2.urlopen('http://127.0.0.1:8888/api/sessions'))
for sess in sessions:
if sess['kernel']['id'] == kernel_id:
print sess['notebook']['name']
break
I updated my answer to include a solution that "works" in IPython 2.0 at least with a simple test. It probably isn't guaranteed to give the correct answer if there are multiple notebooks connected to the same kernel, etc.
It seems I cannot comment, so I have to post this as an answer.
The accepted solution by #iguananaut and the update by #mbdevpl appear not to be working with recent versions of the Notebook.
I fixed it as shown below. I checked it on Python v3.6.1 + Notebook v5.0.0 and on Python v3.6.5 and Notebook v5.5.0.
import jupyterlab
if jupyterlab.__version__.split(".")[0] == "3":
from jupyter_server import serverapp as app
key_srv_directory = 'root_dir'
else :
from notebook import notebookapp as app
key_srv_directory = 'notebook_dir'
import urllib
import json
import os
import ipykernel
def notebook_path(key_srv_directory, ):
"""Returns the absolute path of the Notebook or None if it cannot be determined
NOTE: works only when the security is token-based or there is also no password
"""
connection_file = os.path.basename(ipykernel.get_connection_file())
kernel_id = connection_file.split('-', 1)[1].split('.')[0]
for srv in app.list_running_servers():
try:
if srv['token']=='' and not srv['password']: # No token and no password, ahem...
req = urllib.request.urlopen(srv['url']+'api/sessions')
else:
req = urllib.request.urlopen(srv['url']+'api/sessions?token='+srv['token'])
sessions = json.load(req)
for sess in sessions:
if sess['kernel']['id'] == kernel_id:
return os.path.join(srv[key_srv_directory],sess['notebook']['path'])
except:
pass # There may be stale entries in the runtime directory
return None
As stated in the docstring, this works only when either there is no authentication or the authentication is token-based.
Note that, as also reported by others, the Javascript-based method does not seem to work when executing a "Run all cells" (but works when executing cells "manually"), which was a deal-breaker for me.
The ipyparams package can do this pretty easily.
import ipyparams
currentNotebook = ipyparams.notebook_name
On Jupyter 3.0 the following works. Here I'm showing the entire path on the Jupyter server, not just the notebook name:
To store the NOTEBOOK_FULL_PATH on the current notebook front end:
%%javascript
var nb = IPython.notebook;
var kernel = IPython.notebook.kernel;
var command = "NOTEBOOK_FULL_PATH = '" + nb.base_url + nb.notebook_path + "'";
kernel.execute(command);
To then display it:
print("NOTEBOOK_FULL_PATH:\n", NOTEBOOK_FULL_PATH)
Running the first Javascript cell produces no output.
Running the second Python cell produces something like:
NOTEBOOK_FULL_PATH:
/user/zeph/GetNotebookName.ipynb
Yet another hacky solution since my notebook server can change. Basically you print a random string, save it and then search for a file containing that string in the working directory. The while is needed because save_checkpoint is asynchronous.
from time import sleep
from IPython.display import display, Javascript
import subprocess
import os
import uuid
def get_notebook_path_and_save():
magic = str(uuid.uuid1()).replace('-', '')
print(magic)
# saves it (ctrl+S)
display(Javascript('IPython.notebook.save_checkpoint();'))
nb_name = None
while nb_name is None:
try:
sleep(0.1)
nb_name = subprocess.check_output(f'grep -l {magic} *.ipynb', shell=True).decode().strip()
except:
pass
return os.path.join(os.getcwd(), nb_name)
There is no real way yet to do this in Jupyterlab. But there is an official way that's now under active discussion/development as of August 2021:
https://github.com/jupyter/jupyter_client/pull/656
In the meantime, hitting the api/sessions REST endpoint of jupyter_server seems like the best bet. Here's a cleaned-up version of that approach:
from jupyter_server import serverapp
from jupyter_server.utils import url_path_join
from pathlib import Path
import re
import requests
kernelIdRegex = re.compile(r"(?<=kernel-)[\w\d\-]+(?=\.json)")
def getNotebookPath():
kernelId = kernelIdRegex.search(get_ipython().config["IPKernelApp"]["connection_file"])[0]
for jupServ in serverapp.list_running_servers():
for session in requests.get(url_path_join(jupServ["url"], "api/sessions"), params={"token": jupServ["token"]}).json():
if kernelId == session["kernel"]["id"]:
return Path(jupServ["root_dir"]) / session["notebook"]['path']
Tested working with
python==3.9
jupyter_server==1.8.0
jupyterlab==4.0.0a7
Modifying #jfb method, gives the function below which worked fine on ipykernel-5.3.4.
def getNotebookName():
display(Javascript('IPython.notebook.kernel.execute("NotebookName = " + "\'"+window.document.getElementById("notebook_name").innerHTML+"\'");'))
try:
_ = type(NotebookName)
return NotebookName
except:
return None
Note that the display javascript will take some time to reach the browser, and it will take some time to execute the JS and get back to the kernel. I know it may sound stupid, but it's better to run the function in two cells, like this:
nb_name = getNotebookName()
and in the following cell:
for i in range(10):
nb_name = getNotebookName()
if nb_name is not None:
break
However, if you don't need to define a function, the wise method is to run display(Javascript(..)) in one cell, and check the notebook name in another cell. In this way, the browser has enough time to execute the code and return the notebook name.
If you don't mind to use a library, the most robust way is:
import ipynbname
nb_name = ipynbname.name()
If you are using Visual Studio Code:
import IPython ; IPython.extract_module_locals()[1]['__vsc_ipynb_file__']
Assuming you have the Jupyter Notebook server's host, port, and authentication token, this should work for you. It's based off of this answer.
import os
import json
import posixpath
import subprocess
import urllib.request
import psutil
def get_notebook_path(host, port, token):
process_id = os.getpid();
notebooks = get_running_notebooks(host, port, token)
for notebook in notebooks:
if process_id in notebook['process_ids']:
return notebook['path']
def get_running_notebooks(host, port, token):
sessions_url = posixpath.join('http://%s:%d' % (host, port), 'api', 'sessions')
sessions_url += f'?token={token}'
response = urllib.request.urlopen(sessions_url).read()
res = json.loads(response)
notebooks = [{'kernel_id': notebook['kernel']['id'],
'path': notebook['notebook']['path'],
'process_ids': get_process_ids(notebook['kernel']['id'])} for notebook in res]
return notebooks
def get_process_ids(name):
child = subprocess.Popen(['pgrep', '-f', name], stdout=subprocess.PIPE, shell=False)
response = child.communicate()[0]
return [int(pid) for pid in response.split()]
Example usage:
get_notebook_path('127.0.0.1', 17004, '344eb91bee5742a8501cc8ee84043d0af07d42e7135bed90')
To realize why you can't get notebook name using these JS-based solutions, run this code and notice the delay it takes for the message box to appear after python has finished execution of the cell / entire notebook:
%%javascript
function sayHello() {
alert('Hello world!');
}
setTimeout(sayHello, 1000);
More info
Javascript calls are async and hence not guaranteed to complete before python starts running another cell containing the code expecting this notebook name variable to be already created... resulting in NameError when trying to access non-existing variables that should contain notebook name.
I suspect some upvotes on this page became locked before voters could discover that all %%javascript-based solutions ultimately don't work... when the producer and consumer notebook cells are executed together (or in a quick succession).
All Json based solutions fail if we execute more than one cell at a time
because the result will not be ready until after the end of the execution
(its not a matter of using sleep or waiting any time, check it yourself but remember to restart kernel and run all every test)
Based on previous solutions, this avoids using the %% magic in case you need to put it in the middle of some other code:
from IPython.display import display, Javascript
# can have comments here :)
js_cmd = 'IPython.notebook.kernel.execute(\'nb_name = "\' + IPython.notebook.notebook_name + \'"\')'
display(Javascript(js_cmd))
For python 3, the following based on the answer by #Iguananaut and updated for latest python and possibly multiple servers will work:
import os
import json
try:
from urllib2 import urlopen
except:
from urllib.request import urlopen
import ipykernel
connection_file_path = ipykernel.get_connection_file()
connection_file = os.path.basename(connection_file_path)
kernel_id = connection_file.split('-', 1)[1].split('.')[0]
running_servers = !jupyter notebook list
running_servers = [s.split('::')[0].strip() for s in running_servers[1:]]
nb_name = '???'
for serv in running_servers:
uri_parts = serv.split('?')
uri_parts[0] += 'api/sessions'
sessions = json.load(urlopen('?'.join(uri_parts)))
for sess in sessions:
if sess['kernel']['id'] == kernel_id:
nb_name = os.path.basename(sess['notebook']['path'])
break
if nb_name != '???':
break
print (f'[{nb_name}]')
just use ipynbname , which is practical
import ipynbname
nb_fname = ipynbname.name()
nb_path = ipynbname.path()
print(f"{nb_fname=}")
print(f"{nb_path=}")
I found this in https://stackoverflow.com/a/65907473/15497427

How to use startTLS with ldaptor?

I'm trying to use ldaptor to connect via startTLS to a LDAP server. Searching on internet and trying myself I arrived to this snippet of code:
from ldaptor.protocols.ldap import ldapclient, ldapsyntax, ldapconnector, distinguishedname
[...]
def main(base, serviceLocationOverrides):
c=ldapconnector.LDAPClientCreator(reactor, ldapclient.LDAPClient)
d = c.connect(base, serviceLocationOverrides)
d.addCallbacks(lambda proto: proto.startTLS(), error)
[...]
d.addErrback(error)
d.addBoth(lambda dummy: reactor.stop())
reactor.run()
but the code exits with an AssertionError:
[Failure instance: Traceback: <type 'exceptions.AssertionError'>:
/usr/lib/python2.7/dist-packages/twisted/internet/base.py:1167:mainLoop
/usr/lib/python2.7/dist-packages/twisted/internet/base.py:789:runUntilCurrent
/usr/lib/python2.7/dist-packages/twisted/internet/defer.py:361:callback
/usr/lib/python2.7/dist-packages/twisted/internet/defer.py:455:_startRunCallbacks
--- <exception caught here> ---
/usr/lib/python2.7/dist-packages/twisted/internet/defer.py:542:_runCallbacks
/usr/lib/pymodules/python2.7/ldaptor/protocols/ldap/ldapclient.py:239:_startTLS
/usr/lib/pymodules/python2.7/ldaptor/protocols/pureldap.py:1278:__init__
/usr/lib/pymodules/python2.7/ldaptor/protocols/pureldap.py:1144:__init__
]
I have tried to look in ldaptor code for the incriminated assertion but seems all ok.
Is there someone who succeded in using ldaptorClient startTLS?
A code snippet?
Thank you very much
Bye
I'm pretty certain that your problem is one I ran into a while back. In ldaptor/protocols/pureldap.py, line 1144 asserts that the LDAPExtendedRequest requestValue must be a string. But according to RFC 2251, that value is optional, and specifically should NOT be present in startTLS requests.
So your approach is correct; this is just a major bug in ldaptor. As far as I can tell, the author only tested using simple bind without TLS. You need to comment out that line in pureldap.py. If you're deploying this with the expectation that users will download or easy-install ldaptor, then you'll need to create a fixed copy of the LDAPExtendedRequest class in your own code, and sub it in at run-time.
Having had to maintain a project using ldaptor for several years, I would strongly urge you to switch to python-ldap if at all possible. Since it wraps the OpenLDAP libs, it can be much more difficult to build, especially with full support for SSL/SASL. But it's well worth it, because ldaptor has a lot more problems than just the one you ran across.
Using ldaptor 0.0.54 from https://github.com/twisted/ldaptor, I had no problems using StartTLS.
Here is the code:
#! /usr/bin/env python
from twisted.internet import reactor, defer
from ldaptor.protocols.ldap import ldapclient, ldapsyntax, ldapconnector
#defer.inlineCallbacks
def example():
serverip = 'your.server.name.or.ip'
basedn = 'o=Organization'
binddn = 'cn=admin,o=Organization'
bindpw = 'Sekret'
query = '(uid=jetsong)'
c = ldapconnector.LDAPClientCreator(reactor, ldapclient.LDAPClient)
overrides = {basedn: (serverip, 389)}
client = yield c.connect(basedn, overrides=overrides)
client = yield client.startTLS()
yield client.bind(binddn, bindpw)
o = ldapsyntax.LDAPEntry(client, basedn)
results = yield o.search(filterText=query)
for entry in results:
print entry
if __name__ == '__main__':
df = example()
df.addErrback(lambda err: err.printTraceback())
df.addCallback(lambda _: reactor.stop())
reactor.run()

Make sure only a single instance of a program is running

Is there a Pythonic way to have only one instance of a program running?
The only reasonable solution I've come up with is trying to run it as a server on some port, then second program trying to bind to same port - fails. But it's not really a great idea, maybe there's something more lightweight than this?
(Take into consideration that program is expected to fail sometimes, i.e. segfault - so things like "lock file" won't work)
The following code should do the job, it is cross-platform and runs on Python 2.4-3.2. I tested it on Windows, OS X and Linux.
from tendo import singleton
me = singleton.SingleInstance() # will sys.exit(-1) if other instance is running
The latest code version is available singleton.py. Please file bugs here.
You can install tend using one of the following methods:
easy_install tendo
pip install tendo
manually by getting it from http://pypi.python.org/pypi/tendo
Simple, cross-platform solution, found in another question by zgoda:
import fcntl
import os
import sys
def instance_already_running(label="default"):
"""
Detect if an an instance with the label is already running, globally
at the operating system level.
Using `os.open` ensures that the file pointer won't be closed
by Python's garbage collector after the function's scope is exited.
The lock will be released when the program exits, or could be
released if the file pointer were closed.
"""
lock_file_pointer = os.open(f"/tmp/instance_{label}.lock", os.O_WRONLY)
try:
fcntl.lockf(lock_file_pointer, fcntl.LOCK_EX | fcntl.LOCK_NB)
already_running = False
except IOError:
already_running = True
return already_running
A lot like S.Lott's suggestion, but with the code.
This code is Linux specific. It uses 'abstract' UNIX domain sockets, but it is simple and won't leave stale lock files around. I prefer it to the solution above because it doesn't require a specially reserved TCP port.
try:
import socket
s = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
## Create an abstract socket, by prefixing it with null.
s.bind( '\0postconnect_gateway_notify_lock')
except socket.error as e:
error_code = e.args[0]
error_string = e.args[1]
print "Process already running (%d:%s ). Exiting" % ( error_code, error_string)
sys.exit (0)
The unique string postconnect_gateway_notify_lock can be changed to allow multiple programs that need a single instance enforced.
I don't know if it's pythonic enough, but in the Java world listening on a defined port is a pretty widely used solution, as it works on all major platforms and doesn't have any problems with crashing programs.
Another advantage of listening to a port is that you could send a command to the running instance. For example when the users starts the program a second time, you could send the running instance a command to tell it to open another window (that's what Firefox does, for example. I don't know if they use TCP ports or named pipes or something like that, 'though).
Never written python before, but this is what I've just implemented in mycheckpoint, to prevent it being started twice or more by crond:
import os
import sys
import fcntl
fh=0
def run_once():
global fh
fh=open(os.path.realpath(__file__),'r')
try:
fcntl.flock(fh,fcntl.LOCK_EX|fcntl.LOCK_NB)
except:
os._exit(0)
run_once()
Found Slava-N's suggestion after posting this in another issue (http://stackoverflow.com/questions/2959474). This one is called as a function, locks the executing scripts file (not a pid file) and maintains the lock until the script ends (normal or error).
Use a pid file. You have some known location, "/path/to/pidfile" and at startup you do something like this (partially pseudocode because I'm pre-coffee and don't want to work all that hard):
import os, os.path
pidfilePath = """/path/to/pidfile"""
if os.path.exists(pidfilePath):
pidfile = open(pidfilePath,"r")
pidString = pidfile.read()
if <pidString is equal to os.getpid()>:
# something is real weird
Sys.exit(BADCODE)
else:
<use ps or pidof to see if the process with pid pidString is still running>
if <process with pid == 'pidString' is still running>:
Sys.exit(ALREADAYRUNNING)
else:
# the previous server must have crashed
<log server had crashed>
<reopen pidfilePath for writing>
pidfile.write(os.getpid())
else:
<open pidfilePath for writing>
pidfile.write(os.getpid())
So, in other words, you're checking if a pidfile exists; if not, write your pid to that file. If the pidfile does exist, then check to see if the pid is the pid of a running process; if so, then you've got another live process running, so just shut down. If not, then the previous process crashed, so log it, and then write your own pid to the file in place of the old one. Then continue.
The best solution for this on windows is to use mutexes as suggested by #zgoda.
import win32event
import win32api
from winerror import ERROR_ALREADY_EXISTS
mutex = win32event.CreateMutex(None, False, 'name')
last_error = win32api.GetLastError()
if last_error == ERROR_ALREADY_EXISTS:
print("App instance already running")
Some answers use fctnl (included also in #sorin tendo package) which is not available on windows and should you try to freeze your python app using a package like pyinstaller which does static imports, it throws an error.
Also, using the lock file method, creates a read-only problem with database files( experienced this with sqlite3).
Here is my eventual Windows-only solution. Put the following into a module, perhaps called 'onlyone.py', or whatever. Include that module directly into your __ main __ python script file.
import win32event, win32api, winerror, time, sys, os
main_path = os.path.abspath(sys.modules['__main__'].__file__).replace("\\", "/")
first = True
while True:
mutex = win32event.CreateMutex(None, False, main_path + "_{<paste YOUR GUID HERE>}")
if win32api.GetLastError() == 0:
break
win32api.CloseHandle(mutex)
if first:
print "Another instance of %s running, please wait for completion" % main_path
first = False
time.sleep(1)
Explanation
The code attempts to create a mutex with name derived from the full path to the script. We use forward-slashes to avoid potential confusion with the real file system.
Advantages
No configuration or 'magic' identifiers needed, use it in as many different scripts as needed.
No stale files left around, the mutex dies with you.
Prints a helpful message when waiting
This may work.
Attempt create a PID file to a known location. If you fail, someone has the file locked, you're done.
When you finish normally, close and remove the PID file, so someone else can overwrite it.
You can wrap your program in a shell script that removes the PID file even if your program crashes.
You can, also, use the PID file to kill the program if it hangs.
For anybody using wxPython for their application, you can use the function wx.SingleInstanceChecker documented here.
I personally use a subclass of wx.App which makes use of wx.SingleInstanceChecker and returns False from OnInit() if there is an existing instance of the app already executing like so:
import wx
class SingleApp(wx.App):
"""
class that extends wx.App and only permits a single running instance.
"""
def OnInit(self):
"""
wx.App init function that returns False if the app is already running.
"""
self.name = "SingleApp-%s".format(wx.GetUserId())
self.instance = wx.SingleInstanceChecker(self.name)
if self.instance.IsAnotherRunning():
wx.MessageBox(
"An instance of the application is already running",
"Error",
wx.OK | wx.ICON_WARNING
)
return False
return True
This is a simple drop-in replacement for wx.App that prohibits multiple instances. To use it simply replace wx.App with SingleApp in your code like so:
app = SingleApp(redirect=False)
frame = wx.Frame(None, wx.ID_ANY, "Hello World")
frame.Show(True)
app.MainLoop()
Using a lock-file is a quite common approach on unix. If it crashes, you have to clean up manually. You could stor the PID in the file, and on startup check if there is a process with this PID, overriding the lock-file if not. (However, you also need a lock around the read-file-check-pid-rewrite-file). You will find what you need for getting and checking pid in the os-package. The common way of checking if there exists a process with a given pid, is to send it a non-fatal signal.
Other alternatives could be combining this with flock or posix semaphores.
Opening a network socket, as saua proposed, would probably be the easiest and most portable.
I'm posting this as an answer because I'm a new user and Stack Overflow won't let me vote yet.
Sorin Sbarnea's solution works for me under OS X, Linux and Windows, and I am grateful for it.
However, tempfile.gettempdir() behaves one way under OS X and Windows and another under other some/many/all(?) *nixes (ignoring the fact that OS X is also Unix!). The difference is important to this code.
OS X and Windows have user-specific temp directories, so a tempfile created by one user isn't visible to another user. By contrast, under many versions of *nix (I tested Ubuntu 9, RHEL 5, OpenSolaris 2008 and FreeBSD 8), the temp dir is /tmp for all users.
That means that when the lockfile is created on a multi-user machine, it's created in /tmp and only the user who creates the lockfile the first time will be able to run the application.
A possible solution is to embed the current username in the name of the lock file.
It's worth noting that the OP's solution of grabbing a port will also misbehave on a multi-user machine.
Building upon Roberto Rosario's answer, I come up with the following function:
SOCKET = None
def run_single_instance(uniq_name):
try:
import socket
global SOCKET
SOCKET = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
## Create an abstract socket, by prefixing it with null.
# this relies on a feature only in linux, when current process quits, the
# socket will be deleted.
SOCKET.bind('\0' + uniq_name)
return True
except socket.error as e:
return False
We need to define global SOCKET vaiable since it will only be garbage collected when the whole process quits. If we declare a local variable in the function, it will go out of scope after the function exits, thus the socket be deleted.
All the credit should go to Roberto Rosario, since I only clarify and elaborate upon his code. And this code will work only on Linux, as the following quoted text from https://troydhanson.github.io/network/Unix_domain_sockets.html explains:
Linux has a special feature: if the pathname for a UNIX domain socket
begins with a null byte \0, its name is not mapped into the
filesystem. Thus it won’t collide with other names in the filesystem.
Also, when a server closes its UNIX domain listening socket in the
abstract namespace, its file is deleted; with regular UNIX domain
sockets, the file persists after the server closes it.
Late answer, but for windows you can use:
from win32event import CreateMutex
from win32api import CloseHandle, GetLastError
from winerror import ERROR_ALREADY_EXISTS
import sys
class singleinstance:
""" Limits application to single instance """
def __init__(self):
self.mutexname = "testmutex_{D0E858DF-985E-4907-B7FB-8D732C3FC3B9}"
self.mutex = CreateMutex(None, False, self.mutexname)
self.lasterror = GetLastError()
def alreadyrunning(self):
return (self.lasterror == ERROR_ALREADY_EXISTS)
def __del__(self):
if self.mutex:
CloseHandle(self.mutex)
Usage
# do this at beginnig of your application
myapp = singleinstance()
# check is another instance of same program running
if myapp.alreadyrunning():
print ("Another instance of this program is already running")
sys.exit(1)
Here is a cross platform example that I've tested on Windows Server 2016 and Ubuntu 20.04 using Python 3.7.9:
import os
class SingleInstanceChecker:
def __init__(self, id):
if isWin():
ensure_win32api()
self.mutexname = id
self.lock = win32event.CreateMutex(None, False, self.mutexname)
self.running = (win32api.GetLastError() == winerror.ERROR_ALREADY_EXISTS)
else:
ensure_fcntl()
self.lock = open(f"/tmp/isnstance_{id}.lock", 'wb')
try:
fcntl.lockf(self.lock, fcntl.LOCK_EX | fcntl.LOCK_NB)
self.running = False
except IOError:
self.running = True
def already_running(self):
return self.running
def __del__(self):
if self.lock:
try:
if isWin():
win32api.CloseHandle(self.lock)
else:
os.close(self.lock)
except Exception as ex:
pass
# ---------------------------------------
# Utility Functions
# Dynamically load win32api on demand
# Install with: pip install pywin32
win32api=winerror=win32event=None
def ensure_win32api():
global win32api,winerror,win32event
if win32api is None:
import win32api
import winerror
import win32event
# Dynamically load fcntl on demand
# Install with: pip install fcntl
fcntl=None
def ensure_fcntl():
global fcntl
if fcntl is None:
import fcntl
def isWin():
return (os.name == 'nt')
# ---------------------------------------
Here is it in use:
import time, sys
def main(argv):
_timeout = 10
print("main() called. sleeping for %s seconds" % _timeout)
time.sleep(_timeout)
print("DONE")
if __name__ == '__main__':
SCR_NAME = "my_script"
sic = SingleInstanceChecker(SCR_NAME)
if sic.already_running():
print("An instance of {} is already running.".format(SCR_NAME))
sys.exit(1)
else:
main(sys.argv[1:])
I use single_process on my gentoo;
pip install single_process
example:
from single_process import single_process
#single_process
def main():
print 1
if __name__ == "__main__":
main()
refer: https://pypi.python.org/pypi/single_process/
I keep suspecting there ought to be a good POSIXy solution using process groups, without having to hit the file system, but I can't quite nail it down. Something like:
On startup, your process sends a 'kill -0' to all processes in a particular group. If any such processes exist, it exits. Then it joins the group. No other processes use that group.
However, this has a race condition - multiple processes could all do this at precisely the same time and all end up joining the group and running simultaneously. By the time you've added some sort of mutex to make it watertight, you no longer need the process groups.
This might be acceptable if your process only gets started by cron, once every minute or every hour, but it makes me a bit nervous that it would go wrong precisely on the day when you don't want it to.
I guess this isn't a very good solution after all, unless someone can improve on it?
I ran into this exact problem last week, and although I did find some good solutions, I decided to make a very simple and clean python package and uploaded it to PyPI. It differs from tendo in that it can lock any string resource name. Although you could certainly lock __file__ to achieve the same effect.
Install with: pip install quicklock
Using it is extremely simple:
[nate#Nates-MacBook-Pro-3 ~/live] python
Python 2.7.6 (default, Sep 9 2014, 15:04:36)
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.39)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from quicklock import singleton
>>> # Let's create a lock so that only one instance of a script will run
...
>>> singleton('hello world')
>>>
>>> # Let's try to do that again, this should fail
...
>>> singleton('hello world')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/nate/live/gallery/env/lib/python2.7/site-packages/quicklock/quicklock.py", line 47, in singleton
raise RuntimeError('Resource <{}> is currently locked by <Process {}: "{}">'.format(resource, other_process.pid, other_process.name()))
RuntimeError: Resource <hello world> is currently locked by <Process 24801: "python">
>>>
>>> # But if we quit this process, we release the lock automatically
...
>>> ^D
[nate#Nates-MacBook-Pro-3 ~/live] python
Python 2.7.6 (default, Sep 9 2014, 15:04:36)
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.39)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from quicklock import singleton
>>> singleton('hello world')
>>>
>>> # No exception was thrown, we own 'hello world'!
Take a look: https://pypi.python.org/pypi/quicklock
linux example
This method is based on the creation of a temporary file automatically deleted after you close the application.
the program launch we verify the existence of the file;
if the file exists ( there is a pending execution) , the program is closed ; otherwise it creates the file and continues the execution of the program.
from tempfile import *
import time
import os
import sys
f = NamedTemporaryFile( prefix='lock01_', delete=True) if not [f for f in os.listdir('/tmp') if f.find('lock01_')!=-1] else sys.exit()
YOUR CODE COMES HERE
On a Linux system one could also ask
pgrep -a for the number of instances, the script
is found in the process list (option -a reveals the
full command line string). E.g.
import os
import sys
import subprocess
procOut = subprocess.check_output( "/bin/pgrep -u $UID -a python", shell=True,
executable="/bin/bash", universal_newlines=True)
if procOut.count( os.path.basename(__file__)) > 1 :
sys.exit( ("found another instance of >{}<, quitting."
).format( os.path.basename(__file__)))
Remove -u $UID if the restriction should apply to all users.
Disclaimer: a) it is assumed that the script's (base)name is unique, b) there might be race conditions.
Here's a good example for django with contextmanager and memcached:
https://docs.celeryproject.org/en/latest/tutorials/task-cookbook.html
Can be used to protect simultaneous operation on different hosts.
Can be used to manage multiple tasks.
Can also be changed for simple python scripts.
My modification of the above code is here:
import time
from contextlib import contextmanager
from django.core.cache import cache
#contextmanager
def memcache_lock(lock_key, lock_value, lock_expire):
timeout_at = time.monotonic() + lock_expire - 3
# cache.add fails if the key already exists
status = cache.add(lock_key, lock_value, lock_expire)
try:
yield status
finally:
# memcache delete is very slow, but we have to use it to take
# advantage of using add() for atomic locking
if time.monotonic() < timeout_at and status:
# don't release the lock if we exceeded the timeout
# to lessen the chance of releasing an expired lock owned by someone else
# also don't release the lock if we didn't acquire it
cache.delete(lock_key)
LOCK_EXPIRE = 60 * 10 # Lock expires in 10 minutes
def main():
lock_name, lock_value = "lock_1", "locked"
with memcache_lock(lock_name, lock_value, LOCK_EXPIRE) as acquired:
if acquired:
# single instance code here:
pass
if __name__ == "__main__":
main()
Here is a cross-platform implementation, creating a temporary lock file using a context manager.
Can be used to manage multiple tasks.
import os
from contextlib import contextmanager
from time import sleep
class ExceptionTaskInProgress(Exception):
pass
# Context manager for suppressing exceptions
class SuppressException:
def __init__(self):
pass
def __enter__(self):
return self
def __exit__(self, *exc):
return True
# Context manager for task
class TaskSingleInstance:
def __init__(self, task_name, lock_path):
self.task_name = task_name
self.lock_path = lock_path
self.lock_filename = os.path.join(self.lock_path, self.task_name + ".lock")
if os.path.exists(self.lock_filename):
raise ExceptionTaskInProgress("Resource already in use")
def __enter__(self):
self.fl = open(self.lock_filename, "w")
return self
def __exit__(self, exc_type, exc_val, exc_tb):
self.fl.close()
os.unlink(self.lock_filename)
# Here the task is silently interrupted
# if it is already running on another instance.
def main1():
task_name = "task1"
tmp_filename_path = "."
with SuppressException():
with TaskSingleInstance(task_name, tmp_filename_path):
print("The task `{}` has started.".format(task_name))
# The single task instance code is here.
sleep(5)
print("The task `{}` has completed.".format(task_name))
# Here the task is interrupted with a message
# if it is already running in another instance.
def main2():
task_name = "task1"
tmp_filename_path = "."
try:
with TaskSingleInstance(task_name, tmp_filename_path):
print("The task `{}` has started.".format(task_name))
# The single task instance code is here.
sleep(5)
print("Task `{}` completed.".format(task_name))
except ExceptionTaskInProgress as ex:
print("The task `{}` is already running.".format(task_name))
if __name__ == "__main__":
main1()
main2()
import sys,os
# start program
try: # (1)
os.unlink('lock') # (2)
fd=os.open("lock", os.O_CREAT|os.O_EXCL) # (3)
except:
try: fd=os.open("lock", os.O_CREAT|os.O_EXCL) # (4)
except:
print "Another Program running !.." # (5)
sys.exit()
# your program ...
# ...
# exit program
try: os.close(fd) # (6)
except: pass
try: os.unlink('lock')
except: pass
sys.exit()

Categories

Resources