SQLite-connect() blocks due to unrelated connect() on separate process - python

I want to launch a separate process that connects to a SQLite-db (the ultimate goal is to run a service on that process). This typically works fine. When I however connect to another db-file before launching the process, the connect() command is completely blocking: neither does it finish, nor does it raise an Error.
import sqlite3, multiprocessing, time
def connect(filename):
print 'Creating a file on current process!'
sqlite3.connect(filename).close()
def connect_process(filename):
def process_f():
print 'Gets here...'
conn = sqlite3.connect(filename)
print '...but not here when local process has previously connected to any unrelated sqlite-file!!'
conn.close()
process = multiprocessing.Process(target=process_f)
process.start()
process.join()
if(__name__=='__main__'):
connect_process('my_db_1') # Just to show that it generally works
time.sleep(0.5)
connect('any_file') # Connect to unrelated file
connect_process('my_db_2') # Does not get to the end!!
time.sleep(2)
This returns:
> Gets here...
...but not here when local process has connected to any unrelated sqlite-file!!
Creating a file on current process!
Gets here..
So we would expect another line to be printed at the end ...but not here when...
Remarks:
I know that SQLite cannot candle concurrent access. It should however
work here for 2 reasons: 1) the file I connect to on my local process
is different to the separately created one. 2) the connection
on the former file is long closed by the time the process gets
created.
The only operation I use here is to connect to the DB and
then close the connection immediately (which creates the file if not
existing). I have of course verified that we get the same behavior if we
actually do anything meaningful...
The code is just a minimal working example for what I really want to do. The goal is to test a service
that uses SQLite. Hence in the test-setup, I need to create some mock
SQLite-files… The service is then launched on a separate process in order to test it via the respective client.

Related

Python - FTP Connection Pool

I have a generic multiprocess script that could run any task in a multiprocess set up. I inject the task as a command line argument and use getattr to call the functions in the injected code.
taskModule = importlib.import_module(taskFile.replace(".py", ""))
taskContext = getattr(taskModule, 'init')()
response = pool.map_async(getattr(taskModule, 'run'), inputList)
The init() function creates all relevant variables for the task to execute and returns them as a dict object - the taskContext. inputList is a list of dict objects, each dict containing both the taskContext object as well as the specific item to be processed, so that each process gets a unique item to process along with a copy of the context required by the task.
One of those tasks is meant for FTP and the taskContext in that case contains information on the FTP server along with other details. The run function in the FTP task pretty much opens a connection using the context variables, uploads the required files and closes it, and this works perfectly.
However, I think it'd be good if I can set up a connection pool with multiple FTP connections at the start, as part of the init() function when the context is created, and then use them in an as-available fashion within the run method, very similar to a DB connection pool that prevents the need to open and close connections to the database every time.
Is this even feasible? If so, what's the best way to go about doing it?
I put together a connection_pool module as part of a proof of concept. I'm not sure how robust it is.
I added connection closing to this which is a bugfix of this.
I was able to set up connection pooling of FTP and SFTP connections transferring a few thousand files over 10-20 threads.
You can install my version from conda:
conda install -c jmeppley connectionpool
Creating an FTP pool looks something like this:
# this is from snakemake
def connect(*args_to_use, **kwargs_to_use):
ftp_base_class = (
ftplib.FTP_TLS if kwargs_to_use["encrypt_data_channel"] else ftplib.FTP
)
ftp_session_factory = ftputil.session.session_factory(
base_class=ftp_base_class,
port=kwargs_to_use["port"],
encrypt_data_channel=kwargs_to_use["encrypt_data_channel"],
debug_level=None,
)
return ftputil.FTPHost(
kwargs_to_use["host"],
kwargs_to_use["username"],
kwargs_to_use["password"],
session_factory=ftp_session_factory,
)
# a function to create a pool using the connect() method
def create_ftp_pool(pool_size, *args_to_use, **kwargs_to_use):
create_callback = partial(connect, *args_to_use, **kwargs_to_use)
connection_pool = ConnectionPool(create_callback,
close=lambda c: c.close(),
max_size=pool_size)
# create a pool with the arguments you'd use to create a connection
pool_size = 10
ftp_pool = create_ftp_pool(pool_size, host=...)
# use item() as a context manager
with ftp_pool.item() as connection:
...

Concurrency on PostgreSQL Database with python subprocesses

I use python multiprocessing processes to establish multiple connections to a postgreSQL database via psycopg.
Every process establishes a connection, creates a cursor, fetches an object from a mp.Queue and does some work on the database. If everything works fine, the changes are commited and the connection is closed.
If one of the processes however creates an error (e.g. an ADD COLUMN request fails, because the COLUMN is already present), all the processes seem to stop working.
import psycopg2
import multiprocessing as mp
import Queue
def connect():
C = psycopg2.connect(host = "myhost", user = "myuser", password = "supersafe", port = 62013, database = "db")
cur = C.cursor()
return C,cur
def commit_and_close(C,cur):
C.commit()
cur.close()
C.close()
def commit(C):
C.commit()
def sub(queue):
C,cur = connect()
while not queue.empty():
work_element = queue.get(timeout=1)
#do something with the work element, that might produce an SQL error
commit_and_close(C,cur)
return 0
if __name__ == '__main__':
job_queue = mp.Queue()
#Fill Job_queue
print 'Run'
for i in range(20):
p=mp.Process(target=sub, args=(job_queue))
p.start()
I can see, that processes are still alive (because the job_queue is still full), but no Network traffic / SQL actions are happening. Is it possible, that an SQL error blocks communication from other subprocesses? How can I prevent that happening?
As chance would have it, I was doing something similar today.
It shouldn't be that the state of one connection can affect a different one, so I don't think we should start there.
There is clearly a race condition in your queue handling. You check if the queue is empty and then try to get a statement to execute. With multiple readers one of the others could empty the queue leaving the others all blocking on their queue.get. If the queue is empty when they all lock up then I would suspect this.
You also never join your processes back when they complete. I'm not sure what effect that would have in the larger picture, but it's probably good practice to clean up.
The other thing that might be happening is that your error-ing process is not rolling back properly. That might leave other transactions waiting to see if it completes or rolls back. They can wait for quite a long time by default but you can configure it.
To see what is happening, fire up psql and check out two useful system views pg_stat_activity and pg_locks. That should show where the cause lies.

python: write file once across concurrent independent invocations

I have a situation where multiple concurrent invocations of a python script takes place involving initializing and loading some system info in a file.
This initialization should happen only once and while it is happening the other invocations must wait somehow. And when this has happened, the other invocations must proceed with reading the file. However, since an unknown number of concurrent invocation of the program is taking place, the section is entered multiple times causing problems.
Here is my code:
#initialization has already happened, load info from file
if os.path.isfile("/tmp/corners.txt"):
logging.info("corners exist, load'em up!")
#load corners from cornersfile
cornersfile=open("/tmp/corners.txt","r")
for line in cornersfile:
corners.append((line.split()[0], line.split()[1]))`
cornersfile.close()
logging.info("corners is %s", corners)
else:
# initialize and do not let other concurrent invocations to proceed!
logging.info("initiation not done, do it!")
#init blocks and return the list of corners
#write corners to file
cornersfile=open("/tmp/corners.txt", "w")
cornersfile.write("\n".join('%s %s' % x for x in corners))
cornersfile.close()
I did some testing running the code 8 times concurrently. In the logs, I see that the first part of code enters thrice and the else part is entered 5 times.
How do I make sure the that the following happens:
If any concurrent invocation finds that the initialization (the else part) is happening, it will wait; all other concurrent invocations will go into a wait state.
If any concurrent invocation finds that the initialization has already happened (that is the file /tmp/corners.txt is present) it will be loaded up.
I understood that there are several python interpreters running. You don't use threads.
I would solve this with file locking. There is a library: https://pypi.python.org/pypi/lockfile
Example:
from lockfile import LockFile
lock = LockFile("/some/file/or/other")
with lock:
print lock.path, 'is locked.'

Dronekit API Python: How to connect to the same vehicle from 2 different processes?

I am looking for help working with the same vehicle from 2 different processes.
I have one SITL instance runnning. I am trying to connect to this same instance from both the main process of my DroneKit script and from a sub-process spawned in the same script.
Both connections work fine (MPAPIConnection object returned in both cases, with the same # reference), but in the subprocess the connection object does not appear to be live, and the vehicle parameters are not updated.
In the example below, the location returned by the main process when the drone is moving is the actual location, but the location returned by the subprocess remains stuck at the initial location when subprocess was first started.
Example:
import time
from pymavlink import mavutil
import multiprocessing
class OtherProcess(multiprocessing.Process):
def __init__(self):
super(OtherProcess,self).__init__()
def run(self):
sp_api = local_connect()
sp_v = api.get_vehicles()[0]
while True:
print "SubProcess : " + str(sp_v.location)
time.sleep(1)
api = local_connect()
v = api.get_vehicles()[0]
sp = OtherProcess()
sp.start()
while True:
print "MainProcess : " + str(v.location)
time.sleep(1)
So is there a way to access the same vehicle from different processes within the same mavproxy instance ?
You should try this again - DKPY2 (just released) uses stand-alone scripts and is designed with the idea that each Vehicle object returned using the connect() function is completely independent. It is certainly possible to connect to separate vehicles in the same script (same process) so very likely you can connect to the same vehicle and from separate processes.

python Redis Connections

I am using Redis server with python.
My application is multithreaded ( I use 20 - 32 threads per process) and I also
I run the app in different machines.
I have noticed that sometimes Redis cpu usage is 100% and Redis server became unresponsive/slow.
I would like to use per application 1 Connection Pool of 4 connections in total.
So for example, if I run my app in 20 machines at maximum, there should be
20*4 = 80 connections to the redis Server.
POOL = redis.ConnectionPool(max_connections=4, host='192.168.1.1', db=1, port=6379)
R_SERVER = redis.Redis(connection_pool=POOL)
class Worker(Thread):
def __init__(self):
self.start()
def run(self):
while True:
key = R_SERVER.randomkey()
if not key: break
value = R_SERVER.get(key)
def _do_something(self, value):
# do something with value
pass
if __name__ = '__main__':
num_threads = 20
workers = [Worker() for _ in range(num_threads)]
for w in workers:
w.join()
The above code should run the 20 threads that get a connection from the connection pool of max size 4 when a command is executed.
When the connection is released?
According to this code (https://github.com/andymccurdy/redis-py/blob/master/redis/client.py):
#### COMMAND EXECUTION AND PROTOCOL PARSING ####
def execute_command(self, *args, **options):
"Execute a command and return a parsed response"
pool = self.connection_pool
command_name = args[0]
connection = pool.get_connection(command_name, **options)
try:
connection.send_command(*args)
return self.parse_response(connection, command_name, **options)
except ConnectionError:
connection.disconnect()
connection.send_command(*args)
return self.parse_response(connection, command_name, **options)
finally:
pool.release(connection)
After the execution of each command, the connection is released and gets back to the pool
Can someone verify that I have understood the idea correct and the above example code will work as described?
Because when I see the redis connections, there are always more than 4.
EDIT: I just noticed in the code that the function has a return statement before the finally. What is the purpose of finally then?
As Matthew Scragg mentioned, the finally clause is executed at the end of the test. In this particular case it serves to release the connection back to the pool when finished with it instead of leaving it hanging open.
As to the unresponsiveness, look to what your server is doing. What is the memory limit of your Redis instance? How often are you saving to disk? Are you running on a Xen based VM such as an AWS instance? Are you running replication, and if so how many slaves and are they in a good state or are they frequently calling for a full resync of data? Are any of your commands "save"?
You can answer some of these questions by using the command line interface. For example
redis-cli info persistence will tell you information about the process of saving to disk, redis-cli info memory will tell you about your memory consumption.
When obtaining the persistence information you want to specifically look at rdb_last_bgsave_status and rdb_last_bgsave_time_sec. These will tell you if the last save was successful and how long it took. The longer it takes the higher the chances are you are running into resource issues and the higher the chance you will encounter slowdowns which can appear as unresponsiveness.
Final block will always run though there is an return statement before it. You may have a look at redis-py/connection.py , pool.release(connection) only put the connection to available-connections pool, So the connection is still alive.
About redis server cpu usage, your app will always send request and has no breaks or sleep, so it just use more and more cpus , but not memory . and cpu usage has no relation with open file numbers.

Categories

Resources