MySql: will it deadlock on to many insert - Django? - python

I wanna migrate from sqlite3 to MySQL in Django. Now I have been working in Oracle, MS Server and I know that I can make Exception to try over and over again until it is done... However this is insert in a same table where the data must be INSERTED right away because users will not be happy for waiting their turn on INSERT on the same table.
So I was wondering, will the deadlock happen on table if to many users make insert in same time and what should I do to bypass that, so that users don't sense it?

I don't think you can get deadlock just from rapid insertions. Deadlock occurs when you have two processes that are each waiting for the other one to do something before they can make the change that the other one is waiting for. If two processes are just inserting, the database will simply process them in the order that they're received, there's no dependency between them.
If you're using InnoDB, it uses row-level locking. So unless two inserts both try to insert the same unique key, they shouldn't even lock each other out, they can be done concurrently.

Related

Strategies for renaming table and indices while select queries are running

I have a job which copies a large file to a table temp_a and also creates an index idx_temp_a_j on a column j. Now once the job finishes copying all the data, I have to rename this table to a table prod_a which is production facing and queries are always running against it with very less idle time. But once I run the rename queries, the queries coming in and the queries which are already running, are backed up producing high API error rates. I want to know what are the possible strategies I can implement so the renaming of the table happens with less downtime.
So far, below are the strategies I came up with:
First, just rename the table and allow queries to be backed up. This approach seems unreliable as rename table query acquires the EXCLUSIVE LOCK and all other queries are backed up, I am getting high level of API error rates.
Second, write a polling function which checks if there any queries running now if not then rename the table and index. In this approach the polling function will check periodically to see if any query is running, any queries are running, then wait , if not then run the alter table query. This approach will only queue up queries which are coming after the alter table rename query has placed an EXCLUSIVE LOCK on the table. Once the renaming finishes, the queued up queries will get executed. I still need to find database APIs which will help me in writing this function.
What are the other strategies which can allow this "seamless" renaming of the table? I am using postgres (PostgreSQL) 11.4 and the job which does all this is in Python.
You cannot avoid blocking concurrent queries while a table is renamed.
The operation itself is blazingly fast, so any delay you experience must be because the ALTER TABLE itself is blocked by long running transactions using the table. All later operations on the table then have to queue behind the ALTER TABLE.
The solution for painless renaming is to keep database transactions very short (which is always desirable, since it also reduces the danger of deadlocks).

Is it thread-safe to use SQLAlchemy with engine/connections instead of sessions?

I've got a simple webservice which uses SQLAlchemy to connect to a database using the pattern
engine = create_engine(database_uri)
connection = engine.connect()
In each endpoint of the service, I then use the same connection, in the following fashion:
for result in connection.execute(query):
<do something fancy>
Since Sessions are not thread-safe, I'm afraid that connections aren't either.
Can I safely keep doing this? If not, what's the easiest way to fix it?
Minor note -- I don't know if the service will ever run multithreaded, but I'd rather be sure that I don't get into trouble when it does.
Short answer: you should be fine.
There is a difference between a connection and a Session. The short description is that connections represent just that… a connection to a database. Information you pass into it will come out pretty plain. It won't keep track of your transactions unless you tell it to, and it won't care about what order you send it data. So if it matters that you create your Widget object before you create your Sprocket object, then you better call that in a thread-safe context. Same generally goes for if you want to keep track of a database transaction.
Session, on the other hand, keeps track of data and transactions for you. If you check out the source code, you'll notice quite a bit of back and forth over database transactions and without a way to know that you have everything you want in a transaction, you could very well end up committing in one thread while you expect to be able to add another object (or several) in another.
In case you don't know what a transaction is this is Wikipedia, but the short version is that transactions help make sure your data stays stable. If you have 15 inserts and updates, and insert 15 fails, you might not want to make the other 14. A transaction would let you cancel the entire operation in bulk.

SQLAlchemy long running script: User was holding a relation lock for too long

I have an SQLAlchemy session in a script. The script is running for a long time, and it only fetches data from database, never updates or inserts.
I get quite a lot of errors like
sqlalchemy.exc.DBAPIError: (TransactionRollbackError) terminating connection due to conflict with recovery
DETAIL: User was holding a relation lock for too long.
The way I understand it, SQLAlchemy creates a transaction with the first select issued, and then reuses it. As my script may run for about an hour, it is very likely that a conflict comes up during the lifetime of that transaction.
To get rid of the error, I could use autocommit in te deprecated mode (without doing anything more), but this is explicitly discouraged by the documentation.
What is the right way to deal with the error? Can I use ORM queries without transactions at all?
I ended up closing the session after (almost) every select, like
session.query(Foo).all()
session.close()
since I do not use autocommit, a new transaction is automatically opened.

SQLite3 and Multiprocessing

I noticed that sqlite3 isn´t really capable nor reliable when i use it inside a multiprocessing enviroment. Each process tries to write some data into the same database, so that a connection is used by multiple threads. I tried it with the check_same_thread=False option, but the number of insertions is pretty random: Sometimes it includes everything, sometimes not. Should I parallel-process only parts of the function (fetching data from the web), stack their outputs into a list and put them into the table all together or is there a reliable way to handle multi-connections with sqlite?
First of all, there's a difference between multiprocessing (multiple processes) and multithreading (multiple threads within one process).
It seems that you're talking about multithreading here. There are a couple of caveats that you should be aware of when using SQLite in a multithreaded environment. The SQLite documentation mentions the following:
Do not use the same database connection at the same time in more than
one thread.
On some operating systems, a database connection should
always be used in the same thread in which it was originally created.
See here for a more detailed information: Is SQLite thread-safe?
I've actually just been working on something very similar:
multiple processes (for me a processing pool of 4 to 32 workers)
each process worker does some stuff that includes getting information
from the web (a call to the Alchemy API for mine)
each process opens its own sqlite3 connection, all to a single file, and each
process adds one entry before getting the next task off the stack
At first I thought I was seeing the same issue as you, then I traced it to overlapping and conflicting issues with retrieving the information from the web. Since I was right there I did some torture testing on sqlite and multiprocessing and found I could run MANY process workers, all connecting and adding to the same sqlite file without coordination and it was rock solid when I was just putting in test data.
So now I'm looking at your phrase "(fetching data from the web)" - perhaps you could try replacing that data fetching with some dummy data to ensure that it is really the sqlite3 connection causing you problems. At least in my tested case (running right now in another window) I found that multiple processes were able to all add through their own connection without issues but your description exactly matches the problem I'm having when two processes step on each other while going for the web API (very odd error actually) and sometimes don't get the expected data, which of course leaves an empty slot in the database. My eventual solution was to detect this failure within each worker and retry the web API call when it happened (could have been more elegant, but this was for a personal hack).
My apologies if this doesn't apply to your case, without code it's hard to know what you're facing, but the description makes me wonder if you might widen your considerations.
sqlitedict: A lightweight wrapper around Python's sqlite3 database, with a dict-like interface and multi-thread access support.
If I had to build a system like the one you describe, using SQLITE, then I would start by writing an async server (using the asynchat module) to handle all of the SQLITE database access, and then I would write the other processes to use that server. When there is only one process accessing the db file directly, it can enforce a strict sequence of queries so that there is no danger of two processes stepping on each others toes. It is also faster than continually opening and closing the db.
In fact, I would also try to avoid maintaining sessions, in other words, I would try to write all the other processes so that every database transaction is independent. At minimum this would mean allowing a transaction to contain a list of SQL statements, not just one, and it might even require some if then capability so that you could SELECT a record, check that a field is equal to X, and only then, UPDATE that field. If your existing app is closing the database after every transaction, then you don't need to worry about sessions.
You might be able to use something like nosqlite http://code.google.com/p/nosqlite/

how to prevent QTableModel from updating table depending on some condition

i have a mysql tables that uses lock-write mechanism. the lock might go for too long (we're talking about 1-2 minutes here).
i had to make a check if the table is in use or not before the update is done (using beforeUpdate signal)
but after checking and returning that my table is in use , system hang until the other user unlocks the table . is it possible to prevent data from updating if the flag returned that the table is in use.,
im searching for a better way to handle this i don't want to re-implement the setData method since doing this is a pain. or if you have a good re-implementation for it . it will be very helpfull.
thanks in advance
Python threading: http://docs.python.org/library/thread.html You can create threads that wait until the table is finished and it should be negligible in system resources, also your end user wont have to wait for the system to respond to continue with a different task.

Categories

Resources