I have a flask project that relies on flask-sqlalchemy and celery to do a lot of things. Many of the celery tasks reach out to external API's, fetch some data, and read/update data on disk. When expanding my number of tasks that are being run, I see a lot of 40001 errors such as
(psycopg2.errors.SerializationFailure) could not serialize access due to read/write dependencies among transactions
DETAIL: Reason code: Canceled on identification as a pivot, during write.
HINT: The transaction might succeed if retried.
And other similar errors, all relating to transactions failing. We are using serializable transactions in SQLAlchemy.
I've tried to rerun the tasks using the builtin celery auto-retry features, but I think an architectural change needs to be made, since the tasks are still failing, even with several retries. The reading from remote APIs and storing to database logic is mingled in the tasks and my first idea was to separate this logic out as much as possible, first reading from database and API, then sending another task that would make changes to API/database, but I'm not sure this would help. I don't want to wrap every single call to db.session.query(Object) or similar in try/catch clauses if there's a better way?
Code looks like this (heavily redacted)
#celery_app.task(
bind=True,
autoretry_for=(
HTTPError,
sqlalchemy.exc.OperationalError,
sqlalchemy.exc.IntegrityError,
),
retry_backoff=5,
retry_jitter=True,
retry_kwargs={"max_retries": 5},
)
def sync_db_objects_task(self, instance_id: int):
instance = ObjectInstance.query.get(instance_id)
if not instance:
return
# Run a query fetching objects of interest
latest_local_tickets = db.session.query(..)
if all([t.deleted for t in latest_local_tickets]):
return
remote_config = (instance.config.address, instance.config.port, instance.config.username, instance.config.password)
# determine which ticket objects are out of sync:
out_of_sync_tickets = out_of_sync_tickets(latest_local_tickets, remote_config)
successes, fails = [],[]
for result in out_of_sync_tickets:
db_object = latest_local_tickets[result.name]
try:
#Sync
#connect, update
status, reason = connect.sync(db_object)
successes.append(db_object.name)
except HTTPError as e:
status, reason = e.status_code, e.msg
fails.append(db_object.name)
db_object.update_state(status, reason)
db.session.add(db_object)
db.session.commit()
logger.info(f"successes: {successes}")
logger.info(f"fails:{fails}")
It usually fails in the call to update_state which looks like this:
#declarative_mixin
class ReplicationMixin:
status = Column(types.Text, default="unknown", nullable=False)
status_reason = Column(types.Text, nullable=True)
last_sync_date = Column(types.DateTime, default=None)
def update_state(status:str, reason:str=None):
self.status = status if status else "fail"
self.status_reason = parse_reason(reason)
self.last_sync_date = datetime.utcnow()
Like this:
[2022-07-29 09:22:11,305: ERROR/ForkPoolWorker-8] (psycopg2.errors.SerializationFailure) could not serialize access due to read/write dependencies among transactions
DETAIL: Reason code: Canceled on identification as a pivot, during write.
HINT: The transaction might succeed if retried.
[SQL: UPDATE tickets SET updated=%(updated)s, last_sync_date=%(last_sync_date)s WHERE tickets.id = %(tickets_id)s]
[parameters: {'updated': datetime.datetime(2022, 7, 29, 9, 22, 11, 304293), 'last_sync_date': datetime.datetime(2022, 7, 29, 9, 22, 11, 303875), 'tickets_id': 124}]
(Background on this error at: https://sqlalche.me/e/14/e3q8)
But i think it's a red herring, as its probably being subsequently read somewhere else.
Can I attempt to leverage db.session.refresh(object) or something else to make sure I commit and re-read data before changing it, or is there some other way to alleviate this issue?
I've been stumped for a week on this issue and I cant seem to figure out what the best way forward is. Very appreciative for any help I can get.
Related
I have a Flask application that uses SQLAlchemy (with some Marshmallow for serialization and deserialization).
I'm currently encountering some intermittent issues when trying to dump an object post-commit.
To give an example, let's say I have implemented a (multi-tenant) system for tracking system faults of some sort. This information is contained in a fault table:
class Fault(Base):
__tablename__ = "fault"
fault_id = Column(BIGINT, primary_key=True)
workspace_id = Column(Integer, ForeignKey('workspace.workspace_id'))
local_fault_id = Column(Integer)
name = Column(String)
description = Column(String)
I've removed a number of columns in the interest of simplicity, but this is the core of the model. The columns should be largely self explanatory, with workspace_id effectively representing tenant, and local_fault_id representing a tenant-specific fault sequence number, which is handled via a separate fault_sequence table.
This fault_sequence table holds a counter against workspace, and is updated by means of a simple on_fault_created() function that is executed by a trigger:
CREATE TRIGGER fault_created
AFTER INSERT
ON "fault"
FOR EACH ROW
EXECUTE PROCEDURE on_fault_created();
So - the problem:
I have a Flask endpoint for fault creation, where we create an instance of a Fault entity, add this via a scoped session (session.add(fault)), then call session.commit().
It seems that this is always successful in creating the desired entities in the database, executing the sequence update trigger etc. However, when I then try to interrogate the fault object for updated fields (after commit()), around 10% of the time I find that each key/field just points to an Exception:
psycopg2.errors.InFailedSqlTransaction: current transaction is aborted, commands ignored until end of transaction block
Which seems to boil down to the following:
(psycopg2.errors.InvalidTextRepresentation) invalid input syntax for integer: ""
[SQL: SELECT fault.fault_id AS fault_fault_id, fault.workspace_id AS fault_workspace_id, fault.local_fault_id AS fault_local_fault_id, fault.name as fault_name, fault.description as fault_description
FROM fault
WHERE fault.fault_id = %(param_1)s]
[parameters: {'param_1': 166}]
(Background on this error at: http://sqlalche.me/e/13/2j8
My question, then, is what do we think could be causing this?
I think it smells like a race condition, with the update trigger not being complete before SQLAlchemy has tried to get the updated data; perhaps local_fault_id is null, and this is resulting in the invalid input syntax error.
That said, I have very low confidence on this. Any guidance here would be amazing, as I could really do with retrieving that sequence number that's incremented/handled by the update trigger.
Thanks
Edit 1:
Some more info:
I have tried removing the update trigger, in the hope of eliminating that as a suspect. This behaviour is still intermittently evident, so I don't think it's related to that.
I have tried adopting usage of flush and refresh before the commit, and this allows me to get the values that I need - though commit still appears to 'break' the fault object.
Edit 2:
So it really seems to be more postgres than anything else. When I interrogate my database logs, this is the weirdest thing. I can copy and paste the command it says is failing, and I struggle to see how this integer value in the WHERE clause is possibly evaluating to an empty string.
This same error is reproducible with SELECT ... FROM fault WHERE fault.fault_id = '', which in no way seems to be the query making to the DB.
I am stumped.
Your sentence "This same error is reproducible with SELECT ... FROM fault WHERE fault.fault_id = '', which in no way seems to be the query making to the DB." seems to indicate that you are trying to access an object that does not have the database primary key "fault_id".
I guess, given that you did not provide the code, that you are adding the object to your session (session.add), committing (session.commit) and then using the object. As fault_id is autogenerated by the database, the fault object in the session (in memory) does not have fault_id.
I believe you can correct this with:
session.add(fault)
session.commit()
session.refresh(fault)
The refresh needs to be AFTER commit to refresh the fault object and retrieve fault_id.
If you are using async, you need
session.add(fault)
await session.commit()
await session.refresh(fault)
I want to implement a dynamic FTPSensor of a kind. Using the contributed FTP sensor I managed to make it work in this way:
ftp_sensor = FTPSensor(
task_id="detect-file-on-ftp",
path="./data/test.txt",
ftp_conn_id="ftp_default",
poke_interval=5,
dag=dag,
)
and it works just fine. But I need to pass dynamic path and ftp_conn_id params. I.e. I generate a bunch of new connections in a previous task and in the ftp_sensor task I want to check for each of the new connections that I previously generated if there's a file present on the FTP.
So I thought first to grab the connections' ids from XCom.
I send them from the previous task in XCom but it seems I cannot access XCom outside of tasks.
E.g. I was aiming at something like:
active_ftp_connections = context['ti'].xcom_pull(key='active_ftps')
for conn in active_ftp_connections:
ftp_sensor = FTPSensor(
task_id="detect-file-on-ftp",
path=conn['path'],
ftp_conn_id=conn['connection'],
poke_interval=5,
dag=dag,
)
but this doesn't seem to be a possible solution.
Then I just wasted a good amount of time trying to create my custom FTPSensor to which to pass dynamically the data I need but right now I reached to the conclusion that I need a hybrid between a sensor and operator, because I need to keep the poke functionality for instance but also have the execute functionality.
I guess one option is to write a custom operator that implements poke from the sensor base class but am probably too tired to try to do it now.
Do you have an idea how to achieve what I am aiming at? I can't seem to find any materials on the topic on the internet - maybe it's just me.
Let me know if the question is not clear so I can provide more details.
Update
I now reached to this as possibility
def get_active_ftps(**context):
active_ftp_connestions = context['ti'].xcom_pull(key='active_ftps')
return active_ftp_connestions
for ftp in get_active_ftps():
ftp_sensor = FTPSensor(
task_id="detect-file-on-ftp",
path="./"+ ftp['folder'] +"/test.txt",
ftp_conn_id=ftp['conn_id'],
poke_interval=5,
dag=dag,
)
but it throws an error: Broken DAG: [/usr/local/airflow/dags/copy_file_from_ftp.py] 'ti'
I managed to do it like this:
active_ftp_folder = Variable.get('active_ftp_folder')
active_ftp_conn_id = Variable.get('active_ftp_conn_id')
ftp_sensor = FTPSensor(
task_id="detect-file-on-ftp",
path="./"+ active_ftp_folder +"/test.txt",
ftp_conn_id=active_ftp_conn_id,
poke_interval=5,
dag=dag,
)
And will just have the dag run one ftp account at a time since I realized that there shouldn't be cycles in a direct acyclic graphs ... apparently.
Below is my current code. It connects successfully to the organization. How can I fetch the results of a query in Azure like they have here? I know this was solved but there isn't an explanation and there's quite a big gap on what they're doing.
from azure.devops.connection import Connection
from msrest.authentication import BasicAuthentication
from azure.devops.v5_1.work_item_tracking.models import Wiql
personal_access_token = 'xxx'
organization_url = 'zzz'
# Create a connection to the org
credentials = BasicAuthentication('', personal_access_token)
connection = Connection(base_url=organization_url, creds=credentials)
wit_client = connection.clients.get_work_item_tracking_client()
results = wit_client.query_by_id("my query ID here")
P.S. Please don't link me to the github or documentation. I've looked at both extensively for days and it hasn't helped.
Edit: I've added the results line that successfully gets the query. However, it returns a WorkItemQueryResult class which is not exactly what is needed. I need a way to view the column and results of the query for that column.
So I've figured this out in probably the most inefficient way possible, but hope it helps someone else and they find a way to improve it.
The issue with the WorkItemQueryResult class stored in variable "result" is that it doesn't allow the contents of the work item to be shown.
So the goal is to be able to use the get_work_item method that requires the id field, which you can get (in a rather roundabout way) through item.target.id from results' work_item_relations. The code below is added on.
for item in results.work_item_relations:
id = item.target.id
work_item = wit_client.get_work_item(id)
fields = work_item.fields
This gets the id from every work item in your result class and then grants access to the fields of that work item, which you can access by fields.get("System.Title"), etc.
I have a simple library application. In order to force 3 actions to commit as one action, and rollback if any of the actions fail, I made the following code changes:
In settings.py:
AUTOCOMMIT=False
In forms.py
from django.db import IntegrityError, transaction
class CreateLoan(forms.Form):
#Fields...
def save(self):
id_book = form.cleaned_data.get('id_book', None)
id_customer = form.cleaned_data.get('id_customer', None)
start_date = form.cleaned_data.get('start_date', None)
book = Book.objects.get(id=id_book)
customer = Customer.objects.get(id=id_customer)
new_return = Return(
book=book
start_date=start_date)
txn=Loan_Txn(
customer=customer,
book=book,
start_date=start_date
)
try
with transaction.atomic():
book.update(status="ON_LOAN")
new_return.save(force_insert=True)
txn.save(force_insert=True)
except IntegrityError:
raise forms.ValidationError("Something occured. Please try again")
Am I still missing anything with regards to this? I'm using Django 1.9 with Python 3.4.3 and the database is MySQL.
You're using transaction.atomic() correctly (including putting the try ... except outside the transaction) but you should definitely not be setting AUTOCOMMIT = False.
As the documentation states, you set that system-wide setting to False when you want to "disable Django’s transaction management"—but that's clearly not what you want to do, since you're using transaction.atomic()! More from the documentation:
If you do this, Django won’t enable autocommit, and won’t perform any commits. You’ll get the regular behavior of the underlying database library.
This requires you to commit explicitly every transaction, even those started by Django or by third-party libraries. Thus, this is best used in situations where you want to run your own transaction-controlling middleware or do something really strange.
So just don't do that. Django will of course disable autocommit for that atomic block and re-enable it when the block finishes.
When updating a document in MongoDB using a search-style update, is it possible to get back the _id of the document(s) updated?
For example:
import pymongo
client = pymongo.MongoClient('localhost', 27017)
db = client.test_database
col = db.test_col
col.insert({'name':'kevin', 'status':'new'})
col.insert({'name':'brian', 'status':'new'})
col.insert({'name':'matt', 'status':'new'})
col.insert({'name':'stephen', 'status':'new'})
info = col.update({'status':'new'}, {'$set':{'status':'in_progress'}}, multi=False)
print info
# {u'updatedExisting': True, u'connectionId': 1380, u'ok': 1.0, u'err': None, u'n': 1}
# I want to know the _id of the document that was updated.
I have multiple threads accessing the database collection and want to be able to mark a document as being acted upon. Getting the document first and then updating by Id is not a good answer, because two threads may "get" the same document before it is updated. The application is a simple asynchronous task queue (yes, I know we'd be better off with something like Rabbit or ZeroMQ for this, but adding to our stack isn't possible right now).
You can use pymongo.collection.find_and_modify. It is a wrapper around MongoDB findAndModify command and can return original (by default) or modified document.
info = col.find_and_modify({'status':'new'}, {'$set':{'status':'in_progress'}})
if info:
print info.get('_id')