How to ensure no repeating datas when use django?

How to ensure no repeating datas when use django? - python

background
I aim at overing a project which is made up of django and celery.
And I code two tasks which would sipder from two differnt web and save some data to database —— mysql.
As before, I do just one task, and I use update_or_create shows enough.
But when I want to do more task in different workers ,and all of them may save data to database, and what's more, they may save the same data at the same time.
question
How can I ensure different tasks running in different worker makeing no repeating data when all of them try to save the same data?
I know a django API is select_for_update which would set a lock in database. But when I reading the documentions, it similar means, would select,if exist,update. But I want to select,if exist,update,else create. It would more be like update_or_create,but this API may not use a lock?

about sorry
May the user anwered my question before had give me the right answer, but I did not get what they means.
what I choose
Finally I use the redis lock to ensure no repeating data.
The logic just below:
when I get the data,I try to use 'set(key,value.nx=True,ex=60)' to get a lock from redis.
If the answer is True,I would try to use the django queryqpi 'update_or_create()'.
If not,I did nothing and return True.
It make the burst problem like a single process.

Related

SQLAlchemy session with Celery (Multipart batch writes)

Supposing I have a mobile app which will send a filled form data (which also contains images) to a Commercial software using its API, and this data should be committed all at once.
Since the mobile does not have enough memory to send all the dataset at once, I need to send it as a Multipart batch.
I use transactions in cases where I want to perform a bunch of operations on the database, but I kind of need them to be performed all at once, meaning that I don't want the database to change out from under me while I'm in the middle of making my changes. And if I'm making a bunch of changes, I don't want users to be able to read my set of documents in that partially changed state. And I certainly don't want a set of operations failing halfway through, leaving me in a weird and inconsistent state forever. It's got to be all or nothing.
I know that Firebase provides the batch write operation which does exactly what I need. However, I need to do this into a local database (like redis or postgres).
The first approach I considered is using POST requests identified by a main session_ID.
- POST /session -> returns new SESSION_ID
- POST [image1] /session/<session_id> -> returns new IMG_ID
- POST [image2] /session/<session_id> -> returns new IMG_ID
- PUT /session/<session_id> -> validate/update metadata
However it does not seem very robust to handle errors.
The second approach I was considering is combining SQLAlchemy session with Celery task using Flask or FastAPI. I am not sure if it is common to do this to solve this issue. I just found this question. I would like to know what do you guys recommend for this second case approach (sending all data parts first, and commit all at once) ?

How can i send real time data to a Django application and show it on a webpage?

I'm trying to add real-time features to my Django webapp. Basically, i want to show real time data on a webpage.
I have an external Python script which generates some JSON data, not big data but around 10 records per second. On the other part, i have a Django app, i would like my Django app to receive that data and show it on a HTML page in real time. I've already considered updating the data on a db and then retrieving it from Django, but i would have too many queries, since Django would query the DB 1+ times per second for every user and my external script would be writing a lot of data every second.
What i'm missing is a "central" system, a way to make these two pieces communicate. I know the question is probably not specific enough, but is there some way to do this? I know something about Django Channels, but i don't know if i could do what i want with it; i've also considered updating the data on a RabbitMQ queue and then retrieve it from Django, but this is not the best use of RabbitMQ.
So is there a way to do this with Django-Channels? Any kind of advice is appreciated.

I would suggest using Django Channels. You can also use Redis instead of RabbitMQ. In your case, Redis might be a better choice.
Here is an approach: http://www.maxburstein.com/blog/realtime-django-using-nodejs-and-socketio/

update existing cache data with newer items in django

I want to use caching in Django and I am stuck up with how to go about it. I have data in some specific models which are write intensive. records will get added continuously to the model. Each user has some specific data in the model similar to orders table.
Since my model is write intensive I am not sure how effective caching frameworks in Django are going to be. I tried Django view specific caching and I am try to develop a view where first it will pick up data from the cache. Then I will have another call which will bring in data which was added to the model after the caching was done. What I want to do is add the updated data to the original cache data and store it again.
It is like I don't want to expire my cache, I just want to keep adding to my existing cache data. may be once in 3 hrs I can clear it.
Is what I am doing right. Are there better ways than this. Can I really add to items in existing cache.
I will be very glad for your help

You ask about "caching" which is a really broad topic, and the answer is always a mix of opinion, style and the specific app requirements. Here are a few points to consider.
If the data is per user, you can cache it per user:
from django.core.cache import cache
cache.set(request.user.id,"foo")
cache.get(request.user.id)
The common practice it to keep a database flag that tells you if the user's data changed since it was cached. So before you fetch the data from cache, check only this flag from the DB. If the flag says nothing changed, get the data from cache. If it did change, pull from DB, replace the cache, and set the flag again.
The flag check should be fast and simple: one table, indexed by user.id, and a boolean flag field. This will squeeze a lot of index rows into a single DB page, and enables a fast fetching of a single one field row. Yet you still get a persistent updated main storage, that prevents the use of not updated cache data. You can check this flag in a middleware.
You can run expiry in many ways: clear cache when user logs out, run a cron script that clears items, or let the cache backend expire items. If you use a flag check before you use the cache, there is no issue in keeping items in cache except space, and caching backends handle that. If you use the django simple file cache (which is easy, simple and zero config), you will have to clear the cache. A simple cron script will do.

Django Rest Framework Viewsets / Database Locks

Hi there so i am using Django Rest Framework 3.1 and i was wondering if its possible to "protect" my viewsets / database against writes in a per user basis?
In other words if 1 user is saving something the other one cannot save and it either waits till the first user finishes or returns some kind of error.
I tried looking for this answer but couldn't find it.
Is this behavior already implemented? if not how can i achieve this in practice?
UPDATE after some more thinking:
This is just a theory still, it needs more thinking, but if we use a Queue (Redis or Rabbitmq) we can put all synchronization writes requests in the queue instead of processing them right away and in conjunction with some user specific lock variable (maybe in the user sessions db table) we can ask if there are any users in front of us belonging to the same proponent and if those users have finished writing their updates or not (using the lock)
cheers

Database transactions will provide some of the safety you're looking for, I think. If a number of database operations are wrapped in a transaction, they are applied to the database together, so a sequence of operations cannot fail mid-way through and leave the database in an invalid state.
Other users will see the results of the operations as if they were applied all at once, or not at all (in the case of an error).

Dynamically Created Top Articles List in Django?

I'm creating a Django-powered site for my newspaper-ish site. The least obvious and common-sense task that I have come across in getting the site together is how best to generate a "top articles" list for the sidebar of the page.
The first thing that came to mind was some sort of database column that is updated (based on what?) with every view. That seems (to my instincts) ridiculously database intensive and impractical and thus I think I'd like to find another solution.
Thanks all.

I would give celery a try (with django-celery). While it's not so easy to configure and use as cache, it enables you to queue tasks like incrementing counters and do them in background. It could be even combined with cache technique - in views increment counters in cache and define PeriodicTask that will run every now and then, resetting counters and writing them to the database.
I just remembered - I once found this blog entry which provides nice way of incrementing 'viewed_count' (or similar) column in database with AJAX JS call. If you don't have heavy traffic maybe it's good idea?
Also mentioned in this post is django-tracking, but I don't know much about it, I never used it myself (yet).

Premature optimization, first try the db way and then see if it really is too database sensitive. Any decent database has so good caches it probably won't matter very much. And even if it is a problem, take a look at the other db/cache suggestions here.
It is most likely by the way is that you will have many more intensive db queries with each view than a simple view update.

If you do something like sort by top views, it would be fast if you index the view column in the DB. Another option is to only collect the top x articles every hour or so, and toss that value into Django's cache framework.
The nice thing about caching the list is that the algorithm you use to determine top articles can be as complex as you like without hitting the DB hard with every page view. Django's cache framework can use memory, db, or file system. I prefer DB, but many others prefer memory. I believe it uses pickle, so you can also store Python objects directly. It's easy to use, recommended.

An index wouldn't help as them main problem I believe is not so much getting the sorted list as having a DB write with every page view of an article. Another index actually makes that problem worse, albeit only a little.
So I'd go with the cache. I think django's cache shim is a problem here because it requires timeouts on all keys. I'm not sure if that's imposed by memcached, if not then go with redis. Actually just go with redis anyway, the python library is great, I've used it from django projects before, and it has atomic increments and powerful sorting - everything you need.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.