microservices and multiple databases - python

i have written MicroServices like for auth, location, etc.
All of microservices have different database, with for eg location is there in all my databases for these services.When in any of my project i need a location of user, it first looks in cache, if not found it hits the database. So far so good.Now when location is changed in any of my different databases, i need to update it in other databases as well as update my cache.
currently i made a model (called subscription) with url as its field, whenever a location is changed in any database, an object is created of this subscription. A periodic task is running which checks for subscription model, when it finds such objects it hits api of other services and updates location and updates the cache.
I am wondering if there is any better way to do this?

I am wondering if there is any better way to do this?
"better" is entirely subjective. if it meets your needs, it's fine.
something to consider, though: don't store the same information in more than one place.
if you need an address, look it up from the service that provides address, every time.
this may be a performance hit, but it eliminates the problem of replicating the data everywhere.
another option would be a more proactive approach, as suggested in comments.
instead of creating a task list for changes, and doing that periodically, send a message across rabbitmq immediately when the change happens. let every service that needs to know, get a copy of the message and update it's own cache of info.
just remember, though. every time you have more than one copy of the information, you reduce the "correctness" of the system, as a whole. it will always be possible for the information found in one of your apps to be out of date, because it did not get an update from the official source.

Related

How to handle multi-user database interaction with PyQt5

I am developing a GUI app that will be used supposedly by mutliple users. In my app, I use QAbstractTableModel to display a MS Access Database (stored on a local server, accessed by several PCs) in a QTableView. I developped everything I needed for unique user interaction. But now I'm moving to the step where I need to think about multi-user interaction.
For exemple, if user A changes a specific line, the instance of the app on user's B PC needs to update the changed line. Another example, if user A is modifying a specific line, and user B also wants to modify it, it needs to be notified as "already being modified, wait please", and once the modification from user A is done, the user B needs to see this modification updated before he has any interaction.
Today, because of the local nature of the MS Access database, I have to update the table view a lot of time, based on user interaction, in order to not miss any database modification from other potential users. It is kinda greedy in terms of performance and resources.
I was thinking about using Django in order make the different app instances communicate with each other, but maybe I'm overthingking it and may be there is other solutions.
Dunno if it's clear, I'm available for more informations !
Perhaps you could simply store a "lastUpdated" timestamp on the row. With each update, you update that timestamp.
Now, when you submit an update, you include that timestamp, and if the timestamps don't match, you let the user know, and handle the conflict on the frontend (Perhaps a simple "overwrite local copy, or force update and overwrite server copy" option).
That's a simple and robust solution, but if you don't want users wasting time writing updates for old rows, you could use WebSockets to communicate from a server to any clients with that row open for editing, and let them know that the row has been updated.
If you want to "lock" rows while the row is already being edited, you could simply store a "inUse" boolean and have users check the value before continuing.
Usually, when using a MVC pattern (which is what QAbstractTableModel + QTableView is) the responsibility of updating the view should lie on the model itself. I.e. it's the model that should notify the view that something changed.
It seems that QAbstractTableModel has a dataChanged signal that gets emitted on data changes.
I suggest you to connect it to your view refresh slot as done here.
In this way you avoid the need of another moving part/infrastructure component (django).

How do I structure a database cache (memcached/Redis) for a Python web app with many different variables for querying?

For my app, I am using Flask, however the question I am asking is more general and can be applied to any Python web framework.
I am building a comparison website where I can update details about products in the database. I want to structure my app so that 99% of users who visit my website will never need to query the database, where information is instead retrieved from the cache (memcached or Redis).
I require my app to be realtime, so any update I make to the database must be instantly available to any visitor to the site. Therefore I do not want to cache views/routes/html.
I want to cache the entire database. However, because there are so many different variables when it comes to querying, I am not sure how to structure this. For example, if I were to cache every query and then later need to update a product in the database, I would basically need to flush the entire cache, which isn't ideal for a large web app.
I would prefer is to cache individual rows within the database. The problem is, how do I structure this so I can flush the cache appropriately when an update is made to the database? Also, how can I map all of this together from the cache?
I hope this makes sense.
I had this exact question myself, with a PHP project, though. My solution was to use ElasticSearch as an intermediate cache between the application and database.
The trick to this is the ORM. I designed it so that when Entity.save() is called it is first stored in the database, then the complete object (with all references) is pushed to ElasticSearch and only then the transaction is committed and the flow is returned back to the caller.
This way I maintained full functionality of a relational database (atomic changes, transactions, constraints, triggers, etc.) and still have all entities cached with all their references (parent and child relations) together with the ability to invalidate individual cached objects.
Hope this helps.
So a free eBook called "Redis in Action" by Josiah Carlson answered all of my questions. It it quite long, but after reading through, I have a fairly solid understanding of how to structure a caching architecture. It gives real world examples, such as a social network and a shopping site with tons of traffic. I will need to read through it again once or twice to fully understand. A great book!
Link: Redis in Action

update existing cache data with newer items in django

I want to use caching in Django and I am stuck up with how to go about it. I have data in some specific models which are write intensive. records will get added continuously to the model. Each user has some specific data in the model similar to orders table.
Since my model is write intensive I am not sure how effective caching frameworks in Django are going to be. I tried Django view specific caching and I am try to develop a view where first it will pick up data from the cache. Then I will have another call which will bring in data which was added to the model after the caching was done. What I want to do is add the updated data to the original cache data and store it again.
It is like I don't want to expire my cache, I just want to keep adding to my existing cache data. may be once in 3 hrs I can clear it.
Is what I am doing right. Are there better ways than this. Can I really add to items in existing cache.
I will be very glad for your help
You ask about "caching" which is a really broad topic, and the answer is always a mix of opinion, style and the specific app requirements. Here are a few points to consider.
If the data is per user, you can cache it per user:
from django.core.cache import cache
cache.set(request.user.id,"foo")
cache.get(request.user.id)
The common practice it to keep a database flag that tells you if the user's data changed since it was cached. So before you fetch the data from cache, check only this flag from the DB. If the flag says nothing changed, get the data from cache. If it did change, pull from DB, replace the cache, and set the flag again.
The flag check should be fast and simple: one table, indexed by user.id, and a boolean flag field. This will squeeze a lot of index rows into a single DB page, and enables a fast fetching of a single one field row. Yet you still get a persistent updated main storage, that prevents the use of not updated cache data. You can check this flag in a middleware.
You can run expiry in many ways: clear cache when user logs out, run a cron script that clears items, or let the cache backend expire items. If you use a flag check before you use the cache, there is no issue in keeping items in cache except space, and caching backends handle that. If you use the django simple file cache (which is easy, simple and zero config), you will have to clear the cache. A simple cron script will do.

In need of a light, changing database/storage solution

I have a Python Flask app I'm writing, and I'm about to start on the backend. The main part of it involves users POSTing data to the backend, usually a small piece of data every second or so, to later be retrieved by other users. The data will always be retrieved within under an hour, and could be retrieved in as low as a minute. I need a database or storage solution that can constantly take in and store the data, purge all data that was retrieved, and also perform a purge on data that's been in storage for longer than an hour.
I do not need any relational system; JSON/key-value should be able to handle both incoming and outgoing data. And also, there will be very constant reading, writing, and deleting.
Should I go with something like MongoDB? Should I use a database system at all, and instead write to a directory full of .json files constantly, or something? (Using only files is probably a bad idea, but it's kind of the extent of what I need.)
You might look at mongoengine we use it in production with flask(there's an extension) and it has suited our needs well, there's also mongoalchemy which I haven't tried but seems to be decently popular.
The downside to using mongo is that there is no expire automatically, having said that you might take a look at using redis which has the ability to auto expire items. There are a few ORMs out there that might suit your needs.

App engine app design questions

I want to load info from another site (this part is done), but i am doing this every time the page is loaded and that wont do. So i was thinking of having a variable in a table of settings like 'last checked bbc site' and when the page loads it would check if its been long enough since last check to check again. Is there anything silly about doing it that way?
Also do i absolutely have to use tables to store 1 off variables like this setting?
I think there are 2 options that would work for you, besides creating a entity in the datastore to keep track of "last visited time".
One way is to just check the external page periodically, using the cron api as described by jldupont.
The second way is to store the last visited time in memcache. Although memcache is not permanent, it doesn't have to be if you are only storing last refresh times. If your entry in memcache were to disappear for some reason, the worst that would happen would be that you would fetch the page again, and update memcache with the current date/time.
The first way would be best if you want to check the external page at regular intervals. The second way might be better if you want to check the external page only when a user clicks on your page, and you haven't fetched that page yourself in the recent past. With this method, you aren't wasting resources fetching the external page unless someone is actually looking for data related to it.
You could also use Scheduled Tasks.
Also, you don't absolutely need to use the Datastore for configuration parameters: you could have this in a script / config file.
If you want some handler on your GAE app (including one for a scheduled task, reception of messages, web page visits, etc) to store some new information in such a way that some handler in the future can recover that information, then GAE's storage is the only good general way (memcache could expire from under you, for example). Not sure what you mean by "tables" (?!), but guessing that you actually mean GAE's storage the answer is "yes". (Under very specific circumstances you might want to put that data to some different place on the network, such as your visitor's browser e.g. via cookies, or an Amazon storage instance, etc, but it does not appear to me that those specific circumstances are appliable to your use case).

Categories

Resources