Redis and RDBMS coexistence (hopefully cooperation) in Flask applications

Redis and RDBMS coexistence (hopefully cooperation) in Flask applications - python

I'm developing a multi-player game in Python with a Flask frontend, and I'm using it as an opportunity to learn more about the NoSQL way of doing things.
Redis seems to be a good fit for some of the things I need for this app, including storage of server-side sessions and other transient data, e.g. what games are in progress, who's online, etc. There are also several good Flask/Redis recipes that have made things very easy so far.
However, there are still some things in the data model that I would prefer lived inside a traditional RDBMS, including user accounts, logs of completed games, etc. It's not that Redis can't do these things, but I just think the RDBMS is more suited to them, and since Redis wants everything in memory, it seems to make sense to "warehouse" some of this data on disk.
The one thing I don't quite have a good strategy for is how to make these two data stores live happily together. Using ORMs like SQLAlchemy and/or redisco seems right out, because the ORMs are going to want to own all the data that's part of their data model, and there are inevitably times I'm going to need to have classes from one ORM know about classes from the other one (e.g. "users are in the RDBMS, but games are in Redis, and games have users participating in them.)
Does anyone have any experience deploying python web apps using a NoSQL store like Redis for some things and an RDBMS for others? If so, do you have any strategies for making them work together?

You should have no problem using an ORM because, in the end, it just stores strings, numbers and other values. So you could have a game in progress, and keep its state in Redis, including the players' IDs from the SQL player table, because the ID is just a unique integer.

Related

Handling repetitive content within django apps

I am currently building a tool in Django for managing the design information within an engineering department. The idea is to have a common catalogue of items accessible to all projects. However, the projects would be restricted based on user groups.
For each project, you can import items from the catalogue and change them within the project. There is a requirement that each project must be linked to a different database.
I am not entirely sure how to approach this problem. From what I read, the solution I came up with is to have multiple django apps. One represents the common catalogue of items (linked to its own database) and then an app for each project(which can write and read from its own database but it can additionally read also from the common items catalogue database). In this way, I can restrict what user can access what database/project. However, the problem with this solution is that it is not DRY. All projects look the same: same models, same forms, same templates. They are just linked to different database and I do not know how to do this in a smart way (without copy-pasting entire files cause I think managing this would be a pain).
I was thinking that this could be avoided by changing the database label when doing queries (employing the using attribute) depending on the group of the authenticated user. The problem with this is that an user can have access to multiple projects. So, I am again at a loss.

It looks for me that all you need is a single application that will manage its access properly.
If the requirement is to have separate DBs then I will not argue that, but ... there is always small chance that separate tables in 1 DB is what they will accept

Django apps don't segregate objects, they are a way of structuring your code base. The idea is that an app can be re-used in other projects. Having a separate app for your catalogue of items and your projects is a good idea, but having them together in one is not a problem if you have a small codebase.
If I have understood your post correctly, what you want is for the databases of different departments to be separate. This is essentially a multi-tenancy question which is a big topic in itself, there are a few options:
Code separation - all of your projects/departments exist in a single database and schema but are separate by code that filters departments depending on who the end user is (literally by using Django .filters()). This is easy to do but there is a risk that data could be leaked to the wrong user if you get your code wrong. I would recommend this one for your use-case.
Schema separation - you are still using a single database but each department has its own schema. You would need to use Postgresql for this but once a schema has been set, there is far less chance that data is going to be visible to the wrong user. There are some Django libraries such as django-tenants that can do a lot of the heavy lifting.
Database separation - each department has their own database. There is even less of a chance that data will be leaked but you have to manage multi-databases and it is more difficult to scale. You can manage this through django as there is support for multi-databases.
Application separation - each department not only has their own database but their own application instance. The separation is absolute but again you need to manage multiple applications on a host like Heroku, which is even less scalable.

What did my teacher mean by 'db.sqlite3' will fall short in bigger real word problems?

I am very new to programming and this site too... An online course that I follow told me that it is not possible to manage bigger databases with db.sqlite3, what does it mean anyway?

Choice of Relational Database Management Systems (RDBMS) is dependent on your use case. The different options available have different pros and cons and hence, for different applications, some are more suitable than others.
I typically use SQLite (only for development purposes) and then switch to MySQL for my Django projects.
SQLite: Is file based. You can actually see the file in your project directory so all the CRUD (Create, Retrieve, Update, Delete) is done directly onto that file. Also, all the underlying code for the RDBMS is quite small in size. So all this makes it good for applications which don't require intensive use of databases or perhaps require offline storage e.g. IoT, small websites etc. When you try to use it for big projects that require intensive use of databases e.g. online stores, you run into many problems because the RDBMS is not as well developed as MySQL or PostgreSQL. The primary problem is a lack of concurrency i.e. only one device can be writing to the database at a time because operations are serialised.
MySQL: Is one of the most popularly used and my personal favourite (very easy to configure and use with Django). It's based on the client/server database model and not a file like SQLite and is very scalable i.e. it is capable of way more than SQLite and you can use it for many different applications that require heavy use of the RDBMS. It has better security, allows for concurrent operations and outperforms PostgreSQL in performance when you need to do lots of reading operations.
PostgreSQL: Is also a very strong option and capable of most of the stuff that MySQL can do but handles clients in a different way and it has an edge over MySQL in SELECTs and INSERTs. MySQL is still soooo much more widely used than PostgreSQL though.
There are also many other options on the market. You can take a look at this article which compares a bunch of them. But to answer your question, SQLite is very simplistic compared to the other options and stores everything in a file in your project rather than on a server, so as a result, there is little security, lack of concurrency etc. This is fine when developing and for use cases that do not require major use of databases but will not cut it for big projects.

This is not a matter of how big the DB is. SQLite DB can be very big, hundreds of Gigabytes.
It is a matter of how many user are using the application (you mention django) concurrently. As SQLite only support one writer at a time, the other are queued. Fortunately, you can have many concurrent readers.
So if you have a lot of concurrent access (that are not explicitly marked are read-only) then SQLite is not a good choice anymore. You'll prefer something like PostgreSQL.
BTW, everything is better explained in the documentation ;)

Selecting a database for your project is like selecting any other technology. It depends on your use case.
Size isn't the issue, complexity is. SQLite3 databases can grow as big as 281 terabytes. Limits on number of tables, columns & rows are also pretty decent.
If your application logic requires SQL operations like:
RIGHT OUTER JOIN, FULL OUTER JOIN
ALTER TABLE, ADD CONSTRAINT, etc..
DELETE, INSERT, or UPDATE on a VIEW
Custom user permissions to read/write
Then SQLite3 should not be your choice of database as these SQL features are not implemented in SQLite3.

Django + Scrapy multi scrapers architecture

Recently I took over Django project whose one component is Scrapy scrapprs (a lot of - core functionality). It is worth adding that scrapers simply feed the database several times a day and django web app is using this data.
__Scraper__s have direct access to Django model, but in my opinion is not the best idea (mixed responsibilities - django rather should act as a web app, not also scrapers, isn't it?). For example after such split scrapers could be run serverless, saving money and being spawned only when needed.
I see it at least as separate component in the architecture. But if I would separate scrapers from Django website then I would need to populate DB there as well - and change in model either in Django webapp or in scraping app would require change in second app to adjust.
I haven't seen really articles about splitting those apps.
What are the best practices here? Is it worth splitting it? How would you organise deployment to cloud solution(e.g. AWS)?
Thank you

Well, this is a big discussion and I have the same "good problem".
Short answer:
I suggest you that if you want to separate it, you can separate the logic from the data using different schemes. I did it before and is a good approach.
Long answer:
The questions are:
Once you gather information from scrapers, are you doing something with them (Aggregation, treatment, or anything else)?
If the answer is yes, you can separate it in 2 DB. One with the raw information and the other with the treated one (which will be the shared with Django).
If the answer is no, I don't see any reason to separate it. At the end, Django is only the visualizer of the data.
The Django website is using a lot of stored data that for the Single Responsibility you want to separate it from the scraped data?
If the answer is yes, separate it by schemas or even DB.
If the answer is no, you can store it in the same DB of Django. At the end, the important data will be the extracted data. Django maybe will have a configuration's DB or other extra data to manage the web, but the big percentage of the DB will be the data crawled/treated. Depends how much cost it will take you to separate it and maintain. If you are doing from the beginning, I would do it separately.

Sharing an ORM between languages

I am making a database with data in it. That database has two customers: 1) a .NET webserver that makes the data visible to users somehow someway. 2) a python dataminer that creates the data and populates the tables.
I have several options. I can use the .NET Entity Framework to create the database, then reverse engineer it on the python side. I can vice versa that. I can just write raw SQL statements in one or the other systems, or both. What are possible pitfalls of doing this one way or the other? I'm worried, for example, that if I use the python ORM to create the tables, then I'm going to have a hard time in the .NET space...

I love questions like that.
Here is what you have to consider, your web site has to be fast, and the bottleneck of most web sites is a database. The answer to your question would be - make it easy for .NET to work with SQL. That will require little more work with python, like specifying names of the table, maybe row names. I think Django and SQLAlchemy are both good for that.
Another solution could be to have a bridge between database with gathered data and database to display data. On a background you can have a task/job to migrate collected data to your main database. That is also an option and will make your job easier, at least all database-specific and strange code will go to the third component.
I've been working with .NET for quite a long time before I switched to python, and what you should know is that whatever strategy you chose it will be possible to work with data in both languages and ORMs. Do the hardest part of the job in the language your know better. If you are a Python developer - pick python to mess with the right names of tables and rows.

Converting Django project from MySQL to Mongo, any major pitfalls?

I want to try Mongodb w/ mongoengine. I'm new to Django and databases and I'm having a fit with Foreign Keys, Joins, Circular Imports (you name it). I know I could eventually work through these issues but Mongo just seems like a simpler solution for what I am doing. My question is I'm using a lot of pluggable apps (Imagekit, Haystack, Registration, etc) and wanted to know if these apps will continue to work if I make the switch. Are there any known headaches that I will encounter, if so I might just keep banging my head with MySQL.

There's no reason why you can't use one of the standard RDBMSs for all the standard Django apps, and then Mongo for your app. You'll just have to replace all the standard ways of processing things from the Django ORM with doing it the Mongo way.
So you can keep urls.py and its neat pattern matching, views will still get parameters, and templates can still take objects.
You'll lose querysets because I suspect they are too closely tied to the RDBMS models - but they are just lazily evaluated lists really. Just ignore the Django docs on writing models.py and code up your database business logic in a Mongo paradigm.
Oh, and you won't have the Django Admin interface for easy access to your data.

You might want to check out django-nonrel, which is a young but promising attempt at a NoSQL backend for Django. Documentation is lacking at the moment, but it works great if you just work it out.

I've used mongoengine with django but you need to create a file like mongo_models.py for example. In that file you define your Mongo documents. You then create forms to match each Mongo document. Each form has a save method which inserts or updates whats stored in Mongo. Django forms are designed to plug into any data back end ( with a bit of craft )
BEWARE: If you have very well defined and structured data that can be described in documents or models then don't use Mongo. Its not designed for that and something like PostGreSQL will work much better.
I use PostGreSQL for relational or well structured data because its good for that. Small memory footprint and good response.
I use Redis to cache or operate in memory queues/lists because its very good for that. great performance providing you have the memory to cope with it.
I use Mongo to store large JSON documents and to perform Map and reduce on them ( if needed ) because its very good for that. Be sure to use indexing on certain columns if you can to speed up lookups.
Don't circle to fill a square hole. It won't fill it.
I've seen too many posts where someone wanted to swap a relational DB for Mongo because Mongo is a buzz word. Don't get me wrong, Mongo is really great... when you use it appropriately. I love using Mongo appropriately

Upfront, it won't work for any existing Django app that ships it's models. There's no backend for storing Django's Model data in mongodb or other NoSQL storages at the moment and, database backends aside, models themselves are somewhat of a moot point, because once you get in to using someones app (django.contrib apps included) that ships model-template-view triads, whenever you require a slightly different model for your purposes you either have to edit the application code (plain wrong), dynamically edit the contents of imported Python modules at runtime (magical), fork the application source altogether (cumbersome) or provide additional settings (good, but it's a rare encounter, with django.contrib.auth probably being the only widely known example of an application that allows you to dynamically specify which model it will use, as is the case with user profile models through the AUTH_PROFILE_MODULE setting).
This might sound bad, but what it really means is that you'll have to deploy SQL and NoSQL databases in parallel and go from an app-to-app basis--like Spacedman suggested--and if mongodb is the best fit for a certain app, hell, just roll your own custom app.
There's a lot of fine Djangonauts with NoSQL storages on their minds. If you followed the streams from the past Djangocon presentations, every year there's been important discussions about how Django should leverage NoSQL storages. I'm pretty sure, in this year or the next, someone will refactor the apps and models API to pave the path to a clean design that can finally unify all the different flavors of NoSQL storages as part of the Django core.

I have recently tried this (although without Mongoengine). There are a huge number of pitfalls, IMHO:
No admin interface.
No Auth django.contrib.auth relies on the DB interface.
Many things rely on django.contrib.auth.User. For example, the RequestContext class. This is a huge hindrance.
No Registration (Relies on the DB interface and django.contrib.auth)
Basically, search through the django interface for references to django.contrib.auth and you'll see how many things will be broken.
That said, it's possible that MongoEngine provides some support to replace/augment django.contrib.auth with something better, but there are so many things that depend on it that it's hard to say how you'd monkey patch something that much.

Primary pitfall (for me): no JOINs!

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.