The best way to to handle large databases in python/flask projects

The best way to to handle large databases in python/flask projects - python

Recently i started to use flask and I liked pretty much.
In the past I had a system in PHP with a lot of databases like marketing, HR, finance and so on.
Each of this databases had their own tables like HR used to have employers, companies and so on.
Each of this tables was a class in PHP, we used this system to facilitate save/delete since they were used all over the system all we had to do was instantiate a new object from one of the table/class where which column was a object property and then call $obj->Save() to insert a new row in the table.
Programming has evolved so much since then so my doubt is if there's a more efficient way to do that in python/flask, instead of creating a class for each of the tables from the databases like I used to do in PHP, I know this is a large question so I would appreciate recommendations of books, wikis and so on about this topic.

A fairly modern approach to interface with a database in high-level programming languages is to use an ORM, or Object Relational Mapper. See this Stack Overflow thread for a good explanation.
If you are using Flask, SQLAlchemy is the most popular choice, so much so that Flask actually has an extension called Flask-SQLAlchemy. Keep in mind, that you will still be mapping classes to database entities. However, the power of SQLAlchemy is that it provides a higher level of abstraction on top of the database, which can go beyond simply mapping a class to a table row. According to the documentation:
SQLAlchemy considers the database to be a relational algebra engine, not just a collection of tables. Rows can be selected from not only tables but also joins and other select statements; any of these units can be composed into a larger structure. SQLAlchemy's expression language builds on this concept from its core.
This Stack Overflow thread provides more Python ORM suggestions.

Related

Advantage of Django ORM V/S Performing raw SQL queries [duplicate]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
If you are motivate to the "pros" of an ORM and why would you use an ORM to management/client, what are those reasons would be?
Try and keep one reason per answer so that we can see which one gets voted up as the best reason.

The most important reason to use an ORM is so that you can have a rich, object oriented business model and still be able to store it and write effective queries quickly against a relational database. From my viewpoint, I don't see any real advantages that a good ORM gives you when compared with other generated DAL's other than the advanced types of queries you can write.
One type of query I am thinking of is a polymorphic query. A simple ORM query might select all shapes in your database. You get a collection of shapes back. But each instance is a square, circle or rectangle according to its discriminator.
Another type of query would be one that eagerly fetches an object and one or more related objects or collections in a single database call. e.g. Each shape object is returned with its vertex and side collections populated.
I'm sorry to disagree with so many others here, but I don't think that code generation is a good enough reason by itself to go with an ORM. You can write or find many good DAL templates for code generators that do not have the conceptual or performance overhead that ORM's do.
Or, if you think that you don't need to know how to write good SQL to use an ORM, again, I disagree. It might be true that from the perspective of writing single queries, relying on an ORM is easier. But, with ORM's it is far too easy to create poor performing routines when developers don't understand how their queries work with the ORM and the SQL they translate into.
Having a data layer that works against multiple databases can be a benefit. It's not one that I have had to rely on that often though.
In the end, I have to reiterate that in my experience, if you are not using the more advanced query features of your ORM, there are other options that solve the remaining problems with less learning and fewer CPU cycles.
Oh yeah, some developers do find working with ORM's to be fun so ORM's are also good from the keep-your-developers-happy perspective. =)

Speeding development. For example, eliminating repetitive code like mapping query result fields to object members and vice-versa.

Making data access more abstract and portable. ORM implementation classes know how to write vendor-specific SQL, so you don't have to.

Supporting OO encapsulation of business rules in your data access layer. You can write (and debug) business rules in your application language of preference, instead of clunky trigger and stored procedure languages.

Generating boilerplate code for basic CRUD operations. Some ORM frameworks can inspect database metadata directly, read metadata mapping files, or use declarative class properties.

You can move to different database software easily because you are developing to an abstraction.

Development happiness, IMO. ORM abstracts away a lot of the bare-metal stuff you have to do in SQL. It keeps your code base simple: fewer source files to manage and schema changes don't require hours of upkeep.
I'm currently using an ORM and it has sped up my development.

So that your object model and persistence model match.

To minimise duplication of simple SQL queries.

The reason I'm looking into it is to avoid the generated code from VS2005's DAL tools (schema mapping, TableAdapters).
The DAL/BLL i created over a year ago was working fine (for what I had built it for) until someone else started using it to take advantage of some of the generated functions (which I had no idea were there)
It looks like it will provide a much more intuitive and cleaner solution than the DAL/BLL solution from http://wwww.asp.net
I was thinking about created my own SQL Command C# DAL code generator, but the ORM looks like a more elegant solution

Abstract the sql away 95% of the time so not everyone on the team needs to know how to write super efficient database specific queries.

I think there are a lot of good points here (portability, ease of development/maintenance, focus on OO business modeling etc), but when trying to convince your client or management, it all boils down to how much money you will save by using an ORM.
Do some estimations for typical tasks (or even larger projects that might be coming up) and you'll (hopefully!) get a few arguments for switching that are hard to ignore.

Compilation and testing of queries.
As the tooling for ORM's improves, it is easier to determine the correctness of your queries faster through compile time errors and tests.
Compiling your queries helps helps developers find errors faster. Right? Right. This compilation is made possible because developers are now writing queries in code using their business objects or models instead of just strings of SQL or SQL like statements.
If using the correct data access patterns in .NET it is easy to unit test your query logic against in memory collections. This speeds the execution of your tests because you don't need to access the database, set up data in the database or even spin up a full blown data context.[EDIT]This isn't as true as I thought it was as unit testing in memory can present difficult challenges to overcome. But I still find these integration tests easier to write than in previous years.[/EDIT]
This is definitely more relevant today than a few years ago when the question was asked, but that may only be the case for Visual Studio and Entity Framework where my experience lies. Plugin your own environment if possible.

.net tiers using code smith templates
http://nettiers.com/default.aspx?AspxAutoDetectCookieSupport=1
Why code something that can be generated just as well.

convince them how much time / money you will save when changes come in and you don't have to rewrite your SQL since the ORM tool will do that for you

I think one cons is that ORM will need some updation in your POJO. mainly related to schema, relation and query. so scenario where you are not suppose to make changes in model objects, might be because it is shared among more that on project or b/w client and server. so in such cases you will need to split it in two levels, which will require additional efforts .
i am an android developer and as you know mobile apps are usually not huge in size, so this additional effort to segregate pure-model and orm-affected-model does not seems worth full.
i understand that question is generic one. but mobile apps are also come inside generic umbrella.

Sharing an ORM between languages

I am making a database with data in it. That database has two customers: 1) a .NET webserver that makes the data visible to users somehow someway. 2) a python dataminer that creates the data and populates the tables.
I have several options. I can use the .NET Entity Framework to create the database, then reverse engineer it on the python side. I can vice versa that. I can just write raw SQL statements in one or the other systems, or both. What are possible pitfalls of doing this one way or the other? I'm worried, for example, that if I use the python ORM to create the tables, then I'm going to have a hard time in the .NET space...

I love questions like that.
Here is what you have to consider, your web site has to be fast, and the bottleneck of most web sites is a database. The answer to your question would be - make it easy for .NET to work with SQL. That will require little more work with python, like specifying names of the table, maybe row names. I think Django and SQLAlchemy are both good for that.
Another solution could be to have a bridge between database with gathered data and database to display data. On a background you can have a task/job to migrate collected data to your main database. That is also an option and will make your job easier, at least all database-specific and strange code will go to the third component.
I've been working with .NET for quite a long time before I switched to python, and what you should know is that whatever strategy you chose it will be possible to work with data in both languages and ORMs. Do the hardest part of the job in the language your know better. If you are a Python developer - pick python to mess with the right names of tables and rows.

python database abstraction to store datastructures unpickled

I am looking for a generic way to store python objects in a database. Of course I could just pickle the objects, but that way I would have binary blobs in my database. That way I can not search my objects. Also it seems to be easier to put it together with other applications.
So in my fantasy, I have on object like
class myClass
data1=1
data2='foobar'
data3=some_html_object
...
and could do something like
mydata=myClass()
mydata.add_data(various_things)
mydata.save_to_database()
and would end up with a database which has colums called data1,data2, data3, where I have the values of the of the objects attributes in the rows stored as text which would be searchable. Of course some inital setup would have to be done.
And of course it would be nice if I could plug any database I want (well, at least not just one database) and would not be bothered with the details.
Now of course I could program my own framework to let me do this, but I was hoping that this has been done bevore by someone else :)
Any suggestions?

Your fantasy in fact exists!
You describe something called the Active Record Pattern. It is usually implemented by using Object-Relational Mapping. One common solution for Python is SQLAlchemy, but Storm is somehow popular too:
See What are some good Python ORM solutions?
If your are developing for the Web, Django possess its own ORM.

It sounds like what you want is an Object Relational Mapper (ORM) to map SQL tables to objects.
The most popular ORMs that support different dialects by community are the following:
Python -- SQLAlchemy, Storm, Django (built into web framework)
Ruby -- ActiveRecord, Sequel
Node -- Sequelize
For a specific example of implementing what you described in Python using SQLAlchemy, check out this blog post that walks through a simple example

What is the difference between sqlite3 and sqlalchemy?

Beginner question- what is the difference between sqlite and sqlalchemy?

They're apples and oranges.
Sqlite is a database storage engine, which can be better compared with things such as MySQL, PostgreSQL, Oracle, MSSQL, etc. It is used to store and retrieve structured data from files.
SQLAlchemy is a Python library that provides an object relational mapper (ORM). It does what it suggests: it maps your databases (tables, etc.) to Python objects, so that you can more easily and natively interact with them. SQLAlchemy can be used with sqlite, MySQL, PostgreSQL, etc.
So, an ORM provides a set of tools that let you interact with your database models consistently across database engines.

sqlite3 is a embedded RDBMS.
According to this article :
A relational database management system (RDBMS) is a database management system (DBMS) that is based on the relational model as introduced by E. F. Codd. Most popular commercial and open source databases currently in use are based on the relational database model.
A short definition of an RDBMS may be a DBMS in which data is stored in the form of tables and the relationship among the data is also stored in the form of tables.
SQLAlchemy is a Python ORM.
According to this article :
Object-relational mapping (ORM, O/RM, and O/R mapping) in computer software is a programming technique for converting data between incompatible type systems in object-oriented programming languages. This creates, in effect, a "virtual object database" that can be used from within the programming language.

Map raw SQL to multiple related Django models

Due to performance reasons I can't use the ORM query methods of Django and I have to use raw SQL for some complex questions. I want to find a way to map the results of a SQL query to several models.
I know I can use the following statement to map the query results to one model, but I can't figure how to use it to be able to map to related models (like I can do by using the select_related statement in Django).
model_instance = MyModel(**dict(zip(field_names, row_data)))
Is there a relatively easy way to be able to map fields of related tables that are also in the query result set?

First, can you prove the ORM is stopping your performance? Sometimes performance problems are simply poor database design, or improper indexes. Usually this comes from trying to force-fit Django's ORM onto a legacy database design. Stored procedures and triggers can have adverse impact on performance -- especially when working with Django where the trigger code is expected to be in the Python model code.
Sometimes poor performance is an application issue. This includes needless order-by operations being done in the database.
The most common performance problem is an application that "over-fetches" data. Casually using the .all() method and creating large in-memory collections. This will crush performance. The Django query sets have to be touched as little as possible so that the query set iterator is given to the template for display.
Once you choose to bypass the ORM, you have to fight out the Object-Relational Impedance Mismatch problem. Again. Specifically, relational "navigation" has no concept of "related": it has to be a first-class fetch of a relational set using foreign keys. To assemble a complex in-memory object model via SQL is simply hard. Circular references make this very hard; resolving FK's into collections is hard.
If you're going to use raw SQL, you have two choices.
Eschew "select related" -- it doesn't exist -- and it's painful to implement.
Invent your own ORM-like "select related" features. A common approach is to add stateful getters that (a) check a private cache to see if they've fetched the related object and if the object doesn't exist, (b) fetch the related object from the database and update the cache.
In the process of inventing your own stateful getters, you'll be reinventing Django's, and you'll probably discover that it isn't the ORM layer, but a database design or an application design issue.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.