Database change underneath SQLObject

Database change underneath SQLObject - python

I'm starting a web project that likely should be fine with SQLite. I have SQLObject on top of it, but thinking long term here -- if this project should require a more robust (e.g. able to handle high traffic), I will need to have a transition plan ready. My questions:
How easy is it to transition from one DB (SQLite) to another (MySQL or Firebird or PostGre) under SQLObject?
Does SQLObject provide any tools to make such a transition easier? Is it simply take the objects I've defined and call createTable?
What about having multiple SQLite databases instead? E.g. one per visitor group? Does SQLObject provide a mechanism for handling this scenario and if so, what is the mechanism to use?
Thanks,
Sean

3) Is quite an interesting question. In general, SQLite is pretty useless for web-based stuff. It scales fairly well for size, but scales terribly for concurrency, and so if you are planning to hit it with a few requests at the same time, you will be in trouble.
Now your idea in part 3) of the question is to use multiple SQLite databases (eg one per user group, or even one per user). Unfortunately, SQLite will give you no help in this department. But it is possible. The one project I know that has done this before is Divmod's Axiom. So I would certainly check that out.
Of course, it would probably be much easier to just use a good concurrent DB like the ones you mention (Firebird, PG, etc).
For completeness:
1 and 2) It should be straightforward without you actually writing much code. I find SQLObject a bit restrictive in this department, and would strongly recommend SQLAlchemy instead. This is far more flexible, and if I was starting a new project today, I would certainly use it over SQLObject. It won't be moving "Objects" anywhere. There is no magic involved here, it will be transferring rows in tables in a database. Which as mentioned you could do by hand, but this might save you some time.

Your success with createTable() will depend on your existing underlying table schema / data types. In other words, how well SQLite maps to the database you choose and how SQLObject decides to use your data types.
The safest option may be to create the new database by hand. Then you'll have to deal with data migration, which may be as easy as instantiating two SQLObject database connections over the same table definitions.
Why not just start with the more full-featured database?

I'm not sure I understand the question.
The SQLObject documentation lists six kinds of connections available. Further, the database connection (or scheme) is specified in a connection string. Changing database connections from SQLite to MySQL is trivial. Just change the connection string.
The documentation lists the different kinds of schemes that are supported.

Related

What did my teacher mean by 'db.sqlite3' will fall short in bigger real word problems?

I am very new to programming and this site too... An online course that I follow told me that it is not possible to manage bigger databases with db.sqlite3, what does it mean anyway?

Choice of Relational Database Management Systems (RDBMS) is dependent on your use case. The different options available have different pros and cons and hence, for different applications, some are more suitable than others.
I typically use SQLite (only for development purposes) and then switch to MySQL for my Django projects.
SQLite: Is file based. You can actually see the file in your project directory so all the CRUD (Create, Retrieve, Update, Delete) is done directly onto that file. Also, all the underlying code for the RDBMS is quite small in size. So all this makes it good for applications which don't require intensive use of databases or perhaps require offline storage e.g. IoT, small websites etc. When you try to use it for big projects that require intensive use of databases e.g. online stores, you run into many problems because the RDBMS is not as well developed as MySQL or PostgreSQL. The primary problem is a lack of concurrency i.e. only one device can be writing to the database at a time because operations are serialised.
MySQL: Is one of the most popularly used and my personal favourite (very easy to configure and use with Django). It's based on the client/server database model and not a file like SQLite and is very scalable i.e. it is capable of way more than SQLite and you can use it for many different applications that require heavy use of the RDBMS. It has better security, allows for concurrent operations and outperforms PostgreSQL in performance when you need to do lots of reading operations.
PostgreSQL: Is also a very strong option and capable of most of the stuff that MySQL can do but handles clients in a different way and it has an edge over MySQL in SELECTs and INSERTs. MySQL is still soooo much more widely used than PostgreSQL though.
There are also many other options on the market. You can take a look at this article which compares a bunch of them. But to answer your question, SQLite is very simplistic compared to the other options and stores everything in a file in your project rather than on a server, so as a result, there is little security, lack of concurrency etc. This is fine when developing and for use cases that do not require major use of databases but will not cut it for big projects.

This is not a matter of how big the DB is. SQLite DB can be very big, hundreds of Gigabytes.
It is a matter of how many user are using the application (you mention django) concurrently. As SQLite only support one writer at a time, the other are queued. Fortunately, you can have many concurrent readers.
So if you have a lot of concurrent access (that are not explicitly marked are read-only) then SQLite is not a good choice anymore. You'll prefer something like PostgreSQL.
BTW, everything is better explained in the documentation ;)

Selecting a database for your project is like selecting any other technology. It depends on your use case.
Size isn't the issue, complexity is. SQLite3 databases can grow as big as 281 terabytes. Limits on number of tables, columns & rows are also pretty decent.
If your application logic requires SQL operations like:
RIGHT OUTER JOIN, FULL OUTER JOIN
ALTER TABLE, ADD CONSTRAINT, etc..
DELETE, INSERT, or UPDATE on a VIEW
Custom user permissions to read/write
Then SQLite3 should not be your choice of database as these SQL features are not implemented in SQLite3.

Sharing an ORM between languages

I am making a database with data in it. That database has two customers: 1) a .NET webserver that makes the data visible to users somehow someway. 2) a python dataminer that creates the data and populates the tables.
I have several options. I can use the .NET Entity Framework to create the database, then reverse engineer it on the python side. I can vice versa that. I can just write raw SQL statements in one or the other systems, or both. What are possible pitfalls of doing this one way or the other? I'm worried, for example, that if I use the python ORM to create the tables, then I'm going to have a hard time in the .NET space...

I love questions like that.
Here is what you have to consider, your web site has to be fast, and the bottleneck of most web sites is a database. The answer to your question would be - make it easy for .NET to work with SQL. That will require little more work with python, like specifying names of the table, maybe row names. I think Django and SQLAlchemy are both good for that.
Another solution could be to have a bridge between database with gathered data and database to display data. On a background you can have a task/job to migrate collected data to your main database. That is also an option and will make your job easier, at least all database-specific and strange code will go to the third component.
I've been working with .NET for quite a long time before I switched to python, and what you should know is that whatever strategy you chose it will be possible to work with data in both languages and ORMs. Do the hardest part of the job in the language your know better. If you are a Python developer - pick python to mess with the right names of tables and rows.

Building a DSL query language

i'm working on a project (written in Django) which has only a few entities, but many rows for each entity.
In my application i have several static "reports", directly written in plain SQL. The users can also search the database via a generic filter form. Since the target audience is really tech-savvy and at some point the filter doesn't fit their needs, i think about creating a query language for my database like YQL or Jira's advanced search.
I found http://sourceforge.net/projects/littletable/ and http://www.quicksort.co.uk/DeeDoc.html, but it seems that they only operate on in-memory objects. Since the database can be too large for holding it in-memory, i would prefer that the query is translated in SQL (or better a Django query) before doing the actual work.
Are there any library or best practices on how to do this?

Writing such a DSL is actually surprisingly easy with PLY, and what ho—there's already an example available for doing just what you want, in Django. You see, Django has this fancy thing called a Q object which make the Django querying side of things fairly easy.
At DjangoCon EU 2012, Matthieu Amiguet gave a session entitled Implementing Domain-specific Languages in Django Applications in which he went through the process, right down to implementing such a DSL as you desire. His slides, which include all you need, are available on his website. The final code (linked to from the last slide, anyway) is available at http://www.matthieuamiguet.ch/media/misc/djangocon2012/resources/compiler.html.
Reinout van Rees also produced some good comments on that session. (He normally does!) These cover a little of the missing context.
You see in there something very similar to YQL and JQL in the examples given:
groups__name="XXX" AND NOT groups__name="YYY"
(modified > 1/4/2011 OR NOT state__name="OK") AND groups__name="XXX"
It can also be tweaked very easily; for example, you might want to use groups.name rather than groups__name (I would). This modification could be made fairly trivially (allow . in the FIELD token, by modifying t_FIELD, and then replacing . with __ before constructing the Q object in p_expression_ID).
So, that satisfies simple querying; it also gives you a good starting point should you wish to make a more complex DSL.

I've faced exactly this problem - a large database which needs searching. I made some static reports and several fancy filters using django (very easy with django) just like you have.
However the power users were clamouring for more. I decided that there already was a DSL that they all knew - SQL. The question was how to make it secure enough.
So I used django permissions to give the power users permission to make SQL queries in a new table. I then made a view for the not-quite-so-power users to use these queries. I made them take optional parameters. The queries were run using Python's lower level DB-API which django is using under the hood for its ORM anyway.
The real trick was opening a read only database connection to run these queries just to make sure that no updates were ever run. I made a read only connection by creating a different user in the database with lower permissions and opening a specific connection for that in the view.
TL;DR - SQL is the way to go!

Depending on the form of your data, the types of queries your users need to use, and the frequency that your data is updated, an alternative to the pure SQL solution suggested by Nick Craig-Wood is to index your data in Solr and then run queries against it.
Solr is an added layer of complexity (configuration, data synchronization) but it is super-fast, can handle large datasets, and provides a (relatively) intuitive query language.

You could write your own SQL-ish language using pyparsing, actually. There is even pretty verbose example you could extend.

Proper way to establish database connection in python

I have a script with several functions that all need to make database calls. I'm trying to get better at writing clean code rather than just throwing together scripts with horrible style. What is generally considered the best way to establish a global database connection that can be accessed anywhere in the script but is not susceptible to errors such as accidentally redefining the variable holding a connection. I'd imagine I should be putting everything in a module? Any links to actual code would be very useful as well. Thanks.

If you are working with Python and databases, you cannot afford not to look at SQLAlchemy:
SQLAlchemy is the Python SQL toolkit
and Object Relational Mapper that
gives application developers the full
power and flexibility of SQL.
It provides a full suite of well known
enterprise-level persistence patterns,
designed for efficient and
high-performing database access,
adapted into a simple and Pythonic
domain language.
I have built very complex databases with a surprisingly small amount of code (a few hundred lines). The schema definition is almost self-documenting, the objects used for the Object Relational Mapper are Plain Old Python Objects (i.e., what you already have), and the querying API is almost obvious. In addition, the documentation is excellent: many online examples, fully documented API, and an O'Reilly book which, while far from perfect, does take you from zero to dangerous in a few evenings.
If you don't want to use the Object Relational Mapper, you can always fall back to plain connections and literal SQL. Also, the code is portable and database independent (the same code will work with MySQL, Oracle, SQLite, and other database managers).
The Session object will automatically take care of the pooling (what you mention as your concern).
The best way to understand its power is probably to follow the tutorials obtained in the first result page of the Google query sqlalchemy tutorial.

Use a model system/ORM system.

Is there any python web app framework that provides database abstraction layer for SQL and NoSQL?

Is it even possible to create an abstraction layer that can accommodate relational and non-relational databases? The purpose of this layer is to minimize repetition and allows a web application to use any kind of database by just changing/modifying the code in one place (ie, the abstraction layer). The part that sits on top of the abstraction layer must not need to worry whether the underlying database is relational (SQL) or non-relational (NoSQL) or whatever new kind of database that may come out later in the future.

There's a Summer of Code project going on right now to add non-relational support to Django's ORM. It seems to be going well and chances are good that it will be merged into core in time for Django 1.3.

You could use stock Django and Django-nonrel ( http://www.allbuttonspressed.com/projects/django-nonrel ) together to get a quite unified experience. Some limits apply, read docs carefully though, remembering Spolsky's "All abstractions are leaky".

Yo may also check web2py, they support relational databases and GAE on the core.

Regarding App Engine, all existing attempts limit you in some way (web2py doesn't support transactions or namespaces and probably many other stuff, for example). If you plan to work with GAE, use what GAE provides and forget looking for a SQL-NoSQL holy grail. Existing solutions are inevitably limited and affect performance negatively.

Thank you for all the answers. To summarize the answers, currently only web2py and Django supports this kind of abstraction.
It is not about a SQL-NoSQL holy grail, using abstraction can make the apps more flexible. Lets assume that you started a project using NoSQL, and then later on you need to switch over to SQL. It is desirable that you only make changes to the codes in a few spots instead of all over the place. For some cases, it does not really matter whether you store the data in a relational or non-relational db. For example, storing user profiles, text content for dynamic page, or blog entries.
I know there must be a trade off by using the abstraction, but my question is more about the existing solution or technical insight, instead of the consequences.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.