How to persist data between executions in Python

How to persist data between executions in Python - python

I am working on a personal project in Python where I need some form of persistent data. The data would fit in 2-3 tables of 10-20 columns and 100-200 records each. I have a basic understanding of SQL, so a database seems to make some sense.
I am new to Python, so I am not familiar with the options for database interface from Python. I have also heard about pickling and am not sure if that would be a better solution for my project size. Can anyone recommend a good solution?

Or, if you just want to persist data between executions - for such a small data set you could have a look at the pickle module for persistency, and just load the data into memory during execution.
It's a simple solution - but for a personal project it might be enough.

You should use sqlite3 module for this, it is included in Python.
Also you may want too look for an ORM solution.

This sounds like very few data. An SQL DB might be overkill, especially with an ORM on top. I'd check whether JSON could do the job...

I agree with using sqlite3. It is very easy to use, you don't need to worry about having to set up a database server. You should check out the SQLAlchemy library too.

The real question is really what kind of operations you want to do with your data.
As far as storage possibilities, the simplest solutions are indeed sqlite3 and pickle.
The solution that you will choose depends basically on whether using SQL or Python is the easiest way for you to manage your data. SQL is probably better at complex operations than Python, but Python is definitely more lightweight and simpler, and therefore is a good choice for simple operations. So, if using pickle+Python is too cumbersome, then sqlite3 is a very good choice.

Peewee is another ORM that works with SQLite. It is an alternative to SQLAlchemy. If using SQLite, I would consider Peewee for pet projects and SQLAlchemy for professional work. I typically would not use SQLite directly.

Related

Building a DSL query language

i'm working on a project (written in Django) which has only a few entities, but many rows for each entity.
In my application i have several static "reports", directly written in plain SQL. The users can also search the database via a generic filter form. Since the target audience is really tech-savvy and at some point the filter doesn't fit their needs, i think about creating a query language for my database like YQL or Jira's advanced search.
I found http://sourceforge.net/projects/littletable/ and http://www.quicksort.co.uk/DeeDoc.html, but it seems that they only operate on in-memory objects. Since the database can be too large for holding it in-memory, i would prefer that the query is translated in SQL (or better a Django query) before doing the actual work.
Are there any library or best practices on how to do this?

Writing such a DSL is actually surprisingly easy with PLY, and what ho—there's already an example available for doing just what you want, in Django. You see, Django has this fancy thing called a Q object which make the Django querying side of things fairly easy.
At DjangoCon EU 2012, Matthieu Amiguet gave a session entitled Implementing Domain-specific Languages in Django Applications in which he went through the process, right down to implementing such a DSL as you desire. His slides, which include all you need, are available on his website. The final code (linked to from the last slide, anyway) is available at http://www.matthieuamiguet.ch/media/misc/djangocon2012/resources/compiler.html.
Reinout van Rees also produced some good comments on that session. (He normally does!) These cover a little of the missing context.
You see in there something very similar to YQL and JQL in the examples given:
groups__name="XXX" AND NOT groups__name="YYY"
(modified > 1/4/2011 OR NOT state__name="OK") AND groups__name="XXX"
It can also be tweaked very easily; for example, you might want to use groups.name rather than groups__name (I would). This modification could be made fairly trivially (allow . in the FIELD token, by modifying t_FIELD, and then replacing . with __ before constructing the Q object in p_expression_ID).
So, that satisfies simple querying; it also gives you a good starting point should you wish to make a more complex DSL.

I've faced exactly this problem - a large database which needs searching. I made some static reports and several fancy filters using django (very easy with django) just like you have.
However the power users were clamouring for more. I decided that there already was a DSL that they all knew - SQL. The question was how to make it secure enough.
So I used django permissions to give the power users permission to make SQL queries in a new table. I then made a view for the not-quite-so-power users to use these queries. I made them take optional parameters. The queries were run using Python's lower level DB-API which django is using under the hood for its ORM anyway.
The real trick was opening a read only database connection to run these queries just to make sure that no updates were ever run. I made a read only connection by creating a different user in the database with lower permissions and opening a specific connection for that in the view.
TL;DR - SQL is the way to go!

Depending on the form of your data, the types of queries your users need to use, and the frequency that your data is updated, an alternative to the pure SQL solution suggested by Nick Craig-Wood is to index your data in Solr and then run queries against it.
Solr is an added layer of complexity (configuration, data synchronization) but it is super-fast, can handle large datasets, and provides a (relatively) intuitive query language.

You could write your own SQL-ish language using pyparsing, actually. There is even pretty verbose example you could extend.

PyMongo vs MongoEngine for Django

For one of my projects I prefered using Django+Mongo.
Why should I use MongoEngine, but not just PyMongo? What are advantages? Querying with PyMongo gives results that are allready objects, aren't they? So what is the purpose of MongoEngine?

This is an old question but stumbling across it, I don't think the accepted answer answers the question. The question wasn't "What is MongoEngine?" - it was "Why should I use MongoEngine?" And the advantages of such an approach. This goes beyond Django to Python/Mongo in general. My two cents:
While both PyMongo and MongoEngine do both return objects (which is not wrong), PyMongo returns dictionaries that need to have their keys referenced by string. MongoEngine allows you to define a schema via classes for your document data. It will then map the documents into those classes for you and allow you to manipulate them. Why define a schema for schema-less data? Because in my opinion, its clear, explicit, and much easier to program against. You don't end up with dictionaries scattered about your code where you can't tell what's in them without actually looking at the data or running the program. In the case of MongoEngine and a decent IDE like PyCharm, typing a simple "." after the object will tell you all you need to know via auto-complete. It's also much easier for other developers coming in to examine and learn the data model as they work and will make anybody who hasn't seen the code in a while more productive, quicker.
Additionally, to me, the syntax used to manipulate documents with PyMongo (which is essentially the same as the mongo console) is ugly, error prone, and difficult to maintain.
Here is a basic example of updating a document in MongoEngine, which to me, is very elegant:
BlogPost.objects(id=post.id).update(title='Example Post')
Why use PyMongo? MongoEngine is a layer between you and the bare metal, so it's probably slower, although I don't have any benchmarks. PyMongo is lower level, so naturally you have more control. For simple projects, MongoEngine might be unnecessary. If you're already fluent in Mongo syntax, you may find PyMongo much more intuitive than I do and have no problem writing complex queries and updates. Perhaps you enjoy working directly with dictionaries on that lower level and aren't interested in an additional layer of abstraction. Maybe you're writing a script that isn't part of a big system, and you need it to be as lean and as fast as possible.
There's more to the argument, but I think that's pretty good for the basics.

I assume you have not read the MongoEngine claim.
MongoEngine is a Document-Object
Mapper (think ORM, but for document
databases) for working with MongoDB
from Python.
This basically say it all.
In addition: your claim that Pymongo would deliver objects is wrong....well in Python everything is an object - even a dict is an object...so you are true but not in the sense of having a custom class defined on the application level.
PyMongo is the low-level driver wrapping the MongoDB API into Python and delivering JSON in and out.
MongoEngine or other layers like MongoKit map your MongoDB-based data to objects similar to native Python database drivers + SQLAlchemy as ORM.

Probably way too late, but for anyone else attempting Django+Mongo, Django-nonrel is worth considering.

mongoengine will use the pymongo driver to connect to mongodb.
If you are familiar with django.. use mongoengine

What should I use for a python program to do stuff with tables of data?

I want to be able to add daily info to each object and want to have the ability to delete info x days old easily. With the tables I need to look at the trends and do stuff like selecting objects which match some criteria.
Edit: I asked this because I'm not able to think of a way to implement deleting old data easily because you cannot delete tables in sqlite

Using sqlite would it be the best option, is file based, easy to use, you can use Lookups with SQL and it's builtin on python you don't need to install anything.
→ http://docs.python.org/library/sqlite3.html

If your question means that you are just going to be using "table like data" but not bound to a db, look into using this python modul: Module for table like snytax
If you are going to be binding to a back end, and not* distributing your data among computers, then SQLite is the way to go.

A "proper" database would probably be the way to go. If your application only runs on one computer and the database doesn't get to big, sqlite is good and easy to use with python (standard module sqlite3, see the Library Reference for more information)

take a look at the sqlite3 module, it lets you create a single-file database (no server to setup) that will let you perform sql queries. It's part of the standard library in python, so you don't need to install anythin additional.
http://docs.python.org/library/sqlite3.html

Lightweight Object->Database in Python

I am in need of a lightweight way to store dictionaries of data into a database. What I need is something that:
Creates a database table from a simple type description (int, float, datetime etc)
Takes a dictionary object and inserts it into the database (including handling datetime objects!)
If possible: Can handle basic references, so the dictionary can reference other tables
I would prefer something that doesn't do a lot of magic. I just need an easy way to setup and get data into an SQL database.
What would you suggest? There seems to be a lot of ORM software around, but I find it hard to evaluate them.

SQLAlchemy's SQL expression layer can easily cover the first two requirements. If you also want reference handling then you'll need to use the ORM, but this might fail your lightweight requirement depending on your definition of lightweight.

SQLAlchemy offers an ORM much like django, but does not require that you work within a web framework.

From it's description, perhaps Axiom is a pythonic tool for this .

Seeing as you have mentioned sql, python and orm in your tags, are you looking for Django? Of all the web frameworks I've tried, I like this one the best. You'd be looking at models, specifically. This could be too fancy for your needs, perhaps, but that shouldn't stop you looking at the code of Django itself and learning from it.

Database change underneath SQLObject

I'm starting a web project that likely should be fine with SQLite. I have SQLObject on top of it, but thinking long term here -- if this project should require a more robust (e.g. able to handle high traffic), I will need to have a transition plan ready. My questions:
How easy is it to transition from one DB (SQLite) to another (MySQL or Firebird or PostGre) under SQLObject?
Does SQLObject provide any tools to make such a transition easier? Is it simply take the objects I've defined and call createTable?
What about having multiple SQLite databases instead? E.g. one per visitor group? Does SQLObject provide a mechanism for handling this scenario and if so, what is the mechanism to use?
Thanks,
Sean

3) Is quite an interesting question. In general, SQLite is pretty useless for web-based stuff. It scales fairly well for size, but scales terribly for concurrency, and so if you are planning to hit it with a few requests at the same time, you will be in trouble.
Now your idea in part 3) of the question is to use multiple SQLite databases (eg one per user group, or even one per user). Unfortunately, SQLite will give you no help in this department. But it is possible. The one project I know that has done this before is Divmod's Axiom. So I would certainly check that out.
Of course, it would probably be much easier to just use a good concurrent DB like the ones you mention (Firebird, PG, etc).
For completeness:
1 and 2) It should be straightforward without you actually writing much code. I find SQLObject a bit restrictive in this department, and would strongly recommend SQLAlchemy instead. This is far more flexible, and if I was starting a new project today, I would certainly use it over SQLObject. It won't be moving "Objects" anywhere. There is no magic involved here, it will be transferring rows in tables in a database. Which as mentioned you could do by hand, but this might save you some time.

Your success with createTable() will depend on your existing underlying table schema / data types. In other words, how well SQLite maps to the database you choose and how SQLObject decides to use your data types.
The safest option may be to create the new database by hand. Then you'll have to deal with data migration, which may be as easy as instantiating two SQLObject database connections over the same table definitions.
Why not just start with the more full-featured database?

I'm not sure I understand the question.
The SQLObject documentation lists six kinds of connections available. Further, the database connection (or scheme) is specified in a connection string. Changing database connections from SQLite to MySQL is trivial. Just change the connection string.
The documentation lists the different kinds of schemes that are supported.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.