i want to create application in windows. i need to use databases which would be preferable best for pyqt application
like
sqlalchemy
mysql
etc.
I would use SQLite every time unless performance became an obvious big problem.
It comes with Python
You don't need to worry about installing it on a target machine or having an existing installation which might clash (including a potential port clash - SQLite doesn't use a port)
It's fairly small (doesn't increase the installed size too much)
Then, a much less obvious choice that I would very much consider making: adding Django to the mix. Django's model system could make for much simpler management, depending on the type of data you're working with. Also, in the case where I've considered it (I just haven't got to that stage of development yet) it means I can reuse the models I've got on the server and a good bit of code from there too.
Obviously in this case you could need to be careful about what you expose; business-critical processing stuff that you don't want to share, potential security holes in server code which you've helpfully provided the code for, etc.
SQlite is fine for a single user.
If you are going over a network to talk to a central database, then you need a database woith a decent Python lirary.
Take a serious look at MySQL if you need/want SQL.
Otherwise, there is CouchDB in the Not SQL camp, which is great if you are storing documents, and can express searches as Map/reduce functions. Poor for adhoc queries.
If you want a relational database I'd recommend you to use SQLAlchemy, as you then get a choice as well as an ORM. Bu default go with SQLite, as per other recommendations here.
If you don't need a relational database, take a look at ZODB. It's an awesome Python-only object-oriented database.
i guess its totally upto you ..but as far as i am concerned i personlly use sqlite, becoz it is easy to use and amazingly simple syntax whereas for MYSQL u can use it for complex apps and has options for performance tuning. but in end its totally upto u and wt your app requires
Related
I'm working on a framework for Digital Forensic Investigators to use to compare files with each other for my Master's capstone project. However, I ran into a bit of a snag...
I'm trying to implement multiprocessing on the comparisons since using a single core seems to be really slow. The trouble I'm having, however, is when the code goes to enter information into an SQLite database. It will occasionally get a "Database is locked" error when two cores complete at nearly the same time.
So, simple side of my question, is it unsafe to operate database functions within a multiprocessing environment due to the errors I'm encountering? If not, is there a method of going about this that is safe and won't result in random errors?
Thanks!
Your problem is that you are trying to have multiple writers access a toy database -- i.e. sqlite -- which is stored in a single file. Using Lock might help, but it's going to kill your multiprocess throughput because of all the waiting-for-the-lock time. In essence, the lock choke point will serialize your program.
Setting up either MySQL or Postgres on almost any platform is straightforward, and there are several excellent Python modules for accessing them. Using one of those will completely eliminate this problem.
Update for an extended response to comment:
I always ask clients / students, "What problem are you trying to solve?" I'm assuming that you are not trying to create a database system, simply to use one. SQLite3 is fine for a well-defined set of problems, but multiprocess access is not one of them. I could veer off into asking what aspect of your project requires multiprocess access, but I'll assume that you have already determined that this is needed. I don't know either your programming skills or your understanding of how a database works, so forgive me if the following is a bit basic.
Normally you need a database (my preference is Postgres), and a Python module that understands all of the fiddly details of how to talk to that database. Then you need to know what it is you want the DBMS to do for you. The Good News is that you are hardly the first to go down this path.
The Postgres Wiki is full of good stuff. See their page on Python Drivers. Psycopg2 is the category leader and runs on Win/Linux/Mac. Also check out PyPi, the Python Package Index, for many well-written extensions.
If you want to stay more object-oriented, as opposed to writing straight SQL, you might want to look at an ORM like SQLAlchemy. This is another category leader that is well-maintained and widely deployed.
The value of using an ORM is that you can (mostly) keep your head in ObjectLand, where most of your problem lives, and not get tangled up in the cognitive dissonance created by object-oriented programming vs. relational database management, which are two very different views of the world of data.
If you need more help, email me. My address is in my profile.
You can make use of Lock. Take a look at https://docs.python.org/2/library/multiprocessing.html#synchronization-between-processes
So, I'm looking at writing an app with python2 django(-rest-framework), postgres and angular.
I'm aware there are lots of things that can be done
multi-server setup behind load balancer
DB replication/sharding?
caching (in various ways)
swapping DRF serialiser for serpy
running on python3
running on pypy
my question is - Which of these (or other things) should really be done right at the start of the project?
Write with scalability in mind.
scaliability is not limited just to production servers/environments but also to development environments.
Always write with scalability in mind.
At development
Scalability at development let you develop the product seemlessly.
Structure your repository
Use git branching models like GitFlow so that developers can work on parallel, or a single developer can switch working on different features. Use a bug tracker.
Design your apps.
Before actually writing a single line of code, write down what apps you are going to write. Design apps as to minimize Relations (ManyToMany, ForeignKey, etc..), imports. Django offers pulggable app architecture feel free to use it wisely.
Write your tests first.
This ensures that you can migrate(production environments), upgrade and downgrade with less pain and hairloss. Trust me writing tests feels so boring, but it is worth to.
Abstract models, managers
Use Abstract Models and Managers, it can eliminate bolier plate model code and help you maintain the code.
Name variables, classes and methods descriptive.
Name the variables, classes and methods descriptive, as you would be able to know what it represents without looking documentation.
Document code.
Feel free to document classes and methods, so you or other peers who look into the code get an idea of what it is indented for than stacktracing to see what a method is doing.
Use debug toolbar
Use django debug toolbar on developement, as you test your API, use prefectch_related() and select_related() to minimize/eliminate duplicated queries.
Modularize code.
Modularize the code. Python and django in general encourage use of modules. Modules are easy to manage. Use classes, more inheritance and abstract base classes to reuse code.
Use Continuous Integration
use continuous integration test your repo and make sure new pushes don't break the system.
At production.
scalability at production let you serve the product to infinite users seemlessly.
For multi server setup
Stick with rest design principles.
Eliminate sessions.
Use distributed cache like Redis.
Swapping DRF serializer for serpy
Start with serpy if you need more speed and if you are comfortable with. It is better to stick with Serpy than rewriting DRF serializer as writing both looks smiliar, but make sure you are not wasting time by optimizing for the lost 1 or 2ms.
Running on python3
Depends on the libraries that you plan to use.
Running on pypy
pypy is faster than the standard implementations. It depends on the library compatibility to use pypy. A list of compatibile package and status of compatibility.
now the question,
Which of these (or other things) should really be done right at the start of the project?
Ans: Developemet (1,2,3,4,5,6,7,8) Production(1,2)
I don't think you need to start worrying about the setup right away. I would discourage premature optimizations. Rather, run the app in production, profile it. See what affects the performance when you hit scale - you would know what's the bottleneck.
The first and main things you have to get right are a clean and correct db schema and clear, readable and correctly factored (DRY... unless it's accidental duplication) and decoupled code. If you know to design a relational DB schema and learn to use Python and Django properly you shouldn't have much problems so far, and if you get both these things right it will (well it should) be easy to scale - by adding cache where needed (Redis, Memcache, or an intermediary NoSQL document database storing "pre-processed" versions of your often accessed data), adding servers, load-balancing etc, depending on your application's needs. Django is built to scale easily, and unless you do stupid things it does scale easily.
I noticed that a significant part of my (pure Python) code deals with tables. Of course, I have class Table which supports the basic functionality, but I end up adding more and more features to it, such as queries, validation, sorting, indexing, etc.
I to wonder if it's a good idea to remove my class Table, and refactor the code to use a regular relational database that I will instantiate in-memory.
Here's my thinking so far:
Performance of queries and indexing would improve but communication between Python code and the separate database process might be less efficient than between Python functions. I assume that is too much overhead, so I would have to go with sqlite which comes with Python and lives in the same process. I hope this means it's a pure performance gain (at the cost of non-standard SQL definition and limited features of sqlite).
With SQL, I will get a lot more powerful features than I would ever want to code myself. Seems like a clear advantage (even with sqlite).
I won't need to debug my own implementation of tables, but debugging mistakes in SQL are hard since I can't put breakpoints or easily print out interim state. I don't know how to judge the overall impact of my code reliability and debugging time.
The code will be easier to read, since instead of calling my own custom methods I would write SQL (everyone who needs to maintain this code knows SQL). However, the Python code to deal with database might be uglier and more complex than the code that uses pure Python class Table. Again, I don't know which is better on balance.
Any corrections to the above, or anything else I should think about?
SQLite does not run in a separate process. So you don't actually have any extra overhead from IPC. But IPC overhead isn't that big, anyway, especially over e.g., UNIX sockets. If you need multiple writers (more than one process/thread writing to the database simultaneously), the locking overhead is probably worse, and MySQL or PostgreSQL would perform better, especially if running on the same machine. The basic SQL supported by all three of these databases is the same, so benchmarking isn't that painful.
You generally don't have to do the same type of debugging on SQL statements as you do on your own implementation. SQLite works, and is fairly well debugged already. It is very unlikely that you'll ever have to debug "OK, that row exists, why doesn't the database find it?" and track down a bug in index updating. Debugging SQL is completely different than procedural code, and really only ever happens for pretty complicated queries.
As for debugging your code, you can fairly easily centralize your SQL calls and add tracing to log the queries you are running, the results you get back, etc. The Python SQLite interface may already have this (not sure, I normally use Perl). It'll probably be easiest to just make your existing Table class a wrapper around SQLite.
I would strongly recommend not reinventing the wheel. SQLite will have far fewer bugs, and save you a bunch of time. (You may also want to look into Firefox's fairly recent switch to using SQLite to store history, etc., I think they got some pretty significant speedups from doing so.)
Also, SQLite's well-optimized C implementation is probably quite a bit faster than any pure Python implementation.
You could try to make a sqlite wrapper with the same interface as your class Table, so that you keep your code clean and you get the sqlite performences.
If you're doing database work, use a database, if your not, then don't. Using tables, it sound's like you are. I'd recommend using an ORM to make it more pythonic. SQLAlchemy is the most flexible (though it's not strictly just an ORM).
I want to use sqlite memory database for all my testing and Postgresql for my development/production server.
But the SQL syntax is not same in both dbs. for ex: SQLite has autoincrement, and Postgresql has serial
Is it easy to port the SQL script from sqlite to postgresql... what are your solutions?
If you want me to use standard SQL, how should I go about generating primary key in both the databases?
My suggestion would be: don't. The capabilities of Postgresql are far beyond what SQLite can provide, particularly in the areas of date/numeric support, functions and stored procedures, ALTER support, constraints, sequences, other types like UUID, etc., and even using various SQLAlchemy tricks to try to smooth that over will only get you a slight bit further. In particular date and interval arithmetic are totally different beasts on the two platforms, and SQLite has no support for precision decimals (non floating-point) the way PG does. PG is very easy to install on every major OS and life is just easier if you go that route.
Don't do it. Don't test in one environment and release and develop in another. Your asking for buggy software using this process.
Although we started with sqllite for our testing environment we are seriously looking to having postgres running for each developer. We have scripts that build the test database that our unittests run against, and we have a 'development' version that the devs use.
We investigated running postgres 'in memory' on ramdisk, but this discussion: http://dbaspot.com/forums/postgresql/395602-memory-postgresql-database.html suggests that it isn't necessary.
We haven't run into any problems yet, but it is still early in the development process and we haven't had to do anything too fancy yet.
zzzeek points out some items that will probably trip us up soon :(
Best make the move now....
I'm starting on a new scientific project which has a lot of data (millions of entries) I'd like to store in an easily and quickly accessible format. I've come across a number of different potential options, but I'm not sure how to pick amongst them. My data can probably just be stored as a dictionary, or potentially a dictionary of dictionaries. Some potential considerations:
Speed. I can't load all the data off disk every time I start a new script, and I'd like as quick access to random entries as possible.
Ease-of-use. This is python. The storage should feel like python.
Stability/maturity. I'd like something that's currently supported, although something that works well but is still in development would be fine.
Ease of installation. My sysadmin should be able to get this running on our cluster.
I don't really care that much about the size of the storage, but it could be a consideration if an option is really terrible on this front. Also, if it matters, I'll most likely be creating the database once, and thereafter only reading from it.
Some potential options that I've started looking at (see this post):
pyTables
ZopeDB
shove
shelve
redis
durus
Any suggestions on which of these might be better for my purposes? Any better ideas? Some of these have a back-end; any suggestions on which file-system back-end would be best?
Might want to give mongodb a shot - the PyMongo library works with dictionaries and supports most Python types. Easy to install, very performant + scalable. MongoDB (and PyMongo) is also used in production at some big names.
A RDBMS.
Nothing is more realiable than using tables on a well known RDBMS. Postgresql comes to mind.
That automatically gives you some choices for the future like clustering. Also you automatically have a lot of tools to administer your database, and you can use it from other software written in virtually any language.
It is really fast.
In the "feel like python" point, I might add that you can use an ORM. A strong name is sqlalchemy. Maybe with the elixir "extension".
Using sqlalchemy you can leave your user/sysadmin choose which database backend he wants to use. Maybe they already have MySql installed - no problem.
RDBMSs are still the best choice for data storage.
I'm working on such a project and I'm using SQLite.
SQLite stores everything in one file and is part of Python's standard library. Hence, installation and configuration is virtually for free (ease of installation).
You can easily manage the database file with small Python scripts or via various tools. There is also a Firefox plugin (ease of installation / ease-of-use).
I find it very convenient to use SQL to filter/sort/manipulate/... the data. Although, I'm not an SQL expert. (ease-of-use)
I'm not sure if SQLite is the fastes DB system for this work and it lacks some features you might need e.g. stored procedures.
Anyway, SQLite works for me.
if you really just need dictionary-like storage, some of the new key/value or column stores like Cassandra or MongoDB might provide a lot more speed than you'd get with a relational database. Of course if you decide to go with RDBMS, SQLAlchemy is the way to go (disclaimer: I am its creator), but your desired featurelist seems to lean in the direction of "I just want a dictionary that feels like Python" - if you aren't interested in relational queries or strong ACIDity, those facets of RDBMS will probably feel cumbersome.
Sqlite -- it comes with python, fast, widely availible and easy to maintain
If you only need simple (dict like) access mechanisms and need efficiency for processing a lot of data, then HDF5 might be a good option. If you are going to be using numpy then it is really worth considering.
Go with a RDBMS is reliable scalable and fast.
If you need a more scalabre solution and don't need the features of RDBMS, you can go with a key-value store like couchdb that has a good python api.
The NEMO collaboration (building a cosmic neutrino detector underwater) had much of the same problems, and they used mysql and postgresql without major problems.
It really depends on what you're trying to do. An RDBMS is designed for relational data, so if your data is relational, then use one of the various SQL options. But it sounds like your data is more oriented towards a key-value store with very fast random GET operations. If that's the case, compare the benchmarks of the various key-stores, focusing on the GET speed. The ideal key-value store will keep or cache requests in memory, and be able to handle many GET requests concurrently. You may actually want to create your own benchmark suite so you can effectively compare random concurrent GET operations.
Why do you need a cluster? Is the size of each value very large? If not, you shouldn't need a cluster to handle storage of a million entries. But if you're storing large blobs of data, that matters, and you may need something easily supports read slaves and/or transparent partitioning. Some of the key-value stores are document oriented and/or optimized for storing larger values. Redis is technically more storage efficient for larger values due to the indexing overhead required for fast GETs, but that doesn't necessarily mean it's slower. In fact, the extra indexing makes lookups faster.
You're the only one that can truly answer this question, and I strongly recommend putting together a custom benchmark suite to test available options with actual usage scenarios. The data you get from that will give you more insight than anything else.