Which Python client library should I use for CouchdB? [closed]

Which Python client library should I use for CouchdB? [closed] - python

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
I'm starting to experiment with CouchDB because it looks like the perfect solution for certain problems we have. Given that all work will be on a brand new project with no legacy dependencies, which client library would you suggest that I use, and why?
This would be easier if there was any overlap on the OSes we use. FreeBSD only has py-simplecouchdb already available in its ports collection, but that library's project website says to use CouchDBKit instead. Neither of those come with Ubuntu, which only ships with CouchDB. Since those two OSes don't have an libraries in common, I'll probably be installing something from source (and hopefully submitting packages to the Ubuntu and FreeBSD folks if I have time).
For those interested, I'd like to use CouchDB as a convenient intermediate storage place for data passed between various services - think of a message bus system but with less formality. For example, we have daemons that download and parse web pages, then send interesting bits to other daemons for further processing. A lot of those objects are ill-defined until runtime ("here's some HTML, plus a set of metadata, and some actions to run on it"). Rather than serialize it to an ad-hoc local network protocol or stick it in PostgreSQL, I'd much rather use something designed for the purpose. We're currently using NetWorkSpaces in this role, but it doesn't have nearly the breadth of support or the user community of CouchDB.

I have been using couchdb-python with quite a lot of success and as far as I know the guys of desktopcouch use it in ubuntu. The prerequisites are very basic and you should have not problems:
httplib2
simplejson or cjson
Python
CouchDB 0.9.x (earlier or later versions are unlikely to work as the interface is still changing)
For me some of the advantages are:
Pythonic interface. You can work with the database like if it was a dict.
Interface for design documents.
a CouchDB view server that allows writing view functions in Python
It also provides a couple of command-line tools:
couchdb-dump: Writes a snapshot of a CouchDB database
couchdb-load: Reads a MIME multipart file as generated by couchdb-dump and loads all the documents, attachments, and design documents into a CouchDB database.
couchdb-replicate: Can be used as an update-notification script to trigger replication between databases when data is changed.

If you're still considering CouchDB then I'll recommend Couchdbkit (http://www.couchdbkit.org). It's simple enough to quickly get a hang on and runs fine on my machine running Karmic Koala. Prior to that I've tried couchdb-python but some bugs (maybe ironed out by now) with httplib was giving me some errors (duplicate documents..etc) but Couchdbkit got me up and going so far without any problems.

spycouch
Simple Python API for CouchDB
Python library for easily manage CouchDB.
Compared to ordinarily available libraries on web, works with the latest version CouchDB - 1.2.1
Functionality
Create a new database on the server
Deleting a database from the server
Listing databases on the server
Database information
Database compression
Create map view
Map view
Listing documents in DB
Get document from DB
Save document to DB
Delete document from DB
Editing of a document
spycouch on >> https://github.com/cernyjan/repository

Considering the task you are trying to solve (distributed task processing) you should consider using one of the many tools designed for message passing rather than using a database. See for instance this SO question on running multiple tasks over many machines.
If you really want a simple casual message passing system, I recommend you shift your focus to MorbidQ. As you get more serious, use RabbitMQ or ActiveMQ. This way you reduce the latency in your system and avoid having many clients polling a database (and thus hammering that computer).
I've found that avoiding databases is a good idea (That's my blog) - and I have a end-to-end live data system running using MorbidQ here

I have written a couchdb client library built on python-requests (which is in most distributions). We use this library in production.
https://github.com/adamlofts/couchdb-requests
Robust CouchDB Python interface using python-requests.
Goals:
Only one way to do something
Fast and stable (connection pooled)
Explicit is better than implicit. Buffer sizes, connection pool size.
Specify query parameters, no **params in query functions

After skimming through the docs of many couchdb python libraries, my choice went to pycouchdb.
All I needed to know was very quick to grasp from the doc: https://py-couchdb.readthedocs.org/en/latest/ and it works like a charm.
Also, it works well with Python 3.

Related

Mongoose Schemas and inserting via python

I'm using a Nodejs server for a WebApp and Mongoose is acting as the ORM.
I've got some hooks that fire when data is inserted into a certain collection.
I want those hooks to fire when a python script inserts into the mongoDB instance. So if I have a pre save hook, it would modify the python scripts insert according to that hook.
Is this possible? If so, How do I do it?
If not, please feel free to explain to me why this is impossible and/or why I'm stupid.
EDIT: I came back to this question some months later and cringed just at how green I was when I asked it. All I really needed done was to create an API endpoint/flag on the NodeJS server that is specifically for automated tasks like the python script to send data to, and have mongoose in NodeJS land structure.

It is impossible because python and nodejs are 2 different runtimes - separate isolated processes which don't have access to each other memories.
Mongoose is a nodejs ORM - a library that maps Javascript objects to Mongodb documents and handles queries to the database.
All mongoose hooks belong to javascript space. They are executed on javascript objects before Mongoose sends any request to mongo. 2 outcomes from there: no other process can mess up with these hooks, not even another nodejs, and once the query reaches mongodb it's final, no more hooks, no more modifications.
One said a picture worth 100 words:
Neither python nor mongo are aware about mongoose hooks. All queries to mongo are initiated on the client side - a script sends a request to modify state of the database or to query state of the database.
The only way to trigger a javascript code execution from an update on mongodb side is to use change streams
Change streams are not mongoose hooks but can be used to hook into the updates on mongo side. It's a bit more advanced use of the database. It comes with additional requirements for mongo set up, size of the oplog, availability of the changestream clients, error handling etc.
You can learn more about change streams here https://docs.mongodb.com/manual/changeStreams/ I would strongly recommend to seek professional advice to architect such set up to avoid frustration and unexpected behaviour.

Mongo itself does not support hooks as a feature, mongoose gives you out of the box hooks you can use as you've mentioned. So what can you do to make it work in python?
Use an existing framework like python's eve, eve gives you database hooks, much like mongoose does. Now eve is a REST api framework which from your description doesn't sound like what you're looking for. Unfortunately I do not know of any package that's a perfect fit to your needs (if you do find one it would be great if you share a link in your question).
Build your own custom wrapper like this one. You can just built a custom wrapper class real quick and implement your own logic very easily.

OLAP Server for NodeJS

I have been looking for ways to provide analytics for an app which is powered by REST server written in NodeJs and MySQL. Discovered OLAP which can actually make this much easier.
And found a python library that provides an OLAP HTTP server called 'Slicer'
http://cubes.databrewery.org/
Can someone explain how this works? Does this mean I have to update my schema. And create what is called fact tables?
Can this be used in conjunction with my NodeJS App? Any examples? Since I have only created single server apps. Would python reside on the same nodejs server. How will it start? ('forever app.js' is my default script)
If I cant use python since I have no exp, what are basics to do it in Nodejs?
My model is basically list of words, so the olap queries I have are words made in days,weeks,months of length 2,5,10 letters in languages eng,french,german etc
Ideas, hints and guidance much appreciated!

As you found out, CUbes provides an HTTPS OLAP server (the slicer tool).
Can someone explain how this works?
As an OLAP server, you can issue OLAP queries to the server. The API is REST/JSON based, so you can easily query the server from Javascript, nodejs, Python or any other language of your choice via HTTP.
The server can answer OLAP queries. OLAP queries are based on a model of "facts" and "dimensions". You can for example query "the total sales amount for a given country and product, itemized by moonth".
Does this mean I have to update my schema. And create what is called fact tables?
OLAP queries are is built around the Facts and Dimension concepts.
OLAP-oriented datawarehousing strategies often involve the creation of these Fact and Dimension tables, building what is called a Star Schema or a Snowflake Schema. These schemas offer better performance for OLAP-type queries on relational databases. Data is often loaded by what is called an ETL process (it can be a simple script) that loads data in the appropriate form.
The Python Cubes framework, however, does not force you to alter your schema or create an alternate one. It has a SQL backend which allows you to define your model (in terms of Facts and Dimensions) without the need of changing the actual database model. This is the documentation for the model definition: https://pythonhosted.org/cubes/model.html .
However, in some cases you may still prefer to define a schema for Data Mining and use a transformation process to load data periodically. It depends on your needs, the amount of data you have, performance considerations, etc...
With Cubes you can also use other non RDBMS backends (ie MongoDB), some of which offer built-in aggregation capabilities that OLAP servers like Cubes can leverage.
Can this be used in conjunction with my NodeJS App?
You can issue queries to your Cubes Slicer server from NodeJS.
Any examples?
There is a Javascript client library to query Cubes. You probably want to use this one: https://github.com/Stiivi/cubes.js/
I don't know of any examples using NodeJS. You can try to get some inspiration from the included AngularJS application in Cubes (https://github.com/Stiivi/cubes/tree/master/incubator). Another client tool is CubesViewer which may be of use to you while building your model: http://jjmontesl.github.io/cubesviewer/ .
Since I have only created single server apps. Would python reside on the same nodejs server. How will it start? ('forever app.js' is my default script)
You would run Cubes Slicer server as a web application (directly from your web server, ie. Apache). For example, with Apache, you would use apache-wsgi mod which allows to serve python applications.
Slicer can also run as a small web server in a standalone process, which is very handy during development (but I wouldn't recommend for production environments). In this case, it will be listening on a different port (typically: http://localhost:5000 ).
If I cant use python since I have no exp, what are basics to do it in Nodejs?
You don't really need to use Python at all. You can configure and use Python Cubes as OLAP server, and run queries from Javascript code (ie. directly from the browser). From the client point of view, is like a database system which you can query via HTTP and get responses in JSON format.

Standalone Desktop App with Centralized Database (file) w/o Database Server?

I have been developing a fairly simple desktop application to be used by a group of 100-150 people within my department mainly for reporting. Unfortunately, I have to build it within some pretty strict confines similar to the specs called out in this post. The application will just be a self contained executable with no need to install.
The problem I'm running into is figuring out how to handle the database need. There will probably only be about 1GB of data for the app, but it needs to be available to everyone.
I would embed the database with the application (SQLite), but the data needs to be refreshed every week from a centralized process, so I figure it would be easier to maintain one database, rather than pushing updates down to the apps. Plus users will need to write to the database as well and those updates need to be seen by everyone.
I'm not allowed to set up a server for the database, so that rules out any good options for a true database. I'm restricted to File Shares or SharePoint.
It seems like I'm down to MS Access or SQLite. I'd prefer to stick with SQLite because I'm a fan of python and SQLAlchemy - but based on what I've read SQLite is not a good solution for multiple users accessing it over the network (or even possible).
Is there another option I haven't discovered for this setup or am I stuck working with MS Access? Perhaps I'll need to break down and work with SharePoint lists and apps?
I've been researching this for quite a while now, and I've run out of ideas. Any help is appreciated.
FYI, as I'm sure you can tell, I'm not a professional developer. I have enough experience in web / python / vb development that I can get by - so I was asked to do this as a side project.

SQLite can operate across a network and be shared among different processes. It is not a good solution when the application is write-heavy (because it locks the database file for the duration of a write), but if the application is mostly reporting it may be a perfectly reasonable solution.

As my options are limited, I decided to go with a built in database for each app using SQLite. The db will only need updated every week or two, so I figured a 30 second update by pulling from flat files will be OK. then the user will have all data locally to browse as needed.

Small "embeddable" database that can also be synced over the network?

I am looking for a small database that can be "embedded" into my Python application without running a separate server, as one can do with SQLite or Metakit. I don't need an SQL database, in fact storing free-form data like Python dictionaries or JSON is preferable.
The other requirement is that to be able to run an instance of the database on a server, and have instances of my application (clients) sync the database with the server (two-way), similar to what CouchDB replication can do.
Is there a database that will do this?

From what you describe, it sounds like you could get by using pickle and FTP.

If you don't need an SQL database, what's wrong with CouchDB? You can spawn a local process to serve the DB, and you could easily write a server wrapper to allow only access from your app. I'm not sure about the access story, but I believe the latest Ubuntu uses CouchDB for synchronizeable user-level data.

Seems like the perfect job for CouchDB: 2 way sync is incredibly easy, schema-less JSON documents are the native format. If you're using python, couchdb-python is a great way to work with CouchDB.

Do you need clients to work offline and then resync when they reconnect to the network? I don't know if MongoDB can handle the offline client scenario, but if the client is online all the time, MongoDB might be a good solution too. It has pretty goode python support. Still a separate process, but perhaps easier to get running on Windows than CouchDB.

BerkeleyDB might be another option to check out, and it's lightweight enough. easy_install bsddb3 if you need a Python interface.

HSQLDB does this, but unfortunately it's Java rather than Python.
Firebird SQL might be closer to what you want, since it does seem to have a Python interface.

backend for python

which is the best back end for python applications and what is the advantage of using sqlite ,how it can be connected to python applications

What do you mean with back end? Python apps connect to SQLite just like any other database, you just have to import the correct module and check how to use it.
The advantages of using SQLite are:
You don't need to setup a database server, it's just a file
No configurations needed
Cross platform
Mainly, desktops applications are the ones that take real advantage of this. For web apps, SQLite is not recommended, since the file containing the data, is easily readable (lacks any kind of encryption), and when the web server lacks special configuration, the file is downloadable by anyone.

Django, Twisted, and CherryPy are popular Python "Back-Ends" as far as web applications go, with Twisted likely being the most flexible as far as networking is concerned.
SQLite can, as has been previously posted, be directly interfaced with using SQL commands as it has native bindings for Python, or it can be accessed with an Object Relational Manager such as SQLObject (another Python library).
As far as performance is concered, SQLite is fairly scalable and should be able to handle most use cases that don't require a seperate database server (nothing enterprise level). An additional benefit of SQLite is that the database is self-contained in a single file allowing for easy backup while remained a common enough format that multiple applications can access the data. A word of advice on using SQLite with Python, however, is that you may run into issues with threading (in the past most of the bindings for SQLite were not thread-safe, although this may have changed over time).

The language you are using at the application layer has little to do with your database choice underneath. You need to examine the advantages of other DB packages to get an idea of what you want.
Here are some popular database packages for cheap or free:
ms sql server express, pg/sql, mysql

If you mean "what is the best database?" then there's simply no way to answer this question. If you just want a small database that won't be used by more than a handful of people at a time, SQLite is what you're looking for. If you're running a database for a giant corporation serving thousands, you're probably looking for Oracle. In between those, you have MySQL, PostgreSQL, SQL Server, db2, and probably more.
If you're familiar with one of those, that may be the best to go with from a practical standpoint. If you're doing a typical webapp, my advice would be to go with MySQL or PostgreSQL as they're free and well supported by just about any ORM you could think of (my personal preference is towards PostgreSQL, but I'm not experienced enough with either of these to make a good argument one way or another). If you do go with one of those two, my recommendation is to use storm as the ORM.
(And yes, there are free versions of SQL Server and Oracle. You won't have as many choices as far as ORMs go though)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.