I'm using a Nodejs server for a WebApp and Mongoose is acting as the ORM.
I've got some hooks that fire when data is inserted into a certain collection.
I want those hooks to fire when a python script inserts into the mongoDB instance. So if I have a pre save hook, it would modify the python scripts insert according to that hook.
Is this possible? If so, How do I do it?
If not, please feel free to explain to me why this is impossible and/or why I'm stupid.
EDIT: I came back to this question some months later and cringed just at how green I was when I asked it. All I really needed done was to create an API endpoint/flag on the NodeJS server that is specifically for automated tasks like the python script to send data to, and have mongoose in NodeJS land structure.
It is impossible because python and nodejs are 2 different runtimes - separate isolated processes which don't have access to each other memories.
Mongoose is a nodejs ORM - a library that maps Javascript objects to Mongodb documents and handles queries to the database.
All mongoose hooks belong to javascript space. They are executed on javascript objects before Mongoose sends any request to mongo. 2 outcomes from there: no other process can mess up with these hooks, not even another nodejs, and once the query reaches mongodb it's final, no more hooks, no more modifications.
One said a picture worth 100 words:
Neither python nor mongo are aware about mongoose hooks. All queries to mongo are initiated on the client side - a script sends a request to modify state of the database or to query state of the database.
The only way to trigger a javascript code execution from an update on mongodb side is to use change streams
Change streams are not mongoose hooks but can be used to hook into the updates on mongo side. It's a bit more advanced use of the database. It comes with additional requirements for mongo set up, size of the oplog, availability of the changestream clients, error handling etc.
You can learn more about change streams here https://docs.mongodb.com/manual/changeStreams/ I would strongly recommend to seek professional advice to architect such set up to avoid frustration and unexpected behaviour.
Mongo itself does not support hooks as a feature, mongoose gives you out of the box hooks you can use as you've mentioned. So what can you do to make it work in python?
Use an existing framework like python's eve, eve gives you database hooks, much like mongoose does. Now eve is a REST api framework which from your description doesn't sound like what you're looking for. Unfortunately I do not know of any package that's a perfect fit to your needs (if you do find one it would be great if you share a link in your question).
Build your own custom wrapper like this one. You can just built a custom wrapper class real quick and implement your own logic very easily.
Related
Looking for a bit of advice.
I have a current architecture of Django and PostgreSQL, where a whole lot of activity is happening to the data via the ORM, through scheduled jobs. The data on the backend is being processed and updated on roughly 30 second intervals.
The data is available to the front-end through a bunch of DRF serialisers (basic REST API). This is just being piped to standard HTML templates at the moment.
I'd like the React front-end to mirror this behaviour, and am looking for best-practice advice on how this is typically done. I know in practice how this works in other frameworks but am not certain of doing this well (namely, connecting React's DOM automation to server-side updates).
(I don't want to get involved with websockets, at all.)
Theoretically, I understand there is two ways to do this:
Front-end AJAX polling the API for new data
HTTP/2 Server Push
Something built into React that will load stuff in incrementally?
Appreciate the advice - (short examples would be really helpful if possible).
First use Django channels, documentation is great btw.
Django Channels
Next what is for you is connect React on some event from models, when you save something in model or create new instance after method save, call channels to expose that object in some group. Of course you need to write URL-s where you will be able to get response from channels.
I have JS running and essentially getting user entries from my HTML session storage and pushing these to a DB. I also need to use a HTTP request to pass a json object containing the entries to a python file hosted somewhere else.
Does anyone have any idea of documentation I could look at, or perhaps how to get JSON objects from JS to Python.
My client does not want me to grab the variables directly from the DB.
You have to create some sort of communication channel between the javascript and python code. This could be anything, SOAP, HTTP, RPC, any number of and flavor of message queue.
If nothing like that is in place, it's quite the long way around. A complex application might warrant you doing this, think micro services communicating across some sort of service bus. It's a sound strategy and perhaps that's why your client is asking for it.
You already have Firebase, though! Firebase is a real-time database that already has many of the characteristics of a queue. The simplest and most idiomatic thing to do would be to let the python code be notified of changes by Firebase: Firebase as service bus is a nice strategy too!
I have been looking for ways to provide analytics for an app which is powered by REST server written in NodeJs and MySQL. Discovered OLAP which can actually make this much easier.
And found a python library that provides an OLAP HTTP server called 'Slicer'
http://cubes.databrewery.org/
Can someone explain how this works? Does this mean I have to update my schema. And create what is called fact tables?
Can this be used in conjunction with my NodeJS App? Any examples? Since I have only created single server apps. Would python reside on the same nodejs server. How will it start? ('forever app.js' is my default script)
If I cant use python since I have no exp, what are basics to do it in Nodejs?
My model is basically list of words, so the olap queries I have are words made in days,weeks,months of length 2,5,10 letters in languages eng,french,german etc
Ideas, hints and guidance much appreciated!
As you found out, CUbes provides an HTTPS OLAP server (the slicer tool).
Can someone explain how this works?
As an OLAP server, you can issue OLAP queries to the server. The API is REST/JSON based, so you can easily query the server from Javascript, nodejs, Python or any other language of your choice via HTTP.
The server can answer OLAP queries. OLAP queries are based on a model of "facts" and "dimensions". You can for example query "the total sales amount for a given country and product, itemized by moonth".
Does this mean I have to update my schema. And create what is called fact tables?
OLAP queries are is built around the Facts and Dimension concepts.
OLAP-oriented datawarehousing strategies often involve the creation of these Fact and Dimension tables, building what is called a Star Schema or a Snowflake Schema. These schemas offer better performance for OLAP-type queries on relational databases. Data is often loaded by what is called an ETL process (it can be a simple script) that loads data in the appropriate form.
The Python Cubes framework, however, does not force you to alter your schema or create an alternate one. It has a SQL backend which allows you to define your model (in terms of Facts and Dimensions) without the need of changing the actual database model. This is the documentation for the model definition: https://pythonhosted.org/cubes/model.html .
However, in some cases you may still prefer to define a schema for Data Mining and use a transformation process to load data periodically. It depends on your needs, the amount of data you have, performance considerations, etc...
With Cubes you can also use other non RDBMS backends (ie MongoDB), some of which offer built-in aggregation capabilities that OLAP servers like Cubes can leverage.
Can this be used in conjunction with my NodeJS App?
You can issue queries to your Cubes Slicer server from NodeJS.
Any examples?
There is a Javascript client library to query Cubes. You probably want to use this one: https://github.com/Stiivi/cubes.js/
I don't know of any examples using NodeJS. You can try to get some inspiration from the included AngularJS application in Cubes (https://github.com/Stiivi/cubes/tree/master/incubator). Another client tool is CubesViewer which may be of use to you while building your model: http://jjmontesl.github.io/cubesviewer/ .
Since I have only created single server apps. Would python reside on the same nodejs server. How will it start? ('forever app.js' is my default script)
You would run Cubes Slicer server as a web application (directly from your web server, ie. Apache). For example, with Apache, you would use apache-wsgi mod which allows to serve python applications.
Slicer can also run as a small web server in a standalone process, which is very handy during development (but I wouldn't recommend for production environments). In this case, it will be listening on a different port (typically: http://localhost:5000 ).
If I cant use python since I have no exp, what are basics to do it in Nodejs?
You don't really need to use Python at all. You can configure and use Python Cubes as OLAP server, and run queries from Javascript code (ie. directly from the browser). From the client point of view, is like a database system which you can query via HTTP and get responses in JSON format.
I am over accustomed to Django ORM and feel handicapped when trying to build a standalone python-twisted application which needs database integration.
SQLAlchemy looks promising - true. But I am trying to tinker with twisted as well and am unable to find anything on the lines of a good async python orm.
what I have found (https://stackoverflow.com/a/1705987/338691) would force me to write raw sql queries - doesn't feel quite right after my elongated stint with django.
So how does one play with database schema in a twisted application?
There is also http://findingscience.com/twistar/ which unfortunately follows the Active Record pattern and last time I checked, the author feels that migrations are out of scope of the project. So you would end up writing migrations manually anyway (maybe there could be some adapter for alembic for that, that would be cool).
Also I remember seeing github repo where the author tries to make twisted play nicely with sqlalchemy (without deferToThread) but I haven't followed to see if it was a success and can't find the URL. (also Twisted + SQLAlchemy and the best way to do it)
And lastly, recent versions of psycopg supports setting an async callback. Maybe that could be leveraged to something (integration with SQLAlchemy? or something).
UPDATE: also recently appeared this interesting project - alchimia
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
I'm starting to experiment with CouchDB because it looks like the perfect solution for certain problems we have. Given that all work will be on a brand new project with no legacy dependencies, which client library would you suggest that I use, and why?
This would be easier if there was any overlap on the OSes we use. FreeBSD only has py-simplecouchdb already available in its ports collection, but that library's project website says to use CouchDBKit instead. Neither of those come with Ubuntu, which only ships with CouchDB. Since those two OSes don't have an libraries in common, I'll probably be installing something from source (and hopefully submitting packages to the Ubuntu and FreeBSD folks if I have time).
For those interested, I'd like to use CouchDB as a convenient intermediate storage place for data passed between various services - think of a message bus system but with less formality. For example, we have daemons that download and parse web pages, then send interesting bits to other daemons for further processing. A lot of those objects are ill-defined until runtime ("here's some HTML, plus a set of metadata, and some actions to run on it"). Rather than serialize it to an ad-hoc local network protocol or stick it in PostgreSQL, I'd much rather use something designed for the purpose. We're currently using NetWorkSpaces in this role, but it doesn't have nearly the breadth of support or the user community of CouchDB.
I have been using couchdb-python with quite a lot of success and as far as I know the guys of desktopcouch use it in ubuntu. The prerequisites are very basic and you should have not problems:
httplib2
simplejson or cjson
Python
CouchDB 0.9.x (earlier or later versions are unlikely to work as the interface is still changing)
For me some of the advantages are:
Pythonic interface. You can work with the database like if it was a dict.
Interface for design documents.
a CouchDB view server that allows writing view functions in Python
It also provides a couple of command-line tools:
couchdb-dump: Writes a snapshot of a CouchDB database
couchdb-load: Reads a MIME multipart file as generated by couchdb-dump and loads all the documents, attachments, and design documents into a CouchDB database.
couchdb-replicate: Can be used as an update-notification script to trigger replication between databases when data is changed.
If you're still considering CouchDB then I'll recommend Couchdbkit (http://www.couchdbkit.org). It's simple enough to quickly get a hang on and runs fine on my machine running Karmic Koala. Prior to that I've tried couchdb-python but some bugs (maybe ironed out by now) with httplib was giving me some errors (duplicate documents..etc) but Couchdbkit got me up and going so far without any problems.
spycouch
Simple Python API for CouchDB
Python library for easily manage CouchDB.
Compared to ordinarily available libraries on web, works with the latest version CouchDB - 1.2.1
Functionality
Create a new database on the server
Deleting a database from the server
Listing databases on the server
Database information
Database compression
Create map view
Map view
Listing documents in DB
Get document from DB
Save document to DB
Delete document from DB
Editing of a document
spycouch on >> https://github.com/cernyjan/repository
Considering the task you are trying to solve (distributed task processing) you should consider using one of the many tools designed for message passing rather than using a database. See for instance this SO question on running multiple tasks over many machines.
If you really want a simple casual message passing system, I recommend you shift your focus to MorbidQ. As you get more serious, use RabbitMQ or ActiveMQ. This way you reduce the latency in your system and avoid having many clients polling a database (and thus hammering that computer).
I've found that avoiding databases is a good idea (That's my blog) - and I have a end-to-end live data system running using MorbidQ here
I have written a couchdb client library built on python-requests (which is in most distributions). We use this library in production.
https://github.com/adamlofts/couchdb-requests
Robust CouchDB Python interface using python-requests.
Goals:
Only one way to do something
Fast and stable (connection pooled)
Explicit is better than implicit. Buffer sizes, connection pool size.
Specify query parameters, no **params in query functions
After skimming through the docs of many couchdb python libraries, my choice went to pycouchdb.
All I needed to know was very quick to grasp from the doc: https://py-couchdb.readthedocs.org/en/latest/ and it works like a charm.
Also, it works well with Python 3.