I have a Django-app built in API style, and I need benchmark it.
I want to use django unitetesting library to build becnhmarks as tests for API endpoints. They will use Django-test-client for querying endpoints and collect data about SQL queries and their timing, and save it somwhere.
Is it sane idea at all?
Also I want to see timings for Python code with stacktraces, and see which code causes which SQL queries. Could somebody know approaches to collect such information without modifications in code of app?
Just an option, that I've used before: nose and it's --with-xunit plugin:
This plugin provides test results in the standard XUnit XML format.
In the test results you'll see running time for each test case, stack traces for failures etc.
Also, django-debug-toolbar and django database logging might help you with getting data about SQL queries.
Also there are other suggestions here:
benchmarking django apps
Is there a library to benchmark my Django App for SQL requests?
log all sql queries
Django performance testing suite that'll report on metrics (db queries etc.)
Hope that helps.
Related
I'm using a Nodejs server for a WebApp and Mongoose is acting as the ORM.
I've got some hooks that fire when data is inserted into a certain collection.
I want those hooks to fire when a python script inserts into the mongoDB instance. So if I have a pre save hook, it would modify the python scripts insert according to that hook.
Is this possible? If so, How do I do it?
If not, please feel free to explain to me why this is impossible and/or why I'm stupid.
EDIT: I came back to this question some months later and cringed just at how green I was when I asked it. All I really needed done was to create an API endpoint/flag on the NodeJS server that is specifically for automated tasks like the python script to send data to, and have mongoose in NodeJS land structure.
It is impossible because python and nodejs are 2 different runtimes - separate isolated processes which don't have access to each other memories.
Mongoose is a nodejs ORM - a library that maps Javascript objects to Mongodb documents and handles queries to the database.
All mongoose hooks belong to javascript space. They are executed on javascript objects before Mongoose sends any request to mongo. 2 outcomes from there: no other process can mess up with these hooks, not even another nodejs, and once the query reaches mongodb it's final, no more hooks, no more modifications.
One said a picture worth 100 words:
Neither python nor mongo are aware about mongoose hooks. All queries to mongo are initiated on the client side - a script sends a request to modify state of the database or to query state of the database.
The only way to trigger a javascript code execution from an update on mongodb side is to use change streams
Change streams are not mongoose hooks but can be used to hook into the updates on mongo side. It's a bit more advanced use of the database. It comes with additional requirements for mongo set up, size of the oplog, availability of the changestream clients, error handling etc.
You can learn more about change streams here https://docs.mongodb.com/manual/changeStreams/ I would strongly recommend to seek professional advice to architect such set up to avoid frustration and unexpected behaviour.
Mongo itself does not support hooks as a feature, mongoose gives you out of the box hooks you can use as you've mentioned. So what can you do to make it work in python?
Use an existing framework like python's eve, eve gives you database hooks, much like mongoose does. Now eve is a REST api framework which from your description doesn't sound like what you're looking for. Unfortunately I do not know of any package that's a perfect fit to your needs (if you do find one it would be great if you share a link in your question).
Build your own custom wrapper like this one. You can just built a custom wrapper class real quick and implement your own logic very easily.
I'm currently looking at a codebase and trying to figure out what database tables are hit for specific API calls. I'd like a way to programmatically find out the specific tables being hit for an API call. The API seems to mainly hit a postgresql and mssql databse, with some calls also hitting a dynamo db. Unfortunately the API seems complicated, so I really don't want to deal with enabling logging in all of the projects that impact the API. I checked out some python libraries that created diagrams from python code, but they did not seem too helpful. Would it be feasible to change the logging in the databases to log complete queries for a time? I figure if I set up a test version of the APIs and databases with increased logging I can just hit the db with specific calls and then get the table info from the database logs.
I use django-rest-framework for backend and angularjs for frontend. I started write e2e tests using protractor and faced with a problem, that after each test all changes in database are saved.
In django every test is enclosed in a database transaction that is rolled back at the end of the test. Is there a way to enclose in transaction every protractor test? I know that I can use django live server, python-selenium and write tests in python, but then I lose advantages of protractor.
Unfortunately, there is no universal solution for this problem.
One option is to connect to your database directly from Protractor/Node.js with a database client of your choice and make the necessary database changes before, after or during the tests. You can even use ORMs like sequelize.js as an abstraction layer for your database tables. But, since your backend is not Node.js, having two database abstraction layers in two different languages would probably overcomplicate things.
Or, generally a better way: you can use your Django REST API in the "set up" and "tear down" phases of your Protractor tests to restore/prepare the necessary database state by making requests to the REST API with an HTTP client, please see more at:
Direct Server HTTP Calls in Protractor
I have been looking for ways to provide analytics for an app which is powered by REST server written in NodeJs and MySQL. Discovered OLAP which can actually make this much easier.
And found a python library that provides an OLAP HTTP server called 'Slicer'
http://cubes.databrewery.org/
Can someone explain how this works? Does this mean I have to update my schema. And create what is called fact tables?
Can this be used in conjunction with my NodeJS App? Any examples? Since I have only created single server apps. Would python reside on the same nodejs server. How will it start? ('forever app.js' is my default script)
If I cant use python since I have no exp, what are basics to do it in Nodejs?
My model is basically list of words, so the olap queries I have are words made in days,weeks,months of length 2,5,10 letters in languages eng,french,german etc
Ideas, hints and guidance much appreciated!
As you found out, CUbes provides an HTTPS OLAP server (the slicer tool).
Can someone explain how this works?
As an OLAP server, you can issue OLAP queries to the server. The API is REST/JSON based, so you can easily query the server from Javascript, nodejs, Python or any other language of your choice via HTTP.
The server can answer OLAP queries. OLAP queries are based on a model of "facts" and "dimensions". You can for example query "the total sales amount for a given country and product, itemized by moonth".
Does this mean I have to update my schema. And create what is called fact tables?
OLAP queries are is built around the Facts and Dimension concepts.
OLAP-oriented datawarehousing strategies often involve the creation of these Fact and Dimension tables, building what is called a Star Schema or a Snowflake Schema. These schemas offer better performance for OLAP-type queries on relational databases. Data is often loaded by what is called an ETL process (it can be a simple script) that loads data in the appropriate form.
The Python Cubes framework, however, does not force you to alter your schema or create an alternate one. It has a SQL backend which allows you to define your model (in terms of Facts and Dimensions) without the need of changing the actual database model. This is the documentation for the model definition: https://pythonhosted.org/cubes/model.html .
However, in some cases you may still prefer to define a schema for Data Mining and use a transformation process to load data periodically. It depends on your needs, the amount of data you have, performance considerations, etc...
With Cubes you can also use other non RDBMS backends (ie MongoDB), some of which offer built-in aggregation capabilities that OLAP servers like Cubes can leverage.
Can this be used in conjunction with my NodeJS App?
You can issue queries to your Cubes Slicer server from NodeJS.
Any examples?
There is a Javascript client library to query Cubes. You probably want to use this one: https://github.com/Stiivi/cubes.js/
I don't know of any examples using NodeJS. You can try to get some inspiration from the included AngularJS application in Cubes (https://github.com/Stiivi/cubes/tree/master/incubator). Another client tool is CubesViewer which may be of use to you while building your model: http://jjmontesl.github.io/cubesviewer/ .
Since I have only created single server apps. Would python reside on the same nodejs server. How will it start? ('forever app.js' is my default script)
You would run Cubes Slicer server as a web application (directly from your web server, ie. Apache). For example, with Apache, you would use apache-wsgi mod which allows to serve python applications.
Slicer can also run as a small web server in a standalone process, which is very handy during development (but I wouldn't recommend for production environments). In this case, it will be listening on a different port (typically: http://localhost:5000 ).
If I cant use python since I have no exp, what are basics to do it in Nodejs?
You don't really need to use Python at all. You can configure and use Python Cubes as OLAP server, and run queries from Javascript code (ie. directly from the browser). From the client point of view, is like a database system which you can query via HTTP and get responses in JSON format.
I want to be able to instrument Python applications so that I know:
Page generation time.
Percentage of time spent in external requests (mysql, api calls).
Number of mysql queries, what the MySQL queries were.
I want this data from production (not offline profiling) - because the time spent in various places will be different under load.
In PHP I can do this with XHProf or instrumentation-for-php. In Ruby on Rails/.NET/Java, I can do this with New Relic.
Is there such a package recommended for Python or django?
Yes, it's perfectly possible. E.g. use some magic switch in URL, like "?profile-me" which triggers profiling in Django middleware.
There are a number of snippets on the Internet, like this one: http://djangosnippets.org/snippets/70/ or modules like this one: http://code.google.com/p/django-profiling/ - but I haven't used any of them so I cannot recommend anything.
Anyway, the approach they take is similar to what I do - i.e. use Python Hotshot profiler module in a middleware that wraps your view. For the MySQL part, you can just use connection.queries form Django.
The nice thing about Hotshot is that its output can be browsed using Kcachegrind like here: http://www.rkblog.rk.edu.pl/w/p/django-profiling-hotshot-and-kcachegrind/
New Relic now had a package for Python, including Django through mod_wsgi.
https://support.newrelic.com/help/kb/python
django-prometheus is a good choice for handling production workloads, especially in a container environment like Kubernetes. Out of the box, it has middleware for tracking request latencies and counts (by view method), as well as Database and cache access times. It wouldn't be a good solution for tracking which queries are actually executing, but that's where a logging solution like ELK would come into play. If it helps, I've written a post which walks through how to add custom metrics to a Django application.