I'm currently looking at a codebase and trying to figure out what database tables are hit for specific API calls. I'd like a way to programmatically find out the specific tables being hit for an API call. The API seems to mainly hit a postgresql and mssql databse, with some calls also hitting a dynamo db. Unfortunately the API seems complicated, so I really don't want to deal with enabling logging in all of the projects that impact the API. I checked out some python libraries that created diagrams from python code, but they did not seem too helpful. Would it be feasible to change the logging in the databases to log complete queries for a time? I figure if I set up a test version of the APIs and databases with increased logging I can just hit the db with specific calls and then get the table info from the database logs.
Related
We have inherited a DynamoDB database and have been told that a custom script is required if we want to edit/delete multiple items. ie you can't just run an easy mysql query to do this like in a relational database.
What is the most common way this is done? e.g. python script, lambda, etc.
thanks
You can connect to DynamoDB programmatically through a variety of SDKs (Javascript, .Net, Java, Go, C++, Ruby, Python, etc),in the AWS web UI console, and through the AWS command line interface (among others). I don't think it's terribly different than other databases in that regard.
If you are just getting started, I'd start by writing a script in your preferred programming languages with one of the many SDKs available. The NoSQL Workbench (an app available from Amazon directly) has a useful Operation Builder that will not only help with query syntax, but will even create a script to execute the operation in Java, Javascript and Python.
The DynamoDB API references the operations you can perform on DynamoDB (CRUD operations, transactions, etc). Each API method has a section at the bottom of the page with links to examples of that method being called in each of the supported SDKs. To mutate multiple items at once, you may want to check out the Batch operations (BatchWriteItem, BatchGetItem).
I'm not entirely sure I understand the advice you were given. DynamoDB is a noSQL database, so you cannot access it via SQL queries. NoSQL databases can come with a steep learning curve. I wouldn't say it's any harder than working with SQL databases, just different.
Familiarize yourself with DynamoDB data modeling, it is very different than data modeling in SQL databases. When learning DynamoDB, try hard to forget everything you know about working with SQL databases. Trying to find the SQL equivalent for something in DynamoDB consistently gives people a hard time.
I would highly recommend The Dynamo Guide to start the learning process. Again, it's not hard, just different than SQL databases. The AWS docs can be a bit terse, so I found resources like The DynamoDB Guide to be a lifesaver!
I'm using a Nodejs server for a WebApp and Mongoose is acting as the ORM.
I've got some hooks that fire when data is inserted into a certain collection.
I want those hooks to fire when a python script inserts into the mongoDB instance. So if I have a pre save hook, it would modify the python scripts insert according to that hook.
Is this possible? If so, How do I do it?
If not, please feel free to explain to me why this is impossible and/or why I'm stupid.
EDIT: I came back to this question some months later and cringed just at how green I was when I asked it. All I really needed done was to create an API endpoint/flag on the NodeJS server that is specifically for automated tasks like the python script to send data to, and have mongoose in NodeJS land structure.
It is impossible because python and nodejs are 2 different runtimes - separate isolated processes which don't have access to each other memories.
Mongoose is a nodejs ORM - a library that maps Javascript objects to Mongodb documents and handles queries to the database.
All mongoose hooks belong to javascript space. They are executed on javascript objects before Mongoose sends any request to mongo. 2 outcomes from there: no other process can mess up with these hooks, not even another nodejs, and once the query reaches mongodb it's final, no more hooks, no more modifications.
One said a picture worth 100 words:
Neither python nor mongo are aware about mongoose hooks. All queries to mongo are initiated on the client side - a script sends a request to modify state of the database or to query state of the database.
The only way to trigger a javascript code execution from an update on mongodb side is to use change streams
Change streams are not mongoose hooks but can be used to hook into the updates on mongo side. It's a bit more advanced use of the database. It comes with additional requirements for mongo set up, size of the oplog, availability of the changestream clients, error handling etc.
You can learn more about change streams here https://docs.mongodb.com/manual/changeStreams/ I would strongly recommend to seek professional advice to architect such set up to avoid frustration and unexpected behaviour.
Mongo itself does not support hooks as a feature, mongoose gives you out of the box hooks you can use as you've mentioned. So what can you do to make it work in python?
Use an existing framework like python's eve, eve gives you database hooks, much like mongoose does. Now eve is a REST api framework which from your description doesn't sound like what you're looking for. Unfortunately I do not know of any package that's a perfect fit to your needs (if you do find one it would be great if you share a link in your question).
Build your own custom wrapper like this one. You can just built a custom wrapper class real quick and implement your own logic very easily.
I am developing a Cloud based data analysis tool, and I am using Django(1.10) for that.
I have to add columns to the existing tables, create new tables, change data-type of columns(part of data-cleaning activity) at the run time and can't figure out a way to update/reflect those changes, in run time, in the Django model, because those changes will be required in further analysis process.
I have looked into 'inspectdb' and 'syncdb', but all of these options would require taking the portal offline and then making those changes, which I don't want.
Please can you suggest a solution or a work-around of how to achieve this.
Also, is there a way in which I can select what database I want to work from the list of databases on my MySQL server, after running Django.
Django's ORM might not be the right tool for you if you need to change your schema (or db) online - the schema is defined in python modules and loaded once when Django's web server starts.
You can still use Django's templates, forms and other libraries and write your own custom DB access layer that manipulates a DB dynamically using python.
I use django-rest-framework for backend and angularjs for frontend. I started write e2e tests using protractor and faced with a problem, that after each test all changes in database are saved.
In django every test is enclosed in a database transaction that is rolled back at the end of the test. Is there a way to enclose in transaction every protractor test? I know that I can use django live server, python-selenium and write tests in python, but then I lose advantages of protractor.
Unfortunately, there is no universal solution for this problem.
One option is to connect to your database directly from Protractor/Node.js with a database client of your choice and make the necessary database changes before, after or during the tests. You can even use ORMs like sequelize.js as an abstraction layer for your database tables. But, since your backend is not Node.js, having two database abstraction layers in two different languages would probably overcomplicate things.
Or, generally a better way: you can use your Django REST API in the "set up" and "tear down" phases of your Protractor tests to restore/prepare the necessary database state by making requests to the REST API with an HTTP client, please see more at:
Direct Server HTTP Calls in Protractor
I have a Django-app built in API style, and I need benchmark it.
I want to use django unitetesting library to build becnhmarks as tests for API endpoints. They will use Django-test-client for querying endpoints and collect data about SQL queries and their timing, and save it somwhere.
Is it sane idea at all?
Also I want to see timings for Python code with stacktraces, and see which code causes which SQL queries. Could somebody know approaches to collect such information without modifications in code of app?
Just an option, that I've used before: nose and it's --with-xunit plugin:
This plugin provides test results in the standard XUnit XML format.
In the test results you'll see running time for each test case, stack traces for failures etc.
Also, django-debug-toolbar and django database logging might help you with getting data about SQL queries.
Also there are other suggestions here:
benchmarking django apps
Is there a library to benchmark my Django App for SQL requests?
log all sql queries
Django performance testing suite that'll report on metrics (db queries etc.)
Hope that helps.