best practices for updating multiple DynamoDB items

best practices for updating multiple DynamoDB items - python

We have inherited a DynamoDB database and have been told that a custom script is required if we want to edit/delete multiple items. ie you can't just run an easy mysql query to do this like in a relational database.
What is the most common way this is done? e.g. python script, lambda, etc.
thanks

You can connect to DynamoDB programmatically through a variety of SDKs (Javascript, .Net, Java, Go, C++, Ruby, Python, etc),in the AWS web UI console, and through the AWS command line interface (among others). I don't think it's terribly different than other databases in that regard.
If you are just getting started, I'd start by writing a script in your preferred programming languages with one of the many SDKs available. The NoSQL Workbench (an app available from Amazon directly) has a useful Operation Builder that will not only help with query syntax, but will even create a script to execute the operation in Java, Javascript and Python.
The DynamoDB API references the operations you can perform on DynamoDB (CRUD operations, transactions, etc). Each API method has a section at the bottom of the page with links to examples of that method being called in each of the supported SDKs. To mutate multiple items at once, you may want to check out the Batch operations (BatchWriteItem, BatchGetItem).
I'm not entirely sure I understand the advice you were given. DynamoDB is a noSQL database, so you cannot access it via SQL queries. NoSQL databases can come with a steep learning curve. I wouldn't say it's any harder than working with SQL databases, just different.
Familiarize yourself with DynamoDB data modeling, it is very different than data modeling in SQL databases. When learning DynamoDB, try hard to forget everything you know about working with SQL databases. Trying to find the SQL equivalent for something in DynamoDB consistently gives people a hard time.
I would highly recommend The Dynamo Guide to start the learning process. Again, it's not hard, just different than SQL databases. The AWS docs can be a bit terse, so I found resources like The DynamoDB Guide to be a lifesaver!

Related

Alternate solution for portable database other than sqlite for python program

I'm comfortable with sqlite but when I use two different python program to access the same database. It throws an error like table is locked.
What are the different portable database to use with python?

SQLite is great but is designed as a small, fast, single user database. it isn't designed for the use you describe.
You can just pretty much any database with Python. The Python database API provides a straightforward way to interface with most relational databases, including SQLite.
However, to interact with a database in a more natural Python style, I've enjoyed using SQLAlchemy. It took a bit to work through the tutorial but it's great.
My personal preferred database is Postgres, but there are many other choices.

OLAP Server for NodeJS

I have been looking for ways to provide analytics for an app which is powered by REST server written in NodeJs and MySQL. Discovered OLAP which can actually make this much easier.
And found a python library that provides an OLAP HTTP server called 'Slicer'
http://cubes.databrewery.org/
Can someone explain how this works? Does this mean I have to update my schema. And create what is called fact tables?
Can this be used in conjunction with my NodeJS App? Any examples? Since I have only created single server apps. Would python reside on the same nodejs server. How will it start? ('forever app.js' is my default script)
If I cant use python since I have no exp, what are basics to do it in Nodejs?
My model is basically list of words, so the olap queries I have are words made in days,weeks,months of length 2,5,10 letters in languages eng,french,german etc
Ideas, hints and guidance much appreciated!

As you found out, CUbes provides an HTTPS OLAP server (the slicer tool).
Can someone explain how this works?
As an OLAP server, you can issue OLAP queries to the server. The API is REST/JSON based, so you can easily query the server from Javascript, nodejs, Python or any other language of your choice via HTTP.
The server can answer OLAP queries. OLAP queries are based on a model of "facts" and "dimensions". You can for example query "the total sales amount for a given country and product, itemized by moonth".
Does this mean I have to update my schema. And create what is called fact tables?
OLAP queries are is built around the Facts and Dimension concepts.
OLAP-oriented datawarehousing strategies often involve the creation of these Fact and Dimension tables, building what is called a Star Schema or a Snowflake Schema. These schemas offer better performance for OLAP-type queries on relational databases. Data is often loaded by what is called an ETL process (it can be a simple script) that loads data in the appropriate form.
The Python Cubes framework, however, does not force you to alter your schema or create an alternate one. It has a SQL backend which allows you to define your model (in terms of Facts and Dimensions) without the need of changing the actual database model. This is the documentation for the model definition: https://pythonhosted.org/cubes/model.html .
However, in some cases you may still prefer to define a schema for Data Mining and use a transformation process to load data periodically. It depends on your needs, the amount of data you have, performance considerations, etc...
With Cubes you can also use other non RDBMS backends (ie MongoDB), some of which offer built-in aggregation capabilities that OLAP servers like Cubes can leverage.
Can this be used in conjunction with my NodeJS App?
You can issue queries to your Cubes Slicer server from NodeJS.
Any examples?
There is a Javascript client library to query Cubes. You probably want to use this one: https://github.com/Stiivi/cubes.js/
I don't know of any examples using NodeJS. You can try to get some inspiration from the included AngularJS application in Cubes (https://github.com/Stiivi/cubes/tree/master/incubator). Another client tool is CubesViewer which may be of use to you while building your model: http://jjmontesl.github.io/cubesviewer/ .
Since I have only created single server apps. Would python reside on the same nodejs server. How will it start? ('forever app.js' is my default script)
You would run Cubes Slicer server as a web application (directly from your web server, ie. Apache). For example, with Apache, you would use apache-wsgi mod which allows to serve python applications.
Slicer can also run as a small web server in a standalone process, which is very handy during development (but I wouldn't recommend for production environments). In this case, it will be listening on a different port (typically: http://localhost:5000 ).
If I cant use python since I have no exp, what are basics to do it in Nodejs?
You don't really need to use Python at all. You can configure and use Python Cubes as OLAP server, and run queries from Javascript code (ie. directly from the browser). From the client point of view, is like a database system which you can query via HTTP and get responses in JSON format.

Django and Rails with one common DB

I have earlier worked on Java+Spring to create a web-app.
I have to build a new web-app now.
It will have one centralized db.
There will be two different type of instance of web-app.
Web-App 1:
a) It would have nothing to UI render, no html,js etc.
b) All it need to give is some set of rest API which will
b.1) create some new entries in DB
b.2) modify some entries in DB
b.3) retrieve some of DB records in JSON format.
some frontend code ( doesn't belong to this app) will periodically fetch
this details.
c) it will be used by max by 100,000 people but at a given point of time,
we can expect about 1000 users logged in and doing whats being said in b)
Web-App2 :
a) It will have some dashboards
b) 90% of DB operations would be read operations
c) 10% of DB operations would be write/modify
d) There will be about 1000s of user of this system and at any given point of time
hardly 50-1000 people will be accessing it.
I am thinking of following.
Have Web-App 1 created in python+Django and Web-App 2 created in RoR.
I am planning to use to Dynamo DB and memcache.
Why two different frameworks?
1) So that I get to learn both of them
2) There have been concern about scalability in RoR (and I also know people claim its not there), Web-app 1 may have scaling needs in future.
My questions is Do you see any problem with this combination?
for example active records would want you to use specific namings format for your data base tables? Are there any other concerns similar to this?
Anyone else who have used similar technology stack?
both frameworks are full stack framework and and provide MVC, templating, unit testing, security, db migration, caching, security, ORMs.

For my startup, we also needed to put out a full fleshed website along with an API. We are also using DynamoDB for storing most of the data and are only using MySQL for session info.
I opted to use Ruby on Rails for the Webapp and Sinatra for the API. If you're criteria is simply learning as many new things as possible, then it would make sense to opt for relatively different stacks (django/python and RoR). In our case, we went with sinatra because it's essentially a very lightweight wrapper around Rack and perfect for an API which essentially receives requests, calls one or more services or does some processing and hands out a formatted response. While I don't see any problem with using python/django instead of sinatra, in our case the benefit was having to spend less time working with a different language.
Also, scalability in rails is a bit of an iffy subject. In the end, it's about how you use it. We've had no issues scaling rails with unicorn and nginx. Our business logic is all in the API service and the rails server as well uses the API for most of the work. This means we don't use active record on rails and the website is just another consumer for our API which does all the heavy lifting whether the request comes from an app or the website. Using MySQL for the session store ensures we can route requests to any of the application servers without having to worry about always routing requests from the same client to the same server every time. This allows us to ramp up and down easily only considering the amount of traffic we're getting.
At the time we started working on this, there wasn't an ORM for dynamo db which looked and felt just like active record, so we ended up writing a few high level classes of our own to handle storage and retrieval of models on DynamoDb. Considering DynamoDB is not tailored for scans or joins, this didn't take a lot of effort since we were almost always doing lookups based on keys and ranges. This meant we didn't really need a replacement for active record since the real strength of active record is being able to intuitively do joins, etc. by convention.
DynamoDB does have it's limitations though and you might find yourself in situations where you will need to scan a large number of records. In our case, we also use CloudSearch to index some important info and use it as a fallback for cases when we need to do text based searches which need to scan all our data.

backend for python

which is the best back end for python applications and what is the advantage of using sqlite ,how it can be connected to python applications

What do you mean with back end? Python apps connect to SQLite just like any other database, you just have to import the correct module and check how to use it.
The advantages of using SQLite are:
You don't need to setup a database server, it's just a file
No configurations needed
Cross platform
Mainly, desktops applications are the ones that take real advantage of this. For web apps, SQLite is not recommended, since the file containing the data, is easily readable (lacks any kind of encryption), and when the web server lacks special configuration, the file is downloadable by anyone.

Django, Twisted, and CherryPy are popular Python "Back-Ends" as far as web applications go, with Twisted likely being the most flexible as far as networking is concerned.
SQLite can, as has been previously posted, be directly interfaced with using SQL commands as it has native bindings for Python, or it can be accessed with an Object Relational Manager such as SQLObject (another Python library).
As far as performance is concered, SQLite is fairly scalable and should be able to handle most use cases that don't require a seperate database server (nothing enterprise level). An additional benefit of SQLite is that the database is self-contained in a single file allowing for easy backup while remained a common enough format that multiple applications can access the data. A word of advice on using SQLite with Python, however, is that you may run into issues with threading (in the past most of the bindings for SQLite were not thread-safe, although this may have changed over time).

The language you are using at the application layer has little to do with your database choice underneath. You need to examine the advantages of other DB packages to get an idea of what you want.
Here are some popular database packages for cheap or free:
ms sql server express, pg/sql, mysql

If you mean "what is the best database?" then there's simply no way to answer this question. If you just want a small database that won't be used by more than a handful of people at a time, SQLite is what you're looking for. If you're running a database for a giant corporation serving thousands, you're probably looking for Oracle. In between those, you have MySQL, PostgreSQL, SQL Server, db2, and probably more.
If you're familiar with one of those, that may be the best to go with from a practical standpoint. If you're doing a typical webapp, my advice would be to go with MySQL or PostgreSQL as they're free and well supported by just about any ORM you could think of (my personal preference is towards PostgreSQL, but I'm not experienced enough with either of these to make a good argument one way or another). If you do go with one of those two, my recommendation is to use storm as the ORM.
(And yes, there are free versions of SQL Server and Oracle. You won't have as many choices as far as ORMs go though)

Do any Python ORMs (SQLAlchemy?) work with Google App Engine?

I'd like to use the Python version of App Engine but rather than write my code specifically for the Google Data Store, I'd like to create my models with a generic Python ORM that could be attached to Big Table, or, if I prefer, a regular database at some later time. Is there any Python ORM such as SQLAlchemy that would allow this?

Technically this wouldn't be called an ORM (Object Relational Mapper) but a DAL (Database Abstraction Layer). The ORM part is not really interesting for AppEngine as the API already takes care of the object mapping and does some simple relational mapping (see RelationProperty).
Also realize that a DAL will never let you switch between AppEngine's datastore and a "normal" sql database like mysql because they work very differently. It might let you switch between different key value stores, like reddis, mongo or tokyo cabinet. But as they all have such very different characteristics I would really think twice before using one.
Lastly, the DAL traditionally sits on top of the DB interface, but with AppEngine's api you can implement your own "stubs" that basically let you use other storage backends on their api. The people at Mongo wrote one for MongoDB which is very nice. And the dev_appserver comes with a filesystem-based one.
And now to the answer: yes there is one! It's part of web.py. I haven't really tried if for the reasons above, so I can't really say if it's good.
PS. I know Ruby has a nice DAL project for keyvalue stores in the works too, but I can't find it now... Maybe nice to port to Python at some point.

Nowadays they do since Google has launched Cloud SQL

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.