I've got a google app engine application that loads time series data real-time into a google datastore nosql style table. I was hoping to get some feedback around the right type of architecture to pull this data into a web application style chart (and ideally something I could also plug into a content management system like Word Press).
Most of my server-side code is python. What's a reasonable client-server setup to pull the data from the datastore database and display into my webpage? Ideally I'd have something that scales and doesn't cause an unnecessary number of reads on my database (potentially using google-app-engine's built in caching/etc).
I'm guessing this is a common use-case but I'd like to get an idea of what might be some best practices around this. I've seen some examples using client web side javascript/ajax with server side php to read the DB- is this really the best way?
Welcome to "it depends".
You have some choices. Imagine the classic four-quadrant chart. Along one axis is data size, along the other is staleness/freshness.
If your time-series data changes rapidly but is small enough to safely be retrieved within a request, you can query for it on demand, convert it to JSON, and squirt it to the browser to be rendered by the JavaScript charting package of your choice. If the data is large, your app will need to do some sort of server-side pre-processing so that when the data is needed, it can be retrieved in sufficiently fewer requests that that the request won't time out. This might involve something data dependent like pre-bucketing the time series.
If the data changes slowly, you have the option of generating your chart on the server side, perhaps using matplotlib. When new data is ingested, or perhaps at intervals, spawn off a task to generate and cache the chart (or JSON to hand to the front-end) as a blob in the datastore. If the data is sufficiently large that a task will timeout, you might need to use a backend process. If the data is sufficiently large and you don't pre-process, you're in the quadrant of unhappiness.
In my experience, GAE memcache is best for caching data between requests where the time between requests is very short. Don't rely on generating artifacts, stuff them in memcache and hoping that they'll be there a few minutes later. I've rarely seen that work.
Related
I'm building a simple crud app in Django which is basically enhanced reporting for commerce related software. So I pull information on behalf of a user from another software's API, do a bunch of aggregation/calculations on the data, and then display a report to the user.
I'm having a hard time figuring out how to make this all happen quickly. While I'm certain there are optimizations that can be made in my Python code, I'd like to find some way to be able to make multiple calculations on reasonably large sets of objects without making the user wait like 10 seconds.
The problem is that the data can change in a given moment. If the user makes changes to their data in the other software, there is no way for me to know that without hitting the API again for that info. So I don't see a way that I can cache information or pre-fetch it without making a ridiculous number of requests to the API.
In a perfect world the API would have webhooks I could use to watch for data changes but that's not an option.
Is my best bet to just optimize the factors I control to the best of my ability and hope my users can live with it?
I have a rethinkdb. Data will get in database for every five minutes.
I want to create a website to real-time inspect this data flow from rethinkdb.
That is, when surfing the webpage, the data from db on webpages can update automatically without refreshing the webpage.
I know there are several ways to make it real-time such as django channels or websockets. However, model in django does not support rethinkdb.
Sorry I am a layman of making website and may express things inaccurately.
Can someone give me a keyword or hint?
If you make your question more specific, the community here will be able to offer you better support.
However, here is a general solution to your problem.
You will need to do two things:
Create a backend API that allows you to:
Check if new data has been added to the database
Fetch new data via a REST api request
Make frontend AJAX requests to this api
Fetch data
Periodically (every 30sec) check if there is new data
Fetch data again if new data is detected
To do this using Django as the backend, I would recommend using the Django Rest Framework to create your API.
This API should have two endpoints:
ListView of your data
Endpoint returning the id and timestamp of the last datapoint
Next you will have to create a frontend that uses javascript to make requests to these endpoints. When you fetch data, store the id and timestamp of the most recent data point. Use this to check if there is new data.
I would recommend using a Javascript framework such as Angular or react but depending on your needs these may be overkill.
EDIT:
Now that you have updated your answer to be more specific, here is my advice. It sounds like your number one priority is rethinkDB and real time data. Django is not well suited this because it is not compatible with rethinkDB. Real time support has come a long way in Django with Django channels however.
It sounds like you are early on in your project and have little to no codebase in Django. I would recommend using horizon along with rethink db. Horizon is a javascript backend built for real time data from rethinkdb.
I have developed a RESTful API using the Django-rest-framework in python. I developed the required models, serialised them, set up token authentication and all the other due diligence that goes along with it.
I also built a front-end using Angular, hosted on a different domain. I setup CORS modifications so I can access the API as required. Everything seems to be working fine.
Here is the problem. The web app I am building is a financial application that should allow the user to run some complex calculations on the server and send the results to the front-end app so they can be rendered into charts and other formats. I do not know how or where to put these calculations.
I chose Django for the back-end as I expected that python would help me run such calculations wherever required. Basically, when I call a particular api link on the server, I want to be able to retrieve data from my database, from multiple tables if required, and use the data to run some calculations using python or a library of python (pandas or numpy) and serve the results of the calculations as response to the API call.
If this is a daunting task, I at least want to be able to use the API to retrieve data from the tables to the front-end, process the data a little using JS, and send it to a python function located on the server with this processed data, and this function would run the necessary complex calculations and respond with results which would be rendered into charts / other formats.
Can anyone point me to a direction to move from here? I looked for resources online but I think I am unable to find the correct keywords to search for them. I just want a shell code kind of a thing to integrate into my current backed using which I can call some python scripts that I write to run these calculations.
Thanks in advance.
I assume your question is about "how do I do these calculations in the restful framework for django?", but I think in this case you need to move away from that idea.
You did everything correctly but RESTful APIs serve resources -- basically your model.
A computation however is nothing like that. As I see it, you have two ways of achieving what you want:
1) Write a model that represents the results of a computation and is served using the RESTful framework, thus your computation being a resource (can work nicely if you store the results in your database as a way of caching)
2) Add a route/endpoint to your api, that is meant to serve results of that computation.
Path 1: Computation as Resource
Create a model, that handles the computation upon instantiation.
You could even set up an inheritance structure for computations and implement an interface for your computation models.
This way, when the resource is requested and the restful framework wants to serve this resource, the computational result will be served.
Path 2: Custom Endpoint
Add a route for your computation endpoints like /myapi/v1/taxes/compute.
In the underlying controller of this endpoint, you will load up the models you need for your computation, perform the computation, and serve the result however you like it (probably a json response).
You can still implement computations with the above mentioned inheritance structure. That way, you can instantiate the Computation object based on a parameter (in the above case taxes).
Does this give you an idea?
I have earlier worked on Java+Spring to create a web-app.
I have to build a new web-app now.
It will have one centralized db.
There will be two different type of instance of web-app.
Web-App 1:
a) It would have nothing to UI render, no html,js etc.
b) All it need to give is some set of rest API which will
b.1) create some new entries in DB
b.2) modify some entries in DB
b.3) retrieve some of DB records in JSON format.
some frontend code ( doesn't belong to this app) will periodically fetch
this details.
c) it will be used by max by 100,000 people but at a given point of time,
we can expect about 1000 users logged in and doing whats being said in b)
Web-App2 :
a) It will have some dashboards
b) 90% of DB operations would be read operations
c) 10% of DB operations would be write/modify
d) There will be about 1000s of user of this system and at any given point of time
hardly 50-1000 people will be accessing it.
I am thinking of following.
Have Web-App 1 created in python+Django and Web-App 2 created in RoR.
I am planning to use to Dynamo DB and memcache.
Why two different frameworks?
1) So that I get to learn both of them
2) There have been concern about scalability in RoR (and I also know people claim its not there), Web-app 1 may have scaling needs in future.
My questions is Do you see any problem with this combination?
for example active records would want you to use specific namings format for your data base tables? Are there any other concerns similar to this?
Anyone else who have used similar technology stack?
both frameworks are full stack framework and and provide MVC, templating, unit testing, security, db migration, caching, security, ORMs.
For my startup, we also needed to put out a full fleshed website along with an API. We are also using DynamoDB for storing most of the data and are only using MySQL for session info.
I opted to use Ruby on Rails for the Webapp and Sinatra for the API. If you're criteria is simply learning as many new things as possible, then it would make sense to opt for relatively different stacks (django/python and RoR). In our case, we went with sinatra because it's essentially a very lightweight wrapper around Rack and perfect for an API which essentially receives requests, calls one or more services or does some processing and hands out a formatted response. While I don't see any problem with using python/django instead of sinatra, in our case the benefit was having to spend less time working with a different language.
Also, scalability in rails is a bit of an iffy subject. In the end, it's about how you use it. We've had no issues scaling rails with unicorn and nginx. Our business logic is all in the API service and the rails server as well uses the API for most of the work. This means we don't use active record on rails and the website is just another consumer for our API which does all the heavy lifting whether the request comes from an app or the website. Using MySQL for the session store ensures we can route requests to any of the application servers without having to worry about always routing requests from the same client to the same server every time. This allows us to ramp up and down easily only considering the amount of traffic we're getting.
At the time we started working on this, there wasn't an ORM for dynamo db which looked and felt just like active record, so we ended up writing a few high level classes of our own to handle storage and retrieval of models on DynamoDb. Considering DynamoDB is not tailored for scans or joins, this didn't take a lot of effort since we were almost always doing lookups based on keys and ranges. This meant we didn't really need a replacement for active record since the real strength of active record is being able to intuitively do joins, etc. by convention.
DynamoDB does have it's limitations though and you might find yourself in situations where you will need to scan a large number of records. In our case, we also use CloudSearch to index some important info and use it as a fallback for cases when we need to do text based searches which need to scan all our data.
I was wondering what is the 'best' way of passing data between views. Is it better to create invisible fields and pass it using POST or should I encode it in my URLS? Or is there a better/easier way of doing this? Sorry if this question is stupid, I'm pretty new to web programming :)
Thanks
There are different ways to pass data between views. Actually this is not much different that the problem of passing data between 2 different scripts & of course some concepts of inter-process communication come in as well. Some things that come to mind are -
GET request - First request hits view1->send data to browser -> browser redirects to view2
POST request - (as you suggested) Same flow as above but is suitable when more data is involved
Django session variables - This is the simplest to implement
Client-side cookies - Can be used but there is limitations of how much data can be stored.
Shared memory at web server level- Tricky but can be done.
REST API's - If you can have a stand-alone server, then that server can REST API's to invoke views.
Message queues - Again if a stand-alone server is possible maybe even message queues would work. i.e. first view (API) takes requests and pushes it to a queue and some other process can pop messages off and hit your second view (another API). This would decouple first and second view API's and possibly manage load better.
Cache - Maybe a cache like memcached can act as mediator. But then if one is going this route, its better to use Django sessions as it hides a whole lot of implementation details but if scale is a concern, memcached or redis are good options.
Persistent storage - store data in some persistent storage mechanism like mysql. This decouples your request taking part (probably a client facing API) from processing part by having a DB in the middle.
NoSql storages - if speed of writes are in other order of hundreds of thousands per sec, then MySql performance would become bottleneck (there are ways to get around by tweaking mysql config but its not easy). Then considering NoSql DB's could be an alternative. e.g: dynamoDB, Redis, HBase etc.
Stream Processing - like Storm or AWS Kinesis could be an option if your use-case is real-time computation. In fact you could use AWS Lambda in the middle as a server-less compute module which would read off and call your second view API.
Write data into a file - then the next view can read from that file (real ugly). This probably should never ever be done but putting this point here as something that should not be done.
Cant think of any more. Will update if i get any. Hope this helps in someway.