How to make a cost effective but scalable site?

How to make a cost effective but scalable site? - python

Portal Technology Assessment in which we will be creating a placement portal for the campuses and industry to help place students. The portal will handle large volumes of data and people logging in, approximately 1000 users/day in a concurrent mode.
What technology should i use? PHP with CakePHP as a framework, Ruby on Rails, ASP.NET, Python, or should I opt for cloud computing? Which of those are the most cost beneficial?

Any of those will do, it really depends on what you know. If you're comfortable with Python, use Django. If you like Ruby go with ROR. These modern frameworks are built to scale, assuming you're not going to be developing something on the scale of facebook then they should suffice.
I personally recommend nginx as your main server to host static content and possibly reverse-proxy to Django/mod_wsgi/Apache2.
Another important aspect is caching, make sure to use something like memcached and make sure the framework has some sort of plugin or it's easily attachable.

Language choice is important as you must choose language that you and your team feel the most comfortable with as you must develop mid-large size application. Of course use framework with Python it must be Django, with ASP.NET .NET or MVC.NET whatever you feel better with with Ruby ROR and with PHP there are too large amount of frameworks.
1000 concurrent users is not that much, especially it depends what users will do. Places where users will get large amount of data are better to Cache with with any caching engine you want. You need to design application this what so you can easily swap between real DB calls and calls to cache. For that use Data Objects like for Logins create an Object array of course if you need it. Save some information in cookies when user logins for example his last login, password in case he wants to change it, email and such so you will make less calls to DB in read mode ( select queries ).
use cookie less domain for static content like images, js and css files. setup on this domain the fastest system you can with simplest server you can, probably something based on Linux.
For servers, best advice is to either get large machine and set Virtual Boxes on it with vmware or other Linux based solution or to get few servers which is better because if on big server down you lost everything if one of 1 is down you still can do some stuff. Especially if you set railroad mode. Railroad mode is simple you set up Application server (IIS or Apache) on one server and make it master while you set up SQL on the same server and make it slave. On other server you set up SQL as master and Application server as slave. So server one serves IIS/Apache and Other one SQL, if one down you just need to change line in host.etc in order to set something somewhere else ( i don't know how to do that in Linux ).
last server for static content.
Cloud Computing, you will use if you want it or not. You will share resources with some applications as Google API for jquery and jqueryUI for instance but you create unique application and i don't believe making core of application based on cloud computing will do any good. Use large site's CDNs for good.

Related

How do I deploy this app for my job: EC2, Elastic Beanstalk, something else entirely?

I'm tasked with creating a web app (I think?) for my job that will tracker something in our system. It'll be an internal tool that staff uses to keep track of the status of one of the things we do. It should look like trello, with cards that drag from step to step. That frontend exists, but my job is to make the system update when the cards are dragged. This requires using an API in Python and isn't that complicated to grab from/update. I have no idea how to put all of this together. My job is almost completely nontechnical and there's no one internally who knows what I'm doing except for me. I'm in so over my head here and have no idea where to begin. Is this something I should deploy on Elastic Beanstalk? EC2? How do I tie this together and put it somewhere?

Are you trying to pull in live data from Trello or from your companies own internal project management tool?
An EC2 might be useful, but honestly, it may be completely unnecessary if your company has its own servers. An EC2 is basically just a collection of rental computers to help with scaling. I have never used beanstalk so my input would be useless there.
From what I can assume from the question, you could have a python script running to pull from the API and make the changes without an EC2.

First thing you should do is gather as much information about what the end product should look like. From your question, I have the feeling that you have only a vague idea of what the stakeholders want. Don't be afraid to ask more clarification about an unclear task. It's better to spend 30 minutes discussing and taking note than to show the end-product after a month and realizing that's not what your boss/team wanted.
Question I would Ask
Who is going to be using this app? (technical or non-technical person)
For what purpose is this being developed?
Does it need to be on the web or can it be used locally?
How many users need to have access to this application?
Are we handling sensitive information with this application?
Will this need to be augmented with other functionality at some point?
This is just a sample of what I would ask, during the conversation with the stakeholder a lot more will pop up for sure.
What I think you have to do
You need to make a monitoring system for the tasks that need to be done by your development team (like a Kanban)
What I think you already have
A frontend with the card that are draggable to each bin. I also assume that you can create a new card and delete one in the frontend. The frontend is most likely written in React, Angular or Vue.js. You might also have no frontend framework (a mix of jQuery and vanilla js), but usually frontend developper end up picking a framework of sort to help the development.
A backend API in Python (in Flask or with Django-rest-framework most likely) that is communicating with a SQL database like postgresql or a Document database like MongoDB.
I'm making a lot of assumption here, but your aim should be to understand the technology you will be working with in order to check which hosting would work best. For instance, if the database that is setup is a MySQL database you might have some trouble with some hosting provider.
What I think you are missing
Currently the frontend and the backend don't communicate to each other. When you drag a card it won't persist if you refresh the page. Also, all of this is sitting in your computer and cannot be used by any one from your staff. You need to first connect the frontend with the backend so that the application has persistance. Then you need to deploy this application somewhere so that it is reachable by your staff.
What I would do is first work locally to make sure that the layer of persistance is working. This imply having the API server, the frontend server and the database server running simultaneously on your computer to develop. You should then fetch data from the API to know which cards are there in the database and then create them visually in your frontend at the right spot.
When you drop a card to a new spot after having dragging it should trigger a POST request to your API server in order to update the status of this particular card (look at the documentation of your API to check what you need to send).
The server should be sending back an updated version of the cards status if the POST request was sucessful, so your application should then just redraw the card at the right spot (it won't make a difference for you since they are already at the right spot and your frontend framework will most likely won't act on this change since the state hasn't changed). That's all I would do for that part.
I would then move to the deployment phase to make sure that whatever you did locally can still work online. I would use Heroku to start instead of jumping directly to AWS. Heroku is a service built on top of AWS which manage a lot of the complexity of AWS for you. This is great for prototyping and it means that when your stuff is ready you can migrate to AWS easily and be confident that a setup exist to make your app work. You might also be tied up to your company servers, which is another thing I would ask to the stakeholder (i.e. where can I put this application and where I can't put it).
The flow for a frontend + api + database application on Heroku is usually as follow. You create a github repo for your frontend (make it private) and you create an app on Heroku that will watch this repository for changes. It will re-deploy the application for you when it sees a change at a specific subdomain of Heroku hosting. You will need to configure some procfiles that will tell Heroku what to do with a given application type. This is where you need to double check what frontend you are using since that might change the procfiles used. It's most likely a node.js based frontend (React, Angular or Vue) so head over here for the documentation of how to put that online.
You will need to make a repo for the backend also that is separate from the frontend, these two entities are distinct and they only communicate through HTTP request (frontend->backend) and JSON (backend->frontend). You will need to follow the same idea as with the frontend to deploy, head over here.
Once you have these two online, you need to create a database on Heroku. This is done by adding a datastore to your api, head over here. There are some framework specific configuration you need to do to make the API talk to an online database, but then you will need to find that configuration on the framework documentation. The database could also be already up and living on your server, if this is the case you just need to configure your online backend to talk to that particular database at a particular address.
Once all of the above is done, re-test your application to check if you get the same behavior as before. This is a usable MVP, however there are no layer of security. Anyone with the right URL could just fetch your frontend and start messing around with your data.
There is more engineering that need to be done to make this a viable end product. This leads us to my final remark: why you are not using a product like Trello, Jira, or even Github Project? If it is to save some money on not paying for a subscription I think you should factor in the cost of development, security and maintenance of this application.
Hope it helps!

One simple option is Heroku for deploy your API and your frontend application.

OLAP Server for NodeJS

I have been looking for ways to provide analytics for an app which is powered by REST server written in NodeJs and MySQL. Discovered OLAP which can actually make this much easier.
And found a python library that provides an OLAP HTTP server called 'Slicer'
http://cubes.databrewery.org/
Can someone explain how this works? Does this mean I have to update my schema. And create what is called fact tables?
Can this be used in conjunction with my NodeJS App? Any examples? Since I have only created single server apps. Would python reside on the same nodejs server. How will it start? ('forever app.js' is my default script)
If I cant use python since I have no exp, what are basics to do it in Nodejs?
My model is basically list of words, so the olap queries I have are words made in days,weeks,months of length 2,5,10 letters in languages eng,french,german etc
Ideas, hints and guidance much appreciated!

As you found out, CUbes provides an HTTPS OLAP server (the slicer tool).
Can someone explain how this works?
As an OLAP server, you can issue OLAP queries to the server. The API is REST/JSON based, so you can easily query the server from Javascript, nodejs, Python or any other language of your choice via HTTP.
The server can answer OLAP queries. OLAP queries are based on a model of "facts" and "dimensions". You can for example query "the total sales amount for a given country and product, itemized by moonth".
Does this mean I have to update my schema. And create what is called fact tables?
OLAP queries are is built around the Facts and Dimension concepts.
OLAP-oriented datawarehousing strategies often involve the creation of these Fact and Dimension tables, building what is called a Star Schema or a Snowflake Schema. These schemas offer better performance for OLAP-type queries on relational databases. Data is often loaded by what is called an ETL process (it can be a simple script) that loads data in the appropriate form.
The Python Cubes framework, however, does not force you to alter your schema or create an alternate one. It has a SQL backend which allows you to define your model (in terms of Facts and Dimensions) without the need of changing the actual database model. This is the documentation for the model definition: https://pythonhosted.org/cubes/model.html .
However, in some cases you may still prefer to define a schema for Data Mining and use a transformation process to load data periodically. It depends on your needs, the amount of data you have, performance considerations, etc...
With Cubes you can also use other non RDBMS backends (ie MongoDB), some of which offer built-in aggregation capabilities that OLAP servers like Cubes can leverage.
Can this be used in conjunction with my NodeJS App?
You can issue queries to your Cubes Slicer server from NodeJS.
Any examples?
There is a Javascript client library to query Cubes. You probably want to use this one: https://github.com/Stiivi/cubes.js/
I don't know of any examples using NodeJS. You can try to get some inspiration from the included AngularJS application in Cubes (https://github.com/Stiivi/cubes/tree/master/incubator). Another client tool is CubesViewer which may be of use to you while building your model: http://jjmontesl.github.io/cubesviewer/ .
Since I have only created single server apps. Would python reside on the same nodejs server. How will it start? ('forever app.js' is my default script)
You would run Cubes Slicer server as a web application (directly from your web server, ie. Apache). For example, with Apache, you would use apache-wsgi mod which allows to serve python applications.
Slicer can also run as a small web server in a standalone process, which is very handy during development (but I wouldn't recommend for production environments). In this case, it will be listening on a different port (typically: http://localhost:5000 ).
If I cant use python since I have no exp, what are basics to do it in Nodejs?
You don't really need to use Python at all. You can configure and use Python Cubes as OLAP server, and run queries from Javascript code (ie. directly from the browser). From the client point of view, is like a database system which you can query via HTTP and get responses in JSON format.

Django and Rails with one common DB

I have earlier worked on Java+Spring to create a web-app.
I have to build a new web-app now.
It will have one centralized db.
There will be two different type of instance of web-app.
Web-App 1:
a) It would have nothing to UI render, no html,js etc.
b) All it need to give is some set of rest API which will
b.1) create some new entries in DB
b.2) modify some entries in DB
b.3) retrieve some of DB records in JSON format.
some frontend code ( doesn't belong to this app) will periodically fetch
this details.
c) it will be used by max by 100,000 people but at a given point of time,
we can expect about 1000 users logged in and doing whats being said in b)
Web-App2 :
a) It will have some dashboards
b) 90% of DB operations would be read operations
c) 10% of DB operations would be write/modify
d) There will be about 1000s of user of this system and at any given point of time
hardly 50-1000 people will be accessing it.
I am thinking of following.
Have Web-App 1 created in python+Django and Web-App 2 created in RoR.
I am planning to use to Dynamo DB and memcache.
Why two different frameworks?
1) So that I get to learn both of them
2) There have been concern about scalability in RoR (and I also know people claim its not there), Web-app 1 may have scaling needs in future.
My questions is Do you see any problem with this combination?
for example active records would want you to use specific namings format for your data base tables? Are there any other concerns similar to this?
Anyone else who have used similar technology stack?
both frameworks are full stack framework and and provide MVC, templating, unit testing, security, db migration, caching, security, ORMs.

For my startup, we also needed to put out a full fleshed website along with an API. We are also using DynamoDB for storing most of the data and are only using MySQL for session info.
I opted to use Ruby on Rails for the Webapp and Sinatra for the API. If you're criteria is simply learning as many new things as possible, then it would make sense to opt for relatively different stacks (django/python and RoR). In our case, we went with sinatra because it's essentially a very lightweight wrapper around Rack and perfect for an API which essentially receives requests, calls one or more services or does some processing and hands out a formatted response. While I don't see any problem with using python/django instead of sinatra, in our case the benefit was having to spend less time working with a different language.
Also, scalability in rails is a bit of an iffy subject. In the end, it's about how you use it. We've had no issues scaling rails with unicorn and nginx. Our business logic is all in the API service and the rails server as well uses the API for most of the work. This means we don't use active record on rails and the website is just another consumer for our API which does all the heavy lifting whether the request comes from an app or the website. Using MySQL for the session store ensures we can route requests to any of the application servers without having to worry about always routing requests from the same client to the same server every time. This allows us to ramp up and down easily only considering the amount of traffic we're getting.
At the time we started working on this, there wasn't an ORM for dynamo db which looked and felt just like active record, so we ended up writing a few high level classes of our own to handle storage and retrieval of models on DynamoDb. Considering DynamoDB is not tailored for scans or joins, this didn't take a lot of effort since we were almost always doing lookups based on keys and ranges. This meant we didn't really need a replacement for active record since the real strength of active record is being able to intuitively do joins, etc. by convention.
DynamoDB does have it's limitations though and you might find yourself in situations where you will need to scan a large number of records. In our case, we also use CloudSearch to index some important info and use it as a fallback for cases when we need to do text based searches which need to scan all our data.

How to re-architect a portal for creating mobile app

Currently I am working on a portal which is exposed to end users. This portal is developed using Python 2.7, Django 1.6 and MySQL.
Now we want to expose this portal as a mobile app. But current design does not support that as templates, views and database are tightly coupled with each other. So we decided to re-architect the whole portal. After some research I found following:
Client side: AngularJS for all client side operations like show data and get data using ajax.
Server side: Rest API exposed to AngularJS. This Rest API can be developed using either Tastypie or Django Rest Framework (still not decided). Rest API will be exposed over Django.
I have few questions:
What you guys think about architecture? Is this is a good or bad design? How it can be improved?
Will performance of portal will go down after adding above layers in architecture?
In the above architecture whether 2 servers should be used to run this (like one for client and other for serving the API's) or one server will be enough. Currently Heroku is used for deployment.
Currently portal is getting 10K hits in a day and it is expected to go to 100K a day in 6 months. Will be happy to provide more information if needed.

If i got an opportunity to architect the portal which you mentioned than i would really love to design the architecture which i have already explained here.

What you guys think about architecture?
This is a common Service Oriented Architecture with decoupled Clients. You just have REST endpoints on your backend, and any Client can consume those endpoints.
You should also think about:
Do you need RESTful service (RESTful == stateless, will you store any state on the server?)
How to scale the service in the future? (this is a legit thing as you already aware of huge traffic increase and assume 2 servers)
How it can be improved?
Use scala instead of python :)
Will performance of portal will go down after adding above layers in architecture?
It depends.
It will get some performance penalty (any additional abtract layer has it's tax), but most probably you won't event notice it. But still, you should measure it using some stress tests.
In the above architecture whether 2 servers should be used to run this (like one for client and other for serving the API's) or one server will be enough. Currently Heroku is used for deployment.
Well, as usual, it depends.
It depends on the usage profile you have right now and on the resources available.
If you are interested in whether the new design will perform better than the old one? - there are a number of parameters.
Resume
This is a good overall approach for the system with different clients.
It will allow you:
Totally decouple mobile app and frontend development from backend development. (It could be different independent teams, outsourceable)
Standardize your API layer (as all clients will consume the same endpoints)
Make you service scalable easier (this includes the separate webserver for static assets and many more).

Multi tier architecture implementation on Python

I need to create web application, which can be reached by user as regular web site and as XML-RPC web service. Also web site should have mobile version. I'm planning to use next technologies:
Django (for web frontends (regular and mobile)).
Pyramid (for web service).
SQLAlchemy, Memcached (for persistence level)
Later other projects can reach this data and providing logic, so I think it is better to make two tiers. I see it in next way:
Tier 1. Main logic service level. This level will provide API for frontend applications (Django powered web site, for example).
Tier 2. Different mostly end client applications (web site, API for remote client devices).
For communication between this tiers I'm planning to use XML-RPC protocol.
In this case it will be easy to scale it and add new front end application or connect another projects to this (I believe it).
I have main question, -- what can I use to make it easy build first tier? Maybe there is some framework good for that?
And what do you think about this whole architecture. Because I'm filling that I'm thinking in Java terms developing in Python. Maybe there is some another idioms in Python world for such situations.
Thanks for you time and help.
P. S.
Some links for reading are welcome.

This architecture really makes no sense. You're using Django, a full-stack web framework, for the front end, but not using it for the database. And you're using Pyramid, another full-stack web framework, for the web service side, thus ensuring that you duplicate all the business logic.
Much as I am an advocate of Django, I would say it has no place in your architecture. It looks like the only thing you're really using it for is URL routing and templates, both of which Pyramid does itself fine - you can even use Jinja2, which is based on Django's template language, as the template language in Pyramid if you like.
Doing it this way means that you can share the business logic between the front-end and web service code, since you'll almost certainly find that a lot of it will be the same.
I must say also that I don't understand the division into tiers, which you have described as separate from the front-end/web service division. To me, the web service is the second tier. It makes no sense to have a further division.

You should checkout the Turbogears framework as it is composed of several popular components: ORM with sqlalchemy, pylons for logic and support for WSGI, permits support for several templating engines for the frontend... endless.
I use it for several web-services behind AJAX-enabled front-ends (like Flex-based apps, among others). You can front end the TG2-based webapp with apache or your favorite WSGI-enabled web server too.
Checkout their website since they have a tutorial to setup a wiki in 20 minutes.
Cheers!

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.