I have earlier worked on Java+Spring to create a web-app.
I have to build a new web-app now.
It will have one centralized db.
There will be two different type of instance of web-app.
Web-App 1:
a) It would have nothing to UI render, no html,js etc.
b) All it need to give is some set of rest API which will
b.1) create some new entries in DB
b.2) modify some entries in DB
b.3) retrieve some of DB records in JSON format.
some frontend code ( doesn't belong to this app) will periodically fetch
this details.
c) it will be used by max by 100,000 people but at a given point of time,
we can expect about 1000 users logged in and doing whats being said in b)
Web-App2 :
a) It will have some dashboards
b) 90% of DB operations would be read operations
c) 10% of DB operations would be write/modify
d) There will be about 1000s of user of this system and at any given point of time
hardly 50-1000 people will be accessing it.
I am thinking of following.
Have Web-App 1 created in python+Django and Web-App 2 created in RoR.
I am planning to use to Dynamo DB and memcache.
Why two different frameworks?
1) So that I get to learn both of them
2) There have been concern about scalability in RoR (and I also know people claim its not there), Web-app 1 may have scaling needs in future.
My questions is Do you see any problem with this combination?
for example active records would want you to use specific namings format for your data base tables? Are there any other concerns similar to this?
Anyone else who have used similar technology stack?
both frameworks are full stack framework and and provide MVC, templating, unit testing, security, db migration, caching, security, ORMs.
For my startup, we also needed to put out a full fleshed website along with an API. We are also using DynamoDB for storing most of the data and are only using MySQL for session info.
I opted to use Ruby on Rails for the Webapp and Sinatra for the API. If you're criteria is simply learning as many new things as possible, then it would make sense to opt for relatively different stacks (django/python and RoR). In our case, we went with sinatra because it's essentially a very lightweight wrapper around Rack and perfect for an API which essentially receives requests, calls one or more services or does some processing and hands out a formatted response. While I don't see any problem with using python/django instead of sinatra, in our case the benefit was having to spend less time working with a different language.
Also, scalability in rails is a bit of an iffy subject. In the end, it's about how you use it. We've had no issues scaling rails with unicorn and nginx. Our business logic is all in the API service and the rails server as well uses the API for most of the work. This means we don't use active record on rails and the website is just another consumer for our API which does all the heavy lifting whether the request comes from an app or the website. Using MySQL for the session store ensures we can route requests to any of the application servers without having to worry about always routing requests from the same client to the same server every time. This allows us to ramp up and down easily only considering the amount of traffic we're getting.
At the time we started working on this, there wasn't an ORM for dynamo db which looked and felt just like active record, so we ended up writing a few high level classes of our own to handle storage and retrieval of models on DynamoDb. Considering DynamoDB is not tailored for scans or joins, this didn't take a lot of effort since we were almost always doing lookups based on keys and ranges. This meant we didn't really need a replacement for active record since the real strength of active record is being able to intuitively do joins, etc. by convention.
DynamoDB does have it's limitations though and you might find yourself in situations where you will need to scan a large number of records. In our case, we also use CloudSearch to index some important info and use it as a fallback for cases when we need to do text based searches which need to scan all our data.
Related
Is there software that provides multi-DB multi-tenant support for Django and works with MongoDB?
I think I only need multi-tenancy at the database level and maybe at the schema level but not at the application level.
I have a pretty complicated user model. Some users can view certain data inputted by other users. Users usually belong to organizations. Organizations can be nested hierarchically, and there can be similarities in how the application is configured for users within the organization (e.g., all users within an organization will fill out the same form, unless that's overriden for an individual user). Sometimes certain data that users submit can be viewed by users outside of their organization and even outside of the hierarchy that their organization's within. Organizations using the app can be competitors, and the data we're dealing with is sensitive, so it needs to be very secure. It also needs to be developed very quickly.
I'm thinking of giving each user their own DB, and then either having shared DBs or one shared DB with multiple schemas in order to store configurations that are shared across users within organizations.
Multi tenancy on MongoDB is perfectly viable, we are using it in production at onliquid.com.
I don't know of any lib, plugin or specific software that does it for you but it is doable with not that much effort. If you want to dive into it I would advise to pay special attention to how the driver you are using behaves when selecting the database to read and write to and start working on that. Also take a look at MongoDB configuration options like smallfiles and directoryperdb which allow you to better manage the differences and avoid some problems.
I wrote a blog post some time ago about this for Ruby on Rails using Mongoid, most of the details are applicable to all web frameworks and specific to the inner workings of MongoDB.
I have a server with a database, the server will listen for http requests, and using JSONs for
data transferring.
Currently, what my server code(python) mainly do is read the JSON file, convert it to SQL and make some modifications to the database. And the function of the server, as I see, is only like a converter between JSON and SQL. Is this the right procedure that people usually do?
Then I come up with another idea, I can define a class for user information and every user's information in the database is an instance of that class. When I get the JSON file, first I put it into the instance, do some operation and then put it into the database. In my understanding, it adds a language layer between the http request and the database.
The question is, what do people usually do?
The answer is: people do usually that, what they need to do. The layer between database and client normally provides a higher level api, to make the request independent from the actual database. But how this higher level looks like depends on the application you have.
People usually make use of a Web framework, instead of implementing the basic machinery themselves as you are doing.
That is: Python i s a great language that easily allows one to translate "json into sql" with a small amount of code - and it is great for learning. If you are doing this for educational purposes, it is a nice project to continue fiddling with, and maybe you can have some nice ideas in this answer and in others.
But for "real world" usage, the apparent simple plan comes up with real world issues. Tens or even hundreds of them. How to proper separate html/css template from content from core logic, how to deal with many, many aspects of security, etc...
Them, the frameworks come into play: a web-framewrok project is a software project that had, over the years, and soemtimes hundreds of hours of work from several contributors, thought about, and addresses all of the issues a real web application can and will face.
So, it is ok if one want to to everything from scratch if he believes he can come up with a a framework taht has distinguished capabilities not available in any existing project. And it is ok to make small projects for learning purposes. It is not ok to try to come up with something from scratch for real server production, without having a deep knowledge of all the issues involved, and knowing well at least 3, 4 frameworks.
So, now, you've got the basic understanding of a way to get to a framework - it istime to learn some of the frameworks tehmselves. Try, for example, Bottle and Flask (microframeworks), and Django (a fully featured framework for web application development), maybe Tornado (an http server, but with enough of a web framework in it to be usable, and to be very instructive)- just reading the documentation on "how to get started" with these projects, to get to a "hello world" page will lead you to lots of concepts you probably had not thought about yet.
I have a desktop python application whose data backend is a MySQL database, but whose previous database was a network-accessed xml file(s). When it was xml-powered, I had a thread spawned at the launch of the application that would simply check the xml file for changes and whenever the date modified changed (due to any user updating it), the app would refresh itself so multiple users could use and see the changes of the app as they went about their business.
Now that the program has matured and is venturing toward an online presence so it can be used anywhere. Xml is out the window and I'm using MySQL with SQLAlchemy as the database access method. The plot thickens, however, because the information is no longer stored in one xml file but rather it is split into multiple tables in the SQL database. This complicates the idea of some sort of 'last modified' table value or structure. Thus the question, how do you inform the users that the data has changed and the app needs to refresh? Here are some of my thoughts:
Each table needs a last-modified column (this seems like the worst option ever)
A separate table that holds some last modified column?
Some sort of push notification through a server?
It should be mentioned that I have the capability of running perhaps a very small python script on the same server hosting the SQL db that perhaps the app could connect to and (through sockets?) it could pass information to and from all connected clients?
Some extra information:
The information passed back and forth would be pretty low-bandwidth. Mostly text with the potential of some images (rarely over 50k).
Number of clients at present is very small, in the tens. But the project could be picked up by some bigger companies with client numbers possibly getting into the hundreds. Even still the bandwidth shouldn't be a problem for the foreseeable future.
Anyway, somewhat new territory for me, so what would you do? Thanks in advance!
As I understand this is not a client-server application, but rather an application that has a common remote storage.
One idea would be to change to web services (this would solve most of your problems on the long run).
Another idea (if you don't want to switch to web) is to refresh periodically the data in your interface by using a timer.
Another way (and more complicated) would be to have a server that receives all the updates, stores them in the database and then pushes the changes to the other connected clients.
The first 2 ideas you mentioned will have maintenance, scalability and design uglyness issues.
The last 2 are a lot better in my opinion, but I still stick to web services as being the best.
I was wondering what is the 'best' way of passing data between views. Is it better to create invisible fields and pass it using POST or should I encode it in my URLS? Or is there a better/easier way of doing this? Sorry if this question is stupid, I'm pretty new to web programming :)
Thanks
There are different ways to pass data between views. Actually this is not much different that the problem of passing data between 2 different scripts & of course some concepts of inter-process communication come in as well. Some things that come to mind are -
GET request - First request hits view1->send data to browser -> browser redirects to view2
POST request - (as you suggested) Same flow as above but is suitable when more data is involved
Django session variables - This is the simplest to implement
Client-side cookies - Can be used but there is limitations of how much data can be stored.
Shared memory at web server level- Tricky but can be done.
REST API's - If you can have a stand-alone server, then that server can REST API's to invoke views.
Message queues - Again if a stand-alone server is possible maybe even message queues would work. i.e. first view (API) takes requests and pushes it to a queue and some other process can pop messages off and hit your second view (another API). This would decouple first and second view API's and possibly manage load better.
Cache - Maybe a cache like memcached can act as mediator. But then if one is going this route, its better to use Django sessions as it hides a whole lot of implementation details but if scale is a concern, memcached or redis are good options.
Persistent storage - store data in some persistent storage mechanism like mysql. This decouples your request taking part (probably a client facing API) from processing part by having a DB in the middle.
NoSql storages - if speed of writes are in other order of hundreds of thousands per sec, then MySql performance would become bottleneck (there are ways to get around by tweaking mysql config but its not easy). Then considering NoSql DB's could be an alternative. e.g: dynamoDB, Redis, HBase etc.
Stream Processing - like Storm or AWS Kinesis could be an option if your use-case is real-time computation. In fact you could use AWS Lambda in the middle as a server-less compute module which would read off and call your second view API.
Write data into a file - then the next view can read from that file (real ugly). This probably should never ever be done but putting this point here as something that should not be done.
Cant think of any more. Will update if i get any. Hope this helps in someway.
Portal Technology Assessment in which we will be creating a placement portal for the campuses and industry to help place students. The portal will handle large volumes of data and people logging in, approximately 1000 users/day in a concurrent mode.
What technology should i use? PHP with CakePHP as a framework, Ruby on Rails, ASP.NET, Python, or should I opt for cloud computing? Which of those are the most cost beneficial?
Any of those will do, it really depends on what you know. If you're comfortable with Python, use Django. If you like Ruby go with ROR. These modern frameworks are built to scale, assuming you're not going to be developing something on the scale of facebook then they should suffice.
I personally recommend nginx as your main server to host static content and possibly reverse-proxy to Django/mod_wsgi/Apache2.
Another important aspect is caching, make sure to use something like memcached and make sure the framework has some sort of plugin or it's easily attachable.
Language choice is important as you must choose language that you and your team feel the most comfortable with as you must develop mid-large size application. Of course use framework with Python it must be Django, with ASP.NET .NET or MVC.NET whatever you feel better with with Ruby ROR and with PHP there are too large amount of frameworks.
1000 concurrent users is not that much, especially it depends what users will do. Places where users will get large amount of data are better to Cache with with any caching engine you want. You need to design application this what so you can easily swap between real DB calls and calls to cache. For that use Data Objects like for Logins create an Object array of course if you need it. Save some information in cookies when user logins for example his last login, password in case he wants to change it, email and such so you will make less calls to DB in read mode ( select queries ).
use cookie less domain for static content like images, js and css files. setup on this domain the fastest system you can with simplest server you can, probably something based on Linux.
For servers, best advice is to either get large machine and set Virtual Boxes on it with vmware or other Linux based solution or to get few servers which is better because if on big server down you lost everything if one of 1 is down you still can do some stuff. Especially if you set railroad mode. Railroad mode is simple you set up Application server (IIS or Apache) on one server and make it master while you set up SQL on the same server and make it slave. On other server you set up SQL as master and Application server as slave. So server one serves IIS/Apache and Other one SQL, if one down you just need to change line in host.etc in order to set something somewhere else ( i don't know how to do that in Linux ).
last server for static content.
Cloud Computing, you will use if you want it or not. You will share resources with some applications as Google API for jquery and jqueryUI for instance but you create unique application and i don't believe making core of application based on cloud computing will do any good. Use large site's CDNs for good.