DB structure for Cloud based billing software, planning issues [closed]

DB structure for Cloud based billing software, planning issues [closed] - python

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
We are building a Cloud-based billing software. This software is web-based and should function like desktop software (Atleast). We will have 5000+ users billing at the same time. For now, we only have just 250 users. We are in a need of scaling now. We are using Angular as a Fronten, Python is used for Backend and React Native for Mobile App. PostgreSQL DB is used for Database. I have few doubts, to clarify before we scale.
Using PostgreSQL for DB will show any issues in the future?
Instead of Integer's primary key, we are using UUID (For easy data migrations, but it uses more space). Is that will introduce any problems in the future?
Do we have to consider any DB methods for this kind of scaling ? (Now, uses a single DB for all users)
We are planning to use one server with a huge spec (for all users). Is that will be good or do we have to plan for anything else ?
Using a separate application server and DB server is needed for our this scenario?

I'll try to answer the questions. Feel free to judge it.
So, you are building a cloud-based billing software. Now you have 250+ users and is expected to have at least 5000 users in the future.
Now answering the questions you asked:
Using PostgreSQL for DB will show any issues in the future?
ans: PostgreSQL is great for production. It is the safe way to go. It shouldn't show any issues in the future, but depends highly on the db design.
Instead of Integer's primary key, we are using UUID (For easy data migrations, but it uses more space). Is that will introduce any problems in the future?
ans: Using UUID has its own advantages and disadvantages. If you think scaling is a problem, then you should consider updating your revenue model.
Do we have to consider any DB methods for this kind of scaling ? (Now, uses a single DB for all users)
ans: A single DB for a production app is good at the initial stage. When scaling especially in the case of 5000 concurrent users, it is good to think about moving to Microservices.
We are planning to use one server with a huge spec (for all users). Is that will be good or do we have to plan for anything else ?
ans: Like I said, 5k concurrent users will require a mighty server(depends highly on the operations though, I'm assuming moderate-heavy calculations and stuff) therefore, it's recommended to plan for Microservices architecture. Thant way you can scale up heavily used services and scale down the other. But keep in mind that, Microservices may sound great, but in practice, it's a pain to setup. If you have a strong backend team, you can proceed with this idea otherwise just don't.
Using a separate application server and DB server is needed for our this scenario?
ans: Short answer is Yes. Long answer: why do you want to stress your server machine when you have that many users.

Related

Best practice for deploying machine learning web app with Flask [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
Hoping you can help. I’m creating a web app with python and Flask. One of the the things that my web app will do is provide a smart document search. You can enter text and it will fetch results of documents similar to the portion of text you entered.
I’ve used Flask for the front end to serve the HTML, manage any DB interactions required and display results. It will pass the query through to a Gensim similarity model and query it.
My question here is what is the best way to host these? I’ve explored loading the model as part of loading flask but it slows things down quite a lot (it’s c. 6gb in memory) but it works. I can then query the model quite easily as it’s all within the same program scope.
My concern is that this would then not be scalable and possibly not best practice and that I may be better to host the model separately and make API calls to it from my Flask web app.
Thoughts and views would be much appreciated.
Thanks,
Pete

Your thoughts are definitely on the right track.
Yes, you should separate the hosting of the model from your web app. Your suggestion of an API is a good one. Even if in the beginning it is all hosted on one machine, it is still worth doing this separation.
Once you are hosting this separately via an API, then as your web app has more users, it becomes easy to scale the model API.
Whether by launching more instances and balancing requests. Or, depending on requirements, you could add scalability and robustness via messaging, like Rabbitmq, or a mix of the two.
For example, some systems that access extremely large datasets, return a response via email to let you know your answer is ready to download or view. In this case, you may host one instance of the model, and put requests in q queue to answer one by one.
If you need very fast responses from your model, then you are likely to scale via more instances and balancing.
Both options above can be rolled out yourself, using open source solutions, or you can go straight to managed services in the cloud that will auto scale via either of these methods.
If you are just producing this project yourself with no funding, then you most likely do not want to start by using managed services in the cloud, as these will auto scale your bank account in the wrong direction.
The above solutions allow you to make changes, update the model, even use a different one, and release it on its own as long as it still conforms to the API.
Separation of boundaries in data, and responsibilities in behaviour are important in having a scalable and maintainable architecture.

Django and Rails with one common DB

I have earlier worked on Java+Spring to create a web-app.
I have to build a new web-app now.
It will have one centralized db.
There will be two different type of instance of web-app.
Web-App 1:
a) It would have nothing to UI render, no html,js etc.
b) All it need to give is some set of rest API which will
b.1) create some new entries in DB
b.2) modify some entries in DB
b.3) retrieve some of DB records in JSON format.
some frontend code ( doesn't belong to this app) will periodically fetch
this details.
c) it will be used by max by 100,000 people but at a given point of time,
we can expect about 1000 users logged in and doing whats being said in b)
Web-App2 :
a) It will have some dashboards
b) 90% of DB operations would be read operations
c) 10% of DB operations would be write/modify
d) There will be about 1000s of user of this system and at any given point of time
hardly 50-1000 people will be accessing it.
I am thinking of following.
Have Web-App 1 created in python+Django and Web-App 2 created in RoR.
I am planning to use to Dynamo DB and memcache.
Why two different frameworks?
1) So that I get to learn both of them
2) There have been concern about scalability in RoR (and I also know people claim its not there), Web-app 1 may have scaling needs in future.
My questions is Do you see any problem with this combination?
for example active records would want you to use specific namings format for your data base tables? Are there any other concerns similar to this?
Anyone else who have used similar technology stack?
both frameworks are full stack framework and and provide MVC, templating, unit testing, security, db migration, caching, security, ORMs.

For my startup, we also needed to put out a full fleshed website along with an API. We are also using DynamoDB for storing most of the data and are only using MySQL for session info.
I opted to use Ruby on Rails for the Webapp and Sinatra for the API. If you're criteria is simply learning as many new things as possible, then it would make sense to opt for relatively different stacks (django/python and RoR). In our case, we went with sinatra because it's essentially a very lightweight wrapper around Rack and perfect for an API which essentially receives requests, calls one or more services or does some processing and hands out a formatted response. While I don't see any problem with using python/django instead of sinatra, in our case the benefit was having to spend less time working with a different language.
Also, scalability in rails is a bit of an iffy subject. In the end, it's about how you use it. We've had no issues scaling rails with unicorn and nginx. Our business logic is all in the API service and the rails server as well uses the API for most of the work. This means we don't use active record on rails and the website is just another consumer for our API which does all the heavy lifting whether the request comes from an app or the website. Using MySQL for the session store ensures we can route requests to any of the application servers without having to worry about always routing requests from the same client to the same server every time. This allows us to ramp up and down easily only considering the amount of traffic we're getting.
At the time we started working on this, there wasn't an ORM for dynamo db which looked and felt just like active record, so we ended up writing a few high level classes of our own to handle storage and retrieval of models on DynamoDb. Considering DynamoDB is not tailored for scans or joins, this didn't take a lot of effort since we were almost always doing lookups based on keys and ranges. This meant we didn't really need a replacement for active record since the real strength of active record is being able to intuitively do joins, etc. by convention.
DynamoDB does have it's limitations though and you might find yourself in situations where you will need to scan a large number of records. In our case, we also use CloudSearch to index some important info and use it as a fallback for cases when we need to do text based searches which need to scan all our data.

web2py or grok (zope) on a big portal,

I am planning to make some big project (1 000 000 users, approximately 500 request pre second - in hot time).
For performance I'm going to use no relational dbms (each request could cost lot of instructions in relational dbms like mysql) - so i can't use DAL.
My question is:
how web2py is working with a big traffic, is it work concurrently? I'm consider to use web2py or Gork - Zope,
How is working zodb(Z Object Database) with a lot of data? Is there some comparison with object-relational postgresql?
Could you advice me please.

First, don't assume that a data abstraction layer will have unacceptable performance, until you actually see it in practice. It is pretty easy to switch to RAW sql if and when you run into a problem.
Second, most users who worry about there server technology handling a million users never finish their applications. Pick whatever technology you think will enable you to build the best application in the shortest time. Any technology can be scaled, at the very least, through clustering.

I agree with mikerobi - pick what will let you develop fastest. For me that is web2py.
web2py runs on Google App Engine, so if you don't want to use a relational database then you can use Google's datastore.

Zope and the ZODB have been used with big applications, but I'd still consider linking Zope with MySQL or something like that for serious large-scale applications. Even though Zope has had a lot of development cycles, it is usually used with another database engine for good reason. As far as I know, the argument applies doubly for web2py.

Which Python client library should I use for CouchdB? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
I'm starting to experiment with CouchDB because it looks like the perfect solution for certain problems we have. Given that all work will be on a brand new project with no legacy dependencies, which client library would you suggest that I use, and why?
This would be easier if there was any overlap on the OSes we use. FreeBSD only has py-simplecouchdb already available in its ports collection, but that library's project website says to use CouchDBKit instead. Neither of those come with Ubuntu, which only ships with CouchDB. Since those two OSes don't have an libraries in common, I'll probably be installing something from source (and hopefully submitting packages to the Ubuntu and FreeBSD folks if I have time).
For those interested, I'd like to use CouchDB as a convenient intermediate storage place for data passed between various services - think of a message bus system but with less formality. For example, we have daemons that download and parse web pages, then send interesting bits to other daemons for further processing. A lot of those objects are ill-defined until runtime ("here's some HTML, plus a set of metadata, and some actions to run on it"). Rather than serialize it to an ad-hoc local network protocol or stick it in PostgreSQL, I'd much rather use something designed for the purpose. We're currently using NetWorkSpaces in this role, but it doesn't have nearly the breadth of support or the user community of CouchDB.

I have been using couchdb-python with quite a lot of success and as far as I know the guys of desktopcouch use it in ubuntu. The prerequisites are very basic and you should have not problems:
httplib2
simplejson or cjson
Python
CouchDB 0.9.x (earlier or later versions are unlikely to work as the interface is still changing)
For me some of the advantages are:
Pythonic interface. You can work with the database like if it was a dict.
Interface for design documents.
a CouchDB view server that allows writing view functions in Python
It also provides a couple of command-line tools:
couchdb-dump: Writes a snapshot of a CouchDB database
couchdb-load: Reads a MIME multipart file as generated by couchdb-dump and loads all the documents, attachments, and design documents into a CouchDB database.
couchdb-replicate: Can be used as an update-notification script to trigger replication between databases when data is changed.

If you're still considering CouchDB then I'll recommend Couchdbkit (http://www.couchdbkit.org). It's simple enough to quickly get a hang on and runs fine on my machine running Karmic Koala. Prior to that I've tried couchdb-python but some bugs (maybe ironed out by now) with httplib was giving me some errors (duplicate documents..etc) but Couchdbkit got me up and going so far without any problems.

spycouch
Simple Python API for CouchDB
Python library for easily manage CouchDB.
Compared to ordinarily available libraries on web, works with the latest version CouchDB - 1.2.1
Functionality
Create a new database on the server
Deleting a database from the server
Listing databases on the server
Database information
Database compression
Create map view
Map view
Listing documents in DB
Get document from DB
Save document to DB
Delete document from DB
Editing of a document
spycouch on >> https://github.com/cernyjan/repository

Considering the task you are trying to solve (distributed task processing) you should consider using one of the many tools designed for message passing rather than using a database. See for instance this SO question on running multiple tasks over many machines.
If you really want a simple casual message passing system, I recommend you shift your focus to MorbidQ. As you get more serious, use RabbitMQ or ActiveMQ. This way you reduce the latency in your system and avoid having many clients polling a database (and thus hammering that computer).
I've found that avoiding databases is a good idea (That's my blog) - and I have a end-to-end live data system running using MorbidQ here

I have written a couchdb client library built on python-requests (which is in most distributions). We use this library in production.
https://github.com/adamlofts/couchdb-requests
Robust CouchDB Python interface using python-requests.
Goals:
Only one way to do something
Fast and stable (connection pooled)
Explicit is better than implicit. Buffer sizes, connection pool size.
Specify query parameters, no **params in query functions

After skimming through the docs of many couchdb python libraries, my choice went to pycouchdb.
All I needed to know was very quick to grasp from the doc: https://py-couchdb.readthedocs.org/en/latest/ and it works like a charm.
Also, it works well with Python 3.

Feedback on using Google App Engine? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
Looking to do a very small, quick 'n dirty side project. I like the fact that the Google App Engine is running on Python with Django built right in - gives me an excuse to try that platform... but my question is this:
Has anyone made use of the app engine for anything other than a toy problem? I see some good example apps out there, so I would assume this is good enough for the real deal, but wanted to get some feedback.
Any other success/failure notes would be great.

I have tried app engine for my small quake watch application
http://quakewatch.appspot.com/
My purpose was to see the capabilities of app engine, so here are the main points:
it doesn't come by default with Django, it has its own web framework which is pythonic has URL dispatcher like Django and it uses Django templates
So if you have Django exp. you will find it easy to use
But you can use any pure python framework and Django can be easily added see
http://code.google.com/appengine/articles/django.html
google-app-engine-django (http://code.google.com/p/google-app-engine-django/) project is excellent and works almost like working on a Django project
You can not execute any long running process on server, what you do is reply to request and which should be quick otherwise appengine will kill it
So if your app needs lots of backend processing appengine is not the best way
otherwise you will have to do processing on a server of your own
My quakewatch app has a subscription feature, it means I had to email latest quakes as they happend, but I can not run a background process in app engine to monitor new quakes
solution here is to use a third part service like pingablity.com which can connect to one of your page and which executes the subscription emailer
but here also you will have to take care that you don't spend much time here
or break task into several pieces
It provides Django like modeling capabilities but backend is totally different but for a new project it should not matter.
But overall I think it is excellent for creating apps which do not need lot of background processing.
Edit:
Now task queues can be used for running batch processing or scheduled tasks
Edit:
after working/creating a real application on GAE for a year, now my opnion is that unless you are making a application which needs to scale to million and million of users, don't use GAE. Maintaining and doing trivial tasks in GAE is a headache due to distributed nature, to avoid deadline exceeded errors, count entities or do complex queries requires complex code, so small complex application should stick to LAMP.
Edit:
Models should be specially designed considering all the transactions you wish to have in future, because entities only in same entity group can be used in a transaction and it makes the process of updating two different groups a nightmare e.g. transfer money from user1 to user2 in transaction is impossible unless they are in same entity group, but making them same entity group may not be best for frequent update purposes....
read this http://blog.notdot.net/2009/9/Distributed-Transactions-on-App-Engine

I am using GAE to host several high-traffic applications. Like on the order of 50-100 req/sec. It is great, I can't recommend it enough.
My previous experience with web development was with Ruby (Rails/Merb). Learning Python was easy. I didn't mess with Django or Pylons or any other framework, just started from the GAE examples and built what I needed out of the basic webapp libraries that are provided.
If you're used to the flexibility of SQL the datastore can take some getting used to. Nothing too traumatic! The biggest adjustment is moving away from JOINs. You have to shed the idea that normalizing is crucial.
Ben

One of the compelling reasons I have come across for using Google App Engine is its integration with Google Apps for your domain. Essentially it allows you to create custom, managed web applications that are restricted to the (controlled) logins of your domain.
Most of my experience with this code was building a simple time/task tracking application. The template engine was simple and yet made a multi-page application very approachable. The login/user awareness api is similarly useful. I was able to make a public page/private page paradigm without too much issue. (a user would log in to see the private pages. An anonymous user was only shown the public page.)
I was just getting into the datastore portion of the project when I got pulled away for "real work".
I was able to accomplish a lot (it still is not done yet) in a very little amount of time. Since I had never used Python before, this was particularly pleasant (both because it was a new language for me, and also because the development was still fast despite the new language). I ran into very little that led me to believe that I wouldn't be able to accomplish my task. Instead I have a fairly positive impression of the functionality and features.
That is my experience with it. Perhaps it doesn't represent more than an unfinished toy project, but it does represent an informed trial of the platform, and I hope that helps.

The "App Engine running Django" idea is a bit misleading. App Engine replaces the entire Django model layer so be prepared to spend some time getting acclimated with App Engine's datastore which requires a different way of modeling and thinking about data.

I used GAE to build http://www.muspy.com
It's a bit more than a toy project but not overly complex either. I still depend on a few issues to be addressed by Google, but overall developing the website was an enjoyable experience.
If you don't want to deal with hosting issues, server administration, etc, I can definitely recommend it. Especially if you already know Python and Django.

I think App Engine is pretty cool for small projects at this point. There's a lot to be said for never having to worry about hosting. The API also pushes you in the direction of building scalable apps, which is good practice.
app-engine-patch is a good layer between Django and App Engine, enabling the use of the auth app and more.
Google have promised an SLA and pricing model by the end of 2008.
Requests must complete in 10 seconds, sub-requests to web services required to complete in 5 seconds. This forces you to design a fast, lightweight application, off-loading serious processing to other platforms (e.g. a hosted service or an EC2 instance).
More languages are coming soon! Google won't say which though :-). My money's on Java next.

This question has been fully answered. Which is good.
But one thing perhaps is worth mentioning.
The google app engine has a plugin for the eclipse ide which is a joy to work with.
If you already do your development with eclipse you are going to be so happy about that.
To deploy on the google app engine's web site all I need to do is click one little button - with the airplane logo - super.

Take a look the the sql game, it is very stable and actually pushed traffic limits at one point so that it was getting throttled by Google. I have seen nothing but good news about App Engine, other than hosting you app on servers someone else controls completely.

I used GAE to build a simple application which accepts some parameters, formats and send email. It was extremely simple and fast. I also made some performance benchmarks on the GAE datastore and memcache services (http://dbaspects.blogspot.com/2010/01/memcache-vs-datastore-on-google-app.html ). It is not that fast. My opinion is that GAE is serious platform which enforce certain methodology. I think it will evolve to the truly scalable platform, where bad practices simply not allowed.

I used GAE for my flash gaming site, Bearded Games. GAE is a great platform. I used Django templates which are so much easier than the old days of PHP. It comes with a great admin panel, and gives you really good logs. The datastore is different than a database like MySQL, but it's much easier to work with. Building the site was easy and straightforward and they have lots of helpful advice on the site.

I used GAE and Django to build a Facebook application. I used http://code.google.com/p/app-engine-patch as my starting point as it has Django 1.1 support. I didn't try to use any of the manage.py commands because I assumed they wouldn't work, but I didn't even look into it. The application had three models and also used pyfacebook, but that was the extent of the complexity. I'm in the process of building a much more complicated application which I'm starting to blog about on http://brianyamabe.com.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.