sqlalchemy identity map question

sqlalchemy identity map question - python

The identity map and unit of work patterns are part of the reasons sqlalchemy is much more attractive than django.db. However, I am not sure how the identity map would work, or if it works when an application is configured as wsgi and the orm is accessed directly through api calls, instead of a shared service. I would imagine that apache would create a new thread with its own python instance for each request. Each instance would therefore have their own instance of the sqlalchemy classes and not be able to make use of the identity map. Is this correct?

I think you misunderstood the identity map pattern.
From : http://martinfowler.com/eaaCatalog/identityMap.html
An Identity Map keeps a record of all
objects that have been read from the
database in a single business
transaction.
Records are kept in the identity map for a single business transaction. This means that no matter how your web server is configured, you probably will not hold them for longer than a request (or store them in the session).
Normally, you will not have many users taking part in a single business transation. Anyway, you probably don't want your users to share objects, as they might end up doing things that are contradictory.

So this all depends on how you setup your sqlalchemy connection. Normally what you do is to manage each wsgi request to have it's own threadlocal session. This session will know about all of the goings-on of it, items added/changed/etc. However, each thread is not aware of the others. In this way the loading/preconfiguring of the models and mappings is shared during startup time, however each request can operate independent of the others.

Related

How can I have an atomic block of code that doesn't access the database in Django?

I am working on a project that uses Django and Django REST Framework. In one of the views there's a method F() that does the following:
Fetches data from the database (read operation)
Sends a create (POST) request to a 3rd party API. (although not local, this is a write operation and this is where a race condition might take place)
Returns JSON data
I'd like F() to be atomic, in other words, if the server receives multiple requests at the same time asking for this view, the server should handle one request at a time and not allow multiple threads to access this block of code simultaneously. How can this be achieved? I have read that Django provides transactions.atomic() but this guarantees atomicity of database transactions, what I need is atomicity for a whole block of code regardless of whether it accesses the database or not.

The concept you are looking for is a "mutex" or a "lock". This article may guide you in the right direction https://lincolnloop.com/blog/distributed-locking-django/

Do I need to use SQLAlchemy sessions?

The official tutorial for SQLAlchemy provides examples that make use of the session system, such as the following:
>>> from sqlalchemy.orm import sessionmaker
>>> Session = sessionmaker(bind=engine)
Many unofficial tutorials also make use of sessions, however some don't make use of them at all, instead opting for whatever one would call this approach:
e = create_engine('sqlite:///company.db')
conn = e.connect()
query = conn.execute("SELECT first_name FROM employee")
Why are sessions needed at all when this much simpler system seems to do the same thing? The official documentation doesn't make it clear why this would be necessary, as far as I've seen.
There is one particularily relevant section in the official SQLAlchemy documentation:
A web application is the easiest case because such an application is already constructed around a single, consistent scope - this is the request, which represents an incoming request from a browser, the processing of that request to formulate a response, and finally the delivery of that response back to the client. Integrating web applications with the Session is then the straightforward task of linking the scope of the Session to that of the request. The Session can be established as the request begins, or using a lazy initialization pattern which establishes one as soon as it is needed. The request then proceeds, with some system in place where application logic can access the current Session in a manner associated with how the actual request object is accessed. As the request ends, the Session is torn down as well, usually through the usage of event hooks provided by the web framework. The transaction used by the Session may also be committed at this point, or alternatively the application may opt for an explicit commit pattern, only committing for those requests where one is warranted, but still always tearing down the Session unconditionally at the end.
...and...
Some web frameworks include infrastructure to assist in the task of
aligning the lifespan of a Session with that of a web request. This
includes products such as Flask-SQLAlchemy, for usage in conjunction
with the Flask web framework, and Zope-SQLAlchemy, typically used with
the Pyramid framework. SQLAlchemy recommends that these products be
used as available.
Unfortunately I still can't tell whether I need to use sessions, or if the final paragraph is implying that certain implementation such as Flask-SQLAlchemy are already managing sessions automatically.
Do I need to use sessions? Is there a significant risk to not using sessions? Am I already using sessions because I'm using Flask-SQLAlchemy?

Like you pointed out, sessions are not strictly necessary if you only construct and execute queries using plain SQLAlchemy Core. However, they provide higher layer of abstraction required to take advantage of SQLAlchemy ORM. Session maintains a graph of modified models and makes sure the changes are efficiently and consistently flushed to the database when necessary.
Since you already use Flask-SQLAlchemy, I don't see a reason to avoid sessions even if you don't need the ORM features. The extension handles all the plumbing necessary for isolating requests, so that you don't have to reinvent the wheel and can focus on your application code.

You need sessions if you want to use ORM features including:
Changing some attribute on an object returned by a query and having it easily written back to the database
create an object using normal python constructors and have it easily sent back to the database
Have relationships on objects. So, for example if you had a blog, and you had a post object, be able to write post.author to access the User object responsible for that post.
I'll note also that even if you use Session, you don't actually need sessionmaker. In a web application, you probably want it, but you can use sessions like this if you're looking for simplicity:
from sqlalchemy import create_engine
from sqlalchemy.orm import Session
engine = create_engine(...)
session = Session(bind = engine)
Again, I think you probably want sessionmaker in your app, but you don't need it for sessions to work

Django and Rails with one common DB

I have earlier worked on Java+Spring to create a web-app.
I have to build a new web-app now.
It will have one centralized db.
There will be two different type of instance of web-app.
Web-App 1:
a) It would have nothing to UI render, no html,js etc.
b) All it need to give is some set of rest API which will
b.1) create some new entries in DB
b.2) modify some entries in DB
b.3) retrieve some of DB records in JSON format.
some frontend code ( doesn't belong to this app) will periodically fetch
this details.
c) it will be used by max by 100,000 people but at a given point of time,
we can expect about 1000 users logged in and doing whats being said in b)
Web-App2 :
a) It will have some dashboards
b) 90% of DB operations would be read operations
c) 10% of DB operations would be write/modify
d) There will be about 1000s of user of this system and at any given point of time
hardly 50-1000 people will be accessing it.
I am thinking of following.
Have Web-App 1 created in python+Django and Web-App 2 created in RoR.
I am planning to use to Dynamo DB and memcache.
Why two different frameworks?
1) So that I get to learn both of them
2) There have been concern about scalability in RoR (and I also know people claim its not there), Web-app 1 may have scaling needs in future.
My questions is Do you see any problem with this combination?
for example active records would want you to use specific namings format for your data base tables? Are there any other concerns similar to this?
Anyone else who have used similar technology stack?
both frameworks are full stack framework and and provide MVC, templating, unit testing, security, db migration, caching, security, ORMs.

For my startup, we also needed to put out a full fleshed website along with an API. We are also using DynamoDB for storing most of the data and are only using MySQL for session info.
I opted to use Ruby on Rails for the Webapp and Sinatra for the API. If you're criteria is simply learning as many new things as possible, then it would make sense to opt for relatively different stacks (django/python and RoR). In our case, we went with sinatra because it's essentially a very lightweight wrapper around Rack and perfect for an API which essentially receives requests, calls one or more services or does some processing and hands out a formatted response. While I don't see any problem with using python/django instead of sinatra, in our case the benefit was having to spend less time working with a different language.
Also, scalability in rails is a bit of an iffy subject. In the end, it's about how you use it. We've had no issues scaling rails with unicorn and nginx. Our business logic is all in the API service and the rails server as well uses the API for most of the work. This means we don't use active record on rails and the website is just another consumer for our API which does all the heavy lifting whether the request comes from an app or the website. Using MySQL for the session store ensures we can route requests to any of the application servers without having to worry about always routing requests from the same client to the same server every time. This allows us to ramp up and down easily only considering the amount of traffic we're getting.
At the time we started working on this, there wasn't an ORM for dynamo db which looked and felt just like active record, so we ended up writing a few high level classes of our own to handle storage and retrieval of models on DynamoDb. Considering DynamoDB is not tailored for scans or joins, this didn't take a lot of effort since we were almost always doing lookups based on keys and ranges. This meant we didn't really need a replacement for active record since the real strength of active record is being able to intuitively do joins, etc. by convention.
DynamoDB does have it's limitations though and you might find yourself in situations where you will need to scan a large number of records. In our case, we also use CloudSearch to index some important info and use it as a fallback for cases when we need to do text based searches which need to scan all our data.

standard way to handle user session in tornado

So, in order to avoid the "no one best answer" problem, I'm going to ask, not for the best way, but the standard or most common way to handle sessions when using the Tornado framework. That is, if we're not using 3rd party authentication (OAuth, etc.), but rather we have want to have our own Users table with secure cookies in the browser but most of the session info stored on the server, what is the most common way of doing this? I have seen some people using Redis, some people using their normal database (MySQL or Postgres or whatever), some people using memcached.
The application I'm working on won't have millions of users at a time, or probably even thousands. It will need to eventually get some moderately complex authorization scheme, though. What I'm looking for is to make sure we don't do something "weird" that goes down a different path than the general Tornado community, since authentication and authorization, while it is something we need, isn't something that is at the core of our product and so isn't where we should be differentiating ourselves. So, we're looking for what most people (who use Tornado) are doing in this respect, hence I think it's a question with (in theory) an objectively true answer.
The ideal answer would point to example code, of course.

Here's how it seems other micro frameworks handle sessions (CherryPy, Flask for example):
Create a table holding session_id and whatever other fields you'll want to track on a per session basis. Some frameworks will allow you to just store this info in a file on a per user basis, or will just store things directly in memory. If your application is small enough, you may consider those options as well, but a database should be simpler to implement on your own.
When a request is received (RequestHandler initialize() function I think?) and there is no session_id cookie, set a secure session-id using a random generator. I don't have much experience with Tornado, but it looks like setting a secure cookie should be useful for this. Store that session_id and associated info in your session table. Note that EVERY user will have a session, even those not logged in. When a user logs in, you'll want to attach their status as logged in (and their username/user_id, etc) to their session.
In your RequestHandler initialize function, if there is a session_id cookie, read in what ever session info you need from the DB and perhaps create your own Session object to populate and store as a member variable of that request handler.
Keep in mind sessions should expire after a certain amount of inactivity, so you'll want to check for that as well. If you want a "remember me" type log in situation, you'll have to use a secure cookie to signal that (read up on this at OWASP to make sure it's as secure as possible, thought again it looks like Tornado's secure_cookie might help with that), and upon receiving a timed out session you can re-authenticate a new user by creating a new session and transferring whatever associated info into it from the old one.

Tornado designed to be stateless and don't have session support out of the box.
Use secure cookies to store sensitive information like user_id.
Use standard cookies to store not critical information.
For storing large objects - use standard scheme - MySQL + memcache.

The key issue with sessions is not where to store them, is to how to expire them intelligently. Regardless of where sessions are stored, as long as the number of stored sessions is reasonable (i.e. only active sessions plus some surplus are stored), all this data is going to fit in RAM and be served fast. If there is a lot of old junk you may expect unpredictable delays (the need to hit the disk to load the session).

There isn't anything built directly into Tornado for this purpose. As others have commented already, Tornado is designed to be a very fast async framework. It is lean by design. However, it is possible to hook in your own session management capability. You need to add a preamble section to each handler that would create or grab a session container. You will need to store the session ID in a cookie. If you are not strictly HTTPS then you will want to use a secure cookie. The session persistence can be any technology of your choosing such as Redis, Postgres, MySQL, a file store, etc...
There is a Github project that provides session management for Tornado. Even if you decide not to use it, it can provide insight into how to structure your own session management. The Github project is called dustdevil. Full disclosure - we created this several years ago but find it very easy to use and have it in active use today.

How to make process wide editable variable (storage) in django?

I want create project wide accessible storage for project/application settings.
What i want to achieve: - Each app has it's own app specific settings stored in db - When you spawn django wsgi process each settings are stored in memory storage and are available project wide - Whenever you change any setting value in db there is a call to regenerate storage from db
So it works very close to cache but I can't use cache mechanism because it's serializing data. I can also use memcache for that purpose but i want to develop generic solution (not always you have access to memcache).
If anyone have any ideas to solve my problem i would be really gratefully for sharing.

Before giving any specific advice to you, you need to be aware of the limitations of these systems.
ISSUES
Architectural Differences between Django and PHP/other popular language.
PHP re-reads and re-evaluates the entire code tree every time a page is accessed. This allows PHP to re-read settings from DB/cache/whatever on every request.
Initialisation
Many django apps initialise their own internal cache of settings and parameters and perform validation on these parameters at module import time (server start) rather than on every request. This means you would need a server restart anyway when modifying any settings for non db-settings enabled apps, so why not just change settings.py and restart server?
Multi-site/Multi-instance and in-memory settings
In Memory changes are strictly discouraged by Django docs because they will only affect a single server instance. In the case of multiple sites (contrib.sites), only a single site will recieve updates to shared settings. In the case of instanced servers (ep.io/gondor) any changes will only be made to a local instance, not every instance running for your site. Tread carefully
PERFORMANCE
In Memory Changes
Changing settings values while the server is running is strictly discouraged by django docs for the reasons outlined above. However there is no performance hit with this option. USE ONLY WITHIN THE CONFINES OF SPECIFIC APPS MADE BY YOU AND SINGLE SERVER/INSTANCE.
Cache (redis-cache/memcached)
This is the intermediate speed option. Reasonably fast lookup of settings which can be deserialised into complex python structures easily - great for dict-configs. IMPORTANT is that values are shared among Sites/Instances safely and are updated atomically.
DB (SLOOOOW)
Grabbing one setting at a time from DB will be very very slow unless you hack in connection pooling. grabbing all settings at once is faster but increases db transfer on every single request. Settings synched between Sites/Instances safely. Use only for 1 or 2 settings and it would be reasonable to use.
CONCLUSION
Storing configuration values in database/cache/mem can be done, but you have to be aware of the caveats, and the potential performance hit. Creating a generic replacement for settings.py will not work, however creating a support app that stores settings for other apps could be a viable solution, as long as the other apps accept that settings must be reloaded every request.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.