Database Management on Django and Github

Database Management on Django and Github - python

I am trying to set up a website using the Django Framework. Because of it's convenience, I had choosen SQLite as my database since the start of my project. It's very easy to use and I was very happy with this solution.
Being a new developer, I am quite new to Github and database management. Since SQLite databases are located in a single file, I was able to push my updates on Github until that .db file reached a critical size larger than 100MB. Since then, it seems my file is too large to push on my repository (for others having the same problem I found satisfying answers here: GIT: Unable to delete file from repo).
Because of this problem, I am now considering an alternative solution:
Since my website will require users too interact with my database (they are expected post a certain amount data), I am thinking about switching SQLite for MySQL. I was told MySQL will handle better the user inputs and will scale more easily (I dare to expect a large volume of users). This is the first part of my question. Is switching to MySQL after having used SQLite for a while a good idea/good practice or will it lead to migration problems?
If the answer to that first question is yes, then I have other questions about how to handle this change. Since SQLite is serverless, I will have to set up a new server for MySQL. Will I be able to access my data remotely with that server? Since I used to push my database on my Github repository, this is where I use to get my data from when I wanted to work remotely. Will there be a way for me to host my data on a server (hopefully for free) and fetch it the same way I fetch my code on Github?
Thank you very much for your help and I hope you have a nice day.

First of all, you shouldn't be uploading any sensitive data to your repository. That includes database passwords, Django's secret key or the database itself in the case of SQLite.
Answering your first question, there shouldn't be any problem switching from SQLite to MySQL. Django handles migrations exceptionally and SQLite has less features than MySQL. To migrate your data to a mysql database you can use django's dumpdata and loaddata.
Now, your second question is a bit more complicated. You can always expose your database to the Internet, but that is usually not a good idea unless you know exactly what you're doing and know how to secure it properly. If you go this way though, you can just change the database parameters in your settings file to point to your MySQL database's public IP and add the db name, user and password.
My recommendation though is to have one database for development in your dev PC and another in your production server that is behind a firewall and can only be accessed through localhost. I don't think you need the db in your dev pc to be always up to date, if you have some sample data that should be enough.
So, instead of writing sensitive data into the settings file you can have a secrets.json file in the root of your project that looks like this:
{
"secret_key": "YOURSUPERSECRETKEY",
"debug": true, TRUE IN YOUR DEV PC, FALSE IN YOUR PROD SERVER
"allowed_hosts": ["127.0.0.1" , "localhost", "YOUR"],
"db_name": "YOURDBNAME",
"db_user": "YOURDBUSER",
"db_password": "YOURDBPASSWORD",
"db_host": "localhost",
"db_port": 3306
}
This file should be included in your .gitignore so it doesn't get pushed to your repository and you would have one in your local pc and another one with different settings in your production server (you can use vi or nano to create the file).
Then in your settings.py file you can do the following:
import json
BASE_DIR = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
try:
with open(os.path.join(BASE_DIR, 'secrets.json')) as handle:
SECRETS = json.load(handle)
except IOError:
SECRETS = {}
SECRET_KEY = SECRETS['secret_key']
ALLOWED_HOSTS = SECRETS['allowed_hosts']
DEBUG = SECRETS['debug']
...
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.mysql',
'NAME': SECRETS['db_name'],
'USER': SECRETS['db_user'],
'PASSWORD': SECRETS['db_password'],
'HOST': SECRETS['db_host'],
'PORT': SECRETS['db_port'],
}
}

Related

How do I upgrade an old Django Google App Engine application to a Second Generation Cloud SQL Instance?

I have a Django application that I wrote 5 years ago, which has been running successfully on Google App Engine - until last month when Google upgraded to Second Generation Cloud SQL.
Currently, I have a settings.py file, which includes a database definition which looks like this:
DATABASES = {
'default': {
'ENGINE': 'google.appengine.ext.django.backends.rdbms',
'INSTANCE': 'my-project-name:my-db-instance',
'NAME': 'my-db-name',
},
Google's upgrade guide, tells me that the connection name needs to change from 'my-project-name:my-db-instance' to 'my-project-name:my-region:my-db-instance'. That's simple enough. Changing this leads me to get the error
InternalError at /login/
(0, u'Not authorized to access instance:
my-project-name:my-region:my-db-instance')
According to this question, I need to add the prefix '/cloudsql/' to my instance name. So, I changed this (and the ENGINE specification) to give:
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.mysql',
'INSTANCE': '/cloudsql/my-project-name:my-region:my-db-instance',
'NAME': 'my-db-name',
'USER' : 'root',
'PASSWORD': '******************',
},
I uploaded the modified file to Google (using gcloud app deploy). This time I get a completely different error screen, showing:
Error: Server Error
The server encountered an error and could not complete your request.
Please try again in 30 seconds.
When I look in the Logs, I see:
ImproperlyConfigured: Error loading MySQLdb module: No module named
MySQLdb
It was pointed out by Daniel Ocando in this question, "The rdbms library will not work with an upgraded Second Generation Cloud SQL instance". Connections to the database now need to be made by means of a Unix domain socket.
Google provide documentation and examples of how to do this. However, I dont find Google's instructions very helpful. For connecting a Python application, they give this example code:
main.py:
db = sqlalchemy.create_engine(
sqlalchemy.engine.url.URL(
drivername="mysql+pymysql",
username=db_user,
password=db_pass,
database=db_name,
query={"unix_socket": "/cloudsql/{}".format(cloud_sql_connection_name)},
),
)
My application doesnt have a main.py file, so I'm really not sure where to put this code. I have looked at the full example code for this in GitHub, but I am none the wiser about what changes I need to make in my settings.py file, or elsewhere in my application.
My question is: Do I really have to go down this route (shifting to use SQLalchemy library), or can I upgrade my Django app to work with a second generation Google cloud SQL instance simply by making some changes in my settings.py file? And, if so, what changes?
My application uses Django 1.4, Python 2.7, and I'm not using the Flask framework which Google suggest.
I learned to use Django purely for the purposes of writing this application 5 years ago, but I have not used it since - so I have forgotten pretty much everything I knew about Django and Python.

A few notes here before I start: Django 1.4 and Python 2.7 are no longer supported, so this configuration may or may not work.
You were on the right track up until the part where you introduced the SQLAlchemy instructions. These aren't Django configurations. You should be able to keep your current DATABASES syntax, provided you complete the following steps:
Ensure you add the settings to your app.yaml to allow your new database.
Ensure you have set the Cloud SQL Client (preferred) permissions to the App Engine service account.
Make sure you have mysqlclient as a Python dependency.
The Django on App Engine Flex guide uses PostgreSQL, but does include some MySQL specific suggestions that should be useful. There's also sample DATABASE configurations there.
Hope this helps!

Many thanks to glasnt for the helpful suggestions. These put me on the right track, and together with some information I found elsewhere, I got my site back up and running again - much to the delight of my client!
Here are the details of what I had to change:
I updated my app.yaml file and added:
beta_settings:
cloud_sql_instance: "my-project-name:my-region:my-db-instance"
and also:
libraries:
- name: MySQLdb
version: "latest"
I updated settings.py, and added:
import MySQLdb
and I updated the DATABASES definition to:
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.mysql',
'HOST': '/cloudsql/my-project-name:my-region:my-db-instance',
'NAME': 'my-db-name',
'USER' : 'root',
'PASSWORD': 'my-db-password',
},
I found that I did not need to set up any permissions for the App Engine Service account. Google's upgrade guide, states:
When you upgrade an instance with an authorized App Engine project
from First Generation to Second Generation, Cloud SQL creates a
special service account that provides the same access as the
authorized App Engine project did before the upgrade. Because this
service account authorizes access only to a specific instance, rather
than the entire project, this service account is not visible in the
IAM service account page, and you cannot update or delete it.
So, for a migrated first-gen application, no permissions need to be configured.

Test database accessibility in Django

My question is very similar to this question
I'm just getting started with Django, and I find myself attempting to learn how it works any time I have a spare moment and my laptop available. I've found that Heroku is a pretty great place to test things, but I can't always reach the internet if I'm waiting to pick up kids, or something similar. In development, I would like to create a test that will check if a DB is accessible. If not, fail over to an SQLite DB.
I started with code heavily borrowed from here:
def pingable(hostname):
try:
return os.system("ping -c 1 " + hostname + " > /dev/null 2>&1") == 0
except:
return False
if (not pingable(DATABASES['default']['HOST'])):
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.sqlite3',
'NAME': os.path.join(BASE_DIR, 'db.sqlite3'),
}
}
I simply plop that code immediately after the DATABASES variable is set. But this has a few weaknesses. The most glaring is that AWS (Which Heroku uses) doesn't respond to pings unless you specifically enable them . . . and honestly, why make things less secure if you don't have to?
So in the interest of not reinventing the wheel, this has led me to ask this question: has someone created a way to check if a Django DB is accessible?
I really only need to check Postgres . . . but I'd really love to find a generic solution, so half credit if you can point me to a solution that only works for Postgres.
Edit: To clarify, the internet itself may be available, but the necessary port(s) may be blocked by a firewall . . . it's hard to know what will be available

This isn't really the way to manage database settings between Heroku and your local dev machine.
Heroku manages all these sorts of settings via environment variables, which is one of the principles of the 12-factor app. They've also made a Django library, dj-database-url, which reads those env vars and automatically configures the settings appropriately.
You should use this for your database settings, and then you can set a local env var DATABASE_URL with the address of your local sqlite3 database. Then your app will automatically run in both dev and production and configure itself to point to the relevant database automatically.

Django - Can we make a connection to different remote database

I am writing a Django application where I already have 1 mysql backend db configured in my settings.py.
I know we can add as many db configurations as we want, but that's hard coding which I don't want.. rather can't possibly do as I have to ad-hockly connect to say, about 70-80 different remote machines and query and fetch the result.
I am planning to connect to those machines via their IP address.
I am comparatively new to Django, so I was wondering if we can somehow, make a function which queries the machine by putting in configuration something like :
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.mysql',
'NAME': 'dbName',
'USER': 'root',
'PASSWORD': 'root',
'HOST': '',
'PORT': '3306'
}
}
So instead of DATABASES and default, I could configure my function to change the configuration, through an Ajax call or something!
Fortunately, every machine I have to connect to uses mysql so no problem with that.
I looked into this mysql -python connector but not sure if I should use it as I already have MySQLDb installed. I also have to do some raw queries too :|
Could anyone guide me for what would be the best approach for this situation?
P.S : I have also looked at this post which discusses about connecting to remote mysql machine from local. But that's of no help for me :( :(

I believe there are quite a few paths you can take, 3 of which are:
Add all your connections in DATABASES using using - which you said you don't want to do because you have so many conections
You could connect using python's mysql library. If you do this I don't think you'll get to use djangos nice ORM
Look at how django wraps connections to allow you to use their ORM. I did some quick searches about manually establishing a connection using django ORM but didn't find anything. All the answers are in the source code. I believe you can just instantiate your own connections and interact with your remote database using the ORM. I don't have time to look through it now, but everything is in their source

Python database WITHOUT using Django (for Heroku)

To my surprise, I haven't found this question asked elsewhere. Short version, I'm writing an app that I plan to deploy to the cloud (probably using Heroku), which will do various web scraping and data collection. The reason it'll be in the cloud is so that I can have it be set to run on its own every day and pull the data to its database without my computer being on, as well as so the rest of the team can access the data.
I used to use AWS's SimpleDB and DynamoDB, but I found SDB's storage limitations to be to small and DDB's poor querying ability to be a problem, so I'm looking for a database system (SQL or NoSQL) that can store arbitrary-length values (and ideally arbitrary data structures) and that can be queried on any field.
I've found many database solutions for Heroku, such as ClearDB, but all of the information I've seen has shown how to set up Django to access the database. Since this is intended to be script and not a site, I'd really prefer not to dive into Django if I don't have to.
Is there any kind of database that I can hook up to in Heroku with Python without using Django?

You can get a database provided from Heroku without requiring your app to use Django. To do so:
heroku addons:add heroku-postgresql:dev
If you need a larger more dedicated database, you can examine the plans at Heroku Postgres
Within your requirements.txt you'll want to add:
psycopg2
Then you can connect/interact with it similar to the following:
import psycopg2
import os
import urlparse
urlparse.uses_netloc.append('postgres')
url = urlparse.urlparse(os.environ['DATABASE_URL'])
conn = psycopg2.connect("dbname=%s user=%s password=%s host=%s " % (url.path[1:], url.username, url.password, url.hostname))
cur = conn.cursor()
query = "SELECT ...."
cur.execute(query)

I'd use MongoDB. Heroku has support for it, so I think it will be really easy to start and scale out: https://addons.heroku.com/mongohq
About Python: MongoDB is a really easy database. The schema is flexible and fits really well with Python dictionaries. That's something really good.
You can use PyMongo
from pymongo import Connection
connection = Connection()
# Get your DB
db = connection.my_database
# Get your collection
cars = db.cars
# Create some objects
import datetime
car = {"brand": "Ford",
"model": "Mustang",
"date": datetime.datetime.utcnow()}
# Insert it
cars.insert(car)
Pretty simple, uh?
Hope it helps.
EDIT:
As Endophage mentioned, another good option for interfacing with Mongo is mongoengine. If you have lots of data to store, you should take a look at that.

I did this recently with Flask. (https://github.com/HexIce/flask-heroku-sqlalchemy).
There are a couple of gotchas:
1. If you don't use Django you may have to set up your database yourself by doing:
heroku addons:add shared-database
(Or whichever database you want to use, the others cost money.)
2. The database URL is stored in Heroku in the "DATABASE_URL" environment variable.
In python you can get it by doing.
dburl = os.environ['DATABASE_URL']
What you do to connect to the database from there is up to you, one option is SQLAlchemy.

Create a standalone Heroku Postgres database. http://postgres.heroku.com

How to run Django's test database only in memory?

My Django unit tests take a long time to run, so I'm looking for ways to speed that up. I'm considering installing an SSD, but I know that has its downsides too. Of course, there are things I could do with my code, but I'm looking for a structural fix. Even running a single test is slow since the database needs to be rebuilt / south migrated every time. So here's my idea...
Since I know the test database will always be quite small, why can't I just configure the system to always keep the entire test database in RAM? Never touch the disk at all. How do I configure this in Django? I'd prefer to keep using MySQL since that's what I use in production, but if SQLite 3 or something else makes this easy, I'd go that way.
Does SQLite or MySQL have an option to run entirely in memory? It should be possible to configure a RAM disk and then configure the test database to store its data there, but I'm not sure how to tell Django / MySQL to use a different data directory for a certain database, especially since it keeps getting erased and recreated each run. (I'm on a Mac FWIW.)

If you set your database engine to sqlite3 when you run your tests, Django will use a in-memory database.
I'm using code like this in my settings.py to set the engine to sqlite when running my tests:
if 'test' in sys.argv:
DATABASE_ENGINE = 'sqlite3'
Or in Django 1.2:
if 'test' in sys.argv:
DATABASES['default'] = {'ENGINE': 'sqlite3'}
And finally in Django 1.3 and 1.4:
if 'test' in sys.argv:
DATABASES['default'] = {'ENGINE': 'django.db.backends.sqlite3'}
(The full path to the backend isn't strictly necessary with Django 1.3, but makes the setting forward compatible.)
You can also add the following line, in case you are having problems with South migrations:
SOUTH_TESTS_MIGRATE = False

I usually create a separate settings file for tests and use it in test command e.g.
python manage.py test --settings=mysite.test_settings myapp
It has two benefits:
You don't have to check for test or any such magic word in sys.argv, test_settings.py can simply be
from settings import *
# make tests faster
SOUTH_TESTS_MIGRATE = False
DATABASES['default'] = {'ENGINE': 'django.db.backends.sqlite3'}
Or you can further tweak it for your needs, cleanly separating test settings from production settings.
Another benefit is that you can run test with production database engine instead of sqlite3 avoiding subtle bugs, so while developing use
python manage.py test --settings=mysite.test_settings myapp
and before committing code run once
python manage.py test myapp
just to be sure that all test are really passing.

MySQL supports a storage engine called "MEMORY", which you can configure in your database config (settings.py) as such:
'USER': 'root', # Not used with sqlite3.
'PASSWORD': '', # Not used with sqlite3.
'OPTIONS': {
"init_command": "SET storage_engine=MEMORY",
}
Note that the MEMORY storage engine doesn't support blob / text columns, so if you're using django.db.models.TextField this won't work for you.

I can't answer your main question, but there are a couple of things that you can do to speed things up.
Firstly, make sure that your MySQL database is set up to use InnoDB. Then it can use transactions to rollback the state of the db before each test, which in my experience has led to a massive speed-up. You can pass a database init command in your settings.py (Django 1.2 syntax):
DATABASES = {
'default': {
'ENGINE':'django.db.backends.mysql',
'HOST':'localhost',
'NAME':'mydb',
'USER':'whoever',
'PASSWORD':'whatever',
'OPTIONS':{"init_command": "SET storage_engine=INNODB" }
}
}
Secondly, you don't need to run the South migrations each time. Set SOUTH_TESTS_MIGRATE = False in your settings.py and the database will be created with plain syncdb, which will be much quicker than running through all the historic migrations.

You can do double tweaking:
use transactional tables: initial fixtures state will be set using database rollback after every TestCase.
put your database data dir on ramdisk: you will gain much as far as database creation is concerned and also running test will be faster.
I'm using both tricks and I'm quite happy.
How to set up it for MySQL on Ubuntu:
$ sudo service mysql stop
$ sudo cp -pRL /var/lib/mysql /dev/shm/mysql
$ vim /etc/mysql/my.cnf
# datadir = /dev/shm/mysql
$ sudo service mysql start
Beware, it's just for testing, after reboot your database from memory is lost!

Another approach: have another instance of MySQL running in a tempfs that uses a RAM Disk. Instructions in this blog post: Speeding up MySQL for testing in Django.
Advantages:
You use the exactly same database that your production server uses
no need to change your default mysql configuration

Extending on Anurag's answer I simplified the process by creating the same test_settings and adding the following to manage.py
if len(sys.argv) > 1 and sys.argv[1] == "test":
os.environ.setdefault("DJANGO_SETTINGS_MODULE", "mysite.test_settings")
else:
os.environ.setdefault("DJANGO_SETTINGS_MODULE", "mysite.settings")
seems cleaner since sys is already imported and manage.py is only used via command line, so no need to clutter up settings

Use below in your setting.py
DATABASES['default']['ENGINE'] = 'django.db.backends.sqlite3'

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.