I have a PostgreSQL schema that resides in a schema.sql file that gets run each time a database connection is initiated in Python. It looks something like:
CREATE TABLE IF NOT EXISTS users (
id SERIAL PRIMARY KEY,
facebook_id TEXT NOT NULL,
name TEXT NOT NULL,
access_token TEXT,
created TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW()
);
The app is deployed on Heroku, using their PostgreSQL and everything works as expected.
Now, what if I want to change a bit the structure of my users table? How can I do this the easiest and the best way? I thought of writing an ALTER... line in schema.sql for each change I want to produce in the database, but I don't think this is the best approach, since after some time the schema file will be full of ALTERs and it will slow down my app.
What's the indicated way to deploy changes made to a database?
Running a hard-coded script on each connection is not a great way to handle schema management.
You need to either manage the schema manually, or use a full-fledged tool that keeps a schema version identifier in the database, checks that, and applies a script to upgrade to the next schema version if it's different to the latest one. Rails calls this "migrations" and it kind-of works. If you're using Django it has schema management too.
If you're not using a framework like that, I suggest just writing your own schema upgrade scripts. Add a "schema_version" table with a single row. SELECT it when the app first starts after a redeploy and if it's lower than the current version the app knows about, apply the update script(s) in order, eg schema_1_to_2, schema_2_to_3, etc.
I don't recommend doing this on connect, do it on app start, or better, as a special maintenance command. If you do it on every connection you'll have multiple connections trying to make the same changes and you'll land up with duplicated columns and all sorts of other mess.
I support several django apps on heroku with Postgres. I just connect via PgAdmin and run my scripts when changes are required. I don't see any need for running a script every time a connection is made.
Related
I am using Postgres DB and I have two databases that I am using.
Consider this example as I have to make two version of the software. In one version there is the main DB used for real data and second DB for test version of the software for trainee person.
Now how do I change the default DB according to the logged in User.
P.S: I know I can use 'using(db_name)'. But I don't want that. I want to change the default DB in settings.py dynamically.
The puspose is to provide a test version to the client with a seperate DB.
I am writing a REST API using flask_restful and managing the mysql db using flask-sqlalchemy. I would like to know what the best practice for loading existing data into a table when the app starts is.
I am currently calling the db.create_all() method withing an endpoint with the #app.before_first_request decorator. I would like to then fill in one of the tables created with existing data from a csv file. Should the code to push the data in a separate script or within the function?
Thanks!
I would separate loading initial database data from application initialization, because probably initial data from my experience would not be changed often and can take some time if file is bigger, and usually you don't need to reload it in the database each time application loads.
I think you will most certainly need database migrations at some point in your application development, so I would suggest setting up Flask-Migrate to handle that, and running its upgrade method on application creation (create_app method if you are using Flask application factories pattern) which will handle database migrations. I am saying this since it will save you some headache when you are introducing it later on database already populated with actual data which is initialized with db.create_all().
And for populating database with seed data I would go with Flask CLI or Flask-Script. In one of my recent projects I used Flask-Script for this, and created separate manage.py file which amongst other application management methods contained initial data seeding method which looked something like this:
#manager.command
def seed():
"Load initial data into database."
db.session.add(...)
db.session.commit()
And it was run on demand by following command:
python manage.py seed
My database on Amazon currently has only a little data in it (I am making a web app but it is still in development) and I am looking to delete it, make changes to the schema, and put it back up again. The past few times I have done this, I have completely recreated my elasticbeanstalk app, but there seems like there is a better way. On my local machine, I will take the following steps:
"dropdb databasename" and then "createdb databasename"
python manage.py makemigrations
python manage.py migrate
Is there something like this that I can do on amazon to delete my database and put it back online again without deleting the entire application? When I tried just deleting the RDS instance a while ago and making a new one, I was having problems with elasticbeanstalk.
The easiest way to accomplish this is to SSH to one of your EC2 instances, that has acccess to the RDS DB, and then connect to the DB from there. Make sure that your python scripts can read your app configuration to access the configured DB, or add arguments for DB hostname. To drop and create your DB, you must just add the necessary arguments to connect to the DB. For example:
$ createdb -h <RDS endpoint> -U <user> -W ebdb
You can also create a RDS snapshot when the DB is empty, and use the RDS instance actions Restore to Point in Time or Migrate Latest Snapshot.
I had the same problem and came up with a workaround. In your python code just add and run the following method when deploying your app the next time:
FOR SQLALCHEMY AFTER VERSION 2.0
from sqlalchemy import create_engine, text
tables = ["table1_name", "table2_name"] # the names of the tables you want to delte
engine = create_engine("sqlite:///example.db") # here you create your engine
def delete_tables(tables):
for table in tables:
sql = text(f"DROP TABLE IF EXISTS {table} CASCADE;") # CASCADE deltes the tables even if they had some connections to other tables
with engine.connect() as connection:
with connection.begin():
connection.execute(sql)
delete_tables(tables) # Comment this line out after running it once.
FOR SQLALCHEMY BEFORE VERSION 2 (I guess)
def delete_tables(tables):
for table in tables:
engine.execute(f"DROP TABLE IF EXISTS {table} CASCADE;")
delete_tables(tables) # Comment this line out after running it once.
After you deployed and ran this code 1 time, all your tables will be deleted.
IMPORTANT: Delete or comment out this code after that, otherwise you will delete all your tables every time when you deploy your code to AWS
I'm a Django beginner and have developed 1 app using mysql as primary DB, but in my next project I have to use Cassandra db using https://github.com/cqlengine/cqlengine but do not use https://github.com/r4fek/django-cassandra-engine (which is a wrapper over cqlengine?).
I dont have any clue How do I start? I mean how and where should I create db connection and then create models in models.py file?
Should I create connection in init.py file?in views.py? what would be the most efficient way?
would be great(for future readers too) if someone provide a simple configuration and a model.
The twissandra demo should be a good example of how to build an app using Cassandra and Django.
In this implementation there is no models.py and the connection is maintained in the file cass.py.
You'll see cass.py also hosts all the functions required to return data from the C* database and make objects which are used by the system. This is where you would swap out the api requests with your CqlEngine code.
I hope these resources get you pointed in the right direction
Rustyrazorblade shows an easy way to accomplish this via his CQLEngine tutorial branch HERE.
You can easily setup the connection by doing something like this in your_app_project/models/connection.py:
from cqlengine import management
from cqlengine.connection import setup
def connect():
setup(["127.0.0.1", "127.0.1.1", "127.0.1.2"], "tutorial", retry_connect=True)
management.create_keyspace("tutorial", replication_factor=1, strategy_class="SimpleStrategy")
In this example: "tutorial" is the keyspace we are using, strategy_class is the replication strategy your C* instance is using, replication_factor is the amount of replications that will be stored throughout the ring, 127.0.0.1 is a Cassandra cluster node IP address (you can pass this a list or a string) and retry_connect specifies whether or not you would like it to attempt to reconnect if there is a connection failure.
From here, it is very easy for new C* users to get confused. You can call this anytime Before syncing the C* tables or using a C* query.
So, you'll want to do something like:
from cqlengine.management import sync_table
from models.connection import connect
from models.somemodels import MyCassandraModel
# This will fire off our previously setup 'connect' method
connect()
# This will setup the Model as a table in your C* DB
sync_table(MyCassandraModel)
You can even drop this into manage.py, just as long as that CQLEngine setup() is properly executed.
To my surprise, I haven't found this question asked elsewhere. Short version, I'm writing an app that I plan to deploy to the cloud (probably using Heroku), which will do various web scraping and data collection. The reason it'll be in the cloud is so that I can have it be set to run on its own every day and pull the data to its database without my computer being on, as well as so the rest of the team can access the data.
I used to use AWS's SimpleDB and DynamoDB, but I found SDB's storage limitations to be to small and DDB's poor querying ability to be a problem, so I'm looking for a database system (SQL or NoSQL) that can store arbitrary-length values (and ideally arbitrary data structures) and that can be queried on any field.
I've found many database solutions for Heroku, such as ClearDB, but all of the information I've seen has shown how to set up Django to access the database. Since this is intended to be script and not a site, I'd really prefer not to dive into Django if I don't have to.
Is there any kind of database that I can hook up to in Heroku with Python without using Django?
You can get a database provided from Heroku without requiring your app to use Django. To do so:
heroku addons:add heroku-postgresql:dev
If you need a larger more dedicated database, you can examine the plans at Heroku Postgres
Within your requirements.txt you'll want to add:
psycopg2
Then you can connect/interact with it similar to the following:
import psycopg2
import os
import urlparse
urlparse.uses_netloc.append('postgres')
url = urlparse.urlparse(os.environ['DATABASE_URL'])
conn = psycopg2.connect("dbname=%s user=%s password=%s host=%s " % (url.path[1:], url.username, url.password, url.hostname))
cur = conn.cursor()
query = "SELECT ...."
cur.execute(query)
I'd use MongoDB. Heroku has support for it, so I think it will be really easy to start and scale out: https://addons.heroku.com/mongohq
About Python: MongoDB is a really easy database. The schema is flexible and fits really well with Python dictionaries. That's something really good.
You can use PyMongo
from pymongo import Connection
connection = Connection()
# Get your DB
db = connection.my_database
# Get your collection
cars = db.cars
# Create some objects
import datetime
car = {"brand": "Ford",
"model": "Mustang",
"date": datetime.datetime.utcnow()}
# Insert it
cars.insert(car)
Pretty simple, uh?
Hope it helps.
EDIT:
As Endophage mentioned, another good option for interfacing with Mongo is mongoengine. If you have lots of data to store, you should take a look at that.
I did this recently with Flask. (https://github.com/HexIce/flask-heroku-sqlalchemy).
There are a couple of gotchas:
1. If you don't use Django you may have to set up your database yourself by doing:
heroku addons:add shared-database
(Or whichever database you want to use, the others cost money.)
2. The database URL is stored in Heroku in the "DATABASE_URL" environment variable.
In python you can get it by doing.
dburl = os.environ['DATABASE_URL']
What you do to connect to the database from there is up to you, one option is SQLAlchemy.
Create a standalone Heroku Postgres database. http://postgres.heroku.com