Inserting Unicode values on alembic migration

Inserting Unicode values on alembic migration - python

I'm working on a small pet-project that involves some accounting in multiple currencies. During its development I decided to move from straight-forward DB setting to DB-migrations using alembic. And on some migrations I need to populate DB with initial currencies, that are displayed in Ukrainian.
My problem is that data populated from alembic migration scripts is saving in some unknown encoding, so I cannot use it within the application (that expects to be human readable). My settings as well as script are as follows:
alembic.ini
...
sqlalchemy.url = mysql+pymysql://defaultuser:defaultpwd#localhost/petdb
...
alembic/versions/f433ab2a814_adding_currency.py
from alembic import op
# -*- coding: utf-8 -*-
"""Adding currency
Revision ID: f433ab2a814
Revises: 49538bba2220
Create Date: 2016-03-08 13:50:35.369021
"""
# revision identifiers, used by Alembic.
revision = 'f433ab2a814'
down_revision = '1c0b47263c82'
branch_labels = None
depends_on = None
def upgrade():
op.create_table(
'currency',
Column('id', Integer, primary_key=True),
Column('name', Unicode(120), nullable=False),
Column('abbr', String(3), nullable=False)
)
op.execute(u'INSERT INTO currency SET name="{}", abbr="{}";'.format(u"Гривня", "UAH"))
After checking table currency from mysql client or mysql-workbench, it is displayed as:
mysql> SELECT * FROM currency;
+----+----------------------------+------+
| id | name | abbr |
+----+----------------------------+------+
| 1 | Ð“Ñ€Ð¸Ð²Ð½Ñ | UAH |
+----+----------------------------+------+
Expected result is:
mysql> SELECT * FROM currency;
+----+----------------------------+------+
| id | name | abbr |
+----+----------------------------+------+
| 1 | Гривня | UAH |
+----+----------------------------+------+
From my application I've been setting this value as follows:
from petproject import app
app.config.from_object(config.DevelopmentConfig)
engine = create_engine(app.config["DATABASE"]+"?charset=utf8",
convert_unicode=True, encoding="utf8", echo=False)
db_session = scoped_session(sessionmaker(autocommit=False,
autoflush=False,
bind=engine))
if len(db_session.query(Currency).all()) == 0:
default_currency = Currency()
default_currency.name = u"Гривня"
default_currency.abbr = u"UAH"
db_session.add(default_currency)
db_session.commit()
So I'm wondering how to insert initial Unicode values on migration that will be stored in correct encoding. Did I miss anything?

After a more extended analysis, I discovered, that MySQL keeps all data in 'windows-1252' encoding. MySQL manual (section "West European Character Sets") states about this issue as:
latin1 is the default character set. MySQL's latin1 is the same as the Windows cp1252 character set.
It looked like either MySQL ignored character_set_client that, I assumed to be 'utf-8', or SQLAlchemy / alembic didn't inform server to accept data as 'UTF-8' encoded data. Unfortunatelly, recommended option '?charset=utf8' is not possible to set in alembic.ini.
In order to accept and save data in correct encoding, I set character set manually by calling op.execute('SET NAMES utf8');. Thus complete code looks like:
def upgrade():
op.create_table(
'currency',
Column('id', Integer, primary_key=True),
Column('name', Unicode(120), nullable=False),
Column('abbr', String(3), nullable=False)
)
op.execute('SET NAMES utf8')
op.execute(u'INSERT INTO currency SET name="{}", abbr="{}";'.format(u"Гривня", "UAH"))
And result became as expected:
mysql> SELECT * FROM currency;
+----+----------------------------+------+
| id | name | abbr |
+----+----------------------------+------+
| 1 | Гривня | UAH |
+----+----------------------------+------+

Related

How to create index on on SQLAlchemy column_property?

Using SQLAlchemy with an SQLite engine, I've got a self-referential hierarchal table that describes a directory structure.
from sqlalchemy import Column, Integer, String, ForeignKey, Index
from sqlalchemy.orm import column_property, aliased, join
from sqlalchemy.ext.declarative import declarative_base
Base = declarative_base()
class Dr(Base):
__tablename__ = 'directories'
id = Column(Integer, primary_key=True)
name = Column(String)
parent_id = Column(Integer, ForeignKey('directories.id'))
Each Dr row only knows it's own "name" and its "parent_id". I've added a recursive column_property called "path" that returns a string containing all of a Dr's ancestors from the root Dr.
root_anchor = (
select([Dr.id, Dr.name, Dr.parent_id,Dr.name.label('path')])
.where(Dr.parent_id == None).cte(recursive=True)
)
dir_alias = aliased(Dr)
cte_alias = aliased(root_anchor)
path_table = root_anchor.union_all(
select([
dir_alias.id, dir_alias.name,
dir_alias.parent_id, cte_alias.c.path + "/" + dir_alias.name
]).select_from(join(
dir_alias, cte_alias, onclause=cte_alias.c.id==dir_alias.parent_id)
))
)
Dr.path = column_property(
select([path_table.c.path]).where(path_table.c.id==Dr.id)
)
Here's an example of the output:
"""
-----------------------------
| id | name | parent_id |
-----------------------------
| 1 | root | NULL |
-----------------------------
| 2 | kid | 1 |
-----------------------------
| 3 | grandkid | 2 |
-----------------------------
"""
sqllite_engine = create_engine('sqlite:///:memory:')
Session = sessionmaker(bind=sqllite_engine)
session = Session()
instance = session.query(Dr).filter(Dr.name=='grandkid').one()
print(instance.path)
# Outputs: "root/kid/grandkid"
I'd like to be able to add an index, or a least a unique constraint, on the "path" property so that unique paths cannot exist more than once in the table. I've tried:
Index('pathindex', Directory.path, unique=True)
...with no luck. No error is raised, but SQLAlchemy doesn't seem to register the index, it just silently ignores it. It still allows adding a duplicate path, e.g.:
session.add(Dr(name='grandkid', parent_id=2))
session.commit()
As further evidence that the Index() was ignored, inspecting the "indexes" property of the table results in an empty set:
print(Dr.__table__.indexes)
#Outputs: set([])
It's essential to me that duplicate paths cannot exist in the database. I'm not sure whether what I'm trying to do with column_property is possible in SQLAlchemy, and if not I'd love to hear some suggestions on how else I can go about this.

I think unique index should suffice, in class Db
__table_args__ = (UniqueConstraint('parent_id', 'name'), )

reflecting every schema from postgres DB using SQLAlchemy

I have an existing database that has two schemas, named schools and students, contained in an instance of declarative_base and through two different classes that inherit from that instance
class DirectorioEstablecimiento(Base):
__table_args__ = {'schema': 'schools'}
__tablename__ = 'addresses'
# some Columns are defined here
and
class Matricula(Base):
__table_args__ = {'schema': 'students'}
__tablename__ = 'enrollments'
# some Columns are defined here
I can use the Base instance to as Base.metadata.create_all(bind=engine) to recreate it in a test DB I have in postgres. I can confirm this was done without problems if I query the pg_namespace
In [111]: engine.execute("SELECT * FROM pg_namespace").fetchall()
2017-12-13 18:04:01,006 INFO sqlalchemy.engine.base.Engine SELECT * FROM pg_namespace
2017-12-13 18:04:01,006 INFO sqlalchemy.engine.base.Engine {}
Out[111]:
[('pg_toast', 10, None),
('pg_temp_1', 10, None),
('pg_toast_temp_1', 10, None),
('pg_catalog', 10, '{postgres=UC/postgres,=U/postgres}'),
('public', 10, '{postgres=UC/postgres,=UC/postgres}'),
('information_schema', 10, '{postgres=UC/postgres,=U/postgres}'),
('schools', 16386, None),
('students', 16386, None)]
and from the psql CLI
user# select * from pg_tables;
schemaname | tablename | tableowner | tablespace | hasindexes | hasrules | hastriggers | rowsecurity
--------------------+------------------------------+------------+------------+------------+----------+-------------+-------------
schools | addresses | diego | | t | f | f | f
students | enrollments | diego | | t | f | f | f
pg_catalog | pg_statistic | postgres | | t | f | f | f
pg_catalog | pg_type | postgres | | t | f | f | f
pg_catalog | pg_authid | postgres | pg_global | t | f | f | f
pg_catalog | pg_user_mapping | postgres | | t | f | f | f
-- other tables were omitted
However, if I want to reflect that database in some other instance of declarative_base nothing is reflected.
Something like
In [87]: Base.metadata.tables.keys()
Out[87]: dict_keys(['schools.addresses', 'students.enrollments'])
In [88]: new_base = declarative_base()
In [89]: new_base.metadata.reflect(bind=engine)
In [90]: new_base.metadata.tables.keys()
Out[90]: dict_keys([])
I understand that reflect accepts a schema as a parameter but I would like to obtain all of them at once during reflection. For some reason I can achieve this one at a time.
Is there a way to do this?

When you call metadata.reflect() it will only reflect the default schema (the first in your search_path for which you have permissions). So if your search_path is public,students,school it will only reflect the tables in schema public. If you do not have permissions on schema public, public schema will be skipped and will default to reflect only students.
The default schema is retrieved by SELECT current_schema();
In order to reflect other schemas
you need to call metadata.reflect() for each schema.
metadata.reflect(schema='public') # will reflect even if you do not have permissions on the tables in schema `public`, as long as you have access to pg_* system tables
metadata.reflect(schema='students')
metadata.reflect(schema='schools')
Note: When you reflect with an explicit schema
Reflected tables in metadata.tables will have the keys with the tables fully qualified schema name as in schema1.mytable, schema2.mytable
Any conflicting table names will be replaced with the later one. If you have any tables with the same name, you should implement your the function classname_for_table to prefix the names with the schema name.
An example of prefixing table names with the schema
def classname_for_table(base, tablename, table):
schema_name = table.schema
fqname = '{}.{}'.format(schema_name, tablename)
return fqname
Base.prepare(classname_for_table=classname_for_table)
**As a bonus, here is a small snippet which will expose all tables within a dynamic submodule per schema so you can access it **
create a file ie. db.py and place the following
from types import ModuleType
def register_classes(base, module_dict):
for name, table in base.classes.items():
schema_name, table_name = name.split('.')
class_name = table_name.title().replace('_', '')
if schema_name not in module_dict:
module = module_dict[schema_name] = ModuleType(schema_name)
else:
module = module_dict[schema_name]
setattr(module, class_name, table)
Call this function with the automap base and the __dict__ of the module which you would like to register the schemas with.
register_classes(base, globals())
or
import db
db.register_classes(base, db.__dict__)
and then you will get
import db
db.students.MyTable
db.schools.MyTable

How to add DateTimeField in django without microsecond

I'm writing django application in django 1.8 and mysql 5.7.
Below is the model which I have written:
class People(models.Model):
name = models.CharField(max_length=20)
age = models.IntegerField()
create_time = models.DateTimeField()
class Meta:
db_table = "people"
Above model creates the table below:
mysql> desc people;
+-------------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+-------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| name | varchar(20) | NO | | NULL | |
| age | int(11) | NO | | NULL | |
| create_time | datetime(6) | NO | | NULL | |
+-------------+-------------+------+-----+---------+----------------+
Here Django creates datetime field with microsecond
datetime(6)
But I want datetime field without microsecond
datetime
I have another application, which is also using the same database and that datetime field with microsecond is raising an issue for me.

This is really very interesting question. I looked through the source code and here is the reason for setting the datetime with fractional seconds. The following snippet is from the file django/db/backends/mysql/base.py:
class DatabaseWrapper(BaseDatabaseWrapper):
vendor = 'mysql'
# This dictionary maps Field objects to their associated MySQL column
# types, as strings. Column-type strings can contain format strings; they'll
# be interpolated against the values of Field.__dict__ before being output.
# If a column type is set to None, it won't be included in the output.
_data_types = {
'AutoField': 'integer AUTO_INCREMENT',
'BinaryField': 'longblob',
'BooleanField': 'bool',
'CharField': 'varchar(%(max_length)s)',
'CommaSeparatedIntegerField': 'varchar(%(max_length)s)',
'DateField': 'date',
'DateTimeField': 'datetime',
'DecimalField': 'numeric(%(max_digits)s, %(decimal_places)s)',
'DurationField': 'bigint',
'FileField': 'varchar(%(max_length)s)',
'FilePathField': 'varchar(%(max_length)s)',
'FloatField': 'double precision',
'IntegerField': 'integer',
'BigIntegerField': 'bigint',
'IPAddressField': 'char(15)',
'GenericIPAddressField': 'char(39)',
'NullBooleanField': 'bool',
'OneToOneField': 'integer',
'PositiveIntegerField': 'integer UNSIGNED',
'PositiveSmallIntegerField': 'smallint UNSIGNED',
'SlugField': 'varchar(%(max_length)s)',
'SmallIntegerField': 'smallint',
'TextField': 'longtext',
'TimeField': 'time',
'UUIDField': 'char(32)',
}
#cached_property
def data_types(self):
if self.features.supports_microsecond_precision:
return dict(self._data_types, DateTimeField='datetime(6)', TimeField='time(6)')
else:
return self._data_types
# ... further class methods
In the method data_types the if condition checks the MySQL version. The method supports_microsecond_precision comes from the file django/db/backends/mysql/features.py:
class DatabaseFeatures(BaseDatabaseFeatures):
# ... properties and methods
def supports_microsecond_precision(self):
# See https://github.com/farcepest/MySQLdb1/issues/24 for the reason
# about requiring MySQLdb 1.2.5
return self.connection.mysql_version >= (5, 6, 4) and Database.version_info >= (1, 2, 5)
So when you use MySQL 5.6.4 or higher the field DateTimeField is mapped to datetime(6).
I couldn't find any possibility given by Django to adjust this, so ended up with monkey patching:
from django.db.backends.mysql.base import DatabaseWrapper
DatabaseWrapper.data_types = DatabaseWrapper._data_types
Put the above code where it suits best your needs, be it models.py or __init__.py, or maybe some other file.
When running migrations Django will create column datetime and not datetime(6) for DateTimeField, even if you're using MySQL 5.7.

This answer gave me an idea. What if you try to manually change the migrations.
First run python manage.py makemigrations and after that edit the file 0001_initial.py (or whatever the name is) in the subdirectory migrations of your app:
class Migration(migrations.Migration):
operations = [
migrations.CreateModel(
name = 'People'
fields = [
# the fields
# ... in this part comment or delete create_time
],
),
migrations.RunSQL(
"ALTER TABLE people ADD COLUMN create_time datetime(0)",
reverse_sql="ALTER TABLE people DROP COLUMN create_time",
state_operations=[
migrations.AddField(
model_name='people',
name='create_time',
fields= models.DateTimeField(),
)
]
)
]
This is just an example. You can try with different options and check with:
python manage.py sqlmigrations yourapp 0001
what the SQL output is. Instead of yourapp and 0001 provide the name of your app and the number of the migration.
Here is a link to the official documentation about fractional seconds time values.
EDIT: I tested the code above with MySQL 5.7 and it works as expected. Maybe it can help someone else. If you get some errors, check that you have installed mysqlclient and sqlparse.

cassandra not set default value for new column added later in python model

I have code like below.
from uuid import uuid4
from uuid import uuid1
from cassandra.cqlengine import columns, connection
from cassandra.cqlengine.models import Model
from cassandra.cqlengine.management import sync_table
class BaseModel(Model):
__abstract__ = True
id = columns.UUID(primary_key=True, default=uuid4)
created_timestamp = columns.TimeUUID(primary_key=True,
clustering_order='DESC',
default=uuid1)
deleted = columns.Boolean(required=True, default=False)
class OtherModel(BaseModel):
__table_name__ = 'other_table'
if __name__ == '__main__':
connection.setup(hosts=['localhost'],
default_keyspace='test')
sync_table(OtherModel)
OtherModel.create()
After first execution, I can see the record in db when run query as.
cqlsh> select * from test.other_table;
id | created_timestamp | deleted
--------------------------------------+--------------------------------------+---------
febc7789-5806-44d8-bbd5-45321676def9 | 840e1b66-cc73-11e6-a66c-38c986054a88 | False
(1 rows)
After this, I added new column name in OtherModel it and run same program.
class OtherModel(BaseModel):
__table_name__ = 'other_table'
name = columns.Text(required=True, default='')
if __name__ == '__main__':
connection.setup(hosts=['localhost'],
default_keyspace='test')
sync_table(OtherModel)
OtherModel.create(name='test')
When check db entry
cqlsh> select * from test.other_table;
id | created_timestamp | deleted | name
--------------------------------------+--------------------------------------+---------+------
936cfd6c-44a4-43d3-a3c0-fdd493144f4b | 4d7fd78c-cc74-11e6-bb49-38c986054a88 | False | test
febc7789-5806-44d8-bbd5-45321676def9 | 840e1b66-cc73-11e6-a66c-38c986054a88 | False | null
(2 rows)
There is one row with name as null.
But I can't query on null value.
cqlsh> select * from test.other_table where name=null;
InvalidRequest: code=2200 [Invalid query] message="Unsupported null value for indexed column name"
I got reference How Can I Search for Records That Have A Null/Empty Field Using CQL?.
When I set default='' in the Model, why it not set for all the null value in table?
Is there any way to set null value in name to default value '' with query?

The null cell is actually it just not being set. And the absence of data isn't something you can query on, since its a filtering operation. Its not scalable or possible to do efficiently, so its not something C* will encourage (or in this case even allow).
Going back and retroactively setting values to all the previously created rows would be very expensive (has to read everything, then do as many writes). Its pretty easy in application side to just say if name is null its '' though.

SQLAlchemy Column to Row Transformation and vice versa -- is it possible?

I'm looking for a SQLAlchemy only solution for converting a dict received from a form submission into a series of rows in the database, one for each field submitted. This is to handle preferences and settings that vary widely across applications. But, it's very likely applicable to creating pivot table like functionality. I've seen this type of thing in ETL tools but I was looking for a way to do it directly in the ORM. I couldn't find any documentation on it but maybe I missed something.
Example:
Submitted from form: {"UniqueId":1, "a":23, "b":"Hello", "c":"World"}
I would like it to be transformed (in the ORM) so that it is recorded in the database like this:
_______________________________________
|UniqueId| ItemName | ItemValue |
---------------------------------------
| 1 | a | 23 |
---------------------------------------
| 1 | b | Hello |
---------------------------------------
| 1 | c | World |
---------------------------------------
Upon a select the result would be transformed (in the ORM) back into a row of data from each of the individual values.
---------------------------------------------------
| UniqueId | a | b | c |
---------------------------------------------------
| 1 | 23 | Hello | World |
---------------------------------------------------
I would assume on an update that the best course of action would be to wrap a delete/create in a transaction so the current records would be removed and the new ones inserted.
The definitive list of ItemNames will be maintained in a separate table.
Totally open to more elegant solutions but would like to keep out of the database side if at all possible.
I'm using the declarative_base approach with SQLAlchemy.
Thanks in advance...
Cheers,
Paul

Here is a slightly modified example from documentation to work with such table structure mapped to dictionary in model:
from sqlalchemy import *
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm.collections import attribute_mapped_collection
from sqlalchemy.ext.associationproxy import association_proxy
from sqlalchemy.orm import relation, sessionmaker
metadata = MetaData()
Base = declarative_base(metadata=metadata, name='Base')
class Item(Base):
__tablename__ = 'Item'
UniqueId = Column(Integer, ForeignKey('ItemSet.UniqueId'),
primary_key=True)
ItemSet = relation('ItemSet')
ItemName = Column(String(10), primary_key=True)
ItemValue = Column(Text) # Use PickleType?
def _create_item(ItemName, ItemValue):
return Item(ItemName=ItemName, ItemValue=ItemValue)
class ItemSet(Base):
__tablename__ = 'ItemSet'
UniqueId = Column(Integer, primary_key=True)
_items = relation(Item,
collection_class=attribute_mapped_collection('ItemName'))
items = association_proxy('_items', 'ItemValue', creator=_create_item)
engine = create_engine('sqlite://', echo=True)
metadata.create_all(engine)
session = sessionmaker(bind=engine)()
data = {"UniqueId": 1, "a": 23, "b": "Hello", "c": "World"}
s = ItemSet(UniqueId=data.pop("UniqueId"))
s.items = data
session.add(s)
session.commit()

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Inserting Unicode values on alembic migration - python

Related

How to create index on on SQLAlchemy column_property?

reflecting every schema from postgres DB using SQLAlchemy

How to add DateTimeField in django without microsecond

cassandra not set default value for new column added later in python model

SQLAlchemy Column to Row Transformation and vice versa -- is it possible?

Categories

Resources