Importing JSON Data to SQL using sqlalchemy python - python

So I have a set of JSON files and would like to import them to my sqlite database using sqlalchemy.
The way that I am thinking is:
Declare a class in python with all the variable name:
class Designs(Base):
__tablename__='designs'
__table_args__ = {'sqlite_autoincrement': True}
design_name= Column(String(80),nullable=False,primary_key=True)
user_name= Column(String(80),nullable=False,primary_key=True)
rev_tag= Column(String(80),nullable=False,primary_key=True)
...... much more variables.....
Read the JSON (using python json package and store it one by one)
import json
data = json.load(open('xxx.json'))
for key, value in data.iteritems():
store it in the sql database
But if my JSON file is very big, declaring all variables in the class seems very troublesome and hard to maintain as I plan to further grow my JSON file.
Wondering if there are any better way to do it

Sqlalchemy offers a mapping interface in addition to the declarative interface you have shown. Using it you can add columns programmaticly.
metadata = MetaData()
# This tuple of columns could be generated programmaticly
columns = (
Column('design_name', String(80), primary_key=True),
Column('user_name', String(80), nullable=False),
Column('rev_tag', String(80), nullable=False),
...
)
designs = Table('designs', metadata, *columns)
class Designs(object):
def __init__(self, json_data):
for key, value in data.iteritems():
setattr(self, key, value)
mapper(Designs, designs)

Related

Copying data from one sqlalchemy session to another

I have a sqlalchemy schema containing three tables, (A, B, and C) related via one-to-many Foreign Key relationships (between A->B) and (B->C) with SQLite as a backend. I create separate database files to store data, each of which use the exact same sqlalchemy Models and run identical code to put data into them.
I want to be able to copy data from all these individual databases and put them into a single new database file, while preserving the Foreign Key relationships. I tried the following code to copy data from one file to a new file:
import sqlalchemy
from sqlalchemy.ext import declarative
from sqlalchemy import Column, String, Integer
from sqlalchemy import orm, engine
Base = declarative.declarative_base()
Session = orm.session_maker()
class A(Base):
__tablename__ = 'A'
a_id = Column(Ingeter, primary_key=True)
adata = Column(String)
b = orm.relationship('B', back_populates='a', cascade='all, delete-orphan', passive_deletes=True)
class B(Base):
__tablename__ = 'B'
b_id = Column(Ingeter, primary_key=True)
a_id = Column(Integer, sqlalchemy.ForeignKey('A.a_id', ondelete='SET NULL')
bdata = Column(String)
a = orm.relationship('A', back_populates='b')
c = orm.relationship('C', back_populates='b', cascade='all, delete-orphan', passive_deletes=True)
class C(Base):
__tablename__ = 'C'
c_id = Column(Ingeter, primary_key=True)
b_id = Column(Integer, sqlalchemy.ForeignKey('B.b_id', ondelete='SET NULL')
cdata = Column(String)
b = orm.relationship('B', back_populates='c')
file_new = 'file_new.db'
resource_new = 'sqlite:////%s' % file_new.lstrip('/')
engine_new = sqlalchemy.create_engine(resource_new, echo=False)
session_new = Session(bind=engine_new)
file_old = 'file_old.db'
resource_old = 'sqlite:////%s' % file_old.lstrip('/')
engine_old = sqlalchemy.create_engine(resource_old, echo=False)
session_old = Session(bind=engine_old)
for arow in session_old.query(A):
session_new.add(arow) # I am assuming that this will somehow know to copy all the child rows from the tables B and C due to the Foreign Key.
When run, I get the error, "Object '' is already attached to session '2' (this is '1')". Any pointers on how to do this using sqlalchemy and sessions? I also want to preserve the Foreign Key relationships within each database.
The use case is where data is first generated locally in non-networked machines and aggregated into a central db on the cloud. While the data will get generated in SQLite, the merge might happen in MySQL or Postgres, although here everything is happening in SQLite for simplicity.
First, the reason you get that error is because the instance arow is still tracked by session_old, so session_new will refuse to deal with it. You can detach it from session_old:
session_old.expunge(arow)
Which will allow you do add arow to session_new without issue, but you'll notice that nothing gets inserted into file_new. This is because SQLAlchemy knows that arow is persistent (meaning there's a row in the db corresponding to this object), and when you detach it and add it to session_new, SQLAlchemy still thinks it's persistent, so it does not get inserted again.
This is where Session.merge comes in. One caveat is that it won't merge unloaded relationships, so you'll need to eager load all the relationships you want to merge:
query = session_old.query(A).options(orm.subqueryload(A.b),
orm.subqueryload(A.b, B.c))
for arow in query:
session_new.merge(arow)

How to do a proper upsert using sqlalchemy on postgresql?

I would like to do an upsert using the "new" functionality added by postgresql 9.5, using sqlalchemy core. While it is implemented, I'm pretty confused by the syntax, which I can't adapt to my needs.
Here is a sample code of what I would like to be able to do :
from sqlalchemy.ext.declarative import declarative_base
Base = declarative_base()
class User(Base):
__tablename__ = 'test'
a_id = Column('id',Integer, primary_key=True)
a = Column("a",Integer)
engine = create_engine('postgres://name:password#localhost/test')
User().metadata.create_all(engine)
meta = MetaData(engine)
meta.reflect()
table = Table('test', meta, autoload=True)
conn = engine.connect()
from sqlalchemy.dialects.postgresql import insert as psql_insert
stmt = psql_insert(table).values({
table.c['id']: bindparam('id'),
table.c['a']: bindparam('a'),
})
stmt = stmt.on_conflict_do_update(
index_elements=[table.c['id']],
set_={'a': bindparam('a')},
)
list_of_dictionary = [{'id':1, 'a':1, }, {'id':2, 'a':2,}]
conn.execute(stmt, list_of_dictionary)
I basically want to insert a bulk of rows, and if one id is already taken, I want to update it with the value I initially wanted to insert.
However sqlalchemy throw me this error :
CompileError: bindparam() name 'a' is reserved for automatic usage in the VALUES or SET clause of this insert/update statement. Please use a name other than column name when using bindparam() with insert() or update() (for example, 'b_a').
While it is a known issue (see https://groups.google.com/forum/#!topic/sqlalchemy/VwiUlF1cz_o), I didn't found any proper answer that does not require to modify either the keys of list_of_dictionary or the name of your columns.
I want to know if there is a way of constructing stmt in a way to have a consistent behavior that does not depends on whether the keys of the variable list_of_dictionary are the name of the columns of the inserted table (my code works without error in those cases).
this does the trick for me:
from sqlalchemy import create_engine
from sqlalchemy import MetaData, Table
from sqlalchemy.dialects import postgresql
from sqlalchemy.inspection import inspect
def upsert(engine, schema, table_name, records=[]):
metadata = MetaData(schema=schema)
metadata.bind = engine
table = Table(table_name, metadata, schema=schema, autoload=True)
# get list of fields making up primary key
primary_keys = [key.name for key in inspect(table).primary_key]
# assemble base statement
stmt = postgresql.insert(table).values(records)
# define dict of non-primary keys for updating
update_dict = {
c.name: c
for c in stmt.excluded
if not c.primary_key
}
# cover case when all columns in table comprise a primary key
# in which case, upsert is identical to 'on conflict do nothing.
if update_dict == {}:
warnings.warn('no updateable columns found for table')
# we still wanna insert without errors
insert_ignore(table_name, records)
return None
# assemble new statement with 'on conflict do update' clause
update_stmt = stmt.on_conflict_do_update(
index_elements=primary_keys,
set_=update_dict,
)
# execute
with engine.connect() as conn:
result = conn.execute(update_stmt)
return result
For anyone looking for an ORM solution, the following worked for me:
def upsert(
sa_sessionmaker: Union[sessionmaker, scoped_session],
model: DeclarativeMeta,
get_values: Dict[str, Any],
update_values: Dict[str, Any],
) -> Any:
"""Upserts (updates if exists, else inserts) a SQLAlchemy model object.
Note that get_values must uniquely identify a single model object (row) for this
function to work.
Args:
sa_sessionmaker: SQLAlchemy sessionmaker to connect to the database.
model: Model declarative metadata.
get_values: Arguments used to try to retrieve an existing object.
update_values: Desired attributes for the object fetched via get_values,
or the new object if nothing was fetched.
Returns:
Model object subject to upsert.
"""
with sa_sessionmaker() as session:
instance = session.query(model).filter_by(**get_values).one_or_none()
if instance:
for attr, new_val in update_values.items():
setattr(instance, attr, new_val)
else:
create_kwargs = get_values | update_values
session.add(model(**create_kwargs))
session.commit()
instance = session.query(model).filter_by(**get_values).one_or_none()
return instance
A few remarks:
If the primary key of the object is known, using Session.merge() is likely a better alternative than the function above. In that sense, the function above assumes that the primary key is not known (and hence not part of get_values)
sa_sessionmaker is a factory for Session objects (see the docs)
model takes a SQLAlchemy declarative metadata (i.e., a "table" see the docs)
Python >= 3.9 required for the implementation above. If your environment requires a previous version of Python, replace create_kwargs = get_values | update_values with create_kwargs = {**get_values, **update_values}

Sqlalchemy if table does not exist

I wrote a module which is to create an empty database file
def create_database():
engine = create_engine("sqlite:///myexample.db", echo=True)
metadata = MetaData(engine)
metadata.create_all()
But in another function, I want to open myexample.db database, and create tables to it if it doesn't already have that table.
EG of the first, subsequent table I would create would be:
Table(Variable_TableName, metadata,
Column('Id', Integer, primary_key=True, nullable=False),
Column('Date', Date),
Column('Volume', Float))
(Since it is initially an empty database, it will have no tables in it, but subsequently, I can add more tables to it. Thats what i'm trying to say.)
Any suggestions?
I've managed to figure out what I intended to do. I used engine.dialect.has_table(engine, Variable_tableName) to check if the database has the table inside. IF it doesn't, then it will proceed to create a table in the database.
Sample code:
engine = create_engine("sqlite:///myexample.db") # Access the DB Engine
if not engine.dialect.has_table(engine, Variable_tableName): # If table don't exist, Create.
metadata = MetaData(engine)
# Create a table with the appropriate Columns
Table(Variable_tableName, metadata,
Column('Id', Integer, primary_key=True, nullable=False),
Column('Date', Date), Column('Country', String),
Column('Brand', String), Column('Price', Float),
# Implement the creation
metadata.create_all()
This seems to be giving me what i'm looking for.
Note that in 'Base.metadata' documentation it states about create_all:
Conditional by default, will not attempt to recreate tables already
present in the target database.
And if you can see that create_all takes these arguments: create_all(self, bind=None, tables=None, checkfirst=True), and according to documentation:
Defaults to True, don't issue CREATEs for tables already present in
the target database.
So if I understand your question correctly, you can just skip the condition.
The accepted answer prints a warning that engine.dialect.has_table() is only for internal use and not part of the public API. The message suggests this as an alternative, which works for me:
import os
import sqlalchemy
# Set up a connection to a SQLite3 DB
test_db = os.getcwd() + "/test.sqlite"
db_connection_string = "sqlite:///" + test_db
engine = create_engine(db_connection_string)
# The recommended way to check for existence
sqlalchemy.inspect(engine).has_table("BOOKS")
See also the SQL Alchemy docs.
For those who define the table first in some models.table file, among other tables.
This is a code snippet for finding the class that represents the table we want to create ( so later we can use the same code to just query it )
But together with the if written above, I still run the code with checkfirst=True
ORMTable.__table__.create(bind=engine, checkfirst=True)
models.table
class TableA(Base):
class TableB(Base):
class NewTableC(Base):
id = Column('id', Text)
name = Column('name', Text)
form
Then in the form action file:
engine = create_engine("sqlite:///myexample.db")
if not engine.dialect.has_table(engine, table_name):
# Added to models.tables the new table I needed ( format Table as written above )
table_models = importlib.import_module('models.tables')
# Grab the class that represents the new table
# table_name = 'NewTableC'
ORMTable = getattr(table_models, table_name)
# checkfirst=True to make sure it doesn't exists
ORMTable.__table__.create(bind=engine, checkfirst=True)
engine.dialect.has_table does not work for me on cx_oracle.
I am getting AttributeError: 'OracleDialect_cx_oracle' object has no attribute 'default_schema_name'
I wrote a workaround function:
from sqlalchemy.engine.base import Engine
def orcl_tab_or_view_exists(in_engine: Engine, in_object: str, in_object_name: str,)-> bool:
"""Checks if Oracle table exists in current in_engine connection
in_object: 'table' | 'view'
in_object_name: table_name | view_name
"""
obj_query = """SELECT {o}_name FROM all_{o}s WHERE owner = SYS_CONTEXT ('userenv', 'current_schema') AND {o}_name = '{on}'
""".format(o=in_object, on=in_object_name.upper())
with in_engine.connect() as connection:
result = connection.execute(obj_query)
return len(list(result)) > 0
This is the code working for me to create all tables of all model classes defined with Base class
from sqlalchemy import create_engine
from sqlalchemy.ext.declarative import declarative_base
class YourTable(Base):
__tablename__ = 'your_table'
id = Column(Integer, primary_key = True)
DB_URL="mysql+mysqldb://<user>:<password>#<host>:<port>/<db_name>"
scoped_engine = create_engine(DB_URL)
Base = declarative_base()
Base.metadata.create_all(scoped_engine)

Skip metadata.drop_all metadata.create_all if schema didn't change

If drop_all; create_all would not leave me with a schema change, I would want to skip that two lines.
How can I achieve that?
Background: I use an SQLite for caching and saving data on code changes isn't worth the effort, yet if stuff is cached and no code or db changes occurred meanwhile, I would like to keep the cached data and just use it.
Note: The question now is of academic interest as the solution "the developer has to care about migration" was accepted by my team. I would still be interested in how to detect if the actual db-schema matches the entities-derived db-schema.
create a version table in your database that stores a version number for the schema, then check this number against a number that's in your application.
This is the same approach used by schema migration tools such as alembic. Using a real migration tool and just invoking "upgrade" instead of "create_all()" would be the best practice here.
comparing schemas is not a simple task - alembic actually includes a feature which provides this which you can also access at the API level, but it has many caveats and is more of a timesaver for when one is generating new migration scripts.
Edit: Alembic's api: compare_metadata
example:
from alembic.migration import MigrationContext
from alembic.autogenerate import compare_metadata
from sqlalchemy.schema import SchemaItem
from sqlalchemy.types import TypeEngine
from sqlalchemy import (create_engine, MetaData, Column,
Integer, String, Table)
import pprint
engine = create_engine("sqlite://")
engine.execute('''
create table foo (
id integer not null primary key,
old_data varchar,
x integer
)''')
engine.execute('''
create table bar (
data varchar
)''')
metadata = MetaData()
Table('foo', metadata,
Column('id', Integer, primary_key=True),
Column('data', Integer),
Column('x', Integer, nullable=False)
)
Table('bat', metadata,
Column('info', String)
)
mc = MigrationContext.configure(engine.connect())
diff = compare_metadata(mc, metadata)
pprint.pprint(diff, indent=2, width=20)
output:
[ ( 'add_table',
Table('bat', MetaData(bind=None),
Column('info', String(), table=<bat>), schema=None)),
( 'remove_table',
Table(u'bar', MetaData(bind=None),
Column(u'data', VARCHAR(), table=<bar>), schema=None)),
( 'add_column',
None,
'foo',
Column('data', Integer(), table=<foo>)),
( 'remove_column',
None,
'foo',
Column(u'old_data', VARCHAR(), table=None)),
[ ( 'modify_nullable',
None,
'foo',
u'x',
{ 'existing_server_default': None,
'existing_type': INTEGER()},
True,
False)]]

Dumping SQLAlchemy output to JSON

I have written a small python script that uses SQLAlchemy to read all records of the db. Here is some of the code
Base=declarative_base()
Session = sessionmaker(bind=engine)
cess=Session()
class Test(Base):
__tablename__ = 'test'
my_id = Column(Integer, primary_key=True)
name = Column(String)
def __init__(self, id, name):
self.my_id = id
self.name = name
def __repr__(self):
return "<User('%d','%s')>" % (self.id, self.name)
query= cess.query(Test.my_id, Test.name).order_by(Test.my_id).all()
Now the query object i want to convert to a json string. How can i do this ? using json.dumps(query) throws an exception ?
Kind Regards
json.dumps will convert object according to its conversion table.
Since you have rows of type Test, these cannot be directly serialized. Probably the quickest approach is to convert each returned row to a Python dict and then pass this through to json.dumps.
This answer describes how you might go about converting a table row to a dict.
Or, perhaps the _asdict() method from row object can be utilised directly.
query = cess.query(Test.my_id, Test.name).order_by(Test.my_id).all()
json.dumps([ row._asdict() for row in query ])
An alternative might be to access the __dict__ attribute directly on each row, although you should check the output to ensure that there are no internal state variables in row.__dict__.
query = cess.query(Test.my_id, Test.name).order_by(Test.my_id).all()
json.dumps([ row.__dict__ for row in query ])
How I did it:
fe = SomeClass.query.get(int(1))
fe_dict = fe.__dict__
del fe_dict['_sa_instance_state']
return flask.jsonify(fe_dict)
Basically, given the object you've retrieved, grab the dict for the class instance, remove the sqlalchemy object that can't be json serialized and convert to json. I'm using flask to do this but I think json.dumps() would work the same.

Categories

Resources