I'm pretty new to database and server related tasks. I currently have two tables stored in a MSsql database on a server and I'm trying to use python package sqlalchemy to pull some of the data to my local machine. The first table has default schema dbo, and I was able to use the Connect String
'mssql+pyodbc://<username>:<password>#<dsnname>'
to inspect the table, but the other table has a customized schema, and I don't see any information about the table when I use the previous commands. I assume it is because now the second table has different schema and the python package can't find it anymore.
I was looking at automap hoping the package offers a way to deal with customized schema, but many concepts described in there I don't quite understand and I'm not trying to alter the database just pulling data so not sure if it's the right way, any suggestions?
Thanks
In case of automap you should pass the schema argument when preparing reflectively:
AutomapBase.prepare(reflect=True, schema='myschema')
If you wish to reflect both the default schema and your "customized schema" using the same automapper, then first reflect both schemas using the MetaData instance and after that prepare the automapper:
AutomapBase.metadata.reflect()
AutomapBase.metadata.reflect(schema='myschema')
AutomapBase.prepare()
If you call AutomapBase.prepare(reflect=True, ...) consecutively for both schemas, then the automapper will recreate and replace the classes from the 1st prepare because the tables already exist in the metadata. This will then raise warnings.
Related
I'm new to using Postgres, so I'm not sure if this is a basic question. I obtain a Postgres dump from my company that I load as a Postgres schema using Pgadmin. I then use the psycopg2 package in python to load the various tables of this schema into pandas dataframes and do whatever processing I need.
Every time I get a new version of the data, I have to go through a 3-step process in pgadmin to update the schema before I work with it in python:
"Drop cascade" the existing public schema
Create a new public schema
Restore the public schema by pointing it to the new pgdump file
Rather than doing these three using pgadmin, can this be done programatically from within a python script?
I'm designing a database that has an API layer over it to get data from the tables. The database is postgres. Every night, we do a batch ETL process to update the data in the database. Due to some complications that aren't worth mentioning, the ETL process involves wiping out all of the data and rebuilding things from scratch.
Obviously, this is problematic for the API because if the API queries the database during the rebuilding phase, data will be missing.
I've decided to solve this by using two schemas. The "finished" schema (let's call this schema A) and the "rebuilding" schema (let's call this schema B). My ETL process looks like this:
1. Create schema B as an exact replica of schema A
2. Completely rebuild the data in schema B
3. In a transaction, drop schema A and rename schema B to schema A
The problem I'm currently running into is that I'm using sqlalchemy Session and Table objects, and the tables are bound to schema A by virtue of their metadata.
I would like to be able to do session.add(obj) and have it add that data to schema B. After all, schema A and schema B are exactly the same so the table definitions should be valid for both.
I'm wondering if anyone has any recommendations on how I can use sqlalchemy's session object and/or table objects to dynamically select which schema I should be using.
I still want sessions/tables to point to schema A because the same code is reused in the API layer. I only want to use schema B during this one step.
I ended up solving this by wrapping my table definitions in functions that accept a sqlalchemy metadata object and return the table definition bound to that metadata object.
I'm writing a SQLAlchemy app that needs to connect to a PostgreSQL database and a MySQL database. Basically I'm loading the data from an existing MySQL database, doing some transforms on it, and then saving it in PostgreSQL.
I am managing the PostgreSQL schema using SQLAlchemy's declarative base. The MySQL database already exists, and I am accessing the schema via SQLAlchemy's reflection. Both have very different schemas.
I know I need dedicated engines for each database, but I'm unclear on whether I need dedicated objects of any of the following:
Base - I think this corresponds to the database schema. Since both databases have very different schemas, I will need a dedicated Base for each schema.
Metadata - Is this intended to be a single global metadata object that holds all schemas from all engines?
Sessions - I'm not sure, but I think I need separate sessions for each database? Or can a single session share multiple engine/Base combos? I'm using scoped_sessions.
Part of my confusion comes from not understanding the difference between Base and Metadata. The SQLAlchemy docs say:
MetaData is a container object that keeps together many different features of a database (or multiple databases) being described.
This seems to imply that a single metadata can hold multiple Base's, but I'm still a bit fuzzy on how that works. For example, I want to be able to call metadata.create_all() and create tables in PostgreSQL, but not MySQL.
The short answer is that it's easiest to have separate instances of them all for both databases. It is possible to create a single routing session, but it has its caveats.
The sessionmaker and Session also support passing multiple binds as an argument and 2-phase commits, which can also allow using a single session with multiple databases. As luck would have it, the 2 databases that support 2-phase commits are PostgreSQL and MySQL.
About the relation between Base and metadata:
Base is a base class that has a metaclass used to declaratively create Table objects from information provided in the class itself and its subclasses. All Table objects implicitly declared by subclasses of Base will share the same MetaData.
You can provide metadata as an argument when creating a new declarative base and thus share it between multiple Bases, but in your case it is not useful.
MetaData
is a collection of Table objects and their associated schema constructs. It also can hold a binding to an Engine or Session.
In short, you can have Tables and MetaData without a Base, but a Base requires MetaData to function.
I need to read data from MySQL, process it with python script and write the result into Sqlite.
Also, I need to convert MySql create definitions to Sqlite create definitions.
Are there any existing libraries for python to convert MySql data type (including set, enum, timestamp, etc.) to Sqlite data type, or I should write it myself?
Depending on your use case, you could use an ORM library like peewee for abstracting away the MySQL and Sqlite databases.
One possible way of approaching your problem would be to use peewee's model generator for creating models for the MySQL database, which you can later reuse for the Sqlite one using this example as reference.
This might sound like a bit of an odd question - but is it possible to load data from a (in this case MySQL) table to be used in Django without the need for a model to be present?
I realise this isn't really the Django way, but given my current scenario, I don't really know how better to solve the problem.
I'm working on a site, which for one aspect makes use of a table of data which has been bought from a third party. The columns of interest are liklely to remain stable, however the structure of the table could change with subsequent updates to the data set. The table is also massive (in terms of columns) - so I'm not keen on typing out each field in the model one-by-one. I'd also like to leave the table intact - so coming up with a model which represents the set of columns I am interested in is not really an ideal solution.
Ideally, I want to have this table in a database somewhere (possibly separate to the main site database) and access its contents directly using SQL.
You can always execute raw SQL directly against the database: see the docs.
There is one feature called inspectdb in Django. for legacy databases like MySQL , it creates models automatically by inspecting your db tables. it stored in our app files as models.py. so we don't need to type all column manually.But read the documentation carefully before creating the models because it may affect the DB data ...i hope this will be useful for you.
I guess you can use any SQL library available for Python. For example : http://www.sqlalchemy.org/
You have just then to connect to your database, perform your request and use the datas at your will. I think you can't use Django without their model system, but nothing prevents you from using another library for this in parallel.