How do I extract table metadata from a database using python - python

I want to come up minimal set of queries/loc that extracts the table metadata within a database, on as many versions of database as possible. I'm using PostgreSQl. I'm trying to get this using python. But I've no clue on how to do this, as I'm a python newbie.
I appreciate your ideas/suggestions on this issue.

You can ask your database driver, in this case psycopg2, to return some metadata about a database connection you've established. You can also ask the database directly about some of it's capabilities, or schemas, but this is highly dependent on the version of the database you're connecting to, as well as the type of database.
Here's an example taken from http://bytes.com/topic/python/answers/438133-find-out-schema-psycopg for PostgreSQL:
>>> import psycopg2 as db
>>> conn = db.connect('dbname=billings user=steve password=xxxxx port=5432')
>>> curs = conn.cursor()
>>> curs.execute("""select table_name from information_schema.tables WHERE table_schema='public' AND table_type='BASETABLE'""")
>>> curs.fetchall()
[('contacts',), ('invoicing',), ('lines',), ('task',), ('products',),('project',)]
However, you probably would be better served using an ORM like SQLAlchemy. This will create an engine which you can query about the database you're connected to, as well as normalize how you connect to varying database types.
If you need help with SQLAlchemy, post another question here! There's TONS of information already available by searching the site.

Related

pyodbc and MYSQL connection

I am aware a similiar question exists. It has not been marked as answered yet and I tried all suggestions so far. I am also not native speaker, please excuse spelling mistakes.
I have written a small class in python to interact with a SQL Database.
Now I want to be able to connect to either SQL or MYSQL Database with the same functionalities.
It would be perfect for me to just change the connection type in the instance initiation to keep my class maintainable. Else I would need to create a seconde class using for example the mysql.connector, which would result in two classes with nearly the same structure and content.
This is how I tried to use pyodbc so far:
conn = pyodbc.connect('Driver={SQL Server};'
'Server=xyzhbv;'
'Database=Test;'
'ENCRYPT=yes;'
'UID=root;'
'PWD=12345;')
Please note that I changed all credentials.
What do I need to change to use pyodbc for MySQL?
Is that even possible?
Or
Can I use both libaries within one class without confusing? (they share function names)
Many Thanks for any help.
Have a great day.

Can read but not write to as/400 database with python

I have a problem setting which involves data manipulation on an IBMi as/400 database. I'm trying to solve the problem with the help of python and pandas.
For the last days I'm was trying to set up a proper connection to the as/400 db which every combination of package, driver, dialect whatsoever that I could find on SO or Google.
Neither of the solutions is fully working for me. Some are better, while others are not working at all.
Here's the current situation:
I'm able to read and write data through pyodbc. The connection string I'm using is the following:
cstring = urllib.parse.quote("DRIVER={IBM i Access ODBC Driver};SYSTEM=IP;UID=XXX;PWD=YYY;PORT=21;CommitMode=0;SIGNON=4;CCSID=1208;TRANSLATE=1;")
Then I establish the connection like so:
connection = pypyodbc.connect(cstring)
With connection I can read and write data from/to the as400 db through raw SQL statements:
connection.execute("""CREATE TABLE WWNMOD5.temp(
store_id INT GENERATED BY DEFAULT AS IDENTITY NOT NULL,
store_name VARCHAR(150),
PRIMARY KEY (store_id)
)""")
This is, of course, a meaningless example. My goal would be to write a pandas DataFrame to the as400 by using
df.to_sql()
But when trying to do something like this:
df.to_sql('temp', connection, schema='WWNMOD5', chunksize=1000, if_exists='append', index=False)
I get this error:
pandas.io.sql.DatabaseError: Execution failed on sql 'SELECT name FROM sqlite_master WHERE type='table' AND name=?;': ('42000', '[42000] [IBM][System i Access ODBC-Treiber][DB2 für i5/OS]SQL0104 - Token ; ungültig. Gültige Token: <ENDE DER ANWEISUNG>.')
Meaning that an invalid token was used which in this case I believe is the ';' at the end of the SQL statement.
I believe that pandas isn't compatible with the pyodbc package. Therefore, I was also trying to work with the db over sqlalchemy.
With sqlalchemy, I establish the connection like so:
engine= sa.create_engine("iaccess+pyodbc:///?odbc_connect=%s"%cstring)
I also tried to use ibm_db_sa instead of iaccess but the result is always the same.
If I do the same from above with sqlalchemy, that is:
df.to_sql('temp', engine, schema='WWNMOD5', chunksize=1000, if_exists='append', index=False)
I don't get any error message but the table is not created either and I don't know why.
Is there a way how to get this working? All the SO threads are only suggesting solutions for establishing a connection and reading data from as400 databases but don't cover writing data back to the as400 db via python.
It looks like you are using the wrong driver. Pandas claims support for any DB supported by SQLAlchemy. But in order for SQLAlchemy to use DB2, it needs a third party extension.
This is the one recommended by SQLAlchemy: https://pypi.org/project/ibm-db-sa/

Get the schema of an Oracle database with python

I want to list and describe the tables present in an Oracle database.
To do this by connecting to the database with a client such as SQL Plus, a working approach is:
List the tables:
select tablespace_name, table_name from all_tables;
Get columns and data types for each table:
describe [table_name];
However when using cx_Oracle through python, cur.execute('describe [table_name]') results in an 'invalid sql' error.
How can we use describe with cx_Oracle in python?
It seems you can't.
From cx_Oracle instead of describe use:
cur.execute('select column_name, data_type from all_tab_columns where table_name = [table_name]')
(From Richard Moore here http://cx-oracle-users.narkive.com/suaWH9nn/cx-oracle4-3-1-describe-table-query-is-not-working)
As noted by others there is no ability to describe directly. I created a set of libraries and tools that let you do this, however. You can see them here: https://github.com/anthony-tuininga/cx_OracleTools.

Incrementally get all data from source table (in db1) to destination table (in db2) in PostgreSQL

I have two PostgreSQL databases db1 = source database and db2 = destination database in a AWS server endpoint. For db1, I just have read rights and for the db2 I have both read and write rights. db1 being production database has a table called 'public.purchases', my task is to get incrementally all the data from 'public.purchases' table in db1 to a 'to be newly created table' in db2 (let me call the table as 'public.purchases_copy'). And every time I run the script to perform this action the destination table which is 'public.purchases_copy' in db2 needs to be updated without fully reloading the table.
My question is what would be the best way to achieve this task more efficiently. I did quite a bit of research online and I found out that it can be achieved by connecting Python to PostgreSQL using 'psycopg2' module. Me being not so proficient in Python it would be of great help if somebody help me out in pointing out the links in StackOverflow where the similar question was being answered or guide me in what can be done or how this can be achieved or any particular tutorial which I can refer? Thanks in advance.
PostgreSQL version: 9.5,
PostgreSQL GUI using: pgadmin 3,
Python version installed: 3.5
While it is possible to do this using python, I would recommend first looking into Postgres own module postgres_fdw, if it is possible for you to use it :
The postgres_fdw module provides the foreign-data
wrapper postgres_fdw, which can be used to access data stored in
external PostgreSQL servers.
Details available on postgres docs, but specifically after you set it up, you can :
Create a foreign table, using CREATE FOREIGN TABLE or IMPORT FOREIGN
SCHEMA, for each remote table you want to access. The columns of the
foreign table must match the referenced remote table. You can,
however, use table and/or column names different from the remote
table's, if you specify the correct remote names as options of the
foreign table object.
Now you need only SELECT from a foreign table to access the data
stored in its underlying remote table
For simpler setup, it should probably be best to use readonly db as the foreign one.

Copy whole SQL Server database into JSON from Python

I facing an atypical conversion problem. About a decade ago I coded up a large site in ASP. Over the years this turned into ASP.NET but kept the same database.
I've just re-done the site in Django and I've copied all the core data but before I cancel my account with the host, I need to make sure I've got a long-term backup of the data so if it turns out I'm missing something, I can copy it from a local copy.
To complicate matters, I no longer have Windows. I moved to Ubuntu on all my machines some time back. I could ask the host to send me a backup but having no access to a machine with MSSQL, I wouldn't be able to use that if I needed to.
So I'm looking for something that does:
db = {}
for table in database:
db[table.name] = [row for row in table]
And then I could serialize db off somewhere for later consumption... But how do I do the table iteration? Is there an easier way to do all of this? Can MSSQL do a cross-platform SQLDump (inc data)?
For previous MSSQL I've used pymssql but I don't know how to iterate the tables and copy rows (ideally with column headers so I can tell what the data is). I'm not looking for much code but I need a poke in the right direction.
Have a look at the sysobjects and syscolumns tables. Also try:
SELECT * FROM sysobjects WHERE name LIKE 'sys%'
to find any other metatables of interest. See here for more info on these tables and the newer SQL2005 counterparts.
I've liked the ADOdb python module when I've needed to connect to sql server from python. Here is a link to a simple tutorial/example: http://phplens.com/lens/adodb/adodb-py-docs.htm#tutorial
I know you said JSON, but it's very simple to generate a SQL script to do an entire dump in XML:
SELECT REPLACE(REPLACE('SELECT * FROM {TABLE_SCHEMA}.{TABLE_NAME} FOR XML RAW', '{TABLE_SCHEMA}',
QUOTENAME(TABLE_SCHEMA)), '{TABLE_NAME}', QUOTENAME(TABLE_NAME))
FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_TYPE = 'BASE TABLE'
ORDER BY TABLE_SCHEMA
,TABLE_NAME
As an aside to your coding approach - I'd say :
set up a virtual machine with an eval on windows
put sql server eval on it
restore your data
check it manually or automatically using the excellent db scripting tools from red-gate to script the data and the schema
if fine then you have (a) a good backup and (b) a scripted output.

Categories

Resources