Does anybody know of a good Python code that can copy large number of tables (around 100 tables) from one database to another in SQL Server?
I ask if there is a way to do it in Python, because due to restrictions at my place of employment, I cannot copy tables across databases inside SQL Server alone.
Here is a simple Python code that copies one table from one database to another. I am wondering if there is a better way to write it if I want to copy 100 tables.
print('Initializing...')
import pandas as pd
import sqlalchemy
import pyodbc
db1 = sqlalchemy.create_engine("mssql+pyodbc://user:password#db_one")
db2 = sqlalchemy.create_engine("mssql+pyodbc://user:password#db_two")
print('Writing...')
query = '''SELECT * FROM [dbo].[test_table]'''
df = pd.read_sql(query, db1)
df.to_sql('test_table', db2, schema='dbo', index=False, if_exists='replace')
print('(1) [test_table] copied.')
SQLAlchemy is actually a good tool to use to create identical tables in the second db:
table = Table('test_table', metadata, autoload=True, autoload_with=db1)
table.create(engine=db2)
This method will also produce correct keys, indexes, foreign keys. Once the needed tables are created, you can move the data by either select/insert if the tables are relatively small or use bcp utility to dump table to disk and then load it into the second database (much faster but more work to get it to work correctly)
If using select/insert then it is better to insert in batches of 500 records or so.
You can do something like this:
tabs = pd.read_sql("SELECT table_name FROM INFORMATION_SCHEMA.TABLES", db1)
for tab in tabs['table_name']:
pd.read_sql("select * from {}".format(tab), db1).to_sql(tab, db2, index=False)
But it might be be awfully slow. Use SQL Server tools to do this job.
Consider using sp_addlinkedserver procedure to link one SQL Server from another. After that you can execute:
SELECT * INTO server_name...table_name FROM table_name
for all tables from the db1 database.
PS this might be done in Python + SQLAlchemy as well...
Related
I am trying to access tables from a database using python. There was some code on the website: https://rnacentral.org/help/public-database
import psycopg2.extras
def main():
conn_string = "host='hh-pgsql-public.ebi.ac.uk' dbname='pfmegrnargs' user='reader' password='NWDMCE5xdipIjRrp'"
conn = psycopg2.connect(conn_string)
cursor = conn.cursor(cursor_factory=psycopg2.extras.DictCursor)`
# retrieve a list of RNAcentral databases
query = "SELECT * FROM rnc_database"
cursor.execute(query)
for row in cursor:
print(row)`
When i run this code, i get back a list of databases:
I want to access tables from one of these databases but I don't know what the schema for those tables are or what the values in each list returned represents. I have been looking at 'postgresql to python' resources but all of them are about accessing tables when you know the name of the tables and the columns within.... Is there code for how I can access the table names from the database?
Thank You
Edit: sorry, i thought i linked the website before
The dataset you want to use has schema diagram here https://rnacentral.org/help/public-database
For general purpose I would use something like https://dbeaver.io/ tool it will show you all the schemas in the db and tables inside the schema and so forth. The DBeaver settings to connect to your db would look like this
If you want to keep using python script to explore the db this sql query
SELECT *
FROM pg_catalog.pg_tables
WHERE schemaname != 'pg_catalog' AND
schemaname != 'information_schema';
Should help you.
While programming on Python and working with SQL dbs I previously used mysql.connector library, but I found out that it takes a lot of rows of code to get your data into SQL table because you need to write the whole SQL query column by column.
From the other side while using pandas there are easy methods working with SQLalchemy library only:
- pd.to_sql
- pd.read_sql_table
(Well, I got errors while using mysql db.cursor() and couldn't find any tutorial besides SQLalchemy + Pandas).
These two methods let you easily get a dataframe from SQL table and create a SQL table from dataframe.
I wonder if there is such analogue in mysql.connector to easily convert dataframe to SQL table and vice versa, since still the syntax of this library for me is more convenient for other actions rather than SQL???
P.S. MySQL initiate code is written just for info and not used in
provided code, though I need to somehow find an analogue
# ------------------- IMPORT -------------------------
import mysql.connector
import pandas as pd
import sqlalchemy
# ------------------- MYSQL + SQL Alchemy -------------------------
mydb = mysql.connector.connect(
host='host',
user='user',
passwd='pass',
database='db'
)
mycursor = mydb.cursor()
engine = sqlalchemy.create_engine('mysql+mysqldb://user:pass#host/db', pool_recycle=3600)
# ------------------- FUNCTIONS ----------------------
def get_function():
df = pd.read_html(
"https://www.url.com")
df[3].to_sql(name=table, con=engine, index=False, if_exists='replace')
# -------------------- MAIN --------------------------
table = 'table_name'
get_function()
print(pd.read_sql_table(table, engine, columns=[]))
Your question isn't very clear, so I'm not completely sure what you're trying to do, but I'll try my best to answer.
So SQLAlchemy is a so called Object-relational-Mapper (like Hibernate in the Java world), which maps between relations (columns, rows, tables) and objects.
Pandas is a data analysis library, that can use SQLAlchemy. SQLAlchemy itself supports a wide range of Databases, including MySQL.
Now I didn't understand whether you'd like to use Pandas + SQLAlchemy + MySQL, or whether you just want a simple way to work with MySQL directly.
In the first case you can simply use Pandas, in the latter case you can use SQLAlchemy directly. Pandas provides documentation and so does SQLAlchemy
I'm working with two separate databases (Oracle and PostgreSQL) where a substantial amount of reference data is in Oracle that I'll frequently need to reference while doing analysis on data stored in Postgres (I have no control over this part). I'd like to be able to directly transfer the results of a query from Oracle to a table in Postgres (and vice versa). The closest I've gotten is something like the code below using Pandas as a go between, but it's quite slow.
import pandas as pd
import psycopg2
import cx_Oracle
import sqlalchemy
# CONNECT TO POSTGRES
db__PG = psycopg2.connect(dbname="username", user="password")
eng_PG = sqlalchemy.create_engine('postgresql://username:password#localhost:5432/db_one')
# CONNECT TO ORACLE
db__OR = cx_Oracle.connect('username', 'password', 'SID_Name')
eng_OR = sqlalchemy.create_engine('oracle+cx_oracle://username:password#10.1.2.3:4422/db_two')
#DEFINE QUERIES
query_A = "SELECT * FROM tbl_A"
query_B = "SELECT * FROM tbl_B"
# CREATE PANDAS DATAFRAMES FROM QUERY RESULTS
df_A = pd.read_sql_query(query_A, db__OR)
df_B = pd.read_sql_query(query_B, db__PG)
# CREATE TABLES FROM PANDAS DATAFRAMES
df_A.to_sql(name='tbl_a', con=eng_PG, if_exists='replace', index=False)
df_B.to_sql(name='tbl_b', con=eng_OR, if_exists='replace', index=False)
I think there would have to be a more efficient, direct way to do this (like database links for moving data across different DB's in Oracle), but I'm fairly new to Python and have generally worked either directly in SQL or in SAS previously. I've been searching for ways to create a table directly from a python cursor result or a SQLAlchemy ResultProxy, but haven't had much luck.
Suggestions?
Searched the web and this forum without satisfaction. Using Python 2.7 and pyODBC on Windows XP. I can get the code below to run and generate two cursors from two different databases without problems. Ideally, I'd then like to join these result cursors thusly:
SELECT a.state, sum(b.Sales)
FROM cust_curs a
INNER JOIN fin_curs b
ON a.Cust_id = b.Cust_id
GROUP BY a.state
Is there a way to join cursors using SQL statements in python or pyODBC? Would I need to store these cursors in a common DB (SQLite3?) to accomplish this? Is there a pure python data handling approach that would generate this summary from these two cursors?
Thanks for your consideration.
Working code:
import pyodbc
#
# DB2 Financial Data Cursor
#
cnxn = pyodbc.connect('DSN=DB2_Fin;UID=;PWD=')
fin_curs = cnxn.cursor()
fin_curs.execute("""SELECT Cust_id, sum(Sales) as Sales
FROM Finance.Sales_Tbl
GROUP BY Cust_id""")
#
# Oracle Customer Data Cursor
#
cnxn = pyodbc.connect('DSN=Ora_Cust;UID=;PWD=')
cust_curs = cnxn.cursor()
cust_curs.execute("""SELECT Distinct Cust_id, gender, address, state
FROM Customers.Cust_Data""")
Cursors are simply objects used for executing SQL commands and retrieving the results. The data aren't migrated in a new database and thus joins aren't possible. If you would like to join the data you'll need to have the two tables in the same database. Whether that means brining both tables and their data into a SQLite database or doing it some other way depends on the specifics of your use case, but that would theoretically work.
I've got a MySQL table with about ~10m rows. I created a parallel schema in SQLite3, and I'd like to copy the table somehow. Using Python seems like an acceptable solution, but this way --
# ...
mysqlcursor.execute('SELECT * FROM tbl')
rows = mysqlcursor.fetchall() # or mysqlcursor.fetchone()
for row in rows:
# ... insert row via sqlite3 cursor
...is incredibly slow (hangs at .execute(), I wouldn't know for how long).
I'd only have to do this once, so I don't mind if it takes a couple of hours, but is there a different way to do this? Using a different tool rather than Python is also acceptable.
The simplest way might be to use mysqldump to get a SQL file of the whole db, then use the SQLite command-line tool to execute the file.
You don't show exactly how you insert rows, but you mention execute().
You might try executemany()* instead.
For example:
import sqlite3
conn = sqlite3.connect('mydb')
c = conn.cursor()
# one '?' placeholder for each column you're inserting
# "rows" needs to be a sequence of values, e.g. ((1,'a'), (2,'b'), (3,'c'))
c.executemany("INSERT INTO tbl VALUES (?,?);", rows)
conn.commit()
*executemany() as described in the Python DB-API:
.executemany(operation,seq_of_parameters)
Prepare a database operation (query or
command) and then execute it against
all parameter sequences or mappings
found in the sequence
seq_of_parameters.
You can export a flat file from mysql using select into outfile and import those with sqlite's .import:
mysql> SELECT * INTO OUTFILE '/tmp/export.txt' FROM sometable;
sqlite> .separator "\t"
sqlite> .import /tmp/export.txt sometable
This handles the data export/import but not copying the schema, of course.
If you really want to do this with python (maybe to transform the data), I would use a MySQLdb.cursors.SSCursor to iterate over the data - otherwise the mysql resultset gets cached in memory which is why your query is hanging on execute. So that would look something like:
import MySQLdb
import MySQLdb.cursors
connection = MySQLdb.connect(...)
cursor = connection.cursor(MySQLdb.cursors.SSCursor)
cursor.execute('SELECT * FROM tbl')
for row in cursor:
# do something with row and add to sqlite database
That will be much slower than the export/import approach.