Inspect many sqlite databases with python dataset module - python

Recently I used the python module dataset to manipulate and store information. As result, I have a collection of sqlite databases, let say file1.db, file2.db and so on. Moreover, each of the databases contains the same table.
With dataset I can easily connect and inspect the databases with the code:
>>> db1 = dataset.connect('sqlite:////path/file1.db')
>>> table1 = db1[u'tweet']
Assuming I want to keep databases separated in many files, what is the preferable way to inspect all the databases with dataset?
I am looking at something better to this:
>>> db1 = dataset.connect('sqlite:////path/file1.db')
>>> db2 = dataset.connect('sqlite:////path/file2.db')
>>> tables = [db1[u'tweet'],db2[u'tweet']]
>>> for table in tables:
for tweet in table:
print(tweet['text'])

I don't know a clean solution to this, but it might be interesting to use an in-memory SQLite database for this scenario:
mem_db = dataset.connect('sqlite:///')
databases = ['sqlite:////path/file1.db']
for uri in databases:
db1 = dataset.connect(uri)
for row in db1['table']:
mem_db.insert(row)
There's also an insert_many call, I believe, which may be faster for bulk transfer.

Related

POSTGRESQL Queries using Python

I am trying to access tables from a database using python. There was some code on the website: https://rnacentral.org/help/public-database
import psycopg2.extras
def main():
conn_string = "host='hh-pgsql-public.ebi.ac.uk' dbname='pfmegrnargs' user='reader' password='NWDMCE5xdipIjRrp'"
conn = psycopg2.connect(conn_string)
cursor = conn.cursor(cursor_factory=psycopg2.extras.DictCursor)`
# retrieve a list of RNAcentral databases
query = "SELECT * FROM rnc_database"
cursor.execute(query)
for row in cursor:
print(row)`
When i run this code, i get back a list of databases:
I want to access tables from one of these databases but I don't know what the schema for those tables are or what the values in each list returned represents. I have been looking at 'postgresql to python' resources but all of them are about accessing tables when you know the name of the tables and the columns within.... Is there code for how I can access the table names from the database?
Thank You
Edit: sorry, i thought i linked the website before
The dataset you want to use has schema diagram here https://rnacentral.org/help/public-database
For general purpose I would use something like https://dbeaver.io/ tool it will show you all the schemas in the db and tables inside the schema and so forth. The DBeaver settings to connect to your db would look like this
If you want to keep using python script to explore the db this sql query
SELECT *
FROM pg_catalog.pg_tables
WHERE schemaname != 'pg_catalog' AND
schemaname != 'information_schema';
Should help you.

How to print MongoDB databases in Python

I have a MongoDB database that is storing data from ROS topics that my robot is logging. I am trying to print the data in MongoDB by using the following python script:
from pymongo import MongoClient
client = MongoClient('cpr-j100-0101', 62345)
db1 = client.front_scan
db2 = client.cmd_vel
db3 = client.odometry_filtered
print db1
print db2
print db3
but I dont get the result I want when I run this script. I have attached the result of running this script as an image. Instead of this, I would like to actually be able to access the data within mongoDB.enter image description here
You can't print a database before accessing it. First, you need to select what database you need to print. For an example let's think you have 2 collections in db1 as coll1 and coll2. By printing the database means you are going to print the documents of the collections that are in the database.
from pymongo import MongoClient
client = MongoClient('mongodb://localhost:27017/')
db = client.myDatabase
#my dummy database is myDatabase.
coll1 = db.coll1 #selecting the coll1 in myDatabase
for document in coll1.find():
print (document)
so from the above code, you can print all the documents in the coll1 collection of myDatabase. The same manner you can print the databases one by one.
With this script you actually don't do much. You just create three databases and that's basically it. You never insert data, or read data from the database. You're just printing the database object.
I believe, that MongoDB Manual should be useful...

Copy tables from one database to another in SQL Server, using Python

Does anybody know of a good Python code that can copy large number of tables (around 100 tables) from one database to another in SQL Server?
I ask if there is a way to do it in Python, because due to restrictions at my place of employment, I cannot copy tables across databases inside SQL Server alone.
Here is a simple Python code that copies one table from one database to another. I am wondering if there is a better way to write it if I want to copy 100 tables.
print('Initializing...')
import pandas as pd
import sqlalchemy
import pyodbc
db1 = sqlalchemy.create_engine("mssql+pyodbc://user:password#db_one")
db2 = sqlalchemy.create_engine("mssql+pyodbc://user:password#db_two")
print('Writing...')
query = '''SELECT * FROM [dbo].[test_table]'''
df = pd.read_sql(query, db1)
df.to_sql('test_table', db2, schema='dbo', index=False, if_exists='replace')
print('(1) [test_table] copied.')
SQLAlchemy is actually a good tool to use to create identical tables in the second db:
table = Table('test_table', metadata, autoload=True, autoload_with=db1)
table.create(engine=db2)
This method will also produce correct keys, indexes, foreign keys. Once the needed tables are created, you can move the data by either select/insert if the tables are relatively small or use bcp utility to dump table to disk and then load it into the second database (much faster but more work to get it to work correctly)
If using select/insert then it is better to insert in batches of 500 records or so.
You can do something like this:
tabs = pd.read_sql("SELECT table_name FROM INFORMATION_SCHEMA.TABLES", db1)
for tab in tabs['table_name']:
pd.read_sql("select * from {}".format(tab), db1).to_sql(tab, db2, index=False)
But it might be be awfully slow. Use SQL Server tools to do this job.
Consider using sp_addlinkedserver procedure to link one SQL Server from another. After that you can execute:
SELECT * INTO server_name...table_name FROM table_name
for all tables from the db1 database.
PS this might be done in Python + SQLAlchemy as well...

How to create a table in one DB from the result of a query from different DB in Python

I'm working with two separate databases (Oracle and PostgreSQL) where a substantial amount of reference data is in Oracle that I'll frequently need to reference while doing analysis on data stored in Postgres (I have no control over this part). I'd like to be able to directly transfer the results of a query from Oracle to a table in Postgres (and vice versa). The closest I've gotten is something like the code below using Pandas as a go between, but it's quite slow.
import pandas as pd
import psycopg2
import cx_Oracle
import sqlalchemy
# CONNECT TO POSTGRES
db__PG = psycopg2.connect(dbname="username", user="password")
eng_PG = sqlalchemy.create_engine('postgresql://username:password#localhost:5432/db_one')
# CONNECT TO ORACLE
db__OR = cx_Oracle.connect('username', 'password', 'SID_Name')
eng_OR = sqlalchemy.create_engine('oracle+cx_oracle://username:password#10.1.2.3:4422/db_two')
#DEFINE QUERIES
query_A = "SELECT * FROM tbl_A"
query_B = "SELECT * FROM tbl_B"
# CREATE PANDAS DATAFRAMES FROM QUERY RESULTS
df_A = pd.read_sql_query(query_A, db__OR)
df_B = pd.read_sql_query(query_B, db__PG)
# CREATE TABLES FROM PANDAS DATAFRAMES
df_A.to_sql(name='tbl_a', con=eng_PG, if_exists='replace', index=False)
df_B.to_sql(name='tbl_b', con=eng_OR, if_exists='replace', index=False)
I think there would have to be a more efficient, direct way to do this (like database links for moving data across different DB's in Oracle), but I'm fairly new to Python and have generally worked either directly in SQL or in SAS previously. I've been searching for ways to create a table directly from a python cursor result or a SQLAlchemy ResultProxy, but haven't had much luck.
Suggestions?

Working with Cursors in Python

Searched the web and this forum without satisfaction. Using Python 2.7 and pyODBC on Windows XP. I can get the code below to run and generate two cursors from two different databases without problems. Ideally, I'd then like to join these result cursors thusly:
SELECT a.state, sum(b.Sales)
FROM cust_curs a
INNER JOIN fin_curs b
ON a.Cust_id = b.Cust_id
GROUP BY a.state
Is there a way to join cursors using SQL statements in python or pyODBC? Would I need to store these cursors in a common DB (SQLite3?) to accomplish this? Is there a pure python data handling approach that would generate this summary from these two cursors?
Thanks for your consideration.
Working code:
import pyodbc
#
# DB2 Financial Data Cursor
#
cnxn = pyodbc.connect('DSN=DB2_Fin;UID=;PWD=')
fin_curs = cnxn.cursor()
fin_curs.execute("""SELECT Cust_id, sum(Sales) as Sales
FROM Finance.Sales_Tbl
GROUP BY Cust_id""")
#
# Oracle Customer Data Cursor
#
cnxn = pyodbc.connect('DSN=Ora_Cust;UID=;PWD=')
cust_curs = cnxn.cursor()
cust_curs.execute("""SELECT Distinct Cust_id, gender, address, state
FROM Customers.Cust_Data""")
Cursors are simply objects used for executing SQL commands and retrieving the results. The data aren't migrated in a new database and thus joins aren't possible. If you would like to join the data you'll need to have the two tables in the same database. Whether that means brining both tables and their data into a SQLite database or doing it some other way depends on the specifics of your use case, but that would theoretically work.

Categories

Resources