I am trying to access tables from a database using python. There was some code on the website: https://rnacentral.org/help/public-database
import psycopg2.extras
def main():
conn_string = "host='hh-pgsql-public.ebi.ac.uk' dbname='pfmegrnargs' user='reader' password='NWDMCE5xdipIjRrp'"
conn = psycopg2.connect(conn_string)
cursor = conn.cursor(cursor_factory=psycopg2.extras.DictCursor)`
# retrieve a list of RNAcentral databases
query = "SELECT * FROM rnc_database"
cursor.execute(query)
for row in cursor:
print(row)`
When i run this code, i get back a list of databases:
I want to access tables from one of these databases but I don't know what the schema for those tables are or what the values in each list returned represents. I have been looking at 'postgresql to python' resources but all of them are about accessing tables when you know the name of the tables and the columns within.... Is there code for how I can access the table names from the database?
Thank You
Edit: sorry, i thought i linked the website before
The dataset you want to use has schema diagram here https://rnacentral.org/help/public-database
For general purpose I would use something like https://dbeaver.io/ tool it will show you all the schemas in the db and tables inside the schema and so forth. The DBeaver settings to connect to your db would look like this
If you want to keep using python script to explore the db this sql query
SELECT *
FROM pg_catalog.pg_tables
WHERE schemaname != 'pg_catalog' AND
schemaname != 'information_schema';
Should help you.
Related
I need to select some columns from a table with SQL Alchemy. Everything works fine except selecting the one column with '/' in the name. My query looks like:
query = select([func.sum(Table.c.ColumnName),
func.sum(Table.c.Column/Name),
])
Obviously the issue comes from the second line with the column 'Column/Name'. Is there a way in SQL Alchemy to overcome special characters in a column name?
edit:
I've it all inside some class but simplified version of a process looks like this. I create an engine (all necessary db data is inside create_new_engine() function) and map all tables in db into metadata.
def map(self):
from sqlalchemy.engine.base import Engine
# check if engine exist
if not isinstance(self.engine, Engine):
self.create_new_engine()
self.metadata = MetaData({'schema': 'dbo'})
self.metadata.reflect(bind=self.engine)
Then I map a single table with:
def map_table(self, table_name):
table = "{schema}.{table_name}".format(schema=self.metadata.schema, table_name=table_name)
table = self.metadata.tables[table]
return table
In the end I use pandas read_sql_query to run above query with connection and engine established earlier.
I'm connecting to SQL Server.
Since Table.c points to a plain python obect. Try in pure Python
query = select([func.sum(Table.c.ColumnName),
func.sum(getattr(Table.c, 'Column/Name')),
])
So in your case (from comments above) :
func.sum(getattr(Table.c, 'cur/fees'))
Does anybody know of a good Python code that can copy large number of tables (around 100 tables) from one database to another in SQL Server?
I ask if there is a way to do it in Python, because due to restrictions at my place of employment, I cannot copy tables across databases inside SQL Server alone.
Here is a simple Python code that copies one table from one database to another. I am wondering if there is a better way to write it if I want to copy 100 tables.
print('Initializing...')
import pandas as pd
import sqlalchemy
import pyodbc
db1 = sqlalchemy.create_engine("mssql+pyodbc://user:password#db_one")
db2 = sqlalchemy.create_engine("mssql+pyodbc://user:password#db_two")
print('Writing...')
query = '''SELECT * FROM [dbo].[test_table]'''
df = pd.read_sql(query, db1)
df.to_sql('test_table', db2, schema='dbo', index=False, if_exists='replace')
print('(1) [test_table] copied.')
SQLAlchemy is actually a good tool to use to create identical tables in the second db:
table = Table('test_table', metadata, autoload=True, autoload_with=db1)
table.create(engine=db2)
This method will also produce correct keys, indexes, foreign keys. Once the needed tables are created, you can move the data by either select/insert if the tables are relatively small or use bcp utility to dump table to disk and then load it into the second database (much faster but more work to get it to work correctly)
If using select/insert then it is better to insert in batches of 500 records or so.
You can do something like this:
tabs = pd.read_sql("SELECT table_name FROM INFORMATION_SCHEMA.TABLES", db1)
for tab in tabs['table_name']:
pd.read_sql("select * from {}".format(tab), db1).to_sql(tab, db2, index=False)
But it might be be awfully slow. Use SQL Server tools to do this job.
Consider using sp_addlinkedserver procedure to link one SQL Server from another. After that you can execute:
SELECT * INTO server_name...table_name FROM table_name
for all tables from the db1 database.
PS this might be done in Python + SQLAlchemy as well...
I have 2 tables, "vector" and "vocab." I'm trying to do this:
c.execute('SELECT value FROM vector WHERE word IN (SELECT word FROM vocab)')
I'm getting the error sqlite3.OperationalError: no such table: vocab Of course, this is because I haven't connected to the vocab table. I only connected to the vector table before:
dbname = "/Users/quantumjuker/NLP/vector.db"
conn = sqlite3.connect(dbname)
c = conn.cursor()
How can I connect to the vocab table as well so I don't receive an error?
Thanks!
So SQLite3 has an ability to read additional SQLite data files. This is done using the ATTACH command. The nice thing about it is that it is used as an sql query. So you do something like:
c.execute("ATTACH 'vocab.db' AS 'vocabulary'");
Note the AS aliases the database to a name, not the table to a name. Once this is done you can run your query against the vocab table as well.
I am writing a basic gui for a program which uses Peewee. In the gui, I would like to show all the tables which exist in my database.
Is there any way to get the names of all existing tables, lets say in a list?
Peewee has the ability to introspect Postgres, MySQL and SQLite for the following types of schema information:
Table names
Columns (name, data type, null?, primary key?, table)
Primary keys (column(s))
Foreign keys (column, dest table, dest column, table)
Indexes (name, sql*, columns, unique?, table)
You can get this metadata using the following methods on the Database class:
Database.get_tables()
Database.get_columns()
Database.get_indexes()
Database.get_primary_keys()
Database.get_foreign_keys()
So, instead of using a cursor and writing some SQL yourself, just do:
db = PostgresqlDatabase('my_db')
tables = db.get_tables()
For even more craziness, check out the reflection module, which can actually generate Peewee model classes from an existing database schema.
To get a list of the tables in your schema, make sure that you have established your connection and cursor and try the following:
cursor.execute("SELECT table_name FROM information_schema.tables WHERE table_schema='public'")
myables = cursor.fetchall()
mytables = [x[0] for x in mytables]
I hope this helps.
When using mclient it is possible to list all tables in database by issuing command '\d'. I'm using python-monetdb package and I don't know how the same can be accomplished. I've seen example like "SELECT * FROM TABLES;" but I get an error that "tables" table does not exist.
In your query you need to specify that you are looking for the tables table that belongs to the default sys schema, or sys.tables. The SQL query that returns the names of all non-system tables in MonetDB is:
SELECT t.name FROM sys.tables t WHERE t.system=false
In Python this should look something like:
import monetdb.sql
connection = monetdb.sql.connect(username='<username>', password='<password>', hostname='<hostname>', port=50000, database='<database>')
cursor = connection.cursor()
cursor.execute('SELECT t.name FROM sys.tables t WHERE t.system=false')
If you are looking for tables only in a specific schema, you will need to extend your query, specifying the schema:
SELECT t.name FROM sys.tables t WHERE t.system=false AND t.schema_id IN (SELECT s.id FROM sys.schemas s WHERE name = '<schema-name>')
where the <schema-name> is your schema, surrounded by single quotes.