Download all postgresql tables to pandas - python

Is there a simple way to download all tables from a postgresql database into pandas? For example can pandas just load from the .sql file? All the solutions I found on the line suggest connecting to the database and using select from commands, which seems far more complicated.

You have to connect to a database anyway. You can find out table names from odbc cursor and then use pandas.read_table for names ex. pypyodbc finding names:
allnames=cursor.tables( schema='your_schema').fetchall()
-- without view and indexes below
tabnames=[el[2] for el in allnames if el[3]=='TABLE']

Related

sql database a panda

hello someone can help me I must create a program in Python that connects to my database, read the database records and create a pivot table with Pandas with that data
You might be looking for pandas.read_sql
You can use con parameter to pass your database connection string.
Here's a link to connect SQL to Python. Depending on the database you use, library will vary. Search for your particular database to python connection.
https://www.geeksforgeeks.org/how-to-connect-python-with-sql-database/
This will give you a 5 min crash course with Pandas and how to use pivot tables.
https://medium.com/bhavaniravi/python-pandas-tutorial-92018da85a33#:~:text=%20Get%20Started%20With%20Pandas%20In%205%20mins,DataFrame%20and%20understood%20its%20structure%2C%20let%E2%80%99s...%20More%20

Python or R -- create a SQL join using a dataframe

I am trying to find a way, either in R or Python, to use a dataframe as a table in an Oracle SQL statement.
It is impractical, for my objective, to:
Create a string out of a column and use that as a criteria (more than a 1k, which is the limit)
Create a new table in the database and use that (don't have access)
Download the entire contents of the table and merge in pandas (millions of records in the database and would bog down the db and my system)
I have found packages that will allow you to "register" a dataframe and have it act as a "table/view" to allow queries against it, but it will not allow them to be used in a query with a different connection string. Can anyone point me in the right direction? Either to allow two different connections in the same SQL statement (to Oracle and a package like DuckDB) to permit an inner join or direct link to the dataframe and allow that to be used as a table in a join?
SAS does this so effortlessly and I don't want to go back to SAS because the other functionality is not as good as Python / R, but this is a dealbreaker if I can't do database extractions.
Answering my own question here -- after much research.
In short, this cannot be done. A series of criteria, outside of a list or concat, you cannot create a dataframe in python or R and pass it through a query into a SQL Server or Oracle database. It's unfortunate, but if you don't have permissions to write to temporary tables in the Oracle database, you're out of options.

Upload data to Exasol from python Dataframe

I wonder if there's anyways to upload a dataframe and create a new table in Exasol? import_from_pandas assumes the table already exists. Do we need to run a SQL separately to create the table? for other databases, to_sql can just create the table if it doesn't exist.
Yes, As you mentioned import_from_pandas requires a table. So, you need to create a table before writing to it. You can run a SQL create table ... script by connection.execute before using import_from_pandas. Also to_sql needs a table since based on the documentation it will be translated to a SQL insert command.
Pandas to_sql allows to create a new table if it does not exist, but it needs an SQLAlchemy connection, which is not supported for Exasol out of the box. However, there seems to be a SQLAlchemy dialect for Exasol you could use (haven't tried it yet): sqlalchemy-exasol.
Alternatively, I think you have to use a create table statement and then populate the table via pyexasol's import_from_pandas.

Insert Information to MySQL from multiple related excel files

So i have this huge DB schema from a vehicle board cards, this data is actually stored in multiple excel files, my job was to create a database scheema to dump all this data into a MySql, but now i need to create the process to insert data into the DB.
This is an example of how is the excel tables sorted:
The thing is that all this excel files are not well tagged.
My question is, what do i need to do in order to create a script to dump all this data from the excel to the DB?
I'm also using ids, Foreign keys, Primary Keys, joins, etc.
I've thought about this so far:
1.-Normalize the structure of the tables in Excel in a good way so that data can be inserted with SQL language.
2.-Create a script in python to insert the data of each table.
Can you help out where should i start and how? what topics i should google?
With pandas you can easily read from excel (both csv and xlsx) and dump the data into any database
import pandas as pd
df = pd.read_excel('file.xlsx')
df.to_sql(sql_table)
If you have performance issues dumping to MySQL, you can find another way of doing the dump here
python pandas to_sql with sqlalchemy : how to speed up exporting to MS SQL?

How to write a pandas DataFrame directly into a Netezza Database?

I have a pandas DataFrame in python and want this DataFrame directly to be written into a Netezza Database.
I would like to use the pandas.to_sql() method that is described here but it seems like that this method needs one to use SQLAlchemy to connect to the DataBase.
The Problem: SQLAlchemy does not support Netezza.
What I am using at the moment to connect to the database is pyodbc. But this o the other hand is not understood by pandas.to_sql() or am I wrong with this?
My workaround to this is to write the DataFrame into a csv file via pandas.to_csv() and send this to the Netezza Database via pyodbc.
Since I have big data, writing the csv first is a performance issue. I actually do not care if I have to use SQLAlchemy or pyodbc or something different but I cannot change the fact that I have a Netezza Database.
I am aware of deontologician project but as the author states itself "is far from complete, has a lot of bugs".
I got the package to work (see my solution below). But if someone nows a better solution, please let me know!
I figured it out. For my solution see accepted answer.
Solution
I found a solution that I want to share for everyone with the same problem.
I tried the netezza dialect from deontologician but it does not work with python3 so I made a fork and corrected some encoding issues. I uploaded to github and it is available here. Be aware that I just made some small changes and that is mostly work of deontologician and nobody is maintaining it.
Having the netezza dialect I got pandas.to_sql() to work directy with the Netezza database:
import netezza_dialect
from sqlalchemy import create_engine
engine = create_engine("netezza://ODBCDataSourceName")
df.to_sql("YourDatabase",
engine,
if_exists='append',
index=False,
dtype=your_dtypes,
chunksize=1600,
method='multi')
A little explaination to the to_sql() parameters:
It is essential that you use the method='multi' parameter if you do not want to take pandas for ever to write in the database. Because without it it would send an INSERT query per row. You can use 'multi' or you can define your own insertion method. Be aware that you have to have at least pandas v0.24.0 to use it. See the docs for more info.
When using method='multi' it can happen (happend at least to me) that you exceed the parameter limit. In my case it was 1600 so I had to add chunksize=1600 to avoid this.
Note
If you get a warning or error like the following:
C:\Users\USER\anaconda3\envs\myenv\lib\site-packages\sqlalchemy\connectors\pyodbc.py:79: SAWarning: No driver name specified; this is expected by PyODBC when using DSN-less connections
"No driver name specified; "
pyodbc.InterfaceError: ('IM002', '[IM002] [Microsoft][ODBC Driver Manager] Data source name not found and no default driver specified (0) (SQLDriverConnect)')
Then you propably treid to connect to the database via
engine = create_engine(netezza://usr:pass#address:port/database_name)
You have to set up the database in the ODBC Data Source Administrator tool from Windows and then use the name you defined there.
engine = create_engine(netezza://ODBCDataSourceName)
Then it should have no problems to find the driver.
I know you already answered the question yourself (thanks for sharing the solution)
One general comment about large data-writes to Netezza:
I’d always choose to write data to a file and then use the external table/ODBC interface to insert the data. Instead of inserting 1600 rows at a time, you can probably insert millions of rows in the same timeframe.
We use UTF8 data in the flat file and CSV unless you want to load binary data which will probably require fixed width files.
I’m not a python savvy but I hope you can follow me ...
If you need a documentation link, you can start here: https://www.ibm.com/support/knowledgecenter/en/SSULQD_7.2.1/com.ibm.nz.load.doc/c_load_create_external_tbl_syntax.html

Categories

Resources