Python: Having Trouble Connecting to Postgresql DB - python

I have been using the following lines of code for the longest time, without any hitch, but today it seems to have produced the following error and I cannot figure out why. The strange thing is that I have other scripts that use the same code and they all seem to work...
import pandas as pd
import psycopg2
link_conn_string = "host='<host>' dbname='<db>' user='<user>' password='<pass>'"
conn = psycopg2.connect(link_conn_string)
df = pd.read_sql("SELECT * FROM link._link_bank_report_lms_loan_application", link_conn_string)
Error Message:
"Could not parse rfc1738 URL from string '%s'" % name)
sqlalchemy.exc.ArgumentError: Could not parse rfc1738 URL from string 'host='<host>' dbname='<db>' user='<user>' password='<pass>''

change link_conn_string to be like here:
postgresql://[user[:password]#][netloc][:port][/dbname][?param1=value1&...]
Eg:
>>> import psycopg2
>>> cs = 'postgresql://vao#localhost:5432/t'
>>> c = psycopg2.connect(cs)
>>> import pandas as pd
>>> df = pd.read_sql("SELECT now()",c)
>>> print df;
now
0 2017-02-27 21:58:27.520372+00:00

Your connection string is wrong.
You should remove the line where you assign a string to "link_conn_string" variable and replace the next line with something like this (remember to replace localhost, postgres and secret with the name of the machine where postgresql is running, the user and password required to make that connection):
conn = psycopg2.connect(dbname="localhost", user="postgres", password="secret")
Also, you can check if the database is working with "psql" command, from the terminal, type (again, remember to change user and database):
psql -U postgresql database

Related

How to read a SQLite database file using polars package in Python

I want to read a SQLite database file (database.sqlite) using polars package. I tried following unsuccessfully:
import sqlite3
import polars as pl
conn = sqlite3.connect('database.sqlite')
df = pl.read_sql("SELECT * from table_name", conn)
print(df)
Getting following error:
AttributeError: 'sqlite3.Connection' object has no attribute 'split'
Any suggestions?
From the docs, you can see pl.read_sql accepts connection string as a param, and you are sending the object sqlite3.Connection, and that's why you get that message.
You should first generate the connection string, which is url for your db
db_path = 'database.sqlite'
connection_string = 'sqlite://' + db_path
And after that, you can type the updated next line, which gave you problems:
df = pl.read_sql("SELECT * from table_name", connection_string)

How do I run cx_Oracle queries within my session limit?

I am querying an Oracle database through cx_Oracle Python and getting the following error after 12 queries. How can I run multiple queries within the same session? Or do I need to quit the session after each query? If so, how do I do that?
DatabaseError: (cx_Oracle.DatabaseError) ORA-02391: exceeded
simultaneous SESSIONS_PER_USER limit (Background on this error at:
https://sqlalche.me/e/14/4xp6)
My code looks something like:
import pandas as pd
import cx_Oracle
from sqlalchemy import create_engine
def get_query(item):
...
return valid_query
cx_Oracle.init_oracle_client(lib_dir=r"\instantclient_21_3")
user = ...
password = ...
tz = pytz.timezone('US/Eastern')
dsn_tns = cx_Oracle.makedsn(...,
...,
service_name=...)
cstr = f'oracle://{user}:{password}#{dsn_tns}'
engine = create_engine(cstr,
convert_unicode=False,
pool_recycle=10,
pool_size=50,
echo=False)
df_dicts = {}
for item in items:
df = pd.read_sql_query(get_query(item), con=cstr)
df_dicts[item.attribute] = df
Thank you!
You can use the cx_Oracle connection object directly in Pandas - for Oracle connection I always found that worked better than sqlalchemy for simple use as a Pandas connection.
Something like:
conn = cx_Oracle.connect(f'{user}/{password}#{dsn_tns}')
df_dicts = {}
for item in items:
df = pd.read_sql(sql=get_query(item), con=conn, parse_dates=True)
df_dicts[item.attribute] = df
(Not sure if you had dates, I just remember that being a necessary element for parsing).

pd.read_sql, how to convert data types?

I am running a sql query from my python code and attempting to create a dataframe from it. When I execute the code, pandas produces the following error message:
pandas.io.sql.DatabaseError: Execution failed on sql '*my connection info*' : expecting string or bytes object
The relevant code is:
import cx_Oracle
import cx_Oracle as cx
import pandas as pd
dsn_tns = cx.makedsn('x.x.x.x', 'y',
service_name='xxx')
conn = cx.connect(user='x', password='y', dsn=dsn_tns)
sql_query1 = conn.cursor()
sql_query1.execute("""select * from *table_name* partition(p20210712) t""")
df = pd.read_sql(sql_query1,conn)
I was thinking to convert all values in query result to strings with df.astype(str) function, but I cannot find the proper way to accomplish this within the pd.read_sql statement. Would data type conversion correct this issue?

Reading a MySQL query with Python where output is empty

I'm trying to connect MySQL with python in order to automate some reports. By now, I'm just testing the connection. Seems it's working but here comes the problem: the output from my Python code is different from the one that I get in MySQL.
Here I attach the query used and the output that I can find in MySQL:
The testing query for the Python connection:
SELECT accountID
FROM Account
WHERE accountID in ('340','339','343');
The output from MySQL (using Dbiever). For this test, the column chosen contains integers:
accountID
1 339
2 340
3 343
Here I attach the actual output from my Python code:
today:
20200811
Will return true if the connection works:
True
Empty DataFrame
Columns: [accountID]
Index: []
In order to help you understand the problem, please find attached my python code:
import pandas as pd
import json
import pymysql
import paramiko
from datetime import date, time
tiempo_inicial = time()
today = date.today()
today= today.strftime("%Y%m%d")
print('today:')
print(today)
#from paramiko import SSHClient
from sshtunnel import SSHTunnelForwarder
**(part that contains all the connection information, due to data protection this part can't be shared)**
print('will return true if connection works:')
print(conn.open)
query = '''SELECT accountId
FROM Account
WHERE accountID in ('340','339','343');'''
data = pd.read_sql_query(query, conn)
print(data)
conn.close()
Under my point of view doesn't have a sense this output as the connection is working and the query it's being tested previously in MySQL with a positive output. I tried with other columns that contain names or dates and the result doesn't change.
Any idea why I'm getting this "Empty DataFrame" output?
Thanks

Is it possible to use dask dataframe with teradata python module?

I have this code:
import teradata
import dask.dataframe as dd
login = login
pwd = password
udaExec = teradata.UdaExec (appName="CAF", version="1.0",
logConsole=False)
session = udaExec.connect(method="odbc", DSN="Teradata",
USEREGIONALSETTINGS='N', username=login,
password=pwd, authentication="LDAP");
And the connection is working.
I want to get a dask dataframe. I have tried this:
sqlStmt = "SOME SQL STATEMENT"
df = dd.read_sql_table(sqlStmt, session, index_col='id')
And I'm getting this error message:
AttributeError: 'UdaExecConnection' object has no attribute '_instantiate_plugins'
Does anyone have a suggestion?
Thanks in advance.
read_sql_table expects a SQLalchemy connection string, not a "session" as you are passing. I have not heard of teradata being used via sqlalchemy, but apparently there is at least one connector you could install, and possibly other solutions using the generic ODBC driver.
However, you may wish to use a more direct approach using delayed, something like
from dask import delayed
# make a set of statements for each partition
statements = [sqlStmt + " where id > {} and id <= {}".format(bounds)
for bounds in boundslist] # I don't know syntax for tera
def get_part(statement):
# however you make a concrete dataframe from a SQL statement
udaExec = ..
session = ..
df = ..
return dataframe
# ideally you should provide the meta and divisions info here
df = dd.from_delayed([delayed(get_part)(stm) for stm in statements],
meta= , divisions=)
We will be interested to hear of your success.

Categories

Resources