Im executing the following code, the purposes of the exeuction is to create a lookup-table in the Oracle data base to speed up my load of data. The table I want to load in is simply a vector with ID values, so only one column is loaded.
The code is written per below:
lookup = df.id_variable.drop_duplicates()
conn = my_oracle_connection()
obj = lookup.to_sql(name = 'lookup', con = conn, if_exists = 'replace')
I get the following error when exeucting this:
DatabaseError: Execution failed on sql 'SELECT name FROM sqlite_master
WHERE type='table' AND name=?;': ORA-01036: illegal variable
name/number
I can execute a psql.read_sql() query but above fails.
Now, I dont exactly know how to go about fixing it, im quite new to the technical aspects of getting this to work so any pointers in what direction to take it would be greately appriciated.
Thanks for any time and input!
I had the same issue when using cx_Oracle connection (I was able to use .read_sql function, but not the .to_sql one)
Use SQLalchemy connection instead:
import sqlalchemy as sa
oracle_db = sa.create_engine('oracle://username:password#database')
connection = oracle_db.connect()
dataframe.to_sql('table_name', connection, schema='schema_name', if_exists='append', index=False)
I think the problem happens writing to the Oracle DB using a connection object created by cx_Oracle. SqlAlchemy has a work around:
import cx_Oracle
from sqlalchemy import types, create_engine
conn = create_engine('oracle+cx_oracle://Jeremy:SuperSecret#databasehost:1521/?service_name=gdw')
df.to_sql('TEST', conn, if_exists='replace')
Related
I am trying to use python sqlalchemy to query our PostgreSQL database view using ODBC but I am getting the error
{ProgrammingError}(pyodbc.ProgrammingError) ('42883', '[42883] ERROR: function schema_name() does not exist;\nError while executing the query (1) (SQLExecDirectW)')
[SQL: SELECT schema_name()]
(Background on this error at: https://sqlalche.me/e/14/f405)
Using the code below, I successfully create the connection engine but executing the query seems to be the problem.
When using 'pyodbc' or 'psycopg2' establishing the connection and querying data does work perfectly, but with a warning
'UserWarning: pandas only support SQLAlchemy connectable(engine/connection) ordatabase string URI or sqlite3 DBAPI2 connectionother DBAPI2 objects are not tested, please consider using SQLAlchemy
warnings.warn('
as to why we are looking into establishing the connection the sqlalchemy-way
import config
import sqlalchemy
if __name__ == '__main__':
connection_string = (config.odbc('database_odbc.txt'))
connection_url = sqlalchemy.engine.url.URL.create("mssql+pyodbc", query={"odbc_connect": connection_string})
conn = sqlalchemy.create_engine(connection_url)
query_string = """SELECT [column name in view] FROM public.[name of view]"""
df1 = pd.read_sql(query_string, conn)
print(df1.to_string())
conn.close()
print('Database connection closed.')
As mentioned, the query runs perfectly using the other methods. I already tried different syntax of the database view including
SELECT [column name in view] FROM [database name].public.[name of view]
SELECT [column name in view] FROM [name of view]
and more without success.
Any help is appreciated, thank you!
Thank you #Gord Thompson,
I followed the default postgresql syntax at https://docs.sqlalchemy.org/en/14/core/engines.html
engine = create_engine('postgresql://scott:tiger#localhost/mydatabase')
now the code looks like
import sqlalchemy
if __name__ == '__main__':
engine = create_engine('postgresql://[user]:[password]#[host]/[db]')
conn = engine.connect()
query_string = """SELECT [column name in view] FROM public.[name of view]"""
df1 = pd.read_sql(query_string, conn)
print(df1.to_string())
conn.close()
print('Database connection closed.')
and now it works perfectly, thank you!
Using the psycopg2 module to connect to the PostgreSQL database using python. Able to execute all queries using the below connection method. Now I want to specify a different schema than public to execute my SQL statements. Is there any way to specify the schema name in the connection method?
conn = psycopg2.connect(host="localhost",
port="5432",
user="postgres",
password="password",
database="database",
)
I tried to specify schema directly inside the method.
schema="schema2"
But I am getting the following programming error.
ProgrammingError: invalid dsn: invalid connection option "schema"
When we were working on ThreadConnectionPool which is in psycopg2 and creating connection pool, this is how we did it.
from psycopg2.pool import ThreadedConnectionPool
db_conn = ThreadedConnectionPool(
minconn=1, maxconn=5,
user="postgres", password="password", database="dbname", host="localhost", port=5432,
options="-c search_path=dbo,public"
)
You see that options key there in params. That's how we did it.
When you execute a query using the cursor from that connection, it will search across those schemas mentioned in options i.e., dbo,public in sequence from left to right.
You may try something like this:
psycopg2.connect(host="localhost",
port="5432",
user="postgres",
password="password",
database="database",
options="-c search_path=dbo,public")
Hope this might help you.
If you are using the string form you need to URL escape the options argument:
postgresql://localhost/airflow?options=-csearch_path%3Ddbo,public
(%3D = URL encoding of =)
This helps if you are using SQLAlchemy for example.
This is a fairly common question but even using the answers on SO like here but I still can't connect.
When I setup my connection to pyodbc I can connect with the following:
cnxn = pyodbc.connect('DRIVER={SQL Server Native Client 11.0};SERVER=ip,port;DATABASE=db;UID=user;PWD=pass')
cursor = cnxn.cursor()
cursor.execute("some select query")
for row in cursor.fetchall():
print(row)
and it works.
However to do a .read_sql() in pandas I need to connect with sqlalchemy.
I have tried with both hosted connections and pass-through pyodbc connections like the below:
quoted = urllib.parse.quote_plus('DRIVER={SQL Server Native Client 11.0};Server=ip;Database=db;UID=user;PWD=pass;Port=port;')
engine = sqlalchemy.create_engine('mssql+pyodbc:///?odbc_connect={}'.format(quoted))
engine.connect()
I have tried with both SERVER=ip,port format and the separate Port=port parameter like above but still no luck.
The error I'm getting is Login failed for user 'user'. (18456)
Any help is much appreciated.
I assume that you want to create a DataFrame so when you have a cnxn you can pass it to Pandas read_sql_query function.
Example:
cnxn = pyodbc.connect('your connection string')
query = 'some query'
df = pandas.read_sql_query(query, conn)
I'm trying to follow the method for inserting a Panda data frame into SQL Server that is mentioned here as it appears to be the fastest way to import lots of rows.
However I am struggling with figuring out the connection parameter.
I am not using DSN , I have a server name, a database name, and using trusted connection (i.e. windows login).
import sqlalchemy
import urllib
server = 'MYServer'
db = 'MyDB'
cxn_str = "DRIVER={SQL Server Native Client 11.0};SERVER=" + server +",1433;DATABASE="+db+";Trusted_Connection='Yes'"
#cxn_str = "Trusted_Connection='Yes',Driver='{ODBC Driver 13 for SQL Server}',Server="+server+",Database="+db
params = urllib.parse.quote_plus(cxn_str)
engine = sqlalchemy.create_engine("mssql+pyodbc:///?odbc_connect=%s" % params)
conn = engine.connect().connection
cursor = conn.cursor()
I'm just not sure what the correct way to specify my connection string is. Any suggestions?
I have been working with pandas and SQL server for a while and the fastest way I found to insert a lot of data in a table was in this way:
You can create a temporary CSV using:
df.to_csv('new_file_name.csv', sep=',', encoding='utf-8')
Then use pyobdc and BULK INSERT Transact-SQL:
import pyodbc
conn = pyodbc.connect(DRIVER='{SQL Server}', Server='server_name', Database='Database_name', trusted_connection='yes')
cur = conn.cursor()
cur.execute("""BULK INSERT table_name
FROM 'C:\\Users\\folders path\\new_file_name.csv'
WITH
(
CODEPAGE = 'ACP',
FIRSTROW = 2,
FIELDTERMINATOR = ',',
ROWTERMINATOR = '\n'
)""")
conn.commit()
cur.close()
conn.close()
Then you can delete the file:
import os
os.remove('new_file_name.csv')
It was a second to charge a lot of data at once into SQL Server. I hope this gives you an idea.
Note: don't forget to have a field for the index. It was my mistake when I started to use this lol.
Connection string parameter values should not be enclosed in quotes so you should use Trusted_Connection=Yes instead of Trusted_Connection='Yes'.
While trying to write a pandas' dataframe into sql-server, I get this error:
DatabaseError: Execution failed on sql 'SELECT name FROM sqlite_master WHERE type='table' AND name=?;': ('42S02', "[42S02] [Microsoft][SQL Server Native Client 11.0][SQL Server]Invalid object name 'sqlite_master'. (208) (SQLExecDirectW); [42000] [Microsoft][SQL Server Native Client 11.0][SQL Server]Statement(s) could not be prepared. (8180)")
It seems pandas is looking into sqlite instead of the real database.
It's not a connection problem since I can read from the sql-server with the same connection using pandas.read_sql
The connection has been set using
sqlalchemy.create_engine("mssql+pyodbc:///?odbc_connect=%s" % params)
It's not a database permission problem either since I can write line by line using the same connection parameters as:
cursor = conn.cursor()
cursor.execute('insert into test values (1, 'test', 10)')
conn.commit()
I could just write a loop to instert line by line but I would like to know why to_sql isn't working for me, and I am affraid it won't be as efficient.
Environment:
Python: 2.7
Pandas: 0.20.1
sqlalchemy: 1.1.12
Thanks in advance.
runnable example:
import pandas as pd
from sqlalchemy import create_engine
import urllib
params = urllib.quote_plus("DRIVER={SQL Server Native Client 11.0};SERVER=
<servername>;DATABASE=<databasename>;UID=<username>;PWD=<password>")
engine = create_engine("mssql+pyodbc:///?odbc_connect=%s" % params)
test = pd.DataFrame({'col1':1, 'col2':'test', 'col3':10}, index=[0])
conn=engine.connect().connection
test.to_sql("dbo.test", con=conn, if_exists="append", index=False)
According to the to_sql doc, the con parameter is either an SQLAchemy engine or the legacy DBAPI2 connection (sqlite3). Because you are passing the connection object rather than the SQLAlchemy engine object as the parameter, pandas is inferring that you're passing a DBAPI2 connection, or a SQLite3 connection since its the only one supported. To remedy this, just do:
myeng = sqlalchemy.create_engine("mssql+pyodbc:///?odbc_connect=%s" % params)
# Code to create your df
...
# Now write to DB
df.to_sql('table', myeng, index=False)
try this.
good to connect MS SQL server(SQL Authentication) and update data
from sqlalchemy import create_engine
params = urllib.parse.quote_plus(
'DRIVER={ODBC Driver 13 for SQL Server};'+
'SERVER='+server+';DATABASE='+database+';UID='+username+';PWD='+ password)
engine = create_engine("mssql+pyodbc:///?odbc_connect=%s" % params)
#df: pandas.dataframe; mTableName:table name in MS SQL
#warning: discard old table if exists
df.to_sql(mTableName, con=engine, if_exists='replace', index=False)
So I ran into this same thing. I tried looking through the code, couldn't figure out why it wasn't working but it looks like it gets stuck on this call.
pd.io.sql._is_sqlalchemy_connectable(engine)
I found that if I run this first it returns True, but as soon as I run it after running df.to_sql() it returns False. Right now I'm running it before I do the df.to_sql() and it actually works.
Hope this helps.