I am setting up a new computer at work, and after installing anaconda and other various packages I have on my other computer, I am attempting to run some code that works fine on my other computer.
However, when trying to use SQLalchemy to import into redshift, I am getting a new error that I can't find anything on via google:
'SQLTable' object has no attribute 'insert_statement'
this appears to be some issue with padas.io.sql but I have no clue what
here is the code block:
import io
from pandas.io.sql import SQLTable
def _execute_insert(self, conn, keys, data_iter):
print("Using monkey-patched _execute_insert")
data = [dict((k, v) for k, v in zip(keys, row)) for row in data_iter]
conn.execute(self.insert_statement().values(data))
SQLTable._execute_insert = _execute_insert
import pandas as pd
from sqlalchemy import create_engine
from sqlalchemy import text
dbschema='xref'
engine = create_engine('not_showing_you_this_part',
connect_args={'options': '-csearch_path={}'.format(dbschema)})
# test
from sqlalchemy import event, create_engine
#event.listens_for(engine, 'before_cursor_execute')
def receive_before_cursor_execute(conn, cursor, statement, params, context, executemany):
if executemany:
cursor.fast_executemany = True
cursor.commit()
# end test
api_start_time = time.time()
print('starting SQL query')
# change yh to the dataframe you want to upload
# under name = : enter in the name of the table you want to create or append to
df.to_sql(name='computer_test', con = engine, if_exists = 'append',index=False)
print('sql insert took: ' + str((time.time() - api_start_time)) + ' seconds')
for reference, the monkey-patch part is from:
How to speed up insertion from pandas.DataFrame .to_sql
full error in image
I kept on searching for the answer, only to see that a gentlemen had answered you question in the comment section.
I was using a very similar code for connecting and inserting to Redshift.
And the mistake I was committing was to use the below
conn.execute(self.insert_statement().values(data))
Replace the above with the code below:
conn.execute(self.table.insert().values(data))
Shoutout to https://stackoverflow.com/users/6560549/supershoot for answering it in the comments.
Related
I am querying an Oracle database through cx_Oracle Python and getting the following error after 12 queries. How can I run multiple queries within the same session? Or do I need to quit the session after each query? If so, how do I do that?
DatabaseError: (cx_Oracle.DatabaseError) ORA-02391: exceeded
simultaneous SESSIONS_PER_USER limit (Background on this error at:
https://sqlalche.me/e/14/4xp6)
My code looks something like:
import pandas as pd
import cx_Oracle
from sqlalchemy import create_engine
def get_query(item):
...
return valid_query
cx_Oracle.init_oracle_client(lib_dir=r"\instantclient_21_3")
user = ...
password = ...
tz = pytz.timezone('US/Eastern')
dsn_tns = cx_Oracle.makedsn(...,
...,
service_name=...)
cstr = f'oracle://{user}:{password}#{dsn_tns}'
engine = create_engine(cstr,
convert_unicode=False,
pool_recycle=10,
pool_size=50,
echo=False)
df_dicts = {}
for item in items:
df = pd.read_sql_query(get_query(item), con=cstr)
df_dicts[item.attribute] = df
Thank you!
You can use the cx_Oracle connection object directly in Pandas - for Oracle connection I always found that worked better than sqlalchemy for simple use as a Pandas connection.
Something like:
conn = cx_Oracle.connect(f'{user}/{password}#{dsn_tns}')
df_dicts = {}
for item in items:
df = pd.read_sql(sql=get_query(item), con=conn, parse_dates=True)
df_dicts[item.attribute] = df
(Not sure if you had dates, I just remember that being a necessary element for parsing).
i am having problems to load data into an access-database. For testing purpose i build a little convert functions which takes all data-sets from a hdf-file and writes it into the accdb. Without the #event.listens_for(engine, "before_cursor_execute") functionality it works, but veeery slow. With it, it creates an odd behavior. It creates only one empty table (from the first df) in the db and finishes execution. The for-loop will never be finished and no error raises.
Maybe it’s because the sqlalchemy-access package doesn’t support fast_executemany but couldn’t find any related information about it. Does any of you have some input for me how i can solve it or be able to write data in a faster way into the db?
big thanks!
import urllib
from pathlib import Path
from sqlalchemy import create_engine, event
# PATHS
HOME = Path(__file__).parent
DATA_DIR = HOME / 'output'
FILE_ACCESS = DATA_DIR / 'db.accdb'
FILE_HDF5 = DATA_DIR / 'Data.hdf'
# FUNCTIONS
def convert_from_hdf_to_accb():
# https://github.com/gordthompson/sqlalchemy-access/wiki/Getting-Connected
driver = '{Microsoft Access Driver (*.mdb, *.accdb)}'
conn_str = 'DRIVER={};DBQ={};'.format(driver, FILE_ACCESS)
conn_url = "access+pyodbc:///?odbc_connect={}".format(urllib.parse.quote_plus(conn_str))
# https://medium.com/analytics-vidhya/speed-up-bulk-inserts-to-sql-db-using-pandas-and-python-61707ae41990
# https://github.com/pandas-dev/pandas/issues/15276
# https://stackoverflow.com/questions/48006551/speeding-up-pandas-dataframe-to-sql-with-fast-executemany-of-pyodbc
engine = create_engine(conn_url)
#event.listens_for(engine, "before_cursor_execute")
def receive_before_cursor_execute(conn, cursor, statement, params, context, executemany):
if executemany:
cursor.fast_executemany = True
with pd.HDFStore(path=FILE_HDF5, mode="r") as store:
for key in store.keys():
df = store.get(key)
df.to_sql(name=key, con=engine, index=False, if_exists='replace')
print(' IT NEVER REACHES AND DOESNT RAISE AN ERROR :( ')
# EXECUTE
if __name__ == "__main__":
convert_from_hdf_to_accb()
Maybe it’s because the sqlalchemy-access package doesn’t support fast_executemany
That is true. pyodbc's fast_executemany feature requires that the driver support an internal ODBC mechanism called "parameter arrays", and the Microsoft Access ODBC driver does not support them.
See also
https://github.com/mkleehammer/pyodbc/wiki/Driver-support-for-fast_executemany
I have a sqlite db in my home dir.
stephen#stephen-AO725:~$ pwd
/home/stephen
stephen#stephen-AO725:~$ sqlite db1
SQLite version 2.8.17
Enter ".help" for instructions
sqlite> select * from test
...> ;
3|4
5|6
sqlite> .quit
when I try to connect from a jupiter notebook with sqlalchemy and pandas, sth does not work.
db=sqla.create_engine('sqlite:////home/stephen/db1')
pd.read_sql('select * from db1.test',db)
~/anaconda3/lib/python3.7/site-packages/sqlalchemy/engine/default.py in do_execute(self, cursor, statement, parameters, context)
578
579 def do_execute(self, cursor, statement, parameters, context=None):
--> 580 cursor.execute(statement, parameters)
581
582 def do_execute_no_params(self, cursor, statement, context=None):
DatabaseError: (sqlite3.DatabaseError) file is not a database
[SQL: select * from db1.test]
(Background on this error at: http://sqlalche.me/e/4xp6)
I also tried:
db=sqla.create_engine('sqlite:///~/db1')
same result
Personally, just to complete the code of #Stephen with the modules required:
# 1.-Load module
import sqlalchemy
import pandas as pd
#2.-Turn on database engine
dbEngine=sqlalchemy.create_engine('sqlite:////home/stephen/db1.db') # ensure this is the correct path for the sqlite file.
#3.- Read data with pandas
pd.read_sql('select * from test',dbEngine)
#4.- I also want to add a new table from a dataframe in sqlite (a small one)
df_todb.to_sql(name = 'newTable',con= dbEngine, index=False, if_exists='replace')
Another way to read is using sqlite3 library, which may be more straighforward:
#1. - Load libraries
import sqlite3
import pandas as pd
# 2.- Create your connection.
cnx = sqlite3.connect('sqlite:////home/stephen/db1.db')
cursor = cnx.cursor()
# 3.- Query and print all the tables in the database engine
cursor.execute("SELECT name FROM sqlite_master WHERE type='table';")
print(cursor.fetchall())
# 4.- READ TABLE OF SQLITE CALLED test
dfN_check = pd.read_sql_query("SELECT * FROM test", cnx) # we need real name of table
# 5.- Now I want to delete all rows of this table
cnx.execute("DELETE FROM test;")
# 6. -COMMIT CHANGES! (mandatory if you want to save these changes in the database)
cnx.commit()
# 7.- Close the connection with the database
cnx.close()
Please let me know if this helps!
import sqlalchemy
engine=sqlalchemy.create_engine(f'sqlite:///db1.db')
Note: that you need three slashes in sqlite:/// in order to use a relative path for the DB. If you want an absolute path, use four slashes: sqlite:////
Source: Link
The issue is no backward compatibility as noted by Everila. anaconda installs its own sqlite, which is sqlite3.x and that sqlite cannot load databases created by sqlite 2.x
after creating a db with sqlite 3 the code works fine
db=sqla.create_engine('sqlite:////home/stephen/db1')
pd.read_sql('select * from test',db)
which confirms the 4 slashes are needed.
None of the sqlalchemy solutions worked for me with python 3.10.6 and sqlalchemy 2.0.0b4, it could be a beta issue or version 2.0.0 changed things. #corina-roca's solution was close, but not right as you need to pass a connection object, not an engine object. That's what the documentation says, but it didn't actually work. After a bit of experimentation, I discovered that engine.raw_connect() works, although you get a warning on the CLI. Here are my working examples
The sqlite one works out of the box - but it's not ideal if you are thinking of changing databases later
import sqlite3
conn = sqlite3.connect("sqlite:////home/stephen/db1")
df = pd.read_sql_query('SELECT * FROM test', conn)
df.head()
# works, no problem
sqlalchemy lets you abstract your db away
from sqlalchemy import create_engine, text
engine = create_engine("sqlite:////home/stephen/db1")
conn = engine.connect() # <- this is also what you are supposed to
# pass to pandas... it doesn't work
result = conn.execute(text("select * from test"))
for row in result:
print(row) # outside pands, this works - proving that
# connection is established
conn = engine.raw_connection() # with this workaround, it works; but you
# get a warning UserWarning: pandas only
# supports SQLAlchemy connectable ...
df = pd.read_sql_query(sql='SELECT * FROM test', con=conn)
df.head()
I have this code:
import teradata
import dask.dataframe as dd
login = login
pwd = password
udaExec = teradata.UdaExec (appName="CAF", version="1.0",
logConsole=False)
session = udaExec.connect(method="odbc", DSN="Teradata",
USEREGIONALSETTINGS='N', username=login,
password=pwd, authentication="LDAP");
And the connection is working.
I want to get a dask dataframe. I have tried this:
sqlStmt = "SOME SQL STATEMENT"
df = dd.read_sql_table(sqlStmt, session, index_col='id')
And I'm getting this error message:
AttributeError: 'UdaExecConnection' object has no attribute '_instantiate_plugins'
Does anyone have a suggestion?
Thanks in advance.
read_sql_table expects a SQLalchemy connection string, not a "session" as you are passing. I have not heard of teradata being used via sqlalchemy, but apparently there is at least one connector you could install, and possibly other solutions using the generic ODBC driver.
However, you may wish to use a more direct approach using delayed, something like
from dask import delayed
# make a set of statements for each partition
statements = [sqlStmt + " where id > {} and id <= {}".format(bounds)
for bounds in boundslist] # I don't know syntax for tera
def get_part(statement):
# however you make a concrete dataframe from a SQL statement
udaExec = ..
session = ..
df = ..
return dataframe
# ideally you should provide the meta and divisions info here
df = dd.from_delayed([delayed(get_part)(stm) for stm in statements],
meta= , divisions=)
We will be interested to hear of your success.
I have a Pyramid / SQLAlchemy, MySQL python app.
When I execute a raw SQL INSERT query, nothing gets written to the DB.
When using ORM, however, I can write to the DB. I read the docs, I read up about the ZopeTransactionExtension, read a good deal of SO questions, all to no avail.
What hasn't worked so far:
transaction.commit() - nothing is written to the DB. I do realize this statement is necessary with ZopeTransactionExtension but it just doesn't do the magic here.
dbsession().commit - doesn't work since I'm using ZopeTransactionExtension
dbsession().close() - nothing written
dbsession().flush() - nothing written
mark_changed(session) -
File "/home/dev/.virtualenvs/sc/local/lib/python2.7/site-packages/zope/sqlalchemy/datamanager.py", line 198, in join_transaction
if session.twophase:
AttributeError: 'scoped_session' object has no attribute 'twophase'"
What has worked but is not acceptable because it doesn't use scoped_session:
engine.execute(...)
I'm looking for how to execute raw SQL with a scoped_session (dbsession() in my code)
Here is my SQLAlchemy setup (models/__init__.py)
def dbsession():
assert (_dbsession is not None)
return _dbsession
def init_engines(settings, _testing_workarounds=False):
import zope.sqlalchemy
extension = zope.sqlalchemy.ZopeTransactionExtension()
global _dbsession
_dbsession = scoped_session(
sessionmaker(
autoflush=True,
expire_on_commit=False,
extension=extension,
)
)
engine = engine_from_config(settings, 'sqlalchemy.')
_dbsession.configure(bind=engine)
Here is a python script I wrote to isolate the problem. It resembles the real-world environment of where the problem occurs. All I want is to make the below script insert the data into the DB:
# -*- coding: utf-8 -*-
import sys
import transaction
from pyramid.paster import setup_logging, get_appsettings
from sc.models import init_engines, dbsession
from sqlalchemy.sql.expression import text
def __main__():
if len(sys.argv) < 2:
raise RuntimeError()
config_uri = sys.argv[1]
setup_logging(config_uri)
aa = init_engines(get_appsettings(config_uri))
session = dbsession()
session.execute(text("""INSERT INTO
operations (description, generated_description)
VALUES ('hello2', 'world');"""))
print list(session.execute("""SELECT * from operations""").fetchall()) # prints inserted data
transaction.commit()
print list(session.execute("""SELECT * from operations""").fetchall()) # doesn't print inserted data
if __name__ == '__main__':
__main__()
What is interesting, if I do:
session = dbsession()
session.execute(text("""INSERT INTO
operations (description, generated_description)
VALUES ('hello2', 'world');"""))
op = Operation(generated_description='aa', description='oo')
session.add(op)
then the first print outputs the raw SQL inserted row ('hello2' 'world'), and the second print prints both rows, and in fact both rows are inserted into the DB.
I cannot comprehend why using an ORM insert alongside raw SQL "fixes" it.
I really need to be able to call execute() on a scoped_session to insert data into the DB using raw SQL. Any advice?
It has been a while since I mixed raw sql with sqlalchemy, but whenever you mix them, you need to be aware of what happens behind the scenes with the ORM. First, check the autocommit flag. If the zope transaction is not configured correctly, the ORM insert might be triggering a commit.
Actually, after looking at the zope docs, it seems manual execute statements need an extra step. From their readme:
By default, zope.sqlalchemy puts sessions in an 'active' state when they are
first used. ORM write operations automatically move the session into a
'changed' state. This avoids unnecessary database commits. Sometimes it
is necessary to interact with the database directly through SQL. It is not
possible to guess whether such an operation is a read or a write. Therefore we
must manually mark the session as changed when manual SQL statements write
to the DB.
>>> session = Session()
>>> conn = session.connection()
>>> users = Base.metadata.tables['test_users']
>>> conn.execute(users.update(users.c.name=='bob'), name='ben')
<sqlalchemy.engine...ResultProxy object at ...>
>>> from zope.sqlalchemy import mark_changed
>>> mark_changed(session)
>>> transaction.commit()
>>> session = Session()
>>> str(session.query(User).all()[0].name)
'ben'
>>> transaction.abort()
It seems you aren't doing that, and so the transaction.commit does nothing.