I'm trying to understand what this code is doing behind the scenes:
import psycopg2
c = psycopg2.connect('db=some_db user=me').cursor()
c.execute('select * from some_table')
for row in c:
pass
Per PEP 249 my understanding was that this was repeatedly calling Cursor.next() which is the equivalent of calling Cursor.fetchone(). However, the psycopg2 docs say the following:
When a database query is executed, the Psycopg cursor usually fetches
all the records returned by the backend, transferring them to the
client process.
So I'm confused -- when I run the code above, is it storing the results on the server and fetching them one by one, or is it bringing over everything at once?
It depends on how you configure psycopg2. See itersize and server side cursors.
By default it fetches all rows into client memory, then just iterates over the fetched rows with the cursor. But per the above docs, you can configure batch fetches from a server-side cursor instead.
Related
I need to extract a bunch of records within a singlestore database and insert the records into another table. For performance, the ideal way to do this is to create a query string with an Insert Into statement and iterate through on a daily basis.
I can't seem to get python to execute the query in the database, but it appears to run successfully?
fh = 'file_containing_insert_select_query.sql'
qry = open(fh).read()
for i in range(2):
qry_new = some_custom_function_to_replace_dates(qry, i)
engine = tls.custom_engine_function()
engine.execute(qry_new)
I've verified that the sql statements created by my custom function can be copy/pasted to a sql editor and executed successfully, but it won't run in python... any thoughts?
After execution of the above query, you need to send a 'commit' query to database using connection.commit()
(Where connection contains the Database connection credentials and ip address) so it can save the no. of rows inserted via python program.
if you want it to run really fast, it’s usually better to use set-oriented SQL, like INSERT INTO targetTbl … SELECT FROM …;
That way, it doesn’t have to round-trip through the client app.
I learned from a helpful post on StackOverflow about how to call stored procedures on SQL Server in python (pyodbc). After modifying my code to what is below, I am able to connect and run execute() from the db_engine that I created.
import pyodbc
import sqlalchemy as sal
from sqlalchemy import create_engine
import pandas as pd
import urllib
params = urllib.parse.quote_plus(
'DRIVER={ODBC Driver 17 for SQL Server};'
f'SERVER=myserver.com;'
f'DATABASE=mydb;'
f'UID=foo;'
f'PWD=bar')
cobnnection_string = f'mssql+pyodbc:///?odbc_connect={params}'
db_engine = create_engine(connection_string)
db_engine.execute("EXEC [dbo].[appDoThis] 'MYDB';")
<sqlalchemy.engine.result.ResultProxy at 0x1121f55e0>
db_engine.execute("EXEC [dbo].[appDoThat];")
<sqlalchemy.engine.result.ResultProxy at 0x1121f5610>
However, even though no errors are returned after running the above code in Python, when I check the database, I confirm that nothing has been executed (what is more telling is that the above commands take one or two seconds to complete whereas running these stored procedures successfully on the database admin tool takes about 5 minutes).
How should I understand what is not working correctly in the above setup in order to properly debug? I literally run the exact same code through my database admin tool with no issues - the stored procedures execute as expected. What could be preventing this from happening via Python? Does the executed SQL need to be committed? Is there a way to debug using the ResultProxy that is returned? Any advice here would be appreciated.
Calling .execute() directly on an Engine object is an outdated usage pattern and will emit deprecation warnings starting with SQLAlchemy version 1.4. These days the preferred approach is to use a context manager (with block) that uses engine.begin():
import sqlalchemy as sa
# …
with engine.begin() as conn: # transaction starts here
conn.execute(sa.text("EXEC [dbo].[appDoThis] 'MYDB';"))
# On exiting the `with` block the transaction will automatically be committed
# if no errors have occurred. If an error has occurred the transaction will
# automatically be rolled back.
Notes:
When passing an SQL command string it should be wrapped in a SQLAlchemy text() object.
SQL Server stored procedures (and anonymous code blocks) should begin with SET NOCOUNT ON; in the overwhelming majority of cases. Failure to do so can result in legitimate results or errors getting "stuck behind" any row counts that may have been emitted by DML statements like INSERT, UPDATE, or DELETE.
I searched the web and the Stack Overflow site in particular, and I couldn't find any simple explanation as to the role a cursor plays in PyMySQL. Why is it required? what function does it fulfill? Can I have multiple cursors? Can I pass it as an argument to a class or a function?
Looking at tutorials with examples I wrote code that uses cursors and does work. But so far the use of cursors is counter intuitive to me without really understanding their role and function.
Please help...
The cursor in MySQL is used in most cases to retrieve rows from your resultset and then perform operations on that data. The cursor enables you to iterate over returned rows from an SQL query.
Here is an example.
1) First we declare a cursor:
DECLARE cursor_name CURSOR FOR SELECT_statement;
2) Let's open the cursor.
OPEN cursor_name;
3) Now we can use the FETCH statement to retrieve the next row in the result set.
(Recall the syntax for the FETCH statement: FETCH [ NEXT [ FROM ] ] cursor_name INTO variable_list;. As you can see, cursor is within the syntax, so it is a vital part of the FETCH statement).
FETCH cursor_name INTO variable_list;
4) Summary: Okay, so we have used our cursor_name to FETCH the next row, and we store that in variable_list (a list of variables, comma-separated, where the cursor result should be stored).
This should illustrate the following:
FETCH use MySQL cursors to fetch the next row in a resultset.
The cursor is a tool to iterate over your rows in a resultset, one row at a time.
The pymysql cursor
PyMySQL is used to "interact" with the database. However, take a look at PEP 249 which defines the Python Database API Specification.
PyMySQL is based on the PEP 249 specification, so the cursor is derived from the PEP 249 specification.
And in PEP 249 we see this:
https://www.python.org/dev/peps/pep-0249/#cursor-objects
"Cursor Objects
These objects represent a database cursor, which is used to manage the context of a fetch operation. Cursors created from the same connection are not isolated, i.e., any changes done to the database by a cursor are immediately visible by the other cursors. Cursors created from different connections can or can not be isolated, depending on how the transaction support is implemented (see also the connection's .rollback() and .commit() methods)."
Currently using cx_Oracle module in Python to connect to my Oracle database. I would like to only allow the user of the program to do read only executions, like Select, and NOT INSERT/DELETE queries.
Is there something I can do to the connection/cursor variables once I establish the connection to prevent writable queries?
I am using the Python Language.
Appreciate any help.
Thanks.
One possibility is to issue the statement "set transaction read only" as in the following code:
import cx_Oracle
conn = cx_Oracle.connect("cx_Oracle/welcome")
cursor = conn.cursor()
cursor.execute("set transaction read only")
cursor.execute("insert into c values (1, 'test')")
That will result in the following error:
ORA-01456: may not perform insert/delete/update operation inside a READ ONLY transaction
Of course you'll have to make sure that you create a Connection class that calls this statement when it is first created and after each and every commit() and rollback() call. And it can still be circumvented by calling a PL/SQL block that performs a commit or rollback.
The only other possibility that I can think of right now is to create a restricted user or role which simply doesn't have the ability to insert, update, delete, etc. and make sure the application uses that user or role. This one at least is fool proof, but a lot more effort up front!
I'm calling extremely simple query from Python program using pymsqsql library.
with self.conn.cursor() as cursor:
cursor.execute('select extra_id from mytable where id = %d', id)
extra_id = cursor.fetchone()[0]
Note that parameter binding is used as described in pymssql documentation.
One of the main goals of parameter binding is providing ability for DBMS engine to cache the query plan. I connected to MS SQL with Profiler and checked what queries are actually executed. It turned out that each time a unique statement gets executed (with its own bound ID). I also checked query usage with such query:
select * from sys.dm_exec_cached_plans ec
cross apply
sys.dm_exec_sql_text(ec.plan_handle) txt
where txt.text like '%select extra_id from mytable where id%'
And it shown that the plan is not reused (which is expectable of course due to unique text of each query). This differs much from parameter binding when querying from C#, when we can see that queries are the same but the supplied parameters are different.
So I wonder if I am using pymssql correctly and whether this lib is appropriate for using with MS SQL DBMS.
P.S. I know that MS SQL has a feature of auto-parameterization which works for basic queries, but it is not guarantied, and may not work for complex queries.
You are using pymssql correctly. It is true that pymssql actually does substitute the parameter values into the SQL text before sending the query to the server. For example:
pymssql:
SELECT * FROM tablename WHERE id=1
pyodbc with Microsoft's ODBC Driver for SQL Server (not the FreeTDS ODBC driver):
exec sp_prepexec #p1 output,N'#P1 int',N'SELECT * FROM tablename WHERE id=#P1',1
However, bear in mind that pymssql is based on FreeTDS and the above behaviour appears to be a function of the way FreeTDS handles parameterized queries, rather than a specific feature of pymssql per se.
And yes, it can have implications for the re-use of execution plans (and hence performance) as illustrated in this answer.