PostgreSQL : All queries running slow except in PgAdmin or Dbeaver - python

I tried to run a Python program which query our database.
But unfortunately any query i run with psycopg2 is very very very slow.
As an exemple you can see in the picture that the same query took 47ms in Dbeaver and take more than 3 minutes in Python !
In the past i tried to move from dbever to oracle client. But all my queries in oracle were so slow so i decided to stay on dBeaver.
But scripting and make queries on the database is a need for my project.
Here an exemple of table I am querying "bex" :
ID
Name
code
code_acr
1
Paris
PAR
PAR
2
Dijon
DIJ
DIJ
3
Brest
BRS
BRT
4
Toulon
TLN
TLN
Here is the code I am using in Python :
import psycopg2
try:
conn = psycopg2.connect(
host="xxxxx.sogate-pacy.xxxxxx.fr",
dbname="xxxxxx",
user="xxxxxxx",
password="\<xxxxx\>",
port="5432",
options="-c search_path=xxx",
sslmode = "disable"
)
cursor = conn.cursor()
postgreSQL_select_Query = "SELECT * FROM bex"
cursor.execute(postgreSQL_select_Query)
ouvrage = cursor.fetchone()
print("Print each row and it's columns values")
print(cursor.fetchone())
except (Exception, psycopg2.Error) as error:
print("Error while fetching data from PostgreSQL", error)
finally:
# closing database connection.
if conn:
cursor.close()
conn.close()
print("PostgreSQL connection is closed")
I tried to make a Python script to get data from the database
To be noted that this table has only 10 rows at total.
and this happen even if do a select to return me only one row

Your python code fetches the entire bex table into memory in your python process memory space, and then processes the first row and throws the rest away. While pgAdmin4 and DBeaver both uses cursors (or something equivalent to them) to fetch only a small number of rows until you do something which calls for more. You can use a psycopg2 "named cursor" to get the same behavior in your own python code as you get with pgAdmin4.

Related

Unable to truncate table in postgres

I am trying to truncate a table in my Postgres DB using a python script:
conn = get_psql_conn()
cursor = conn.cursor()
cursor.execute("""TRUNCATE table table_name;""")
cursor.close()
conn.close()
Nothing happens.
The script finishes quickly, no error is raised.
The table still has its rows.
I was able to execute other queries with no problem using the same setup.
I did appreciate it if anyone can point out my mistake here!
Thanks

Turbodbc - Getting Output from Multi-Statement Query

I am struggling to access the results of a stored procedure giving me the identity of the row just inserted using Turbodbc 4.1.2, Python 3.7, and SQL Server 2017.
My procedure runs along the following lines:
CREATE OR ALTER PROCEDURE [dbo].[testSP] #var INT
AS
INSERT INTO testTable VALUES (#var)
SELECT 4 --intermediate step to prove concept
--SELECT SCOPE_IDENTITY() as [scope_id] --final goal
--SELECT ##IDENTITY AS '[scope_id]'
My Turbodbc code looks like this:
cnxn = connect(driver='{ODBC Driver 17 for SQL Server}', server=srv, database=db, uid=user, pwd=password, turbodbc_options=options)
crsr = cnxn.cursor()
cmd = "EXEC testSP 1"
crsr.execute(cmd)
df = pd.DataFrame(crsr.fetchallnumpy())
When running the stored procedure without any inserts (ie, just "SELECT 4"), the result set returns fine. However, when running with the insert, which operates correctly, I receive an error "turbodbc.exceptions.InterfaceError: No active result set". The query runs fine in SSMS.
I am guessing that this is because I am receiving two result sets back - one for the insert, and one for the select. I saw from a couple questions on SO that nextset function is available in pymssql and pyodbc, but that the same functionality is not available in turbodbc.
How can I access the second part in my multi-statement query using turbodbc? This seems like a relatively simple issue, but I have been banging my head against the wall for a few hours.

Bulk Insert into SQL Server with Python not working

I'm attempting to bulk insert a csv into a table in SQL server. The catch is, the data doesn't match the columns of the destination table. The destination table has several audit columns that are not found in the source file. The solution I found for this is to insert into a view instead. The code is pretty simple:
from sqlalchemy import create_engine
engine = create_engine('mssql+pyodbc://[DNS]')
conn = engine.connect()
sql = "BULK INSERT [table view] FROM '[source file path]' WITH (FIELDTERMINATOR = ',',ROWTERMINATOR = '\n')"
conn.execute(sql)
conn.close()
When I run the SQL statement inside of SSMS it works perfectly. When I try to execute it from inside a Python script, the script runs but no data winds up in the table. What am I missing?
Update: It turns out bulk inserting into a normal table doesn't work either.
Before closing the connection, you need to call commit() or the SQL actions will be rolled back on connection close.
conn.commit()
conn.close()
It turns out that instead of using SQL Alchemy, I had to use pypyodbc. Not sure why this worked and the other way didn't. Example code found here:How to Speed up with Bulk Insert to MS Server from Python with Pyodbc from CSV
This works for me after checking sqlalchemy transactions refeference. I don't explicitly set conn.commit() as
The block managed by each .begin() method has the behavior such that the transaction is committed when the block completes.
with engine.begin() as conn:
conn.execute(sql_bulk_insert)

Can't create table lock in mssql from python

I'm using ceODBC to connect to sql-server 2014 from a centos 6 box from python 2.7.9.
In a critical part of our code, after inserting rows into a table, I want to do double check that all rows have arrived safely. I want to do this because sometimes an error happens, but ceODBC does not throw an error, and the table is empty.
To make sure that in between inserting data and doing a 'count statement' no other parts of the code do any inserts I want to lock the table. This is where I have my problem. It seems that there is a sp_getapplock build into sql-server, but when I do the following:
import ceodbc
conn = # Make connection here
cursor = conn.cursor()
cursor.execute("declare #result int; exec #result = sp_getapplock #Resource='Dim_Date', #LockMode='Exclusive'; select #result").fetchall()
The result sometimes is a 0, sometimes a -999, but never is the table locked for other connections.
Does anyone know what I''m doing wrong?
(I added the pyodbc tag because I think the two drivers are similar.)

How to update records in SQL Alchemy in a Loop

I am trying to use SQLSoup - the SQLAlchemy extention, to update records in a SQL Server 2008 database. I am using pyobdc for the connections. There are a number of issues which make it hard to find a relevant example.
I am reprojection a geometry field in a very large table (2 million + records), so many of the standard ways of updating fields cannot be used. I need to extract coordinates from the geometry field to text, convert them and pass them back in. All this is fine, and all the individual pieces are working.
However I want to execute a SQL Update statement on each row, while looping through the records one by one. I assume this places locks on the recordset, or the connection is in use - as if I use the code below it hangs after successfully updating the first record.
Any advice on how to create a new connection, reuse the existing one, or accomplish this another way is appreciated.
s = select([text("%s as fid" % id_field),
text("%s.STAsText() as wkt" % geom_field)],
from_obj=[feature_table])
rs = s.execute()
for row in rs:
new_wkt = ReprojectFeature(row.wkt)
update_value = "geometry :: STGeomFromText('%s',%s)" % (new_wkt, "3785")
update_sql = ("update %s set GEOM3785 = %s where %s = %i" %
(full_name, update_value, id_field, row.fid))
conn = db.connection()
conn.execute(update_sql)
conn.close() #or not - no effect..
Updated working code now looks like this. It works fine on a few records, but hangs on the whole table, so I guess it is reading in too much data.
db = SqlSoup(conn_string)
#create outer query
Session = sessionmaker(autoflush=False, bind=db.engine)
session = Session()
rs = session.execute(s)
for row in rs:
#create update sql...
session.execute(update_sql)
session.commit()
I now get connection busy errors.
DBAPIError: (Error) ('HY000', '[HY000] [Microsoft][ODBC SQL Server Driver]Connection is busy with results for another hstmt (0) (SQLExecDirectW)')
It looks like this could be a problem with the ODBC driver - http://sourceitsoftware.blogspot.com/2008/06/connection-is-busy-with-results-for.html
Further Update:
On the server using profiler, it shows the select statement then the first update statement are "starting" but neither complete.
If I set the Select statement to return the top 10 rows, then it does complete and the updates run.
SQL: Batch Starting Select...
SQL: Batch Starting Update...
I believe this is an issue with pyodbc and SQL Server drivers. If I remove SQL Alchemy and execute the same SQL with pyodbc it also hangs. Even if I create a new connection object for the updates.
I also tried the SQL Server Native Client 10.0 driver which is meant to allow MARS - Multiple Active Record Sets but it made no difference. In the end I have resorted to "paging the results" and updating these batches using pyodbc and SQL (see below), however I thought SQLAlchemy would have been able to do this for me automatically.
Try using a Session.
rs = s.execute() then becomes session.execute(rs) and you can replace the last three lines with session.execute(update_sql). I'd also suggest configuring your Session with autocommit off and call session.commit() at the end.
Can I suggest that when your process hangs you do a sp_who2 on the Sql box and see what is happening. Check for blocked spid's and see if you can find anything in the Sql code that can suggest what is happening. If you do find a spid that is blocking others you can do a dbcc inputbuffer(*spidid*) and see if that tells you what the query was it executed. Otherwise you can also attach the Sql profiler and trace your calls.
In some cases it could also be parallelism on the Sql server that cause blocks. Unless this is a data warehouse, I suggest turn your Max DOP off, (set it to 1). Let me know and when I check this again in the morning and you need help, I'll be glad to help.
Until I find another solution I am using a single connection and custom SQL to return sets of records, and updating these in batches. I don't think what I am doing is a particulary unique case, so I am not sure why I cannot handle multiple result sets simultaneously.
Below works but is very, very slow..
cnxn = pyodbc.connect(conn_string, autocommit=True)
cursor = cnxn.cursor()
#get total recs in the database
s = "select count(fid) as count from table"
count = cursor.execute(s).fetchone().count
#choose number of records to update in each iteration
batch_size = 100
for i in range(1,count, batch_size):
#sql to bring back relevant records in each batch
s = """SELECT fid, wkt from(select ROW_NUMBER() OVER(ORDER BY FID ASC) AS 'RowNumber'
,FID
,GEOM29902.STAsText() as wkt
FROM %s) features
where RowNumber >= %i and RowNumber <= %i""" % (full_name,i,i+batch_size)
rs = cursor.execute(s).fetchall()
for row in rs:
new_wkt = ReprojectFeature(row.wkt)
#...create update sql statement for the record
cursor.execute(update_sql)
counter += 1
cursor.close()
cnxn.close()

Categories

Resources