Python function is executed more than once - python

I want to get and print all the records from a table in a Mysql DB that is in a VPS but when I use a for loop to print all the records retrieved I get them printed 2-3 times and not just 1.
#!/usr/bin/env python
#Modules imported
# VPS
# Parameters to connecto to de DB in the VPS
def connDB():
global conn
global cur
try:
conn = MySQLdb.connect(DBhost, DBuser, DBpass, DBdb, charset='utf8', use_unicode=True)
cur = conn.cursor()
print("...DB VPS connect")
except:
print("...DB VPS ERROR")
pass
def selectallDB(query):
global conn
global cur
try:
cur.execute(query)
localrpis = cur.fetchall()
conn.commit()
print("... select All OK")
print('Total Row(s):', cur.rowcount)
for i in localrpis:
print(i)
except:
print("... select ERROR")
connDBLocal()
pass
def getallDB():
c_select = """
SELECT * FROM %s
"""%(trpistmsMCSIR)
selectallDB(c_select)
def checktime(sec):
# Function to trigger the read data funtion from "sec" to "sec"
while True:
res = round(time()%sec)
if res==0.0:
getallDB()
sleep(0.2) # Changed to 0.5
connDB()
while True:
checktime(10)
I assume that the for loop inside the try is executed 2 times (sometimes even 3) but I don't get why.
...DB connect
...DB VPS connect
... select All OK
('Total Row(s):', 2L)
('SELECT result OK')
... select All OK
('Total Row(s):', 2L)
('SELECT result OK')
As a work around after many changes I got it "working" changing the sleep(0.2) to sleep(0.5) but I'm not sure if this resolves the problem or it's just an illusion that the loop is working as expected.

It is not the for loop as can be seen from duplicate ... select All OK lines. The problem is your while loop as round(0.2) is equal to 0.0. That's why it is fixed when you make it 0.5. Theoretically it may also run 3 times (at seconds 0.0, 0.2, and 0.4) if the database operation is fast enough.
If you want to run your code every 10 seconds, sleeping 0.5 seconds in between checks is a good compromise.

Related

How do I improve the performance of python insert into Postgres

Using execute 40 inserts per minute
Using executemany 41 inserts per minute
Using extras.execute_Values 42 inserts per minute
def save_return_to_postgres(record_to_insert) -> Any:
insert_query = """INSERT INTO pricing.xxxx (description,code,unit,price,created_date,updated_date)
VALUES %sreturning id"""
records = (record_to_insert[2],record_to_insert[1],record_to_insert[3],record_to_insert[4],record_to_insert[0],datetime.datetime.now())
# df = df[["description","code","unit","price","created_date","updated_date"]]
try:
conn = psycopg2.connect(database = 'xxxx',
user = 'xxxx',
password = 'xxxxx',
host= 'xxxx',
port='xxxx',
connect_timeout = 10)
print("Connection Opened with Postgres")
cursor = conn.cursor()
extras.execute_values(cursor, insert_query, [records])
conn.commit()
# print(record_to_insert)
finally:
if conn:
cursor.close()
conn.close()
print("Connection to postgres was successfully closed")
valores = df.values
for valor in valores:
save_return_to_postgres(valor)
print(valor)
I don't know how much lines-per-insert postgres can take
But many SQL-based databases can take multiples inserts at the same time.
So instead of running
for insert_query in queries:
sql_execute(insert_query)
Try making several inserts at once in a single command
(Test it on pure SQL first to see if it works)
insert_list=[]
for insert_query in queries:
insert_list.append(insert_query)
sql_execute(insert_list)
I had a similar issue and this link helped me
https://www.sqlservertutorial.net/sql-server-basics/sql-server-insert-multiple-rows/
(of course mine was not Postgres but the idea is the same,
decrease internet time by running multiple inserts in one command)
Tamo Junto
Use execute_batch or execute_values and use them over the entire record set. As of now you are not using the batch capabilities of execute_values because you are inserting a single record at a time. You are further slowing things down by opening and closing a connection for each record as that is a time/resource expensive operation. Below is untested as I don't have the actual data and am assuming what df.values is.
insert_query = """INSERT INTO pricing.xxxx (description,code,unit,price,created_date,updated_date)
VALUES %s returning id"""
#execute_batch query
#insert_query = """INSERT INTO pricing.xxxx #(description,code,unit,price,created_date,updated_date)
# VALUES (%s, %s, %s, %s, %s, %s) returning id"""
valores = df.values
#Create a list of lists to pass to query as a batch instead of singly.
records = [[record_to_insert[2],record_to_insert[1],record_to_insert[3],
record_to_insert[4],record_to_insert[0],datetime.datetime.now()]
for record_to_insert in valores]
try:
conn = psycopg2.connect(database = 'xxxx',
user = 'xxxx',
password = 'xxxxx',
host= 'xxxx',
port='xxxx',
connect_timeout = 10)
print("Connection Opened with Postgres")
cursor = conn.cursor()
extras.execute_values(cursor, insert_query, [records])
#execute_batch
#extras.execute_batch(cursor, insert_query, [records])
conn.commit()
# print(record_to_insert)
finally:
if conn:
cursor.close()
conn.close()
print("Connection to postgres was successfully closed")
For more information see Fast execution helpers. Note that both the execute_values and execute_batch functions have a page_size argument of default value 100. This is the batch size for the operations. For large data sets you can reduce the time further by increasing the page_size to make bigger batches and reduce the number of server round trips .

How to optimize fetch from cursor with 5 millions raw

I got a table from MSSQL with 5M rows and when I fetch all the rows of this table, this take me 2~3 minutes. I want (if possible) to optimize that.
That's my code :
cursor.execute("SELECT * FROM MyTable")
rows = cursor.fetchall() # that takes 2~3 minutes
# some code for setup the output that take only few seconds
I already tried, to used :
while True:
rows = cursor.fetchmany(500000)
if not rows:
break
# Do some stuff
And Also with fetchone.
But again i'm between 2-3 mins :/ How to optimize that ? Maybe using thread but I don't know how.
thanks for your help.
I think you can limit the number of lines returned by your query even if you have to make several calls to your database.
About the Threads, you have several solutions:
A single connection but a different cursor for each Thread
One connection for each Thread and one cursor from that connection
In any case you need a ThreadedConnectionPool. Here is a small example of one of the ways to do it
import psycopg2
from psycopg2 import pool
from threading import Thread
from time import sleep
threaded_connection_pool = None
thread_table = list()
def get_new_connection():
global threaded_postgreSQL_pool
connection = None
while not isinstance(connection, psycopg2.extensions.connection):
try:
connection = threaded_postgreSQL_pool.getconn()
except pool.PoolError:
sleep(10) # Wait a free connection
return connection, connection.cursor()
def thread_target():
connection, cursor = get_new_connection()
with connection, cursor:
# Do some stuff
pass
threaded_connection_pool = psycopg2.pool.ThreadedConnectionPool(
# YOUR PARAM
)
for counter_thread in range(10):
thread = Thread(
target=thread_target,
name=f"Thread n°{counter_thread}"
)
thread_table.append(thread)
thread.start()
#
# Do many more stuff
#
for thread in thread_table:
thread.join()
# End
I prefer to use the first solution "A single connection but a different cursor for each Thread"
For that : I have to do something like that ?
result = []
cursor = connection.cursor()
def fetch_cursor(cursor):
global result
rows = cursor.fetchall()
if rows:
result += beautify_output(rows)
######### THIS CODE BELOW IS INSIDE A FUNCTION ######
thread_table = []
limit = 1000000
offset = 0
sql = "SELECT * FROM myTABLE"
while True:
try:
cursor.execute(f"{sql} LIMIT {limit} OFFSET {offset}")
except Exception as e:
break
offset += limit
thread = Thread(target=fetch_cursor, args=(cursor,))
thread_table.append(thread)
thread.start()
for thread in thread_table:
thread.join()
print(result)
So something like that should work ? (I will try that tommorow)

Validate list elements if they take X time to run in a loop

I have a list that contains SQL code that can be executed in an external Trino CLI. So for instance, my nested list would look like:
sql = []
sql = [['test 1', 'SELECT * FROM a.testtable1'],['test 2', 'SELECT * FROM a.testtable1']]
This simple loop detects if there's a syntax error:
sql_results = []
for l in sql:
sql_code = l[1]
try:
cur.execute(sql_code
rows = cur.fetchall()
except Exception as e:
status = str(e)
status = 'OK' if len(rows) > 1
sql_results.append([l[0],sql_code,status])
It works good, but sometimes the queries take too long and kill the process. Knowing that if one query lasts more than 3 seconds in its execution, then it means the syntax is OK (and I'm only interested in checking the syntax, and NOT in the result of the query) I'd like to add a time validation. Something like:
If the SQL execution lasts more than 3 seconds, then kill it and status = 'OK'
I tried this using time:
import time
sql_results = []
for l in sql:
sql_code = l[1]
try:
timeout = time.time() + 2
cur.execute(sql_code)
rows = cur.fetchall()
except Exception as e:
status = str(e)
status = 'OK' if len(rows) > 1 or time.time() > timeout
sql_results.append([l[0],sql_code,status])
But it does not do much, and I keep getting the occasional timeouts. Any idea?
Instead of actually running the query, you can ask Trino if the query syntax is valid. Just add the following to each of your queries:
EXPLAIN (TYPE VALIDATE)
https://trino.io/docs/current/sql/explain.html#explain-type-validate

Why it's the else statement executed and if it's ignored?

I'm having a problem with a Python script, the if branch it's not executed, no matter what parameters I give to the script. Is it something wrong with my code? I'm executing an HTML form and the result was OK until I've added some content to the else statement..I've tried everything but it still doesn't want to work...
#!/usr/bin/python3
import pymysql
import cgi
from http import cookies
# Open database connection
db = pymysql.connect("localhost","superadmin","123","dinamic" )
# prepare a cursor object using cursor() method
cursor = db.cursor()
data=cgi.FieldStorage()
a=data.getvalue('e1')
b=data.getvalue('p1')
# Prepare SQL query to fetch a record into the database.
sql = "SELECT id, email, password FROM register WHERE email = %s AND password = %s"
try:
# Execute the SQL command
cursor.execute(sql, (a, b))
# Commit your changes in the database
db.commit()
c=cookies.SimpleCookie()
# assign a value
c['mou']=a
# set the xpires time
c['mou']['expires']=24*60*60
# print the header, starting with the cookie
print (c)
print("Content-type: text/html", end="\r\n\r\n", flush=True);
print('''<html><head><title>First python script for Security and Encrpytion class</title></head><body><center><h2>Successfully login!</h2><br><img src='image/2.gif'></body></html>''');
except:
db.commit()
print("Content-type: text/html", end="\r\n\r\n", flush=True);
print("<html>");
print("<body>");
print("<center>")
print("<h2>Fail to login!</h2>");
print("<img src='image/dinamic.gif'>");
print("</body>");
print("</html>");
# Rollback in case there is any error
db.rollback()
cursor.execute() doesn't return the number of rows that were selected. You can call cursor.fetchone() to see if it returns a row.
There's also no need to call db.commit() since you haven't made any changes.
try:
# Execute the SQL command
cursor.execute(sql, (a, b)))
row = cursor.fetchone()
if row:
c=cookies.SimpleCookie()
# assign a value
c['mou']=a
# set the xpires time
c['mou']['expires']=24*60*60
# print the header, starting with the cookie
print (c);
print ("Content-type: text/html", end="\r\n\r\n", flush=True);
print("");
print('''<html><head><title>Security & Encryption class - First script</title></head><body><h2>successfully login!</h2>''');
print("<center>");
print("<img src='image/successfully.gif'>");
print("</center>");
print("</body></html>");
else:
print ("Content-type: text/html", end="\r\n\r\n", flush=True)
print("<html>")
print("<body>")
print("<center>")
print("<h2>Login fail!</h2>")
print("<img src='image/dinamic.gif'>")
print("<br><br>")
print('<button id="myButton" class="float-left submit-button" >Home</button>')
print('''<script type="text/javascript">
document.getElementById("myButton").onclick = function () {
location.href = "index.html";
};
</script>''');
print("</center>");
print("</body>");
print("</html>");
except:
# Rollback in case there is any error
db.rollback()

Running Python directly is much faster compared to when Django runs python

So I have a SQL query that takes really long to load using Django, 10000 rows takes about 30 seconds. If I run the exact same code directly using python it does this in 2 seconds. For some reason, the loop I built takes really long to execute when Django runs the code, does anyone know why that is? Can I do something to increase the performance and get rid of this inconvenience?
import psycopg2
def doQuery( conn ) :
cur = conn.cursor()
cur.execute("SELECT * FROM table WHERE substring(addr from 0 for 5)
= '\\x82332355'::bytea")
return cur.fetchall()
myConnection = psycopg2.connect( host=hostname, user=username,
password=password, dbname=database )
results = doQuery( myConnection )
def lists(t):
if type(t) == list or type(t) == tuple:
return [lists(i) for i in t]
return t
results = lists(results)
for result in results:
result[1] = str(result[1]).encode("hex"))
result[3] = datetime.datetime.fromtimestamp(int(result[3])).strftime('%Y-%m-%d %H:%M:%S')
result[6] = "Not Avaliable"
print result
This for loop ^^^^^^^^ takes really long in Django, fast in python
myConnection.close()

Categories

Resources