I have a main Python script which connects to a MySQL database and pulls out few records from it. Based on the result returned it starts as many threads (class instances) as many records are grabbed. Each thread should go back to the database and update another table by setting one status flag to a different state ("process started").
To achieve this I tried to:
1.) Pass the database connection to all threads
2.) Open a new database connection from each thread
but none of them were working.
I could run my update without any issue in both cases by using try/except, but the MySQL table has not been updated, and no error was generated. I used commit in both cases.
My question would be how to handle MySQL connection(s) in such a case?
Update based on the first few comments:
MAIN SCRIPT
-----------
#Connecting to DB
db = MySQLdb.connect(host = db_host,
db = db_db,
port = db_port,
user = db_user,
passwd = db_password,
charset='utf8')
# Initiating database cursor
cur = db.cursor()
# Fetching records for which I need to initiate a class instance
cur.execute('SELECT ...')
for row in cur.fetchall() :
# Initiating new instance, appending it to a list and
# starting all of them
CLASS WHICH IS INSTANTIATED
---------------------------
# Connecting to DB again. I also tried to pass connection
# which has been opened in the main script but it did not
# work either.
db = MySQLdb.connect(host = db_host,
db = db_db,
port = db_port,
user = db_user,
passwd = db_password,
charset='utf8')
# Initiating database cursor
cur_class = db.cursor()
cur.execute('UPDATE ...')
db.commit()
Here is an example using multithreading deal mysql in Python, I don't know
your table and data, so, just change the code may help:
import threading
import time
import MySQLdb
Num_Of_threads = 5
class myThread(threading.Thread):
def __init__(self, conn, cur, data_to_deal):
threading.Thread.__init__(self)
self.threadID = threadID
self.conn = conn
self.cur = cur
self.data_to_deal
def run(self):
# add your sql
sql = 'insert into table id values ({0});'
for i in self.data_to_deal:
self.cur.execute(sql.format(i))
self.conn.commit()
threads = []
data_list = [1,2,3,4,5]
for i in range(Num_Of_threads):
conn = MySQLdb.connect(host='localhost',user='root',passwd='',db='')
cur = conn.cursor()
new_thread = myThread(conn, cur, data_list[i])
for th in threads:
th.start()
for t in threads:
t.join()
It seems there's no problem with my code but with my MySQL version. I'm using MySQL standard community edition and based on the official documentation found here :
The thread pool plugin is a commercial feature. It is not included in MySQL community distributions.
I'm about to upgrade to MariaDB to solve this issue.
Looks like mysql 5.7 does support multithreading.
As you tried previously - absolutely make sure to pass the connection within the def worker(). defining the connections globally was my mistake
Here's sample code that prints 10 records via 5 threads, 5 times
import MySQLdb
import threading
def write_good_proxies():
local_db = MySQLdb.connect("localhost","username","PassW","DB", port=3306 )
local_cursor = local_db.cursor (MySQLdb.cursors.DictCursor)
sql_select = 'select http from zproxies where update_time is null order by rand() limit 10'
local_cursor.execute(sql_select)
records = local_cursor.fetchall()
id_list = [f['http'] for f in records]
print id_list
def worker():
x=0
while x< 5:
x = x+1
write_good_proxies()
threads = []
for i in range(5):
print i
t = threading.Thread(target=worker)
threads.append(t)
t.start()
Related
Let me start off by saying I am extremely new to Python and Postgresql so I feel like I'm in way over my head. My end goal is to get connected to the dvdrental database in postgresql and be able to access/manipulate the data. So far I have:
created a .config folder and a database.ini is within there with my login credentials.
in my src i have a config.py folder and use config parser, see below:
def config(filename='.config/database.ini', section='postgresql'):
# create a parser
parser = ConfigParser()
# read config file
parser.read(filename)
# get section, default to postgresql
db = {}
if parser.has_section(section):
params = parser.items(section)
for param in params:
db[param[0]] = param[1]
else:
raise Exception('Section {0} not found in the {1} file'.format(section, filename))
return db
then also in my src I have a tasks.py file that has a basic connect function, see below:
import pandas as pd
from clients.config import config
import psycopg
def connect():
""" Connect to the PostgreSQL database server """
conn = None
try:
# read connection parameters
params = config()
# connect to the PostgreSQL server
print('Connecting to the PostgreSQL database...')
conn = psycopg.connect(**params)
# create a cursor
cur = conn.cursor()
# execute a statement
print('PostgreSQL database version:')
cur.execute('SELECT version()')
# display the PostgreSQL database server version
db_version = cur.fetchone()
print(db_version)
# close the communication with the PostgreSQL
cur.close()
except (Exception, psycopg.DatabaseError) as error:
print(error)
finally:
if conn is not None:
conn.close()
print('Database connection closed.')
if __name__ == '__main__':
connect()
Now this runs and prints out the Postgresql database version which is all well & great but I'm struggling to figure out how to change the code so that it's more generalized and maybe just creates a cursor?
I need the connect function to basically just connect to the dvdrental database and create a cursor so that I can then use my connection to select from the database in other needed "tasks" -- for example I'd like to be able to create another function like the below:
def select_from_table(cursor, table_name, schema):
cursor.execute(f"SET search_path TO {schema}, public;")
results= cursor.execute(f"SELECT * FROM {table_name};").fetchall()
return results
but I'm struggling with how to just create a connection to the dvdrental database & a cursor so that I'm able to actually fetch data and create pandas tables with it and whatnot.
so it would be like
task 1 is connecting to the database
task 2 is interacting with the database (selecting tables and whatnot)
task 3 is converting the result from 2 into a pandas df
thanks so much for any help!! This is for a project in a class I am taking and I am extremely overwhelmed and have been googling-researching non-stop and seemingly end up nowhere fast.
The fact that you established the connection is honestly the hardest step. I know it can be overwhelming but you're on the right track.
Just copy these three lines from connect into the select_from_table method
params = config()
conn = psycopg.connect(**params)
cursor = conn.cursor()
It will look like this (also added conn.close() at the end):
def select_from_table(cursor, table_name, schema):
params = config()
conn = psycopg.connect(**params)
cursor = conn.cursor()
cursor.execute(f"SET search_path TO {schema}, public;")
results= cursor.execute(f"SELECT * FROM {table_name};").fetchall()
conn.close()
return results
I have written a Python Tool with an wxPython GUI which has mainly the task to get a lot of user input regarding Customer Data, Product Data and so on and save it to a SQL Database, at the moment locally with a SQLite3 Database for testing an now switching to MS Azure to have anybody work in the same Database.
As i now plan to use a MS Azure SQL DB i have a few questions an i am hoping this is the right place to ask:
What is the best library to connect to Azure via Python? I found
pyodbc and pymssql but i think both need to have an extra driver
installed? Is this true and is this a problem in real usecases?
I have many modules, like Manage_Customer.py and Manage_Factory.py and so on. In all of them I connect to my Database. I have no module which is like a SQL Master which handels some overhead.
So my code looks like this most of the time:
import wx
import sqlite3
SQL_PATH = "Database_Test.db"
class ManageCustomerToDB(wx.Dialog):
def __init__(self, *args, **kw):
super(ManageCustomerToDB, self).__init__(*args, **kw)
def InitUI(self):
#[GUI an so on...]
# I do this on time inside a module:
conn = sqlite3.connect(SQL_PATH)
self.c = conn.cursor()
# Use functions like the ones below...
def GetCustomerData(self):
self.c.execute("SELECT * FROM Customer WHERE CustomerID = ?", (self.tc_customer_id.GetValue(),))
customer_data = self.c.fetchall()
# Do something with Customer Data
def GetPersonData(self):
self.c.execute("SELECT * FROM Person WHERE PersonID = ?", (self.tc_person_id.GetValue(),))
person_data = self.c.fetchall()
# Do something with Person Data
I hope this example shows what i do. Are there any bigger mistakes i do?
After a read in SQL I dont have to close the DB in any way?
Thanks for your help and let me know if i can improve my question or give more details.
It is not a good idea to create a new connection to Azure SQL every time you CRUD. This is a waste of resources, and when the number of accesses reaches a certain number, it will have a large impact on the performance of mssql.
I suggest you use database connection pool. The pool manager will initial several connections to SQL Server instance, and then reuse these connections when requested.
There is an existing package which you can take advantage of. It is DBUtils. You can use the PoolDB from it with pyodbc together.
A sample for showing how database connection pool works:
import pyodbc
from DBUtils.PooledDB import PooledDB
class Database:
def __init__(self, server, driver, port, database, username, password):
self.server = server
self.driver = driver
self.port = port
self.database = database
self.username = username
self.password = password
self._CreatePool()
def _CreatePool(self):
self.Pool = PooledDB(creator=pyodbc, mincached=2, maxcached=5, maxshared=3, maxconnections=6, blocking=True, DRIVER=self.driver, SERVER=self.server, PORT=self.port, DATABASE=self.database, UID=self.username, PWD=self.password)
def _Getconnect(self):
self.conn = self.Pool.connection()
cur = self.conn.cursor()
if not cur:
raise "connection error"
else:
return cur
# query sql
def ExecQuery(self, sql):
cur = self._Getconnect()
cur.execute(sql)
relist = cur.fetchall()
cur.close()
self.conn.close()
return relist
# non-query sql
def ExecNoQuery(self, sql):
cur = self._Getconnect()
cur.execute(sql)
self.conn.commit()
cur.close()
self.conn.close()
def main():
server = 'jackdemo.database.windows.net'
database = 'jackdemo'
username = 'jack'
port=1433
password = '*********'
driver= '{ODBC Driver 17 for SQL Server}'
ms = Database(server=server, driver=driver, port=port, database=database, username=username, password=password)
resList = ms.ExecQuery("select * from Users")
print(resList)
if __name__ == '__main__':
main()
Answers to your questions:
Q1: What is the best library to connect to Azure via Python? I found pyodbc and pymssql but i think both need to have an extra driver installed? Is this true and is this a problem in real usecases?
Answer: Both of then would be OK. ODBC stands for Open Database Connectivity, so it could be used to connect many databases. I see the Microsoft tutorial uses pyodbc, so maybe it is a better choice.
Q2: I have many modules, like Manage_Customer.py and Manage_Factory.py and so on. In all of them I connect to my Database. I have no module which is like a SQL Master which handels some overhead.
Answer: Use database connection pool.
Q3: After a read in SQL I dont have to close the DB in any way?
Answer: If you use database connection pool, the connection will be put back too pool after you call close() method.
I have a script running which I want to process data when it's added to the database.
import mysql.connector
import time
wait_time = 2
mydb = mysql.connector.connect(
host="localhost",
user="xxx",
passwd="yyy",
database="my_database"
)
mycursor = mydb.cursor()
while True:
sql = "SELECT * FROM data WHERE processed = 0"
mycursor.execute(sql)
records = mycursor.fetchall()
for i, r in enumerate(records):
print(r)
time.sleep(wait_time)
However, if insert a row via different connection, this connection doesn't show it.
I.e. if I connect to my database via a third party app, and insert a row to
However if I restart the above script, it appears.
Any ideas?
Use a message queue (e.g. RabbitMQ). Get the third party App to use it. Message queue implementation has better APIs for processing information asynchronously. Even if you just use the message queue for storing the primary key of the database content.
Alternately enable binary logging and use a replication protocol library to process events.
I just faced the same error. and the easiest way to solve it is... defining mydb and mycursor inside the loop.
import mysql.connector
import time
wait_time = 2
while True:
mydb = mysql.connector.connect(
host="localhost",
user="xxx",
passwd="yyy",
database="my_database"
)
mycursor = mydb.cursor()
sql = "SELECT * FROM data WHERE processed = 0"
mycursor.execute(sql)
records = mycursor.fetchall()
for i, r in enumerate(records):
print(r)
time.sleep(wait_time)
cherrypy.engine subscribe()s a function to connect to a database, and this cherrypy.engine start()s with that database subscribed.
If I want to fetch multiple sets of data from different databases, I would need to connect to different databases.
Is there any way to do it in CherryPy without too much change in code?
You will need to use 2 cursors or at least initialize the same one twice. Try something like this...
import cherrypy
import MySQLdb
def connect(thread_index):
# Create a connection and store it in the current thread
cherrypy.thread_data.db = MySQLdb.connect('host', 'user', 'password', 'dbname')
cherrypy.thread_data.db2 = MySQLdb.connect('host', 'user', 'password', 'dbname2')
# Tell CherryPy to call "connect" for each thread, when it starts up
cherrypy.engine.subscribe('start_thread', connect)
class Root:
def index(self):
# Sample page that displays the number of records in "table"
# Open a cursor, using the DB connection for the current thread
c = cherrypy.thread_data.db.cursor()
c.execute('select count(*) from table')
res = c.fetchone()
c.close()
c = cherrypy.thread_data.db2.cursor()
c.execute('select count(*) from table2')
res = c.fetchone()
return "<html><body>Hello, you have %d records in your table</body></html>" % res[0]
index.exposed = True
cherrypy.quickstart(Root())
Hope this helps!
I'm trying to run a Python script which leaves a connection open permanently, and responds to changes made outside of the script.
So for example:
Data script: Accepts form posts and commits form data to the database
Worker script: Monitors the database for new form posts and takes action accordingly
The relevant code in the worker script is:
import pymysql
conn = pymysql.connect(host='127.0.0.1', port=3306, user='dbuser', passwd='dbpass', db='my_db')
def processForms(Formdat):
c = conn.cursor(pymysql.cursors.DictCursor)
myform.sendEmail(c)
conn.commit()
c.close()
def doForms():
while True:
... get data and store in 'myforms' ...
futures = [executor.submit(processForms, myform) for myform in myforms]
time.sleep(30)
doForms()
Now I don't understand why this is not picking up new forms... If I create a new connection in each iteration of doForms(), the new forms are picked up, but I don't want to be creating and destroying connections all the time.
For example, this modification works:
conn = None
def doForms():
while True:
global conn
conn = pymysql.connect(host='127.0.0.1', port=3306, user='root', passwd='', db='mw_py')
... get data and store in 'myforms' ...
futures = [executor.submit(processForms, myform) for myform in myforms]
conn.close()
time.sleep(30)
Is there a way for me to use the open connection and have it poll the latest data?
Open 1 connection in the beginning of your script. It is not a cheap operation to connect.
Remember ID of last row fetched.
On every iteration select rows with ID greater than last seen.