solved,every greenlet should have one connection rather than share the same connection.
I want to insert a lot data into a MySQL database. I use gevent to download data from the internet, then insert the data into MySQL. I found umysqldb to insert into MySQL async. However I am getting the following error: Mysql Error 0: Concurrent access in query method.
My code is following:
def insert_into_mysql(conn,cur,pid,user,time,content):
try:
value=[pid,user,time,content]
#print 'value is', value
print 'hi'
cur.execute('insert into post(id,user,time,content) values(%s,%s,%s,%s)',value)
print 'after execute'
conn.commit()
# except MySQLdb.Error,e:
except umysqldb.Error,e:
print "Mysql Error %d: %s" % (e.args[0], e.args[1])
insert_into_mysql is included in download_content:
while len(ids_set) is not 0:
id = ids_set.pop()
print 'now id is', id
pool.spawn(download_content,conn,cur,int(id))
r.sadd('visited_ids',id)
pool.join()
ultramysql doesn't allow you to make multiple queries on the same mysql connection, it just makes it async friendly. So you will either need a new mysql connection for each greenlet or use locking primitives to makes sure only one greenlet is using the connection at a time.
Related
I have starting to learn how to code psycopg2 together with Python. what I do is that I have quite few scripts. Lets have an example where it can be up to 150 connections and as we know, we cannot have more than 100 connections connected at the same time. What I figure out is that whenever I want to do a database query/execution - I then connect to the database, do the execution and then close the database. However I do believe that opening and closing new connection are very expensive and should be longer-lived.
I have done something like this:
DATABASE_CONNECTION = {
"host": "TEST",
"database": "TEST",
"user": "TEST",
"password": "TEST"
}
def get_all_links(store):
"""
Get all links from given store
:param store:
:return:
"""
conn = psycopg2.connect(**DATABASE_CONNECTION)
sql_update_query = "SELECT id, link FROM public.store_items WHERE store = %s AND visible = %s;"
cursor = conn.cursor(cursor_factory=psycopg2.extras.DictCursor)
try:
data_tuple = (store, "yes")
cursor.execute(sql_update_query, data_tuple)
test_data = [{"id": links["id"], "link": links["link"]} for links in cursor]
cursor.close()
conn.close()
return test_data
except (Exception, psycopg2.DatabaseError) as error:
print("Error: %s" % error)
cursor.close()
conn.rollback()
return 1
def get_all_stores():
"""
Get all stores in database
:return:
"""
conn = psycopg2.connect(**DATABASE_CONNECTION)
sql_update_query = "SELECT store FROM public.store_config;"
cursor = conn.cursor(cursor_factory=psycopg2.extras.DictCursor)
try:
cursor.execute(sql_update_query)
test_data = [stores["store"] for stores in cursor]
cursor.close()
conn.close()
return test_data
except (Exception, psycopg2.DatabaseError) as error:
print("Error: %s" % error)
cursor.close()
conn.rollback()
return 1
I wonder how can I make it as effective as possible where I can have alot of scripts connected to the database but still do not hit the max_connection issue?
I do forgot to add that the way im connecting is that I have multiple scripts etc:
test1.py
test2.py
test3.py
....
....
every script runs for themselves
where they all have a import database.py which has the following code that I have showed before.
UPDATE:
from psycopg2 import pool
threaded_postgreSQL_pool = psycopg2.pool.ThreadedConnectionPool(1, 2,
user="test",
password="test",
host="test",
database="test")
if (threaded_postgreSQL_pool):
print("Connection pool created successfully using ThreadedConnectionPool")
def get_all_stores():
"""
Get all stores in database
:return:
"""
# Use getconn() method to Get Connection from connection pool
ps_connection = threaded_postgreSQL_pool.getconn()
sql_update_query = "SELECT store FROM public.store_config;"
ps_cursor = ps_connection.cursor(cursor_factory=psycopg2.extras.DictCursor)
try:
ps_cursor.execute(sql_update_query)
test_data = [stores["store"] for stores in ps_cursor]
ps_cursor.close()
threaded_postgreSQL_pool.putconn(ps_connection)
print("Put away a PostgreSQL connection")
return test_data
except (Exception, psycopg2.DatabaseError) as error:
print("Error: %s" % error)
ps_cursor.close()
ps_connection.rollback()
return 1
While opening and close database connections is not free, it is also not all that expensive when compared to starting up and stopping the python interpreter. If all your scripts are running independently and briefly, that is probably the first thing you should fix. You have to decide and describe how your scripts are getting scheduled and invoked before you can know how (and if) to use a connection pooler.
and as we know, we cannot have more than 100 connections connected at the same time.
100 is the default setting for max_connections, but it is entirely configurable. You can increase it if you want to. If you refactor for performance, you should probably do so in a way that naturally means you don't need to raise max_connections. But refactoring just because you don't want to raise max_connections is letting the tail wag the dog.
You are right, establishing a database connection is expensive; therefore, you should use connection pooling. But there is no need to re-invent the wheel, since psycopg2 has built-in connection pooling:
Use a psycopg2.pool.SimpleConnectionPool or psycopg2.pool.ThreadedConnectionPool (depending on whether you use threading or not) and use the getconn() and putconn() methods to grab or return a connection.
Relatively new to python scripts, so bare with.
I have used speedtest-cli before. I have edited the script so it will insert the values into a sql table as below, however having an issue with one of the inserts. It will insert ping, and download ok, however, the upload is always 2.74 or 2.75 for example, but ONLY when run from a crontab.. very weird.
If I run the python script from cli it will insert values fine.
This is my query, and the values ping, download and upload are coming from the speedtest-cli script.
Here is the full script
import re
import subprocess
import time
import mysql.connector
from mysql.connector import Error
from mysql.connector import errorcode
print "----------------------------------"
print 'Started: {} {}'.format(time.strftime('%d/%m/%y %H:%M:%S'), "")
response = subprocess.Popen('speedtest-cli --simple', shell=True, stdout=subprocess.PIPE).stdout.read()
ping = re.findall('Ping:\s(.*?)\s', response, re.MULTILINE)
download = re.findall('Download:\s(.*?)\s', response, re.MULTILINE)
upload = re.findall('Upload:\s(.*?)\s', response, re.MULTILINE)
ping[0] = ping[0].replace(',', '.')
download[0] = download[0].replace(',', '.')
upload[0] = upload[0].replace(',', '.')
try:
if os.stat('/var/www/html/speed/log.txt').st_size == 0:
print 'Date,Time,Ping (ms),Download (Mbit/s),Upload (Mbit/s)'
except:
pass
print 'PING: {}, DOWN: {}, UP: {}'.format(ping[0], download[0], upload[0])
try:
connection = mysql.connector.connect(host='localhost',
database='dev',
user='dev',
password='dev1')
sql_insert_query = ("""INSERT INTO speedtest(ping, download, upload) VALUES (%s,%s,%s)""", (ping[0], download[0], upload[0]))
cursor = connection.cursor()
result = cursor.execute(*sql_insert_query)
connection.commit()
print ("Insert success into speedtest tbl")
except mysql.connector.Error as error :
connection.rollback() #rollback if any exception occured
print("Failed inserting record into speedtest table {}".format(error))
finally:
#closing database connection.
if(connection.is_connected()):
cursor.close()
connection.close()
print("MySQL conn closed")
print 'Finished: {} {}'.format(time.strftime('%d/%m/%y %H:%M:%S'), "")
Manual script runs ok, just from crontab I get unexpected values. Not sure how to solve.
def hbasePopulate(self,table="abc",MachineIP="xx.xx.xx.xx"):
connection=happybase.Connection(MachineIP,autoconnect=True)
tablename=Reptype.lower()+'rep'
print "Connecting to table "
print tablename
try:
table=connection.table(tablename)
for key,data in table.scan():
print key,data
print table
#except IOError as e:
except:
print "Table does not exists,creating"
self.createTable(table=table,machineIP=machineIP)
with table.batch() as b:
with open('xxx.csv','r') as queryFile:
for lines in queryFile:
lines=lines.strip("\n")
splitRecord=lines.split(",")
key=splitRecord[0]
key=key.replace("'","")
val=",".join(splitRecord[1:])
val=ast.literal_eval(val)
table.put(splitRecord[0],val)
for key,data in table.scan():
print key,data
def createTable(self,table="abc",MachineIP=""):
connection=happybase.Connection(MachineIP,autoconnect=True)
print "Connection Handle",connection
tname=table.lower()
tablename=str(tname)
print "Creating table : "+table+", On Hbase machine : "+MachineIP
families={"cf":{} ,} #using default column family
connection.create_table(table,families=families)
print "Creating table done "
Every time I run this script it populated data to hbase table but it leaves a connection open. When I check using netstat -an I see the connection count has increased which persists even after the script completes.
Am I missing something? Do we need to explicitly close connection?
Thanks for helping.
Got the solution .Turns out to be this
try:
connection.close()
except Exception as e:
print "Unable to close connection to hbase "
print e
If the program exited, any network connections are automatically closed. What you're likely seeing is the TIME_WAIT state for already closed connections.
This link contains shows the database I've created in the mysql workbench and the connection I have established in the code but the database is unknown for some reason. Is there a step I've missed?
http://gyazo.com/d995c4da99043da43bfbd057a0a839c7
__author__ = 'avi'
from TwitterSearch import *
import json
twtsearch = TwitterSearch(
consumer_key='PXTUrlRfgC1zSTsAPU9z6EHtD',
consumer_secret='qM9F4FVj1qLFc6f795r96DQPNAJO8hkbWy4PXWYLfQcYyNGY7D',
access_token='2943116292-wVHEjbfjX7OFqaOURBqim5o7Vs6lZyjxsoto8nD',
access_token_secret='CJAppSRY9TZ5cwYTABZhH2YTd0rm5IzBDqPder6v4qLBA'
)
twtsearchorder = TwitterSearchOrder()
twtsearchorder.set_keywords(['iphone6'])
twtsearchorder.set_language('en')
twtsearchorder.set_include_entities(True)
tweet_limit=50
parsed_tweets= {}
table="twtinfo"
import MySQLdb as mdb
con = mdb.connect('localhost', 'root','root','tweetinfo')
cur=con.cursor()
for tweet in twtsearch.search_tweets_iterable(twtsearchorder):
if tweet_limit > 0 :
parsed_tweets['name'] = tweet['user']['screen_name']
parsed_tweets['content'] = tweet['text']
parsed_tweets['user_id'] = tweet['user']['id']
parsed_tweets['fav_count'] = tweet['favorite_count']
parsed_tweets['location'] = tweet['user']['location']
parsed_tweets['retweet_count'] = tweet['retweet_count']
placeholders= ', '.join(['%s'] *len(parsed_tweets))
columns = ', '.join(parsed_tweets.keys())
sql="INSERT into %s ( %s ) VALUES ( %s )" % (table, columns, placeholders)
cur.execute(sql,parsed_tweets.values())
tweet_limit -= 1
The MySQL process is complaining about the database you are trying to access, namely tweetinfo isn't existing. MySQL error 1049 is usually an indication of having to forgot to select a database, but you did as forth argument to mdb.connect()
Possible errors could be:
That you have several MySQL processes running, with the one with the proper database not being on the default MySQL port.
That somehow your database GUI application hasn't actually submitted your database and table to the MySQL process.
That MySQL isn't running? You would probably get a different error message for that, but it could be an idea to make sure it is just in case.
Just to check if your tables exists and that your database is in place, open a terminal and write the following commands:
mysql -u root -proot
use tweetinfo;
show create table twtinfo;
Another thing to try could be to ask the MySQL process about which database it thinks you are using. Try adding something like the following to your code:
cur.execute("SELECT DATABASE() FROM DUAL;")
print("Database is: %s.", cur.fetchone()[0])
I'm not a python programmer, so I'm not entirely confident that will work without some adjustments.
If none of this gave you a good lead, I'm not quite sure what's wrong.
I'm writing a python script to monitor a few 1wire sensors off of a Raspberry Pi and store the results in a MySQL database.
Using the MySQL Connector/Python library I can successfully connect to the database, and run the query however the transaction doesn't seem to fully commit. I know the query runs successfully since the out param is set to the new auto-incremented ID.
CREATE TABLE `lamp`.`sensors` (
`SensorID` int(11) unsigned NOT NULL AUTO_INCREMENT,
`SensorSerial` char(15) NOT NULL,
`SensorFamily` tinyint(4) NOT NULL,
PRIMARY KEY (`SensorID`),
UNIQUE KEY `SensorID_UNIQUE` (`SensorID`),
UNIQUE KEY `SensorSerial_UNIQUE` (`SensorSerial`)
)
CREATE PROCEDURE `lamp`.`AddSensor` (sensorSerial char(15),
sensorFamily tinyint, out returnValue int)
BEGIN
INSERT INTO sensors (SensorSerial,SensorFamily) VALUES (sensorSerial,sensorFamily);
SET returnValue=LAST_INSERT_ID();
END
However when I attempt to query the table (Select * from sensors) I get 0 results. If I run the procedure from the MySQL Workbench or from a .Net application everything works as expected. Which means I'm missing something when it comes to the Connector/Python, but I have no clue what. I'm extremely baffled since the auto-increment value does increase but no records are added. There are also no errors reported
def Test(self):
#this line works fine
#self.RunProcedure("INSERT INTO sensors (SensorSerial,SensorFamily) VALUES ('{0}',{1})".format(self.ID,self.Family),False,())
#this line does not?
args=self.RunProcedure('AddSensor',True,(self.ID,self.Family,-1))
if args[2]>=1:
logging.debug("Successfully added sensor data '{1}' for sensor '{0}' to the database".format(self.ID,value))
return True
else:
logging.critical("Failed to add Data to Database for unknown reason. SensorID: {0} Type: {1} Data:{2}".format(self.ID,self.Family,value))
def RunProcedure(self,proc,isStored,args):
try:
logging.debug("Attempting to connect to database.")
connection=mysql.connector.connect(user='root',password='1q2w3e4r',host='localhost',database='LAMP')
except mysql.connector.Error as e:
logging.exception("Failed to connect to mysql database.\r\n {0}".format(e))
else:
logging.debug("Successfully connected to database.")
try:
cursor=connection.cursor()
if isStored:
args = cursor.callproc(proc,args)
return args
else:
cursor.execute(proc,args)
#these do not seem to solve the issue.
#cursor.execute("Commit;")
#connection.commit()
except mysql.connector.Error as e:
logging.exception("Exception while running the command '{0}' with the args of '{1}' exception is {2}".format(proc,args,e))
finally:
logging.debug("Closing connection to database")
cursor.close()
connection.close()
Output from to the log looks like this;
2013-06-09 13:21:25,662 Attempting to connect to database.
2013-06-09 13:21:25,704 Successfully connected to database.
2013-06-09 13:21:25,720 Closing connection to database
2013-06-09 13:21:25,723 Successfully added sensor data '22.25' for sensor '10.85FDA8020800' to the database
**Edit
Not sure why but adding autocommit=True to the connection.open params seems to have resolved the issue. Since it was a problem with committing why didn't connection.commit() or the cursor.execute('commit;') correct the issue?
The problem is actually in your code:
..
try:
cursor=connection.cursor()
if isStored:
args = cursor.callproc(proc,args)
return args
else:
cursor.execute(proc,args)
connection.commit()
except mysql.connector.Error as e:
..
If you are using the Stored Routine, you are immediately returning, so commit() will never be called.