SQLAlchemy Misuse Causing Memory Leak?

SQLAlchemy Misuse Causing Memory Leak? - python

My program is sucking up a meg every few seconds. I read that python doesn't see curors in garbage collection, so I have a feeling that I might be doing something wrong with my use of pydbc and sqlalchemy and maybe not closing something somwhere?
#Set up SQL Connection
def connect():
conn_string = 'DRIVER={FreeTDS};Server=...;Database=...;UID=...;PWD=...'
return pyodbc.connect(conn_string)
metadata = MetaData()
e = create_engine('mssql://', creator=connect)
c = e.connect()
metadata.bind = c
log_table = Table('Log', metadata, autoload=True)
...
atexit.register(cleanup)
#Core Loop
line_c = 0
inserts = []
insert_size = 2000
while True:
#line = sys.stdin.readline()
line = reader.readline()
line_c +=1
m = line_regex.match(line)
if m:
fields = m.groupdict()
...
inserts.append(fields)
if line_c >= insert_size:
c.execute(log_table.insert(), inserts)
line_c = 0
inserts = []
Should I maybe move the metadata block or part of it to the insert block and close the connection each insert?
Edit:
Q: Does it every stabilize?
A: Only if you count Linux blowing away the process :-) (Graph does exclude Buffers/Cache from Memory Usage)

I would not necessarily blame SQLAlchemy. It could also be a problem of the underlaying driver. In general memory leaks are hard to detect. In any case you should ask on the SQLALchemy mailing list where the core developer Michael Bayer is responding on almost
every question...perhaps a better chance to get real help there...

Related

Closing a connection causes stored procedure changes to rollback

Upon closing a connection, deletion done by stored procedure DeleteSproc is getting rolled back. What's wrong with this code?
try:
sql = '{CALL dbo.DeleteSproc (?,?,?,?,?,?,?,?)}'
values = (c['brandId'],c['requestUuid'],c['registrationUuid'],i['tuid'],i['tpid'],c['status'],c['responseType'],i['BookingItemIds'])
connBS = pyodbc.connect(l['connectionStrings'][0])
cursorBS = connBS.cursor()
rv = cursorBS.execute(sql, values)
sql = '{CALL dbo.StatusProc (?,?,?)}'
values = (c['requestUuid'],i['tuid'],i['tpid'])
cursorBS.execute(sql, values)
rows = cursorBS.fetchall()
finally:
cursorBS.close()
connBS.close()

I was able to solve this by putting COMMIT at the end of the stored procedure.

how do i call the function in specified period of time?

I am trying to build a learner which will call the function and store the weights into the DB, now the problem is, it at least takes from 30 to 60 seconds to learn, so if i want to store i need to wait and i decided to call the function with threading timer which will call the function after specified time period,
Example of code:
def learn(myConnection):
'''
Derive all the names,images where state = 1
Learn and Store
Delete all the column where state is 1
'''
id = 0
with myConnection:
cur = myConnection.cursor()
cur.execute("Select name, image FROM images WHERE state = 1")
rows = cur.fetchall()
for row in rows:
print "%s, %s" % (row[0], row[1])
name ='images/Output%d.jpg' % (id,)
names = row[0]
with open(name, "wb") as output_file:
output_file.write(row[1])
unknown_image = face_recognition.load_image_file(name)
unknown_encoding = face_recognition.face_encodings(unknown_image)[0]
# here i give a timer and call the function
threading=Timer(60, storeIntoSQL(names,unknown_encoding) )
threading.start()
id += 1
the thing that did not work with this is that it just worked as if i did not specify the timer it did not wait 60 seconds it just worked normal as if i called the function without the timer, Any ideas on how i can make this work or what alternatives i can use ? ... PS i have already used time.sleep it just stops the main thread i need the Project to be running while this is training
Example of the function that is being called:
def storeIntoSQL(name,unknown_face_encoding):
print 'i am printing'
# connect to the database
con = lite.connect('users2.db')
# store new person into the database rmena
with con:
cur = con.cursor()
# get the new id
cur.execute("SELECT DISTINCT id FROM Users ")
rows = cur.fetchall()
newId = len(rows)+1
# store into the Database
query = "INSERT INTO Users VALUES (?,?,?)"
cur.executemany(query, [(newId,name,r,) for r in unknown_face_encoding])
con
I was also told that MUTEX synchronization could help, where i can make one thread to work only if the other thread has finished it's job but i am not sure how to implement it and am open to any suggestions

I would suggest to use the threading library of python and implement a time.sleep(60) somewhere inside your function or in a wrapper function. For example
import time
import threading
def delayed_func(name,unknown_face_encoding):
time.sleep(60)
storeIntoSQL(name,unknown_face_encoding)
timer_thread = threading.Thread(target=delayed_func, args=(name,unknown_face_encoding))
timer_thread.start()

PySQLPool and Celery, proper way to use it?

I am wondering what is the proper way to use the mysql pool with celery tasks.
At the moment, this is how (the relevant portion) of my tasks module looks like:
from start import celery
import PySQLPool as pool
dbcfg = config.get_config('inputdb')
input_db = pool.getNewConnection(username=dbcfg['user'], password=dbcfg['passwd'], host=dbcfg['host'], port=dbcfg['port'], db=dbcfg['db'], charset='utf8')
dbcfg = config.get_config('outputdb')
output_db = pool.getNewConnection(username=dbcfg['user'], password=dbcfg['passwd'], host=dbcfg['host'], port=dbcfg['port'], db=dbcfg['db'], charset='utf8')
#celery.task
def fetch():
ic = pool.getNewQuery(input_db)
oc = pool.getNewQuery(output_db)
count = 1
for e in get_new_stuff():
# do stuff with new stuff
# read the db with ic
# write to db using oc
# commit from time to time
if count % 1000:
pool.commitPool()
# commit whatever's left
pool.commitPool()
On one machine there can be at most 4 fetch() tasks running at the same time (1 per core).
I notice, however, that sometimes a task will hang and I suspect it is due to mysql.
Any tips on how to use mysql and celery?
Thank you!

I am also using celery and PySQLPool.
maria = PySQLPool.getNewConnection(username=app.config["MYSQL_USER"],
password=app.config["MYSQL_PASSWORD"],
host=app.config["MYSQL_HOST"],
db='configuration')
def myfunc(self, param1, param2):
query = PySQLPool.getNewQuery(maria, True)
try:
sSql = """
SELECT * FROM table
WHERE col1= %s AND col2
"""
tDatas = ( var1, var2)
query.Query(sSql, tDatas)
return query.record
except Exception, e:
logger.info(e)
return False
#celery.task
def fetch():
myfunc('hello', 'world')

Task Chaining with Cursor issue on app engine. Exception: Too big query offset. Anyone else get this issue?

I'm not sure if anyone else has this problem, but I'm getting an exception "Too big query offset" when using a cursor for chaining tasks on appengine development server (not sure if it happens on live).
The error occurs when requesting a cursor after 4000+ records have been processed in a single query.
I wasn't aware that offsets had anything to do with cursors, and perhaps its just a quirk in sdk for app engine.
To fix, either shorten the time allowed before task is deferred (so fewer records get processed at a time) or when checking time elapsed you can also check the number of records processed is still within range. e.g, if time.time() > end_time or count == 2000.Reset count and defer task. 2000 is an arbitrary number, I'm not sure what the limit should be.
EDIT:
After making the above mentioned changes, the never finishes executing. The with_cursor(cursor) code is being called, but seems to start at the beginning each time. Am I missing something obvious?
The code that causes the exception is as follows:
The table "Transact" has 4800 rows. The error occurs when transacts.cursor() is called when time.time() > end_time is true. 4510 records have been processed at the time when the cursor is requested, which seems to cause the error (on development server, haven't tested elsewhere).
def some_task(trans):
tts = db.get(trans)
for t in tts:
#logging.info('in some_task')
pass
def test_cursor(request):
ret = test_cursor_task()
def test_cursor_task(cursor = None):
startDate = datetime.datetime(2010,7,30)
endDate = datetime.datetime(2010,8,30)
end_time = time.time() + 20.0
transacts = Transact.all().filter('transactionDate >', startDate).filter('transactionDate <=',endDate)
count =0
if cursor:
transacts.with_cursor(cursor)
trans =[]
logging.info('queue_trans')
for tran in transacts:
count+=1
#trans.append(str(tran))
trans.append(str(tran.key()))
if len(trans)==20:
deferred.defer(some_task, trans, _countdown = 500)
trans =[]
if time.time() > end_time:
logging.info(count)
if len(trans)>0:
deferred.defer(some_task, trans, _countdown = 500)
trans =[]
logging.info('time limit exceeded setting next call to queue')
cursor = transacts.cursor()
deferred.defer(test_cursor_task, cursor)
logging.info('returning false')
return False
return True
return HttpResponse('')
Hope this helps someone.
Thanks
Bert

Try this again without using the iter functionality:
#...
CHUNK = 500
objs = transacts.fetch(CHUNK)
for tran in objs:
do_your_stuff
if len(objs) == CHUNK:
deferred.defer(my_task_again, cursor=str(transacts.cursor()))
This works for me.

How can I optimize this code?

I'm developing a logger daemon to squid to grab the logs on a mongodb database. But I'm experiencing too much cpu utilization. How can I optimize this code?
from sys import stdin
from pymongo import Connection
connection = Connection()
db = connection.squid
logs = db.logs
buffer = []
a = 'timestamp'
b = 'resp_time'
c = 'src_ip'
d = 'cache_status'
e = 'reply_size'
f = 'req_method'
g = 'req_url'
h = 'username'
i = 'dst_ip'
j = 'mime_type'
L = 'L'
while True:
l = stdin.readline()
if l[0] == L:
l = l[1:].split()
buffer.append({
a: float(l[0]),
b: int(l[1]),
c: l[2],
d: l[3],
e: int(l[4]),
f: l[5],
g: l[6],
h: l[7],
i: l[8],
j: l[9]
}
)
if len(buffer) == 1000:
logs.insert(buffer)
buffer = []
if not l:
break
connection.disconnect()

This might be a better question for a python profiler. There's a few builtin Python profiling modules such as cProfile; you can read more about it here.

I'd suspect it might actually be readline() causing cpu utilization. Try running the same code with the readline replaced with just looking at some constant buffer provided by you. And try running with the database inserts commented out. Establish which one of these is the culprit.

The cpu usage is given by that active loop While True.
How many lines / minute do you have? put the
if len(buffer) == 1000:
logs.insert(buffer)
buffer = []
check after the buffer.append
I will tell you more after you tell me how many insertions you get so far

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

SQLAlchemy Misuse Causing Memory Leak? - python

Related

Closing a connection causes stored procedure changes to rollback

how do i call the function in specified period of time?

PySQLPool and Celery, proper way to use it?

Task Chaining with Cursor issue on app engine. Exception: Too big query offset. Anyone else get this issue?

How can I optimize this code?

Categories

Resources