I'm having some trouble converting some working code to a function. The first code sample is running without issue, but the function below just hangs. A little debugging using print showed that the function moves all the way through the cursor to the final record, appending it to the list, but the the program hangs and won't exit.
Cursor is part of the cx_Oracle module. The intent is to query an oracle db, then create a list. I have tested the original code on several queries and had no problems (max return is about 15000 rows). At this point I can make the code work using the original format, but I'd like to know what I might be doing wrong in the function.
Working code:
cursor = db.cursor()
cursor.execute(mysqlexp)
for row in cursor:
myList.append(row)
cursor.close()
Function (not working):
def sqlToList(listname, sqlexp):
cursor = db.cursor()
cursor.execute(sqlexp)
for row in cursor:
listname.append(row)
#print statement here indicates that final record appends
#but then the program stops responding
#print statement here never appears (indicating for loop hasn't exited?)
cursor.close()
sqlToList(myList, mysqlexp)
Related
I'm new to SQL and psycopg2. I'm playing around a bit and try to find our how to display the results of a query. I have a small script where I make a connection to the database and create a cursor to run the query.
from psycopg2 import connect
conn = connect(host="localhost", user="postgres", dbname="portfolio",
password="empty")
cur = conn.cursor()
cur.execute("SELECT * FROM portfolio")
for record in cur:
print("ISIN: {}, Naam: {}".format(record[0], record[1]))
print(cur.fetchmany(3))
cur.close()
conn.close()
If I run this code, the first print is fine, but the second print-statement returns [].
If I run only one of the two print-statements, I get a result every time.
Can someone explain me why?
The cursor loops over the results and returns one at a time. When it has returned all of them, it can't return any more. This is precisely like when you loop over the lines in a file (there are no more lines once you reach the end of the file) or even looping over a list (there are no more entries in the list after the last one).
If you want to manipulate the results in Python, you should probably read them into a list, which you can then traverse as many times as you like, or search, sort, etc, or access completely randomly.
cur.execute("SELECT * FROM portfolio")
result = cur.fetchall()
for record in result:
print("ISIN: {}, Naam: {}".format(record[0], record[1]))
print(result[0:3]))
My set-up:
MySQL server.
host running a python script.
(1) and (2) are different machines on the network.
The python script generates data which must be stored in a MySQL-database.
I use this (example-)code to achieve that:
def function sqldata(date,result):
con = mdb.connect('sql.lan', 'demouser', 'demo', 'demo')
with con:
cur = con.cursor()
cur.execute('INSERT INTO tabel(titel, nummer) VALUES( %s, %s)',(date, result))
The scipt generates one data-point approx. every minute. So this means that a new connection is opened and closed every minute. I'm wondering if it would be a better idea to open the connection at the start of the script and only close it when the script terminates. Effectively leaving the connection open indefinately.
This then obviously begs the question how to handle/recover when the SQL-server "leaves" the network (e.g. due to a reboot) for a while.
While typing my question this question appeared in the "Similar Questions" section. It is, however, from 2008 and possibly outdated and the 4 answers it received seem to contradict with each other.
What are the current insights in this matter?
Well the referred answer is right in it's point, but maybe not answering all your questions. I can not provide a full running python script for you here, but let me explain how i would go along with it:
Rule 1: Generally most mysql functions return values, that you should always check so that you can react on unwanted behavior.
Rule 2: Open a connection at the beginning of your script and use this one and only connection throughout your script.
Obviously you could check if there is an existing connection in your sqldata function, and if not then you could open a new one to the global con object.
if not con:
con = mdb.connect('sql.lan', 'demouser', 'demo', 'demo')
And if there is a connection already, you could check it's "up status" by performing a simple query with fixed expected result that you can check to see if the sql server is running.
if con:
cur = con.cursor()
returned = cur.execute('SELECT COUNT(*) FROM tabel')
if returned.with_rows:
....
Basically you could avoid this, because if you don't get a cursor back, and you check that first before using it, then you already know if the server is alive or not.
So CHECK, CHECK and CHECK. You should check everything you get back from a function to have a good error handling. Just using a connection or using a cursor without checking it first, can lead you talking to a NIL object and crashing your script.
And the last BIG HINT i can give you is to use multiple row inserts. You can actually insert hundreds of rows, if you just add the values comma seperated to your insert string:
# consider result would be filled like this
result = '("First Song",1),("Second Song",2),("Third Song",3)'
# then this will insert 3 rows with one call
returned = cur.execute('INSERT INTO tabel (titel, nummer) VALUES %s',(date, result), multi=True)
# since literally it will execute
returned = cur.execute('INSERT INTO tabel (titel, nummer) VALUES ("First Song",1),("Second Song",2),("Third Song",3)', multi=True)
# and now you can check returned for any error
if returned:
....
I have located a piece of code that runs quite slow (in my opinion) and would liek to know what you guys think. The code in question is as follows and is supposed to:
Query a database and get 2 fields, a field and its value
Populate the object dictionary with their values
The code is:
query = "SELECT Field, Value FROM metrics " \
"WHERE Status NOT LIKE '%ERROR%' AND Symbol LIKE '{0}'".format(self.symbol)
query = self.db.run(query, True)
if query is not None:
for each in query:
self.metrics[each[0].lower()] = each[1]
The query is run using a db class I created that is very simple:
def run(self, query, onerrorkeeprunning=False):
# Run query provided and return result
try:
con = lite.connect(self.db)
cur = con.cursor()
cur.execute(query)
con.commit()
runsql = cur.fetchall()
data = []
for rows in runsql:
line = []
for element in rows:
line.append(element)
data.append(line)
return data
except lite.Error, e:
if onerrorkeeprunning is True:
if con:
con.close()
return
else:
print 'Error %s:' % e.args[0]
sys.exit(1)
finally:
if con:
con.close()
I know there are tons of ways of writting this code and I was trying to keep things simple but for 24 fields this takes 0.03s so if I have 1,000 elements that is 30s and I find it a little too long!
EDIT: on further review, runsql = cur.fetchall() is the line that takes the most to run.
Any help will be much appreciated.
2nd EDIT: Looking online further, I have found the issue lies with the fetchall() commant and not with my query or the initialization of the DB. Has anybody been able to imporve the performance of the result fetching? (Some people mentioned changing the SQL code but this is not to blame, it runs pretty fast but then the slowness comes when you try to grab those results)
fetchall() reads all results, and returns them in a temporary list.
Your run() function then just puts all the results into another list.
Your top-level code then copies these values into yet another dictionary.
You should fetch only the row you need (which can be done directly on the cursor), and handle it directly:
cur.execute("SELECT Field, Value ...")
for row in cur:
self.metrics[row[0].lower()] = row[1]
Note: this distributes the cost of the SQL query over all for iteration; the overall time spent in the database does not change.
This code improves only on the time that would have been spent handling all the temporary variables.
The typical MySQLdb library query can use a lot of memory and perform poorly in Python, when a large result set is generated. For example:
cursor.execute("SELECT id, name FROM `table`")
for i in xrange(cursor.rowcount):
id, name = cursor.fetchone()
print id, name
There is an optional cursor that will fetch just one row at a time, really speeding up the script and cutting the memory footprint of the script a lot.
import MySQLdb
import MySQLdb.cursors
conn = MySQLdb.connect(user="user", passwd="password", db="dbname",
cursorclass = MySQLdb.cursors.SSCursor)
cur = conn.cursor()
cur.execute("SELECT id, name FROM users")
row = cur.fetchone()
while row is not None:
doSomething()
row = cur.fetchone()
cur.close()
conn.close()
But I can't find anything about using SSCursor with with nested queries. If this is the definition of doSomething():
def doSomething()
cur2 = conn.cursor()
cur2.execute('select id,x,y from table2')
rows = cur2.fetchall()
for row in rows:
doSomethingElse(row)
cur2.close()
then the script throws the following error:
_mysql_exceptions.ProgrammingError: (2014, "Commands out of sync; you can't run this command now")
It sounds as if SSCursor is not compatible with nested queries. Is that true? If so that's too bad because the main loop seems to run too slowly with the standard cursor.
This problem in discussed a bit in the MySQLdb User's Guide, under the heading of the threadsafety attribute (emphasis mine):
The MySQL protocol can not handle multiple threads using the same
connection at once. Some earlier versions of MySQLdb utilized locking
to achieve a threadsafety of 2. While this is not terribly hard to
accomplish using the standard Cursor class (which uses
mysql_store_result()), it is complicated by SSCursor (which uses
mysql_use_result(); with the latter you must ensure all the rows have
been read before another query can be executed.
The documentation for the MySQL C API function mysql_use_result() gives more information about your error message:
When using mysql_use_result(), you must execute mysql_fetch_row()
until a NULL value is returned, otherwise, the unfetched rows are
returned as part of the result set for your next query. The C API
gives the error Commands out of sync; you can't run this command now
if you forget to do this!
In other words, you must completely fetch the result set from any unbuffered cursor (i.e., one that uses mysql_use_result() instead of mysql_store_result() - with MySQLdb, that means SSCursor and SSDictCursor) before you can execute another statement over the same connection.
In your situation, the most direct solution would be to open a second connection to use while iterating over the result set of the unbuffered query. (It wouldn't work to simply get a buffered cursor from the same connection; you'd still have to advance past the unbuffered result set before using the buffered cursor.)
If your workflow is something like "loop through a big result set, executing N little queries for each row," consider looking into MySQL's stored procedures as an alternative to nesting cursors from different connections. You can still use MySQLdb to call the procedure and get the results, though you'll definitely want to read the documentation of MySQLdb's callproc() method since it doesn't conform to Python's database API specs when retrieving procedure outputs.
A second alternative is to stick to buffered cursors, but split up your query into batches. That's what I ended up doing for a project last year where I needed to loop through a set of millions of rows, parse some of the data with an in-house module, and perform some INSERT and UPDATE queries after processing each row. The general idea looks something like this:
QUERY = r"SELECT id, name FROM `table` WHERE id BETWEEN %s and %s;"
BATCH_SIZE = 5000
i = 0
while True:
cursor.execute(QUERY, (i + 1, i + BATCH_SIZE))
result = cursor.fetchall()
# If there's no possibility of a gap as large as BATCH_SIZE in your table ids,
# you can test to break out of the loop like this (otherwise, adjust accordingly):
if not result:
break
for row in result:
doSomething()
i += BATCH_SIZE
One other thing I would note about your example code is that you can iterate directly over a cursor in MySQLdb instead of calling fetchone() explicitly over xrange(cursor.rowcount). This is especially important when using an unbuffered cursor, because the rowcount attribute is undefined and will give a very unexpected result (see: Python MysqlDB using cursor.rowcount with SSDictCursor returning wrong count).
I have the following code:
def executeQuery(conn, query):
cur = conn.cursor()
cur.execute(query)
return cur
def trackTagsGenerator(chunkSize, baseCondition):
""" Returns a dict of trackId:tag limited to chunkSize. """
sql = """
SELECT track_id, tag
FROM tags
WHERE {baseCondition}
""".format(baseCondition=baseCondition)
limit = chunkSize
offset = 0
while True:
trackTags = {}
# fetch the track ids with the coresponding tag
limitPhrase = " LIMIT %d OFFSET %d" % (limit, offset)
query = sql + limitPhrase
offset += limit
cur = executeQuery(smacConn, query)
rows = cur.fetchall()
if not rows:
break
for row in rows:
trackTags[row['track_id']] = row['tag']
yield trackTags
I want to use it like this:
for trackTags in list(trackTagsGenerator(DATA_CHUNK_SIZE, baseCondition)):
print trackTags
break
This code produces the following error without even fetching one chunk of track tags:
Exception _mysql_exceptions.ProgrammingError: (2014, "Commands out of sync; you can't run this command now") in <bound method SSDictCursor.__del__ of <MySQLdb.cursors.SSDictCursor object at 0x10b067b90>> ignored
I suspect it's because I have the query execute logic in the body of loop in the generator function.
Is someone able to tell me how to fetch chunks of data using mysqldb in such way?
I'm pretty sure this is because it can run into situations where you've got two queries
running simultaniously because of the yield. Depending on how you call the function (threads, async, etc..) I'm pretty sure your cursor might get clobbered too?
As well, you're opening yourself up to (sorry, but I can't sugar coat this part) horrific SQL injection holes by inserting baseConditional using essentially a printf. Take a look at the DB-API’s parameter substitution docs for help.
Yield isn't going to save you time or energy here at all, the full sql command will always need to run before you'll get a single result. (Hence you're using LIMIT and OFFSET to make it more friendly, kudos)
i.e. someone updates the table while you're yielding out some data, in this particular case - not the end of the world. In many others, it gets ugly.
If you're just goofing around and you want this to work 'right-now-dammit', it'd probably work to modify executeQuery as such:
def executeQuery(conn, query):
cur = conn.cursor()
cur.execute(query)
cur = executeQuery(smacConn, query)
rows = cur.fetchall()
cur.close()
return rows
One thing that also kinda jumps out at me - you define trackTags = {}, but then you update tagTrackIds, and yield trackTags.. Which will always be empty dict.
My suggestion would be to not bother yourself with the headache of hand writing SQL if you're just trying to get a hobby project working. Take a look at Elixir which is built on top of SQLAlchemy.
Using an ORM (object-relational-mapper) can be a much more friendly introduction to databases. Defining what your objects look like in Python, and having it automatically generate your schema for you - and being able to add/modify/delete things in a Pythonic manner is really nifty.
If you really need to be async, check out ultramysql python module.
You use a SSDictCursor, something that maps to mysql_use_result() on MySQL-API-side. This requires that you read out the complete result before you can issue a new command.
As this happens before you receive the first chunk of data after all: are yu sure that this doesn't happen in the context of the query before this part of code is executed? The results of that last query might be still in the line, and executing the next one (i. e., the fist one in this context) might break things...