I'm running a relatively simple python script that is meant to read a text file that has a series of stored procedures - one per line. The script should run the stored procedure on the first line, move to the second line, run the stored procedure on the second line, etc etc. Running these stored procedures should populate a particular table.
So my problem is that these procedures aren't populating the table with all of the results that they should be. For example, if my file looks like
exec myproc 'data1';
exec myproc 'data2';
Where myproc 'data1' should populate this other table with about ~100 records, and myproc 'data2' should populate this other table with an additional ~50 records. Instead, I end up with about 9 results total - 5 from the first proc, 4 from the second.
I know the procedures work, because if I run the same sql file (with the procs) through OSQL and I get the correct ~150 records in the other table, so obviously it's something to do with my script.
Here's the code I'm running to do this:
import pypyodbc
conn = pypyodbc.connect(CONN_STR.format("{SQL Server}",server,database,user,password))
conn.autoCommit = True
procsFile = openFile('otherfile.txt','r+')
#loop through each proc (line) in the file
for proc in procsFile:
#run the procedure
curs = conn.cursor()
curs.execute(proc)
#commit results
conn.commit()
curs.close();
conn.close();
procsFile.close();
I'm thinking this has something to do with the procedures not committing...or something?? Frankly I don't really understand why only 5/100 records would be committed.
I dont know. any help or advice would be much appreciated.
There a couple things to check. One is that your data1 is actually a string 'data1', or if you want the value of data1? If you want the string 'data1', then you will have to add quotes around it. So your string to execute would look like this:
exec_string = 'exec my_proc \'data1\';'
In your case you turn ON auto-commit and also you manually commit for the entire connection.
I would comment out the auto-commit line:
#conn.autoCommit = True
And then change conn.commit() to the cursor instead
curs.commit()
As a one-liner:
conn.cursor().execute('exec myproc \'data1\';').commit()
Also, your Python semi-colons (;) at the end of the python line are unnecessary, and may be doing something weird in your for-loop. (Keep the SQL ones though.)
Related
In making some slow code run more efficiently, I ran into a snag with escape characters. The original code (which works) was something like this:
sql = f"""INSERT INTO {schema}.{table}
(seg_id, road_type, road_name)
VALUES ({s_id}, '{road_type}', %s);"""
# road_name may have an apostrophe. Let the execute() handle it.
cur.execute(sql, (road_name,))
This query ran many thousands of times, and the execute statements make the code sluggish due to communicating with a remote server. I would like to get a series of sql statements and execute them all at once, something like this:
sql += f"""INSERT INTO {schema}.{table}
(seg_id, road_type, road_name)
VALUES ({s_id}, '{road_type}', '{road_name}');\n"""
The sql statement is not executed immediately, but is "stacked" so that I can execute a collection of sql statements in a single cur.execute(sql) which happens later. However, since it does not have the road_name as an argument in the execute statement, it fails when there's a road name with an apostrophe (like Eugene O'Neill Drive).
What is a viable alternative?
NOTE: The query is in a function that is called from multiple processes. The {variables} are being passed as arguments.
So, I have the following code that inserts the data of an old database to a new one:
...
cur_old.execute("""SELECT DISTINCT module FROM all_students_users_log_course266""")
module_rows = cur_old.fetchall()
for row in module_rows:
cur_new.execute("""INSERT INTO modules(label) SELECT %s WHERE NOT EXISTS (SELECT 1 FROM modules WHERE label=%s)""", (row[0], row[0]))
...
The last line executes a query where labels are inserted into the new database table. I tested this query on pgAdmin and it works as I want.
However, when execute the script, nothing is inserted on the modules table. (Actually the sequences are updated, but none data is stored on the table).
Do I need to do anything else after I call the execute method from the cursor?
(Ps. The script is running till the end without any errors)
You forgot to do connection.commit(). Any alteration in the database has to be followed by a commit on the connection. For example, the sqlite3 documentation states it clearly in the first example:
# Save (commit) the changes.
conn.commit()
And the first example in the psycopg2 documentation does the same:
# Make the changes to the database persistent
>>> conn.commit()
As Evert said, the commit() was missing. An alternative to always specifying it in your code is using the autocommit feature.
http://initd.org/psycopg/docs/connection.html#connection.autocommit
For example like this:
with psycopg2.connect("...") as dbconn:
dbconn.autocommit=True
My set-up:
MySQL server.
host running a python script.
(1) and (2) are different machines on the network.
The python script generates data which must be stored in a MySQL-database.
I use this (example-)code to achieve that:
def function sqldata(date,result):
con = mdb.connect('sql.lan', 'demouser', 'demo', 'demo')
with con:
cur = con.cursor()
cur.execute('INSERT INTO tabel(titel, nummer) VALUES( %s, %s)',(date, result))
The scipt generates one data-point approx. every minute. So this means that a new connection is opened and closed every minute. I'm wondering if it would be a better idea to open the connection at the start of the script and only close it when the script terminates. Effectively leaving the connection open indefinately.
This then obviously begs the question how to handle/recover when the SQL-server "leaves" the network (e.g. due to a reboot) for a while.
While typing my question this question appeared in the "Similar Questions" section. It is, however, from 2008 and possibly outdated and the 4 answers it received seem to contradict with each other.
What are the current insights in this matter?
Well the referred answer is right in it's point, but maybe not answering all your questions. I can not provide a full running python script for you here, but let me explain how i would go along with it:
Rule 1: Generally most mysql functions return values, that you should always check so that you can react on unwanted behavior.
Rule 2: Open a connection at the beginning of your script and use this one and only connection throughout your script.
Obviously you could check if there is an existing connection in your sqldata function, and if not then you could open a new one to the global con object.
if not con:
con = mdb.connect('sql.lan', 'demouser', 'demo', 'demo')
And if there is a connection already, you could check it's "up status" by performing a simple query with fixed expected result that you can check to see if the sql server is running.
if con:
cur = con.cursor()
returned = cur.execute('SELECT COUNT(*) FROM tabel')
if returned.with_rows:
....
Basically you could avoid this, because if you don't get a cursor back, and you check that first before using it, then you already know if the server is alive or not.
So CHECK, CHECK and CHECK. You should check everything you get back from a function to have a good error handling. Just using a connection or using a cursor without checking it first, can lead you talking to a NIL object and crashing your script.
And the last BIG HINT i can give you is to use multiple row inserts. You can actually insert hundreds of rows, if you just add the values comma seperated to your insert string:
# consider result would be filled like this
result = '("First Song",1),("Second Song",2),("Third Song",3)'
# then this will insert 3 rows with one call
returned = cur.execute('INSERT INTO tabel (titel, nummer) VALUES %s',(date, result), multi=True)
# since literally it will execute
returned = cur.execute('INSERT INTO tabel (titel, nummer) VALUES ("First Song",1),("Second Song",2),("Third Song",3)', multi=True)
# and now you can check returned for any error
if returned:
....
The typical MySQLdb library query can use a lot of memory and perform poorly in Python, when a large result set is generated. For example:
cursor.execute("SELECT id, name FROM `table`")
for i in xrange(cursor.rowcount):
id, name = cursor.fetchone()
print id, name
There is an optional cursor that will fetch just one row at a time, really speeding up the script and cutting the memory footprint of the script a lot.
import MySQLdb
import MySQLdb.cursors
conn = MySQLdb.connect(user="user", passwd="password", db="dbname",
cursorclass = MySQLdb.cursors.SSCursor)
cur = conn.cursor()
cur.execute("SELECT id, name FROM users")
row = cur.fetchone()
while row is not None:
doSomething()
row = cur.fetchone()
cur.close()
conn.close()
But I can't find anything about using SSCursor with with nested queries. If this is the definition of doSomething():
def doSomething()
cur2 = conn.cursor()
cur2.execute('select id,x,y from table2')
rows = cur2.fetchall()
for row in rows:
doSomethingElse(row)
cur2.close()
then the script throws the following error:
_mysql_exceptions.ProgrammingError: (2014, "Commands out of sync; you can't run this command now")
It sounds as if SSCursor is not compatible with nested queries. Is that true? If so that's too bad because the main loop seems to run too slowly with the standard cursor.
This problem in discussed a bit in the MySQLdb User's Guide, under the heading of the threadsafety attribute (emphasis mine):
The MySQL protocol can not handle multiple threads using the same
connection at once. Some earlier versions of MySQLdb utilized locking
to achieve a threadsafety of 2. While this is not terribly hard to
accomplish using the standard Cursor class (which uses
mysql_store_result()), it is complicated by SSCursor (which uses
mysql_use_result(); with the latter you must ensure all the rows have
been read before another query can be executed.
The documentation for the MySQL C API function mysql_use_result() gives more information about your error message:
When using mysql_use_result(), you must execute mysql_fetch_row()
until a NULL value is returned, otherwise, the unfetched rows are
returned as part of the result set for your next query. The C API
gives the error Commands out of sync; you can't run this command now
if you forget to do this!
In other words, you must completely fetch the result set from any unbuffered cursor (i.e., one that uses mysql_use_result() instead of mysql_store_result() - with MySQLdb, that means SSCursor and SSDictCursor) before you can execute another statement over the same connection.
In your situation, the most direct solution would be to open a second connection to use while iterating over the result set of the unbuffered query. (It wouldn't work to simply get a buffered cursor from the same connection; you'd still have to advance past the unbuffered result set before using the buffered cursor.)
If your workflow is something like "loop through a big result set, executing N little queries for each row," consider looking into MySQL's stored procedures as an alternative to nesting cursors from different connections. You can still use MySQLdb to call the procedure and get the results, though you'll definitely want to read the documentation of MySQLdb's callproc() method since it doesn't conform to Python's database API specs when retrieving procedure outputs.
A second alternative is to stick to buffered cursors, but split up your query into batches. That's what I ended up doing for a project last year where I needed to loop through a set of millions of rows, parse some of the data with an in-house module, and perform some INSERT and UPDATE queries after processing each row. The general idea looks something like this:
QUERY = r"SELECT id, name FROM `table` WHERE id BETWEEN %s and %s;"
BATCH_SIZE = 5000
i = 0
while True:
cursor.execute(QUERY, (i + 1, i + BATCH_SIZE))
result = cursor.fetchall()
# If there's no possibility of a gap as large as BATCH_SIZE in your table ids,
# you can test to break out of the loop like this (otherwise, adjust accordingly):
if not result:
break
for row in result:
doSomething()
i += BATCH_SIZE
One other thing I would note about your example code is that you can iterate directly over a cursor in MySQLdb instead of calling fetchone() explicitly over xrange(cursor.rowcount). This is especially important when using an unbuffered cursor, because the rowcount attribute is undefined and will give a very unexpected result (see: Python MysqlDB using cursor.rowcount with SSDictCursor returning wrong count).
I've got 32 SQLite (3.7.9) databases with 3 tables each that I'm trying to merge together using the idiom that I've found elsewhere (each db has the same schema):
attach db1.sqlite3 as toMerge;
insert into tbl1 select * from toMerge.tbl1;
insert into tbl2 select * from toMerge.tbl2;
insert into tbl3 select * from toMerge.tbl3;
detach toMerge;
and rinse-repeating for the entire set of databases. I do this using python and the sqlite3 module:
for fn in filelist:
completedb = sqlite3.connect("complete.sqlite3")
c = completedb.cursor()
c.execute("pragma synchronous = off;")
c.execute("pragma journal_mode=off;")
print("Attempting to merge " + fn + ".")
query = "attach '" + fn + "' as toMerge;"
c.execute(query)
try:
c.execute("insert into tbl1 select * from toMerge.tbl1;")
c.execute("insert into tbl2 select * from toMerge.tbl2;")
c.execute("insert into tbl3 select * from toMerge.tbl3;")
c.execute("detach toMerge;")
completedb.commit()
except sqlite3.Error as err:
print "Error! ", type(err), " Error msg: ", err
raise
2 of the tables are fairly small, only 50K rows per db, while the third (tbl3) is larger, about 850 - 900K rows. Now, what happens is that the inserts progressively slow down until I get to about the fourth database when they grind to a near halt (on the order of a a megabyte or two in file size added every 1-3 minutes to the combined database). In case it was python, I've even tried dumping out the tables as INSERTs (.insert; .out foo; sqlite3 complete.db < foo is the skeleton, found here) and combining them in a bash script using the sqlite3 CLI to do the work directly, but I get exactly the same problem.
The table setup of tbl3 isn't too demanding - a text field containing a UUID, two integers, and four real values. My worry is that it's the number of rows, because I ran into exactly the same trouble at exactly the same spot (about four databases in) when the individual databases were an order of magnitude larger in terms of file size with the same number of rows (I trimmed the contents of tbl3 significantly by storing summary stats instead of raw data). Or maybe it's the way I'm performing the operation? Can anyone shed some light on this problem that I'm having before I throw something out the window?
Try adding or removing indexes/primary key for the larger table.
You didn't mention the OS you were using or the db file sizes. Windows can have issues with files that are bigger than 2Gb depending on what version.
In any case, since this is a glorified batch script why not get rid of the for loop, get the filename from sys.argv, and then just run it once for each merge db. That way you will never have to deal with memory issues from doing too much in one process.
Mind you, if you end the loop with the following that will likely also fix things.
c.close()
completedb.close()
You say that the same thing occurs when you follow this process using the CLI and quitting after every db. I assume that you mean the Python CLI, and quitting means that you exit and restart Python. If that is true, and it still develops a problem every 4th database, then something is wrong with your SQLITE shared library. It shouldn't be keeping state like that.
If I were in your shoes, I would stop using attach and just open multiple connections in Python, then move the data in batches of about 1000 records per commit. It would be slower than your technique because all the data moves in and out of Python objects, but I think it would also be more reliable. Open the complete db, then loop around opening a second db, copying, then closing the second db. For the copying, I would use OFFSET and LIMIT on the SELECT statements to process batches of 100 records, then commit, then repeat.
In fact, I would also count the completedb records, and the second db records before copying, then after copying count the completedb records to ensure that I had copied the expected amount. Also, you would be keeping track of the value of the next OFFSET and I would write that to a text file right after committing, so that I could interrupt and restart the process at any time and it would carry on where it left off.