Transaction roll back - python

I have a big list which itself is consisted of 53,000,000 smaller lists as elements. And I want to submit each of these smaller lists as a row to a db in batches with the batch size of 1,000,000, meaning that every time the script connects to the db, it submits 1000,000 elements, then it disconnects from the db, and it connects again to submit another 1,000,000 rows.
Now my problem is that, if an error happens in the middle, for ex after submitting 50,000,000 rows, I need to delete all the rows in the db and try submitting everything from beginning.
I was thinking maybe I can use rollback(), to remove all 50,000,000 rows which has been added by now, but as long as I am using a loop, I do not know how I can rollback all 50,000,000 rows which are submitted in batches.
does any one have a suggestion?
here is my script:
"results" is the list with 53,000,000 smaller lists as elements.
batch = []
counter = 0
BATCH_SIZE =1000000
cursor_count = 0
def prepare_names(names):
return [w.replace("'", '') for w in names]
for i in range(len(results)):
if counter < BATCH_SIZE:
batch.append(prepare_names([results[i][0], results[i][1], results[i][2]])) # batch => [[ACC1234.0, 'Some full taxa name'], ...]
counter += 1
else:
batch.append(prepare_names([results[i][0], results[i][1], results[i][2]]))
values = (", ").join([f"('{d[0]}', '{d[1]}', '{d[2]}')" for d in batch])
sql = f"INSERT INTO accession_taxonomy(accession_number, taxon_id, taxonomy) VALUES {values}"
try:
cursor.execute(sql)
db.commit()
except Exception as exception:
print(exception)
print(f"Problem with query: {sql}")
print(cursor.rowcount, "Records Inserted")
cursor_count += cursor.rowcount
counter = 0
batch = []
else:
if batch:
values = (", ").join([f"('{d[0]}', '{d[1]}', '{d[2]}')" for d in batch])
sql = f"INSERT INTO accession_taxonomy(accession_number, taxon_id, taxonomy) VALUES {values}"
try:
cursor.execute(sql)
db.commit()
except Exception as exception:
print(exception)
print(f"Problem with query: {sql}")
print(cursor.rowcount, "Records Inserted")
cursor_count += cursor.rowcount
print("Total Number Of %s Rows Has Been Added." %(cursor_count))
db.close()

I would use some flags to make sure that
something was inserted
nothing wrong happened
And then, use those flags to choose to commit or to rollback, such as :
nothing_wrong_happened = True
something_was_inserted = False
for i in range(len(results)):
# Your code that generates the query
try:
cursor.execute(sql)
something_was_inserted = True # <-- you inserted something
except Exception as exception:
nothing_wrong_happened = False # <-- Something bad happened
print(exception)
print(f"Problem with query: {sql}")
# the rest of your code
else:
# Your code that generates the query
try:
cursor.execute(sql)
something_was_inserted = True # <-- you inserted something
except Exception as exception:
nothing_wrong_happened = False # <-- Something bad happened
print(exception)
print(f"Problem with query: {sql}")
# the rest of your code
# The loop is now over
if (something_was_inserted):
if (nothing_wrong_happened):
db.commit() # commit everything
else:
db.rollback() # rollback everything

There is no rollback after commit.
concider this:
1st Attempt 1M rows : committed
2nd Attempt 1M rows : committed
3rd Attempt 1m rows : error
You can only rollback the 3rd attempt. 1st and 2nd are done.
workaround
modify your accession_taxonomy table and add a field something called insertHash. Your batch update process will have an unique value for this field -for this batch exectuion. let's say todaysDate- and if any of your insert steps fails you can then do
Delete T from accession_taxonomy T Where T.insertHash ='TheValueUSet'
so essentially it becomes like this:
1st Attempt 1M rows : committed
2nd Attempt 1M rows : committed
3rd Attempt 1m rows : error
Delete AllRows where insertHash = 'TheValueUSet'
Having said that, are you sure you want to shoot 1m rows? have you checked if your server is capable of accepting that large packet?

Related

How can I fix for loop to other things

Currently I have set a for loop which retrieves data from the database for every row.
I need to use run a while loop, however it would not run as the for loop finishes once after it has retrieved database data. In result, this stops the rest of my While true loop to await for user response
c.execute("SELECT * FROM maincores WHERE set_status = 1")
rows = c.fetchall()
for v in rows:
# skip
while True:
#skip
I have tried using a global variable to store the database data then return the loop, all resulting in a fail.
How can I get sqlite3 database information without using for loop?
I'm not 100% on the problem, but I think you might want to use a generator so that you throttle your intake of information with your loop. So, you could do a function like:
def getDBdata():
c.execute("SELECT * FROM maincores WHERE set_status = 1")
rows = c.fetchall()
for v in rows:
yield(v) #just returns one result at a time ...
x = True
data = getDBdata()
while x is True:
do something with data
if <condition>:
next(data) #get the next row of data
else:
x = False
So, now you are controlling the data flow from your DB so that you don't exhaust your while loop as a condition of the data flow.
My apologies if I'm not answering the question your asking, but I hope this helps.

SQLite3 How to Select first 100 rows from database, then the next 100

Currently I have database filled with 1000s of rows.
I want to SELECT the first 100 rows, and then select the next 100, then the next 100 and so on...
So far I have:
c.execute('SELECT words FROM testWords')
data = c.fetchmany(100)
This allows me to get the first 100 rows, however, I can't find the syntax for selecting the next 100 rows after that, using another SELECT statement.
I've seen it is possible with other coding languages, but haven't found a solution with Python's SQLite3.
When you are using cursor.fetchmany() you don't have to issue another SELECT statement. The cursor is keeping track of where you are in the series of results, and all you need to do is call c.fetchmany(100) again until that produces an empty result:
c.execute('SELECT words FROM testWords')
while True:
batch = c.fetchmany(100)
if not batch:
break
# each batch contains up to 100 rows
or using the iter() function (which can be used to repeatedly call a function until a sentinel result is reached):
c.execute('SELECT words FROM testWords')
for batch in iter(lambda: c.fetchmany(100), []):
# each batch contains up to 100 rows
If you can't keep hold of the cursor (say, because you are serving web requests), then using cursor.fetchmany() is the wrong interface. You'll instead have to tell the SELECT statement to return only a selected window of rows, using the LIMIT syntax. LIMIT has an optional OFFSET keyword, together these two keywords specify at what row to start and how many rows to return.
Note that you want to make sure that your SELECT statement is ordered so you get a stable result set you can then slice into batches.
batchsize = 1000
offset = 0
while True:
c.execute(
'SELECT words FROM testWords ORDER BY somecriteria LIMIT ? OFFSET ?',
(batchsize, offset))
batch = list(c)
offset += batchsize
if not batch:
break
Pass the offset value to a next call to your code if you need to send these batches elsewhere and then later on resume.
sqlite3 is nothing to do with Python. It is a standalone database; Python just supplies an interface to it.
As a normal database, sqlite supports standard SQL. In SQL, you can use LIMIT and OFFSET to determine the start and end for your query. Note that if you do this, you should really use an explicit ORDER BY clause, to ensure that your results are consistently ordered between queries.
c.execute('SELECT words FROM testWords ORDER BY ID LIMIT 100')
...
c.execute('SELECT words FROM testWords ORDER BY ID LIMIT 100 OFFSET 100')
You can crate iterator and call it multiple times:
def ResultIter(cursor, arraysize=100):
while True:
results = cursor.fetchmany(arraysize)
if not results:
break
for result in results:
yield result
Or simply like this for returning the first 5 rows:
num_rows = 5
cursor = dbconn.execute("SELECT words FROM testWords" )
for row in cursor.fetchmany(num_rows):
print( "Words= " + str( row[0] ) + "\n" )

cx_Oracle query returns zero rows

Why does the below code not work ? It returns zero rows even though I have many multiple matching the search criteria.
A simple query of the form select * from Table_1 works fine and returns positive number of rows
import cx_Oracle
def function_A (data):
connection = cx_Oracle.connect('omitted details here')
for index in range(len(data)):
# connection is already open
cursor = connection.cursor()
query = "select * from Table_1 where column_1=:column1 and column_2=:column2 and column_3=:column3 and column_4=:column4"
bindVars={'column1':data[index][3], 'column2':data[index][4], 'column4':data[index][5], 'column5':data[index][6]}
cursor.execute(query, bindVars)
cursor.arraysize = 256
rowCount = 0
resultSet = cursor.fetchall()
if (resultSet != None):
logger.debug("Obtained a resultSet with length = %s", len(resultSet))
for index in range(len(resultSet)):
logger.debug("Fetched one row from cursor, incrementing counter !!")
rowCount = rowCount + 1
logger.debug("Fetched one row from cursor, incremented counter !!")
logger.debug("Successfully executed the select statement for table Table_1; that returned %s rows !!", rowCount)
logger.debug("Successfully executed the select statement for table Table_1; that returned %s rows !!", cursor.rowcount)
Please ignore minor formatting issues, code runs just does not give me a positive number of rows.
Code is being run on IBM AIX with python2.6 and a compatible version of cx_Oracle.
Oracle CX's cursor object has a read-only rowcount property. Rowcount is returning how many rows are returned with fetch* methods.
Say the query yields 5 rows, then the interaction is like this
execute rowcount = 0
fetchone rowcount = 1
fetchone rowcount = 2
fetchall rowcount = 5
Thay way you do not need to manually track it. Your query issues will have to be resolved first offcourse :)
Your query returns 0 rows because there are 0 rows that match your query. Either remove a predicate from your WHERE clause or change the value you pass into one.
It's worth noting that you're not binding anything to column3 in your bindVars variable. I'm also not entirely certain why you're iterating, cursor.rowcount, as you have it gives you the number of rows that have been fetched by the cursor.
Generally, if you think a SELECT statement is not returning the correct result then take it our of code and run it directly against the database. Bind all variables first so you can see exactly what you're actually running.
banging my head against monitor on this one... you have to do something like below to check, as the cursor value changes once you operate on it:
result_set = DB_connector.execute(sql)
result_list = result_set.fetchall() # assign the return row to a list
if result_set.rowcount == 0:
print('DB check failed, no row returned')
sql_result_set = None
else:
for row in result_list: # use this instead result_set
print('row fetched: ' + row[0])
sql_result_set.append(row[0])
print('DB test Passed')

Python MySQLdb iterate through table

I have a MSQL db and I need to iterate through a table and perform an action once a WHERE clause is met. Then once it reaches the end of the table, return to the top and start over.
Currently I have
cursor = database.cursor()
cursor.execute("SELECT user_id FROM round WHERE state == -1 AND state = 2")
round_id = cursor.fetchone()
if round != 5
...do stuff
in a loop but this obviously only keeps looping the first entry. I guess you need to use the for in function to read through the table, but I'm not sure exactly how to do this using mysqldb?
Once you have results in the cursor, you can iterate right in it.
cursor = database.cursor()
cursor.execute("SELECT user_id FROM round WHERE state == -1 AND state = 2")
for round in cursor:
if round[0] != 5
...do stuff
This will set the cursor at the beginning of the result set and tell you how many rows it got back (I went back and forth on this one, but this is the most authoritative documentation I have found, older Python MySQLdb lib returned rowcount on execute, but Python Database API Specification v2.0 does not, this should be the most compatible)
cursor.execute("SELECT user_id FROM round WHERE state = -1 OR state = 2")
numrows = cursor.rowcount
Will tell you how many Rows you got in return
for x in xrange(0,numrows):
row = cursor.fetchone()
print row[0], "-->", row[1]
Will iterate over each row (no need to enumerate x with range)

Assigning a 'for loop' to a variable in a python program [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 7 years ago.
Improve this question
Im writing a program at the moment that interacts with a MySQL database and im having a problem. As you can see I've written a query that will look for products in the products table that corresponds to the barcode that the user has inputted.
If the barcode that is input by the user is found in the products table, I want to increase the 'amount' field by 1 in the stocks table where the product that corresponds to the barcode input, is the same as the product in the stocks table.
As you can see I've tried to assign a variable to the for loop to try and get it to work that way but its not working. does anyone have any idea of how to do it?
import MySQLdb
def look_up_product():
db = MySQLdb.connect(host='localhost', user = 'root', passwd='$$', db='fillmyfridge')
cursor = db.cursor (MySQLdb.cursors.DictCursor)
user_input=raw_input('please enter the product barcode that you wish to checkin to the fridge: \n')
if cursor.execute("""select * from products where product = %s""", (user_input)):
db.commit()
result_set = cursor.fetchall ()
#i want here to assign a variable to this for loop and the line below = for product in result_set:
print "%s" % (row["product"])
cursor.execute('update stocks set amount = amount + 1 where product = %s', (#here i want the result of the for loop))
db.commit()
else:
print 'no not in products table'
thanks a million.
The answer depends on what you mean by "assign a variable to a for loop." This wording is confusing because a for loop is a tool for controlling the flow of execution -- it's not normally thought of as having a value. But I think I know what you mean. Every time the loop runs, it will execute print "%s" % (row["product"]). I'm guessing that you want to store all of the strings that this makes as the loop runs. I'm also going to guess you meant row[product] and not row["product"], because the latter will be the same for the whole loop. Then you can do this:
mylist = []
for product in result_set:
mylist.append("%s" % (row[product],))
Notice that the % operation works even though you're not printing the string anymore -- this is a surprise for people coming from C. You can also use python list comprehensions to make this event more succinct:
mylist = ["%s" % (row[product],) for product in result_set]
Are you expecting a single row as a result? If so, try this:
row = cursor.fetchone()
print row["product"]
cursor.execute('update stocks set amount = amount + 1 where product = %s', row["product"])
I'm not sure how you get a row id from value fetched from products table. I'd recommend explicitely specifying needed columns and not using the select * from idiom.
I introduced the helper function for the id retrieval to make code more readable:
def getAnIdFromValue(someValueTuple):
'''This function retrieves some table row identifier from a row tuple'''
returns someValueTuple[0]
I'd try the following function body if multiple rows are expected:
db = MySQLdb.connect(...)
cursor = db.cursor()
ids = []
cursor.execute("""select * from products where product = %s""", (user_input))
for value in cursor.fetchall():
#value is a tuple. len(value) == number of columns in products table
ids.append(getAnIdFromValue(value))
if len(ids):
cursor.executemany("update stocks set amount = amount + 1 where product =%s", tuple(ids))
db.commit()
else:
print 'no not in products table'
I think you need to indent the "update stocks..." line so that it's inside the for loop.
There there. I also fixed a comma you were missing on the first cursor.execute line.
import MySQLdb
def look_up_product():
db = MySQLdb.connect(host='localhost', user = 'root',
passwd='$$', db='fillmyfridge')
cursor = db.cursor (MySQLdb.cursors.DictCursor)
user_input=raw_input('please enter the product barcode '
'that you wish to checkin to the fridge: \n')
cursor.execute("""select * from products where product = %s""",
(user_input,))
for row in iter(cursor.fetchone, None):
print row["product"]
cursor.execute('update stocks set amount = amount + 1'
' where product = %s', (row["product"],))
db.commit()
Of course, you could always use sqlalchemy instead:
import sqlalchemy as sa
import sqlalchemy.orm
# Prepare high-level objects:
class Product(object): pass
engine = sa.create_engine('mysql://root:$$#localhost/fillmyfridge')
session = sa.orm.create_session(bind=engine)
product_table = sa.Table('products', sa.MetaData(), autoload=True)
sqlalchemy.orm.mapper(Product, product_table)
def look_up_product():
user_input=raw_input('please enter the product barcode '
'that you wish to checkin to the fridge: \n')
for prod in session.query(Product).filter(Product.product == user_input):
print prod.product
# Here's the nicety: to update just change the object directly:
prod.ammount = prod.ammount + 1
session.flush()
session.commit()

Categories

Resources