How to clear buffer of SScursor using pymysql? - python

I tried the following codes (from pymysql test code):
https://github.com/PyMySQL/PyMySQL/blob/master/pymysql/tests/test_cursor.py
def test_cleanup_rows_unbuffered(self):
conn = self.test_connection
cursor = conn.cursor(pymysql.cursors.SSCursor)
cursor.execute("select * from test as t1, test as t2")
for counter, row in enumerate(cursor):
print(row)
if counter > 10:
break
del cursor
self.safe_gc_collect()
print('The second cursor.')
c2 = conn.cursor()
c2.execute("select 1")
self.assertEqual(c2.fetchone(), (1,))
self.assertIsNone(c2.fetchone())
But the code keeps running and never stops after "print(row)". Or should I say it never moves on to the second cursor.
How can I solve this problem?

Hello You have to run fetchall() command . You can enumerate this to do the operations required later. SO your program should look something like below.
for counter, row in enumerate(cursor.fetchall()):
print(row)
if counter > 10:
break

Related

How can I fix for loop to other things

Currently I have set a for loop which retrieves data from the database for every row.
I need to use run a while loop, however it would not run as the for loop finishes once after it has retrieved database data. In result, this stops the rest of my While true loop to await for user response
c.execute("SELECT * FROM maincores WHERE set_status = 1")
rows = c.fetchall()
for v in rows:
# skip
while True:
#skip
I have tried using a global variable to store the database data then return the loop, all resulting in a fail.
How can I get sqlite3 database information without using for loop?
I'm not 100% on the problem, but I think you might want to use a generator so that you throttle your intake of information with your loop. So, you could do a function like:
def getDBdata():
c.execute("SELECT * FROM maincores WHERE set_status = 1")
rows = c.fetchall()
for v in rows:
yield(v) #just returns one result at a time ...
x = True
data = getDBdata()
while x is True:
do something with data
if <condition>:
next(data) #get the next row of data
else:
x = False
So, now you are controlling the data flow from your DB so that you don't exhaust your while loop as a condition of the data flow.
My apologies if I'm not answering the question your asking, but I hope this helps.

Transaction roll back

I have a big list which itself is consisted of 53,000,000 smaller lists as elements. And I want to submit each of these smaller lists as a row to a db in batches with the batch size of 1,000,000, meaning that every time the script connects to the db, it submits 1000,000 elements, then it disconnects from the db, and it connects again to submit another 1,000,000 rows.
Now my problem is that, if an error happens in the middle, for ex after submitting 50,000,000 rows, I need to delete all the rows in the db and try submitting everything from beginning.
I was thinking maybe I can use rollback(), to remove all 50,000,000 rows which has been added by now, but as long as I am using a loop, I do not know how I can rollback all 50,000,000 rows which are submitted in batches.
does any one have a suggestion?
here is my script:
"results" is the list with 53,000,000 smaller lists as elements.
batch = []
counter = 0
BATCH_SIZE =1000000
cursor_count = 0
def prepare_names(names):
return [w.replace("'", '') for w in names]
for i in range(len(results)):
if counter < BATCH_SIZE:
batch.append(prepare_names([results[i][0], results[i][1], results[i][2]])) # batch => [[ACC1234.0, 'Some full taxa name'], ...]
counter += 1
else:
batch.append(prepare_names([results[i][0], results[i][1], results[i][2]]))
values = (", ").join([f"('{d[0]}', '{d[1]}', '{d[2]}')" for d in batch])
sql = f"INSERT INTO accession_taxonomy(accession_number, taxon_id, taxonomy) VALUES {values}"
try:
cursor.execute(sql)
db.commit()
except Exception as exception:
print(exception)
print(f"Problem with query: {sql}")
print(cursor.rowcount, "Records Inserted")
cursor_count += cursor.rowcount
counter = 0
batch = []
else:
if batch:
values = (", ").join([f"('{d[0]}', '{d[1]}', '{d[2]}')" for d in batch])
sql = f"INSERT INTO accession_taxonomy(accession_number, taxon_id, taxonomy) VALUES {values}"
try:
cursor.execute(sql)
db.commit()
except Exception as exception:
print(exception)
print(f"Problem with query: {sql}")
print(cursor.rowcount, "Records Inserted")
cursor_count += cursor.rowcount
print("Total Number Of %s Rows Has Been Added." %(cursor_count))
db.close()
I would use some flags to make sure that
something was inserted
nothing wrong happened
And then, use those flags to choose to commit or to rollback, such as :
nothing_wrong_happened = True
something_was_inserted = False
for i in range(len(results)):
# Your code that generates the query
try:
cursor.execute(sql)
something_was_inserted = True # <-- you inserted something
except Exception as exception:
nothing_wrong_happened = False # <-- Something bad happened
print(exception)
print(f"Problem with query: {sql}")
# the rest of your code
else:
# Your code that generates the query
try:
cursor.execute(sql)
something_was_inserted = True # <-- you inserted something
except Exception as exception:
nothing_wrong_happened = False # <-- Something bad happened
print(exception)
print(f"Problem with query: {sql}")
# the rest of your code
# The loop is now over
if (something_was_inserted):
if (nothing_wrong_happened):
db.commit() # commit everything
else:
db.rollback() # rollback everything
There is no rollback after commit.
concider this:
1st Attempt 1M rows : committed
2nd Attempt 1M rows : committed
3rd Attempt 1m rows : error
You can only rollback the 3rd attempt. 1st and 2nd are done.
workaround
modify your accession_taxonomy table and add a field something called insertHash. Your batch update process will have an unique value for this field -for this batch exectuion. let's say todaysDate- and if any of your insert steps fails you can then do
Delete T from accession_taxonomy T Where T.insertHash ='TheValueUSet'
so essentially it becomes like this:
1st Attempt 1M rows : committed
2nd Attempt 1M rows : committed
3rd Attempt 1m rows : error
Delete AllRows where insertHash = 'TheValueUSet'
Having said that, are you sure you want to shoot 1m rows? have you checked if your server is capable of accepting that large packet?

Inserting billions of data to Sqlite via Python

I want to insert billions of values(exchange rates) to a sqlite db file. I want to use threading because it takes a lot of time but threading pool loop executes same nth element multiple times. I have a print statement in the begining of my method and it prints out multiple times instead of just one.
pool = ThreadPoolExecutor(max_workers=2500)
def gen_nums(i, cur):
global x
print('row number', x, ' has started')
gen_numbers = list(mydata)
sql_data = []
for f in gen_numbers:
sql_data.append((f, i, mydata[i]))
cur.executemany('INSERT INTO numbers (rate, min, max) VALUES (?, ?, ?)', sql_data)
print('row number', x, ' has finished')
x += 1
with conn:
cur = conn.cursor()
for i in mydata:
pool.submit(gen_nums, i, cur)
pool.shutdown(wait=True)
and the output is:
row number 1 has started
row number 1 has started
row number 1 has started
row number 1 has started
row number 1 has started
row number 1 has started
row number 1 has started
...
Divide your data into chunks on the fly using generator expressions, make inserts inside the transaction.
Here how your code may look like.
Also, sqlite has an ability to import CSV files.
Sqlite can do tens of thousands of inserts per second, just make sure to do all of them in a single transaction by surrounding the inserts with BEGIN and COMMIT. (executemany() does this automatically.)
As always, don't optimize before you know speed will be a problem. Test the easiest solution first, and only optimize if the speed is unacceptable.

cx_Oracle query returns zero rows

Why does the below code not work ? It returns zero rows even though I have many multiple matching the search criteria.
A simple query of the form select * from Table_1 works fine and returns positive number of rows
import cx_Oracle
def function_A (data):
connection = cx_Oracle.connect('omitted details here')
for index in range(len(data)):
# connection is already open
cursor = connection.cursor()
query = "select * from Table_1 where column_1=:column1 and column_2=:column2 and column_3=:column3 and column_4=:column4"
bindVars={'column1':data[index][3], 'column2':data[index][4], 'column4':data[index][5], 'column5':data[index][6]}
cursor.execute(query, bindVars)
cursor.arraysize = 256
rowCount = 0
resultSet = cursor.fetchall()
if (resultSet != None):
logger.debug("Obtained a resultSet with length = %s", len(resultSet))
for index in range(len(resultSet)):
logger.debug("Fetched one row from cursor, incrementing counter !!")
rowCount = rowCount + 1
logger.debug("Fetched one row from cursor, incremented counter !!")
logger.debug("Successfully executed the select statement for table Table_1; that returned %s rows !!", rowCount)
logger.debug("Successfully executed the select statement for table Table_1; that returned %s rows !!", cursor.rowcount)
Please ignore minor formatting issues, code runs just does not give me a positive number of rows.
Code is being run on IBM AIX with python2.6 and a compatible version of cx_Oracle.
Oracle CX's cursor object has a read-only rowcount property. Rowcount is returning how many rows are returned with fetch* methods.
Say the query yields 5 rows, then the interaction is like this
execute rowcount = 0
fetchone rowcount = 1
fetchone rowcount = 2
fetchall rowcount = 5
Thay way you do not need to manually track it. Your query issues will have to be resolved first offcourse :)
Your query returns 0 rows because there are 0 rows that match your query. Either remove a predicate from your WHERE clause or change the value you pass into one.
It's worth noting that you're not binding anything to column3 in your bindVars variable. I'm also not entirely certain why you're iterating, cursor.rowcount, as you have it gives you the number of rows that have been fetched by the cursor.
Generally, if you think a SELECT statement is not returning the correct result then take it our of code and run it directly against the database. Bind all variables first so you can see exactly what you're actually running.
banging my head against monitor on this one... you have to do something like below to check, as the cursor value changes once you operate on it:
result_set = DB_connector.execute(sql)
result_list = result_set.fetchall() # assign the return row to a list
if result_set.rowcount == 0:
print('DB check failed, no row returned')
sql_result_set = None
else:
for row in result_list: # use this instead result_set
print('row fetched: ' + row[0])
sql_result_set.append(row[0])
print('DB test Passed')

Python MySQLdb iterate through table

I have a MSQL db and I need to iterate through a table and perform an action once a WHERE clause is met. Then once it reaches the end of the table, return to the top and start over.
Currently I have
cursor = database.cursor()
cursor.execute("SELECT user_id FROM round WHERE state == -1 AND state = 2")
round_id = cursor.fetchone()
if round != 5
...do stuff
in a loop but this obviously only keeps looping the first entry. I guess you need to use the for in function to read through the table, but I'm not sure exactly how to do this using mysqldb?
Once you have results in the cursor, you can iterate right in it.
cursor = database.cursor()
cursor.execute("SELECT user_id FROM round WHERE state == -1 AND state = 2")
for round in cursor:
if round[0] != 5
...do stuff
This will set the cursor at the beginning of the result set and tell you how many rows it got back (I went back and forth on this one, but this is the most authoritative documentation I have found, older Python MySQLdb lib returned rowcount on execute, but Python Database API Specification v2.0 does not, this should be the most compatible)
cursor.execute("SELECT user_id FROM round WHERE state = -1 OR state = 2")
numrows = cursor.rowcount
Will tell you how many Rows you got in return
for x in xrange(0,numrows):
row = cursor.fetchone()
print row[0], "-->", row[1]
Will iterate over each row (no need to enumerate x with range)

Categories

Resources