I queried two databases to get two relations. I terate over those relations once to form maps, and then again to perform some calculations. However, when I attempt iterate over the same relations a second time, I find that no iteration is actually occurring. Here is the code:
dev_connect = dev_engine.connect()
prod_connect = prod_engine.connect() # from a different database
Relation1 = dev_engine.execute(sqlquery1)
Relation2 = prod_engine.execute(sqlquery)
before_map = {}
after_map = {}
for row in Relation1:
before_map[row['instrument_id']] = row
for row2 in Relation2:
after_map[row2['instrument_id']] = row2
update_count = insert_count = delete_count = 0
change_list = []
count =0
for prod_row in Relation2:
count += 1
result = list(prod_row)
...
change_list.append(result)
count2 = 0
for before_row in Relation1:
count2 += 1
result = before_row
...
print count, count2 # prints 0
before_map and after_map are not empty, so Relation1 and Relation2 definitely have tuples in them. Yet count and count2 are 0, so the prod_row and before_row 'for loops' aren't actually occurring. Why can't I iterate over Relation1 and Relation2 a second time?
When you call execute on a SQL Alchemy engine, you get back a ResultProxy, which is a facade to a DBAPI cursor to the rows your query returns.
Once you iterate over all the results of the ResultProxy, it automatically closes the underlying cursor so you can't use the results again by just iterating over it, as documented on the SQLAlchemy page:
The returned result is an instance of ResultProxy, which references a DBAPI cursor and provides a largely compatible interface with that of the DBAPI cursor. The DBAPI cursor will be closed by the ResultProxy when all of its result rows (if any) are exhausted.
You can solve your problem a couple ways:
Store the results in a list. Just do a list-comprehension against the rows returned:
Relation1 = dev_engine.execute(sqlquery1)
relation1_items = [r for r in Relation1]
# ...
# now you can iterate over relation1_items as much as you want
Do everything you need to in one pass through each row set returned. I don't know if this option is feasible for you since I don't know if the full extent of your calculations require cross-referencing between your before_map and after_map objects.
Related
I'm trying to create generator object for the list of records with the data from mysql database, so I'm passing the mysql cursor object to the function as parameter.
My issue here is if the "if block" containing yield records is commented then cust_records function works perfectly fine but if I uncomment the line then the function is not working.
Not sure if this is not the way to yield the list object in Python 3
My code so far:
def cust_records(result_set) :
block_id = None
records = []
i = 0
for row in result_set :
records.append((row['id'], row, smaller_ids))
if records :
yield records
The point of generators is lazy evaluation, so storing all records in a list and yielding the list makes no sense at all. If you want to retain lazy evalution (which is IMHO preferable, specially if you have to work on arbitrary datasets that might get huge), you want to yield each record, ie:
def cust_records(result_set) :
for row in result_set :
yield (row['id'], row, smaller_ids)
# then
def example():
cursor.execute(<your_sql_query_here>)
for record in cust_records(cursor):
print(record)
else (if you really want to consume as much memory as possible) just male cust_record a plain function:
def cust_records(result_set) :
records = []
for row in result_set :
records.append((row['id'], row, smaller_ids))
return records
So I have data known as id_list that is coming into the function in this format [(u'SGP-3630', 1202), (u'MTSCR-534', 1244)]. The format being two values paired together, there could be 1 pair or a hundred pairs.
This is the function:
def ListParser(id_list):
list_length = len(id_list)
count = 0
table = ""
while count < list_length:
jira = id_list[count][0]
stash = id_list[count][1]
count = count + 1
table = table + RetrieveFromAPI(stash, jira)
table = TableFormatter(table)
table = TableColouriser(table)
return table
What this function does is goes through the list and extracts the pairs and puts them through a function called RetrieveFromAPI() which fetches information from a URL.
Anyone have an idea on how to impliment multithreading here? I've had a shot at splitting both lists up into their own lists and getting the pool to iterate through each list but it hasn't quite worked.
def ListParser(id_list):
pool = ThreadPool(4)
list_length = len(id_list)
count = 0
table = ""
jira_list = list()
stash_list = list()
while count < list_length:
jira_list = jira_list.extend(id_list[count][0])
print jira_list
stash_list = stash_list.extend(id_list[count][1])
print stash_list
count = count + 1
table = table + pool.map(RetrieveFromAPI, stash_list, jira_list)
table = TableFormatter(table)
table = TableColouriser(table)
return table
The error I'm getting for this attempt is TypeError: 'int' object is not iterable
EDIT 2: Okay so I've managed to get the first list with tuples split up into two different lists, but I'm unsure how to get multithreading working with it.
jira,stash= map(list,zip(*id_list))
You're working too hard! From help(multiprocessing.pool.ThreadPool)
map(self, func, iterable, chunksize=None)
Apply `func` to each element in `iterable`, collecting the results
in a list that is returned.
The second argument is an iterable of the arguments you want to pass to the worker threads. You have a list of lists and you want the first two items from the inner list for each call. id_list is already iterable, so we're close. A small function (in this case implemented as a lambda) bridges the gap.
I worked up a full mock solution just to make sure it works, so here it goes. As an aside, you can benefit from a fairly large pool size since they spend much of their time waiting on I/O.
from multiprocessing.pool import ThreadPool
def RetrieveFromAPI(stash, jira):
# boring mock of api
return '{}-{}.'.format(stash, jira)
def TableFormatter(table):
# mock
return table
def TableColouriser(table):
# mock
return table
def ListParser(id_list):
if id_list:
pool = ThreadPool(min(12, len(id_list)))
table = ''.join(pool.map(lambda item: RetrieveFromAPI(item[1], item[0]),
id_list, chunksize=1))
pool.close()
pool.join()
else:
table = ''
table = TableFormatter(table)
table = TableColouriser(table)
return table
id_list = [[0,1,'foo'], [2,3,'bar'], [4,5, 'baz']]
print(ListParser(id_list))
I have a raw query, for example:
# Table posts = 420 rows
>>> cursor = connection.cursor()
>>> posts = Post.objects.raw('SELECT SQL_CALC_FOUND_ROWS posts.* FROM posts LIMIT 1,10')
>>> found_rows = cursor.execute("SELECT FOUND_ROWS()")
>>> print found_rows()
1
I want to know how to get the total number of rows is for paging.
Post.objects.raw() does not execute the query — it only returns a RawQuerySet instance. The actual query will only be executed once you try to iterate over that RawQuerySet (e.g. by calling next(iter(posts)) in your code).
Since you're limiting your query to only 10 results, you might just pull all instances in a list:
posts = list(Post.objects.raw('SELECT SQL_CALC_FOUND_ROWS posts.* FROM posts LIMIT 1,10'))
This will make sure that your query has been executed for your next SELECT FOUND_ROWS() to return the actual count.
I am trying to get the numbers of rows returned from an sqlite3 database in python but it seems the feature isn't available:
Think of php mysqli_num_rows() in mysql
Although I devised a means but it is a awkward: assuming a class execute sql and give me the results:
# Query Execution returning a result
data = sql.sqlExec("select * from user")
# run another query for number of row checking, not very good workaround
dataCopy = sql.sqlExec("select * from user")
# Try to cast dataCopy to list and get the length, I did this because i notice as soon
# as I perform any action of the data, data becomes null
# This is not too good as someone else can perform another transaction on the database
# In the nick of time
if len(list(dataCopy)) :
for m in data :
print("Name = {}, Password = {}".format(m["username"], m["password"]));
else :
print("Query return nothing")
Is there a function or property that can do this without stress.
Normally, cursor.rowcount would give you the number of results of a query.
However, for SQLite, that property is often set to -1 due to the nature of how SQLite produces results. Short of a COUNT() query first you often won't know the number of results returned.
This is because SQLite produces rows as it finds them in the database, and won't itself know how many rows are produced until the end of the database is reached.
From the documentation of cursor.rowcount:
Although the Cursor class of the sqlite3 module implements this attribute, the database engine’s own support for the determination of “rows affected”/”rows selected” is quirky.
For executemany() statements, the number of modifications are summed up into rowcount.
As required by the Python DB API Spec, the rowcount attribute “is -1 in case no executeXX() has been performed on the cursor or the rowcount of the last operation is not determinable by the interface”. This includes SELECT statements because we cannot determine the number of rows a query produced until all rows were fetched.
Emphasis mine.
For your specific query, you can add a sub-select to add a column:
data = sql.sqlExec("select (select count() from user) as count, * from user")
This is not all that efficient for large tables, however.
If all you need is one row, use cursor.fetchone() instead:
cursor.execute('SELECT * FROM user WHERE userid=?', (userid,))
row = cursor.fetchone()
if row is None:
raise ValueError('No such user found')
result = "Name = {}, Password = {}".format(row["username"], row["password"])
import sqlite3
conn = sqlite3.connect(path/to/db)
cursor = conn.cursor()
cursor.execute("select * from user")
results = cursor.fetchall()
print len(results)
len(results) is just what you want
Use following:
dataCopy = sql.sqlExec("select count(*) from user")
values = dataCopy.fetchone()
print values[0]
When you just want an estimate beforehand, then simple use COUNT():
n_estimate = cursor.execute("SELECT COUNT() FROM user").fetchone()[0]
To get the exact number before fetching, use a locked "Read transaction", during which the table won't be changed from outside, like this:
cursor.execute("BEGIN") # start transaction
n = cursor.execute("SELECT COUNT() FROM user").fetchone()[0]
# if n > big: be_prepared()
allrows=cursor.execute("SELECT * FROM user").fetchall()
cursor.connection.commit() # end transaction
assert n == len(allrows)
Note: A normal SELECT also locks - but just until it itself is completely fetched or the cursor closes or commit() / END or other actions implicitely end the transaction ...
I've found the select statement with count() to be slow on a very large DB. Moreover, using fetch all() can be very memory-intensive.
Unless you explicitly design your database so that it does not have a rowid, you can always try a quick solution
cur.execute("SELECT max(rowid) from Table")
n = cur.fetchone()[0]
This will tell you how many rows your database has.
I did it like
cursor.execute("select count(*) from my_table")
results = cursor.fetchone()
print(results[0])
this code worked for me:
import sqlite3
con = sqlite3.connect(your_db_file)
cursor = con.cursor()
result = cursor.execute("select count(*) from your_table").fetchall() #returns array of tupples
num_of_rows = result[0][0]
A simple alternative approach here is to use fetchall to pull a column into a python list, then count the length of the list. I don't know if this is pythonic or especially efficient but it seems to work:
rowlist = []
c.execute("SELECT {rowid} from {whichTable}".\
format (rowid = "rowid", whichTable = whichTable))
rowlist = c.fetchall ()
rowlistcount = len(rowlist)
print (rowlistcount)
The following script works:
def say():
global s #make s global decleration
vt = sqlite3.connect('kur_kel.db') #connecting db.file
bilgi = vt.cursor()
bilgi.execute(' select count (*) from kuke ') #execute sql command
say_01=bilgi.fetchone() #catch one query from executed sql
print (say_01[0]) #catch a tuple first item
s=say_01[0] # assign variable to sql query result
bilgi.close() #close query
vt.close() #close db file
I have a simple, single-client setup for MongoDB and PyMongo 2.6.3. The goal is to iterate over each document in the collection collection and update (save) each document in the process. The approach I'm using looks roughly like:
cursor = collection.find({})
index = 0
count = cursor.count()
while index != count:
doc = cursor[index]
print 'updating doc ' + doc['name']
# modify doc ..
collection.save(doc)
index += 1
cursor.close()
The problem is that save is apparently modifying the order of documents in the cursor. For example, if my collection is made of 3 documents (ids omitted for clarity):
{
"name": "one"
}
{
"name": "two"
}
{
"name": "three"
}
the above program outputs:
> updating doc one
> updating doc two
> updating doc two
If however, the line collection.save(doc) is removed, the output becomes:
> updating doc one
> updating doc two
> updating doc three
Why is this happening? What is the right way to safely iterate and update documents in a collection?
Found the answer in MongoDB documentation:
Because the cursor is not isolated during its lifetime, intervening write operations on a document may result in a cursor that returns a document more than once if that document has changed. To handle this situation, see the information on snapshot mode.
Snapshot mode is enabled on the cursor, and makes a nice guarantee:
snapshot() traverses the index on the _id field and guarantees that the query will return each document (with respect to the value of the _id field) no more than once.
To enable snapshot mode with PyMongo:
cursor = collection.find(spec={},snapshot=True)
as per PyMongo find() documentation. Confirmed that this fixed my problem.
Snapshot does the work.
But on pymongo 2.9 and onwards, the syntax is slightly different.
cursor = collection.find(modifiers={"$snapshot": True})
or for any version,
cursor = collection.find({"$snapshot": True})
as per the PyMongo documentations
I couldn't recreate your situation but maybe, off the top of my head, because fetching the results like you're doing it get's them one by one from the db, you're actually creating more as you go (saving and then fetching the next one).
You can try holding the result in a list (that way, your fetching all results at once - might be heavy, depending on your query):
cursor = collection.find({})
# index = 0
results = [res for res in cursor] #count = cursor.count()
cursor.close()
for res in results: # while index != count //This will iterate the list without you needed to keep a counter:
# doc = cursor[index] // No need for this since 'res' holds the current record in the loop cycle
print 'updating doc ' + res['name'] # print 'updating doc ' + doc['name']
# modify doc ..
collection.save(res)
# index += 1 // Again, no need for counter
Hope it helps