Check if <sqlalchemy.engine.result.ResultProxy> object is empty - python

I'm downloading some data from a SQL Server database through a library that leverages pymssql in the back-end. The result of a curson.execute("""<QUERY BODY>""") is a sqlalchemy.engine.result.ResultProxy object. How can I check if the result of the query was empty, so there are no rows?
cur = ff.sql.create_engine(server=dw.address, db=dw.BI_DW,
login=":".join([os.environ["SQL_USER"],
os.environ["SQL_PASSWD"]]))
for n in range(100):
result = cur.execute("""QUERY BODY;""")
if result:
break
Unfortunately, result will never be None even when no rows were returned by the SQL query.
What's the best way to check for that?

The ResultProxy object does not contain any rows yet. Therefore it has no information about the total amount of them, or even whether there are any. ResultProxy is just a "pointer" to the database. You get your rows only when you explicitly fetch them via ResultProxy. You can do that via iteration over this object, or via .first() method, or via .fetchall() method.
Bottom line: you cannot know the amount of fethced rows until you actually fetch all of them and the ResultProxy object is exhausted.
Approach #1
You can fetch all the rows at once and count them and then do whatever you need with them:
rows = result.fetchall()
if len(rows):
# do something with rows
The downside of this method is that we load all rows into memory at once (rows is a Python list containing all the fetched rows). This may not be desirable if the amount of fetched rows is very large and/or if you only need to iterate over the rows one-by-one independently (usually that's the case).
Approach #2
If loading all fetched rows into memory at once is not acceptable, then we can do this:
rows_amount = 0
for row in result:
rows_amount += 1
# do something with row
if not rows_amount:
print('There were zero rows')
else:
print('{} rows were fetched and processed'.format(rows_amount))

SQLAlchemy < 1.2: You can always turn the ResultProxy into an iterator:
res = engine.execute(...)
rp_iter = iter(res)
row_count = 0
try:
row = next(rp_iter)
row_count += 1
except StopIteration:
# end of data
if not row_count:
# no rows returned, StopIteration was raised on first attempt
In SQLAlchemy >= 1.2, the ResultProxy implements both .next() and .__next__(), so you do not need to create the iterator:
res = engine.execute()
row_count = 0
try:
row = next(res)
row_count += 1
except StopIteration:
...

Related

Want to run a query multiple times using a for loop and add each result into a dictionary. This code only execute once even as it loops through

I have a query that take a random sample of records. I want to do this multiple times and add each result into a dictionary, which I will concat into a pandas DataFrame later. This code only execute once even as it loops through.
cursor.execute("select record1, record2 from table order by random() limit 1000")
d = {}
for x in range(10):
d[x] = pd.DataFrame(cursor.fetchall())
cursor.fetchall() doesn't execute the query, it just fetches the remaining results from the query that was already executed by cursor.execute(). The first iteration of the loop fetches everything, so the other 9 iterations have nothing left to fetch and you get empty dataframes.
You need to move the cursor.execute() call into the loop.
d = {}
for x in range(10):
cursor.execute("select record1, record2 from table order by random() limit 1000")
d[x] = pd.DataFrame(cursor.fetchall())
Note that there will likely be overlap between the records in each dataframe. If you don't want that, you should do a single query for 10,000 records, and then slice them into dataframes for each block of 1,000.
cursor.execute("select record1, record2 from table order by random() limit 10000")
rows = cursor.fetchall()
d = {}
for x in range(0,10000,1000):
d[x/1000] = pd.DataFrame(rows[x:x+1000])

If statement in Flask using string variable from SQL database

I'm creating a program where I need to check if a certain cell in a table equals a string value and, if it does not, to not change that value. Here is some snippet of the code for clarification:
if (db.execute("SELECT :rowchosen FROM userboard WHERE column=:columnchosen", rowchosen = rowchosen, columnchosen = columnchosen)) == '-'):
#change value of cell
else:
#go to a new page that displays an error
Yet, whenever I run this code, I always get an error because the value (I believe) prints as a dictionary value, something like {"row" = 'row'} of that sort. Any help/advice as to why this happens?
Are you sure that userboard is the database and not the table?
i think, here is what you want to do
conn = sqlite3.connect(db_file)
cur = conn.cursor()
cur.execute("SELECT * FROM userboard WHERE one=?", (columnchosen,))
rows = cur.fetchall()
for row in rows:
print(row)
now, in the loop for row in rows: you need to perform your check. For all the rows returned, you need to check each row for - in the appropriate column
also check out http://www.sqlitetutorial.net/sqlite-python/sqlite-python-select/

Getting COUNT from sqlalchemy

I have:
res = db.engine.execute('select count(id) from sometable')
The returned object is sqlalchemy.engine.result.ResultProxy.
How do I get count value from res?
Res is not accessed by index but I have figured this out as:
count=None
for i in res:
count = res[0]
break
There must be an easier way right? What is it? I didn't discover it yet.
Note: The db is a postgres db.
While the other answers work, SQLAlchemy provides a shortcut for scalar queries as ResultProxy.scalar():
count = db.engine.execute('select count(id) from sometable').scalar()
scalar() fetches the first column of the first row and closes the result set, or returns None if no row is present. There's also Query.scalar(), if using the Query API.
what you are asking for called unpacking, ResultProxy is an iterable, so we can do
# there will be single record
record, = db.engine.execute('select count(id) from sometable')
# this record consist of single value
count, = record
The ResultProxy in SQLAlchemy (as documented here http://docs.sqlalchemy.org/en/latest/core/connections.html?highlight=execute#sqlalchemy.engine.ResultProxy) is an iterable of the columns returned from the database. For a count() query, simply access the first element to get the column, and then another index to get the first element (and only) element of that column.
result = db.engine.execute('select count(id) from sometable')
count = result[0][0]
If you happened to be using the ORM of SQLAlchemy, I would suggest using the Query.count() method on the appropriate model as shown here: http://docs.sqlalchemy.org/en/latest/orm/query.html?highlight=count#sqlalchemy.orm.query.Query.count

Processing each row of a large database table in Python

Context
I have a function in python that scores a row in my table. I would like to combine the scores of all the rows arithmetically (eg. computing the sum, average, etc.. of the scores).
def compute_score(row):
# some complicated python code that would be painful to convert into SQL-equivalent
return score
The obvious first approach is to simply read in all the data
import psycopg2
def sum_scores(dbname, tablename):
conn = psycopg2.connect(dbname)
cur = conn.cursor()
cur.execute('SELECT * FROM ?', tablename)
rows = cur.fetchall()
sum = 0
for row in rows:
sum += score(row)
conn.close()
return sum
Problem
I would like to be able to handle as much data as my database can hold. This could be larger that what would fit into Python's memory, so fetchall() seems to me like it would not function correctly in that case.
Proposed Solutions
I was considering 3 approaches, all with the aim of processing a couple records at a time:
One-by-one record processing using fetchone()
def sum_scores(dbname, tablename):
...
sum = 0
for row_num in cur.rowcount:
row = cur.fetchone()
sum += score(row)
...
return sum
Batch-record processing using fetchmany(n)
def sum_scores(dbname, tablename):
...
batch_size = 1e3 # tunable
sum = 0
batch = cur.fetchmany(batch_size)
while batch:
for row in batch:
sum += score(row)
batch = cur.fetchmany(batch_size)
...
return sum
Relying on the cursor's iterator
def sum_scores(dbname, tablename):
...
sum = 0
for row in cur:
sum += score(row)
...
return sum
Questions
Was my thinking correct in that my 3 proposed solutions would only pull in manageable sized chunks of data at a time? Or do they suffer from the same problem as fetchall?
Which of the 3 proposed solutions would work (ie. compute the correct score combination and not crash in the process) for LARGE datasets?
How does the cursor's iterator (Proposed Solution #3) actually pull in data into Python's memory? One-by-one, in batches, or all at once?
All 3 solutions will work, and only bring a subset of the results into memory.
Iterating via the cursor, Proposed solution #3, will work the same as Proposed Solution #2, if you pass a name to the cursor. Iterating over the cursor will fetch itersize records (default is 2000).
Solutions #2 and #3 will be much quicker than #1, because there is much less of a connection overhead.
http://initd.org/psycopg/docs/cursor.html#fetch

Python retrieve individual item from fetchone()

I have a SQL query in python like so to retrieve the first, fourth, and fifth column elements if it exists
cur2.execute('SELECT * FROM duplicates where TITLE=?', [post_title])
sql2.commit()
if cur2.fetchone():
repost_permalink = cur.fetchone()[0]
repost_title = cur.fetchone()[3]
repost_submitter = cur.fetchone()[4]
Forr some reason I keep getting the error:
repost_permalink = cur.fetchone()[0]
TypeError: 'NoneType' object has no attribute '__getitem__'
Am I accessing the element incorrectly?
Every time you call fetchone(), it fetches another row. So you are fetching four different rows in your code, because you call fetchone four times. If there aren't that many rows in the result set, some of them will be None.
If you want to get parts of a single row, store the row and then access it:
row = cur2.fetchone()
if row:
repost_permalink = row[0]
repost_title = row[3]
repost_submitter = row[4]

Categories

Resources