Does anyone know how to get the row count from an SQL Alchemy query ResultProxy object without looping through the result set? The ResultProxy.rowcount attribute shows 0, I would expect it to have a value of 2. For updates it shows the number of rows affected which is what I would expect.
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
engine = create_engine(
'oracle+cx_oracle://user:pass#host:port/database'
)
session = sessionmaker(
bind = engine
, autocommit = False
, autoflush = False
)()
sql_text = u"""
SELECT 1 AS Val FROM dual UNION ALL
SELECT 2 AS Val FROM dual
"""
results = session.execute(sql_text)
print '%s rows returned by query...\n' % results.rowcount
print results.keys()
for i in results:
print repr(i)
Output:
0 rows returned by query...
[u'val']
(1,)
(2,)
resultproxy.rowcount is ultimately a proxy for the DBAPI attribute cursor.rowcount. Most DBAPIs do not provide the "count of rows" for a SELECT query via this attribute; its primary purpose is to provide the number of rows matched by an UPDATE or DELETE statement. A relational database in fact does not know how many rows would be returned by a particular statement until it has finished locating all of those rows; many DBAPI implementations will begin returning rows as the database finds them, without buffering, so no such count is even available in those cases.
To get the count of rows a SELECT query would return, you either need to do a SELECT COUNT(*) up front, or you need to fetch all the rows into an array and perform len() on the array.
The notes at ResultProxy.rowcount discuss this further (http://docs.sqlalchemy.org/en/latest/core/connections.html?highlight=rowcount#sqlalchemy.engine.ResultProxy.rowcount):
Notes regarding ResultProxy.rowcount:
This attribute returns the number of rows matched, which is not necessarily the same as the number of rows that were actually modified - an UPDATE statement, for example, may have no net change on a given row if the SET values given are the same as those present in the row
already. Such a row would be matched but not modified. On backends
that feature both styles, such as MySQL, rowcount is configured by
default to return the match count in all cases.
ResultProxy.rowcount is only useful in conjunction with an UPDATE or DELETE statement. Contrary to what the Python DBAPI says, it does
not return the number of rows available from the results of a SELECT
statement as DBAPIs cannot support this functionality when rows are
unbuffered.
ResultProxy.rowcount may not be fully implemented by all dialects. In particular, most DBAPIs do not support an aggregate rowcount result
from an executemany call. The ResultProxy.supports_sane_rowcount() and ResultProxy.supports_sane_multi_rowcount() methods will report from
the dialect if each usage is known to be supported.
Statements that use RETURNING may not return a correct rowcount.
You could use this:
rowcount = len(results._saved_cursor._result.rows)
Then your code will be
print '%s rows returned by query...\n' % rowcount
print results.keys()
Only tested 'find' queries
It works for me.
Related
I'm having a problem executing this SQL statement with a python list injection. I'm new to teradata SQL, and I'm not sure if this is the appropriate syntax for injecting a list into the where clause.
conn = teradatasql.connect(host='PROD', user='1234', password='1234', logmech='LDAP')
l = ["Comp-EN Routing", "Comp-COLLABORATION"]
l2 = ["PEO", "TEP"]
l3 = ["TCV"]
crsr = conn.cursor()
query = """SELECT SOURCE_ORDER_NUMBER
FROM DL_.BV_DETAIL
WHERE (LEVEL_1 IN ? AND LEVEL_2 IN ?) or LEVEL_3 IN ?"""
crsr.executemany(query, [l,l2,l3])
conn.autocommit = True
I keep getting this error
Version 17.0.0.2] [Session 308831600] [Teradata Database] [Error 3939] There is a mismatch between the number of parameters specified and the number of parameters required.
Late to answer this, but if I found the question someone else will in the future too.
executemany in teradatasql requires that second parameter to be a "sequence of sequences". The most common type of sequence we generally use in Python is a list. Essentially you need a list that contains, for each element in the list, a list.
In your case this may look like:
myListOfLists=[['level1valueA','level1valueA','level3valueA'],['level1valueB','level1valueB','level3valueB']]
Your SQL statement will be executed twice, once for each list in your list.
In your case though I suspect you are wanting to find any combination of the values that you have stored in your three lists which is entirely different ball of wax and is going to take some creativity (generate a list of list with all possible combinations and submit to executemany OR construct a SQL statement that can take in multiple comma delimited lists of values, form a cartesian product, and test for hits)
Want to add some regarding SELECT statement and executemany method: to retrieve all records returned by your query you will need to call .nextset() followed by .fetchall() as many times as it will become False. First .fetchall() will give you only first result (first list of parameters specified).
...
with teradatasql.connect(connectionstring) as conn:
with conn.cursor() as cur:
cur.executemany("SELECT COL1 FROM THEDATABASE.THETABLE WHERE COL1 = ?;",[['A'],['B']])
result=cur.fetchall() # will bring you only rows matching 'A'
if (cur.nextset()):
result2=cur.fetchall() # results for 'B'
...
I have this query:
SELECT COUNT(DISTINCT Serial, DatumOrig, Glucose) FROM values;
I've tried to recreate it with SQLAlchemy this way:
session.query(Value.Serial, Value.DatumOrig, Value.Glucose).distinct().count()
But this translates to this:
SELECT count(*) AS count_1
FROM (SELECT DISTINCT
values.`Serial` AS `values_Serial`,
values.`DatumOrig` AS `values_DatumOrig`,
values.`Glucose` AS `values_Glucose`
FROM values)
AS anon_1
Which does not call the count function inline but wraps the select distinct into a subquery.
My question is: What are the different ways with SQLAlchemy to count a distinct select on multiple columns and what are they translating into?
Is there any solution which would translate into my original query? Is there any serious difference in performance or memory usage?
First off, I think that COUNT(DISTINCT) supporting more than 1 expression is a MySQL extension. You can kind of achieve the same in for example PostgreSQL with ROW values, but the behaviour is not the same regarding NULL. In MySQL if any of the value expressions evaluate to NULL, the row does not qualify. That also leads to what is different between the two queries in the question:
If any of Serial, DatumOrig, or Glucose is NULL in the COUNT(DISTINCT) query, that row does not qualify or in other words does not count.
COUNT(*) is the cardinality of the subquery anon_1, or in other words the count of rows. SELECT DISTINCT Serial, DatumOrig, Glucose will include (distinct) rows with NULL.
Looking at EXPLAIN output for the 2 queries it looks like the subquery causes MySQL to use a temporary table. That will likely cause a performance difference, especially if it is materialized on disk.
Producing the multi valued COUNT(DISTINCT) query in SQLAlchemy is a bit tricky, because count() is a generic function and implemented closer to the SQL standard. It only accepts a single expression as its (optional) positional argument and the same goes for distinct(). If all else fails, you can always revert to text() fragments, like in this case:
# NOTE: text() fragments are included in the query as is, so if the text originates
# from an untrusted source, the query cannot be trusted.
session.query(func.count(distinct(text("`Serial`, `DatumOrig`, `Glucose`")))).\
select_from(Value).\
scalar()
which is far from readable and maintainable code, but gets the job done now. Another option is to write a custom construct that implements the MySQL extension, or rewrite the query as you have attempted. One way to form a custom construct that produces the required SQL would be:
from itertools import count
from sqlalchemy import func, distinct as _distinct
def _comma_list(exprs):
# NOTE: Magic number alert, the precedence value must be large enough to avoid
# producing parentheses around the "comma list" when passed to distinct()
ps = count(10 + len(exprs), -1)
exprs = iter(exprs)
cl = next(exprs)
for p, e in zip(ps, exprs):
cl = cl.op(',', precedence=p)(e)
return cl
def distinct(*exprs):
return _distinct(_comma_list(exprs))
session.query(func.count(distinct(
Value.Serial, Value.DatumOrig, Value.Glucose))).scalar()
I am querying a Postgres database for a large number of results and want to use server side cursors to stream the results to my client. It looks like when I do this, the rowcount attribute of the cursor is now set to -1 after I execute the query. I'm creating the cursor like so:
with db.cursor('cursor_name') as cursor:
Is there a way to find the number of results of my query while streaming results from the database? (I could do a SELECT COUNT(*), but I'd like to avoid that because I'm trying to abstract away the code around the query and that would complicate the API).
In the case of a server-side cursor, although cursor.execute() returns, the query has not necessarily been executed by the server at that point, and so the row count is not available to psycopg2. This is consistent with the DBAPI 2.0 spec which states that rowcount should be -1 if the row count of the last operation is indeterminate.
Attempts to coerce it with cursor.fetchone(), for example, updates cursor.rowcount, but only by the number of items retrieved, so that is not useful. cursor.fetchall() will result in rowcount being correctly set, however, that performs the full query and transfer of data that you seek to avoid.
A possible workaround that avoids a completely separate query to get the count, and which should give accurate results is:
select *, (select count(*) from test) from test;
This will result in each row having the table row count appended as the final column. You can then get the table row count using cursor.fetchone() and then taking the final column:
with db.cursor('cursor_name') as cursor:
cursor.execute('select *, (select count(*) from test) from test')
row = cursor.fetchone()
data, count = row[:-1], row[-1]
Now count will contain the number of rows in the table. You can use row[:-1] to refer to the row data.
This might slow down the query because a possibly expensive SELECT COUNT(*) will be performed, but once done retrieving the data should be fast.
I have a DB with ID/Topic/Definition columns. When a select query is made, with possibly hundreds of parameters, I would like the fetchall call to also return the topic of any non-existent rows with a default text (i.e. "Not Found").
I realize this could be done in a loop, but that would query the DB every cycle and have a significant performance hit. With the parameters joined by "OR" in a single select statement the search is nearly instantaneous.
Is there a way to get a return of the query (topic) with default text for non-existent rows in SQLite?
Table Structure (named "dictionary")
ID|Topic|Definition
1|wd1|def1
2|wd3|def3
Sample Query
SELECT Topic,Definition FROM dictionary WHERE Topic = "wd1" or Topic = "wd2" or topic = "wd3"'
Desired Return
[(wd1, def1), (wd2, "Not Found"), (wd3, def3)]
To get data like wd2 out of a query, such data must be in the database in the first place.
You could put it into a temporary table, or use a common table expression.
To include rows without a match, use an outer join:
WITH IDs(ID) AS ( VALUES ('wd1'), ('wd2'), ('wd3') )
SELECT Topic,
IFNULL(Definition, 'Not Found') AS Definition
FROM IDs
LEFT JOIN dictionary USING (ID);
I have the following query:
self.cursor.execute("SELECT platform_id_episode, title, from table WHERE asset_type='movie'")
Is there a way to get the number of results returned directly? Currently I am doing the inefficient:
r = self.cursor.fetchall()
num_results = len(r)
If you don't actually need the results,* don't ask MySQL for them; just use COUNT:**
self.cursor.execute("SELECT COUNT(*) FROM table WHERE asset_type='movie'")
Now, you'll get back one row, with one column, whose value is the number of rows your other query would have returns.
Notice that I ignored your specific columns and just did COUNT(*). A COUNT(platform_id_episode) would also be legal, but it means the number of found rows with non-NULL platform_id_episode values; COUNT(*) is the number of found rows full stop.***
* If you do need the results… well, you have to call fetchall() or equivalent to get them, so I don't see the problem.
** If you've never used aggregate functions in SQL before, make sure to look over some of the examples on that page; you've probably never realized you can do things like that so simply (and efficiently).
*** If someone taught you "never use * in a SELECT", well, that's good advice, but it's not relevant here. The problem with SELECT * is that it spams all of the columns, in random order, across your result set, instead of the columns you actually need in the order you need. SELECT COUNT(*) doesn't do that.