I have a raw query, for example:
# Table posts = 420 rows
>>> cursor = connection.cursor()
>>> posts = Post.objects.raw('SELECT SQL_CALC_FOUND_ROWS posts.* FROM posts LIMIT 1,10')
>>> found_rows = cursor.execute("SELECT FOUND_ROWS()")
>>> print found_rows()
1
I want to know how to get the total number of rows is for paging.
Post.objects.raw() does not execute the query — it only returns a RawQuerySet instance. The actual query will only be executed once you try to iterate over that RawQuerySet (e.g. by calling next(iter(posts)) in your code).
Since you're limiting your query to only 10 results, you might just pull all instances in a list:
posts = list(Post.objects.raw('SELECT SQL_CALC_FOUND_ROWS posts.* FROM posts LIMIT 1,10'))
This will make sure that your query has been executed for your next SELECT FOUND_ROWS() to return the actual count.
Related
I have this query that returns a list of student objects:
query = db.session.query(Student).filter(Student.is_deleted == false())
query = query.options(joinedload('project'))
query = query.options(joinedload('image'))
query = query.options(joinedload('student_locator_map'))
query = query.options(subqueryload('attached_addresses'))
query = query.options(subqueryload('student_meta'))
query = query.order_by(Student.student_last_name, Student.student_first_name,
Student.student_middle_name, Student.student_grade, Student.student_id)
query = query.filter(filter_column == field_value)
students = query.all()
The query itself does not take much time. The problem is converting all these objects (can be 5000+) to Python dicts. It takes over a minute with this many objects.Currently, the code loops thru the objects and converts using to_dict(). I have also tried _dict__ which was much faster but this does not convert all relational objects it seems.
How can I convert all these Student objects and related objects quickly?
Maybe this will help you...
from collections import defaultdict
def query_to_dict(student_results):
result = defaultdict(list)
for obj in student_results:
instance = inspect(obj)
for key, x in instance.attrs.items():
result[key].append(x.value)
return result
output = query_to_dict(students)
query = query.options(joinedload('attached_addresses').joinedload('address'))
By chaining address joinedload to attached_addresses I was able to significantly speed up the query.
My understanding of why this is the case:
Address objects were not being loaded with the initial query. Every iteration thru the loop, the db would get hit to retrieve the Address object. With joined load, Address objects are now loaded upon initial query.
Thanks to Corley Brigman for the help.
I queried two databases to get two relations. I terate over those relations once to form maps, and then again to perform some calculations. However, when I attempt iterate over the same relations a second time, I find that no iteration is actually occurring. Here is the code:
dev_connect = dev_engine.connect()
prod_connect = prod_engine.connect() # from a different database
Relation1 = dev_engine.execute(sqlquery1)
Relation2 = prod_engine.execute(sqlquery)
before_map = {}
after_map = {}
for row in Relation1:
before_map[row['instrument_id']] = row
for row2 in Relation2:
after_map[row2['instrument_id']] = row2
update_count = insert_count = delete_count = 0
change_list = []
count =0
for prod_row in Relation2:
count += 1
result = list(prod_row)
...
change_list.append(result)
count2 = 0
for before_row in Relation1:
count2 += 1
result = before_row
...
print count, count2 # prints 0
before_map and after_map are not empty, so Relation1 and Relation2 definitely have tuples in them. Yet count and count2 are 0, so the prod_row and before_row 'for loops' aren't actually occurring. Why can't I iterate over Relation1 and Relation2 a second time?
When you call execute on a SQL Alchemy engine, you get back a ResultProxy, which is a facade to a DBAPI cursor to the rows your query returns.
Once you iterate over all the results of the ResultProxy, it automatically closes the underlying cursor so you can't use the results again by just iterating over it, as documented on the SQLAlchemy page:
The returned result is an instance of ResultProxy, which references a DBAPI cursor and provides a largely compatible interface with that of the DBAPI cursor. The DBAPI cursor will be closed by the ResultProxy when all of its result rows (if any) are exhausted.
You can solve your problem a couple ways:
Store the results in a list. Just do a list-comprehension against the rows returned:
Relation1 = dev_engine.execute(sqlquery1)
relation1_items = [r for r in Relation1]
# ...
# now you can iterate over relation1_items as much as you want
Do everything you need to in one pass through each row set returned. I don't know if this option is feasible for you since I don't know if the full extent of your calculations require cross-referencing between your before_map and after_map objects.
I have a table with the following model:
CREATE TABLE IF NOT EXISTS {} (
user_id bigint ,
pseudo text,
importance float,
is_friend_following bigint,
is_friend boolean,
is_following boolean,
PRIMARY KEY ((user_id), is_friend_following)
);
I also have a table containing my seeds. Those (20) users are the starting point of my graph. So I select their ID and search in the table above to get their Followers and friends, and from there I build my graph (networkX).
def build_seed_graph(cls, name):
obj = cls()
obj.name = name
query = "SELECT twitter_id FROM {0};"
seeds = obj.session.execute(query.format(obj.seed_data_table))
obj.graph.add_nodes_from(obj.seeds)
for seed in seeds:
query = "SELECT friend_follower_id, is_friend, is_follower FROM {0} WHERE user_id={1}"
statement = SimpleStatement(query.format(obj.network_table, seed), fetch_size=1000)
friend_ids = []
follower_ids = []
for row in obj.session.execute(statement):
if row.friend_follower_id in obj.seeds:
if row.is_friend:
friend_ids.append(row.friend_follower_id)
if row.is_follower:
follower_ids.append(row.friend_follower_id)
if friend_ids:
for friend_id in friend_ids:
obj.graph.add_edge(seed, friend_id)
if follower_ids:
for follower_id in follower_ids:
obj.graph.add_edge(follower_id, seed)
return obj
The problem is that the time it takes to build the graph is too long and I would like to optimize it.
I've got approximately 5 millions rows in my table 'network_table'.
I'm wondering if it would be faster for me if instead of doing a query with a where clauses to just do a single query on whole table? Will it fit in memory? Is that a good Idea? Are there better way?
I suspect the real issue may not be the queries but rather the processing time.
I'm wondering if it would be faster for me if instead of doing a query with a where clauses to just do a single query on whole table? Will it fit in memory? Is that a good Idea? Are there better way?
There should not be any problem with doing a single query on the whole table if you enable paging (https://datastax.github.io/python-driver/query_paging.html - using fetch_size). Cassandra will return up to the fetch_size and will fetch additional results as you read them from the result_set.
Please note that if you have many rows in the table that are non seed related then a full scan may be slower as you will receive rows that will not include a "seed"
Disclaimer - I am part of the team building ScyllaDB - a Cassandra compatible database.
ScyllaDB have published lately a blog on how to efficiently do a full scan in parallel http://www.scylladb.com/2017/02/13/efficient-full-table-scans-with-scylla-1-6/ which applies to Cassandra as well - if a full scan is relevant and you can build the graph in parallel than this may help you.
It seems like you can get rid of the last 2 if statements, since you're going through data that you already have looped through once:
def build_seed_graph(cls, name):
obj = cls()
obj.name = name
query = "SELECT twitter_id FROM {0};"
seeds = obj.session.execute(query.format(obj.seed_data_table))
obj.graph.add_nodes_from(obj.seeds)
for seed in seeds:
query = "SELECT friend_follower_id, is_friend, is_follower FROM {0} WHERE user_id={1}"
statement = SimpleStatement(query.format(obj.network_table, seed), fetch_size=1000)
for row in obj.session.execute(statement):
if row.friend_follower_id in obj.seeds:
if row.is_friend:
obj.graph.add_edge(seed, row.friend_follower_id)
elif row.is_follower:
obj.graph.add_edge(row.friend_follower_id, seed)
return obj
This also gets rid of many append operations on lists that you're not using, and should speed up this function.
Is there any way to fetch the entire dataset in an app engine search index? The below search takes an integer limit through QueryOptions, and the limit which always needs to be present.
I'm unable to determine if there is some special flag that can bypass this limit and return the entire result set. If the query is made without a QueryOptions, the result set is limited to 20 somehow.
_INDEX = search.Index(name=constants.SEARCH_INDEX)
_INDEX.search(query=search.Query(
query,
options=search.QueryOptions(
limit=limit,
sort_options=search.SortOptions(...))))
Any ideas?
You could customise the delete all example, if indeed you want every document in the index rather then every result in a query https://cloud.google.com/appengine/docs/python/search/#Python_Deleting_documents_from_an_index
from google.appengine.api import search
def delete_all_in_index(index_name):
"""Delete all the docs in the given index."""
doc_index = search.Index(name=index_name)
# looping because get_range by default returns up to 100 documents at a time
while True:
# Get a list of documents populating only the doc_id field and extract the ids.
document_ids = [document.doc_id
for document in doc_index.get_range(ids_only=True)]
if not document_ids:
break
# Delete the documents for the given ids from the Index.
doc_index.delete(document_ids)
So you might end up with something like:
while True:
document_ids = [document.doc_id
for document in doc_index.get_range(ids_only=True)]
if not document_ids:
break
# Get then something with the document
for id in document_ids:
document = index.get(id)
You'd probably want to get the document itself in the list comprehension rather then getting the ID then getting the document from that ID, but you get the idea.
Firstly, if you peek into the constructor of QueryOptions, that answers your question why it returns 20 results:
def __init__(self, limit=20, number_found_accuracy=None, cursor=None,
offset=None, sort_options=None, returned_fields=None,
ids_only=False, snippeted_fields=None,
returned_expressions=None):
The reason I think why the API is doing this is to avoid unnecessary fetching of results. You should use an offset if you need to fetch more results upon user action instead of always fetching all results. See this.
from google.appengine.api import search
...
# get the first set of results
page_size = 10
results = index.search(search.Query(query_string='some stuff',
options=search.QueryOptions(limit=page_size))
# calculate pages
pages = results.found_count / page_size
# user chooses page and hence an offset into results
next_page = ith * page_size
# get the search results for that page
results = index.search(search.Query(query_string='some stuff',
options=search.QueryOptions(limit=page_size, offset=next_page))
I am trying to get the numbers of rows returned from an sqlite3 database in python but it seems the feature isn't available:
Think of php mysqli_num_rows() in mysql
Although I devised a means but it is a awkward: assuming a class execute sql and give me the results:
# Query Execution returning a result
data = sql.sqlExec("select * from user")
# run another query for number of row checking, not very good workaround
dataCopy = sql.sqlExec("select * from user")
# Try to cast dataCopy to list and get the length, I did this because i notice as soon
# as I perform any action of the data, data becomes null
# This is not too good as someone else can perform another transaction on the database
# In the nick of time
if len(list(dataCopy)) :
for m in data :
print("Name = {}, Password = {}".format(m["username"], m["password"]));
else :
print("Query return nothing")
Is there a function or property that can do this without stress.
Normally, cursor.rowcount would give you the number of results of a query.
However, for SQLite, that property is often set to -1 due to the nature of how SQLite produces results. Short of a COUNT() query first you often won't know the number of results returned.
This is because SQLite produces rows as it finds them in the database, and won't itself know how many rows are produced until the end of the database is reached.
From the documentation of cursor.rowcount:
Although the Cursor class of the sqlite3 module implements this attribute, the database engine’s own support for the determination of “rows affected”/”rows selected” is quirky.
For executemany() statements, the number of modifications are summed up into rowcount.
As required by the Python DB API Spec, the rowcount attribute “is -1 in case no executeXX() has been performed on the cursor or the rowcount of the last operation is not determinable by the interface”. This includes SELECT statements because we cannot determine the number of rows a query produced until all rows were fetched.
Emphasis mine.
For your specific query, you can add a sub-select to add a column:
data = sql.sqlExec("select (select count() from user) as count, * from user")
This is not all that efficient for large tables, however.
If all you need is one row, use cursor.fetchone() instead:
cursor.execute('SELECT * FROM user WHERE userid=?', (userid,))
row = cursor.fetchone()
if row is None:
raise ValueError('No such user found')
result = "Name = {}, Password = {}".format(row["username"], row["password"])
import sqlite3
conn = sqlite3.connect(path/to/db)
cursor = conn.cursor()
cursor.execute("select * from user")
results = cursor.fetchall()
print len(results)
len(results) is just what you want
Use following:
dataCopy = sql.sqlExec("select count(*) from user")
values = dataCopy.fetchone()
print values[0]
When you just want an estimate beforehand, then simple use COUNT():
n_estimate = cursor.execute("SELECT COUNT() FROM user").fetchone()[0]
To get the exact number before fetching, use a locked "Read transaction", during which the table won't be changed from outside, like this:
cursor.execute("BEGIN") # start transaction
n = cursor.execute("SELECT COUNT() FROM user").fetchone()[0]
# if n > big: be_prepared()
allrows=cursor.execute("SELECT * FROM user").fetchall()
cursor.connection.commit() # end transaction
assert n == len(allrows)
Note: A normal SELECT also locks - but just until it itself is completely fetched or the cursor closes or commit() / END or other actions implicitely end the transaction ...
I've found the select statement with count() to be slow on a very large DB. Moreover, using fetch all() can be very memory-intensive.
Unless you explicitly design your database so that it does not have a rowid, you can always try a quick solution
cur.execute("SELECT max(rowid) from Table")
n = cur.fetchone()[0]
This will tell you how many rows your database has.
I did it like
cursor.execute("select count(*) from my_table")
results = cursor.fetchone()
print(results[0])
this code worked for me:
import sqlite3
con = sqlite3.connect(your_db_file)
cursor = con.cursor()
result = cursor.execute("select count(*) from your_table").fetchall() #returns array of tupples
num_of_rows = result[0][0]
A simple alternative approach here is to use fetchall to pull a column into a python list, then count the length of the list. I don't know if this is pythonic or especially efficient but it seems to work:
rowlist = []
c.execute("SELECT {rowid} from {whichTable}".\
format (rowid = "rowid", whichTable = whichTable))
rowlist = c.fetchall ()
rowlistcount = len(rowlist)
print (rowlistcount)
The following script works:
def say():
global s #make s global decleration
vt = sqlite3.connect('kur_kel.db') #connecting db.file
bilgi = vt.cursor()
bilgi.execute(' select count (*) from kuke ') #execute sql command
say_01=bilgi.fetchone() #catch one query from executed sql
print (say_01[0]) #catch a tuple first item
s=say_01[0] # assign variable to sql query result
bilgi.close() #close query
vt.close() #close db file