I want to know if SQLAlchemy has problems querying a view. If I query the view with normal SQL on the server like:
SELECT * FROM ViewMyTable WHERE index1 = '608_56_56';
I get a whole bunch of records. But with SQLAlchemy I get only the first one. But in the count is the correct number. I have no idea why.
This is my SQLAlchemy code.
myQuery = Session.query(ViewMyTable)
erg = myQuery.filter(ViewMyTable.index1 == index1.strip())
# Contains the correct number of all entries I found with that query.
totalCount = erg.count()
# Contains only the first entry I found with my query.
ergListe = erg.all()
if you've mapped ViewMyTable, the query will only return rows that have a fully non-NULL primary key. This behavior is specific to versions 0.5 and lower - on 0.6, if any of the columns have a non-NULL in the primary key, the row is turned into an instance. Specify the flag allow_null_pks=True to your mappers to ensure that partial primary keys still count :
mapper(ViewMyTable, myview, allow_null_pks=True)
If OTOH the rows returned have all nulls for the primary key, then SQLAlchemy cannot create an entity since it can't place it into the identity map. You can instead get at the individual columns by querying for them specifically:
for id, index in session.query(ViewMyTable.id, ViewMyTable.index):
print id, index
I was facing similar problem - how to filter view with SQLAlchemy. For table:
t_v_full_proposals = Table(
'v_full_proposals', metadata,
Column('proposal_id', Integer),
Column('version', String),
Column('content', String),
Column('creator_id', String)
)
I'm filtering:
proposals = session.query(t_v_full_proposals).filter(t_v_full_proposals.c.creator_id != 'greatest_admin')
Hopefully it will help:)
Related
I have a datamodel where I store a list of values separated by comma (1,2,3,4,5...).
In my code, in order to work with arrays instead of string, I have defined the model like this one:
class MyModel(db.Model):
pk = db.Column(db.Integer, primary_key=True)
__fake_array = db.Column(db.String(500), name="fake_array")
#property
def fake_array(self):
if not self.__fake_array:
return
return self.__fake_array.split(',')
#fake_array.setter
def fake_array(self, value):
if value:
self.__fake_array = ",".join(value)
else:
self.__fake_array = None
This works perfect and from the point of view of my source code "fake_array" is an array, It's only transformed into string when it's stored in database.
The problem appears when I try to filter by that field. Expressions like this doesn't work:
MyModel.query.filter_by(fake_array="1").all()
It seems that I cant filter using the SqlAlchemy query model.
What can I do here? Is there any way to filter this kind of fields? Is there is a better pattern for the "fake_array" problem?
Thanks!
What you're trying to do should really be replaced with a pair of tables and a relationship between them.
The first table (which I'll call A) contains everything BUT the array column, and it should have a primary key of some sort. You should have another table (which I'll call B) that contains a primary key, a foreign key column to A (which I'll call a_id, and an integer field.
Using this layout, each row in the A table has its associated array in table B where B's a_id == A.id via a join. You can add or remove values from the array by manipulating the rows in table B. You can filter by using a join.
If the order of the values is needed, then create an order column in table B.
I have:
res = db.engine.execute('select count(id) from sometable')
The returned object is sqlalchemy.engine.result.ResultProxy.
How do I get count value from res?
Res is not accessed by index but I have figured this out as:
count=None
for i in res:
count = res[0]
break
There must be an easier way right? What is it? I didn't discover it yet.
Note: The db is a postgres db.
While the other answers work, SQLAlchemy provides a shortcut for scalar queries as ResultProxy.scalar():
count = db.engine.execute('select count(id) from sometable').scalar()
scalar() fetches the first column of the first row and closes the result set, or returns None if no row is present. There's also Query.scalar(), if using the Query API.
what you are asking for called unpacking, ResultProxy is an iterable, so we can do
# there will be single record
record, = db.engine.execute('select count(id) from sometable')
# this record consist of single value
count, = record
The ResultProxy in SQLAlchemy (as documented here http://docs.sqlalchemy.org/en/latest/core/connections.html?highlight=execute#sqlalchemy.engine.ResultProxy) is an iterable of the columns returned from the database. For a count() query, simply access the first element to get the column, and then another index to get the first element (and only) element of that column.
result = db.engine.execute('select count(id) from sometable')
count = result[0][0]
If you happened to be using the ORM of SQLAlchemy, I would suggest using the Query.count() method on the appropriate model as shown here: http://docs.sqlalchemy.org/en/latest/orm/query.html?highlight=count#sqlalchemy.orm.query.Query.count
The current value of an Entity's status attribute can queried as the latest entry in a EntityHistory table for that entity, i.e.
Entities (id) <- EntityHistory (timestamp, entity_id, value)
How do I write an efficient SQLALchemy expression that eagerly loads the current value from the history table for all entities without resulting in N+1 queries?
I tried writing a property for my model, but this generates a query for each (N+1) when I iterate over it. To my knowledge, there is no way to solve this without a subquery, which still seems inefficient to me on the database.
Example EntityHistory data:
timestamp |entity_id| value
==========|=========|======
15:00| 1| x
15:01| 1| y
15:02| 2| x
15:03| 2| y
15:04| 1| z
So the current value for entity 1 would be z and for entity 2 it would be y. The backing database is Postgres.
I think you could use a column_property to load the latest value as an attribute of an Entities instance along other column-mapped attributes:
from sqlalchemy import select
from sqlalchemy.orm import column_property
class Entities(Base):
...
value = column_property(
select([EntityHistory.value]).
where(EntityHistory.entity_id == id). # the id column from before
order_by(EntityHistory.timestamp.desc()).
limit(1).
correlate_except(EntityHistory)
)
A subquery could of course also be used in a query instead of a column_property.
query = session.query(
Entities,
session.query(EntityHistory.value).
filter(EntityHistory.entity_id == Entities.id).
order_by(EntityHistory.timestamp.desc()).
limit(1).
label('value')
)
Performance would naturally depend on proper index being in place:
Index('entityhistory_entity_id_timestamp_idx',
EntityHistory.entity_id,
EntityHistory.timestamp.desc())
In a way this is still your dreaded N+1, as the query uses a subquery per row, but it's hidden in a single round trip to the DB.
If on the other hand having value as a property of Entities is not necessary, in Postgresql you can join with a DISTINCT ON ... ORDER BY query to fetch latest values:
values = session.query(EntityHistory.entity_id,
EntityHistory.value).\
distinct(EntityHistory.entity_id).\
# The same index from before speeds this up.
# Remember nullslast(), if timestamp can be NULL.
order_by(EntityHistory.entity_id, EntityHistory.timestamp.desc()).\
subquery()
query = session.query(Entities, values.c.value).\
join(values, values.c.entity_id == Entities.id)
though in limited testing with dummy data the subquery-as-output-column always beat the join by a notable margin, if every entity had values. On the other hand if there were millions of entities and a lot of missing history values, then a LEFT JOIN was faster. I'd recommend testing on your own data which query suits your data better. For random access of single entity given that the index is in place a correlated subquery is faster. For bulk fetches: test.
I'm trying to compare the unique numerical id of an element in my database with a list of longs.
My GQL query should return those elements which have this id I'm passing as part of their array of longs.
I've tried using a statement of the form:
"SELECT * FROM Table WHERE id IN :1", list_of_stored_ids
I've also tried using this question: GQL query with numeric id in datastore viewer, but I still can't find any way to compare to a list.
Is there such a way? If not, what must I do?
You will need to build up a list of ndb keys, not numeric ids, in order to get this to work.
eg:
ids = [5918782761467904, 5624113645223936, 5463928544952320]
keys = [ndb.Key('<Entity>', id) for id in ids]
entities = ndb.gql("SELECT * FROM <Entity> WHERE __key__ IN :1", keys).fetch()
or (non-GQL version)
entities = ndb.get_multi(keys)
We have been using Cassandra for awhile now and we are trying to get a really optimized table going that will be able to quickly query and filter on about 100k rows.
Our model looks something like this:
class FailedCDR(Model):
uuid = columns.UUID(partition_key=True, primary_key=True)
num_attempts = columns.Integer(index=True)
datetime = columns.Integer()
If I describe the table it clearly shows that num_attempts is index.
CREATE TABLE cdrs.failed_cdrs (
uuid uuid PRIMARY KEY,
datetime int,
num_attempts int
) WITH bloom_filter_fp_chance = 0.01
AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'}
AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99.0PERCENTILE';
CREATE INDEX index_failed_cdrs_num_attempts ON cdrs.failed_cdrs (num_attempts);
We want to be able to run a filter similar to this:
failed = FailedCDR.filter(num_attempts__lte=9)
But this happens:
QueryException: Where clauses require either a "=" or "IN" comparison with either a primary key or indexed field
How can we accomplish a similar task?
If you want to do a range query in CQL, you need the field to be a clustering column.
So you'll want the num_attempts field to be a clustering column.
Also if you want to do a single query, you need all the rows you want to query in the same partition (or a small number of partitions that you can access using an IN clause). Since you only have 100K rows, that is small enough to fit in one partition.
So you could define your table like this:
CREATE TABLE test.failed_cdrs (
partition int,
num_attempts int,
uuid uuid,
datetime int,
PRIMARY KEY (partition, num_attempts, uuid));
You would insert your data with a constant for the partition key, such as 1.
INSERT INTO failed_cdrs (uuid, datetime, num_attempts, partition)
VALUES ( now(), 123, 5, 1);
Then you can do range queries like this:
SELECT * from failed_cdrs where partition=1 and num_attempts >=8;
The drawback to this method is that to change the value of num_attempts, you need to delete the old row and insert a new row since you are not allowed to update key fields. You could do the delete and insert for that in a batch statement.
A better option that will become available in Cassandra 3.0 is to make a materialized view that has num_attempts as a clustering column, in which case Cassandra would take care of the delete and insert for you when you updated num_attempts in the base table. The 3.0 release is currently in beta testing.