GQL query checking numerical id within a list of ids - python

I'm trying to compare the unique numerical id of an element in my database with a list of longs.
My GQL query should return those elements which have this id I'm passing as part of their array of longs.
I've tried using a statement of the form:
"SELECT * FROM Table WHERE id IN :1", list_of_stored_ids
I've also tried using this question: GQL query with numeric id in datastore viewer, but I still can't find any way to compare to a list.
Is there such a way? If not, what must I do?

You will need to build up a list of ndb keys, not numeric ids, in order to get this to work.
eg:
ids = [5918782761467904, 5624113645223936, 5463928544952320]
keys = [ndb.Key('<Entity>', id) for id in ids]
entities = ndb.gql("SELECT * FROM <Entity> WHERE __key__ IN :1", keys).fetch()
or (non-GQL version)
entities = ndb.get_multi(keys)

Related

SQLAlchemy: how to obtain all distinct values from Array field?

I have the following model of a blog post:
title = db.Column(db.String())
content = db.Column(db.String())
tags = db.Column(ARRAY(db.String))
Tags field can be an empty list.
Now I want to select all distinct tags from the database entries with max performance - excluding empty arrays.
So, say I have 3 records with the following values of the tags field:
['database', 'server', 'connection']
[]
['connection', 'security']
The result would be ['database', 'server', 'connection', 'security']
The actual order is not important.
The distinct() method should still work fine with array columns.
from sqlalchemy import func
unique_vals = BlogPost.query(func.unnest(BlogPost.tags)).distinct().all()
https://docs.sqlalchemy.org/en/13/orm/query.html?highlight=distinct#sqlalchemy.orm.query.Query.distinct
This would be identical to running a query in postgres:
SELECT DISTINCT unnest(tags) FROM blog_posts
If you can process the results after(usually you can) and don't want to use a nested query for this sort of thing, I usually resort to doing something like;
func.array_agg(func.array_to_string(BlogPost.tags, "||")).label("tag_lists")
and then split on the join string(||) after.

Dealing with Arrays in Flask-SqlAlchemy and MySQL

I have a datamodel where I store a list of values separated by comma (1,2,3,4,5...).
In my code, in order to work with arrays instead of string, I have defined the model like this one:
class MyModel(db.Model):
pk = db.Column(db.Integer, primary_key=True)
__fake_array = db.Column(db.String(500), name="fake_array")
#property
def fake_array(self):
if not self.__fake_array:
return
return self.__fake_array.split(',')
#fake_array.setter
def fake_array(self, value):
if value:
self.__fake_array = ",".join(value)
else:
self.__fake_array = None
This works perfect and from the point of view of my source code "fake_array" is an array, It's only transformed into string when it's stored in database.
The problem appears when I try to filter by that field. Expressions like this doesn't work:
MyModel.query.filter_by(fake_array="1").all()
It seems that I cant filter using the SqlAlchemy query model.
What can I do here? Is there any way to filter this kind of fields? Is there is a better pattern for the "fake_array" problem?
Thanks!
What you're trying to do should really be replaced with a pair of tables and a relationship between them.
The first table (which I'll call A) contains everything BUT the array column, and it should have a primary key of some sort. You should have another table (which I'll call B) that contains a primary key, a foreign key column to A (which I'll call a_id, and an integer field.
Using this layout, each row in the A table has its associated array in table B where B's a_id == A.id via a join. You can add or remove values from the array by manipulating the rows in table B. You can filter by using a join.
If the order of the values is needed, then create an order column in table B.

What is the best way to store and query a very large number of variable-length lists in a MySQL database?

Maybe this question will be made more clear through an example. Let's say the dataset I'm working with is a whole bunch (several gigabytes) of variable-length lists of tuples, each associated with a unique ID and a bit of metadata, and I want to be able quickly retrieve any of these lists by its ID.
I currently have two tables set up more or less like this:
TABLE list(
id VARCHAR PRIMARY KEY,
flavor VARCHAR,
type VARCHAR,
list_element_start INT,
list_element_end INT)
TABLE list_element(
id INT PRIMARY KEY,
value1 FLOAT,
value2 FLOAT)
To pull a specific list out of the database I currently do something like this:
SELECT list_element_start, list_element_end FROM list WHERE id = 'my_list_id'
Then I use the retrieved list_element_start and list_element_end values to get the list elements:
SELECT *
FROM list_element
WHERE id BETWEEN(my_list_element_start, my_list_element_end)
Of course, this works very fast, but I feel as though there's a better way to do this. I'm aware that I could have another column in list_element_end called list_id, and then do something like SELECT * FROM list_element WHERE list_id = 'my_list_id' ORDER BY id. However, it seems to me that having that extra column, as well as a foreign key index on that column would take up a lot of unnecessary space.
Is there simpler way to do this?
Apologies if this question has been asked before, but I was unable to locate the answer. I'd also like to use SQLAlchemy in Python to do all of this, if possible.
Thanks in advance!
between is not a function so I don't know what you think is going on there. Anyway... Why not:
SELECT e.*
FROM list_element e
Join list l
On l.id between e.my_list_element_start and my_list_element_end
Or am I missing something
You can normalize each element of your array into a row. The following is the declarative style in SQLAlchemy that will give you a "MyList" object with flavor etc, and then elements will be an actual Python list of each "MyElement" object. You could get more complicated to weed out the extra id and idx within the returned element list, but this should be plenty fast enough.
Also, above, you had mixed varchar and int for your primary key, not sure if it was just oversight, but you ought not do that. Additionally, when handling large data sets remember options like chunking. You can use offset and limit to work with smaller sizes and process iteratively.
class MyList(Base):
__tablename__ = 'my_list'
id = Column(Integer, primary_key=True)
flavor = Column(String)
list_type = Column(String)
elements = Relationship('my_element', order_by='my_element.idx')
class MyElement(Base):
__tablename__ = 'my_element'
id = Column(Integer, ForeignKey('my_list.id'))
idx = Column(Integer)
val = Column(Integer)
__table_args__ = (PrimaryKeyConstraint('id','idx'), )

Obtaining data from PostgreSQL as Dictionary

I have a database table with multiple fields which I am querying and pulling out all data which meets certain parameters. I am using psycopg2 for python with the following syntax:
cur.execute("SELECT * FROM failed_inserts where insertid='%s' AND site_failure=True"%import_id)
failed_sites= cur.fetchall()
This returns the correct values as a list with the data's integrity and order maintained. However I want to query the list returned somewhere else in my application and I only have this list of values, i.e. it is not a dictionary with the fields as the keys for these values. Rather than having to do
desiredValue = failed_sites[13] //where 13 is an arbitrary number of the index for desiredValue
I want to be able to query by the field name like:
desiredValue = failed_sites[fieldName] //where fieldName is the name of the field I am looking for
Is there a simple way and efficient way to do this?
Thank you!
cursor.description will give your the column information (http://www.python.org/dev/peps/pep-0249/#cursor-objects). You can get the column names from it and use them to create a dictionary.
cursor.execute('SELECT ...')
columns = []
for column in cursor.description:
columns.append(column[0].lower())
failed_sites = {}
for row in cursor:
for i in range(len(row)):
failed_sites[columns[i]] = row[i]
if isinstance(row[i], basestring):
failed_sites[columns[i]] = row[i].strip()
The "Dictionary-like cursor", part of psycopg2.extras, seems what you're looking for.

Querying a view in SQLAlchemy

I want to know if SQLAlchemy has problems querying a view. If I query the view with normal SQL on the server like:
SELECT * FROM ViewMyTable WHERE index1 = '608_56_56';
I get a whole bunch of records. But with SQLAlchemy I get only the first one. But in the count is the correct number. I have no idea why.
This is my SQLAlchemy code.
myQuery = Session.query(ViewMyTable)
erg = myQuery.filter(ViewMyTable.index1 == index1.strip())
# Contains the correct number of all entries I found with that query.
totalCount = erg.count()
# Contains only the first entry I found with my query.
ergListe = erg.all()
if you've mapped ViewMyTable, the query will only return rows that have a fully non-NULL primary key. This behavior is specific to versions 0.5 and lower - on 0.6, if any of the columns have a non-NULL in the primary key, the row is turned into an instance. Specify the flag allow_null_pks=True to your mappers to ensure that partial primary keys still count :
mapper(ViewMyTable, myview, allow_null_pks=True)
If OTOH the rows returned have all nulls for the primary key, then SQLAlchemy cannot create an entity since it can't place it into the identity map. You can instead get at the individual columns by querying for them specifically:
for id, index in session.query(ViewMyTable.id, ViewMyTable.index):
print id, index
I was facing similar problem - how to filter view with SQLAlchemy. For table:
t_v_full_proposals = Table(
'v_full_proposals', metadata,
Column('proposal_id', Integer),
Column('version', String),
Column('content', String),
Column('creator_id', String)
)
I'm filtering:
proposals = session.query(t_v_full_proposals).filter(t_v_full_proposals.c.creator_id != 'greatest_admin')
Hopefully it will help:)

Categories

Resources