sql query in mongoengine - python

I have a document and an embedded document as shown below. And I would like to query the embedded document in mongoengine. In sql, this would be: SELECT A.Nom_PC, B.Intitule from Comptes as A, Vals as B WHERE B.Num = "some value"
class Vals(EmbeddedDocument):
Num = StringField()
Intitule = StringField()
meta = {'allow_inheritance': True}
class Comptes(Document):
Nom_PC = StringField()
PC = ListField(EmbeddedDocumentField(Vals))
meta = {'allow_inheritance': True}
I've tried out some things that didn't work like:
Comptes.objects(Vals__match={ "Num": Num }).aggregate(
{'$project': {
'PC': {
'$filter': {
'input': '$PC',
'as': 'Vals',
'cond': {'$eq': ['$$Vals.Num', Num]}
}
}
}}
)

First off, you really should use
PC = EmbeddedDocumentListField(Vals)
instead of
PC = ListField(EmbeddedDocumentField(Vals))
This is because lists of embedded documents require special considerations.
As to the query:
q = Comptes.objects(PC__Num="some value")
This creates a query for all matching Comptes documents. You can then cherry pick whatever data you wish from each document.
(If, in the future, you need to match on multiple items in an EmbeddedDocument, use the match keyword. See the docs for more info.)
For example:
my_list = []
for doc in q:
for v in doc.PC:
if v.Num == "some value":
my_list.append([doc.Nom_PC, v.Intitule])
For a YouTube vid I made with further details about the EmbeddedDocumentListField: https://www.youtube.com/watch?v=ajwPOyb6VEU&index=6

Related

Flask SQLAlchemy Marshmallow | How do I query all entries between two ID values

I want to be able to query a database and jsonify() to results to send over the server.
My function is supposed to incrementally send x amount of posts every time it called, i.e. Sending posts 1 - 10, ..., Sending posts 31 - 40, ...
I have the following query:
q = Post.query.filter(Post.column.between(x, x + 10))
result = posts_schema.dump(q)
return make_response(jsonify(result), 200) // or would it be ...jsonify(result.data), 200)?
Ideally, it would return something like this:
[
{
"id": 1,
"title": "Title",
"description": "A descriptive description."
},
{
"id": 2,
...
},
...
]
The SQLAlchemy model I am using and the Marshmallow schema:
class Post(db.Model):
id = db.Column(db.Integer, primary_key=True)
title = db.Column(db.String(30))
content = db.Column(db.String(150))
def __init__(self, title, description):
self.title = title
self.description = description
class PostSchema(ma.Schema):
class Meta:
fields = ('id', 'title', 'description')
posts_schema = PostSchema(many=True)
I am new to SQLAlchemy, so I don't know too much about query yet. Another user had to point me in the direction I am in now with the current query, but I don't think it is quite right.
In SQL, I am looking to reproduce the following:
SELECT * FROM Post WHERE id BETWEEN value1 AND value2
To paginate with SQL Alchemy you would do the following:
# In the view function, collect the page and per page values
#app.route('/posts/<int:page>/<int:per_page>', methods=['GET'])
def posts(page=1, per_page=30):
#... insert other logic here
posts = Post.query.order_by(Post.id.asc()) # don't forget to order these by ID
posts = posts.paginate(page=page, per_page=per_page)
return jsonify({
'page': page,
'per_page': per_page,
'has_next': posts.has_next,
'has_prev': posts.has_prev,
'page_list': [iter_page if iter_page else '...' for iter_page in posts.iter_pages()],
'posts': [{
'id': p.id,
'title': p.title,
'content': p.content
} for p in posts.items]
})
On the front end, you would use the page_list, page, per_page, has_next, and has_prev values to help the user choose which page to go to next.
The values you pass in the URL will dictate which page to go to next. This is all handily built into SQLAlchemy for you, which is another reason it is such a great library.
I found out a solution to my question:
Post.query.filter((Post.id >= x) & (Post.id <= (x + 10))).all()

Python SQLAlchemy query distinct returns list of lists instead of dict

I'm using SQLAlchemy to setup some data models and query it. I have the following table class
class Transactions(Base):
__tablename__ = 'simulation_data'
sender_account = db.Column('sender_account', db.BigInteger)
recipient_account = db.Column('recipient_account', db.String)
sender_name = db.Column('sender_name', db.String)
recipient_name = db.Column('recipient_name', db.String)
date = db.Column('date', db.DateTime)
text = db.Column('text', db.String)
amount = db.Column('amount', db.Float)
currency = db.Column('currency', db.String)
transaction_type = db.Column('transaction_type', db.String)
fraud = db.Column('fraud', db.BigInteger)
swift_bic = db.Column('swift_bic', db.String)
recipient_country = db.Column('recipient_country', db.String)
internal_external = db.Column('internal_external', db.String)
ID = Column('ID', db.BigInteger, primary_key=True)
I'm trying to get distinct row values for columns recipient_country and internal_external using the following script
data = db.query(
Transactions.recipient_country,
Transactions.internal_external).distinct()
However, this doesn't retrieve all distinct combinations of these two columns (it neglects values for Transactions.internal_external in this case). Example:
{
"China": "External",
"Croatia": "External",
"Denmark": "Internal",
"England": "External",
"Germany": "External",
"Norway": "External",
"Portugal": "External",
"Sweden": "External",
"Turkey": "External"
}
When I try
data = db.query(
Transactions.recipient_country,
Transactions.internal_external).distinct().all()
The correct output is returned, however it comes out as a list of lists, and not a dict. Example:
[["China","External"],["Croatia","External"],["Denmark","External"],["Denmark","Internal"],["England","External"],["Germany","External"],["Norway","External"],["Portugal","External"],["Sweden","External"],["Turkey","External"]]
I'm trying to reproduce the following SQL query:
SELECT DISTINCT
[recipient_country],
[internal_external]
FROM [somedb].[dbo].[simulation_data];
I want it to return the data as a dict instead. What am I doing wrong?
The key in a dictionary is always unique, so if the country (China) occurs multiple times - once for external and once for external - then setting the value the second time will overwrite the first value:
result = {}
result['China'] = 'internal'
result['China'] = 'external'
print(result) # { 'China': 'external' }
How you should visualise your query more is as a list of objects (or dictionaries), with each object representing one row. Then you can have something like
[dict(country="China", internal="internal"), dict(country="China", internal="external"), ...]
Here, country and internal are the column names. You can also get these from the Query object, using query.column_descriptions
before you execute .all().
EDIT: You can also store the values in an array:
query = db.query(
Transactions.recipient_country,
func.array_agg(Transactions.internal_external.distinct())
).group_by(Transactions.recipient_country)
data = {country: options for country, options in query}
print(data) # { 'China': ['internal', 'external'] }
Or you can use "both" as an identifier to show that internal and external are both possible:
query = db.query(
Transactions.recipient_country,
Transactions.internal_external
).distinct()
data = {}
for country, option in query:
if country in data:
option = 'both'
data[country] = option
print(data) # { 'China': 'both' }

SQL Alchemy update row based on optional fields from rest api

How to elegantly update the DB rows for selected fields that are optional from rest endpoint using the sqlalchemy.
Assume there is a user moel :
class User(Base):
__tablename__ = 'user'
id = Column(u'id', Integer(), primary_key=True)
name = Column(u'name', String(50),nullable = False)
address = Column(u'adress', String(50))
notes = Column(u'notes', String(50))
Example: I have an API that accepts optional parameters to update user data:
Case 1:
{"id":1,
"name":"nameone",
"address":"one address"
}
Case 2: here address is optional
{ "id":1,
"name":"name-1",
"notes":"test notes"
}
I can update the row using SQLalchemy if the fields are known
For Case 1 :
User.update().where(User.id == id).values(name="nameone",address="one address")
For Case 2 :
User.update().where(User.id == id).values(name="name-1",notes="test notes")
Is there any elegant way to do this instead of writing cases for each case scenario using sqlalchemy ORM?
Use Python to do your logic
data = { "id":1,
"name":"name-1",
"notes":"test notes"
}
user = User.query.filter(User.id == data['id']).first()
for attr, val in data.items():
if not attr == 'id':
setattr(user, attr, val)
Just to be clear, it sound like you are asking how to write only one 'update' statement for different combinations of fields.
An elegant way is to use a dictionary variable in the values parameter.
For example:
if case1:
values_dict = {"id":1, "name":"nameone", "address":"one address"}
else:
values_dict = {"id":1, "name":"name-1", "notes":"test notes"}
User.update().where(User.id == id).values(values_dict)
Depending on how your API returns the data you may or may not need use the case logic in this example.
See also https://docs.sqlalchemy.org/en/14/core/dml.html#sqlalchemy.sql.expression.update
"values – Optional dictionary which specifies the SET conditions of
the UPDATE."

Cosmos DB - Delete Document with Python

In this SO question I had learnt that I cannot delete a Cosmos DB document using SQL.
Using Python, I believe I need the DeleteDocument() method. This is how I'm getting the document ID's that are required (I believe) to then call the DeleteDocument() method.
# set up the client
client = document_client.DocumentClient()
# use a SQL based query to get a bunch of documents
query = { 'query': 'SELECT * FROM server s' }
result_iterable = client.QueryDocuments('dbs/DB/colls/coll', query, options)
results = list(result_iterable);
for x in range(0, len (results)):
docID = results[x]['id']
Now, at this stage I want to call DeleteDocument().
The inputs into which are document_link and options.
I can define document_link as something like
document_link = 'dbs/DB/colls/coll/docs/'+docID
And successfully call ReadAttachments() for example, which has the same inputs as DeleteDocument().
When I do however, I get an error...
The partition key supplied in x-ms-partitionkey header has fewer
components than defined in the the collection
...and now I'm totally lost
UPDATE
Following on from Jay's help, I believe I'm missing the partitonKey element in the options.
In this example, I've created a testing database, it looks like this
So I think my partition key is /testPART
When I include the partitionKey in the options however, no results are returned, (and so print len(results) outputs 0).
Removing partitionKey means that results are returned, but the delete attempt fails as before.
# Query them in SQL
query = { 'query': 'SELECT * FROM c' }
options = {}
options['enableCrossPartitionQuery'] = True
options['maxItemCount'] = 2
options['partitionKey'] = '/testPART'
result_iterable = client.QueryDocuments('dbs/testDB/colls/testCOLL', query, options)
results = list(result_iterable)
# should be > 0
print len(results)
for x in range(0, len (results)):
docID = results[x]['id']
print docID
client.DeleteDocument('dbs/testDB/colls/testCOLL/docs/'+docID, options=options)
print 'deleted', docID
According to your description, I tried to use pydocument module to delete document in my azure document db and it works for me.
Here is my code:
import pydocumentdb;
import pydocumentdb.document_client as document_client
config = {
'ENDPOINT': 'Your url',
'MASTERKEY': 'Your master key',
'DOCUMENTDB_DATABASE': 'familydb',
'DOCUMENTDB_COLLECTION': 'familycoll'
};
# Initialize the Python DocumentDB client
client = document_client.DocumentClient(config['ENDPOINT'], {'masterKey': config['MASTERKEY']})
# use a SQL based query to get a bunch of documents
query = { 'query': 'SELECT * FROM server s' }
options = {}
options['enableCrossPartitionQuery'] = True
options['maxItemCount'] = 2
result_iterable = client.QueryDocuments('dbs/familydb/colls/familycoll', query, options)
results = list(result_iterable);
print(results)
client.DeleteDocument('dbs/familydb/colls/familycoll/docs/id1',options)
print 'delete success'
Console Result:
[{u'_self': u'dbs/hitPAA==/colls/hitPAL3OLgA=/docs/hitPAL3OLgABAAAAAAAAAA==/', u'myJsonArray': [{u'subId': u'sub1', u'val': u'value1'}, {u'subId': u'sub2', u'val': u'value2'}], u'_ts': 1507687788, u'_rid': u'hitPAL3OLgABAAAAAAAAAA==', u'_attachments': u'attachments/', u'_etag': u'"00002100-0000-0000-0000-59dd7d6c0000"', u'id': u'id1'}, {u'_self': u'dbs/hitPAA==/colls/hitPAL3OLgA=/docs/hitPAL3OLgACAAAAAAAAAA==/', u'myJsonArray': [{u'subId': u'sub3', u'val': u'value3'}, {u'subId': u'sub4', u'val': u'value4'}], u'_ts': 1507687809, u'_rid': u'hitPAL3OLgACAAAAAAAAAA==', u'_attachments': u'attachments/', u'_etag': u'"00002200-0000-0000-0000-59dd7d810000"', u'id': u'id2'}]
delete success
Please notice that you need to set the enableCrossPartitionQuery property to True in options if your documents are cross-partitioned.
Must be set to true for any query that requires to be executed across
more than one partition. This is an explicit flag to enable you to
make conscious performance tradeoffs during development time.
You could find above description from here.
Update Answer:
I think you misunderstand the meaning of partitionkey property in the options[].
For example , my container is created like this:
My documents as below :
{
"id": "1",
"name": "jay"
}
{
"id": "2",
"name": "jay2"
}
My partitionkey is 'name', so here I have two paritions : 'jay' and 'jay1'.
So, here you should set the partitionkey property to 'jay' or 'jay2',not 'name'.
Please modify your code as below:
options = {}
options['enableCrossPartitionQuery'] = True
options['maxItemCount'] = 2
options['partitionKey'] = 'jay' (please change here in your code)
result_iterable = client.QueryDocuments('dbs/db/colls/testcoll', query, options)
results = list(result_iterable);
print(results)
Hope it helps you.
Using the azure.cosmos library:
install and import azure cosmos package:
from azure.cosmos import exceptions, CosmosClient, PartitionKey
define delete items function - in this case using the partition key in query:
def deleteItems(deviceid):
client = CosmosClient(config.cosmos.endpoint, config.cosmos.primarykey)
# Create a database if not exists
database = client.create_database_if_not_exists(id=azure-cosmos-db-name)
# Create a container
# Using a good partition key improves the performance of database operations.
container = database.create_container_if_not_exists(id=container-name, partition_key=PartitionKey(path='/your-pattition-path'), offer_throughput=400)
#fetch items
query = f"SELECT * FROM c WHERE c.device.deviceid IN ('{deviceid}')"
items = list(container.query_items(query=query, enable_cross_partition_query=False))
for item in items:
container.delete_item(item, 'partition-key')
usage:
deviceid=10
deleteItems(items)
github full example here: https://github.com/eladtpro/python-iothub-cosmos

MongoEngine Query Optimization

I have two collections ScenarioDrivers and ModelDrivers which has One to Many relationship with each other.
class ScenarioDrivers(Document):
meta = {
'collection': 'ScenarioDrivers'
}
ScenarioId = ReferenceField('ModelScenarios')
DriverId = ReferenceField('ModelDrivers')
DriverCalibrationMethod = StringField()
SegmentName = StringField()
DriverValue = ListField()
CalibrationStatus = StringField()
AdjustedValues = ListField(default=[])
CreateDate = DateTimeField(default=ObjectId().generation_time)
LastUpdateDate = DateTimeField(default=datetime.utcnow())
class ModelDrivers(Document):
meta = {
'collection': 'ModelDrivers'
}
PortfolioModelId = ReferenceField('PortfolioModels')
DriverName = StringField()
CreateDate = DateTimeField(default=ObjectId().generation_time)
LastUpdateDate = DateTimeField(default=datetime.utcnow())
FieldFormat = StringField()
DriverData = ListField()
My query is like this.
class GetCalibratedDrivers(Resource):
def get(self, scenario_id):
scenario_drivers_list = []
scenario_drivers = ScenarioDrivers.objects(ScenarioId=scenario_id).exclude('ScenarioId').select_related(1)
for scenario_driver in scenario_drivers:
scenario_driver_dict = {
'id': str(scenario_driver.id),
'DriverId': str(scenario_driver.DriverId.id),
'SegmentName': scenario_driver.SegmentName,
'CalibrationMethod': scenario_driver.DriverCalibrationMethod,
'CalibratedValues': exchange(scenario_driver.DriverValue),
'AdjustedValues': scenario_driver.AdjustedValues,
'LastUpdateDate': formatted_date(scenario_driver.LastUpdateDate),
'FieldFormat': scenario_driver.DriverId.FieldFormat
}
scenario_drivers_list.append(scenario_driver_dict)
return {
'DriverCalibrations': scenario_drivers_list
}
The Query matches 1140 records and then I construct a dictionary and make it a list.
But this API call takes 30s to process just 1140 records. Where I am missing? Please help. I am using latest version of Pymongo and MongoEngine.
I think the problem is not with your query, it is with you looping over 1140 records. I do not see any use of referenced objects so you should consider removing select_related(1). Once you do that, if you want to convert reference object ids to string, you can use as_pymongo() which will do that by default for you. And finally if you must read some data in specific format like formatted_date or exchange, it is better to save them as part of your document. i.e. save FormattedLastUpdateDate with LastUpdateDate. In MongoDB, you have to think about your read specific logic when you save the document.

Categories

Resources