Django: annotate Sum Case When depending on the status of a field

Django: annotate Sum Case When depending on the status of a field - python

In my application i need to get all transactions per day for the last 30 days.
In transactions model i have a currency field and i want to convert the value in euro if the chosen currency is GBP or USD.
models.py
class Transaction(TimeMixIn):
COMPLETED = 1
REJECTED = 2
TRANSACTION_STATUS = (
(COMPLETED, _('Completed')),
(REJECTED, _('Rejected')),
)
user = models.ForeignKey(CustomUser)
status = models.SmallIntegerField(choices=TRANSACTION_STATUS, default=COMPLETED)
amount = models.DecimalField(default=0, decimal_places=2, max_digits=7)
currency = models.CharField(max_length=3, choices=Core.CURRENCIES, default=Core.CURRENCY_EUR)
Until now this is what i've been using:
Transaction.objects.filter(created__gte=last_month, status=Transaction.COMPLETED)
.extra({"date": "date_trunc('day', created)"})
.values("date").annotate(amount=Sum("amount"))
which returns a queryset containing dictionaries with date and amount:
<QuerySet [{'date': datetime.datetime(2018, 6, 19, 0, 0, tzinfo=<UTC>), 'amount': Decimal('75.00')}]>
and this is what i tried now:
queryset = Transaction.objects.filter(created__gte=last_month, status=Transaction.COMPLETED).extra({"date": "date_trunc('day', created)"}).values('date').annotate(
amount=Sum(Case(When(currency=Core.CURRENCY_EUR, then='amount'),
When(currency=Core.CURRENCY_USD, then=F('amount') * 0.8662),
When(currency=Core.CURRENCY_GBP, then=F('amount') * 1.1413), default=0, output_field=FloatField()))
)
which is converting gbp or usd to euro but it creates 3 dictionaries with the same day instead of making the sum of them.
This is what it returns: <QuerySet [{'date': datetime.datetime(2018, 6, 19, 0, 0, tzinfo=<UTC>), 'amount': 21.655}, {'date': datetime.datetime(2018, 6, 19, 0, 0, tzinfo=<UTC>), 'amount': 28.5325}, {'date': datetime.datetime(2018, 6, 19, 0, 0, tzinfo=<UTC>), 'amount': 25.0}]>
and this is what i want:
<QuerySet [{'date': datetime.datetime(2018, 6, 19, 0, 0, tzinfo=<UTC>), 'amount': 75.1875}]>

The only thing that remains is an order_by. This will (yeah, I know that sounds strange), force Django to perform a GROUP BY. So it should be rewritten to:
queryset = Transaction.objects.filter(
created__gte=last_month,
status=Transaction.COMPLETED
).extra(
{"date": "date_trunc('day', created)"}
).values(
'date'
).annotate(
amount=Sum(Case(
When(currency=Core.CURRENCY_EUR, then='amount'),
When(currency=Core.CURRENCY_USD, then=F('amount') * 0.8662),
When(currency=Core.CURRENCY_GBP, then=F('amount') * 1.1413),
default=0,
output_field=FloatField()
))
).order_by('date')
(I here fixed the formatting a bit to make it more readable, especially for small screens, but it is (if we ignore spacing) the same as in the question, except for .order_by(..) of course.)

We need to aggregate the query set to accomplish what you are trying.
Try using aggregate()
queryset = Transaction.objects.filter(created__gte=last_month, status=Transaction.COMPLETED).extra({"date": "date_trunc('day', created)"}).values('date').aggregate(
amount=Sum(Case(When(currency=Core.CURRENCY_EUR, then='amount'),
When(currency=Core.CURRENCY_USD, then=F('amount') * 0.8662),
When(currency=Core.CURRENCY_GBP, then=F('amount') * 1.1413), default=0, output_field=FloatField())))
for more info: aggregate()

Related

SageMaker Model Package Group cannot be deleted because it still contains Model Packages

I have created a Model Package Group in SageMaker to store different versions in the Model Registry.
import boto3
model_package = 'risk-model'
sagemaker_boto_client = boto3.client('sagemaker')
sagemaker_boto_client.list_model_packages(ModelPackageGroupName=model_package)["ModelPackageSummaryList"]
>>> [
{'ModelPackageGroupName': 'risk-model',
'ModelPackageVersion': 3,
'ModelPackageArn': 'some_arn_3',
'ModelPackageDescription': 'New Model Version 3',
'CreationTime': datetime.datetime(2022, 4, 5, 15, 9, 3, 800000, tzinfo=tzlocal()),
'ModelPackageStatus': 'Completed',
'ModelApprovalStatus': 'PendingManualApproval'},
{'ModelPackageGroupName': 'risk-model',
'ModelPackageVersion': 2,
'ModelPackageArn': 'some_arn_2',
'ModelPackageDescription': 'New Model Version 2',
'CreationTime': datetime.datetime(2022, 4, 5, 14, 48, 5, 150000, tzinfo=tzlocal()),
'ModelPackageStatus': 'Completed',
'ModelApprovalStatus': 'PendingManualApproval'},
{'ModelPackageGroupName': 'risk-model',
'ModelPackageVersion': 1,
'ModelPackageArn': 'some_arn_1',
'ModelPackageDescription': 'New Model Version 1',
'CreationTime': datetime.datetime(2022, 4, 4, 23, 10, 38, 516000, tzinfo=tzlocal()),
'ModelPackageStatus': 'Completed',
'ModelApprovalStatus': 'Approved'}]
When I want to delete the Model Package Group
sagemaker_boto_client.delete_model_package_group(
ModelPackageGroupName='risk-model'
)
I got the following error
An error occurred (ValidationException) when calling the DeleteModelPackageGroup operation: Model Package Group cannot be deleted because it still contains Model Packages.

You would need to first delete all the Model Packages in the Model Package Group.
Please see the delete_model_package() boto3 API.

Based on #Marc K suggestion the following code snippet works to empty and delete a ModelPackageGroup using boto3
import boto3
import time
sagemaker_client = boto3.client('sagemaker')
model_package = 'risk-fraud-model'
empty_and_delete_model_package(sagemaker_client, model_package)
def empty_and_delete_model_package(sagemaker_client, mpg_name):
mpg = sagemaker_client.list_model_packages(
ModelPackageGroupName=mpg_name,
)
# Delete model packages if Group not empty
model_packages = mpg.get('ModelPackageSummaryList')
if model_packages:
for mp in model_packages:
sagemaker_client.delete_model_package(
ModelPackageName=mp['ModelPackageArn']
)
time.sleep(1)
# Delete model package group
sagemaker_client.delete_model_package_group(
ModelPackageGroupName=mpg_name
)

Django ORM how to get raw values grouped by a field

I have a model which is like so:
class CPUReading(models.Model):
host = models.CharField(max_length=256)
reading = models.IntegerField()
created = models.DateTimeField(auto_now_add=True)
I am trying to get a result which looks like the following:
{
"host 1": [
{
"created": DateTimeField(...),
"value": 20
},
{
"created": DateTimeField(...),
"value": 40
},
...
],
"host 2": [
{
"created": DateTimeField(...),
"value": 19
},
{
"created": DateTimeField(...),
"value": 10
},
...
]
}
I need it grouped by host and ordered by created.
I have tried a bunch of stuff including using values() and annotate() in order to create a GROUP BY statement, but I think I must be missing something because in order to use GROUP BY it seems I need to use some aggregation function which I don't really want to do. I need the actual values of the reading field grouped by the host field and ordered by the created field.
This is more-or-less how any charting library needs the data.
I know I can make it happen with either python code or with raw sql queries, but I'd much prefer to use the django ORM, unless it explicitly disallows this sort of query.

As far as I'm aware, there's nothing in the ORM that makes this easy. If you want to do it in the ORM without raw queries, and if you're willing and able to change your data structure, you can solve this mostly in the ORM, with Python code kept to a minimum:
class Host(models.Model):
pass
class CPUReading(models.Model):
host = models.ForeignKey(Host, related_name="readings", on_delete=models.CASCADE)
reading = models.IntegerField()
created = models.DateTimeField(auto_now_add=True)
With this you can use two queries with fairly clean code:
from collections import defaultdict
results = defaultdict(list)
hosts = Host.objects.prefetch_related("readings")
for host in hosts:
for reading in host.readings.all():
results[host.id].append(
{"created": reading.created, "value": reading.reading}
)
Or you can do it a little more efficiently with one query and a single loop:
from collections import defaultdict
results = defaultdict(list)
readings = CPUReading.objects.select_related("host")
for reading in readings:
results[reading.host.id].append(
{"created": reading.created, "value": reading.reading}
)

Assuming you are using PostgreSQL you can use a combination of array_agg and json_object to achieve what you're after.
from django.contrib.postgres.aggregation import ArrayAgg
from django.contrib.postgres.fields import ArrayField, JSONField
from django.db.models import CharField
from django.db.models.expressions import Func, Value
class JSONObject(Func):
function = 'json_object'
output_field = JSONField()
def __init__(self, **fields):
fields, expressions = zip(*fields.items())
super().__init__(
Value(fields, output_field=ArrayField(CharField())),
Func(*expressions, template='array[%(expressions)s]'),
)
readings = dict(CPUReading.objects.values_list(
'host',
ArrayAgg(
JSONObject(
created_at='created_at',
value='value',
),
ordering='created_at',
),
))

If you want to stay close to the Django ORM, you just need to remember this doesn't return a queryset but a dictionary and is evaluated on the fly, so don't use this in declarative scope. However, the interface is similar to QuerySet.values() and has the additional requirement that it needs to be sorted first.
class PlotQuerySet(models.QuerySet):
def grouped_values(self, key_field, *fields, **expressions):
if key_field not in fields:
fields += (key_field,)
values = self.values(*fields, **expressions)
data = {}
for key, gen in itertools.groupby(values, lambda x: x.pop(key_field)):
data[key] = list(gen)
return data
PlotManager = models.Manager.from_queryset(PlotQuerySet, class_name='PlotManager')
class CpuReading(models.Model):
host = models.CharField(max_length=255)
reading = models.IntegerField()
created_at = models.DateTimeField(auto_now_add=True)
objects = PlotManager()
Example:
CpuReading.objects.order_by(
'host', 'created_at'
).grouped_values(
'host', 'created_at', 'reading'
)
Out[10]:
{'a': [{'created_at': datetime.datetime(2020, 7, 13, 16, 45, 23, 215005, tzinfo=<UTC>),
'reading': 0},
{'created_at': datetime.datetime(2020, 7, 13, 16, 45, 23, 223080, tzinfo=<UTC>),
'reading': 1},
{'created_at': datetime.datetime(2020, 7, 13, 16, 45, 23, 230218, tzinfo=<UTC>),
'reading': 2},
...],
'b': [{'created_at': datetime.datetime(2020, 7, 13, 16, 45, 23, 241476, tzinfo=<UTC>),
'reading': 0},
{'created_at': datetime.datetime(2020, 7, 13, 16, 45, 23, 242015, tzinfo=<UTC>),
'reading': 1},
{'created_at': datetime.datetime(2020, 7, 13, 16, 45, 23, 242537, tzinfo=<UTC>),
'reading': 2},
...]}

Is Using SQL Merge with only one table possible in python? [duplicate]

I have a table containing a student-grade relationship:
Student Grade StartDate EndDate
1 1 09/01/2009 NULL
2 2 09/01/2010 NULL
2 1 09/01/2009 06/15/2010
I am trying to write a stored procedure that takes Student, Grade, and StartDate, and I would like it to
check to make sure these values are not duplicates
insert the record if it's not a duplicate
if there is an existing student record, and it has a EndDate = NULL, then update that record with the StartDate of the new record.
For instance, if I call the procedure and pass in 1, 2, 09/01/2010, I'd like to end up with:
Student Grade StartDate EndDate
1 2 09/01/2010 NULL
1 1 09/01/2009 09/01/2010
2 2 09/01/2010 NULL
2 1 09/01/2009 06/15/2010
This sounds like I could use MERGE, except that I am passing literal values, and I need to perform more than one action. I also have a wicked headache this morning and can't seem to think clearly, so I am fixating on this MERGE solution. If there is a more more obvious way to do this, don't be afraid to point it out.

You can use a MERGE even if you are passing literal values. Here's an example for your issue:
CREATE PROCEDURE InsertStudentGrade(#Student INT, #Grade INT, #StartDate DATE)
AS
BEGIN;
MERGE StudentGrade AS tbl
USING (SELECT #Student AS Student, #Grade AS Grade, #StartDate AS StartDate) AS row
ON tbl.Student = Row.Student AND tbl.Grade = row.Grade
WHEN NOT MATCHED THEN
INSERT(Student, Grade, StartDate)
VALUES(row.Student, row.Grade, row.StartDate)
WHEN MATCHED AND tbl.EndDate IS NULL AND tbl.StartDate != row.StartDate THEN
UPDATE SET
tbl.StartDate = row.StartDate;
END;

I prefer the following, it is cleaner and easier to read and modify.
MERGE Definition.tdSection AS Target
USING
(SELECT *
FROM ( VALUES
( 1, 1, 'Administrator', 1, GETDATE(), NULL, Current_User, GETDATE())
,( 2, 1, 'Admissions', 1, GETDATE(), NULL, Current_User, GETDATE())
,( 3, 1, 'BOM', 1, GETDATE(), NULL, Current_User, GETDATE())
,( 4, 1, 'CRC', 1, GETDATE(), NULL, Current_User, GETDATE())
,( 5, 1, 'ICM', 1, GETDATE(), NULL, Current_User, GETDATE())
,( 6, 1, 'System', 1, GETDATE(), NULL,Current_User, GETDATE())
,( 7, 1, 'Therapy', 1, GETDATE(), NULL, Current_User, GETDATE())
)
AS s (SectionId
,BusinessProcessId
,Description, Sequence
,EffectiveStartDate
,EffectiveEndDate
,ModifiedBy
,ModifiedDateTime)
) AS Source
ON Target.SectionId = Source.SectionId
WHEN NOT MATCHED THEN
INSERT (SectionId
,BusinessProcessId
,Description
,Sequence
,EffectiveStartDate
,EffectiveEndDate
,ModifiedBy
,ModifiedDateTime
)
VALUES (Source.SectionId
,Source.BusinessProcessId
,Source.Description
,Source.Sequence
,Source.EffectiveStartDate
,Source.EffectiveEndDate
,Source.ModifiedBy
,Source.ModifiedDateTime
);

Simply:
--Arrange
CREATE TABLE dbo.Product
(
Id INT IDENTITY PRIMARY KEY,
Name VARCHAR(40),
)
GO
--Act
MERGE INTO dbo.Product AS Target
USING
(
--Here is the trick :)
VALUES
(1, N'Product A'),
(2, N'Product B'),
(3, N'Product C'),
(4, N'Product D')
)
AS
Source
(
Id,
Name
)
ON Target.Id= Source.Id
WHEN NOT MATCHED BY TARGET THEN
INSERT
(
Name
)
VALUES
(
Name
);

How to improve performance of pymongo queries

I inherited an old Mongo database. Let's focus on the following two collections (removed most of their content for better readability):
Collection user
db.user.find_one({"email": "user#host.com"})
{'lastUpdate': datetime.datetime(2016, 9, 2, 11, 40, 13, 160000),
'creationTime': datetime.datetime(2016, 6, 23, 7, 19, 10, 6000),
'_id': ObjectId('576b8d6ee4b0a37270b742c7'),
'email': 'user#host.com' }
Collections entry (one user to many entries):
db.entry.find_one({"userId": _id})
{'date_entered': datetime.datetime(2015, 2, 7, 0, 0),
'creationTime': datetime.datetime(2015, 2, 8, 14, 41, 50, 701000),
'lastUpdate': datetime.datetime(2015, 2, 9, 3, 28, 2, 115000),
'_id': ObjectId('54d775aee4b035e584287a42'),
'userId': '576b8d6ee4b0a37270b742c7',
'data': 'test'}
As you can see, there is no DBRef between the two.
What I would like to do is to count the total number of entries, and the number of entries updated after a given date.
To do this I used Python's pymongo library. The code below gets me what I need, but it is painfully slow.
from pymongo import MongoClient
client = MongoClient('mongodb://foobar/')
db = client.userdata
# First I need to fetch all user ids. Otherwise db cursor will time out after some time.
user_ids = [] # build a list of tuples (email, id)
for user in db.user.find():
user_ids.append( (user['email'], str(user['_id'])) )
date = datetime(2016, 1, 1)
for user_id in user_ids:
email, _id = user_id
t0 = time.time()
query = {"userId": _id}
no_of_all_entries = db.entry.find(query).count()
query = {"userId": _id, "lastUpdate": {"$gte": date}}
no_of_entries_this_year = db.entry.find(query).count()
t1 = time.time()
print("delay ", round(t1 - t0, 2))
print(email, no_of_all_entries, no_of_entries_this_year)
It takes around 0.83 second to run both db.entry.find queries on my laptop, and 0.54 on an AWS server (not the MongoDB server).
Having ~20000 users it takes painful 3 hours to get all the data.
Is that the kind of latency you'd expect to see in Mongo ? What can I do to improve this ? Bear in mind that MongoDB is fairly new to me.

Instead of running two aggregates for all users separately you can just get both aggregates for all users with db.collection.aggregate().
And instead of a (email, userId) tuples we make it a dictionary as it is easier to use to get the corresponding email.
user_emails = {str(user['_id']): user['email'] for user in db.user.find()}
date = datetime(2016, 1, 1)
entry_counts = db.entry.aggregate([
{"$group": {
"_id": "$userId",
"count": {"$sum": 1},
"count_this_year": {
"$sum": {
"$cond": [{"$gte": ["$lastUpdate", date]}, 1, 0]
}
}
}}
])
for entry in entry_counts:
print(user_emails.get(entry['_id']),
entry['count'],
entry['count_this_year'])
I'm pretty sure getting the user's email address into the result could be done but I'm not a mongo expert either.

Unable to access dict values indjango view

I want to save an array of objects passed from javascript through ajax to me database. This is my view code:
data2 = json.loads(request.raw_get_data)
for i in data2:
print(key)
obj = ShoppingCart(quantity = i.quantity , user_id = 3, datetime = datetime.now(), product_id = i.pk)
obj.save()
return render_to_response("HTML.html",RequestContext(request))
After the first line, i get this in my dictionary:
[{'model': 'Phase_2.product', 'fields': {'name': 'Bata', 'category': 2, 'quantity': 1, 'subcategory': 1, 'count': 2, 'price': 50}, 'imageSource': None, 'pk': 1}]
(Only one object in the array right now)
I want to be able access individual fields like quantity, id, etc in order to save the data to my database. When i debug this code, it gives a name error on 'i'. I also tried accessing the fields like this: data2[0].quantity but it gives this error: {AttributeError}dict object has no attribute quantity.
Edited code:
for i in data2:
name = i["fields"]["name"]
obj = ShoppingCart(quantity = i["fields"]["quantity"] , user_id = 3, datetime = datetime.now(), product_id = i["fields"]["pk"])
obj.save()

It might help you to visualise the returned dict with proper formatting:
[
{
'model': 'Phase_2.product',
'fields': {
'name': 'Bata',
'category': 2,
'quantity': 1,
'subcategory': 1,
'count': 2,
'price': 50
},
'imageSource': None,
'pk': 1
}
]
The most likely reason for your error is that you are trying to access values of of the inner 'fields' dictionary as if they belong to the outer i dictionary.
i.e.
# Incorrect
i["quantity"]
# Gives KeyError
# Correct
i["fields"]["quantity"]
Edit
You have the same problem in your update:
# Incorrect
i["fields"]["pk"]
# Correct
i["pk"]
The "pk" field is in the outer dictionary, not the inner "fields" dictionary.

You may try:
i['fields']['quantity']
The json.loads() returns you a dictionary, which should be accessed by key.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Django: annotate Sum Case When depending on the status of a field - python

Related

SageMaker Model Package Group cannot be deleted because it still contains Model Packages

Django ORM how to get raw values grouped by a field

Is Using SQL Merge with only one table possible in python? [duplicate]

How to improve performance of pymongo queries

Unable to access dict values indjango view

Categories

Resources