Fetch related ID's and aggregate in a new column - python

So my problem is coming from a poor understanding of the complexity of my query. A bit of a background story to this ask.
It's a car rental and search website, which started as a personal project of mine. I am using Django 2.1 for this, as well as Postgres.
The setup is as follows: I have a car model, which has an ID, a category, a car type, an engine etc. Secondly, there is an address table, that I use for all sorts of things.
What I would like to do now is the following:
I want to create Google Ads specific .csv files. This file needs a specific column with aggregated integers, in order to show 'related content' for the user. Meaning: You have seen Car A, here is a selection of related or similar cars in that area: Car K, O and Q.
I don't really have a problem making a csv from my data, but my problem is rather in constructing the query for this to work. I have the following as a first step:
cars = Car.objects
.select_all_related()
.only(
'id',
'name',
'address__city',
'address__city_area',
'images'
)
1
select_all_related joins the address table, because that's where the car is located. It also makes my only() call work, since I want to pull out specific fields
Select Related Doc Reference
2
only gives me only the fields I want, since I don't want to sent the whole model anyway, this also works.
Only Doc Reference
So selecting and getting the correct data is not the problem, but:
The real problem:
The following code should create a column in the table. This column should have aggregated IDs of the cars that are in a similar area (city, and city area). And this is sadly a requirement by the Google Ads thing I use.
def find_similiar_cars_in_same_city(cars: QuerySet):
"""Annotate the QuerySet with a field called similar_cars_ids containing
a list of ad IDs or None if there are none."""
similar_cars_queryset = Cars.objects.filter(
address__city=OuterRef('address__city'),
address__city_area=OuterRef('address__city_area'),
).exclude(id=OuterRef('id')).values_list(ArrayAgg('id'), flat=True)
# Hack: Remove GROUP BY that Django adds when adding ArrayAgg.
similar_cars_queryset.query.group_by = []
cars = cars.annotate(similar_cars_ids=Subquery(
similar_cars_queryset,
output_field=ArrayField(models.IntegerField())
))
return cars
And this kinda works. just takes forever. You also can see the comment I made in the code, that annotate() actually groups by which I don't really want here. I run everything locally and even just having 10 cars takes about 12 seconds. I'm unsure if im missing anything. It kinda works, but wont work for larger sample size. I ran it against a DB with roughly 14k cars and it just never finished.
So to summarize: I want to make a function that creates a column in the db with aggregated IDs of similar cars.
Anyone has a pointer on where to make this more efficient? And please ask if there are any more questions and I forgot to mention something!

Unless you're paginating over the results, it might be easier to handle it in python.
cars = Car.objects
.select_all_related()
.only(
'id',
'name',
'address__city',
'address__city_area',
'images'
)
cars_in_area_map = defaultdict(set)
for car in cars:
cars_in_area_map[(car.address.city, car.address.city_area)].add(car.id)
# Generate csv:
data = [
car.id,
car.name,
car.address.city,
car.address.city_area,
car.image,
{id for id in cars_in_area_map[(car.address.city, car.address.city_area)] if id != car.id},
]
If you need to paginate over them, you could try doing it via address:
data = []
addresses = Address.objects.prefetch_related('car_set')
for address in addresses:
cars = list(address.car_set.all())
for car in cars:
data.append([
car.id,
car.name,
address.city,
address.city_area,
car.image,
{c.id for c in cars if c.id != c},
])

Related

Django: Update multiple fields at once

I am fairly new to Django, but it's fun. I built a table with billing information. Let's say I have table structure like this: id, date, price, money received, and so on.
Now, once in a while, I would like to update that table, because everything may have been filled in except for the receipt of the purchase price. Therefore I thought it to be cool to generate a html table with all the entries from the db table in it. Then in the last column there could be input fields so that I could fill in whether there was a payment or not. But I do not want to make separate entries for each bill. Instead it would be cool, just to fill in the last column at once and the just to click a button "update". And then all the fields in the db should get an update about the payment. But generic views like UpdateView seem to only apply to single objects (or data rows if this is a better name for it).
Could you give me an advice how to get such an update table in Django?
Best regards
You can bulk create a lot of object to easily.
# many to many example
book1 = Book(name='book1 ')
book2 = Book(name='book2 ')
book3 = Book(name='book3 ')
entry = Entry.objects.get(id=1)
entry.books.add(book1 , book2 , book3 )
# other example
books = []
for i in range(20):
books.append(Book(name="blog"+str(i), headline='tagline'+str(i)))
Books.objects.bulk_create(books)

KDB+ query in QPython: Filter based on DataFrame list

I am using qpython to query into a KDB+ database and then performing operations on the output. old_df is output from an earlier qpython sync query which has '[source_id]' as a string column. Now am querying into another database trades_database which has the same fields (as source_id) under a different column name customer (also string, no issues in data type)
params = np.array([])
for i in old_df['source_id']:
params = np.append(params, np.string_(i))
new_df = q.sync('{[w]select from trade_database where customer in w}', *params, pandas=True)
Unfortunately, there is very little available online to solve such queries. I have learned a fair bit from the questions asked in here, but am really stuck here. My list could be very long and so would need to write a query where it is taken as an input only.
I also tried:
new_df= q1.sync('{select from trades_database where customer in (`1234, `ABCD)}', pandas=True)
which works but I get
<qpython.qtype.QLambda object at 0x000000000413F710>
How does one "unpack" a QLambda object?
Please ignore the 2nd question if I am not allowed to ask 2 questions in the same post pls. Apologies in that case.
Thanks!
here is what I did and it seems to work:
params = np.array(one_id) #just input the initial id used to search for old_df, and not put the square brackets to make it into a list
for i in old_df['source_id']:
params = np.append(params,np.string_(i))
params=np.unique(params)
new_df = q1.sync('{[w]select from trades_database where customer in w}', params, pandas=True)

Django filter and get the whole record back when using a .values() column-based annotation

This may be a common query but I've struggled to find an answer. This answer to an earlier question gets me half-way using .annotate() and Count but I can't figure out how then to get the full record for the filtered results.
I'm working with undirected networks and would like to limit the query based on a subset of target nodes.
Sample model:
class Edges(Model):
id = models.AutoField(primary_key=True)
source = models.BigIntegerField()
target = models.BigIntegerField()
I want to get a queryset of Edges where the .target exists within a list passed to filter. I then want to exclude any Edges where the source is not greater than a number (1 in the example below but may change).
Here's the query so far (parenthesis added just for better legibility):
(Edges.objects.filter(target__in=[1234,5678, 9012])
.values('source')
.annotate(source_count=Count("source"))
.filter(source_count__gt=1)
)
This query just delivers the source and new source_count fields but I want the whole record (id, source and target) for the subset.
Should I be using this as a subquery or am I missing some obvious Django-foo?
I would suggest either
Edges.objects.filter(target__in=[1234,5678, 9012], source_count__gt=1)
.annotate(source_count=Count('source'))
.values('id', 'source', 'target', 'source_count')
to get only the values of id, source, target and source_count, or
Edges.objects.filter(target__in=[1234,5678, 9012], source_count__gt=1)
.annotate(source_count=Count('source'))
to get a QuerySet of Edges instances, where not only you get the above values but you can call any methods you have defined on them (might be a db consuming, though).

Can I create Dynamic columns (and models) in django?

I want to create a database of dislike items, but depending on the category of item, it has different columns I'd like to show when all you're looking at is cars. In fact, I'd like the columns to be dynamic based on the category so we can easily an additional property to cars in the future, and have that column show up now too.
For example:
But when you filter on car or person, additional rows show up for filtering.
All the examples that I can find about using django models aren't giving me a very clear picture on how I might accomplish this behavior in a clean, simple web interface.
I would probably go for a model describing a "dislike criterion":
class DislikeElement(models.Model):
item = models.ForeignKey(Item) # Item is the model corresponding to your first table
field_name = models.CharField() # e.g. "Model", "Year born"...
value = models.CharField() # e.g. "Mustang", "1960"...
You would have quite a lot of flexibility in what data you can retrieve. For example, to get for a given item all the dislike elements, you would just have to do something like item.dislikeelements_set.all().
The only problem with this solution is that you would to store in value numbers, strings, dates... under the same data type. But maybe that's not an issue for you.

Variable interpolation in python/django, django query filters [duplicate]

Given a class:
from django.db import models
class Person(models.Model):
name = models.CharField(max_length=20)
Is it possible, and if so how, to have a QuerySet that filters based on dynamic arguments? For example:
# Instead of:
Person.objects.filter(name__startswith='B')
# ... and:
Person.objects.filter(name__endswith='B')
# ... is there some way, given:
filter_by = '{0}__{1}'.format('name', 'startswith')
filter_value = 'B'
# ... that you can run the equivalent of this?
Person.objects.filter(filter_by=filter_value)
# ... which will throw an exception, since `filter_by` is not
# an attribute of `Person`.
Python's argument expansion may be used to solve this problem:
kwargs = {
'{0}__{1}'.format('name', 'startswith'): 'A',
'{0}__{1}'.format('name', 'endswith'): 'Z'
}
Person.objects.filter(**kwargs)
This is a very common and useful Python idiom.
A simplified example:
In a Django survey app, I wanted an HTML select list showing registered users. But because we have 5000 registered users, I needed a way to filter that list based on query criteria (such as just people who completed a certain workshop). In order for the survey element to be re-usable, I needed for the person creating the survey question to be able to attach those criteria to that question (don't want to hard-code the query into the app).
The solution I came up with isn't 100% user friendly (requires help from a tech person to create the query) but it does solve the problem. When creating the question, the editor can enter a dictionary into a custom field, e.g.:
{'is_staff':True,'last_name__startswith':'A',}
That string is stored in the database. In the view code, it comes back in as self.question.custom_query . The value of that is a string that looks like a dictionary. We turn it back into a real dictionary with eval() and then stuff it into the queryset with **kwargs:
kwargs = eval(self.question.custom_query)
user_list = User.objects.filter(**kwargs).order_by("last_name")
Additionally to extend on previous answer that made some requests for further code elements I am adding some working code that I am using
in my code with Q. Let's say that I in my request it is possible to have or not filter on fields like:
publisher_id
date_from
date_until
Those fields can appear in query but they may also be missed.
This is how I am building filters based on those fields on an aggregated query that cannot be further filtered after the initial queryset execution:
# prepare filters to apply to queryset
filters = {}
if publisher_id:
filters['publisher_id'] = publisher_id
if date_from:
filters['metric_date__gte'] = date_from
if date_until:
filters['metric_date__lte'] = date_until
filter_q = Q(**filters)
queryset = Something.objects.filter(filter_q)...
Hope this helps since I've spent quite some time to dig this up.
Edit:
As an additional benefit, you can use lists too. For previous example, if instead of publisher_id you have a list called publisher_ids, than you could use this piece of code:
if publisher_ids:
filters['publisher_id__in'] = publisher_ids
Django.db.models.Q is exactly what you want in a Django way.
This looks much more understandable to me:
kwargs = {
'name__startswith': 'A',
'name__endswith': 'Z',
***(Add more filters here)***
}
Person.objects.filter(**kwargs)
A really complex search forms usually indicates that a simpler model is trying to dig it's way out.
How, exactly, do you expect to get the values for the column name and operation?
Where do you get the values of 'name' an 'startswith'?
filter_by = '%s__%s' % ('name', 'startswith')
A "search" form? You're going to -- what? -- pick the name from a list of names? Pick the operation from a list of operations? While open-ended, most people find this confusing and hard-to-use.
How many columns have such filters? 6? 12? 18?
A few? A complex pick-list doesn't make sense. A few fields and a few if-statements make sense.
A large number? Your model doesn't sound right. It sounds like the "field" is actually a key to a row in another table, not a column.
Specific filter buttons. Wait... That's the way the Django admin works. Specific filters are turned into buttons. And the same analysis as above applies. A few filters make sense. A large number of filters usually means a kind of first normal form violation.
A lot of similar fields often means there should have been more rows and fewer fields.

Categories

Resources