I have a python function that scrapes some data from a few different websites and I want to save that data into my database only if a certain condition is met. Namely, the scraped data should only be saved if the combination of the location and date field is unique
So in my view I have a new location variable and and date variable and essentially I just need to test this combination of values against what's already in the database. If this combination is unique, then save it. If it's not, then do nothing.
class Speech(models.Model):
location = models.ForeignKey(Location)
speaker = models.CharField(max_lenth=100)
date = models.DateField
I'm pretty new to django so I'm just not sure how to go about executing this sort of database query.
You want a combination of two things. First, you want a inner Meta class to enforce the uniqueness in the database:
class Speech(models.Model):
location = models.ForeignKey(Location)
speaker = models.CharField(max_length=100)
date = models.DateField()
class Meta:
unique_together = ('location', 'date')
Then, when you're doing your data manipulation in your view, you want the get_or_create method of the default model manager:
speech, new = Speech.objects.get_or_create(
location=my_location_string,
date=my_datetime_variable,
)
if new:
speech.speaker = my_speaker_string
speech.save()
I hope that gets you started. As always, you know your needs better than I do, so don't blindly copy this example, but adapt it to your needs.
Documentation:
unique_together
get_or_create
Related
Django 1.11, python 2.7, postgresql
I have a set of models that look like this:
class Book(Model):
released_at=DateTimeField()
class BookPrice(Model):
price = DecimalField()
created_at = DateTimeField()
Assuming multiple entries for Book and BookPrice (created at different points in time), I want to get a QuerySet of Book annotated with the BookPrice.price value that was current at the time the Book was released. Something like:
books = Book.objects.annotate(
old_price=Subquery(BookPrice.objects.filter(
created_at__lt=OuterRef('released_at')
)
.order_by('created_at')
.last()
.price
)
)
When I try something like this, I get an error: This queryset contains a reference to an outer query and may only be used in a subquery.
I could get the data with a for loop easily enough, but I'm trying to prepare a large chunk of data for a CSV download and I don't want to iterate through every book if I can help it.
Your problem is that you are doing .last().price. This code resolves (executes) the query and tries to get a python object. Hence the error you are getting as the query you are trying to execute contains an OuterRef, therefore it cannot be executed.
You should transform your query into something like the following:
last_price_before_release_query = BookPrice.objects.filter(created_at__lt=OuterRef('released_at')).order_by('-created_at').values('price') # Note the reversed ordering
books = Book.objects.annotate(old_price=Subquery(last_price_before_release_query[:1]))
You can get more information here.
I know it's possible to query a model using a reverse related field using the Django ORM. But is it possible to also get all the fields of the reverse related model for which the query matched?
For example, if we have the following models:
class Location(models.Model):
name = models.CharField(max_length=50)
class Availability(models.Model):
location = models.ForeignKey(Location, on_delete=models.CASCADE)
start_datetime = models.DateTimeField()
end_datetime = models.DateTimeField()
price = models.PositiveIntegerField()
would it be possible to find all Locations that are available in a specific timeframe AND also get the price of the Location during that availability? We are under the assumption that Availability objects that have the same location can not have overlapping start/end datetimes.
if user_start_datetime and user_end_datetime are inputted by the user, then we could possibly do something like the following:
Location.objects.filter(
availability__start_datetime__lte=start_datetime,
availability__end_datetime__gte=end_datetime)
But I'm not sure how to also get the price field for the specific availability that did result in a match for the query.
In raw SQL, the behavior I'm talking about might be achievable via something like this:
SELECT l.id, l.name, a.price
FROM Location l
INNER JOIN Availability a
ON a.location_id = l.id
WHERE /* availability is within user-inputted timeframe */
I've considered using something like prefetch_related('availability_set'), but that would just give me all the availabilities for the Location objects that matched the query. I just want the one availability that was within the timeframe that was queried, and more specifically, the price of that availability.
When you are using an ORM, in general you fetch results from one model class at a time. Since Location and Availability are separate models, you can simply do the following:
availabilities = Availability.objects.filter(
start_datetime__lte=start_datetime,
end_datetime__gte=end_datetime)
for availability in availabilities:
print(availability.location.id, availability.location.name, availability.price)
Which is an easy to read implementation.
Now, accessing Location from an Availability object (in availability.location) requires a second SQL query. You can optimise this using select_related:
This is a performance booster which results in a single more complex query but means later use of foreign-key relationships won’t require database queries.
Simply append it to your original query, i.e.:
availabilities = Availability.objects.select_related('location').filter(...
This will create an SQL join statement in the background and the Location objects will not require an extra query.
I have two tables 'Contact' and other is "Subscriber".. I want to Compare Contact_id of both and want to show only those Contact_id which is present in Contact but not in Subscriber.These two tables are in two different Models.
Something like this should work:
Contact.objects.exclude(
id__in=Subscriber.objects.all()
).values_list('id', flat=True)
Note that these are actually two SQL queries. I'm sure there are ways to optimize it, but this will usually work fine.
Also, the values_list has nothing to do with selecting the objects, it just modifies "format" of what is returned (list of IDs instead of queryset of objects - but same database records in both cases).
If you are excluding by some field other then Subscriber.id (e.g: Subscriber.quasy_id):
Contact.objects.exclude(
id__in=Subscriber.objects.all().values_list('quasy_id', flat=True)
).values_list('id', flat=True)
Edit:
This answer assumes you don't have a relationship between your Contact and Subscriber models. If you do, then see #navit's answer, it is a better choice.
Edit 2:
That flat=True inside exclude is actually not needed.
I assume you have your model like this:
class Subscriber(models.Model):
contact = models.ForeignKey(Contact)
You can do what you want like this:
my_list = Subscriber.objects.filter(contact=None)
This retrieves Subscribers which don't have a Contact. Retrieveing a list of Contacts is straightforward.
If you want to compare value of fields in two different tables(which have connection with ForeignKey) you can use something like this:
I assume model is like below:
class Contact(models.Model):
name = models.TextField()
family = models.TextField()
class Subscriber(models.Model):
subscriber_name = models.ForeignKey(Contact, on_delete=models.CASCADE)
subscriber_family = models.TextField()
this would be the query:
query = Subscriber.objects.filter(subscriber_name =F(Contact__name))
return query
I have a simple to-do list with activities that can be ordered by the user. I use the model List, with a many-to-many field to the model Activities.
Now I need a way to store the user defined ordering of the activities on the list. Should I go with an extra field in my List model to store the order of my activity primary keys like this:
class List(models.Model):
activities = models.ManyToManyField(Activity)
order = models.CommaSeperatedIntegerField(max_length=250)
Or should I go with a solution in the Activity model, like described here:
https://djangosnippets.org/snippets/998/
What method can be considered as best practice?
you can create your own ManyToMany Model defining the extra field order
https://docs.djangoproject.com/en/dev/topics/db/models/#extra-fields-on-many-to-many-relationships
something like:
class ActivityList(models.Model):
activity = models.ForeignKey(Activity)
list = models.ForeignKey(List)
order = models.IntegerField()
class List(models.Model)
activities = models.ManyToManyField(Activity, through='ActivityList')
Now I need a way to store the user defined ordering of the activities on the list.
Specifying and order field allows you to give each activity an order.
specifying a comma seperated string, is 100% not the way to go, IN fact it is one of the biggest anti patterns in relational databases, Is storing a delimited list in a database column really that bad?
Using a through model lets you query for the order when presenting your todo list to the user
for activity in your_list.activities.all().order_by('order'):
# display activity to user
I have tables called 'has_location' and 'locations'. 'has_location' has user_has and location_id and its own id which is given by django itself.
'locations' have more columns.
Now I want to get all locations of some certain user. What I did is..(user.id is known):
users_locations_id = has_location.objects.filter(user_has__exact=user.id)
locations = Location.objects.filter(id__in=users_locations_id)
print len(locations)
but I am getting 0 by this print. I have data in db. but I have the feeling that __in does not accept the models id, does it ?
thanks
Using __in for this kind of query is a common anti-pattern in Django: it's tempting because of its simplicity, but it scales poorly in most databases. See slides 66ff in this presentation by Christophe Pettus.
You have a many-to-many relationship between users and locations, represented by the has_location table. You would normally describe this to Django using a ManyToManyField with a through table, something like this:
class Location(models.Model):
# ...
class User(models.Model):
locations = models.ManyToManyField(Location, through = 'LocationUser')
# ...
class LocationUser(models.Model):
location = models.ForeignKey(Location)
user = models.ForeignKey(User)
class Meta:
db_table = 'has_location'
Then you can fetch the locations for a user like this:
user.locations.all()
You can query the locations in your filter operations:
User.objects.filter(locations__name = 'Barcelona')
And you can request that users' related locations be fetched efficiently using the prefetch_related() method on a query set.
You are using has_location's own id to filter locations. You have to use location_ids to filter locations:
user_haslocations = has_location.objects.filter(user_has=user)
locations = Location.objects.filter(id__in=user_haslocations.values('location_id'))
You can also filter the locations directly through the reverse relation:
location = Location.objects.filter(has_location__user_has=user.id)
What do your models look like?
For your doubt, __in does accept filtered ids.
For your current code, the solution:
locations = Location.objects.filter(id__in=has_location.objects.filter(user=user).values('location_id'))
# if you just want the length of the locations, evaluate locations.count()
locations.count()
# if you want to iterate locations to access items afterwards
len(locations)