Get duplicates in django

Get duplicates in django - python

I've this name field in my database and quite a few of the names are duplicates. I want to have them unique. I know I can set the unique = True but that would only help with future entries. I want to know all the current entries with duplicate names. Is there an easy way to print out all the names that are duplicate in the doctor model?
class Doctor(models.Model):
name = models.CharField(max_length=1300)

To get rid of all duplicates from your database, you must ask yourself a question first - what to do with them? Remove? Merge somehow? Change name of each duplicate?
After answering that question, simply construct data migration (with RunPython migration) that will do desired operation on each duplicated entry.
To find all duplicates, you can do:
from django.db.models import Count
with_duplicates = Doctor.objects.annotate(count=Count('id')).order_by('id').distinct('name').filter(count__gt=1)
That query will fetch from database first (by id) record from duplicates group (for example if you have 3 doctors named "who", it will fetch first of them and it will fetch only doctors with duplicates).
Having that, for each doctor that have duplicates, you can get list of that duplicates:
with_duplicates = Doctor.objects.annotate(count=Count('id')).order_by('id').distinct('name').filter(count__gt=1)
for doctor in with_duplicates:
duplicates = Doctor.objects.filter(name=doctor.name).exclude(id=doctor.id)
And do something with them.

class Doctor(models.Model):
name = models.CharField(max_length=1300, unique = True)

Related

Django Query where one field is duplicate and another is different

I want to know if I can create a query where one field is duplicate and another one is different.
Basically I want to get all UsersNames where First Name is the same and user_id is different.
I did this
UserNames.objects.values("first_name", "user_id").annotate(ct=Count("first_name")).filter(ct__gt=0)
This will retrieve a list whit all Users
After tis, I make some post processing and create another query where I filter just the users with first_name__in=['aaa'] & user_id__in=[1, 2] to get the users with the same first_name but different user_id
Can I do this just in one query? or in a better way?

You can work with a subquery here, but it will not matter much in terms of performance I think:
from django.db.models import Exists, OuterRef, Q
UserNames.objects.filter(
Exists(UserNames.objects.filter(
~Q(user_id=OuterRef('user_id')),
first_name=OuterRef('first_name')
))
)
or prior to django-3.0:
from django.db.models import Exists, OuterRef, Q
UserNames.objects.annotate(
has_other=Exists(UserNames.objects.filter(
~Q(user_id=OuterRef('user_id')),
first_name=OuterRef('first_name')
))
).filter(has_other=True)
We thus retain UserNames objects for which there exists a UserNames object with the same first_name, and with a different user_id.

Django: remove duplicates (group by) from queryset by related model field

I have a Queryset with a couple of records, and I wan't to remove duplicates using the related model field. For example:
class User(models.Model):
group = models.ForeignKey('Group')
...
class Address(models.Model):
...
models.ForeignKey('User')
addresses = Address.objects.filter(user__group__id=1).order_by('-id')
This returns a QuerySet of Address records, and I want to group by the User ID.
I can't use .annotate because I need all fields from Address, and the relationship between Address and User
I can't use .distinct() because it doesn't work, since all addresses are distinct, and I want distinct user addresses.
I could:
addresses = Address.objects.filter(user__group__id=1).order_by('-id')
unique_users_ids = []
unique_addresses = []
for address in addresses:
if address.user.id not in unique_users_ids:
unique_addresses.append(address)
unique_users_ids.append(address.user.id)
print unique_addresses # TA-DA!
But it seems too much for a simple thing like a group by (damn you Django).
Is there a easy way to achieve this?

By using .distinct() with a field name
Django has also a .distinct(..) function that takes as input column the column names that should be unique. Alas most database systems do not support this (only PostgreSQL to the best of my knowledge). But in PostgreSQL we can thus perform:
# Limited number of database systems support this
addresses = (Address.objects
.filter(user__group__id=1)
.order_by('-id')
.distinct('user_id'))
By using two queries
Another way to handle this is by first having a query that works over the users, and for each user obtains the largest address_id:
from django.db.models import Max
address_ids = (User.objects
.annotate(address_id=Max('address_set__id'))
.filter(address_id__isnull=False)
.values_list('address_id'))
So now for every user, we have calculated the largest corresponding address_id, and we eliminate Users that have no address. We then obtain the list of ids.
In a second step, we then fetch the addresses:
addresses = Address.objects.filter(pk__in=address_ids)

Have multiple elements in sqlite column

How can i add multiple elements for a single column in a row:
Say i have a column, topic, which can have infinitely many elements inside:
topics = ['Particle Physics,'Karaoke','jazz']
I have a statement in sqlite:
def UpdateElement(new_user,new_topic):
new_topic = new_topic + "; "
querycurs.execute('''UPDATE First_Data SET topic = (?) WHERE user = (?)''', (new_topic, new_user))
However this will allow only one element at a time to exist under the topic column. How can you edit the code so that it can add another given element to the current topic.
If in the table topic = ['Math'], then i could make it into topic = '[Math; Python']. This way i can use simple python .join statement to split it.

With a text field you can store anything. You could store the list as semi-colon delimited string or as a json object string. You could also pickle the list and store it as a base64 string. The problem with all of these solutions is that you lose a level of access to your data. To count how many users have like Jazz topic, you need to read/split the text field...or use some more complicated LIKE statement.
Since you are using SQL, you may want to consider normalizing your data to include a Topic table, a User table, and a cross-walk table with foreign keys to your users and topics to enforce the many-to-many relationship. While its a bit more to setup, but it can be simpler to update when user topics change.

Django - Following ForeignKey relationships "backward" for entire QuerySet

is it possible to follow ForeignKey relationships backward for entire querySet?
i mean something like this:
x = table1.objects.select_related().filter(name='foo')
x.table2.all()
when table1 hase ForeignKey to table2.
in
https://docs.djangoproject.com/en/1.2/topics/db/queries/#following-relationships-backward
i can see that it works only with get() and not filter()
Thanks

You basically want to get QuerySet of different type from data you start with.
class Kid(models.Model):
mom = models.ForeignKey('Mom')
name = models.CharField…
class Mom(models.Model):
name = models.CharField…
Let's say you want to get all moms having any son named Johnny.
Mom.objects.filter(kid__name='Johnny')
Let's say you want to get all kids of any Lucy.
Kid.objects.filter(mom__name='Lucy')

You should be able to use something like:
for y in x:
y.table2.all()
But you could also use get() for a list of the unique values (which will be id, unless you have a different specified), after finding them using a query.
So,
x = table1.objects.select_related().filter(name='foo')
for y in x:
z=table1.objects.select_related().get(y.id)
z.table2.all()
Should also work.

You can also use values() to fetch specific values of a foreign key reference. With values the select query on the DB will be reduced to fetch only those values and the appropriate joins will be done.
To re-use the example from Krzysztof Szularz:
jonny_moms = Kid.objects.filter(name='Jonny').values('mom__id', 'mom__name').distinct()
This will return a dictionary of Mom attributes by using the Kid QueryManager.

Remove rows with same ID from two lists in Django

I have a list of objects called checkins that lists all the times a user has checked into something and another set of objects called flagged_checkins that is certain checkins that the user has flagged. They both reference a 3rd table called with a location_id
I'd like to take the two lists of objects, and remove any of the checkins which have a location_id in flagged_checkins
How do I compare these sets and remove the rows from 'checkins'

As per this SO question,
checkins.objects.filter( location_id__in = list(flagged_checkins.objects.values_list('location_id', flat=True)) ).delete()

If you are talking about a queryset then, you can definitely try:
checkins.objects.exclude( location_id__in = list(flagged_checkins.objects.values_list('location_id', flat=True)) )
This would remove the objects based on your criteria. But not from the db level.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Get duplicates in django - python

class Doctor(models.Model): name = models.CharField(max_length=1300, unique = True)

Related

Django Query where one field is duplicate and another is different

Django: remove duplicates (group by) from queryset by related model field

Have multiple elements in sqlite column

Django - Following ForeignKey relationships "backward" for entire QuerySet

Remove rows with same ID from two lists in Django

Categories

Resources