Remove duplicates in the query set

Remove duplicates in the query set - python

I have a query set like the one below which depicts the albums and songs associated with those albums. The model name is UserSongs
<QuerySet [{'id': 1, 'album': 'Sri', 'song': 'in the end','release': 2017},
{'id': 2, 'album': 'Adi', 'song': 'cafe mocha','release': 2016},
{'id': 3, 'album': 'Shr', 'song': 'smooth criminal','release': 2016},
{'id': 4, 'album': 'Mouse', 'song': 'trooper','release': 2017},
{'id': 5, 'album': 'Mouse', 'song': 'my mobile','release': 2015},
{'id': 6, 'album': 'Sri', 'song': 'my office','release': 2018},
{'id': 7, 'album': 'Sri', 'song': 'null','release': null},
{'id': 8, 'album': 'Mouse', 'song': 'null','release': null}]>
In the backend, I'm converting the query set into a list. See code below:
albums_songs = UserSongs.objects.filter(album__in =
['Sri','Mouse']).values('albums','songs')
list_albums_songs = list(albums_songs)
I'm sending this list to the front-end for displaying it in table.Sri,Mouse have multiple entries since they have released multiple songs. In the front end, these songs are displayed in a table with album and songs as entries. Each item in the query set is displayed as one row. Like the one below.
Album Songs
Sri in the end
Adi cafe mocha
Adi null
Shr smooth criminal
Mouse trooper
Mouse my mobile
Sri my office
Sri null
Mouse null
But in the table, null entry for the song is also displayed. I don't want to display that null entry for only Sri,Mouse. I want to diaplay the song=null doe Adi.I can remove it after converting into list and iterating over the list. But that is costly. I believe that we can do it in django query itself. Something like, if album is Sri or Mouse, then check for song = null and don't get that entry.
Or after getting the query set, before converting into list, can we remove those items from query set?

You can use the isnull filter:
albums_songs = UserSongs.objects.filter(album__in=['Sri','Mouse'], songs__isnull=False).values('albums','songs')
EDIT: With the new requirements in your updated question, you should use the exclude method instead:
albums_songs = UserSongs.objects.exclude(album__in=['Sri','Mouse'], songs__isnull=True).values('albums','songs')

Related

Generator using two list of dictionaries

I have two list of dictionaries called Reviewers_dicts and Products_dicts
The keys are the following:
Products_dicts = ['ProductID', 'sku', 'name_title', 'description', 'list_price', 'sale_price', 'category', 'category_tree', 'average_product_rating', 'product_url', 'product_image_urls', 'brand', 'total_number_reviews', 'Reviews', 'Bought With']
Reviewers_dicts = ['Username', 'DOB', 'State', 'Reviewed_ProductID']
I want to write a generator function that takes a username and yields all the reviews the person with that username has written, one at a time.
So far I have tried:
def find_reviews(val):
for dict_key in reviewers_dicts
if dict_key["Username"] == val1:
if dict_key["Reviewed"] in reviewers_dicts == dict_key["ProductD"]:
print(products_dicts["Reviews"])
Eaxmple entry for
prodcuts_dicts = {'uniq_id': 'b6c0b6bea69c722939585baeac73c13d','total_number_reviews': 8, 'Reviews': [{'User': 'fsdv4141', 'Review': 'You never have to worry about the fit...Alfred Dunner clothing sizes are true to size and fits perfectly. Great value for the money.', 'Score': 2}, {'User': 'krpz1113', 'Review': 'Good quality fabric. Perfect fit. Washed very well no iron.', 'Score': 4}, {'User': 'mbmg3241', 'Review': 'I do not normally wear pants or capris that have an elastic waist, but I decided to try these since they were on sale and I loved the color. I was very surprised at how comfortable they are and wear really well even wearing all day. I will buy this style again!', 'Score': 4}, {'User': 'zeqg1222', 'Review': 'I love these capris! They fit true to size and are so comfortable to wear. I am planning to order more of them.', 'Score': 1}, {'User': 'nvfn3212', 'Review': 'This product is very comfortable and the fabric launders very well', 'Score': 1}, {'User': 'aajh3423', 'Review': 'I did not like the fabric. It is 100% polyester I thought it was different.I bought one at the store apprx two monts ago, and I thought it was just like it', 'Score': 5}, {'User': 'usvp2142', 'Review': 'What a great deal. Beautiful Pants. Its more than I expected.', 'Score': 3}, {'User': 'yemw3321', 'Review': 'Alfred Dunner has great pants, good fit and very comfortable', 'Score': 1}], 'Bought With': ['898e42fe937a33e8ce5e900ca7a4d924', '8c02c262567a2267cd207e35637feb1c', 'b62dd54545cdc1a05d8aaa2d25aed996', '0da4c2dcc8cfa0e71200883b00d22b30', '90c46b841e2eeece992c57071387899c']}
Example entry for
Reviewers_Dicts = [{'Username': 'bkpn1412', 'DOB': '31.07.1983', 'State': 'Oregon', 'Reviewed': ['cea76118f6a9110a893de2b7654319c0']}, {'Username': 'gqjs4414', 'DOB': '27.07.1998', 'State': 'Massachusetts', 'Reviewed': ['fa04fe6c0dd5189f54fe600838da43d3']}]

Ok, assuming your products_dicts (typo fixed) actually is a list of several dicts, this code will do what you want:
def find_reviews(user):
reviews = []
for entry in Reviewers_Dicts:
if entry['Username'] == user:
prod_list = entry['Reviewed']
break
for prod_id in prod_list:
for entry in products_dicts:
if entry['uniq_id'] == prod_id:
for rev in entry['Reviews']:
if rev['User'] == user:
yield rev['Review']
break
break
That said, as I commented earlier, your data structure (lists of dicts, applying to products_dicts, Reviewers_Dicts, Reviews (in products_dicts)) is far for optimal, forcing the code to loop through each list to find the relevant entry, and would be better replaced with actual dicts fo dicts.

Add 'column' to list of dictionary

I have two list of strings (one is iOS ids, and the other App Name)
I'm trying to assign the app name to the ios id, respectively, when I iterate over a function that scrapes reviews. get_reviews() pulls the data from the app store using the App ID. I think I'm close but quite not there yet.
iosid = ['123456', '1324567', etc.]
name = ['Target', 'Facebook', etc.]
data = []
for j in iosid:
for i in name:
reviews = get_reviews(j)
result = [dict(item, app=i) for item in reviews]
data.append(result)
Example of output:
[{'review_id': '83323473', 'updated': '2022-02-13T19:05:11-07:00', 'title': 'I wish all apps were like this', 'author': 'john_doe', 'author_url': 'https://itunes.apple.com/us/reviews/id3435341', 'version': '2022.5', 'rating': '5', 'review': 'I love the app, super easy to use', 'vote_count': '0', 'page': 1, 'app': 'Target'},
{'review_id': '83323473', 'updated': '2022-02-13T19:05:11-07:00', 'title': 'Facebook changed', 'author': 'jim_doe', 'author_url': 'https://itunes.apple.com/us/reviews/id3234341', 'version': '2021.5', 'rating': '2', 'review': 'Super hard to use, don't recommend', 'vote_count': '0', 'page': 1, 'app': 'Facebook'}]

I think you could do like this
data = []
for id, name in zip(id_list, name_list):
reviews = get_reviews(id) # reviews is a list of dictionaries
# Add the field `app` with value `name` to each dictionary in `reviews`
result = [dict(item, app=name) for item in reviews]
data.append(result)

Modeling a dictionary as a queryable data object in python

I have a simple book catalog dictionary as the following
{
'key':
{
'title': str,
'authors': [ {
'firstname': str,
'lastname': str
}
],
'tags': [ str ],
'blob': str
}
}
Each book is a string key in the dictionary. A book contains a single title, and possibly has many authors (often just one). An author is made of two strings, firstname and lastname. Also we can associate many tags to a book as novel, literature, art, 1900s, etc. Each book as a blob field that contains additional data. (often the book itself). I want to be able to search for a given entry (or a group of them) based on data, as by author, by tag.
My main workflow would be:
Given a query, return all blob fields associated to each entry.
My question is how to model this, which libraries or formats to use keeping the given constraints:
Minimize the number of data objects (preference for a single data object to simplify queries).
Small size of columns (create a new column for every possible tag is probably insane and lead to a very sparse dataset)
Do not duplicate blob field (since it can be large).
My first idea was to create multiple rows for each author, for example:
{ '123': { 'title': 'A sample book',
'authors': [ {'firstname': 'John', 'lastname': 'Smith'},
{'firstname': 'Foos', 'lastname': 'M. Bar'} ]
'tags': [ 'tag1', 'tag2', 'tag3' ],
'blob': '.....'
}
Would turn, initially into two entries as
idx
key
Title
authors_firstname
authors_lastname
tags
blob
0
123
Sample Book
John
Smith
['tag1', 'tag2', 'tag3']
...
1
123
Sample Book
Foos
M. Bar
['tag1', 'tag2', 'tag3']
...
But this still duplicates the blob, and still need to figure out what to do with the unknown number of tags (as the database grows).

You can use TinyDB to accomplish what you want.
First, convert your dict to a database:
from tinydb import TinyDB, Query
from tinydb.table import Document
data = [{'123': {'title': 'A sample book',
'authors': [{'firstname': 'John', 'lastname': 'Smith'},
{'firstname': 'Foos', 'lastname': 'M. Bar'}],
'tags': ['tag1', 'tag2', 'tag3'],
'blob': 'blob1'}},
{'456': {'title': 'Another book',
'authors': [{'firstname': 'Paul', 'lastname': 'Roben'}],
'tags': ['tag1', 'tag3', 'tag4'],
'blob': 'blob2'}}]
db = TinyDB('catalog.json')
for record in data:
db.insert(Document(list(record.values())[0], doc_id=list(record.keys())[0]))
Now you can make queries:
Book = Query()
Author = Query()
rows = db.search(Book.authors.any(Author.lastname == 'Smith'))
rows = db.search(Book.tags.all(['tag1', 'tag4']))
rows = db.all()
Given a query, return all blob fields associated to each entry.
blobs = {row.doc_id: row['blob'] for row in db.all()}
>>> blobs
{123: 'blob1', 456: 'blob2'}

Merge a specific value from one array of dicts into another if a value matches

How do I merge a specific value from one array of dicts into another array of dicts if a single specific value matches between them?
I have an array of dicts that represent books
books = [{'writer_id': '123-456-789', 'index': None, 'title': 'Yellow Snow'}, {'writer_id': '888-888-777', 'index': None, 'title': 'Python for Dummies'}, {'writer_id': '999-121-223', 'index': 'Foo', 'title': 'Something Else'}]
and I have an array of dicts that represents authors
authors = [{'roles': ['author'], 'profile_picture': None, 'author_id': '123-456-789', 'name': 'Pat'}, {'roles': ['author'], 'profile_picture': None, 'author_id': '999-121-223', 'name': 'May'}]
I want to take the name from authors and add it to the dict in books where the books writer_id matches the authors author_id.
My end result would ideally change the book array of dicts to be (notice the first dict now has the value of 'name': 'Pat' and the second book has 'name': 'May'):
books = [{'writer_id': '123-456-789', 'index': None, 'title': 'Yellow Snow', 'name': 'Pat'}, {'writer_id': '888-888-777', 'index': None, 'title': 'Python for Dummies'}, {'writer_id': '999-121-223', 'index': 'Foo', 'title': 'Something Else', 'name': 'May'}]
My current solution is:
for book in books:
for author in authors:
if book['writer_id'] == author['author_id']:
book['author_name'] = author['name']
And this works. However, the nested statements bother me and feel unwieldy. I also have a number of other such structures so I end up with a function that has a bunch of code resembling this in it:
for book in books:
for author in authors:
if book['writer_id'] == author['author_id']:
book['author_name'] = author['name']
books_with_foo = []
for book in books:
for thing in things:
if something:
// do something
for blah in books_with_foo:
for book_foo in otherthing:
if blah['bar'] == stuff['baz']:
// etc, etc.
Alternatively, how would you aggregate data from multiple database tables into one thing... some of the data comes back as dicts, some as arrays of dicts?

Pandas is almost definitely going to help you here. Convert your dicts to DataFrames for easier manipulation, then merge them:
import pandas as pd
authors = [{'roles': ['author'], 'profile_picture': None, 'author_id': '123-456-789', 'name': 'Pat'}, {'roles': ['author'], 'profile_picture': None, 'author_id': '999-121-223', 'name': 'May'}]
books = [{'writer_id': '123-456-789', 'index': None, 'title': 'Yellow Snow'}, {'writer_id': '888-888-777', 'index': None, 'title': 'Python for Dummies'}, {'writer_id': '999-121-223', 'index': 'Foo', 'title': 'Something Else'}]
df1 = pd.DataFrame.from_dict(books)
df2 = pd.DataFrame.from_dict(authors)
df1['author_id'] = df1.writer_id
df1 = df1.set_index('author_id')
df2 = df2.set_index('author_id')
result = pd.concat([df1, df2], axis=1)
you may find this page helpful for different ways of combining (merging, concatenating, etc) separate DataFrames.

Updating unique records in mongodb

In mongo database date is unique key. Record to be inserted looks like below where result contains multiple dictionaries.
{'date': 1435363200, 'result': [{'article_link': u'http://gadgets.ndtv.com/mobiles/reviews/lenovo-k3-note-first-impressions-the-complete-package-708138?trendingnow', 'img': u'http://cdn.ndtv.com/tech/images/gadgets/thumb/lenovo_k3_note_back_ndtv_small.jpg', 'title': u'Lenovo K3 Note First Impressions: The Complete Package?'}, {'article_link': u'http://www.ndtv.com/india-news/rajasthan-chief-minister-vasundhara-raje-to-attend-niti-aayog-meeting-in-delhi-today-775766?trendingnow', 'img': u'http://i.ndtvimg.com/i/2015-06/vasundhara-raje-240-pti-neeti-aayog_240x180_41435392178.jpg', 'title': u'Vasundhara Raje Attends NITI Aayog Meet in Delhi, Returns to Jaipur Without Meeting BJP Leaders'}]}
On same date there may be multiple inserts having different values in result sometimes same as previous insert.
#result = {'date': 1435363200, 'result': [{'article_link': u'http://gadgets.ndtv.com/mobiles/reviews/lenovo-k3-note-first-impressions-the-complete-package-708138?trendingnow', 'img': u'http://cdn.ndtv.com/tech/images/gadgets/thumb/lenovo_k3_note_back_ndtv_small.jpg', 'title': u'Lenovo K3 Note First Impressions: The Complete Package?'}, {'article_link': u'http://www.ndtv.com/india-news/rajasthan-chief-minister-vasundhara-raje-to-attend-niti-aayog-meeting-in-delhi-today-775766?trendingnow', 'img': u'http://i.ndtvimg.com/i/2015-06/vasundhara-raje-240-pti-neeti-aayog_240x180_41435392178.jpg', 'title': u'Vasundhara Raje Attends NITI Aayog Meet in Delhi, Returns to Jaipur Without Meeting BJP Leaders'}, {'article_link': u'http://www.ndtv.com/india-news/high-commissioner-to-new-zealand-posted-back-to-delhi-after-wife-accused-of-assault-775813?trendingnow', 'img': u'http://i.ndtvimg.com/i/2015-06/ravi-thapar_240x180_81435381614.jpg', 'title': u"High Commissioner to New Zealand 'Posted Back' to Delhi After Wife Accused of Assault"}]}
#result = {'date': 1435363200, 'result': [{'article_link': u'http://gadgets.ndtv.com/mobiles/reviews/lenovo-k3-note-first-impressions-the-complete-package-708138?trendingnow', 'img': u'http://cdn.ndtv.com/tech/images/gadgets/thumb/lenovo_k3_note_back_ndtv_small.jpg', 'title': u'Lenovo K3 Note First Impressions: The Complete Package?'}, {'article_link': u'http://www.ndtv.com/india-news/rajasthan-chief-minister-vasundhara-raje-to-attend-niti-aayog-meeting-in-delhi-today-775766?trendingnow', 'img': u'http://i.ndtvimg.com/i/2015-06/vasundhara-raje-240-pti-neeti-aayog_240x180_41435392178.jpg', 'title': u'Vasundhara Raje Attends NITI Aayog Meet in Delhi, Returns to Jaipur Without Meeting BJP Leaders'}, {'article_link': u'http://profit.ndtv.com/news/economy/article-world-economy-may-be-slipping-into-1930s-depression-raghuram-rajan-775521?pfrom=home-lateststories&trendingnow', 'img': u'http://i.ndtvimg.com/i/2015-06/raghuram-rajan_240x180_61435303075.jpg', 'title': u'Raghuram Rajan Says, World Economy May Be Slipping Into 1930s Depression'}, {'article_link': u'http://www.ndtv.com/diaspora/bobby-jindal-wants-to-get-rid-of-us-supreme-court-775793?trendingnow', 'img': u'http://i.ndtvimg.com/i/2015-06/bobby-jindal-announcement-afp_240x180_41435200334.jpg', 'title': u'Bobby Jindal Wants to Get Rid of US Supreme Court'}, {'article_link': u'http://auto.ndtv.com/news/will-india-go-crazy-over-the-hyundai-creta-775784?trendingnow', 'img': u'http://i.ndtvimg.com/i/2015-06/hyundai-creta_240x180_51433745765.jpg', 'title': u'Will India Go Crazy Over the Hyundai Creta?'}]}
Issue is, during every insert, only new distinct dictionaries should be appended in the database.
Here is my code, which seems working, any improvement or better way much appreciated :
try:
self.collection.insert(record)
print "mongo done"
except Exception:
print 'Failed to save value '
date = self.getTodayDate()
data = self.collection.find({'date': record['date']})
new_key = []
for val in record['result']:
#convert input values in unicode, as data fetched from mongo comes in this format
new_key.append({ unicode(key):unicode(value) for key,value in val.items() })
for key in data:
#Append only new unique values in new_key
new_key.extend([k for k in key['result'] if k not in new_key])
self.collection.update(
{'date' : record['date']},
{ '$set' : {'result': new_key}},
True
)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Remove duplicates in the query set - python

Related

Generator using two list of dictionaries

Add 'column' to list of dictionary

Modeling a dictionary as a queryable data object in python

Merge a specific value from one array of dicts into another if a value matches

Updating unique records in mongodb

Categories

Resources