I'm trying to build a simple search engine for a small website. My initial thought is to avoid using larger packages such as Solr, Haystack, etc. because of the simplistic nature of my search needs.
My hope is that with some guidance I can make my code more pythonic, efficient, and most importantly function properly.
Intended functionality: return product results based on full or partial matches of item_number, product name, or category name (currently no implementation of category matching)
Some code:
import pymssql
import utils #My utilities
class Product(object):
def __init__(self, item_number, name, description, category, msds):
self.item_number = str(item_number).strip()
self.name = name
self.description = description
self.category = category
self.msds = str(msds).strip()
class Category(object):
def __init__(self, name, categories):
self.name = name
self.categories = categories
self.slug = utils.slugify(name)
self.products = []
categories = (
Category('Food', ('123', '12A')),
Category('Tables', ('354', '35A', '310', '31G')),
Category('Chemicals', ('845', '85A', '404', '325'))
)
products = []
conn = pymssql.connect(...)
curr = conn.cursor()
for Category in categories:
for c in Category.categories:
curr.execute('SELECT item_number, name, CAST(description as text), category, msds from tblProducts WHERE category=%s', c)
for row in curr:
product = Product(row[0], row[1], row[2], row[3], row[4])
products.append(product)
Category.products.append(product)
conn.close()
def product_search(*params):
results = []
for product in products:
for param in params:
name = str(product.name)
if (name.find(param.capitalize())) != -1:
results.append(product)
item_number = str(product.item_number)
if (item.number.find(param.upper())) != -1:
results.append(product)
print results
product_search('something')
MS SQL database with tables and fields I cannot change.
At most I will pull in about 200 products.
Some things that jump out at me. Nested for loops. Two different if statements in the product search which could result in duplicate products being added to the results.
My thought was that if I had the products in memory (the products will rarely change) I could cache them, reducing database dependence and possibly providing an efficient search.
...posting for now... will come back and add more thoughts
Edit:
The reason I have a Category object holding a list of Products is that I want to show html pages of Products organized by Category. Also, the actual category numbers may change in the future and holding a tuple seemed like simple painless solution. That and I have read-only access to the database.
The reason for a separate list of products was somewhat of a cheat. I have a page that shows all products with the ability to view MSDS (safety sheets). Also it provided one less level to traverse while searching.
Edit 2:
def product_search(*params):
results = []
lowerParams = [ param.lower() for param in params ]
for product in products:
item_number = (str(product.item_number)).lower()
name = (str(product.name)).lower()
for param in lowerParams:
if param in item_number or param in name:
results.append(product)
print results
Prepare all variables outside of the loops and use in instead of .find if you don't need the position of the substring:
def product_search(*params):
results = []
upperParams = [ param.upper() for param in params ]
for product in products:
name = str(product.name).upper()
item_number = str(product.item_number).upper()
for upperParam in upperParams:
if upperParam in name or upperParam in item_number:
results.append(product)
print results
If both the name and number matches the search parameters, the product will appear twice on the result list.
Since the products count is a small number, I recommend constructing a SELECT query like:
def search(*args):
import operator
cats = reduce(operator.add, [list(c.categories) for c in categories], [])
query = "SELECT * FROM tblProducts WHERE category IN (" + ','.join('?' * len(cats)) + ") name LIKE '%?%' or CAST(item_number AS TEXT) LIKE '%?%' ..."
curr.execute(query, cats + list(args)) # Not actual code
return list(curr)
Related
I have following models structure:
class Chapter:
name = ...
class Atom:
name = ...
chapter = FK(Chapter)
class Question:
title = ...
atom = FK(Atom)
I want to get all Questions related to Chapter, grouped by Atoms in this chapter.
How to do it with many queries I know:
atoms = Question.objects.filter(atom__chapter__id=1).values_list("atom", flat=True).distinct()
result = {}
for atom in atoms:
result["atom"] = Question.objects.filter(atom__id=atom)
I suppose that it could be done with one query using annotate, Subquery and other Django's staff. Is this possible?
You can fetch this in one query, but you need to do the grappig yourself:
from itertools import groupby
from operator import attrgetter
questions = Question.objects.filter(
atom__chapter_id=1
).order_by('atom')
result = {
k: list(vs)
for k, vs in groupby(questions, attrgetter('atom_id'))
}
Here result is a dictionary that maps the primary keys of atom_ids on a list of Question objects with that atom_id.
Using the models from https://docs.djangoproject.com/en/dev/topics/db/queries/#making-queries with minor modifications:
from django.db import models
class Blog(models.Model):
name = models.CharField(max_length=100)
class Author(models.Model):
name = models.CharField(max_length=200)
joined = models.DateField()
def __str__(self):
return self.name
class Entry(models.Model):
blog = models.ForeignKey(Blog, on_delete=models.CASCADE)
headline = models.CharField(max_length=255)
authors = models.ManyToManyField(Author)
rating = models.IntegerField()
I would like to create a dictionary from Author to Entries, where the Author joined this year, and the Entry has a rating of 4 or better. The structure of the resulting dict should look like:
author_entries = {author1: [set of entries], author2: [set of entries], etc.}
while hitting the database less than 3'ish times (or at least not proportional to the number of Authors or Entries).
My first attempt (db hits == number of authors, 100 authors 100 db-hits):
res = {}
authors = Author.objects.filter(joined__year=date.today().year)
for author in authors:
res[author] = set(author.entry_set.filter(rating__gte=4))
second attempt, trying to read entries in one go:
res = {}
authors = Author.objects.filter(joined__year=date.today().year)
entries = Entry.objects.select_related().filter(rating__gte=4, authors__in=authors)
for author in authors:
res[author] = {e for e in entries if e.authors.filter(pk=author.pk)}
this one is even worse, 100 authors, 198 db-hits (the original second attempt used {e for e in entries if author in e.authors}, but Django wouldn't have it.
The only method I've found involves raw-sql (4 db-hits):
res = {}
_authors = Author.objects.filter(joined__year=date.today().year)
_entries = Entry.objects.select_related().filter(rating__gte=4, authors__in=_authors)
authors = {a.id: a for a in _authors}
entries = {e.id: e for e in _entries}
c = connection.cursor()
c.execute("""
select entry_id, author_id
from sampleapp_entry_authors
where author_id in (%s)
""" % ','.join(str(v) for v in authors.keys()))
res = {a: set() for a in _authors}
for eid, aid in c.fetchall():
if eid in entries:
res[authors[aid]].add(entries[eid])
(apologies for using string substitutions in the c.execute(..) call -- I couldn't find the syntax sqlite wanted for a where in ? call).
Is there a more Djangoesque way to do this?
I've created a git repo with the code I'm using (https://github.com/thebjorn/revm2m), the tests are in https://github.com/thebjorn/revm2m/blob/master/revm2m/sampleapp/tests.py
You can use a Prefetch-object [Django-doc] for that:
from django.db.models import Prefetch
good_ratings = Prefetch(
'entry_set',
queryset=Entry.objects.filter(rating__gte=4),
to_attr='good_ratings'
)
authors = Author.objects.filter(
joined__year=date.today().year
).prefetch_related(
good_ratings
)
Now the Author objects in authors will have an extra attribute good_ratings (the value of the to_attr of the Prefetch object) that is a preloaded QuerySet containing the Entrys with a rating greater than or equal to four.
So you can post-process these like:
res = {
author: set(author.good_ratings)
for author in authors
}
Although since the Author objects (from this QuerySet, not in general), already carry the attribute, so there is probably not much use anyway.
We are trying to return a list of titles for the Django API, in which the title can have a few keywords.
So for instance, if we use the __icontains method to search for "money" and "world" (api.com/?keyworld=money&keyword=world) this will return all records that contain money, world or both.
The related SQL statement is:
select * from news
where news_source = 1 or news_source = 2
and news_title like '%money%' or news_title like '%world%'
We are trying to use this code to allow the user to have multiple keywords for the __icontains as well as multiple sources, so the end goal is:
api.com/?keyworld=money&keyword=world&source=1&source=2
Our code:
def get_queryset(self):
queryset = News.objects.all()
title = self.request.query_params.getlist('title')
source = self.request.query_params.getlist('source')
if title:
queryset = queryset.filter(news_title__icontains=title, news_source__in=source)
return queryset
The issue is that this is only returning the second keyword if a second keyword is used, and not other keywords prior to what is typed in &keyword=.
You can not perform an __icontains with a list, but you can for example design a function that, for a list constructs the logical or of these values. For example:
from django.db.models import Q
from functools import reduce
from operator import or_
def or_fold(list_of_qs):
if list_of_qs:
return reduce(or_, list_of_qs)
else:
return Q()
def unroll_lists_or(qs, **kwargs):
return qs.filter([
or_fold(Q(**{k: vi}) for vi in v)
for k, v in kwargs.items()
])
You can then call the unroll_lists_or with a queryset, and each item should be an iterable (for example a list). It will then perform or-logic between the items of the list, and and-logic between different keys. In case an iterable is empty, it is ignored.
So we can then write the check as:
unroll_lists_or(queryset, news_title__icontains=title, news_source=source)
In case title contains two items (so title == [title1, title2]), and source contains three items (so source = [source1, source2, source3]), then this will result in:
qs.filter(
Q(news_title__icontains=title1) | Q(news_title__icontains=title2),
Q(news_source=source1) | Q(news_source=source2) | Q(news_source=source3)
)
You can however combine it with an .filter(..) for the __in check. For example:
queryset = News.objects.all()
if source:
queryset = queryset.filter(news_source__in=source)
queryset = unroll_lists_or(queryset, news_title__icontains=title)
I was able to solve this by creating 2 separate functions within the get_querset() function, which is called when a GET request is made.
def get_queryset(self):
queryset = News.objects.all()
source_list = self.request.query_params.getlist('source')
keyword_list = self.request.query_params.getlist('title')
if source_list or keyword_list:
def create_q_source(*args):
list = [*args]
source = Q()
for value in list:
source.add(Q(news_source=value), Q.OR)
return source
def create_q_keyword(*args):
list = [*args]
keyword = Q()
for value in list:
keyword.add(Q(news_title__icontains=value), Q.OR)
return keyword
queryset = queryset.filter(create_q_source(*source_list),create_q_keyword(*keyword_list))
return queryset
Edit:
When you go to the api link and pass in the parameters, filtering will occur based on what is passed in:
http://127.0.0.1:8000/api/notes/?keyword=trump&keyword=beyond&keyword=money&source=1
SQL Equivalent:
select * from news where news_source = 1 AND news_title like '%beyond%' OR news_title like '%money%'
I have the following code and i'm trying to update an embedded document in a listfield.
store = store_service.get_store_from_product_id(product_id)
got_product, idx = get_product_from_store(store, product_id)
product = Product()
product.pid = got_product.pid
product.display_name = display_name
product.description = description
product.rank = rank
product.price = price
product.categories = categories
product.properties = properties
store.catalog.products[idx] = product
print store.catalog.products[idx].__unicode__()
store.save()
When I print out my product, it has the correct values, but when I save it, its not persisting. There are no errors being thrown. Any thoughts one what I could be doing wrong?
store.catalog.products[idx] = product can be applied for DictField(). For ListField(). You can try:
store.catalog.products = [product]
or
store.catalog.products.append(product)
And you need to call save on the object:
store.save()
There is the possibility of atomic updates which can help in other cases:
Store.objects(id='123400000').update_one(push__catalog__products=product)
I have a django model that looks something like this:
class Definition
name = models.CharField(max_length=254)
text = models.TextField()
If I do the following query:
animal = Definition.objects.get(name='Owl')
and if I have the following definitions with these names in my database:
Elephant, Owl, Zebra, Human
is there a way to do a django query(ies) that will show me the previous and the next Definitions based on the animal object based on alphabetical order of the name field in the model?
I know that there are ways of getting previous/next based on datetime fields, but I am not so sure for this case.
I don't know of any way of doing this in less than three queries.
target = 'Owl'
animal = Definition.objects.get(name=target)
previous_animal = Definition.objects.order_by('name').filter(name__lt=target)[0]
next_animal = Definition.objects.order_by('name').filter(name__gt=target)[0]
If anyone comes across this like I just did...
heres my solution... it also loops(so if on last item it shows first item as next and if on first item shows last item as previous)
def get_previous_by_title(self):
curr_title = self.get_object().title
queryset = self.my_queryset()
try:
prev = queryset.filter(title__lt=curr_title).order_by("-title")[0:1].get()
except Video.DoesNotExist:
prev = queryset.order_by("-title")[0:1].get()
return prev
def get_next_by_title(self):
curr_title = self.get_object().title
queryset = self.my_queryset()
try:
next = queryset.filter(title__gt=curr_title).order_by("title")[0:1].get()
except Video.DoesNotExist:
next = queryset.order_by("title")[0:1].get()
return next
i have custom querysets based on user level so could just set the queryset as a normal queryset like... Video.objects.all() but anyplace I repeat code more than once I make a function