Django MySQL Exact Word Match With iRegex Search - python

I am trying to get a column that has an exact word say yax and not yaxx but I keep getting the two for whichever one that search for. I want only yax when I search for yax regardless of case.
I have tried:
key = 'yax'
query = Model.objects.filter(content__iregex=r"[[:<:]]*{0}*[[:>:]]".format(key))
Answers I have checked but didn't quite help me:
This...
And this...
And this too...

Remove the * from your regex.
query = Model.objects.filter(content__iregex=r"[[:<:]]{0}[[:>:]]".format(key))

Related

How to find all cells matching a regex with gspread?

So I am very new to programming and I am using python gspread module to use a google sheet as a database.
There's a function for said module called sheet.findall(query, row, column), and this is great, but there's one issue, the query parameter will only look for an exact match, meaning that if i write "DDG", it will not get me the info from a cell with the value of "DDG-87".
After reading the documentation, I found out that you can use python regular expressions to structure the query parameter, so I did that, but there's a problem; The second parameter in re.findall is WHERE to look for, but the issue is that the whole variable is the action of searching, example shown below:
search = sheet.findall(re.findall("[DDG]", The where to search goes here))
As you can see, the whole variable (SEARCH) is the search function, and therefore, I can not specify where to search.
I have tried to set the second parameter of the regex as (SEARCH), but obviously, it won't work.
Any idea or a clue on how I can set the second parameter of re.findall() to be self, or what I can do so that the function doesn't search for an exact match, but if it contains the text?
Thank you.
From the gspread docs:
Find all cells matching a regexp:
criteria_re = re.compile(r'(Small|Room-tiering) rug')
cell_list = worksheet.findall(criteria_re)
So the following should work in your case:
criteria_re = re.compile(r'DDG.*')
search = sheet.findall(criteria_re)

Using whoosh as matcher without an index

Is it possible to use whoosh as a matcher without building an index?
My situation is that I have subscriptions pre-defined with strings, and documents coming through in a stream. I check each document matches the subscriptions and send them if so. I don't need to store the documents, or recall them later. Once they've been sent to the subscriptions, they can be discarded.
Currently just using simple matching, but as consumers ask for searches based on fields, and/or logic, etc, I'm wondering if it's possible to use a whoosh matcher and allow whoosh query syntax for this.
I could build an index for each document, query it, and then throw it away, but that seems very wasteful, is it possible to directly construct a Matcher? I couldn't find any docs or questions online indicating a way to do this and my attempts haven't worked.
Alternatively, is this just the wrong library for this task, and is there something better suited?
The short answer is no.
Search indices and matchers work quite differently. For example, if searching for the phrase "hello world", a matcher would simply check the document text contains the substring "hello world". A search index cannot do this, it would have to check every document, and that be very slow.
As documents are added, every word in them is added to the index for that word. So the index for "hello" will say that document 1 matches at position 0, and the index for "world" will say that document 1 matches at position 6. And a search for "hello world" will find all document IDs in the "hello" index, then all in the "world" index, and see if any have a position for "world" which is 6 digits after the position for "hello".
So it's a completely orthogonal way of doing things in whoosh vs a matcher.
It is possible to do this with whoosh, using a new index for each document, like so:
def matches_subscription(doc: Document, q: Query) -> bool:
with RamStorage() as store:
ix = store.create_index(schema)
writer = ix.writer()
writer.add_document(
title=doc.title,
description=doc.description,
keywords=doc.keywords
)
writer.commit()
with ix.searcher() as searcher:
results = searcher.search(q)
return bool(results)
This takes about 800 milliseconds per check, which is quite slow.
A better solution is to build a parser with pyparsing, anbd then create your own nested query classes which can do the matching, better fitting your specific search queries. It's quite extendable too that way. That can bring it down to ~40 microseconds, so, 20,000 times faster.

Flask SQLAlchemy Contains/Ilike producing different results?

I am trying to query a column from a database with contains/ilike, they are producing different results. Any idea why?
My current code;
search = 'nel'
find = Clients.query.filter(Clients.lastName.ilike(search)).all()
# THE ABOVE LINE PRODUCES 0 RESULTS
find = Clients.query.filter(Clients.lastName.contains(search)).all()
# THE ABOVE LINE PRODUCES THE DESIRED RESULTS
for row in find:
print(row.lastName)
My concern is am I missing something? I have read that 'contains' does not always work either. Is there a better way to do what I am doing?
For ilike and like, you need to include wildcards in your search like this:
Clients.lastName.ilike(r"%{}%".format(search))
As the Postgres docs say:
LIKE pattern matching always covers the entire string. Therefore, to match a sequence anywhere within a string, the pattern must start and end with a percent sign.
The other difference is that contains is case-sensitive, while ilike is insensitive.

How do you improve search?

I just got haystack with solr installed and created a custom view:
from haystack.query import SearchQuerySet
def post_search(request, template_name='search/search.html'):
getdata = request.GET.copy()
try:
results = SearchQuerySet().filter(title=getdata['search'])[:10]
except:
results = None
return render_to_response(template_name, locals(), context_instance=RequestContext(request))
This view only returns exact matches on the title field. How do I do at least things like the sql LIKE '%string%' (or at least i think it's this) where if I search 'i' or 'IN' or 'index' I will get the result 'index'?
Also are most of the ways you search edited using haystack or solr?
What other good practices/search improvements do you suggest (please give implementation too)?
Thanks a bunch in advance!
When you use Haystack/Solr, the idea is that you have to tell Haystack/Solr what you want indexed for a particular object. So say you wanted to build a find as you type index for a basic dictionary. If you wanted it to just match prefixes, for the word Boston, you'd need to tell it to index B, Bo, Bos, etc. and then you'd issue a query for whatever the current search expression was and you could return the results. If you wanted to search any part of the word, you'd need to build suffix trees and then Solr would take care of indexing them.
Look at templates in Haystack for more info. http://docs.haystacksearch.org/dev/best_practices.html#well-constructed-templates
The question you're asking is fairly generic, it might help to give specifics about what people are searching for. Then it'll be easier to suggest how to index the data. Good luck.

How to match search strings to content in python

Usually when we search, we have a list of stories, we provide a search string, and expect back a list of results where the given search strings matches the story.
What I am looking to do, is the opposite. Give a list of search strings, and one story and find out which search strings match to that story.
Now this could be done with re but the case here is i wanna use complex search queries as supported by solr. Full details of the query syntax here. Note: i wont use boost.
Basically i want to get some pointers for the doesitmatch function in the sample code below.
def doesitmatch(contents, searchstring):
"""
returns result of searching contents for searchstring (True or False)
"""
???????
???????
story = "big chunk of story 200 to 1000 words long"
searchstrings = ['sajal' , 'sajal AND "is a jerk"' , 'sajal kayan' , 'sajal AND (kayan OR bangkok OR Thailand OR ( webmaster AND python))' , 'bangkok']
matches = [[searchstr] for searchstr in searchstrings if doesitmatch(story, searchstr) ]
Edit: Additionally would also be interested to know if any module exists to convert lucene query like below into regex:
sajal AND (kayan OR bangkok OR Thailand OR ( webmaster AND python) OR "is a jerk")
After extensive googling, i realized what i am looking to do is a Boolean search.
Found the code that makes regex boolean aware : http://code.activestate.com/recipes/252526/
Issue looks solved for now.
Probably slow, but easy solution:
Make a query on the story plus each string to the search engine. If it returns anything, then it matches.
Otherwise you need to implement the search syntax yourself. If that includes things like "title:" and stuff this can be rather complex. If it's only the AND and OR from your example, then it's a recursive function that isn't too hairy.
Some time ago I looked for a python implementaion of lucene and I came accross of Woosh which is a pure python text-based research engine. Maybe it will statisfy your needs.
You can also try pyLucene, but i did'nt investigate this one.
Here's a suggestion in pseudocode. I'm assuming you store a story identifier with the search terms in the index, so that you can retrieve it with the search results.
def search_strings_matching(story_id_to_match, search_strings):
result = set()
for s in search_strings:
result_story_ids = query_index(s) # query_index returns an id iterable
if story_id_to_match in result_story_ids:
result.add(s)
return result
This is probably less interesting to you now, since you've already solved your problem, but what you're describing sounds like Prospective Search, which is what you call it when you have the query first and you want to match it against documents as they come along.
Lucene's MemoryIndex is a class that was designed specifically for something like this, and in your case it might be efficient enough to run many queries against a single document.
This has nothing to do with Python, though. You'd probably be better off writing something like this in java.
If you are writing Python on AppEngine, you can use the AppEngine Prospective Search Service to achieve exactly what you are trying to do here. See: http://code.google.com/appengine/docs/python/prospectivesearch/overview.html

Categories

Resources