How to delete document from index by it's path in Whoosh - python

First i add documents to index like this:
writer.add_document(title=doc_path.split(os.sep)[-1], path=doc_path, content=text, textdata=text)
And then i just need to delete one of them completely from index by it's path. Documentation says there are few no low level method to do this:
delete_by_term(fieldname, termtext)
Deletes any documents where the given (indexed) field contains the
given term. This is mostly useful for ID or KEYWORD fields.
delete_by_query(query)
Deletes any documents that match the given query.
but i can't find suitable and very convenient method for me where i can specify path of the document and just remove it. There is some low level method where i can specify internal doc_number, which i supposed to get somehow.
Can anyone give me advice how it's better to accomplish this task?

ix = open_dir('/my_index_dir_path/..')
writer = ix.writer()
writer.delete_by_term('path', doc_path)
writer.commit()
delete_by_term
method does exactly what i need. Note, that first argument is a text string 'path', and them goes the actual path. My mistake was to put an actual path instead of attribute name.

Related

How to get Document Name from DocumentReference in Firestore Python

I have a document reference that I am retreiving from a query on my Firestore database. I want to use the DocumentReference as a query parameter for another query. However, when I do that, it says
TypeError: sequence item 1: expected str instance, DocumentReference found
This makes sense, because I am trying to pass a DocumentReference in my update statement:
db.collection("Teams").document(team).update("Dictionary here") # team is a DocumentReference
Is there a way to get the document name from a DocumentReference? Now before you mark this as duplicate: I tried looking at the docs here, and the question here, although the docs were so confusing and the question had no answer.
Any help is appreciated, Thank You in advance!
Yes,split the .refPath. The document "name" is always the last element after the split; something like lodash _.last() can work, or any other technique that identifies the last element in the array.
Note, btw, the refPath is the full path to the document. This is extremely useful (as in: I use it a lot) when you find documents via collectionGroup() - it allows you to parse to find parent document(s)/collection(s) a particular document came from.
Also note: there is a pseudo-field __name__ available. (really an alias of documentID()). In spite of it's name(s), it returns the FULL PATH (i.e. refPath) to the document NOT the documentID by itself.
I think I figured out - by doing team.path.split("/")[1] I could get the document name. Although this might not work for all firestore databases (like subcollections) so if anyone has a better solution, please go ahead. Thanks!

How to find all cells matching a regex with gspread?

So I am very new to programming and I am using python gspread module to use a google sheet as a database.
There's a function for said module called sheet.findall(query, row, column), and this is great, but there's one issue, the query parameter will only look for an exact match, meaning that if i write "DDG", it will not get me the info from a cell with the value of "DDG-87".
After reading the documentation, I found out that you can use python regular expressions to structure the query parameter, so I did that, but there's a problem; The second parameter in re.findall is WHERE to look for, but the issue is that the whole variable is the action of searching, example shown below:
search = sheet.findall(re.findall("[DDG]", The where to search goes here))
As you can see, the whole variable (SEARCH) is the search function, and therefore, I can not specify where to search.
I have tried to set the second parameter of the regex as (SEARCH), but obviously, it won't work.
Any idea or a clue on how I can set the second parameter of re.findall() to be self, or what I can do so that the function doesn't search for an exact match, but if it contains the text?
Thank you.
From the gspread docs:
Find all cells matching a regexp:
criteria_re = re.compile(r'(Small|Room-tiering) rug')
cell_list = worksheet.findall(criteria_re)
So the following should work in your case:
criteria_re = re.compile(r'DDG.*')
search = sheet.findall(criteria_re)

LDAP extensible match filter LDAP_MATCHING_RULE_IN_CHAIN

When I run the following I end up with a good list of results:
base = 'OU=Security Groups,OU=Groups,DC=myserver,DC=com'
criteria = 'CN=My Example'
attributes = ['member', 'groupType', 'description', 'memberOf']
result = connection.search_ext_s(base, ldap.SCOPE_SUBTREE, criteria, attributes, sizelimit=0)
However I can't seem to find anything that helps me when using an LDAP_MATCHING_RULE_IN_CHAIN.
base = 'OU=Security Groups,OU=Groups,DC=myserver,DC=com'
criteria = '1.2.840.113556.1.4.1941:=CN=MatchedRuleChainExample'
attributes = ['member', 'groupType', 'description', 'memberOf']
result = connection.search_ext_s(base, ldap.SCOPE_SUBTREE, criteria, attributes, sizelimit=0)
The above always returns blank. Can anyone help me grasp this? I feel completely lost on how to get through the subgroups in Python.
This criteria syntax 1.2.840.113556.1.4.1941:=CN=MatchedRuleChainExample is wrong.
The string representation of an LDAP extensible match filter must be comprised of the following components in order :
An opening parenthesis
The name of the attribute type, or an empty string if none was provided
The string ":dn" if the dnAttributes flag is set, or an empty string if not
If a matching rule ID is available, then a string comprised of a colon followed by that OID, or an empty string if there is no matching
rule ID
The string ":="
The string representation of the assertion value
A closing parenthesis
To sum it up, it should look like :
([<attr>][:dn][:<OID>]:=<assertion>)
# In your case, fixing the attribute position :
(cn:1.2.840.113556.1.4.1941:=MatchedRuleChainExample)
But there is another issue here : LDAP_MATCHING_RULE_IN_CHAIN only works when used with Distinguished Names (DN) type attributes (like member or memberOf that are commonly used with extensible match filter), but cn is not, so it can't work.
To grab all Security Groups member of CN=My Example, including nested groups, use the memberOf attribute with extensible match and apply it to the group's dn.
# Fixing the attribute type and assertion value :
(memberOf:1.2.840.113556.1.4.1941:=<groupDN>)
Also, you need to filter objectClass to match only group entries (group members could also be users or machines). So in the end, the filter criteria should look like :
(&(objectClass=groupOfNames)(memberOf:1.2.840.113556.1.4.1941:=CN=My Example,OU=Security Groups,OU=Groups,DC=myserver,DC=com))
cf. Active Directory Group Related Searches
Note that LDAP_MATCHING_RULE_IN_CHAIN is available only on Domain Controllers with Windows Server 2003 R2 (or above).

Python search for value in string using wildcards

I'm trying to make an address book for a school project, but I can't get my head around searching for values when part of the value is queried.
Here is the block of code at which I am stuck on:
self.ui.tableWidget.setRowCount(0)
with open("data.csv") as file:
for rowdata in csv.reader(file):
row = self.ui.tableWidget.rowCount()
if query in rowdata:
self.ui.tableWidget.insertRow(row)
for column, data in enumerate(rowdata):
item = QtGui.QTableWidgetItem(data)
self.ui.tableWidget.setItem(row, column, item)
As you can see, I'm using a PyQt TableWidget to display search results from a csv file. The code above does work, but it only displays the result when a full query is given. The line of code that checks for a match is:
if query in rowdata:
So for example, if I wanted to find someone called John, I would have to search for exactly "John". It wouldn't appear if I searched "Joh" or "john" or "hn" and so on...
If you require more information, just ask :P
What I think you want to do is search for query within the fields of your rows, not the whole row at once.
The problem you're having is that string in obj does different things depending on what type obj has. If it is a string, it does a substring search (like you want). If it is a non-string container however, it does a regular membership check (which is why it only finds exact matches now).
So to fix it, change your test from if query in rowdata to if any(query in field for field in rowdata). If you only want to search for matches within a specific field (like the contact's name) then it could be even simpler: if query in rowdata[name_column] (where name_column is the number of the column of the name field is in your CSV file).

How to search for ZCatalog object names

I want to Search for an Object name.
If i have this Structure:
/de/myspace/media/justAnotherPdf.pdf
Then i want to Search for the name "justAnotherPdf" to find it or something like "justAnot"
I have Indexed the pdf files.
But i cant search it with TextIndexNG2 or PathIndex.
Currently this is not supported out-of-the-box. Object identifiers (getId) are only indexed as field values and thus can only be looked up as whole strings.
You'd need to add separate index to the catalog to support your use-case. You could add a new TextIndexNG2 index with a new name indexing just the getId method. In the ZMI, find the portal_catalog, then it's 'Indexes' tab, then on the right-hand-side you'll find a drop-down menu for adding a new index. Pick a memorable name ('fullTextId' for example) and use getId as the indexed attribute.
You'll need to do a reindex, but only for that index. Once added, select it in on the Indexes tab (tick the check-box) and select 'Reindex' at the bottom of that page. Now you can use this index in your custom searches with a wildcard search.
import os.path
name = os.path.splitext(os.path.split(url)[1])[0]
explaining the code:
from os.path import split, splitext
url = '/de/myspace/media/justAnotherPdf.pdf'
path, name_with_ext = split(url)
name_without_ext, ext = splitext(name_with_ext)

Categories

Resources