Python search for value in string using wildcards - python

I'm trying to make an address book for a school project, but I can't get my head around searching for values when part of the value is queried.
Here is the block of code at which I am stuck on:
self.ui.tableWidget.setRowCount(0)
with open("data.csv") as file:
for rowdata in csv.reader(file):
row = self.ui.tableWidget.rowCount()
if query in rowdata:
self.ui.tableWidget.insertRow(row)
for column, data in enumerate(rowdata):
item = QtGui.QTableWidgetItem(data)
self.ui.tableWidget.setItem(row, column, item)
As you can see, I'm using a PyQt TableWidget to display search results from a csv file. The code above does work, but it only displays the result when a full query is given. The line of code that checks for a match is:
if query in rowdata:
So for example, if I wanted to find someone called John, I would have to search for exactly "John". It wouldn't appear if I searched "Joh" or "john" or "hn" and so on...
If you require more information, just ask :P

What I think you want to do is search for query within the fields of your rows, not the whole row at once.
The problem you're having is that string in obj does different things depending on what type obj has. If it is a string, it does a substring search (like you want). If it is a non-string container however, it does a regular membership check (which is why it only finds exact matches now).
So to fix it, change your test from if query in rowdata to if any(query in field for field in rowdata). If you only want to search for matches within a specific field (like the contact's name) then it could be even simpler: if query in rowdata[name_column] (where name_column is the number of the column of the name field is in your CSV file).

Related

Python and csv reader: Failing to print all the rows in a text file that do NOT have a field that contains a certain string

I have the following repl.it program, and cannot get one part of the program to work (the logic is wrong)
The context is that of a dating site. in the "matchmagic" subroutine, I want to be able to retrieve all rows in the database that do NOT have the keystrength variable.
In other words, if a user types in "patience", then every row in the text file that DOES NOT contain that word, is displayed (i.e all the users that are not patient) as we are going for contrasting personalities for a match.
The whole program is here:
https://repl.it/#oiuwdeoiuas/Matchmakingskills-1
The relevant part of the program is:
def matchmagic():
wordfound=False
print("===Creating Match===")
while wordfound==False:
with open("dating.txt","r") as f:
keystrength=input("Enter one of your key strengths:")
reader=csv.reader(f)
for row in reader:
for field in row:
if field != keystrength:
print(row)
wordfound=True
search()
mainmenu()
What I have tried here is obvious, but I think there is an issue with the following:
if field != keystrength:
print(row)
wordfound=True
It prints all the rows instead of identifying the rows that do not contain that identified keystrength.
Sample CSV:
Joe,Bloggs,JoeBbird,open123,M,jblogs#gmail.com,10/10/20,Christian,patience,0
FName,LName,Username,password,Gender,email,dob,Religion,keystrength,contactcount
In the example above, if this user is logged in, their keystrength is "patience", and the program should return all the rows (usernames or first and last names) of users that do NOT have "patience" listed anywhere in their files.
If you add a header line to the CSV file when it is first created, you can use csv.DictReader to read the rows and use the column name to filter. This is nice because both your CSV and your code self document. So, just testing one column, your code could be:
import csv
def matchmagic():
print("===Creating Match===")
keystrength=input("Enter one of your key strengths:").upper()
with open("dating.txt","r") as f:
reader=csv.DictReader(f)
return [row for row in reader if row["keystrength"].upper()!=keystrength]
result = matchmagic()
for r in result:
print(r.values())
dating.txt
FName,LName,Username,password,Gender,email,dob,Religion,keystrength,contactcount
Joe,Bloggs,JoeBbird,open123,M,jblogs#gmail.com,10/10/20,Christian,patience,0
Darth,Vader,vader6599,open123,M,dvader#deathstar.com,10/10/20,Sith,impatience,0
This code is testing for exact matches. I added "impatience" to highlight that a test of whether "patience" is just somewhere in the string can be problematic.

Searching sub string value in Numpy Array

First of all, I am using "Python" and the latest Pycharm Community edition.
I am currently working on a user interface with tkinter which is requesting several values from the user - two string values and one integer. Afterwards the program should search through an Excel or CSV file to find those values. Unfortuantely, I am currently stucked on the first entry. I've created a numpy array out of the dataframe, since I've read that arrays are much faster when it comes to work with great data. The final excel/csv file I am working with will contain several thousands of rows and up to 60 columns. In addition, the enrty_name could be a sub string of a bigger string and the search algortihm should find the fraction or the full name ( example: entry: "BMW", in array([["BMW Werk", "BMW-Automobile", "BMW_Client"], ["BMW Part1", "BMW Part2", "XS-12354"]) ). Afterwards I would like to proceed with other calculations based on the values in the array.
Example: entry: "BMW", in array([["BMW Werk", "Car1", "XD-12345"], ["BMW Part1", "exauster", "XS-12354"]])
Program found "BMW Werk" and "BMW Part1" in array, returns ["BMW Werk", "Car1", "XD-12345"] and ["BMW Part1", "exauster", "XS-12354"]
entry_name = "BMW"
path_for_excel = "D:\Python PyCharm\Tool\Clientlist.xslx"
client_list_df= pd.read_excel(path_for_excel , engine="openpyxl")
client_list_array= client_list_df.to_numpy()
#first check if entry_name is populated ( entry field in ui )
if entry_name == True:
#search for sub string in string
part_string_value = np.char.startswith(client_list_array, entry_name)
if part_string_value in client_list_array:
index = np.where(client_list_array == part_string_value)
#print found value, including the other values in the list
print(client_list_array[])
I am able to retrieve the requested values if the client is using the correct, full name like "BMW Werk", but any typo will hinder the process and it is very exausting for some names to type the full name, as an example one name looks like: "BMW Werk Bloemfontein, 123-45, Willows".
Hopefully, somebody finds time to help with my issue.
Thank you !

How to delete document from index by it's path in Whoosh

First i add documents to index like this:
writer.add_document(title=doc_path.split(os.sep)[-1], path=doc_path, content=text, textdata=text)
And then i just need to delete one of them completely from index by it's path. Documentation says there are few no low level method to do this:
delete_by_term(fieldname, termtext)
Deletes any documents where the given (indexed) field contains the
given term. This is mostly useful for ID or KEYWORD fields.
delete_by_query(query)
Deletes any documents that match the given query.
but i can't find suitable and very convenient method for me where i can specify path of the document and just remove it. There is some low level method where i can specify internal doc_number, which i supposed to get somehow.
Can anyone give me advice how it's better to accomplish this task?
ix = open_dir('/my_index_dir_path/..')
writer = ix.writer()
writer.delete_by_term('path', doc_path)
writer.commit()
delete_by_term
method does exactly what i need. Note, that first argument is a text string 'path', and them goes the actual path. My mistake was to put an actual path instead of attribute name.

Get by name in google app engine

Instead of using get_by_id() method for getting the id of a specific entry and print the content of this entry from the google datastore, i am trying to get the name of the url and print the content. For example:
print all the content that have this specific name(may have more than one rows of content with this name)
print the content of the specific id
i am using get_by_id(long(id)) to get the id in the second part of my example, and its working. I am trying to use get_by_key_name(name) but it does not working. any ideas on that? thank you.
sorry, but since i couldn't leave a comment, i am editing my question. Basically, since now i can get all the name of animals from my datastore and i have made them clickable using an html code in template file. In the datastore, there are entries with the same name of animal more than one times (e.g. name= duck, content= water and name=duck, content=lake). Now, when i am clicking into every name of animals(i have use the DINSTINCT in my gql query to print redundant elements(e.g. duck) only one time).Since the name=duck has two contents, when i am clicking on the name of the duck i want to see both of the contents. My problem is if i am using get_by_id(long(id)) i get the unique id of every element. But this will not print me both of the content of the name=duck because every entry has a unique id. But i want all the content of the entries with the same name. I am trying the following but it does not working.
msg = MODEL.Animals.get_by_key_name(name)
self.response.write("%s" % msg.content)
With get_by_id() you can get entity only if you know this ID. This operations named "Small operations" in quota and they are cheaper than datastore reads, but to get list of entities filtered by indexed property - you should use filters.
query = MODEL.Animals.query()
query = query.filter(MODEL.Animals.name == 'duck')
ducks = query.fetch(limit=100) # limit number of returned animals
for duck in ducks:
self.response.write('%s - %s' % (duck.name, duck.content))
By default, all string properties are indexed, so you will be able to do such requests.

Can a formfield be selected w/mechanize based on the type of the field (eg. TextControl, TextareaControl)?

I'm trying to parse an html form using mechanize. The form itself has an arbitrary number of hidden fields and the field names and id's are randomly generated so I have no obvious way to directly select them. Clearly using a name or id is out, and due to the random number of hidden fields I cannot select them based on the sequence number since this always changes too.
However there are always two TextControl fields right after each other, and then below that is a TextareaControl. These are the 3 fields I need access too, basically I need to parse their names and all is well. I've been looking through the mechanize documentation for the past couple hours and haven't come up with anything that seems to be able to do this, however simple it should seem to be (to me anyway).
I have come up with an alternate solution that involves making a list of the form controls, iterating through it to find the controls that contain the string 'Text' returning a new list of those, and then finally stripping out the name using a regular expression. While this works it seems unnecessary and I'm wondering if there's a more elegant solution. Thanks guys.
edit: Here's what I'm currently doing to extract that info if anyone's curious. I think I'm probably just going to stick with this. It seems unnecessary but it gets the job done and it's nothing intensive so I'm not worried about efficiency or anything.
def formtextFieldParse(browser):
'''Expects a mechanize.Browser object with a form already selected. Parses
through the fields returning a tuple of the name of those fields. There
SHOULD only be 3 fields. 2 text followed by 1 textarea corresponding to
Posting Title, Specific Location, and Posting Description'''
import re
pattern = '\(.*\)'
fields = str(browser).split('\n')
textfields = []
for field in fields:
if 'Text' in field: textfields.append(field)
titleFieldName = re.findall(pattern, textfields[0])[0][1:-2]
locationFieldName = re.findall(pattern, textfields[1])[0][1:-2]
descriptionFieldName = re.findall(pattern, textfields[2])[0][1:-2]
I don't think mechanize has the exact functionality you require; could you use mechanize to get the HTML page, then parse the latter for example with BeautifulSoup?

Categories

Resources