multiple words search in a sentence using pandas data frame [closed] - python

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
Have a column containing sentences in a standard format. I am trying to retrieve the rows where sentence contains particular key words.
data is like this
***Damage, Location, Near Location***
Corrosion, Bonnet, Left Head light
Corrosion, Bonnet, Right Head light
Corrosion, Left Door, Near Handle
Scratch, Right Door, Near Handle
Dent, Right Door, Near Handle
Dent, Bonnet, Left Head light
list1=[corrosion,Bonnet]
I am trying to pass words as list (list1) and i only need the rows which have both words. I tried contains but working only for one word.

You may try using contains here:
list1 = ['Corrosion', 'Bonnet']
regex = r'\b(?:' + '|'.join(list1) + r')\b'
df[df['data'].str.contains(regex, regex=True)]
By the way, the regex pattern we are using here to search is an alternation, which says to match any of the search terms:
\b(?:Corrosion|Bonnet)\b

Lets assume, you have a dataframe as df
df = df[(df['Damage'] == list[0]) & (df['Location'] == list[1])]

Related

Finding a combination of characters and digits in a string python [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 months ago.
Improve this question
I have a list of lists, and I'm attempting to loop through and check to see if the strings in a specific index in each of the inner lists contain a combination of "XY" and then 4 numbers immediately following. The "XY" could be in various locations of the string, so I'm struggling with the syntax beyond just using "XY" in row[5]. How to I add the digits after the "XY" to check? Something that combines the "XY" and isdigit()? Am I stuck using the find function to return an index and then going from there?
You can use Python's regex module re with this pattern that matches XY and then four digits anywhere in the string.
import re
pattern = r'XY\d{4}'
my_list = [['XY0'],['XY1234','AB1234'],['XY1234','ABC123XY5678DEF6789']]
elem_to_check = 1
for row in my_list:
if len(row) > elem_to_check:
for found in re.findall(pattern, row[elem_to_check]):
print(f'{found} found in {row[elem_to_check]}')
Output:
XY5678 found in ABC123XY5678DEF6789

Can I slice a sentence while ignoring whitespaces? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
Like the title says, can I slice a sentence while ignoring whitespaces in Python?
For example, if the last letter of my word is sliced, the second letter of the following word needs to be sliced (while I'm using [::2]). I also have to preserve punctuations, so split isn't really an option. Replacing whitespaces isn't an option either, because I would have no way to put them back in the correct spot.
Sample input:
Myevmyozrtilets gwaaarkmv yuozub ubpi farfokm ctbhpe pientsfiydqe. zBmuvtk tahgelyu anlpsmo ttzevagrk yioquj awpyaoryts.
Expected output:
Memories warm you up from the inside. But they also tear you apart.
Sample implementation below.
Takes in consideration the punctuation (it looks like you've got it apart of the whitespace).
You'd enjoy trying to implement it on your own, I'm sure.
f="Myevmyozrtilets gwaaarkmv yuozub ubpi farfokm ctbhpe pientsfiydqe. zBmuvtk tahgelyu anlpsmo ttzevagrk yioquj awpyaoryts."
def g(f):
c=0
for l in f:
if l not in string.ascii_letters:
yield l
else:
if c%2==0:
yield l
c+=1
''.join(g(f))
'Memories warm you up from the inside. But they also tear you apart.'

Python text document similarities (w/o libraries) [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I need to create a python program vanilla (without library), which can compute text document similarities between different documents.
The program takes documents as an input and computes a dictionary (matrix) for words of the given input. Each document consists of a sentence and when a new document goes into the program, we need to compare it to the other documents in order to find similar documents. See example below:
Given text input:
input_text = ["Why I like music", "Beer and music is my favorite combination",
"The sun is shining", "How to dance in GTA5", ]
The sentences have to be transformed into vectors, see example:
Hope you can help.
Here some ideas:
use new_str = str.upper() so beer and Beer will be same (if you
need this)
use list = str.split() to make a list of the words
in your string.
use set = set(list) to get rid of double words
if needed.
start with an empty word_list. Copy the first set in the word_list. In the following steps you can loop over the entries in your set and check if they are part of your word_list.
for word in set:
if word not in word_list:
word_list.append(word)
Now you can make a multi-hot vector from your sentence. (1 if word_list[i] in sentence else 0)
Don't forget to make your multi-hot vectors longer (additional zeros) if you add a word to word_list.
last step: make a matrix from your vectors.

A folder contains files of dog breeds like `Boston_terrier_02303.jpg`. I want to remove the numeric parts and also the `_`. How do I achieve this? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 years ago.
Improve this question
My folder contains .jpg files in folder. I need to fetch only the characters from the file names.
I removed all the non alphabets but it resulted in a single string without spaces
Input: Boston_terrier_02303.jpg
Desired Output: Boston terrier
Assuming that you always have the same structure (n word fragments, 1 number, and the output), you can simply get your desired result by:
new_string = " ".join(string.split("_")[:-1])
To elaborate:
You start by splitting the strings at the underscores, and then selecting everything but the last. Then simply join the remaining strings with a space between them.

Elongated words and combination of words in a sentence python [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I have a few lines such as:
biggestfoolofall, sooo, hiiieee, footballfan
If you notice the pattern above, either there are a combination of words in 1 word itself such as "biggestfoolofall" "footballfan".
1) I wanted to know how I can understand that its a multi-word within 1 words.
2) sooo and hiiieee are elongated words.I want to detect elongated words in python. How can I do that?
I am new to python so got stuck at this part. Also, if you can share any helpful sites to learn for loops, strings split etc then it would be very helpful
I guess you have a list of valid words. So iterate over your words and check if they are in your line:
for word in words: # iterate over all valid words
if word in line: # if a valid word is found in line
print 'I found a valid word: '+word
line.replace(word,'') # remove the word from your line
At the end, you end up with finding all valid words and only junk characters left in your "line" variable.
See string methods for further string operations.

Categories

Resources