Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 1 year ago.
Improve this question
Is there a Python library that can detect/highlight all the dipthongs (in normal spelling not IPA) in a given English text?
If you are defining 'dipthongs' as just letter pairs/combinations that typically indicate double vowel sounds ('ee', 'ou', etc.), then something like the following would work to go through and search for letter combinations from a pre-defined set:
while len(text) > 0:
# Iterate over dipthongs
for d in DIPTHONGS:
# If dipthong is in remaining text
if d in text:
# Partition remaining text
before, dip, after = text.partition(d)
# Append the part before and the highlighted dipthong to highlighted_text
highlighted_text = highlighted_text + before
highlighted_text = highlighted_text + f'*{d}*'
# Update text to the remaining text
text = after
else:
# No dipthongs found, so append remainder of text to highlighted_text
highlighted_text = highlighted_text + text
text = ''
print(highlighted_text)
Output:
I have used asterisks for the highlight as it's quick and easy but you could easily adapt this use colours, or whatever you need for your use case.
I can't think of any examples off the top of my head, but I suspect there are cases where the spelling is dipthong-like but the pronuncation is not a dipthong or vice-versa (because English is like that). To actually take the pronunciation into consideration you could use something like the NLTK CMUdict corpus - see section 4.2 of https://www.nltk.org/book/ch02.html to get started.
Related
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 1 year ago.
Improve this question
Does anyone knows how to do similar in Python to Bash' seq ?
For example, I would like to search for a line in a document that contains the chain of characters "measurement = abc" and replace "abc" with something of my own (using "{}".format() or something like it).
Do I need to check each string/substring in the document with an iterative loop or is there a built-in command or package in Python that already does that ?
Thank you!
I would do this with the regex implementation in Python (built-in re module).
This can be done with a positive lookbehind assertion.
(?<=...) Matches if the current position in the string is preceded by a match for ... that ends at the current position.
import re
data = "measurement = abc some other text measurement = abc some other text"
data = re.sub(r'(?<=measurement\s=\s)abc', '123', data) # positive lookbehind assertion
print(data) # "measurement = 123 some other text measurement = 123 some other text"
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
Have a column containing sentences in a standard format. I am trying to retrieve the rows where sentence contains particular key words.
data is like this
***Damage, Location, Near Location***
Corrosion, Bonnet, Left Head light
Corrosion, Bonnet, Right Head light
Corrosion, Left Door, Near Handle
Scratch, Right Door, Near Handle
Dent, Right Door, Near Handle
Dent, Bonnet, Left Head light
list1=[corrosion,Bonnet]
I am trying to pass words as list (list1) and i only need the rows which have both words. I tried contains but working only for one word.
You may try using contains here:
list1 = ['Corrosion', 'Bonnet']
regex = r'\b(?:' + '|'.join(list1) + r')\b'
df[df['data'].str.contains(regex, regex=True)]
By the way, the regex pattern we are using here to search is an alternation, which says to match any of the search terms:
\b(?:Corrosion|Bonnet)\b
Lets assume, you have a dataframe as df
df = df[(df['Damage'] == list[0]) & (df['Location'] == list[1])]
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 3 years ago.
Improve this question
Below is the code and output.
article = Article(url,keep_article_html=True)
article.download()
article.parse()
print(article.article_html)
<div><p class="tt_adsense_top">
</p>
<p> test test </p>
I want to delete this part from the string
<div><p class="tt_adsense_top">
</p>
<p>
only leave
<p> test test </p>
when i use python re to match it, i can only match this line i don't know how to match blank line and blank space.
<div><p class="tt_adsense_top">
who can give me a example to delete it
The 'replace' function for python strings would be the easiest way. So for example you would do
a = "some string removethis"
a = a.replace("removethis", "")
You could also use 'remove' function, and the important distinction between that and replace is that 'remove' only removes the first string it finds while replace will replace all found substrings with the 2nd argument you pass to the function. I would advise reading the python documentation on string methods it has a lot interesting and important functions you can play around with.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I need to create a python program vanilla (without library), which can compute text document similarities between different documents.
The program takes documents as an input and computes a dictionary (matrix) for words of the given input. Each document consists of a sentence and when a new document goes into the program, we need to compare it to the other documents in order to find similar documents. See example below:
Given text input:
input_text = ["Why I like music", "Beer and music is my favorite combination",
"The sun is shining", "How to dance in GTA5", ]
The sentences have to be transformed into vectors, see example:
Hope you can help.
Here some ideas:
use new_str = str.upper() so beer and Beer will be same (if you
need this)
use list = str.split() to make a list of the words
in your string.
use set = set(list) to get rid of double words
if needed.
start with an empty word_list. Copy the first set in the word_list. In the following steps you can loop over the entries in your set and check if they are part of your word_list.
for word in set:
if word not in word_list:
word_list.append(word)
Now you can make a multi-hot vector from your sentence. (1 if word_list[i] in sentence else 0)
Don't forget to make your multi-hot vectors longer (additional zeros) if you add a word to word_list.
last step: make a matrix from your vectors.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
How can I write following while loop in python?
int v,t;
while (scanf("%d %d",&v,&t)==2)
{
//body
}
Python being a higher level language than C will normally use different patterns for this kind of situation. There is a deleted question that in fact had the exact same behavior than your C snippet - and thus would be "correct", but it became so ugly a piece of code in Python it was downvoted pretty fast.
So, first things first - it is 2015, and people can't simply look at a C's "scanf" on the prompt and divine they should type two white space separated integer numbers - you'd better give then a message. Anther big difference is that in Python variable assignments are considered statements, and can't be done inside an expression (the while expression in this case). So you have to have a while expression that is always true, and decide whether to break later.
Thus you can go with a pattern like this.
while True:
values = input("Type your values separated by space, any other thing to exit:").split()
try:
v = int(values[0])
t = int(values[1])
except IndexError, ValueError:
break
<body>
This replaces the behavior of your code:
import re
while True:
m = re.match("(\d) (\d)", input()):
if m is None: #The input did not match
break #replace it with an error message and continue to let the user try again
u, v = [int(e) for e in m.groups()]
pass #body