Title case with a list of exception words - python

Im trying to come up with something that will "title" a string of words. It should capitalize all words in the string unless given words not to capitalize as an argument. But will still capitalize the first word no matter what. I know how to capitalize every word, but I dont know how to not capitalize the exceptions. Kind of lost on where to start, couldnt find much on google.
def titlemaker(title, exceptions):
return ' '.join(x[0].upper() + x[1:] for x in title.split(' '))
or
return title.title()
but I found that will capitalize a letter after an apostrophe so I dont think I should use it.
Any help on how I should take into account the exceptions would be nice
example: titlemaker('a man and his dog', 'a and') should return 'A Man and His Dog'

def titlemaker(title,exceptions):
exceptions = exceptions.split(' ')
return ' '.join(x.title() if nm==0 or not x in exceptions else x for nm,x in enumerate(title.split(' ')))
titlemaker('a man and his dog','a and') # returns "A Man and His Dog"
The above assumes that the input string and the list of exceptions are in the same case (as they are in your example), but would fail on something like `titlemaker('a man And his dog','a and'). If they could be in mixed case do,
def titlemaker(title,exceptions):
exceptionsl = [x.lower() for x in exceptions.split(' ')]
return ' '.join(x.title() if nm==0 or not x.lower() in exceptions else x.lower() for nm,x in enumerate(title.split(' ')))
titlemaker('a man and his dog','a and') # returns "A Man and His Dog"
titlemaker('a man AND his dog','a and') # returns "A Man and His Dog"
titlemaker('A Man And His DOG','a and') # returns "A Man and His Dog"

Try with this:
def titleize(text, exceptions):
exceptions = exceptions.split()
text = text.split()
# Capitalize every word that is not on "exceptions" list
for i, word in enumerate(text):
text[i] = word.title() if word not in exceptions or i == 0 else word
# Capitalize first word no matter what
return ' '.join(text)
print titleize('a man and his dog', 'a and')
Output:
A Man and His Dog

def titleize(text, exceptions):
return ' '.join([word if word in exceptions else word.title()
for word in text.split()]).capitalize()

import re
import nltk
from nltk.corpus import stopwords
from itertools import chain
def setTitleCase(title):
exceptions = []
exceptions.append([word for word in stopwords.words('english')])
exceptions.append([word for word in stopwords.words('portuguese')])
exceptions.append([word for word in stopwords.words('spanish')])
exceptions.append([word for word in stopwords.words('french')])
exceptions.append([word for word in stopwords.words('german')])
exceptions = list(chain.from_iterable(exceptions))
list_of_words = re.split(' ', title)
final = [list_of_words[0].capitalize()]
for word in list_of_words[1:]:
word = word.lower()
if word in exceptions:
final.append(word)
else:
final.append(word.capitalize())
return " ".join(final)
print(setTitleCase("a Fancy Title WITH many stop Words and other stuff"))
Wich gives you as answer: "A Fancy Title with Many Stop Words and other Stuff"

Related

How do input more than one word for translation in Python?

I'm trying to make a silly translator game as practice. I'm replacing "Ben" with "Idiot" but it only works when the only word I input is "Ben". If I input "Hello, Ben" then the console prints out a blank statement. I'm trying to get "Hello, Idiot". Or if I enter "Hi there, Ben!" I would want to get "Hi there Idiot!". If I input "Ben" then it converts to "Idiot" but only when the name by itself is entered.
I'm using Python 3 and am using function def translate(word): so maybe I'm over-complicating the process.
def translate(word):
translation = ""
if word == "Ben":
translation = translation + "Idiot"
return translation
print(translate(input("Enter a phrase: ")))
I'm sorry if I explained all of this weird. Completely new to coding and using this website! Appreciate all of the help!
use str.replace() function for this:
sentence = "Hi there Ben!"
sentence=sentence.replace("Ben","Idiot")
Output: Hi there Idiot!
#str.replace() is case sensitive
At first, you must split string to words:
s.split()
But that function, split string to words by white spaces, it's not good enough!
s = "Hello Ben!"
print(s.split())
Out: ["Hello", "Ben!"]
In this example, you can't find "Ben" easily.
We use re in this case:
re.split('[^a-zA-Z]', word)
Out: ["Hello", "Ben", ""]
But, we missed "!", We change it:
re.split('([^a-zA-Z])', word)
Out: ['Hello', ' ', 'Ben', '!', '']
and finally:
import re
def translate(word):
words_list = re.split('([^a-zA-Z])', word)
translation = ""
for item in words_list:
if item == "Ben":
translation += "Idiot"
else:
translation += item
return translation
print(translate("Hello Ben! Benchmark is ok!"))
P.S:
If we use replace, we have a wrong answer!
"Hello Ben! Benchmark is ok!".replace("Ben", "Idiot")
Out: Hello Idiot! Idiotchmark is ok!

Looking for single exact word within string

I am trying to find a single exact word within a large string.
I have tried the below:
for word in words:
if word in strings:
best.append("The word " + word + " The Sentence " + strings)
else:
pass
This seemed to work at first until tried with a larger set of words in a much larger string and was getting partial matches. As an example if the word is "me" it would pass "message" off as being found.
Is there a way of searching for exactly "me"?
Thanks in advance.
You need to set boundaries in order to find complete word. I'd go to regex. Something like:
re.search(r'\b' + word_to_find + r'\b')
You can split the string into words and then perform the in operation, making sure you strip the words in the list and the string of any trailing whitespaces
import string
def find_words(words, s):
best = []
#Strip extra whitespaces if any around the word and make them all lowercase
modified_words = [word.strip().lower() for word in words]
#Strip away punctuations from string, and make it lower
modified_s = s.translate(str.maketrans('', '', string.punctuation))
words_list = [word.strip().lower() for word in modified_s.lower().split()]
#Iterate through the list
for idx, word in enumerate(modified_words):
#If word is found in lit of words, append to result
if word in words_list:
best.append("The word " + words[idx] + " The Sentence " + s)
return best
print(find_words(['me', 'message'], 'I me myself'))
print(find_words([' me ', 'message'], 'I me myself'))
print(find_words(['me', 'message'], 'I me myself'))
print(find_words(['me', 'message'], 'I am me.'))
print(find_words(['me', 'message'], 'I am ME.'))
print(find_words(['Me', 'message'], 'I am ME.'))
The output will be
['The word me The Sentence I me myself']
['The word me The Sentence I me myself']
['The word me The Sentence I me myself']
['The word me The Sentence I am me.']
['The word me The Sentence I am ME.']
['The word Me The Sentence I am ME.']
You can also use regex to find the word exactly. \\b means boundary like space or punctuation marks.
for word in words:
if len(re.findall("\\b" + word + "\\b", strings)) > 0:
best.append("The word " + word + " The Sentence " + strings)
else:
pass
The double backslashes are due to a '\b' character being the backspace control sequence. Source
You could include the surrounding spaces in the if statement.
for word in words:
if f' {word} ' in strings:
best.append("The word " + word + " The Sentence " + strings)
else:
pass
To make sure you don't detect words inside words that they are contained within (like "me" in "message" or "flame") is to add spaces before and after the words in the detection. The easiest way of doing this is to replace
if word in strings:
with
if " "+word+" " in strings:
Hope this helps! -Theo
You need to set boundaries for your search, \b is the boundary character.
import re
string = 'youyou message me me me me me'
print(re.findall(r'\bme\b', string))
The string has message and me, we only need me explicitly. So added boundaries in my search expression. The result is below -
['me', 'me', 'me', 'me', 'me']
Got all the me(s), but not the message which also has a me in it.
Without knowing the rest of the code, the best I could suggest is using == to get a direct match, so for example
a = 0
list = ["Me","Hello","Message"]
b = len(list)
i = input("What do you want to find?")
for d in range(b):
if list[a] == i:
print("Found a match")
else:
a = a+1

Python find n-sized window around phrase within string

I have a string, for example 'i cant sleep what should i do'as well as a phrase that is contained in the string 'cant sleep'. What I am trying to accomplish is to get an n sized window around the phrase even if there isn't n words on either side. So in this case if I had a window size of 2 (2 words on either size of the phrase) I would want 'i cant sleep what should'.
This is my current solution attempting to find a window size of 2, however it fails when the number of words to the left or right of the phrase is less than 2, I would also like to be able to use different window sizes.
import re
sentence = 'i cant sleep what should i do'
phrase = 'cant sleep'
words = re.findall(r'\w+', sentence)
phrase_words = re.findall(r'\w+', phrase)
print sentence_words[left-2:right+3]
left = sentence_words.index(span_words[0])
right = sentence_words.index(span_words[-1])
print sentence_words[left-2:right+3]
You can use the partition method for a non-regex solution:
>>> s='i cant sleep what should i do'
>>> p='cant sleep'
>>> lh, _, rh = s.partition(p)
Then use a slice to get up to two words:
>>> n=2
>>> ' '.join(lh.split()[:n]), p, ' '.join(rh.split()[:n])
('i', 'cant sleep', 'what should')
Your exact output:
>>> ' '.join(lh.split()[:n]+[p]+rh.split()[:n])
'i cant sleep what should'
You would want to check whether p is in s or if the partition succeeds of course.
As pointed out in comments, lh should be a negative to take the last n words (thanks Mathias Ettinger):
>>> s='w1 w2 w3 w4 w5 w6 w7 w8 w9'
>>> p='w4 w5'
>>> n=2
>>> ' '.join(lh.split()[-n:]+[p]+rh.split()[:n])
'w2 w3 w4 w5 w6 w7'
If you define words being entities separated by spaces you can split your sentences and use regular python slicing:
def get_window(sentence, phrase, window_size):
sentence = sentence.split()
phrase = phrase.split()
words = len(phrase)
for i,word in enumerate(sentence):
if word == phrase[0] and sentence[i:i+words] == phrase:
start = max(0, i-window_size)
return ' '.join(sentence[start:i+words+window_size])
sentence = 'i cant sleep what should i do'
phrase = 'cant sleep'
print(get_window(sentence, phrase, 2))
You can also change it to a generator by changing return to yield and be able to generate all windows if several match of phrase are in sentence:
>>> list(gen_window('I dont need it, I need to get rid of it', 'need', 2))
['I dont need it, I', 'it, I need to get']
import re
def contains_sublist(lst, sublst):
n = len(sublst)
for i in xrange(len(lst)-n+1):
if (sublst == lst[i:i+n]):
a = max(i, i-2)
b = min(i+n+2, len(lst))
return ' '.join(lst[a:b])
sentence = 'i cant sleep what should i do'
phrase = 'cant sleep'
sentence_words = re.findall(r'\w+', sentence)
phrase_words = re.findall(r'\w+', phrase)
print contains_sublist(sentence_words, phrase_words)
you can split words using inbuilt string methods, so re shouldn't be nessesary. If you want to define varrring values, then wrap it in a function call like so:
def get_word_window(sentence, phrase, w_left=0, w_right=0):
w_lst = sentence.split()
p_lst = phrase.split()
for i,word in enumerate(w_lst):
if word == p_lst[0] and \
w_lst[i:i+len(p_lst)] == p_lst:
left = max(0, i-w_left)
right = min(len(w_lst), i+w_right+len(p_list)
return w_lst[left:right]
Then you can get the new phrase like so:
>>> sentence='i cant sleep what should i do'
>>> phrase='cant sleep'
>>> ' '.join(get_word_window(sentence,phrase,2,2))
'i cant sleep what should'

Creating a censoring function from a list of bad words

I'm trying to create a function that censors words in a string. It's kinda working, with a few quirks.
This is my code:
def censor(sentence):
badwords = 'apple orange banana'.split()
sentence = sentence.split()
for i in badwords:
for words in sentence:
if i in words:
pos = sentence.index(words)
sentence.remove(words)
sentence.insert(pos, '*' * len(i))
print " ".join(sentence)
sentence = "you are an appletini and apple. new sentence: an orange is a banana. orange test."
censor(sentence)
And the output:
you are an ***** and ***** new sentence: an ****** is a ****** ****** test.
Some punctuation is gone and the word "appletini" is replaced wrongly.
How can this be fixed?
Also, is there any simpler way of doing this kind of thing?
The specific problems are that:
You don't consider punctuation at all; and
You use the length of the "bad word", not the word, when inserting '*'s.
I would switch the loop order around, so you only process the sentence once, and use enumerate rather than remove and insert:
def censor(sentence):
badwords = ("test", "word") # consider making this an argument too
sentence = sentence.split()
for index, word in enumerate(sentence):
if any(badword in word for badword in badwords):
sentence[index] = "".join(['*' if c.isalpha() else c for c in word])
return " ".join(sentence) # return rather than print
Testing str.isalpha will replace only upper- and lower-case letters with asterisks. Demo:
>>> censor("Censor these testing words, will you? Here's a test-case!")
"Censor these ******* *****, will you? Here's a ****-****!"
# ^ note length ^ note punctuation
Try:
for i in bad_word_list:
sentence = sentence.replace(i, '*' * len(i))

How do I print words with only 1 vowel?

my code so far, but since i'm so lost it doesn't do anything close to what I want it to do:
vowels = 'a','e','i','o','u','y'
#Consider 'y' as a vowel
input = input("Enter a sentence: ")
words = input.split()
if vowels == words[0]:
print(words)
so for an input like this:
"this is a really weird test"
I want it to only print:
this, is, a, test
because they only contains 1 vowel.
Try this:
vowels = set(('a','e','i','o','u','y'))
def count_vowels(word):
return sum(letter in vowels for letter in word)
my_string = "this is a really weird test"
def get_words(my_string):
for word in my_string.split():
if count_vowels(word) == 1:
print word
Result:
>>> get_words(my_string)
this
is
a
test
Here's another option:
import re
words = 'This sentence contains a bunch of cool words'
for word in words.split():
if len(re.findall('[aeiouy]', word)) == 1:
print word
Output:
This
a
bunch
of
words
You can translate all the vowels to a single vowel and count that vowel:
import string
trans = string.maketrans('aeiouy','aaaaaa')
strs = 'this is a really weird test'
print [word for word in strs.split() if word.translate(trans).count('a') == 1]
>>> s = "this is a really weird test"
>>> [w for w in s.split() if len(w) - len(w.translate(None, "aeiouy")) == 1]
['this', 'is', 'a', 'test']
Not sure if words with no vowels are required. If so, just replace == 1 with < 2
You may use one for-loop to save the sub-strings into the string array if you have checked he next character is a space.
Them for each substring, check if there is only one a,e,i,o,u (vowels) , if yes, add into the another array
aFTER THAT, FROM another array, concat all the strings with spaces and comma
Try this:
vowels = ('a','e','i','o','u','y')
words = [i for i in input('Enter a sentence ').split() if i != '']
interesting = [word for word in words if sum(1 for char in word if char in vowel) == 1]
i found so much nice code here ,and i want to show my ugly one:
v = 'aoeuiy'
o = 'oooooo'
sentence = 'i found so much nice code here'
words = sentence.split()
trans = str.maketrans(v,o)
for word in words:
if not word.translate(trans).count('o') >1:
print(word)
I find your lack of regex disturbing.
Here's a plain regex only solution (ideone):
import re
str = "this is a really weird test"
words = re.findall(r"\b[^aeiouy\W]*[aeiouy][^aeiouy\W]*\b", str)
print(words)

Categories

Resources