I have a list like this,
['Therefore', 'allowance' ,'(#)', 't(o)o', 'perfectly', 'gentleman', '(##)' ,'su(p)posing', 'man', 'his', 'now']
Expected output:
['Therefore', 'allowance' ,'(#)', 'too', 'perfectly', 'gentleman', '(##)' ,'supposing', 'man', 'his', 'now']
Removing the brackets is easy by using .replace(), but I don't want to remove the brackets from strings (#) and (##).
my code:
ch = "()"
for w in li:
if w in ["(#)", "(##)"]:
print(w)
else:
for c in ch:
w.replace(c, "")
print(w)
but this doesn't remove the brackets from the words.
You can use re.sub. In particular, note that it can take a function as repl parameter. The function takes a match object, and returns the desired replacement based on the information the match object has (e.g., m.group(1)).
import re
lst = ['Therefore', 'allowance', '(#)', 't(o)o', 'perfectly', 'gentleman', '(##)', 'su(p)posing', 'man', 'his', 'now']
def remove_paren(m):
return m.group(0) if m.group(1) in ('#', '##') else m.group(1)
output = [re.sub(r"\((.*?)\)", remove_paren, word) for word in lst]
print(output) # ['Therefore', 'allowance', '(#)', 'too', 'perfectly', 'gentleman', '(##)', 'supposing', 'man', 'his', 'now']
def removeparanthesis(s):
a=''
for i in s:
if i not in '()':
a+=i
return a
a = ['Therefore', 'allowance' , '(#)' , 't(o)o' , 'perfectly' , 'gentleman' , '(##)' , 'su(p)posing', 'man', 'his', 'now']
b=[]
for i in a:
if i == '(#)' or i == '(##)':
b.append(i)
else:
b.append(removeparanthesis(i))
print(b)
#I just created a function to remove parenthesis to those with not having them as a start and end
Give this a try!
Here, I define another empty array. And by looping in the original array to append the words again except the ones that we don't need.
At first, as you can see we got two loops. In the second one, we loop through each character and whenever we encounter a ( or ) we skip it and continue appending our string word.
If you notice that; to keep the (#) and (##) we skip the second loop but do not forget to add them again to the new list.
li = ["Therefore", "allowance", "(#)", "t(o)o" , "perfectly", "gentleman", "(##)", "su(p)posing", "man", "his", "now"]
new_li = []
for index, w in enumerate(li):
if w in ["(#)", "(##)"]:
new_li.append(w)
continue
new_word = ""
for c in w:
if c == "(" or c == ")":
continue
new_word = new_word + c
new_li.append(new_word)
print(new_li)
Related
I was trying to create a program that removes all sorts of punctuation from a given input sentence. The code looked somewhat like this
from string import punctuation
sent = str(input())
def rempunc(string):
for i in string:
word =''
list = [0]
if i in punctuation:
x = string.index(i)
word += string[list[-1]:x]+' '
list.append(x)
list_2 = word.split(' ')
return list_2
print(rempunc(sent))
However the output is coming out as follows:
This state ment has # 1 ! punc.
['This', 'state', 'ment', 'has', '#', '1', '!', 'punc', '']
Why isn't the punctuation being removed entirely? Am I missing something in the code?
I tried changing x with x-1 in line 7 but it did not help. Now I'm stuck and don't know what else to try.
Repeated string slicing isn't necessary here.
I would suggest using filter() to filter out the undesired characters for each word, and then reading that result into a list comprehension. From there, you can use a second filter() operation to remove the empty strings:
from string import punctuation
def remove_punctuation(s):
cleaned_words = [''.join(filter(lambda x: x not in punctuation, word))
for word in s.split()]
return list(filter(lambda x: x != "", cleaned_words))
print(remove_punctuation(input()))
This outputs:
['This', 'state', 'ment', 'has', '1', 'punc']
I wanna return a list of words containing a letter disregarding its case.
Say if i have sentence = "Anyone who has never made a mistake has never tried anything new", then f(sentence, a) would return
['Anyone', 'has', 'made', 'a', 'mistake', 'has', 'anything']
This is what i have
import re
def f(string, match):
string_list = string.split()
match_list = []
for word in string_list:
if match in word:
match_list.append(word)
return match_list
You don't need re. Use str.casefold:
[w for w in sentence.split() if "a" in w.casefold()]
Output:
['Anyone', 'has', 'made', 'a', 'mistake', 'has', 'anything']
You can use string splitting for it, if there is not punctuation.
match_list = [s for s in sentence.split(' ') if 'a' in s.lower()]
Here's another variation :
sentence = 'Anyone who has never made a mistake has never tried anything new'
def f (string, match) :
match_list = []
for word in string.split () :
if match in word.lower ():
match_list.append (word)
return match_list
print (f (sentence, 'a'))
I want to write Python codes to search the matching items from "word" using the "letter" list.
I created 2 lists as follow - word & letter:
word = ['hello', 'how', 'are', 'you', 'potato']
letter = ['how', 'ell', 'aaa', 'bbb', 'tat', 're']
What I want to get is the following output as a list, with the mapping result between "word" and "letter" list.
If a complete string is matched, the result will return as "True".
If a partial string is matched, the result will return as "True".
If nil part of the string is matched, the result will return as "False.
word_result = ['True', 'True', 'True', 'False', 'True']
I tried on my own using for loop / if...else / import re, but could not get the result as what I want.
Can anyone give a hand to assist?
Thank you so much!
I tested using the below coding but not work:
word = ['hello', 'how', 'are', 'you', 'potato']
letters = ['ell', 'how', 'aaa', 'bbb', 'tat', 're']
def check_match():
for l in letters:
if l in word:
print(l)
print(check_match())
Expect result:
word_result = ['True', 'True', 'True', 'False', 'True']
Use list comprehension with any:
word = ['hello', 'how', 'are', 'you', 'potato']
letters = ['ell', 'how', 'aaa', 'bbb', 'tat']
def check_match():
return [any(x in i for x in letters) for i in word]
print(check_match())
Output:
[True, True, False, False, True]
I am not sure if your expected output is right.
Here is code that should work:
def check_match():
res = []
for l in letters:
for w in word:
if l in w:
res.append(True)
break
else:
res.append(False)
return res
output:
[True, True, False, False, True]
EDIT:
Now I get your question... First of all, you are shown different input:
word = ['hello', 'how', 'are', 'you', 'potato']
letter = ['how', 'ell', 'aaa', 'bbb', 'tat', 're']
Then used different:
word = ['hello', 'how', 'are', 'you', 'potato']
letters = ['ell', 'how', 'aaa', 'bbb', 'tat']
And the second of all, you want to check if word from word has some corresponding letter from letters. All you need to do is switch loops:
def check_match():
res = []
for w in word:
for l in letters:
if l in w:
res.append(True)
break
else:
res.append(False)
return res
output:
[True, True, True, False, True]
You do not have a precise definition of partial match, which makes it difficult to answer your question in full.
However, we can confine this (missing) piece of wisdom into a function:
def partial_match(word, letter):
...
The rest of the logic can be easily written with either a nested loop:
words = ['hello', 'how', 'are', 'you', 'potato']
letters = ['ell', 'how', 'aaa', 'bbb', 'tat']
results = []
for word in words:
result = False
for letter in letters:
if word == letter or partial_match(word, letter):
result = True
break
results.append(result)
The inner part of the loop is a common design pattern in programming, and Python offers a shortcut for this using the any() primitive and rewriting the inner loop as a comprehension:
results = []
for word in words:
result = any(
word == letter or partial_match(word, letter)
for letter in letters)
results.append(result)
or even more compact, rewriting both loops are comprehensions:
results = [
any(
word == letter or partial_match(word, letter)
for letter in letters)
for word in words]
Now let us focus to partial_match(), if all you want is to make sure that letter is contained in word, e.g.:
partial_match('how', 'ow') == True
partial_match('how', 'ho') == True
partial_match('how', 'o') == True
partial_match('how', 'oww') == False
partial_match('how', 'wow') == False
partial_match('how', 'hoe') == False
partial_match('how', 'xxx') == False
Then you can simply use:
def partial_match(word, letter):
return letter in word
and, noticing that the exact match (described by word == letter) will also satisfy the partial_match() you end up #U10-Forward's answer by omitting the == check, inlining partial_match() and a couple of renames.
If your partial_match() should be different, all of the above is still valid, and you just need to refine that function.
splitText(text) where text is a string and return the list of the words by splitting the string text.
See example below:
sampleText = "As Python's creator, I'd like to say a few words about its origins.”
splitText(sampleText)
['As', 'Python', 's', 'creator', 'I', 'd', 'like', 'to', 'say', 'a', 'few', 'words', 'about', 'its', 'origins']
You must NOT use the method split() from the str type, however other methods >from the class are allowed. You must not use python library such as string.py.
This is my code:
def split(text):
final_lst = ""
length = len(text)
for x in range(length):
if text[x].isalpha() == True:
final_lst = final_lst + text[x]
else:
final_lst = final_lst + ", "
final_len = len(final_lst)
for a in range(final_len):
if final_lst[:a] == " " or final_lst[:a] == "":
final_lst = "'" + final_lst[a]
if final_lst[a:] == " " or final_lst[a:] == ", ":
final_lst = final_lst[a] + "'"
elif final_lst[:a].isalpha() or final_lst[a:].isalpha():
final_lst[a]
print(final_lst)
split(sampleText)
When I run it I get this:
'A
I've tried lots of things to try and solve.
First of all, your function name is wrong. You have split(text) and the exercise specifically calls for splitText(text). If your class is graded automatically, for example by a program that just loads your code and tries to run splitText(), you'll fail.
Next, this would be a good time for you to learn that a string is an iterable object in Python. You don't have to use an index - just iterate through the characters directly.
for ch in text:
Next, as #Evert pointed out, you are trying to build a list, not a string. So use the correct Python syntax:
final_list = []
Next, let's think about how you can process one character at a time and get this done. When you see a character, you can determine whether it is, or is not, an alphabetic character. You need one more piece of information: what were you doing before?
If you are in a "word", and you get "more word", you can just append it.
If you are in a "word", and you get "not a word", you have reached the end of the word and should add it to your list.
If you are in "not a word", and you get "not a word", you can just ignore it.
If you are in "not a word", and you get "word", that's the start of a new word.
Now, how can you tell whether you are in a word or not? Simple. Keep a word variable.
def splitText(text):
"""Split text on any non-alphabetic character, return list of words."""
final_list = []
word = ''
for ch in text:
if word: # Empty string is false!
if ch.isalpha():
word += ch
else:
final_list.append(word)
word = ''
else:
if ch.isalpha():
word += ch
else:
# still not alpha.
pass
# Handle end-of-text with word still going
if word:
final_list.append(word)
return final_list
sampleText = "As Python's creator, I'd like to say a few words about its origins."
print(splitText(sampleText))
Output is:
['As', 'Python', 's', 'creator', 'I', 'd', 'like', 'to', 'say', 'a', 'few', 'words', 'about', 'its', 'origins']
Next, if you sit and stare at it for a while you'll realize that you can combine some of the cases. It boils down nicely- try turning it inside out by moving the outer if to the inside, and see what you get.
To me, it looks like you are complicating things too much, basically all you need to do is to go through the text char by char, and combining them to words, once you find empty space you separate it and add it to the result array. After you run out of text you just return the array.
def splittext(text):
result = []
word = ""
for i in text:
if i != " ":
word += i
else:
result.append(word)
word = ""
result.append(word)
return result
This should work:
smapleText = 'As Python\'s creator, I\'d like to say a few words about its origins.'
def split(text):
result =[]
temp=""
length = len(text)
for x in range(length):
if text[x].isalpha():
temp = temp+text[x]
else:
result.append(temp)
temp=""
print result
split(smapleText)
Can you cheat with regular expressions?
import re
sampleText = "As Python's creator, I'd like to say a few words about its origins."
result = re.findall(r'\w+', sampleText)
>>> result
['As', 'Python', 's', 'creator', 'I', 'd', 'like', 'to', 'say', 'a', 'few', 'words', 'about', 'its', 'origins']
def stringSplitter(string):
words = []
current_word = ""
for x in range(len(string)):
if string[x] == " ":
words.append(current_word)
current_word = ""
else:
current_word += string[x]
return words
I'm trying to convert a string to a list of words using python. I want to take something like the following:
string = 'This is a string, with words!'
Then convert to something like this :
list = ['This', 'is', 'a', 'string', 'with', 'words']
Notice the omission of punctuation and spaces. What would be the fastest way of going about this?
I think this is the simplest way for anyone else stumbling on this post given the late response:
>>> string = 'This is a string, with words!'
>>> string.split()
['This', 'is', 'a', 'string,', 'with', 'words!']
Try this:
import re
mystr = 'This is a string, with words!'
wordList = re.sub("[^\w]", " ", mystr).split()
How it works:
From the docs :
re.sub(pattern, repl, string, count=0, flags=0)
Return the string obtained by replacing the leftmost non-overlapping occurrences of pattern in string by the replacement repl. If the pattern isn’t found, string is returned unchanged. repl can be a string or a function.
so in our case :
pattern is any non-alphanumeric character.
[\w] means any alphanumeric character and is equal to the character set
[a-zA-Z0-9_]
a to z, A to Z , 0 to 9 and underscore.
so we match any non-alphanumeric character and replace it with a space .
and then we split() it which splits string by space and converts it to a list
so 'hello-world'
becomes 'hello world'
with re.sub
and then ['hello' , 'world']
after split()
let me know if any doubts come up.
To do this properly is quite complex. For your research, it is known as word tokenization. You should look at NLTK if you want to see what others have done, rather than starting from scratch:
>>> import nltk
>>> paragraph = u"Hi, this is my first sentence. And this is my second."
>>> sentences = nltk.sent_tokenize(paragraph)
>>> for sentence in sentences:
... nltk.word_tokenize(sentence)
[u'Hi', u',', u'this', u'is', u'my', u'first', u'sentence', u'.']
[u'And', u'this', u'is', u'my', u'second', u'.']
The most simple way:
>>> import re
>>> string = 'This is a string, with words!'
>>> re.findall(r'\w+', string)
['This', 'is', 'a', 'string', 'with', 'words']
Using string.punctuation for completeness:
import re
import string
x = re.sub('['+string.punctuation+']', '', s).split()
This handles newlines as well.
Well, you could use
import re
list = re.sub(r'[.!,;?]', ' ', string).split()
Note that both string and list are names of builtin types, so you probably don't want to use those as your variable names.
Inspired by #mtrw's answer, but improved to strip out punctuation at word boundaries only:
import re
import string
def extract_words(s):
return [re.sub('^[{0}]+|[{0}]+$'.format(string.punctuation), '', w) for w in s.split()]
>>> str = 'This is a string, with words!'
>>> extract_words(str)
['This', 'is', 'a', 'string', 'with', 'words']
>>> str = '''I'm a custom-built sentence with "tricky" words like https://stackoverflow.com/.'''
>>> extract_words(str)
["I'm", 'a', 'custom-built', 'sentence', 'with', 'tricky', 'words', 'like', 'https://stackoverflow.com']
Personally, I think this is slightly cleaner than the answers provided
def split_to_words(sentence):
return list(filter(lambda w: len(w) > 0, re.split('\W+', sentence))) #Use sentence.lower(), if needed
A regular expression for words would give you the most control. You would want to carefully consider how to deal with words with dashes or apostrophes, like "I'm".
list=mystr.split(" ",mystr.count(" "))
This way you eliminate every special char outside of the alphabet:
def wordsToList(strn):
L = strn.split()
cleanL = []
abc = 'abcdefghijklmnopqrstuvwxyz'
ABC = abc.upper()
letters = abc + ABC
for e in L:
word = ''
for c in e:
if c in letters:
word += c
if word != '':
cleanL.append(word)
return cleanL
s = 'She loves you, yea yea yea! '
L = wordsToList(s)
print(L) # ['She', 'loves', 'you', 'yea', 'yea', 'yea']
I'm not sure if this is fast or optimal or even the right way to program.
def split_string(string):
return string.split()
This function will return the list of words of a given string.
In this case, if we call the function as follows,
string = 'This is a string, with words!'
split_string(string)
The return output of the function would be
['This', 'is', 'a', 'string,', 'with', 'words!']
This is from my attempt on a coding challenge that can't use regex,
outputList = "".join((c if c.isalnum() or c=="'" else ' ') for c in inputStr ).split(' ')
The role of apostrophe seems interesting.
Probably not very elegant, but at least you know what's going on.
my_str = "Simple sample, test! is, olny".lower()
my_lst =[]
temp=""
len_my_str = len(my_str)
number_letter_in_data=0
list_words_number=0
for number_letter_in_data in range(0, len_my_str, 1):
if my_str[number_letter_in_data] in [',', '.', '!', '(', ')', ':', ';', '-']:
pass
else:
if my_str[number_letter_in_data] in [' ']:
#if you want longer than 3 char words
if len(temp)>3:
list_words_number +=1
my_lst.append(temp)
temp=""
else:
pass
else:
temp = temp+my_str[number_letter_in_data]
my_lst.append(temp)
print(my_lst)
You can try and do this:
tryTrans = string.maketrans(",!", " ")
str = "This is a string, with words!"
str = str.translate(tryTrans)
listOfWords = str.split()