I need to replace "!##$%^&*()\n{}[]()_-+=<>?\xa0;'/.," with a blank. I am using replace method but it seems it is deprecated on python 3.6. word_list = [] is a list which will have all the words extracted from the webpage. Then clean_up_list method will clean the symbols and replace them with blank space.
I used for to loop through the length of symbols and replace symbols with blank. I used
word = word.replace(symbols[i],"") ; Any help on how to use the replace method so that symbols are replaced and words are printed without symbols between them.
Error:
AttributeError: 'list' object has no attribute 'replace'
My Code:
url = urllib.request.urlopen("https://www.servicenow.com/solutions-by-category.html").read()
word_list = []
soup = bs.BeautifulSoup(url,'lxml')
word_list.append([element.get_text() for element in soup.select('a')])
print(word_list)
def clean_up_list(word_list):
clean_word_list = []
for word in word_list:
symbols = "!##$%^&*()\n{}[]()_-+=<>?\xa0;'/.,"
for i in range(0,len(symbols)):
word = word.replace(symbols[i],"")
#print(type(word))
#print(type(word))
#word.replace(symbols[i]," ")
if(len(word) > 0):
#print(word)
clean_word_list.append(word)
There are two errors here: first you do not construct a list of strings, but a list of lists of strings. This line:
word_list.append([element.get_text() for element in soup.select('a')])
should be:
word_list.extend([element.get_text() for element in soup.select('a')])
Furthermore you cannot call replace on the list directly (it is not a method of a list object). You need to this for every entry.
Next you also specify (correctly) than you then have to call replace(..) for every character in the symbols string. Which is of course inefficient. You can however use translate(..) for that.
So you can replace the entire for loop with with list comprehension:
symbols = "!##$%^&*()\n{}[]()_-+=<>?\xa0;'/.,"
clean_word_list = [word.translate(None,symbols) for word in word_list]
Try explicitly converting the word to a string, as the error code you're receiving mentions the object is a 'list' not string and that the replace method cannot be called on lists. For example (notice the second to last line):
def clean_up_list(word_list):
clean_word_list = []
for word in word_list:
word = str(word)
symbols = "!##$%^&*()\n{}[]()_-+=<>?\xa0;'/.,"
Related
I'm trying to remove punctuations from a tokenized text in python like so:
word_tokens = ntlk.tokenize(text)
w = word_tokens
for e in word_tokens:
if e in punctuation_marks:
w.remove(e)
This works somewhat, I manage to remove a lot of the punctuation marks but for some reason a lot of the punctuation marks in word_tokens are still left.
If I run the code another time, it again removes some more of the punctuations. After running the same code 3 times all the marks are removed. Why does this happen?
It doesn't seem to matter whether punctuation_marks is a list, a string or a dictionary. I've also tried to iterate over word_tokens.copy() which does a bit better, it almost removes all marks the first time, and all the second time.
Is there a simple way to fix this problem so that it is sufficient to run the code only once?
You are removing elements from the same list that you are iterating. It seems that you are aware of the potential problem, that's why you added the line:
w = word_tokens
However, that line doesn't actually create a copy of the object referenced by word_tokens, it only makes w reference the same object. In order to create a copy you can use the slicing operator, replacing the above line by:
w = word_tokens[:]
Why don't you add tokens that are not punctuations instead?
word_tokens = ntlk.tokenize(text)
w = list()
for e in word_tokens:
if e not in punctuation_marks:
w.append(e)
Suggestions:
I see you are creating words tokens. If that's the case I would suggest you remove punctuations before tokenizing the text. You may use the translate function (under string library) that is already available.
# Import the library
import string
# Initialize the translate to remove punctuations
tr = str.maketrans("", "", string.punctuation)
# Remove punctuations
text = text.translate(tr)
# Get the word tokens
word_tokens = ntlk.tokenize(text)
If you want to do sentence tokenization, then you may do something like the below:
from nltk.tokenize import sent_tokenize
texts = sent_tokenize(text)
for i in range(0, len(texts))
texts[i] = texts[i].translate(tr)
I suggest you try regex and append your results to a new list and not directly manipulating the word_tokens's one:
word_tokens = ntlk.tokenize(text)
w_ = list()
for e in word_tokens:
w_.append(re.sub('[.!?\\-]', e))
You are modifying the the actual word_tokens, which is wrong.
For instance, say you have something like A?!B where it's indexed as: A:0, ?:1, !:2, B:3. Your for loop has a counter (say i) that increase at each loop. Say you remove the ? (Means i=1) that makes the array indexes shift back (New indexes are: A:0, !:1, B:2) and your counter increments (i=2). So you missed the ! character here!
Best not to mess with the original string and simply copy to a new one.
Given a string, I have to reverse every word, but keeping them in their places.
I tried:
def backward_string_by_word(text):
for word in text.split():
text = text.replace(word, word[::-1])
return text
But if I have the string Ciao oaiC, when it try to reverse the second word, it's identical to the first after beeing already reversed, so it replaces it again. How can I avoid this?
You can use join in one line plus generator expression:
text = "test abc 123"
text_reversed_words = " ".join(word[::-1] for word in text.split())
s.replace(x, y) is not the correct method to use here:
It does two things:
find x in s
replace it with y
But you do not really find anything here, since you already have the word you want to replace. The problem with that is that it starts searching for x from the beginning at the string each time, not at the position you are currently at, so it finds the word you have already replaced, not the one you want to replace next.
The simplest solution is to collect the reversed words in a list, and then build a new string out of this list by concatenating all reversed words. You can concatenate a list of strings and separate them with spaces by using ' '.join().
def backward_string_by_word(text):
reversed_words = []
for word in text.split():
reversed_words.append(word[::-1])
return ' '.join(reversed_words)
If you have understood this, you can also write it more concisely by skipping the intermediate list with a generator expression:
def backward_string_by_word(text):
return ' '.join(word[::-1] for word in text.split())
Splitting a string converts it to a list. You can just reassign each value of that list to the reverse of that item. See below:
text = "The cat tac in the hat"
def backwards(text):
split_word = text.split()
for i in range(len(split_word)):
split_word[i] = split_word[i][::-1]
return ' '.join(split_word)
print(backwards(text))
Here is my code -
sentence = input("Enter a sentence without punctuation")
sentence = sentence.lower()
words = sentence.split()
pos = [words.index(s)+1 for s in words]
hi = print("This sentence can be recreated from positions", pos)
print(hi)
saveFile = open("exampleFile.txt" , "w")
saveFile.write(hi)
saveFile.close()
However i get the error - TypeError: write() argument must be str, not None
and im not sure how to fix it
write('+'.join([str(x) for x in pos])) should work for you.
Replace the + with whatever delimiter you want.
Similar to your original code line [words.index(s)+1 for s in words] this list comprehension is a short form of a loop.
It takes every element in pos, names it x and applies the function str(x). The result of str(x) is then added to a new list.
So [1234] is converted to a new list ['1','2','3','4'].
Finally '+'.join(new list) joins all elements using '+' as delimiter.
So we end up with the string 1+2+3+4.
Note how this seems the same as above, but now it's characters, not numbers anymore.
The string is than the final parameter so python 'sees' write('1+2+3+4').
def title_case(title, minor_words = 0):
title = title.lower().split(" ")
title_change = []
temp = []
if minor_words != 0 :
minor_words = minor_words.lower().split(" ")
for i in range(len(title)):
if (i != 0 and title[i] not in minor_words) or (i == 0 and title[i] in minor_words):
temp = list(title[i].lower())
temp[0] = temp[0].upper()
title_change.append("".join(temp))
else:
title_change.append(title[i])
temp = []
else:
for i in range(len(title)):
temp = list(title[i])
temp[0] = temp[0].upper()
title_change.append("".join(temp))
temp = []
return " ".join(title_change)
Hello,this is my python code here.
This is the question:
A string is considered to be in title case if each word in the string is either (a) capitalised (that is, only the first letter of the word is in upper case) or (b) considered to be an exception and put entirely into lower case unless it is the first word, which is always capitalised.
Write a function that will convert a string into title case, given an optional list of exceptions (minor words). The list of minor words will be given as a string with each word separated by a space. Your function should ignore the case of the minor words string -- it should behave in the same way even if the case of the minor word string is changed.
I am trying not to use capitalize() to do this.It seems my code works fine on my computer,but the code wars just prompted "IndexError: list index out of range".
Your code will break if title has leading or trailing spaces, or two consecutive spaces, such as "foo bar". It will also break on an empty string. That's because title.lower().split(" ") on any of those kinds of titles will give you an empty string as one of your "words", and then temp[0] will cause an IndexError later on.
You can avoid the issue by using split() with no argument. It will split on any kind of whitespace in any combinations. Multiple spaces will be treated just like one space, and leading or trailing whitespace will be ignored. An empty string will become an empty list when split is called, rather than a list with one empty string in it.
Just as a supplement to #Blckknght's explanation, here is an illuminating console session that steps through what's happening to your variable.
>>> title = ''
>>> title = title.lower().split(' ')
>>> title
['']
>>> temp = list(title[0])
>>> temp
[]
>>> temp[0]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: list index out of range
I tried your solution on other (non-whitespace) inputs, and it works fine.
This is a homework question. I need to define a function that takes a word and letter and deletes all occurrences of that letter in the word. I can't use stuff like regex or the string library. I've tried...
def delete(word,letter):
word = []
char = ""
if char != letter:
word+=char
return word
and
def delete(word,letter):
word = []
char = ""
if char != letter: #I also tried "if char not letter" for both
word = word.append(char)
return word
Both don't give any output. What am I doing wrong?
Well, look at your functions closely:
def delete(word,letter):
word = []
char = ""
if char != letter:
word+=char # or `word = word.append(char)` in 2nd version
return word
So, the function gets a word and a letter passed in. The first thing you do is throw away the word, because you are overwriting the local variable with a different value (a new empty list). Next, you are initializing an empty string char and compare its content (it’s empty) with the passed letter. If they are not equal, i.e. if letter is not an empty string, the empty string in char is added to the (empty list) word. And then word is returned.
Also note that you cannot add a string to a list. The + operation on lists is only implemented to combine two lists, so your append version is definitelly less wrong. Given that you want a string as a result, it makes more sense to just store the result as one to begin with.
Instead of adding an empty string to an empty string/list when something completely unrelated to the passed word happens, what you rather want to do is keep the original word intact and somehow look at each character. You basically want to loop through the word and keep all characters that are not the passed letter; something like this:
def delete(word, letter):
newWord = '' # let's not overwrite the passed word
for char in word:
# `char` is now each character of the original word.
# Here you now need to decide if you want to keep the
# character for `newWord` or not.
return newWord
The for var in something will basically take the sequence something and execute the loop body for each value of that sequence, identified using the variable var. Strings are sequences of characters, so the loop variable will contain a single character and the loop body is executed for each character within the string.
You're not doing anything with word passed to your function. Ultimately, you need to iterate over the word passed into your function (for character in word: doSomething_with_character) and build your output from that.
def delete(word, ch):
return filter(lambda c: c != ch, word)
Basically, just a linear pass over the string, dropping out letters that match ch.
filter takes a higher order function and an iterable. A string is an iterable and iterating over it iterates over the characters it contains. filter removes the elements from the iterable for which the higher order function returns False.
In this case, we filter out all characters that are equal to the passed ch argument.
I like the functional style #TC1 and #user2041448 that is worth understanding. Here's another implementation:
def delete( letter, string ):
s2 = []
for c in string:
if c!=letter:
s2.append( c )
return ''.join(s2)
Your first function uses + operator with a list which probably isn't the most appropriate choice. The + operator should probably be reserved for strings (and use .append() function with lists).
If the intent is to return a string, assign "" instead of [], and use + operators.
If the intent is to return a list of characters assign [], and use .append() function.
Change the name of the variable you are using to construct the returned value.
Assigning anything to word gets rid of the content that was given to the function as an argument.
so make it result=[] OR result="" etc..
ALSO:
the way you seem to be attempting to solve this requires you to loop over the characters in the original string, the code you posted does not loop at all.
you could use a for loop with this type of semantic:
for characterVar in stringVar:
controlled-code-here
code-after-loop
you can/should change the names of course, but i named them in a way that should help you understand. In your case stringVar would be replaced with word and you would append or add characterVar to result if it isn't the deleted character. Any code that you wish to be contained in the loop must be indented. the first unindented line following the control line indicates to python that the code comes AFTER the loop.
This is what I came up with:
def delete(word, letter):
new_word = ""
for i in word:
if i != letter:
new_word += i
return new_word