Python: replace nth word in string - python

What is the easiest way in Python to replace the nth word in a string, assuming each word is separated by a space?
For example, if I want to replace the tenth word of a string and get the resulting string.

I guess you may do something like this:
nreplace=1
my_string="hello my friend"
words=my_string.split(" ")
words[nreplace]="your"
" ".join(words)
Here is another way of doing the replacement:
nreplace=1
words=my_string.split(" ")
" ".join([words[word_index] if word_index != nreplace else "your" for word_index in range(len(words))])

Let's say your string is:
my_string = "This is my test string."
You can split the string up using split(' ')
my_list = my_string.split()
Which will set my_list to
['This', 'is', 'my', 'test', 'string.']
You can replace the 4th list item using
my_list[3] = "new"
And then put it back together with
my_new_string = " ".join(my_list)
Giving you
"This is my new string."

A solution involving list comprehension:
text = "To be or not to be, that is the question"
replace = 6
replacement = 'it'
print ' '.join([x if index != replace else replacement for index,x in enumerate(s.split())])
The above produces:
To be or not to be, it is the question

You could use a generator expression and the string join() method:
my_string = "hello my friend"
nth = 0
new_word = 'goodbye'
print(' '.join(word if i != nth else new_word
for i, word in enumerate(my_string.split(' '))))
Output:
goodbye my friend

Through re.sub.
>>> import re
>>> my_string = "hello my friend"
>>> new_word = 'goodbye'
>>> re.sub(r'^(\s*(?:\S+\s+){0})\S+', r'\1'+new_word, my_string)
'goodbye my friend'
>>> re.sub(r'^(\s*(?:\S+\s+){1})\S+', r'\1'+new_word, my_string)
'hello goodbye friend'
>>> re.sub(r'^(\s*(?:\S+\s+){2})\S+', r'\1'+new_word, my_string)
'hello my goodbye'
Just replace the number within curly braces with the position of the word you want to replace - 1. ie, for to replace the first word, the number would be 0, for second word the number would be 1, likewise it goes on.

Related

Python - Trying to replace words in a list of strings but having problems with single letter words

I have a list of strings such as
words = ['Twinkle Twinkle', 'How I wonder']
I am trying to create a function that will find and replace words in the original list and I was able to do that except for when the user inputs single letter words such as 'I' or 'a' etc.
current function
def sub(old: string, new: string, words: list):
words[:] = [w.replace(old, new) for w in words]
if input for old = 'I'
and new = 'ASD'
current output = ['TwASDnkle TwASDnkle', 'How ASD wonder']
intended output = ['Twinkle Twinkle', 'How ASD wonder']
This is my first post here and I have only been learning python for a few months now so I would appreciate any help, thank you
Don't use str.replace in a loop. This often doesn't do what is expected as it doesn't work on words but on all matches.
Instead, split the words, replace on match and join:
l = ['Twinkle Twinkle', 'How I wonder']
def sub(old: str, new: str, words: list):
words[:] = [' '.join(new if w==old else w for w in x.split()) for x in words]
sub('I', 'ASD', l)
Output: ['Twinkle Twinkle', 'How ASD wonder']
Or use a regex with word boundaries:
import re
def sub(old, new, words):
words[:] = [re.sub(fr'\b{re.escape(old)}\b', new, w) for w in words]
l = ['Twinkle Twinkle', 'How I wonder']
sub('I', 'ASD', l)
# ['Twinkle Twinkle', 'How ASD wonder']
NB. As #re-za pointed out, it might be a better practice to return a new list rather than mutating the input, just be aware of it
It seems like you are replacing letters and not words. I recommend splitting sentences (strings) into words by splitting strings by the ' ' (space char).
output = []
I would first get each string from the list like this:
for string in words:
I would then split the strings into a list of words like this:
temp_string = '' # a temp string we will use later to reconstruct the words
for word in string.split(' '):
Then I would check to see if the word is the one we are looking for by comparing it to old, and replacing (if it matches) with new:
if word == old:
temp_string += new + ' '
else:
temp_string += word + ' '
Now that we have each word reconstructed or replaced (if needed) back into a temp_string we can put all the temp_strings back into the array like this:
output.append(temp_string[:-1]) # [:-1] means we omit the space at the end
It should finally look like this:
def sub(old: string, new: string, words: list):
output = []
for string in words:
temp_string = '' # a temp string we will use later to reconstruct the words
for word in string.split(' '):
if word == old:
temp_string += new + ' '
else:
temp_string += word + ' '
output.append(temp_string[:-1]) # [:-1] means we omit the space at the end
return output

Splitting string using different scenarios using regex

I have 2 scenarios so split a string
scenario 1:
"##$hello?? getting good.<li>hii"
I want to be split as 'hello','getting','good.<li>hii (Scenario 1)
'hello','getting','good','li,'hi' (Scenario 2)
Any ideas please??
Something like this should work:
>>> re.split(r"[^\w<>.]+", s) # or re.split(r"[##$? ]+", s)
['', 'hello', 'getting', 'good.<li>hii']
>>> re.split(r"[^\w]+", s)
['', 'hello', 'getting', 'good', 'li', 'hii']
This might be what your looking for \w+ it matches any digit or letter from 1 to n times as many times as possible. Here is a working Java-Script
var value = "##$hello?? getting good.<li>hii";
var matches = value.match(
new RegExp("\\w+", "gi")
);
console.log(matches)
It works by using \w+ which matches word characters as many times as possible. You cound also use [A-Za-b] to match only letters which not numbers. As show here.
var value = "##$hello?? getting good.<li>hii777bloop";
var matches = value.match(
new RegExp("[A-Za-z]+", "gi")
);
console.log(matches)
It matches what are in the brackets 1 to n timeas as many as possible. In this case the range a-z of lower case charactors and the range of A-Z uppder case characters. Hope this is what you want.
For first scenario just use regex to find all words that are contain word characters and <>.:
In [60]: re.findall(r'[\w<>.]+', s)
Out[60]: ['hello', 'getting', 'good.<li>hii']
For second one you need to repleace the repeated characters only if they are not valid english words, you can do this using nltk corpus, and re.sub regex:
In [61]: import nltk
In [62]: english_vocab = set(w.lower() for w in nltk.corpus.words.words())
In [63]: repeat_regexp = re.compile(r'(\w*)(\w)\2(\w*)')
In [64]: [repeat_regexp.sub(r'\1\2\3', word) if word not in english_vocab else word for word in re.findall(r'[^\W]+', s)]
Out[64]: ['hello', 'getting', 'good', 'li', 'hi']
In case you are looking for solution without regex. string.punctuation will give you list of all special characters.
Use this list with list comprehension for achieving your desired result as:
>>> import string
>>> my_string = '##$hello?? getting good.<li>hii'
>>> ''.join([(' ' if s in string.punctuation else s) for s in my_string]).split()
['hello', 'getting', 'good', 'li', 'hii'] # desired output
Explanation: Below is the step by step instruction regarding how it works:
import string # Importing the 'string' module
special_char_string = string.punctuation
# Value of 'special_char_string': '!"#$%&\'()*+,-./:;<=>?#[\\]^_`{|}~'
my_string = '##$hello?? getting good.<li>hii'
# Generating list of character in sample string with
# special character replaced with whitespace
my_list = [(' ' if item in special_char_string else item) for item in my_string]
# Join the list to form string
my_string = ''.join(my_list)
# Split it based on space
my_desired_list = my_string.strip().split()
The value of my_desired_list will be:
['hello', 'getting', 'good', 'li', 'hii']

Python check if string contains all words in Python

I want to check if all words are found in another string without any loops or iterations:
a = ['god', 'this', 'a']
sentence = "this is a god damn sentence in python"
all(a in sentence)
should return TRUE.
You could use a set depending on your exact needs as follows:
a = ['god', 'this', 'a']
sentence = "this is a god damn sentence in python"
print set(a) <= set(sentence.split())
This would print True, where <= is issubset.
It should be:
all(x in sentence for x in a)
Or:
>>> chk = list(filter(lambda x: x not in sentence, a)) #Python3, for Python2 no need to convert to list
[] #Will return empty if all words from a are in sentence
>>> if not chk:
print('All words are in sentence')

Iterating through a string word by word

I wanted to know how to iterate through a string word by word.
string = "this is a string"
for word in string:
print (word)
The above gives an output:
t
h
i
s
i
s
a
s
t
r
i
n
g
But I am looking for the following output:
this
is
a
string
When you do -
for word in string:
You are not iterating through the words in the string, you are iterating through the characters in the string. To iterate through the words, you would first need to split the string into words , using str.split() , and then iterate through that . Example -
my_string = "this is a string"
for word in my_string.split():
print (word)
Please note, str.split() , without passing any arguments splits by all whitespaces (space, multiple spaces, tab, newlines, etc).
This is one way to do it:
string = "this is a string"
ssplit = string.split()
for word in ssplit:
print (word)
Output:
this
is
a
string
for word in string.split():
print word
Using nltk.
from nltk.tokenize import sent_tokenize, word_tokenize
sentences = sent_tokenize("This is a string.")
words_in_each_sentence = word_tokenize(sentences)
You may use TweetTokenizer for parsing casual text with emoticons and such.
One way to do this is using a dictionary. The problem for the code above is it counts each letter in a string, instead of each word. To solve this problem, you should first turn the string into a list by using the split() method, and then create a variable counts each comma in the list as its own value. The code below returns each time a word appears in a string in the form of a dictionary.
s = input('Enter a string to see if strings are repeated: ')
d = dict()
p = s.split()
word = ','
for word in p:
if word not in d:
d[word] = 1
else:
d[word] += 1
print (d)
s = 'hi how are you'
l = list(map(lambda x: x,s.split()))
print(l)
Output: ['hi', 'how', 'are', 'you']
You can try this method also:
sentence_1 = "This is a string"
list = sentence_1.split()
for i in list:
print (i)

Python regex for something *not* in a dictionary?

Say I have the following dictionary:
d = {"word1":0, "word2":0}
For this regex I need to verify that a word in the string isn't a key in that dictionary.
Is it possible to set a variable to anything not in a dictionary, for the purposes of a regex?
Forget about regex in this case:
test = "word1 word2 word3" # your string
words = test.split(' ') # words in your string
dict = {"word1":0, "word2":0} # your dict
for word in words:
if word in dict:
print word, "is a key in dict"
else:
print word, "isn't a key in dict"
>>> d = {"foo":0, "spam":0}
>>> test = "This is a string with many words, including foo and bar"
>>> any(word in d for word in test.split())
True
If punctuation is a problem (for example, "This is foo." would not find foo with this approach), and since you said all your words are alphanumeric, you could also use
>>> import re
>>> test = "This is foo."
>>> any(word in d for word in re.findall("[A-Za-z0-9]+", test))

Categories

Resources