How to switch text in a string? - python

I want to switch text but I always fail.
Let's say I want to switch,
I with We inx='I are We'
I tried
x=x.replace('I','We').replace('We','I')
but it is obvious that it will print I are I
Can someone help?

You can use a regex to avoid going through your string several times (Each replace go through the list) and to make it more readable ! It also works on several occurences words.
string = 'I are We, I'
import re
replacements = {'I': 'We', 'We': 'I'}
print(re.sub("I|We", lambda x: replacements[x.group()], string)) # Matching words you want to replace, and replace them using a dict
Output
"We are I, We"

You can use re.sub with function as a substitution:
In [9]: import re
In [10]: x = 'I are We'
In [11]: re.sub('I|We', lambda match: 'We' if match.group(0) == 'I' else 'I', x)
Out[11]: 'We are I'
If you need to replace more than 2 substrings you may want to create a dict like d = {'I': 'We', 'We': 'I', 'You': 'Not You'} and pick correct replacement like lambda match: d[match.group(0)]. You may also want to construct regular expression dynamically based on the replacement strings, but make sure to escape them:
In [14]: d = {'We': 'I', 'I': 'We', 'ar|e': 'am'}
In [15]: re.sub('|'.join(map(re.escape, d.keys())), lambda match: d[match.group(0)], 'We ar|e I')
Out[15]: 'I am We'

x='I are We'
x=x.replace('I','You').replace('We','I').replace('You','We')
>>> x
'We are I'

It is a bit clunky, but i tend to do something along the lines of
x='I are We'
x=x.replace('I','we')
x=x.replace('We','I')
x=x.replace('we','We')
Which can be shortened to
`x=x.replace('I','we').replace('We','I').replace('we','We')

This doesn't make use of replace, but I hope it helps:
s = "I are We"
d = {"I": "We", "We": "I"}
" ".join([d.get(x, x) for x in s.split()])
>>> 'We are I'

x='I are We'
dic = {'I':'We','We':'I'}
sol = []
for i in x.split():
if i in dic:
sol.append(dic[i])
else:
sol.append(i)
result = ' '.join(sol)
print(result)

Related

How to remove a corresponding word in a dictionary from a string?

I have a dictionary and a text:
{"love":1, "expect":2, "annoy":-2}
test="i love you, that is annoying"
I need to remove the words from the string if they appear in the dictionary. I have tried this code:
for k in dict:
if k in test:
test=test.replace(k, "")
However the result is:
i you,that is ing
And this is not what I am looking for, as it should not remove "annoy" as a part of the word, the whole word should be evaluated. How can I achieve it?
First, you should not assign names to variables that are also names of builtin in classes, such as dict.
Variable test is a string composed of characters. When you say, if k in test:, you will be testing k to see if it is a substring of test. What you want to do is break up test into a list of words and compare k against each complete word in that list. If words are separated by a single space, then they may be "split" with:
test.split(' ')
The only complication is that it will create the following list:
['i', '', 'you,', 'that', 'is', 'annoying']
Note that the third item still has a , in it. So we should first get rid of punctuation marks we might expect to find in our sentence:
test.replace('.', '').replace(',', ' ').split(' ')
Yielding:
['i', '', 'you', '', 'that', 'is', 'annoying']
The following will actually get rid of all punctuation:
import string
test.translate(str.maketrans('', '', string.punctuation))
So now our code becomes:
>>> import string
>>> d = {"love":1, "expect":2, "annoy":-2}
>>> test="i love you, that is annoying"
>>> for k in d:
... if k in test.translate(str.maketrans('', '', string.punctuation)).split(' '):
... test=test.replace(k, "")
...
>>> print(test)
i you, that is annoying
>>>
You may now find you have extra spaces in your sentence, but you can figure out how to get rid of those.
you can use this:
query = "i love you, that is annoying"
query = query.replace('.', '').replace(',', '')
my_dict = {"love": 1, "expect": 2, "annoy": -2}
querywords = query.split()
resultwords = [word for word in querywords if word.lower() not in my_dict]
result = ' '.join(resultwords)
print(result)
>> 'i you, that is annoying'
If you want to exclude all words without being key sensitive convert all keys in my_dict to lowercase:
my_dict = {k.lower(): v for k, v in my_dict.items()}

how to get a list with words that are next to a specific word in a string in python

Assuming I have a string
string = 'i am a person i believe i can fly i believe i can touch the sky'.
What I would like to do is to get all the words that are next to (from the right side) the word 'i', so in this case am, believe, can, believe, can.
How could I do that in python ? I found this but it only gives the first word, so in this case, 'am'
Simple generator method:
def get_next_words(text, match, sep=' '):
words = iter(text.split(sep))
for word in words:
if word == match:
yield next(words)
Usage:
text = 'i am a person i believe i can fly i believe i can touch the sky'
words = get_next_words(text, 'i')
for w in words:
print(w)
# am
# believe
# can
# believe
# can
You can write a regular expression to find the words after the target word:
import re
word = "i"
string = 'i am a person i believe i can fly i believe i can touch the sky'
pat = re.compile(r'\b{}\b \b(\w+)\b'.format(word))
print(pat.findall(string))
# ['am', 'believe', 'can', 'believe', 'can']
One way is to use a regular expression with a look behind assertion:
>>> import re
>>> string = 'i am a person i believe i can fly i believe i can touch the sky'
>>> re.findall(r'(?<=\bi )\w+', string)
['am', 'believe', 'can', 'believe', 'can']
You can split the string and get the next index of the word "i" as you iterate with enumerate:
string = 'i am a person i believe i can fly i believe i can touch the sky'
sl = string.split()
all_is = [sl[i + 1] for i, word in enumerate(sl[:-1]) if word == 'i']
print(all_is)
# ['am', 'believe', 'can', 'believe', 'can']
Note that as #PatrickHaugh pointed out, we want to be careful if "i" is the last word so we can exclude iterating over the last word completely.
import re
string = 'i am a person i believe i can fly i believe i can touch the sky'
words = [w.split()[0] for w in re.split('i +', string) if w]
print(words)

Translate both ways, print word if not in keys or values

I am having trouble with this one have looked up much possible solutions and can't seem to find the right one, my trouble here is I can't get the program to print the word typed in the input if the word isn't a key or value using Python 2.7
Tuc={"i":["o"],"love":["wau"],"you":["uo"],"me":["ye"],"my":["yem"],"mine":["yeme"],"are":["sia"]}
while True:
#Translates English to Tuccin and visa versa
translation = str(raw_input("Enter content for translation.\n").lower())
#this is for translating full phrases, both ways.
input_list = translation.split()
for word in input_list:
#English to Tuccin
if word in Tuc and word not in v:
print ("".join(Tuc[word]))
#Tuccin to English
for k, v in Tuc.iteritems():
if word in v and word not in Tuc:
print k
You can create a set of your keys and values with a set comprehension then check for intersection :
>>> set_list={k[0] if isinstance(k,list) else k for it in Tuc.items() for k in it}
set(['me', 'love', 'i', 'ye', 'mine', 'o', 'sia', 'yeme', 'are', 'uo', 'yem', 'wau', 'my', 'you'])
if set_list.intersection(input_list):
#do stuff
lets do this in simple way....create two dict for Tuccin to English and English to Tuccin translations.
In [28]: Tuc_1 = {k:Tuc[k][0] for k in Tuc} # this dict will help in translation from English to Tuccin
In [29]: Tuc_1
Out[29]:
{'are': 'sia',
'i': 'o',
'love': 'wau',
'me': 'ye',
'mine': 'yeme',
'my': 'yem',
'you': 'uo'}
In [30]: Tuc_2 = {Tuc[k][0]:k for k in Tuc} # this dict will help in translation from Tuccin to English
In [31]: Tuc_2
Out[31]:
{'o': 'i',
'sia': 'are',
'uo': 'you',
'wau': 'love',
'ye': 'me',
'yem': 'my',
'yeme': 'mine'}
example usage:
In [53]: translation = "I love You"
In [54]: input_list = translation.split()
In [55]: print " ".join(Tuc_1.get(x.lower()) for x in input_list if x.lower() in Tuc_1)
o wau uo
In [56]: print " ".join(Tuc_2.get(x.lower()) for x in input_list if x.lower() in Tuc_2)
In [57]: translation = "O wau uo"
In [58]: input_list = translation.split()
In [59]: print " ".join(Tuc_1.get(x.lower()) for x in input_list if x.lower() in Tuc_1)
In [60]: print " ".join(Tuc_2.get(x.lower()) for x in input_list if x.lower() in Tuc_2)
i love you
You can use the following lambda for finding translations. If the word does not exist, it will return empty.
find_translation = lambda w: [(k, v) for k, v in Tuc.items() if w==k or w in v]
Usage = find_translation(translation)
>>> find_translation('i')
[('i', ['o'])]
Edit:
Modifying the result to convert whole string.
Since you mentioned that you want to convert list of words let's take the same lambda and use it for multiple words.
line = 'i me you rubbish' # Only first three words will return something
# Let's change the lambda to either return something from Tuc or the same word back
find_translation = lambda w: ([v[0] for k, v in Tuc.items() if w==k or w in v] or [w])[0]
# Split the words and keep using find_transaction to either get a conversion or to return the same word
results_splits = [find_translation(part) for part in line.split()]
You will get the following results:
['o', 'ye', 'uo', 'rubbish']
You can put the string back together by joining results_splits
' '.join(results_splits)
And you get the translation back
'o ye uo rubbish'

Converting a String to a List of Words?

I'm trying to convert a string to a list of words using python. I want to take something like the following:
string = 'This is a string, with words!'
Then convert to something like this :
list = ['This', 'is', 'a', 'string', 'with', 'words']
Notice the omission of punctuation and spaces. What would be the fastest way of going about this?
I think this is the simplest way for anyone else stumbling on this post given the late response:
>>> string = 'This is a string, with words!'
>>> string.split()
['This', 'is', 'a', 'string,', 'with', 'words!']
Try this:
import re
mystr = 'This is a string, with words!'
wordList = re.sub("[^\w]", " ", mystr).split()
How it works:
From the docs :
re.sub(pattern, repl, string, count=0, flags=0)
Return the string obtained by replacing the leftmost non-overlapping occurrences of pattern in string by the replacement repl. If the pattern isn’t found, string is returned unchanged. repl can be a string or a function.
so in our case :
pattern is any non-alphanumeric character.
[\w] means any alphanumeric character and is equal to the character set
[a-zA-Z0-9_]
a to z, A to Z , 0 to 9 and underscore.
so we match any non-alphanumeric character and replace it with a space .
and then we split() it which splits string by space and converts it to a list
so 'hello-world'
becomes 'hello world'
with re.sub
and then ['hello' , 'world']
after split()
let me know if any doubts come up.
To do this properly is quite complex. For your research, it is known as word tokenization. You should look at NLTK if you want to see what others have done, rather than starting from scratch:
>>> import nltk
>>> paragraph = u"Hi, this is my first sentence. And this is my second."
>>> sentences = nltk.sent_tokenize(paragraph)
>>> for sentence in sentences:
... nltk.word_tokenize(sentence)
[u'Hi', u',', u'this', u'is', u'my', u'first', u'sentence', u'.']
[u'And', u'this', u'is', u'my', u'second', u'.']
The most simple way:
>>> import re
>>> string = 'This is a string, with words!'
>>> re.findall(r'\w+', string)
['This', 'is', 'a', 'string', 'with', 'words']
Using string.punctuation for completeness:
import re
import string
x = re.sub('['+string.punctuation+']', '', s).split()
This handles newlines as well.
Well, you could use
import re
list = re.sub(r'[.!,;?]', ' ', string).split()
Note that both string and list are names of builtin types, so you probably don't want to use those as your variable names.
Inspired by #mtrw's answer, but improved to strip out punctuation at word boundaries only:
import re
import string
def extract_words(s):
return [re.sub('^[{0}]+|[{0}]+$'.format(string.punctuation), '', w) for w in s.split()]
>>> str = 'This is a string, with words!'
>>> extract_words(str)
['This', 'is', 'a', 'string', 'with', 'words']
>>> str = '''I'm a custom-built sentence with "tricky" words like https://stackoverflow.com/.'''
>>> extract_words(str)
["I'm", 'a', 'custom-built', 'sentence', 'with', 'tricky', 'words', 'like', 'https://stackoverflow.com']
Personally, I think this is slightly cleaner than the answers provided
def split_to_words(sentence):
return list(filter(lambda w: len(w) > 0, re.split('\W+', sentence))) #Use sentence.lower(), if needed
A regular expression for words would give you the most control. You would want to carefully consider how to deal with words with dashes or apostrophes, like "I'm".
list=mystr.split(" ",mystr.count(" "))
This way you eliminate every special char outside of the alphabet:
def wordsToList(strn):
L = strn.split()
cleanL = []
abc = 'abcdefghijklmnopqrstuvwxyz'
ABC = abc.upper()
letters = abc + ABC
for e in L:
word = ''
for c in e:
if c in letters:
word += c
if word != '':
cleanL.append(word)
return cleanL
s = 'She loves you, yea yea yea! '
L = wordsToList(s)
print(L) # ['She', 'loves', 'you', 'yea', 'yea', 'yea']
I'm not sure if this is fast or optimal or even the right way to program.
def split_string(string):
return string.split()
This function will return the list of words of a given string.
In this case, if we call the function as follows,
string = 'This is a string, with words!'
split_string(string)
The return output of the function would be
['This', 'is', 'a', 'string,', 'with', 'words!']
This is from my attempt on a coding challenge that can't use regex,
outputList = "".join((c if c.isalnum() or c=="'" else ' ') for c in inputStr ).split(' ')
The role of apostrophe seems interesting.
Probably not very elegant, but at least you know what's going on.
my_str = "Simple sample, test! is, olny".lower()
my_lst =[]
temp=""
len_my_str = len(my_str)
number_letter_in_data=0
list_words_number=0
for number_letter_in_data in range(0, len_my_str, 1):
if my_str[number_letter_in_data] in [',', '.', '!', '(', ')', ':', ';', '-']:
pass
else:
if my_str[number_letter_in_data] in [' ']:
#if you want longer than 3 char words
if len(temp)>3:
list_words_number +=1
my_lst.append(temp)
temp=""
else:
pass
else:
temp = temp+my_str[number_letter_in_data]
my_lst.append(temp)
print(my_lst)
You can try and do this:
tryTrans = string.maketrans(",!", " ")
str = "This is a string, with words!"
str = str.translate(tryTrans)
listOfWords = str.split()

Split by comma and strip whitespace in Python

I have some python code that splits on comma, but doesn't strip the whitespace:
>>> string = "blah, lots , of , spaces, here "
>>> mylist = string.split(',')
>>> print mylist
['blah', ' lots ', ' of ', ' spaces', ' here ']
I would rather end up with whitespace removed like this:
['blah', 'lots', 'of', 'spaces', 'here']
I am aware that I could loop through the list and strip() each item but, as this is Python, I'm guessing there's a quicker, easier and more elegant way of doing it.
Use list comprehension -- simpler, and just as easy to read as a for loop.
my_string = "blah, lots , of , spaces, here "
result = [x.strip() for x in my_string.split(',')]
# result is ["blah", "lots", "of", "spaces", "here"]
See: Python docs on List Comprehension
A good 2 second explanation of list comprehension.
I came to add:
map(str.strip, string.split(','))
but saw it had already been mentioned by Jason Orendorff in a comment.
Reading Glenn Maynard's comment on the same answer suggesting list comprehensions over map I started to wonder why. I assumed he meant for performance reasons, but of course he might have meant for stylistic reasons, or something else (Glenn?).
So a quick (possibly flawed?) test on my box (Python 2.6.5 on Ubuntu 10.04) applying the three methods in a loop revealed:
$ time ./list_comprehension.py # [word.strip() for word in string.split(',')]
real 0m22.876s
$ time ./map_with_lambda.py # map(lambda s: s.strip(), string.split(','))
real 0m25.736s
$ time ./map_with_str.strip.py # map(str.strip, string.split(','))
real 0m19.428s
making map(str.strip, string.split(',')) the winner, although it seems they are all in the same ballpark.
Certainly though map (with or without a lambda) should not necessarily be ruled out for performance reasons, and for me it is at least as clear as a list comprehension.
Split using a regular expression. Note I made the case more general with leading spaces. The list comprehension is to remove the null strings at the front and back.
>>> import re
>>> string = " blah, lots , of , spaces, here "
>>> pattern = re.compile("^\s+|\s*,\s*|\s+$")
>>> print([x for x in pattern.split(string) if x])
['blah', 'lots', 'of', 'spaces', 'here']
This works even if ^\s+ doesn't match:
>>> string = "foo, bar "
>>> print([x for x in pattern.split(string) if x])
['foo', 'bar']
>>>
Here's why you need ^\s+:
>>> pattern = re.compile("\s*,\s*|\s+$")
>>> print([x for x in pattern.split(string) if x])
[' blah', 'lots', 'of', 'spaces', 'here']
See the leading spaces in blah?
Clarification: above uses the Python 3 interpreter, but results are the same in Python 2.
Just remove the white space from the string before you split it.
mylist = my_string.replace(' ','').split(',')
I know this has already been answered, but if you end doing this a lot, regular expressions may be a better way to go:
>>> import re
>>> re.sub(r'\s', '', string).split(',')
['blah', 'lots', 'of', 'spaces', 'here']
The \s matches any whitespace character, and we just replace it with an empty string ''. You can find more info here: http://docs.python.org/library/re.html#re.sub
map(lambda s: s.strip(), mylist) would be a little better than explicitly looping. Or for the whole thing at once: map(lambda s:s.strip(), string.split(','))
import re
result=[x for x in re.split(',| ',your_string) if x!='']
this works fine for me.
re (as in regular expressions) allows splitting on multiple characters at once:
$ string = "blah, lots , of , spaces, here "
$ re.split(', ',string)
['blah', 'lots ', ' of ', ' spaces', 'here ']
This doesn't work well for your example string, but works nicely for a comma-space separated list. For your example string, you can combine the re.split power to split on regex patterns to get a "split-on-this-or-that" effect.
$ re.split('[, ]',string)
['blah',
'',
'lots',
'',
'',
'',
'',
'of',
'',
'',
'',
'spaces',
'',
'here',
'']
Unfortunately, that's ugly, but a filter will do the trick:
$ filter(None, re.split('[, ]',string))
['blah', 'lots', 'of', 'spaces', 'here']
Voila!
s = 'bla, buu, jii'
sp = []
sp = s.split(',')
for st in sp:
print st
import re
mylist = [x for x in re.compile('\s*[,|\s+]\s*').split(string)]
Simply, comma or at least one white spaces with/without preceding/succeeding white spaces.
Please try!

Categories

Resources