Okay, so I have the following little function:
def swap(inp):
inp = inp.split()
out = ""
for item in inp:
ind = inp.index(item)
item = item.replace("i am", "you are")
item = item.replace("you are", "I am")
item = item.replace("i'm", "you're")
item = item.replace("you're", "I'm")
item = item.replace("my", "your")
item = item.replace("your", "my")
item = item.replace("you", "I")
item = item.replace("my", "your")
item = item.replace("i", "you")
inp[ind] = item
for item in inp:
ind = inp.index(item)
item = item + " "
inp[ind] = item
return out.join(inp)
Which, while it's not particularly efficient gets the job done for shorter sentences. Basically, all it does is swaps pronoun etc. perspectives. This is fine when I throw a string like "I love you" at it, it returns "you love me" but when I throw something like:
you love your version of my couch because I love you, and you're a couch-lover.
I get:
I love your versyouon of your couch because I love I, and I'm a couch-lover.
I'm confused as to why this is happening. I explicitly split the string into a list to avoid this. Why would it be able to detect it as being a part of a list item, rather than just an exact match?
Also, slightly deviating to avoid having to post another question so similar; if a solution to this breaks this function, what will happen to commas, full stops, other punctuation?
It made some very surprising mistakes. My expected output is:
I love my version of your couch because you love I, and I'm a couch-lover.
The reason I formatted it like this, is because I eventually hope to be able to replace the item.replace(x, y) variables with words in a database.
For this specific problem you need regular expressions. Basically, along the lines of:
table = [
("I am", "you are"),
("I'm", "you're"),
("my", "your"),
("I", "you"),
]
import re
def swap(s):
dct = dict(table)
dct.update((y, x) for x, y in table)
return re.sub(
'|'.join(r'(?:\b%s\b)' % x for x in dct),
lambda m: dct[m.group(0)],
s)
print swap("you love your version of my couch because I love you, and you're a couch-lover.")
# I love my version of your couch because you love I, and I'm a couch-lover.
But in general, natural language processing by the means of string/re functions is naive at best (note "you love I" above).
Heres a simple code:
def swap(inp):
inp = inp.split()
out = []
d1 = ['i am', 'you are', 'i\'m', 'you\'re', 'my', 'your', 'I', 'my', 'you']
d2 = ['you are', 'I am', 'you\'re', 'I\'m', 'your', 'my', 'you', 'your', 'I']
for item in inp:
itm = item.replace(',','')
if itm not in d1:
out.append(item)
else: out.append(d2[d1.index(itm)])
return ' '.join(out)
print(swap('you love your version of my couch because I love you, and you\'re a couch-lover.'))
The problem is that both index() and replace() works with substrings (in your case, sub-words).
Take a look at my answer to another question: String replacement with dictionary, complications with punctuation
The code in that answer can be used to solve your problem.
Related
I'm looking to count whether eth or btc appear in listy
searchterms = ['btc', 'eth']
listy = ['Hello, my name is btc', 'hello, my name is ETH,', 'i love eth', '#eth is great', 'barbaric tsar', 'nothing']
cnt = round((sum(len(re.findall('eth', x.lower())) for x in listy)/(len(listy)))*100)
print(f'{cnt}%')
The solution only looks for eth. How do I look for multiple search terms?
Bottom line: I have a list of search terms in searchterms. I'd like to see if any of those appear in listy. From there, I can perform a percentage of how many of those terms appear in the list.
you need to use the pipe "|" betwwen the values you want to search. In your code change re.findall('eth', x.lower() by re.findall(r"(eth)|(btc)", x.lower()
listy = ['Hello, my name is btc', 'hello, my name is ETH,', 'i love eth', '#eth is great', 'barbaric tsar', 'nothing']
cnt = round((sum(len(re.findall(r"(eth)|(btc)", x.lower())) for x in listy)/(len(listy)))*100)
print(f'{cnt}%')
67%
I would say instead of complicating the problem and using re, use a simple classic list comprehension.
listy = ['Hello, my name is btc', 'hello, my name is ETH,', 'i love eth', '#eth is great', 'barbaric tsar', 'nothing']
print(len([i for i in listy if 'btc' in i.lower() or 'eth' in i.lower()]) * 100 / len(listy))
It improves the readability and the simplicity of the code.
Let me know if it helps!
a bit more of code is giving a good readability. also adding additional words to search for, just needs to add them to the list search_for. for counting it uses a defaultdict.
listy = ['Hello, my name is btc', 'hello, my name is ETH,', 'i love eth', '#eth is great', 'barbaric tsar', 'nothing']
my_dict = defaultdict(int)
search_for = ['btc', 'eth']
for word in listy:
for sub in search_for:
if sub in word:
my_dict[sub] += 1
print(my_dict.items())
Consider arbitrary strings in which the character # represents a placeholder char.
For example:
"# has bought 8 apples today"
"# and # are together for 10 years"
"If # wouldn't have told me that, I would have never known that you got in touch with #.
Now, in addition I have a list: names = ['Peter', 'James', 'Claire', 'Julia']
Finally, I want to iterate over a random string from my string list and replace every # with a random element from the names list. Although, it should not happen that the same name is picked twice.
Ugly solution (Pseudo-Code):
names = some_names
list_of_arbitrary_strings = some_strings
while True:
raw_str = random.choice(list_of_arbitrary_strings)
processed_str = ""
temp_names = random.shuffle(names)
for chr in current_str:
if chr == "#":
chr = temp_names.pop()
processed_str += chr
print(processed_str)
input('Press any key to continue')
Is there something better that I could do (library calls, changing the structure of my strings, ...)?
You can use re.sub with random.shuffle:
import re, random
def rep_char(s, l):
random.shuffle(l)
l = iter(l)
return re.sub('#', lambda _:next(l), s)
names = ['Peter', 'James', 'Claire', 'Julia']
vals = ["# has bought 8 apples today", "# and # are together for 10 years", "If # wouldn't have told me that, I would have never known that you got in touch with #."]
result = [rep_char(i, names) for i in vals]
Output:
['Julia has bought 8 apples today', 'Claire and Julia are together for 10 years', "If Julia wouldn't have told me that, I would have never known that you got in touch with Claire."]
You can replace the placeholder by {} and simply use format and sample
import random
NAMES = ['Peter', 'James', 'Claire', 'Julia']
STRINGS = [
"{} has bought 8 apples today",
"{} and {} are together for 10 years",
"If {} wouldn't have told me that, I would have never known that you got in touch with {}."
]
def replace(text, names):
return text.format(*random.sample(names, text.count("{}")))
for text in STRINGS:
print(replace(text, NAMES))
I want to write a python program to test if there are any phrase can match the string using python.
string ='I love my travel all over the world'
list =['I love','my travel','all over the world']
So I want to text if there are any one of list can match that string that can print 'I love' or 'my travel','all over the world'.
any(x in string for x in list)
Or I need to use text mining to solve the problem?
Your current solution is probably the best to use in this given scenario. You could encapsulate it as a function if you wanted.
def list_in_string(slist, string):
return any(x in string for x in slist_list)
You can't do this:
if any(x in string for x in word_list)
print x
Because the any function iterates through the entire string/list, discards the x variable, and then simply returns a Boolean (True or False).
You can however, just break apart your any function so that you can get your desired output.
string ='I love traveling all over the world'
word_list =['I love','traveling','all over the world']
for x in word_list:
if x in string:
print x
This will output:
>>>
I love
traveling
all over the world
>>>
Update using string.split() :
string =['I', 'love','traveling','all', 'over', 'the', 'world']
word_list =['I love','traveling','all over the world']
count=0
for x in word_list:
for y in x.split():
if y in string:
count+=1
if count==len(x.split()) and (' ' in x) == True:
print x
count=0
This will output:
>>>
I love
all over the world
>>>
If you want a True or False returned, you can definitely use any(), for example:
>>> string = 'I love my travel all over the world'
>>> list_string =['I love',
'my travel',
'all over the world',
'Something something',
'blah']
>>> any(x for x in list_string if x in string)
True
>>>
Otherwise, you could do some simple list comprehension:
>>> string ='I love my travel all over the world'
>>> list_string =['I love',
'my travel',
'all over the world',
'Something something',
'blah']
>>> [x for x in list_string if x in string]
['I love', 'my travel', 'all over the world']
>>>
Depending on what you want returned, both of these work perfectly.
You could also probably use regular expression, but it's a little overkill for something so simple.
For completeness, one may mention the find method:
_string ='I love my travel all over the world'
_list =['I love','my travel','all over the world','spam','python']
for i in range(len(_list)):
if _string.find(_list[i]) > -1:
print _list[i]
Which outputs:
I love
my travel
all over the world
Note: this solution is not as elegant as the in usage mentioned, but may be useful if the position of the found substring is needed.
I have a list containing a list of words called words. I have a function called random_sentence which can be called using any sentence. I want to search the random sentence for any word in my list that is in the spot [0] of each list and then switch it with the corresponding word in that list. Hope that makes sense.
words = [["I", "you"], ["i", "you"], ["we", "you"], ["my", "your"], ["our", "your"]]
def random_sentence(sentence):
list = sentence.split()
string = sentence
for y in list:
for i in words:
for u in i:
if y == u:
mylist = i[1]
string = string.replace(y, mylist)
return string
So random_sentence("I have a my pet dog")
should return "you have your pet dog".
My function works some times, but other times it does not.
Say random_sentence("I went and we")
produces "you yount and you" does not make sense.
How do I fix my function to produce the right outcome?
First, your code, as pasted, does not even run. You have a space instead of an underscore in your function definition, and you never return anything.
But, after fixing that, your code does exactly what you describe.
To figure out why, try adding prints to see what it's doing at each step, or running it through a visualizer, like this one.
When you get to the point where y is "we", you'll end up doing this:
string = string.replace("we", "you")
But that will replace every we in string, including the one in went.
If you want to do things this way, you probably want to modify each y in list, and then join them back together at the end, like this:
def random_sentence(sentence):
list = sentence.split()
for index, y in enumerate(list):
for i in words:
for u in i:
if y == u:
mylist = i[1]
list[index] = mylist
return ' '.join(list)
If you find this hard to understand… well, so do I. All of your variable names are either a single letter, or a misleading name (like mylist, which isn't even a list). Also, you're looking over i when you really only want to check the first element. See if this is more readable:
replacements = [["I", "you"], ["i", "you"], ["we", "you"], ["my", "your"], ["our", "your"]]
def random_sentence(sentence):
words = sentence.split()
for index, word in enumerate(words):
for replacement in replacements:
if word == replacement[0]:
words[index] = replacement[1]
return ' '.join(words)
However, there's a much better way to solve this problem.
First, instead of having a list of word-replacement pairs, just use a dictionary. Then you can get rid of a whole loop and make it much easier to read (and faster, too):
replacements = {"I": "you", "i": "you", "we": "you", "my": "your", "our": "your"}
def random_sentence(sentence):
words = sentence.split()
for index, word in enumerate(words):
replacement = replacements.get(word, word)
words[index] = replacement
return ' '.join(words)
And then, instead of trying to modify the original list in place, just build up a new one:
def random_sentence(sentence):
result = []
for word in sentence.split():
result.append(replacements.get(word, word))
return ' '.join(result)
Then, this result = [], for …: result.append(…) is exactly what a list comprehension is for:
def random_sentence(sentence):
result = [replacements.get(word, word) for word in sentence.split()]
return ' '.join(result)
… or, since you don't actually need the list for any purpose but to serve it to join, you can use a generator expression instead:
def random_sentence(sentence):
return ' '.join(replacements.get(word, word) for word in sentence.split())
A Dictionary/map makes more sense here, not an array of arrays. Define your dictionary words as:
words = {"I":"you", "i":"you", "we":"you","my":"your","our":"your"}
And then, use it as:
def randomsentence(text):
result = []
for word in text.split():
if word in words: #Check if the current word exists in our dictionary
result.append(words[word]) #Append the value against the word
else:
result.append(word)
return " ".join(result)
OUTPUT:
>>> randomsentence("I have a my pet dog")
'you have a your pet dog'
>>> words = {'I': 'you', 'i': 'you', 'we': 'you', 'my': 'your', 'our': 'your'}
>>> def random_sentence(sentence):
return ' '.join([words.get(word, word) for word in sentence.split()])
>>> random_sentence('I have a my pet dog')
'you have a your pet dog'
The problem is that string.replace replaces substrings that are parts of the words. You can manually build an answer like this:
def random_sentence(sentence):
list = sentence.split()
result = []
for y in list:
for i in words:
if i[0] == y:
result.append(i[1])
break
else:
result.append(y)
return " ".join(result)
Note that else corresponds to for not if.
I am trying to append a list (null) with "sentences" which have # (Hashtags) from a different list.
Currently my code is giving me a new list with length of total number of elements involved in the list and not single sentences.
The code snippet is given below
import re
old_list = ["I love #stackoverflow because #people are very #helpful!","But I dont #love hastags",
"So #what can you do","Some simple senetnece","where there is no hastags","however #one can be good"]
new_list = [ ]
for tt in range(0,len(s)):
for ui in s:
if bool(re.search(r"#(\w+)",s[tt])) == True :
njio.append(s[tt])
Please let me know how to append only the single sentence.
I am not sure what you are wanting for output, but this will preserve the original sentence along with its matching set of hashtags:
>>> import re
>>> old_list = ["I love #stackoverflow because #people are very #helpful!","But I dont #love hastags",
... "So #what can you do","Some simple senetnece","where there is no hastags","however #one can be good"]
>>> hash_regex = re.compile('#(\w+)')
>>> [(hash_regex.findall(l), l) for l in old_list]
[(['stackoverflow', 'people', 'helpful'], 'I love #stackoverflow because #people are very #helpful!'), (['love'], 'But I dont #love hastags'), (['what'], 'So #what can you do'), ([], 'Some simple senetnece'), ([], 'where there is no hastags'), (['one'], 'however #one can be good')]