I would like to pull out the locations for an inconsistently formatted data field in a Pandas dataframe. (I do not maintain the data so I cannot alter how this field is formatted.)
Running the following toy version
string2 = 'Denver.John'
if string2.find(' -'):
string2 = string2.split(' -')[0]
elif string2.find('.'):
string2 = string2.split('.')[0]
print(string2)
gives me Denver.John instead of Denver. However, if I use an if instead:
string2 = 'Denver.John'
if string2.find(' -'):
string2 = string2.split(' -')[0]
if string2.find('.'):
string2 = string2.split('.')[0]
print(string2)
I get Denver, as desired. The problem is I also have strings like 'Las.Vegas - Rudy' and I want to be able to pull out Las.Vegas in those instances so I only want to split on a period if the field does not contain the hyphen (' - ').
Why does the elif not work for Denver.John?
Because find either yields the index or -1 while -1 is valid!!!, so try using:
string2 = 'Denver.John'
if string2.find(' -') + 1:
string2 = string2.split(' -')[0]
elif string2.find('.') + 1:
string2 = string2.split('.')[0]
print(string2)
Or better like:
string2 = 'Denver.John'
if ' -' in string2:
string2 = string2.split(' -')[0]
elif '.' in string2:
string2 = string2.split('.')[0]
print(string2)
Use
if ' -' in string2
instead. The find method returns an int
find() returns the lowest index of the substring if it is found in given string. If it’s not found then it returns -1.
So in your case:
string2 = 'Denver.John'
print(string2.find(' -')) # prints -1
print(string2.find('.')) # prints 6
if string2.find(' -'):
string2 = string2.split(' -')[0]
elif string2.find('.'):
string2 = string2.split('.')[0]
print(string2)
So in your if statement you can compare the result of find with -1.
string.find returns a position of the substring, and it is -1 if it doesn't find the substring.
Thus, do the following instead:
string2 = 'Denver.John'
if string2.find(' -') >= 0:
string2 = string2.split(' -')[0]
elif string2.find('.') >= 0:
string2 = string2.split('.')[0]
print(string2)
Related
string1 = "Billie Jean"
string2 = " "
teststring = string2.split(" ")
for word in teststring:
if word in string1:
return True
return False
Can I make it so that if string2 is for example: "Baby Jean" it's true, but if it's: "ea" it returns as false?
I believe the resolution is to split up string1 into a list of words before performing an in check, as follows:
string1 = "Billie Jean"
string2 = " "
string3 = 'Baby Jean'
words1 = string1.split()
def check(string):
teststring = string.split()
for word in teststring:
# Notice that we iterate over a list of words instead of string1
if word in words1:
return True
return False
print(check(string2))
print(check(string3))
Another option would be to use the set.intersection method, which returns true if there are any shared elements between two collections (one of which is a set):
string1 = "Billie Jean"
string2 = " "
string3 = 'Baby Jean'
words1 = set(string1.split())
def check(string):
return True if words1.intersection(string.split()) else False
print(check(string2))
print(check(string3))
Outputs:
False
True
A couple of pointers:
split() by default splits on the " " character so you don't need to explicitly pass it.
Since you are checking for word being in another string, you must split string1 too.
Check the code below:
def func(string1, string2):
for word in string2.split():
if word in string1.split():
return True
return False
print(func("Billie Jean", "Baby Jean")) # True
print(func("Billie Jean", "ea")) # False
A more interesting solution would be the following:
def func(string1, string2):
# create set of words for each string and see if intersection is empty or not
return True if set(string1.split()) & set(string2.split()) else False
print(func("Billie Jean", "Baby Jean"))
print(func("Billie Jean", "ea"))
Since matching using the intersection of sets leverages hashing, this method will be a lot faster at scale. (Might not really matter to you know, but it is good to know)
PS. You might want to convert both strings to the same case (lower or upper) if you wish to match "Jean" with "jean".
Say I have two strings, string1="A B C " and string2="abc". How do combine these two strings so string1 becomes "AaBbCc"? So basically I want all the spaces in string1 to be replaced by characters in string2. I tried using two for-loops like this:
string1="A B C "
string2="abc"
for char1 in string1:
if char1==" ":
for char2 in string2:
string1.replace(char1,char2)
else:
pass
print(string1)
But that doesn't work. I'm fairly new to Python so could somebody help me? I use version Python3. Thank you in advance.
You can use iter on String2 and replace ' ' with char in String2 like below:
>>> string1 = "A B C "
>>> string2 = "abc"
>>> itrStr2 = iter(string2)
>>> ''.join(st if st!=' ' else next(itrStr2) for st in string1)
'AaBbCc'
If maybe len in two String is different you can use itertools.cycle like below:
>>> from itertools import cycle
>>> string1 = "A B C A B C "
>>> string2 = "abc"
>>> itrStr2 = cycle(string2)
>>> ''.join(st if st!=' ' else next(itrStr2) for st in string1)
'AaBbCcAaBbCc'
string1 = "A B C "
string2 = "abc"
out, repl = '', list(string2)
for s in string1:
out += s if s != " " else repl.pop(0)
print(out) #AaBbCc
i have a few strings that hold a value. i.e.
how can i loop though multiple string1,string2,string3 etc?
string1 = re.findall('qr="">(.*?)</span', str(raw[1]))
string2 = re.findall('qr="">(.*?)</span', str(raw[2]))
string3 = re.findall('qr="">(.*?)</span', str(raw[3]))
for i in x:
print(i)
i would like for it to print the value of string1,string2,string3
i have tried to store string1 - string3 in a list but with ut success.
Try Something like this perhaps
strings = []
string1 = re.findall('qr="">(.*?)</span', str(raw[1]))
string2 = re.findall('qr="">(.*?)</span', str(raw[2]))
string3 = re.findall('qr="">(.*?)</span', str(raw[3]))
strings.append(string1)
strings.append(string2)
strings.append(string3)
for i in strings:
print(i)
Say I have strings,
string1 = 'Hello how are you'
string2 = 'are you doing now?'
The result should be something like
Hello how are you doing now?
I was thinking different ways using re and string search.
(Longest common substring problem)
But is there any simple way (or library) that does this in python?
To make things clear i'll add one more set of test strings!
string1 = 'This is a nice ACADEMY'
string2 = 'DEMY you know!'
the result would be!,
'This is a nice ACADEMY you know!'
This should do:
string1 = 'Hello how are you'
string2 = 'are you doing now?'
i = 0
while not string2.startswith(string1[i:]):
i += 1
sFinal = string1[:i] + string2
OUTPUT :
>>> sFinal
'Hello how are you doing now?'
or, make it a function so that you can use it again without rewriting:
def merge(s1, s2):
i = 0
while not s2.startswith(s1[i:]):
i += 1
return s1[:i] + s2
OUTPUT :
>>> merge('Hello how are you', 'are you doing now?')
'Hello how are you doing now?'
>>> merge("This is a nice ACADEMY", "DEMY you know!")
'This is a nice ACADEMY you know!'
This should do what you want:
def overlap_concat(s1, s2):
l = min(len(s1), len(s2))
for i in range(l, 0, -1):
if s1.endswith(s2[:i]):
return s1 + s2[i:]
return s1 + s2
Examples:
>>> overlap_concat("Hello how are you", "are you doing now?")
'Hello how are you doing now?'
>>>
>>> overlap_concat("This is a nice ACADEMY", "DEMY you know!")
'This is a nice ACADEMY you know!'
>>>
Using str.endswith and enumerate:
def overlap(string1, string2):
for i, s in enumerate(string2, 1):
if string1.endswith(string2[:i]):
break
return string1 + string2[i:]
>>> overlap("Hello how are you", "are you doing now?")
'Hello how are you doing now?'
>>> overlap("This is a nice ACADEMY", "DEMY you know!")
'This is a nice ACADEMY you know!'
If you were to account for trailing special characters, you'd be wanting to employ some re based substitution.
import re
string1 = re.sub('[^\w\s]', '', string1)
Although note that this would remove all special characters in the first string.
A modification to the above function which will find the longest matching substring (instead of the shortest) involves traversing string2 in reverse.
def overlap(string1, string2):
for i in range(len(s)):
if string1.endswith(string2[:len(string2) - i]):
break
return string1 + string2[len(string2) - i:]
>>> overlap('Where did', 'did you go?')
'Where did you go?'
Other answers were great guys but it did fail for this input.
string1 = 'THE ACADEMY has'
string2= '.CADEMY has taken'
output:
>>> merge(string1,string2)
'THE ACADEMY has.CADEMY has taken'
>>> overlap(string1,string2)
'THE ACADEMY has'
However there's this standard library difflib which proved to be effective in my case!
match = SequenceMatcher(None, string1,\
string2).find_longest_match\
(0, len(string1), 0, len(string2))
print(match) # -> Match(a=0, b=15, size=9)
print(string1[: match.a + match.size]+string2[match.b + match.size:])
output:
Match(a=5, b=1, size=10)
THE ACADEMY has taken
which words you want to replace are appearing in the second string so you can try something like :
new_string=[string2.split()]
new=[]
new1=[j for item in new_string for j in item if j not in string1]
new1.insert(0,string1)
print(" ".join(new1))
with the first test case:
string1 = 'Hello how are you'
string2 = 'are you doing now?'
output:
Hello how are you doing now?
second test case:
string1 = 'This is a nice ACADEMY'
string2 = 'DEMY you know!'
output:
This is a nice ACADEMY you know!
Explanation :
first, we are splitting the second string so we can find which words we have to remove or replace :
new_string=[string2.split()]
second step we will check each word of this splitter string with string1 , if any word is in that string than choose only first string word , leave that word in second string :
new1=[j for item in new_string for j in item if j not in string1]
This list comprehension is same as :
new1=[]
for item in new_string:
for j in item:
if j not in string1:
new1.append(j)
last step combines both string and join the list:
new1.insert(0,string1)
print(" ".join(new1))
What is an elegant way to look for a string within another string in Python, but only if the substring is within whole words, not part of a word?
Perhaps an example will demonstrate what I mean:
string1 = "ADDLESHAW GODDARD"
string2 = "ADDLESHAW GODDARD LLP"
assert string_found(string1, string2) # this is True
string1 = "ADVANCE"
string2 = "ADVANCED BUSINESS EQUIPMENT LTD"
assert not string_found(string1, string2) # this should be False
How can I best write a function called string_found that will do what I need? I thought perhaps I could fudge it with something like this:
def string_found(string1, string2):
if string2.find(string1 + " "):
return True
return False
But that doesn't feel very elegant, and also wouldn't match string1 if it was at the end of string2. Maybe I need a regex? (argh regex fear)
You can use regular expressions and the word boundary special character \b (highlight by me):
Matches the empty string, but only at the beginning or end of a word. A word is defined as a sequence of alphanumeric or underscore characters, so the end of a word is indicated by whitespace or a non-alphanumeric, non-underscore character. Note that \b is defined as the boundary between \w and \W, so the precise set of characters deemed to be alphanumeric depends on the values of the UNICODE and LOCALE flags. Inside a character range, \b represents the backspace character, for compatibility with Python’s string literals.
def string_found(string1, string2):
if re.search(r"\b" + re.escape(string1) + r"\b", string2):
return True
return False
Demo
If word boundaries are only whitespaces for you, you could also get away with pre- and appending whitespaces to your strings:
def string_found(string1, string2):
string1 = " " + string1.strip() + " "
string2 = " " + string2.strip() + " "
return string2.find(string1)
The simplest and most pythonic way, I believe, is to break the strings down into individual words and scan for a match:
string = "My Name Is Josh"
substring = "Name"
for word in string.split():
if substring == word:
print("Match Found")
For a bonus, here's a oneliner:
any(substring == word for word in string.split())
Here's a way to do it without a regex (as requested) assuming that you want any whitespace to serve as a word separator.
import string
def find_substring(needle, haystack):
index = haystack.find(needle)
if index == -1:
return False
if index != 0 and haystack[index-1] not in string.whitespace:
return False
L = index + len(needle)
if L < len(haystack) and haystack[L] not in string.whitespace:
return False
return True
And here's some demo code (codepad is a great idea: Thanks to Felix Kling for reminding me)
I'm building off aaronasterling's answer.
The problem with the above code is that it will return false when there are multiple occurrences of needle in haystack, with the second occurrence satisfying the search criteria but not the first.
Here's my version:
def find_substring(needle, haystack):
search_start = 0
while (search_start < len(haystack)):
index = haystack.find(needle, search_start)
if index == -1:
return False
is_prefix_whitespace = (index == 0 or haystack[index-1] in string.whitespace)
search_start = index + len(needle)
is_suffix_whitespace = (search_start == len(haystack) or haystack[search_start] in string.whitespace)
if (is_prefix_whitespace and is_suffix_whitespace):
return True
return False
One approach using the re, or regex, module that should accomplish this task is:
import re
string1 = "pizza pony"
string2 = "who knows what a pizza pony is?"
search_result = re.search(r'\b' + string1 + '\W', string2)
print(search_result.group())
Excuse me REGEX fellows, but the simpler answer is:
text = "this is the esquisidiest piece never ever writen"
word = "is"
" {0} ".format(text).lower().count(" {0} ".format(word).lower())
The trick here is to add 2 spaces surrounding the 'text' and the 'word' to be searched, so you guarantee there will be returning only counts for the whole word and you don't get troubles with endings and beginnings of the 'text' searched.
Thanks for #Chris Larson's comment, I test it and updated like below:
import re
string1 = "massage"
string2 = "muscle massage gun"
try:
re.search(r'\b' + string1 + r'\W', string2).group()
print("Found word")
except AttributeError as ae:
print("Not found")
def string_found(string1,string2):
if string2 in string1 and string2[string2.index(string1)-1]=="
" and string2[string2.index(string1)+len(string1)]==" ":return True
elif string2.index(string1)+len(string1)==len(string2) and
string2[string2.index(string1)-1]==" ":return True
else:return False