string.find indexing in Python

string.find indexing in Python - python

I'm trying to understand why the following python code incorrectly returns the string "dining":
def remove(somestring, sub):
"""Return somestring with sub removed."""
location = somestring.find(sub)
length = len(sub)
part_before = somestring[:location]
part_after = somestring[location + length:]
return part_before + part_after
print remove('ding', 'do')
I realize the way to make the code run correctly is to add an if statement so that if the location variable returns a -1 it will simply return the original string (in this case "ding"). The code, for example, should be:
def remove(somestring, sub):
"""Return somestring with sub removed."""
location = somestring.find(sub)
if location == -1:
return somestring
length = len(sub)
part_before = somestring[:location]
part_after = somestring[location + length:]
return part_before + part_after
print remove('ding', 'do')
Without using the if statement to fix the function, the part_before variable will return the string "din". I would love to know why this happens. Reading the python documentation on string.find (which is ultimately how part_before is formulated) I see that the location variable would become a -1 because "do" is NOT found. But if the part_before variable holds all letters before the -1 index, shouldn't it be blank and not "din"? What am I missing here?
For reference, Python documentation for string.find states:
string.find(s, sub[, start[, end]])
Return the lowest index in s where the substring sub is found such that sub is wholly contained in s[start:end]. Return -1 on failure. Defaults for start and end and interpretation of negative values is the same as for slices.

string = 'ding'
string[:-1]
>>> 'din'
Using a negative number as an index in python returns the nth element from the right-hand side. Accordingly, a slice with :-1 return all but the last element of the string.

If you have a string 'ding' and you are searching for 'do', str.find() will return -1. 'ding'[:-1] is equal to 'din' and 'ding'[-1 + len(sub):] equals 'ding'[1:] which is equal to 'ing'. Putting the two together results in 'dining'. To get the right answer, try something like this:
def remove(string, sub):
index = string.find(sub)
if index == -1:
return string
else:
return string[:index] + string[index + len(sub):]
The reason that string[:-1] is not equal to the whole string is that in slicing, the first number (in this case blank so equal to None, or for our purposes equivalent to 0) is inclusive, but the second number (-1) is exclusive. string[-1] is the last character, so that character is not included.

Related

What does "[1:][::-1]" mean, for example (str) a[1:][::-1]? [duplicate]

This question already has answers here:
Understanding slicing
(38 answers)
Closed last month.
This is actually a problem on leetcode, I found this part of code in the solution. But I couldn't understand how does this part of code work. The problem was just basically reverse a given integer, for example: x = 123, then the result should return 321.
This is the code:
class Solution:
def reverse(self, x: int) -> int:
s = str(x)
def check(m):
if ans > 4294967295/2 or ans < -4294967295/2:
return 0
else:
return ans
if x < 0:
ans = int('-' + s[1:][::-1])
return check(ans)
else:
ans = int(s[::-1])
return check(ans)
I'm a beginner in programming so I have never seen anything like that in a string in python.

These are Python string slices.
There are three parts to a string slice - the starting position, the end position and the step. These take the form string_obj[start_position:end_position:step].
When omitted the start position will default to the start of the string, then end position will default to the end of the string, and the step will default to 1.
The expression s[1:][::-1] is doing two slice operations.
s[1:] uses index 1 as the start position and leaves the end position blank (which defaults to the end of the string). It returns a string that's the same as s only without the first character of the string.
This new string is then fed into the second slicing operation. [::-1] has no start or end position defined so the whole string will be used, and the -1 in the step place means that the returned string will be stepped through backwards one step at a time, reversing the string.

I will explain with an example
x = 123
s = str(x) #index will be 0, 1, 2
print(s[1:])
This will print 23 as the output. ie, The index starts from 0, when you put s[1:] the index starts from 1 and goes till the end gets printed out.
s[::-1] will reverse the string. ":" means starting from the beginning and going till the ending ":" with a step of -1. This means that it will start from the ending and goes to the start as the step is -1.
In this particular problem, we need to reverse the integer. When the number becomes less than zero. -123 when reversed using string ie, s[::-1] gets converted to 321-. In order to reverse the negative integer as -321, s[1:] is used to remove "-" and converting the 123 to 321
ans = int('-' + s[1:][::-1])
When the integer value is less than 0.
The above line of code converts the -123 to -321. "-" gets neglected when reversing the string and gets added up when '-' + s[1:][::-1] this string operation works. Taking int() the string will convert "-321" to -321, ie, string to integer form.
When the integer value is above 0
We only need to reverse the integer using s[::-1]
def check(m):
if ans > 4294967295/2 or ans < -4294967295/2:
return 0
else:
return ans
This is a constraint given in the problem that the values of the reversed integer should be between -2^31 and 2^31 otherwise it should be returned 0.

string "c" is the latest item, why it is not break?

I think this code should be break. positon of "c" in string "abc" is -1, so it should break, but why didn't?
string = "abc"
print(string[-1])
while True:
position = string.find("c")
if position == -1:
break
string = string[:position] + "f" + string[position+len("c"):]
print(string)
I think this code should be break. positon of "c" in string "abc" is -1, so it should break, but why didn't?

The indexing syntax mystr[-1] gives you the last element of mystr, but that is a convenience of the indexing syntax. When you use find() you get a number counting from zero, so 2 in this case. The return value -1 means not found.
You are overgeneralizing the -1 convention: it doesn't apply to find. If it did, then string.find("c") could equally well return -1 or 2 in this example because both would be correct. That would be inconvenient, to say the least.

str.find gives a positive integer index, so position = 2 in your example.
The preferred solution is to simply test against the length of your string:
if position == len(string) - 1:
# do something
Alternatively, for the negative index, you can redefine position:
position = string.find('c') - len(string) # -1
However, be careful: if your character is not found, str.find returns -1. So there is good reason why positive integers are preferred in the first place.
See this answer for a diagram of how negative indexing works.

How do I detect any of 4 characters in a string and then return their index?

I would understand how to do this assuming that I was only looking for one specific character, but in this instance I am looking for any of the 4 operators, '+', '-', '*', '/'. The function returns -1 if there is no operator in the passed string, txt, otherwise it returns the position of the leftmost operator. So I'm thinking find() would be optimal here.
What I have so far:
def findNextOpr(txt):
# txt must be a nonempty string.
if len(txt) <= 0 or not isinstance(txt, str):
print("type error: findNextOpr")
return "type error: findNextOpr"
if '+' in txt:
return txt.find('+')
elif '-' in txt:
return txt.find('-')
else
return -1
I think if I did what I did for the '+' and '-' operators for the other operators, it wouldn't work for multiple instances of that operator in one expression. Can a loop be incorporated here?

Your current approach is not very efficient, as you will iterate over txt, multiple times, 2 (in and find()) for each operator.
You could use index() instead of find() and just ignore the ValueError exception , e.g.:
def findNextOpr(txt):
for o in '+-*/':
try:
return txt.index(o)
except ValueError:
pass
return -1
You can do this in a single (perhaps more readable) pass by enumerate()ing the txt and return if you find the character, e.g.:
def findNextOpr(txt):
for i, c in enumerate(txt):
if c in '+-*/':
return i
return -1
Note: if you wanted all of the operators you could change the return to yield, and then just iterate over the generator, e.g.:
def findNextOpr(txt):
for i, c in enumerate(txt):
if c in '+-*/':
yield i
In []:
for op in findNextOpr('1+2-3+4'):
print(op)
Out[]:
1
3
5

You can improve your code a bit because you keep looking at the string a lot of times. '+' in txt actually searches through the string just like txt.find('+') does. So you can combine those easily to avoid having to search through it twice:
pos = txt.find('+')
if pos >= 0:
return pos
But this still leaves you with the problem that this will return for the first operator you are looking for if that operator is contained anywhere within the string. So you don’t actually get the first position any of these operators is within the string.
So what you want to do is look for all operators separately, and then return the lowest non-negative number since that’s the first occurence of any of the operators within the string:
plusPos = txt.find('+')
minusPos = txt.find('-')
multPos = txt.find('*')
divPos = txt.find('/')
return min(pos for pos in (plusPos, minusPos, multPos, divPos) if pos >= 0)

First, you shouldn't be printing or returning error messages; you should be raising exceptions. TypeError and ValueError would be appropriate here. (A string that isn't long enough is the latter, not the former.)
Second, you can simply find the the positions of all the operators in the string using a list comprehension, exclude results of -1, and return the lowest of the positions using min().
def findNextOpr(text, start=0):
ops = "+-/*"
assert isinstance(text, str), "text must be a string"
# "text must not be empty" isn't strictly true:
# you'll get a perfectly sensible result for an empty string
assert text, "text must not be empty"
op_idxs = [pos for pos in (text.find(op, start) for op in ops) if pos > -1]
return min(op_idxs) if op_idxs else -1
I've added a start argument that can be used to find the next operator: simply pass in the index of the last-found operator, plus 1.

counting the number of substrings in a string

I am working on an Python assignment and I am stuck here.
Apparently, I have to write a code that counts the number of a given substring within a string.
I thought I got it right, then I am stuck here.
def count(substr,theStr):
# your code here
num = 0
i = 0
while substr in theStr[i:]:
i = i + theStr.find(substr)+1
num = num + 1
return num
substr = 'is'
theStr = 'mississipi'
print(count(substr,theStr))
if I run this, I expect to get 2 as the result, rather, I get 3...
See, other examples such as ana and banana works fine, but this specific example keeps making the error. I don't know what I did wrong here.
Would you PLEASE help me out.

In your code
while substr in theStr[i:]:
correctly advances over the target string theStr, however the
i = i + theStr.find(substr)+1
keeps looking from the start of theStr.
The str.find method accepts optional start and end arguments to limit the search:
str.find(sub[, start[, end]])
Return the lowest index in the string where substring sub is found
within the slice s[start:end]. Optional arguments start and end
are interpreted as in slice notation. Return -1 if sub is not found.
We don't really need to use in here: we can just check that find doesn't return -1. It's a bit wasteful performing an in search when we then need to repeat the search using find to get the index of the substring.
I assume that you want to find overlapping matches, since the str.count method can find non-overlapping matches, and since it's implemented in C it's more efficient than implementing it yourself in Python.
def count(substr, theStr):
num = i = 0
while True:
j = theStr.find(substr, i)
if j == -1:
break
num += 1
i = j + 1
return num
print(count('is', 'mississipi'))
print(count('ana', 'bananana'))
output
2
3
The core of this code is
j = theStr.find(substr, i)
i is initialised to 0, so we start searching from the beginning of theStr, and because of i = j + 1 subsequent searches start looking from the index following the last found match.

The code change you need is -
i = i + theStr[i:].find(substr)+ 1
instead of
i = i + theStr.find(substr)+ 1
In your code the substring is always found until i reaches position 4 or more. But while finding the index of the substring, you were using the original(whole) string which in turn returns the position as 1.
In your example of banana, after first iteration i becomes 2. So, in next iteration str[i:] becomes nana. And the position of substring ana in this sliced string and the original string is 1. So, the bug in the code is just suppressed and the code seems to work fine.
If your code is purely for learning purpose, the you can do this way. Otherwise you may want to make use of python provided functions (like count()) to do the job.

Counting the number of substrings:
def count(substr,theStr):
num = 0
for i in range(len(theStr)):
if theStr[i:i+len(substr)] == substr:
num += 1
return num
substr = 'is'
theStr = 'mississipi'
print(count(substr,theStr))
O/P : 2
where theStr[i:i+len(substr)] is slice string, i is strating index and i+len(substr) is ending index.
Eg.
i = 0
substr length = 2
first-time compare substring is => mi
String slice more details

Create Your Own Find String Function

For a school project I have to create a function called find_str that essentially does the same thing as the .find string method, but we cannot use any string methods in our definition.
The project description reads: "Function find_str has two parameters (both strings). It returns the lowest index where the second parameter is found within the first parameter (it returns -1 if the second parameter is not found within the first parameter)."
I have spent a lot of time working on this project and have yet to come to a solution. This is the current definition that I have come up with:
def find_str (string, substring):
index = 0
length = len (substring)
for ch in string:
if ch == substring [0]:
subindex1 = 0
subindex2 = index
for i in range (length):
if ch == substring [i]:
subindex1 +=1
if subindex1 == length:
return index
ch = string [(subindex2)+1]
subindex2 +=1
index += 1
return "-1"
This sample of code only works in some instances, but not all.
For example:
print (find_str ("hello", "llo"))
returns:
2
as it should.
But
print (find_str ("hello", "el"))
returns:
ch = string [(subindex2)+1]
IndexError: string index out of range
I feel like I am overthinking this and there must be is an easier way to do it. Any input or help would be great! Thanks.

FFUsing a sub function to clear your thoughts often help.
def find_str (string, substring):
index = 0
length = len (substring)
for j in range(len(string)):
if is_next_sub(string, substring, j):
return j
return "-1"
def is_next_sub(string, substring, index):
for i in range(len(substring)):
if substring[i] != string[index + i]:
return False
return True

I'm not sure we should be helping you with 'homework'
How about this:
def find_str(string, substring):
for off in xrange(len(string)):
if string[off:].startswith(substring):
return off
return -1

I haven't checked through your code in detail, but it looks like you're trying to compare characters that don't exist.
Suppose you're searching "aaaaa" for the substring "aaa", and you need to find all matches...
String : aaaaa
Match at 0 : aaa..
Match at 1 : .aaa.
Match at 2 : ..aaa
Even though the characters always match, and there five characters in the string, there are only three positions that you might need to consider.
So before you look at the actual characters at all, you can restrict the number of start positions you might need to consider based on the lengths of the string and substring. You only loop for those start positions. That means you're not looping for start positions that cannot match. Also, if you don't do this...
String : aaaaa
Match at 0 : aaa..
Match at 1 : .aaa.
Match at 2 : ..aaa
Match at 3 : ...aa!
Match at 4 : ....a!!
Those exclamation points are places where you try to match a character in the substring with a character that doesn't exist, after the end of the string. You can check for that within the loop to avoid the error each time it occurs, but why not eliminate all those cases at once by not looping for the match positions that cannot occur?
The number of start positions you may need to check is len(fullstring) + 1 - len(substring), so you can derive a range of possible start positions using range(0, len(fullstring) + 1 - len(substring)).

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

string.find indexing in Python - python

string = 'ding' string[:-1] >>> 'din' Using a negative number as an index in python returns the nth element from the right-hand side. Accordingly, a slice with :-1 return all but the last element of the string.

Related

What does "[1:][::-1]" mean, for example (str) a[1:][::-1]? [duplicate]

string "c" is the latest item, why it is not break?

How do I detect any of 4 characters in a string and then return their index?

counting the number of substrings in a string

Create Your Own Find String Function

Categories

Resources