String slices/substrings/ranges in Python - python

I'm newbie in Python and I would like to know something that I found very curious.
Let's say I have this:
s = "hello"
Then:
s[1:4] prints "ell" which makes sense...
and then s[3:-1] prints 'l' only that does makes sense too..
But!
s[-1:3] which is same range but backwards returns an empty string ''... and s[1:10] or s[1:-20] is not throwing an error at all.. which.. from my point of view, it should produce an error right? A typical out-of-bounds error.. :S
My conclusion is that the range are always from left to right, I would like to confirm with the community if this is as I'm saying or not.
Thanks!

s[-1:3] returns the empty string because there is nothing in that range. It is requesting the range from the last character, to the third character, moving to the right, but the last character is already past the third character.
Ranges are by default left to right.
There are extended slices which can reverse the step, or change it's size. So s[-1:3:-1] will give you just 'o'. The last -1 in that slice is telling you that the slice should move from right to left.
Slices won't throw errors if you request a range that isn't in the string, they just return an empty string for those positions.

Ranges are "clamped" to the extent of the string... i.e.
s[:10]
will return the first 10 characters, or less if the string is not long enough.
A negative index means starting counting from the end, so s[-3:] takes the last three characters (or less if the string is shorter).
You can have range backward but you need to use an explicit step, like
s[10:5:-1]
You can also simply get the reverse of a string with
s[::-1]
or the string composed by taking all chars in even position with
s[::2]

Related

Printing range of numbers without using loop?

Can someone explain to me how this code works ? It prints numbers from 0 to 100, but I cannot understand how.
print(*range(True,ord("e")))
ord accepts a character and returns the ASCII code. In this case "e" returns 101. The *range is unpacking the iterable that range creates. Enabling you to print out the values from True (1) to 101 - 1.
I found this out by googling each piece of code individually. Type in "ord python" then another search was "star range python". These searches lead to information you are seeking.
print(*range(True,ord("e")))
Firstly, print() means that we are displaying some information on the screen.
Secondly, the * indicates that there may be more than one object to be printed.
Now, think of the True as a 0. True does nothing. The ord("e") means where is e in the Unicode. It returns that number, which is 101. Now, range(start, end) means every value between the start value (0), and end value (1).

Recursively search word in a matrix of characters

I'm trying to write a program for a homework using recursion to search for a word in a matrix (2x2 or more), it can be going from left to right or from up to down (no other directions), for example if I am searching for ab , in the matrix [['a','b'],['c','d']], the program should return in what direction the word is written (across), the starting index(0), ending index(2), and the index of the row or column(0).
My problem is that I have the idea of the recursion but, I can't implement it. I tried to break the problem down into more little proplems, like searching for the word in a given row, I started by thinking of the smallest case which is 2x2 matrix, at the first row and column, I need to search one to the right and one to the bottom of the first char, and check if they are equal to my given string, then give my recursion function a smaller problem with the index+1. However I can't think of what to make my function return at the base case of the recursion, been trying to solve it and think of ways to do it for two days, and I can't code what I think about or draw.
Note that I can't use any loops, I would really appreciate it if somone could push me in the right direction, any help would be pretty much appreciated, thanks in advance.
Edit: more examples: for input of matrix : [['a','b','c'],['d','e','f'],['g','h','i']] the outputs are:
with the string ab : across,0,0,2
with the string be : down,1,0,2
with the string ghi: across,2,0,3
I assume that the word we are looking for could be found starting from any place but we can move up to down or left to right only.
In that case, you should have a function that takes the start index and a direction and then the function keeps moving in the given direction starting from the given index and keeps moving until it doesn't find a mismatch, and it just returns true or false based on the match of the given string.
Now you need to call this function for each and every index of the matrix along with two directions up to down and left to right, and at any index, if you get the output of the function as true then you have found your answer.
This is a very basic idea to work, next it depends on you how you want to optimize the things in this method only.
Update:
To avoid using the loops.
The other way I can think of is that the function which we have defined now takes the row, column, and the string to find. So at each call, you will first check if the character at the given row and column matches the first character of the given string if so then it calls the two more functions, one in the right direction and the other in the down direction, along with the string with the first character removed.
Now to check all the columns of the matrix, you will anyway call the function in down and right direction with the exact same string.
The base case will be that if you reach the end of the string then you have found the answer and you will return True, otherwise False.
One more thing to notice here is that if any of the 4 function calls gives you a True response then the current row/column will also return True.
Cheers!

what is output of following code? if it is empty string then please explain why?

I declare a string variable and want to access some characters with the help of slicing operators but it showing empty str as output.
please explain why it is showing empty.
I tried to print with different end index it works for all others but fails when end index becomes 0.
s='0123456789'
print(s[2:-1:-1])
In Python slicing, -1 means "the last element". So for a 10-character string, it's equivalent to 9. And then since your step is -1, you are slicing in the wrong way, so the result becomes empty.
If you want '210', you can go with s[2::-1], although it's a bit inconvenient when your end is a variable. There are multiple workarounds, though, like s[0:3][::-1].
See:
>>> s='0123456789'
>>> print(s[2:-1:-1])
>>> print(s[-1:2:-1])
9876543
The reason that the first statement isn't printing anything is because the start is already less than the end - if you were to continue to step -1, you would go out of the bounds of the array. Keep in mind that s[-1] resolves to s[len(s) - 1].
Or, in other words, if I told you to start at the second index of the array and go frontwards until you hit the last index of the array, it wouldn't make sense. After all, if you go frontwards, you're going towards the front of the array, not the back.
Meanwhile, if I switch those commands around - "start at the last index, and go frontwards until you hit the second index" - that makes perfect sense.
First argument means start_index, second means end_index (empty means until end iteration), third is step.
Your expected output: 210
s='0123456789'
print(s[2::-1])
output:
210

Why Simple String's Character has So many array Dimensions?

I am currently working on a python's String and List.
When I assign string in variable str="string" and try to access it first character by str[0] it works perfectly and give "s".
But, when I try to find character str[0][0][0][0][0][0] it again gives "s". But when I give str[0][1] it gives an error:
IndexError: string index out of range
Its Correct. My Question is Why Simple String Character has So many array Dimensions? and it did not given any error and print 0 character of string when str[0][0][0][0][0][0]. What is Data Structure of String?
My Code is
str="string"
print((str[0][0][0][0][0][0][0][0])) # Working, but my Question is Why Working
print((str[1][0][0][0][0])) # Working
print((str[2][0][0][0][0])) # Working
print((str[3][0][0][0][0])) # Working
list=["0","p",0]
print(list[0][0][0]) # Working
My Output is:
s
t
r
i
0
Why shouldn't it work?
Indexing a string returns a one element string which is again indexable and returns the same value:
>>> 's'[0]
's'
since it consists of one element, you can continue indexing the zero-element [0] as much as you want.
This is explained in the standard type hierarchy section of the Python Reference manual:
Strings
A string is a sequence of values that represent Unicode code points. All the code points in the range U+0000 - U+10FFFF can be represented in a string. Python doesn’t have a char type; instead, every code point in the string is represented as a string object with length 1.
(Emphasis mine)
Side-note: Don't use names such as str, you mask the built-in str.
In Python a string is a sequence of characters, but characters are 1-char strings.
So if you access 'foobar'[0], you obtain 'f'. Since f is however a string, we can access the first character of that string. Since 'f'[0] is 'f'. As a result if you access a strings s with s[i][0][0][0], you thus keep accessing the first character of the string s.
If you write s[i][1] however, this will error, since s[i] is a one-character string, and thus you can not obtain the second character, since there is no such character.
The string itself is not multidimensional, you simply obtain a new string and call the index of that new string. You can add as many [0]s as you like.
The problem is not in Python, it is due to the fact that you assume there is a char type in Python (based on the title of this question).
A string in Python is an array of essentially single element strings. s[0] simply returns the string 's', not a character. s[0]...[0] can be thought of as an infinite recursion that keeps getting the same single element string, infinitely many times.
You can go as deep as you want: (in this case, in order to do it more than 997 times you will need to modify Python`s default allowed recursion depth)
def string_dive(s, count=0):
if count < 997:
count += 1
return string_dive(s[0], count)
else:
return s
print(string_dive('string'))
# 's'

Variable Length Needle in Haystack (Python)

I have a function designed to find errors in an application's search capabilities, which generates a variable-length search string from the non-control UTF-8 possibilities. Running pytest iterations on this function, the random UTF-8 strings, submitted for search, generate debug errors roughly once per 500 searches.
As I can grab each of the strings that caused an error, I want to determine what is the minimal sub-series of the characters in those strings which truly provoke the error. In other words, (inside of a pytest loop):
def fumble_towards_ecstasy(string_that_breaks):
# iterate over both length and content of the string
nugget = # minimum series of characters that break the search
return nugget
Should I slice the string in half and whittle down each side and re-submit until it fails, choose random characters from its (len() - 1) and then back up if an error doesn't happen? Brute force combinatorial? What's the best way to step through this?
Thanks.
Splitting the string in half will fail if there is a two character sequence that causes the failure, and that sequence lies exactly in the middle. Each half succeeds, but the combined string fails.
Here's one algorithm that will find a local minimum:
Try removing each character in turn.
If removing the character still causes failure, keep the new shorter string and repeat the algorithm on this new string.
If removing the character no longer causes failure, put it back and try removing the next character. Keep going until there are no more characters left to try. When you reach the end of the string you know that removing any one character causes the search to succeed.
I'd use a "whittle from both sides" approach. Splitting the string will always run the risk of breaking up the substring that was causing the error. My approach would be:
Pop as many characters off the left of the string as you can while still ensuring that the string causes an error.
Do the same to the right side.
You're left with - in theory - the minimal substring that causes the error.
Hope that helps!
First of all it's worth noting that the solution is possibly not unique, i.e. it may be the case that there are two or more broken substrings.
An alternate suggestion (to the good answers by both Xavier and Mark) is to run a recursive approach. Repeat the sampling with the limited subset of strings that caused the error. Once another error is found, repeat until a minimal substring is reached. This approach is robust enough to handle a more complex use case, where the error can exist in two non-adjacent entries. I don't think that is the case here, but it's nice to have a general purpopse method.

Categories

Resources