A "string index out if range" Python error - python

I´ve searched for other "string index out of range" cases, but they were not useful for me, so I wanted to search for help here.
The program has to do this: "Write a function kth_word(s, k) that given a string s and an integer k≥ 1 returns the kth word in string s. If s has less than k words it returns the empty string. We assume all characters of s are letters and spaces. Warning: do not use the split string method."
Here is my code:
def kth_word(s, k):
new =""
word_count = 0
for i in range(0, len(s)):
if s[i] == " " and s[i+1] != " ":
word_count+=1
#try to find how many characters to print until the space
if word_count == k-1:
while i!= " " and i<=len(s): #if it is changed to i<len(s), the output is strange and wrong
new+=s[i]
i=i+1
print(new) #check how new is doing, normally works good
return new
print(kth_word('Alea iacta est', 2))
(I tried my best to implement the code in a right way, but i do not know how)
And depending on the place where you live return new it gives or an error or just an empty answer

You iterate from 0 to len(s)-1 in your first for loop, but you're addressing i+1 which, on the last iteration, is len(s).
s[len(s)] is an IndexError -- it is out of bounds.
Additionally your while loop is off-by-one.
while i!= " " and i<=len(s):
# do something referencing s[i]
Your first condition makes no sense (i is a number, how could it be " "?) and your second introduces the same off-by-one error as above, where i is maximally len(s) and s[len(s)] is an error.
Your logic is a bit off here, too, since you're wrapping this inside the for loop which is already referencing i. This appears to be a takewhile loop, but isn't really doing that.

Warning: do not use the split string method.
So groupby / islice from itertools should work:
from itertools import groupby, islice
def kth_word(s, k):
g = (j for i, j in groupby(s, key=lambda x: x==' ') if not i)
return ''.join(next(islice(g, k-1, k), ''))
words = 'Alea iacta est'
res = kth_word(words, 2) # 'est'
We handle StopIteration errors by setting the optional parameter in next to ''.

You're not allowed to use str.split. If you could, the answer would just be:
def kth_word(s, k):
return s.split()[k]
But if you could write a function that does the same thing str.split does, you could call that instead. And that would certainly show that you understand everything the assignment was testing for—how to loop over strings, and do character-by-character operations, and so on.
You can write a version with only the features of Python usually taught in the first week:
def split(s):
words = []
current = ''
for ch in s:
if ch.isspace():
if current:
words.append(current)
current = ''
else:
current += ch
if current:
words.append(current)
return words
If you know additional Python features, you can improve it in a few ways:
Build current as a list instead of a str and ''.join it.
Change those append calls to yield so it splits the string lazily (even better than str.split).
Use str.find or str.index or re.search to find the next space instead of searching character by character.
Abstract out the space-finding part into a general-purpose generator—or, once you realize what you want, find that function in itertools.
Add all of the features we're missing from str.split, like the ability to pass a custom delimiter instead of breaking on any whitespace.
But I think even the basic version—assuming you understand it and can explain how it works—ought to be enough to get an A on the assignment.
And, more importantly, you're practicing the best way to solve problems: reduce them to simpler problems. split is actually easier to write than kth_word, but once you write split, kth_word becomes trivial.

You actually have at least five problems here, and you need to fix all of them.
First, as pointed out by Adam Smith, this is wrong:
for i in range(0, len(s)):
if s[i] == " " and s[i+1] != " ":
This loops with i over all the values up to but not including len(s), which is good, but then, if s[i] is a space, it tries to access s[i+1]. So, if your string ended with a space, you would get an IndexError here.
Second, as ggorlen pointed out in a comment, this is wrong:
while i!= " " and i<=len(s):
new+=s[i[]
When i == len(s), you're going to try to access s[i], which will be an IndexError. In fact, this is the IndexError you're seeing in your example.
You seem to realize that's a problem, but refuse to fix it, based on this comment:
#if it is changed to i<len(s), the output is strange and wrong
Yes, the output is strange and wrong, but that's because fixing this bug means that, instead of an IndexError, you hit the other bugs in your code. It's not causing those bugs.
Next, you need to return new right after doing the inner loop, rather than after the outer loop. Otherwise, you add all of the remaining words rather than just the first one, and you add them over and over, once per character, instead of just adding them once.
You may have been expecting that doing that i=i+1 would affect the loop variable and skip over the rest of the word, but (a) it won't; the next time through the for it just reassigns i to the next value, and (b) that wouldn't help anyway, because you're only advancing i to the next space, not to the end of the string.
Also, you're counting words at the space, but then you're iterating from that space until the next one. Which means (except for the first word) you're going to include that space as part of the word. So, you need to do an i += 1 before the while loop.
Although it would probably be a lot more readable to not try to reuse the same variable i, and also to use for instead of while.
Also, your inner loop should be checking s[i] != " ", not i!=" ". Obviously the index, being a number, will never equal a space character.
Without the previous fix, this would mean you output iacta est
with an extra space before it—but with the previous fix, it means you output nothing instead of iacta.
Once you fix all of these problems, your code works:
def kth_word(s, k):
word_count = 0
for i in range(0, len(s) - 1):
if s[i] == " " and s[i+1] != " ":
word_count+=1
#try to find how many characters to print until the space
if word_count == k-1:
new =""
j = i+1
while j < len(s) and s[j] != " ":
new+=s[j]
j = j+1
print(new) #check how new is doing, normally works good
return new
Well, you still have a problem with the first word, but I'll leave it to you to find and fix that one.

Your use of the variable 'i' in both the for loop and the while loop was causing problems. using a new variable, 'n', for the while loop and changing the condition to n < len(s) fixes the problem. Also, some other parts of your code required changing because either they were pointless or not compatible with more than 2 words. Here is the fully changed code. It is explained further down:
def kth_word(s, k):
new = ""
word_count = 0
n = 0
for i in range(0, len(s) - 1):
if s[i] == " " and s[i + 1] != " ":
word_count += 1
#try to find how many characters to print until the space
if word_count < k:
while n < len(s): #if it is changed to i<len(s), the output is strange and wrong
new+=s[n]
n += 1
print(new) #check how new is doing, normally works good
return new
print(kth_word('Alea iacta est', 2))
Explanation:
As said in Adam Smith's answer, 'i' is a number and will never be equal to ' '. That part of the code was removed because it is always true.
I have changed i = i + 1 to i += 1. It won't make much difference here, but this will help you later when you use longer variable names. It can also be used to append text to strings.
I have also declared 'n' for later use and changed for i in range(0, len(s)): to for i in range(0, len(s) - 1): so that the for loop can't go out of range either.
if word_count == k-1: was changed to if word_count < k: for compatibility for more words, because the former code only went to the while loop when it was up to the second-last word.
And finally, spaces were added for better readability (This will also help you later).

Related

Function that is made to remove round brackets does not work, Python 3

I have a function that takes in an argument, preferably a string, takes each value of the string and implements them as elements in a list. After that, it iterate's through the list and is supposed to delete/remove elements that are round brackets, so basically, these: ( ). Here is the code:
def func(s):
n = 0
s = [i for i in s]
for i in s:
if s[n] == "(" or s[n] == ")":
del s[n]
else:
n += 1
continue
return s
print(func("ubib0_)IUBi(biub()()()9uibib()((U*H)9g)*(GB(uG(*UV(V79V*&^&87vyutgivugyrxerdtufcviO)()(()()()(0()90Y*(g780(&*^(UV(08U970u9yUV())))))))))"))
However, the function stops the iteration and ends/returns the list early (when some round brackets are still there).
I also went with another way, a way that works:
def func(s):
n = 0
s = [i for i in s]
s2 = [i for i in s if i != "(" and i != ")"]
return s2
print(func("ubib0_)IUBi(biub()()()9uibib()((U*H)9g)*(GB(uG(*UV(V79V*&^&87vyutgivugyrxerdtufcviO)()(()()()(0()90Y*(g780(&*^(UV(08U970u9yUV())))))))))"))
Why does this work while the other way doesn't? They like they'd output the same result.
What am I doing wrong in the first example?
Your concept is correct, in that you either delete the current item or increment n.
Where you've gone wrong is that you're iterating over each letter which doesn't make sense given the above info. Changing for i in s to while n < len(s) will fix the problem.
A couple of things you may find useful:
list(s) looks cleaner than [i for i in s]
i not in "()" is another way to write i != "(" and i != ")"
At the beginning when you're increasing n, n equals to i. But when you meet a bracket, n has the same value the next iteration, and i increases. It happens every time s[n] == "(" or s[n] == ")" and the difference between n's and i's values increases.
To work correctly you program needs to check every symbol in the list (string) for equality of either '(' or ')' using s[n], but it doesn't happen because the iteration stops when i achieves the end of the list and n at that time is much less than i and it hasn't achieved the end of the list yet and hasn't checked all symbols.

For loop for finding index in string

I'm wondering why my function does not call the index of the character in the string. I used a for loop for this, and for some reason it's just listing all of the possible indices of the string. I made a specific if statement, but I don't get why it's not following the instructions.
def where_is(char,string):
c=0
for char in string:
if char==(string[c]):
print (c)
c+=1
else:
print ("")
where_is('p','apple')
Your loop is overwriting your parameter char. As soon as you enter your loop, it is overwritten with a character from the string, which you then compare to itself. Rename either your parameter or your loop variable. Also, your counter increment c+=1 should also be outside of your if. You want to increase the index whether or not you find a match, otherwise your results are going to be off.
And just as a matter of style, you don't really need that else block, the print call will just give you extra newlines you probably don't want.
Firstly, the index you used is not being increased in the else part and secondly, I generally prefer a while loop to a for loop when it comes to iterating through strings. Making slight modifications to your code, have a look at this :
def where_is(char,string):
i=0
while i<len(string):
if char==(string[i]):
print (i)
else:
print ("")
i+=1
where_is('p','apple')
Input : where_is('p','apple')
Output: 1 2
Check it out here
The problem is that the given code iterated over everything stored in the string and as it matched everytime the value of 'c' increased and got printed.
I think your code should be:
def where_is(char, string):
for i in range(len(string)):
if char == (string[i]):
print(i)
where_is('p', 'apple')
This prints the index of all the 'p's in 'apple'.
As mentioned in the comments, you don't increment correctly in your for loop. You want to loop through the word, incrementing each time and outputting the index when the letter is found:
def where_is(char, word):
current_index = 0
while current_index < len(word):
if char == word[current_index]:
print (current_index)
current_index += 1
where_is('p', 'apple')
Returning:
1
2
Alternatively, by using enumerate and a list comprehension, you could reduce the whole thing down to:
def where_is(char, word):
print [index for index, letter in enumerate(word) if letter == char]
where_is('p', 'apple')
which will print:
[1,2]
You then also have the option of return-ing the list you create, for further processing.

Using for loop to insert character between elements of string

def main():
string = raw_input("string:")
pattern = raw_input("pattern:")
end = len(string)
insertPattern(string,pattern)
def insertPattern(string,pattern):
end= len(string)-1
print "Iterative:",
for x in range(end):
if x == end:
print string[x]
if x < end:
print string[x]+pattern,
main()
I'd like this to output
Instead it's outputting
How would I modify the code to fix this? Assignment requires that I do this without lists or join.
You've got three problems here.
First, the reason you're getting that Iterative: at the beginning is because you explicitly asked for it with this line:
print "Iterative:",
Just take it out.
The reason you're getting spaces after each * is a bit trickier. The print statement's "magic comma" always prints a space. There's no way around that. So, what you have to do is not use the print statement's magic comma.
There are a few options:
Use the more-powerful print function from Python 3.x, which you can borrow in 2.7 with a __future__ statement. You can pass any separator you want to replace the space, even the empty string.
Use sys.stdout.write instead of print; that way you get neither newlines nor spaces unless you write them explicitly.
Build up the string as you go along, and then print the whole thing at the end.
The last one is the most general solution (and also leads to lots of other useful possibilities, like returning or storing the built-up string), so I'll show that:
def insertPattern(string,pattern):
result = ''
end= len(string)-1
for x in range(end):
if x == end:
result += string[x]
if x < end:
result += string[x]+pattern
print result
Finally, the extra * at the end is because x == end can never be true. range(end) gives you all the numbers up to, but not including end.
What you probably wanted was end = len(string), and then if x == end-1.
But you can simplify this quite a bit. The only reason you need x is to get string[x], and to distinguish either the first or last value from the others (so you know not to add an extra * either before the first or after the last). You can solve the last one with a flag, or by just treating the first one special. And then, you can just iterate over string itself, instead of over its indices:
def insertPattern(string,pattern):
result = string[0]
for ch in string[1:]:
result += pattern + ch
print result
And once you've done that, you may realize that this is almost identical to what the str.join method does, so you can just use that:
def insertPattern(string,pattern):
print pattern.join(string)

Python: How to check if two inputs A and B are anagrams without all punctuation, and all uppercase letters were lower case letters

The first part of the question is to check if input A and input B are anagrams, which I can do easily enough.
s = input ("Word 1?")
b = sorted(s)
c = ''.join(b)
t = input("Word 2?")
a = sorted(t)
d = ''.join(b)
if d == c:
print("Anagram!")
else:
print("Not Anagram!")
The problem is the second part of the question - I need to check if two words are anagrams if all of the punctuation is removed, the upper case letters turned to lower case, but the question assumes no spaces are used. So, for example, (ACdB;,.Eo,."kl) and (oadcbE,LK) are anagrams. The question also asks for loops to be used.
s = input ("Word 1?")
s = s.lower()
for i in range (0, len(s)):
if ord(s[i]) < 97 or ord(s[i]) >122:
s = s.replace(s[i], '')
b = sorted(s)
c = ''.join(b)
print(c)
Currently, the above code is saying the string index is out of range.
Here's the loop you need to add, in psuedocode:
s = input ("Word 1?")
s_letters = ''
for letter in s:
if it's punctuation: skip it
else if it's uppercase: add the lowercase version to s_letters
else: add it to s_letters
b = sorted(s_letters)
Except of course that you need to add the same thing for t as well. If you've learned about functions, you will want to write this as a function, and call it twice, instead of copying and pasting it with minor changes.
There are three big problems with your loop. You need to solve all three of these, not just one.
First, s = s.replace(s[i], '') doesn't replace the ith character with a space, it replaces the ith character and every other copy of the same character with a space. That's going to screw up the rest of your loop if there are any duplicates. It's also very slow, because you have to search the entire string over and over again.
The right way to replace the character at a specific index is to use slicing: s = s[:i] + s[i+1:].
Or, you could make this a lot simpler by turning the string into a list of characters (s = list(s)), you can mutate it in-place (del s[i]).
Next, we're going through the loop 6 times, checking s[0], s[1], s[2], s[3], s[4], and s[5]. But somewhere along the way, we're going to remove some of the characters (ideally three of them). So some of those indices will be past the end of the string, which will raise an IndexError. I won't explain how to fix this yet, because it ties directly into the next problem.
Modifying a sequence while you loop over it always breaks your loop.* Imagine starting with s = '123abc'. Let's step through the loop.
i = 0, so you check s[0], which is 1, so you remove it, leaving s = '23abc'.
i = 1, so you check s[1], which is 3, so you remove it, leaving s = '2abc'.
i = 2, so you check s[2], which is b, so you leave it, leaving s = '2abc'.
And so on.
The 2 got moved to s[0] by removing the 1. But you're never going to come back to i = 0 once you've passed it. So, you're never going to check the 2. You can solve this in a few different ways—iterating backward, doing a while instead of an if each time through the for, etc.—but most of those solutions will just exacerbate the previous problem.
The easy way to solve both problems is to just not modify the string while you loop over it. You could do this by, e.g., building up a list of indexes to remove as you go along, then applying that in reverse order.
But a much easier way to do it is to just build up the characters you want to keep as you go along. And that also solves the first problem for your automatically.
So:
new_s = []
for i in range (0, len(s)):
if ord(s[i]) < 97 or ord(s[i]) >122:
pass
else:
new_s.append(s[i])
b = sorted(new_s)
And with that relative minor change, your code works.
While we're at it, there are a few ways you're overcomplicating things.
First, you don't need to do ord(s[i]) < 97; you can just do s[i] < 'a'. This makes things a lot more readable.
But, even more simply, you can just use the isalpha or islower method. (Since you've already converted to lower, and you're only dealing with one character at a time, it doesn't really matter which.) Besides being more readable, and harder to get wrong, this has the advantage of working with non-ASCII characters, like é.
Finally, you almost never want to write a loop like this:
for i in range(len(s)):
That forces you to write s[i] all over the place, when you could have just looped over s in the first place:
for ch in s:
So, putting it all together, here's your code, with the two simple fixes, and the cleanup:
s = input ("Word 1?")
s = s.lower()
new_s = []
for ch in s:
if ch.isalpha():
new_s.append(ch)
b = sorted(new_s)
c = ''.join(b)
print(c)
If you know about comprehensions or higher-order functions, you'll recognize this pattern as exactly what a list comprehension does. So, you can turn the whole 4 lines of code that build new_s into either of these one-liners, which are more readable as well as being shorter:
new_s = (ch for ch in s if ch.isalpha)
new_s = filter(str.isalpha, s)
And in fact, the whole thing can become a one-liner:
b = sorted(ch for ch in s.lower() if ch.isalpha)
But your teacher asked you to use a for statement, so you'd better keep it as a for statement.
* This isn't quite true. If you only modify the part of the sequence after the current index, and you make sure the sequence aways has the right length by the time you get to each index even though it may have had a different length before you did (using a while loop instead of a for loop, to reevaluate len(seq) each time, makes this part trivial instead of hard), then it works. But it's easier to just never do it to than learn the rules and carefully analyze your code to see if you're getting away with it this time.

Python - packing/unpacking by letters

I'm just starting to learn python and I have this exercise that's puzzling me:
Create a function that can pack or unpack a string of letters.
So aaabb would be packed a3b2 and vice versa.
For the packing part of the function, I wrote the following
def packer(s):
if s.isalpha(): # Defines if unpacked
stack = []
for i in s:
if s.count(i) > 1:
if (i + str(s.count(i))) not in stack:
stack.append(i + str(s.count(i)))
else:
stack.append(i)
print "".join(stack)
else:
print "Something's not quite right.."
return False
packer("aaaaaaaaaaaabbbccccd")
This seems to work all proper. But the assignment says that
if the input has (for example) the letter a after b or c, then
it should later be unpacked into it's original form.
So "aaabbkka" should become a3b2k2a, not a4b2k2.
I hence figured, that I cannot use the "count()" command, since
that counts all occurrences of the item in the whole string, correct?
What would be my options here then?
On to the unpacking -
I've thought of the basics what my code needs to do -
between the " if s.isalpha():" and else, I should add an elif that
checks whether or not the string has digits in it. (I figured this would be
enough to determine whether it's the packed version or unpacked).
Create a for loop and inside of it an if sentence, which then checks for every element:
2.1. If it has a number behind it > Return (or add to an empty stack) the number times the digit
2.2. If it has no number following it > Return just the element.
Big question number 2 - how do I check whether it's a number or just another
alphabetical element following an element in the list? I guess this must be done with
slicing, but those only take integers. Could this be achieved with the index command?
Also - if this is of any relevance - so far I've basically covered lists, strings, if and for
and I've been told this exercise is doable with just those (...so if you wouldn't mind keeping this really basic)
All help appreciated for the newbie enthusiast!
SOLVED:
def packer(s):
if s.isalpha(): # Defines if unpacked
groups= []
last_char = None
for c in s:
if c == last_char:
groups[-1].append(c)
else:
groups.append([c])
last_char = c
return ''.join('%s%s' % (g[0], len(g)>1 and len(g) or '') for g in groups)
else: # Seems to be packed
stack = ""
for i in range(len(s)):
if s[i].isalpha():
if i+1 < len(s) and s[i+1].isdigit():
digit = s[i+1]
char = s[i]
i += 2
while i < len(s) and s[i].isdigit():
digit +=s[i]
i+=1
stack += char * int(digit)
else:
stack+= s[i]
else:
""
return "".join(stack)
print (packer("aaaaaaaaaaaabbbccccd"))
print (packer("a4b19am4nmba22"))
So this is my final code. Almost managed to pull it all off with just for loops and if statements.
In the end though I had to bring in the while loop to solve reading the multiple-digit numbers issue. I think I still managed to keep it simple enough. Thanks a ton millimoose and everyone else for chipping in!
A straightforward solution:
If a char is different, make a new group. Otherwise append it to the last group. Finally count all groups and join them.
def packer(s):
groups = []
last_char = None
for c in s:
if c == last_char:
groups[-1].append(c)
else:
groups.append([c])
last_char = c
return ''.join('%s%s'%(g[0], len(g)) for g in groups)
Another approach is using re.
Regex r'(.)\1+' can match consecutive characters longer than 1. And with re.sub you can easily encode it:
regex = re.compile(r'(.)\1+')
def replacer(match):
return match.group(1) + str(len(match.group(0)))
regex.sub(replacer, 'aaabbkka')
#=> 'a3b2k2a'
I think You can use `itertools.grouby' function
for example
import itertools
data = 'aaassaaasssddee'
groupped_data = ((c, len(list(g))) for c, g in itertools.groupby(data))
result = ''.join(c + (str(n) if n > 1 else '') for c, n in groupped_data)
of course one can make this code more readable using generator instead of generator statement
This is an implementation of the algorithm I outlined in the comments:
from itertools import takewhile, count, islice, izip
def consume(items):
from collections import deque
deque(items, maxlen=0)
def ilen(items):
result = count()
consume(izip(items, result))
return next(result)
def pack_or_unpack(data):
start = 0
result = []
while start < len(data):
if data[start].isdigit():
# `data` is packed, bail
return unpack(data)
run = run_len(data, start)
# append the character that might repeat
result.append(data[start])
if run > 1:
# append the length of the run of characters
result.append(str(run))
start += run
return ''.join(result)
def run_len(data, start):
"""Return the end index of the run of identical characters starting at
`start`"""
return start + ilen(takewhile(lambda c: c == data[start],
islice(data, start, None)))
def unpack(data):
result = []
for i in range(len(data)):
if data[i].isdigit():
# skip digits, we'll look for them below
continue
# packed character
c = data[i]
# number of repetitions
n = 1
if (i+1) < len(data) and data[i+1].isdigit():
# if the next character is a digit, grab all the digits in the
# substring starting at i+1
n = int(''.join(takewhile(str.isdigit, data[i+1:])))
# append the repeated character
result.append(c*n) # multiplying a string with a number repeats it
return ''.join(result)
print pack_or_unpack('aaabbc')
print pack_or_unpack('a3b2c')
print pack_or_unpack('a10')
print pack_or_unpack('b5c5')
print pack_or_unpack('abc')
A regex-flavoured version of unpack() would be:
import re
UNPACK_RE = re.compile(r'(?P<char> [a-zA-Z]) (?P<count> \d+)?', re.VERBOSE)
def unpack_re(data):
matches = UNPACK_RE.finditer(data)
pairs = ((m.group('char'), m.group('count')) for m in matches)
return ''.join(char * (int(count) if count else 1)
for char, count in pairs)
This code demonstrates the most straightforward (or "basic") approach of implementing that algorithm. It's not particularly elegant or idiomatic or necessarily efficient. (It would be if written in C, but Python has the caveats such as: indexing a string copies the character into a new string, and algorithms that seem to copy data excessively might be faster than trying to avoid this if the copying is done in C and the workaround was implemented with a Python loop.)

Categories

Resources