Pythonic way to have a "size safe" slicing - python

Here is a quote from https://stackoverflow.com/users/893/greg-hewgill answer to Explain Python's slice notation.
Python is kind to the programmer if there are fewer items than you ask
for. For example, if you ask for a[:-2] and a only contains one
element, you get an empty list instead of an error. Sometimes you
would prefer the error, so you have to be aware that this may happen.
So when the error is prefered, what is the Pythonic way to proceed ? Is there a more Pythonic way to rewrite this example ?
class ParseError(Exception):
pass
def safe_slice(data, start, end):
"""0 <= start <= end is assumed"""
r = data[start:end]
if len(r) != end - start:
raise IndexError
return r
def lazy_parse(data):
"""extract (name, phone) from a data buffer.
If the buffer could not be parsed, a ParseError is raised.
"""
try:
name_length = ord(data[0])
extracted_name = safe_slice(data, 1, 1 + name_length)
phone_length = ord(data[1 + name_length])
extracted_phone = safe_slice(data, 2 + name_length, 2 + name_length + phone_length)
except IndexError:
raise ParseError()
return extracted_name, extracted_phone
if __name__ == '__main__':
print lazy_parse("\x04Jack\x0A0123456789") # OK
print lazy_parse("\x04Jack\x0A012345678") # should raise ParseError
edit: the example was simpler to write using byte strings but my real code is using lists.

Here's one way that is arguably more Pythonic. If you want to parse a byte string you can use the struct module that is provided for that exact purpose:
import struct
from collections import namedtuple
Details = namedtuple('Details', 'name phone')
def lazy_parse(data):
"""extract (name, phone) from a data buffer.
If the buffer could not be parsed, a ParseError is raised.
"""
try:
name = struct.unpack_from("%dp" % len(data), data)[0]
phone = struct.unpack_from("%dp" % (len(data)-len(name)-1), data, len(name)+1)[0]
except struct.error:
raise ParseError()
return Details(name, phone)
What I still find unpythonic about that is throwing away the useful struct.error traceback to replace with a ParseError whatever that is: the original tells you what is wrong with the string, the latter only tells you that something is wrong.

Using a function like safe_slice would be faster than creating an object just to perform the slice, but if speed is not a bottleneck and you are looking for a nicer interface, you could define a class with a __getitem__ to perform checks before returning the slice.
This allows you to use nice slice notation instead of having to pass both the start and stop arguments to safe_slice.
class SafeSlice(object):
# slice rules: http://docs.python.org/library/stdtypes.html#sequence-types-str-unicode-list-tuple-bytearray-buffer-xrange
def __init__(self,seq):
self.seq=seq
def __getitem__(self,key):
seq=self.seq
if isinstance(key,slice):
start,stop,step=key.start,key.stop,key.step
if start:
seq[start]
if stop:
if stop<0: stop=len(seq)+stop
seq[stop-1]
return seq[key]
seq=[1]
print(seq[:-2])
# []
print(SafeSlice(seq)[:-1])
# []
print(SafeSlice(seq)[:-2])
# IndexError: list index out of range
If speed is an issue, then I suggest just testing the end points instead of doing arithmetic. Item access for Python lists is O(1). The version of safe_slice below also allows you to pass 2,3 or 4 arguments. With just 2 arguments, the second will be interpreted as the stop value, (similar to range).
def safe_slice(seq, start, stop=None, step=1):
if stop is None:
stop=start
start=0
else:
seq[start]
if stop<0: stop=len(seq)+stop
seq[stop-1]
return seq[start:stop:step]

Here is a more pythonic, more general rewrite of your code:
class ParseError(Exception):
pass
def safe_slice(data, start, end, exc=IndexError):
"""0 <= start <= end is assumed"""
r = data[start:end]
if len(r) != end - start:
raise exc()
return r
def lazy_parse(data):
"""extract (name, phone) from a data buffer.
If the buffer could not be parsed, a ParseError is raised."""
results = []
ptr = 0
while ptr < len(data):
length = ord(data[ptr])
ptr += 1
results.append(safe_slice(data, ptr, ptr + length, exc=ParseError))
ptr += length
return tuple(results)
if __name__ == '__main__':
print lazy_parse("\x04Jack\x0A0123456789") # OK
print lazy_parse("\x04Jack\x0A012345678") # should raise ParseError
Most of the changes are in the body of lazy_parse -- it will now work with multiple values instead of just two, and the correctness of the whole thing still depends on the last element being able to be parsed out exactly.
Also, rather than have safe_slice raise an IndexError which lazy_parse changes into a ParseError, I have lazy_parse give the desired exception to safe_slice to be raised in case of error (lazy_parse defaults to IndexError if nothing is passed to it).
Finally, lazy_parse isn't -- it's processing the entire string at once and returning all the results. 'Lazy' in Python means doing only what is needed to return the next piece. In the case of lazy_parse it would mean returning the name, then on a later call returning the phone. With only a slight modification we can make lazy_parse lazy:
def lazy_parse(data):
"""extract (name, phone) from a data buffer.
If the buffer could not be parsed, a ParseError is raised."""
ptr = 0
while ptr < len(data):
length = ord(data[ptr])
ptr += 1
result = (safe_slice(data, ptr, ptr + length, ParseError))
ptr += length
yield result
if __name__ == '__main__':
print list(lazy_parse("\x04Jack\x0A0123456789")) # OK
print list(lazy_parse("\x04Jack\x0A012345678")) # should raise IndexError
lazy_parse is now a generator that returns one piece at a time. Notice that we had to put list() around the lazy_parse call in the main section get lazy_parse to give us all the results in order to print them.
Depending on what you're doing this might not be the desired way, however, as it can be more difficult to recover from errors:
for item in lazy_parse(some_data):
result = do_stuff_with(item)
make_changes_with(result)
...
By the time the ParseError is raised you may have made changes that are difficult or impossible to back out. The solution in a case like this would be to do the same as we did in the print part of main:
for item in list(lazy_parse(some_data)):
...
The list call completely consumes lazy_parse and gives us a list of the results, and if an error was raised we'll know about it before we process the first item in the loop.

Here is a complete SafeSlice class re-using https://stackoverflow.com/users/107660/duncan and
https://stackoverflow.com/users/190597/unutbu answers.
The class is quite big because it have full slice support (start, stop and step). This may be overkill for the simple job done in the example but for a more complete real life problem, it might prove useful.
from __future__ import division
from collections import MutableSequence
from collections import namedtuple
from math import ceil
class ParseError(Exception):
pass
Details = namedtuple('Details', 'name phone')
def parse_details(data):
safe_data = SafeSlice(bytearray(data)) # because SafeSlice expects a mutable object
try:
name_length = safe_data.pop(0)
name = safe_data.popslice(slice(name_length))
phone_length = safe_data.pop(0)
phone = safe_data.popslice(slice(phone_length))
except IndexError:
raise ParseError()
if safe_data:
# safe_data should be empty at this point
raise ParseError()
return Details(name, phone)
def main():
print parse_details("\x04Jack\x0A0123456789") # OK
print parse_details("\x04Jack\x0A012345678") # should raise ParseError
SliceDetails = namedtuple('SliceDetails', 'first last length')
class SafeSlice(MutableSequence):
"""This implementation of a MutableSequence gives IndexError with invalid slices"""
def __init__(self, mutable_sequence):
self._data = mutable_sequence
def __str__(self):
return str(self._data)
def __repr__(self):
return repr(self._data)
def __len__(self):
return len(self._data)
def computeindexes(self, ii):
"""Given a slice or an index, this method computes what would ideally be
the first index, the last index and the length if the SafeSequence was
accessed using this parameter.
None indexes will be returned if the computed length is 0.
First and last indexes may be negative. This means that they are invalid
indexes. (ie: range(2)[-4:-3] will return first=-2, last=-1 and length=1)
"""
if isinstance(ii, slice):
start, stop, step = ii.start, ii.stop, ii.step
if start is None:
start = 0
elif start < 0:
start = len(self._data) + start
if stop is None:
stop = len(self._data)
elif stop < 0:
stop = len(self._data) + stop
if step is None:
step = 1
elif step == 0:
raise ValueError, "slice step cannot be zero"
length = ceil((stop - start) / step)
length = int(max(0, length))
if length:
first_index = start
last_index = start + (length - 1) * step
else:
first_index, last_index = None, None
else:
length = 1
if ii < 0:
first_index = last_index = len(self._data) + ii
else:
first_index = last_index = ii
return SliceDetails(first_index, last_index, length)
def slicecheck(self, ii):
"""Check if the first and the last item of parameter could be accessed"""
slice_details = self.computeindexes(ii)
if slice_details.first is not None:
if slice_details.first < 0:
# first is *really* negative
self._data[slice_details.first - len(self._data)]
else:
self._data[slice_details.first]
if slice_details.last is not None:
if slice_details.last < 0:
# last is *really* negative
self._data[slice_details.last - len(self._data)]
else:
self._data[slice_details.last]
def __delitem__(self, ii):
self.slicecheck(ii)
del self._data[ii]
def __setitem__(self, ii, value):
self.slicecheck(ii)
self._data[ii] = value
def __getitem__(self, ii):
self.slicecheck(ii)
r = self._data[ii]
if isinstance(ii, slice):
r = SafeSlice(r)
return r
def popslice(self, ii):
"""Same as pop but a slice may be used as index."""
self.slicecheck(ii)
r = self._data[ii]
if isinstance(ii, slice):
r = SafeSlice(r)
del self._data[ii]
return r
def insert(self, i, value):
length = len(self._data)
if -length <= i <= length:
self._data.insert(i, value)
else:
self._data[i]
if __name__ == '__main__':
main()

Related

I am trying to write an algorithm that uses a stack to check if an expression has balanced parentheses but I keep encountering this error

def is_matched(expression):
left_bracket = "[({"
right_bracket = "])}"
my_stack = Stack(len(expression))
# our solution methodology is to go through the expression and push all of the the open brackets onto the stack and then
# with the closing brackets - each time we encounter a closing bracket we will pop the stack and compare
for character in expression:
if character in left_bracket:
my_stack.push(character)
elif character in right_bracket:
# first check to see that the stack is not empty i.e we actually have some opneing brackets in the expression
if my_stack.is_empty():
return False
# now we need to check that the type of braket we pop is the equivalent of it's closing bracket in the expression
if right_bracket.index(character) != left_bracket.index(my_stack.pop):
return False
return my_stack.is_empty()
print(is_matched("()"))
if right_bracket.index(character) != left_bracket.index(my_stack.pop):
TypeError: expected a string or other character buffer object
python-BaseException
here is my stack implementation:
class Stack:
def __init__(self, capacity):
"""Builds a stack with given capacity > 0."""
if capacity <= 0:
raise Exception("The capacity must be positive")
self.the_array = [None] * capacity
self.top = -1 # the index of the top element
def size(self):
"""Returns the size, i.e. the number
of elements in the container."""
return self.top + 1
def is_empty(self):
"""Returns True if and only if the container is empty."""
return self.size() == 0
def is_full(self):
"""Returns True if and only if the container is full."""
return self.size() >= len(self.the_array)
def push(self, item):
"""Places the given item at the top of the stack
if there is capacity, or raises an Exception."""
if self.is_full():
raise Exception("The stack is full")
self.top += 1
self.the_array[self.top] = item
def pop(self):
"""Removes and returns the top element of the stack,
or raises an Exception if there is none."""
if self.is_empty():
raise Exception("The stack is empty")
item = self.the_array[self.top]
# removes a reference to this item,
# helps with memory management and debugging
self.the_array[self.top] = None
self.top -= 1
return item
def reset(self):
"""Removes all elements from the container."""
while not self.is_empty():
self.pop()
assert (self.is_empty)
It should upon the second iteration pop the stack and notice that the indexes of the right and left bracket are the same and move into the final iteration where it realises the stack is empty and returns True but it is not doing so but instead throwing a typeError.
Any help is appreciated.
Thank you
at this line:
if right_bracket.index(character) != left_bracket.index(my_stack.pop):
you actually need to call pop method, since pop is a method, not a property.
therefore it should look like this:
if right_bracket.index(character) != left_bracket.index(my_stack.pop()):

How to return only numbers that divide by 3 in iterable form within class using iterator

I've started to learn iterators am trying to implement them myself.
I have created a class that should provide numbers within a range from user defined start to user defined end, in an iterable form.
Now my code looks like this:
class Can_be_divided_by_three:
def __init__(self, start, end):
self.start = start
self.end = end
def __iter__(self):
return self
def __next__(self):
if self.start > self.end:
raise StopIteration
item = self.start
self.start += 1
if item % 3 == 0:
return item
iterator = Can_be_divided_by_three(3, 8)
print(next(iterator))
print(next(iterator))
print(next(iterator))
print(next(iterator))
And this is the output:
3
None
None
6
So actually there is output even if the number is not divided by 3 and it is None.
Am I getting this wrong, and if yes, how to get it right? I actually need the only output in the form of number divisible by 3 with iteration capabilities.
Thank you in advance.
As per your logic next method will return the number if the number is divisible by 3 but you have not specified what this function should do if the number is not divisible by three, so try below code:
class Can_be_divided_by_three:
def __init__(self, start, end):
self.start = start
self.end = end
def __iter__(self):
return self
def __next__(self):
if self.start > self.end:
raise StopIteration
item = self.start
self.start += 1
if item % 3 == 0:
return item
else:
return self.__next__()
iterator = Can_be_divided_by_three(3, 8)
print(next(iterator))
print(next(iterator))
print(next(iterator))
print(next(iterator))
Based on John Coleman's comment that you only need to find the smallest multiple of 3, you can achieve the same with this:
def Can_be_divided_by_three(start, end):
while start % 3:
start += 1
for i in range(start, end, 3):
yield i
You only return item if item % 3 == 0.
You should return something else in other scenarios if you want to avoid Nones
The code which you have written is right and would return number divisible by 3 within user defined range. The reason you are finding None in the output is because of the print() function.
Try running the only next(iterator). In Jupyter the iterator returns only 3 & 6

How to define an indexing method within a class (__getitem__ attempted)

I am new to oop in python.
Below is a class for a mathod that is similar to range() except that it is inclusive for the range boundary.
I am trying to create an indexing method inside the class so that when a specific index is called the element with that index is returned. I read that __getitem__ can perform indexing yet I have not been successful in implementing it correctly. If there is a more efficient way not necessarily using __getitem__ please advise.
Please take a look at the code below, this is a simple code aimed at learning how create classes.
the method starting at def __getitem__(self,index) is the one that does not work and this corresponds to calling the index at the end o[4] which is what I would like to achieve.
class inclusive_range:
def __init__(self, *args):
numargs = len(args)
if numargs < 1: raise TypeError('requires at least one argument')
elif numargs == 1:
self.stop = args[0]
self.start = 0
self.step = 1
elif numargs == 2:
(self.start,self.stop) = args
self.step = 1
elif numargs == 3:
(self.start,self.stop,self.step) = args
else:
raise TypeError('three arguments required at most, got {}'.format(numargs))
def __iter__(self): # this makes the function (the method) and iterable function
i = self.start
while i <= self.stop:
yield i
i += self.step
def __getitem__(self,index):
return self[index]
print(self[index])
def main():
o = inclusive_range(5, 10, 1)
for i in o: print(i, end=' ')
o[2]
if __name__ == "__main__": main()
Thank you
You can just calculate the number based on self.start, the index and the step size. For your object to be a proper sequence you also need a length, which comes in handy when testing for the boundaries:
def __len__(self):
start, stop, step = self.start, self.stop, self.step
if step < 0:
lo, hi = stop, start
else:
lo, hi = start, stop
return ((hi - lo) // abs(step)) + 1
def __getitem__(self, i):
length = len(self)
if i < 0:
i += length
if 0 <= i < length:
return self.start + i * self.step
raise IndexError('Index out of range: {}'.format(i))
The above is based on my own translation of the range() source code to Python, with a small adjustment to account for the end being inclusive.
I'd cache the __len__ result in __init__, to avoid having to re-calculate it each time you want to know the length.

Enforcing type restriction in list abstraction using python

Below is the list abstraction in functional paradigm, that encapsulates any type of data in its representation.
empty_rlist = None
#Representation - start
#Constructor
def rlist(first, rest):
return(first, rest)
#Selector
def first(s):
return s[0]
def rest(s):
return s[1]
#Representation - end
#Constructor and Selector constitutes ADT(above) that supports below invariant:
#If a recursive list s is constructed from a first element f and a recursive list r, then
# • first(s) returns f, and
# • rest(s) returns r, which is a recursive list.
#Usage(interface) - start
def create_list(first, rest):
return rlist(first, rest)
def len_rlist(s):
"""Compute the length of the recursive list s"""
def compute_length(s, length):
if s is empty_rlist:
return length
else:
return compute_length(rest(s), length + 1)
return compute_length(s, 0)
def getitem_rlist(s, i):
"""Return the element at index i of recursive list s"""
if i == 1:
return first(s)
else:
return getitem_rlist(rest(s), i-1)
def count(s, value):
"""Count the occurence of value in the list s """
def count_occurence(s, value, count):
if s == empty_rlist:
return count
else:
if first(s) == value:
return count_occurence(rest(s), value, count + 1)
else:
return count_occurence(rest(s), value, count)
return count_occurence(s, value, 0)
#Usage - end
Lst = empty_rlist
Lst = create_list(4, Lst)
Lst = create_list(3, Lst)
Lst = create_list(1, Lst)
Lst = create_list(1, Lst)
print(count(Lst, 1))
In the above code, interfaces that are provided to users of this abstraction are create_list / len_rlist / getitem_rlist / count.
Questions:
How to enforce that the object passed to parameter(s) of interfaces len_rlist / getitem_rlist / count is nothing but the object provided by create_list interface?
How to enforce above list abstraction store same type data?
Note: Practically it is required to enforce these rules from syntax perspective.
Because python is dynamicaly typed language you can't check type before executing. But in reality sometimes need check input parameters, return values. I use next solutions for this tasks:
def accepts(*types):
"""Check input types"""
#print types
def check_accepts(f):
assert len(types) == f.func_code.co_argcount
def new_f(*args, **kwds):
for (a, t) in zip(args, types):
assert isinstance(a, t), \
"arg %r does not match %s" % (a,t)
return f(*args, **kwds)
new_f.func_name = f.func_name
return new_f
return check_accepts
def returns(rtype):
"""Check returns type"""
def check_returns(f):
def new_f(*args, **kwds):
result = f(*args, **kwds)
assert isinstance(result, rtype), \
"return value %r does not match %s" % (result,rtype)
return result
new_f.func_name = f.func_name
return new_f
return check_returns
if __name__ == '__main__':
import types
#returns(types.NoneType) #Check None as result
#accepts(int, (int,float)) #First param int; second int or float
def func(arg1, arg2):
#return str(arg1 * arg2)
pass
func(1, 2)
In order to enforce the type, you will have to provide the type as a parameter somewhere in your constructor. Consider building a parameterized type constructor. Here is an example.
>>> def list_spec_for(type_):
... empty_rlist = type_()
... def rlist(first, rest):
... return (type_(first), rest)
... return empty_rlist, rlist
>>> empty_rlist, rlist = list_spec_for(int)
>>> empty_rlist
0
>>> rlist(1, empty_rlist)
(1, 0)
>>> rlist("1", empty_rlist)
(1, 0)
>>> rlist("one", empty_rlist)
ValueError: invalid literal for int() with base 10: 'one'
If accepting "1" is not OK for your purpose, you can of course add an isinstance check to the definition of rlist.
Python is not a strongly typed language. More exactly, it is a dynamic typed. That means that variables contains values that do have a type, but the language itself will never forbids to put a value of a different type in a variable.
a = 1 # a contains an integer value
a = "abc" # a now contains a string value
But, you have the isinstance and type functions that could help to achieve this requirement : you could affect a type to your recursive list and only allow to bind together an element and a recursive list of compatible types.
The full spec could be :
a rlist stores the type of the element it accepts
a rlist can be constructed by adding a first element for which isinstance(elt, typ) is true, and typ is the accepted typ of the rest part
an initial list can be constructed by giving it explicetly a type, or by using the type of its first element
Implementation:
class rlist:
def __init__(self, first, rest=None, typ=None):
self.first = first
self.rest = rest
if rest is None: # initial creation
self.typ = type(first) if typ is None else typ
else:
if not isinstance(rest, rlist):
raise TypeError(str(rest) + " not a rlist"
self.typ = rest.typ
if not isinstance(first, self.typ):
raise TypeError(str(first) + "not a " + str(typ))
# other methods ...
But when you need strong typing, you should wonder if Python is really the appropriate language - Java is strongly typed and natively supports all that. The Python way is more I accept this and just hope it'll fit, programmer should know what he does

Pretty printing a linked list of nodes in python using __str__()

I have a node class in python which is something like this
class Node:
def __init__(self, value, next, prev, rand):
self.value = value
self.next = next
self.prev = prev
self.rand = rand
I create a list out of this by creating nodes and setting the appropriate pointers. Now, i want to pretty print this list : like [1] --> [2] --> [3] and so on. If there is a random pointer along with next it will be [1] --> [2] --> [3,4] -->[4] for example. here 2 next is 3 and 2 rand is 4 while 3 next is 4. I am trying to do this with the built-in str() method in the node class it self. I have it as below as of now
#pretty print a node
def __str__(self):
if self.next is None:
return '[%d]' % (self.value)
else:
return '[%d]-->%s' % (self.value, str(self.next))
This pretty prints it without the random pointer. But I am not able to incorporate the random pointer printing in this. I tried couple of approaches, but it's messing up the brackets. How would i go about this ?
Thanks
Try factoring the printing of the pointers out, that way it may be clearer:
#pretty print a node
def __str__(self):
if self.next is None:
return self.stringify_pointers()
else:
return '%s-->%s' % (self.stringify_pointers(), str(self.next))
def stringify_pointers(self):
s = '['
if self.next:
s += str(self.next) + (',' if self.rand else '')
if self.rand:
# <> to distinguish rand from next
s += '<%s>' % str(self.rand)
return s + ']'
One way to do it is using array slices on the string and injecting the random value into the next node string like this
def __str__(self):
if self.next is None:
return '[%d]' % (self.value)
else:
nn = str(self.next)
if self.rand != None:
nn = '%s,%d%s' % (nn[:2], self.rand, nn[2:])
return '[%d]-->%s' % (self.value, nn)

Categories

Resources