Tuple Indexing, Combine Slice and Index - python

i have got a question about tuple indexing and slicing in python. I want to write better and clearer code. This is a simplified version of my problem:
I have a tuple a = (1,2,3,4,5) and i want to index into it so that i get b = (1,2,4).
Is it possible to do this in one operation or do i have do b = a[0:2] + (a[3],)? I have thought about indexing with another tuple, which is not possible, i have also searched if there is a way to combine a slice and an index. It just seems to me, that there must be a better way to do it.
Thank you very much :)

Just for fun, you can implement an indexer and use simple syntax to get the results:
from collections.abc import Sequence
class SequenceIndexer:
def __init__(self, seq: Sequence):
self.seq = seq
def __getitem__(self, item):
if not isinstance(item, tuple):
return self.seq[item]
result = []
for i in item:
if isinstance(i, slice):
result.extend(self.seq[i])
else:
result.append(self.seq[i])
return result
a = 1, 2, 3, 4, 5
indexer = SequenceIndexer(a)
print(indexer[:2, 3]) # [1, 2, 4]
print(indexer[:3, 2, 4, -3:]) # [1, 2, 3, 3, 5, 3, 4, 5]

As stated in the comment you can use itemgetter
import operator
a = (1,2,3,4)
result = operator.itemgetter(0, 1, 3)(a)
If you do this kind of indexing often and your arrays are big, you might also have a look at numpy as it supports more complex indexing schemes:
import numpy as np
a = (1,2,3,4)
b = np.array(a)
result = b[[0,1,3]]
If the array is very large and or you have a huge list of single points you want to include it will get convinient to construct the slicing list in advance. For the example above it will look like this:
singlePoints = [3,]
idx = [*range(0,2), *singlePoints]
result = b[idx]

you can easily use unpacking but dont name the parameters you dont need you can name them as underscore (_).
one, two, _, four, _ = (1, 2, 3, 4, 5)
now you have your numbers.
this is one simple way.
b = (one, two, four)
or you can use numpy as well.

Related

Nested array computations in Python using numpy

I am trying to use numpy in Python in solving my project.
I have a random binary array rndm = [1, 0, 1, 1] and a resource_arr = [[2, 3], 4, 2, [1, 2]]. What I am trying to do is to multiply the array element wise, then get their sum. As an expected output for the sample above,
output = 5 0 2 3. I find hard to solve such problem because of the nested array/list.
So far my code looks like this:
def fitness_score():
output = numpy.add(rndm * resource_arr)
return output
fitness_score()
I keep getting
ValueError: invalid number of arguments.
For which I think is because of the addition that I am trying to do. Any help would be appreciated. Thank you!
Numpy treats its arrays as matrices, and resource_arr is not a (valid) matrix. In your case a python list is more suitable:
def sum_nested(l):
tmp = []
for element in l:
if isinstance(element, list):
tmp.append(numpy.sum(element))
else:
tmp.append(element)
return tmp
In this function we check for each element inside l if it is a list. If so, we sum its elements. On the other hand, if the encountered element is just a number, we leave it untouched. Please note that this only works for one level of nesting.
Now, if we run sum_nested([[2, 3], 4, 2, [1, 2]]) we will get [5 4 2 3]. All that's left is multiplying this result by the elements of rndm, which can be achieved easily using numpy:
def fitness_score(a, b):
return numpy.multiply(a, sum_nested(b))
Numpy is all about the non-jagged arrays. You can do things with jagged arrays, but doing so efficiently and elegantly isnt trivial.
Almost always, trying to find a way to map your datastructure to a non-nested one, for instance, encoding the information as below, will be more flexible, and more performant.
resource_arr = (
[0, 0, 1, 2, 3, 3]
[2, 3, 4, 2, 1, 2]
)
That is, an integer denoting the 'row' each value belongs to, paired with an array of equal size of the values themselves.
This may 'feel' wasteful when coming from a C-style way of doing arrays (omg more memory consumption), but staying away from nested datastructures is almost certainly your best bet in terms of performance, and the amount of numpy/scipy ecosystem that will actually be compatible with your data representation. If it really uses more memory is actually rather questionable; every new python object uses a ton of bytes, so if you have only few elements per nesting, it is the more memory efficient solution too.
In this case, that would give you the following efficient solution to your problem:
output = np.bincount(*resource_arr) * rndm
I have not worked much with pandas/numpy so I'm not sure if this is most efficient way, but it works (atleast for the example you have shown):
import numpy as np
rndm = [1, 0, 1, 1]
resource_arr = [[2, 3], 4, 2, [1, 2]]
multiplied_output = np.multiply(rndm, resource_arr)
print(multiplied_output)
output = []
for elem in multiplied_output:
output.append(sum(elem)) if isinstance(elem, list) else output.append(elem)
final_output = np.array(output)
print(final_output)

using python itertools to manage nested for loops

I am trying to use itertools.product to manage the bookkeeping of some nested for loops, where the number of nested loops is not known in advance. Below is a specific example where I have chosen two nested for loops; the choice of two is only for clarity, what I need is a solution that works for an arbitrary number of loops.
This question provides an extension/generalization of the question appearing here:
Efficient algorithm for evaluating a 1-d array of functions on a same-length 1d numpy array
Now I am extending the above technique using an itertools trick I learned here:
Iterating over an unknown number of nested loops in python
Preamble:
from itertools import product
def trivial_functional(i, j): return lambda x : (i+j)*x
idx1 = [1, 2, 3, 4]
idx2 = [5, 6, 7]
joint = [idx1, idx2]
func_table = []
for items in product(*joint):
f = trivial_functional(*items)
func_table.append(f)
At the end of the above itertools loop, I have a 12-element, 1-d array of functions, func_table, each element having been built from the trivial_functional.
Question:
Suppose I am given a pair of integers, (i_1, i_2), where these integers are to be interpreted as the indices of idx1 and idx2, respectively. How can I use itertools.product to determine the correct corresponding element of the func_table array?
I know how to hack the answer by writing my own function that mimics the itertools.product bookkeeping, but surely there is a built-in feature of itertools.product that is intended for exactly this purpose?
I don't know of a way of calculating the flat index other than doing it yourself. Fortunately this isn't that difficult:
def product_flat_index(factors, indices):
if len(factors) == 1: return indices[0]
else: return indices[0] * len(factors[0]) + product_flat_index(factors[1:], indices[1:])
>> product_flat_index(joint, (2, 1))
9
An alternative approach is to store the results in a nested array in the first place, making translation unnecessary, though this is more complex:
from functools import reduce
from operator import getitem, setitem, itemgetter
def get_items(container, indices):
return reduce(getitem, indices, container)
def set_items(container, indices, value):
c = reduce(getitem, indices[:-1], container)
setitem(c, indices[-1], value)
def initialize_table(lengths):
if len(lengths) == 1: return [0] * lengths[0]
subtable = initialize_table(lengths[1:])
return [subtable[:] for _ in range(lengths[0])]
func_table = initialize_table(list(map(len, joint)))
for items in product(*map(enumerate, joint)):
f = trivial_functional(*map(itemgetter(1), items))
set_items(func_table, list(map(itemgetter(0), items)), f)
>>> get_items(func_table, (2, 1)) # same as func_table[2][1]
<function>
So numerous answers were quite useful, thanks to everyone for the solutions.
It turns out that if I recast the problem slightly with Numpy, I can accomplish the same bookkeeping, and solve the problem I was trying to solve with vastly improved speed relative to pure python solutions. The trick is just to use Numpy's reshape method together with the normal multi-dimensional array indexing syntax.
Here's how this works. We just convert func_table into a Numpy array, and reshape it:
func_table = np.array(func_table)
component_dimensions = [len(idx1), len(idx2)]
func_table = np.array(func_table).reshape(component_dimensions)
Now func_table can be used to return the correct function not just for a single 2d point, but for a full array of 2d points:
dim1_pts = [3,1,2,1,3,3,1,3,0]
dim2_pts = [0,1,2,1,2,0,1,2,1]
func_array = func_table[dim1_pts, dim2_pts]
As usual, Numpy to the rescue!
This is a little messy, but here you go:
from itertools import product
def trivial_functional(i, j): return lambda x : (i+j)*x
idx1 = [1, 2, 3, 4]
idx2 = [5, 6, 7]
joint = [enumerate(idx1), enumerate(idx2)]
func_map = {}
for indexes, items in map(lambda x: zip(*x), product(*joint)):
f = trivial_functional(*items)
func_map[indexes] = f
print(func_map[(2, 0)](5)) # 40 = (3+5)*5
I'd suggest using enumerate() in the right place:
from itertools import product
def trivial_functional(i, j): return lambda x : (i+j)*x
idx1 = [1, 2, 3, 4]
idx2 = [5, 6, 7]
joint = [idx1, idx2]
func_table = []
for items in product(*joint):
f = trivial_functional(*items)
func_table.append(f)
From what I understood from your comments and your code, func_table is simply indexed by the occurence of a certain input in the sequence. You can access it back again using:
for index, items in enumerate(product(*joint)):
# because of the append(), index is now the
# position of the function created from the
# respective tuple in join()
func_table[index](some_value)

Slices with several intervals?

I have a file where I want to extract columns 2, 3, 4, 5 and column -4. These columns are not adjacent.
For reasons of code neatness I'd like to do something like
values = line.split()[columns_to_extract]
instead of
values_part_one = line.split()[columns_to_extract_one]
values_part_two = line.split()[columns_to_extract_two]
Therefore I'd like to make a slice that contains the positions 2, 3, 4, 5 and -4 to be able to extract the values in one line. Is this possible?
If not, are there any other neat oneliners that could do this?
Is it possible to make a slice to do that? No.
However, all is not lost! You can use operator.itemgetter:
getter = operator.itemgetter(2, 3, 4, 5, -4)
example:
>>> import operator
>>> getter = operator.itemgetter(2, 3, 4, 5, -4)
>>> getter(range(50)) # Note, returns a `tuple`
(2, 3, 4, 5, 46)
parts = line.split()
values_part_one = [parts[i] for i in columns_to_extract_one]
values_part_two = [parts[i] for i in columns_to_extract_two]
or, as #mgilson points out, you could use operator.itemgetter to get tuples:
import operator
extract_one = operator.itemgetter(*columns_to_extract_one) # or list explicitly
extract_two = operator.itemgetter(*columns_to_extract_two) # if using fixed cols
parts = line.split()
values_part_one = extract_one(parts)
values_part_Two = extract_two(parts)
Note that both of these will fail with IndexError if the thing you are trying to grab from isn't large enough to contain all of the specified indices.

Writing a generalized function for both strings and lists in python

So i'm green as grass and learning programming from How to think like a computer scientist: Learn python 3. I'm able to answer the question (see below) but fear i'm missing the lesson.
Write a function (called insert_at_end) that will pass (return the bold given the two arguments before) for all three:
test(insert_at_end(5, [1, 3, 4, 6]), **[1, 3, 4, 6, 5]**)
test(insert_at_end('x', 'abc'), **'abcx'**)
test(insert_at_end(5, (1, 3, 4, 6)), **(1, 3, 4, 6, 5)**)
The book gives this hint:"These exercises illustrate nicely that the sequence abstraction is general, (because slicing, indexing, and concatenation are so general), so it is possible to write general functions that work over all sequence types.".
This version doesn't have solutions on-line (that i could find) but in I found someone's answers to a previous version of the text (for python 2.7) and they did it this way:
def encapsulate(val, seq):
if type(seq) == type(""):
return str(val)
if type(seq) == type([]):
return [val]
return (val,)
def insert_at_end(val, seq):
return seq + encapsulate(val, seq)
Which seems to be solving the question by distinguishing between lists and strings... going against the hint. So how about it Is there a way to answer the question (and about 10 more similar ones) without distinguishing? i.e not using "type()"
My best effort:
def insert_at_end(val, seq):
t = type(seq)
try:
return seq + t(val)
except TypeError:
return seq + t([val])
This will attempt to create the sequence of type(seq) and if val is not iterable produces a list and concatenates.
I'd say that the example isn't symetric, meaning that it asks the reader handle two different cases:
int, list
str, str
In my opinion, the exercise should ask to implement this:
list, list: insert_at_end([5], [1, 3, 4, 6])
str, str: insert_at_end('x', 'abc')
In this case, the reader has to work only with two parameters that use the same sequence type and the hint would make much more sense.
This is not a solution but rather an explanation why a truly elegant solution does not look possible.
+ concatenates sequences, but only sequences of same type.
values passed as first argument to insert_at_end are 'scalar', so you have to convert them to the sequence type that the second argument has.
to do that, you cannot simply call a sequence constructor with a scalar argument and create a one-item sequence of that type: tuple(1) does not work.
str works differently than other sequence types: tuple(["a"]) is ("a",), list(["a"]) is ["a"], but str(["a"])) is "['a']" and not "a".
This renders + useless in this situation, even though you can easily construct a sequence of given type cleanly, without instanceof, just by using type().
You can't use slice assignment, too, since only lists are mutable.
In this situation, the solution by #Hamish looks cleanest.
That problem is one of a long list and the hint applies to all of them. I think it is reasonable that, having written the encapsulate function which can be re-used for things like insert_at_front, the rest of the implementation is type agnostic.
However, I think a better implementation of encapsulate might be:
def encapsulate(val, seq):
if isinstance(seq, basestring):
return val
return type(seq)([val])
which handles a wider range of types with less code.
The challenge with this question (in Python 2.7, I'm testing 3.2 right now to verify) is that two of the possible input types for seq are immutable, and you're expected to return the same type as was passed in. For strings, this is less of an issue, because you could do this:
return seq + char
As that will return a new string that's the concatenation of the input sequence and the appended character, but that doesn't work for lists or tuples. You can only concatenate a list to a list or a tuple to a tuple. If you wanted to avoid "type" checking, you could get there with something like this:
if hasattr(seq, 'append'): # List input.
seq.append(char)
elif hasattr(seq, 'strip'): # String input.
seq = seq + char
else: # Tuple
seq = seq + (char,)
return seq
That's really not much different from actually checking types, but it does avoid using the type function directly.
This solution still requires some separate code for strings as opposed to lists/tuples, but it is more concise and doesn't do any checking for specific types.
def insert_at_end(val, seq):
try:
return seq + val
except TypeError: # unsupported operand type(s) for +
return seq + type(seq)([val])
Maybe this is nearer the answer:
def genappend(x, s):
if isinstance(s, basestring):
t = s[0:0].join
else:
t = type(s)
lst = list(s)
lst.append(x)
return t(lst)
print genappend(5, [1,2,3,4])
print genappend(5, (1,2,3,4))
print genappend('5', '1234')
There could also be completely user-defined sequence types. They will also work as long as convertable to and from a list. This also works:
print genappend('5', set('1234'))
I agree that the point is if item is iterable or not.
So my solution would be this:
def iterate(seq, item):
for i in seq:
yield i
yield item
def insert_at_end(seq, item):
if hasattr(item, '__iter__'):
return seq + item
else:
return type(seq)(iterate(seq, item))
Example:
>>> insert_at_end('abc', 'x')
'abcx'
>>> insert_at_end([1, 2, 4, 6], 5)
[1, 2, 4, 6, 5]
>>> insert_at_end((1, 2, 4, 6), 5)
(1, 2, 4, 6, 5)
Since insert_at_end can handle iterable and not, works fine even with:
>>> insert_at_end('abc', 'xyz')
'abcxyz'
>>> insert_at_end([1, 2, 4, 6], [5, 7])
[1, 2, 4, 6, 5, 7]
>>> insert_at_end((1, 2, 4, 6), (5, 7))
(1, 2, 4, 6, 5, 7)
While encapsulate relies on the type, the code directly in insert_at_end does not, and relies on + meaning related things for all 3 types, and so in that sense, fits in with the hint.

What is more efficient in python new array creation or in place array manipulation?

Say I have an array with a couple hundred elements. I need to iterate of the array and replace one or more items in the array with some other item. Which strategy is more efficient in python in terms of speed (I'm not worried about memory)?
For example: I have an array
my_array = [1,2,3,4,5,6]
I want to replace the first 3 elements with one element with the value 123.
Option 1 (inline):
my_array = [1,2,3,4,5,6]
my_array.remove(0,3)
my_array.insert(0,123)
Option2 (new array creation):
my_array = [1,2,3,4,5,6]
my_array = my_array[3:]
my_array.insert(0,123)
Both of the above will options will give a result of:
>>> [123,4,5,6]
Any comments would be appreciated. Especially if there is options I have missed.
If you want to replace an item or a set of items in a list, you should never use your first option. Removing and adding to a list in the middle is slow (reference). Your second option is also fairly inefficient, since you're doing two operations for a single replacement.
Instead, just do slice assignment, as eiben's answer instructs. This will be significantly faster and more efficient than either of your methods:
>>> my_array = [1,2,3,4,5,6]
>>> my_array[:3] = [123]
>>> my_array
[123, 4, 5, 6]
arr[0] = x
replaces the 0th element with x. You can also replace whole slices.
>>> arr = [1, 2, 3, 4, 5, 6]
>>> arr[0:3] = [8, 9, 99]
>>> arr
[8, 9, 99, 4, 5, 6]
>>>
And generally it's unclear what you're trying to achieve. Please provide more information or an example.
OK, as for your update. The remove method doesn't work (remove needs one argument). But the slicing I presented works for your case too:
>>> arr
[8, 9, 99, 4, 5, 6]
>>> arr[0:3] = [4]
>>> arr
[4, 4, 5, 6]
I would guess it's the fastest method, but do try it with timeit. According to my tests it's twice as fast as your "new array" approach.
If you're looking speed efficience and manipulate series of integers, You should use the standard array module instead:
>>> import array
>>> my_array = array.array('i', [1,2,3,4,5,6])
>>> my_array = my_array[3:]
>>> my_array.insert(0,123)
>>> my_array
array('i', [123, 4, 5, 6])
The key thing is to avoid moving large numbers of list items more than absolutely have to. Slice assignment, as far as i'm aware, still involves moving the items around the slice, which is bad news.
How do you recognise when you have a sequence of items which need to be replaced? I'll assume you have a function like:
def replacement(objects, startIndex):
"returns a pair (numberOfObjectsToReplace, replacementObject), or None if the should be no replacement"
I'd then do:
def replaceAll(objects):
src = 0
dst = 0
while (src < len(objects)):
replacementInfo = replacement(objects, src)
if (replacementInfo != None):
numberOfObjectsToReplace, replacementObject = replacementInfo
else:
numberOfObjectsToReplace = 1
replacementObject = objects[src]
objects[dst] = replacementObject
src = src + numberOfObjectsToReplace
dst = dst + 1
del objects[dst:]
This code still does a few more loads and stores than it absolutely has to, but not many.

Categories

Resources