Means of asymmetric arrays in numpy

Means of asymmetric arrays in numpy - python

I have an asymmetric 2d array in numpy, as in some arrays are longer than others, such as: [[1, 2], [1, 2, 3], ...]
But numpy doesn't seem to like this:
import numpy as np
foo = np.array([[1], [1, 2]])
foo.mean(axis=1)
Traceback:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/tom/.virtualenvs/nlp/lib/python3.5/site-packages/numpy/core/_methods.py", line 56, in _mean
rcount = _count_reduce_items(arr, axis)
File "/home/tom/.virtualenvs/nlp/lib/python3.5/site-packages/numpy/core/_methods.py", line 50, in _count_reduce_items
items *= arr.shape[ax]
IndexError: tuple index out of range
Is there a nice way to do this or should I just do the maths myself?

We could use an almost vectorized approach based upon np.add.reduceat that takes care of the irregular length subarrays, for which we are calculating the average values. np.add.reduceat sums up elements in those intervals of irregular lengths after getting a 1D flattened version of the input array with np.concatenate. Finally, we need to divide the summations by the lengths of those subarrays to get the average values.
Thus, the implementation would look something like this -
lens = np.array(map(len,foo)) # Thanks to #Kasramvd on this!
vals = np.concatenate(foo)
shift_idx = np.append(0,lens[:-1].cumsum())
out = np.add.reduceat(vals,shift_idx)/lens.astype(float)

You could perform the mean for each sub-array of foo using a list comprehension:
mean_foo = np.array( [np.mean(subfoo) for subfoo in foo] )
As suggested by #Kasramvd in another answer's comment, you can also use the map function :
mean_foo = np.array( map(np.mean, foo) )

Related

Numpy minimum like np.outer()

Maybe I'm just being lazy here, but let's say that I have two arrays, of length n and m, and I'd like a pairwise minimum of all of the elements of the two arrays compared against each other. For example:
a = [1,5,3]
b = [2,4]
cross_min(a,b)
= [[1,1],[2,4],[2,3]]
This is similar to the behavior of np.outer(), except that instead of multiplying the two arrays, it computes the minimum of the two elements.
Is there an operation in numpy that does a similar thing?
I know that I can just run np.minimum() along b and stack the results together. I'm wondering if this is a well-known operation that I just don't know the name of.

You can use np.minimum.outer(a, b)

You might turn one of the array into a 2d array, and then make use of the broadcasting rule and np.minimum:
import numpy as np
a = np.array([1,5,3])
b = np.array([2,4])
np.minimum(a[:,None], b)
#array([[1, 1],
# [2, 4],
# [2, 3]])

Unpacking iterable using map

Assuming I have two iterables of numbers of the same length
weights = range(0, 10)
values = range(0, 100, 10)
I need to count weighted sum. I know that it can be done with list comprehension
weighted_sum = sum(weight * value for weight, value in zip(weights, values))
I wonder if it can be done using map and operator.mul like
import operator
weighted_sum = sum(map(operator.mul, zip(weights, values)))
but this gives an error
Traceback (most recent call last):
File "<input>", line 3, in <module>
TypeError: op_mul expected 2 arguments, got 1
so my question: is there any way of passing unpacked tuples to function using map?

map doesn't need the zip, just use
weighted_sum = sum(map(operator.mul, weights, values))
From map's Documentation
If additional iterable arguments are passed, function must take that many arguments and is applied to the items from all iterables in parallel.
Also mentioned in map's documentation is that instead you can use itertools.starmap instead of map for already zipped input.
As Rahul hinted at, using numpy is always a good idea when dealing with numerics, actually something like
import numpy as np
np.asarray(weights) * values
already should do the trick (though in contrast to map this requires the two arrays to be of same length, while map would map the shortest length).

Try this:
>>> import operator
>>>
>>> weights = range(0, 10)
>>> values = range(0, 100, 10)
>>> sum(map(lambda i:operator.mul(*i), zip(weights, values)))
2850
Or
>>> sum(map(operator.mul, weights, values))
2850

You can also attempt with numpy,
In [45]: import numpy as np
In [46]: sum(map(np.multiply,weights,values))
Out[46]: 2850
As per Tobias Kienzler's Suggestion,
In [52]: np.sum(np.array(weights) * values)
Out[52]: 2850

apply fromiter over matrix

Why does fromiter fail if I want to apply a function over the entire matrix?
>>> aaa = np.matrix([[2],[23]])
>>> np.fromiter( [x/2 for x in aaa], np.float)
array([ 1., 11.])
This works fine, but if the matrix is 2D, i get the following error:
>>> aaa = np.matrix([[2,2],[1,23]])
>>> aaa
matrix([[ 2, 2],
[ 1, 23]])
>>> np.fromiter( [x/2 for x in aaa], np.float)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: setting an array element with a sequence.
What alternate can I use?
I know i can write 2 loops for rows and columns, but that seems slow and not pythonic.
Thanks in advance.

Iterating over a multidimensional matrix iterates over the rows, not the cells. To iterate over each value, iterate over aaa.flat.
Note that fromiter (as documented) only creates one-dimensional arrays, which is why you have to iterate over the cells and not the rows. If you want to create new matrix of some other shape, you'll have to reshape the resulting 1d array.
Also, of course, in many cases you don't need to iterate at all. For your example, you can just do aaa/2 to divide every element of the matrix by 2.

simple but weird vstack/concatenate problems (python)

I've been reading over the documentation on numpy arrays and some of it is not making sense.
For instance, the answer given here suggest to use np.vstack or np.concatenate to combine arrays, as does many other places on the internet.
However, when I try to do this with converted lists to np.arrays is doesn't work:
"
>>> some_list = [1,2,3,4,5]
>>> np.array(some_list)
array([1, 2, 3, 4, 5])
>>> some_Y_list = [2,1,5,6,3]
>>> np.array(some_Y_list)
array([2, 1, 5, 6, 3])
>>> dydx = np.diff(some_Y_list)/np.diff(some_list)
>>> np.vstack([dydx, dydx[-1]])"
Traceback (most recent call last):
File "<pyshell#5>", line 1, in <module>
np.vstack([dydx, dydx[-1]])
File "C:\Python27\lib\site-packages\numpy\core\shape_base.py", line 226, in vstack
return _nx.concatenate(map(atleast_2d,tup),0)
ValueError: array dimensions must agree except for d_0
Any way that I can do this?
All I am needing this for in this instance is just to make the derivatives of any order the same shape as my X array given by the user so I can do processing.
Thanks for any help.

The following won't work except in some very limited circumstances:
np.vstack([dydx, dydx[-1]])
Here, dydx is an array and dydx[-1] is a scalar.
It's unclear what you're trying to achieve, but did you perhaps mean to stack them horizontally:
np.hstack([dydx, dydx[-1]])
?
In [38]: np.hstack([dydx, dydx[-1]])
Out[38]: array([-1, 4, 1, -3, -3])

Pythonic way to get the first AND the last element of the sequence

What is the easiest and cleanest way to get the first AND the last elements of a sequence? E.g., I have a sequence [1, 2, 3, 4, 5], and I'd like to get [1, 5] via some kind of slicing magic. What I have come up with so far is:
l = len(s)
result = s[0:l:l-1]
I actually need this for a bit more complex task. I have a 3D numpy array, which is cubic (i.e. is of size NxNxN, where N may vary). I'd like an easy and fast way to get a 2x2x2 array containing the values from the vertices of the source array. The example above is an oversimplified, 1D version of my task.

Use this:
result = [s[0], s[-1]]

Since you're using a numpy array, you may want to use fancy indexing:
a = np.arange(27)
indices = [0, -1]
b = a[indices] # array([0, 26])
For the 3d case:
vertices = [(0,0,0),(0,0,-1),(0,-1,0),(0,-1,-1),(-1,-1,-1),(-1,-1,0),(-1,0,0),(-1,0,-1)]
indices = list(zip(*vertices)) #Can store this for later use.
a = np.arange(27).reshape((3,3,3)) #dummy array for testing. Can be any shape size :)
vertex_values = a[indices].reshape((2,2,2))
I first write down all the vertices (although I am willing to bet there is a clever way to do it using itertools which would let you scale this up to N dimensions ...). The order you specify the vertices is the order they will be in the output array. Then I "transpose" the list of vertices (using zip) so that all the x indices are together and all the y indices are together, etc. (that's how numpy likes it). At this point, you can save that index array and use it to index your array whenever you want the corners of your box. You can easily reshape the result into a 2x2x2 array (although the order I have it is probably not the order you want).

This would give you a list of the first and last element in your sequence:
result = [s[0], s[-1]]
Alternatively, this would give you a tuple
result = s[0], s[-1]

With the particular case of a (N,N,N) ndarray X that you mention, would the following work for you?
s = slice(0,N,N-1)
X[s,s,s]
Example
>>> N = 3
>>> X = np.arange(N*N*N).reshape(N,N,N)
>>> s = slice(0,N,N-1)
>>> print X[s,s,s]
[[[ 0 2]
[ 6 8]]
[[18 20]
[24 26]]]

>>> from operator import itemgetter
>>> first_and_last = itemgetter(0, -1)
>>> first_and_last([1, 2, 3, 4, 5])
(1, 5)

Why do you want to use a slice? Getting each element with
result = [s[0], s[-1]]
is better and more readable.
If you really need to use the slice, then your solution is the simplest working one that I can think of.
This also works for the 3D case you've mentioned.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Means of asymmetric arrays in numpy - python

You could perform the mean for each sub-array of foo using a list comprehension: mean_foo = np.array( [np.mean(subfoo) for subfoo in foo] ) As suggested by #Kasramvd in another answer's comment, you can also use the map function : mean_foo = np.array( map(np.mean, foo) )

Related

Numpy minimum like np.outer()

Unpacking iterable using map

apply fromiter over matrix

simple but weird vstack/concatenate problems (python)

Pythonic way to get the first AND the last element of the sequence

Categories

Resources