Average arrays to same fixed length - python

I have multiple arrays with different length and I would like this data to be averaged to comparable arrays, e.g.
array1 = np.array([1, 2, 3, 4, 5])
array2 = np.array([1, 2, 3, 4])
target_length = 3
def cast(array, target_length):
...
This should give cast(array1, target_length) as:
np.array([(1+2*0.66)/1.66, (2*0.33+3*1+4*0.33)/1.66, (4*0.66+5)/1.66 ])
because: 5/3=1.66. Also we would obtain:
cast(array1, target_length) as:
np.array([(1+2*0.33)/1.33, (2*0.66+3*0.66)/1.33, (3*0.33+4)/1.33])
because: 4/3=1.33.
The arrays will never need to grow as a good numpy solution is available for this.
Is there a solution using the numpy library?

The question could be be read in a few different ways, but if I got it right, what you are trying to achieve is
def cast(array, target_length):
target = np.zeros(target_length)
for i in range(target_length*len(array)):
target[i//len(array)] += array[i//target_length]/len(array)
return target
If that's what you are aiming for, this may be obtained through numpy operations as
def cast(array, target_length):
return np.mean(np.repeat(array, target_length).reshape(-1, len(array)), 1)

Related

Adding 2 Diffrent NumPy arrays with diffrent values inside (Boolean , int)

I am taking the Data Science course on DataCamp.On one of the examples there were some kind of lack of an explanation about the numpy addittion rules. I am sending the picture of the example and the question below. What i did not understood was how a 2 array with diffrent values can be add up and give a solution like that.
DataCamp Numpy example
Code Python
In [1]:
np.array([True, 1, 2]) + np.array([3, 4, False])
Out[1]:
array([4, 5, 2])
You can think of a numpy 1d array as a list in python.
In fact you can see this if you case to a list like this:
# cast to a list
a = np.array([True, 1, 2]).tolist()
b = np.array([3, 4, False]).tolist()
# print them out
print(a) # [1,1,2]
print(b) # [3,4,0]
returns this:
[1, 1, 2]
[3, 4, 0]
You are then just adding each element of the lists.
a[0]+b[0] , a[1]+b[1], a[2]+b[2]
So the (numpy) result is this:
[4,5,2]
Because you are using numpy (which is a module in python) the plus (+) operation returns the result as a numpy list (which is the sum of both lists).
Note: numpy arrays are similar, but not identical to python lists.

Is there any way of getting multiple ranges of values in numpy array at once?

Let's say we have a simple 1D ndarray. That is:
import numpy as np
a = np.array([1,2,3,4,5,6,7,8,9,10])
I want to get the first 3 and the last 2 values, so that the output would be [ 1 2 3 9 10].
I have already solved this by merging and concatenating the merged variables as follows :
b= a[:2]
c= a[-2:]
a=np.concatenate([b,c])
However I would like to know if there is a more direct way to achieve this using slices, such as a[:2 and -2:] for instance. As an alternative I already tried this :
a = a[np.r_[:2, -2:]]
but it not seems to be working. It returns me only the first 2 values that is [1 2] ..
Thanks in advance!
Slicing a numpy array needs to be continuous AFAIK. The np.r_[-2:] does not work because it does not know how big the array a is. You could do np.r_[:2, len(a)-2:len(a)], but this will still copy the data since you are indexing with another array.
If you want to avoid copying data or doing any concatenation operation you could use np.lib.stride_tricks.as_strided:
ds = a.dtype.itemsize
np.lib.stride_tricks.as_strided(a, shape=(2,2), strides=(ds * 8, ds)).ravel()
Output:
array([ 1, 2, 9, 10])
But since you want the first 3 and last 2 values the stride for accessing the elements will not be equal. This is a bit trickier, but I suppose you could do:
np.lib.stride_tricks.as_strided(a, shape=(2,3), strides=(ds * 8, ds)).ravel()[:-1]
Output:
array([ 1, 2, 3, 9, 10])
Although, this is a potential dangerous operation because the last element is reading outside the allocated memory.
In afterthought, I cannot find out a way do this operation without copying the data somehow. The numpy ravel in the code snippets above is forced to make a copy of the data. If you can live with using the shapes (2,2) or (2,3) it might work in some cases, but you will only have reading permission to a strided view and this should be enforced by setting the keyword writeable=False.
You could try to access the elements with a list of indices.
import numpy as np
a = np.array([1,2,3,4,5,6,7,8,9,10])
b = a[[0,1,2,8,9]] # b should now be array([ 1, 2, 3, 9, 10])
Obviously, if your array is too long, you would not want to type out all the indices.
Thus, you could build the inner index list from for loops.
Something like that:
index_list = [i for i in range(3)] + [i for i in range(8, 10)]
b = a[index_list] # b should now be array([ 1, 2, 3, 9, 10])
Therefore, as long as you know where your desired elements are, you can access them individually.

Nested array computations in Python using numpy

I am trying to use numpy in Python in solving my project.
I have a random binary array rndm = [1, 0, 1, 1] and a resource_arr = [[2, 3], 4, 2, [1, 2]]. What I am trying to do is to multiply the array element wise, then get their sum. As an expected output for the sample above,
output = 5 0 2 3. I find hard to solve such problem because of the nested array/list.
So far my code looks like this:
def fitness_score():
output = numpy.add(rndm * resource_arr)
return output
fitness_score()
I keep getting
ValueError: invalid number of arguments.
For which I think is because of the addition that I am trying to do. Any help would be appreciated. Thank you!
Numpy treats its arrays as matrices, and resource_arr is not a (valid) matrix. In your case a python list is more suitable:
def sum_nested(l):
tmp = []
for element in l:
if isinstance(element, list):
tmp.append(numpy.sum(element))
else:
tmp.append(element)
return tmp
In this function we check for each element inside l if it is a list. If so, we sum its elements. On the other hand, if the encountered element is just a number, we leave it untouched. Please note that this only works for one level of nesting.
Now, if we run sum_nested([[2, 3], 4, 2, [1, 2]]) we will get [5 4 2 3]. All that's left is multiplying this result by the elements of rndm, which can be achieved easily using numpy:
def fitness_score(a, b):
return numpy.multiply(a, sum_nested(b))
Numpy is all about the non-jagged arrays. You can do things with jagged arrays, but doing so efficiently and elegantly isnt trivial.
Almost always, trying to find a way to map your datastructure to a non-nested one, for instance, encoding the information as below, will be more flexible, and more performant.
resource_arr = (
[0, 0, 1, 2, 3, 3]
[2, 3, 4, 2, 1, 2]
)
That is, an integer denoting the 'row' each value belongs to, paired with an array of equal size of the values themselves.
This may 'feel' wasteful when coming from a C-style way of doing arrays (omg more memory consumption), but staying away from nested datastructures is almost certainly your best bet in terms of performance, and the amount of numpy/scipy ecosystem that will actually be compatible with your data representation. If it really uses more memory is actually rather questionable; every new python object uses a ton of bytes, so if you have only few elements per nesting, it is the more memory efficient solution too.
In this case, that would give you the following efficient solution to your problem:
output = np.bincount(*resource_arr) * rndm
I have not worked much with pandas/numpy so I'm not sure if this is most efficient way, but it works (atleast for the example you have shown):
import numpy as np
rndm = [1, 0, 1, 1]
resource_arr = [[2, 3], 4, 2, [1, 2]]
multiplied_output = np.multiply(rndm, resource_arr)
print(multiplied_output)
output = []
for elem in multiplied_output:
output.append(sum(elem)) if isinstance(elem, list) else output.append(elem)
final_output = np.array(output)
print(final_output)

Numpy minimum like np.outer()

Maybe I'm just being lazy here, but let's say that I have two arrays, of length n and m, and I'd like a pairwise minimum of all of the elements of the two arrays compared against each other. For example:
a = [1,5,3]
b = [2,4]
cross_min(a,b)
= [[1,1],[2,4],[2,3]]
This is similar to the behavior of np.outer(), except that instead of multiplying the two arrays, it computes the minimum of the two elements.
Is there an operation in numpy that does a similar thing?
I know that I can just run np.minimum() along b and stack the results together. I'm wondering if this is a well-known operation that I just don't know the name of.
You can use np.minimum.outer(a, b)
You might turn one of the array into a 2d array, and then make use of the broadcasting rule and np.minimum:
import numpy as np
a = np.array([1,5,3])
b = np.array([2,4])
np.minimum(a[:,None], b)
#array([[1, 1],
# [2, 4],
# [2, 3]])

Pythonic way to get the first AND the last element of the sequence

What is the easiest and cleanest way to get the first AND the last elements of a sequence? E.g., I have a sequence [1, 2, 3, 4, 5], and I'd like to get [1, 5] via some kind of slicing magic. What I have come up with so far is:
l = len(s)
result = s[0:l:l-1]
I actually need this for a bit more complex task. I have a 3D numpy array, which is cubic (i.e. is of size NxNxN, where N may vary). I'd like an easy and fast way to get a 2x2x2 array containing the values from the vertices of the source array. The example above is an oversimplified, 1D version of my task.
Use this:
result = [s[0], s[-1]]
Since you're using a numpy array, you may want to use fancy indexing:
a = np.arange(27)
indices = [0, -1]
b = a[indices] # array([0, 26])
For the 3d case:
vertices = [(0,0,0),(0,0,-1),(0,-1,0),(0,-1,-1),(-1,-1,-1),(-1,-1,0),(-1,0,0),(-1,0,-1)]
indices = list(zip(*vertices)) #Can store this for later use.
a = np.arange(27).reshape((3,3,3)) #dummy array for testing. Can be any shape size :)
vertex_values = a[indices].reshape((2,2,2))
I first write down all the vertices (although I am willing to bet there is a clever way to do it using itertools which would let you scale this up to N dimensions ...). The order you specify the vertices is the order they will be in the output array. Then I "transpose" the list of vertices (using zip) so that all the x indices are together and all the y indices are together, etc. (that's how numpy likes it). At this point, you can save that index array and use it to index your array whenever you want the corners of your box. You can easily reshape the result into a 2x2x2 array (although the order I have it is probably not the order you want).
This would give you a list of the first and last element in your sequence:
result = [s[0], s[-1]]
Alternatively, this would give you a tuple
result = s[0], s[-1]
With the particular case of a (N,N,N) ndarray X that you mention, would the following work for you?
s = slice(0,N,N-1)
X[s,s,s]
Example
>>> N = 3
>>> X = np.arange(N*N*N).reshape(N,N,N)
>>> s = slice(0,N,N-1)
>>> print X[s,s,s]
[[[ 0 2]
[ 6 8]]
[[18 20]
[24 26]]]
>>> from operator import itemgetter
>>> first_and_last = itemgetter(0, -1)
>>> first_and_last([1, 2, 3, 4, 5])
(1, 5)
Why do you want to use a slice? Getting each element with
result = [s[0], s[-1]]
is better and more readable.
If you really need to use the slice, then your solution is the simplest working one that I can think of.
This also works for the 3D case you've mentioned.

Categories

Resources