Add in-between steps into array of numbers (Python)

Add in-between steps into array of numbers (Python) - python

I am looking for some function that takes an input array of numbers and adds steps (range) between these numbers. I need to specify the length of the output's array.
Example:
input_array = [1, 2, 5, 4]
output_array = do_something(input_array, output_length=10)
Result:
output_array => [1, 1.3, 1.6, 2, 3, 4, 5, 4.6, 4.3, 4]
len(output_array) => 10
Is there something like that, in Numpy for example?
I have a prototype of this function that uses dividing input array into pairs ([0,2], [2,5], [5,8]) and filling "spaces" between with np.linspace() but it don't work well: https://onecompiler.com/python/3xwcy3y7d
def do_something(input_array, output_length):
import math
import numpy as np
output = []
in_between_steps = math.ceil(output_length/len(input_array))
prev_num = None
for num in input_array:
if prev_num is not None:
for in_num in np.linspace(start=prev_num, stop=num, num=in_between_steps, endpoint=False):
output.append(in_num)
prev_num = num
output.append(input_array[len(input_array)-1]) # manually add last item
return output
How it works:
input_array = [1, 2, 5, 4]
print(len(do_something(input_array, output_length=10))) # result: 10 OK
print(len(do_something(input_array, output_length=20))) # result: 16 NOT OK
print(len(do_something(input_array, output_length=200))) # result: 151 NOT OK
I have an array [1, 2, 5, 4] and I need to "expand" a number of items in it but preserve the "shape":

There is numpy.interp which might be what you are looking for.
import numpy as np
points = np.arange(4)
values = np.array([1,2,5,4])
x = np.linspace(0, 3, num=10)
np.interp(x, points, values)
output:
array([1. , 1.33333333, 1.66666667, 2. , 3. ,
4. , 5. , 4.66666667, 4.33333333, 4. ])

Related

Numpy python - calculating sum of columns from irregular dimension

I have a multi-dimensional array for scores, and for which, I need to get sum of each columns at 3rd level in Python. I am using Numpy to achieve this.
import numpy as np
Data is something like:
score_list = [
[[1,1,3], [1,2,5]],
[[2,7,5], [4,1,3]]
]
This should return:
[[3 8 8] [5 3 8]]
Which is happening correctly using this:
sum_array = np_array.sum(axis=0)
print(sum_array)
However, if I have irregular shape like this:
score_list = [
[[1,1], [1,2,5]],
[[2,7], [4,1,3]]
]
I expect it to return:
[[3 8] [5 3 8]]
However, it comes up with warning and the return value is:
[list([1, 1, 2, 7]) list([1, 2, 5, 4, 1, 3])]
How can I get expected result?

numpy will try to cast it into an nd array which will fail, instead consider passing each sublist individually using zip.
score_list = [
[[1,1], [1,2,5]],
[[2,7], [4,1,3]]
]
import numpy as np
res = [np.sum(x,axis=0) for x in zip(*score_list)]
print(res)
[array([3, 8]), array([5, 3, 8])]

Here is one solution for doing this, keep in mind that it doesn't use numpy and will be very inefficient for larger matrices (but for smaller matrices runs just fine).
# Create matrix
score_list = [
[[1,1,3], [1,2,5]],
[[2,7,5], [4,1,3]]
]
# Get each row
for i in range(1, len(score_list)):
# Get each list within the row
for j in range(len(score_list[i])):
# Get each value in each list
for k in range(len(score_list[i][j])):
# Add current value to the same index
# on the first row
score_list[0][j][k] += score_list[i][j][k]
print(score_list[0])
There is bound to be a better solution but this is a temporary fix for you :)
Edit. Made more efficient

A possible solution:
a = np.vstack([np.array(score_list[x], dtype='object')
for x in range(len(score_list))])
[np.add(*[x for x in a[:, i]]) for i in range(a.shape[1])]
Another possible solution:
a = sum(score_list, [])
b = [a[x] for x in range(0,len(a),2)]
c = [a[x] for x in range(1,len(a),2)]
[np.add(x[0], x[1]) for x in [b, c]]
Output:
[array([3, 8]), array([5, 3, 8])]

Calculate sum of vectors in numpy array based on dictionary values

I have an array like the following, but much larger:
array = np.random.randint(6, size=(5, 4))
array([[4, 3, 0, 2],
[1, 4, 3, 1],
[0, 3, 5, 2],
[1, 0, 5, 3],
[0, 5, 4, 4]])
I also have a dictionary which gives me the vector representation of each value in this array:
dict_ = {2:np.array([3.4, 2.6, -1.2]), 0:np.array([0, 0, 0]), 1:np.array([3.9, 2.6, -1.2]), 3:np.array([3.8, 6.6, -1.9]), 4:np.array([5.4, 2.6, -1.2]),5:np.array([6.4, 2.6, -1.2])}
I want to calculate the average of the vector representations for each row in the array, but when the value is 0, ignore it when calculating average (dictionary shows it as a 0 vector).
For example, for the first row, it should average [5.4, 2.6, -1.2], [3.8, 6.6, -1.9], and [3.4, 2.6, -1.2], and give [4.2, 3.93, -1.43] as the first row of the output.
I want an output which keeps the same row structure, and has 3 columns (each vector in the dictionary has 3 values).
How can this be done in an efficient way? My actual dictionary has over 100000 entries and array is 100000 by 5000.

For efficiency I would transform the dict to an array and then use advanced indexing for lookup:
>>> import numpy as np
>>>
# create problem
>>> v = np.random.random((100_000, 3))
>>> dict_ = dict(enumerate(v))
>>> arr = np.random.randint(0, 100_000, (100_000, 100))
>>>
# solve
>>> from operator import itemgetter
>>> lookup = np.array(itemgetter(*range(100_000))(dict_))
>>> lookup[0] = np.nan
>>> result = np.nanmean(lookup[arr], axis=1)
Or applied to OP's example:
>>> arr = np.array([[4, 3, 0, 2],
... [1, 4, 3, 1],
... [0, 3, 5, 2],
... [1, 0, 5, 3],
... [0, 5, 4, 4]])
>>> dict_ = {2:np.array([3.4, 2.6, -1.2]), 0:np.array([0, 0, 0]), 1:np.array([3.9, 2.6, -1.2]), 3:np.array([3.8, 6.6, -1.9]), 4:np.array([5.4, 2.6, -1.2]),5:np.array([6.4, 2.6, -1.2])}
>>>
>>> lookup = np.array(itemgetter(*range(6))(dict_))
>>> lookup[0] = np.nan
>>> result = np.nanmean(lookup[arr], axis=1)
>>> result
array([[ 4.2 , 3.93333333, -1.43333333],
[ 4.25 , 3.6 , -1.375 ],
[ 4.53333333, 3.93333333, -1.43333333],
[ 4.7 , 3.93333333, -1.43333333],
[ 5.73333333, 2.6 , -1.2 ]])
Timings against #jpp's method:
pp: 0.8046 seconds
jpp: 10.3449 seconds
results equal: True
Code to produce timings:
import numpy as np
# create problem
v = np.random.random((100_000, 3))
dict_ = dict(enumerate(v))
arr = np.random.randint(0, 100_000, (100_000, 100))
# solve
from operator import itemgetter
def f_pp(arr, dict_):
lookup = np.array(itemgetter(*range(100_000))(dict_))
lookup[0] = np.nan
return np.nanmean(lookup[arr], axis=1)
def f_jpp(arr, dict_):
def averager(x):
lst = [dict_[i] for i in x if i]
return np.mean(lst, axis=0) if lst else np.array([0, 0, 0])
return np.apply_along_axis(averager, -1, arr)
from time import perf_counter
t = perf_counter()
r_pp = f_pp(arr, dict_)
s = perf_counter()
print(f'pp: {s-t:8.4f} seconds')
t = perf_counter()
r_jpp = f_jpp(arr, dict_)
s = perf_counter()
print(f'jpp: {s-t:8.4f} seconds')
print('results equal:', np.allclose(r_pp, r_jpp))

This is one solution using numpy.apply_along_axis.
You should test and benchmark to see if performance is adequate for your use case.
A = np.random.randint(6, size=(5, 4))
print(A)
[[3 5 2 4]
[2 4 5 2]
[0 3 1 1]
[3 4 4 5]
[2 5 0 2]]
zeros = {k for k, v in dict_.items() if (v==0).all()}
def averager(x):
lst = [dict_[i] for i in x if i not in zeros]
return np.mean(lst, axis=0) if lst else np.array([0, 0, 0])
res = np.apply_along_axis(averager, -1, A)
array([[ 4.75 , 3.6 , -1.375 ],
[ 4.65 , 2.6 , -1.2 ],
[ 3.86666667, 3.93333333, -1.43333333],
[ 5.25 , 3.6 , -1.375 ],
[ 4.4 , 2.6 , -1.2 ]])

How do I overwrite a row vector in a numpy array?

I am trying to normalize each row vector of numpy array x, but I'm facing 2 problems.
I'm unable to update the row vectors of x (source code in image)
Is it possible to avoid the for loop (line 6) with any numpy functions?
import numpy as np
x = np.array([[0, 3, 4] , [1, 6, 4]])
c = x ** 2
for i in range(0, len(x)):
print(x[i]/np.sqrt(c[i].sum())) #prints [0. 0.6 0.8]
x[i] = x[i]/np.sqrt(c[i].sum())
print(x[i]) #prints [0 0 0]
print(x) #prints [[0 0 0] [0 0 0]] and wasn't updated
I've just recently started out with numpy, so any assistance would be greatly appreciated!

I'm unable to update the row vectors of x (source code in image)
Your np.array has no dtype argument, so it uses <type 'numpy.int32'>. If you wish to store floats in the array, add a float dtype:
x = np.array([
[0,3,4],
[1,6,4]
], dtype = np.float)
To see this, compare
x = np.array([
[0,3,4],
[1,6,4]
], dtype = np.float)
print type(x[0][0]) # output = <type 'numpy.float64'>
to
x = np.array([
[0,3,4],
[1,6,4]
])
print type(x[0][0]) # output = <type 'numpy.int32'>
is it possible to avoid the for loop (line 6) with any numpy functions?
This is how I would do it:
norm1, norm2 = np.linalg.norm(x[0]), np.linalg.norm(x[1])
print x[0] / norm1
print x[1] / norm2

You can use:
x/np.sqrt((x*x).sum(axis=1))[:, None]
Example:
In [9]: x = np.array([[0, 3, 4] , [1, 6, 4]])
In [10]: x/np.sqrt((x*x).sum(axis=1))[:, None]
Out[10]:
array([[0. , 0.6 , 0.8 ],
[0.13736056, 0.82416338, 0.54944226]])

For the first question:
x = np.array([[0,3,4],[1,6,4]],dtype=np.float32)
For the second question:
x/np.sqrt(np.sum(x**2,axis=1).reshape((len(x),1)))

Given 2-dimensional array
x = np.array([[0, 3, 4] , [1, 6, 4]])
Row-wise L2 norm of that array can be calculated with:
norm = np.linalg.norm(x, axis = 1)
print(norm)
[5. 7.28010989]
You can not divide array x of shape (2, 3) by norm of shape (2,), the following trick enables that by adding extra dimension to norm
# Divide by adding extra dimension
x = x / norm[:, None]
print(x)
[[0. 0.6 0.8 ]
[0.13736056 0.82416338 0.54944226]]
This solves both your questions

Extracting coordinates from two numpy arrays

Say you have two numpy arrays one, call it A = [x1,x2,x3,x4,x5] which has all the x coordinates, then I have another array, call it B = [y1,y2,y3,y4,y5].. How would one "extract" a set of coordinates e.g (x1,y1) so that i could actually do something with it? Could I use a forloop or something similar? I can't seem to find any good examples, so if you could direct me or show me some I would be grateful.

Not sure if that's what you're looking for. But you can use numpy.concatenate. You just have to add a fake dimension before with [:,None] :
import numpy as np
a = np.array([1,2,3,4,5])
b = np.array([6,7,8,9,10])
arr_2d = np.concatenate([a[:,None],b[:,None]], axis=1)
print arr_2d
# [[ 1 6] [ 2 7] [ 3 8] [ 4 9] [ 5 10]]
Once you have generated a 2D array you can just use arr_2d[i] to get the i-th set of coordinates.

import numpy as np
a = np.array([1, 2, 3, 4, 5])
b = np.array([6, 7, 8, 9, 10])
print(np.hstack([a[:, np.newaxis], b[:, np.newaxis]]))
[[ 1 6]
[ 2 7]
[ 3 8]
[ 4 9]
[ 5 10]]

As #user2314737 said in a comment, you could manually do it by simply grabbing the same element from each array like so:
a = np.array([1,2,3])
b = np.array([4,5,6])
index = 2 #completely arbitrary index choice
#as individual values
pointA = a[index]
pointB = b[index]
#or in tuple form
point = (a[index], b[index])
If you need all of them converted to coordinate form, then #Nuageux's answer is probably better

Let's say you have x = np.array([ 0.48, 0.51, -0.43, 2.46, -0.91]) and y = np.array([ 0.97, -1.07, 0.62, -0.92, -1.25])
Then you can use the zip function
zip(x,y)
This will create a generator. Turn this generator into a list and turn the result into a numpy array
np.array(list(zip(x,y)))
the result will look like this
array([[ 0.48, 0.97],
[ 0.51, -1.07],
[-0.43, 0.62],
[ 2.46, -0.92],
[-0.91, -1.25]])

How to make numpy.cumsum start after the first value

I have:
import numpy as np
position = np.array([4, 4.34, 4.69, 5.02, 5.3, 5.7, ..., 4])
x = (B/position**2)*dt
A = np.cumsum(x)
assert A[0] == 0 # I want this to be true.
Where B and dt are scalar constants. This is for a numerical integration problem with initial condition of A[0] = 0. Is there a way to set A[0] = 0 and then do a cumsum for everything else?

I don't understand what exactly your problem is, but here are some things you can do to have A[0] = 0.
You can create A to be longer by one index to have the zero as the first entry:
# initialize example data
import numpy as np
B = 1
dt = 1
position = np.array([4, 4.34, 4.69, 5.02, 5.3, 5.7])
# do calculation
A = np.zeros(len(position) + 1)
A[1:] = np.cumsum((B/position**2)*dt)
Result:
A = [ 0. 0.0625 0.11559096 0.16105356 0.20073547 0.23633533 0.26711403]
len(A) == len(position) + 1
Alternatively, you can manipulate the calculation to substract the first entry of the result:
# initialize example data
import numpy as np
B = 1
dt = 1
position = np.array([4, 4.34, 4.69, 5.02, 5.3, 5.7])
# do calculation
A = np.cumsum((B/position**2)*dt)
A = A - A[0]
Result:
[ 0. 0.05309096 0.09855356 0.13823547 0.17383533 0.20461403]
len(A) == len(position)
As you see, the results have different lengths. Is one of them what you expect?

1D cumsum
A wrapper around np.cumsum that sets first element to 0:
def cumsum(pmf):
cdf = np.empty(len(pmf) + 1, dtype=pmf.dtype)
cdf[0] = 0
np.cumsum(pmf, out=cdf[1:])
return cdf
Example usage:
>>> np.arange(1, 11)
array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
>>> cumsum(np.arange(1, 11))
array([ 0, 1, 3, 6, 10, 15, 21, 28, 36, 45, 55])
N-D cumsum
A wrapper around np.cumsum that sets first element to 0, and works with N-D arrays:
def cumsum(pmf, axis=None, dtype=None):
if axis is None:
pmf = pmf.reshape(-1)
axis = 0
if dtype is None:
dtype = pmf.dtype
idx = [slice(None)] * pmf.ndim
# Create array with extra element along cumsummed axis.
shape = list(pmf.shape)
shape[axis] += 1
cdf = np.empty(shape, dtype)
# Set first element to 0.
idx[axis] = 0
cdf[tuple(idx)] = 0
# Perform cumsum on remaining elements.
idx[axis] = slice(1, None)
np.cumsum(pmf, axis=axis, dtype=dtype, out=cdf[tuple(idx)])
return cdf
Example usage:
>>> np.arange(1, 11).reshape(2, 5)
array([[ 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10]])
>>> cumsum(np.arange(1, 11).reshape(2, 5), axis=-1)
array([[ 0, 1, 3, 6, 10, 15],
[ 0, 6, 13, 21, 30, 40]])

I totally understand your pain, I wonder why Numpy doesn't allow this with np.cumsum. Anyway, though I'm really late and there's already another good answer, I prefer this one a bit more:
np.cumsum(np.pad(array, (1, 0), "constant"))
where array in your case is (B/position**2)*dt. You can change the order of np.pad and np.cumsum as well. I'm just adding a zero to the start of the array and calling np.cumsum.

You can use roll (shift right by 1) and then set the first entry to zero.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Add in-between steps into array of numbers (Python) - python

There is numpy.interp which might be what you are looking for. import numpy as np points = np.arange(4) values = np.array([1,2,5,4]) x = np.linspace(0, 3, num=10) np.interp(x, points, values) output: array([1. , 1.33333333, 1.66666667, 2. , 3. , 4. , 5. , 4.66666667, 4.33333333, 4. ])

Related

Numpy python - calculating sum of columns from irregular dimension

Calculate sum of vectors in numpy array based on dictionary values

How do I overwrite a row vector in a numpy array?

Extracting coordinates from two numpy arrays

How to make numpy.cumsum start after the first value

Categories

Resources