I am looking to create a an array via numpy that generates an equally spaced values from interval to interval based on values in a given array.
I understand there is:
np.linspace(min, max, num_elements)
but what I am looking for is imagine you have a set of values:
arr = np.array([1, 2, 4, 6, 7, 8, 12, 10])
When I do:
#some numpy function
arr = np.somefunction(arr, 16)
>>>arr
>>> array([1, 1.12, 2, 2.5, 4, 4.5, etc...)]
# new array with 16 elements including all the numbers from previous
# array with generated numbers to 'evenly space them out'
So I am looking for the same functionality as linspace() but takes all the elements in an array and creates another array with the desired elements but evenly spaced intervals from the set values in the array. I hope I am making myself clear on this..
What I am trying to actually do with this set up is take existing x,y data and expand the data to have more 'control points' in a sense so i can do calculations in the long run.
Thank you in advance.
xp = np.arange(len(arr)) # X coordinates of arr
targets = np.arange(0, len(arr)-0.5, 0.5) # X coordinates desired
np.interp(targets, xp, arr)
The above does simple linear interpolation of 8 data points at 0.5 spacing for a total of 15 points (because of fenceposting):
array([ 1. , 1.5, 2. , 3. , 4. , 5. , 6. , 6.5, 7. ,
7.5, 8. , 10. , 12. , 11. , 10. ])
There are some additional options you can use in numpy.interp to tweak the behavior. You can also generate targets in different ways if you want.
Related
I am trying to create an array of equally spaced points using numpy, as below:
array([ 0. , 0.05263158, 0.10526316, 0.15789474, 0.21052632,
0.26315789, 0.31578947, 0.36842105, 0.42105263, 0.47368421,
0.52631579, 0.57894737, 0.63157895, 0.68421053, 0.73684211,
0.78947368, 0.84210526, 0.89473684, 0.94736842, 1. ])
This array is 20 points between 0 and 1, all with the same amount of space between them.
Looking at the documentation for numpy.array, however, I don't see a way to do this by passing in an extra parameter. Is this possible to do as a quick-and-easy one-liner?
Use linspace.
np.linspace(0, 1, 20)
http://docs.scipy.org/doc/numpy/reference/generated/numpy.linspace.html
I am trying to create a square NumPy (or PyTorch, since PyTorch code can be turned into NumPy with minimal effort) matrix which has the following property: given a set of values, the diagonal elements in each row have the largest value and the other values are randomly shuffled for the other positions.
For example, if I have [1, 2, 3, 4], a possible desired output is:
[[4, 3, 1, 2],
[1, 4, 3, 2],
[2, 1, 4, 3],
[2, 3, 1, 4]]
There can be (several) other possible outputs, as long as the diagonal elements are the largest value (4 in this case) and the off-diagonal elements in each row contain the other values but shuffled.
A hacky/inefficient way of doing this could be first creating a square matrix (4x4) of zeros and putting the largest value (4) in all the diagonal positions, and then traversing the matrix row by row, where for each row i, populate the elements except index i with shuffled remaining values (shuffled versions of [1, 2, 3]). This would be very slow as the matrix size increases. Is there a cleaner/faster/Pythonic way of doing it? Thank you.
First you can generate a randomized array on the first axis with np.random.shuffle(), then I've used a (not so easy to understand) mathematical tricks to shift each rows:
import numpy as np
from numpy.fft import fft, ifft
# First create your randomized array with np.random.shuffle()
x = np.array([[1,2,3,4],
[2,4,3,1],
[4,1,2,3],
[2,3,1,4]])
# We use np.where to determine on which column each 4 are.
_,s = np.where(x==4);
# We compute the left shift that need to be applied to each row in order to get each 4 on the diagonal
s = s-np.r_[0:x.shape[0]]
# And here is the tricks, we can use the fast fourrier transform in order to left shift each row by a given value:
L = np.real(ifft(fft(x,axis=1)*np.exp(2*1j*np.pi/x.shape[1]*s[:,None]*np.r_[0:x.shape[1]][None,:]),axis=1).round())
# Noticed that we could also use a right shift, we simply have to negate our exponential exponant:
# np.exp(-2*1j*np.pi...
And we obtain the following matrix:
[[4. 1. 2. 3.]
[2. 4. 1. 3.]
[2. 3. 4. 1.]
[3. 2. 1. 4.]]
No hidden for loop, only pure linear algaebra stuff.
To give you an idea it take only a few milliseconds for a 1000x1000 matrix on my computer and ~20s for a 10000x10000 matrix.
I'm trying to normalize an array within a range, e.g. [10,100]
But I also want to manually specify additional points in my result array, for example:
num = [1,2,3,4,5,6,7,8]
num_expected = [min(num), 5, max(num)]
expected_range = [10, 20, 100]
result_array = normalize(num, num_expected, expected_range)
Intended results:
Values from 1-5 are normalized to range (10,20].
5 in num array is mapped to 20 in expected range.
Values from 6-8 are normalized to range (20,100].
I know I can do it by normalizing the array twice, but I might have many additional points to add. I was wondering if there's any built-in function in numpy or scipy to do this?
I've checked MinMaxScaler in sklearn, but did not find the functionality I want.
Thanks!
Linear interpolation will do exactly what you want:
import scipy.interpolate
interp = scipy.interpolate.interp1d(num_expected, expected_range)
Then just pass numbers or arrays of numbers that you want to interpolate:
In [20]: interp(range(1, 9))
Out[20]:
array([ 10. , 12.5 , 15. , 17.5 ,
20. , 46.66666667, 73.33333333, 100. ])
I would like to stretch an increasing numpy array linearly on a bigger sized array
[2, 4, 6, 8, 10] to something like: [1.25, 2.5, 3.75, 5, 6.25, 7.5, 8.75, 10]
but a solution that may receive a general increasing input and still stretch it linearly
say for example:
[ 2, 4, 6, 7, 8, 10, 20, 20, 20] and stretch it linearly on array of size 20
Is there any existing numpy function or other simple way of doing that?
Edited:
I will try to make my question more understandable
I am trying to equalize an Image.
As part of the process, I am using the histogram and cumulative histogram
when checking the cumulative histogram - my first gray level might not be 0, and my last gray level might not be MAX_VAL (255 in my case)
I would like to take the received values (some monotonically increasing array) , and to stretch them so my first gray level would be 0 , and my last would be MAX_VAL.
I was thinking about cropping the array by [first gray level: last gray level] and then to try and stretch it back to the original size (256 in my case) yet I don't seem to understand how to do that
thanks in advance
The question seems to use 'stretch' in two senses.
1) to translate the values in an array to a given range.
2) To expand the array to a bigger array.
Version 1. a -> 0 to 255 with no change in size.
def translate(x, mx):
lo = x.min()
rng = x.max()-lo
return (x-lo)*mx/rng
a =np.array([10, 15,16,20, 25, 125, 126, 130, 150, 200, 201., 202])
at = translate(a,255.)
print(at)
# array([ 0. , 6.640625, 7.96875 , 13.28125 , 19.921875,
# 152.734375, 154.0625 , 159.375 , 185.9375 , 252.34375 ,
# 253.671875, 255. ])
The array a stays the same size but the values are stretched to fill the range 0 to 255.
Version 2
x = np.arange(len(a)) # An independent x for each a (or at)
new_x = np.linspace( 0., 11., 24) # Make the result have 25 elements, from 11.
print(new_x)
# [ 0. 0.47826087 0.95652174 1.43478261 1.91304348 2.39130435
# 2.86956522 3.34782609 3.82608696 4.30434783 4.7826087 5.26086957
# 5.73913043 6.2173913 6.69565217 7.17391304 7.65217391 8.13043478
# 8.60869565 9.08695652 9.56521739 10.04347826 10.52173913 11. ]
Use new_x to interpolate the a (or at) values based on x
np.interp(new_x, x, at) # The array at is made longer (25 elements)
# array([ 0. , 3.17595109, 6.35190217, 7.21807065,
# 7.85326087, 10.04755435, 12.58831522, 15.59103261,
# 18.7669837 , 60.34307065, 123.86209239, 153.08084239,
# 153.71603261, 155.2173913 , 157.75815217, 163.99456522,
# 176.69836957, 194.59918478, 226.35869565, 252.45923913,
# 253.09442935, 253.72961957, 254.36480978, 255. ])
np.interp(new_x, x, a) # The original array a is made longer (25 elements)
# array([ 10. , 12.39130435, 14.7826087 , 15.43478261,
# 15.91304348, 17.56521739, 19.47826087, 21.73913043,
# 24.13043478, 55.43478261, 103.26086957, 125.26086957,
# 125.73913043, 126.86956522, 128.7826087 , 133.47826087,
# 143.04347826, 156.52173913, 180.43478261, 200.08695652,
# 200.56521739, 201.04347826, 201.52173913, 202. ])
I'm not certain either answer meets the question but these are the two ways I can interpret it.
Like this?
your_array = np.array([2,4,6,8,10])
streched_array = np.linspace(np.min(your_array),np.max(your_array),20)
I have two list a and b as follows:
a = [4,4,4,1.1]
b = [4,4,4,1.2]
It is clear that the last value in both the list is different, still why do I get the correlation co-eff (from numpy) to be equal to 1 in the below code:
print(corrcoef(a,b))
output:
[[1. 1.]
[1. 1.]]
You assume just because the last value is different, the correlation coefficient should not be 1. This assumption however, can be flawed.
The important thing to realize is that correlation is calculated only after adjusting for the scales of each list/feature. With that in mind, you only have two unique pairs of datapoints. The correlation given only two datapoints can almost* always be constructed in such a way that it comes as 1 or -1. This is because the actual values don't matter, since they are scaled accordingly before comparison.
For example:
import numpy as np
a = [60, 30]
b = [1050, 490]
print(np.corrcoef(a,b)) #still gives 1.
Compare this to what you essentially passed:
import numpy as np
a = [4, 1.1]
b = [4, 1.2]
print(np.corrcoef(a,b)) #still gives 1.
Two datapoints don't contain enough information to show that the correlation can be a specific value that is not equal to 1 or -1.
To see why the correlation of 1 can make sense here, consider a 3rd point that i can add.
a = [6.9, 4, 1.1] #gaps of 2.9
b = [6.8, 4, 1.2] #gaps of 2.8
print(np.corrcoef(a,b)) #still gives 1.
Perhaps this makes it slightly clearer why the correlation can be 1, because the data points in the two lists are still moving together perfectly.
For getting a different correlation value with 3 points, we can compare to this.
a = [7, 4, 1.1]
b = [7, 4, 1.2]
print(np.corrcoef(a,b)) #gives 0.99994879
Now we have enough datapoints to show that the correlation is not perfectly 1.
*regarding the almost, exceptions would be cases where one feature does not change at all. such as a = [0, 0] with b = [0, 1]