I have a number of x/y data points of varying sizes and I need to rescale each one to the same fixed size.
For example, given two sets of x/y data where the first has 12 data points and the second 6. The maximum y value of the first is 80 and the second 55:
X1 = np.array([ 1, 2, 3, 4, 5, 6 ,7, 8, 9, 10, 11, 12 ])
Y1 = np.array([ 10, 20, 50, 55, 70, 77 ,78, 80, 55, 50, 21, 12 ])
X2 = [ 1, 2, 3, 4, 5, 6 ]
Y2 = [ 10, 20, 50, 55, 50, 10 ]
How can I rescale this data so that they both have 8 data points and the maximum y value is 60? I'm developing in python with numpy/matplotlib.
If you want to add/remove points from a data set my first idea would be to do a regression on the data set with np.polyfit or scipy.optimize.curve_fit (depending on what kind of function you expect your points to follow), then generate new points from that regression.
new_x_points = [1, 2, 3, 4, 5, 6, 7, 8]
coeff = np.polyfit(X1, Y1, deg = 2)
new_y_points = np.polyval(coeff, new_x_points)
Moving points from an interval (a,b) to the interval (c,d) is purely a mathematical problem. If x is on the interval (a,b) then
f(x) = (x - a) * h + c
where
h = (d - c)/(b - a)
is a linear map to the interval (c, d).
Related
I would like to know if I have generated the 3 arrays in the manner below, how can I sum all the numbers up from all 3 arrys without summing up the ones that appear in each array.
(I would like to only som upt 10 once but I cant add array X_1 andX_2 because they both have 10 and 20, I only want to som up those numbers once.)
Maybe this can be done by creating a new array out of the X_1, X_2 and X_3 what leave out doubles?
def get_divisible_by_n(arr, n):
return arr[arr%n == 0]
x = np.arange(1,21)
X_1=get_divisible_by_n(x, 2)
#we get array([ 2, 4, 6, 8, 10, 12, 14, 16, 18, 20])
X_2=get_divisible_by_n(x, 5)
#we get array([ 5, 10, 15, 20])
X_3=get_divisible_by_n(x, 3)
#we get array([3, 6, 9, 12, 15, 18])
it is me again!
here is my solution using numpy, cuz i had more time this time:
import numpy as np
arr = np.arange(1,21)
divisable_by = lambda x: arr[np.where(arr % x == 0)]
n_2 = divisable_by(2)
n_3 = divisable_by(3)
n_5 = divisable_by(5)
what_u_want = np.unique( np.concatenate((n_2, n_3, n_5)) )
# [ 2, 3, 4, 5, 6, 8, 9, 10, 12, 14, 15, 16, 18, 20]
Not really efficient and not using numpy but here is one solution:
def get_divisible_by_n(arr, n):
return [i for i in arr if i % n == 0]
x = [i for i in range(21)]
X_1 = get_divisible_by_n(x, 2)
X_2 = get_divisible_by_n(x, 5)
X_3 = get_divisible_by_n(x, 3)
X_all = X_1+X_2+X_3
y = set(X_all)
print(sum(y)) # 142
I'm using doatools.py library (https://github.com/morriswmz/doatools.py)
Now, my code looks like:
import numpy as np
from scipy import constants as const
import math
import doatools.model as model
import doatools.estimation as estimation
def calculate_wavelength(frequency):
return const.speed_of_light / frequency
# Uniform circular array
# X
# |
# X---------X
# |
# X
NUMBER_OF_ELEMENTS = 4 # elements are shown as "X"
RADIUS = 0.47 / 2
FREQ_MHZ = 315
freq = FREQ_MHZ * const.mega
wavelength = calculate_wavelength(freq)
antenna_array = model.UniformCircularArray(NUMBER_OF_ELEMENTS, RADIUS)
# Create a MUSIC-based estimator.
grid = estimation.FarField1DSearchGrid()
estimator = estimation.MUSIC(antenna_array, wavelength, grid)
R = np.array([[1.5, 2, 3, 4], [4, 5, 6, 5], [45, 5, 5, 6], [5, 1, 0, 5]])
_, estimates = estimator.estimate(R, 1, return_spectrum=False, refine_estimates=True)
print('Estimates: {0}'.format(estimates.locations))
I can generate signal with this library, but how to use my own? For example, signal from ADC (like this:
-> Switching to antenna 0 : [0, 4, 7, 10]
-> Switching to antenna 1 : [5, 6, 11, 83]
-> Switching to antenna 2 : [0, 23, 2, 34]
-> Switching to antenna 3 : [23, 105, 98, 200]
)
I think your question is how you should feed the real data from antennas, right?
Supposedly your data should be in order along time. I mean in case of "antenna 0 : [0, 4, 7, 10]", 0 is the 1st-in data, and 4, 7, in order, and the 10 is the last one in time.
If yes, you could leave them as a simple matrix like what you typed above:
r = matrix 4x4 of
0, 4, 7, 10
5, 6, 11, 83
0, 23, 2, 34
23, 105, 98, 200
//===============
r(0,0) = 0, r(0,1) = 4, r(0,2) = 7, r(0,3) = 10
r(1,0) = 5, r(1,1) = 6, ... etc.
r(2,0) = 0, ...etc.
//==============
R = the product of r and its hermitian matrix (r.h in python).
R = r # r.h
And this is the covariance matrix that you need to fill in as the 1st argument in function.
I am generating a heat map for my data.
everything works fine, but I have a little problem. My data (numbers) are from 0 to 10.000.
0 means nothing (no data) and at the moment the field with 0 just take the lowest color of my color scala. My problem is how to make the data with 0 to have a total different color (e.g. black or white)
Just see the Picture to better understand what i mean:
My code (snippet) looks like this:
matplotlib.pyplot.imshow(results, interpolation='none')
matplotlib.pyplot.colorbar();
matplotlib.pyplot.xticks([0, 1, 2, 3, 4, 5, 6, 7, 8], [10, 15, 20, 25, 30, 35, 40, 45, 50]);
matplotlib.pyplot.xlabel('Population')
matplotlib.pyplot.yticks([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10], [1, 2, 3, 4, 5, 10, 15, 20, 25, 30, 'serial']);
matplotlib.pyplot.ylabel('Communication Step');
axis.xaxis.tick_top();
matplotlib.pyplot.savefig('./results_' + optimisationProblem + '_dim' + str(numberOfDimensions) + '_' + statisticType + '.png');
matplotlib.pyplot.close();
If you are not interested in a smooth transition between the values 0 and 0.0001, you can just set every value that equals 0 to NaN. This will result in a white color whereas 0.0001 will still be deep blue-ish.
In the following code I include an example. I generate the data randomly. I therefore select a single element from my array and set it to NaN. This results in the color white. I also included a line in which you can set every data point that equals 0 to NaN.
import numpy
import matplotlib.pyplot as plt
#Random data
data = numpy.random.random((10, 10))
#Set all data points equal to zero to NaN
#data[data == 0.] = float("NaN")
#Set single data value to nan
data[2][2] = float("NaN")
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
ax.imshow(data, interpolation = "nearest")
plt.show()
I am trying to write a function that maps 2d-ndarray to 2d-ndarray. The rows of the input array can be processed independently and there shall be a 1-to-1 correspondence between rows of the input and rows of the output. For each row of the input, the polynomial expansion of a given order for the row shall be computed (see docstring for an example). The current implementation works; however it requires an explicit loop over the rows and duplication of rows in the "powerMatrix"). Is it possible to get the same result with a single call to numpy.power? Btw.: the order of the entries in the result's rows doesn't matter to me.
import numpy
def polynomialFeatures(x, order):
""" Generate polynomial features of given order for data x.
For each row of ndarray x, the polynomial expansions are computed, i.e
for row [x1, x2] and order 2, the following row of the result matrix is
computed: [1, x1, x1**2, x2, x1*x2, x1**2*x2, x2**2, x1*x2**2, x1**2*x2**2]
Parameters
----------
x : array-like
2-D array; for each of its rows, the polynomial features are created
order : int
The order of the polynomial features
Returns
-------
out : ndarray
2-D array of shape (x.shape[0], (order+1)**x.shape[1]) containing the
polynomial features computed for the rows of the array x
Examples
--------
>>> polynomialFeatures([[1, 2, 3], [-1, -2, -3]], 2)
array([[ 1 3 9 2 6 18 4 12 36 1 3 9 2 6 18 4 12
36 1 3 9 2 6 18 4 12 36]
[ 1 -3 9 -2 6 -18 4 -12 36 -1 3 -9 2 -6 18 -4 12
-36 1 -3 9 -2 6 -18 4 -12 36]])
"""
x = numpy.asarray(x)
# TODO: Avoid duplication of rows
powerMatrix = numpy.array([range(order+1)] * x.shape[1]).T
# TODO: Avoid explicit loop, and use numpy's broadcasting
F = []
for i in range(x.shape[0]):
X = numpy.power(x[i], powerMatrix).T
F.append(numpy.multiply.reduce(cartesian(X), axis=1))
return numpy.array(F)
print numpy.all(polynomialFeatures([[1, 2, 3], [-1, -2, -3]], 2) ==
numpy.array([[1, 3, 9, 2, 6, 18, 4, 12, 36, 1,
3, 9, 2, 6, 18, 4, 12, 36, 1, 3,
9, 2, 6, 18, 4, 12, 36],
[1, -3, 9, -2, 6, -18, 4, -12, 36, -1,
3, -9, 2, -6, 18, -4, 12, -36, 1, -3,
9, -2, 6, -18, 4, -12, 36]]))
Thanks,
Jan
EDIT: The missing function cartesian is defined here: Using numpy to build an array of all combinations of two arrays
The basic idea is to move the dimension (in your case, dimension 0, the number of rows) that's irrelevant to the calculation "out of the way" into a higher dimension and then automatically broadcast over it.
I'm not sure what your cartesian method is doing, but here's a solution that uses np.indices to generate indexing tuples over the X matrix:
import numpy as np
def polynomial_features(x, order):
x = np.asarray(x).T[np.newaxis]
n = x.shape[1]
power_matrix = np.tile(np.arange(order + 1), (n, 1)).T[..., np.newaxis]
X = np.power(x, power_matrix)
I = np.indices((order + 1, ) * n).reshape((n, (order + 1) ** n)).T
F = np.product(np.diagonal(X[I], 0, 1, 2), axis=2)
return F.T
I am trying to group a numpy array into smaller size by taking average of the elements. Such as take average foreach 5x5 sub-arrays in a 100x100 array to create a 20x20 size array. As I have a huge data need to manipulate, is that an efficient way to do that?
I have tried this for smaller array, so test it with yours:
import numpy as np
nbig = 100
nsmall = 20
big = np.arange(nbig * nbig).reshape([nbig, nbig]) # 100x100
small = big.reshape([nsmall, nbig//nsmall, nsmall, nbig//nsmall]).mean(3).mean(1)
An example with 6x6 -> 3x3:
nbig = 6
nsmall = 3
big = np.arange(36).reshape([6,6])
array([[ 0, 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10, 11],
[12, 13, 14, 15, 16, 17],
[18, 19, 20, 21, 22, 23],
[24, 25, 26, 27, 28, 29],
[30, 31, 32, 33, 34, 35]])
small = big.reshape([nsmall, nbig//nsmall, nsmall, nbig//nsmall]).mean(3).mean(1)
array([[ 3.5, 5.5, 7.5],
[ 15.5, 17.5, 19.5],
[ 27.5, 29.5, 31.5]])
This is pretty straightforward, although I feel like it could be faster:
from __future__ import division
import numpy as np
Norig = 100
Ndown = 20
step = Norig//Ndown
assert step == Norig/Ndown # ensure Ndown is an integer factor of Norig
x = np.arange(Norig*Norig).reshape((Norig,Norig)) #for testing
y = np.empty((Ndown,Ndown)) # for testing
for yr,xr in enumerate(np.arange(0,Norig,step)):
for yc,xc in enumerate(np.arange(0,Norig,step)):
y[yr,yc] = np.mean(x[xr:xr+step,xc:xc+step])
You might also find scipy.signal.decimate interesting. It applies a more sophisticated low-pass filter than simple averaging before downsampling the data, although you'd have to decimate one axis, then the other.
Average a 2D array over subarrays of size NxN:
height, width = data.shape
data = average(split(average(split(data, width // N, axis=1), axis=-1), height // N, axis=1), axis=-1)
Note that eumiro's approach does not work for masked arrays as .mean(3).mean(1) assumes that each mean along axis 3 was computed from the same number of values. If there are masked elements in your array, this assumption does not hold any more. In that case, you have to keep track of the number of values used to compute .mean(3) and replace .mean(1) by a weighted mean. The weights are the normalized number of values used to compute .mean(3).
Here is an example:
import numpy as np
def gridbox_mean_masked(data, Nbig, Nsmall):
# Reshape data
rshp = data.reshape([Nsmall, Nbig//Nsmall, Nsmall, Nbig//Nsmall])
# Compute mean along axis 3 and remember the number of values each mean
# was computed from
mean3 = rshp.mean(3)
count3 = rshp.count(3)
# Compute weighted mean along axis 1
mean1 = (count3*mean3).sum(1)/count3.sum(1)
return mean1
# Define test data
big = np.ma.array([[1, 1, 2],
[1, 1, 1],
[1, 1, 1]])
big.mask = [[0, 0, 0],
[0, 0, 1],
[0, 0, 0]]
Nbig = 3
Nsmall = 1
# Compute gridbox mean
print gridbox_mean_masked(big, Nbig, Nsmall)