I have a numpy array of x and y coordinates and want to make it regular. The array is sorted based on its x values (first column):
import numpy as np
Irregular_points = np.array([[1.1,5.], [0.85,7.1], [0.9,9], [1.1,11], [1.,13.1],
[1.9,5.2], [2.,6.9], [1.95,9], [2.1,11.1], [2.,13.1],
[3.0,5.1], [3.1,7.0], [3.,9], [3.0,11.], [3.1,12.8]])
I want to firtly find out which points have almost the same x values: it will be the first five rows, middle five rows and last five rows. One signal for finding these points is that y value decreases when I go to the next group. Then, I want to replace the x values of each group with the average value. For example in the fisrt five rows x values are 1.1, 0.85, 0.9, 1.1 and 1. and the average is 0.98. I want to do the same for next two parts.
For y values I again want to find similar ones which fall into five groups and then replace them with average of each group. y values of the first group are 5., 5.2 and 5.1 and average is 5.1. Finally, my points should be like the following array:
Regular_points = np.array([[0.98,5.1], [0.98,7.0], [0.98,9.0], [0.98,11.03], [0.98,13.0],
[1.98,5.1], [1.98,7.0], [1.98,9.0], [1.98,11.03], [1.98,13.0],
[3.04,5.1], [3.04,7.0], [3.04,9.0], [3.04,11.03], [3.04,13.0]])
I tried to round numbers but it did not work for real cases and I need to make these averages. I very much appreciate any help. The figure clearly shows what I want. Red dots are irregular points but by replacing averages, blue dots can be resulted.
Since you're averaging rows and columns, you'll need to use a different shape. Then separate x and y coords, average them by different axis and use np.transpose + np.meshgrid for nice display:
irregular_points = np.array([[1.1,5.], [0.85,7.1], [0.9,9], [1.1,11], [1.,13.1],
[1.9,5.2], [2.,6.9], [1.95,9], [2.1,11.1], [2.,13.1],
[3.0,5.1], [3.1,7.0], [3.,9], [3.0,11.], [3.1,12.8]])
points_reshape = irregular_points.reshape(3, 5, 2)
x, y = np.transpose(points_reshape)
x_mean = x.mean(axis=0)
y_mean = y.mean(axis=1)
regular_points = np.transpose(np.meshgrid(x_mean, y_mean))
regular_points
>>>
array([[[ 0.99 , 5.1 ],
[ 0.99 , 7. ],
[ 0.99 , 9. ],
[ 0.99 , 11.03333333],
[ 0.99 , 13. ]],
[[ 1.99 , 5.1 ],
[ 1.99 , 7. ],
[ 1.99 , 9. ],
[ 1.99 , 11.03333333],
[ 1.99 , 13. ]],
[[ 3.04 , 5.1 ],
[ 3.04 , 7. ],
[ 3.04 , 9. ],
[ 3.04 , 11.03333333],
[ 3.04 , 13. ]]])
You could use a cluster algorithm like KMeans:
import numpy as np
from sklearn.cluster import KMeans
irregular_points = np.array([[1.1,5.], [0.85,7.1], [0.9,9], [1.1,11], [1.,13.1],
[1.9,5.2], [2.,6.9], [1.95,9], [2.1,11.1], [2.,13.1],
[3.0,5.1], [3.1,7.0], [3.,9], [3.0,11.], [3.1,12.8]])
kmeans_x = KMeans(n_clusters=3).fit(irregular_points[:, 0, np.newaxis])
kmeans_y = KMeans(n_clusters=5).fit(irregular_points[:, 1, np.newaxis])
clusters_x = kmeans_x.predict(irregular_points[:, 0, np.newaxis])
clusters_y = kmeans_y.predict(irregular_points[:, 1, np.newaxis])
regular_points_x = kmeans_x.cluster_centers_[clusters_x]
regular_points_y = kmeans_y.cluster_centers_[clusters_y]
regular_points = np.asarray([[regular_points_x[i], regular_points_y[i]] for i in range(irregular_points.shape[0])])
>>>
array([[[ 0.99 , 5.1 ],
[ 0.99 , 7. ],
[ 0.99 , 9. ],
[ 0.99 , 11.03333333],
[ 0.99 , 13. ]],
[[ 1.99 , 5.1 ],
[ 1.99 , 7. ],
[ 1.99 , 9. ],
[ 1.99 , 11.03333333],
[ 1.99 , 13. ]],
[[ 3.04 , 5.1 ],
[ 3.04 , 7. ],
[ 3.04 , 9. ],
[ 3.04 , 11.03333333],
[ 3.04 , 13. ]]])
How can I use numpy to apply a level of diminishing returns across 2 axes. I'm working with temperature model data for a fixed (x,y) location. So the axes I'm working with is t_axis time and the z_axis vertical atmosphere.
The values below dont really apply to what would make sense for the normal atmosphere, but lets pretend.
a1=np.arange(16).reshape(4,4)
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]
[12 13 14 15]]
assume the information above is current forecast model data for my location, and it is predicting a temp of 12°C at the surface right now. But when I walk outside its actually 10°C, so I want to adjust the model data and make that temperature 10°C.
z_axis=3
t_axis=0
a1[z_axis,t_axis]=10
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]
[10 13 14 15]]
but really what I want to do apply a level of correction based on 2 variables t_mod (diminished returns over time) & z_mod (diminished returns through the vertical atmosphere).
correction = -2
t_mod=.05#50%
z_mod=0.25#25%
# how can i generate this array from modifiers
a2=np.array([
[0,0,0,0],#6k feet above ground level (agl)
[0,0,0,0],#4k feet agl
[.25,.13,0,0],#2k feet agl
[1,.5,.25,0]#surface
# ^ ^ ^ ^__ +3 hour
# | | L__ +2 hour
# | L__ +1 hour
# L__ zero hour
])
a1+(a2*correction )
[[ 0. 1. 2. 3. ]
[ 4. 5. 6. 7. ]
[ 7.5 8.74 8.8 11. ]
[10. 12. 13.5 15. ]]
Is this the approach I should be using? If so how can I generate a2 from the z and t axis modifiers.
How about this, we use linear stepping in t and z directions and multiply the t and z axes for points inside the matrix:
def shock_2d(t_mod, z_mod, n=4):
ts = np.maximum(1 - np.arange(n)*t_mod,0)
zs = np.maximum(1 - np.arange(n)*z_mod,0)
shock = zs.reshape(-1,1) # ts.reshape(1,-1)
return np.flipud(shock)
eg
shock_2d(t_mod = 0.5, z_mod = 0.25)
Out:
array([[0.25 , 0.125, 0. , 0. ],
[0.5 , 0.25 , 0. , 0. ],
[0.75 , 0.375, 0. , 0. ],
[1. , 0.5 , 0. , 0. ]])
and
shock_2d(t_mod = 0.05, z_mod = 0.25)
Out:
array([[0.25 , 0.2375, 0.225 , 0.2125],
[0.5 , 0.475 , 0.45 , 0.425 ],
[0.75 , 0.7125, 0.675 , 0.6375],
[1. , 0.95 , 0.9 , 0.85 ]])
the last argument, n, is the size of the matrix
Recently I was using this method to basically select 6 equally values along this colormap.
import matplotlib.pyplot as plt
import numpy as np
ints = np.linspace(0,255,6)
ints = [int(x) for x in ints]
newcm = plt.cm.Accent(ints)
Normally this would return the colormap values no problem. Now when I run this, the output I get for newcm is:
Out[25]:
array([[ 0.49803922, 0.78823529, 0.49803922, 1. ],
[ 0.4 , 0.4 , 0.4 , 1. ],
[ 0.4 , 0.4 , 0.4 , 1. ],
[ 0.4 , 0.4 , 0.4 , 1. ],
[ 0.4 , 0.4 , 0.4 , 1. ],
[ 0.4 , 0.4 , 0.4 , 1. ]])
So now things are not plotting right. I have also tried bytes=True but the behaviour is the same. Do others get the same result or is it some funny setting on my matplotlib that has gone awry?
Moreover - it seems this is happening in particular on the Accent colormap, but not necessarily others.
In general, a colormap ranges between 0 and 1. In np.linspace(0,255,6) all values except the first are larger than 1, hence you get the output corresponding to the maximum value 1 for all but the first item of that list.
If instead you use numbers = np.linspace(0,1,6), you will get 6 different values from that colormap.
import matplotlib.pyplot as plt
import numpy as np
numbers = np.linspace(0,1,6)
newcm = plt.cm.Accent(numbers)
print(newcm)
produces
[[ 0.49803922 0.78823529 0.49803922 1. ]
[ 0.74509804 0.68235294 0.83137255 1. ]
[ 1. 1. 0.6 1. ]
[ 0.21960784 0.42352941 0.69019608 1. ]
[ 0.74901961 0.35686275 0.09019608 1. ]
[ 0.4 0.4 0.4 1. ]]
I am trying to convert a multi-index pandas DataFrame into a numpy.ndarray. The DataFrame is below:
s1 s2 s3 s4
Action State
1 s1 0.0 0 0.8 0.2
s2 0.1 0 0.9 0.0
2 s1 0.0 0 0.9 0.1
s2 0.0 0 1.0 0.0
I would like the resulting numpy.ndarray to be the following with np.shape() = (2,2,4):
[[[ 0.0 0.0 0.8 0.2 ]
[ 0.1 0.0 0.9 0.0 ]]
[[ 0.0 0.0 0.9 0.1 ]
[ 0.0 0.0 1.0 0.0]]]
I have tried df.as_matrix() but this returns:
[[ 0. 0. 0.8 0.2]
[ 0.1 0. 0.9 0. ]
[ 0. 0. 0.9 0.1]
[ 0. 0. 1. 0. ]]
How do I return a list of lists for the first level with each list representing an Action records.
You could use the following:
dim = len(df.index.get_level_values(0).unique())
result = df.values.reshape((dim1, dim1, df.shape[1]))
print(result)
[[[ 0. 0. 0.8 0.2]
[ 0.1 0. 0.9 0. ]]
[[ 0. 0. 0.9 0.1]
[ 0. 0. 1. 0. ]]]
The first line just finds the number of groups that you want to groupby.
Why this (or groupby) is needed: as soon as you use .values, you lose the dimensionality of the MultiIndex from pandas. So you need to re-pass that dimensionality to NumPy in some way.
One way
In [151]: df.groupby(level=0).apply(lambda x: x.values.tolist()).values
Out[151]:
array([[[0.0, 0.0, 0.8, 0.2],
[0.1, 0.0, 0.9, 0.0]],
[[0.0, 0.0, 0.9, 0.1],
[0.0, 0.0, 1.0, 0.0]]], dtype=object)
Using Divakar's suggestion, np.reshape() worked:
>>> print(P)
s1 s2 s3 s4
Action State
1 s1 0.0 0 0.8 0.2
s2 0.1 0 0.9 0.0
2 s1 0.0 0 0.9 0.1
s2 0.0 0 1.0 0.0
>>> np.reshape(P,(2,2,-1))
[[[ 0. 0. 0.8 0.2]
[ 0.1 0. 0.9 0. ]]
[[ 0. 0. 0.9 0.1]
[ 0. 0. 1. 0. ]]]
>>> np.shape(P)
(2, 2, 4)
Elaborating on Brad Solomon's answer, to get a sligthly more generic solution - indexes of different sizes and an unfixed number of indexes - one could do something like this:
def df_to_numpy(df):
try:
shape = [len(level) for level in df.index.levels]
except AttributeError:
shape = [len(df.index)]
ncol = df.shape[-1]
if ncol > 1:
shape.append(ncol)
return df.to_numpy().reshape(shape)
If df has missing sub-indexes reshape will not work. One way to add them would be (maybe there are better solutions):
def enforce_df_shape(df):
try:
ind = pd.MultiIndex.from_product([level.values for level in df.index.levels])
except AttributeError:
return df
fulldf = pd.DataFrame(-1, columns=df.columns, index=ind) # remove -1 to fill fulldf with nan
fulldf.update(df)
return fulldf
If you are just trying to pull out one column, say s1, and get an array with shape (2,2) you can use the .index.levshape like this:
x = df.s1.to_numpy().reshape(df.index.levshape)
This will give you a (2,2) containing the value of s1.
Here's a simple code in python.
end = np.zeros((11,2))
alpha=0
while(alpha<=1):
end[int(10*alpha)] = alpha
print(end[int(10*alpha)])
alpha+=0.1
print('')
print(end)
and output:
[ 0. 0.]
[ 0.1 0.1]
[ 0.2 0.2]
[ 0.3 0.3]
[ 0.4 0.4]
[ 0.5 0.5]
[ 0.6 0.6]
[ 0.7 0.7]
[ 0.8 0.8]
[ 0.9 0.9]
[ 1. 1.]
[[ 0. 0. ]
[ 0.1 0.1]
[ 0.2 0.2]
[ 0.3 0.3]
[ 0.4 0.4]
[ 0.5 0.5]
[ 0.6 0.6]
[ 0.8 0.8]
[ 0. 0. ]
[ 1. 1. ]
[ 0. 0. ]]
It is easy to notice that 0.7 is missing and after 0.8 goes 0 instead of 0.9 etc... Why are these outputs differ?
It's because of floating point errors. Run this:
import numpy as np
end = np.zeros((11, 2))
alpha=0
while(alpha<=1):
print("alpha is ", alpha)
end[int(10*alpha)] = alpha
print(end[int(10*alpha)])
alpha+=0.1
print('')
print(end)
and you will see that alpha is, successively:
alpha is 0
alpha is 0.1
alpha is 0.2
alpha is 0.30000000000000004
alpha is 0.4
alpha is 0.5
alpha is 0.6
alpha is 0.7
alpha is 0.7999999999999999
alpha is 0.8999999999999999
alpha is 0.9999999999999999
Basically floating point numbers like 0.1 are stored inexactly on your computer. If you add 0.1 together say 8 times, you won't necessarily get 0.8 -- the small errors can accumulate and give you a different number, in this case 0.7999999999999999. Numpy arrays must take integers as indexes however, so it uses the int function to force this to round down to the nearest integer -- 7 -- which causes that row to be overwritten.
To solve this, you must rewrite your code so that you only ever use integers to index into an array. One slightly crude way would be to round the float to the nearest integer using the round function. But really you should rewrite your code so that it iterates over integers and coverts them into floats, rather than iterating over floats and converting them into integers.
You can read more about floating point numbers here:
https://docs.python.org/3/tutorial/floatingpoint.html
As #Denziloe pointed, this is due to floating point errors.
If you look at the definition of int():
If x is floating point, the conversion truncates towards zero
To solve your problem use round() instead of int()