Interpolation of a pandas DataFrame

Interpolation of a pandas DataFrame - python

I do have a pandas DataFrame (size = 34,19) which I want to use as a lookup table.
But the values I want to look up are "between" the values in the dataframe
For example:
0.1 0.2 0.3 0.4 0.5
0.1 4.01 31.86 68.01 103.93 139.2
0.2 24.07 57.49 91.37 125.21 158.57
0.3 44.35 76.4 108.97 141.57 173.78
0.4 59.66 91.02 122.8 154.62 186.13
0.5 87.15 117.9 148.86 179.83 210.48
0.6 106.92 137.41 168.26 198.99 229.06
0.7 121.73 152.48 183.4 213.88 243.33
I know want to look up the value for x = 5.5 y = 1.004, so the answer should be around 114.
I tried it with different methods from scipy but the values I get are always way off.
Last method I used was :inter = interpolate.interpn([np.array(np.arange(34)), np.array(np.arange(19))], np_matrix, [x_value, y_value],)
I even get wrong values for points in the grid which do exist.
Can someone tell me what I'm doing wrong or recommend an easy solution for the task?
EDIT:
An additional problem is:
My raw data, from an .xlsx file, look like:
0.1 0.2 0.3 0.4 0.5
0.1 4.01 31.86 68.01 103.93 139.2
0.2 24.07 57.49 91.37 125.21 158.57
0.3 44.35 76.4 108.97 141.57 173.78
0.4 59.66 91.02 122.8 154.62 186.13
0.5 87.15 117.9 148.86 179.83 210.48
0.6 106.92 137.41 168.26 198.99 229.06
0.7 121.73 152.48 183.4 213.88 243.33
But pandas adds an Index column:
0.1 0.2 0.3 0.4 0.5
0 0.1 4.01 31.86 68.01 103.93 139.2
1 0.2 24.07 57.49 91.37 125.21 158.57
2 0.3 44.35 76.4 108.97 141.57 173.78
3 0.4 59.66 91.02 122.8 154.62 186.13
4 0.8 87.15 117.9 148.86 179.83 210.48
5 1.0 106.92 137.41 168.26 198.99 229.06
6 1.7 121.73 152.48 183.4 213.88 243.33
So if I want to access x = 0.4 y = 0.15 I have to input x = 3, y = 0.15.
Data are read with:
model_references = pd.ExcelFile(model_references_path)
Matrix = model_references.parse('Model_References')
n = Matrix.stack().reset_index().values
out = interpolate.griddata(n[:,0:2], n[:,2], (Stroke, Current), method='cubic')

You can reshape data to 3 columns with stack - first column for index, second for columns and last for values, last get values by scipy.interpolate.griddata
from scipy.interpolate import griddata
a = 5.5
b = 1.004
n = df.stack().reset_index().values
#https://stackoverflow.com/a/8662243
out = griddata(n[:,0:2], n[:,2], [(a, b)], method='linear')
print (out)
[104.563]
Detail:
n = df.stack().reset_index().values
print (n)
[[ 1. 1. 4.01]
[ 1. 2. 31.86]
[ 1. 3. 68.01]
[ 1. 4. 103.93]
[ 1. 5. 139.2 ]
[ 2. 1. 24.07]
[ 2. 2. 57.49]
[ 2. 3. 91.37]
[ 2. 4. 125.21]
[ 2. 5. 158.57]
[ 3. 1. 44.35]
[ 3. 2. 76.4 ]
[ 3. 3. 108.97]
[ 3. 4. 141.57]
[ 3. 5. 173.78]
[ 4. 1. 59.66]
[ 4. 2. 91.02]
[ 4. 3. 122.8 ]
[ 4. 4. 154.62]
[ 4. 5. 186.13]
[ 5. 1. 87.15]
[ 5. 2. 117.9 ]
[ 5. 3. 148.86]
[ 5. 4. 179.83]
[ 5. 5. 210.48]
[ 5. 1. 106.92]
[ 5. 2. 137.41]
[ 5. 3. 168.26]
[ 5. 4. 198.99]
[ 5. 5. 229.06]
[ 6. 1. 121.73]
[ 6. 2. 152.48]
[ 6. 3. 183.4 ]
[ 6. 4. 213.88]
[ 6. 5. 243.33]]

Try interp2d from scipy.
import numpy as np
from scipy.interpolate import interp2d
x = [1, 2, 3, 4, 5, 6, 7]
y = [1, 2, 3, 4, 5]
z = [[4.01, 31.86, 68.01, 103.93, 139.2],
[24.07, 57.49, 91.37, 125.21, 158.57],
[44.35, 76.4, 108.97, 141.57, 173.78],
[59.66, 91.02, 122.8, 154.62, 186.13],
[87.15, 117.9, 148.86, 179.83, 210.48],
[106.92, 137.41, 168.26, 198.99, 229.06],
[121.73, 152.48, 183.4, 213.88, 243.33]]
z = np.array(z).T
f = interp2d(x, y, z)
f(x = 5.5, y = 1.004) # returns 97.15748
Try to change method's kind argument in order to experiment with return value.

Related

how to make a regular grid base on some irregular points in python

I have a numpy array of x and y coordinates and want to make it regular. The array is sorted based on its x values (first column):
import numpy as np
Irregular_points = np.array([[1.1,5.], [0.85,7.1], [0.9,9], [1.1,11], [1.,13.1],
[1.9,5.2], [2.,6.9], [1.95,9], [2.1,11.1], [2.,13.1],
[3.0,5.1], [3.1,7.0], [3.,9], [3.0,11.], [3.1,12.8]])
I want to firtly find out which points have almost the same x values: it will be the first five rows, middle five rows and last five rows. One signal for finding these points is that y value decreases when I go to the next group. Then, I want to replace the x values of each group with the average value. For example in the fisrt five rows x values are 1.1, 0.85, 0.9, 1.1 and 1. and the average is 0.98. I want to do the same for next two parts.
For y values I again want to find similar ones which fall into five groups and then replace them with average of each group. y values of the first group are 5., 5.2 and 5.1 and average is 5.1. Finally, my points should be like the following array:
Regular_points = np.array([[0.98,5.1], [0.98,7.0], [0.98,9.0], [0.98,11.03], [0.98,13.0],
[1.98,5.1], [1.98,7.0], [1.98,9.0], [1.98,11.03], [1.98,13.0],
[3.04,5.1], [3.04,7.0], [3.04,9.0], [3.04,11.03], [3.04,13.0]])
I tried to round numbers but it did not work for real cases and I need to make these averages. I very much appreciate any help. The figure clearly shows what I want. Red dots are irregular points but by replacing averages, blue dots can be resulted.

Since you're averaging rows and columns, you'll need to use a different shape. Then separate x and y coords, average them by different axis and use np.transpose + np.meshgrid for nice display:
irregular_points = np.array([[1.1,5.], [0.85,7.1], [0.9,9], [1.1,11], [1.,13.1],
[1.9,5.2], [2.,6.9], [1.95,9], [2.1,11.1], [2.,13.1],
[3.0,5.1], [3.1,7.0], [3.,9], [3.0,11.], [3.1,12.8]])
points_reshape = irregular_points.reshape(3, 5, 2)
x, y = np.transpose(points_reshape)
x_mean = x.mean(axis=0)
y_mean = y.mean(axis=1)
regular_points = np.transpose(np.meshgrid(x_mean, y_mean))
regular_points
>>>
array([[[ 0.99 , 5.1 ],
[ 0.99 , 7. ],
[ 0.99 , 9. ],
[ 0.99 , 11.03333333],
[ 0.99 , 13. ]],
[[ 1.99 , 5.1 ],
[ 1.99 , 7. ],
[ 1.99 , 9. ],
[ 1.99 , 11.03333333],
[ 1.99 , 13. ]],
[[ 3.04 , 5.1 ],
[ 3.04 , 7. ],
[ 3.04 , 9. ],
[ 3.04 , 11.03333333],
[ 3.04 , 13. ]]])

You could use a cluster algorithm like KMeans:
import numpy as np
from sklearn.cluster import KMeans
irregular_points = np.array([[1.1,5.], [0.85,7.1], [0.9,9], [1.1,11], [1.,13.1],
[1.9,5.2], [2.,6.9], [1.95,9], [2.1,11.1], [2.,13.1],
[3.0,5.1], [3.1,7.0], [3.,9], [3.0,11.], [3.1,12.8]])
kmeans_x = KMeans(n_clusters=3).fit(irregular_points[:, 0, np.newaxis])
kmeans_y = KMeans(n_clusters=5).fit(irregular_points[:, 1, np.newaxis])
clusters_x = kmeans_x.predict(irregular_points[:, 0, np.newaxis])
clusters_y = kmeans_y.predict(irregular_points[:, 1, np.newaxis])
regular_points_x = kmeans_x.cluster_centers_[clusters_x]
regular_points_y = kmeans_y.cluster_centers_[clusters_y]
regular_points = np.asarray([[regular_points_x[i], regular_points_y[i]] for i in range(irregular_points.shape[0])])
>>>
array([[[ 0.99 , 5.1 ],
[ 0.99 , 7. ],
[ 0.99 , 9. ],
[ 0.99 , 11.03333333],
[ 0.99 , 13. ]],
[[ 1.99 , 5.1 ],
[ 1.99 , 7. ],
[ 1.99 , 9. ],
[ 1.99 , 11.03333333],
[ 1.99 , 13. ]],
[[ 3.04 , 5.1 ],
[ 3.04 , 7. ],
[ 3.04 , 9. ],
[ 3.04 , 11.03333333],
[ 3.04 , 13. ]]])

apply diminishing returns across 2 axis with numpy

How can I use numpy to apply a level of diminishing returns across 2 axes. I'm working with temperature model data for a fixed (x,y) location. So the axes I'm working with is t_axis time and the z_axis vertical atmosphere.
The values below dont really apply to what would make sense for the normal atmosphere, but lets pretend.
a1=np.arange(16).reshape(4,4)
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]
[12 13 14 15]]
assume the information above is current forecast model data for my location, and it is predicting a temp of 12°C at the surface right now. But when I walk outside its actually 10°C, so I want to adjust the model data and make that temperature 10°C.
z_axis=3
t_axis=0
a1[z_axis,t_axis]=10
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]
[10 13 14 15]]
but really what I want to do apply a level of correction based on 2 variables t_mod (diminished returns over time) & z_mod (diminished returns through the vertical atmosphere).
correction = -2
t_mod=.05#50%
z_mod=0.25#25%
# how can i generate this array from modifiers
a2=np.array([
[0,0,0,0],#6k feet above ground level (agl)
[0,0,0,0],#4k feet agl
[.25,.13,0,0],#2k feet agl
[1,.5,.25,0]#surface
# ^ ^ ^ ^__ +3 hour
# | | L__ +2 hour
# | L__ +1 hour
# L__ zero hour
])
a1+(a2*correction )
[[ 0. 1. 2. 3. ]
[ 4. 5. 6. 7. ]
[ 7.5 8.74 8.8 11. ]
[10. 12. 13.5 15. ]]
Is this the approach I should be using? If so how can I generate a2 from the z and t axis modifiers.

How about this, we use linear stepping in t and z directions and multiply the t and z axes for points inside the matrix:
def shock_2d(t_mod, z_mod, n=4):
ts = np.maximum(1 - np.arange(n)*t_mod,0)
zs = np.maximum(1 - np.arange(n)*z_mod,0)
shock = zs.reshape(-1,1) # ts.reshape(1,-1)
return np.flipud(shock)
eg
shock_2d(t_mod = 0.5, z_mod = 0.25)
Out:
array([[0.25 , 0.125, 0. , 0. ],
[0.5 , 0.25 , 0. , 0. ],
[0.75 , 0.375, 0. , 0. ],
[1. , 0.5 , 0. , 0. ]])
and
shock_2d(t_mod = 0.05, z_mod = 0.25)
Out:
array([[0.25 , 0.2375, 0.225 , 0.2125],
[0.5 , 0.475 , 0.45 , 0.425 ],
[0.75 , 0.7125, 0.675 , 0.6375],
[1. , 0.95 , 0.9 , 0.85 ]])
the last argument, n, is the size of the matrix

matplotlib colormap giving duplicated values

Recently I was using this method to basically select 6 equally values along this colormap.
import matplotlib.pyplot as plt
import numpy as np
ints = np.linspace(0,255,6)
ints = [int(x) for x in ints]
newcm = plt.cm.Accent(ints)
Normally this would return the colormap values no problem. Now when I run this, the output I get for newcm is:
Out[25]:
array([[ 0.49803922, 0.78823529, 0.49803922, 1. ],
[ 0.4 , 0.4 , 0.4 , 1. ],
[ 0.4 , 0.4 , 0.4 , 1. ],
[ 0.4 , 0.4 , 0.4 , 1. ],
[ 0.4 , 0.4 , 0.4 , 1. ],
[ 0.4 , 0.4 , 0.4 , 1. ]])
So now things are not plotting right. I have also tried bytes=True but the behaviour is the same. Do others get the same result or is it some funny setting on my matplotlib that has gone awry?
Moreover - it seems this is happening in particular on the Accent colormap, but not necessarily others.

In general, a colormap ranges between 0 and 1. In np.linspace(0,255,6) all values except the first are larger than 1, hence you get the output corresponding to the maximum value 1 for all but the first item of that list.
If instead you use numbers = np.linspace(0,1,6), you will get 6 different values from that colormap.
import matplotlib.pyplot as plt
import numpy as np
numbers = np.linspace(0,1,6)
newcm = plt.cm.Accent(numbers)
print(newcm)
produces
[[ 0.49803922 0.78823529 0.49803922 1. ]
[ 0.74509804 0.68235294 0.83137255 1. ]
[ 1. 1. 0.6 1. ]
[ 0.21960784 0.42352941 0.69019608 1. ]
[ 0.74901961 0.35686275 0.09019608 1. ]
[ 0.4 0.4 0.4 1. ]]

Pandas Multi-Index DataFrame to Numpy Ndarray

I am trying to convert a multi-index pandas DataFrame into a numpy.ndarray. The DataFrame is below:
s1 s2 s3 s4
Action State
1 s1 0.0 0 0.8 0.2
s2 0.1 0 0.9 0.0
2 s1 0.0 0 0.9 0.1
s2 0.0 0 1.0 0.0
I would like the resulting numpy.ndarray to be the following with np.shape() = (2,2,4):
[[[ 0.0 0.0 0.8 0.2 ]
[ 0.1 0.0 0.9 0.0 ]]
[[ 0.0 0.0 0.9 0.1 ]
[ 0.0 0.0 1.0 0.0]]]
I have tried df.as_matrix() but this returns:
[[ 0. 0. 0.8 0.2]
[ 0.1 0. 0.9 0. ]
[ 0. 0. 0.9 0.1]
[ 0. 0. 1. 0. ]]
How do I return a list of lists for the first level with each list representing an Action records.

You could use the following:
dim = len(df.index.get_level_values(0).unique())
result = df.values.reshape((dim1, dim1, df.shape[1]))
print(result)
[[[ 0. 0. 0.8 0.2]
[ 0.1 0. 0.9 0. ]]
[[ 0. 0. 0.9 0.1]
[ 0. 0. 1. 0. ]]]
The first line just finds the number of groups that you want to groupby.
Why this (or groupby) is needed: as soon as you use .values, you lose the dimensionality of the MultiIndex from pandas. So you need to re-pass that dimensionality to NumPy in some way.

One way
In [151]: df.groupby(level=0).apply(lambda x: x.values.tolist()).values
Out[151]:
array([[[0.0, 0.0, 0.8, 0.2],
[0.1, 0.0, 0.9, 0.0]],
[[0.0, 0.0, 0.9, 0.1],
[0.0, 0.0, 1.0, 0.0]]], dtype=object)

Using Divakar's suggestion, np.reshape() worked:
>>> print(P)
s1 s2 s3 s4
Action State
1 s1 0.0 0 0.8 0.2
s2 0.1 0 0.9 0.0
2 s1 0.0 0 0.9 0.1
s2 0.0 0 1.0 0.0
>>> np.reshape(P,(2,2,-1))
[[[ 0. 0. 0.8 0.2]
[ 0.1 0. 0.9 0. ]]
[[ 0. 0. 0.9 0.1]
[ 0. 0. 1. 0. ]]]
>>> np.shape(P)
(2, 2, 4)

Elaborating on Brad Solomon's answer, to get a sligthly more generic solution - indexes of different sizes and an unfixed number of indexes - one could do something like this:
def df_to_numpy(df):
try:
shape = [len(level) for level in df.index.levels]
except AttributeError:
shape = [len(df.index)]
ncol = df.shape[-1]
if ncol > 1:
shape.append(ncol)
return df.to_numpy().reshape(shape)
If df has missing sub-indexes reshape will not work. One way to add them would be (maybe there are better solutions):
def enforce_df_shape(df):
try:
ind = pd.MultiIndex.from_product([level.values for level in df.index.levels])
except AttributeError:
return df
fulldf = pd.DataFrame(-1, columns=df.columns, index=ind) # remove -1 to fill fulldf with nan
fulldf.update(df)
return fulldf

If you are just trying to pull out one column, say s1, and get an array with shape (2,2) you can use the .index.levshape like this:
x = df.s1.to_numpy().reshape(df.index.levshape)
This will give you a (2,2) containing the value of s1.

Printing numpy array in python

Here's a simple code in python.
end = np.zeros((11,2))
alpha=0
while(alpha<=1):
end[int(10*alpha)] = alpha
print(end[int(10*alpha)])
alpha+=0.1
print('')
print(end)
and output:
[ 0. 0.]
[ 0.1 0.1]
[ 0.2 0.2]
[ 0.3 0.3]
[ 0.4 0.4]
[ 0.5 0.5]
[ 0.6 0.6]
[ 0.7 0.7]
[ 0.8 0.8]
[ 0.9 0.9]
[ 1. 1.]
[[ 0. 0. ]
[ 0.1 0.1]
[ 0.2 0.2]
[ 0.3 0.3]
[ 0.4 0.4]
[ 0.5 0.5]
[ 0.6 0.6]
[ 0.8 0.8]
[ 0. 0. ]
[ 1. 1. ]
[ 0. 0. ]]

It is easy to notice that 0.7 is missing and after 0.8 goes 0 instead of 0.9 etc... Why are these outputs differ?

It's because of floating point errors. Run this:
import numpy as np
end = np.zeros((11, 2))
alpha=0
while(alpha<=1):
print("alpha is ", alpha)
end[int(10*alpha)] = alpha
print(end[int(10*alpha)])
alpha+=0.1
print('')
print(end)
and you will see that alpha is, successively:
alpha is 0
alpha is 0.1
alpha is 0.2
alpha is 0.30000000000000004
alpha is 0.4
alpha is 0.5
alpha is 0.6
alpha is 0.7
alpha is 0.7999999999999999
alpha is 0.8999999999999999
alpha is 0.9999999999999999
Basically floating point numbers like 0.1 are stored inexactly on your computer. If you add 0.1 together say 8 times, you won't necessarily get 0.8 -- the small errors can accumulate and give you a different number, in this case 0.7999999999999999. Numpy arrays must take integers as indexes however, so it uses the int function to force this to round down to the nearest integer -- 7 -- which causes that row to be overwritten.
To solve this, you must rewrite your code so that you only ever use integers to index into an array. One slightly crude way would be to round the float to the nearest integer using the round function. But really you should rewrite your code so that it iterates over integers and coverts them into floats, rather than iterating over floats and converting them into integers.
You can read more about floating point numbers here:
https://docs.python.org/3/tutorial/floatingpoint.html

As #Denziloe pointed, this is due to floating point errors.
If you look at the definition of int():
If x is floating point, the conversion truncates towards zero
To solve your problem use round() instead of int()

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Interpolation of a pandas DataFrame - python

Related

how to make a regular grid base on some irregular points in python

apply diminishing returns across 2 axis with numpy

matplotlib colormap giving duplicated values

Pandas Multi-Index DataFrame to Numpy Ndarray

Printing numpy array in python

Categories

Resources