Struggling with Python extrapolation - python

I have a recorded data from lab equipment. In several cases I would like to interpolate and extrapolate from recorded data.
I will be using I_id and I_iq as my main control variables.
I have tried many different variations to get this working but I cannot.
My data looks like this for the first piece of measurement equipment:
I_id = [0,-25,-50,-75,-100,-125,-150,-175,0,-25,-50,-75,-100,-125,-150,-175,0,-25,-50,-75,-100,-125,-150,-175,0,-25,-50,-75,-100,-125,-150,-175,0,-25,-50,-75,-100,-125,-150,0,-25,-50,-75,-100,-125,-150,0,-25,-50,-75,-100,-125,0,-25,-50,-75,0,0,-25,-50,-75,-100,-125,-150,-175]
I_iq = [0,0,0,0,0,0,0,0,25,25,25,25,25,25,25,25,50,50,50,50,50,50,50,50,75,75,75,75,75,75,75,75,100,100,100,100,100,100,100,125,125,125,125,125,125,125,150,150,150,150,150,150,175,175,175,175,200,0,0,0,0,0,0,0,0]
var = [-0.040032,0.011188,0.030851,0.183906,0.258842,0.355956,0.560895,0.753436,3.325974,11.611581,12.113206,12.804795,13.11953,13.423358,13.689702,13.899162,17.267299,23.553225,24.495611,25.086743,25.559352,25.953261,26.248565,26.534781,34.935503,35.761774,36.52968,37.227405,37.834295,38.310515,38.715564,38.944562,46.322635,47.382142,48.31467,49.163737,49.897316,50.510074,50.936424,57.367325,58.686137,59.86712,60.871714,61.727998,62.407764,62.902043,68.254704,69.745637,71.075856,72.232987,73.282945,74.110145,78.724496,80.425047,81.965227,83.270788,88.79109,69.950271,1.538601,0.005484,0.160758,0.336944,0.44188,0.568149,0.825262]
I_id
I_iq
var
0
0
0
1
-25
0
2
-50
0
3
-75
0
4
-100
0
5
-125
0
6
-150
0
7
-175
0
8
0
25
9
-25
25
10
-50
25
11
-75
25
12
-100
25
13
-125
25
14
-150
25
15
-175
25
16
0
50
17
-25
50
18
-50
50
19
-75
50
20
-100
50
21
-125
50
22
-150
50
23
-175
50
24
0
75
25
-25
75
26
-50
75
27
-75
75
28
-100
75
29
-125
75
30
-150
75
31
-175
75
32
0
100
33
-25
100
34
-50
100
35
-75
100
36
-100
100
37
-125
100
38
-150
100
39
0
125
40
-25
125
41
-50
125
42
-75
125
43
-100
125
44
-125
125
45
-150
125
46
0
150
47
-25
150
48
-50
150
49
-75
150
50
-100
150
51
-125
150
52
0
175
53
-25
175
54
-50
175
55
-75
175
56
0
200
57
0
0
58
-25
0
59
-50
0
60
-75
0
61
-100
0
62
-125
0
63
-150
0
64
-175
0
I have tried creating a meshgrid:
for var in target_variables:
x = idq_data["Id"]
y = idq_data["Iq"]
z = idq_data[var]
# Create a grid of data for the target variables
xi = np.arange(np.min(x), np.max(x), 1)
yi = np.arange(np.min(y), np.max(y), 1)
xx, yy = np.meshgrid(xi, yi)
# Interpolation function using gridata
zi = griddata((x, y), z, (xx, yy), method='cubic')
However, when using griddata when I plot the extrapolated points do not exist.
# plot the interpolated data on a contour plot
fig = go.Figure()
fig.add_trace(go.Contour(
x=xi,
y=yi,
z=zi,
colorscale='Jet',
))
fig.show()
I understand they have not been plotted as they are NaN's but why has the values not been extrapolated?

You can tinker with the parameters of the Radial Basis method:
xi = np.arange(I_id.min(), 1+I_id.max())
yi = np.arange(I_iq.min(), 1+I_iq.max())
xgrid, ygrid = np.meshgrid(xi, yi)
interpolator = scipy.interpolate.RBFInterpolator(
y=np.stack((I_id, I_iq), axis=1),
d=var,
smoothing=0.1,
)
zi = interpolator(np.stack((xgrid, ygrid)).reshape((-1, 2))).reshape(xgrid.shape)
but it doesn't work very well. Instead I will recommend a linear regression over (most of) your data:
'''
[ Id Iq 1 ] [ a ] = [ var ]
[ Id Iq 1 ] [ b ] [ var ]
[ ... 1 ] [ c ] [ ... ]
'''
linear_region = I_id <= -25
affine = np.stack((I_id, I_iq, np.ones_like(I_id)), axis=1)[linear_region, :]
params, *_ = np.linalg.lstsq(affine, var[linear_region])
print(params)
di = np.arange(I_id.min(), 1+I_id.max())[np.newaxis, :]
qi = np.arange(I_iq.min(), 1+I_iq.max())[:, np.newaxis]
vi = params[0]*di + params[1]*qi + params[2]
fig, ax = plt.subplots()
graph: matplotlib.contour.QuadContourSet = ax.contourf(di.ravel(), qi.ravel(), vi)
bar: matplotlib.colorbar.Colorbar = fig.colorbar(graph)
bar.set_label('var')
ax.set_xlabel('I_id')
ax.set_ylabel('I_iq')
plt.show()
producing planar parameters
[-0.01984498 0.47951953 -1.0313259 ]
Don't use the jet colormap.

Related

Find overlapping or inserted rows in python

I have a simple DataFrame:
start end
0 30 40
1 45 55
2 50 60
3 53 64
4 65 70
5 75 80
6 77 85
7 80 83
8 90 120
9 95 100
10 105 110
You may notice some rows are part of another row, or they overlap with them. I want to straighten up this DataFrame to get this:
start end
0 30 40
1 45 64
2 65 70
3 75 85
4 90 120
I drew a picture for a better explanation (hope, it helps):
Use custom function with Dataframe constructor:
#https://stackoverflow.com/a/5679899/2901002
def merge(times):
saved = list(times[0])
for st, en in sorted([sorted(t) for t in times]):
if st <= saved[1]:
saved[1] = max(saved[1], en)
else:
yield tuple(saved)
saved[0] = st
saved[1] = en
yield tuple(saved)
df1 = pd.DataFrame(merge(df[['start','end']].to_numpy()), columns=['start','end'])
print (df1)
start end
0 30 40
1 45 64
2 65 70
3 75 85
4 90 120

Return the closest matched value to [reference] from [ABCD] columns

What is the cleanest way to return the closest matched value to [reference] from [ABCD] columns.
Output is the closest value. e.g. for the first row, absolute delta is [19 40 45 95] so the closest value to return is -21.
df1 = pd.DataFrame(np.random.randint(-100,300,size=(100, 4)), columns=list('ABCD')) # Generate Random Dataframe
df2 = pd.DataFrame(np.random.randint(-100,100,size=(100, 1)), columns=['reference'])
df = pd.concat([df1,df2], axis=1)
df['closest_value'] = "?"
df
You can apply a lambda function on rows and get the closest value from the desired columns based on absolute difference from the reference column
df['closest_value'] = (df
.apply(
lambda x: x.values[(np.abs(x[[i for i in x.index if i != 'reference']].values
- x['reference'])).argmin()]
, axis=1)
)
OUTPUT:
A B C D reference closest_value
0 -2 227 -88 268 -68 -88
1 185 182 18 279 -59 18
2 140 40 264 98 61 40
3 0 98 -32 81 47 81
4 -6 70 -6 -9 -53 -9
.. ... ... ... ... ... ...
95 -29 -34 141 166 -76 -34
96 14 22 175 205 69 22
97 265 11 -25 284 -88 -25
98 283 31 -91 252 11 31
99 6 -59 84 95 -15 6
[100 rows x 6 columns]
Try this :
idx = df.drop(['reference'], axis=1).sub(df.reference, axis=0).abs().idxmin(1)
df['closest_value'] = df.lookup(df.index, idx)
>>> display(df)
Edit:
Since pandas.DataFrame.lookup will be (or is?) deprecated, you can :
Replace this line :
df.lookup(df.index, df['col'])
By these:
out = df.set_index(idx, append=True)
out['closest_value'] = df.stack()
The cleanest way:
Using a conversion to numpy.
data = df[list('ABCD')].to_numpy()
reference = df[['reference']].to_numpy()
indices = np.abs(data - reference).argmin(axis=1)
df['closest_value'] = data[np.arange(len(data)), indices]
Result:
A B C D reference closest_value
0 -60 254 80 -46 89 80
1 5 10 72 259 41 10
2 219 14 269 -70 0 14
3 171 36 132 45 -55 36
4 7 233 -65 231 -76 -65
.. ... ... ... ... ... ...
95 229 213 -54 129 62 129
96 16 -26 -30 79 94 79
97 105 157 -3 148 -48 -3
98 -27 60 218 273 62 60
99 140 131 -49 28 -46 -49
[100 rows x 6 columns]

replace specific columns values

I want to loop through dataset and replace specific columns value with one the same [value]
The whole dataset has 91164 rows.
The case here i need to replace vec_red ,vec_greem, vec_blue with new_data
new_data has shape of (91164,) and its number of appearance equals index of my dataframe.
For e.g. last item is
This 1 need to be value in val_red , val_blue, val_green.
So I want to loop through whole dataframe and replace the calues in columns from 3 to 5.
What I have is :
label_idx = 0
for i in range(321):
for j in range(284):
(sth here) = new_data[label_idx]
label_idx += 1
The case here is that I am updating my pixel values after filtration. Thank you.
The shape of 91164 is result of multiplication 321 * 284. These are my pixel values in an RGB image.
Looping over rows of a dataframe is a code smell. If the 3 columns must receive the same values, you can do it in one single operation:
df[['vec_red', 'vec_green', 'vec_blue']] = np.transpose(
np.array([new_data, new_data, new_data]))
Demo:
np.random.seed(0)
nx = 284
ny = 321
df = pd.DataFrame({'x_indices': [i for j in range(ny) for i in range(nx)],
'y_indices': [j for j in range(ny) for i in range(nx)],
'vec_red': np.random.randint(0, 256, nx * ny),
'vec_green': np.random.randint(0, 256, nx * ny),
'vec_blue': np.random.randint(0, 256, nx * ny)
})
new_data = np.random.randint(0, 256, nx * ny)
print(df)
print(new_data)
df[['vec_red', 'vec_green', 'vec_blue']] = np.transpose(
np.array([new_data, new_data, new_data]))
print(df)
It gives as expected:
x_indices y_indices vec_red vec_green vec_blue
0 0 0 172 167 100
1 1 0 47 92 124
2 2 0 117 65 174
3 3 0 192 249 72
4 4 0 67 108 144
... ... ... ... ... ...
91159 279 320 16 162 42
91160 280 320 142 169 145
91161 281 320 225 81 143
91162 282 320 106 93 68
91163 283 320 85 65 130
[91164 rows x 5 columns]
[ 32 48 245 ... 26 66 58]
x_indices y_indices vec_red vec_green vec_blue
0 0 0 32 32 32
1 1 0 48 48 48
2 2 0 245 245 245
3 3 0 6 6 6
4 4 0 178 178 178
... ... ... ... ... ...
91159 279 320 27 27 27
91160 280 320 118 118 118
91161 281 320 26 26 26
91162 282 320 66 66 66
91163 283 320 58 58 58
[91164 rows x 5 columns]

Creating circle from data

I have some data and what i am trying to do is to make full circle and half circle using that data. Below is the code i did so far but it should start from zero and end at zero. Also this creates a so called half circle. Is there a way to create half-circle and full-circle, starts from zero and ends at zero. Or using the data without manipulating it?
np.random.seed(15)
data = np.random.randint(0, 100, 100)
print(data)
arr = data - np.mean(data)
arr = np.cumsum(np.sort(arr))
plt.plot(arr)
plt.axhline(0, color="#000000", ls="-.", linewidth=0.5)
plt.show()
[72 12 5 0 28 27 71 75 85 47 93 17 31 23 32 62 10 15 68 39 37 19 44 77
60 29 79 15 56 49 1 31 96 85 26 34 75 50 65 53 70 41 34 40 22 63 79 56
28 99 4 7 66 42 96 7 24 60 45 83 49 53 29 76 88 76 33 2 88 42 81 51
62 23 93 98 87 18 90 90 16 77 90 32 70 4 28 84 35 28 69 54 64 73 84 56
46 38 35 14]
You can use Circle (http://matplotlib.org/api/patches_api.html):
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
c = plt.Circle((0, 0), radius=1, edgecolor='b', facecolor='None')
ax.add_patch(c)
plt.show()

How to read file and calculate graph properties in igraph python

I am new to the igraph python and I have read the tutorial but I cannot understand very well.
I need some help to calculate the graph properties of my data. My data is in bpseq format and I do not know how to read the file in igraph.
The graph properties that I need to get is
Articulation point
Average path length
Average node betweenness
Variance of node betweenness
Average edge betweenness
Variance of edge betweenness
Average co-citation coupling
Average bibliographic coupling
Average closeness centrality
Diameter
Graph Density
This is the example of my dataset. The # is the name of the RNA class, the first column is the position, the alphabet is the base and the third column is the pointer. Suppose the base should be the node. and the bond between the nucleotide base should be the edge. But I do not know how to do it. There are about 2,00 dataset that looks like below but that is the one of the RNA class.
# RF00001_AF095839_1_346-228 5S_rRNA
1 G 118
2 C 117
3 G 116
4 U 115
5 A 114
6 C 113
7 G 112
8 G 111
9 C 110
10 C 0
11 A 0
12 U 0
13 A 0
14 C 0
15 U 0
16 A 0
17 U 0
18 G 0
19 G 36
20 G 35
21 G 34
22 A 33
23 A 0
24 U 0
25 A 0
26 C 0
27 A 0
28 C 0
29 C 0
30 U 0
31 G 0
32 A 0
33 U 22
34 C 21
35 C 20
36 C 19
37 G 0
38 U 106
39 C 105
40 C 104
41 G 103
42 A 0
43 U 0
44 U 0
45 U 0
46 C 0
47 A 0
48 G 0
49 A 0
50 A 0
51 G 0
52 U 0
53 U 0
54 A 0
55 A 0
56 G 67
57 C 66
58 C 65
59 U 64
60 C 0
61 A 0
62 U 0
63 C 0
64 A 59
65 G 58
66 G 57
67 C 56
68 A 0
69 U 0
70 C 0
71 C 0
72 U 0
73 A 0
74 A 0
75 G 0
76 U 0
77 A 0
78 C 0
79 U 0
80 A 0
81 G 96
82 G 95
83 G 94
84 U 93
85 G 92
86 G 91
87 G 0
88 C 0
89 G 0
90 A 0
91 C 86
92 C 85
93 A 84
94 C 83
95 C 82
96 U 81
97 G 0
98 G 0
99 G 0
100 A 0
101 A 0
102 C 0
103 C 41
104 G 40
105 G 39
106 A 38
107 U 0
108 G 0
109 U 0
110 G 9
111 C 8
112 U 7
113 G 6
114 U 5
115 A 4
116 C 3
117 G 2
118 C 1
119 U 0
I am using Ubuntu 18.04. I really hope someone can help me and guide me on using igraph python.

Categories

Resources