assign value of arbitrary line in 2-d array to nans - python

I have a 2D numpy array, z, in which I would like to assign values to nan based on the equation of a line +/- a width of 20. I am trying to implement the Raman 2nd scattering correction as it is done by the eem_remove_scattering method in the eemR package listed here:
https://cran.r-project.org/web/packages/eemR/vignettes/introduction.html
but the method isn't visible.
import numpy as np
ex = np.array([240, 245, 250, 255, 260, 265, 270, 275, 280, 285, 290, 295, 300,
305, 310, 315, 320, 325, 330, 335, 340, 345, 350, 355, 360, 365,
370, 375, 380, 385, 390, 395, 400, 405, 410, 415, 420, 425, 430,
435, 440, 445, 450])
em = np.array([300, 302, 304, 306, 308, 310, 312, 314, 316, 318, 320, 322, 324,
326, 328, 330, 332, 334, 336, 338, 340, 342, 344, 346, 348, 350,
352, 354, 356, 358, 360, 362, 364, 366, 368, 370, 372, 374, 376,
378, 380, 382, 384, 386, 388, 390, 392, 394, 396, 398, 400, 402,
404, 406, 408, 410, 412, 414, 416, 418, 420, 422, 424, 426, 428,
430, 432, 434, 436, 438, 440, 442, 444, 446, 448, 450, 452, 454,
456, 458, 460, 462, 464, 466, 468, 470, 472, 474, 476, 478, 480,
482, 484, 486, 488, 490, 492, 494, 496, 498, 500, 502, 504, 506,
508, 510, 512, 514, 516, 518, 520, 522, 524, 526, 528, 530, 532,
534, 536, 538, 540, 542, 544, 546, 548, 550, 552, 554, 556, 558,
560, 562, 564, 566, 568, 570, 572, 574, 576, 578, 580, 582, 584,
586, 588, 590, 592, 594, 596, 598, 600])
X, Y = np.meshgrid(ex, em)
z = np.sin(X) + np.cos(Y)
The equation that I would like to apply is em = - 2 ex/ (0.00036*ex-1) + 500.
I want to set every value in the array that intersects this line (+/- 20 ) to be set to nans. Its simple enough to set a single element to nans, but I havent been able to locate a python function to apply this equation to the array and only set values that intersect with this line to nans.
The desired output would be a new array with the same dimensions as z, but with the values that intersect the line equivalent to nan. Any suggestions on how to proceed are greatly appreciated.

Use np.where in the form np.where( "condition for intersection", np.nan, z):
zi = np.where( np.abs(-2*X/(0.00036*X-1) + 500 - Y) <= 20, np.nan, z)
As a matter of fact, there are no intersections here because (0.00036*ex-1) is close to -1 for all your values, which makes - 2*ex/(0.00036*ex-1) close to 2*ex, and adding 500 brings this over any values you have in em. But in principle this works.
Also, I suspect that the goal you plan to achieve by setting those values to NaN would be better achieved by using a masked array.

Related

How to implement different sequences in shell sort in python?

Hi I have the following code for implementing Shell sort in Python. How can I implement the following sequences in Shell sort using the code below (Note this is not the list I want to sort) :
1, 4, 13, 40, 121, 364, 1093, 3280, 9841, 29524 (Knuth’s sequence)
1, 5, 17, 53, 149, 373, 1123, 3371, 10111, 30341
1, 10, 30, 60, 120, 360, 1080, 3240, 9720, 29160
interval = n // 2
while interval > 0:
for i in range(interval, n):
temp = array[i]
j = i
while j >= interval and array[j - interval] > temp:
array[j] = array[j - interval]
j -= interval
array[j] = temp
interval //= 2
You could modify the pseudo-code provided in the Wikipedia article for Shellsort to take in the gap sequence as a parameter:
from random import choices
from timeit import timeit
RAND_SEQUENCE_SIZE = 500
GAP_SEQUENCES = {
'CIURA_A102549': [701, 301, 132, 57, 23, 10, 4, 1],
'KNUTH_A003462': [29524, 9841, 3280, 1093, 364, 121, 40, 13, 4, 1],
'SPACED_OUT_PRIME_GAPS': [30341, 10111, 3371, 1123, 373, 149, 53, 17, 5, 1],
'SPACED_OUT_EVEN_GAPS': [29160, 9720, 3240, 1080, 360, 120, 60, 30, 10, 1],
}
def shell_sort(seq: list[int], gap_sequence: list[int]) -> None:
n = len(seq)
# Start with the largest gap and work down to a gap of 1. Similar to
# insertion sort but instead of 1, gap is being used in each step.
for gap in gap_sequence:
# Do a gapped insertion sort for every element in gaps.
# Each gap sort includes (0..gap-1) offset interleaved sorting.
for offset in range(gap):
for i in range(offset, n, gap):
# Save seq[i] in temp and make a hole at position i.
temp = seq[i]
# Shift earlier gap-sorted elements up until the correct
# location for seq[i] is found.
j = i
while j >= gap and seq[j - gap] > temp:
seq[j] = seq[j - gap]
j -= gap
# Put temp (the original seq[i]) in its correct location.
seq[j] = temp
def main() -> None:
seq = choices(population=range(1000), k=RAND_SEQUENCE_SIZE)
print(f'{seq = }')
print(f'{len(seq) = }')
for name, gap_sequence in GAP_SEQUENCES.items():
print(f'Shell sort using {name} gap sequence: {gap_sequence}')
print(f'Time taken to sort 100 times: {timeit(lambda: shell_sort(seq.copy(), gap_sequence), number=100)} seconds')
if __name__ == '__main__':
main()
Example Output:
seq = [331, 799, 153, 700, 373, 38, 203, 535, 894, 500, 922, 939, 507, 506, 89, 40, 442, 108, 112, 359, 280, 946, 395, 708, 140, 435, 588, 306, 202, 23, 6, 189, 570, 600, 857, 949, 606, 617, 556, 863, 521, 776, 436, 801, 501, 588, 927, 279, 210, 72, 460, 52, 340, 632, 385, 965, 730, 360, 88, 216, 991, 520, 74, 112, 770, 853, 483, 787, 229, 812, 259, 349, 967, 227, 957, 728, 780, 51, 604, 748, 3, 679, 33, 488, 130, 203, 493, 471, 397, 53, 49, 172, 7, 306, 613, 519, 575, 64, 168, 161, 376, 903, 338, 800, 58, 729, 421, 238, 967, 294, 967, 218, 456, 823, 649, 569, 144, 103, 970, 780, 859, 719, 15, 536, 263, 917, 0, 54, 370, 703, 911, 518, 78, 41, 106, 452, 355, 571, 249, 58, 274, 327, 500, 341, 743, 536, 432, 799, 597, 681, 301, 856, 219, 63, 653, 680, 891, 725, 537, 673, 815, 504, 720, 573, 60, 91, 909, 892, 964, 119, 793, 540, 303, 538, 130, 717, 755, 968, 46, 229, 837, 398, 182, 303, 99, 808, 56, 780, 415, 33, 511, 771, 875, 593, 120, 727, 505, 905, 619, 295, 958, 566, 8, 291, 811, 529, 789, 523, 545, 5, 631, 28, 107, 292, 831, 657, 952, 239, 814, 862, 912, 2, 147, 750, 132, 528, 408, 916, 718, 261, 488, 621, 261, 963, 880, 625, 151, 982, 819, 749, 224, 572, 690, 766, 278, 417, 248, 987, 664, 515, 691, 940, 860, 172, 898, 321, 381, 662, 293, 354, 642, 219, 133, 133, 854, 162, 254, 816, 630, 21, 577, 486, 792, 731, 714, 581, 633, 794, 120, 386, 874, 177, 652, 159, 264, 414, 417, 730, 728, 716, 973, 688, 106, 345, 153, 909, 382, 505, 721, 363, 230, 588, 765, 340, 142, 549, 558, 189, 547, 728, 974, 468, 182, 255, 637, 317, 40, 775, 696, 135, 985, 884, 131, 797, 84, 89, 962, 810, 520, 843, 24, 400, 717, 834, 170, 681, 333, 68, 159, 688, 422, 198, 621, 386, 391, 839, 283, 167, 655, 314, 820, 432, 412, 181, 440, 864, 828, 217, 491, 593, 298, 885, 831, 535, 92, 305, 510, 90, 949, 461, 627, 851, 606, 280, 413, 624, 916, 16, 517, 700, 776, 323, 161, 329, 25, 868, 258, 97, 219, 620, 69, 24, 794, 981, 361, 691, 20, 90, 825, 442, 531, 562, 240, 0, 440, 418, 338, 526, 34, 230, 381, 598, 734, 925, 209, 231, 980, 122, 374, 752, 144, 105, 920, 780, 828, 948, 515, 443, 810, 81, 303, 751, 779, 516, 394, 455, 116, 448, 652, 293, 327, 367, 793, 47, 946, 653, 927, 910, 583, 845, 442, 989, 393, 490, 564, 54, 656, 689, 626, 531, 941, 575, 628, 865, 705, 219, 42, 19, 10, 155, 436, 319, 510, 520, 869, 101, 918, 170, 826, 146, 389, 200, 992, 404, 982, 889, 818, 684, 524, 642, 991, 973, 561, 104, 418, 207, 963, 192, 410, 33]
len(seq) = 500
Shell sort using CIURA_A102549 gap sequence: [701, 301, 132, 57, 23, 10, 4, 1]
Time taken to sort 100 times: 0.06717020808719099 seconds
Shell sort using KNUTH_A003462 gap sequence: [29524, 9841, 3280, 1093, 364, 121, 40, 13, 4, 1]
Time taken to sort 100 times: 0.34870366705581546 seconds
Shell sort using SPACED_OUT_PRIME_GAPS gap sequence: [30341, 10111, 3371, 1123, 373, 149, 53, 17, 5, 1]
Time taken to sort 100 times: 0.3563524999190122 seconds
Shell sort using SPACED_OUT_EVEN_GAPS gap sequence: [29160, 9720, 3240, 1080, 360, 120, 60, 30, 10, 1]
Time taken to sort 100 times: 0.38147866702638566 seconds

Selecting Elements in a one dimensional array in Python Numpy

I have created a one dimensional array in Pyhton Numpy as follows:
import numpy as np
list1=[573, 554, 536, 535, 531, 523, 521, 519, 518, 518, 515, 514, 511, 506, 504, 501, 501, 500, 500, 499, 495, 494, 493, 491, 490, 489, 487, 485, 484, 482, 482, 481, 479, 478, 477, 471, 466, 453, 449, 448, 445, 439, 434, 432, 427, 423, 421, 413, 410, 409, 407, 394, 391, 388, 388, 386, 376, 376, 375, 368]
array_example = np.array(list1)
Now I would like to select the 3rd, 7th and 9th value so I get the values 536, 521, 518.
I try:
array_example[2,6,8]
but get the following error:
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-4-6335bacdceb1> in <module>
----> 1 array_example[2,6,8]
IndexError: too many indices for array: array is 1-dimensional, but 3 were indexed
What would be the right solution? Any suggestions? Ty in advance!
Try this:
array_example[[2,6,8]]
It is called 'fancy indexing'.

Fill area of overlap between two normal distributions in seaborn / matplotlib

I want to fill the area overlapping between two normal distributions. I've got the x min and max, but I can't figure out how to set the y boundaries.
I've looked at the plt documentation and some examples. I think this related question and this one come close, but no luck. Here's what I have so far.
import numpy as np
import seaborn as sns
import scipy.stats as stats
import matplotlib.pyplot as plt
pepe_calories = np.array([361, 291, 263, 284, 311, 284, 282, 228, 328, 263, 354, 302, 293,
254, 297, 281, 307, 281, 262, 302, 244, 259, 273, 299, 278, 257,
296, 237, 276, 280, 291, 278, 251, 313, 314, 323, 333, 270, 317,
321, 307, 256, 301, 264, 221, 251, 307, 283, 300, 292, 344, 239,
288, 356, 224, 246, 196, 202, 314, 301, 336, 294, 237, 284, 311,
257, 255, 287, 243, 267, 253, 257, 320, 295, 295, 271, 322, 343,
313, 293, 298, 272, 267, 257, 334, 276, 337, 325, 261, 344, 298,
253, 302, 318, 289, 302, 291, 343, 310, 241])
modern_calories = np.array([310, 315, 303, 360, 339, 416, 278, 326, 316, 314, 333, 317, 357,
304, 363, 387, 279, 350, 367, 321, 366, 311, 308, 303, 299, 363,
335, 357, 392, 321, 361, 285, 321, 290, 392, 341, 331, 338, 326,
314, 327, 320, 293, 333, 297, 315, 365, 408, 352, 359, 312, 300,
263, 358, 345, 360, 336, 378, 315, 354, 318, 300, 372, 305, 336,
286, 296, 413, 383, 328, 418, 388, 416, 371, 313, 321, 321, 317,
402, 290, 328, 344, 330, 319, 309, 327, 351, 324, 278, 369, 416,
359, 381, 324, 306, 350, 385, 335, 395, 308])
ax = sns.distplot(pepe_calories, fit_kws={"color":"blue"}, kde=False,
fit=stats.norm, hist=None, label="Pepe's");
ax = sns.distplot(modern_calories, fit_kws={"color":"orange"}, kde=False,
fit=stats.norm, hist=None, label="Modern");
# Get the two lines from the axes to generate shading
l1 = ax.lines[0]
l2 = ax.lines[1]
# Get the xy data from the lines so that we can shade
x1 = l1.get_xydata()[:,0]
y1 = l1.get_xydata()[:,1]
x2 = l2.get_xydata()[:,0]
y2 = l2.get_xydata()[:,1]
x2min = np.min(x2)
x1max = np.max(x1)
ax.fill_between(x1,y1, where = ((x1 > x2min) & (x1 < x1max)), color="red", alpha=0.3)
#> <matplotlib.collections.PolyCollection at 0x1a200510b8>
plt.legend()
#> <matplotlib.legend.Legend at 0x1a1ff2e390>
plt.show()
Any ideas?
Created on 2018-12-01 by the reprexpy package
import reprexpy
print(reprexpy.SessionInfo())
#> Session info --------------------------------------------------------------------
#> Platform: Darwin-18.2.0-x86_64-i386-64bit (64-bit)
#> Python: 3.6
#> Date: 2018-12-01
#> Packages ------------------------------------------------------------------------
#> matplotlib==2.1.2
#> numpy==1.15.4
#> reprexpy==0.1.1
#> scipy==1.1.0
#> seaborn==0.9.0
While gathering the pdf data from get_xydata is clever, you are now at the mercy of matplotlib's rendering / segmentation algorithm. Having x1 and x2 span different ranges also makes comparing y1 and y2 difficult.
You can avoid these problems by fitting the normals yourself instead of
letting sns.distplot do it. Then you have more control over the values you are
looking for.
import numpy as np
import scipy.stats as stats
import matplotlib.pyplot as plt
norm = stats.norm
pepe_calories = np.array([361, 291, 263, 284, 311, 284, 282, 228, 328, 263, 354, 302, 293,
254, 297, 281, 307, 281, 262, 302, 244, 259, 273, 299, 278, 257,
296, 237, 276, 280, 291, 278, 251, 313, 314, 323, 333, 270, 317,
321, 307, 256, 301, 264, 221, 251, 307, 283, 300, 292, 344, 239,
288, 356, 224, 246, 196, 202, 314, 301, 336, 294, 237, 284, 311,
257, 255, 287, 243, 267, 253, 257, 320, 295, 295, 271, 322, 343,
313, 293, 298, 272, 267, 257, 334, 276, 337, 325, 261, 344, 298,
253, 302, 318, 289, 302, 291, 343, 310, 241])
modern_calories = np.array([310, 315, 303, 360, 339, 416, 278, 326, 316, 314, 333, 317, 357,
304, 363, 387, 279, 350, 367, 321, 366, 311, 308, 303, 299, 363,
335, 357, 392, 321, 361, 285, 321, 290, 392, 341, 331, 338, 326,
314, 327, 320, 293, 333, 297, 315, 365, 408, 352, 359, 312, 300,
263, 358, 345, 360, 336, 378, 315, 354, 318, 300, 372, 305, 336,
286, 296, 413, 383, 328, 418, 388, 416, 371, 313, 321, 321, 317,
402, 290, 328, 344, 330, 319, 309, 327, 351, 324, 278, 369, 416,
359, 381, 324, 306, 350, 385, 335, 395, 308])
pepe_params = norm.fit(pepe_calories)
modern_params = norm.fit(modern_calories)
xmin = min(pepe_calories.min(), modern_calories.min())
xmax = max(pepe_calories.max(), modern_calories.max())
x = np.linspace(xmin, xmax, 100)
pepe_pdf = norm(*pepe_params).pdf(x)
modern_pdf = norm(*modern_params).pdf(x)
y = np.minimum(modern_pdf, pepe_pdf)
fig, ax = plt.subplots()
ax.plot(x, pepe_pdf, label="Pepe's", color='blue')
ax.plot(x, modern_pdf, label="Modern", color='orange')
ax.fill_between(x, y, color='red', alpha=0.3)
plt.legend()
plt.show()
If, let's say, sns.distplot (or some other plotting function) made a plot that you did not want to have to reproduce, then you could use the data from get_xydata this way:
import numpy as np
import seaborn as sns
import scipy.stats as stats
import matplotlib.pyplot as plt
pepe_calories = np.array([361, 291, 263, 284, 311, 284, 282, 228, 328, 263, 354, 302, 293,
254, 297, 281, 307, 281, 262, 302, 244, 259, 273, 299, 278, 257,
296, 237, 276, 280, 291, 278, 251, 313, 314, 323, 333, 270, 317,
321, 307, 256, 301, 264, 221, 251, 307, 283, 300, 292, 344, 239,
288, 356, 224, 246, 196, 202, 314, 301, 336, 294, 237, 284, 311,
257, 255, 287, 243, 267, 253, 257, 320, 295, 295, 271, 322, 343,
313, 293, 298, 272, 267, 257, 334, 276, 337, 325, 261, 344, 298,
253, 302, 318, 289, 302, 291, 343, 310, 241])
modern_calories = np.array([310, 315, 303, 360, 339, 416, 278, 326, 316, 314, 333, 317, 357,
304, 363, 387, 279, 350, 367, 321, 366, 311, 308, 303, 299, 363,
335, 357, 392, 321, 361, 285, 321, 290, 392, 341, 331, 338, 326,
314, 327, 320, 293, 333, 297, 315, 365, 408, 352, 359, 312, 300,
263, 358, 345, 360, 336, 378, 315, 354, 318, 300, 372, 305, 336,
286, 296, 413, 383, 328, 418, 388, 416, 371, 313, 321, 321, 317,
402, 290, 328, 344, 330, 319, 309, 327, 351, 324, 278, 369, 416,
359, 381, 324, 306, 350, 385, 335, 395, 308])
ax = sns.distplot(pepe_calories, fit_kws={"color":"blue"}, kde=False,
fit=stats.norm, hist=None, label="Pepe's");
ax = sns.distplot(modern_calories, fit_kws={"color":"orange"}, kde=False,
fit=stats.norm, hist=None, label="Modern");
# Get the two lines from the axes to generate shading
l1 = ax.lines[0]
l2 = ax.lines[1]
# Get the xy data from the lines so that we can shade
x1, y1 = l1.get_xydata().T
x2, y2 = l2.get_xydata().T
xmin = max(x1.min(), x2.min())
xmax = min(x1.max(), x2.max())
x = np.linspace(xmin, xmax, 100)
y1 = np.interp(x, x1, y1)
y2 = np.interp(x, x2, y2)
y = np.minimum(y1, y2)
ax.fill_between(x, y, color="red", alpha=0.3)
plt.legend()
plt.show()
I suppose not using seaborn in cases where you want to have full control over the resulting plot is often a useful strategy. Hence just calculate the fits, plot them and use fill between the curves up to the point where they cross each other.
import numpy as np
import scipy.stats as stats
import matplotlib.pyplot as plt
pepe_calories = np.array(...)
modern_calories = np.array(...)
x = np.linspace(150,470,1000)
y1 = stats.norm.pdf(x, *stats.norm.fit(pepe_calories))
y2 = stats.norm.pdf(x, *stats.norm.fit(modern_calories))
cross = x[y1-y2 <= 0][0]
fig, ax = plt.subplots()
ax.fill_between(x,y1,y2, where=(x<=cross), color="red", alpha=0.3)
ax.plot(x,y1, label="Pepe's")
ax.plot(x,y2, label="Modern")
ax.legend()
plt.show()

Removing outliers from lists/XY scatter

I have two lists containing heart beat intervals (Y-axis, in ms; IBIs below) and their absolute timepoints (X-axis, in ms; RR_times below). There are some misreadings, such that the first list contains outliers that need to be removed, and the second one their corresponding timepoints. It would be optimal if the outliers in the first list are NaN-ed so that the total time for the recording remains the same.
RR_times = [411, 827, 1241, 1653, 2066, 2481, 2894, 3308,
3714, 4126, 4532, 4938, 5343, 5751, 6156, 6552,
6951, 7346, 7749, 8149, 8546, 8944, 9338, 9735,
10123, 10511, 10905, 11290, 11675, 12060, 12441, 12825,
13205, 13581, 13960, 14342, 14717, 15087, 15462, 15829,
16204, 16531, 16902, 17304, 17670, 18040, 18398, 18762,
19127, 19465, 19823, 20196, 20554, 20906, 21256, 21609,
21959, 22264, 22637, 22995, 23308, 23649, 24012, 24352,
24687, 25026, 25390, 25681, 26014, 26347, 26680, 27330,
27985, 28628, 28951, 29596, 29915, 30238, 30562, 31191,
31826, 32141, 32461, 32775, 33095, 33382, 33695, 34029,
34341, 34654, 34967, 35281, 35595, 36220, 36530, 36844,
37150, 37462, 37775, 38084, 38395, 38703, 39014, 39324,
39632, 39937, 40246, 40554, 40862, 41169, 41479, 41787,
42095, 42406, 42714, 43019, 43330, 43642, 43945, 44254,
44563, 44871, 45183, 45491, 45796, 46101, 46410, 46713,
47327, 47632, 47937, 48244, 48555, 48867, 49177, 49488,
49792, 50094, 50398, 50707, 50993, 51324, 51626, 51931,
52239, 52550, 52857, 53161, 53773, 54080, 54387, 54693,
54998, 55311, 55617, 55924, 56235, 56547, 56852, 57159,
57470, 57781, 58091, 58400, 58709, 59020, 59331, 59644,
59955, 60265, 60579, 60890, 61206, 61521, 61833, 62149,
62463, 62772, 63088, 63403, 63716, 64034, 64352, 64665,
64984, 65624, 65940, 66262, 66578, 66900, 67221, 67543,
67861, 68179, 68504, 68819, 69145, 69459, 69782, 70111,
70428, 70747, 71070, 71389, 71710, 72036, 72358, 72680,
73003, 73326, 73648, 73973, 74296, 74620, 74944, 75269,
75592, 75916, 76241, 76566, 76889, 77216, 77543, 77869,
78191, 78518, 78843, 79165, 79496, 79823, 80148, 80479,
80803, 81128, 81459, 81783, 82110, 82439, 82771, 83095,
83426, 83757, 84086, 84416, 84741, 85074, 85400, 85729,
86060, 86390, 86719, 87051, 87380, 87711, 88041, 88373,
88705, 89029, 89365, 89698, 90023, 90356, 90690, 91019,
91352, 91684, 92014, 92347, 92681, 93014, 93349, 93678,
94011, 94344, 94675, 95009, 95339, 95673, 96007, 96341,
96668, 97002, 97337, 97665, 98003, 98335, 98668, 99003,
99339, 99673, 100007, 100346, 100684, 101017, 101357, 101693,
102028, 102368, 102705, 103043, 103380, 103718, 104061, 104403,
104736, 105077, 105421, 105756, 106096, 106437, 106777, 107118,
107461, 107800, 108141, 108485, 108822, 109167, 109507, 109848,
110196, 110538, 110884, 111230, 111571, 111918, 112263, 112606,
112952, 113639, 113987, 114336, 114680, 115025, 115372, 115722,
116068, 116418, 116766, 117114, 117464, 117811, 118158, 118511,
118858, 119208, 119557, 119904, 120257, 120606, 120952, 121303,
121655, 122003, 122354, 122707, 123057, 123408, 123760, 124114,
124466, 124815, 125172, 125523, 125879, 126231, 126586, 126946,
127298, 127653, 128014, 128369, 128724, 129084, 129441, 129794,
130150, 130504, 130863, 131219, 131576, 131937, 132297, 132653,
133012, 133375, 133731, 134091, 134455, 134813, 135174, 135534,
135897, 136258, 136621, 136986, 137349, 137711, 138073, 138439,
138799, 139164, 139526, 139887, 140253, 140617, 140977, 141344,
141706, 142071, 142438, 142803, 143170, 143537, 143904, 144274,
144641, 145011, 145382, 145749, 146124, 146493, 146864, 147235,
147605, 147977, 148346, 148718, 149085, 149455, 149826, 150195,
150566, 150936, 151310, 151676, 152048, 152423, 152795, 153167,
153539, 153916, 154290, 154661, 155036, 155408, 155782, 156159,
156530, 156905, 157280, 157655, 158029, 158404, 158783, 159157,
159532, 159910, 160290, 160660, 161037, 161415, 161786, 162161,
162538, 162913, 163289, 163665, 164040, 164415, 164789, 165164,
165539, 165911, 166286, 166661, 167040, 167418, 167791, 168169,
168545, 168922, 169300, 169676, 170053, 170429, 170811, 171195,
171571, 171952, 172335, 172717, 173098, 173484, 173869, 174254,
174637, 175020, 175403, 175785, 176167, 176552, 176933, 177316,
177698, 178080, 178463, 178840, 179224, 179603, 179979, 180360,
180739, 181114, 181492, 181870, 182248, 182626, 183001, 183378,
183752, 184128, 184503, 184876, 185252, 185629, 186003, 186384,
186760, 187134, 187515, 187900, 188281, 188656, 189031, 189415,
189798, 190176, 190555, 190936, 191313, 191692, 192069, 192448,
192824, 193203, 193578, 193953, 194330, 194707]
IBIs = [411,416,414,412,413,415, 413, 414, 406, 412, 406, 406, 405,
408, 405, 396, 399, 395, 403, 400, 397, 398, 394, 397, 388, 388,
394, 385, 385, 385, 381, 384, 380, 376, 379, 382, 375, 370, 375,
367, 375, 327, 371, 402, 366, 370, 358, 364, 365, 338, 358, 373,
358, 352, 350, 353, 350, 305, 373, 358, 313, 341, 363, 340, 335,
339, 364, 291, 333, 333, 333, 650, 655, 643, 323, 645, 319, 323,
324, 629, 635, 315, 320, 314, 320, 287, 313, 334, 312, 313, 313,
314, 314, 625, 310, 314, 306, 312, 313, 309, 311, 308, 311, 310,
308, 305, 309, 308, 308, 307, 310, 308, 308, 311, 308, 305, 311,
312, 303, 309, 309, 308, 312, 308, 305, 305, 309, 303, 614, 305,
305, 307, 311, 312, 310, 311, 304, 302, 304, 309, 286, 331, 302,
305, 308, 311, 307, 304, 612, 307, 307, 306, 305, 313, 306, 307,
311, 312, 305, 307, 311, 311, 310, 309, 309, 311, 311, 313, 311,
310, 314, 311, 316, 315, 312, 316, 314, 309, 316, 315, 313, 318,
318, 313, 319, 640, 316, 322, 316, 322, 321, 322, 318, 318, 325,
315, 326, 314, 323, 329, 317, 319, 323, 319, 321, 326, 322, 322,
323, 323, 322, 325, 323, 324, 324, 325, 323, 324, 325, 325, 323,
327, 327, 326, 322, 327, 325, 322, 331, 327, 325, 331, 324, 325,
331, 324, 327, 329, 332, 324, 331, 331, 329, 330, 325, 333, 326,
329, 331, 330, 329, 332, 329, 331, 330, 332, 332, 324, 336, 333,
325, 333, 334, 329, 333, 332, 330, 333, 334, 333, 335, 329, 333,
333, 331, 334, 330, 334, 334, 334, 327, 334, 335, 328, 338, 332,
333, 335, 336, 334, 334, 339, 338, 333, 340, 336, 335, 340, 337,
338, 337, 338, 343, 342, 333, 341, 344, 335, 340, 341, 340, 341,
343, 339, 341, 344, 337, 345, 340, 341, 348, 342, 346, 346, 341,
347, 345, 343, 346, 687, 348, 349, 344, 345, 347, 350, 346, 350,
348, 348, 350, 347, 347, 353, 347, 350, 349, 347, 353, 349, 346,
351, 352, 348, 351, 353, 350, 351, 352, 354, 352, 349, 357, 351,
356, 352, 355, 360, 352, 355, 361, 355, 355, 360, 357, 353, 356,
354, 359, 356, 357, 361, 360, 356, 359, 363, 356, 360, 364, 358,
361, 360, 363, 361, 363, 365, 363, 362, 362, 366, 360, 365, 362,
361, 366, 364, 360, 367, 362, 365, 367, 365, 367, 367, 367, 370,
367, 370, 371, 367, 375, 369, 371, 371, 370, 372, 369, 372, 367,
370, 371, 369, 371, 370, 374, 366, 372, 375, 372, 372, 372, 377,
374, 371, 375, 372, 374, 377, 371, 375, 375, 375, 374, 375, 379,
374, 375, 378, 380, 370, 377, 378, 371, 375, 377, 375, 376, 376,
375, 375, 374, 375, 375, 372, 375, 375, 379, 378, 373, 378, 376,
377, 378, 376, 377, 376, 382, 384, 376, 381, 383, 382, 381, 386,
385, 385, 383, 383, 383, 382, 382, 385, 381, 383, 382, 382, 383,
377, 384, 379, 376, 381, 379, 375, 378, 378, 378, 378, 375, 377,
374, 376, 375, 373, 376, 377, 374, 381, 376, 374, 381, 385, 381,
375, 375, 384, 383, 378, 379, 381, 377, 379, 377, 379, 376, 379,
375, 375, 377, 377]
Plotting the whole dataset gives:
I previously used an above:below-filter, but that does not work for longer recordings in which the trace spans across larger values (in some recordings the intervals span from 300 (during training) to 1500 (after a period of resting).
What is the best way to remove the outliers in this case, and how would one go about implementing it? Moving average, exclusion based on stdev, median filter...?
Here's a ugly approach that seems to work:
import numpy as np
RR_times = np.array([411, 827, 1241, ...])
IBIs = np.array([411, 416, 414, ...])
diffs = [np.abs(IBIs[i]-IBIs[i+1]) for i in range(len(IBIs)-1)]
IBIs_cleaned = np.full(IBIs.shape, np.nan) # create a array full of NaNs
IBIs_cleaned[0] = IBIs[0] # The first value isn't a outlier
for i in range(1, len(IBIs)):
if np.abs(IBIs[i]-IBIs[i-1]) < np.mean(diffs) and IBIs[i] < 1.6 * np.mean(IBIs):
IBIs_cleaned[i] = IBIs[i]

How can I use the numpy.ndarray output generated in one function as the input into another function?

Firstly, here is the script:
import numpy as np
import osgeo.gdal
import os
ArbitraryXCoords = np.arange(435531.30622598,440020.30622598,400)
ArbitraryYCoords = np.arange(5634955.28972479,5638945.28972479,400)
os.chdir('/home/foo/GIS_Summer2013')
dataset = osgeo.gdal.Open("Raster_DEM")
gt = dataset.GetGeoTransform()
def XAndYArrays(spacing):
XPoints = np.arange(gt[0], gt[0] + dataset.RasterXSize * gt[1], spacing)
YPoints = np.arange(gt[3] + dataset.RasterYSize * gt[5], gt[3], spacing)
return (XPoints, YPoints)
def RasterPoints(XCoords,YCoords):
a=[]
for row in YCoords:
for col in XCoords:
rasterx = int((col - gt[0]) / gt[1])
rastery = int((row - gt[3]) / gt[5])
band = int(dataset.GetRasterBand(1).ReadAsArray(rasterx,rastery, 1, 1)[0][0])
a[len(a):] = [band]
foo = np.asarray(a)
bar = foo.reshape(YCoords.size,XCoords.size)
return bar
When I load the script that is presented above, I am unable to use the output from the function XAndYArrays as input in the function RasterPoints. But I am able to use the numpy.ndarray that I have defined manually as input in the function RasterPoints. But this is not good enough. I need to be able to use the output from XAndYArrays as input in RasterPoints.
Here are the commands that I used at the PyDev interactive console:
>>> Eastings,Northings = XAndYArrays(400)
>>> Eastings
Out[1]:
array([ 435530.30622598, 435930.30622598, 436330.30622598,
436730.30622598, 437130.30622598, 437530.30622598,
437930.30622598, 438330.30622598, 438730.30622598,
439130.30622598, 439530.30622598, 439930.30622598])
>>> Northings
Out[1]:
array([ 5634954.28972479, 5635354.28972479, 5635754.28972479,
5636154.28972479, 5636554.28972479, 5636954.28972479,
5637354.28972479, 5637754.28972479, 5638154.28972479,
5638554.28972479, 5638954.28972479])
>>> RasterPoints(Eastings, Northings)
ERROR 5: MergedDEM_EPSG3159_Reduced, band 1: Access window out of range in RasterIO(). Requested (0,246) of size 1x1 on raster of 269x246.
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/IPython/core/interactiveshell.py", line 2538, in run_code
exec code_obj in self.user_global_ns, self.user_ns
File "<ipython-input-1-326be9918188>", line 1, in <module>
RasterPoints(Eastings, Northings)
File "/home/foo/GIS_Summer2013/src/22July_StackOverflowQuestion.py", line 23, in RasterPoints
band = int(dataset.GetRasterBand(1).ReadAsArray(rasterx,rastery, 1, 1)[0][0])
TypeError: 'NoneType' object has no attribute '__getitem__'
>>> RasterPoints(ArbitraryXCoords, ArbitraryYCoords)
Out[1]:
array([[422, 422, 431, 439, 428, 399, 410, 395, 398, 413, 409, 386],
[414, 428, 421, 430, 426, 403, 409, 410, 406, 408, 412, 406],
[420, 428, 427, 424, 408, 406, 428, 420, 408, 410, 409, 420],
[392, 418, 426, 430, 414, 428, 430, 418, 433, 414, 402, 399],
[400, 411, 420, 406, 401, 405, 398, 420, 419, 400, 401, 414],
[408, 421, 418, 428, 399, 398, 405, 412, 421, 406, 395, 397],
[399, 404, 398, 401, 400, 399, 399, 398, 398, 419, 399, 395],
[401, 410, 407, 407, 404, 400, 398, 397, 397, 399, 400, 398],
[400, 410, 418, 405, 401, 400, 397, 398, 400, 398, 397, 396],
[389, 387, 399, 408, 423, 400, 407, 398, 411, 408, 410, 420]])
>>> print "partial success"
partial success
You are trying to read a pixel location which doesn't exist because its outside the raster dimensions.
Try calculating your Ypoints with:
YPoints = np.arange(gt[3] + (ds.RasterYSize-1) * gt[5], gt[3], abs(gt[5]))
Your RasterPoints function is really bad practice. You're accessing all array values 1 at the time, store them in a list and then make an array of it. That makes no sense at all.
Furthermore i think its good practice on SO to close/resolve previous questions before opening another one.

Categories

Resources