Related
Hi I have the following code for implementing Shell sort in Python. How can I implement the following sequences in Shell sort using the code below (Note this is not the list I want to sort) :
1, 4, 13, 40, 121, 364, 1093, 3280, 9841, 29524 (Knuthâs sequence)
1, 5, 17, 53, 149, 373, 1123, 3371, 10111, 30341
1, 10, 30, 60, 120, 360, 1080, 3240, 9720, 29160
interval = n // 2
while interval > 0:
for i in range(interval, n):
temp = array[i]
j = i
while j >= interval and array[j - interval] > temp:
array[j] = array[j - interval]
j -= interval
array[j] = temp
interval //= 2
You could modify the pseudo-code provided in the Wikipedia article for Shellsort to take in the gap sequence as a parameter:
from random import choices
from timeit import timeit
RAND_SEQUENCE_SIZE = 500
GAP_SEQUENCES = {
'CIURA_A102549': [701, 301, 132, 57, 23, 10, 4, 1],
'KNUTH_A003462': [29524, 9841, 3280, 1093, 364, 121, 40, 13, 4, 1],
'SPACED_OUT_PRIME_GAPS': [30341, 10111, 3371, 1123, 373, 149, 53, 17, 5, 1],
'SPACED_OUT_EVEN_GAPS': [29160, 9720, 3240, 1080, 360, 120, 60, 30, 10, 1],
}
def shell_sort(seq: list[int], gap_sequence: list[int]) -> None:
n = len(seq)
# Start with the largest gap and work down to a gap of 1. Similar to
# insertion sort but instead of 1, gap is being used in each step.
for gap in gap_sequence:
# Do a gapped insertion sort for every element in gaps.
# Each gap sort includes (0..gap-1) offset interleaved sorting.
for offset in range(gap):
for i in range(offset, n, gap):
# Save seq[i] in temp and make a hole at position i.
temp = seq[i]
# Shift earlier gap-sorted elements up until the correct
# location for seq[i] is found.
j = i
while j >= gap and seq[j - gap] > temp:
seq[j] = seq[j - gap]
j -= gap
# Put temp (the original seq[i]) in its correct location.
seq[j] = temp
def main() -> None:
seq = choices(population=range(1000), k=RAND_SEQUENCE_SIZE)
print(f'{seq = }')
print(f'{len(seq) = }')
for name, gap_sequence in GAP_SEQUENCES.items():
print(f'Shell sort using {name} gap sequence: {gap_sequence}')
print(f'Time taken to sort 100 times: {timeit(lambda: shell_sort(seq.copy(), gap_sequence), number=100)} seconds')
if __name__ == '__main__':
main()
Example Output:
seq = [331, 799, 153, 700, 373, 38, 203, 535, 894, 500, 922, 939, 507, 506, 89, 40, 442, 108, 112, 359, 280, 946, 395, 708, 140, 435, 588, 306, 202, 23, 6, 189, 570, 600, 857, 949, 606, 617, 556, 863, 521, 776, 436, 801, 501, 588, 927, 279, 210, 72, 460, 52, 340, 632, 385, 965, 730, 360, 88, 216, 991, 520, 74, 112, 770, 853, 483, 787, 229, 812, 259, 349, 967, 227, 957, 728, 780, 51, 604, 748, 3, 679, 33, 488, 130, 203, 493, 471, 397, 53, 49, 172, 7, 306, 613, 519, 575, 64, 168, 161, 376, 903, 338, 800, 58, 729, 421, 238, 967, 294, 967, 218, 456, 823, 649, 569, 144, 103, 970, 780, 859, 719, 15, 536, 263, 917, 0, 54, 370, 703, 911, 518, 78, 41, 106, 452, 355, 571, 249, 58, 274, 327, 500, 341, 743, 536, 432, 799, 597, 681, 301, 856, 219, 63, 653, 680, 891, 725, 537, 673, 815, 504, 720, 573, 60, 91, 909, 892, 964, 119, 793, 540, 303, 538, 130, 717, 755, 968, 46, 229, 837, 398, 182, 303, 99, 808, 56, 780, 415, 33, 511, 771, 875, 593, 120, 727, 505, 905, 619, 295, 958, 566, 8, 291, 811, 529, 789, 523, 545, 5, 631, 28, 107, 292, 831, 657, 952, 239, 814, 862, 912, 2, 147, 750, 132, 528, 408, 916, 718, 261, 488, 621, 261, 963, 880, 625, 151, 982, 819, 749, 224, 572, 690, 766, 278, 417, 248, 987, 664, 515, 691, 940, 860, 172, 898, 321, 381, 662, 293, 354, 642, 219, 133, 133, 854, 162, 254, 816, 630, 21, 577, 486, 792, 731, 714, 581, 633, 794, 120, 386, 874, 177, 652, 159, 264, 414, 417, 730, 728, 716, 973, 688, 106, 345, 153, 909, 382, 505, 721, 363, 230, 588, 765, 340, 142, 549, 558, 189, 547, 728, 974, 468, 182, 255, 637, 317, 40, 775, 696, 135, 985, 884, 131, 797, 84, 89, 962, 810, 520, 843, 24, 400, 717, 834, 170, 681, 333, 68, 159, 688, 422, 198, 621, 386, 391, 839, 283, 167, 655, 314, 820, 432, 412, 181, 440, 864, 828, 217, 491, 593, 298, 885, 831, 535, 92, 305, 510, 90, 949, 461, 627, 851, 606, 280, 413, 624, 916, 16, 517, 700, 776, 323, 161, 329, 25, 868, 258, 97, 219, 620, 69, 24, 794, 981, 361, 691, 20, 90, 825, 442, 531, 562, 240, 0, 440, 418, 338, 526, 34, 230, 381, 598, 734, 925, 209, 231, 980, 122, 374, 752, 144, 105, 920, 780, 828, 948, 515, 443, 810, 81, 303, 751, 779, 516, 394, 455, 116, 448, 652, 293, 327, 367, 793, 47, 946, 653, 927, 910, 583, 845, 442, 989, 393, 490, 564, 54, 656, 689, 626, 531, 941, 575, 628, 865, 705, 219, 42, 19, 10, 155, 436, 319, 510, 520, 869, 101, 918, 170, 826, 146, 389, 200, 992, 404, 982, 889, 818, 684, 524, 642, 991, 973, 561, 104, 418, 207, 963, 192, 410, 33]
len(seq) = 500
Shell sort using CIURA_A102549 gap sequence: [701, 301, 132, 57, 23, 10, 4, 1]
Time taken to sort 100 times: 0.06717020808719099 seconds
Shell sort using KNUTH_A003462 gap sequence: [29524, 9841, 3280, 1093, 364, 121, 40, 13, 4, 1]
Time taken to sort 100 times: 0.34870366705581546 seconds
Shell sort using SPACED_OUT_PRIME_GAPS gap sequence: [30341, 10111, 3371, 1123, 373, 149, 53, 17, 5, 1]
Time taken to sort 100 times: 0.3563524999190122 seconds
Shell sort using SPACED_OUT_EVEN_GAPS gap sequence: [29160, 9720, 3240, 1080, 360, 120, 60, 30, 10, 1]
Time taken to sort 100 times: 0.38147866702638566 seconds
I have created a one dimensional array in Pyhton Numpy as follows:
import numpy as np
list1=[573, 554, 536, 535, 531, 523, 521, 519, 518, 518, 515, 514, 511, 506, 504, 501, 501, 500, 500, 499, 495, 494, 493, 491, 490, 489, 487, 485, 484, 482, 482, 481, 479, 478, 477, 471, 466, 453, 449, 448, 445, 439, 434, 432, 427, 423, 421, 413, 410, 409, 407, 394, 391, 388, 388, 386, 376, 376, 375, 368]
array_example = np.array(list1)
Now I would like to select the 3rd, 7th and 9th value so I get the values 536, 521, 518.
I try:
array_example[2,6,8]
but get the following error:
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-4-6335bacdceb1> in <module>
----> 1 array_example[2,6,8]
IndexError: too many indices for array: array is 1-dimensional, but 3 were indexed
What would be the right solution? Any suggestions? Ty in advance!
Try this:
array_example[[2,6,8]]
It is called 'fancy indexing'.
I want to fill the area overlapping between two normal distributions. I've got the x min and max, but I can't figure out how to set the y boundaries.
I've looked at the plt documentation and some examples. I think this related question and this one come close, but no luck. Here's what I have so far.
import numpy as np
import seaborn as sns
import scipy.stats as stats
import matplotlib.pyplot as plt
pepe_calories = np.array([361, 291, 263, 284, 311, 284, 282, 228, 328, 263, 354, 302, 293,
254, 297, 281, 307, 281, 262, 302, 244, 259, 273, 299, 278, 257,
296, 237, 276, 280, 291, 278, 251, 313, 314, 323, 333, 270, 317,
321, 307, 256, 301, 264, 221, 251, 307, 283, 300, 292, 344, 239,
288, 356, 224, 246, 196, 202, 314, 301, 336, 294, 237, 284, 311,
257, 255, 287, 243, 267, 253, 257, 320, 295, 295, 271, 322, 343,
313, 293, 298, 272, 267, 257, 334, 276, 337, 325, 261, 344, 298,
253, 302, 318, 289, 302, 291, 343, 310, 241])
modern_calories = np.array([310, 315, 303, 360, 339, 416, 278, 326, 316, 314, 333, 317, 357,
304, 363, 387, 279, 350, 367, 321, 366, 311, 308, 303, 299, 363,
335, 357, 392, 321, 361, 285, 321, 290, 392, 341, 331, 338, 326,
314, 327, 320, 293, 333, 297, 315, 365, 408, 352, 359, 312, 300,
263, 358, 345, 360, 336, 378, 315, 354, 318, 300, 372, 305, 336,
286, 296, 413, 383, 328, 418, 388, 416, 371, 313, 321, 321, 317,
402, 290, 328, 344, 330, 319, 309, 327, 351, 324, 278, 369, 416,
359, 381, 324, 306, 350, 385, 335, 395, 308])
ax = sns.distplot(pepe_calories, fit_kws={"color":"blue"}, kde=False,
fit=stats.norm, hist=None, label="Pepe's");
ax = sns.distplot(modern_calories, fit_kws={"color":"orange"}, kde=False,
fit=stats.norm, hist=None, label="Modern");
# Get the two lines from the axes to generate shading
l1 = ax.lines[0]
l2 = ax.lines[1]
# Get the xy data from the lines so that we can shade
x1 = l1.get_xydata()[:,0]
y1 = l1.get_xydata()[:,1]
x2 = l2.get_xydata()[:,0]
y2 = l2.get_xydata()[:,1]
x2min = np.min(x2)
x1max = np.max(x1)
ax.fill_between(x1,y1, where = ((x1 > x2min) & (x1 < x1max)), color="red", alpha=0.3)
#> <matplotlib.collections.PolyCollection at 0x1a200510b8>
plt.legend()
#> <matplotlib.legend.Legend at 0x1a1ff2e390>
plt.show()
Any ideas?
Created on 2018-12-01 by the reprexpy package
import reprexpy
print(reprexpy.SessionInfo())
#> Session info --------------------------------------------------------------------
#> Platform: Darwin-18.2.0-x86_64-i386-64bit (64-bit)
#> Python: 3.6
#> Date: 2018-12-01
#> Packages ------------------------------------------------------------------------
#> matplotlib==2.1.2
#> numpy==1.15.4
#> reprexpy==0.1.1
#> scipy==1.1.0
#> seaborn==0.9.0
While gathering the pdf data from get_xydata is clever, you are now at the mercy of matplotlib's rendering / segmentation algorithm. Having x1 and x2 span different ranges also makes comparing y1 and y2 difficult.
You can avoid these problems by fitting the normals yourself instead of
letting sns.distplot do it. Then you have more control over the values you are
looking for.
import numpy as np
import scipy.stats as stats
import matplotlib.pyplot as plt
norm = stats.norm
pepe_calories = np.array([361, 291, 263, 284, 311, 284, 282, 228, 328, 263, 354, 302, 293,
254, 297, 281, 307, 281, 262, 302, 244, 259, 273, 299, 278, 257,
296, 237, 276, 280, 291, 278, 251, 313, 314, 323, 333, 270, 317,
321, 307, 256, 301, 264, 221, 251, 307, 283, 300, 292, 344, 239,
288, 356, 224, 246, 196, 202, 314, 301, 336, 294, 237, 284, 311,
257, 255, 287, 243, 267, 253, 257, 320, 295, 295, 271, 322, 343,
313, 293, 298, 272, 267, 257, 334, 276, 337, 325, 261, 344, 298,
253, 302, 318, 289, 302, 291, 343, 310, 241])
modern_calories = np.array([310, 315, 303, 360, 339, 416, 278, 326, 316, 314, 333, 317, 357,
304, 363, 387, 279, 350, 367, 321, 366, 311, 308, 303, 299, 363,
335, 357, 392, 321, 361, 285, 321, 290, 392, 341, 331, 338, 326,
314, 327, 320, 293, 333, 297, 315, 365, 408, 352, 359, 312, 300,
263, 358, 345, 360, 336, 378, 315, 354, 318, 300, 372, 305, 336,
286, 296, 413, 383, 328, 418, 388, 416, 371, 313, 321, 321, 317,
402, 290, 328, 344, 330, 319, 309, 327, 351, 324, 278, 369, 416,
359, 381, 324, 306, 350, 385, 335, 395, 308])
pepe_params = norm.fit(pepe_calories)
modern_params = norm.fit(modern_calories)
xmin = min(pepe_calories.min(), modern_calories.min())
xmax = max(pepe_calories.max(), modern_calories.max())
x = np.linspace(xmin, xmax, 100)
pepe_pdf = norm(*pepe_params).pdf(x)
modern_pdf = norm(*modern_params).pdf(x)
y = np.minimum(modern_pdf, pepe_pdf)
fig, ax = plt.subplots()
ax.plot(x, pepe_pdf, label="Pepe's", color='blue')
ax.plot(x, modern_pdf, label="Modern", color='orange')
ax.fill_between(x, y, color='red', alpha=0.3)
plt.legend()
plt.show()
If, let's say, sns.distplot (or some other plotting function) made a plot that you did not want to have to reproduce, then you could use the data from get_xydata this way:
import numpy as np
import seaborn as sns
import scipy.stats as stats
import matplotlib.pyplot as plt
pepe_calories = np.array([361, 291, 263, 284, 311, 284, 282, 228, 328, 263, 354, 302, 293,
254, 297, 281, 307, 281, 262, 302, 244, 259, 273, 299, 278, 257,
296, 237, 276, 280, 291, 278, 251, 313, 314, 323, 333, 270, 317,
321, 307, 256, 301, 264, 221, 251, 307, 283, 300, 292, 344, 239,
288, 356, 224, 246, 196, 202, 314, 301, 336, 294, 237, 284, 311,
257, 255, 287, 243, 267, 253, 257, 320, 295, 295, 271, 322, 343,
313, 293, 298, 272, 267, 257, 334, 276, 337, 325, 261, 344, 298,
253, 302, 318, 289, 302, 291, 343, 310, 241])
modern_calories = np.array([310, 315, 303, 360, 339, 416, 278, 326, 316, 314, 333, 317, 357,
304, 363, 387, 279, 350, 367, 321, 366, 311, 308, 303, 299, 363,
335, 357, 392, 321, 361, 285, 321, 290, 392, 341, 331, 338, 326,
314, 327, 320, 293, 333, 297, 315, 365, 408, 352, 359, 312, 300,
263, 358, 345, 360, 336, 378, 315, 354, 318, 300, 372, 305, 336,
286, 296, 413, 383, 328, 418, 388, 416, 371, 313, 321, 321, 317,
402, 290, 328, 344, 330, 319, 309, 327, 351, 324, 278, 369, 416,
359, 381, 324, 306, 350, 385, 335, 395, 308])
ax = sns.distplot(pepe_calories, fit_kws={"color":"blue"}, kde=False,
fit=stats.norm, hist=None, label="Pepe's");
ax = sns.distplot(modern_calories, fit_kws={"color":"orange"}, kde=False,
fit=stats.norm, hist=None, label="Modern");
# Get the two lines from the axes to generate shading
l1 = ax.lines[0]
l2 = ax.lines[1]
# Get the xy data from the lines so that we can shade
x1, y1 = l1.get_xydata().T
x2, y2 = l2.get_xydata().T
xmin = max(x1.min(), x2.min())
xmax = min(x1.max(), x2.max())
x = np.linspace(xmin, xmax, 100)
y1 = np.interp(x, x1, y1)
y2 = np.interp(x, x2, y2)
y = np.minimum(y1, y2)
ax.fill_between(x, y, color="red", alpha=0.3)
plt.legend()
plt.show()
I suppose not using seaborn in cases where you want to have full control over the resulting plot is often a useful strategy. Hence just calculate the fits, plot them and use fill between the curves up to the point where they cross each other.
import numpy as np
import scipy.stats as stats
import matplotlib.pyplot as plt
pepe_calories = np.array(...)
modern_calories = np.array(...)
x = np.linspace(150,470,1000)
y1 = stats.norm.pdf(x, *stats.norm.fit(pepe_calories))
y2 = stats.norm.pdf(x, *stats.norm.fit(modern_calories))
cross = x[y1-y2 <= 0][0]
fig, ax = plt.subplots()
ax.fill_between(x,y1,y2, where=(x<=cross), color="red", alpha=0.3)
ax.plot(x,y1, label="Pepe's")
ax.plot(x,y2, label="Modern")
ax.legend()
plt.show()
I have two lists containing heart beat intervals (Y-axis, in ms; IBIs below) and their absolute timepoints (X-axis, in ms; RR_times below). There are some misreadings, such that the first list contains outliers that need to be removed, and the second one their corresponding timepoints. It would be optimal if the outliers in the first list are NaN-ed so that the total time for the recording remains the same.
RR_times = [411, 827, 1241, 1653, 2066, 2481, 2894, 3308,
3714, 4126, 4532, 4938, 5343, 5751, 6156, 6552,
6951, 7346, 7749, 8149, 8546, 8944, 9338, 9735,
10123, 10511, 10905, 11290, 11675, 12060, 12441, 12825,
13205, 13581, 13960, 14342, 14717, 15087, 15462, 15829,
16204, 16531, 16902, 17304, 17670, 18040, 18398, 18762,
19127, 19465, 19823, 20196, 20554, 20906, 21256, 21609,
21959, 22264, 22637, 22995, 23308, 23649, 24012, 24352,
24687, 25026, 25390, 25681, 26014, 26347, 26680, 27330,
27985, 28628, 28951, 29596, 29915, 30238, 30562, 31191,
31826, 32141, 32461, 32775, 33095, 33382, 33695, 34029,
34341, 34654, 34967, 35281, 35595, 36220, 36530, 36844,
37150, 37462, 37775, 38084, 38395, 38703, 39014, 39324,
39632, 39937, 40246, 40554, 40862, 41169, 41479, 41787,
42095, 42406, 42714, 43019, 43330, 43642, 43945, 44254,
44563, 44871, 45183, 45491, 45796, 46101, 46410, 46713,
47327, 47632, 47937, 48244, 48555, 48867, 49177, 49488,
49792, 50094, 50398, 50707, 50993, 51324, 51626, 51931,
52239, 52550, 52857, 53161, 53773, 54080, 54387, 54693,
54998, 55311, 55617, 55924, 56235, 56547, 56852, 57159,
57470, 57781, 58091, 58400, 58709, 59020, 59331, 59644,
59955, 60265, 60579, 60890, 61206, 61521, 61833, 62149,
62463, 62772, 63088, 63403, 63716, 64034, 64352, 64665,
64984, 65624, 65940, 66262, 66578, 66900, 67221, 67543,
67861, 68179, 68504, 68819, 69145, 69459, 69782, 70111,
70428, 70747, 71070, 71389, 71710, 72036, 72358, 72680,
73003, 73326, 73648, 73973, 74296, 74620, 74944, 75269,
75592, 75916, 76241, 76566, 76889, 77216, 77543, 77869,
78191, 78518, 78843, 79165, 79496, 79823, 80148, 80479,
80803, 81128, 81459, 81783, 82110, 82439, 82771, 83095,
83426, 83757, 84086, 84416, 84741, 85074, 85400, 85729,
86060, 86390, 86719, 87051, 87380, 87711, 88041, 88373,
88705, 89029, 89365, 89698, 90023, 90356, 90690, 91019,
91352, 91684, 92014, 92347, 92681, 93014, 93349, 93678,
94011, 94344, 94675, 95009, 95339, 95673, 96007, 96341,
96668, 97002, 97337, 97665, 98003, 98335, 98668, 99003,
99339, 99673, 100007, 100346, 100684, 101017, 101357, 101693,
102028, 102368, 102705, 103043, 103380, 103718, 104061, 104403,
104736, 105077, 105421, 105756, 106096, 106437, 106777, 107118,
107461, 107800, 108141, 108485, 108822, 109167, 109507, 109848,
110196, 110538, 110884, 111230, 111571, 111918, 112263, 112606,
112952, 113639, 113987, 114336, 114680, 115025, 115372, 115722,
116068, 116418, 116766, 117114, 117464, 117811, 118158, 118511,
118858, 119208, 119557, 119904, 120257, 120606, 120952, 121303,
121655, 122003, 122354, 122707, 123057, 123408, 123760, 124114,
124466, 124815, 125172, 125523, 125879, 126231, 126586, 126946,
127298, 127653, 128014, 128369, 128724, 129084, 129441, 129794,
130150, 130504, 130863, 131219, 131576, 131937, 132297, 132653,
133012, 133375, 133731, 134091, 134455, 134813, 135174, 135534,
135897, 136258, 136621, 136986, 137349, 137711, 138073, 138439,
138799, 139164, 139526, 139887, 140253, 140617, 140977, 141344,
141706, 142071, 142438, 142803, 143170, 143537, 143904, 144274,
144641, 145011, 145382, 145749, 146124, 146493, 146864, 147235,
147605, 147977, 148346, 148718, 149085, 149455, 149826, 150195,
150566, 150936, 151310, 151676, 152048, 152423, 152795, 153167,
153539, 153916, 154290, 154661, 155036, 155408, 155782, 156159,
156530, 156905, 157280, 157655, 158029, 158404, 158783, 159157,
159532, 159910, 160290, 160660, 161037, 161415, 161786, 162161,
162538, 162913, 163289, 163665, 164040, 164415, 164789, 165164,
165539, 165911, 166286, 166661, 167040, 167418, 167791, 168169,
168545, 168922, 169300, 169676, 170053, 170429, 170811, 171195,
171571, 171952, 172335, 172717, 173098, 173484, 173869, 174254,
174637, 175020, 175403, 175785, 176167, 176552, 176933, 177316,
177698, 178080, 178463, 178840, 179224, 179603, 179979, 180360,
180739, 181114, 181492, 181870, 182248, 182626, 183001, 183378,
183752, 184128, 184503, 184876, 185252, 185629, 186003, 186384,
186760, 187134, 187515, 187900, 188281, 188656, 189031, 189415,
189798, 190176, 190555, 190936, 191313, 191692, 192069, 192448,
192824, 193203, 193578, 193953, 194330, 194707]
IBIs = [411,416,414,412,413,415, 413, 414, 406, 412, 406, 406, 405,
408, 405, 396, 399, 395, 403, 400, 397, 398, 394, 397, 388, 388,
394, 385, 385, 385, 381, 384, 380, 376, 379, 382, 375, 370, 375,
367, 375, 327, 371, 402, 366, 370, 358, 364, 365, 338, 358, 373,
358, 352, 350, 353, 350, 305, 373, 358, 313, 341, 363, 340, 335,
339, 364, 291, 333, 333, 333, 650, 655, 643, 323, 645, 319, 323,
324, 629, 635, 315, 320, 314, 320, 287, 313, 334, 312, 313, 313,
314, 314, 625, 310, 314, 306, 312, 313, 309, 311, 308, 311, 310,
308, 305, 309, 308, 308, 307, 310, 308, 308, 311, 308, 305, 311,
312, 303, 309, 309, 308, 312, 308, 305, 305, 309, 303, 614, 305,
305, 307, 311, 312, 310, 311, 304, 302, 304, 309, 286, 331, 302,
305, 308, 311, 307, 304, 612, 307, 307, 306, 305, 313, 306, 307,
311, 312, 305, 307, 311, 311, 310, 309, 309, 311, 311, 313, 311,
310, 314, 311, 316, 315, 312, 316, 314, 309, 316, 315, 313, 318,
318, 313, 319, 640, 316, 322, 316, 322, 321, 322, 318, 318, 325,
315, 326, 314, 323, 329, 317, 319, 323, 319, 321, 326, 322, 322,
323, 323, 322, 325, 323, 324, 324, 325, 323, 324, 325, 325, 323,
327, 327, 326, 322, 327, 325, 322, 331, 327, 325, 331, 324, 325,
331, 324, 327, 329, 332, 324, 331, 331, 329, 330, 325, 333, 326,
329, 331, 330, 329, 332, 329, 331, 330, 332, 332, 324, 336, 333,
325, 333, 334, 329, 333, 332, 330, 333, 334, 333, 335, 329, 333,
333, 331, 334, 330, 334, 334, 334, 327, 334, 335, 328, 338, 332,
333, 335, 336, 334, 334, 339, 338, 333, 340, 336, 335, 340, 337,
338, 337, 338, 343, 342, 333, 341, 344, 335, 340, 341, 340, 341,
343, 339, 341, 344, 337, 345, 340, 341, 348, 342, 346, 346, 341,
347, 345, 343, 346, 687, 348, 349, 344, 345, 347, 350, 346, 350,
348, 348, 350, 347, 347, 353, 347, 350, 349, 347, 353, 349, 346,
351, 352, 348, 351, 353, 350, 351, 352, 354, 352, 349, 357, 351,
356, 352, 355, 360, 352, 355, 361, 355, 355, 360, 357, 353, 356,
354, 359, 356, 357, 361, 360, 356, 359, 363, 356, 360, 364, 358,
361, 360, 363, 361, 363, 365, 363, 362, 362, 366, 360, 365, 362,
361, 366, 364, 360, 367, 362, 365, 367, 365, 367, 367, 367, 370,
367, 370, 371, 367, 375, 369, 371, 371, 370, 372, 369, 372, 367,
370, 371, 369, 371, 370, 374, 366, 372, 375, 372, 372, 372, 377,
374, 371, 375, 372, 374, 377, 371, 375, 375, 375, 374, 375, 379,
374, 375, 378, 380, 370, 377, 378, 371, 375, 377, 375, 376, 376,
375, 375, 374, 375, 375, 372, 375, 375, 379, 378, 373, 378, 376,
377, 378, 376, 377, 376, 382, 384, 376, 381, 383, 382, 381, 386,
385, 385, 383, 383, 383, 382, 382, 385, 381, 383, 382, 382, 383,
377, 384, 379, 376, 381, 379, 375, 378, 378, 378, 378, 375, 377,
374, 376, 375, 373, 376, 377, 374, 381, 376, 374, 381, 385, 381,
375, 375, 384, 383, 378, 379, 381, 377, 379, 377, 379, 376, 379,
375, 375, 377, 377]
Plotting the whole dataset gives:
I previously used an above:below-filter, but that does not work for longer recordings in which the trace spans across larger values (in some recordings the intervals span from 300 (during training) to 1500 (after a period of resting).
What is the best way to remove the outliers in this case, and how would one go about implementing it? Moving average, exclusion based on stdev, median filter...?
Here's a ugly approach that seems to work:
import numpy as np
RR_times = np.array([411, 827, 1241, ...])
IBIs = np.array([411, 416, 414, ...])
diffs = [np.abs(IBIs[i]-IBIs[i+1]) for i in range(len(IBIs)-1)]
IBIs_cleaned = np.full(IBIs.shape, np.nan) # create a array full of NaNs
IBIs_cleaned[0] = IBIs[0] # The first value isn't a outlier
for i in range(1, len(IBIs)):
if np.abs(IBIs[i]-IBIs[i-1]) < np.mean(diffs) and IBIs[i] < 1.6 * np.mean(IBIs):
IBIs_cleaned[i] = IBIs[i]
Firstly, here is the script:
import numpy as np
import osgeo.gdal
import os
ArbitraryXCoords = np.arange(435531.30622598,440020.30622598,400)
ArbitraryYCoords = np.arange(5634955.28972479,5638945.28972479,400)
os.chdir('/home/foo/GIS_Summer2013')
dataset = osgeo.gdal.Open("Raster_DEM")
gt = dataset.GetGeoTransform()
def XAndYArrays(spacing):
XPoints = np.arange(gt[0], gt[0] + dataset.RasterXSize * gt[1], spacing)
YPoints = np.arange(gt[3] + dataset.RasterYSize * gt[5], gt[3], spacing)
return (XPoints, YPoints)
def RasterPoints(XCoords,YCoords):
a=[]
for row in YCoords:
for col in XCoords:
rasterx = int((col - gt[0]) / gt[1])
rastery = int((row - gt[3]) / gt[5])
band = int(dataset.GetRasterBand(1).ReadAsArray(rasterx,rastery, 1, 1)[0][0])
a[len(a):] = [band]
foo = np.asarray(a)
bar = foo.reshape(YCoords.size,XCoords.size)
return bar
When I load the script that is presented above, I am unable to use the output from the function XAndYArrays as input in the function RasterPoints. But I am able to use the numpy.ndarray that I have defined manually as input in the function RasterPoints. But this is not good enough. I need to be able to use the output from XAndYArrays as input in RasterPoints.
Here are the commands that I used at the PyDev interactive console:
>>> Eastings,Northings = XAndYArrays(400)
>>> Eastings
Out[1]:
array([ 435530.30622598, 435930.30622598, 436330.30622598,
436730.30622598, 437130.30622598, 437530.30622598,
437930.30622598, 438330.30622598, 438730.30622598,
439130.30622598, 439530.30622598, 439930.30622598])
>>> Northings
Out[1]:
array([ 5634954.28972479, 5635354.28972479, 5635754.28972479,
5636154.28972479, 5636554.28972479, 5636954.28972479,
5637354.28972479, 5637754.28972479, 5638154.28972479,
5638554.28972479, 5638954.28972479])
>>> RasterPoints(Eastings, Northings)
ERROR 5: MergedDEM_EPSG3159_Reduced, band 1: Access window out of range in RasterIO(). Requested (0,246) of size 1x1 on raster of 269x246.
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/IPython/core/interactiveshell.py", line 2538, in run_code
exec code_obj in self.user_global_ns, self.user_ns
File "<ipython-input-1-326be9918188>", line 1, in <module>
RasterPoints(Eastings, Northings)
File "/home/foo/GIS_Summer2013/src/22July_StackOverflowQuestion.py", line 23, in RasterPoints
band = int(dataset.GetRasterBand(1).ReadAsArray(rasterx,rastery, 1, 1)[0][0])
TypeError: 'NoneType' object has no attribute '__getitem__'
>>> RasterPoints(ArbitraryXCoords, ArbitraryYCoords)
Out[1]:
array([[422, 422, 431, 439, 428, 399, 410, 395, 398, 413, 409, 386],
[414, 428, 421, 430, 426, 403, 409, 410, 406, 408, 412, 406],
[420, 428, 427, 424, 408, 406, 428, 420, 408, 410, 409, 420],
[392, 418, 426, 430, 414, 428, 430, 418, 433, 414, 402, 399],
[400, 411, 420, 406, 401, 405, 398, 420, 419, 400, 401, 414],
[408, 421, 418, 428, 399, 398, 405, 412, 421, 406, 395, 397],
[399, 404, 398, 401, 400, 399, 399, 398, 398, 419, 399, 395],
[401, 410, 407, 407, 404, 400, 398, 397, 397, 399, 400, 398],
[400, 410, 418, 405, 401, 400, 397, 398, 400, 398, 397, 396],
[389, 387, 399, 408, 423, 400, 407, 398, 411, 408, 410, 420]])
>>> print "partial success"
partial success
You are trying to read a pixel location which doesn't exist because its outside the raster dimensions.
Try calculating your Ypoints with:
YPoints = np.arange(gt[3] + (ds.RasterYSize-1) * gt[5], gt[3], abs(gt[5]))
Your RasterPoints function is really bad practice. You're accessing all array values 1 at the time, store them in a list and then make an array of it. That makes no sense at all.
Furthermore i think its good practice on SO to close/resolve previous questions before opening another one.