Is there a way I could index through a numpy list just like how I would be able to within a normal list function. I want to go through 3 elements in the list moving up by one point every single time and summing all the slices. So it would go through 1,2,3 for the first sum and then it would go through 2,3,4 for the second sum etc. The code down below gives me a scalar error, is there a way I could perform the function below without using a for loop.
import numpy as np
n = 3
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8,9, 10, 11,12, 13, 14, 15, 16, 17, 18, 19, 20, 21 ,22, 23, 24, 25])
start = np.arange(0, len(arr)-n, 1)
stop = np.arange(n-1, len(arr), 1)
sum_arr = np.sum(arr[start:stop])
I think this should work:
sum_arr = arr[1:-1] + arr[2:] + arr[:-2]
This creates an array that's two values shorter than arr because the last element in arr doesn't have two extra elements to create a sum with.
If you wanted the array to be of the same length as the original arr, you could append two extra zeros to the arr array like so:
arr = np.append(arr, [0, 0])
sum_arr = arr[1:-1] + arr[2:] + arr[:-2]
To sum a sliding range of n elements you can use convolve1d with all weights set to 1. Use 'constant' boundary mode with the default fill value of 0. As the filter window is centered by default you need to adjust the length of the result at both ends.
import numpy as np
from scipy.ndimage import convolve1d
arr = np.arange(1,26)
for n in range(2,6):
k,r = divmod(n, 2)
print(n, convolve1d(arr, np.ones(n), mode='constant')[k+r-1:-k])
Result:
2 [ 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49]
3 [ 6 9 12 15 18 21 24 27 30 33 36 39 42 45 48 51 54 57 60 63 66 69 72]
4 [ 10 14 18 22 26 30 34 38 42 46 50 54 58 62 66 70 74 78 82 86 90 94]
5 [ 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105 110 115]
I have an (4, 2000) numpy array and want to resample each column (N=4) for every 5 elements with such as max, min, left, right, which makes its shape as (4, 400).
I can do with Pandas.DataFrame using .resample('5Min').agg(~) or with numpy array and for loop like result = [max(input[i:i+5]) for i in range(0, len(input), 5)]. However, it takes amount of time with large input array since it's not vectorized. Is there any way that I can do with vectorized computation with np array?
Here is another way that uses numpy strides under the hood (a is your array):
from skimage.util import view_as_blocks
a = view_as_blocks(a, (4,5))
Now, you can use methods/slicing for parameters you want:
#max
a.max(-1)[0].T
#min
a.min(-1)[0].T
#left
a[...,0][0].T
#right
a[...,-1][0].T
example:
a
#[[ 0 1 2 3 4 5 6 7 8 9]
# [10 11 12 13 14 15 16 17 18 19]
# [20 21 22 23 24 25 26 27 28 29]
# [30 31 32 33 34 35 36 37 38 39]]
output for max
#[[ 4 9]
# [14 19]
# [24 29]
# [34 39]]
I have the following problem:
I have a matrix. Now, I want to delete one entry in each row of the matrix: In rows that contain a certain number (say 4) I want to delete the entry with that number, and in other rows I simply want to delete the last element.
E.g. if I have the matrix
matrix=np.zeros((2,2))
matrix[0,0]=2
matrix[1,0]=4
matrix
which gives
2 0
4 0
after the deletion it should simply be
2
0
thanks for your help!
so, assuming there's maximum only one 4 in a row, what you want to do is:
iterate all rows, and if there's a four use roll so it becomes the last element
delete the last column
in rows that have 4, it will delete this 4 and shift the remaining values that come after it,
in rows that don't have 4, it will delete the last element.
(I took the liberty of trying with a little bigger matrix just to make sure output is as expected)
try this:
import numpy as np
# Actual solution
def remove_in_rows(mat, num):
for i, row in enumerate(mat):
if num in row.tolist():
index = row.tolist().index(num)
mat[i][index:] = np.roll(row[index:], -1)
return np.delete(mat, -1, 1)
# Just some example to demonstrate it works
matrix = np.array([[10 * y + x for x in range(6)] for y in range(6)])
matrix[1, 2] = 4
matrix[3, 3] = 4
matrix[4, 0] = 4
print("BEFORE:")
print(matrix)
matrix = remove_in_rows(matrix, 4)
print("AFTER:")
print(matrix)
Output:
BEFORE:
[[ 0 1 2 3 4 5]
[10 11 4 13 14 15]
[20 21 22 23 24 25]
[30 31 32 4 34 35]
[ 4 41 42 43 44 45]
[50 51 52 53 54 55]]
AFTER:
[[ 0 1 2 3 5]
[10 11 13 14 15]
[20 21 22 23 24]
[30 31 32 34 35]
[41 42 43 44 45]
[50 51 52 53 54]]
I have a 2D array of shape (50,50). I need to subtract a value from each column of this array skipping the first), which is calculated based on the index of the column. For example, using a for loop it would look something like this:
for idx in range(1, A[0, :].shape[0]):
A[0, idx] -= idx * (...) # simple calculations with idx
Now, of course this works fine, but it's very slow and performance is critical for my application. I've tried computing the values to be subtracted using np.fromfunction() and then subtracting it from the original array, but results are different than those obtained by the for loop iteractive subtraction:
func = lambda i, j: j * (...) #some simple calculations
subtraction_matrix = np.fromfunction(np.vectorize(func), (1,50))
A[0, 1:] -= subtraction_matrix
What am I doing wrong? Or is there some other method that would be better? Any help is appreciated!
All your code snippets indicate that you require the subtraction to happen only in the first row of A (though you've not explicitly mentioned that). So, I'm proceeding with that understanding.
Referring to your use of from_function(), you can use the subtraction_matrix as below:
A[0,1:] -= subtraction_matrix[1:]
Testing it out (assuming shape (5,5) instead of (50,50)):
import numpy as np
A = np.arange(25).reshape(5,5)
print (A)
func = lambda j: j * 10 #some simple calculations
subtraction_matrix = np.fromfunction(np.vectorize(func), (5,), dtype=A.dtype)
A[0,1:] -= subtraction_matrix[1:]
print (A)
Output:
[[ 0 1 2 3 4] # print(A), before subtraction
[ 5 6 7 8 9]
[10 11 12 13 14]
[15 16 17 18 19]
[20 21 22 23 24]]
[[ 0 -9 -18 -27 -36] # print(A), after subtraction
[ 5 6 7 8 9]
[ 10 11 12 13 14]
[ 15 16 17 18 19]
[ 20 21 22 23 24]]
If you want the subtraction to happen in all the rows of A, you just need to use the line A[:,1:] -= subtraction_matrix[1:], instead of the line A[0,1:] -= subtraction_matrix[1:]
In testing a conjecture about the following recursive relation
,
which claims a periodicity of some kind for the sequence of numbers, I wrote a python program which computes the sequences and prints them in a table.
1 # Consider the recursive relation x_{i+1} = p-1 - (p*i-1 mod x_i)
2 # with p prime and x_0 = 1. What is the shortest period of the
3 # sequence?
4
5 from __future__ import print_function
6 import numpy as np
7 from matplotlib import pyplot as plt
8
9 # The length of the sequences.
10 seq_length = 100
11
12 upperbound_primes = 30
13
14 # Computing a list of prime numbers up to n
15 def primes(n):
16 sieve = [True] * n
17 for i in xrange(3,int(n**0.5)+1,2):
18 if sieve[i]:
19 sieve[i*i::2*i]=[False]*((n-i*i-1)/(2*i)+1)
20 return [2] + [i for i in xrange(3,n,2) if sieve[i]]
21
22 # The list of prime numbers up to upperbound_primes
23 p = primes(upperbound_primes)
24
25 # The amount of primes numbers
26 no_primes = len(p)
27
28 # Generate the sequence for the prime number p
29 def sequence(p):
30 x = np.empty(seq_length)
31 x[0] = 1
32 for i in range(1,seq_length):
33 x[i] = p - 1 - (p * (i-1) - 1) % x[i-1]
34 return x
35
36 # List with the sequences.
37 seq = [sequence(i) for i in p]
38 """
39 # Print the sequences in a table where the upper row
40 # indicates the prime numbers.
41 for i in range(seq_length):
42 if not i:
43 for n in p:
44 print('\t',n,end='')
45 print('')
46 print(i+1,'\t',end='')
47 for j in range(no_primes):
48 print(seq[j][i],end='\t')
49 print('\n',end='')
50 """
51 def autocor(x):
52 result = np.correlate(x,x,mode='full')
53 return result[result.size/2:]
54
55
56 fig = plt.figure('Finding period in the sequences')
57 k = 0
58 for s in seq:
59 k = k + 1
60 fig.add_subplot(no_primes,1,k)
61 plt.title("Prime number %d" % p[k-1])
62 plt.plot(autocor(s))
63 plt.show()
64
Now I want to investigate periodicities in these sequences that I computed. After looking around on the net I found myself two options it seems:
Preform autocorrelation on the data and look for the first peak. This should give an approximation of the period.
Preform a FFT on the data. This shows the frequency of the numbers. I do not see how this can give any useful information about the periodicity of a sequence of numbers.
The last lines show my attempt of using autocorrelation, inspired by the accepted answer of How can I use numpy.correlate to do autocorrelation?.
It gives the following plot
Clearly we see a descending sequence of numbers for all the primes.
When testing the same method on a sin function with the following simplyfied python-code snippet
1 # Testing the autocorrelation of numpy
2
3 import numpy as np
4 from matplotlib import pyplot as plt
5
6 num_samples = 1000
7 t = np.arange(num_samples)
8 dt = 0.1
9
10 def autocor(x):
11 result = np.correlate(x,x,mode='full')
12 return result[result.size/2:]
13
14 def f(x):
15 return [np.sin(i * 2 * np.pi * dt) for i in range(num_samples)]
16
17 plt.plot(autocor(f(t)))
18 plt.show()
I get a similar result, it giving the following plot for the sine function
How could I read off the periodicity in the sine-function case, for example?
Anyhow, I do not understand the mechanism of the autocorrelation leading to peaks that give information of the periodicity of a signal. Can someone elaborate on that? How do you properly use autocorrelation in this context?
Also what am I doing wrong in my implementation of the autocorrelation?
Suggestions on alternative methods of determining periodicity in a sequence of numbers are welcome.
There are quite a few questions here, so I'll start be describing how an autocorrelation produces the period from the case of "3", ie, your second sub-plot of the first image.
For prime number 3, the sequence is (after a less consistent start) 1,2,1,2,1,2,1,2,.... To calculate the autocorrelation, the array is basically translated relative to itself, all the elements that align are multiplied, and all of these results are added. So it looks something like this, for a few test cases, where A is the autocorrelation:
0 1 2 3 4 5 6 7 ... 43 44 45 46 47 48 49 # indices 0
1 2 1 2 1 2 1 2 2 1 2 1 2 1 2 # values 0
1 2 1 2 1 2 1 2 2 1 2 1 2 1 2 # values 1
0 1 2 3 4 5 6 7 ... 43 44 45 46 47 48 49 # indices 1
1 4 1 4 1 4 1 4 4 1 4 1 4 1 4 # products
# above result A[0] = 5*25 5=1+4 25=# of pairs # A[0] = 125
0 1 2 3 4 5 6 7 ... 43 44 45 46 47 48 49 # indices 0
1 2 1 2 1 2 1 2 2 1 2 1 2 1 2 # values 0
1 2 1 2 1 2 1 2 2 1 2 1 2 1 2 # values 1
0 1 2 3 4 5 6 7 ... 43 44 45 46 47 48 49 # indices 1
2 2 2 2 2 2 2 2 2 2 2 2 2 2 # products
# above result A[1] = 4*24 4=2+2 24=# of pairs # A[1] = 96
0 1 2 3 4 5 6 7 ... 43 44 45 46 47 48 49 # indices 0
1 2 1 2 1 2 1 2 2 1 2 1 2 1 2 # values 0
1 2 1 2 1 2 1 2 2 1 2 1 2 1 2 # values 1
0 1 2 3 4 5 6 7 ... 43 44 45 46 47 48 49 # indices 1
1 4 1 4 1 4 1 4 4 1 4 1 4 # products
# above result A[2] = 5*23 5=4+1 23=# of pairs # A[2] = 115
There are three take-home messages from the above: 1. the autocorrelation, A, has larger value when like elements are lined up and multiplied, here at every other step. 2. The index of the autocorrelation corresponds to the relative shift. 3. When doing the autocorrelation over the full arrays, as shown here, there's always a downward ramp since the number of points added together to produce the value are reduced at each successive shift.
So here you can see why there's a periodic 20% bump in your graph from "Prime number 3": because the terms that are summed are 1+4 when they are aligned, vs 2+2 when they aren't, ie, 5 vs 4. It's this bump that you're looking for in reading off the period. That is, here is shows that the period is 2, since that is the index of your first bump. (Also, note btw, in the above I only do the calculation as number of pairs to see how this known periodicity leads to the result you see in the autocorrelation, that is, one doesn't in general want to think of number of pairs.)
In these calculations, the values of the bump relative to the base will be increased if you first subtract the mean before doing the autocorrelation. The ramp can be removed if you do the calculation using arrays with trimmed ends, so there's always the same overlap; this usually makes sense since usually one is looking for a periodicity of much shorter wavelength than the full sample (because it takes many oscillations to define a good period of oscillation).
For the autocorrelation of the sine wave, the basic answer is that the period is shown as the first bump. I redid the plot except with the time axis applied. It's always clearest in these things to use a real time axis, so I changed your code a bit to include that. (Also, I replaced the list comprehension with a proper vectorized numpy expression for calculating the sin wave, but that's not important here. And I also explicitly defined the frequency in f(x), just to make it more clear what's going on -- as an implicitly frequency of 1 in confusing.)
The main point is that since the autocorrelation is calculated by shifting along the axis one point at a time, the axis of the autocorrelation is just the time axis. So I plot that as the axis, and then can read the period off of that. Here I zoomed in to see it clearly (and the code is below):
# Testing the autocorrelation of numpy
import numpy as np
from matplotlib import pyplot as plt
num_samples = 1000
dt = 0.1
t = dt*np.arange(num_samples)
def autocor(x):
result = np.correlate(x,x,mode='full')
return result[result.size/2:]
def f(freq):
return np.sin(2*np.pi*freq*t)
plt.plot(t, autocor(f(.3)))
plt.xlabel("time (sec)")
plt.show()
That is, in the above, I set the frequency to 0.3, and the graph shows the period as about 3.3, which is what's expected.
All of this said, in my experience, the autocorrelation generally works well for physical signals but not so reliably for algorithmic signals. It's fairly easy to throw off, for example, if a periodic signal skips a step, which can happen with an algorithm, but is less likely with a vibrating object. You'd think that it should be trivial to calculate that period of an algorithmic signal, but a bit of searching around will show that it's not, and it's even difficult to define what's meant by period. For example, the series:
1 2 1 2 1 2 0 1 2 1 2 1 2
won't work well with the autocorrelation test.
Update.
#tom10 gave a thorough survey of autocorrelation and explained why the first bump in the autocorrelation could give the period of the periodic signal.
I tried both approaches, FFT as well as autocorrelation. Their results agree, although I would prefer FFT over autocorrelation since it gives you the period more directly.
When using autocorrelation, we simply determine the coordinate of the first peak. A manual inspection of the autocorrelation graph will reveal if you have the 'right' peak, since you can notice the period (although for primes above 7 this becomes less clear). I'm sure you could also work out a simple algorithm which calculates the 'right' peak. Perhaps someone could elaborate on some simple algortihm which does the job?
See, for instance, the following plot of the sequences next to their autocorrelation.
Code:
1 # Plotting sequences satisfying, x_{i+1} = p-1 - (p*i-1 mod x_i)
2 # with p prime and x_0 = 1, next to their autocorrelation.
3
4 from __future__ import print_function
5 import numpy as np
6 from matplotlib import pyplot as plt
7
8 # The length of the sequences.
9 seq_length = 10000
10
11 upperbound_primes = 12
12
13 # Computing a list of prime numbers up to n
14 def primes(n):
15 sieve = [True] * n
16 for i in xrange(3,int(n**0.5)+1,2):
17 if sieve[i]:
18 sieve[i*i::2*i]=[False]*((n-i*i-1)/(2*i)+1)
19 return [2] + [i for i in xrange(3,n,2) if sieve[i]]
20
21 # The list of prime numbers up to upperbound_primes
22 p = primes(upperbound_primes)
23
24 # The amount of primes numbers
25 no_primes = len(p)
26
27 # Generate the sequence for the prime number p
28 def sequence(p):
29 x = np.empty(seq_length)
30 x[0] = 1
31 for i in range(1,seq_length):
32 x[i] = p - 1 - (p * (i-1) - 1) % x[i-1]
33 return x
34
35 # List with the sequences.
36 seq = [sequence(i) for i in p]
37
38 # Autocorrelation function.
39 def autocor(x):
40 result = np.correlate(x,x,mode='full')
41 return result[result.size/2:]
42
43 fig = plt.figure("The sequences next to their autocorrelation")
44 plt.suptitle("The sequences next to their autocorrelation")
45
46 # Proper spacing between subplots.
47 fig.subplots_adjust(hspace=1.2)
48
49 # Set up pyplot to use TeX.
50 plt.rc('text',usetex=True)
51 plt.rc('font',family='serif')
52
53 # Maximize plot window by command.
54 mng = plt.get_current_fig_manager()
55 mng.resize(*mng.window.maxsize())
56
57 k = 0
58 for s in seq:
59 k = k + 1
60 fig.add_subplot(no_primes,2,2*(k-1)+1)
61 plt.title("Sequence of the prime %d" % p[k-1])
62 plt.plot(s)
63 plt.xlabel(r"Index $i$")
64 plt.ylabel(r"Sequence number $x_i$")
65 plt.xlim(0,100)
66
67 # Constrain the number of ticks on the y-axis, for clarity.
68 plt.locator_params(axis='y',nbins=4)
69
70 fig.add_subplot(no_primes,2,2*k)
71 plt.title(r"Autocorrelation of the sequence $^{%d}x$" % p[k-1])
72 plt.plot(autocor(s))
73 plt.xlabel(r"Index $i$")
74 plt.xticks
75 plt.ylabel("Autocorrelation")
76
77 # Proper scaling of the y-axis.
78 ymin = autocor(s)[1]-int(autocor(s)[1]/10)
79 ymax = autocor(s)[1]+int(autocor(s)[1]/10)
80 plt.ylim(ymin,ymax)
81 plt.xlim(0,500)
82
83 plt.locator_params(axis='y',nbins=4)
84
85 # Use scientific notation when 0< t < 1 or t > 10
86 plt.ticklabel_format(style='sci',axis='y',scilimits=(0,1))
87
88 plt.show()
When using FFT, we Fourier transform our sequence and look for the first peak. The coordinate of this first peak, gives the frequency which represents our signal the coarsest. This will give our period since the coarsest frequency is the frequency by which our sequence (ideally) oscillates.
See the following plot of the sequences next to their Fourier transforms.
Code:
1 # Plotting sequences satisfying, x_{i+1} = p-1 - (p*i-1 mod x_i)
2 # with p prime and x_0 = 1, next to their Fourier transforms.
3
4 from __future__ import print_function
5 import numpy as np
6 from matplotlib import pyplot as plt
7
8 # The length of the sequences.
9 seq_length = 10000
10
11 upperbound_primes = 12
12
13 # Computing a list of prime numbers up to n
14 def primes(n):
15 sieve = [True] * n
16 for i in xrange(3,int(n**0.5)+1,2):
17 if sieve[i]:
18 sieve[i*i::2*i]=[False]*((n-i*i-1)/(2*i)+1)
19 return [2] + [i for i in xrange(3,n,2) if sieve[i]]
20
21 # The list of prime numbers up to upperbound_primes
22 p = primes(upperbound_primes)
23
24 # The amount of primes numbers
25 no_primes = len(p)
26
27 # Generate the sequence for the prime number p
28 def sequence(p):
29 x = np.empty(seq_length)
30 x[0] = 1
31 for i in range(1,seq_length):
32 x[i] = p - 1 - (p * (i-1) - 1) % x[i-1]
33 return x
34
35 # List with the sequences.
36 seq = [sequence(i) for i in p]
37
38 fig = plt.figure("The sequences next to their FFT")
39 plt.suptitle("The sequences next to their FFT")
40
41 # Proper spacing between subplots.
42 fig.subplots_adjust(hspace=1.2)
43
44 # Set up pyplot to use TeX.
45 plt.rc('text',usetex=True)
46 plt.rc('font',family='serif')
47
48
49 # Maximize plot window by command.
50 mng = plt.get_current_fig_manager()
51 mng.resize(*mng.window.maxsize())
52
53 k = 0
54 for s in seq:
55 f = np.fft.rfft(s)
56 f[0] = 0
57 freq = np.fft.rfftfreq(seq_length)
58 k = k + 1
59 fig.add_subplot(no_primes,2,2*(k-1)+1)
60 plt.title("Sequence of the prime %d" % p[k-1])
61 plt.plot(s)
62 plt.xlabel(r"Index $i$")
63 plt.ylabel(r"Sequence number $x_i$")
64 plt.xlim(0,100)
65
66 # Constrain the number of ticks on the y-axis, for clarity.
67 plt.locator_params(nbins=4)
68
69 fig.add_subplot(no_primes,2,2*k)
70 plt.title(r"FFT of the sequence $^{%d}x$" % p[k-1])
71 plt.plot(freq,abs(f))
72 plt.xlabel("Frequency")
73 plt.ylabel("Amplitude")
74 plt.locator_params(nbins=4)
75
76 # Use scientific notation when 0 < t < 0 or t > 10
77 plt.ticklabel_format(style='sci',axis='y',scilimits=(0,1))
78
79 plt.show()
To see why the FFT method is more convenient then autocorrelation notice that we have a clear algorithm of determining the period: find the first peak of the Fourier transform. For a sufficient number of samples this always works.
See the following table, attained by the FFT method, which agrees with the autocorrelation method.
prime frequency period
2 0.00 1000.00
3 0.50 2.00
5 0.08 12.00
7 0.02 59.88
11 0.00 1000.00
The following code implements the algorithm, printing a table specifying the frequency and period of the sequences per prime number.
1 # Print a table of periods, determined by the FFT method,
2 # of sequences satisfying,
3 # x_{i+1} = p-1 - (p*i-1 mod x_i) with p prime and x_0 = 1.
4
5 from __future__ import print_function
6 import numpy as np
7 from matplotlib import pyplot as plt
8
9 # The length of the sequences.
10 seq_length = 10000
11
12 upperbound_primes = 12
13
14 # Computing a list of prime numbers up to n
15 def primes(n):
16 sieve = [True] * n
17 for i in xrange(3,int(n**0.5)+1,2):
18 if sieve[i]:
19 sieve[i*i::2*i]=[False]*((n-i*i-1)/(2*i)+1)
20 return [2] + [i for i in xrange(3,n,2) if sieve[i]]
21
22 # The list of prime numbers up to upperbound_primes
23 p = primes(upperbound_primes)
24
25 # The amount of primes numbers
26 no_primes = len(p)
27
28 # Generate the sequence for the prime number p
29 def sequence(p):
30 x = np.empty(seq_length)
31 x[0] = 1
32 for i in range(1,seq_length):
33 x[i] = p - 1 - (p * (i-1) - 1) % x[i-1]
34 return x
35
36 # List with the sequences.
37 seq = [sequence(i) for i in p]
38
39 # Function that finds the first peak.
40 # Assumption: seq_length >> 10 so the Fourier transformed
41 # signal is sufficiently smooth.
42 def firstpeak(x):
43 for i in range(10,len(x)-1):
44 if x[i+1] < x[i]:
45 return i
46 return len(x)-1
47
48 k = 0
49 for s in seq:
50 f = np.fft.rfft(s)
51 freq = np.fft.rfftfreq(seq_length)
52 k = k + 1
53 if k == 1:
54 print("prime \t frequency \t period")
55 print(p[k-1],'\t %.2f' % float(freq[firstpeak(abs(f))]), \
56 '\t\t %.2f' % float(1/freq[firstpeak(abs(f))]))
I used 10000 samples (seq_length) in all the above code. As we increase the number of samples the periods are seen to converge to a certain integral value (Using the FFT method).
The FFT method seems to me like an ideal tool to determine periods in algorithmic signals, one only being limited by how high a sample number your equipment can handle.