Converting Matrix Definition to Zero-Indexed Notation - Numpy - python

I am trying to construct a numpy array (a 2-dimensional numpy array - i.e. a matrix) from a paper that uses a non-standard indexing to construct the matrix. I.e. the top left element is q1,2. instead of q0,0.
Define the n x (n-2) matrix Q by its elements qi,j for i = i,...,n and j = 2, ... , n-1 given by
qj-1,j=h-1j-1, qj,j = h-1j-1 - h-1j and qj+1,j=hjj-1. (I have posted this in Latex form here: http://www.texpaste.com/n/8vwds4fx)
I have tried to implement in python like this:
# n = u_s.size
# n = 299 for this example
n = 299
Q = np.zeros((n,n-2))
for i in range(0,n+1):
for j in range(2,n):
Q[j-1,j] = 1.0/h[j-1]
Q[j,j] = -1.0/h[j-1] - 1.0/h[j]
Q[j+1,j] = 1.0/h[j]
But I always get the error:
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-54-c07a3b1c81bb> in <module>()
1 for i in range(1,n+1):
2 for j in range(2,n-1):
----> 3 Q[j-1,j] = 1.0/h[j-1]
4 Q[j,j] = -1.0/h[j-1] - 1.0/h[j]
5 Q[j+1,j] = 1.0/h[j]
IndexError: index 297 is out of bounds for axis 1 with size 297
I initially thought I could decrement both i and j in my for loop to keep edge cases safe, as a quick way to move to zero-indexed notation, but this hasn't worked. I also tried incrementing and modifying the range().
Is there a way to convert this definition to one that python can handle? Is this a common issue?

Simplifying the problem to make the assignment pattern obvious:
In [228]: h=np.arange(10,15)
In [229]: Q=np.zeros((5,5),int)
In [230]: for j in range(1,5):
...: Q[j-1:j+2,j] = h[j-1:j+2]
In [231]: Q
Out[231]:
array([[ 0, 10, 0, 0, 0],
[ 0, 11, 11, 0, 0],
[ 0, 12, 12, 12, 0],
[ 0, 0, 13, 13, 13],
[ 0, 0, 0, 14, 14]])
Assignment to the partial first and last columns may need tweaking. Here's the equivalent built from diagonals:
In [232]: np.diag(h,0)+np.diag(h[:-1],1)+np.diag(h[1:],-1)
Out[232]:
array([[10, 10, 0, 0, 0],
[11, 11, 11, 0, 0],
[ 0, 12, 12, 12, 0],
[ 0, 0, 13, 13, 13],
[ 0, 0, 0, 14, 14]])
With the h[j-1], h[j] indexing this diagonal assignment probably needs tweaking, but it should be a useful starting point.
Selecting h values more like what you use (skipping the 1/h for now):
In [238]: Q=np.zeros((5,5),int)
In [239]: for j in range(1,4):
...: Q[j-1:j+2,j] =[h[j-1],h[j-1]+h[j], h[j]]
...:
In [240]: Q
Out[240]:
array([[ 0, 10, 0, 0, 0],
[ 0, 21, 11, 0, 0],
[ 0, 11, 23, 12, 0],
[ 0, 0, 12, 25, 0],
[ 0, 0, 0, 13, 0]])
I'm skipping the two partial end columns for now. The first slicing approach allowed me to be a bit sloppy, since it's ok to slice 'off the end'. The end columns, if set, will require their own expressions.
In [241]: j=0; Q[j:j+2,j] =[h[j], h[j]]
In [242]: j=4; Q[j-1:j+1,j] =[h[j-1],h[j-1]+h[j]]
In [243]: Q
Out[243]:
array([[10, 10, 0, 0, 0],
[10, 21, 11, 0, 0],
[ 0, 11, 23, 12, 0],
[ 0, 0, 12, 25, 13],
[ 0, 0, 0, 13, 27]])
The relevant diagonal pieces are still evident:
In [244]: h[1:]+h[:-1]
Out[244]: array([21, 23, 25, 27])

The equation doesn't contain any value for i. It is referring only to j. The Q should be a matrix of dimension n+2 x n+2. For j = 1, it refers to Q[0,1], Q[1,1] and Q[2,1]. for j =n, it refers to Q[n-1,n], Q[n,n] and Q[n+1,n]. So, Q should have indices from 0 to n+1 which n+2
I don't think, you require the i loop. You can achieve your results only with j loop from 1 to n, but Q should be from 0 to n+1

Related

Clustering a list with nearest values without sorting

I have a list like this
tst = [1,3,4,6,8,22,24,25,26,67,68,70,72,0,0,0,0,0,0,0,4,5,6,36,38,36,31]
I want to group the elements from above list into separate groups/lists based on the difference between the consecutive elements in the list (differing by 1 or 2 or 3).
I have tried following code
def slice_when(predicate, iterable):
i, x, size = 0, 0, len(iterable)
while i < size-1:
if predicate(iterable[i], iterable[i+1]):
yield iterable[x:i+1]
x = i + 1
i += 1
yield iterable[x:size]
tst = [1,3,4,6,8,22,24,25,26,67,68,70,72,0,0,0,0,0,0,0,4,5,6,36,38,36,31]
slices = slice_when(lambda x,y: (y - x > 2), tst)
whola=(list(slices))
I got this results
[[1, 3, 4, 6, 8], [22, 24, 25, 26], [67, 68, 70, 72, 0, 0, 0, 0, 0, 0, 0], [4, 5, 6], [36, 38, 36, 31]]
In 3rd list it doesn't separate the sequence of zeros into another list. Any kind of help highly appreciate. Thank you
I guess this is what you want?
tst = [1,3,4,6,8,22,24,25,26,67,68,70,72,0,0,0,0,0,0,0,4,5,6,36,38,36,31]
slices = slice_when(lambda x,y: (abs(y - x) > 2), tst) # Use abs!
whola=(list(slices))
print(whola)

Alternatives for numpy.random generation with choice values and specific frequency of values

I am working in generating an (1109, 8) array with random values generated from a fixed set of numbers [18, 24, 36, 0], I need to ensure each row contains 5 zeros at all times, but it wasn't happening even after adjusting the weightings for probabilities.
My workaround code is below but wanted to know if there is an easier way with another function? or perhaps by adjusting some of the parameters of the generator?
https://numpy.org/doc/stable/reference/random/generator.html
#Random output using new method
from numpy.random import default_rng
rng = default_rng(1)
#generate an array with random values of test duration,
test_duration = rng.choice([18, 24, 36, 0], size = arr.shape, p=[0.075, 0.1, 0.2, 0.625])
# ensure number of tests equals n_tests
n_tests = 3
non_tested = arr.shape[1] - n_tests
for row in range(len(test_duration)):
while np.count_nonzero(test_duration[row, :]) != n_tests:
new_test = rng.choice([18, 24, 36, 0], size = arr.shape[1], p=[0.075, 0.1, 0.2, 0.625])
test_duration[row, :] = np.array(new_test)
else:
pass
print('There are no days exceeding n_tests')
#print(test_durations)
print(test_duration[:10, :])
If you need 5 zeros in every row, you can just randomly select 3 values from [18, 24, 36], pad the rest with zeros and then do a per-row random shuffle. The numpy shuffle happens in-place, so you don't need to reassign.
import numpy as np
c = [18,24,26]
p = np.array([0.075, 0.1, 0.2])
p = p / p.sum() # normalize the probs
a = np.random.choice(c, size=(1109, 3), replace=True, p=(p/p.sum()))
a = np.hstack([a, np.zeros((1109, 5), dtype=np.int32)])
list(map(np.random.shuffle, a))
a
# returns:
array([[ 0, 0, 0, 0, 36, 0, 36, 36],
[ 0, 36, 0, 24, 24, 0, 0, 0],
[ 0, 0, 0, 0, 36, 36, 36, 0]])
...
[ 0, 0, 0, 24, 24, 36, 0, 0],
[ 0, 24, 0, 0, 0, 36, 0, 18],
[ 0, 0, 0, 36, 36, 24, 0, 0]])
You could simply create a random choice for the 5 positions of the zeros in the array, this way you would enforce that there are indeed 5 zeros, and after you sample the [18, 24, 36] with their normalized probabilities.
But by doing this you are not respecting the probability density that you specified in the first place, I don't know in which application you're using this for but this is a point to consider.

Detecting egde on square wave

I have two lists, one for time and other for amplitude.
time = [0, 1, 2, 3, 6, 7, 10, 11, 13, 15, 16, 17, 18, 20] # (seconds for example) the step isn't fixed
ampli = [0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0] # ugly space to facilitate the reading
I want to know when there's a change from '0' to '1' or vice-versa, but I only care if the event happens after verify_time = X. So, if verify_time = 12.5 it would return time[8] = 13 and time[10] = 16.
What I have so far is:
time = [0, 1, 2, 3, 6, 7, 10, 11, 13, 15, 16, 17, 18, 20] # (seconds for example) the step isn't fixed
ampli = [0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0] # ugly spacing to facilitate the reading
verify_time = 12.5
start_end = []
for i, (t, a) in enumerate(zip(time, ampli)):
if t >= verify_time: # should check the values from here
if ampli[i-1] and (a != ampli[i-1]): # there's a change from 0 to 1 or vice-versa
start_end.append(i)
print(f"Start: {time[start_end[0]]}s")
print(f"End: {time[start_end[1]]}s")
This will print:
Start: 13s
End: 17s
Question 1) Shouldn't it print End: 16s? I'm kind of lost with this logic because the number of '1's is three (3).
Question 2) Is there another way to have the same results without using this for if if? I find it awkward, in Matlab I would use the diff() function
if you don't mind using numpy, it is easiest, also faster in larger lists, to find edges by calculating differences, unless your waves are taking gigabytes that goes out of memory
import numpy as np
verify_time = 12.5
time = np.array([0, 1, 2, 3, 6, 7, 10, 11, 13, 15, 16, 17, 18, 20])
ampli = np.array([0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0])
ind = time>verify_time
time = time[ind]
ampli = ampli[ind]
d_ampli = np.diff(ampli)
starts = np.where(d_ampli>0)[0]
ends = np.where(d_ampli<0)[0]-1
UPDATE
I forgot to change the diff properly, it should be d_ampli = np.diff(ampli, prepend=ampli[0]
UPDATE
As you noted, the original answer returns an empty start. The reason is that after filtering the ampli starts with [1, 1, ...] so there is no edge. A philosophical question arises here, does the edge really starts before 12.5 or after it? We don't know, and I'm kinda sure you won't care. What you want here is a backward differencing scheme that numpy does not allow, so we just trick it by shifting everything forward one index as:
import numpy as np
verify_time = 12.5
time = np.array([0, 1, 2, 3, 6, 7, 10, 11, 13, 15, 16, 17, 18, 20])
ampli = np.array([0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0])
d_ampli = np.r_[[0], np.diff(ampli)]
starts = np.where(d_ampli>0)[0]
ends = np.where(d_ampli<0)[0]-1
start = start[time[start]>verify_time]
ends = ends[time[ends]>verify_time]
start, ends
(array([8], dtype=int64), array([10], dtype=int64))
It prints 17s because you take note of the first value after the change, which is 17 for the first 0 after the end of the square wave.
I've simplified the logic into a list comprehension, so you it should make more sense:
assert len(time) == len(ampli)
start_end = [i for i in range(len(time)) if time[i] >= verify_time and ampli[i-1] is not None and (ampli[i] != ampli[i-1])]
print(f"Start: {time[start_end[0]]}s")
print(f"End: {time[start_end[1]]}s")
Also, you had an issue, where if ampli[i-1] was also False when it was 0. Fixed that too. It would be most accurate, if you took the average of time[start_end[0]] and time[start_end[0]-1], as all you know based on your resolution, that the transition occurred somewhere between the two samples.
I've made the below solution to have a straightforward algorithm. In summary, it goes as follows:
Convert lists to NumPy arrays
Find closest value in time array to verify_time, cut off all indexes that occur beforehand.
NumPys' "diff" method is great for finding rising and falling edges. Once those edges are found, we can use NumPys' "where" method to look up the indexes and then return the time found at the same indexes in the time array.
Coding Environment
Python 3.6 (Minimum Requirement for the print statements)
NumPy 1.15.2 (Older versions are probably fine)
import numpy as np
# inputs
time = [0, 1, 2, 3, 6, 7, 10, 11, 13, 15, 16, 17, 18, 20] # (seconds for example) the step isn't fixed
ampli = [0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0] # ugly spacing to facilitate the reading
verify_time = 12.5
# ------------------------------------------
# Solution
# Step 1) Convert lists to Numpy Arrays
npTime = np.array(time)
npAmplitude = np.array(ampli) # Amplitude
# Step 2) Find closest Value in time array to 'verify_time'.
# Strategy:
# i) Subtact 'verify_time' from each value in array. (Produces an array of Diffs)
# ii) The Diff that is nearest to zero, or better yet is zero is the best match for 'verify_time'
# iii) Get the array index of the Diff selected in step ii
# Step i
npDiffs = np.abs(npTime - float(verify_time))
# Step ii
smallest_value = np.amin(npDiffs)
# Step iii (Use numpy.where to lookup array index)
first_index_we_care_about = (np.where(npDiffs == smallest_value)[0])[0]
first_index_we_care_about = first_index_we_care_about - 1 # Below edge detection requires previous index
# Remove the beginning parts of the arrays that the question doesn't care about
npTime = npTime[first_index_we_care_about:len(npTime)]
npAmplitude = npAmplitude[first_index_we_care_about:len(npAmplitude)]
# Step 3) Edge Detection: Find the rising and falling edges
# Generates a 1 when rising edge is found, -1 for falling edges, 0s for no change
npEdges = np.diff(npAmplitude)
# For Reference
# Here you can see that numpy diff placed a 1 before all rising edges, and a -1 before falling
# ampli [ 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0]
# npEdges [ 0, 1, 0, -1, 0, 0, 0, 1, 0, 0, -1, 0, 0]
# Get array indexes where a 1 is found (I.e. A Rising Edge)
npRising_edge_indexes = np.where(npEdges == 1)[0]
# Get array indexes where a -1 is found (I.e. A Falling Edge)
npFalling_edge_indexes = np.where(npEdges == -1)[0]
# Print times that edges are found after 'verify_time'
# Note: Adjust edge detection index by '+1' to answer question correctly (yes this is consistent)
print(f'Start: {npTime[npRising_edge_indexes[0]+1]}s')
print(f'End: {npTime[npFalling_edge_indexes[0]+1]}s')
Output
Start: 13s
End: 17s

Map an element in a multi-dimension array to its index

I am using the function get_tuples(length, total) from here
to generate an array of all tuples of given length and sum, an example and the function are shown below. After I have created the array I need to find a way to return the indices of a given number of elements in the array. I was able to do that using .index() by changing the array to a list, as shown below. However, this solution or another solution that is also based on searching (for example using np.where) takes a lot of time to find the indices. Since all elements in the array (array s in the example) are different, I was wondering if we can construct a one-to-one mapping, i.e., a function such that given the element in the array it returns the index of the element by doing some addition and multiplication on the values of this element. Any ideas if that is possible? Thanks!
import numpy as np
def get_tuples(length, total):
if length == 1:
yield (total,)
return
for i in range(total + 1):
for t in get_tuples(length - 1, total - i):
yield (i,) + t
#example
s = np.array(list(get_tuples(4, 20)))
# array s
In [1]: s
Out[1]:
array([[ 0, 0, 0, 20],
[ 0, 0, 1, 19],
[ 0, 0, 2, 18],
...,
[19, 0, 1, 0],
[19, 1, 0, 0],
[20, 0, 0, 0]])
#example of element to find the index for. (Note in reality this is 1000+ elements)
elements_to_find =np.array([[ 0, 0, 0, 20],
[ 0, 0, 7, 13],
[ 0, 5, 5, 10],
[ 0, 0, 5, 15],
[ 0, 2, 4, 14]])
#change array to list
s_list = s.tolist()
#find the indices
indx=[s_list.index(i) for i in elements_to_find.tolist()]
#output
In [2]: indx
Out[2]: [0, 7, 100, 5, 45]
Here is a formula that calculates the index based on the tuple alone, i.e. it needn't see the full array. To compute the index of an N-tuple it needs to evaluate N-1 binomial coefficients. The following implementation is (part-) vectorized, it accepts ND-arrays but the tuples must be in the last dimension.
import numpy as np
from scipy.special import comb
# unfortunately, comb with option exact=True is not vectorized
def bc(N,k):
return np.round(comb(N,k)).astype(int)
def get_idx(s):
N = s.shape[-1] - 1
R = np.arange(1,N)
ps = s[...,::-1].cumsum(-1)
B = bc(ps[...,1:-1]+R,1+R)
return bc(ps[...,-1]+N,N) - ps[...,0] - 1 - B.sum(-1)
# OP's generator
def get_tuples(length, total):
if length == 1:
yield (total,)
return
for i in range(total + 1):
for t in get_tuples(length - 1, total - i):
yield (i,) + t
#example
s = np.array(list(get_tuples(4, 20)))
# compute each index
r = get_idx(s)
# expected: 0,1,2,3,...
assert (r == np.arange(len(r))).all()
print("all ok")
#example of element to find the index for. (Note in reality this is 1000+ elements)
elements_to_find =np.array([[ 0, 0, 0, 20],
[ 0, 0, 7, 13],
[ 0, 5, 5, 10],
[ 0, 0, 5, 15],
[ 0, 2, 4, 14]])
print(get_idx(elements_to_find))
Sample run:
all ok
[ 0 7 100 5 45]
How to derive formula:
Use stars and bars to express the full partition count #part(N,k) (N is total, k is length) as a single binomial coefficient (N + k - 1) choose (k - 1).
Count back-to-front: It is not hard to verify that after the i-th full iteration of the outer loop of OP's generator exactly #part(N-i,k) have not yet been enumerated. Indeed, what's left are all partitions p1+p2+... = N with p1>=i; we can write p1=q1+i such that q1+p2+... = N-i and this latter partition is constraint-free so we can use 1. to count.
You can use binary search to make the search a lot faster.
Binary search makes the search O(log(n)) rather than O(n) (using Index)
We do not need to sort the tuples since they are already sorted by the generator
import bisect
def get_tuples(length, total):
" Generates tuples "
if length == 1:
yield (total,)
return
yield from ((i,) + t for i in range(total + 1) for t in get_tuples(length - 1, total - i))
def find_indexes(x, indexes):
if len(indexes) > 100:
# Faster to generate all indexes when we have a large
# number to check
d = dict(zip(x, range(len(x))))
return [d[tuple(i)] for i in indexes]
else:
return [bisect.bisect_left(x, tuple(i)) for i in indexes]
# Generate tuples (in this case 4, 20)
x = list(get_tuples(4, 20))
# Tuples are generated in sorted order [(0,0,0,20), ...(20,0,0,0)]
# which allows binary search to be used
indexes = [[ 0, 0, 0, 20],
[ 0, 0, 7, 13],
[ 0, 5, 5, 10],
[ 0, 0, 5, 15],
[ 0, 2, 4, 14]]
y = find_indexes(x, indexes)
print('Found indexes:', *y)
print('Indexes & Tuples:')
for i in y:
print(i, x[i])
Output
Found indexes: 0 7 100 5 45
Indexes & Tuples:
0 (0, 0, 0, 20)
7 (0, 0, 7, 13)
100 (0, 5, 5, 10)
5 (0, 0, 5, 15)
45 (0, 2, 4, 14)
Performance
Scenario 1--Tuples already computed and we just want to find the index of certain tuples
For instance x = list(get_tuples(4, 20)) has already been perform.
Search for
indexes = [[ 0, 0, 0, 20],
[ 0, 0, 7, 13],
[ 0, 5, 5, 10],
[ 0, 0, 5, 15],
[ 0, 2, 4, 14]]
Binary Search
%timeit find_indexes(x, indexes)
100000 loops, best of 3: 11.2 µs per loop
Calculates the index based on the tuple alone (courtesy #PaulPanzer approach)
%timeit get_idx(indexes)
10000 loops, best of 3: 92.7 µs per loop
In this scenario, binary search is ~8x faster when tuples have already been pre-computed.
Scenario 2--the tuples have not been pre-computed.
%%timeit
import bisect
def find_indexes(x, t):
" finds the index of each tuple in list t (assumes x is sorted) "
return [bisect.bisect_left(x, tuple(i)) for i in t]
# Generate tuples (in this case 4, 20)
x = list(get_tuples(4, 20))
indexes = [[ 0, 0, 0, 20],
[ 0, 0, 7, 13],
[ 0, 5, 5, 10],
[ 0, 0, 5, 15],
[ 0, 2, 4, 14]]
y = find_indexes(x, indexes)
100 loops, best of 3: 2.69 ms per loop
#PaulPanzer approach is the same timing in this scenario (92.97 us)
=> #PaulPanzer approach ~29 times faster when the tuples don't have to be computed
Scenario 3--Large number of indexes (#PJORR)
A large number of random indexes is generated
x = list(get_tuples(4, 20))
xnp = np.array(x)
indices = xnp[np.random.randint(0,len(xnp), 2000)]
indexes = indices.tolist()
%timeit find_indexes(x, indexes)
#Result: 1000 loops, best of 3: 1.1 ms per loop
%timeit get_idx(indices)
#Result: 1000 loops, best of 3: 716 µs per loop
In this case, we are #PaulPanzer is 53% faster

Issue with true division with Numpy arrays

Suppose you have this array:
In [29]: a = array([[10, 20, 30, 40, 50], [14, 28, 42, 56, 70], [18, 36, 54, 72, 90]])
Out[30]: a
array([[ 0, 0, 0, 0, 0],
[14, 28, 42, 56, 70],
[18, 36, 54, 72, 90]])
Now divide the third row by the first one (using from future import division)
In [32]: a[0]/a[2]
Out[32]: array([ 0.55555556, 0.55555556, 0.55555556, 0.55555556, 0.55555556])
Now do the same with each row in a loop:
In [33]: for i in range(3):
print a[i]/a[2]
[ 0.55555556 0.55555556 0.55555556 0.55555556 0.55555556]
[ 0.77777778 0.77777778 0.77777778 0.77777778 0.77777778]
[ 1. 1. 1. 1. 1.]
Everything looks right. But now, assign the first array a[i]/a[2] to a[i]:
In [35]: for i in range(3):
a[i]/=a[2]
....:
In [36]: a
Out[36]:
array([[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[1, 1, 1, 1, 1]])
Alright, no problem. Turns out this is by design. Instead, we should do:
In [38]: for i in range(3):
a[i] = a[i]/a[2]
....:
In [39]: a
Out[39]:
array([[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[1, 1, 1, 1, 1]])
But that doesn't work. Why and how can I fix it?
Thanks in advance.
You can cast the whole array to a float array first:
a = a.astype('float')
a /= a[2]
"Why doesn't this work" -- The reason it doesn't work is because numpy arrays have a datatype when they're created. Any attempt to put a different type into that array will be cast to the appropriate type. In other words, when you try to put a float into your integer array, numpy casts the float to an int. The reasoning behind this is because numpy arrays are designed to be a homogonous type in order for them to have optimal performance. Put another way, they're implemented as arrays in C. And in C, you can't have an array where 1 element is a float and the next is an int. (You can have structs which behave like that, but they're not arrays).
Another solution (in addition to the one proposed by #nneonneo) is to specify the array as a float array from the beginning:
a = array([[10, 20, 30, 40, 50], [14, 28, 42, 56, 70], [18, 36, 54, 72, 90]], dtype=float)
It's not the division that's the issue it's the assignment, ie a[i] = ... (which is also used behind the scene when you do a /= ...). Try this:
>>> a = np.zeros(3, dtype='uint8')
>>> a[:] = [2, -3, 5.9]
>>> print a
[ 2 253 5]
When you do intarray[i] = floatarray[i] numpy has to truncate the floating point values to get them to fit into intarray.

Categories

Resources