I have a matrix A = Matrix([[1, 0, 0, 20], [-1, 1, 0, 0], [-2, 1, 0, 0], [0, -1, 1, 0]]), a sympy object.
I want to know if there is a conflicting row - meaning a row that after i reduce the matrix, all the terms in the row are zero, apart from the rightmost one.
This seems easy to do on paper, but I think I misunderstand sympy.
Basically the output from rref method is not what I expected.
Notice that if we row reduce A with pen and paper, we should get Matrix([[1, 0, 0, 20], [0, 1, 0, 20], [0, 0, 0, 20], [0, 0, 1, 20]]) at a certain point.
So row number 2 is a conflicting row.
However when I use A.rref() I get something else entirely. I get Matrix([[1, 0, 0, 0], [0, 1, 0, 0], [0, 0, 1, 0], [0, 0, 0, 1]]) and list <class 'list'>: [0, 1, 2, 3]
I don't understand how they reached this result and how to interpet the list. How can I find the conflicting rows using sympy?
The answer by sympy is correct. The matrix you reached in reducing manually is not the end of the row-reduction process, which explains the difference between your answer and sympy's.
To continue the row-reduction from your matrix, swap rows 2 and 3 (the third and fourth rows), and you get
matrix([
[ 1, 0, 0, 20],
[ 0, 1, 0, 20],
[ 0, 0, 1, 20],
[ 0, 0, 0, 20]])
Now subtract row 3 (the last row) from each of the other rows, then divide that last row by 20, and we get
matrix([
[ 1, 0, 0, 0],
[ 0, 1, 0, 0],
[ 0, 0, 1, 0],
[ 0, 0, 0, 1]])
which is sympy's answer.
There are multiple ways to interpret this result. One way is to think of a system of 4 linear equations in 3 variables--the last column of the matrix hold the constants on the right side of the equations while the other columns are the variable coefficients. Your original matrix represents the equations
x = 20
- x + y = 0
- 2x + y = 0
- y + z = 0
and sympy's row reduction shows this system has the same solutions as
x = 0
y = 0
z = 0
0 = 1
which, of course, has no solutions at all, thanks to the last equation.
Also, you seem to have a misunderstanding of what row-reduction can do. You ask, "How can I find the conflicting rows using sympy?" and "if there is a conflicting row." Row reduction does not find which row conflicts, it finds if the rows together conflict. The rref process cannot show a conflicting row since it swaps rows if needed to get a non-zero pivot value in proper place, so the rows of the starting and the ending matrix do not correspond. Also, it is not true that one row conflicts with the others, just that all the rows together conflict. In your matrix, you could remove any one of the first 3 rows and the result will be non-conflicting. (Removing the last row still has a conflicting matrix.) So which row can you say conflicts? There usually is not one conflicting row, so rref() or any other method cannot possibly find one.
Related
I have a two dimensional list like :
data = [[0,0,0,0,0,1,0,0,0,0], [0,1,0,0,0,0,0,0,0,0]]
How can I access the index of the neighbours, where the value equals 1?
Expected output:
[[4, 5, 6], [0, 1, 2]]
For example, the indices of an array data in first row at value 1 is 5, so I need to access its left and right side neighbour indices like 4 and 6. Same way for row 2.
If I understand description well (please clarify) , maybe you can try this one. Additionally, you can check edge case where there is no 1, or no left or right .
import numpy as np
a = np.array([
[0, 0, 0, 0, 0, 1, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0, 0, 0, 0, 0]])
if __name__ == "__main__":
indices = np.where(a == 1)[1]
indices = indices.reshape(-1,1)
indices = np.concatenate([indices-1,indices,indices+1],-1)
print(indices)
One efficient solution is using FOR loops:
for i in range(2):
for j in range(10):
if a[i][j]==1:
print(str(i)+' '+str(j))
If using lists, here is a one approach which identifies the indexes of the neighbours of 1. As a caveat, this will fail with a index out of range, if the 1 value is the first of last element in the list.
Input:
data = [[0,0,0,0,0,1,0,0,0,0], [0,1,0,0,0,0,0,0,0,0]]
Example:
[[i-1, i, i+1] for sub in data for i, j in enumerate(sub) if j == 1]
Output:
[[4, 5, 6], [0, 1, 2]]
I have a problem where I have a large binary numpy array (1000,2000). The general idea is that the array's columns represent time from 0 to 2000 and each row represents a task. Each 0 in the array represents a failure and each 1 represents success.
What I need to do is select 150 tasks(row axis) out of 1000 available and maximize the total successes (1s) over unique columns. It does not have to be consecutive and we are just looking to maximize success per time period (just need 1 success any additional is extraneous). I would like to select the best "Basket" of 150 tasks. The subarray rows can be taken anywhere from the 1000 initial rows. I want the optimal "Basket" of 150 tasks that lead to the most success across time (columns). (Edited for Additional Clarity)
A real basic example of what the array looks like :
array([[0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 1, 0],
[0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1],
[0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 1, 0],
[1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0],
[1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 0],
[1, 0, 1, 1, 0, 0, 1, 0, 0, 1, 1, 0]])
I have successfully created a Monte Carlo simulation using randomly generated baskets of tasks in NumPy and then going through the array and summing. As you can imagine this takes a while and given the large number of potential combinations it is inefficient. Can someone point me to a algorithm or way to set this problem up in a solver like PuLP?
Try this:
n = 150
row_sums = np.sum(x, axis=1)
top_n_row_sums = np.argsort(row_sums)[-n:]
max_successes = x[top_n_row_sums]
This takes the sum of every row, grabs the indices of the highest n sums, and indexes into x with those row indices.
Note that the rows will end up sorted in ascending order of their sums across the columns. If you want the rows in normal order (ascending order by index), use this instead:
max_successes = x[sorted(top_n_row_sums)]
Why not just calculate the sum of successes for each row and then you can easily pick the top 150 values.
Given a 3 dimensional boolean data:
np.random.seed(13)
bool_data = np.random.randint(2, size=(2,3,6))
>> bool_data
array([[[0, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 1, 0],
[0, 0, 0, 0, 0, 1]],
[[1, 0, 1, 1, 0, 0],
[0, 1, 1, 1, 1, 0],
[1, 1, 1, 0, 0, 0]]])
I wish to count the number of consecutive 1's bounded by two 0's in each row (along axis=1) and return a single array with the tally. For bool_data, this would give array([1, 1, 2, 4]).
Due to the 3D structure of bool_data and the variable tallies for each row, I had to clumsily convert the tallies into nested lists, flatten them using itertools.chain, then back-convert the list into an array:
# count consecutive 1's bounded by two 0's
def count_consect_ones(input):
return np.diff(np.where(input==0)[0])-1
# run tallies across all rows in bool_data
consect_ones = []
for i in range(len(bool_data)):
for j in range(len(bool_data[i])):
res = count_consect_ones(bool_data[i, j])
consect_ones.append(list(res[res!=0]))
>> consect_ones
[[], [1, 1], [], [2], [4], []]
# combines nested lists
from itertools import chain
consect_ones_output = np.array(list(chain.from_iterable(consect_ones)))
>> consect_ones_output
array([1, 1, 2, 4])
Is there a more efficient or clever way for doing this?
consect_ones.append(list(res[res!=0]))
If you use .extend instead, the content of the sequence is appended directly. That saves the step to combine the nested lists afterwards:
consect_ones.extend(res[res!=0])
Furthermore, you could skip the indexing, and iterate over the dimensions directly:
consect_ones = []
for i in bool_data:
for j in i:
res = count_consect_ones(j)
consect_ones.extend(res[res!=0])
We could use a trick to pad the columns with zeros and then look for ramp-up and ramp-down indices on a flattened version and finally filter out the indices corresponding to the border ones to give ourselves a vectorized solution, like so -
# Input 3D array : a
b = np.pad(a, ((0,0),(0,0),(1,1)), 'constant', constant_values=(0,0))
# Get ramp-up and ramp-down indices/ start-end indices of 1s islands
s0 = np.flatnonzero(b[...,1:]>b[...,:-1])
s1 = np.flatnonzero(b[...,1:]<b[...,:-1])
# Filter only valid ones that are not at borders
n = b.shape[2]
valid_mask = (s0%(n-1)!=0) & (s1%(n-1)!=a.shape[2])
out = (s1-s0)[valid_mask]
Explanation -
The idea with padding zeros at either ends of each row as "sentients" is that when we get one-off sliced array versions and compare, we could detect the ramp-up and ramp-down places with b[...,1:]>b[...,:-1] and b[...,1:]<b[...,:-1] respectively. Thus, we get s0 and s1 as the start and end indices for each of the islands of 1s. Now, we don't want the border ones, so we need to get their column indices traced back to the original un-padded input array, hence that bit : s0%(n-1) and s1%(n-1). We need to remove all cases where the start of each island of 1s are at the left border and end of each island of 1s at the right side border. The starts and ends are s0 and s1. So, we use those to check if s0 is 0 and s1 is a.shape[2]. These give us the valid ones. The island lengths are obtained with s1-s0, so mask it with valid-mask to get our desired output.
Sample input, output -
In [151]: a
Out[151]:
array([[[0, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 1, 0],
[0, 0, 0, 0, 0, 1]],
[[1, 0, 1, 1, 0, 0],
[0, 1, 1, 1, 1, 0],
[1, 1, 1, 0, 0, 0]]])
In [152]: out
Out[152]: array([1, 1, 2, 4])
Say I have the following 3D array:
L=np.arange(18).reshape((2,3,3))
L[:,:,1] = 0; L[:,[0,1],:] = 0
In []: L
Out[]:
array([[[ 0, 0, 0],
[ 0, 0, 0],
[ 6, 0, 8]],
[[ 0, 0, 0],
[ 0, 0, 0],
[15, 0, 17]]])
where zero columns in L[0,:] are always matched by corresponding zero columns in L[1,:].
I want to now remove the middle columns where the sum along the axis equals 0 (ignoring rows of zero. My current clumsy approach is
l=np.nonzero(L.sum(axis=1))[1]
In []: L[:,:,l[:len(l)/2]]
Out[]:
array([[[ 0, 0],
[ 0, 0],
[ 6, 8]],
[[ 0, 0],
[ 0, 0],
[15, 17]]])
What is a less roundabout way of doing this?
We can look for all zeros along the first two axes and use that for masking out those from the third axis -
L[:,:,~(L==0).all(axis=(0,1))]
Alternatively, using any() to replace ~all() -
L[:,:,(L!=0).any(axis=(0,1))]
We can use the ellipsis notation ... to replace :,: and also skip the arg axis to give us a compact version -
L[...,~(L==0).all((0,1))]
L[...,(L!=0).any((0,1))]
More on how ellipsis works for NumPy arrays, here.
For the sum part of the question, it would be similar -
L[...,L.sum((0,1))!=0]
I know numpy.where gives a tuple of the array coordinates where the condition applies. But what if I want an array?
assume the following 2d array:
a=np.array([[1 1 1 1 0],
[1 1 1 0 0],
[1 0 0 0 0],
[1 0 1 1 1],
[1 0 0 1 0]])
Now what I want is only the first occurrence of zeros, but for every row, even if it doesn't exist. Something like indexOf() in Java. So the output look like:
array([-1,2,2,1,0])
I need to cut pieces of an ndarray and it would be much easier to reduce a dimension rather than having a tuple and try to regenerate the missing rows.
Is this what you are looking for?
import numpy as np
a=np.array([[1, 1, 1, 1, 0],
[1, 1, 1, 0, 0],
[1, 0, 0, 0, 0],
[1, 0, 1, 1, 1],
[1, 0, 0, 1, 0]])
np.argmax(a==0, axis=0) - ~np.any(a==0, axis=0)
Output:
array([-1, 2, 2, 1, 0], dtype=int64)
The idea here is that np.argmax finds the index of the first matching element in each column (axis=0 for columns, which appears to be what you want in the output, but if you actually want rows, use axis=1). Because np.argmax returns 0 for columns that do not match at all, I subtract 1 from the result for each column that doesn't contain any 0.
Here is a less crafty solution but arguably easier to undestand.
First finds all matches and then creates an array with the first element of the matches and -1 if len == 0.
a=np.array([[1,1,1,1,0],
[1,1,1,0,0],
[1,0,0,0,0],
[1,0,1,1,1],
[1,0,0,1,0]])
matches = [np.where(np.array(i)==0)[0] for i in a.T]
np.array([i[0] if len(i) else -1 for i in matches]) # first occurence, else -1
array([-1, 2, 2, 1, 0])