I have an input matrix that is of unknown n x m dimensions that is populated by 1s and 0s
For example, a 5x4 matrix:
A = array(
[[1, 0, 0, 0],
[1, 0, 0, 0],
[0, 1, 1, 0],
[0, 1, 1, 0],
[1, 0, 1, 1]])
Goal
I need to create a 1 : 1 map between as many columns and rows as possible, where the element at that location is 1.
What I mean by a 1 : 1 map is that each column and row can be used once at most.
the ideal solution has the most mappings possible ie. the most rows and columns used. It should also avoid exhaustive combinations or operations that do not scale well with larger matrices (practically, maximum dimensions should be 100x100, but there is no declared limit so they could go higher)
Here's a possible outcome of the above
array([[ 1., 0., 0., 0.],
[ 0., 0., 0., 0.],
[ 0., 0., 1., 0.],
[ 0., 1., 0., 0.],
[ 0., 0., 0., 1.]])
Some more Examples:
input:
0 1 1
0 1 0
0 1 1
output (one of several possible ones):
0 0 1
0 1 0
0 0 0
another (this shows one problem that can arise)
input:
0 1 1 1
0 1 0 0
1 1 0 0
a good output (again, one of several):
0 0 1 0
0 1 0 0
1 0 0 0
a bad output (still valid, but has fewer mappings)
0 1 0 0
0 0 0 0
1 0 0 0
to better show how their can be multiple outputs
input:
0 1 1
1 1 0
one possible output:
0 1 0
1 0 0
a second possible output:
0 0 1
0 1 0
a third possible output
0 0 1
1 0 0
What have I done?
I have a really dumb way of handling it right now which is not at all guaranteed to work. Basically I just build a filter matrix out of an identity matrix (because its the perfect map, every row and every column are used once) and then I randomly swap its columns (n times) and filter the original matrix with it, recording the filter matrix with the best results.
My [non] solution:
import random
import numpy as np
# this is a starting matrix with random values
A = np.array(
[[1, 0, 0, 0],
[1, 0, 0, 0],
[0, 1, 1, 0],
[0, 1, 1, 0],
[1, 0, 1, 1]])
# add dummy row to make it square
new_col = np.zeros([5,1]) + 1
A = np.append(A, new_col, axis=1)
# make an identity matrix (the perfect map)
imatrix = np.diag([1]*5)
# randomly swap columns on the identity matrix until they match.
n = 1000
# this will hold the map that works the best
best_map_so_far = np.zeros([1,1])
for i in range(n):
a, b = random.sample(range(5), 2)
t = imatrix[:,a].copy()
imatrix[:,a] = imatrix[:,b]
imatrix[:,b] = t
# is this map better than the previous best?
if sum(sum(imatrix * A)) > sum(sum(best_map_so_far)):
best_map_so_far = imatrix
# could it be? a perfect map??
if sum(sum(imatrix * A)) == A.shape[0]:
break
# jk.
# did we still fail
if sum(sum(imatrix * A)) != 5:
print('haha')
# drop the dummy row
output = imatrix * A
output[:,:-1]
#... wow. it actually kind of works.
How about this?
let S be the solution vector, as wide as A, containing row numbers.
let Z be a vector containing the number of zeros in each column.
for each row:
select the cells which contain 1 in A and no value in S.
select from those cells those with the highest score in Z.
select from those cells the first (or a random) cell.
store the row number in the column of S corresponding to the cell.
Does that give you a sufficient solution? If so it should be much more efficient than what you have.
Let me give it a go. The algorithm I suggest will not always give the optimal solution, but maybe somebody can improve it.
You can always interchange two columns or two rows without changing the problem. Further, by keeping track of the changes you can always go back to the original problem.
We are going to fill the main diagonal with 1s as far as it will go. Get the first 1 in the upper left corner by interchanging columns, or rows, or both. Now the first row and column are fixed and we don't touch them anymore. We now try to fill in the second element on the diagonal with 1, and then fix the second row and column. And so on.
If the bottom right submatrix is zero, we should try to bring a 1 there by interchanging two columns or two rows using the whole matrix but preserving the existing 1s in the diagonal. (Here lies the problem. It is easy to check efficiently if one interchange can help. But it could be that at least two interchanges are required, or maybe more.)
We stop when no more 1s can be obtained on the diagonal.
So, while the algorithm is not always optimal, maybe it is possible to come up with extra rules how to interchange columns and rows so as to populate the diagonal with 1s as far as possible.
Related
Imagine I am given a n by n matrix of just 1s and 0s. My goal is to run a 2x2 scan and count the frequency of 1 bits in the 2x2 window and store the frequency in a resulting array. Take it that the given parameters are only the row length, col length and an array of [r,c] coordinates for the 1 bits in the matrix.
I am thinking of how I can optimize this. I had an idea to store the number of 1 bits per column in an array which will help in the counting as I shift the window. But is there a better way? Thanks!
Test case:
[[ 1, 0, 0, 1], [ 0, 1, 0, 1], [ 1, 0, 0, 1]]
should result in [2, 4, 0, 0], where there are 2 sub matrices with 1 1bits in a window and 4 sub matrices with 2 1 bits in a window.
As bit manipulation is much faster than array manipulation you can consider:
Make your array a one dimensional byte array.
1,0,0,1 = 9
0,1,0,1 = 5
1,0,0,1 = 9
AND 1st row with 3 (2 bits) = 1
AND 2nd row with 3 (2 bits) = 1
0101
0011 AND
---- =
0001
shift 2 to left
0001 -> 0100 = 4
add them together = 5, 2 bits are set in your 2x2 scan
loop this, top to bottom (2nd & 3rd row, etx) get all rows.
Next shift all rows 1 to right and repeat actions above
I have a data frame, df, representing a correlation matrix, with this heatmap with example extrema. Every point has, obviously, (x,y,value):
I am looking into getting the local extrema. I looked into argrelextrema, I tried it on individual rows and the results were as expected, but that didn't work for 2D. I have also looked into scipy.signal.find_peaks, but this is for a 1D array.
Is there anything in Python that will return the local extrema over/under certain values(threshold)?
Something like an array of (x, y, value)? If not then can you point me in the right direction?
This is a tricky question, because you need to carefully define the notion of how "big" a maximum or minimum needs to be before it is relevant. For example, imagine that you have a patch containing the following 5x5 grid of pixels:
im = np.array([[ 0 0 0 0 0
0 5 5 5 0
0 5 4 5 0
0 5 5 5 0
0 0 0 0 0. ]])
This might be looked at as a local minimum, because 4 is less than the surrounding 5s. OTOH, it might be looked at as a local maximum, where the single lone 4 pixel is just "noise", and the 3x3 patch of average 4.89-intensity pixels is actually a single local maximum. This is commonly known as the "scale" at which you are viewing the image.
In any case, you can estimate the local derivative in one direction by using a finite difference in that direction. The x direction might be something like:
k = np.array([[ -1 0 1
-1 0 1
-1 0 1. ]])
Applying this filter to the image patch defined above gives:
>>> cv2.filter2D(im, cv2.CV_64F, k)[1:-1,1:-1]
array([[ 9., 0., -9.],
[ 14., 0., -14.],
[ 9., 0., -9.]])
Applying a similar filter in the y direction will transpose this. The only point in here with a 0 in both the x and the y directions is the very middle, which is the 4 that we decided was a local minimum. This is tantamount to checking that the gradient is 0 in both x and y.
This whole process can be extended to find the larger single local maximum that we have identified. You'll use a larger filter, e.g.
k = np.array([[ -2, -1, 0, 1, 2],
[ -2, -1, 0, 1, 2], ...
Since the 4 makes the local maximum an approximate thing, you'll need to use some "approximate" logic. i.e. you'll look for values that are "close" to 0. Just how close depends on just how fudgy you are willing to allow the local extrema to be. To sum up, the two fudge factors here are 1. filter size and 2. ~=0 fudge factor.
First of all thank you for any support. This is my first question published as usually my doubts are solved reading through other user's questions.
Here is my question: I have a number (n) of sets with common elements. These elements are usually added sequentially creating new sets although I do not have the sequence and this is what I am trying to find. The sequence is not always perfect and at some points I have to find the closest one with some uncertainty when the sequence is not 'perfect'.
I coded it using theory of Sets searching sequentially the set that contains all the other sets and when I do not reach the last set then I start from the smallest to the bigger.
I gave some thoughts to the topic and I found, in theory, a more robust and generic approach. The idea is to build a square matrix with the n sets as row index (i) and the n sets as column index (j). The element i,j will be equal to 1 when set j is contained in i.
Here I have an example with sets A to G:
A={a, b, c, d1, d2, e, f};
B={b, c, d1, d2, e, f};
C={c, d1, d2, e, f};
D={d1, f, g};
E={d2, f, g};
F={f, g};
G={g};
If I create the matrix assuming sequence B, E, C, F, D, A, G, I would have:
B E C F D A G
B 1 1 1 1 1 0 1
E 0 1 0 1 0 0 1
C 0 1 1 1 1 0 1
F 0 0 0 1 0 0 1
D 0 0 0 1 1 0 1
A 1 1 1 1 1 1 1
G 0 0 0 0 0 0 1
I should get this matrix transformed into following matrix:
A B C D E F G
A 1 1 1 1 1 1 1
B 0 1 1 1 1 1 1
C 0 0 1 1 1 1 1
D 0 0 0 1 0 1 1
E 0 0 0 0 1 1 1
F 0 0 0 0 0 1 1
G 0 0 0 0 0 0 1
Which shows one of the two possible sequence: A, B, C, D, E, F, G
Here I add a picture as I am not sure matrix are shown clearly.
My first question is how you recommend to handle this matrix (which kind of data type should I use with typical functions to swap rows and columns).
And my second question is if there is already a matrix transformation function for this topic.
From my (small) experience, most used types for matrices are lists and numpy.ndarrays.
For columns swaps in particular, I would recommend numpy. There are many array creation routines in numpy. You either give the list with data explicitly or you create an array based on the shape you want. Example
>>> import numpy as np
>>> np.array([1, 2, 3])
array([1, 2, 3])
>>> np.array([[1, 2, 3], [1, 2, 3]])
array([[1, 2, 3],
[1, 2, 3]])
>>> np.zeros((2, 2))
array([[0., 0.],
[0., 0.]])
np.zeros accepts a shape as an argument (number of rows and columns for matrices). Of course, you can create arrays with how many dimensions you want.
numpy is quite complex regarding indexing its arrays. For a matrix you have:
>>> a = np.arange(6).reshape(2, 3)
>>> a
array([[0, 1, 2],
[3, 4, 5]])
>>> a[0] # row indexing
array([0, 1, 2])
>>> a[1, 1] # element indexing
4
>>> a[:, 2] # column indexing
array([2, 5])
Hopefully the examples are self-explanatory. Regarding the column index, : means "over all the values". So you specify a column index and the fact that you want all the values on that column.
For swapping rows and columns it's pretty short:
>>> a = np.arange(6).reshape(2, 3)
>>> a
array([[0, 1, 2],
[3, 4, 5]])
>>> a[[0, 1]] = a[[1, 0]] # row swapping
>>> a
array([[3, 4, 5],
[0, 1, 2]])
>>> a[:, [0, 2]] = a[:, [2, 0]] # column swapping
>>> a
array([[5, 4, 3],
[2, 1, 0]])
Here advance indexing is used. Each dimension (called axis by numpy) can accept a list of indices. So you can get 2 or more rows/columns at the same time from a matrix.
You don't have to ask for them in a certain order. numpy gives you the values in the order you ask for them.
Swapping rows is done by asking numpy for the two rows in reversed order and saving them in their original positions. It actually respects the pythonic way of swapping values between 2 variables (although surrounded by a complex frame):
a, b = b, a
Regarding matrix transformation, it depends on what you are looking for.
Using the swapping ideas from I made my own functions to find all the swapping to do to get the triangular matrix.
Here I write the code:
`def simple_sort_matrix(matrix):
orden=np.array([i for i in range(len(matrix[0]))])
change=True
while change:
rows_index=row_index_ones(matrix)
change=False
#for i in range(len(rows_index)-1):
i=0
while i
def swap_row_and_column(matrix,i,j):
matrix[[i, j]] = matrix[[j, i]] # row swapping
matrix[:, [i, j]] = matrix[:, [j, i]] # column swapping
return matrix
def row_index_ones(matrix):
return(np.sum(matrix,axis=1))`
Best regards,
Pablo
I'm trying to generate a column that would have zeros everywhere except when a specific condition is met.
Right now, I have an existing series of 0s and 1s saved as a Series object. Let's call this Series A. I've created another series of the same size filled with zeros, let's call this Series B. What I'd like to do is, whenever I hit the last 1 in a sequence of 1s in Series A, then the next six rows of Series B should replace the 0s with 1s.
For example:
Series A
0
0
0
0
1
1
0
0
0
0
0
0
0
0
0
0
1
1
1
0
0
0
0...
Should produce Series B
0
0
0
0
0
0
1
1
1
1
1
1
0
0
0
0
0
0
0
1
1
1
1...
Here's what I've tried so far:
for row in SeriesA:
if row == 1:
continue
if SeriesA[row] == 1 and SeriesA[row + 1] == 0:
SeriesB[row]=1
SeriesB[row+1]=1
SeriesB[row+2]=1
SeriesB[row+3]=1
SeriesB[row+4]=1
SeriesB[row+5]=1
However, this just generates Series B full of zeros except for the first five rows with become 1s. (Series A is all zeros until at least row 50)
I think I'm not understanding how iterating works with Pandas, so any help is appreciated!
EDIT: Full(ish) code
import os
import numpy as np
import pandas as pd
df = pd.read_csv("Python_Datafile.csv", names = fields) #fields is a list with names for each column, the first column is called "Date".
df["Date"] = pd.to_datetime(df["Date"], format = "%m/%Y")
df.set_index("Date", inplace = True)
Recession = df["NBER"] # This is series A
Rin6 = Recession*0 # This is series B
gps = Recession.ne(Recession.shift(1)).where(Recession.astype(bool)).cumsum()
idx = Recession[::-1].groupby(gps).idxmax()
to_one = np.hstack(pd.date_range(start=x+pd.offsets.DateOffset(months=1), freq='M', periods=6) for x in idx)
Rin6[Rin6.index.isin(to_one)]= 1
Rin6.unique() # Returns -> array([0], dtype=int64)
You can create an ID for consecutive groups of 1s using .shift + .cumsum:
gps = s.ne(s.shift(1)).where(s.astype(bool)).cumsum()
Then you can get the last index for each group by:
idx = s[::-1].groupby(gps).idxmax()
#0
#1.0 5
#2.0 18
#Name: 0, dtype: int64
Frorm the list of all indices with np.hstack
import numpy as np
np.hstack(np.arange(x+1, x+7, 1) for x in idx)
#array([ 6, 7, 8, 9, 10, 11, 19, 20, 21, 22, 23, 24])
And set those indices to 1 in the second Series:
s2[np.hstack(np.arange(x+1, x+7, 1) for x in idx)] = 1
s2.ravel()
# array([0., 0., 0., 0., 0., 0., 1., 1., 1., 1., 1., 1., 0., 0., 0., 0., 0.,..
Update from your comment: Assuming you have a Series s whose indices are datetimes, and another Series s2 which has the same indices but all values are 0 and they have the MonthStart frequency, you can proceed in a similar fasion:
s = pd.Series([0,0,0,0,0,0,0,0,0,1,1]*5, index=pd.date_range('2010-01-01', freq='MS', periods=55))
s2 = s*0
gps = s.ne(s.shift(1)).where(s.astype(bool)).cumsum()
idx = s[::-1].groupby(gps).idxmax()
#1.0 2010-11-01
#2.0 2011-10-01
#3.0 2012-09-01
#4.0 2013-08-01
#5.0 2014-07-01
#dtype: datetime64[ns]
to_one = np.hstack(pd.date_range(start=x+pd.offsets.DateOffset(months=1), freq='MS', periods=6) for x in idx)
s2[s2.index.isin(to_one)]= 1
# I check .isin in case the indices extend beyond the indices in s2
Let's say I have a square matrix as input:
array([[0, 1, 1, 0],
[1, 1, 1, 1],
[1, 1, 1, 1],
[0, 1, 1, 0]])
I want to count the nonzeros in the array after removal of rows 2 and 3 and cols 2 and 3. Afterwards I want to do the same for rows 3 and 4 and cols 3 and 4. Hence the output should be:
0 # when removing rows/cols 2 and 3
3 # when removing rows/cols 3 and 4
Here is the naive solution using np.delete:
import numpy as np
a = np.array([[0,1,1,0],[1,1,1,1],[1,1,1,1],[0,1,1,0]])
np.count_nonzero(np.delete(np.delete(a, (1,2), axis=0), (1,2), axis=1))
np.count_nonzero(np.delete(np.delete(a, (2,3), axis=0), (2,3), axis=1))
But np.delete returns a new array. Is there a faster method, which involves deleting rows and columns simultaneously? Can masking be used? The documentation on np.delete reads:
Often it is preferable to use a boolean mask.
How do I go about doing that? Thanks.
Instead of deleting the columns and rows you don't want, it is easier to select the ones you do want. Also note that it is standard to start counting rows and columns from zeros. To get your first example, you thus want to select all elements in rows 0 and 3 and in rows 0 and 3. This requires advanced indexing, for which you can use the ix_ utility function:
In [25]: np.count_nonzero(a[np.ix_([0,3], [0,3])])
Out[25]: 0
For your second example, you want to select rows 0 and 1 and columns 0 and 1, which can be done using basic slicing:
In [26]: np.count_nonzero(a[:2,:2])
Out[26]: 3
There is no need to modify your original array by deleting rows/columns, in order to count the number of non zero elements. Simply use indexing,
a = np.array([[0,1,1,0],[1,1,1,1],[1,1,1,1],[0,1,1,0]])
irows, icols = np.indices(a.shape)
mask = (irows!=2)&(irows!=3)&(icols!=2)&(icols!=3)
np.count_nonzero(a[mask])