I thought that ~ should make 1s into 0s and vice-versa. I used it in my code, yet I'm getting -2s and -1s.
def inverse_graph(graph):
# for each vertex in graph
revg = list(graph)
for i, line in enumerate(revg):
for j, vertex in enumerate(line):
if i != j:
# flip value
graph[i][j] = ~ graph[i][j]
#if vertex == 0:
# graph[i][j] = 1;
#else:
# graph[i][j] = 0;
return revg
def test():
g1 = [[0, 1, 1, 0],
[1, 0, 0, 1],
[1, 0, 0, 1],
[0, 1, 1, 0]]
assert inverse_graph(g1) == [[0, 0, 0, 1],
[0, 0, 1, 0],
[0, 1, 0, 0],
[1, 0, 0, 0]]
g2 = [[0, 1, 1, 1],
[1, 0, 1, 1],
[1, 1, 0, 1],
[1, 1, 1, 0]]
assert inverse_graph(g2) == [[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0]]
.............................................................................................................................................................................................................
Actually, -2 is ~1. Two's complement, remember?
>>> bin(1)
'0b1'
>>> bin(~1)
'-0b10'
The thing is: you're not using bits, but integers. So either you want to revert to using e.g. Booleans (which read less nicely) or you want to use some expression like 0 if x else 1 to flip your elements.
Tip: you can use comprehensions to write this more elegantly:
>>> flipped = lambda graph: [ [0 if x else 1 for x in row] for row in graph]
>>> flipped( [ [1, 0], [0, 1] ] )
[[0, 1], [1, 0]]
As xtofl has pointed out, Python's integers use Two's complement representation. This means that the bitwise inverse of 0 is not 1, but an infinitely long sequence of binary 1s, which is interpreted as -1. The inverse of 1 is not 0, but an infinite number of ones, followed by one zero (which is -2).
Of course, the number of bits stored for each integer is not infinite. Python will normally use the C integer type long that your system defines (it is usually 32 or maybe 64 bits long), but operations that would overflow will instead automatically switch to Python's own arbitrary precision long type if the value is too large to fit (this conversion is handled transparently within the int type in Python 3).
Anyway, an alternative solution is to use:
graph[i][j] = 1 - graph[i][j]
Or, if you don't mind the values becoming instances of the int subtype bool:
graph[i][j] = not graph[i][j]
Python's bool values are still usable as numbers (False works just like 0 and True is just like 1). The only real difference is that they'll print out with text instead of digits.
With numpy it is much easier.
>>> import numpy as np
>>> g1=np.array([[0, 1, 1, 0],
... [1, 0, 0, 1],
... [1, 0, 0, 1],
... [0, 1, 1, 0]])
>>> g2=1-g1
>>> g2
array([[1, 0, 0, 1],
[0, 1, 1, 0],
[0, 1, 1, 0],
[1, 0, 0, 1]])
~ will work with Boolean datatype:
>>> g1=np.array([[0, 1, 1, 0], # 1 represents True and 0 represents False
... [1, 0, 0, 1],
... [1, 0, 0, 1],
... [0, 1, 1, 0]], dtype=bool)
>>> ~g1
array([[ True, False, False, True],
[False, True, True, False],
[False, True, True, False],
[ True, False, False, True]], dtype=bool)
If you want complement(~) in 0s and 1s rather than in True False, this will do the trick:
>>> ~g1+0
array([[1, 0, 0, 1],
[0, 1, 1, 0],
[0, 1, 1, 0],
[1, 0, 0, 1]])
While the others are right, an easier way of saying 0 if x else 1 is not x or maybe int(not x).
not x returns False if x != 0 and True otherwise. False and True are bools, which is a subclass of int. They can readily be used in a calculation, but for the case you prefer to get proper 0s and 1s, or if you need them for indexing a dict, int(not x) might be better.
Related
Is there a way to do linear algebra and matrix manipulation in a finite field in Python? I need to be able to find the null space of a non-square matrix in the finite field F2. I currently can't find a way to do this. I have tried the galois package, but it does not support the scipy null space function. It is easy to compute the null space in sympy, however I do not know how to work in a finite field in sympy.
I'm the author of the galois library you mentioned. As noted by other comments, this capability is easy to add, so I added it in galois#259. It is now available in v0.0.24 (released today 02/12/2022).
Here is the documentation for computing the null space FieldArray.null_space() that you desire.
Here's an example computing the row space and left null space.
In [1]: import galois
In [2]: GF = galois.GF(2)
In [3]: m, n = 7, 3
In [4]: A = GF.Random((m, n)); A
Out[4]:
GF([[1, 1, 0],
[0, 0, 0],
[1, 0, 0],
[1, 1, 1],
[0, 0, 1],
[1, 1, 1],
[0, 1, 0]], order=2)
In [5]: R = A.row_space(); R
Out[5]:
GF([[1, 0, 0],
[0, 1, 0],
[0, 0, 1]], order=2)
In [6]: LN = A.left_null_space(); LN
Out[6]:
GF([[1, 0, 0, 0, 1, 1, 0],
[0, 1, 0, 0, 0, 0, 0],
[0, 0, 1, 0, 1, 1, 1],
[0, 0, 0, 1, 0, 1, 0]], order=2)
# The left null space annihilates the rows of A
In [7]: LN # A
Out[7]:
GF([[0, 0, 0],
[0, 0, 0],
[0, 0, 0],
[0, 0, 0]], order=2)
# The dimension of the row space and left null space sum to m
In [8]: R.shape[0] + LN.shape[0] == m
Out[8]: True
Here's the column space and null space.
In [9]: C = A.column_space(); C
Out[9]:
GF([[1, 0, 0, 0, 1, 0, 1],
[0, 0, 1, 0, 0, 0, 1],
[0, 0, 0, 1, 1, 1, 0]], order=2)
In [10]: N = A.null_space(); N
Out[10]: GF([], shape=(0, 3), order=2)
# If N has dimension > 0, then A # N.T == 0
In [11]: C.shape[0] + N.shape[0] == n
Out[11]: True
That's how I would approach as well.
Null space for floating point numbers is usually implemented using SVD or some other robust algorithm, for your GF(2) field it you can simply use a gaussian elimination, since there is no rounding.
Here goes an example
import numpy as np
import galois
# Initialize GF(2) and a random matrix to serve as an example
M,N = 7, 4
GF2 = galois.GF(2)
A = GF2.Random((M, N))
# B is an augmented matrix [A | I]
B = GF2.Zeros((M, M+N));
B[:, :N] = A
for i in range(M):
B[i, N+i] = 1;
for i in range(M):
B[i, N+i] = 1;
# Run gaussian elimination
k = 0;
for j in range(N):
i = j;
for i in range(k, M):
if B[i,j] != 0:
if i != j:
B[[i,k],:] = B[[k,i],:]
break;
if B[k,j] == 0:
continue;
for i in range(j+1, M):
if B[i,j]:
B[i,j:] += B[k,j:];
k += 1;
C = B[k:, N:]
# C should be the left null space of A
C # A # should be zero
Consider the following toy array a:
a = np.array([[1074279, 937077, 1445858, 1679465],
[1074280, 1023600, 1679465, 937077],
[1074281, 908450, 1932761, 1100360],
[1074282, 1445858, 893656, 908183],
[1074283, 1958030, 1932761, 1445858]])
The first column is an identifier.
How can I transform the array in a way that shows when an identifier is related
to another? A relation exists if two identifiers have in common at least one
value in columns 2-4 of a.
The end result should be the array b below:
b = np.array([[1, 1, 0, 1, 1],
[1, 1, 0, 0, 0],
[0, 0, 1, 0, 1],
[1, 0, 0, 1, 1],
[1, 0, 1, 1, 1]])
This can perhaps better be understood as follows:
1074279 1074280 1074281 1074282 1074283
1074279 1 1 0 1 1
1074280 1 1 0 0 0
1074281 0 0 1 0 1
1074282 1 0 0 1 1
1074283 1 0 1 1 1
I have tried (double) looping over elements to find all the combinations and
then reduce that to the desired array but I cannot get it right.
Outer-equality does the job for a vectorized solution -
In [90]: np.equal.outer(a[:,1:],a[:,1:]).any(axis=(1,3)).view('i1')
Out[90]:
array([[1, 1, 0, 1, 1],
[1, 1, 0, 0, 0],
[0, 0, 1, 0, 1],
[1, 0, 0, 1, 1],
[1, 0, 1, 1, 1]], dtype=int8)
Explanation
Basically, we are performing pairwise equality comparison for all rows and within each row pairwise equality comparison with np.equal.outer(..). The equality comparison is a 4D array. Thus, for the slice a[:,1:] being (m,n) shaped, would give us a equality comparison array of shape (m,n,m,n). So, then we ANY reduce it along the axes - 1 and 3 to give us a 2D boolean array of shape (m,m) and that's our final output after conversion to an int array.
An alternative with explicit dimension-expansion would be -
In [92]: (a[:,1:,None,None]==a[:,1:]).any(axis=(1,3)).view('i1')
Out[92]:
array([[1, 1, 0, 1, 1],
[1, 1, 0, 0, 0],
[0, 0, 1, 0, 1],
[1, 0, 0, 1, 1],
[1, 0, 1, 1, 1]], dtype=int8)
So, the only change is that we are adding new axes for the first version of the slice with None/np.newaxis to create a 4D version. This is then compared against the original 2D version to result in the 4D equality compared boolean array.
A simpler classic solution that is easily understandable:
def has_in_common(a1, a2):
"""
#param a1, a2: two input arrays
#returns True if a1 and a2 has at least one value in common, otherwise False
"""
for v1 in a1[1:]:
for v2 in a2[1:]:
if v1 == v2:
return True
return False
def relation_matrix(a):
"""
#param a: an input array
#returns m a matrix specifying the relationship between the rows of a
ex: a = [[1074279, 937077, 1445858, 1679465],
[1074280, 1023600, 1679465, 937077],
[1074281, 908450, 1932761, 1100360],
[1074282, 1445858, 893656, 908183],
[1074283, 1958030, 1932761, 1445858]]
m = [[1, 1, 0, 1, 1],
[1, 1, 0, 0, 0],
[0, 0, 1, 0, 1],
[1, 0, 0, 1, 1],
[1, 0, 1, 1, 1]]
more precisely
m = 1074279 1074280 1074281 1074282 1074283
1074279 1 1 0 1 1
1074280 1 1 0 0 0
1074281 0 0 1 0 1
1074282 1 0 0 1 1
1074283 1 0 1 1 1
"""
m = np.zeros((a.shape[0], a.shape[0]))
for i in range(len(a)):
for j in range(len(a)):
if has_in_common(a[i], a[j]):
m[i, j] = 1
return m.astype('int')
Demo:
In [1]:relation_matrix(a)
Out[1]:
array([[1, 1, 0, 1, 1],
[1, 1, 0, 0, 0],
[0, 0, 1, 0, 1],
[1, 0, 0, 1, 1],
[1, 0, 1, 1, 1]])
In the code that I am writing, I have three 2D numpy arrays with the same dimensions (m x n), with each 2D array containing info about a specific trait, but each corresponding cell (with a specific row/col value) across all three 2D arrays corresponding to a specific person. The three 2D arrays are trait1, trait2, and trait3. As an example, person (0, 0) will have traits 1, 2, but not three, if only trait1 and trait2 have a value of 1 at location (0,0), but trait3 does not.
What would be an efficient method of updating a 2D array at a specific location based on the values of other corresponding 2D arrays of the same dimension at the same location? That is, how can I efficiently update a 2D array at a specific location such that the other 2D arrays at this same location fulfill specific conditions?
I am currently trying to update the values of the 2D array trait1 and trait2 according to the current values of trait1 and trait2 (such that the corresponding trait1 value == 1, and the corresponding trait2 value == 0); I am also trying to update the values of trait3 according to the current values of trait1, and trait2 (under the same conditions as the previous). However, I am having trouble doing this without using nested for loops, which greatly slows down my program.
Below is my current approach, which works, but is much too slow for my purposes:
for i in range (0, m):
for j in range (0, n):
if trait1[i][j] == 1:
if trait2[i][j] == 0:
trait1[i][j] = 0
trait2[i][j] = 1
new_color(i, j, 1) #updates the color of the specific person on a grid
trait3[i][j] = 0
elif trait1[i][j] == 0:
if trait2[i][j] <= 0:
trait1[i][j] = 1
trait2[i][j] = 0
new_color(i, j, 0)
Numpy array are really slow if you use loop indeed. If you can use matrices operations / numpy function for everything, it will go much faster.
In your case, you could first extract the indices you're interested about, and then update your matrices like this:
import numpy as np
np.random.seed(1)
# Generate some sample data
trait1, trait2, trait3 = ( np.random.randint(0,2, [4,4]) for _ in range(3) )
In [4]: trait1
Out[4]:
array([[1, 1, 0, 0],
[1, 1, 1, 1],
[1, 0, 0, 1],
[0, 1, 1, 0]])
In [5]: trait2
Out[5]:
array([[0, 1, 0, 0],
[0, 1, 0, 0],
[1, 0, 0, 0],
[1, 0, 0, 0]])
In [6]: trait3
Out[6]:
array([[1, 1, 1, 1],
[1, 0, 0, 0],
[1, 1, 1, 1],
[1, 1, 0, 1]])
And then:
cond1_idx = np.where((trait1 == 1) & (trait2==0))
cond2_idx = np.where((trait1 == 0) & (trait2<=0))
trait1[cond1_idx] = 0
trait2[cond1_idx] = 1
trait3[cond1_idx] = 0
[ new_color(i, j, 1) for i,j in zip(*cond1_idx) ]
trait1[cond2_idx] = 1
trait2[cond2_idx] = 0
[ new_color(i, j, 0) for i,j in zip(*cond2_idx) ]
Result:
In [2]: trait1
Out[2]:
array([[0, 1, 1, 1],
[0, 1, 0, 0],
[1, 1, 1, 0],
[0, 0, 0, 1]])
In [3]: trait2
Out[3]:
array([[1, 1, 0, 0],
[1, 1, 1, 1],
[1, 0, 0, 1],
[1, 1, 1, 0]])
In [4]: trait3
Out[4]:
array([[0, 1, 1, 1],
[0, 0, 0, 0],
[1, 1, 1, 0],
[1, 0, 0, 1]])
I cannot really test the new_color though since I don't have the function
I'm working on a Monte Carlo radiative transfer code, which simulates firing photons through a medium and statistically modelling their random walk. It runs slowly firing one photon at a time, so I'd like to vectorize it and run perhaps 1000 photons at once.
I have divided my slab through which the photons are passing into nlayers slices between optical depth 0 and depth. Effectively, that means that I have nlayers + 2 regions (nlayers plus the region above the slab and the region below the slab). At each step, I have to keep track of which layers each photon passes through.
Let's suppose that I already know that two photons start in layer 0. One takes a step and ends up in layer 2, and the other takes a step and ends up in layer 6. This is represented by an array pastpresent that looks like this:
[[ 0 2]
[ 0 6]]
I want to generate an array traveled_through with (nlayers + 2) columns and 2 rows, describing whether photon i passed through layer j (endpoint-inclusive). It would look something like this (with nlayers = 10):
[[ 1 1 1 0 0 0 0 0 0 0 0 0]
[ 1 1 1 1 1 1 1 0 0 0 0 0]]
I could do this by iterating over the photons and generating each row of traveled_through individually, but that's rather slow, and sort of defeats the point of running many photons at once, so I'd rather not do that.
I tried to define the array as follows:
traveled_through = np.zeros((2, nlayers)).astype(int)
traveled_through[ : , np.min(pastpresent, axis = 1) : np.max(pastpresent, axis = 1) + ] = 1
The idea was that in a given photon's row, the indices from the starting layer through and including the ending layer would be set to 1, with all others remaining 0. However, I get the following error:
traveled_through[ : , np.min(pastpresent, axis = 1) : np.max(pastpresent, axis = 1) + 1 ] = 1
IndexError: invalid slice
My best guess is that numpy does not allow different rows of an array to be indexed differently using this method. Does anyone have suggestions for how to generate traveled_through for an arbitrary number of photons and an arbitrary number of layers?
If the two photons always start at 0, you could perhaps construct your array as follows.
First setting the variables...
>>> pastpresent = np.array([[0, 2], [0, 6]])
>>> nlayers = 10
...and then constructing the array:
>>> (pastpresent[:,1][:,np.newaxis] + 1 > np.arange(nlayers+2)).astype(int)
array([[1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0]])
Or if the photons have an arbitrary starting layer:
>>> pastpresent2 = np.array([[1, 7], [3, 9]])
>>> (pastpresent2[:,0][:,np.newaxis] < np.arange(nlayers+2)) &
(pastpresent2[:,1][:,np.newaxis] + 1 > np.arange(nlayers+2)).astype(int)
array([[0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0]])
A little trick I kind of like for this kind of thing involves the accumulate method of the logical_xor ufunc:
>>> a = np.zeros(10, dtype=int)
>>> b = [3, 7]
>>> a[b] = 1
>>> a
array([0, 0, 0, 1, 0, 0, 0, 1, 0, 0])
>>> np.logical_xor.accumulate(a, out=a)
array([0, 0, 0, 1, 1, 1, 1, 0, 0, 0])
Note that this sets to 1 the entries between the positions in b, first index inclusive, last index exclusive, so you have to handle off by 1 errors depending on what exactly you are after.
With several rows, you could make it work as:
>>> a = np.zeros((3, 10), dtype=int)
>>> b = np.array([[1, 7], [0, 4], [3, 8]])
>>> b[:, 1] += 1 # handle the off by 1 error
>>> a[np.arange(len(b))[:, None], b] = 1
>>> a
array([[0, 1, 0, 0, 0, 0, 0, 0, 1, 0],
[1, 0, 0, 0, 0, 1, 0, 0, 0, 0],
[0, 0, 0, 1, 0, 0, 0, 0, 0, 1]])
>>> np.logical_xor.accumulate(a, axis=1, out=a)
array([[0, 1, 1, 1, 1, 1, 1, 1, 0, 0],
[1, 1, 1, 1, 1, 0, 0, 0, 0, 0],
[0, 0, 0, 1, 1, 1, 1, 1, 1, 0]])
I have a m x n matrix where each row is a sample and each column is a class. Each row contains the soft-max probabilities of each class. I want to replace the maximum value in each row with 1 and others with 0. How can I do it efficiently in Python?
Some made up data:
>>> a = np.random.rand(5, 5)
>>> a
array([[ 0.06922196, 0.66444783, 0.2582146 , 0.03886282, 0.75403153],
[ 0.74530361, 0.36357237, 0.3689877 , 0.71927017, 0.55944165],
[ 0.84674582, 0.2834574 , 0.11472191, 0.29572721, 0.03846353],
[ 0.10322931, 0.90932896, 0.03913152, 0.50660894, 0.45083403],
[ 0.55196367, 0.92418942, 0.38171512, 0.01016748, 0.04845774]])
In one line:
>>> (a == a.max(axis=1)[:, None]).astype(int)
array([[0, 0, 0, 0, 1],
[1, 0, 0, 0, 0],
[1, 0, 0, 0, 0],
[0, 1, 0, 0, 0],
[0, 1, 0, 0, 0]])
A more efficient (and verbose) approach:
>>> b = np.zeros_like(a, dtype=int)
>>> b[np.arange(a.shape[0]), np.argmax(a, axis=1)] = 1
>>> b
array([[0, 0, 0, 0, 1],
[1, 0, 0, 0, 0],
[1, 0, 0, 0, 0],
[0, 1, 0, 0, 0],
[0, 1, 0, 0, 0]])
I think the best answer to your particular question is to use a matrix type object.
A sparse matrix should be the most performant in terms of storing large numbers of these matrices of large sizes in a memory friendly way, given that most of the matrix is populated with zeroes. This should be superior to using numpy arrays directly especially for very large matrices in both dimensions, if not in terms of speed of computation, in terms of memory.
import numpy as np
import scipy #older versions may require `import scipy.sparse`
matrix = np.matrix(np.random.randn(10, 5))
maxes = matrix.argmax(axis=1).A1
# was .A[:,0], slightly faster, but .A1 seems more readable
n_rows = len(matrix) # could do matrix.shape[0], but that's slower
data = np.ones(n_rows)
row = np.arange(n_rows)
sparse_matrix = scipy.sparse.coo_matrix((data, (row, maxes)),
shape=matrix.shape,
dtype=np.int8)
This sparse_matrix object should be very lightweight relative to a regular matrix object, which would needlessly track each and every zero in it. To materialize it as a normal matrix:
sparse_matrix.todense()
returns:
matrix([[0, 0, 0, 0, 1],
[0, 0, 1, 0, 0],
[0, 0, 1, 0, 0],
[0, 0, 0, 0, 1],
[1, 0, 0, 0, 0],
[0, 0, 1, 0, 0],
[0, 0, 0, 1, 0],
[0, 1, 0, 0, 0],
[1, 0, 0, 0, 0],
[0, 0, 0, 1, 0]], dtype=int8)
Which we can compare to matrix:
matrix([[ 1.41049496, 0.24737968, -0.70849012, 0.24794031, 1.9231408 ],
[-0.08323096, -0.32134873, 2.14154425, -1.30430663, 0.64934781],
[ 0.56249379, 0.07851507, 0.63024234, -0.38683508, -1.75887624],
[-0.41063182, 0.15657594, 0.11175805, 0.37646245, 1.58261556],
[ 1.10421356, -0.26151637, 0.64442885, -1.23544526, -0.91119517],
[ 0.51384883, 1.5901419 , 1.92496778, -1.23541699, 1.00231508],
[-2.42759787, -0.23592018, -0.33534536, 0.17577329, -1.14793293],
[-0.06051458, 1.24004714, 1.23588228, -0.11727146, -0.02627196],
[ 1.66071534, -0.07734444, 1.40305686, -1.02098911, -1.10752638],
[ 0.12466003, -1.60874191, 1.81127175, 2.26257234, -1.26008476]])
This approach using basic numpy and list comprehensions works, but is the least performant. I'm leaving this answer here as it may be somewhat instructive. First we create a numpy matrix:
matrix = np.matrix(np.random.randn(2,2))
matrix is, e.g.:
matrix([[-0.84558168, 0.08836042],
[-0.01963479, 0.35331933]])
Now map 1 to a new matrix if the element is max, else 0:
newmatrix = np.matrix([[1 if i == row.max() else 0 for i in row]
for row in np.array(matrix)])
newmatrix is now:
matrix([[0, 1],
[0, 1]])
Y = np.random.rand(10,10)
X=np.zeros ((5,5))
y_insert=2
x_insert=3
offset = (1,2)
for index_x, row in enumerate(X):
for index_y, e in enumerate(row):
Y[index_x + offset[0]][index_y + offset[1]] = e