Generate numpy array using multiple columns of pandas dataframe - python

Sorry for the long post.
I'm using python 3.6 on windows 10.I have a pandas data frame that contain around 100,000 rows. From this data frame I need to generate Four numpy arrays. First 5 relevant rows of my data frame looks like below
A B x UB1 LB1 UB2 LB2
0.2134 0.7866 0.2237 0.1567 0.0133 1.0499 0.127
0.24735 0.75265 0.0881 0.5905 0.422 1.4715 0.5185
0.0125 0.9875 0.1501 1.3721 0.5007 2.0866 2.0617
0.8365 0.1635 0.0948 1.9463 1.0854 2.4655 1.9644
0.1234 0.8766 0.0415 2.7903 2.2602 3.5192 3.2828
Column B is (1-Column A), Actually column B is not there in my data frame. I have added it to explain my problem
From this data frame, I need to generate three arrays. My arrays looks like
My array c looks like array([-0.2134, -0.7866,-0.24735, -0.75265,-0.0125, -0.9875,-0.8365, -0.1635,-0.1234, -0.8766],dtype=float32)
Where first element is first row of column A with added negative sign, similarly 2nd element is taken from 1st row of column B, third element is from second row of column A,fourth element is 2nd row of column B & so on
My second array UB looks like
array([ 0.2237, 0.0881, 0.1501, 0.0948, 0.0415, 0.2237],dtype=float32)
where elements are rows of column X.
My third array,bounds, looks like
array([[0.0133 , 0.1567],
[0.127 , 1.0499],
[0.422 , 0.5905],
[0.5185 , 1.4715],
[0.5007 , 1.3721],
[2.0617 , 2.0866],
[1.0854 , 1.9463],
[1.9644 , 2.4655],
[2.2602 , 2.7903],
[3.2828 , 3.5192]])
Where bounds[0][0] is first row of LB1,bounds[0][1] is first row of UB1. bounds[1][0] is first row of LB2, bounds [1][1] is first row of UB2. Again bounds[2][0] is 2nd row of LB1 & so on.
My fourth array looks like
array([[-1, 1, 0, 0, 0, 0, 0, 0, 0, 0],
[ 0, 0, -1, 1, 0, 0, 0, 0, 0, 0],
[ 0, 0, 0, 0, -1, 1, 0, 0, 0, 0],
[ 0, 0, 0, 0, 0, 0, -1, 1, 0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, -1, 1]])
It contains same number of rows as data frame rows & column=2*data frame rows.
Can you please tell me for 100,000 rows of record what is the efficient way to generate these arrays

This should be rather straightforward:
from io import StringIO
import pandas as pd
import numpy as np
data = """A B x UB1 LB1 UB2 LB2
0.2134 0.7866 0.2237 0.1567 0.0133 1.0499 0.127
0.24735 0.75265 0.0881 0.5905 0.422 1.4715 0.5185
0.0125 0.9875 0.1501 1.3721 0.5007 2.0866 2.0617
0.8365 0.1635 0.0948 1.9463 1.0854 2.4655 1.9644
0.1234 0.8766 0.0415 2.7903 2.2602 3.5192 3.2828"""
df = pd.read_csv(StringIO(data), sep='\\s+', header=0)
c = -np.stack([df['A'], 1 - df['A']], axis=1).ravel()
print(c)
# [-0.2134 -0.7866 -0.24735 -0.75265 -0.0125 -0.9875 -0.8365 -0.1635
# -0.1234 -0.8766 ]
ub = df['x'].values
print(ub)
# [0.2237 0.0881 0.1501 0.0948 0.0415]
bounds = np.stack([df['LB1'], df['UB1'], df['LB2'], df['UB2']], axis=1).reshape((-1, 2))
print(bounds)
# [[0.0133 0.1567]
# [0.127 1.0499]
# [0.422 0.5905]
# [0.5185 1.4715]
# [0.5007 1.3721]
# [2.0617 2.0866]
# [1.0854 1.9463]
# [1.9644 2.4655]
# [2.2602 2.7903]
# [3.2828 3.5192]]
n = len(df)
fourth = np.zeros((n, 2 * n))
idx = np.arange(n)
fourth[idx, 2 * idx] = -1
fourth[idx, 2 * idx + 1] = 1
print(fourth)
# [[-1. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
# [ 0. 0. -1. 1. 0. 0. 0. 0. 0. 0.]
# [ 0. 0. 0. 0. -1. 1. 0. 0. 0. 0.]
# [ 0. 0. 0. 0. 0. 0. -1. 1. 0. 0.]
# [ 0. 0. 0. 0. 0. 0. 0. 0. -1. 1.]]

Related

Python numpy replacing values that are in certain pattern

I am trying to 'avoid walls' using an A* star (A-Star) algorithm.
My array look like this:
[1, 1, 1, 0, 0, 0, 1, 1, 1],
[1, 0, 0, 0, 0, 0, 1, 1, 1],
[1, 0, 0, 0, 1, 1, 1, 1, 1],
[1, 0, 0, 0, 1, 1, 1, 1, 1],
[1, 1, 0, 0, 0, 1, 1, 1, 1],
[1, 1, 1, 0, 0, 0, 1, 1, 1]
I can only walk on 0 (zeroes) and 1 (ones) are the walls.
I want my AI to walk on the center of the the path, assuming that there is enough room to walk. AI can walk diagonally.
for example instead of [1, 1, 1, 0, 0, 0, 1, 1, 1],(First array) since there is enough room not to block the path how can I replace it with [1, 1, 1, 1, 0, 1, 1, 1, 1],
Afterthought:
The optimal path here if we will walk on center is [4 3 2 2 3 4].
Also, what if we are given the shortest path possible for this case it
would be [3 3 3 3 4 4] if we are going from (3, 0) to (4, 5). If we
just don't want walls in our path like having a single element before
the wall, how can we arrive to [3 3 2 2 3 4] if we allow start and
finish to touch walls?
Edit:
Ali_Sh answer is what I am initially looking for and is the accepted answer.
If a be the main array, indices of the middle 0 in each row can be achieved by:
cond = np.where(a == 0)
unique = np.unique(cond[0], return_index=True, return_counts=True)
ind = unique[1] + unique[2] // 2
cols = cond[1][ind] # --> [4 3 2 2 3 4]
and it can be used to substitute 1 values in a ones array with the main array shape:
one = np.ones(shape=a.shape)
one[np.arange(len(one)), cols] = 0
which will:
[[1. 1. 1. 1. 0. 1. 1. 1. 1.]
[1. 1. 1. 0. 1. 1. 1. 1. 1.]
[1. 1. 0. 1. 1. 1. 1. 1. 1.]
[1. 1. 0. 1. 1. 1. 1. 1. 1.]
[1. 1. 1. 0. 1. 1. 1. 1. 1.]
[1. 1. 1. 1. 0. 1. 1. 1. 1.]]
Here's an example where it just finds all the values that are zero in each row and sets the path as the middle argument. If there was a row with two patches of zeros, this could run into trouble. In that case, you would need to make sure that the arguments above and below a zero patch are also zero patches.
I have used matplotlib here to visualize the path:
import matplotlib.pyplot as plt
p = []
A = [[1, 1, 1, 0, 0, 0, 1, 1, 1],
[1, 0, 0, 0, 0, 0, 1, 1, 1],
[1, 0, 0, 0, 1, 1, 1, 1, 1],
[1, 0, 0, 0, 1, 1, 1, 1, 1],
[1, 1, 0, 0, 0, 1, 1, 1, 1],
[1, 1, 1, 0, 0, 0, 1, 1, 1]]
for i in range(len(A)):
ptemp = []
for j in range(len(A[0])):
if A[i][j] == 0:
ptemp.append(j) # find all the zero values
p.append(ptemp[int(len(ptemp)/2)]) # set the path as the center zero value
print(p)
plt.imshow(A[::-1])
plt.plot(p[::-1],range(len(A)))
plt.show()
For the update section of the question, if we have another path for columns instead the optimal path that specified in my previous answer (e.g. [3 1 1 1 2 3] instead [4 3 2 2 3 4]), it can be applied just using:
cols = np.array([3, 1, 1, 1, 2, 3])
one = np.ones(shape=a.shape)
one[np.arange(len(one)), cols] = 0
# [[1. 1. 1. 0. 1. 1. 1. 1. 1.]
# [1. 0. 1. 1. 1. 1. 1. 1. 1.]
# [1. 0. 1. 1. 1. 1. 1. 1. 1.]
# [1. 0. 1. 1. 1. 1. 1. 1. 1.]
# [1. 1. 0. 1. 1. 1. 1. 1. 1.]
# [1. 1. 1. 0. 1. 1. 1. 1. 1.]]
If we want the all paths other than boundaries, we could add the following codes to the previous answer codes:
if we don't fully walk on the center but just avoid 'near walls' path
just even 1 offset from the walls:
cols_min = cols - (unique[2] - 2) // 2
cols_max = cols + (unique[2] - 2) // 2
one[np.arange(len(one)), cols_min] = 0
one[np.arange(len(one)), cols_max] = 0
# [[1. 1. 1. 1. 0. 1. 1. 1. 1.]
# [1. 1. 0. 0. 0. 1. 1. 1. 1.]
# [1. 1. 0. 1. 1. 1. 1. 1. 1.]
# [1. 1. 0. 1. 1. 1. 1. 1. 1.]
# [1. 1. 1. 0. 1. 1. 1. 1. 1.]
# [1. 1. 1. 1. 0. 1. 1. 1. 1.]]
For when we can touch the walls (here one of them) on the first and the last rows, we could add the following codes to the previous answer codes:
col_min_first = cols[0] - unique[2][0] // 2
col_min_last = cols[-1] - unique[2][-1] // 2
one[0, col_min_first:cols[0]] = 0
one[-1, col_min_last:cols[-1]] = 0
# [[1. 1. 1. 0. 0. 1. 1. 1. 1.]
# [1. 1. 1. 0. 1. 1. 1. 1. 1.]
# [1. 1. 0. 1. 1. 1. 1. 1. 1.]
# [1. 1. 0. 1. 1. 1. 1. 1. 1.]
# [1. 1. 1. 0. 1. 1. 1. 1. 1.]
# [1. 1. 1. 0. 0. 1. 1. 1. 1.]]
And, finally, if we want to find the shortest path, we can achieve the goal by finding the column with maximum number of 0 in it, firstly, and then, find the nearest column index 0 to that column for where the column not contains 0:
ind_max = np.argmax(np.sum(a == 0, axis=0))
mask_rows = a[:, ind_max] != 0
mask_col_min = a[:, ind_max - 1] == 0
mask_col_max = a[:, ind_max + 1] == 0
ind_max = np.where(mask_rows & mask_col_min, ind_max - 1, ind_max)
ind_max = np.where(mask_rows & mask_col_max, ind_max + 1, ind_max)
one = np.ones(shape=a.shape)
one[np.arange(len(one)), ind_max] = 0
# [[1. 1. 1. 0. 1. 1. 1. 1. 1.] | a = np.array([[1, 1, 1, 0, 0, 0, 1, 1, 1], [[1. 1. 1. 0. 1. 1. 1. 1. 1.]
# [1. 1. 1. 0. 1. 1. 1. 1. 1.] | [1, 0, 0, 0, 0, 0, 1, 1, 1], [1. 1. 1. 0. 1. 1. 1. 1. 1.]
# [1. 1. 1. 0. 1. 1. 1. 1. 1.] | [0, 0, 0, 1, 1, 1, 1, 1, 1], --> [1. 1. 0. 1. 1. 1. 1. 1. 1.]
# [1. 1. 1. 0. 1. 1. 1. 1. 1.] | [1, 0, 0, 0, 1, 1, 1, 1, 1], [1. 1. 1. 0. 1. 1. 1. 1. 1.]
# [1. 1. 1. 0. 1. 1. 1. 1. 1.] | [1, 1, 0, 0, 0, 1, 1, 1, 1], [1. 1. 1. 0. 1. 1. 1. 1. 1.]
# [1. 1. 1. 0. 1. 1. 1. 1. 1.]] | [1, 1, 1, 0, 0, 0, 1, 1, 1]]) [1. 1. 1. 0. 1. 1. 1. 1. 1.]]

Matrix in Python give wrong results using Numpy Python

I am a complete beginner with NumPy and I am trying to generate the following matrix pattern. Below is my code. What I am not figuring out is that what am I doing wrong to get this result. Thanks in advance for any help.
import numpy as np
def matrix(n):
final = []
for i in range(n):
final.append(list(np.tile([0,1],int(n/2))) if i%2==0 else list(np.tile([1,0],int(n/2))))
print(np.array(final))
size = 8
matrix(size)
While using numpy you should avoid working with arrays and for loops for matrix creating and editing because for large matrices it would be very slow.
Try to examine this code:
import math
import numpy as np
def zero_borders(mat: np.ndarray) -> None:
"""Makes the borders of the array zero."""
mat[:, 0] = 0 # left border
mat[:, -1] = 0 # right border
mat[0, :] = 0 # upper border
mat[-1, :] = 0 # bottom border
def zero_center_square(mat: np.ndarray) -> None:
"""Makes small square of zeros in the center of the array."""
size = mat.shape[0]
i_low = size//2 - 1
i_high = math.ceil(size/2)
mat[i_low, i_low:i_high + 1] = 0 # upper edge of the square
mat[i_high, i_low:i_high + 1] = 0 # upper edge of the square
mat[i_low:i_high + 1, i_low] = 0 # left edge of the square
mat[i_low:i_high + 1, i_high] = 0 # right edge of the square
def matrix(n: int) -> np.ndarray:
"""Creates a square matrix with special pattern."""
mat = np.ones((n, n))
zero_borders(mat)
zero_center_square(mat)
return mat
def main():
print("Even size:")
print(matrix(8))
print("")
print("Odd size:")
print(matrix(9))
if __name__ == "__main__":
main()
The output:
Even size:
[[0. 0. 0. 0. 0. 0. 0. 0.]
[0. 1. 1. 1. 1. 1. 1. 0.]
[0. 1. 1. 1. 1. 1. 1. 0.]
[0. 1. 1. 0. 0. 1. 1. 0.]
[0. 1. 1. 0. 0. 1. 1. 0.]
[0. 1. 1. 1. 1. 1. 1. 0.]
[0. 1. 1. 1. 1. 1. 1. 0.]
[0. 0. 0. 0. 0. 0. 0. 0.]]
Odd size:
[[0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 1. 1. 1. 1. 1. 1. 1. 0.]
[0. 1. 1. 1. 1. 1. 1. 1. 0.]
[0. 1. 1. 0. 0. 0. 1. 1. 0.]
[0. 1. 1. 0. 1. 0. 1. 1. 0.]
[0. 1. 1. 0. 0. 0. 1. 1. 0.]
[0. 1. 1. 1. 1. 1. 1. 1. 0.]
[0. 1. 1. 1. 1. 1. 1. 1. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0.]]
You can use numpy ix_() like this:
>>> x = np.zeros((9,9), dtype=int)
>>> p1 = np.ix_([1,2,6,7],[1,2,3,4,5,6,7])
>>> x[p]=1
>>> p2 = np.ix_([3,4,5],[1,2,6,7])
>>> x[p2]=1
>>> x
array([[0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 1, 1, 1, 1, 1, 1, 1, 0],
[0, 1, 1, 1, 1, 1, 1, 1, 0],
[0, 1, 1, 0, 0, 0, 1, 1, 0],
[0, 1, 1, 0, 1, 0, 1, 1, 0],
[0, 1, 1, 0, 0, 0, 1, 1, 0],
[0, 1, 1, 1, 1, 1, 1, 1, 0],
[0, 1, 1, 1, 1, 1, 1, 1, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0]])
You have not mentioned any particular pattern for lxl length of matrix, so I will write just code about how to generate the matrix in given image.
You can use NumPy (particularly numpy.pad()) to create that matrix easily as:
import numpy as np
# Create required matrix
matrix = np.pad(np.pad(np.pad(np.array([[1]]), (1, 1)), (2, 2), constant_values = 1), (1, 1))
# If you want that as list instead of NumPy array
list_matrix = list(list(i) for i in matrix)

Numpy: Change values in numpy array by indexes and condition

I am new in numpy, and I am having troubles with simple managment of numpy arrays.
I am doing a task in which it said that loops has to be avoid as much as possible, and I need to edit the values of an array through another array of indexes.
indexes # [3, 16]
y # [0. 1. 1. 1. 0. 1. 0. 0. 0. 0. 1. 1. 1. 0. 1. 0. 0. 0. 1. 1.]
y[indexes] = 2 # [0. 1. 1. 2. 0. 1. 0. 0. 0. 0. 1. 1. 1. 0. 1. 0. 2. 0. 1. 1.]
But I don't need change the value simply by 2. I need make a conditional change. This what I have got, but I would need something like
y[indexes] = 0 if y[indexes] == 1 else 0
>>> [0. 1. 1. 0. 0. 1. 0. 0. 0. 0. 1. 1. 1. 0. 1. 0. 1. 0. 1. 1.]
And the line above should be the results.
This is the loop way answer, but I need a numpy way if exists:
for index in indexes:
y[index] = 1 if y[index] == 0 else 0
Thanks in advance.
I don't know if I understood your question. But I hope this helps you.
tip 01
import numpy as np
indexes = [1, 5, 7] # index list
y = np.array([9,10,11,12,13,14,15,16,17,18,19,20,21,22,23]) #array example
y[indexes][2] #3rd(0,1,>>2<<) item of y array (1,5,>>7<<).
In this case it is y[7] equal 16.
tip 02
This can also be useful.
y = np.array([0,1,1,0,3,0,1,0,1,0])
y
array([0, 1, 1, 0, 3, 0, 1, 0, 1, 0])
np.where(y != 1, y, 0)
y
array([0, 0, 0, 0, 3, 0, 0, 0, 0, 0])

Create a Sparse Zero Mean Random Matrix

Does anyone has experience in creating sparse matrix with the non-zero values follows a uniform distribution of [-0.5, 0.5] and has zero mean (zero centered) in python (e.g. using Scipy.sparse)?
I am aware that scipy.sparse package provide a few method on creating random sparse matrix, like 'rand' and 'random'. However I could not achieve what I want with those method. For example, I tried:
import numpy as np
import scipy.sparse as sp
s = np.random.uniform(-0.5,0.5)
W=sp.random(1024, 1024, density=0.01, format='csc', data_rvs=s)
To specifiy my idea:
Let say I want the above mentioned matrix which is non-sparse, or dense, I will create it by:
dense=np.random.rand(1024,1024)-0.5
'np.random.rand(1024,1024)' will create a dense uniform matrix with values in [0,1]. To make it zero mean, I centre the matrix by substract it 0.5.
However if I create a sparse matrix, let say:
sparse=sp.rand(1024,1024,density=0.01, format='csc')
The matrix will be having non-zero values in uniform [0,1]. However, if I want to centre the matrix, I cannot simply do 'sparse-=0.5' which will cause all the originally zero entries non-zero after substraction.
So, how can I achieve the same as for the above example for dense matrix on sparse matrix?
Thank you for all of your help!
The data_rvs parameter is expecting a "callable" that takes a size. This isn't exactly obvious from the documentation. This can be done with a lambda as follows:
import numpy as np
import scipy.sparse as sp
W = sp.random(1024, 1024, density=0.01, format='csc',
data_rvs=lambda s: np.random.uniform(-0.5, 0.5, size=s))
Then print(W) gives:
(243, 0) -0.171300809713
(315, 0) 0.0739590145626
(400, 0) 0.188151369316
(440, 0) -0.187384896218
: :
(1016, 0) 0.29262088084
(156, 1) -0.149881296136
(166, 1) -0.490405135834
(191, 1) 0.188167190147
(212, 1) 0.0334533020488
: :
(411, 1) 0.122330200832
(431, 1) -0.0494334160833
(813, 1) -0.0076379249885
(828, 1) 0.462807265425
: :
(840, 1021) 0.456423017883
(12, 1022) -0.47313075329
: :
(563, 1022) -0.477190349161
(655, 1022) -0.460942546313
(673, 1022) 0.0930207181126
(676, 1022) 0.253643616387
: :
(843, 1023) 0.463793903168
(860, 1023) 0.454427252782
For the newbie, the lambda may look odd - this is just an unnamed function. The sp.random function takes an optional argument data_rvs that defaults to None. When specified, it is expected to be a function that takes a size argument and returns that number of random numbers. A simple function to do this would be:
def generate_n_uniform_randoms(n):
return np.uniform(-0.5, 0.5, n)
I don't know the origin of the API, but the shape is not needed as sp.random presumably first figures out which indices will be non-zero, and then it just needs to compute random values for those indices, which is a set of a known size.
The lambda is just syntactic sugar that allows us to define that function inline in terms of some other function call. We could instead write
W = sp.random(1024, 1024, density=0.01, format='csc',
data_rvs=generate_n_uniform_randoms)
Actually, this can be a "callable" - some object f for which f(n) returns n random variables. This can be a function, but it can also be an object of a class that implements the __call__(self, n) function. For example:
class ufoo(object):
def __call__(self, n):
import numpy
return numpy.random.uniform(-0.5, 0.5, n)
W = sp.random(1024, 1024, density=0.01, format='csc',
data_rvs=ufoo())
If you need the mean to be exactly zero (within roundoff of course), this can be done by subtracting the mean from the non-zero values, as I mentioned above:
W.data -= np.mean(W.data)
Then:
W[idx].mean()
-2.3718641632430623e-18
sparse.random does 2 things - distributes nonzeros randomly, and generates random uniform values.
In [62]: M = sparse.random(10,10,density=.2, format='csr')
In [63]: M
Out[63]:
<10x10 sparse matrix of type '<class 'numpy.float64'>'
with 20 stored elements in Compressed Sparse Row format>
In [64]: M.data
Out[64]:
array([ 0.42825407, 0.51858978, 0.8084335 , 0.08691635, 0.13210409,
0.61288928, 0.39675205, 0.58242891, 0.5174367 , 0.57859824,
0.48812484, 0.13472883, 0.82992478, 0.70568697, 0.45001632,
0.52147305, 0.72943809, 0.55801913, 0.97018861, 0.83236235])
You can modify the data values cheaply without changing the sparsity distribution:
In [65]: M.data -= 0.5
In [66]: M.A
Out[66]:
array([[ 0. , 0. , 0. , -0.07174593, 0. ,
0. , 0. , 0. , 0. , 0. ],
[ 0.01858978, 0. , 0. , 0.3084335 , -0.41308365,
0. , 0. , 0. , 0. , -0.36789591],
[ 0. , 0. , 0. , 0. , 0.11288928,
-0.10324795, 0. , 0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0.08242891, 0.0174367 , 0. ],
[ 0. , 0. , 0.07859824, 0. , 0. ,
0. , 0. , 0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. ,
0. , -0.01187516, 0. , 0. , -0.36527117],
[ 0. , 0. , 0.32992478, 0. , 0. ,
0. , 0. , 0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0.20568697,
0. , 0. , -0.04998368, 0. , 0. ],
[ 0.02147305, 0. , 0.22943809, 0.05801913, 0. ,
0. , 0. , 0. , 0. , 0. ],
[ 0. , 0.47018861, 0.33236235, 0. , 0. ,
0. , 0. , 0. , 0. , 0. ]])
In [67]: np.mean(M.data)
Out[67]: 0.044118297661574338
Or replacing the nonzero values with a new set of values:
In [69]: M.data = np.random.randint(-5,5,20)
In [70]: M
Out[70]:
<10x10 sparse matrix of type '<class 'numpy.int32'>'
with 20 stored elements in Compressed Sparse Row format>
In [71]: M.A
Out[71]:
array([[ 0, 0, 0, 4, 0, 0, 0, 0, 0, 0],
[-1, 0, 0, 1, 2, 0, 0, 0, 0, -4],
[ 0, 0, 0, 0, 0, 4, 0, 0, 0, 0],
[ 0, 0, 0, 0, 0, 0, 0, -5, -5, 0],
[ 0, 0, 2, 0, 0, 0, 0, 0, 0, 0],
[ 0, 0, 0, 0, 0, 0, -3, 0, 0, 3],
[ 0, 0, -1, 0, 0, 0, 0, 0, 0, 0],
[ 0, 0, 0, 0, -4, 0, 0, -1, 0, 0],
[-1, 0, -5, -2, 0, 0, 0, 0, 0, 0],
[ 0, 3, 1, 0, 0, 0, 0, 0, 0, 0]])
In [72]: M.data
Out[72]:
array([ 4, -1, 1, 2, -4, 0, 4, -5, -5, 2, -3, 3, -1, -4, -1, -1, -5,
-2, 3, 1])
In my opinion, your requirements are still incomplete (see disadvantage mentioned below).
Here is some implementation for my simple construction outlined above in my comment:
import numpy as np
import scipy.sparse as sp
M, N, NNZ = 5, 5, 10
assert NNZ % 2 == 0
flat_dim = M*N
valuesA = np.random.uniform(-0.5, 0.5, size=NNZ // 2)
valuesB = valuesA * -1
values = np.hstack((valuesA, valuesB))
positions_flat = np.random.choice(flat_dim, size=NNZ, replace=False)
positions_2d = np.unravel_index(positions_flat, dims=(M, N))
mat = sp.coo_matrix((values, (positions_2d[0], positions_2d[1])), shape=(M, N))
print(mat.todense())
print(mat.data.mean())
Output:
[[ 0. 0. 0. 0.0273862 0. ]
[-0.3943963 0. 0. -0.04134932 0. ]
[-0.10121743 0. -0.0273862 0. 0.04134932]
[ 0.3943963 0. 0. 0. 0. ]
[-0.24680983 0. 0.24680983 0.10121743 0. ]]
0.0
Advantages
sparse
zero mean
entries from uniform distribution
Potential disadvantage:
for each value x in the matrix, somewhere -x is to be found!
meaning: it's not uniform in a more broad joint-distribution sense
if that's hurtful only you can tell
if yes: the above construction could be easily modified to use any centered values from some distribution, so your problem collapses into this somewhat smaller (but not necessarily much easier problem)
Now in regards to that linked problem: i'm guessing here, but i would not be surprised to see that sampling x values uniformly with the constraint mean(x)=0 is NP-hard.
Keep in mind, that a-posteriori centering of nonzeros, as recommend in the other answer, changes the underlying distribution (even for simple distributions). In some cases even invalidating bounds (leaving interval -0.5, 0.5).
This means: this question is all about formalizing which objective is how important and balance these out in some way.

Sparse arrays from tuples

I searched the net to find a guide for Scipy sparse matrices and I failed. I would be happy if anybody would share any source for it but now going to question:
I have an array of tuples. I want to change the array of tuples to a sparse matrix where the tuples appear on the main diagonal and diagonal just beside to it as the following example shows it. What is the fancy(efficient) way of doing it?
import numpy as np
A=np.asarray([[1,2],[3,4],[5,6],[7,8]])
B=np.zeros((A.shape[0],A.shape[0]+1))
for i in range(A.shape[0]):
B[i,i]=A[i,0]
B[i,i+1]=A[i,1]
print B
Output being:
[[ 1. 2. 0. 0. 0.]
[ 0. 3. 4. 0. 0.]
[ 0. 0. 5. 6. 0.]
[ 0. 0. 0. 7. 8.]]
You can build those really fast as a CSR matrix:
>>> A = np.asarray([[1,2],[3,4],[5,6],[7,8]])
>>> rows = len(A)
>>> cols = rows + 1
>>> data = A.flatten() # we want a copy
>>> indptr = np.arange(0, len(data)+1, 2) # 2 non-zero entries per row
>>> indices = np.repeat(np.arange(cols), [1] + [2] * (cols-2) + [1])
>>> import scipy.sparse as sps
>>> a_sps = sps.csr_matrix((data, indices, indptr), shape=(rows, cols))
>>> a_sps.A
array([[1, 2, 0, 0, 0],
[0, 3, 4, 0, 0],
[0, 0, 5, 6, 0],
[0, 0, 0, 7, 8]])
Try diags from scipy
import numpy as np
import scipy.sparse
A = np.asarray([[1,2],[3,4],[5,6],[7,8]])
B = scipy.sparse.diags([A[:,0], A[:,1]], [0, 1], [4, 5])
When I print B.todense(), it gives me
[[ 1. 2. 0. 0. 0.]
[ 0. 3. 4. 0. 0.]
[ 0. 0. 5. 6. 0.]
[ 0. 0. 0. 7. 8.]]

Categories

Resources