Creating a random matrix in python - python

I am trying to create a random square matrix of nxn random numbers with numpy. Of course I am able to generate enough random numbers but I am having trouble using numpy to create a matrix of variable length. This is as far as I have gotten thus far:
def testMatrix(size):
a = []
for i in range(0, size*size):
a.append(randint(0, 5))
How can I put this list into an array of size x size?

Try
np.random.randint(0, 5, size=(s, s))

Your test matrix is currently one dimensional. If you want to create a random matrix with numpy, you can just do:
num_rows = 3
num_columns = 3
random_matrix = numpy.random.random((num_rows, num_columns))
The result would be:
array([[ 0.15194989, 0.21977027, 0.85063633],
[ 0.1879659 , 0.09024749, 0.3566058 ],
[ 0.18044427, 0.59143149, 0.25449112]])
You can also create a random matrix without numpy:
import random
num_rows = 3
num_columns = 3
random_matrix = [[random.random() for j in range(num_rows)] for i in range(num_columns)]
The result would be:
[[0.9982841729782105, 0.9659048749818827, 0.22838327707784145],
[0.3524666409224604, 0.1918744765283834, 0.7779130503458696],
[0.5239230720346117, 0.0224389713805887, 0.6547162177880549]]
Per the comments, I've added a function to convert a one dimensional array to a matrix:
def convert_to_matrix(arr, chunk_size):
return [arr[i:i+chunk_size] for i in range(0, len(arr), chunk_size)]
arr = [1,2,3,4,5,6,7,8,9]
matrix = convert_to_matrix(arr, 3)
# [[1, 2, 3], [4, 5, 6], [7, 8, 9]] is the output

Related

Numpy fast for loop

I have a list named "y" with 8 numpy arrays of the shape (180000,)
Now I want to create a new numpy array named "Collision" with the same shape that counts how many values ​​of y are not 0. See the following example:
import numpy as np
collisions = np.zeros(len(y[0]), dtype=np.uint8)
for yi in y:
collisions[np.where(yi > 0)] += 1
The calculation of this function takes a relatively long time. Is there a faster implementation to do this?
I am not sure why your calculation takes so long, hope this helps to clarify, for example your list of array is like this:
import numpy as np
y = [np.random.normal(0,1,180000) for i in range(8)]
Running your code, it works ok:
collisions = np.zeros(len(y[0]), dtype=np.uint8)
for yi in y:
collisions[np.where(yi > 0)] += 1
collisions
array([4, 2, 4, ..., 4, 4, 5], dtype=uint8)
You can do it a bit faster like this, basically making your list of arrays a matrix and doing a row sum of >0, but I don't see the problem with that above:
(np.array(y)>0).sum(axis=0)
array([4, 2, 4, ..., 4, 4, 5])
I'm assuming you're looking for something like this:
import numpy as np
# simulating your data by randomly generating numbers in [-0.5, 0.5)
y = np.random.rand(8, 180_000) - 0.5
print(y.shape) # (8, 180000)
collisions = np.sum(y > 0, axis=0, dtype=np.uint8)
print(collisions.shape) # (180000,)
print(collisions) # [4 4 4 ... 1 6 7]

Python function for creating a square matrix of any size

I need to create a function which can take an unspecified number of parameters (elements for the matrix) and return the corresponding square matrix. I have implemented this using the following approach.
def square_matrix(size, *elements):
numbers = list(elements)
if size ** 2 != len(numbers):
return "Number of elements does not match the size of the matrix"
else:
matrix = []
factor = 0
for i in range(0, size):
row = []
for j in range(factor * size, (factor + 1) * size):
row.append(numbers[j])
factor += 1
matrix.append(row)
i += 1
return matrix
print(square_matrix(3, 1, 2, 3, 4, 5, 6, 7, 8, 9))
# Output: [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
Although this method works fine for smaller matrices, it seems somewhat inefficient as it uses nested loops and appears unnecessarily long to me. Is there any better/concise way of implementing the same?
is it OK to just use NumPy?
import numpy as np
def square_matrix(size, *elements):
return np.array(elements).reshape(size, size)
For the matrix creation, you could use list comprehension which is more efficient than a for a loop.
You could replace the for loops in the else part of your if statement by the below code.
matrix = [[i for i in elements[j:j+size]] for j in range(0,len(elements),size)]
There is no need to give the size as an argument. If the square root of the length of the arguments is a whole number, then you can make a square matrix of the elements.
import numpy as np
from math import sqrt
def square_matrix(*elements):
size = sqrt(len(elements))
if size.is_integer():
return np.array(elements).reshape(int(size), int(size))
else:
raise RuntimeError("Number of elements is not sufficient to make a square matrix")
print(square_matrix(1, 2, 3, 4, 5, 6, 7, 8, 9))
# Output:
# array([[1, 2, 3],
# [4, 5, 6],
# [7, 8, 9]])

Aggregate elements based on position vector

I'm trying to vectorize a very simple operation but can't seem to figure out how.
Given a very large numerical vector (over 1M positions) and another array of size n with a given set of positions, I would like to get back a vector of size n with elements being the average of the values of the first vector as specified by the second
a = np.array([1,2,3,4,5,6,7])
b = np.array([[0,1],[2],[3,5],[4,6]])
c = [1.5,3,5,6]
I need to repeat this operation many times so performance is an issue.
Vanilla python solution:
import numpy as np
import time
a = np.array([1,2,3,4,5,6,7])
b = np.array([[0,1],[2],[3,5],[4,6]])
begin = time.time()
for i in range(100000):
c = []
for d in b:
c.append(np.mean(a[d]))
print(time.time() - begin, c)
# 3.7529971599578857 [1.5, 3.0, 5.0, 6.0]
I'm not sure if this is necessarily faster but you may as well try:
import numpy as np
a = np.array([1, 2, 3, 4, 5, 6, 7])
b = np.array([[0, 1], [2], [3, 5], [4, 6]])
# Get the length of each subset of indices
lens = np.fromiter((len(bi) for bi in b), count=len(b), dtype=np.int32)
# Compute reduction indices
reduce_idx = np.roll(np.cumsum(lens), 1)
reduce_idx[0] = 0
# Make flattened array of index lists
idx = np.fromiter((i for bi in b for i in bi), count=lens.sum(), dtype=np.int32)
# Reorder according to indices
a2 = a[idx]
# Sum reordered array at reduction indices and divide by number of indices
c = np.add.reduceat(a2, reduce_idx) / lens
print(c)
# [1.5 3. 5. 6. ]

Decrease array size by averaging adjacent values with numpy

I have a large array of thousands of vals in numpy. I want to decrease its size by averaging adjacent values.
For example:
a = [2,3,4,8,9,10]
#average down to 2 values here
a = [3,9]
#it averaged 2,3,4 and 8,9,10 together
So, basically, I have n number of elements in array, and I want to tell it to average down to X number of values, and it averages like above.
Is there some way to do that with numpy (already using it for other things, so I'd like to stick with it).
Using reshape and mean, you can average every m adjacent values of an 1D-array of size N*m, with N being any positive integer number. For example:
import numpy as np
m = 3
a = np.array([2, 3, 4, 8, 9, 10])
b = a.reshape(-1, m).mean(axis=1)
#array([3., 9.])
1)a.reshape(-1, m) will create a 2D image of the array without copying data:
array([[ 2, 3, 4],
[ 8, 9, 10]])
2)taking the mean in the second axis (axis=1) will then calculate the mean value of each row, resulting in:
array([3., 9.])
Try this:
n_averaged_elements = 3
averaged_array = []
a = np.array([ 2, 3, 4, 8, 9, 10])
for i in range(0, len(a), n_averaged_elements):
slice_from_index = i
slice_to_index = slice_from_index + n_averaged_elements
averaged_array.append(np.mean(a[slice_from_index:slice_to_index]))
>>>> averaged_array
>>>> [3.0, 9.0]
Looks like a simple non-overlapping moving window average to me, how about:
In [3]:
import numpy as np
a = np.array([2,3,4,8,9,10])
window_sz = 3
a[:len(a)/window_sz*window_sz].reshape(-1,window_sz).mean(1)
#you want to be sure your array can be reshaped properly, so the [:len(a)/window_sz*window_sz] part
Out[3]:
array([ 3., 9.])
In this example, I presume that a is the 1D numpy array that needs to be averaged. In the method that I give below, we first find the factors of the length of this array a. And, then we choose the an appropriate factor as the step size to average the array with.
Here is the code.
import numpy as np
from functools import reduce
''' Function to find factors of a given number 'n' '''
def factors(n):
return list(set(reduce(list.__add__,
([i, n//i] for i in range(1, int(n**0.5) + 1) if n % i == 0))))
a = [2,3,4,8,9,10] #Given array.
'''fac: list of factors of length of a.
In this example, len(a) = 6. So, fac = [1, 2, 3, 6] '''
fac = factors(len(a))
'''step: choose an appropriate step size from the list 'fac'.
In this example, we choose one of the middle numbers in fac
(3). '''
step = fac[int( len(fac)/3 )+1]
'''avg: initialize an empty array. '''
avg = np.array([])
for i in range(0, len(a), step):
avg = np.append( avg, np.mean(a[i:i+step]) ) #append averaged values to `avg`
print avg #Prints the final result
[3.0, 9.0]

list of numpy vectors to sparse array

I have a list of numpy vectors of the format:
[array([[-0.36314615, 0.80562619, -0.82777381, ..., 2.00876354,2.08571887, -1.24526026]]),
array([[ 0.9766923 , -0.05725135, -0.38505339, ..., 0.12187988,-0.83129255, 0.32003683]]),
array([[-0.59539878, 2.27166874, 0.39192573, ..., -0.73741573,1.49082653, 1.42466276]])]
here, only 3 vectors in the list are shown. I have 100s..
The maximum number of elements in one vector is around 10 million
All the arrays in the list have unequal number of elements but the maximum number of elements is fixed.
Is it possible to create a sparse matrix using these vectors in python such that I have zeros in place of elements for the vectors which are smaller than the maximum size?
Try this:
from scipy import sparse
M = sparse.lil_matrix((num_of_vectors, max_vector_size))
for i,v in enumerate(vectors):
M[i, :v.size] = v
Then take a look at this page: http://docs.scipy.org/doc/scipy/reference/sparse.html
The lil_matrix format is good for constructing the matrix, but you'll want to convert it to a different format like csr_matrix before operating on them.
In this approach you replace the elements below your thresold by 0 and then create a sparse matrix out of them. I am suggesting the coo_matrix since it is the fastest to convert to the other types according to your purposes. Then you can scipy.sparse.vstack() them to build your matrix accounting all elements in the list:
import scipy.sparse as ss
import numpy as np
old_list = [np.random.random(100000) for i in range(5)]
threshold = 0.01
for a in old_list:
a[np.absolute(a) < threshold] = 0
old_list = [ss.coo_matrix(a) for a in old_list]
m = ss.vstack( old_list )
A little convoluted, but I would probably do it like this:
>>> import scipy.sparse as sps
>>> a = [np.arange(5), np.arange(7), np.arange(3)]
>>> lens = [len(j) for j in a]
>>> cols = np.concatenate([np.arange(j) for j in lens])
>>> rows = np.concatenate([np.repeat(j, len_) for j, len_ in enumerate(lens)])
>>> data = np.concatenate(a)
>>> b = sps.coo_matrix((data,(rows, cols)))
>>> b.toarray()
array([[0, 1, 2, 3, 4, 0, 0],
[0, 1, 2, 3, 4, 5, 6],
[0, 1, 2, 0, 0, 0, 0]])

Categories

Resources