I have a large array of thousands of vals in numpy. I want to decrease its size by averaging adjacent values.
For example:
a = [2,3,4,8,9,10]
#average down to 2 values here
a = [3,9]
#it averaged 2,3,4 and 8,9,10 together
So, basically, I have n number of elements in array, and I want to tell it to average down to X number of values, and it averages like above.
Is there some way to do that with numpy (already using it for other things, so I'd like to stick with it).
Using reshape and mean, you can average every m adjacent values of an 1D-array of size N*m, with N being any positive integer number. For example:
import numpy as np
m = 3
a = np.array([2, 3, 4, 8, 9, 10])
b = a.reshape(-1, m).mean(axis=1)
#array([3., 9.])
1)a.reshape(-1, m) will create a 2D image of the array without copying data:
array([[ 2, 3, 4],
[ 8, 9, 10]])
2)taking the mean in the second axis (axis=1) will then calculate the mean value of each row, resulting in:
array([3., 9.])
Try this:
n_averaged_elements = 3
averaged_array = []
a = np.array([ 2, 3, 4, 8, 9, 10])
for i in range(0, len(a), n_averaged_elements):
slice_from_index = i
slice_to_index = slice_from_index + n_averaged_elements
averaged_array.append(np.mean(a[slice_from_index:slice_to_index]))
>>>> averaged_array
>>>> [3.0, 9.0]
Looks like a simple non-overlapping moving window average to me, how about:
In [3]:
import numpy as np
a = np.array([2,3,4,8,9,10])
window_sz = 3
a[:len(a)/window_sz*window_sz].reshape(-1,window_sz).mean(1)
#you want to be sure your array can be reshaped properly, so the [:len(a)/window_sz*window_sz] part
Out[3]:
array([ 3., 9.])
In this example, I presume that a is the 1D numpy array that needs to be averaged. In the method that I give below, we first find the factors of the length of this array a. And, then we choose the an appropriate factor as the step size to average the array with.
Here is the code.
import numpy as np
from functools import reduce
''' Function to find factors of a given number 'n' '''
def factors(n):
return list(set(reduce(list.__add__,
([i, n//i] for i in range(1, int(n**0.5) + 1) if n % i == 0))))
a = [2,3,4,8,9,10] #Given array.
'''fac: list of factors of length of a.
In this example, len(a) = 6. So, fac = [1, 2, 3, 6] '''
fac = factors(len(a))
'''step: choose an appropriate step size from the list 'fac'.
In this example, we choose one of the middle numbers in fac
(3). '''
step = fac[int( len(fac)/3 )+1]
'''avg: initialize an empty array. '''
avg = np.array([])
for i in range(0, len(a), step):
avg = np.append( avg, np.mean(a[i:i+step]) ) #append averaged values to `avg`
print avg #Prints the final result
[3.0, 9.0]
Related
I have a matrix in the following form:
import numpy as np
matrix = np.array([[-2,2,6,7,8],[-3,7,1,0,-2]])
I want to find the location of the column with the highest possible value in the first row conditional on non-negative numbers in the second row e.g. in my case I want the algorithm to find the 4th row.
solution = np.array([7,0])
column_location = 3
I tried using numpy functions like np.min(), np.max(),np.take() but I loose the location information when subsampling the matrix.
Simply:
nn = np.where(matrix[1] >= 0)[0]
ix = nn[matrix[0, nn].argmax()]
On your data:
>>> ix
3
Here's a sketch:
pos_inds = np.where(matrix[1, :] >= 0)[0] # indices where 2nd row is positive
max_ind = matrix[0, pos_inds].argmax() # max index into row with positive values only
orig_max_ind = pos_inds[max_ind] # max index into the original array
print(orig_max_ind) # 3
print(matrix[:, orig_max_ind]) # [7, 0]
Here I use the masks to handle the numpy, and also consider that if all the numbers are negative in second column, there will be no solution:
import numpy as np
import numpy.ma as ma
from copy import deepcopy
min_int = -2147483648
matrix = np.array([[-2, 2, 6, 7, 8], [-3, 7, 1, 0, -2]])
# we keep the original matrix untouched
matrix_copy = deepcopy(matrix)
masked_array = ma.masked_less(matrix[1], 0)
matrix_copy[0][masked_array.mask] = min_int
column_location = np.argmax(matrix_copy[0])
if matrix_copy[0][column_location] == min_int:
print("No solution")
else:
solution = np.array([matrix[0][column_location], matrix[1][column_location]])
print(solution) # np.array([7,0])
print(column_location) # 3
I have an array of data-points, for example:
[10, 9, 8, 7, 6, 5, 4, 3, 2, 1]
and I need to perform the following sum on the values:
However, the problem is that I need to perform this sum on each value > i. For example, using the last 3 values in the set the sum would be:
and so on up to 10.
If i run something like:
import numpy as np
x = np.array([10, 9, 8, 7, 6, 5, 4, 3, 2, 1])
alpha = 1/np.log(2)
for i in x:
y = sum(x**(alpha)*np.log(x))
print (y)
It returns a single value of y = 247.7827060452275, whereas I need an array of values. I think I need to reverse the order of the data to achieve what I want but I'm having trouble visualising the problem (hope I explained it properly) as a whole so any suggestions would be much appreciated.
The following computes all the partial sums of the grand sum in your formula
import numpy as np
# Generate numpy array [1, 10]
x = np.arange(1, 11)
alpha = 1 / np.log(2)
# Compute parts of the sum
parts = x ** alpha * np.log(x)
# Compute all partial sums
part_sums = np.cumsum(parts)
print(part_sums)
You really do not any explicit loop, or a non-numpy operation (like sum()) here. numpy takes care of all your needs.
I want to generate a binary matrix of numbers with M rows and N columns. Each row must sum to <=p and >=q. In other words, each row must have at most p and at least q ones.
This is the code I have been using.
import numpy as np
def randbin(M, N, P):
return np.random.choice([0, 1], size=(M, N), p=[P, 1 - P])
MyMatrix = randbin(200, 7, 0.5)
Notice that row 0 is all zeros:
I noticed that some rows have all zeros and some rows have all ones. How can I modify this to get what I want? Is there an efficient way of achieving this solution?
You can generate a random number in [q, p] for each row and then set that many random ones in each row. If by efficient you mean vectorized, then yes, there is an efficient way. The trick is to simulate sampling without replacement in one axis but with the the other. This can be done with np.argsort. You can select a variable number of indices by turning a random vector into a mask.
def randbin(m, n, p, q):
# output to assign ones into
result = np.zeros((m, n), dtype=bool)
# simulate sampling with replacement in one axis
col_ind = np.argsort(np.random.random(size=(m, n)), axis=1)
# figure out how many samples to take in each row
count = np.random.randint(p, q + 1, size=(m, 1))
# turn it into a mask over col_ind using a clever broadcast
mask = np.arange(n) < count
# apply the mask not only to col_ind, but also the corresponding row_ind
col_ind = col_ind[mask]
row_ind = np.broadcast_to(np.arange(m).reshape(-1, 1), (m, n))[mask]
# Set the corresponding elements to 1
result[row_ind, col_ind] = 1
return result
The selection is made so that each run of equal values in row_ind is between p and q elements long. The corresponding elements of col_ind are unique and uniformly distributed within each row.
An alternative is #Prunes solution. It requires np.argsort to shuffle the rows independently, since np.random.shuffle would keep the rows together:
def randbin(m, n, p, q):
# make the unique rows
options = np.arange(n) < np.arange(p, q + 1).reshape(-1, 1)
# select random unique row to go into each output row
selection = np.random.choice(options.shape[0], size=m, replace=True)
# perform the selection
result = options[selection]
# create indices to shuffle each row independently
col_ind = np.argsort(np.random.random(result.shape), axis=1)
row_ind = np.arange(m).reshape(-1, 1)
# perform the shuffle
result = result[row_ind, col_ind]
return result
Okay, then: a uniform distribution is easy enough. Let's take that case with [2,5] 1s required. Use a list of the allowable combinations:
[ [1, 1, 0, 0, 0, 0],
[1, 1, 1, 0, 0, 0],
[1, 1, 1, 1, 0, 0],
[1, 1, 1, 1, 1, 0] ]
For each of your rows, choose a random element from these four, and then shuffle it. There is your row.
I'm trying to vectorize a very simple operation but can't seem to figure out how.
Given a very large numerical vector (over 1M positions) and another array of size n with a given set of positions, I would like to get back a vector of size n with elements being the average of the values of the first vector as specified by the second
a = np.array([1,2,3,4,5,6,7])
b = np.array([[0,1],[2],[3,5],[4,6]])
c = [1.5,3,5,6]
I need to repeat this operation many times so performance is an issue.
Vanilla python solution:
import numpy as np
import time
a = np.array([1,2,3,4,5,6,7])
b = np.array([[0,1],[2],[3,5],[4,6]])
begin = time.time()
for i in range(100000):
c = []
for d in b:
c.append(np.mean(a[d]))
print(time.time() - begin, c)
# 3.7529971599578857 [1.5, 3.0, 5.0, 6.0]
I'm not sure if this is necessarily faster but you may as well try:
import numpy as np
a = np.array([1, 2, 3, 4, 5, 6, 7])
b = np.array([[0, 1], [2], [3, 5], [4, 6]])
# Get the length of each subset of indices
lens = np.fromiter((len(bi) for bi in b), count=len(b), dtype=np.int32)
# Compute reduction indices
reduce_idx = np.roll(np.cumsum(lens), 1)
reduce_idx[0] = 0
# Make flattened array of index lists
idx = np.fromiter((i for bi in b for i in bi), count=lens.sum(), dtype=np.int32)
# Reorder according to indices
a2 = a[idx]
# Sum reordered array at reduction indices and divide by number of indices
c = np.add.reduceat(a2, reduce_idx) / lens
print(c)
# [1.5 3. 5. 6. ]
I am trying to create a random square matrix of nxn random numbers with numpy. Of course I am able to generate enough random numbers but I am having trouble using numpy to create a matrix of variable length. This is as far as I have gotten thus far:
def testMatrix(size):
a = []
for i in range(0, size*size):
a.append(randint(0, 5))
How can I put this list into an array of size x size?
Try
np.random.randint(0, 5, size=(s, s))
Your test matrix is currently one dimensional. If you want to create a random matrix with numpy, you can just do:
num_rows = 3
num_columns = 3
random_matrix = numpy.random.random((num_rows, num_columns))
The result would be:
array([[ 0.15194989, 0.21977027, 0.85063633],
[ 0.1879659 , 0.09024749, 0.3566058 ],
[ 0.18044427, 0.59143149, 0.25449112]])
You can also create a random matrix without numpy:
import random
num_rows = 3
num_columns = 3
random_matrix = [[random.random() for j in range(num_rows)] for i in range(num_columns)]
The result would be:
[[0.9982841729782105, 0.9659048749818827, 0.22838327707784145],
[0.3524666409224604, 0.1918744765283834, 0.7779130503458696],
[0.5239230720346117, 0.0224389713805887, 0.6547162177880549]]
Per the comments, I've added a function to convert a one dimensional array to a matrix:
def convert_to_matrix(arr, chunk_size):
return [arr[i:i+chunk_size] for i in range(0, len(arr), chunk_size)]
arr = [1,2,3,4,5,6,7,8,9]
matrix = convert_to_matrix(arr, 3)
# [[1, 2, 3], [4, 5, 6], [7, 8, 9]] is the output