Numpy Matrix initialization with ascending numbers for rows - python

I try to have a matrix like:
M= [[1,1,..,1],
[2,2,..,2],
...
[40000, 40000, ..,40000]
It's what I tried:
data = np.mat((40000,8))
print(data.shape)
for i in range(data.shape[0]):
data[i,:] = i
print(data[:5])
The above code prints:
(1, 2)
[[0 0]]
I know how to fill a matrix with constant values, but I couldn't find a similar question for this case.

Use a simple array and don't forget that Python starts indexing at 0:
data = np.zeros((40000,8))
for i in range(data.shape[0]):
data[i,:] = i+1

Here's a way using numpy:
rows = 10
cols = 3
l = np.arange(1,rows)
np.tile(l,cols).reshape(cols,rows-1).T
array([[1, 1, 1],
[2, 2, 2],
[3, 3, 3],
[4, 4, 4],
[5, 5, 5],
[6, 6, 6],
[7, 7, 7],
[8, 8, 8],
[9, 9, 9]])

Matthieu Brucher's answer will perfectly do for your case. If you are looking at numbers much higher than 4000 and if time is an issue, you might want to get rid of the for-loop and create a list of lists with list comprehension before turning it into a numpy array:
a = [[i]*8 for i in range(1,4001)]
m = np.asarray(a)
In my case, this solution was ~7 times faster.

To use numpy broadcast over iterations u can do,
import numpy as np
M = np.ones((40000,8), dtype=np.int).T * np.arange(1, 40001)
M = M.T
print(M)
This should be faster than any above iterations.
If that's what u are looking for

Very simple:
data = np.arange(1, 40001).repeat(8).reshape(-1,8)
Though this is pure numpy as well, this is considerably slower than #yatu's solution.

Related

Combinations between two arrays numpy different shapes

I know it is possible to use meshgrid to get all combinations between two arrays using numpy.
But in my case I have an array of two columns and n rows and another array that I would like to get the unique combinations.
For example:
a = [[1,1],
[2,2],
[3,3]]
b = [5,6]
# The expected result would be:
final_array = [[1,1,5],
[1,1,6],
[2,2,5],
[2,2,6],
[3,3,5],
[3,3,6]]
Which method is the fastest way to get this result using only numpy?
Proposed solution
Ok got the result, but I would like to know if this is a reliable and fast solution for this task, if someone could give me any advice I will appreciate.
a_t = np.tile(a, len(b)).reshape(-1,2)
b_t = np.tile(b, len(a)).reshape(1,-1)
final_array = np.hstack((a_t,b_t.T))
array([[1, 1, 5],
[1, 1, 6],
[2, 2, 5],
[2, 2, 6],
[3, 3, 5],
[3, 3, 6]])
Kind of ugly, but here's one way:
xx = np.repeat(a, len(b)).reshape(-1, a.shape[1])
yy = np.tile(b, a.shape[0])[:, None]
np.concatenate((xx, yy), axis=1)

Python: Concatenate all combinations of numpy array rows

In python need to combine two 2-dimensional numpy arrays, so that the resulting rows are combinations of the rows from the input arrays concatenated together. I need the fastest solution so it can be used in arrays that are very big.
For example:
I got:
import numpy as np
array1 = np.array([[1,2],[3,4]])
array2 = np.array([[5,6],[7,8]])
I want the code to return:
[[1,2,5,6]
[1,2,7,8]
[3,4,5,6]
[3,4,7,8]]
Solution using numpy's repeat, tile and hstack
The snippet
result = np.hstack([
np.repeat(array1, array2.shape[0], axis=0),
np.tile(array2, (array1.shape[0], 1))
])
Step by step explanation
We start with the two arrays, array1 and array2:
import numpy as np
array1 = np.array([[1,2],[3,4]])
array2 = np.array([[5,6],[7,8]])
First, we duplicate the content of array1 using repeat:
a = np.repeat(array1, array2.shape[0], axis=0)
The content of a is:
array([[1, 2],
[1, 2],
[3, 4],
[3, 4]])
Then we repeat the second array, array2, using tile. In particular, (array1.shape[0],1) replicates array2 in the first direction array1.shape[0] times and just 1 time in the other direction.
b = np.tile(array2, (array1.shape[0],1))
The result is:
array([[5, 6],
[7, 8],
[5, 6],
[7, 8]])
Now we can just proceed to stack the two results, using hstack:
result = np.hstack([a,b])
Achieving the desired output:
array([[1, 2, 5, 6],
[1, 2, 7, 8],
[3, 4, 5, 6],
[3, 4, 7, 8]])
For this small example, itertools.product is actually faster. I don't know how it scales
alist = list(itertools.product(array1.tolist(),array2.tolist()))
np.array(alist).reshape(-1,4)

How to get max (top) N values across entire numpy matrix

I want to get the top N (maximal) args & values across an entire numpy matrix, as opposed to across a single dimension (rows / columns).
Example input (with N=3):
import numpy as np
mat = np.matrix([[9,8, 1, 2], [3, 7, 2, 5], [0, 3, 6, 2], [0, 2, 1, 5]])
print(mat)
[[9 8 1 2]
[3 7 2 5]
[0 3 6 2]
[0 2 1 5]]
Desired output: [9, 8, 7]
Since max isn't transitive across a single dimension, going by rows or columns doesn't work.
# by rows, no 8
np.squeeze(np.asarray(mat.max(1).reshape(-1)))[:3]
array([9, 7, 6])
# by cols, no 7
np.squeeze(np.asarray(mat.max(0)))[:3]
array([9, 8, 6])
I have code that works, but looks really clunky to me.
# reshape into single vector
mat_as_vector = np.squeeze(np.asarray(mat.reshape(-1)))
# get top 3 arg positions
top3_args = mat_as_vector.argsort()[::-1][:3]
# subset the reshaped matrix
top3_vals = mat_as_vector[top3_args]
print(top3_vals)
array([9, 8, 7])
Would appreciate any shorter way / more efficient way / magic numpy function to do this!
Using numpy.partition() is significantly faster than performing full sort for this purpose:
np.partition(np.asarray(mat), mat.size - N, axis=None)[-N:]
assuming N<=mat.size.
If you need the final result also be sorted (besides being top N), then you need to sort previous result (but presumably you will be sorting a smaller array than the original one):
np.sort(np.partition(np.asarray(mat), mat.size - N, axis=None)[-N:])
If you need the result sorted from largest to lowest, post-pend [::-1] to the previous command:
np.sort(np.partition(np.asarray(mat), mat.size - N, axis=None)[-N:])[::-1]
One way may be with flatten and sorted and slice top n values:
sorted(mat.flatten().tolist()[0], reverse=True)[:3]
Result:
[9, 8, 7]
The idea is from this answer: How to get indices of N maximum values in a numpy array?
import numpy as np
import heapq
mat = np.matrix([[9,8, 1, 2], [3, 7, 2, 5], [0, 3, 6, 2], [0, 2, 1, 5]])
ind = heapq.nlargest(3, range(mat.size), mat.take)
print(mat.take(ind).tolist()[0])
Output
[9, 8, 7]

Calculating Mean of arrays with different lengths

Is it possible to calculate the mean of multiple arrays, when they may have different lengths? I am using numpy. So let's say I have:
numpy.array([[1, 2, 3, 4, 8], [3, 4, 5, 6, 0]])
numpy.array([[5, 6, 7, 8, 7, 8], [7, 8, 9, 10, 11, 12]])
numpy.array([[1, 2, 3, 4], [5, 6, 7, 8]])
Now I want to calculate the mean, but ignoring elements that are 'missing' (Naturally, I can not just append zeros as this would mess up the mean)
Is there a way to do this without iterating through the arrays?
PS. These arrays are all 2-D, but will always have the same amount of coordinates for that array. I.e. the 1st array is 5 and 5, 2nd is 6 and 6, 3rd is 4 and 4.
An example:
np.array([[1, 2], [3, 4]])
np.array([[1, 2, 3], [3, 4, 5]])
np.array([[7], [8]])
This must give
(1+1+7)/3 (2+2)/2 3/1
(3+3+8)/3 (4+4)/2 5/1
And graphically:
[1, 2] [1, 2, 3] [7]
[3, 4] [3, 4, 5] [8]
Now imagine that these 2-D arrays are placed on top of each other with coordinates overlapping contributing to that coordinate's mean.
I often needed this for plotting mean of performance curves with different lengths.
Solved it with simple function (based on answer of #unutbu):
def tolerant_mean(arrs):
lens = [len(i) for i in arrs]
arr = np.ma.empty((np.max(lens),len(arrs)))
arr.mask = True
for idx, l in enumerate(arrs):
arr[:len(l),idx] = l
return arr.mean(axis = -1), arr.std(axis=-1)
y, error = tolerant_mean(list_of_ys_diff_len)
ax.plot(np.arange(len(y))+1, y, color='green')
So applying that function to the list of above-plotted curves yields the following:
numpy.ma.mean allows you to compute the mean of non-masked array elements. However, to use numpy.ma.mean, you have to first combine your three numpy arrays into one masked array:
import numpy as np
x = np.array([[1, 2], [3, 4]])
y = np.array([[1, 2, 3], [3, 4, 5]])
z = np.array([[7], [8]])
arr = np.ma.empty((2,3,3))
arr.mask = True
arr[:x.shape[0],:x.shape[1],0] = x
arr[:y.shape[0],:y.shape[1],1] = y
arr[:z.shape[0],:z.shape[1],2] = z
print(arr.mean(axis = 2))
yields
[[3.0 2.0 3.0]
[4.66666666667 4.0 5.0]]
The below function also works by adding columns of arrays of different lengths:
def avgNestedLists(nested_vals):
"""
Averages a 2-D array and returns a 1-D array of all of the columns
averaged together, regardless of their dimensions.
"""
output = []
maximum = 0
for lst in nested_vals:
if len(lst) > maximum:
maximum = len(lst)
for index in range(maximum): # Go through each index of longest list
temp = []
for lst in nested_vals: # Go through each list
if index < len(lst): # If not an index error
temp.append(lst[index])
output.append(np.nanmean(temp))
return output
Going off of your first example:
avgNestedLists([[1, 2, 3, 4, 8], [5, 6, 7, 8, 7, 8], [1, 2, 3, 4]])
Outputs:
[2.3333333333333335,
3.3333333333333335,
4.333333333333333,
5.333333333333333,
7.5,
8.0]
The reason np.amax(nested_lst) or np.max(nested_lst) was not used in the beginning to find the max value is because it will return an array if the nested lists are of different sizes.
OP, I know you were looking for a non-iterative built-in solution, but the following really only takes 3 lines (2 if you combine transpose and means but then it just gets messy):
arrays = [
np.array([1,2], [3,4]),
np.array([1,2,3], [3,4,5]),
np.array([7], [8])
]
mean = lambda x: sum(x)/float(len(x))
transpose = [[item[i] for item in arrays] for i in range(len(arrays[0]))]
means = [[mean(j[i] for j in t if i < len(j)) for i in range(len(max(t, key = len)))] for t in transpose]
Outputs:
>>>means
[[3.0, 2.0, 3.0], [4.666666666666667, 4.0, 5.0]]

Numpy - add row to array

How does one add rows to a numpy array?
I have an array A:
A = array([[0, 1, 2], [0, 2, 0]])
I wish to add rows to this array from another array X if the first element of each row in X meets a specific condition.
Numpy arrays do not have a method 'append' like that of lists, or so it seems.
If A and X were lists I would merely do:
for i in X:
if i[0] < 3:
A.append(i)
Is there a numpythonic way to do the equivalent?
Thanks,
S ;-)
You can do this:
newrow = [1, 2, 3]
A = numpy.vstack([A, newrow])
What is X? If it is a 2D-array, how can you then compare its row to a number: i < 3?
EDIT after OP's comment:
A = array([[0, 1, 2], [0, 2, 0]])
X = array([[0, 1, 2], [1, 2, 0], [2, 1, 2], [3, 2, 0]])
add to A all rows from X where the first element < 3:
import numpy as np
A = np.vstack((A, X[X[:,0] < 3]))
# returns:
array([[0, 1, 2],
[0, 2, 0],
[0, 1, 2],
[1, 2, 0],
[2, 1, 2]])
As this question is been 7 years before, in the latest version which I am using is numpy version 1.13, and python3, I am doing the same thing with adding a row to a matrix, remember to put a double bracket to the second argument, otherwise, it will raise dimension error.
In here I am adding on matrix A
1 2 3
4 5 6
with a row
7 8 9
same usage in np.r_
A = [[1, 2, 3], [4, 5, 6]]
np.append(A, [[7, 8, 9]], axis=0)
>> array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
#or
np.r_[A,[[7,8,9]]]
Just to someone's intersted, if you would like to add a column,
array = np.c_[A,np.zeros(#A's row size)]
following what we did before on matrix A, adding a column to it
np.c_[A, [2,8]]
>> array([[1, 2, 3, 2],
[4, 5, 6, 8]])
If you want to prepend, you can just flip the order of the arguments, i.e.:
np.r_([[7, 8, 9]], A)
>> array([[7, 8, 9],
[1, 2, 3],
[4, 5, 6]])
If no calculations are necessary after every row, it's much quicker to add rows in python, then convert to numpy. Here are timing tests using python 3.6 vs. numpy 1.14, adding 100 rows, one at a time:
import numpy as np
from time import perf_counter, sleep
def time_it():
# Compare performance of two methods for adding rows to numpy array
py_array = [[0, 1, 2], [0, 2, 0]]
py_row = [4, 5, 6]
numpy_array = np.array(py_array)
numpy_row = np.array([4,5,6])
n_loops = 100
start_clock = perf_counter()
for count in range(0, n_loops):
numpy_array = np.vstack([numpy_array, numpy_row]) # 5.8 micros
duration = perf_counter() - start_clock
print('numpy 1.14 takes {:.3f} micros per row'.format(duration * 1e6 / n_loops))
start_clock = perf_counter()
for count in range(0, n_loops):
py_array.append(py_row) # .15 micros
numpy_array = np.array(py_array) # 43.9 micros
duration = perf_counter() - start_clock
print('python 3.6 takes {:.3f} micros per row'.format(duration * 1e6 / n_loops))
sleep(15)
#time_it() prints:
numpy 1.14 takes 5.971 micros per row
python 3.6 takes 0.694 micros per row
So, the simple solution to the original question, from seven years ago, is to use vstack() to add a new row after converting the row to a numpy array. But a more realistic solution should consider vstack's poor performance under those circumstances. If you don't need to run data analysis on the array after every addition, it is better to buffer the new rows to a python list of rows (a list of lists, really), and add them as a group to the numpy array using vstack() before doing any data analysis.
You can also do this:
newrow = [1,2,3]
A = numpy.concatenate((A,newrow))
import numpy as np
array_ = np.array([[1,2,3]])
add_row = np.array([[4,5,6]])
array_ = np.concatenate((array_, add_row), axis=0)
I use 'np.vstack' which is faster, EX:
import numpy as np
input_array=np.array([1,2,3])
new_row= np.array([4,5,6])
new_array=np.vstack([input_array, new_row])
I use numpy.insert(arr, i, the_object_to_be_added, axis) in order to insert object_to_be_added at the i'th row(axis=0) or column(axis=1)
import numpy as np
a = np.array([[1, 2, 3], [5, 4, 6]])
# array([[1, 2, 3],
# [5, 4, 6]])
np.insert(a, 1, [55, 66], axis=1)
# array([[ 1, 55, 2, 3],
# [ 5, 66, 4, 6]])
np.insert(a, 2, [50, 60, 70], axis=0)
# array([[ 1, 2, 3],
# [ 5, 4, 6],
# [50, 60, 70]])
Too old discussion, but I hope it helps someone.
If you can do the construction in a single operation, then something like the vstack-with-fancy-indexing answer is a fine approach. But if your condition is more complicated or your rows come in on the fly, you may want to grow the array. In fact the numpythonic way to do something like this - dynamically grow an array - is to dynamically grow a list:
A = np.array([[1,2,3],[4,5,6]])
Alist = [r for r in A]
for i in range(100):
newrow = np.arange(3)+i
if i%5:
Alist.append(newrow)
A = np.array(Alist)
del Alist
Lists are highly optimized for this kind of access pattern; you don't have convenient numpy multidimensional indexing while in list form, but for as long as you're appending it's hard to do better than a list of row arrays.
You can use numpy.append() to append a row to numpty array and reshape to a matrix later on.
import numpy as np
a = np.array([1,2])
a = np.append(a, [3,4])
print a
# [1,2,3,4]
# in your example
A = [1,2]
for row in X:
A = np.append(A, row)

Categories

Resources