Use numpy to stack combinations of a 1D and 2D array - python

I have 2 numpy arrays, one 2D and the other 1D, for example like this:
import numpy as np
a = np.array(
[
[1, 2],
[3, 4],
[5, 6]
]
)
b = np.array(
[7, 8, 9, 10]
)
I want to get all possible combinations of the elements in a and b, treating a like a 1D array, so that it leaves the rows in a intact, but also joins the rows in a with the items in b. It would look something like this:
>>> combine1d(a, b)
[ [1 2 7] [1 2 8] [1 2 9] [1 2 10]
[3 4 7] [3 4 8] [3 4 9] [3 4 10]
[5 6 7] [5 6 8] [5 6 9] [5 6 10] ]
I know that there are slow solutions for this (like a for loop), but I need a fast solution to this as I am working with datasets with millions of integers.
Any ideas?

This is one of those cases where it's easier to build a higher dimensional object, and then fix the axes when you're done. The first two dimensions are the length of b and the length of a. The third dimension is the number of elements in each row of a plus 1. We can then use broadcasting to fill in this array.
x, y = a.shape
z, = b.shape
result = np.empty((z, x, y + 1))
result[...,:y] = a
result[...,y] = b[:,None]
At this point, to get the exact answer you asked for, you'll need to swap the first two axes, and then merge those two axes into a single axis.
result.swapaxes(0, 1).reshape(-1, y + 1)
An hour later. . . .
I realized by being a little bit more clever, I didn't need to swap axes. This also has the nice benefit that the result is a contiguous array.
def convert1d(a, b):
x, y = a.shape
z, = b.shape
result = np.empty((x, z, y + 1))
result[...,:y] = a[:,None,:]
result[...,y] = b
return result.reshape(-1, y + 1)

this is very "scotch tape" solution:
import numpy as np
a = np.array(
[
[1, 2],
[3, 4],
[5, 6]
]
)
b = np.array(
[7, 8, 9, 10]
)
z = []
for x in b:
for y in a:
z.append(np.append(y, x))
np.array(z).reshape(3, 4, 3)

You need to use np.c_ to attach to join two dataframe. I also used np.full to generate a column of second array (b). The result are like what follows:
result = [np.c_[a, np.full((a.shape[0],1), x)] for x in b]
result
Output
[array([[1, 2, 7],
[3, 4, 7],
[5, 6, 7]]),
array([[1, 2, 8],
[3, 4, 8],
[5, 6, 8]]),
array([[1, 2, 9],
[3, 4, 9],
[5, 6, 9]]),
array([[ 1, 2, 10],
[ 3, 4, 10],
[ 5, 6, 10]])]
The output might be kind of messy. But it's exactly like what you mentioned as your desired output. To make sure, you cun run below to see what comes from the first element in the result array:
print(result[0])
Output
array([[1, 2, 7],
[3, 4, 7],
[5, 6, 7]])

Related

Elements overlapping rows and columns

Question:
Create a array x of shape (n_row.n_col), having first n natural numbers.
N = 30, n_row= 6, n_col=5
Print elements, overlapping first two rows and last three columns.
Expected output:
[[2 3 4]
[7 8 9]]
My output:
[2 3 7 8]
My approach:
x = np.arange (n)
x= x.reshape(n_row,n_col)
a= np. intersect1d(x[0:2,],x[:,-3:-1])
print (a)
I couldn't think of anything else, please help
The overlap of row and column slices of the same array is just the combined slice
import numpy as np
x = np.arange(30).reshape(6, 5)
x[:2,-3:]
Output
array([[2, 3, 4],
[7, 8, 9]])
To compute the overlap by finding same elements is odd but possible
r, c = np.where(np.isin(x, np.intersect1d(x[:2], x[:,-3:])))
x[np.ix_(np.unique(r), np.unique(c))]
Output
array([[2, 3, 4],
[7, 8, 9]])
I think the answers are a bit convoluted...
Personally from the original question:
Question: Create a array x of shape (n_row.n_col), having first n natural numbers. N = 30, n_row= 6, n_col=5
Print elements, overlapping first two rows and last three columns.
I understand "sub-indexing":
N, n_rows, n_cols = 30, 6, 5
a = np.arange(N).reshape(n_rows, n_cols)
print(a[:2, -3:])
Output:
[[2, 3, 4],
[7, 8, 9]]

Appending contents of 1D numpy array to another 2D numpy array

I have three numpy arrays. The shape of the first is (413, 2), the shape of the second is (176, 2), and the shape of the third is (589,). If you'll notice, 413 + 176 = 589. What I want to accomplish is to use the 589 values of the third np array and make the first two arrays of shapes (413, 3) and (176, 3) respectively.
So, what I want is to take the values in the third np array and append them to the columns of the first and second np arrays. I can do the logic for applying to the first and then using the offset of the length of the first to continue appending to the second with the correct values. I suppose I could also combine np arrays 1 and 2, they are separated for a reason though because of my data preprocessing.
To put it visually if that helps, what I have is like this:
Array 1:
[[1 2]
[3 4]
[4 5]]
Array 2:
[[6 7]
[8 9]
[10 11]]
Array 3:
[1 2 3 4 5 6]
And what I want to have is:
Array 1:
[[1 2 1]
[3 4 2]
[4 5 3]]
Array 2:
[[6 7 4]
[8 9 5]
[10 11 6]]
I've tried using np.append, np.concatenate, and np.vstack but have not been able to achieve what I am looking for. I am relatively new to using numpy, and Python in general, so I imagine I am just using these tools incorrectly.
Many thanks for any help that can be offered! This is my first time asking a question here so if I did anything wrong or left anything out please let me know.
Split the third array using the length of array1, then horizontally stack them. You need to use either np.newaxis or array.reshape to change the dimensionality of the slice of array3.
import numpy as np
array1 = np.array(
[[1, 2],
[3, 4],
[4, 5]]
)
array2 = np.array(
[[6, 7],
[8, 9],
[10, 11]]
)
array3 = np.array([1, 2, 3, 4, 5, 6])
array13 = np.hstack([array1, array3[:len(array1), np.newaxis]])
array23 = np.hstack([array1, array3[len(array1):, np.newaxis]])
Outputs:
array13
array([[1, 2, 4],
[3, 4, 5],
[4, 5, 6]])
array23
array([[ 6, 7, 4],
[ 8, 9, 5],
[10, 11, 6]])

Slice an array into segments

Suppose I have an array [1,2,3,4,5,6,7,8], and the array is composed of two samples [1,2,3,4], and [5,6,7,8]. For each sample, I want to do a slicing window with window size n. And if there are not enough elements, pad the result with the last elements. Each row in the return value should be the sliced window starting from the element in that row.
For example:
if n=3, then the result should be:
[[1,2,3],
[2,3,4],
[3,4,4],
[4,4,4],
[5,6,7],
[6,7,8],
[7,8,8],
[8,8,8]]
How can I achieve this with efficient slicing instead of a for loop? Thanks.
Similar approach of #hpaulj using some numpy built-in functionalities
import numpy as np
samples = [[1,2,3,4],[5,6,7,8]]
ws = 3 #window size
# add padding
samples = [s + [s[-1]]*(ws-1) for s in samples]
# rolling window function for arrays
def rolling_window(a, window):
shape = a.shape[:-1] + (a.shape[-1]-window+1, window)
strides = a.strides + (a.strides[-1],)
return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)
result = sum([rolling_window(np.array(s), ws).tolist() for s in samples ], [])
result
[[1, 2, 3],
[2, 3, 4],
[3, 4, 4],
[4, 4, 4],
[5, 6, 7],
[6, 7, 8],
[7, 8, 8],
[8, 8, 8]]
A python list approach:
In [201]: order = [1,3,2,3,5,8]
In [202]: samples = [[1,2,3,4],[5,6,7,8]]
expand samples to take care of the padding issue:
In [203]: samples = [row+([row[-1]]*n) for row in samples]
In [204]: samples
Out[204]: [[1, 2, 3, 4, 4, 4, 4], [5, 6, 7, 8, 8, 8, 8]]
define a function:
def foo(i, samples):
for row in samples:
try:
j = row.index(i)
except ValueError:
continue
return row[j:j+n]
In [207]: foo(3,samples)
Out[207]: [3, 4, 4]
In [208]: foo(9,samples) # non-found case isn't handled well
for all the order elements:
In [209]: [foo(i,samples) for i in order]
Out[209]: [[1, 2, 3], [3, 4, 4], [2, 3, 4], [3, 4, 4], [5, 6, 7], [8, 8, 8]]
I have a simple oneliner :
import numpy as np
samples = np.array([[1,2,3,4],[5,6,7,8]])
n,d = samples.shape
ws = 3
result = samples[:,np.minimum(np.arange(d)[:,None]+np.arange(ws)[None,:],d-1)]
The output is :
No loop, only broadcasting. This makes it probably the most efficient way of doing it. The dimension of the output is not exactly what you asked for, but it is easy to correct with a simple np.reshape
[[[1 2 3]
[2 3 4]
[3 4 4]
[4 4 4]]
[[5 6 7]
[6 7 8]
[7 8 8]
[8 8 8]]]

index 3 is out of bounds for axis 0 with size 3 (python)

So i looked it up a couple of times and i just cant figure it out so I think it would be best to ask for some help.So here is the deal t0 is a [1000,1], x is a [1000,1] and y is a [1000,1], m=1000, suma1=0 and every time i run it i get this stupid error
index 3 is out of bounds for axis 0 with size 3
for i in range(m):
suma1+=((t0[i] + x[i]- y[i])**2)
Note that t0 is a [1000,1], x is a [1000,1] and y is a [1000,1] This means that each of these values is a two dimensional list. However, the code that you are showing appears to want a single dimensioned list summing up the values.
If t0, x, y are two dimensioned lists, then you are concatenating the lists and not adding the values.
t0 = [[1, 2, 3, 4], 1], [[9, 8, 7, 6], 2]
x = [[8, 7, 4, 7], 3], [[6, 4, 2, 8], 4]
t0[1] + x[1] = [[9, 8, 7, 6], 2, [6, 4, 2, 8], 4]]
To do the arithmatic you need singly dimensioned arrays such as:
t0 = [1, 2, 3, 4, 5]
x = [8, 7, 4, 7, 1]
y = [3, 7, 9, 4, 8]
You need to show more code and show sample values (using m with a range of 10)
Also run your code as individual lines
suma1 = 0
for i in range(m):
suma1 += ((t0[i] + x[i]- y[i])**2)
are x, y and t0 1D numpy arrays or what?
Try to find out how to correctly index those arrays?/lists? in the correct way. For example
t0[999,1] vs t0[1,999]
Edit:
In that case I would try using:
suma1+=((t0[i,1] + x[i,1]- y[i,1])**2)
or
suma1+=((t0[1,i] + x[1,i]- y[1,i])**2)

Calculating Mean of arrays with different lengths

Is it possible to calculate the mean of multiple arrays, when they may have different lengths? I am using numpy. So let's say I have:
numpy.array([[1, 2, 3, 4, 8], [3, 4, 5, 6, 0]])
numpy.array([[5, 6, 7, 8, 7, 8], [7, 8, 9, 10, 11, 12]])
numpy.array([[1, 2, 3, 4], [5, 6, 7, 8]])
Now I want to calculate the mean, but ignoring elements that are 'missing' (Naturally, I can not just append zeros as this would mess up the mean)
Is there a way to do this without iterating through the arrays?
PS. These arrays are all 2-D, but will always have the same amount of coordinates for that array. I.e. the 1st array is 5 and 5, 2nd is 6 and 6, 3rd is 4 and 4.
An example:
np.array([[1, 2], [3, 4]])
np.array([[1, 2, 3], [3, 4, 5]])
np.array([[7], [8]])
This must give
(1+1+7)/3 (2+2)/2 3/1
(3+3+8)/3 (4+4)/2 5/1
And graphically:
[1, 2] [1, 2, 3] [7]
[3, 4] [3, 4, 5] [8]
Now imagine that these 2-D arrays are placed on top of each other with coordinates overlapping contributing to that coordinate's mean.
I often needed this for plotting mean of performance curves with different lengths.
Solved it with simple function (based on answer of #unutbu):
def tolerant_mean(arrs):
lens = [len(i) for i in arrs]
arr = np.ma.empty((np.max(lens),len(arrs)))
arr.mask = True
for idx, l in enumerate(arrs):
arr[:len(l),idx] = l
return arr.mean(axis = -1), arr.std(axis=-1)
y, error = tolerant_mean(list_of_ys_diff_len)
ax.plot(np.arange(len(y))+1, y, color='green')
So applying that function to the list of above-plotted curves yields the following:
numpy.ma.mean allows you to compute the mean of non-masked array elements. However, to use numpy.ma.mean, you have to first combine your three numpy arrays into one masked array:
import numpy as np
x = np.array([[1, 2], [3, 4]])
y = np.array([[1, 2, 3], [3, 4, 5]])
z = np.array([[7], [8]])
arr = np.ma.empty((2,3,3))
arr.mask = True
arr[:x.shape[0],:x.shape[1],0] = x
arr[:y.shape[0],:y.shape[1],1] = y
arr[:z.shape[0],:z.shape[1],2] = z
print(arr.mean(axis = 2))
yields
[[3.0 2.0 3.0]
[4.66666666667 4.0 5.0]]
The below function also works by adding columns of arrays of different lengths:
def avgNestedLists(nested_vals):
"""
Averages a 2-D array and returns a 1-D array of all of the columns
averaged together, regardless of their dimensions.
"""
output = []
maximum = 0
for lst in nested_vals:
if len(lst) > maximum:
maximum = len(lst)
for index in range(maximum): # Go through each index of longest list
temp = []
for lst in nested_vals: # Go through each list
if index < len(lst): # If not an index error
temp.append(lst[index])
output.append(np.nanmean(temp))
return output
Going off of your first example:
avgNestedLists([[1, 2, 3, 4, 8], [5, 6, 7, 8, 7, 8], [1, 2, 3, 4]])
Outputs:
[2.3333333333333335,
3.3333333333333335,
4.333333333333333,
5.333333333333333,
7.5,
8.0]
The reason np.amax(nested_lst) or np.max(nested_lst) was not used in the beginning to find the max value is because it will return an array if the nested lists are of different sizes.
OP, I know you were looking for a non-iterative built-in solution, but the following really only takes 3 lines (2 if you combine transpose and means but then it just gets messy):
arrays = [
np.array([1,2], [3,4]),
np.array([1,2,3], [3,4,5]),
np.array([7], [8])
]
mean = lambda x: sum(x)/float(len(x))
transpose = [[item[i] for item in arrays] for i in range(len(arrays[0]))]
means = [[mean(j[i] for j in t if i < len(j)) for i in range(len(max(t, key = len)))] for t in transpose]
Outputs:
>>>means
[[3.0, 2.0, 3.0], [4.666666666666667, 4.0, 5.0]]

Categories

Resources