combining 2 numpy arrays - python

I have 2 numpy arrays:
one of shape (753,8,1) denoting 8 sequential actions of a customer
and other of shape (753,10) denoting 10 features of a training sample.
How can I combine these two such that:
all 10 features are appended to each of the 8 sequential actions of a training sample , that is, the combined final array should have shape of (753,8,11).

Maybe something like this:
import numpy as np
# create dummy arrays
a = np.zeros((753, 8, 1))
b = np.arange(753*10).reshape(753, 10)
# make a new axis for b and repeat the values along axis 1
c = np.repeat(b[:, np.newaxis, :], 8, axis=1)
c.shape
>>> (753, 8, 10)
# now the first two axes of a and c have the same shape
# append the values in c to a along the last axis
result = np.append(a, c, axis=2)
result.shape
>>> (753, 8, 11)
result[0]
>>> array([[0., 0., 1., 2., 3., 4., 5., 6., 7., 8., 9.],
[0., 0., 1., 2., 3., 4., 5., 6., 7., 8., 9.],
[0., 0., 1., 2., 3., 4., 5., 6., 7., 8., 9.],
[0., 0., 1., 2., 3., 4., 5., 6., 7., 8., 9.],
[0., 0., 1., 2., 3., 4., 5., 6., 7., 8., 9.],
[0., 0., 1., 2., 3., 4., 5., 6., 7., 8., 9.],
[0., 0., 1., 2., 3., 4., 5., 6., 7., 8., 9.],
[0., 0., 1., 2., 3., 4., 5., 6., 7., 8., 9.]])
# values from b (0-9) have been appended to a (0)

Related

Python - Concatenate or stack more than two arrays with different shape

I would like to get an array of size 11x11 with different subarrays, for example the array M composed by the following arrays (shape in parenthesis):
CC(3x3) CA(3x4) CB(3x4)
AC(4x3) AA(4x4) AB(4x4)
BC(4x3) BA(4x4) BB(4x4)
I could use concatenate but it is not optimal. I also tried the stack function, but arrays must have the same shape. Do you have any ideas to do it?
Thanks a lot!
You want np.block(). It creates an array out of 'blocks', like what you have. For e.g.
>>> CC = 1*np.ones((3, 3))
>>> CA = 2*np.ones((3, 4))
>>> CB = 3*np.ones((3, 4))
>>> AC = 4*np.ones((4, 3))
>>> AA = 5*np.ones((4, 4))
>>> AB = 6*np.ones((4, 4))
>>> BC = 7*np.ones((4, 3))
>>> BA = 8*np.ones((4, 4))
>>> BB = 9*np.ones((4, 4))
>>> M = np.block([[CC, CA, CB],
[AC, AA, AB],
[BC, BA, BB]])
>>> M
array([[ 1., 1., 1., 2., 2., 2., 2., 3., 3., 3., 3.],
[ 1., 1., 1., 2., 2., 2., 2., 3., 3., 3., 3.],
[ 1., 1., 1., 2., 2., 2., 2., 3., 3., 3., 3.],
[ 4., 4., 4., 5., 5., 5., 5., 6., 6., 6., 6.],
[ 4., 4., 4., 5., 5., 5., 5., 6., 6., 6., 6.],
[ 4., 4., 4., 5., 5., 5., 5., 6., 6., 6., 6.],
[ 4., 4., 4., 5., 5., 5., 5., 6., 6., 6., 6.],
[ 7., 7., 7., 8., 8., 8., 8., 9., 9., 9., 9.],
[ 7., 7., 7., 8., 8., 8., 8., 9., 9., 9., 9.],
[ 7., 7., 7., 8., 8., 8., 8., 9., 9., 9., 9.],
[ 7., 7., 7., 8., 8., 8., 8., 9., 9., 9., 9.]])

How to make a loop for going through the input variables in python?

I want to modify this code by making maybe some sort of loop that could go through the variable names so I should not write down that 'if data1 is not None' part? I was also wondering if there is a way that I could make some sort of dynamic code that the number of inputs into the function could change somehow, for example, let's say I want to input 100 different data sets, I can't write down all of them in the part of input for function, what should I do for that?
Also, how could I put title for both of the plots? because when I use the plt.title(), it only shows the last title.
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(4)
randomSet = np.random.randint(0, 2, (10, 20))
np.random.seed(3)
randomSet3 = np.random.randint(0, 2, (10, 20))
np.random.seed(2)
randomSet2 = np.random.randint(0, 2, (10, 20))
np.random.seed(1)
randomSet1 = np.random.randint(0, 2, (10, 20))
def showResult(data, data1 = None, data2 = None, data3 = None, data4 = None, data5 = None, nscan = 1):
#index = 0
total = np.zeros(data.shape[0]*data.shape[1])
dataList = [data.reshape(data.shape[0]*data.shape[1])]
if data1 is not None:
dataList.append(data1.reshape(data1.shape[0]*data1.shape[1]))
if data2 is not None:
dataList.append(data2.reshape(data2.shape[0]*data2.shape[1]))
if data3 is not None:
dataList.append(data3.reshape(data3.shape[0]*data3.shape[1]))
if data4 is not None:
dataList.append(data4.reshape(data4.shape[0]*data4.shape[1]))
if data5 is not None:
dataList.append(data5.reshape(data5.shape[0]*data5.shape[1]))
#total = copy.copy(data)
for i in range(nscan):
total += dataList[i]
fig = plt.figure(figsize = (8, 10))
ax1 = fig.add_subplot(211)
ax2 = fig.add_subplot(212)
ax1.imshow(total.reshape(data.shape[0], data.shape[1]), cmap= 'gray', interpolation= 'nearest')
#plt.title('Image')
ax2.hist(total)
#plt.title('Histogram')
plt.show()
return total
showResult(randomSet, randomSet1, randomSet, randomSet3, randomSet, randomSet2, nscan= 6)
Output should be:
array([ 1., 2., 5., 4., 4., 2., 4., 3., 2., 5., 0., 3., 5.,
6., 2., 5., 5., 5., 0., 0., 0., 2., 2., 1., 2., 0.,
4., 0., 5., 4., 4., 4., 1., 6., 2., 1., 3., 1., 4.,
1., 2., 4., 1., 3., 5., 3., 1., 5., 2., 4., 4., 1.,
1., 3., 1., 6., 3., 5., 5., 1., 3., 5., 4., 1., 4.,
3., 5., 5., 4., 5., 2., 1., 4., 1., 2., 1., 6., 3.,
2., 4., 5., 1., 1., 2., 5., 3., 2., 5., 3., 2., 3.,
3., 4., 1., 4., 2., 5., 2., 4., 5., 5., 5., 1., 4.,
5., 0., 4., 1., 5., 1., 5., 2., 2., 2., 1., 3., 1.,
1., 3., 1., 3., 3., 5., 5., 5., 2., 2., 1., 4., 5.,
2., 5., 2., 3., 2., 0., 0., 5., 5., 5., 2., 2., 1.,
1., 4., 4., 4., 2., 5., 2., 4., 5., 4., 2., 2., 1.,
4., 4., 2., 4., 4., 1., 4., 3., 5., 0., 1., 2., 3.,
0., 5., 3., 2., 2., 2., 4., 4., 2., 4., 0., 5., 5.,
2., 3., 0., 1., 1., 5., 3., 1., 3., 5., 1., 2., 3.,
5., 5., 2., 2., 5.])
Output plots
You don't need to hardcore each dataset individually. You can simply call np.random.randint(low, high, (x, y, n)), with n being the number of scans/trials. Summing them along the last axis means you'll get an array with shape (x, y). This can be done trivially with np.sum().
The way to add a title in a subplot can be found here. Overall,
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(0)
sets = 6
data = np.random.randint(0, 2, (10, 20, sets))
def plot_data(data):
total = np.sum(data, axis=-1)
fig = plt.figure(figsize=(8, 10))
ax1 = fig.add_subplot(211)
ax2 = fig.add_subplot(212)
ax1.imshow(total, cmap= 'gray', interpolation= 'nearest')
ax1.set_title('Image')
# best way to flatten a numpy array
ax2.hist(total.flatten())
ax2.set_title('Histogram')
plt.show()
plot_data(data)

Relabeling overlapping segments located in adjacent numpy 2-d blocks (without for-loops)

I have a numpy 2-d array which I divided in several numpy 2-d blocks. All blocks have the same shape. On these blocks I performed K-means segementation using the scikit-learn module. The edges of each block are overlapping (each block has one row/column overlap with the adjacent block). What I want is to give the overlapping segments in two adjacent blocks the same value. My current code can be downloaded here.
Image of the blocks and their position in the original image:
Blocks in python code
blockNW=np.array([[ 0., 0., 0., 0., 5.],
[ 0., 0., 4., 5., 5.],
[ 0., 4., 4., 5., 2.],
[ 0., 4., 5., 5., 2.],
[ 5., 5., 2., 2., 2.]])
blockNE=np.array([[ 1., 18., 18., 18., 6.],
[ 1., 18., 7., 6., 6.],
[ 3., 7., 7., 7., 6.],
[ 3., 3., 3., 7., 7.],
[ 3., 3., 7., 7., 7.]])
blockSW=np.array([[ 8., 8., 8., 10., 10.],
[ 8., 8., 9., 10., 10.],
[ 8., 8., 9., 9., 10.],
[ 8., 8., 8., 9., 10.],
[ 8., 8., 9., 9., 11.]])
blockSE=np.array([[ 12., 12., 12., 12., 12.],
[ 12., 12., 12., 12., 13.],
[ 12., 12., 12., 13., 13.],
[ 12., 12., 13., 13., 13.],
[ 12., 13., 13., 13., 13.]])
blocksStacked=np.array([blockNW,blockNE,blockSW,blockSE])
What I want is to connect the overlapping segments. For this I would like to use as few for-loops as possible, because they are slowing down the code. My current steps are:
import math
import numpy as np
from scipy import ndimage,stats
n_blocks,blocksize = np.shape(blocksStacked)[0],np.shape(blocksStacked)[1]
# shape of original image
out_shp = (8,8)
# horizontal and vertical blocks
horizontal_blocks=math.ceil(out_shp[1]/float(blocksize))
vertical_blocks=math.ceil(out_shp[0]/float(blocksize))
# numpy 2_d array in the shape of the image with an unique ID for each block
blockindex=np.arange(horizontal_blocks*vertical_blocks).reshape(-1,horizontal_blocks)
Block index
def find_neighbours(values,neighbourslist):
'''function to find the index of neighbouring blocks'''
mode=stats.mode(values)
if mode.count>1:
values=np.delete(values,np.where(values==mode[0]))
else:
values=np.delete(values,np.where(values==np.median(values)))
neighbourslist.append(values)
return 0
#Locate overlapping rows and columns per block
neighbourlist=[]
kernel=np.array([[0,1,0],[1,1,1],[0,1,0]],dtype='uint8')
_ =ndimage.generic_filter(blockindex, find_neighbours, footprint=kernel,extra_arguments=(neighbourlist,))
#output (block 0 has neighbours 1 and 2, etc.):
>>> neighbourlist
[array([ 1., 2.]), array([ 0., 3.]), array([ 0., 3.]), array([ 1., 2.])]
Now the next step could be is to loop through all blocks and neighbors and select the overlapping rows or columns (If possible I would also like to remove these loops).
# First I create masks to select overlapping rows or columns:
upmask=np.ones((blocksize,blocksize),dtype=bool)
upmask[1:,:]=0
downmask=np.ones((blocksize,blocksize),dtype=bool)
downmask[:-1,:]=0
rightmask=np.ones((blocksize,blocksize),dtype=bool)
rightmask[:,:-1]=0
leftmask=np.ones((blocksize,blocksize),dtype=bool)
leftmask[:,1:]=0
# Now loop through all blocks and neighbours and select the overlapping rows/columsn
for i in range(n_blocks):
n_neighbours = len(neighbourlist[i])
block=blocksStacked[i,:,:]
for j in range(n_neighbours):
neighborindex=neighbourlist[i][j]
block_neighbour=blocksStacked[neighborindex,:,:]
if i+1==neighborindex:
blockvals=block[rightmask]
neighbourvals=block_neighbour[leftmask]
elif i-1==neighborindex:
blockvals=block[leftmask]
neighbourvals=block_neighbour[rightmask]
elif i+horizontal_blocks==neighborindex:
blockvals=block[downmask]
neighbourvals=block_neighbour[upmask]
elif i-horizontal_blocks==neighborindex:
blockvals=block[upmask]
neighbourvals=block_neighbour[downmask]
In each loop I end up with two numpy 1d arrays representing the overlapping columns or rows. For the first loop I will end up with:
>>> blockvals
array([5., 5., 2., 2., 2.])
>>> neighbourvals
array([1., 1., 3., 3., 3.])
I want to relabel the values of the overlapping segments to the values of the segments in the block which is not a neighbour:
blockNW=np.array([[ 0., 0., 0., 0., 5.],
[ 0., 0., 4., 5., 5.],
[ 0., 4., 4., 5., 2.],
[ 0., 4., 5., 5., 2.],
[ 5., 5., 2., 2., 2.]])
blockNE=np.array([[ 5., 18., 18., 18., 6.],
[ 5., 18., 7., 6., 6.],
[ 2., 7., 7., 7., 6.],
[ 2., 2., 2., 7., 7.],
[ 2., 2., 7., 7., 7.]])
Any idea on how to detect and relabel these overlapping segments?
Also my code looks a bit too cumbersome, any ideas on how to improve my code?
A few remarks:
Some segments will not overlap for 100%, so it should be possible to set a threshold. For example is segments are overlapping for more than 70% they should be relabeled
The output shape of the function should be similar to the shape of the stacked blocks
The desired output will look like this:
EDIT
With for-loops the code to solve the question would look something like this:
from scipy.stats import itemfreq
# Locate and re-label overlapping segments
for k in range(len(np.unique(blockvals))):
#Iterate over each value in the overlapping row/column of the block
blockval=np.unique(blockvals)[k]
#count of blockval
block_val_count=len(blockvals[np.where(blockvals==blockval)])
#Select values in neighbour on the same location
overlap=neighbourvals[np.where(blockvals==blockval)]
overlapfreq=itemfreq(overlap)
#select neighboring value which overlaps the most
neighval_overlap_count= np.max(overlapfreq[:,1])
neighval=overlapfreq[np.where(overlapfreq[:,1]==neighval_overlap_count),0][0]
# count occurence of selected neighboring value
neigh_val_count=len(neighbourvals[np.where(neighbourvals==neighval)])
#If overlap is more than 70% relabel the neigboring value to the value in the block
thresh=0.7
if (neighval_overlap_count/float(neigh_val_count)>=thresh) and (neighval_overlap_count/float(block_val_count)>=thresh):
blocksStacked[neighborindex,:,:,][np.where(blocksStacked[neighborindex,:,:]==neighval)]=blockval
#output
>>> blocksStacked
array([[[ 0., 0., 0., 0., 5.],
[ 0., 0., 4., 5., 5.],
[ 0., 4., 4., 5., 2.],
[ 0., 4., 5., 5., 2.],
[ 5., 5., 2., 2., 2.]],
[[ 5., 18., 18., 18., 6.],
[ 5., 18., 7., 6., 6.],
[ 2., 7., 7., 7., 6.],
[ 2., 2., 2., 7., 7.],
[ 2., 2., 7., 7., 7.]],
[[ 8., 8., 8., 10., 10.],
[ 8., 8., 9., 10., 10.],
[ 8., 8., 9., 9., 10.],
[ 8., 8., 8., 9., 10.],
[ 8., 8., 9., 9., 11.]],
[[ 10., 10., 10., 10., 10.],
[ 10., 10., 10., 10., 13.],
[ 10., 10., 10., 13., 13.],
[ 10., 10., 13., 13., 13.],
[ 10., 13., 13., 13., 13.]]])

Multiplication in Python with arrays of different length

I have a five 100x100 arrays, A, and I want to multiply each matrix by a value from an array of length five, B. I wish to multiply the first matrix in A by the first value in B and the second matrix by the second value in B, etc. Am I able to do this?
Actually the answer has been provided by gboffi in his comment. Yet I want to elaborate that answer, giving a concrete example with code:
import numpy as np
#example data, all arrays of ones 100x100
A1 = A2 = A3 =A4 = A5 = np.ones((100, 100))
#example array containing the factor for each matrix
B = np.array([1, 2, 3, 4, 5])
#create an array containing all matrices
A = np.array([A1, A2, A3, A4, A5])
A*B[:,None,None]
The result then looks like this:
array([[[ 1., 1., 1., ..., 1., 1., 1.],
[ 1., 1., 1., ..., 1., 1., 1.],
[ 1., 1., 1., ..., 1., 1., 1.],
...,
[ 1., 1., 1., ..., 1., 1., 1.],
[ 1., 1., 1., ..., 1., 1., 1.],
[ 1., 1., 1., ..., 1., 1., 1.]],
[[ 2., 2., 2., ..., 2., 2., 2.],
[ 2., 2., 2., ..., 2., 2., 2.],
[ 2., 2., 2., ..., 2., 2., 2.],
...,
[ 2., 2., 2., ..., 2., 2., 2.],
[ 2., 2., 2., ..., 2., 2., 2.],
[ 2., 2., 2., ..., 2., 2., 2.]],
[[ 3., 3., 3., ..., 3., 3., 3.],
[ 3., 3., 3., ..., 3., 3., 3.],
[ 3., 3., 3., ..., 3., 3., 3.],
...,
[ 3., 3., 3., ..., 3., 3., 3.],
[ 3., 3., 3., ..., 3., 3., 3.],
[ 3., 3., 3., ..., 3., 3., 3.]],
[[ 4., 4., 4., ..., 4., 4., 4.],
[ 4., 4., 4., ..., 4., 4., 4.],
[ 4., 4., 4., ..., 4., 4., 4.],
...,
[ 4., 4., 4., ..., 4., 4., 4.],
[ 4., 4., 4., ..., 4., 4., 4.],
[ 4., 4., 4., ..., 4., 4., 4.]],
[[ 5., 5., 5., ..., 5., 5., 5.],
[ 5., 5., 5., ..., 5., 5., 5.],
[ 5., 5., 5., ..., 5., 5., 5.],
...,
[ 5., 5., 5., ..., 5., 5., 5.],
[ 5., 5., 5., ..., 5., 5., 5.],
[ 5., 5., 5., ..., 5., 5., 5.]]])

Are there alternative way to manage value assignment of n-dim array/matrix/list in Python?

In python we do some thing like this for example:
n = 30
A = numpy.zeros(shape=(n,n))
for i in range(0, n):
for j in range(0, n):
A[i, j] = i+j
#i+j just example of assignment
To manage a 2-dim array. It's so simple. just use nest loop to walk around rows and columns.
But my friend told me why it's so complicated. Could you give me the another way to manage it?
He told me in Mathematica have some way more easier to manage n-dim array (I'm not sure. I've never use Mathematica)
Can you give me the alternative way to manage value assignment on n-dim matrix/array(in Numpy) or list(ordinary one in Python)?
You are looking for numpy.fromfunction:
>>> numpy.fromfunction(lambda x, y: x + y, (5, 5))
array([[ 0., 1., 2., 3., 4.],
[ 1., 2., 3., 4., 5.],
[ 2., 3., 4., 5., 6.],
[ 3., 4., 5., 6., 7.],
[ 4., 5., 6., 7., 8.]])
You can simplify slightly using operator:
>>> from operator import add
>>> numpy.fromfunction(add, (5, 5))
array([[ 0., 1., 2., 3., 4.],
[ 1., 2., 3., 4., 5.],
[ 2., 3., 4., 5., 6.],
[ 3., 4., 5., 6., 7.],
[ 4., 5., 6., 7., 8.]])
You can use the mathematical rules for matrixes and vectors:
n = 30
w = numpy.arange(n).reshape(1,-1)
A = w+w.T

Categories

Resources