Reshape arrays with python - python

I have the array
A = array([[ 1., 2., 3., 10., 11., 12.],
[ 4., 5., 6., 13., 14., 15.],
[ 7., 8., 9., 16., 17., 18.],
[ 19., 20., 21., 28., 29., 30.],
[ 22., 23., 24., 31., 32., 33.],
[ 25., 26., 27., 34., 35., 36.]])
I would like to reshape it in order to obtain
B = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36]
I have tried
>>> B = A.reshape(1,36)
array([[ 1., 2., 3., 10., 11., 12., 4., 5., 6., 13., 14.,
15., 7., 8., 9., 16., 17., 18., 19., 20., 21., 28.,
29., 30., 22., 23., 24., 31., 32., 33., 25., 26., 27.,
34., 35., 36.]])
But, obviously, I didn't reach the result. My real data differs from the example, so I can't sort the array A to obtain B.
I suppose I need more reshapes...

Split each of those two axes such that the remaining ones are of lengths 2 each with a reshape giving us a 4D array and then swap the middle two axes with np.swapaxes() and finally flatten with np.ravel() -
A.reshape(2,3,2,3).swapaxes(1,2).ravel()
Generically put -
m,n = A.shape
A.reshape(2,m//2,2,n//2).swapaxes(1,2).ravel()
Sample run -
In [15]: A
Out[15]:
array([[ 1., 2., 3., 10., 11., 12.],
[ 4., 5., 6., 13., 14., 15.],
[ 7., 8., 9., 16., 17., 18.],
[ 19., 20., 21., 28., 29., 30.],
[ 22., 23., 24., 31., 32., 33.],
[ 25., 26., 27., 34., 35., 36.]])
In [16]: A.reshape(2,3,2,3).swapaxes(1,2).ravel()
Out[16]:
array([ 1., 2., 3., 4., 5., 6., 7., 8., 9., 10., 11.,
12., 13., 14., 15., 16., 17., 18., 19., 20., 21., 22.,
23., 24., 25., 26., 27., 28., 29., 30., 31., 32., 33.,
34., 35., 36.])

Related

Inconsistency in Keras Flatten() layer behavior using Theano Backend

I am trying to understand the behavior of the Flatten() layer in Keras with Theano backend. I have two different versions of Keras installed in two different Conda environments. The way a 4D tensor is unrolled using Flatten() differs in these two versions and I am thoroughly confused about which one is correct.
I have written the following two code snippets to show the problem:
The above code is flattening the input matrix along the channels axis first
However, in another version, the result is different:
The above code is flattening the input matrix along the columns axis first.
Can someone please explain this? Thanks!
Both are correct. The difference is because of the image_data_format setting. This can be set in keras.json or via the backend API.
https://keras.io/backend/
>>> from keras import backend as K
>>> K.image_data_format()
'channels_first'
When format is 'channels_first', output is
array([[ 0., 9., 18., 27., 1., 10., 19., 28., 2., 11., 20., 29., 3.,
12., 21., 30., 4., 13., 22., 31., 5., 14., 23., 32., 6., 15.,
24., 33., 7., 16., 25., 34., 8., 17., 26., 35.]], dtype=float32)
When format is 'channels_last', output is
array([[ 0., 1., 2., 3., 4., 5., 6., 7., 8., 9., 10., 11., 12.,
13., 14., 15., 16., 17., 18., 19., 20., 21., 22., 23., 24., 25.,
26., 27., 28., 29., 30., 31., 32., 33., 34., 35.]], dtype=float32)

Is there a way to get batches with continuous examples in Pytorch

I have a dataset with 10,000+ examples and using Dataloader, I create batches of size 50. I'm trying to find a way to have batch 1 start at example 1 and end at example 50 then have batch 2 start at example 2 and end at example 51 and so on.
This is a snip of where I use DataLoader:
train_loader = torch.utils.data.DataLoader(train, batch_size=batch_size, drop_last=True, shuffle=False)
for epoch in range(num_epochs):
totalEpochs += 1
for X, y in train_loader:
train = X.view(-1, 1, X.shape[1]).float()
def timeseries_to_supervised(data, seq_length):
x = []
y = []
for i in range(len(data)-seq_length-1):
_x = data[i:(i+seq_length)]
_y = data[i+seq_length]
x.append(_x)
y.append(_y)
return np.array(x),np.array(y)
data = range(180)
window_size = 30 # 60 mins
x,y = timeseries_to_supervised(data, window_size)
train_data = x[:90]
train_label = y[:90]
test_data = x[90:]
test_label = y[90:]
trainX = Variable(torch.Tensor(np.array(train_data)))
trainY = Variable(torch.Tensor(np.array(train_label)))
testX = Variable(torch.Tensor(np.array(test_data)))
testY = Variable(torch.Tensor(np.array(test_label)))
batch = 24
from torch.utils.data import Dataset, DataLoader
train_loader = (DataLoader(TimeSeriesDataSet(trainX, trainY), batch_size=batch, shuffle=False))
test_loader = (DataLoader(TimeSeriesDataSet(testX, testY), batch_size=batch, shuffle=False))
for i, d in enumerate(train_loader):
print(i, d[0].shape, d[1].shape)
print (d) # d[0] - features , d[1] - labels
Results:
0 torch.Size([24, 30]) torch.Size([24])
[tensor([[ 0., 1., 2., 3., 4., 5., 6., 7., 8., 9., 10., 11., 12., 13.,
14., 15., 16., 17., 18., 19., 20., 21., 22., 23., 24., 25., 26., 27.,
28., 29.],
[ 1., 2., 3., 4., 5., 6., 7., 8., 9., 10., 11., 12., 13., 14.,
15., 16., 17., 18., 19., 20., 21., 22., 23., 24., 25., 26., 27., 28.,
29., 30.],
[ 2., 3., 4., 5., 6., 7., 8., 9., 10., 11., 12., 13., 14., 15.,
16., 17., 18., 19., 20., 21., 22., 23., 24., 25., 26., 27., 28., 29.,
30., 31.],
...
[23., 24., 25., 26., 27., 28., 29., 30., 31., 32., 33., 34., 35., 36.,
37., 38., 39., 40., 41., 42., 43., 44., 45., 46., 47., 48., 49., 50.,
51., 52.]]),
tensor([30., 31., 32., 33., 34., 35., 36., 37., 38., 39., 40., 41., 42., 43.,
44., 45., 46., 47., 48., 49., 50., 51., 52., 53.])]

Numpy replace values in array using putmask and indexing

I would like to replace values in a NumpyArray, in only one column, on several selected rows only, using putmask. I wish to use indexing on the array to be modified as well as the mask used. Therefor I create a nd.array, a mask and and array of desired replacements. as follows:
import numpy as np
a = np.linspace(1,30,30)
a.shape(10,3)
mask = np.random.randint(2, size=8)
replacements = a[[2,4,5,6,7,8],0]*a[[2,4,5,6,7,8],1]
a
array([[ 1., 2., 3.],
[ 4., 5., 6.],
[ 7., 8., 9.],
[10., 11., 12.],
[13., 14., 15.],
[16., 17., 18.],
[19., 20., 21.],
[22., 23., 24.],
[25., 26., 27.],
[28., 29., 30.]])
mask
array([0, 1, 0, 0, 1, 0, 1, 1])
replacements
array([ 56., 182., 272., 380., 506., 650.])
np.putmask(a[[2,4,5,6,7,8],2], mask[2::], replacements)
My expected result would look like this:
a
array([[ 1., 2., 3.],
[ 4., 5., 6.],
[ 7., 8., 9.],
[10., 11., 12.],
[13., 14., 15.],
[16., 17., 272.],
[19., 20., 21.],
[22., 23., 506.],
[25., 26., 650.],
[28., 29., 30.]])
But instead I get this:
a
array([[ 1., 2., 3.],
[ 4., 5., 6.],
[ 7., 8., 9.],
[10., 11., 12.],
[13., 14., 15.],
[16., 17., 18.],
[19., 20., 21.],
[22., 23., 24.],
[25., 26., 27.],
[28., 29., 30.]])
Anybody has an idea maybe?
Note that you are using fancy indexing, so when using np.putmask you are modifying a copy rather than a sliced view, and thus the original array remains unchanged. You can check this by trying to index using slice notation, np.putmask(a[2:8,2], mask[2::], replacements) for instance, which would in this case modify the values in a.
What you could do is use np.where and reassign the values to the corresponding indices in a:
a[[2,4,5,6,7,8],2] = np.where(mask[2::], replacements, a[[2,4,5,6,7,8],2])
Output
array([[ 1., 2., 3.],
[ 4., 5., 6.],
[ 7., 8., 56.],
[ 10., 11., 12.],
[ 13., 14., 182.],
[ 16., 17., 272.],
[ 19., 20., 380.],
[ 22., 23., 506.],
[ 25., 26., 650.],
[ 28., 29., 30.]])

Repeating numpy arrays

I have a 3D np arrays like this :
x= array([[[ 1., 2., 3.],
[ 4., 5., 6.],
[ 7., 8., 9.],
[ 10., 11., 12.]],
[[ 13., 14., 15.],
[ 16., 17., 18.],
[ 19., 20., 21.],
[ 22., 23., 24.]]])
I would like to repeat my array n times ( eg 3 times) like this :
array([[[ 1., 2., 3.],
[ 4., 5., 6.],
[ 7., 8., 9.],
[ 10., 11., 12.]],
[[ 1., 2., 3.],
[ 4., 5., 6.],
[ 7., 8., 9.],
[ 10., 11., 12.]],
[[ 1., 2., 3.],
[ 4., 5., 6.],
[ 7., 8., 9.],
[ 10., 11., 12.]],
[[ 13., 14., 15.],
[ 16., 17., 18.],
[ 19., 20., 21.],
[ 22., 23., 24.]],
[[ 13., 14., 15.],
[ 16., 17., 18.],
[ 19., 20., 21.],
[ 22., 23., 24.]]
[[ 13., 14., 15.],
[ 16., 17., 18.],
[ 19., 20., 21.],
[ 22., 23., 24.]]])
I have tried like this :
xx=np.vstack([x]*3)
print xx.reshape(6,4,3)
array([[[ 1., 2., 3.],
[ 4., 5., 6.],
[ 7., 8., 9.],
[ 10., 11., 12.]],
[[ 13., 14., 15.],
[ 16., 17., 18.],
[ 19., 20., 21.],
[ 22., 23., 24.]],
[[ 1., 2., 3.],
[ 4., 5., 6.],
[ 7., 8., 9.],
[ 10., 11., 12.]],
[[ 13., 14., 15.],
[ 16., 17., 18.],
[ 19., 20., 21.],
[ 22., 23., 24.]],
[[ 1., 2., 3.],
[ 4., 5., 6.],
[ 7., 8., 9.],
[ 10., 11., 12.]],
[[ 13., 14., 15.],
[ 16., 17., 18.],
[ 19., 20., 21.],
[ 22., 23., 24.]]])
How can I get in the order what I want, there should be the easy way to do this. Thanks in advance for your suggestions.
After a bit of trial and error I have found a way to do it:
np.tile(x.reshape(2,12), [1,3]).reshape(6,4,3)
You can use np.repeat with axis = 0:
np.repeat(x, [3, 3], axis = 0) # or more generally np.repeat(x, [n] * len(x), axis = 0)
# here n is the repeat times
Out[514]:
array([[[ 1., 2., 3.],
[ 4., 5., 6.],
[ 7., 8., 9.],
[ 10., 11., 12.]],
[[ 1., 2., 3.],
[ 4., 5., 6.],
[ 7., 8., 9.],
[ 10., 11., 12.]],
[[ 1., 2., 3.],
[ 4., 5., 6.],
[ 7., 8., 9.],
[ 10., 11., 12.]],
[[ 13., 14., 15.],
[ 16., 17., 18.],
[ 19., 20., 21.],
[ 22., 23., 24.]],
[[ 13., 14., 15.],
[ 16., 17., 18.],
[ 19., 20., 21.],
[ 22., 23., 24.]],
[[ 13., 14., 15.],
[ 16., 17., 18.],
[ 19., 20., 21.],
[ 22., 23., 24.]]])
Another option would be to index it as:
x[[0,0,0,1,1,1]]
Or programmatically:
x[[i for i in range(len(x)) for j in range(3)]]
Out[518]:
array([[[ 1., 2., 3.],
[ 4., 5., 6.],
[ 7., 8., 9.],
[ 10., 11., 12.]],
[[ 1., 2., 3.],
[ 4., 5., 6.],
[ 7., 8., 9.],
[ 10., 11., 12.]],
[[ 1., 2., 3.],
[ 4., 5., 6.],
[ 7., 8., 9.],
[ 10., 11., 12.]],
[[ 13., 14., 15.],
[ 16., 17., 18.],
[ 19., 20., 21.],
[ 22., 23., 24.]],
[[ 13., 14., 15.],
[ 16., 17., 18.],
[ 19., 20., 21.],
[ 22., 23., 24.]],
[[ 13., 14., 15.],
[ 16., 17., 18.],
[ 19., 20., 21.],
[ 22., 23., 24.]]])

Tolerances, linalg.solv, polynom solve

I have following problem:
I try to solve the equilation by using linalg.solv and it seems to work. But if i try to check it by inserting the aquired coefficients and one of the required points i get a difference of about 30% to the original data. Have i done a mistake, which i dont get? Or do i have do use a different methode to get more accurate datasets. If yes, which one?
Further if i use different values from which i entered while calculating the coefficients, i get strangly high results
data = np.genfromtxt("data1.csv",dtype=float,delimiter=";")
to = data[0,2:]
tc = data[1:,0]
y = data[1:,2:]
a = np.array([[1, to[0], tc[0], to[0]**2, to[0]*tc[0], tc[0]**2, to[0]**3, tc[0]*to[0]**2, to[0]*tc[0]**2, tc[0]**3],
[1, to[1], tc[1], to[1]**2, to[1]*tc[1], tc[1]**2, to[1]**3, tc[1]*to[1]**2, to[1]*tc[1]**2, tc[1]**3],
[1, to[2], tc[2], to[2]**2, to[2]*tc[2], tc[2]**2, to[2]**3, tc[2]*to[2]**2, to[2]*tc[2]**2, tc[2]**3],
[1, to[3], tc[3], to[3]**2, to[3]*tc[3], tc[3]**2, to[3]**3, tc[3]*to[3]**2, to[3]*tc[3]**2, tc[3]**3],
[1, to[4], tc[4], to[4]**2, to[4]*tc[4], tc[4]**2, to[4]**3, tc[4]*to[4]**2, to[4]*tc[4]**2, tc[4]**3],
[1, to[5], tc[5], to[5]**2, to[5]*tc[5], tc[5]**2, to[5]**3, tc[5]*to[5]**2, to[5]*tc[5]**2, tc[5]**3],
[1, to[6], tc[6], to[6]**2, to[6]*tc[6], tc[6]**2, to[6]**3, tc[6]*to[6]**2, to[6]*tc[6]**2, tc[6]**3],
[1, to[7], tc[7], to[7]**2, to[7]*tc[7], tc[7]**2, to[7]**3, tc[7]*to[7]**2, to[7]*tc[7]**2, tc[7]**3],
[1, to[8], tc[8], to[8]**2, to[8]*tc[8], tc[8]**2, to[8]**3, tc[8]*to[8]**2, to[8]*tc[8]**2, tc[8]**3],
[1, to[9], tc[9], to[9]**2, to[9]*tc[9], tc[9]**2, to[9]**3, tc[9]*to[9]**2, to[9]*tc[9]**2, tc[9]**3]])
b = np.array([y[0,0],y[1,1],y[2,2],y[3,3],y[4,4],y[5,5],y[6,6],y[7,7],y[8,8],y[9,9]])
c = np.linalg.solve(a, b)
ges_to = 10
ges_tc = 35
ges_y = c[0] + c[1]*ges_to + c[2]*ges_tc + c[3]*ges_to**2 + c[4]*ges_to*ges_tc + c[5]*ges_tc**2 + c[6]*ges_to**3 \
+ c[7]*ges_tc*ges_to**2 + c[8]*ges_to*ges_tc**2 + c[9]*ges_tc**3
Here are the values I use to calculate
('to:', array([ 15., 10., 5., 0., -5., -10., -15., -20., -25., -30., -35.]))
('tc:', array([ 30., 35., 40., 45., 50., 55., 60., 65., 70., 80., 90.]))
('b', array([ 24., 31., 35., 36., 35., 33., 30., 25., 21., 18.]))
('y:', array([[ 24., 26., 27., 27., 26., 25., 23., 20., 18., 15., 13.],
[ 30., 31., 31., 30., 29., 27., 24., 21., 18., 16., 14.],
[ 35., 35., 35., 33., 31., 29., 26., 22., 19., 16., 15.],
[ 40., 40., 38., 36., 33., 30., 27., 23., 20., 16., 15.],
[ 45., 44., 41., 39., 35., 32., 28., 24., 20., 17., 16.],
[ 49., 47., 44., 41., 37., 33., 29., 25., 20., 17., 16.],
[ 53., 51., 47., 43., 39., 34., 30., 25., 21., 17., 16.],
[ 57., 54., 50., 45., 40., 35., 30., 25., 21., 17., 16.],
[ 61., 57., 52., 47., 41., 36., 31., 26., 21., 17., 16.],
[ 64., 60., 54., 59., 53., 37., 32., 27., 22., 18., 19.],
[ 67., 63., 56., 61., 55., 59., 34., 29., 24., 18., 19.]]))
('ges_y:', 49.0625)
The floating point arithmetic is leading you astray. If you look at the determinant of that matrix a, it's something incredibly small like 1.551864434916621e-51. If you compute the determinate with the entries as integers (and avoid floating point arithmetic weirdness) you'll see it's actually 0, and the rank of your matrix is 5. So it's singular, and in general equations like ax = b may not have any solution.
Another quick thing you can do to see this is np.dot(a, np.linalg.inv(a)) is nowhere close to the identity matrix. Similarly, np.dot(a, c) is nowhere close to b.
There may or may not be an actual solution to ax = b, but np.linalg.lstsq(a,b) will get you an approximate solution in either case, if that's sufficient for you rneeds.

Categories

Resources