getting mean of value in matrix changes matrix

getting mean of value in matrix changes matrix - python

I am getting mean of ambiguate elements in matrix
import pandas as pd
ds2 = [[ 4, 1],
[ 5, 3],
[ 6, 1],
[ 7, 2],
[ 7, 2],
[ 8, 2],
[12, 1],
[ 9, 3],
[12, 1],
[13, 2],
[22, 3]]
ds2= pd.DataFrame(ds2)
print type(ds2)
print ds2
ds2 = ds2.groupby(0).mean()
print type(ds2)
print ds2
output:
<class 'pandas.core.frame.DataFrame'>
0 1
0 4 1
1 5 3
2 6 1
3 7 2
4 8 2
5 9 3
6 12 1
7 13 2
8 22 3
<class 'pandas.core.frame.DataFrame'>
1
0
4 1
5 3
6 1
7 2
8 2
9 3
12 1
13 2
22 3
Type remains same, but the way matrix looks change, is there any way to persist matrix view even after processing?

Pass param as_index=False to the groupby method:
In [140]:
ds2 = [[ 4, 1],
[ 5, 3],
[ 6, 1],
[ 7, 2],
[ 7, 2],
[ 8, 2],
[12, 1],
[ 9, 3],
[12, 1],
[13, 2],
[22, 3]]
ds2= pd.DataFrame(ds2)
ds2.groupby(0, as_index=False).mean()
Out[140]:
0 1
0 4 1
1 5 3
2 6 1
3 7 2
4 8 2
5 9 3
6 12 1
7 13 2
8 22 3
By default any columns passed will be used to form the index.
From the docs:
as_index : boolean, default True
For aggregated output, return object
with group labels as the index. Only relevant for DataFrame input.
as_index=False is effectively “SQL-style” grouped output

Related

Find size of numpy array within a Panda dataframe in python

I would like to get the size of each numpy array within a panda. How do I do this?
I have:
x y z
0 [1, 2, 3, 4] [8, 9, 7] [8, 9, 7]
1 [2, 3, 4, 8] [9, 8, 1] [9, 8, 1, 6, 7, 8, 9]
2 [5, 6, 7] [3, 4, 1] [3, 4, 1]
cars= pd.DataFrame({'x': [[1,2,3,4],[2,3,4,8],[5,6,7]],
'y': [[8,9,7],[9,8,1],[3,4,1]],
'z': [[8,9,7],[9,8,1,6,7,8,9],[3,4,1]]})
I want:
x y z
0 4 3 3
1 4 3 7
2 3 3 3
I know how to get the shape and size of the entire DataFrame, but not how to combine them with size of each block.
print(cars)
print(cars.size)
print(cars.shape)

Use Series.str.len in DataFrame.apply for precessing all columns:
df = cars.apply(lambda x: x.str.len())
print (df)
x y z
0 4 3 3
1 4 3 7
2 3 3 3
If no missing values use DataFrame.applymap for element-wise apply function len :
df = cars.applymap(len)

Multidimensional cumulative sum in numpy

I want to be able to calculate the cumulative sum of a large n-dimensional numpy array. The value of each element in the final array should be the sum of all elements which have indices greater than or equal to the current element.
2D: xᶦʲ = ∑xᵐⁿ ∀ m ≥ i and n ≥ j
3D: xᶦʲᵏ = ∑xᵐⁿᵒ ∀ m ≥ i and n ≥ j and o ≥ k
Examples in 2D:
1 1 0 2 1 0
1 1 1 -> 5 3 1
1 1 1 8 5 2
1 2 3 6 5 3
4 5 6 -> 21 16 9
7 8 9 45 33 18
Example in 3D:
1 1 1 3 2 1
1 1 1 6 4 2
1 1 1 9 6 3
1 1 1 6 4 2
1 1 1 -> 12 8 4
1 1 1 18 12 6
1 1 1 9 6 3
1 1 1 18 12 6
1 1 1 27 18 9

Flip along the last axis, cumsum along the same, flip it back and finally cumsum along the second last axis onwards until the first axis -
def multidim_cumsum(a):
out = a[...,::-1].cumsum(-1)[...,::-1]
for i in range(2,a.ndim+1):
np.cumsum(out, axis=-i, out=out)
return out
Sample 2D case run -
In [107]: a
Out[107]:
array([[1, 1, 0],
[1, 1, 1],
[1, 1, 1]])
In [108]: multidim_cumsum(a)
Out[108]:
array([[2, 1, 0],
[5, 3, 1],
[8, 5, 2]])
Sample 3D case run -
In [110]: a
Out[110]:
array([[[1, 1, 1],
[1, 1, 1],
[1, 1, 1]],
[[1, 1, 1],
[1, 1, 1],
[1, 1, 1]],
[[1, 1, 1],
[1, 1, 1],
[1, 1, 1]]])
In [111]: multidim_cumsum(a)
Out[111]:
array([[[ 3, 2, 1],
[ 6, 4, 2],
[ 9, 6, 3]],
[[ 6, 4, 2],
[12, 8, 4],
[18, 12, 6]],
[[ 9, 6, 3],
[18, 12, 6],
[27, 18, 9]]])

For those who want a "numpy-like" cumsum where the top-left corner is smallest:
def multidim_cumsum(a):
out = a.cumsum(-1)
for i in range(2,a.ndim+1):
np.cumsum(out, axis=-i, out=out)
return out
Modified from #Divakar (thanks to him!)

Here is a general solution. I'm going by the description, not the examples, i.e. order of vertical display is top down not bottom up:
import itertools as it
import functools as ft
ft.reduce(np.cumsum, it.chain((a[a.ndim*(np.s_[::-1],)],), range(a.ndim)))[a.ndim*(np.s_[::-1],)]
Or in-place:
for i in range(a.ndim):
b = a.swapaxes(0, i)[::-1]
b.cumsum(axis=0, out=b)

Writing a 3d numpy array that is readable in matlab

I'm trying to save a 3D numpy array to my disk so that I can later read it in matlab. I've had some difficulty using numpy.savetxt() on a 3D array, so my solution has been to first convert it to a 1D array using the following code:
import numpy
array = numpy.array([[0, 1, 2, 3],
[0, 1, 1, 3],
[3, 1, 3, 1]])
ndarray = numpy.dstack((array, array, array))
darray = ndarray.reshape(36,1)
numpy.savetxt('test.txt', darray, fmt = '%i')
Then in matlab it can be read with the following code:
file = fopen('test.txt')
array = fscanf(file, '%f')
My issue now is converting it back to the original shape. Using reshape(array, 3,4,3) yields the following:
ans(:,:,1) =
0 1 2 3
0 1 2 3
0 1 2 3
ans(:,:,2) =
0 1 1 3
0 1 1 3
0 1 1 3
ans(:,:,3) =
3 1 3 1
3 1 3 1
3 1 3 1
I've tried to transpose the 1D matlab array, then use reshape() but get the same array.
What matlab function can I apply to achieve my original python array?

You want to permute the dimensions. In numpy this is transpose. There are two complications - the 'F' order of MATLAB matrices, and the display pattern, using blocks on the last dimension (which is the outer one with F order). Jump to the end of this answer for details.
===
In [72]: arr = np.array([[0, 1, 2, 3],
...: [0, 1, 1, 3],
...: [3, 1, 3, 1]])
...:
In [80]: np.dstack((arr,arr+1))
Out[80]:
array([[[0, 1],
[1, 2],
[2, 3],
[3, 4]],
[[0, 1],
[1, 2],
[1, 2],
[3, 4]],
[[3, 4],
[1, 2],
[3, 4],
[1, 2]]])
In [81]: np.dstack((arr,arr+1)).shape
Out[81]: (3, 4, 2)
In [75]: from scipy.io import loadmat, savemat
In [76]: pwd
Out[76]: '/home/paul/mypy'
In [83]: savemat('test3',{'arr':arr, 'arr3':arr3})
In Octave
>> load 'test3.mat'
>> arr
arr =
0 1 2 3
0 1 1 3
3 1 3 1
>> arr3
arr3 =
ans(:,:,1) =
0 1 2 3
0 1 1 3
3 1 3 1
ans(:,:,2) =
1 2 3 4
1 2 2 4
4 2 4 2
>> size(arr3)
ans =
3 4 2
back in numpy I can display the array as 2 3x4 blocks with:
In [95]: arr3[:,:,0]
Out[95]:
array([[0, 1, 2, 3],
[0, 1, 1, 3],
[3, 1, 3, 1]])
In [96]: arr3[:,:,1]
Out[96]:
array([[1, 2, 3, 4],
[1, 2, 2, 4],
[4, 2, 4, 2]])
These arrays, ravelled to 1d (showing in effect the layout of values in the underlying databuffer):
In [100]: arr.ravel()
Out[100]: array([0, 1, 2, 3, 0, 1, 1, 3, 3, 1, 3, 1])
In [101]: arr3.ravel()
Out[101]:
array([0, 1, 1, 2, 2, 3, 3, 4, 0, 1, 1, 2, 1, 2, 3, 4, 3, 4, 1, 2, 3, 4, 1, 2])
The corresponding ravel in Octave:
>> arr(:).'
ans =
0 0 3 1 1 1 2 1 3 3 3 1
>> arr3(:).'
ans =
0 0 3 1 1 1 2 1 3 3 3 1 1 1 4 2 2 2 3 2 4 4 4 2
MATLAB uses F (fortran) order, with the first dimension changing fastest. Thus it is natural to display blocks arr(:,:i). You can specify order='F' when creating and working with numpy arrays. But it can be tricky keeping the order straight, especially when working with 3d. loadmat/savemat try to do some of the reordering for us. For example a 2d MATLAB matrix loads as an order F array in numpy.
In [107]: np.array([0,0,3,1,1,1,2,1,3,3,3,1])
Out[107]: array([0, 0, 3, 1, 1, 1, 2, 1, 3, 3, 3, 1])
In [108]: np.array([0,0,3,1,1,1,2,1,3,3,3,1]).reshape(4,3)
Out[108]:
array([[0, 0, 3],
[1, 1, 1],
[2, 1, 3],
[3, 3, 1]])
In [109]: np.array([0,0,3,1,1,1,2,1,3,3,3,1]).reshape(4,3).T
Out[109]:
array([[0, 1, 2, 3],
[0, 1, 1, 3],
[3, 1, 3, 1]])
In [111]: np.array([0,0,3,1,1,1,2,1,3,3,3,1]).reshape((3,4),order='F')
Out[111]:
array([[0, 1, 2, 3],
[0, 1, 1, 3],
[3, 1, 3, 1]])
It might easier to keep track of shapes with this array:
In [112]: arr3 = np.arange(2*3*4).reshape(2,3,4)
In [113]: arr3f = np.arange(2*3*4).reshape(2,3,4, order='F')
In [114]: arr3
Out[114]:
array([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]],
[[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23]]])
In [115]: arr3f
Out[115]:
array([[[ 0, 6, 12, 18],
[ 2, 8, 14, 20],
[ 4, 10, 16, 22]],
[[ 1, 7, 13, 19],
[ 3, 9, 15, 21],
[ 5, 11, 17, 23]]])
In [116]: arr3f.ravel()
Out[116]:
array([ 0, 6, 12, 18, 2, 8, 14, 20, 4, 10, 16, 22, 1, 7, 13, 19, 3,
9, 15, 21, 5, 11, 17, 23])
In [117]: arr3f.ravel(order='F')
Out[117]:
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22, 23])
In [118]: savemat('test3',{'arr3':arr3, 'arr3f':arr3f})
In Octave:
>> arr3
arr3 =
ans(:,:,1) =
0 4 8
12 16 20
ans(:,:,2) =
1 5 9
13 17 21
....
>> arr3f
arr3f =
ans(:,:,1) =
0 2 4
1 3 5
ans(:,:,2) =
6 8 10
7 9 11
...
>> arr3.ravel()'
error: int32 matrix cannot be indexed with .
>> arr3(:)'
ans =
Columns 1 through 20:
0 12 4 16 8 20 1 13 5 17 9 21 2 14 6 18 10 22 3 15
Columns 21 through 24:
7 19 11 23
>> arr3f(:)'
ans =
Columns 1 through 20:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Columns 21 through 24:
20 21 22 23
arr3f still looks 'messedup' when printed by blocks, but when raveled we see that values are in same F order. That's also evident if we print the last 'block' of the numpy array:
In [119]: arr3f[:,:,0]
Out[119]:
array([[0, 2, 4],
[1, 3, 5]])
So to match up numpy and matlab we have to keep 2 things straight - the order, and the block display style.
My MATLAB is rusty, but I found permute with is similar to the np.transpose. Using that to reorder the dimensions:
>> permute(arr3,[3,2,1])
ans =
ans(:,:,1) =
0 4 8
1 5 9
2 6 10
3 7 11
ans(:,:,2) =
12 16 20
13 17 21
14 18 22
15 19 23
>> permute(arr3,[3,2,1])(:)'
ans =
Columns 1 through 20:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Columns 21 through 24:
20 21 22 23
The equivalent transpose in numpy
In [121]: arr3f.transpose(2,1,0).ravel()
Out[121]:
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22, 23])
(Sorry for the rambling answer. I may go back an edit it. Hopefully it gives you something to work with.)
===
Let's try to apply that rambling more explicitly to your case
In [122]: x = np.array([[0, 1, 2, 3],
...: [0, 1, 1, 3],
...: [3, 1, 3, 1]])
...:
In [123]: x3 = np.dstack((x,x,x))
In [125]: dx3 = x3.reshape(36,1)
In [126]: np.savetxt('test3.txt',dx3, fmt='%i')
In [127]: cat test3.txt
0
0
0
....
3
3
1
1
1
In Octave
>> file = fopen('test3.txt')
file = 21
>> array = fscanf(file,'%f')
array =
0
0
....
>> reshape(array,3,4,3)
ans =
ans(:,:,1) =
0 1 2 3
0 1 2 3
0 1 2 3
ans(:,:,2) =
0 1 1 3
0 1 1 3
0 1 1 3
ans(:,:,3) =
3 1 3 1
3 1 3 1
3 1 3 1
and with the perumtation
>> permute(reshape(array,3,4,3),[3,2,1])
ans =
ans(:,:,1) =
0 1 2 3
0 1 1 3
3 1 3 1
ans(:,:,2) =
0 1 2 3
0 1 1 3
3 1 3 1
ans(:,:,3) =
0 1 2 3
0 1 1 3
3 1 3 1

Dynamically partition a 2d tensor into multiple tensors in tensorflow

Given a 2d tensor (matrix), I would like to partition it into several small ones with equal size. You can regard it as the preprocessing of the max pooling. For instance,
1 2 3 4 5 6 7 8
2 3 4 5 6 7 8 9
3 4 5 6 7 8 9 10
4 5 6 7 8 9 10 11
Given the a dynamic desired_size of 2 * 4, the outputs should be:
1 2 3 4
2 3 4 5
5 6 7 8
6 7 8 9
3 4 5 6
4 5 6 7
7 8 9 10
8 9 10 11
I have studied slice and gather for a while. But I still don't have idea how to do it. Could you tell me how to get that? Thanks in advance!

You could use tf.extract_image_patches, even though it turns out somewhat verbose:
import numpy as np
import tensorflow as tf
x = tf.constant(np.arange(8) + np.arange(1,5)[:,np.newaxis])
e = tf.extract_image_patches(x[tf.newaxis,:,:,tf.newaxis],
[1, 2, 4, 1], [1, 2, 4, 1], [1, 1, 1, 1], padding='VALID')
e = tf.reshape(e, [-1, 2, 4])
sess = tf.InteractiveSession()
e.eval()
# returns
# array([[[ 1, 2, 3, 4],
# [ 2, 3, 4, 5]],
# [[ 5, 6, 7, 8],
# [ 6, 7, 8, 9]],
# [[ 3, 4, 5, 6],
# [ 4, 5, 6, 7]],
# [[ 7, 8, 9, 10],
# [ 8, 9, 10, 11]]])

I tied with tf.split():
num_splits = 2
desired_size = (2, 4)
A = tf.constant(a)
C = tf.concat(tf.split(A, desired_size[0], 0),1)
D = tf.reshape(tf.concat(tf.split(C, num_splits*desired_size[0], 1), 0), (-1, desired_size[0], desired_size[1]))
#The result
[[[ 1 2 3 4]
[ 2 3 4 5]]
[[ 5 6 7 8]
[ 6 7 8 9]]
[[ 3 4 5 6]
[ 4 5 6 7]]
[[ 7 8 9 10]
[ 8 9 10 11]]]
# For num_splits = 4, desired_size = (2, 2) you get
[[[ 1 2]
[ 2 3]]
[[ 3 4]
[ 4 5]]
[[ 5 6]
[ 6 7]]
[[ 7 8]
[ 8 9]]
[[ 3 4]
[ 4 5]]
[[ 5 6]
[ 6 7]]
[[ 7 8]
[ 8 9]]
[[ 9 10]
[10 11]]]

In numpy, how to efficiently list all fixed-size submatrices?

I have an arbitrary NxM matrix, for example:
1 2 3 4 5 6
7 8 9 0 1 2
3 4 5 6 7 8
9 0 1 2 3 4
I want to get a list of all 3x3 submatrices in this matrix:
1 2 3 2 3 4 0 1 2
7 8 9 ; 8 9 0 ; ... ; 6 7 8
3 4 5 4 5 6 2 3 4
I can do this with two nested loops:
rows, cols = input_matrix.shape
patches = []
for row in np.arange(0, rows - 3):
for col in np.arange(0, cols - 3):
patches.append(input_matrix[row:row+3, col:col+3])
But for a large input matrix, this is slow. Is there a way to do this faster with numpy?
I've looked at np.split, but that gives me non-overlapping sub-matrices, whereas I want all possible submatrices, regardless of overlap.

You want a windowed view:
from numpy.lib.stride_tricks import as_strided
arr = np.arange(1, 25).reshape(4, 6) % 10
sub_shape = (3, 3)
view_shape = tuple(np.subtract(arr.shape, sub_shape) + 1) + sub_shape
arr_view = as_strided(arr, view_shape, arr.strides * 2
arr_view = arr_view.reshape((-1,) + sub_shape)
>>> arr_view
array([[[[1, 2, 3],
[7, 8, 9],
[3, 4, 5]],
[[2, 3, 4],
[8, 9, 0],
[4, 5, 6]],
...
[[9, 0, 1],
[5, 6, 7],
[1, 2, 3]],
[[0, 1, 2],
[6, 7, 8],
[2, 3, 4]]]])
The good part of doing it like this is that you are not copying any data, you are simply accessing the data of your original array in a different way. For large arrays this can result in tremendous memory savings.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

getting mean of value in matrix changes matrix - python

Related

Find size of numpy array within a Panda dataframe in python

Multidimensional cumulative sum in numpy

Writing a 3d numpy array that is readable in matlab

Dynamically partition a 2d tensor into multiple tensors in tensorflow

In numpy, how to efficiently list all fixed-size submatrices?

Categories

Resources