3d numpy record array

3d numpy record array - python

Is is possible to have a 3-D record array in numpy? (Maybe this is not possible, or there is simply an easier way to do things too -- I am open to other options).
Assume I want an array that holds data for 3 variables (say temp, precip, humidity), and each variable's data is actually a 2-d array of 2 years (rows) and 6 months of data (columns), I could create that like this:
>>> import numpy as np
>>> d = np.array(np.arange(3*2*6).reshape(3,2,6))
>>> d
#
# comments added for explanation...
# jan feb mar apr may Jun
array([[[ 0, 1, 2, 3, 4, 5], # yr1 temp
[ 6, 7, 8, 9, 10, 11]], # yr2 temp
[[12, 13, 14, 15, 16, 17], # yr1 precip
[18, 19, 20, 21, 22, 23]], # yr2 precip
[[24, 25, 26, 27, 28, 29], # yr1 humidity
[30, 31, 32, 33, 34, 35]]]) # yr2 humidity
I'd like to be able to type:
>>> d['temp']
and get this (the first "page" of the data):
>>> array([[ 0, 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10, 11]])
or:
>>> d['Jan'] # assume months are Jan-June
and get this
>>> array([[0,6],
[12,18],
[24,30]])
I have been through this: http://www.scipy.org/RecordArrays a number of times, but don't see how set up what I am after.

Actually, you can do something similar to this with structured arrays, but it's generally more trouble than it's worth.
What you want is basically labeled axes.
Pandas (which is built on top of numpy) provides what you want, and is a better choice if you want this type of indexing. There's also Larry (for labeled array), but it's largely been superseded by Pandas.
Also, you should be looking at the numpy documentation for structured arrays for info on this, rather than an FAQ. The numpy documentation has considerably more information. http://docs.scipy.org/doc/numpy/user/basics.rec.html
If you do want to take a pure-numpy route, note that structured arrays can contain multidimensional arrays. (Note the shape argument when specifying a dtype.) This will rapidly get more complex than it's worth, though.
In pandas terminology, what you want is a Panel. You should probably get familiar with DataFrames first, though.
Here's how you'd do it with Pandas:
import numpy as np
import pandas
d = np.array(np.arange(3*2*6).reshape(3,2,6))
dat = pandas.Panel(d, items=['temp', 'precip', 'humidity'],
major_axis=['yr1', 'yr2'],
minor_axis=['jan', 'feb', 'mar', 'apr', 'may', 'jun'])
print dat['temp']
print dat.major_xs('yr1')
print dat.minor_xs('may')

Related

Is there a way to make a model that creates a mask to drop certain inputs before feeding the masked data to another network?

This might be a question that is kind of dumb but I'm trying to construct a model that is able to filter out inputs before feeding the filtered output to another network.
For example, I have an image that I would to match with a database of about 100 pictures, then I would apply the first network to do some operations that would output the top 10 pictures that is most likely to correctly match. Afterwards, I would apply a second network to rematch those top 10 pictures using a secondary network.
INPUT --> | NETWORK 1 | --> FILTERED OUTPUT --> | NETWORK 2 | --> FINAL OUTPUT
Wondering if there is a way to accomplish this sort of filtration behaviour where that filtered output is fed to the second model like that.

You could maybe take a look at Boolean index arrays with numpy
>>> import numpy as np
>>> x = np.array(range(20))
>>> x
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9,
10, 11, 12, 13, 14, 15, 16, 17, 18, 19])
>>> x[x > 10]
array([11, 12, 13, 14, 15, 16, 17, 18, 19])
x > 10 returns an array with 20 booleans, so you can maybe try something like this:
x = pic_arr[network1(pic_arr)]
network2(x)
Where pic_arr is an array with your pictures and network1 returns a list with booleans of which pictures to select.

Reshaping 2D array to 3D array for tiff conversion

I have a 2D array like below, where 1,2,3...6 are rows(6) and Alphabets are columns(4).
1 - A,B,C,D
2 - E,F,G,H
3 - I,J,K,L
4 - M,N,O,P
5 - Q,R,S,T
6 - U,V,W,X
I need to reshape this array to 2X3 array as mentioned below in a way that if I access "1" it should
return alphabets A,B,C,D
1, 2, 3
4, 5, 6
after reshaping I will save the array as multi-band tiff.
I know it's a very simple task and I can do it by creating an empty array and filling it using for loops, but I want to do it with reshape function or any simple method.
Kindly help me guys.

Your array format is a bit misleading. However, here's a minimal example I could prepare:
# example array
arr = pd.np.random.randint(1, 30, 30).reshape(-1, 5)
array([[25, 13, 24, 10, 14],
[13, 11, 2, 24, 20],
[16, 28, 5, 12, 24],
[ 2, 21, 24, 29, 21],
[21, 5, 18, 23, 23],
[22, 9, 10, 29, 9]])
# reshape the array by taking first value from each row
np.apply_along_axis(lambda x: x[0], 1, arr).reshape(-1, 3)
array([[25, 13, 16],
[ 2, 21, 22]])

Got answer from this link
how to save an array representing an image with 40 band to a .tif file
Actually I was saving tiff file using skimage's imsave command and skimage can handle only 4 channel data. Following command solve my issue.
tifffile.imsave("y.tif", x, planarconfig='contig') ie. band dimension last for contig.

Numpy Dot Product of two 2-d arrays in numpy to get 3-d array

Sorry for the badly explained title. I am trying to parallelise a part of my code and got stuck on a dot product. I am looking for an efficient way of doing what the code below does, I'm sure there is a simple linear algebra solution but I'm very stuck:
puy = np.arange(8).reshape(2,4)
puy2 = np.arange(12).reshape(3,4)
print puy, '\n'
print puy2.T
zz = np.zeros([4,2,3])
for i in range(4):
zz[i,:,:] = np.dot(np.array([puy[:,i]]).T,
np.array([puy2.T[i,:]]))

One way would be to use np.einsum, which allows you to specify what you want to happen to the indices:
>>> np.einsum('ik,jk->kij', puy, puy2)
array([[[ 0, 0, 0],
[ 0, 16, 32]],
[[ 1, 5, 9],
[ 5, 25, 45]],
[[ 4, 12, 20],
[12, 36, 60]],
[[ 9, 21, 33],
[21, 49, 77]]])
>>> np.allclose(np.einsum('ik,jk->kij', puy, puy2), zz)
True

Here's another way with broadcasting -
(puy[None,...]*puy2[:,None,:]).T

Attempting to make a multi-column graph

I am trying to make a column graph where the y-axis is the mean grain size, the x-axis is the distance along the transect, and each series is a date and/or number value (it doesn't really matter).
I have been trying a few different methods in Excel 2010 but I cannot figure it out. My hope is that, lets say at the first location, 9, there will be three columns and then at 12 there will be two columns. If it matter at all, lets say the total distance is 50. The result of this data should have 7 sets of columns along the transect/x-axis.
I have tried to do this using python but my coding knowledge is close to nil. Here is my code so far:
import numpy as np
import matplotlib.pyplot as plt
grainsize = [0.7912, 0.513, 0.4644, 1.0852, 1.8515, 1.812, 6.371, 1.602, 1.0251, 5.6884, 0.4166, 24.8669, 0.5223, 37.387, 0.5159, 0.6727]
series = [2, 3, 4, 1, 4, 2, 3, 4, 1, 4, 1, 4, 1, 4, 1, 4]
distance = [9, 9, 9, 12, 12, 15, 15, 15, 17, 17, 25, 25, 32.5, 32.5, 39.5, 39.5]
If someone happen to know of a code to use, it would be very helpful. A recommendation for how to do this in Excel would be awesome too.

There's a plotting library called seaborn, built on top of matplotlib, that does this in one line. Your example:
import numpy as np
import seaborn as sns
from matplotlib.pyplot import show
grainsize = [0.7912, 0.513, 0.4644, 1.0852, 1.8515, 1.812, 6.371,
1.602, 1.0251, 5.6884, 0.4166, 24.8669, 0.5223, 37.387, 0.5159, 0.6727]
series = [2, 3, 4, 1, 4, 2, 3, 4, 1, 4, 1, 4, 1, 4, 1, 4]
distance = [9, 9, 9, 12, 12, 15, 15, 15, 17, 17, 25, 25, 32.5, 32.5, 39.5, 39.5]
ax = sns.barplot(x=distance, y=grainsize, hue=series, palette='muted')
ax.set_xlabel('distance')
ax.set_ylabel('grainsize')
show()
You will be able to do a lot even as a total newbie by editing the many examples in the seaborn gallery. Use them as training wheels: edit only one thing at a time and think about what changes.

Logical indexing in Numpy with two indices as in MATLAB

How do I replicate this indexing done in MATLAB with Numpy?
X=magic(5);
M=[0,0,1,2,1];
X(M==0,M==2)
that returns:
ans =
8
14
I've found that doing this in Numpy is not correct, since it does not give me the same results..
X = np.matrix([[17, 24, 1, 8, 15],
[23, 5, 7, 14, 16],
[ 4, 6, 13, 20, 22],
[10, 12, 19, 21, 3],
[11, 18, 25, 2, 9]])
M=array([0,0,1,2,1])
X.take([M==0]).take([M==2], axis=1)
since I get:
matrix([[24, 24, 24, 24, 24]])
What is the correct way to logically index with two indices in numpy?

In general there are two ways to interpret X[a, b] when both a and b are arrays (vectors in matlab), "inner-style" indexing or "outer-style" indexing.
The designers of matlab chose "outer-style" indexing and the designers of numpy chose inner-style indexing. To do "outer-style" indexing in numpy one can use:
X[np.ix_(a, b)]
# This is roughly equal to matlab's
X(a, b)
for completness you can do "inner-style" indexing in matlab by doing:
X(sub2ind(size(X), a, b))
# This is roughly equal to numpy's
X[a, b]
In short, try X[np.ix_(M == 0, M == 1)].

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

3d numpy record array - python

Related

Is there a way to make a model that creates a mask to drop certain inputs before feeding the masked data to another network?

Reshaping 2D array to 3D array for tiff conversion

Numpy Dot Product of two 2-d arrays in numpy to get 3-d array

Attempting to make a multi-column graph

Logical indexing in Numpy with two indices as in MATLAB

Categories

Resources