Creating a numpy array and then sorting array - python

I have been encountering a problem that I can't seem to solve I need to take a list of strings and calculate some values and then add the relevant string and the relevant integer to a numpy array. I've been told to create the numpy array of zeroes first as it will be of a known length so I can do that. My problem is how do I iteratively add each string to the first column (names) and each value (labels) to the second and then sort the full array alphabetically by the first column
fileCount = sum([len(files) for r, d, files in os.walk(inputDirectory)])
labelArray = np.zeros(shape = (fileCount,2))
arrayInsertCounter = 0
for label, subDirectories in enumerate(inputDirectory):
subDirPath = os.path.join(inputDirectory, subDirectories)
for name in subDirPath:
labelArray[arrayInsertCounter] = [name,label]
arrayInsertCounter += 1

You could do it in numpy using a structured array
import numpy as np
labels = list(map(''.join, zip(*3*((chr(ord('a')+(19*i)%24) for i in range(24)),))))
numbers = np.arange(8)
dt = np.dtype([('label', object), ('value', int)])
table = np.empty((8,), dtype = dt)
table['label'] = labels
table['value'] = numbers
print(table)
table.sort()
print(table)
Output:
#[('ato', 0) ('jex', 1) ('sni', 2) ('dwr', 3) ('mhc', 4) ('vql', 5)
# ('gbu', 6) ('pkf', 7)]
#[('ato', 0) ('dwr', 3) ('gbu', 6) ('jex', 1) ('mhc', 4) ('pkf', 7)
# ('sni', 2) ('vql', 5)]
Edit: How to access individual records:
table[2] = 'new label', 1000
table
# array([('ato', 0), ('dwr', 3), ('new label', 1000), ('jex', 1),
# ('mhc', 4), ('pkf', 7), ('sni', 2), ('vql', 5)],
# dtype=[('label', 'O'), ('value', '<i8')])

Related

padding a input vector, a 4-D matrix, using numpy for a convolutional neural network (CNN)

This is the entire code related to my question. You should be able to run this code and see the plots created - by just pasting and running it into your IDE.
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(1)
x = np.random.randn(4, 3, 3, 2)
x_pad = np.pad(x, ((0,0), (2, 2), (2, 2), (0,0))\
, mode='constant', constant_values = (0,0))
print ("x.shape =\n", x.shape)
print ("x_pad.shape =\n", x_pad.shape)
print ("x[1,1] =\n", x[1,1])
print ("x_pad[1,1] =\n", x_pad[1,1])
fig, axarr = plt.subplots(1, 2)
axarr[0].set_title('x')
axarr[0].imshow(x[0,:,:,0])
axarr[1].set_title('x_pad')
axarr[1].imshow(x_pad[0,:,:,0])
Specifically, my question is related to these two lines of code:
x = np.random.randn(4, 3, 3, 2)
x_pad = np.pad(x, ((0,0), (2, 2), (2, 2), (0,0)), mode='constant', constant_values = (0,0))
I want to pad the 2nd and 3rd dimension in x. So, I want to pad x[1] which has a value of 3 and x[2] which also has the value of 3. Based on the problem that I am solving, x[0] and x[3], which contain '4' and '2' respectively, represent something else. x[0] represents the number of number of such 3*3 matrices and x[3] the channels.
My question is about around how python is representing this information and about how we are interpreting it. Are these the same?
The statement x = np.random.randn (4, 3, 3, 2) created a matrix 4 rows by 3 columns and each element in this 4*3 matrix is a 3 row by 2 column matrix. That is how Python is representing the x_pad. Is this understanding correct?
If so, then in the np.pad statement, we are padding the number of columns in the outer matrix (which is 3 in the 4*3). We are also padding the number of rows, which is 3, in the “3*2” - that is, the number of rows in the inner matrix).
The 3, 3 in (4, 3, 3, 2) was supposed to be part of just one matrix and not the columns of the outer matrix and the rows of the inner matrix? I am having trouble visualizing this? Can someone please clarify. Thank you!
These lines:
x = np.random.randn(4, 3, 3, 2)
x_pad = np.pad(x, ((0,0), (2, 2), (2, 2), (0,0)), mode='constant', constant_values = (0,0))
are equivalent to:
x = np.random.randn(4, 3, 3, 2)
x_pad = np.zeros((4, 3+2+2, 3+2+2, 2))
x_pad[:, 2:-2, 2:-2, :] = x
You could interpret a 4-D array as being a 2-D array of 2-D arrays if that fits whatever this data represents for you, but numpy internally stores arrays as a 1D array of data; with x[i,j,k,l] pointing to data[l+n3*(k + n2*(j + n1*i))] where n1, n2, n3 are the lengths of the corresponding axes.
Visualizing 4-D (and higher) arrays is very difficult for humans. You just have to keep track of the indices for the four axes when you deal with such arrays.

Why is my array coming out as shape: (6, 1, 2) when it is made of two (6, ) arrays?

I'm trying to import data from an excel and create an array pos with 6 rows and two columns. Later, when I go to index the array pos[0][1], I get an error: IndexError: index 1 is out of bounds for axis 0 with size 1.
I looked at the shape of my array and it returns (6, 1, 2). I was expecting to get (6, 2). The individual shapes of the arrays which make up pos are (6, ) and (6, ) which I don't really understand, why not (6, 1)? Don't quite understand the difference between the two.
irmadata = pd.read_excel("DangerZone.xlsx")
irma_lats = irmadata["Average Latitude"].tolist()
irma_longs = irmadata["Average Longitude"].tolist()
shipdata = pd.read_excel("ShipPositions.xlsx")
ship_lats = shipdata["Latitude"].to_numpy() ## these are the (6, ) arrays
ship_longs = shipdata["Longitude"].to_numpy()
pos = np.array([[ship_lats], [ship_longs]], dtype = "d").T
extent = [-10, -90, 0, 50]
ax = plot.axes(projection = crs.PlateCarree())
ax.stock_img()
ax.add_feature(cf.COASTLINE)
ax.coastlines(resolution = "50m")
ax.set_title("Base Map")
ax.set_extent(extent)
ax.plot(irma_longs, irma_lats)
for i in range(len(ship_lats)):
lat = pos[i][0]
lon = pos[i][1] ## This is where my error occurs
ax.plot(lon, lat, 'o', label = "Ship " + str(i+1))
plot.show()
Obviously, I could just index pos[0][0][1] however, I'd like to know why I'm getting this issue. I'm coming from MATLAB so I suppose a lot of my issues will stem from differences in how numpy and MATLAB work, and hence any tips would also be appreciated!
I solved it, I didn't realise I could just use single square brackets for combining my two column arrays. So, changing pos = np.array([ship_lats], [ship_longs]], dtype = "d").T to pos = np.array([ship_lats, ship_longs], dtype = "d").T worked.

How to create a numpy array from two lists of tuples, but only when the tuples are the same

For image analysis i loaded up a float image with scipy imread.
Next, i had scipys argrelmax search for local maxima in axis 0 and 1 and stored the results as arrays of tuples.
data = msc.imread(prediction1, 'F')
datarelmax_0 = almax(data, axis = 0)
datarelmax_1 = almax(data, axis = 1)
how can i create a numpy array from both lists which contains only the tuples that are in both list?
Edit:
argrelmax creates a tuple with two arrays:
datarelmax_0 = ([1,2,3,4,5],[6,7,8,9,10])
datarelmax_1 = ([11,2,13,14,5], [11,7,13,14,10])
in want to create a numpy array that looks like:
result_ar[(2,7),(5,10)]
How about this "naive" way?
import numpy as np
result = np.array([x for x in datarelmax_0 if x in datarelmax_1])
Pretty simple. Maybe there's a better/faster/fancier way by using some numpy methods but this should work for now.
EDIT:
To answer your edited question, you can do this:
result = [x for x in zip(datarelmax_0[0], datarelmax_0[1]) if x in zip(datarelmax_1[0], datarelmax_1[1])]
This gives you
result = [(2, 7), (5, 10)]
If you convert it to a numpy array by using
result = np.array(result)
it looks like this:
result = array([[ 2, 7],
[ 5, 10]])
In case you are interested in what zip does:
>>> zip(datarelmax_0[0], datarelmax_0[1])
[(1, 6), (2, 7), (3, 8), (4, 9), (5, 10)]
>>> zip(datarelmax_1[0], datarelmax_1[1])
[(11, 11), (2, 7), (13, 13), (14, 14), (5, 10)]

How to concatenate two parts of an array to make a new array

Here I need to get a 2D array
x = np.zeros((10, 4))
y = np.ones((10, 4))
c = np.array([x[0:3, :], y[0:3, :]])
print c.shape # I get (2, 3, 4)
np.reshape(c, (6, 4))
print c.shape # I get (2, 3, 4)
I need to get a 2D array of 6 rows by 4 columns.
np.concatenate((x[0:3,:], y[0:3,:]), axis=0)
Or
np.vstack((x[0:3,:],y[0:3,:]))
The most concise solution is probably
c = np.r_[x[:3], y[:3]]
(The most concise solution isn't necessarily the most readable solution.)

Cartesian product of two 2d arrays

Suppose I have a 2d image, with associated coordinates (x,y) at every point.
I want to find the inner product of the position vector at every point $i$ with every other point $j$. Essentially, the Cartesian product of two 2d arrays.
What would be the fastest way to accomplish this, in Python?
My current implementation looks something like this:
def cartesian_product(arrays):
broadcastable = np.ix_(*arrays)
broadcasted = np.broadcast_arrays(*broadcastable)
rows, cols = reduce(np.multiply, broadcasted[0].shape), len(broadcasted)
out = np.empty(rows * cols, dtype=broadcasted[0].dtype)
start, end = 0, rows
for a in broadcasted:
out[start:end] = a.reshape(-1)
start, end = end, end + rows
return out.reshape(cols, rows).T
def inner_product():
x, y = np.meshgrid(np.arange(4),np.arange(4))
cart_x = cartesian_product([x.flatten(),x.flatten()])
cart_y = cartesian_product([y.flatten(),y.flatten()])
Nx = x.shape[0]
xx = (cart_x[:,0]*cart_x[:,1]).reshape((Nx**2,Nx,Nx))
yy = (cart_y[:,0]*cart_y[:,1]).reshape((Nx**2,Nx,Nx))
inner_products = xx+yy
return inner_products
(Credit where credit is due: cartesian_product is taken from Using numpy to build an array of all combinations of two arrays)
But this doesn't work. For larger arrays (say, 256x256), this gives me a memory error.
You're probably storing the generated cartesian product.
You're taking product of 2 dimensional arrays. Product of mxm and nxn matrices would produce (mmn*n) values.
For 256*256 matrices it's going to generate 2^32=4,294,967,296 elements.
If you don't need all the values at the same time, you could try storing a few and processing them and disposing them off before generating next values.
Simpler way of taking cartesian product, would be like :
import itertools
xMax = 2
yMax = 2
m1 = [ [ (x + y*xMax) for x in range(xMax)] for y in range(yMax)]
print("m1=" + `m1`)
m2 = [ [ chr(ord('a') + (x + y*xMax)) for x in range(xMax)] for y in range(yMax)]
print("m2=" + `m2`)
for x in m1 :
for y in m2:
for e in itertools.product(x,y): #generating xMax *xMax at at time, process one by one or in batch
print e
Above code will generate following output
m1=[[0, 1], [2, 3]]
m2=[['a', 'b'], ['c', 'd']]
(0, 'a')
(0, 'b')
(1, 'a')
(1, 'b')
(0, 'c')
(0, 'd')
(1, 'c')
(1, 'd')
(2, 'b')
(2, 'a')
(3, 'a')
(3, 'b')
(2, 'c')
(2, 'd')
(3, 'c')
(3, 'd')

Categories

Resources