Cartesian product of two 2d arrays - python

Suppose I have a 2d image, with associated coordinates (x,y) at every point.
I want to find the inner product of the position vector at every point $i$ with every other point $j$. Essentially, the Cartesian product of two 2d arrays.
What would be the fastest way to accomplish this, in Python?
My current implementation looks something like this:
def cartesian_product(arrays):
broadcastable = np.ix_(*arrays)
broadcasted = np.broadcast_arrays(*broadcastable)
rows, cols = reduce(np.multiply, broadcasted[0].shape), len(broadcasted)
out = np.empty(rows * cols, dtype=broadcasted[0].dtype)
start, end = 0, rows
for a in broadcasted:
out[start:end] = a.reshape(-1)
start, end = end, end + rows
return out.reshape(cols, rows).T
def inner_product():
x, y = np.meshgrid(np.arange(4),np.arange(4))
cart_x = cartesian_product([x.flatten(),x.flatten()])
cart_y = cartesian_product([y.flatten(),y.flatten()])
Nx = x.shape[0]
xx = (cart_x[:,0]*cart_x[:,1]).reshape((Nx**2,Nx,Nx))
yy = (cart_y[:,0]*cart_y[:,1]).reshape((Nx**2,Nx,Nx))
inner_products = xx+yy
return inner_products
(Credit where credit is due: cartesian_product is taken from Using numpy to build an array of all combinations of two arrays)
But this doesn't work. For larger arrays (say, 256x256), this gives me a memory error.

You're probably storing the generated cartesian product.
You're taking product of 2 dimensional arrays. Product of mxm and nxn matrices would produce (mmn*n) values.
For 256*256 matrices it's going to generate 2^32=4,294,967,296 elements.
If you don't need all the values at the same time, you could try storing a few and processing them and disposing them off before generating next values.
Simpler way of taking cartesian product, would be like :
import itertools
xMax = 2
yMax = 2
m1 = [ [ (x + y*xMax) for x in range(xMax)] for y in range(yMax)]
print("m1=" + `m1`)
m2 = [ [ chr(ord('a') + (x + y*xMax)) for x in range(xMax)] for y in range(yMax)]
print("m2=" + `m2`)
for x in m1 :
for y in m2:
for e in itertools.product(x,y): #generating xMax *xMax at at time, process one by one or in batch
print e
Above code will generate following output
m1=[[0, 1], [2, 3]]
m2=[['a', 'b'], ['c', 'd']]
(0, 'a')
(0, 'b')
(1, 'a')
(1, 'b')
(0, 'c')
(0, 'd')
(1, 'c')
(1, 'd')
(2, 'b')
(2, 'a')
(3, 'a')
(3, 'b')
(2, 'c')
(2, 'd')
(3, 'c')
(3, 'd')

Related

how to select some lines created by points based on their distances in python

I have some lines created by connecting points of a regular grid and want to pair the correct lines to create surfces. This is coordinates of my point array:
coord=np.array([[0.,0.,2.], [0.,1.,3.], [0.,2.,2.], [1.,0.,1.], [1.,1.,3.],\
[1.,2.,1.], [2.,0.,1.], [2.,1.,1.], [3.,0.,1.], [4.,0.,1.]])
Then, I created lines by connecting points. My points are from a regular grid. So, I have two perpendicular sets of lines. I called them blue (vertical) and red (horizontal) lines. To do so:
blue_line=[]
for ind, i in enumerate (range (len(coord)-1)):
if coord[i][0]==coord[i+1][0]:
line=[ind, ind+1]
# line=[x+1 for x in line]
blue_line.append(line)
threshold_x = 1.5
threshold_y = 1.5
i, j = np.where((coord[:, 1] == coord[:, np.newaxis, 1]) &
(abs(coord[:, 0] - coord[:, np.newaxis, 0]) < 1.2 * threshold_y))
# Restrict to where i is before j
i, j = i[i < j], j[i < j]
# Combine and print the indices
red_line=np.vstack([i, j]).T
blue_line=np.array(blue_line)
red_line=np.array(red_line)
all_line=np.concatenate((blue_line, red_line), axis=0)
To find the correct lines for creating surfaces, I check the center of each line with the adjacent ones. I start from the first blue line and try if there are other three adjacent lines or not. If I find any line which its center is less than threshold_x and also its x coordinate is different from that line, I will keep it as a pair. Then I continue searching for adjacent lines with this rule. My fig clearly shows it. First blue line is connected by an arrow to the blue line numbered 3 and also red lines numbered 6 and 7. It is not paired with blue line numbered 2 because they have the same x coordinate. I tried the following but it was not successful to do all the things and I coulnot solve it:
ave_x=[]
ave_y=[]
ave_z=[]
for ind, line in enumerate (all_line):
x = (coord[line][0][0]+coord[line][1][0])/2
ave_x.append (x)
y = (coord[line][0][1]+coord[line][1][1])/2
ave_y.append (y)
z = (coord[line][0][2]+coord[line][1][2])/2
ave_z.append (z)
avs=np.concatenate((ave_x, ave_y, ave_z), axis=0)
avs=avs.reshape(-1,len (ave_x))
avs_f=avs.T
blue_red=[len (blue_line), len (red_line)]
avs_split=np.split(avs_f,np.cumsum(blue_red))[:-1] # first array is center of
# blue lines and second is center of red lines
dists=[]
for data in avs_split:
for ind, val in enumerate (data):
if ind < len(data):
for ind in range (len(data)-1):
squared_dist = np.sum((data[ind]-data[ind+1])**2, axis=0)
dists.append (squared_dist)
In fact I expect my code to give me the resulting list as the pairs of the lines the create three surfaces:
[(1, 6, 3, 7), (2, 7, 4, 8), (3, 9, 5, 10)]
At the end, I want to find the number of lines which are not used in creating the surfaces or are used but are closer than a limit the the dashed line in my fig. I have the coordinate of the two points creating that dashed line:
coord_dash=np.array([[2., 2., 2.], [5., 0., 1.]])
adjacency_threshold=2
These line numbers are also:
[4, 10, 5, 11, 12]
In advance I do appreciate any help.
I'm not sure my answer is what you are looking for because your question is a bit unclear. To start off I create the blue and red lines as dictionaries, where the keys are the line numbers and the values are tuples with the star and end point numbers. I also create a dictionary all_mid where the key is the line number and the value is a pandas Series with the coordinates of the mid point.
import numpy as np
import pandas as pd
coord = np.array([[0.,0.,2.], [0.,1.,3.], [0.,2.,2.], [1.,0.,1.], [1.,1.,3.],
[1.,2.,1.], [2.,0.,1.], [2.,1.,1.], [3.,0.,1.], [4.,0.,1.]])
df = pd.DataFrame(
data=sorted(coord, key=lambda item: (item[0], item[1], item[2])),
columns=['x', 'y', 'z'],
index=range(1, len(coord) + 1))
count = 1
blue_line = {}
for start, end in zip(df.index[:-1], df.index[1:]):
if df.loc[start, 'x'] == df.loc[end, 'x']:
blue_line[count] = (start, end)
count += 1
red_line = []
index = df.sort_values('y').index
for start, end in zip(index[:-1], index[1:]):
if df.loc[start, 'y'] == df.loc[end, 'y']:
red_line.append((start, end))
red_line = {i + count: (start, end)
for i, (start, end) in enumerate(sorted(red_line))}
all_line = {**blue_line, **red_line}
all_mid = {i: (df.loc[start] + df.loc[end])/2
for i, (start, end) in all_line.items()}
The lines look like this:
In [875]: blue_line
Out[875]: {1: (1, 2), 2: (2, 3), 3: (4, 5), 4: (5, 6), 5: (7, 8)}
In [876]: red_line
Out[876]:
{6: (1, 4),
7: (2, 5),
8: (3, 6),
9: (4, 7),
10: (5, 8),
11: (7, 9),
12: (9, 10)}
Then I define some utility functions:
adjacent returns True if the input points are adjacent.
left_to_right returns True if the x coordinate of the first point is less than the x coordinate of the second point.
connections returns a dictionary in which the key is a line number and the value is a list with the line numbers connected to it.
def adjacent(p, q, threshold=1):
dx = abs(p['x'] - q['x'])
dy = abs(p['y'] - q['y'])
dxy = np.sqrt(dx**2 + dy**2)
return np.max([dx, dy, dxy]) <= threshold
def left_to_right(p, q):
return p['x'] < q['x']
def connections(midpoints, it):
mapping = {}
for start, end in it:
if adjacent(midpoints[start], midpoints[end]):
if left_to_right(midpoints[start], midpoints[end]):
if start in mapping:
if end not in mapping[start]:
mapping[start].append(end)
else:
mapping[start] = [end]
return mapping
We are now ready to create a list of lists, in which each sublist has the line numbers that make up a surface:
from itertools import product, combinations
blues = blue_line.keys()
reds = red_line.keys()
blue_to_red = connections(all_mid, product(blues, reds))
blue_to_blue = connections(all_mid, combinations(blues, r=2))
surfaces = []
for start in blue_line:
red_ends = blue_to_red.get(start, [])
blue_ends = blue_to_blue.get(start, [])
if len(red_ends) == 2 and len(blue_ends) == 1:
surfaces.append(sorted([start] + red_ends + blue_ends))
This is what you get:
In [879]: surfaces
Out[879]: [[1, 3, 6, 7], [2, 4, 7, 8], [3, 5, 9, 10]]

padding a input vector, a 4-D matrix, using numpy for a convolutional neural network (CNN)

This is the entire code related to my question. You should be able to run this code and see the plots created - by just pasting and running it into your IDE.
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(1)
x = np.random.randn(4, 3, 3, 2)
x_pad = np.pad(x, ((0,0), (2, 2), (2, 2), (0,0))\
, mode='constant', constant_values = (0,0))
print ("x.shape =\n", x.shape)
print ("x_pad.shape =\n", x_pad.shape)
print ("x[1,1] =\n", x[1,1])
print ("x_pad[1,1] =\n", x_pad[1,1])
fig, axarr = plt.subplots(1, 2)
axarr[0].set_title('x')
axarr[0].imshow(x[0,:,:,0])
axarr[1].set_title('x_pad')
axarr[1].imshow(x_pad[0,:,:,0])
Specifically, my question is related to these two lines of code:
x = np.random.randn(4, 3, 3, 2)
x_pad = np.pad(x, ((0,0), (2, 2), (2, 2), (0,0)), mode='constant', constant_values = (0,0))
I want to pad the 2nd and 3rd dimension in x. So, I want to pad x[1] which has a value of 3 and x[2] which also has the value of 3. Based on the problem that I am solving, x[0] and x[3], which contain '4' and '2' respectively, represent something else. x[0] represents the number of number of such 3*3 matrices and x[3] the channels.
My question is about around how python is representing this information and about how we are interpreting it. Are these the same?
The statement x = np.random.randn (4, 3, 3, 2) created a matrix 4 rows by 3 columns and each element in this 4*3 matrix is a 3 row by 2 column matrix. That is how Python is representing the x_pad. Is this understanding correct?
If so, then in the np.pad statement, we are padding the number of columns in the outer matrix (which is 3 in the 4*3). We are also padding the number of rows, which is 3, in the “3*2” - that is, the number of rows in the inner matrix).
The 3, 3 in (4, 3, 3, 2) was supposed to be part of just one matrix and not the columns of the outer matrix and the rows of the inner matrix? I am having trouble visualizing this? Can someone please clarify. Thank you!
These lines:
x = np.random.randn(4, 3, 3, 2)
x_pad = np.pad(x, ((0,0), (2, 2), (2, 2), (0,0)), mode='constant', constant_values = (0,0))
are equivalent to:
x = np.random.randn(4, 3, 3, 2)
x_pad = np.zeros((4, 3+2+2, 3+2+2, 2))
x_pad[:, 2:-2, 2:-2, :] = x
You could interpret a 4-D array as being a 2-D array of 2-D arrays if that fits whatever this data represents for you, but numpy internally stores arrays as a 1D array of data; with x[i,j,k,l] pointing to data[l+n3*(k + n2*(j + n1*i))] where n1, n2, n3 are the lengths of the corresponding axes.
Visualizing 4-D (and higher) arrays is very difficult for humans. You just have to keep track of the indices for the four axes when you deal with such arrays.

How do i code for find coordinates in 2d matrix?

If the 3x4 matrix is shown below,
a=[[1,2,3,4], [5,6,7,8], [9,10,11,12]]
I want to find 7 and draw the coordinate value (2,3) into a variable.
Do you have a built-in function?
In matlab, [row, col] = find(a==7), and result is row=2,col=3.
I'm curious about how Python works.
After initializing the value of the matrix value you want,
val = 7
here is a nice one-liner:
array = [(ix,iy) for ix, row in enumerate(a) for iy, i in enumerate(row) if i == val]
Output of print(array):
[(1, 2)]
Note the one-liner will catch all instances of the number 7 in a matrix, not just one. Also note the indexes start at 0, so row 2 will be displayed as 1 and column 3 will be displayed as 2. If, say, you have more than one instance of 7 in a row and want the actual row and column numbers (not starting at 0), this may be helpful:
a=[[1,7,7,4], [5,6,7,8], [9,10,11,7]]
val = 7
array = [(ix+1,iy+1) for ix, row in enumerate(a) for iy, i in enumerate(row) if i == val]
print(array)
Output:
[(1, 2), (1, 3), (2, 3), (3, 4)]
To do it similar to Matlab you would have to use numpy
import numpy as np
a = [[1,2,3,4], [5,6,7,8], [9,10,11,12]]
a = np.array(a)
rows, cols = np.where(a == 7)
print(rows[0], cols[0])
It can find all 7 in matrix so it returns rows, cols as lists.
And it counts rows/cols starting at 0 so you may have to add +1 to get the same results as matlab
I would use numpy's where function. Here's another post that displays it's use nicely. I'd apply it to your use case like so:
import numpy as np
arr = np.array([[1, 2, 3],[4, 100, 6],[100, 8, 9]])
positions = np.where(arr == 100)
# positions = (array([1, 2], dtype=int64), array([1, 0], dtype=int64))
positions = [tuple(cor.item() for cor in pos) for pos in positions]
# positions = [(1, 2), (1, 0)]
Note that this solution allows for the possibly that the desired pattern might occur more than once.

Creating a numpy array and then sorting array

I have been encountering a problem that I can't seem to solve I need to take a list of strings and calculate some values and then add the relevant string and the relevant integer to a numpy array. I've been told to create the numpy array of zeroes first as it will be of a known length so I can do that. My problem is how do I iteratively add each string to the first column (names) and each value (labels) to the second and then sort the full array alphabetically by the first column
fileCount = sum([len(files) for r, d, files in os.walk(inputDirectory)])
labelArray = np.zeros(shape = (fileCount,2))
arrayInsertCounter = 0
for label, subDirectories in enumerate(inputDirectory):
subDirPath = os.path.join(inputDirectory, subDirectories)
for name in subDirPath:
labelArray[arrayInsertCounter] = [name,label]
arrayInsertCounter += 1
You could do it in numpy using a structured array
import numpy as np
labels = list(map(''.join, zip(*3*((chr(ord('a')+(19*i)%24) for i in range(24)),))))
numbers = np.arange(8)
dt = np.dtype([('label', object), ('value', int)])
table = np.empty((8,), dtype = dt)
table['label'] = labels
table['value'] = numbers
print(table)
table.sort()
print(table)
Output:
#[('ato', 0) ('jex', 1) ('sni', 2) ('dwr', 3) ('mhc', 4) ('vql', 5)
# ('gbu', 6) ('pkf', 7)]
#[('ato', 0) ('dwr', 3) ('gbu', 6) ('jex', 1) ('mhc', 4) ('pkf', 7)
# ('sni', 2) ('vql', 5)]
Edit: How to access individual records:
table[2] = 'new label', 1000
table
# array([('ato', 0), ('dwr', 3), ('new label', 1000), ('jex', 1),
# ('mhc', 4), ('pkf', 7), ('sni', 2), ('vql', 5)],
# dtype=[('label', 'O'), ('value', '<i8')])

Distance of Robot from Starting Point after a Series of Movements

I am trying to write a program which takes a list of directions and magnitudes and outputs the distance of the robot from its starting position.
I get an error when executing the following code but I cannot identify why I get the error.
import math
position = [0,0]
direction = ['+Y','-X','-Y','+X','-X','-Y','+X']
magnitude = [9,7,4,8,3,6,2]
i = 0
while i < len(direction):
if direction[i] == '+Y': position[0] += magnitude[i]
elif direction[i] == '-Y': position[0] -= magnitude[i]
elif direction[i] == '+X': position[1] += magnitude[i]
elif direction[i] == '-X': position[1] -= magnitude[i]
else: pass
i += 1
print float(math.sqrt(position[1]**2+position[0]**2))
Edit:
I get this error:
IndentationError: unindent does not match any outer indentation level
Most probably you have mixed up your spaces and tabs. In this instance, it might be easier to put the sign within the magnitude and filter with x and y like so:
In [15]: mDr = [ (int(d[0]+m), d[1]) for (d, m) in zip(direction, map(str, magnitude))]
In [16]: mDr
Out[16]: [(9, 'Y'), (-7, 'X'), (-4, 'Y'), (8, 'X'), (-3, 'X'), (-6, 'Y'), (2, 'X')]
In this case, you can get to total x and y distances pretty easily. For example, the y distances:
In [17]: [md[0] for md in mDr if md[1] =='Y']
Out[17]: [9, -4, -6]
And the total y distance in the particular direction:
In [18]: sum( [md[0] for md in mDr if md[1] =='Y'] )
Out[18]: -1
You can do the same for x and then calculate the distance that way.
Here comes my offtopic reaction (your problem was mixing tabs and spaces, my answer is simple rewrite).
import math
xymoves = {"+X": (1, 0), "-X": (-1, 0), "+Y": (0, 1), "-Y": (0, -1)}
position = [0, 0]
directions = ['+Y', '-X', '-Y', '+X', '-X', '-Y', '+X']
assert all(xymove in xymoves for xymove in directions)
magnitudes = [9, 7, 4, 8, 3, 6, 2]
for direction, magnitude in zip(directions, magnitudes):
xmove, ymove = xymoves[direction]
position[0] += magnitude * xmove
position[1] += magnitude * ymove
print math.sqrt(position[1]**2+position[0]**2)
Changes:
looping using for and not while with incrementing index.
the logic "where to move" moved from if elif elif into dictionary xymoves
rejecting to process direction, which is not expected
math.sqrt always returns float, so conversion to float removed
Note, that the dictionary with xymoves could be extended with other directions, e.g. using "N" for North, "NE" for North-East etc.

Categories

Resources