Numpy trouble vectorizing certain kind of aggregation

Numpy trouble vectorizing certain kind of aggregation - python

I am having difficulty in vectorizing the below operation:
# x.shape = (a,)
# y.shape = (a, b)
# x and y are ordered over a.
# Want to combine x, y into z.shape(num_unique_x, b)
# Below works and illustrates intent but is iterative
z = np.zeros((num_unique_x, b))
for i in range(a):
z[x[i], y[i, :]] += 1

Your use of num_unique_x, and the size of z suggests that this is a case where x and y have repeats, and that some of the z will be larger than 1. In which case we need to use np.add.at. But to set that up I'd have review its documentation, and possibly test some alternatives.
But first a no-repeats case
In [522]: x=np.arange(6)
In [523]: y=np.arange(3)+x[:,None]
In [524]: y
Out[524]:
array([[0, 1, 2],
[1, 2, 3],
[2, 3, 4],
[3, 4, 5],
[4, 5, 6],
[5, 6, 7]])
See why I ask for a diagnostic example. I'm guessing as to possible values. I have to make a z with more than 3 columns.
In [529]: z=np.zeros((6,8),dtype=int)
In [530]: for i in range(6):
...: z[x[i],y[i,:]]+=1
In [531]: z
Out[531]:
array([[1, 1, 1, 0, 0, 0, 0, 0],
[0, 1, 1, 1, 0, 0, 0, 0],
[0, 0, 1, 1, 1, 0, 0, 0],
[0, 0, 0, 1, 1, 1, 0, 0],
[0, 0, 0, 0, 1, 1, 1, 0],
[0, 0, 0, 0, 0, 1, 1, 1]])
The vectorized equivalent
In [532]: z[x[:,None],y]
Out[532]:
array([[1, 1, 1],
[1, 1, 1],
[1, 1, 1],
[1, 1, 1],
[1, 1, 1],
[1, 1, 1]])
In [533]: z[x[:,None],y] += 1
In [534]: z
Out[534]:
array([[2, 2, 2, 0, 0, 0, 0, 0],
[0, 2, 2, 2, 0, 0, 0, 0],
[0, 0, 2, 2, 2, 0, 0, 0],
[0, 0, 0, 2, 2, 2, 0, 0],
[0, 0, 0, 0, 2, 2, 2, 0],
[0, 0, 0, 0, 0, 2, 2, 2]])
The corresponding add.at expression is
In [538]: np.add.at(z,(x[:,None],y),1)
In [539]: z
Out[539]:
array([[3, 3, 3, 0, 0, 0, 0, 0],
[0, 3, 3, 3, 0, 0, 0, 0],
[0, 0, 3, 3, 3, 0, 0, 0],
[0, 0, 0, 3, 3, 3, 0, 0],
[0, 0, 0, 0, 3, 3, 3, 0],
[0, 0, 0, 0, 0, 3, 3, 3]])
So that works for this no-repeats case.
For repeats in x:
In [542]: x1=np.array([0,1,1,2,3,5])
In [543]: z1=np.zeros((6,8),dtype=int)
In [544]: np.add.at(z1,(x1[:,None],y),1)
In [545]: z1
Out[545]:
array([[1, 1, 1, 0, 0, 0, 0, 0],
[0, 1, 2, 2, 1, 0, 0, 0],
[0, 0, 0, 1, 1, 1, 0, 0],
[0, 0, 0, 0, 1, 1, 1, 0],
[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 1, 1, 1]])
Without add.at we miss the 2s.
In [546]: z2=np.zeros((6,8),dtype=int)
In [547]: z2[x1[:,None],y] += 1
In [548]: z2
Out[548]:
array([[1, 1, 1, 0, 0, 0, 0, 0],
[0, 1, 1, 1, 1, 0, 0, 0],
[0, 0, 0, 1, 1, 1, 0, 0],
[0, 0, 0, 0, 1, 1, 1, 0],
[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 1, 1, 1]])

Related

Generating binary entries array in python

I would like to generate an array as follows:
[[0,0,0],
[0,0,1],
[0,1,0],
[0,1,1],
[1,0,0],
[1,0,1],
[1,1,0]
[1,1,1]]
I tried to achieve this by setting 3 for loops, but I wish to go further to 4, 5, and higher bit-numbers, so the last method would not scale easly to these numbers.
Is there any simple way for doing this?

I can't figure out why you want this, but here goes:
For 3:
>>> [[int(x) for x in "{0:03b}".format(y)] for y in range(8)]
[[0, 0, 0], [0, 0, 1], [0, 1, 0], [0, 1, 1], [1, 0, 0], [1, 0, 1], [1, 1, 0], [1, 1, 1]]
>>>
For 5:
>>> [[int(x) for x in "{0:05b}".format(y)] for y in range(32)]
[[0, 0, 0, 0, 0], [0, 0, 0, 0, 1], [0, 0, 0, 1, 0], [0, 0, 0, 1, 1], [0, 0, 1, 0, 0], [0, 0, 1, 0, 1], [0, 0, 1, 1, 0], [0, 0, 1, 1, 1], [0, 1, 0, 0, 0], [0, 1, 0, 0, 1], [0, 1, 0, 1, 0], [0, 1, 0, 1, 1], [0, 1, 1, 0, 0], [0, 1, 1, 0, 1], [0, 1, 1, 1, 0], [0, 1, 1, 1, 1], [1, 0, 0, 0, 0], [1, 0, 0, 0, 1], [1, 0, 0, 1, 0], [1, 0, 0, 1, 1], [1, 0, 1, 0, 0], [1, 0, 1, 0, 1], [1, 0, 1, 1, 0], [1, 0, 1, 1, 1], [1, 1, 0, 0, 0], [1, 1, 0, 0, 1], [1, 1, 0, 1, 0], [1, 1, 0, 1, 1], [1, 1, 1, 0, 0], [1, 1, 1, 0, 1], [1, 1, 1, 1, 0], [1, 1, 1, 1, 1]]
>>>
Matching your formatting is harder.

You can use itertools.product to do this.
>>> import itertools
>>> list(itertools.product([0,1], repeat=3))
[(0, 0, 0), (0, 0, 1), (0, 1, 0), (0, 1, 1), (1, 0, 0), (1, 0, 1), (1, 1, 0), (1, 1, 1)]
https://docs.python.org/3/library/itertools.html#itertools.product

You can use a recursive function like the following:
def generate_binary_entries(n, t=[[]]): # n: length of bit number
if n == 0:
return t
new_t = []
for entry in t:
new_t.append(entry + [0])
new_t.append(entry + [1])
return generate_binary_entries(n - 1, new_t)
Then
generate_binary_entries(4)
generates
[[0, 0, 0, 0],
[0, 0, 0, 1],
[0, 0, 1, 0],
[0, 0, 1, 1],
[0, 1, 0, 0],
[0, 1, 0, 1],
[0, 1, 1, 0],
[0, 1, 1, 1],
[1, 0, 0, 0],
[1, 0, 0, 1],
[1, 0, 1, 0],
[1, 0, 1, 1],
[1, 1, 0, 0],
[1, 1, 0, 1],
[1, 1, 1, 0],
[1, 1, 1, 1]]

Need a recursive function to get all permutations of an array where each element is itself plus 0 to n

Sorry for the wording of the title as I am unsure how to phrase the question.
I am trying to get all permutations of an array where each element could be it's value plus 0 to n ('wild' value)
e.g.
The array [0, 1, 0, 2, 1] with the wild value equal to 1 would have the permutations:
[1, 1, 0, 2, 1]
[0, 2, 0, 2, 1]
[0, 1, 1, 2, 1]
[0, 1, 0, 3, 1]
[0, 1, 0, 2, 2]
The array [1, 2, 0, 0] with the wild value equal to 2 would have the permutations:
[3, 2, 0, 0]
[2, 3, 0, 0]
[2, 2, 1, 0]
[2, 2, 0, 1]
[1, 4, 0, 0]
[2, 3, 0, 0]
[1, 3, 1, 0]
[1, 3, 0, 1]
[1, 2, 2, 0]
[2, 2, 1, 0]
[1, 3, 1, 0]
[1, 2, 1, 1]
... and so on...
This is the code I have tried, but it is not producing the desired results:
def generateAllMatrices(length, buckets, ind, wild):
if ind == length:
# possible_buckets.append(buckets.copy())
print(buckets)
return
if wild != 0:
for i in range(1, wild + 1):
buckets[ind] += 1
generateAllMatrices(length, buckets, 0, wild - 1)
buckets[ind] -= wild
generateAllMatrices(length, buckets, ind + 1, wild)
An example result produced from the above code is:
Original = [1, 0, 0, 2, 0, 1, 0, 0, 0, 1, 1, 0, 0]
Wild = 1
Permutations:
[2, 0, 0, 2, 0, 1, 0, 0, 0, 1, 1, 0, 0]
[1, 1, 0, 2, 0, 1, 0, 0, 0, 1, 1, 0, 0]
[1, 0, 1, 2, 0, 1, 0, 0, 0, 1, 1, 0, 0]
[1, 0, 0, 3, 0, 1, 0, 0, 0, 1, 1, 0, 0]
[1, 0, 0, 2, 1, 1, 0, 0, 0, 1, 1, 0, 0]
[1, 0, 0, 2, 0, 2, 0, 0, 0, 1, 1, 0, 0]
[1, 0, 0, 2, 0, 1, 1, 0, 0, 1, 1, 0, 0]
[1, 0, 0, 2, 0, 1, 0, 1, 0, 1, 1, 0, 0]
[1, 0, 0, 2, 0, 1, 0, 0, 1, 1, 1, 0, 0]
[1, 0, 0, 2, 0, 1, 0, 0, 0, 2, 1, 0, 0]
[1, 0, 0, 2, 0, 1, 0, 0, 0, 1, 2, 0, 0]
[1, 0, 0, 2, 0, 1, 0, 0, 0, 1, 1, 1, 0]
[1, 0, 0, 2, 0, 1, 0, 0, 0, 1, 1, 0, 1]
[1, 0, 0, 2, 0, 1, 0, 0, 0, 1, 1, 0, 0]
Are there any similar algorithms I could reference for this? Or what route should I take regarding developing something that will produce what I need.
Thanks!

You could do the following:
import itertools
def make_reps(l, wild):
for indices in itertools.product(range(len(l)), repeat=wild):
new_l = list(l)
for i in indices:
new_l[i] += 1
yield new_l
With your given examples:
In [12]: list(make_reps([0, 1, 0, 2, 1], 1))
Out[12]:
[[1, 1, 0, 2, 1],
[0, 2, 0, 2, 1],
[0, 1, 1, 2, 1],
[0, 1, 0, 3, 1],
[0, 1, 0, 2, 2]]
In [14]: list(make_reps([1, 2, 0, 0], 2))
Out[14]:
[[3, 2, 0, 0],
[2, 3, 0, 0],
[2, 2, 1, 0],
[2, 2, 0, 1],
[2, 3, 0, 0],
[1, 4, 0, 0],
[1, 3, 1, 0],
[1, 3, 0, 1],
[2, 2, 1, 0],
[1, 3, 1, 0],
[1, 2, 2, 0],
[1, 2, 1, 1],
[2, 2, 0, 1],
[1, 3, 0, 1],
[1, 2, 1, 1],
[1, 2, 0, 2]]

scipy.ndimage.label: include error margin

After reading an interesting topic on scipy.ndimage.label (Variable area threshold for identifying objects - python), I'd like to include an 'error margin' in the labelling.
In the above linked discussion:
How can the blue dot on top be included, too (let's say it is wrongly disconnected from the orange, biggest, object)?
I found the structure attribute, which should be able to include that dot by changing the array (from np.ones(3,3,3) to anything more than that (I'd like it to be 3D). However, adjusting the 'structure' attribute to a larger array does not seem to work, unfortunately. It either gives an error of dimensions (RuntimeError: structure and input must have equal rank
) or it does not change anything..
Thanks!
this is the code:
labels, nshapes = ndimage.label(a, structure=np.ones((3,3,3)))
in which a is a 3D array.

Here's a possible approach that uses scipy.ndimage.binary_dilation. It is easier to see what is going on in a 2D example, but I'll show how to generalize to 3D at the end.
In [103]: a
Out[103]:
array([[0, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[1, 1, 0, 0, 1, 0, 0],
[1, 1, 0, 0, 0, 1, 1],
[0, 0, 0, 0, 0, 1, 1],
[1, 1, 1, 0, 0, 0, 0]])
In [104]: from scipy.ndimage import label, binary_dilation
Extend each "shape" by one pixel down and to the right:
In [105]: b = binary_dilation(a, structure=np.array([[0, 0, 0], [0, 1, 1], [0, 1, 1]])).astype(int)
In [106]: b
Out[106]:
array([[0, 0, 0, 1, 1, 0, 0],
[0, 0, 0, 1, 1, 0, 0],
[1, 1, 1, 0, 1, 1, 0],
[1, 1, 1, 0, 1, 1, 1],
[1, 1, 1, 0, 0, 1, 1],
[1, 1, 1, 1, 0, 1, 1]])
Apply label to the padded array:
In [107]: labels, numlabels = label(b)
In [108]: numlabels
Out[108]: 2
In [109]: labels
Out[109]:
array([[0, 0, 0, 1, 1, 0, 0],
[0, 0, 0, 1, 1, 0, 0],
[2, 2, 2, 0, 1, 1, 0],
[2, 2, 2, 0, 1, 1, 1],
[2, 2, 2, 0, 0, 1, 1],
[2, 2, 2, 2, 0, 1, 1]], dtype=int32)
By multiplying a by labels, we get the desired array of labels of a:
In [110]: alab = labels*a
In [111]: alab
Out[111]:
array([[0, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[2, 2, 0, 0, 1, 0, 0],
[2, 2, 0, 0, 0, 1, 1],
[0, 0, 0, 0, 0, 1, 1],
[2, 2, 2, 0, 0, 0, 0]])
(This assumes that the values in a are 0 or 1. If they are not, you can use alab = labels * (a > 0).)
For a 3D input, you have to change the structure argument to binary_dilation:
struct = np.zeros((3, 3, 3), dtype=int)
struct[1:, 1:, 1:] = 1
b = binary_dilation(a, structure=struct).astype(int)

Label regions with unique combinations of values in two numpy arrays?

I have two labelled 2D numpy arrays a and b with identical shapes. I would like to re-label the array b by something similar to a GIS geometric union of the two arrays, such that cells with unique combination of values in array a and b are assigned new unique IDs:
I'm not concerned with the specific numbering of the regions in the output, so long as the values are all unique. I have attached sample arrays and desired outputs below: my real datasets are much larger, with both arrays having integer labels which range from "1" to "200000". So far I've experimented with concatenating the array IDs to form unique combinations of values, but ideally I would like to output a simple set of new IDs in the form of 1, 2, 3..., etc.
import numpy as np
import matplotlib.pyplot as plt
# Example labelled arrays a and b
input_a = np.array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 0],
[0, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 0],
[0, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 0],
[0, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 0],
[0, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 0],
[0, 3, 3, 3, 3, 3, 2, 2, 2, 2, 2, 0],
[0, 0, 3, 3, 3, 3, 2, 2, 2, 2, 0, 0],
[0, 0, 3, 3, 3, 3, 2, 2, 2, 2, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])
input_b = np.array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 1, 1, 3, 3, 3, 3, 3, 0, 0],
[0, 0, 1, 1, 1, 3, 3, 3, 3, 3, 0, 0],
[0, 0, 1, 1, 1, 2, 2, 2, 2, 2, 0, 0],
[0, 0, 1, 1, 1, 2, 2, 2, 2, 2, 0, 0],
[0, 0, 1, 1, 1, 2, 2, 2, 2, 2, 0, 0],
[0, 0, 1, 1, 1, 2, 2, 2, 2, 2, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])
# Plot inputs
plt.imshow(input_a, cmap="spectral", interpolation='nearest')
plt.imshow(input_b, cmap="spectral", interpolation='nearest')
# Desired output, union of a and b
output = np.array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 1, 1, 2, 3, 3, 3, 3, 0, 0],
[0, 0, 1, 1, 1, 2, 3, 3, 3, 3, 0, 0],
[0, 0, 1, 1, 1, 4, 7, 7, 7, 7, 0, 0],
[0, 0, 5, 5, 5, 6, 7, 7, 7, 7, 0, 0],
[0, 0, 5, 5, 5, 6, 7, 7, 7, 7, 0, 0],
[0, 0, 5, 5, 5, 6, 7, 7, 7, 7, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])
# Plot desired output
plt.imshow(output, cmap="spectral", interpolation='nearest')

If I understood the circumstances correctly, you are looking to have unique pairings from a and b. So, 1 from a and 1 from b would have one unique tag in the output; 1 from a and 3 from b would have another unique tag in the output. Also looking at the desired output in the question, it seems that there is an additional conditional situation here that if b is zero, the output is to be zero as well irrespective of the unique pairings.
The following implementation tries to solve all of that -
c = a*(b.max()+1) + b
c[b==0] = 0
_,idx = np.unique(c,return_inverse= True)
out = idx.reshape(b.shape)
Sample run -
In [21]: a
Out[21]:
array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 0],
[0, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 0],
[0, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 0],
[0, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 0],
[0, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 0],
[0, 3, 3, 3, 3, 3, 2, 2, 2, 2, 2, 0],
[0, 0, 3, 3, 3, 3, 2, 2, 2, 2, 0, 0],
[0, 0, 3, 3, 3, 3, 2, 2, 2, 2, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])
In [22]: b
Out[22]:
array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 1, 1, 3, 3, 3, 3, 3, 0, 0],
[0, 0, 1, 1, 1, 3, 3, 3, 3, 3, 0, 0],
[0, 0, 1, 1, 1, 2, 2, 2, 2, 2, 0, 0],
[0, 0, 1, 1, 1, 2, 2, 2, 2, 2, 0, 0],
[0, 0, 1, 1, 1, 2, 2, 2, 2, 2, 0, 0],
[0, 0, 1, 1, 1, 2, 2, 2, 2, 2, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])
In [23]: out
Out[23]:
array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 1, 1, 3, 5, 5, 5, 5, 0, 0],
[0, 0, 1, 1, 1, 3, 5, 5, 5, 5, 0, 0],
[0, 0, 1, 1, 1, 2, 4, 4, 4, 4, 0, 0],
[0, 0, 6, 6, 6, 7, 4, 4, 4, 4, 0, 0],
[0, 0, 6, 6, 6, 7, 4, 4, 4, 4, 0, 0],
[0, 0, 6, 6, 6, 7, 4, 4, 4, 4, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])
Sample plot -
# Plot inputs
plt.figure()
plt.imshow(a, cmap="spectral", interpolation='nearest')
plt.figure()
plt.imshow(b, cmap="spectral", interpolation='nearest')
# Plot output
plt.figure()
plt.imshow(out, cmap="spectral", interpolation='nearest')

Here is a way to do it conceptually in terms of set union, but not to GIS geometric union, since that was mentioned after I answered.
Make a list of all possible unique 2-tuples of values with one from a and the other from b in that order. Map each tuple in that list to its index in it. Create the union array using that map.
For example say a and b are arrays each containing values in range(4) and assume for simplicity they have the same shape. Then:
v = range(4)
from itertools import permutations
p = list(permutations(v,2))
m = {}
for i,x in enumerate(p):
m[x] = i
union = np.empty_like(a)
for i,x in np.ndenumerate(a):
union[i] = m[(x,b[i])]
For demonstration, generating a and b with
np.random.randint(4, size=(3, 3))
produced:
a = array([[3, 0, 3],
[1, 3, 2],
[0, 0, 3]])
b = array([[1, 3, 1],
[0, 0, 1],
[2, 3, 0]])
m = {(0, 1): 0,
(0, 2): 1,
(0, 3): 2,
(1, 0): 3,
(1, 2): 4,
(1, 3): 5,
(2, 0): 6,
(2, 1): 7,
(2, 3): 8,
(3, 0): 9,
(3, 1): 10,
(3, 2): 11}
union = array([[10, 2, 10],
[ 3, 9, 7],
[ 1, 2, 9]])
In this case the property that a union should be bigger or equal to its composits is reflected in increased numerical values rather than increase in number of elements.

An issue with using itertools permutations is that the number of permutations could be much larger than needed. It would be much larger if the number of overlaps per area is much smaller than the number of areas.
The question uses Union but the picture shows an Intersection. Divakar's answer replicates the pictured Intersection, and is more elegant than my solution below, which produces the Union.
One could make a dictionary of only the actual overlaps, and then work from that. Flattening the input arrays first makes this easier for me to see, I'm not sure if that is feasible for you:
shp = numpy.shape(input_a)
a = input_a.flatten()
b = input_b.flatten()
s = set(((i,j) for i,j in zip(a,b))) # unique pairings
d = {p:i for i,p in enumerate(sorted(list(s))} # dict{pair:index}
output_c = numpy.array([d[i,j] for i,j in zip(a,b)]).reshape(shp)
array([[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[ 0, 1, 1, 1, 1, 1, 5, 5, 5, 5, 5, 0],
[ 0, 1, 1, 1, 1, 1, 5, 5, 5, 5, 5, 0],
[ 0, 1, 2, 2, 2, 4, 7, 7, 7, 7, 5, 0],
[ 0, 1, 2, 2, 2, 4, 7, 7, 7, 7, 5, 0],
[ 0, 1, 2, 2, 2, 3, 6, 6, 6, 6, 5, 0],
[ 0, 8, 9, 9, 9, 10, 6, 6, 6, 6, 5, 0],
[ 0, 0, 9, 9, 9, 10, 6, 6, 6, 6, 0, 0],
[ 0, 0, 9, 9, 9, 10, 6, 6, 6, 6, 0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])

How can I use a Matrix as a dataset on PyBran?

I´m using pybrain in order to train a simple neural network in which the input is going to be a 7x5 Matrix.
The following are the inputs:
A = [[0, 0, 1, 0, 0],
[0, 1, 1, 0, 0],
[0, 1, 0, 1, 0],
[0, 1, 0, 1, 0],
[1, 1, 1, 1, 1],
[1, 0, 0, 0, 1],
[1, 0, 0, 0, 1]]
E = [[1, 1, 1, 1, 1],
[1, 0, 0, 0, 0],
[1, 0, 0, 0, 0],
[1, 1, 1, 1, 0],
[1, 0, 0, 0, 0],
[1, 0, 0, 0, 0],
[1, 1, 1, 1, 1]]
I = [[0, 0, 1, 0, 0],
[0, 0, 1, 0, 0],
[0, 0, 1, 0, 0],
[0, 0, 1, 0, 0],
[0, 0, 1, 0, 0],
[0, 0, 1, 0, 0],
[0, 0, 1, 0, 0]]
O = [[1, 1, 1, 1, 0],
[1, 0, 0, 0, 1],
[1, 0, 0, 0, 1],
[1, 0, 0, 0, 1],
[1, 0, 0, 0, 1],
[1, 0, 0, 0, 1],
[1, 1, 1, 1, 0]]
U = [[1, 0, 0, 0, 1],
[1, 0, 0, 0, 1],
[1, 0, 0, 0, 1],
[1, 0, 0, 0, 1],
[1, 0, 0, 0, 1],
[0, 1, 0, 0, 1],
[0, 0, 1, 1, 0]]
I thought writing something like:
ds = SupervisedDataSet(1, 1)
ds.addSample((A), ("A",))
might work, but I´m getting:
ValueError: cannot copy sequence with size 7 to array axis with dimension 1
Is there any way I can give this datasets to pyBrain?

First you have to know that SupervisedDataSet works with list, so you will need to convert the 2D arrays into a list. You can do it with something like this:
def convertToList (matrix):
list = [ y for x in matrix for y in x]
return list
Then you will need to give the new list to the method SupervisedDataSet.
Also if you would like to use that info to make the network you should use some number to identify the letter like A = 1, E = 2, I = 3, O = 4, U = 5. So to do this, the second parameter for SupervisedDataSet should be just a number 1. In this way you are saying something like "For a list with 35 elements use these numbers to identify a single number".
Finally your code should look like this:
ds = SupervisedDataSet(35, 1)
A2 = convertToList(A)
ds.addSample(A2, (1,))
E2 = convertToList(E)
ds.addSample(E2, (2,))
I2 = convertToList(I)
ds.addSample(I2, (3,))
O2 = convertToList(O)
ds.addSample(O2, (4,))
U2 = convertToList(U)
ds.addSample(U2, (5,))
Hope this could help.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Numpy trouble vectorizing certain kind of aggregation - python

Related

Generating binary entries array in python

Need a recursive function to get all permutations of an array where each element is itself plus 0 to n

scipy.ndimage.label: include error margin

Label regions with unique combinations of values in two numpy arrays?

How can I use a Matrix as a dataset on PyBran?

Categories

Resources