Converting a list of points to a numpy 2D array

Converting a list of points to a numpy 2D array - python

I'm using genfromtxt to import essentially a 2D array that has all its values listed in a text file of the form (x's and y's are integers):
x1 y1 z1
x2 y2 z2
: : :
I'm using the for loop below but I'm pretty sure there must be a one line way to do it. What would be a more efficient way to do this conversion?
raw = genfromtxt(file,skip_header = 6)
xrange = ( raw[:,0].min() , raw[:,0].max() )
yrange = ( raw[:,1].min() , raw[:,1].max() )
Z = zeros(( xrange[1] - xrange[0] +1 , yrange[1] - yrange[0] +1 ))
for row in raw:
Z[ row[0]-xrange[0] , row[1]-yrange[0] ] = row[2]

You can replace the for loop with the following:
xidx = (raw[:,0]-xrange[0]).astype(int)
yidx = (raw[:,1]-yrange[0]).astype(int)
Z[xidx, yidx] = raw[:,2]

To import a matrix from a file you can just split the lines and then convert to int.
[[int(i) for i in j.split()] for j in open('myfile').readlines()]
of course, I'm supposing your file contains only the matrix.
At the end, you can convert this 2-D array to numpy.

You may try something like this:
>>> Z = zeros((3, 3))
>>> test = array([[0, 1, 2], [1, 1, 6], [2, 0, 4]])
>>> Z[test[:, 0:2].T.tolist()]
array([ 0., 0., 0.])
>>> Z[test[:, 0:2].T.tolist()] = test[:, 2]
>>> Z
array([[ 0., 2., 0.],
[ 0., 6., 0.],
[ 4., 0., 0.]])
In your case:
Z[(raw[:, 0:2] - minimum(raw[:, 0:2], axis=0)).T.tolist()] = raw[:, 2]

You could also go with numpy.searchsorted which will also allow for non-equally spaced / float data:
raw = genfromtxt(file,skip_header = 6)
xvalues = numpy.sorted(set(raw[:,0]))
xidx = numpy.searchsorted(xvalues, raw[:,0])
yvalues = numpy.sorted(set(raw[:,1]))
yidx = numpy.searchsorted(yvalues, raw[:,1])
Z = numpy.zeros((len(xvalues), len(yvalues)))
Z[xidx, yidx] = raw[:,2]
Otherwise, I would be following Simon's answer.

Related

Add 2 column vector to ndarray [duplicate]

I have two numpy arrays of different shapes, but with the same length (leading dimension). I want to shuffle each of them, such that corresponding elements continue to correspond -- i.e. shuffle them in unison with respect to their leading indices.
This code works, and illustrates my goals:
def shuffle_in_unison(a, b):
assert len(a) == len(b)
shuffled_a = numpy.empty(a.shape, dtype=a.dtype)
shuffled_b = numpy.empty(b.shape, dtype=b.dtype)
permutation = numpy.random.permutation(len(a))
for old_index, new_index in enumerate(permutation):
shuffled_a[new_index] = a[old_index]
shuffled_b[new_index] = b[old_index]
return shuffled_a, shuffled_b
For example:
>>> a = numpy.asarray([[1, 1], [2, 2], [3, 3]])
>>> b = numpy.asarray([1, 2, 3])
>>> shuffle_in_unison(a, b)
(array([[2, 2],
[1, 1],
[3, 3]]), array([2, 1, 3]))
However, this feels clunky, inefficient, and slow, and it requires making a copy of the arrays -- I'd rather shuffle them in-place, since they'll be quite large.
Is there a better way to go about this? Faster execution and lower memory usage are my primary goals, but elegant code would be nice, too.
One other thought I had was this:
def shuffle_in_unison_scary(a, b):
rng_state = numpy.random.get_state()
numpy.random.shuffle(a)
numpy.random.set_state(rng_state)
numpy.random.shuffle(b)
This works...but it's a little scary, as I see little guarantee it'll continue to work -- it doesn't look like the sort of thing that's guaranteed to survive across numpy version, for example.

Your can use NumPy's array indexing:
def unison_shuffled_copies(a, b):
assert len(a) == len(b)
p = numpy.random.permutation(len(a))
return a[p], b[p]
This will result in creation of separate unison-shuffled arrays.

X = np.array([[1., 0.], [2., 1.], [0., 0.]])
y = np.array([0, 1, 2])
from sklearn.utils import shuffle
X, y = shuffle(X, y, random_state=0)
To learn more, see http://scikit-learn.org/stable/modules/generated/sklearn.utils.shuffle.html

Your "scary" solution does not appear scary to me. Calling shuffle() for two sequences of the same length results in the same number of calls to the random number generator, and these are the only "random" elements in the shuffle algorithm. By resetting the state, you ensure that the calls to the random number generator will give the same results in the second call to shuffle(), so the whole algorithm will generate the same permutation.
If you don't like this, a different solution would be to store your data in one array instead of two right from the beginning, and create two views into this single array simulating the two arrays you have now. You can use the single array for shuffling and the views for all other purposes.
Example: Let's assume the arrays a and b look like this:
a = numpy.array([[[ 0., 1., 2.],
[ 3., 4., 5.]],
[[ 6., 7., 8.],
[ 9., 10., 11.]],
[[ 12., 13., 14.],
[ 15., 16., 17.]]])
b = numpy.array([[ 0., 1.],
[ 2., 3.],
[ 4., 5.]])
We can now construct a single array containing all the data:
c = numpy.c_[a.reshape(len(a), -1), b.reshape(len(b), -1)]
# array([[ 0., 1., 2., 3., 4., 5., 0., 1.],
# [ 6., 7., 8., 9., 10., 11., 2., 3.],
# [ 12., 13., 14., 15., 16., 17., 4., 5.]])
Now we create views simulating the original a and b:
a2 = c[:, :a.size//len(a)].reshape(a.shape)
b2 = c[:, a.size//len(a):].reshape(b.shape)
The data of a2 and b2 is shared with c. To shuffle both arrays simultaneously, use numpy.random.shuffle(c).
In production code, you would of course try to avoid creating the original a and b at all and right away create c, a2 and b2.
This solution could be adapted to the case that a and b have different dtypes.

Very simple solution:
randomize = np.arange(len(x))
np.random.shuffle(randomize)
x = x[randomize]
y = y[randomize]
the two arrays x,y are now both randomly shuffled in the same way

James wrote in 2015 an sklearn solution which is helpful. But he added a random state variable, which is not needed. In the below code, the random state from numpy is automatically assumed.
X = np.array([[1., 0.], [2., 1.], [0., 0.]])
y = np.array([0, 1, 2])
from sklearn.utils import shuffle
X, y = shuffle(X, y)

from np.random import permutation
from sklearn.datasets import load_iris
iris = load_iris()
X = iris.data #numpy array
y = iris.target #numpy array
# Data is currently unshuffled; we should shuffle
# each X[i] with its corresponding y[i]
perm = permutation(len(X))
X = X[perm]
y = y[perm]

Shuffle any number of arrays together, in-place, using only NumPy.
import numpy as np
def shuffle_arrays(arrays, set_seed=-1):
"""Shuffles arrays in-place, in the same order, along axis=0
Parameters:
-----------
arrays : List of NumPy arrays.
set_seed : Seed value if int >= 0, else seed is random.
"""
assert all(len(arr) == len(arrays[0]) for arr in arrays)
seed = np.random.randint(0, 2**(32 - 1) - 1) if set_seed < 0 else set_seed
for arr in arrays:
rstate = np.random.RandomState(seed)
rstate.shuffle(arr)
And can be used like this
a = np.array([1, 2, 3, 4, 5])
b = np.array([10,20,30,40,50])
c = np.array([[1,10,11], [2,20,22], [3,30,33], [4,40,44], [5,50,55]])
shuffle_arrays([a, b, c])
A few things to note:
The assert ensures that all input arrays have the same length along
their first dimension.
Arrays shuffled in-place by their first dimension - nothing returned.
Random seed within positive int32 range.
If a repeatable shuffle is needed, seed value can be set.
After the shuffle, the data can be split using np.split or referenced using slices - depending on the application.

you can make an array like:
s = np.arange(0, len(a), 1)
then shuffle it:
np.random.shuffle(s)
now use this s as argument of your arrays. same shuffled arguments return same shuffled vectors.
x_data = x_data[s]
x_label = x_label[s]

There is a well-known function that can handle this:
from sklearn.model_selection import train_test_split
X, _, Y, _ = train_test_split(X,Y, test_size=0.0)
Just setting test_size to 0 will avoid splitting and give you shuffled data.
Though it is usually used to split train and test data, it does shuffle them too.
From documentation
Split arrays or matrices into random train and test subsets
Quick utility that wraps input validation and
next(ShuffleSplit().split(X, y)) and application to input data into a
single call for splitting (and optionally subsampling) data in a
oneliner.

This seems like a very simple solution:
import numpy as np
def shuffle_in_unison(a,b):
assert len(a)==len(b)
c = np.arange(len(a))
np.random.shuffle(c)
return a[c],b[c]
a = np.asarray([[1, 1], [2, 2], [3, 3]])
b = np.asarray([11, 22, 33])
shuffle_in_unison(a,b)
Out[94]:
(array([[3, 3],
[2, 2],
[1, 1]]),
array([33, 22, 11]))

One way in which in-place shuffling can be done for connected lists is using a seed (it could be random) and using numpy.random.shuffle to do the shuffling.
# Set seed to a random number if you want the shuffling to be non-deterministic.
def shuffle(a, b, seed):
np.random.seed(seed)
np.random.shuffle(a)
np.random.seed(seed)
np.random.shuffle(b)
That's it. This will shuffle both a and b in the exact same way. This is also done in-place which is always a plus.
EDIT, don't use np.random.seed() use np.random.RandomState instead
def shuffle(a, b, seed):
rand_state = np.random.RandomState(seed)
rand_state.shuffle(a)
rand_state.seed(seed)
rand_state.shuffle(b)
When calling it just pass in any seed to feed the random state:
a = [1,2,3,4]
b = [11, 22, 33, 44]
shuffle(a, b, 12345)
Output:
>>> a
[1, 4, 2, 3]
>>> b
[11, 44, 22, 33]
Edit: Fixed code to re-seed the random state

Say we have two arrays: a and b.
a = np.array([[1,2,3],[4,5,6],[7,8,9]])
b = np.array([[9,1,1],[6,6,6],[4,2,0]])
We can first obtain row indices by permutating first dimension
indices = np.random.permutation(a.shape[0])
[1 2 0]
Then use advanced indexing.
Here we are using the same indices to shuffle both arrays in unison.
a_shuffled = a[indices[:,np.newaxis], np.arange(a.shape[1])]
b_shuffled = b[indices[:,np.newaxis], np.arange(b.shape[1])]
This is equivalent to
np.take(a, indices, axis=0)
[[4 5 6]
[7 8 9]
[1 2 3]]
np.take(b, indices, axis=0)
[[6 6 6]
[4 2 0]
[9 1 1]]

If you want to avoid copying arrays, then I would suggest that instead of generating a permutation list, you go through every element in the array, and randomly swap it to another position in the array
for old_index in len(a):
new_index = numpy.random.randint(old_index+1)
a[old_index], a[new_index] = a[new_index], a[old_index]
b[old_index], b[new_index] = b[new_index], b[old_index]
This implements the Knuth-Fisher-Yates shuffle algorithm.

Shortest and easiest way in my opinion, use seed:
random.seed(seed)
random.shuffle(x_data)
# reset the same seed to get the identical random sequence and shuffle the y
random.seed(seed)
random.shuffle(y_data)

most solutions above work, however if you have column vectors you have to transpose them first. here is an example
def shuffle(self) -> None:
"""
Shuffles X and Y
"""
x = self.X.T
y = self.Y.T
p = np.random.permutation(len(x))
self.X = x[p].T
self.Y = y[p].T

With an example, this is what I'm doing:
combo = []
for i in range(60000):
combo.append((images[i], labels[i]))
shuffle(combo)
im = []
lab = []
for c in combo:
im.append(c[0])
lab.append(c[1])
images = np.asarray(im)
labels = np.asarray(lab)

I extended python's random.shuffle() to take a second arg:
def shuffle_together(x, y):
assert len(x) == len(y)
for i in reversed(xrange(1, len(x))):
# pick an element in x[:i+1] with which to exchange x[i]
j = int(random.random() * (i+1))
x[i], x[j] = x[j], x[i]
y[i], y[j] = y[j], y[i]
That way I can be sure that the shuffling happens in-place, and the function is not all too long or complicated.

Just use numpy...
First merge the two input arrays 1D array is labels(y) and 2D array is data(x) and shuffle them with NumPy shuffle method. Finally split them and return.
import numpy as np
def shuffle_2d(a, b):
rows= a.shape[0]
if b.shape != (rows,1):
b = b.reshape((rows,1))
S = np.hstack((b,a))
np.random.shuffle(S)
b, a = S[:,0], S[:,1:]
return a,b
features, samples = 2, 5
x, y = np.random.random((samples, features)), np.arange(samples)
x, y = shuffle_2d(train, test)

What is the most efficient way to compute a Kronecker Product in TensorFlow?

I am interested in implementing this paper on Kronecker Recurrent Units in TensorFlow.
This involves the computation of a Kronecker Product. TensorFlow does not have an operation for Kronecker Products. I am looking for an efficient and robust way to compute this.
Does this exist, or would I need to define a TensorFlow op manually?

If you will read the math definition of conv2d_transpose and see what Kronecker product calculates, you will see that with the appropriate size of stides for conv2d_tranpose (width, height of the second matrix), it does the same thing.
Moreover you even have batching of Kronecker's product out of the box with conv2d_transpose.
Here is an example of you which calculates the Kronecker's product for matrices from wiki.
import tensorflow as tf
a = [[1, 2], [3, 4]]
b = [[0, 5], [6, 7]]
i, k, s = len(a), len(b), len(b)
o = s * (i - 1) + k
a_tf = tf.reshape(tf.constant(a, dtype=tf.float32), [1, i, i, 1])
b_tf = tf.reshape(tf.constant(b, dtype=tf.float32), [k, k, 1, 1])
res = tf.squeeze(tf.nn.conv2d_transpose(a_tf, b_tf, (1, o, o, 1), [1, s, s, 1], "VALID"))
with tf.Session() as sess:
print sess.run(res)
Notice that in the case of a non-square matrix, you will need to calulcate more dimensions in the lines:
i, k, s = len(a), len(b), len(b)
o = s * (i - 1) + k
and use them properly as your strides/outputs arguments.

TensorFlow 1.7+ provides the function kronecker_product in tf.contrib.kfac.utils.kronecker_product:
a = tf.eye(3)
b = tf.constant([[1., 2.], [3., 4.]])
kron = tf.contrib.kfac.utils.kronecker_product(a, b)
tf.Session().run(kron)
Output:
array([[1., 2., 0., 0., 0., 0.],
[3., 4., 0., 0., 0., 0.],
[0., 0., 1., 2., 0., 0.],
[0., 0., 3., 4., 0., 0.],
[0., 0., 0., 0., 1., 2.],
[0., 0., 0., 0., 3., 4.]], dtype=float32)

Here's the utility I use for this. See kronecker_test for example of usage
def fix_shape(tf_shape):
return tuple(int(dim) for dim in tf_shape)
def concat_blocks(blocks, validate_dims=True):
"""Takes 2d grid of blocks representing matrices and concatenates to single
matrix (aka ArrayFlatten)"""
if validate_dims:
col_dims = np.array([[int(b.shape[1]) for b in row] for row in blocks])
col_sums = col_dims.sum(1)
assert (col_sums[0] == col_sums).all()
row_dims = np.array([[int(b.shape[0]) for b in row] for row in blocks])
row_sums = row_dims.sum(0)
assert (row_sums[0] == row_sums).all()
block_rows = [tf.concat(row, axis=1) for row in blocks]
return tf.concat(block_rows, axis=0)
def chunks(l, n):
"""Yield successive n-sized chunks from l."""
for i in range(0, len(l), n):
yield l[i:i + n]
from tensorflow.python.framework import ops
original_shape_func = ops.set_shapes_for_outputs
def disable_shape_inference():
ops.set_shapes_for_outputs = lambda _: _
def enable_shape_inference():
ops.set_shapes_for_outputs = original_shape_func
def kronecker(A, B, do_shape_inference=True):
"""Kronecker product of A,B.
turn_off_shape_inference: if True, makes 10x10 kron go 2.4 sec -> 0.9 sec
"""
Arows, Acols = fix_shape(A.shape)
Brows, Bcols = fix_shape(B.shape)
Crows, Ccols = Arows*Brows, Acols*Bcols
temp = tf.reshape(A, [-1, 1, 1])*tf.expand_dims(B, 0)
Bshape = tf.constant((Brows, Bcols))
# turn off shape inference
if not do_shape_inference:
disable_shape_inference()
# [1, n, m] => [n, m]
slices = [tf.reshape(s, Bshape) for s in tf.split(temp, Crows)]
# import pdb; pdb.set_trace()
grid = list(chunks(slices, Acols))
assert len(grid) == Arows
result = concat_blocks(grid, validate_dims=do_shape_inference)
if not do_shape_inference:
enable_shape_inference()
result.set_shape((Arows*Brows, Acols*Bcols))
return result
def kronecker_test():
A0 = [[1,2],[3,4]]
B0 = [[6,7],[8,9]]
A = tf.constant(A0)
B = tf.constant(B0)
C = kronecker(A, B)
sess = tf.Session()
C0 = sess.run(C)
Ct = [[6, 7, 12, 14], [8, 9, 16, 18], [18, 21, 24, 28], [24, 27, 32, 36]]
Cnp = np.kron(A0, B0)
check_equal(C0, Ct)
check_equal(C0, Cnp)

Try the following solution, see if it works for you:
def tf_kron(a,b):
a_shape = [a.shape[0].value,a.shape[1].value]
b_shape = [b.shape[0].value,b.shape[1].value]
return tf.reshape(tf.reshape(a,[a_shape[0],1,a_shape[1],1])*tf.reshape(b,[1,b_shape[0],1,b_shape[1]]),[a_shape[0]*b_shape[0],a_shape[1]*b_shape[1]])

How about something like this:
def kron(x, y):
"""Computes the Kronecker product of two matrices.
Args:
x: A matrix (or batch thereof) of size m x n.
y: A matrix (or batch thereof) of size p x q.
Returns:
z: Kronecker product of matrices x and y of size mp x nq
"""
with tf.name_scope('kron'):
x = tf.convert_to_tensor(x, dtype_hint=tf.float32)
y = tf.convert_to_tensor(y, dtype_hint=x.dtype)
def _maybe_expand(x):
xs = tf.pad(
tf.shape(x),
paddings=[[tf.maximum(2 - tf.rank(x), 0), 0]],
constant_values=1)
x = tf.reshape(x, xs)
_, mx, nx = tf.split(xs, num_or_size_splits=[-1, 1, 1])
return x, mx, nx
x, mx, nx = _maybe_expand(x)
y, my, ny = _maybe_expand(y)
x = x[..., :, tf.newaxis, :, tf.newaxis]
y = y[..., tf.newaxis, :, tf.newaxis, :]
z = x * y
bz = tf.shape(z)[:-4]
z = tf.reshape(z, tf.concat([bz, mx * my, nx * ny], axis=0))
return z
This solution:
supports batches
supports broadcasting
works in xla
clearly shows the relationship between numpy broadcasting and kronecker products.

Is there a way to define a float array in Python?

For my astronomy homework, I need to simulate the elliptical orbit of a planet around a sun. To do this, I need to use a for loop to repeatedly calculate the motion of the planet. However, every time I try to run the program, I get the following error:
RuntimeWarning: invalid value encountered in power
r=(x**2+y**2)**1.5
Traceback (most recent call last):
File "planetenstelsel3-4.py", line 25, in <module>
ax[i] = a(x[i],y[i])*x[i]
ValueError: cannot convert float NaN to integer
I've done some testing, and I think the problem lies in the fact that the values that are calculated are greater than what fits in an integer, and the arrays are defined as int arrays. So if there was a way do define them as float arrays, maybe it would work. Here is my code:
import numpy as np
import matplotlib.pyplot as plt
dt = 3600 #s
N = 5000
x = np.tile(0, N)
y = np.tile(0, N)
x[0] = 1.496e11 #m
y[0] = 0.0
vx = np.tile(0, N)
vy = np.tile(0, N)
vx[0] = 0.0
vy[0] = 28000 #m/s
ax = np.tile(0, N)
ay = np.tile(0, N)
m1 = 1.988e30 #kg
G = 6.67e-11 #Nm^2kg^-2
def a(x,y):
r=(x**2+y**2)**1.5
return (-G*m1)/r
for i in range (0,N):
r = x[i],y[i]
ax[i] = a(x[i],y[i])*x[i]
ay[i] = a(x[i],y[i])*y[i]
vx[i+1] = vx[i] + ax[i]*dt
vy[i+1] = vy[i] + ay[i]*dt
x[i+1] = x[i] + vx[i]*dt
y[i+1] = y[i] + vy[i]*dt
plt.plot(x,y)
plt.show()
The first few lines are just some starting parameters.
Thanks for the help in advance!

When you are doing physics simulations you should definitely use floats for everything. 0 is an integer constant in Python, and thus np.tile creates integer arrays; use 0.0 as the argument to np.tile to do floating point arrays; or preferably use the np.zeros(N) instead:
You can check the datatype of any array variable from its dtype member:
>>> np.tile(0, 10).dtype
dtype('int64')
>>> np.tile(0.0, 10).dtype
dtype('float64')
>>> np.zeros(10)
array([ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])
>>> np.zeros(10).dtype
dtype('float64')
To get a zeroed array of float32 you'd need to give a float32 as the argument:
>>> np.tile(np.float32(0), 10)
array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], dtype=float32)
or, preferably, use zeros with a defined dtype:
>>> np.zeros(10, dtype='float32')
array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], dtype=float32)

You need x = np.zeros(N), etc.: this declares the arrays as float arrays.
This is the standard way of putting zeros in an array (np.tile() is convenient for creating a tiling with a fixed array).

Split a multidimensional numpy array using a condition

I have a multidimensional numpy array.
The first array indicates the quality of the data. 0 is good, 1 is not so good.
For a first check I only want to use good data.
How do I split the array into two new ones?
My own idea does not work:
good_data = [x for x in data[0,:] if x = 1.0]
bad_data = [x for x in data[0,:] if x = 0.0]
Here is a small example indicating my problem:
import numpy as np
flag = np.array([0., 0., 0., 1., 1., 1.])
temp = np.array([300., 310., 320., 300., 222., 333.])
pressure = np.array([1013., 1013., 1013., 900., 900., 900.])
data = np.array([flag, temp, pressure])
good_data = data[0,:][data[0,:] == 1.0]
bad_data = data[0,:][data[0,:] == 0.0]
print good_data
The print statement gives me [1., 1., 1.].
But I am looking for [[1., 1., 1.], [300., 222., 333.], [900., 900., 900.]].

Is this what you are looking for?
good_data = data[0,:][data[0,:] == 1.0]
bad_data = data[0,:][data[0,:] == 0.0]
This returns a numpy.array.
Alternatively, you can do as you suggested, but convert the resulting list to numpy.array:
good_data = np.array([x for x in data[0,:] if x == 1.0])
Notice the comparison operator == in place of the assignment operator =.
For your particular example, subset data using flag == 1 while iterating over the first index:
good_data = [data[n,:][flag == 1] for n in range(data.shape[0])]
If you really want the elements of good_data to be lists, convert inside the comprehension:
good_data = [data[n,:][flag == 1].tolist() for n in range(data.shape[0])]
Thanks to Jaime who pointed out that the easy way to do this is:
good_data = data[:, data[0] == 1]

How to normalize a 2-dimensional numpy array in python less verbose?

Given a 3 times 3 numpy array
a = numpy.arange(0,27,3).reshape(3,3)
# array([[ 0, 3, 6],
# [ 9, 12, 15],
# [18, 21, 24]])
To normalize the rows of the 2-dimensional array I thought of
row_sums = a.sum(axis=1) # array([ 9, 36, 63])
new_matrix = numpy.zeros((3,3))
for i, (row, row_sum) in enumerate(zip(a, row_sums)):
new_matrix[i,:] = row / row_sum
There must be a better way, isn't there?
Perhaps to clearify: By normalizing I mean, the sum of the entrys per row must be one. But I think that will be clear to most people.

Broadcasting is really good for this:
row_sums = a.sum(axis=1)
new_matrix = a / row_sums[:, numpy.newaxis]
row_sums[:, numpy.newaxis] reshapes row_sums from being (3,) to being (3, 1). When you do a / b, a and b are broadcast against each other.
You can learn more about broadcasting here or even better here.

Scikit-learn offers a function normalize() that lets you apply various normalizations. The "make it sum to 1" is called L1-norm. Therefore:
from sklearn.preprocessing import normalize
matrix = numpy.arange(0,27,3).reshape(3,3).astype(numpy.float64)
# array([[ 0., 3., 6.],
# [ 9., 12., 15.],
# [ 18., 21., 24.]])
normed_matrix = normalize(matrix, axis=1, norm='l1')
# [[ 0. 0.33333333 0.66666667]
# [ 0.25 0.33333333 0.41666667]
# [ 0.28571429 0.33333333 0.38095238]]
Now your rows will sum to 1.

I think this should work,
a = numpy.arange(0,27.,3).reshape(3,3)
a /= a.sum(axis=1)[:,numpy.newaxis]

In case you are trying to normalize each row such that its magnitude is one (i.e. a row's unit length is one or the sum of the square of each element in a row is one):
import numpy as np
a = np.arange(0,27,3).reshape(3,3)
result = a / np.linalg.norm(a, axis=-1)[:, np.newaxis]
# array([[ 0. , 0.4472136 , 0.89442719],
# [ 0.42426407, 0.56568542, 0.70710678],
# [ 0.49153915, 0.57346234, 0.65538554]])
Verifying:
np.sum( result**2, axis=-1 )
# array([ 1., 1., 1.])

I think you can normalize the row elements sum to 1 by this:
new_matrix = a / a.sum(axis=1, keepdims=1).
And the column normalization can be done with new_matrix = a / a.sum(axis=0, keepdims=1). Hope this can hep.

You could use built-in numpy function:
np.linalg.norm(a, axis = 1, keepdims = True)

it appears that this also works
def normalizeRows(M):
row_sums = M.sum(axis=1)
return M / row_sums

You could also use matrix transposition:
(a.T / row_sums).T

Here is one more possible way using reshape:
a_norm = (a/a.sum(axis=1).reshape(-1,1)).round(3)
print(a_norm)
Or using None works too:
a_norm = (a/a.sum(axis=1)[:,None]).round(3)
print(a_norm)
Output:
array([[0. , 0.333, 0.667],
[0.25 , 0.333, 0.417],
[0.286, 0.333, 0.381]])

Use
a = a / np.linalg.norm(a, ord = 2, axis = 0, keepdims = True)
Due to the broadcasting, it will work as intended.

Or using lambda function, like
>>> vec = np.arange(0,27,3).reshape(3,3)
>>> import numpy as np
>>> norm_vec = map(lambda row: row/np.linalg.norm(row), vec)
each vector of vec will have a unit norm.

We can achieve the same effect by premultiplying with the diagonal matrix whose main diagonal is the reciprocal of the row sums.
A = np.diag(A.sum(1)**-1) # A

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Converting a list of points to a numpy 2D array - python

You can replace the for loop with the following: xidx = (raw[:,0]-xrange[0]).astype(int) yidx = (raw[:,1]-yrange[0]).astype(int) Z[xidx, yidx] = raw[:,2]

To import a matrix from a file you can just split the lines and then convert to int. [[int(i) for i in j.split()] for j in open('myfile').readlines()] of course, I'm supposing your file contains only the matrix. At the end, you can convert this 2-D array to numpy.

Related

Add 2 column vector to ndarray [duplicate]

What is the most efficient way to compute a Kronecker Product in TensorFlow?

Is there a way to define a float array in Python?

Split a multidimensional numpy array using a condition

How to normalize a 2-dimensional numpy array in python less verbose?

Categories

Resources