Function numpy.reshape - python

I have this function in matlab
cn = reshape (repmat (sn, n_rep, 1), 1, []);
No python with the key code:
import numpy like np
from numpy.random import randint
M = 2
N = 2 * 10 ** 8 ### data value
n_rep = 3 ## number of repetitions
sn = randint (0, M, size = N) ### integers 0 and 1
print ("sn =", sn)
cn_repmat = np.tile (sn, n_rep)
print ("cn_repmat =", cn_repmat)
cn = np.reshape (cn_repmat, 1, [])
print (cn)
I'm not sure if retro kinship is not known
File "C: / Users / Sergio Malhao / .spyder-py3 / Desktop / untitled6.py", line 17, under <module>
cn = np.reshape (cn_repmat, 1, [])
File "E: \ Anaconda3 \ lib \ site-packages \ numpy \ core \ fromnumeric.py", line 232, in reshape
return _wrapfunc (a, 'reshape', newshape, order = order)
File "E: \ Anaconda3 \ lib \ site-packages \ numpy \ core \ fromnumeric.py", line 57, in _wrapfunc
return getattr (obj, method) (* args, ** kwds)
ValueError: Can not reshape the array of size 600000000 in shape (1,)

Numpy is not supposed to be a 1:1 matlab. It works similar, but not in the same way.
I assume you want to convert a matrix into one dimensional array.
Try to:
np.reshape (cn_repmat, (1, -1))
where the (1, -1) is a tuple defining size of the new array.
One shape dimension can be -1. In this case, the value is inferred
from the length of the array and remaining dimensions.

In Octave:
>> sn = [0,1,2,3,4]
sn =
0 1 2 3 4
>> repmat(sn,4,1)
ans =
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
>> reshape(repmat(sn,4,1),1,[])
ans =
0 0 0 0 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4
In numpy:
In [595]: sn=np.array([0,1,2,3,4])
In [596]: np.repeat(sn,4)
Out[596]: array([0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4])
In [597]: np.tile(sn,4)
Out[597]: array([0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 0, 1, 2, 3, 4])
In MATLAB matrices are at least 2d; in numpy they may be 1d. Out[596] is 1d.
We could get closer to the MATLAB by making the sn 2d:
In [608]: sn2 = sn[None,:] # = sn.reshape((1,-1))
In [609]: sn2
Out[609]: array([[0, 1, 2, 3, 4]])
In [610]: np.repeat(sn2,4,1)
Out[610]: array([[0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4]])
With tile we have to transpose or play order games (MATLAB is order F):
In [613]: np.tile(sn,[4,1])
Out[613]:
array([[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4]])
In [614]: np.tile(sn,[4,1]).T.ravel()
Out[614]: array([0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4])
In [615]: np.tile(sn,[4,1]).ravel(order='F')
Out[615]: array([0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4])
ravel is the equivalent of reshape(...., -1). -1 functions like [] in MATLAB when reshaping.
In numpy repeat is the basic function; tile uses repeat with a different user interface (more like repmat).

Related

Python for loop syntax with two dimensional array

for f in train.columns:
missings = train[train[f] == -1][f].count()
what does trainp[][] mean? How can this be two dimensional array if f referring to another column?
For vanilla python It is certainly very odd and poorly written code, but it could be valid in a very limited number of situations. Below are a couple examples in which it would work. I am sure there are more, but either way it is not very easy to understand and I do not recommend using it in your own code.
Note: the iterable.count() method requires 1 argument.
example 2
f = 4
train = [[1, 2, 3, 4, [0, 0, 1, 0]], [1, 2, 3, 4, [1, 0, 1, 1]], 0, 1, -1]
missings = train[train[f] == -1][f].count(1)
print(missings) # output = 3
example 1
f = 4
train = {True: [1, 2, 3, 4, [0, 0, 0, 1]], False: [1, 2, 3, 4, [1, 1, 1, 0]], 4: 1}
missing = train[train[f] == -1][f].count(1)
print(missing) # output = 3
It's looking like you are already getting values from the 2D array i-e train[train[f] == -1][f]
you can make it a 2D array by doing something like that
import numpy as np
arr = np.array([[1, 2, 3], [4, 5, 6]])
or
arr = [[12, 13, 5, 4], [14, 8,11], [12, 10, 12, 6], [15,17,9,0]]

Find runs and lengths of consecutive values in an array

I'd like to find equal values in an array and their indices if they occur consecutively more then 2 times.
[0, 3, 0, 1, 0, 1, 2, 1, 2, 2, 2, 2, 1, 3, 4]
so in this example I would find value "2" occured "4" times, starting from position "8". Is there any build in function to do that?
I found a way with collections.Counter
collections.Counter(a)
# Counter({0: 3, 1: 4, 3: 2, 5: 1, 4: 1})
but this is not what I am looking for.
Of course I can write a loop and compare two values and then count them, but may be there is a more elegant solution?
Find consecutive runs and length of runs with condition
import numpy as np
arr = np.array([0, 3, 0, 1, 0, 1, 2, 1, 2, 2, 2, 2, 1, 3, 4])
res = np.ones_like(arr)
np.bitwise_xor(arr[:-1], arr[1:], out=res[1:]) # set equal, consecutive elements to 0
# use this for np.floats instead
# arr = np.array([0, 3, 0, 1, 0, 1, 2, 1, 2.4, 2.4, 2.4, 2, 1, 3, 4, 4, 4, 5])
# res = np.hstack([True, ~np.isclose(arr[:-1], arr[1:])])
idxs = np.flatnonzero(res) # get indices of non zero elements
values = arr[idxs]
counts = np.diff(idxs, append=len(arr)) # difference between consecutive indices are the length
cond = counts > 2
values[cond], counts[cond], idxs[cond]
Output
(array([2]), array([4]), array([8]))
# (array([2.4, 4. ]), array([3, 3]), array([ 8, 14]))
_, i, c = np.unique(np.r_[[0], ~np.isclose(arr[:-1], arr[1:])].cumsum(),
return_index = 1,
return_counts = 1)
for index, count in zip(i, c):
if count > 1:
print([arr[index], count, index])
Out[]: [2, 4, 8]
A little more compact way of doing it that works for all input types.

How to create a sequence of sequences of numbers in NumPy?

Inspired by the post How to create a sequence of sequences of numbers in R?.
Question:
I would like to make the following sequence in NumPy.
[1, 2, 3, 4, 5, 2, 3, 4, 5, 3, 4, 5, 4, 5, 5]
I have tried the following:
Non-generic and hard coding using np.r_
np.r_[1:6, 2:6, 3:6, 4:6, 5:6]
# array([1, 2, 3, 4, 5, 2, 3, 4, 5, 3, 4, 5, 4, 5, 5])
Pure Python to generate the desired array.
n = 5
a = np.r_[1:n+1]
[i for idx in range(a.shape[0]) for i in a[idx:]]
# [1, 2, 3, 4, 5, 2, 3, 4, 5, 3, 4, 5, 4, 5, 5]
Create a 2D array and take the upper triangle from it.
n = 5
a = np.r_[1:n+1]
arr = np.tile(a, (n, 1))
print(arr)
# [[1 2 3 4 5]
# [1 2 3 4 5]
# [1 2 3 4 5]
# [1 2 3 4 5]
# [1 2 3 4 5]]
o = np.triu(arr).flatten()
# array([1, 2, 3, 4, 5,
# 0, 2, 3, 4, 5,
# 0, 0, 3, 4, 5, # This is 1D array
# 0, 0, 0, 4, 5,
# 0, 0, 0, 0, 5])
out = o[o > 0]
# array([1, 2, 3, 4, 5, 2, 3, 4, 5, 3, 4, 5, 4, 5, 5])
The above solution is generic but I want to know if there's a more efficient way to do it in NumPy.
I'm not sure if this is a good idea but I tried running it against your python method and it seems to be faster.
np.concatenate([np.arange(i, n+1) for i in range(1, n+1)])
Here is the full code:
import numpy as np
from time import time
n = 5000
t = time()
c = np.concatenate([np.arange(i, n+1) for i in range(1, n+1)])
print(time() - t)
# 0.039876699447631836
t = time()
a = np.r_[1:n+1]
b = np.array([i for idx in range(a.shape[0]) for i in a[idx:]])
print(time() - t)
# 2.0875167846679688
print(all(b == c))
# True
A really plain Python (no numpy) way is:
n = 5
a = [r for start in range(1, n+1) for r in range(start, n+1)]
This will be faster for small n (~150) but slower than #tangolin's solution for larger n. It is still faster than the OP's "pure python" way.
A faster implementation prepares the data in advance, avoiding creating a new range each time :
source = np.arange(1, n+1)
d = np.concatenate([source[i: n+1] for i in range(0, n)])
NOTE
My original implementation both allocated space for the return value and prepared the data in advance, but it was not pythonic. I changed it to use concatenate after reading #tangolin's answer and noticed that concatenate does the same.
Original implementation:
e = np.empty((n*(n+1)//2, ), dtype='int64')
source = np.arange(1, n+1)
for i in range(n):
init = n * i - i*(i-1)//2
end = n - i + init
e[init:end] = source[i:n]

Resize matrix by repeating copies of it, in python

Say you have two matrices, A is 2x2 and B is 2x7 (2 rows, 7 columns). I want to create a matrix C of shape 2x7, out of copies of A. The problem is np.hstack only understands situations where the column numbers divide (say 2 and 8, thus you can easily stack 4 copies of A to get C) ,but what about when they do not? Any ideas?
A = [[0,1] B = [[1,2,3,4,5,6,7], C = [[0,1,0,1,0,1,0],
[2,3]] [1,2,3,4,5,6,7]] [2,3,2,3,2,3,2]]
Here's an approach with modulus -
In [23]: ncols = 7 # No. of cols in output array
In [24]: A[:,np.mod(np.arange(ncols),A.shape[1])]
Out[24]:
array([[0, 1, 0, 1, 0, 1, 0],
[2, 3, 2, 3, 2, 3, 2]])
Or with % operator -
In [27]: A[:,np.arange(ncols)%A.shape[1]]
Out[27]:
array([[0, 1, 0, 1, 0, 1, 0],
[2, 3, 2, 3, 2, 3, 2]])
For such repeated indices, using np.take would be more performant -
In [29]: np.take(A, np.arange(ncols)%A.shape[1], axis=1)
Out[29]:
array([[0, 1, 0, 1, 0, 1, 0],
[2, 3, 2, 3, 2, 3, 2]])
A solution without numpy (although the np solution posted above is a lot nicer):
A = [[0,1],
[2,3]]
B = [[1,2,3,4,5,6,7],
[1,2,3,4,5,6,7]]
i_max, j_max = len(A), len(A[0])
C = []
for i, line_b in enumerate(B):
line_c = [A[i % i_max][j % j_max] for j, _ in enumerate(line_b)]
C.append(line_c)
print(C)
First solution is very nice. Another possible way would be to still use hstack, but if you don't want the pattern repeated fully you can use array slicing to get the values you need:
a.shape > (2,2)
b.shape > (2,7)
repeats = np.int(np.ceil(b.shape[1]/a.shape[0]))
trim = b.shape[1] % a.shape[0]
c = np.hstack([a] * repeats)[:,:-trim]
>
array([[0, 1, 0, 1, 0, 1, 0],
[2, 3, 2, 3, 2, 3, 2]])

Generate 1D NumPy array of concatenated ranges

I want to generate a following array a:
nv = np.random.randint(3, 10+1, size=(1000000,))
a = np.concatenate([np.arange(1,i+1) for i in nv])
Thus, the output would be something like -
[0, 1, 2, 3, 0, 1, 2, 3, 4, 0, 1, 2, 0, 1, 2, 3, 4, 5, 0, ...]
Does there exist any better way to do it?
Here's a vectorized approach using cumulative summation -
def ranges(nv, start = 1):
shifts = nv.cumsum()
id_arr = np.ones(shifts[-1], dtype=int)
id_arr[shifts[:-1]] = -nv[:-1]+1
id_arr[0] = start # Skip if we know the start of ranges is 1 already
return id_arr.cumsum()
Sample runs -
In [23]: nv
Out[23]: array([3, 2, 5, 7])
In [24]: ranges(nv, start=0)
Out[24]: array([0, 1, 2, 0, 1, 0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 5, 6])
In [25]: ranges(nv, start=1)
Out[25]: array([1, 2, 3, 1, 2, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 6, 7])
Runtime test -
In [62]: nv = np.random.randint(3, 10+1, size=(100000,))
In [63]: %timeit your_func(nv) # #MSeifert's solution
10 loops, best of 3: 129 ms per loop
In [64]: %timeit ranges(nv)
100 loops, best of 3: 5.54 ms per loop
Instead of doing this with numpy methods you could use normal python ranges and just convert the result to an array:
from itertools import chain
import numpy as np
def your_func(nv):
ranges = (range(1, i+1) for i in nv)
flattened = list(chain.from_iterable(ranges))
return np.array(flattened)
This doesn't need to utilize hard to understand numpy slicings and constructs. To show a sample case:
import random
>>> nv = [random.randint(1, 10) for _ in range(5)]
>>> print(nv)
[4, 2, 10, 5, 3]
>>> print(your_func(nv))
[ 1 2 3 4 1 2 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 1 2 3]
Why two steps?
a = np.concatenate([np.arange(0,np.random.randint(3,11)) for i in range(1000000)])

Categories

Resources