I'm new to programming, and I'm trying to write a Python function to find the inverse of a permutation on {1,2,3,...,n} using the following code:
def inv(str):
result = []
i = list(str).index(min(list(str)))
while min(list(str)) < len(list(str)) + 1:
list(str)[i : i + 1] = [len(list(str)) + 1]
result.append(i + 1)
return result
However, when I try to use the function, inv('<mypermutation>') returns []. Am I missing something? Is Python skipping over my while loop for some syntactical reason I don't understand? None of my google and stackoverflow searches on topics I think of are returning anything helpful.
Other answers are correct, but for what it's worth, there's a much more performant alternative using numpy:
inverse_perm = np.argsort(permutation)
EDIT: and the fourth function below is even faster.
Timing code:
def invert_permutation_list_scan(p):
return [p.index(l) for l in range(len(p))]
def invert_permutation_list_comp(permutation):
return [i for i, j in sorted(enumerate(permutation), key=lambda i_j: i_j[1])]
def invert_permutation_numpy(permutation):
return np.argsort(permutation)
def invert_permutation_numpy2(permutation):
inv = np.empty_like(permutation)
inv[permutation] = np.arange(len(inv), dtype=inv.dtype)
return inv
x = np.random.randn(1000)
perm = np.argsort(x)
permlist = list(perm)
assert np.array_equal(invert_permutation_list_scan(permlist), invert_permutation_numpy(perm))
assert np.array_equal(invert_permutation_list_comp(perm), invert_permutation_numpy(perm))
assert np.array_equal(invert_permutation_list_comp(perm), invert_permutation_numpy2(perm))
%timeit invert_permutation_list_scan(permlist)
%timeit invert_permutation_list_comp(perm)
%timeit invert_permutation_numpy(perm)
%timeit invert_permutation_numpy2(perm)
Results:
82.2 ms ± 7.28 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
479 µs ± 9.19 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
18 µs ± 1.17 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)
4.22 µs ± 388 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
If you only want the inverse permutation, you can use
def inv(perm):
inverse = [0] * len(perm)
for i, p in enumerate(perm):
inverse[p] = i
return inverse
perm = [3, 0, 2, 1]
print(inv(perm))
for i in perm:
print(inv(perm)[i])
[1, 3, 2, 0]
0
1
2
3
I believe the best way to invert a permutation perm is
pinv = sorted(range(len(perm)), key=perm.__getitem__)
This avoids repeated calls to .index() (as in the answer by SeF), which may not be very efficient (quadratic time complexity, while sorting should only take O(n log n)).
Note, however, that this yields as a result a permutation of {0,1,...n-1}, regardless of whether the input was a permutation of {0,1,...,n-1} or of {1,2,...,n} (the latter is what is stated in the question). If the output is supposed to be a permutation of {1,2,...,n}, each element of the result has to be increased by one, for example, like this:
pinv = [i+1 for i in sorted(range(len(perm)), key=perm.__getitem__)]
Correct me if I have this wrong, but I think the problem with my code comes when I change str to a list: str is a string, and list(str) is a list of string elements. However, since string elements can't be numerically compared to numbers, the code fails to produce a result (other than []).
A "functional style" version:
def invert_permutation(permutation):
return [i for i, j in sorted(enumerate(permutation), key=lambda (_, j): j)]
Basically, sorting the indices i of the permutation by their values j in the permutation yields the desired inverse.
p = [2, 1, 5, 0, 4, 3]
invert_permutation(p)
# [3, 1, 0, 5, 4, 2]
# inverse of inverse = identity
invert_permutation(invert_permutation(p)) == p
# True
Just since no one has recommended it here yet, I think it should be mentioned that SymPy has an entire combinatorics module, with a Permutation class:
from sympy.combinatorics import Permutation
o = [3, 0, 2, 1]
p = Permutation(o)
inv = p.__invert__()
print(inv.array_form) # [1, 3, 2, 0]
Using the SymPy class gives you access to a whole lot of other useful methods, such as comparison between equivalent permutations with ==.
You can read the sympy.combinatorics.Permutation source code here.
Other than that, I would recommend the answer on this page using np.arange and argsort.
Maybe there is a shorter way:
def invert(p):
return [p.index(l) for l in range(len(p))]
so that:
perm = [3, 0, 2, 1]; print(invert(perm))
returns
[1,3,2,0]
Related
Basically, I have :
An array giving indexes "I", e.g. (1, 2),
And a list of the same length giving the corresponding number of repetitions "N", e.g. [1, 3]
And I want to create an array containing the indexes I repeated N times, i.e. (1, 2, 2, 2) here, where 1 is repeated one time and 2 is repeated 3 times.
The best solution I've come up with uses the np.repeat and np.concatenate functions :
import numpy as np
list_index = np.arange(2)
list_no_repetition = [1, 3]
result = np.concatenate([np.repeat(index, no_repetition)
for index, no_repetition in zip(list_index, list_no_repetition)])
print(result)
I wonder if there is a "prettier"/"more efficient solution".
Thank you for your help.
Not sure about prettier, but you could solve it completely with list comprehension:
[x for i,l in zip(list_index, list_no_repetition) for x in [i]*l]
Hello this is the alternative that I propose:
import numpy as np
list_index = np.arange(2)
list_no_repetition = [1, 3]
result = np.array([])
for i in range(len(list_index)):
tempA=np.empty(list_no_repetition[i])
tempA.fill(list_index[i])
result = np.concatenate([result, tempA])
result
You could also use a dictionary with key as the index and the value as the amount of times repeated. I think that Andreas had it right with the list comprehension.
import numpy as np
repeatdict = {
1:1,
2:3,
3:6
}
result = [x for key, value in repeatdict.items() for x in [key]*value]
print(result)
If by "efficiency" you mean speed, you can use timeit. Here are some results for some arbitrary, larger data.
First, define the functions and data:
# generate some data (list values/indices and number of reps)
N = 1000
li_2 = np.arange(N)
lnr_2 = np.random.randint(low=0, high=10, size=N)
# three functions produce the same result
def by_range(items, rep_cts):
x = np.full(sum(rep_cts), np.nan)
i = 0
for val, reps in zip(items, rep_cts):
x[i:i + reps] = val
i = i + reps
return x
def by_comp(items, reps):
return np.array([val for val, rep in zip(items, reps) for i in range(rep)])
def by_cat(list_index, list_no_repetition):
return np.concatenate([np.repeat(index, no_repetition)
for index, no_repetition in zip(list_index, list_no_repetition)])
About the same speed: first allocating an array and then filling it in, vs. doing a one-line double-for comprehension.
# 820 µs ± 11.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit by_range(li_2, lnr_2)
# 829 µs ± 4.26 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit by_comp(li_2, lnr_2)
Original method of concatenation is slightly slower:
# 2.19 ms ± 98.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit by_cat(li_2, lnr_2)
Note that the results will differ depending on where/how you run this, and the specific data you're dealing with.
I am trying to order the zeroes and ones in arrangement of the order. The expected output is what I am trying to get to. Without using a list comprehension preferably.
import numpy as np
order = np.array([0,1,0,1,0])
zeroes= np.array([10,55, 30])
ones = np.array([3,8])
Expected Output
[10, 3, 55, 8, 30]
How about this (no Python loops: 750x faster than a list comprehension, when tested on 200k elements):
# note: updated version: faster and more robust to faulty input
def altcat(zeroes, ones, order):
i0 = np.nonzero(order == 0)[0][:len(zeroes)]
i1 = np.nonzero(order == 1)[0][:len(ones)]
z = np.zeros_like(order, dtype=zeroes.dtype)
z[i0] = zeroes[:len(i0)]
z[i1] = ones[:len(i1)]
return z
On your example:
>>> altcat(zeroes=np.array([10,55, 30]), ones=np.array([3,8]),
... order=np.array([0,1,0,1,0]))
array([10, 3, 55, 8, 30])
Speed
# set up
n = 200_000
np.random.seed(0)
order = np.random.randint(0, 2, size=n)
n1 = order.sum()
n0 = n - n1
ones = np.random.randint(100, size=n1)
zeroes = np.random.randint(100, size=n0)
# for comparison, a method proposed elsewhere, based on lists
def altcat_list(zeroes, ones, order):
zeroes = list(zeroes)
ones = list(ones)
return [zeroes.pop(0) if i == 0 else ones.pop(0) for i in order]
Test:
a = %timeit -o altcat(zeroes, ones, order)
# 2.38 ms ± 573 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)
b = %timeit -o altcat_list(zeroes, ones, order)
# 1.84 s ± 1.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
b.average / a.average
# 773.59
Note: I initially tried with n = 1_000_000, but while altcat does that in 12.4ms, the list-based version would take forever and I had to stop it.
It seems that the list-based method is worse than O(n) (100K: 0.4s; 200K: 1.84s; 400K: 10.4s).
Addendum
If you really want to do it with a list comprehension and not in pure numpy, then at least consider this:
def altcat_list_mod(zeroes, ones, order):
it = [iter(zeroes), iter(ones)]
return [next(it[i]) for i in order]
That's faster than altcat_list(), but still almost 25x slower than altcat():
# on 200k elements
c = %timeit -o altcat_list_mod(zeroes, ones, order)
# 60 ms ± 24.3 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
c.average / a.average
# 24.93
for my class I need to write more optimized math function using NumPy. Problem is, when using NumPy my solutions are slower when native Python.
function which cubes all the elements of an array and sum them
Python:
def cube(x):
result = 0
for i in range(len(x)):
result += x[i] ** 3
return result
My, using NumPy (15-30% slower):
def cube(x):
it = numpy.nditer([x, None])
for a, b in it:
b[...] = a*a*a
return numpy.sum(it.operands[1])
Some random calculation function
Python:
def calc(x):
m = sum(x) / len(x)
result = 0
for i in range(len(x)):
result += (x[i] - m)**4
return result / len(x)
NumPy (>10x slower):
def calc(x):
m = numpy.mean(x)
result = 0
for i in range(len(x)):
result += numpy.power((x[i] - m), 4)
return result / len(x)
I don't know how to approatch this, so far I have tried random functions from NumPy
To elaborate on what has been said in comments:
Numpy's power comes from being able to do all the looping in fast c/fortran rather than slow Python looping. For example, if you have an array x and you want to calculate the square of every value in that array, you could do
y = []
for value in x:
y.append(value**2)
or even (with a list comprehension)
y = [value**2 for value in x]
but it will be much faster if you can do all the looping inside numpy with
y = x**2
(assuming x is already a numpy array).
So for your examples, the proper way to do it in numpy would be
1.
def sum_of_cubes(x):
result = 0
for i in range(len(x)):
result += x[i] ** 3
return result
def sum_of_cubes_numpy(x):
return (x**3).sum()
def calc(x):
m = sum(x) / len(x)
result = 0
for i in range(len(x)):
result += (x[i] - m)**4
return result / len(x)
def calc_numpy(x):
m = numpy.mean(x) # or just x.mean()
return numpy.sum((x - m)**4) / len(x)
Note that I've assumed that the input x is already a numpy array, not a regular Python list: if you have a list lst, you can create an array from it with arr = numpy.array(lst).
In [337]: def cube(x):
...: result = 0
...: for i in range(len(x)):
...: result += x[i] ** 3
...: return result
...:
nditer is not a good numpy iterator, at least not when used in Python level code. It's really just a stepping stone toward writing compiled code. It's docs need a better disclaimer.
In [338]: def cube1(x):
...: it = numpy.nditer([x, None])
...: for a, b in it:
...: b[...] = a*a*a
...: return numpy.sum(it.operands[1])
...:
In [339]: cube(list(range(10)))
Out[339]: 2025
In [340]: cube1(list(range(10)))
Out[340]: 2025
In [341]: cube1(np.arange(10))
Out[341]: 2025
A more direct numpy iteration:
In [342]: def cube2(x):
...: it = [a*a*a for a in x]
...: return numpy.sum(it)
...:
The better whole-array code. Just as sum can work with the whole array, the power also applies the whole.
In [343]: def cube3(x):
...: return numpy.sum(x**3)
...:
In [344]: cube2(np.arange(10))
Out[344]: 2025
In [345]: cube3(np.arange(10))
Out[345]: 2025
Doing some timings:
The list reference:
In [346]: timeit cube(list(range(1000)))
438 µs ± 9.87 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
The slow nditer:
In [348]: timeit cube1(np.arange(1000))
2.8 ms ± 5.65 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
The partial numpy:
In [349]: timeit cube2(np.arange(1000))
520 µs ± 20 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
I can improve its time by passing a list instead of an array. Iteration on lists is faster.
In [352]: timeit cube2(list(range(1000)))
229 µs ± 9.53 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
But the time for a 'pure' numpy version blows all of those out of the water:
In [350]: timeit cube3(np.arange(1000))
23.6 µs ± 128 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
The general rule is that numpy methods applied to a numpy array are fastest. But if you must loop, it's usually better to use lists.
Sometimes the pure numpy approach creates very large temporary array. Then memory management complexities can reduce performance. In such cases a modest of number of iterations on a complex task may be best.
I want to apply outer addition of multiple vectors/matrices. Let's say four times:
import numpy as np
x = np.arange(100)
B = np.add.outer(x,x)
B = np.add.outer(B,x)
B = np.add.outer(B,x)
I would like best if the number of additions could be a variable, like a=4 --> 4 times the addition. Is this possible?
Approach #1
Here's one with array-initialization -
n = 4 # number of iterations to add outer versions
l = len(x)
out = np.zeros([l]*n,dtype=x.dtype)
for i in range(n):
out += x.reshape(np.insert([1]*(n-1),i,l))
Why this approach and not iterative addition to create new arrays at each iteration?
Iteratively creating new arrays at each iteration would require more memory and hence memory-overhead there. With array-initialization, we are adding element off x into an already initialized array. Hence, it tries to be memory-efficient with it.
Alternative #1
We can remove one iteration with initializing with x. Hence, the changes would be -
out = np.broadcast_to(x,[l]*n).copy()
for i in range(n-1):
Approach # 2: With np.add.reduce -
Another way would be with np.add.reduce, which again doesn't create any intermediate arrays, but being a reduction method might be better here as that's what it's implemented for -
l = len(x); n = 4
np.add.reduce([x.reshape(np.insert([1]*(n-1),i,l)) for i in range(n)])
Timings -
In [17]: x = np.arange(100)
In [18]: %%timeit
...: n = 4 # number of iterations to add outer versions
...: l = len(x)
...: out = np.zeros([l]*n,dtype=x.dtype)
...: for i in range(n):
...: out += x.reshape(np.insert([1]*(n-1),i,l))
829 ms ± 28.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [19]: l = len(x); n = 4
In [20]: %timeit np.add.reduce([x.reshape(np.insert([1]*(n-1),i,l)) for i in range(n)])
183 ms ± 2.52 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
I don't think there's a builtin argument to repeat this procedure several times, but you can define a custom function for it fairly easily
def recursive_outer_add(arr, num):
if num == 1:
return arr
x = np.add.outer(arr, arr)
for i in range(num - 1):
x = np.add.outer(x, arr)
return x
Just as a warning: the array gets really big really fast
Short and reasonably fast:
n = 4
l = 10
x = np.arange(l)
sum(np.ix_(*n*(x,)))
timeit(lambda:sum(np.ix_(*n*(x,))),number=1000)
# 0.049082988989539444
We can speed this up a little by going back to front:
timeit(lambda:sum(reversed(np.ix_(*n*(x,)))),number=1000)
# 0.03847671199764591
We can also build our own reversed np.ix_:
from operator import getitem
from itertools import accumulate,chain,repeat
sum(accumulate(chain((x,),repeat((slice(None),None),n-1)),getitem))
timeit(lambda:sum(accumulate(chain((x,),repeat((slice(None),None),n-1)),getitem)),number=1000)
# 0.02427654700295534
I have a sparse matrix in csr format, e.g.:
>>> a = sp.random(3, 3, 0.6, format='csr') # an example
>>> a.toarray() # just to see how it looks like
array([[0.31975333, 0.88437035, 0. ],
[0. , 0. , 0. ],
[0.14013856, 0.56245834, 0.62107962]])
>>> a.data # data array
array([0.31975333, 0.88437035, 0.14013856, 0.56245834, 0.62107962])
For this particular example, I want to get [0, 4] which are the data-array indices of the non-zero diagonal elements 0.31975333 and 0.62107962.
A simple way to do this is the following:
ind = []
seen = set()
for i, val in enumerate(a.data):
if val in a.diagonal() and val not in seen:
ind.append(i)
seen.add(val)
But in practice the matrix is very big, so I don't want to use the for loops or convert to numpy array using toarray() method. Is there a more efficient way to do it?
Edit: I just realized that the above code gives incorrect result in cases when there are off-diagonal elements equal to and preceding some of the diagonal elements: it returns the indices of that off-diagonal element. Also, it doesn't return the indices of repeating diagonal elements. For example:
a = np.array([[0.31975333, 0.88437035, 0. ],
[0.62107962, 0.31975333, 0. ],
[0.14013856, 0.56245834, 0.62107962]])
a = sp.csr_matrix(a)
>>> a.data
array([0.31975333, 0.88437035, 0.62107962, 0.31975333, 0.14013856,
0.56245834, 0.62107962])
My code returns ind = [0, 2], but it should be [0, 3, 6].
The code provided by Andras Deak (his get_rowwise function), returns the correct result.
I've found a possibly more efficient solution, though it still loops. However, it loops over the rows of the matrix rather than on the elements themselves. Depending on the sparsity pattern of your matrix this might or might not be faster. This is guaranteed to cost N iterations for a sparse matrix with N rows.
We just loop through each row, fetch the filled column indices via a.indices and a.indptr, and if the diagonal element for the given row is present in the filled values then we compute its index:
import numpy as np
import scipy.sparse as sp
def orig_loopy(a):
ind = []
seen = set()
for i, val in enumerate(a.data):
if val in a.diagonal() and val not in seen:
ind.append(i)
seen.add(val)
return ind
def get_rowwise(a):
datainds = []
indices = a.indices # column indices of filled values
indptr = a.indptr # auxiliary "pointer" to data indices
for irow in range(a.shape[0]):
rowinds = indices[indptr[irow]:indptr[irow+1]] # column indices of the row
if irow in rowinds:
# then we've got a diagonal in this row
# so let's find its index
datainds.append(indptr[irow] + np.flatnonzero(irow == rowinds)[0])
return datainds
a = sp.random(300, 300, 0.6, format='csr')
orig_loopy(a) == get_rowwise(a) # True
For a (300,300)-shaped random input with the same density the original version runs in 3.7 seconds, the new version runs in 5.5 milliseconds.
Method 1
This is a vectorized approach, which generates all nonzero indices first and than gets the positions where row and column index is the same. This is a bit slow and has a high memory usage.
import numpy as np
import scipy.sparse as sp
import numba as nb
def get_diag_ind_vec(csr_array):
inds=csr_array.nonzero()
return np.array(np.where(inds[0]==inds[1])[0])
Method 2
Loopy approaches are in general no problem regarding peformance, as long as you make use of Compiler eg. Numba or Cython. I allocated memory for the maximum diagonal elements that could occour. If this method uses to much memory it can be easily modified.
#nb.jit()
def get_diag_ind(csr_array):
ind=np.empty(csr_array.shape[0],dtype=np.uint64)
rowPtr=csr_array.indptr
colInd=csr_array.indices
ii=0
for i in range(rowPtr.shape[0]-1):
for j in range(rowPtr[i],rowPtr[i+1]):
if (i==colInd[j]):
ind[ii]=j
ii+=1
return ind[:ii]
Timings
csr_array = sp.random(1000, 1000, 0.5, format='csr')
get_diag_ind_vec(csr_array) -> 8.25ms
get_diag_ind(csr_array) -> 0.65ms (first call excluded)
Here's my solution which seems to be faster than get_rowwise (Andras Deak) and get_diag_ind_vec (max9111) (I do not consider the use of Numba or Cython).
The idea is to set the non-zero diagonal elements of the matrix (or its copy) to some unique value x that is not in the original matrix (I chose the max value + 1), and then simply use np.where(a.data == x) to return the desired indices.
def diag_ind(a):
a = a.copy()
i = a.diagonal() != 0
x = np.max(a.data) + 1
a[i, i] = x
return np.where(a.data == x)
Timing:
A = sp.random(1000, 1000, 0.5, format='csr')
>>> %timeit diag_ind(A)
6.32 ms ± 335 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
>>> %timeit get_diag_ind_vec(A)
14.6 ms ± 292 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
>>> %timeit get_rowwise(A)
24.3 ms ± 5.28 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Edit: copying the sparse matrix (in order to preserve the original matrix) is not memory efficient, so a better solution would be to store the diagonal elements and later use them for restoring the original matrix.
def diag_ind2(a):
a_diag = a.diagonal()
i = a_diag != 0
x = np.max(a.data) + 1
a[i, i] = x
ind = np.where(a.data == x)
a[i, i] = a_diag[np.nonzero(a_diag)]
return ind
This is even faster:
>>> %timeit diag_ind2(A)
2.83 ms ± 419 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)