Python: finding distances between list fields - python

Hi I need to calculate distances between every numbers pair in list, including the distance between the last and the first (it's a circle).
Naively i can do something like that:
l = [10,-12,350]
ret = []
for i in range(len(l)-1):
ret.append(abs(l[i] - l[i+1]))
ret.append(l[-1] - l[0])
print ret
out: [22, 362, 340]
I tried "enumerate" which is a little bit better way:
print [abs(v - (l+[l[0]])[i+1]) for i, v in enumerate(l)]
out: [22, 362, 340]
Is there more elegant and "pythonic" way?

I'd argue this is a small improvement. There could well be a cleaner way than this though:
print [abs(v - l[(i+1)%len(l)]) for i, v in enumerate(l)]

Another method:
print map(lambda x,y: abs(x-y), l[1:] + l[:1], l)

Not a huge improvement:
>>> [abs(a - b) for a, b in zip(l, l[1:] + l[:-1])]
[22, 362, 340]

It's probably not as good as other answers in this case, but if used as part of a larger codebase, it could be useful to define an iterator which returns pairs of items over a list, e.g.:
def pairs(l):
if len(l) < 2:
return
for i in range(len(l)-1):
yield l[i], l[i+1]
yield l[-1], l[0]
print [abs(a - b) for a,b in pairs([10,-12,350])]
It's not a one-liner, but is fairly readable.

If you're happy to use numpy...
list(numpy.abs(numpy.ediff1d(l, to_end=l[0]-l[-1])))
This scales well with longer l. Not converting to or from a list will speed things up quite a bit (very often a numpy array can be used in place of a list anyway).
Or you can construct it youself using numpy.roll:
list(numpy.abs(l - numpy.roll(l, -1)))
A few timings:
In [37]: l = list(numpy.random.randn(1000))
In [38]: timeit [abs(v - l[(i+1)%len(l)]) for i, v in enumerate(l)]
1000 loops, best of 3: 936 us per loop
In [39]: timeit list(numpy.abs(numpy.ediff1d(l, to_end=l[0]-l[-1])))
1000 loops, best of 3: 367 us per loop
In [40]: _l = numpy.array(l)
In [41]: timeit numpy.abs(numpy.ediff1d(_l, to_end=l[0]-l[-1]))
10000 loops, best of 3: 48.9 us per loop
In [42]: timeit _l = numpy.array(l); list(numpy.abs(_l - numpy.roll(_l, -1)))
1000 loops, best of 3: 350 us per loop
In [43]: timeit numpy.abs(_l - numpy.roll(_l, -1))
10000 loops, best of 3: 32.2 us per loop
If raw speed is your thing, quicker still, but not so neat, you can use sliced arrays directly:
In [78]: timeit a = numpy.empty(_l.shape, _l.dtype); a[:-1] = _l[:-1] - _l[1:]; a[-1] = _l[-1] - _l[0]; a = numpy.abs(a)
10000 loops, best of 3: 20.5 us per loop

Combining the answer from icecrime with this answer provides another pythonic possibility:
print [numpy.linalg.norm(a-b) for a, b in zip(l, l[1:] + l[:-1])]

Related

Numpy.dot nests vector when multiplying [duplicate]

I am using numpy. I have a matrix with 1 column and N rows and I want to get an array from with N elements.
For example, if i have M = matrix([[1], [2], [3], [4]]), I want to get A = array([1,2,3,4]).
To achieve it, I use A = np.array(M.T)[0]. Does anyone know a more elegant way to get the same result?
Thanks!
If you'd like something a bit more readable, you can do this:
A = np.squeeze(np.asarray(M))
Equivalently, you could also do: A = np.asarray(M).reshape(-1), but that's a bit less easy to read.
result = M.A1
https://numpy.org/doc/stable/reference/generated/numpy.matrix.A1.html
matrix.A1
1-d base array
A, = np.array(M.T)
depends what you mean by elegance i suppose but thats what i would do
You can try the following variant:
result=np.array(M).flatten()
np.array(M).ravel()
If you care for speed; But if you care for memory:
np.asarray(M).ravel()
Or you could try to avoid some temps with
A = M.view(np.ndarray)
A.shape = -1
First, Mv = numpy.asarray(M.T), which gives you a 4x1 but 2D array.
Then, perform A = Mv[0,:], which gives you what you want. You could put them together, as numpy.asarray(M.T)[0,:].
This will convert the matrix into array
A = np.ravel(M).T
ravel() and flatten() functions from numpy are two techniques that I would try here. I will like to add to the posts made by Joe, Siraj, bubble and Kevad.
Ravel:
A = M.ravel()
print A, A.shape
>>> [1 2 3 4] (4,)
Flatten:
M = np.array([[1], [2], [3], [4]])
A = M.flatten()
print A, A.shape
>>> [1 2 3 4] (4,)
numpy.ravel() is faster, since it is a library level function which does not make any copy of the array. However, any change in array A will carry itself over to the original array M if you are using numpy.ravel().
numpy.flatten() is slower than numpy.ravel(). But if you are using numpy.flatten() to create A, then changes in A will not get carried over to the original array M.
numpy.squeeze() and M.reshape(-1) are slower than numpy.flatten() and numpy.ravel().
%timeit M.ravel()
>>> 1000000 loops, best of 3: 309 ns per loop
%timeit M.flatten()
>>> 1000000 loops, best of 3: 650 ns per loop
%timeit M.reshape(-1)
>>> 1000000 loops, best of 3: 755 ns per loop
%timeit np.squeeze(M)
>>> 1000000 loops, best of 3: 886 ns per loop
Came in a little late, hope this helps someone,
np.array(M.flat)

How do a split integers within a list into single digits only?

Let's say I have something like this:
list(range(9:12))
Which gives me a list:
[9,10,11]
However I want it to be like:
[9,1,0,1,1]
Which splits every integer into single digits, is there anyway of achieving this without sacrificing too much performance? Or is there a way of generating list like these in the first place?
You can build the final result efficiently without having to build one large and/or small intermediate strings using itertools.chain.from_iterable.
In [18]: list(map(int, chain.from_iterable(map(str, range(9, 12)))))
Out[18]: [9, 1, 0, 1, 1]
In [12]: %%timeit
...: list(map(int, chain.from_iterable(map(str, range(9, 20)))))
...:
100000 loops, best of 3: 8.19 µs per loop
In [13]: %%timeit
...: [int(i) for i in ''.join(map(str, range(9, 20)))]
...:
100000 loops, best of 3: 9.15 µs per loop
In [14]: %%timeit
...: [int(x) for i in range(9, 20) for x in str(i)]
...:
100000 loops, best of 3: 9.92 µs per loop
Timings scale with input. The itertools version also uses memory efficiently although it is marginally slower than the str.join version if used with list(map(int, ...)):
In [15]: %%timeit
...: list(map(int, chain.from_iterable(map(str, range(9, 200)))))
...:
10000 loops, best of 3: 138 µs per loop
In [16]: %%timeit
...: [int(i) for i in ''.join(map(str, range(9, 200)))]
...:
10000 loops, best of 3: 159 µs per loop
In [17]: %%timeit
...: [int(x) for i in range(9, 200) for x in str(i)]
...:
10000 loops, best of 3: 182 µs per loop
In [18]: %%timeit
...: list(map(int, ''.join(map(str, range(9, 200)))))
...:
10000 loops, best of 3: 130 µs per loop
simplest way is,
>>> [int(i) for i in range(9,12) for i in str(i)]
[9, 1, 0, 1, 1]
>>>
Convert the integers to strings, then split() the string and reconvert the digits back to ints.
li = range(9,12)
digitlist = [int(d) for number in li for d in str(number)]
Output:
[9,1,0,1,1]
I've investigated how performant I can make this a little more. The first function I wrote was naive_single_digits, which uses the str approach, with a pretty efficient list comprehension.
def naive_single_digits(l):
return [int(c) for n in l
for c in str(n)]
As you can see, this approach works:
In [2]: naive_single_digits(range(9, 15))
Out[2]: [9, 1, 0, 1, 1, 1, 2, 1, 3, 1, 4]
However, I thought that it would surely be unecessary to always build a str object for each item in the list - all we actually need is a base conversion to digits. Out of laziness, I copied this function from here. I've optimised it a bit by specifying it to base 10.
def base10(n):
if n == 0:
return [0]
digits = []
while n:
digits.append(n % 10)
n //= 10
return digits[::-1]
Using this, I made
def arithmetic_single_digits(l):
return [i for n in l
for i in base10(n)]
which also behaves correctly:
In [3]: arithmetic_single_digits(range(9, 15))
Out[3]: [9, 1, 0, 1, 1, 1, 2, 1, 3, 1, 4]
Now to time it. I've also tested against one other answer (full disclosure: I modified it a bit to work in Python2, but that shouldn't have affected the performance much)
In [11]: %timeit -n 10 naive_single_digits(range(100000))
10 loops, best of 3: 173 ms per loop
In [10]: %timeit -n 10 list(map(int, itertools.chain(*map(str, range(100000)))))
10 loops, best of 3: 154 ms per loop
In [12]: %timeit arithmetic_single_digits(range(100000))
10 loops, best of 3: 93.3 ms per loop
As you can see arithmetic_single_digits is actually somewhat faster, although this is at the cost of more code and possibly less clarity. I've tested against ridiculously large inputs, so you can see a difference in performance - at any kind of reasonable scale, every answer here will be blazingly fast. Note that python's integer arithmetic is probably actually relatively slow, as it doesn't use a primitive integer type. If this were to be implemented in C, I'd suspect my approach to get a bit faster.
Comparing this to viblo's answer, using (pure) Python 3 (to my shame I haven't installed ipython for python 3):
print(timeit.timeit("digits(range(1, 100000))", number=10, globals=globals()))
print(timeit.timeit("arithmetic_single_digits(range(1, 100000))", number=10, globals=globals()))
This has the output of:
3.5284318959747907
0.806847038998967
My approach is quite a bit faster, presumably because I'm purely using integer arithmetic.
Another way to write an arithmetic solution. Compared to Izaak van Dongens solution this doesnt use a while loop but calculates upfront how many iterations it need in the list comprehension/loop.
import itertools, math
def digits(ns):
return list(itertools.chain.from_iterable(
[
[
(abs(n) - (abs(n) // 10 **x)*10**x ) // 10**(x-1)
for x
in range(1+math.floor(math.log10(abs(n) if n!=0 else 1)), 0, -1)]
for n in ns
]
))
digits([-11,-10,-9,0,9,10,11])
Turn it to a string then back into a list :)
lambda x: list(''.join(str(e) for e in x))
You can also do with map function
a=range(9,12)
res = []
b=[map(int, str(i)) for i in a]
for i in b:
res.extend(i)
print(res)
here is how I did it:
ls = range(9,12)
lsNew = []
length = len(ls)
for i in range(length):
item = ls[i]
string = str(item)
if len(string) > 1:
split = list(string)
lsNew = lsNew + split
else:
lsNew.append(item)
ls = lsNew
print(ls)
def breakall(L):
if L == []:
return []
elif L[0] < 10:
return [L[0]] + breakall(L[1:])
else:
return breakall([L[0]//10]) + [L[0] % 10] + breakall(L[1:])
print(breakall([9,10,12]))
-->
[9, 1, 0, 1, 2]

Cannot understand numpy argpartition output

I am trying to use arpgpartition from numpy, but it seems there is something going wrong and I cannot seem to figure it out. Here is what's happening:
These are first 5 elements of the sorted array norms
np.sort(norms)[:5]
array([ 53.64759445, 54.91434479, 60.11617279, 64.09630585, 64.75318909], dtype=float32)
But when I use indices_sorted = np.argpartition(norms, 5)[:5]
norms[indices_sorted]
array([ 60.11617279, 64.09630585, 53.64759445, 54.91434479, 64.75318909], dtype=float32)
When I think I should get the same result as the sorted array?
It works just fine when I use 3 as the parameter indices_sorted = np.argpartition(norms, 3)[:3]
norms[indices_sorted]
array([ 53.64759445, 54.91434479, 60.11617279], dtype=float32)
This isn't making much sense to me, hoping someone can offer some insight?
EDIT: Rephrasing this question as whether argpartition preserves order of the k partitioned elements makes more sense.
We need to use list of indices that are to be kept in sorted order instead of feeding the kth param as a scalar. Thus, to maintain the sorted nature across the first 5 elements, instead of np.argpartition(a,5)[:5], simply do -
np.argpartition(a,range(5))[:5]
Here's a sample run to make things clear -
In [84]: a = np.random.rand(10)
In [85]: a
Out[85]:
array([ 0.85017222, 0.19406266, 0.7879974 , 0.40444978, 0.46057793,
0.51428578, 0.03419694, 0.47708 , 0.73924536, 0.14437159])
In [86]: a[np.argpartition(a,5)[:5]]
Out[86]: array([ 0.19406266, 0.14437159, 0.03419694, 0.40444978, 0.46057793])
In [87]: a[np.argpartition(a,range(5))[:5]]
Out[87]: array([ 0.03419694, 0.14437159, 0.19406266, 0.40444978, 0.46057793])
Please note that argpartition makes sense on performance aspect, if we are looking to get sorted indices for a small subset of elements, let's say k number of elems which is a small fraction of the total number of elems.
Let's use a bigger dataset and try to get sorted indices for all elems to make the above mentioned point clear -
In [51]: a = np.random.rand(10000)*100
In [52]: %timeit np.argpartition(a,range(a.size-1))[:5]
10 loops, best of 3: 105 ms per loop
In [53]: %timeit a.argsort()
1000 loops, best of 3: 893 µs per loop
Thus, to sort all elems, np.argpartition isn't the way to go.
Now, let's say I want to get sorted indices for only the first 5 elems with that big dataset and also keep the order for those -
In [68]: a = np.random.rand(10000)*100
In [69]: np.argpartition(a,range(5))[:5]
Out[69]: array([1647, 942, 2167, 1371, 2571])
In [70]: a.argsort()[:5]
Out[70]: array([1647, 942, 2167, 1371, 2571])
In [71]: %timeit np.argpartition(a,range(5))[:5]
10000 loops, best of 3: 112 µs per loop
In [72]: %timeit a.argsort()[:5]
1000 loops, best of 3: 888 µs per loop
Very useful here!
Given the task of indirectly sorting a subset (the top k, top meaning first in sort order) there are two builtin solutions: argsort and argpartition cf. #Divakar's answer.
If, however, performance is a consideration then it may (depending on the sizes of the data and the subset of interest) be well worth resisting the "lure of the one-liner", investing one more line and applying argsort on the output of argpartition:
>>> def top_k_sort(a, k):
... return np.argsort(a)[:k]
...
>>> def top_k_argp(a, k):
... return np.argpartition(a, range(k))[:k]
...
>>> def top_k_hybrid(a, k):
... b = np.argpartition(a, k)[:k]
... return b[np.argsort(a[b])]
>>> k = 100
>>> timeit.timeit('f(a,k)', 'a=rng((100000,))', number = 1000, globals={'f': top_k_sort, 'rng': np.random.random, 'k': k})
8.348663672804832
>>> timeit.timeit('f(a,k)', 'a=rng((100000,))', number = 1000, globals={'f': top_k_argp, 'rng': np.random.random, 'k': k})
9.869098862167448
>>> timeit.timeit('f(a,k)', 'a=rng((100000,))', number = 1000, globals={'f': top_k_hybrid, 'rng': np.random.random, 'k': k})
1.2305558240041137
argsort is O(n log n), argpartition with range argument appears to be O(nk) (?), and argpartition + argsort is O(n + k log k)
Therefore in an interesting regime n >> k >> 1 the hybrid method is expected to be fastest
UPDATE: ND version:
import numpy as np
from timeit import timeit
def top_k_sort(A,k,axis=-1):
return A.argsort(axis=axis)[(*axis%A.ndim*(slice(None),),slice(k))]
def top_k_partition(A,k,axis=-1):
return A.argpartition(range(k),axis=axis)[(*axis%A.ndim*(slice(None),),slice(k))]
def top_k_hybrid(A,k,axis=-1):
B = A.argpartition(k,axis=axis)[(*axis%A.ndim*(slice(None),),slice(k))]
return np.take_along_axis(B,np.take_along_axis(A,B,axis).argsort(axis),axis)
A = np.random.random((100,10000))
k = 100
from timeit import timeit
for f in globals().copy():
if f.startswith("top_"):
print(f, timeit(f"{f}(A,k)",globals=globals(),number=10)*100)
Sample run:
top_k_sort 63.72379460372031
top_k_partition 99.30561298970133
top_k_hybrid 10.714635509066284
Let's describe the partition method in a simplified way which helps a lot understand argpartition
Following the example in the picture if we execute C=numpy.argpartition(A, 3) C will be the resulting array of getting the position of every element in B with respect to the A array. ie:
Idx(z) = index of element z in array A
then C would be
C = [ Idx(B[0]), Idx(B[1]), Idx(B[2]), Idx(X), Idx(B[4]), ..... Idx(B[N]) ]
As previously mentioned this method is very helpful and comes very handy when you have a huge array and you are only interested in a selected group of ordered elements, not the whole array.

Processing time difference between tuple-list

Wondering why this tuple process;
x = tuple((t for t in range(100000)))
# 0.014001131057739258 seconds
Took longer than this list;
y = [z for z in range(100000)]
# 0.005000114440917969 seconds
I learned that tuple processes are faster than list since tuples are immutable.
Edit: After I changed the codes;
x = tuple(t for t in range(100000))
y = list(z for z in range(100000))
>>>
0.009999990463256836
0.0
>>>
These are the result: Still tuple is the slower one.
Tuple operations aren't necessarily faster. Being immutable at most opens the door to more optimisations, but that doesn't mean Python does them or that they apply in every case.
The difference here is very marginal, and - without profiling to confirm - it seems likely that it relates to the generator version having an extra name lookup and function call. As mentioned in the comments, rewriting the list comprehension as a call to list wrapped around a generator expression, the difference will likely shrink.
using comparative methods of testing the tuple is slightly faster:
In [12]: timeit tuple(t for t in range(100000))
100 loops, best of 3: 7.41 ms per loop
In [13]: timeit list(t for t in range(100000))
100 loops, best of 3: 7.53 ms per loop
calling list does actually create a list:
In [19]: x = list(t for t in range(10))
In [20]: x
Out[20]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
we can also see calling list on the generator does not allocate as much space as using a list comprehension:
In [28]: x = list(t for t in range(10))
In [29]: sys.getsizeof(x)
Out[29]: 168
In [30]: x = [t for t in range(10)]
In [31]: sys.getsizeof(x)
Out[31]: 200
So both operations are very similar.
A better comparison would be creating lists and tuples as subelements:
In [41]: timeit tuple((t,) for t in range(1000000))
10 loops, best of 3: 151 ms per loop
In [42]: timeit list([t] for t in range(1000000))
1 loops, best of 3: 247 ms per loop
Now we see a much larger difference.

Mapping an array of booleans to an array of values and returning True instances in Python

I have a list of values:
a = [1,2,3,4]
And a corresponding list of Booleans:
b = [True, True, False, True]
I want to map b onto a such that I get all values in a such that their corresponding value in b is 'True'. So the answer in this instance would be [1,2,4]
The only way I can think of doing is to loop through the elements of b, get the indices that are True, and then retrieve the corresponding index in a. So something like:
def maplist(l1, l2):
# list1 is a list of Booleans to map onto list2
l2_true = []
for el in range(len(l1)):
if l1[el] == True:
l2_true.append(l2[el])
return l2_true
Is there a better way to do this?
Here is a list comprehension that should do what you want:
[v for i, v in enumerate(a) if b[i]]
Another approach:
[x for x, y in zip(a, b) if y]
I know that the question states two lists, and doesn't mention numpy. But if you will consider using it, and given that a and b are numpy arrays, the mapping operation becomes trivial:
a[b]
I've taken the liberty of benchmarking the suggested options, using 1000x elements:
import numpy
a = [1,2,3,4] * 1000
b = [True, True, False, True] * 1000
def question_fn():
l2_true = []
for el in range(len(a)):
if b[el] == True:
l2_true.append(a[el])
return l2_true
def suggestion_1():
return [v for i, v in enumerate(a) if b[i]]
def suggestion_2():
return [x for x,y in zip(a,b) if y]
x = numpy.array(a)
y = numpy.array(b)
def using_numpy():
return x[y]
python -m timeit -s 'import so' 'so.question_fn()'
1000 loops, best of 3: 453 usec per loop
python -m timeit -s 'import so' 'so.suggestion_1()'
10000 loops, best of 3: 203 usec per loop
python -m timeit -s 'import so' 'so.suggestion_2()'
1000 loops, best of 3: 238 usec per loop
python -m timeit -s 'import so' 'so.using_numpy()'
10000 loops, best of 3: 23 usec per loop
Please note that the numpy timing does not include converting to arrays, otherwise it would be much slower than all of the other suggested solutions. But, if using numpy arrays from the start is an option, it might a viable solution.
Or this:
[a[i] for i in range(len(a)) if b[i]]

Categories

Resources