Assignment to discontinuous slices in python - python

In Matlab I can do this:
s1 = 'abcdef'
s2 = 'uvwxyz'
s1(1:2:end) = s2(1:2:end)
s1 is now 'ubwdyf'
This is just an example of the general:
A(I) = B
Where A,B are vectors, I a vector of indices and B is the same length as I. (Im ignoring matrices for the moment).
What would be the pythonic equivalent of the general case in Python? Preferably it should also run on jython/ironpython (no numpy)
Edit: I used strings as a simple example but solutions with lists (as already posted, wow) are what I was looking for. Thanks.

>>> s1 = list('abcdef')
>>> s2 = list('uvwxyz')
>>> s1[0::2] = s2[0::2]
>>> s1
['u', 'b', 'w', 'd', 'y', 'f']
>>> ''.join(s1)
'ubwdyf'
The main differences are:
Strings are immutable in Python. You can use lists of characters instead though.
Indexing is 0-based in Python.
The slicing syntax is [start : stop : step] where all parameters are optional.

Strings are immutable in Python, so I will use lists in my examples.
You can assign to slices like this:
a = range(5)
b = range(5, 7)
a[1::2] = b
print a
which will print
[0, 5, 2, 6, 4]
This will only work for slices with a constant increment. For the more general A[I] = B, you need to use a for loop:
for i, b in itertools.izip(I, B):
A[i] = b

NumPy arrays can be indexed with an arbitrary list, much as in Matlab:
>>> x = numpy.array(range(10)) * 2 + 5
>>> x
array([ 5, 7, 9, 11, 13, 15, 17, 19, 21, 23])
>>> x[[1,6,4]]
array([ 7, 17, 13])
and assignment:
>>> x[[1,6,4]] = [0, 0, 0]
>>> x
array([ 5, 0, 9, 11, 0, 15, 0, 19, 21, 23])
Unfortunately, I don't think it is possible to get this without numpy, so you'd just need to loop for those.

Related

Minimum distance for each value in array respect to other

I have two numpy arrays of integers A and B. The values in array A and B correspond to time-points at which events A and B occurred. I would like to transform A to contain the time since the most recent event b occurred.
I know I need to subtract each element of A by its nearest smaller the element of B but am unsure of how to do so. Any help would be greatly appreciated.
>>> import numpy as np
>>> A = np.array([11, 12, 13, 17, 20, 22, 33, 34])
>>> B = np.array([5, 10, 15, 20, 25, 30])
Desired Result:
cond_a = relative_timestamp(to_transform=A, reference=B)
cond_a
>>> array([1, 2, 3, 2, 0, 2, 3, 4])
You can use np.searchsorted to find the indices where the elements of A should be inserted in B to maintain order. In other words, you are finding the closest elemet in B for each element in A:
idx = np.searchsorted(B, A, side='right')
result = A-B[idx-1] # substract one for proper index
According to the docs searchsorted uses binary search, so it will scale fine for large inputs.
Here's an approach consisting on computing the pairwise differences. Note that it has a O(n**2) complexity so it might for larger arrays #brenlla's answer will perform much better.
The idea here is to use np.subtract.outer and then find the minimum difference along axis 1 over a masked array, where only values in B smaller than a are considered:
dif = np.abs(np.subtract.outer(A,B))
np.ma.array(dif, mask = A[:,None] < B).min(1).data
# array([1, 2, 3, 2, 0, 2, 3, 4])
As I am not sure, if it is really faster to calculate all pairwise differences, instead of a python loop over each array entry (worst case O(Len(A)+len(B)), the solution with a loop:
A = np.array([11, 12, 13, 17, 20, 22, 33, 34])
B = np.array([5, 10, 15, 20, 25, 30])
def calculate_next_distance(to_transform, reference):
max_reference = len(reference) - 1
current_reference = 0
transformed_values = np.zeros_like(to_transform)
for i, value in enumerate(to_transform):
while current_reference < max_reference and reference[current_reference+1] <= value:
current_reference += 1
transformed_values[i] = value - reference[current_reference]
return transformed_values
calculate_next_distance(A,B)
# array([1, 2, 3, 2, 0, 2, 3, 4])

Python: turn single array of sorted, repeat values into an array of arrays?

I have a sorted array with some repeated values. How can this array be turned into an array of arrays with the subarrays grouped by value (see below)? In actuality, my_first_array has ~8 million entries, so the solution would preferably be as time efficient as possible.
my_first_array = [1,1,1,3,5,5,9,9,9,9,9,10,23,23]
wanted_array = [ [1,1,1], [3], [5,5], [9,9,9,9,9], [10], [23,23] ]
itertools.groupby makes this trivial:
import itertools
wanted_array = [list(grp) for _, grp in itertools.groupby(my_first_array)]
With no key function, it just yields groups consisting of runs of identical values, so you list-ify each one in a list comprehension; easy-peasy. You can think of it as basically a within-Python API for doing the work of the GNU toolkit program, uniq, and related operations.
In CPython (the reference interpreter), groupby is implemented in C, and it operates lazily and linearly; the data must already appear in runs matching the key function, so sorting might make it too expensive, but for already sorted data like you have, there is nothing that will be more efficient.
Note: If the inputs might be value identical, but different objects, it may make sense for memory reasons to change list(grp) for _, grp to [k] * len(list(grp)) for k, grp. The former would retain the original (possibly value but not identity duplicate) objects in the final result, the latter would replicate the first object from each group instead, reducing the final cost per group to the cost of N references to a single object, instead of N references to between 1 and N objects.
I am assuming that the input is a NumPy array and you are looking for a list of arrays as output. Now, you can split the input array at indices where those shifts (groups of repeats have boundaries) with np.split. To find such indices, there are two ways - Using np.unique with its optional argument return_index set as True, and another with a combination of np.where and np.diff. Thus, we would have two approaches as listed next.
With np.unique -
import numpy as np
_,idx = np.unique(my_first_array, return_index=True)
out = np.split(my_first_array, idx)[1:]
With np.where and np.diff -
idx = np.where(np.diff(my_first_array)!=0)[0] + 1
out = np.split(my_first_array, idx)
Sample run -
In [28]: my_first_array
Out[28]: array([ 1, 1, 1, 3, 5, 5, 9, 9, 9, 9, 9, 10, 23, 23])
In [29]: _,idx = np.unique(my_first_array, return_index=True)
...: out = np.split(my_first_array, idx)[1:]
...:
In [30]: out
Out[30]:
[array([1, 1, 1]),
array([3]),
array([5, 5]),
array([9, 9, 9, 9, 9]),
array([10]),
array([23, 23])]
In [31]: idx = np.where(np.diff(my_first_array)!=0)[0] + 1
...: out = np.split(my_first_array, idx)
...:
In [32]: out
Out[32]:
[array([1, 1, 1]),
array([3]),
array([5, 5]),
array([9, 9, 9, 9, 9]),
array([10]),
array([23, 23])]
Here is a solution, although it might not be very efficient:
my_first_array = [1,1,1,3,5,5,9,9,9,9,9,10,23,23]
wanted_array = [ [1,1,1], [3], [5,5], [9,9,9,9,9], [10], [23,23] ]
new_array = [ [my_first_array[0]] ]
count = 0
for i in range(1,len(my_first_array)):
a = my_first_array[i]
if a == my_first_array[i - 1]:
new_array[count].append(a)
else:
count += 1
new_array.append([])
new_array[count].append(a)
new_array == wanted_array
This is O(n):
a = [1,1,1,3,5,5,9,9,9,9,9,10,23,23,24]
res = []
s = 0
e = 0
length = len(a)
while s < length:
b = []
while e < length and a[s] == a[e]:
b.append(a[s])
e += 1
res.append(b)
s = e
print res

Numpy - Stacked memory view of two 1D arrays

I know that I can do the following:
import numpy as np
c = np.random.randn(20, 2)
a = c[:, 0]
b = c[:, 1]
Here, a and b are pointers to c's first and second column respectively. Modifying a or b will change c (same reciprocally).
However, what I want to achieve is exactly the opposite. I want to create a 2D memory view where each column (or row) will point to a memory of a different 1D array. Assume that I already have two 1D arrays, is it possible to create a 2D view to these arrays where each row/column points to each of them?
I can create c from a and b in the following way:
c = np.c_[a, b]
However, this copies a's and b memory onto c. Can I just somehow create c as 'view' of [a b], where, by modifying an element of c this reflects in the respective a or b 1D array?
I don't think it is possible.
In your first example, the values of the a and b views are interwoven, as can be seen from this variation:
In [51]: c=np.arange(10).reshape(5,2)
In [52]: a, b = c[:,0], c[:,1]
In [53]: a
Out[53]: array([0, 2, 4, 6, 8])
In [54]: c.flatten()
Out[54]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
The data buffer for c and a start at the same memory point; b starts at 4 bytes into that buffer.
In [55]: c.__array_interface__
Out[55]:
{'strides': None,
'data': (172552624, False),...}
In [56]: a.__array_interface__
Out[56]:
{'strides': (8,),
'data': (172552624, False),...}
In [57]: b.__array_interface__
Out[57]:
{'strides': (8,),
'data': (172552628, False),...}
Even if the a,b split were by rows, b would start just further along in the same shared data buffer.
From the .flags we see that c is C-contiguous, b is not. But b values are accessed with constant strides in that shared data buffer.
When a and b are created separately, their data buffers are entirely separate. The numpy striding mechanism cannot step back and forth between these two data buffers. A 2d composite of a and b has to work with its own data buffer.
I can imagine writing a class that ends up looking like what you want. The indexing_tricks file that defines np.c_ might give you ideas (e.g. a class with a custom __getitem__ method). But it wouldn't have the speed advantages of a regular 2d array. And it might be hard to implement all of the ndarray functionality.
While #hpaulj's answer is the correct one, for your particular case, and more as an exercise in understanding numpy memory layout than as anything with practical applications, here's how you can get a view of two 1-D arrays as columns of a common array:
>>> from numpy.lib.stride_tricks import as_strided
>>> a = np.arange(10)
>>> b = np.arange(20, 30)
>>> col_stride = (b.__array_interface__['data'][0] -
a.__array_interface__['data'][0])
>>> c = as_strided(a, shape=(10, 2), strides=(a.strides[0], col_stride))
>>> c
array([[ 0, 20],
[ 1, 21],
[ 2, 22],
[ 3, 23],
[ 4, 24],
[ 5, 25],
[ 6, 26],
[ 7, 27],
[ 8, 28],
[ 9, 29]])
>>> c[4, 1] = 0
>>> c[6, 0] = 0
>>> a
array([0, 1, 2, 3, 4, 5, 0, 7, 8, 9])
>>> b
array([20, 21, 22, 23, 0, 25, 26, 27, 28, 29])
There are many things that can go wrong here, mainly that the array b has not had its reference count increased, so if you delete it its memory will be released, but the view will still be accessing it. It can also not be extended to more than two 1-D arrays, and requires that both 1-D arrays have the same stride.
Of course, just because you can do it doesn't mean you should do it! And you should definitely not do this.

Find common numbers in Python

I've 2 list
a = [1,9] # signifies the start point and end point, ie numbers 1,2,3,4,5,6,7,8,9
b = [4,23] # same for this.
Now I need to find whether the numbers from a intersect with numbers from b.
I can do it via making a list of numbers from a and b,and then intersecting the 2 lists, but I'm looking for some more pythonic solution.
Is there anything better solution.
My o/p should be 4,5,6,7,8,9
This is using intersecting two lists:
c = list(set(range(a[0],a[1]+1)) & set(range(b[0],b[1]+1)))
>>> print c
[4,5,6,7,8,9]
This is using min and max:
>>> c = range(max([a[0],b[0]]), min([a[1],b[1]])+1)
a = [1, 2, 3, 4, 5, 6, 7, 8, 9]
b = [4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23]
The most efficient way is using sets:
result = set(a).intersection(b)
Of course you can use a generator (a pythonic way of applying your logic)
result = (x for x in a if x in b)
You need to get [] or None or sth if sets do not inersect. Something like this would be most efficient:
def intersect(l1, l2):
bg = max(l1[0], l2[0])
end = max(l1[1], l2[1])
return [bg, end] if bg < end else []

Python sort array by another positions array

Assume I have two arrays, the first one containing int data, the second one containing positions
a = [11, 22, 44, 55]
b = [0, 1, 10, 11]
i.e. I want a[i] to be be moved to position b[i] for all i. If I haven't specified a position, then insert a -1
i.e
sorted_a = [11, 22,-1,-1,-1,-1,-1,-1,-1,-1, 44, 55]
^ ^ ^ ^
0 1 10 11
Another example:
a = [int1, int2, int3]
b = [5, 3, 1]
sorted_a = [-1, int3, -1, int2, -1, int1]
Here's what I've tried:
def sort_array_by_second(a, b):
sorted = []
for e1 in a:
sorted.appendAt(b[e1])
return sorted
Which I've obviously messed up.
Something like this:
res = [-1]*(max(b)+1) # create a list of required size with only -1's
for i, v in zip(b, a):
res[i] = v
The idea behind the algorithm:
Create the resulting list with a size capable of holding up to the largest index in b
Populate this list with -1
Iterate through b elements
Set elements in res[b[i]] with its proper value a[i]
This will leave the resulting list with -1 in every position other than the indexes contained in b, which will have their corresponding value of a.
I would use a custom key function as an argument to sort. This will sort the values according to the corresponding value in the other list:
to_be_sorted = ['int1', 'int2', 'int3', 'int4', 'int5']
sort_keys = [4, 5, 1, 2, 3]
sort_key_dict = dict(zip(to_be_sorted, sort_keys))
to_be_sorted.sort(key = lambda x: sort_key_dict[x])
This has the benefit of not counting on the values in sort_keys to be valid integer indexes, which is not a very stable thing to bank on.
>>> a = ["int1", "int2", "int3", "int4", "int5"]
>>> b = [4, 5, 1, 2, 3]
>>> sorted(a, key=lambda x, it=iter(sorted(b)): b.index(next(it)))
['int4', 'int5', 'int1', 'int2', 'int3']
Paulo Bu answer is the best pythonic way. If you want to stick with a function like yours:
def sort_array_by_second(a, b):
sorted = []
for n in b:
sorted.append(a[n-1])
return sorted
will do the trick.
Sorts A by the values of B:
A = ['int1', 'int2', 'int3', 'int4', 'int5']
B = [4, 5, 1, 2, 3]
from operator import itemgetter
C = [a for a, b in sorted(zip(A, B), key = itemgetter(1))]
print C
Output
['int3', 'int4', 'int5', 'int1', 'int2']
a = [11, 22, 44, 55] # values
b = [0, 1, 10, 11] # indexes to sort by
sorted_a = [-1] * (max(b) + 1)
for index, value in zip(b, a):
sorted_a[index] = value
print(sorted_a)
# -> [11, 22, -1, -1, -1, -1, -1, -1, -1, -1, 44, 55]

Categories

Resources