Python sort array by another positions array

Python sort array by another positions array - python

Assume I have two arrays, the first one containing int data, the second one containing positions
a = [11, 22, 44, 55]
b = [0, 1, 10, 11]
i.e. I want a[i] to be be moved to position b[i] for all i. If I haven't specified a position, then insert a -1
i.e
sorted_a = [11, 22,-1,-1,-1,-1,-1,-1,-1,-1, 44, 55]
^ ^ ^ ^
0 1 10 11
Another example:
a = [int1, int2, int3]
b = [5, 3, 1]
sorted_a = [-1, int3, -1, int2, -1, int1]
Here's what I've tried:
def sort_array_by_second(a, b):
sorted = []
for e1 in a:
sorted.appendAt(b[e1])
return sorted
Which I've obviously messed up.

Something like this:
res = [-1]*(max(b)+1) # create a list of required size with only -1's
for i, v in zip(b, a):
res[i] = v
The idea behind the algorithm:
Create the resulting list with a size capable of holding up to the largest index in b
Populate this list with -1
Iterate through b elements
Set elements in res[b[i]] with its proper value a[i]
This will leave the resulting list with -1 in every position other than the indexes contained in b, which will have their corresponding value of a.

I would use a custom key function as an argument to sort. This will sort the values according to the corresponding value in the other list:
to_be_sorted = ['int1', 'int2', 'int3', 'int4', 'int5']
sort_keys = [4, 5, 1, 2, 3]
sort_key_dict = dict(zip(to_be_sorted, sort_keys))
to_be_sorted.sort(key = lambda x: sort_key_dict[x])
This has the benefit of not counting on the values in sort_keys to be valid integer indexes, which is not a very stable thing to bank on.

>>> a = ["int1", "int2", "int3", "int4", "int5"]
>>> b = [4, 5, 1, 2, 3]
>>> sorted(a, key=lambda x, it=iter(sorted(b)): b.index(next(it)))
['int4', 'int5', 'int1', 'int2', 'int3']

Paulo Bu answer is the best pythonic way. If you want to stick with a function like yours:
def sort_array_by_second(a, b):
sorted = []
for n in b:
sorted.append(a[n-1])
return sorted
will do the trick.

Sorts A by the values of B:
A = ['int1', 'int2', 'int3', 'int4', 'int5']
B = [4, 5, 1, 2, 3]
from operator import itemgetter
C = [a for a, b in sorted(zip(A, B), key = itemgetter(1))]
print C
Output
['int3', 'int4', 'int5', 'int1', 'int2']

a = [11, 22, 44, 55] # values
b = [0, 1, 10, 11] # indexes to sort by
sorted_a = [-1] * (max(b) + 1)
for index, value in zip(b, a):
sorted_a[index] = value
print(sorted_a)
# -> [11, 22, -1, -1, -1, -1, -1, -1, -1, -1, 44, 55]

Related

IndexError: list assignment index out of range , Python

I am trying to achieve functionality. It's working should be this way:
It takes two lists.
Mark some indexes, preferably center few.
Both parents switches marked indexes.
Other indexes go sequentially to their parent element.
If the same element is already present in that parent, it maps and check where other parent same element was and goes there.
import random
def pm(indA, indB):
size = min(len(indA), len(indB))
c1, c2 = [0] * size, [0] * size
# Initialize the position of each indices in the individuals
for i in range(1,size):
c1[indA[i]] = i
c2[indB[i]] = i
crosspoint1 = random.randint(0, size)
crosspoint2 = random.randint(0, size - 1)
if crosspoint2 >= crosspoint1:
crosspoint2 += 1
else: # Swap the two cx points
crosspoint1, crosspointt2 = crosspoint2, crosspoint1
for i in range(crosspoint1, crosspoint2):
# Keep track of the selected values
temp1 = indA[i]
temp2 = indB[i]
# Swap the matched value
indA[i], indA[c1[temp2]] = temp2, temp1
indB[i], indB[c2[temp1]] = temp1, temp2
# Position bookkeeping
c1[temp1], c1[temp2] = c1[temp2], c1[temp1]
c2[temp1], c2[temp2] = c2[temp2], c2[temp1]
return indA, indB
a,b = pm([3, 4, 8, 2, 7, 1, 6, 5],[4, 2, 5, 1, 6, 8, 3, 7])
Error:
in pm
c1[indA[i]] = i
IndexError: list assignment index out of range

Not sure whether there are other errors in your code (I didn't run it), but here's the explanation for this one. In Python (as most of other languages), lists (sequences to be more precise) index is 0 based:
>>> l = [1, 2, 3, 4, 5, 6]
>>>
>>> for e in l:
... print(e, l.index(e))
...
1 0
2 1
3 2
4 3
5 4
6 5
>>>
>>> l[0]
1
>>> l[5]
6
>>> l[6]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: list index out of range
To summarize your problem:
Your indA and indB lists have each 6 elements ([1..6]), and their indexes: [0..5]
Your c1 and c2 lists also have 6 elements (indexes also [0..5])
But, your using values from #1. as indexes in lists from #2., and the value 6 is a problem, as there's no such index
To fix your problem, you should use valid index values. Either:
Have the proper values in indA and indB (this is the one I'd chose):
a, b = pmxCrossover([0, 3, 1, 2, 5, 4], [4, 0, 2, 3, 5, 1])
Subtract 1, wherever you encounter values from indA or indB used as indexes:
c1[indA[i] - 1] = i
As a general advice: whenever you encounter errors, add print statements before the faulty line (printing (partial) stuff from it), and that might give you clues that could lead to solving the problem yourself.
#EDIT0
Posting (a slightly modified version of) the original code, with the index conversion:
Before the algorithm: subtract 1 (from each element) to have valid indexes
After the algorithm: add 1 to come back to 1 based indexes
code00.py:
#!/usr/bin/env python3
import sys
import random
def pmx_crossover(ind_a, ind_b):
size = min(len(ind_a), len(ind_b))
c1, c2 = [0] * size, [0] * size
# Initialize the position of each indices in the individuals
for i in range(1, size):
c1[ind_a[i]] = i
c2[ind_b[i]] = i
# Choose crossover points
crosspoint1 = random.randint(0, size)
crosspoint2 = random.randint(0, size - 1)
if crosspoint2 >= crosspoint1:
crosspoint2 += 1
else: # Swap the two cx points
crosspoint1, crosspointt2 = crosspoint2, crosspoint1
# Apply crossover between cx points
for i in range(crosspoint1, crosspoint2):
# Keep track of the selected values
temp1 = ind_a[i]
temp2 = ind_b[i]
# Swap the matched value
ind_a[i], ind_a[c1[temp2]] = temp2, temp1
ind_b[i], ind_b[c2[temp1]] = temp1, temp2
# Position bookkeeping
c1[temp1], c1[temp2] = c1[temp2], c1[temp1]
c2[temp1], c2[temp2] = c2[temp2], c2[temp1]
return ind_a, ind_b
def main():
#initial_a, initial_b = [1, 2, 3, 4, 5, 6, 7, 8], [3, 7, 5, 1, 6, 8, 2, 4]
initial_a, initial_b = [1, 4, 2, 3, 6, 5], [5, 1, 3, 4, 6, 2]
index_offset = 1
temp_a = [i - index_offset for i in initial_a]
temp_b = [i - index_offset for i in initial_b]
a, b = pmx_crossover(temp_a, temp_b)
final_a = [i + index_offset for i in a]
final_b = [i + index_offset for i in b]
print("Initial: {0:}, {1:}".format(initial_a, initial_b))
print("Final: {0:}, {1:}".format(final_a, final_b))
if __name__ == "__main__":
print("Python {0:s} {1:d}bit on {2:s}\n".format(" ".join(item.strip() for item in sys.version.split("\n")), 64 if sys.maxsize > 0x100000000 else 32, sys.platform))
main()
print("\nDone.")
Output (one of the possibilities (due to random.randint)):
[cfati#CFATI-5510-0:e:\Work\Dev\StackOverflow\q058424002]> "e:\Work\Dev\VEnvs\py_064_03.07.03_test0\Scripts\python.exe" code00.py
Python 3.7.3 (v3.7.3:ef4ec6ed12, Mar 25 2019, 22:22:05) [MSC v.1916 64 bit (AMD64)] 64bit on win32
Initial: [1, 4, 2, 3, 6, 5], [5, 1, 3, 4, 6, 2]
Final: [1, 3, 2, 4, 6, 5], [5, 1, 4, 3, 6, 2]
Done.

c1 is out of range because in your for at the fourth index the value of indA[4] is 6.
And the range of c1 index it's 0-5 (it's lengh is 6).
With c1[indA[i]] = i
you try to do c1[6] = 4

Finding consecutive duplicates and listing their indexes of where they occur in python

I have a list in python for example:
mylist = [1,1,1,1,1,1,1,1,1,1,1,
0,0,1,1,1,1,0,0,0,0,0,
1,1,1,1,1,1,1,1,0,0,0,0,0,0]
my goal is to find where there are five or more zeros in a row and then list the indexes of where this happens, for example the output for this would be:
[17,21][30,35]
here is what i have tried/seen in other questions asked on here:
def zero_runs(a):
# Create an array that is 1 where a is 0, and pad each end with an extra 0.
iszero = np.concatenate(([0], np.equal(a, 0).view(np.int8), [0]))
absdiff = np.abs(np.diff(iszero))
# Runs start and end where absdiff is 1.
ranges = np.where(absdiff == 1)[0].reshape(-1, 2)
return ranges
runs = zero_runs(list)
this gives output:
[0,10]
[11,12]
...
which is basically just listing indexes of all duplicates, how would i go about separating this data into what i need

You could use itertools.groupby, it will identify the contiguous groups in the list:
from itertools import groupby
lst = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0]
groups = [(k, sum(1 for _ in g)) for k, g in groupby(lst)]
cursor = 0
result = []
for k, l in groups:
if not k and l >= 5:
result.append([cursor, cursor + l - 1])
cursor += l
print(result)
Output
[[17, 21], [30, 35]]

Your current attempt is very close. It returns all of the runs of consecutive zeros in an array, so all you need to accomplish is adding a check to filter runs of less than 5 consecutive zeros out.
def threshold_zero_runs(a, threshold):
iszero = np.concatenate(([0], np.equal(a, 0).view(np.int8), [0]))
absdiff = np.abs(np.diff(iszero))
ranges = np.where(absdiff == 1)[0].reshape(-1, 2)
m = (np.diff(ranges, 1) >= threshold).ravel()
return ranges[m]
array([[17, 22],
[30, 36]], dtype=int64)

Use the shift operator on the array. Compare the shifted version with the original. Where they do not match, you have a transition. You then need only to identify adjacent transitions that are at least 5 positions apart.
Can you take it from there?

Another way using itertools.groupby and enumerate.
First find the zeros and the indices:
from operator import itemgetter
from itertools import groupby
zerosList = [
list(map(itemgetter(0), g))
for i, g in groupby(enumerate(mylist), key=itemgetter(1))
if not i
]
print(zerosList)
#[[11, 12], [17, 18, 19, 20, 21], [30, 31, 32, 33, 34, 35]]
Now just filter zerosList:
runs = [[x[0], x[-1]] for x in zerosList if len(x) >= 5]
print(runs)
#[[17, 21], [30, 35]]

Python: how to avoid loop?

I have a list of entries
l = [5, 3, 8, 12, 24]
and a matrix M
M:
12 34 5 8 7
0 24 12 3 1
I want to find the indeces of the matrix where appear the numbers in l. For the k-entry of l I want to save a random couple of indices i, j where M[i][j]==l[k]. I am doing the following
indI = []
indJ = []
for i in l:
tmp = np.where(M == i)
rd = randint(len(tmp))
indI.append(tmp[0][rd])
indJ.append(tmp[1][rd])
I would like to see if there is a way to avoid that loop

One way in which you should be able to significantly speed up your code is to avoid duplicate work:
tmp = np.where(M == i)
As this gives you a list of all locations in M where the value is equal to i, it must be searching through the entire matrix. So for each element in l, you are searching through the full matrix.
Instead of doing that, try indexing your matrix as a first step:
matrix_index = {}
for i in len(M):
for j in len(M[i]):
if M[i][j] not in matrix_index:
matrix_index[M[i][j]] = [(i,j)]
else:
matrix_index[M[i][j]].append((i,j))
Then for each value in l, instead of doing a costly search through the full matrix, you can just get it straight from your matrix index.
Note: I haven't with numpy very much, so I may have gotten the specific syntax incorrect. There may also be a more idiomatic way of doing this in numpy.

If both l and M are not large matrices like the following:
In: l0 = [5, 3, 8, 12, 34, 1, 12]
In: M0 = [[12, 34, 5, 8, 7],
In: [ 0, 24, 12, 3, 1]]
In: l = np.asarray(l)
In: M = np.asarray(M)
You can try this:
In: np.where(l[None, None, :] == M[:, :, None])
Out:
(array([0, 0, 0, 0, 0, 1, 1, 1, 1]), <- i
array([0, 0, 1, 2, 3, 2, 2, 3, 4]), <- j
array([3, 6, 4, 0, 2, 3, 6, 1, 5])) <- k
The rows should be the i, j, k, respectively and read the column to get every (i, j, k) you need. For example, the 1st column [0, 0, 3] means M[0, 0] = l[3], and the 2nd column [0, 0, 6] says M[0, 0] = l[6], and vice versa. I think these are what you want.
However, the numpy trick can not be extended to very large matrices, such as 2M elements in l or 2500x2500 elements in M. They need quite a lot memory and very very long time to compute... if they are lucky not to crash for out of memory. :)

One solution that does not use the word for is
c = np.apply_along_axis(lambda row: np.random.choice(np.argwhere(row).ravel()), 1, M.ravel()[np.newaxis, :] == l[:, np.newaxis])
indI, indJ = c // M.shape[1], c % M.shape[1]
Note that while that solves the problem, M.ravel()[np.newaxis, :] == l[:, np.newaxis] will quickly produce MemoryErrors. A more pragmatic approach would be to get the indices of interest through something like
s = np.argwhere(M.ravel()[np.newaxis, :] == l[:, np.newaxis])
and then do the random choice post-processing by hand. This, however, probably does not yield any significant performance improvements over your search.
What makes it slow, though, is that you search through the entire matrix in every step of your loop; by pre-sorting the matrix (at a certain cost) gives you a straightforward way of making each individual search much faster:
In [312]: %paste
def direct_search(M, l):
indI = []
indJ = []
for i in l:
tmp = np.where(M == i)
rd = np.random.randint(len(tmp[0])) # Note the fix here
indI.append(tmp[0][rd])
indJ.append(tmp[1][rd])
return indI, indJ
def using_presorted(M, l):
a = np.argsort(M.ravel())
M_sorted = M.ravel()[a]
def find_indices(i):
s = np.searchsorted(M_sorted, i)
j = 0
while M_sorted[s + j] == i:
yield a[s + j]
j += 1
indices = [list(find_indices(i)) for i in l]
c = np.array([np.random.choice(i) for i in indices])
return c // M.shape[1], c % M.shape[1]
## -- End pasted text --
In [313]: M = np.random.randint(0, 1000000, (1000, 1000))
In [314]: l = np.random.choice(M.ravel(), 1000)
In [315]: %timeit direct_search(M, l)
1 loop, best of 3: 4.76 s per loop
In [316]: %timeit using_presorted(M, l)
1 loop, best of 3: 208 ms per loop
In [317]: indI, indJ = using_presorted(M, l) # Let us check that it actually works
In [318]: np.all(M[indI, indJ] == l)
Out[318]: True

One Function in Python to Iterate Over Concurrent Lists to be Used in Equation

I have four separate lists of integers that I need to use concurrently in an equation:
h = [160, 193, 162, 17, 0]
d = [32, 1, 34, 35, 4]
t = [1, 2, 3, 4, 5]
r = [2, 5, 1, 3, 4]
s = h - (d + t + r)
I am trying to create one function to which I can pass each separate list as an argument to use in the function. I want to be able to take the value at each successive index on each list and then use them in the correct place in the equation. I would then take the value of s at each index and populate a new list.
So for example the equation at index[0] should read:
s = 160 - (32 + 1 + 2)
How can I take each integer value from list? I have tried to use the enumerate function and I have read about the * function, but I am not sure that I am supposed to be unpacking the lists - should I not just be iterating over them with a for loop?
def getSingles(h, d, r, t)
singles = []
for n, val in enumerate(h):
hit = val
for n, val in enumerate(d):
double = val
for n, val in enumerate(t):
triple = val
for n, val in enumerate(r):
run = val
I am basically suck here - is this even possible? Thank you!

You could zip them together. Something like this should work:
>>> H = [160, 193, 162, 17, 0]
>>> D = [32, 1, 34, 35, 4]
>>> T = [1, 2, 3, 4, 5]
>>> R = [2, 5, 1, 3, 4]
>>>
>>> for h, d, t, r in zip(H, D, T, R):
... s = h - (d + t + r)
... print(s)
...
125
185
124
-25
-13
Note that if you're using Python 2.x and using very large lists, you might want to use itertools.izip instead.

You can use zip function and a lambda in map :
>>> map(lambda x: x[0] - (x[1] + x[2] + x[3]),zip(h,d,t,r))
[125, 185, 124, -25, -13]

A pandas Series makes this easy:
>>> import pandas as pd
>>> h = pd.Series([160, 193, 162, 17, 0])
>>> d = pd.Series([32, 1, 34, 35, 4])
>>> t = pd.Series([1, 2, 3, 4, 5])
>>> r = pd.Series([2, 5, 1, 3, 4])
>>> s = h - (d + t + r)
>>> s
0 125
1 185
2 124
3 -25
4 -13
dtype: int64
If you have h,d,t,r data in a CSV file, you can use pandas.read_csv() to read that into a pandas Dataframe. A Dataframe is like an array of Series, and can calculate new columns in a similar fashion.

Finding indices of the lowest 5 numbers of an array in Python

How to find out which indices belong to the lowest x (say, 5) numbers of an array?
[10.18398473, 9.95722384, 9.41220631, 9.42846614, 9.7300549 , 9.69949144, 9.86997862, 10.28299122, 9.97274071, 10.08966867, 9.7]
Also, how to directly find the sorted (from low to high) lowest x numbers?

The existing answers are nice, but here's the solution if you're using numpy:
mylist = np.array([10.18398473, 9.95722384, 9.41220631, 9.42846614, 9.7300549 , 9.69949144, 9.86997862, 10.28299122, 9.97274071, 10.08966867, 9.7])
x = 5
lowestx = np.argsort(mylist)[:x]
#array([ 2, 3, 5, 10, 4])

You could do something like this:
>>> l = [5, 1, 2, 4, 6]
>>> sorted(range(len(l)), key=lambda i: l[i])
[1, 2, 3, 0, 4]

mylist = [10.18398473, 9.95722384, 9.41220631, 9.42846614, 9.7300549 , 9.69949144, 9.86997862, 10.28299122, 9.97274071, 10.08966867, 9.7]
# lowest 5
lowest = sorted(mylist)[:5]
# indices of lowest 5
lowest_ind = [i for i, v in enumerate(mylist) if v in lowest]
# 5 indices of lowest 5
import operator
lowest_5ind = [i for i, v in sorted(enumerate(mylist), key=operator.itemgetter(1))[:5]]

[a.index(b) for b in sorted(a)[:5]]
sorted(a)[.x]

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python sort array by another positions array - python

>>> a = ["int1", "int2", "int3", "int4", "int5"] >>> b = [4, 5, 1, 2, 3] >>> sorted(a, key=lambda x, it=iter(sorted(b)): b.index(next(it))) ['int4', 'int5', 'int1', 'int2', 'int3']

Paulo Bu answer is the best pythonic way. If you want to stick with a function like yours: def sort_array_by_second(a, b): sorted = [] for n in b: sorted.append(a[n-1]) return sorted will do the trick.

Sorts A by the values of B: A = ['int1', 'int2', 'int3', 'int4', 'int5'] B = [4, 5, 1, 2, 3] from operator import itemgetter C = [a for a, b in sorted(zip(A, B), key = itemgetter(1))] print C Output ['int3', 'int4', 'int5', 'int1', 'int2']

a = [11, 22, 44, 55] # values b = [0, 1, 10, 11] # indexes to sort by sorted_a = [-1] * (max(b) + 1) for index, value in zip(b, a): sorted_a[index] = value print(sorted_a) # -> [11, 22, -1, -1, -1, -1, -1, -1, -1, -1, 44, 55]

Related

IndexError: list assignment index out of range , Python

Finding consecutive duplicates and listing their indexes of where they occur in python

Python: how to avoid loop?

One Function in Python to Iterate Over Concurrent Lists to be Used in Equation

Finding indices of the lowest 5 numbers of an array in Python

Categories

Resources