Handling duplicates when using Partially Matched Crossover for Genetic Algorithm - python

I am new to Genetic Algorithms and am working on a python implementation. I am up to the crossover step and am attempting a Partially Matched Crossover. For my final output I am hoping for a list that contains no duplicated numbers. However, in some cases, I am introducing duplicates.
For example, Take the lists
Mate 1 [1,2,3,5,4,6]
Mate 2 [6,5,4,3,2,1]
If the crossover portion is [3,5,4] -> [4,3,2]
Then the offspring before mapping becomes [1,2,4,3,2,6]. My understanding of the algorithm is the mapping outside the crossover is 4 -> 3, 5 -> 3 and 2 -> 4. However, this results in an output of [1,4,4,3,2,6] which has duplicates and is missing a 5.
How do I work around this problem? Does the first 4 just become a 5? And how would this scale to larger lists that might introduce multiple duplicates?

I am not sure you have implemented it right:
for Partially Matched Crossover (see explanation), if your crossover points are 2 and 5 as suggested in the example then you can only obtain
offspring1 = [6, 2, 3, 5, 4, 1]
offspring2 = [1, 5, 4, 3, 2, 6]
if you select 3,5,4 from mate1 and fill the rest in the order of mate2 you will get offspring 1 but if you select 4,3,2 from mate2 and fill the rest in the order of mate 1 you will get offspring 2
See implementation below:
mate1 = [1,2,3,5,4,6]
mate2 = [6,5,4,3,2,1]
crossoverpoint1 = 2
crossoverpoint2=5
child = []
#fill in the initial genes in order of mate1
count = 0
for i in mate1:
if(count == crossoverpoint1):
break
if(i not in mate2[crossoverpoint1:crossoverpoint2]):
child.append(i)
count= count+1
#select the genes within the crossover points from mate2
child.extend(mate2[crossoverpoint1:crossoverpoint2])
#fill in the remaining genes in order of mate1
child.extend([x for x in mate1 if x not in child])
print(child)
output:
[1, 5, 4, 3, 2, 6]
to obtain offspring1 swap mate1 for mate2.
you can also try different crossover points, let me know if this helps

Related

Python find index of last element smaller than number

I have a list of numbers from 0 to 3 and I want to remove every number that is smaller than 2 xor is not connected to the last 3 in the list. It is also going to be done about 200 Million times so it should preferably perform well. For example, I could have a list like that:
listIwantToCheck = [3, 0, 1, 2, 0, 2, 3, 2, 2, 3, 2, 0, 2, 1]
listIWantToGet = [2, 3, 2, 2, 3]
I already have the index of the last 3 so what I would do is:
listIWantToGet = listIWantToCheck[??? : indexOfLastThree + 1]
??? being 4 in this instance. It is the index with the mentioned conditions.
So How do I get the index of the last number smaller than 2?
Nailed it, the index i want is
index = ([0]+[i for i, e in enumerate(listIWantToCheck[:indexOfLastThree]) if e < 2])[-1] + 1
List comprehension is truly beautiful.
I enumerated through the slice and created a list of all indices, which point to a number smaller than 2 and took the last. The 0 in front is added to circumvent an index error, which would occur if there are no elements smaller than 2.
So if i get this right you want to get the index of the last number smaller than 2 that comes before the last 3.
My approach would be to take the part of the list from index 0 to the index of the last 3 and then reverse the list and check if the number is smaller than 2.
If you however want to get the last 2 of the entire list just reverse it and loop through it the same way
for i in listIWanttoCheck[0:indexOfLastThree].reverse():
if i <2:
return listIWanttoCheck.index(i)
Correct me if I missunderstood your problem

How can i optimize this python code to reduce time complexity

I'm Trying To Solve A Problem.
Given 2 arrays of integers, A and B, both of size n.
Suppose you have a list of integers, initially empty. Now, you traverse the 2 arrays simultaneously for every i from 1 to n, and perform the following operation:
You append Ai to the list.
If the list now contains at least Bi distinct elements, you will remove Bi distinct elements from the list. If the list has more than Bi distinct elements, you will remove Bi distinct elements ordered by the frequency in the list, from highest frequency to lowest i.e. element with a higher frequency is removed first.
If 2 elements have the same frequency in the list, then you will remove the element with the lower value of Ai.
If you can perform this removal operation, then it is counted as a successful operation, else it is unsuccessful.
Note that in case of a successful operation, the updated list, from which Bi distinct elements have been removed, is taken for the next operation.
Example
Assumptions
n = 4
A = [1, 1, 2, 3]
B = [3, 2, 3, 2]
Approach
Initially, the list is empty.
For i = 1, you first append A1 to the list. So, the list becomes [1]. Now, B1 = 3, that is, you need at least 3 distinct elements in the list. So, the operation is unsuccessful.
For i = 2, you first append A2 to the list. So, the list becomes [1, 1]. Now, B2=2, that is, you need at least 2 distinct elements in the list, but the list has only 1 distinct element. So, the operation is unsuccessful.
For i = 3, you first append A3 to the list. So, list becomes [1, 1, 2]. Now, B3 = 3, that is, you need to remove at least 3 distinct elements in the list, but the list has 2 distinct elements. So, the operation is unsuccessful.
For i = 4, you first append A4 to the list. So, the list becomes [1, 1, 2, 3]. Now, B4 = 2, that is, you need to remove at least 2 distinct elements from the list, and the list has 3 distinct elements. So, the operation is successful. You remove a 1 as it has the highest frequency in the list, and then we remove 2 as it has a lower value than 3. So, the updated list is [1, 3].
Hence, the answer, which is the number of successful operations, is 1.
The Below Is My Code Its Working fine for small cases. But FOr higher cases it giving time limit exceeded.
T = int(input())# No of test cases
for _ in range(T):
N = int(input()) # Numbers of Elements in Array
A = list(map(int,input().split()))
B = list(map(int,input().split()))
Lis = [] #New List
Sucess = 0
for i in range(N):
Lis.append(A[i])
if(len(set(Lis))>=B[i]):
Lis = sorted(Lis,key=Lis.count,reverse = True)
Set = []
for S in Lis:
if S not in Set:
Set.append(S)
if(len(Set)==B[i]):
break
for K in range(B[i]):
Lis.remove(Set[K])
Sucess += 1
print(Sucess)
Input
1
4
1 1 2 3
3 2 3 2
Output
1
How can i optimize this code further?

Sorting Algorithm Idea

I want to create a sorting algorithm for a specific game inventory.
Each item has an ID and a size (1-3). The size reflects how many slots it occupies in the inventory, vertically.
I want to create a sorting algorithm using its size mainly so the largest items are first and that would be very simple. However the inventory has multiple pages, each page having 5 columns of 10 rows. This is where the problem appears. Logically you will fill up the first inventory with 3 sized items, however that means that in the last row there wont be any items. So the algorithm has to fill the first 6 rows with 3 size items, and the second 4 with 2 size items. The number of items is dynamic so that may not be the case every time. Can anyone point me in the right direction? I am using python. Thank you very much!
If your goal is to:
minimize the number of unoccupied rows
then at equivalent solution, prefer the one which has the most "big items"
You may apply a 0-1 knapsack algorithm: maximize the "cost" up to 10
Below a solution dumbly copy-pasted and adaptated from a previous answer of mine
long story short:
apply knapsack (do it yourself, code is just for illustration)
a candidate is a set of items picked among all the available items
in implem below, we grow the candidate size so at equal sum, the shorter its size the bigger the items in it (which fulfills our requirement)
default to the candidate whose sum is closest to 10 if none reach 10 (best_fallback)
from collections import namedtuple
def pick_items (values):
S = 10
Candidate = namedtuple('Candidate', ['sum', 'lastIndex', 'path'])
tuples = [Candidate(0, -1, [])]
best_fallback = tuples[0]
while len(tuples):
next = []
for (sum, i, path) in tuples:
for j in range(i + 1, len(values)):
v = values[j]
if v + sum <= S:
candidate = Candidate(sum = v + sum, lastIndex = j, path = path + [v])
if candidate[0] > best_fallback[0]:
best_fallback = candidate
next.append(candidate)
if v + sum == S:
return path + [v]
tuples = next
return best_fallback[2]
print(pick_items([3,3,3,1])) #finds the trivial sum [3, 3, 3, 1]
print(pick_items([1,3,3,1])) #returns the closest to goal [1, 3, 3, 1]
print(pick_items([2,2,2,2,2,1,3,3,1])) #returns the shortest set [2, 2, 3, 3]
print(pick_items([3,3,2,2,3])) #returns an exact count [3, 3, 2, 2]
print(pick_items([3,1,1,1,2,2,2,2])) #shortest set as well [3, 1, 2, 2, 2]
PS: regarding the set [2,2,2,2,2,3,1,3,1] (where there are two solutions of equal size: (3,1, 3,1, 2) and (2,2, 2,2 ,2) we may force the order in which the solutions are explored by prefixing values=sorted(values, reverse=True) at the begininning:
def pick_items (values):
# force biggest items solution to be explored first
values = sorted(values, reverse=True)
S = 10

why the swapping does not take place when indices are 0 and 1?

I have an unordered array consisting of consecutive integers [1, 2, 3, ..., n] without any duplicates. It is allowed to swap any two elements. I need to find the minimum number of swaps required to sort the array in ascending order.
Starting from the first element of the list, I try to put them in their right position (for example if the first element is 7 it should be in the 6th position of the list). To go one by one in the list, I make a copy, and do the swapping in the second list.
a = [4,3,1,2]
b = a[:]
swap = 0
for p in a:
if (p!= b[p-1]):
b[p-1], b[b.index(p)] = b[b.index(p)], b[p-1]
swap+=1
print(swap)
this code works, except for the case that I have to swap two elements in the list whose position is either 0 or 1, in this case . which I don't understand why?? since I'm not exceeding the limit.
Can anyone please explain to me why this happens?
For example, if I print p, two indices where swapping happens, updated list of b and updated number of swaps:
p = 4
idx1= 3 idx2= 0
b= [2, 3, 1, 4]
swap = 1
p = 3
idx1= 2 idx2= 1
b= [2, 1, 3, 4]
swap = 2
p = 1
idx1= 0 idx2= 1
b= [2, 1, 3, 4]
swap = 3
p = 2
idx1= 1 idx2= 0
b= [1, 2, 3, 4]
swap = 4
In this case, you can see that for p = 1, when indices are 0 and 1, the swapping is not taking place.
I changed the order of b[p-1], b[b.index(p)] and I don't have the same problem anymore, but I don't understand the reason.
I have encountered the same problem before, and stucked for a while. The reason to cause this is the order of multiple assignment.
b[p-1], b[b.index(p)] = b[b.index(p)], b[p-1]
Actually multiple assignment is not exactly assign at the same time. pack and unpack mechanism behind this. so it will change b[p-1] first, and b.index(p) in b[b.index(p)] will be find a new index which is p-1 when the case p=1 idx1=0 idx2=1
if you change the assignment order, it will work fine.
b[b.index(p)], b[p - 1] = b[p - 1], b[b.index(p)]
or calculate idx first:
idx1, idx2 = p - 1, b.index(p)
b[idx1], b[idx2] = b[idx2], b[idx1]
I recommend the second version, because first version will do index twice. cost twice time than second version.
you can refer to my related question here: The mechanism behind mutiple assignment in Python
by the way, I think your algorithm is inefficient here, to decrease swap time, but use O(n) index operation, and also copy a array here. I think you can use the same idea, just swap in orignal array.

Max path triangle

I have a triangle with two-hundred rows, where I have to find the maximum distance to get from the top to the bottom of the triangle.
5
9 8
5 4 6
9 7 3 4
Here, the shortest distance would be 5+8+4+3=20. The maximum distance would be 5+9+5+9=28.
I have a good idea of the algorithm I want to implement but I am struggling to turn it into code.
My plan is: start at the 2nd to last row, add the maximum of the possible paths from the bottom row, and iterate to the top.
For instance, the above triangle would turn into:
28
23 19
14 11 10
9 7 3 4
This is vastly more efficient than brute-forcing, but I have two general questions:
Using brute-force, how do I list all the possible paths from top to
bottom (can only move to adjacent points)? I tried using this
(triangle is the list of lists containing the triangle):
points=list(itertools.product(*triangle))
but this contains all possible combinations from each row, not just
adjacent members.
Project Euler #18 - how to brute force all possible paths in tree-like structure using Python?
This somewhat explains a possible approach, but I'd like to use
itertools and any other modules (as pythonic as possible)
How would I go about iterating the strategy of adding each maximum
from the previous row and iterating to the top? I know I have to
implement a nested loop:
for x in triangle:
for i in x:
i+=? #<-Not sure if this would even increment it
edit:
what I was thinking was:
triangle[y][x] = max([triangle[y+1][x],triangle[y+1][x+1]])
It does not use itertools, it is recursive, but I memoize the results, so its still fast...
def memoize(function):
memo = {}
def wrapper(*args):
if args in memo:
return memo[args]
else:
rv = function(*args)
memo[args] = rv
return rv
return wrapper
#memoize
def getmaxofsub(x, y):
if y == len(triangle) or x>y: return 0
#print x, y
return triangle[y][x] + max(getmaxofsub(x, y+1), getmaxofsub(x+1, y+1))
getmaxofsub(0,0)
I read your algorithm suggestion some more times and your "cumulative triangle" is stored in memoof the memoized decorator, so in the end it is very similar. if you want to prevent that there is big stack during recursive "down calling" through the triangle, you can fill the cache of memoize by calling getmaxofsub() bottom -> up.
for i in reversed(range(len(triangle))):
getmaxofsub(0, i), getmaxofsub(i//2, i), getmaxofsub(i, i)
print getmaxofsub(0,0)
Edit
getmaxofsub: How does this function work? First you have to know, that you can't divide your triangle in sub triangles. I take your triangle as an example:
5
9 8
5 4 6
9 7 3 4
That's the complete one. The "coordinates" of the peak are x=0, y=0.
Now I extract the sub triangle of the peak x=0, y=1:
9
5 4
9 7 3
or x=1, y=2
4
7 3
So this is how my algorithm works: The peak of the whole triangle (x=0, y=0) asks its sub triangles (x=0, y=1) and (x=1, y=1), "What is your maximum distance to the ground?" And each of them will ask their sub-triangles and so on…
this will go on until the function reaches the ground/y==len(triangle): The ground-entries want to ask their sub triangles, but since their is none of those, they get the answer 0.
After each triangle has called their sub triangles, it decides, which one is the greater one, add their own value and return this sum.
So now you see, what is the principle of this algorithm. Those algorithms are called recursive algorithms. You see, a function calling itself is pretty standard… and it works…
So, if you think about this whole algorithm, you would see that a lot of sub-triangles are called several times and they would ask their sub-triangles and so on… But each time they return the same value. That is why I used the memorize-decorator: If a function is called with the same arguments x and y, the decorator returns the last calculated value for those arguments and prevents the time-consuming calculation… It is a simple cache…
That is why this function is as easy to implement as a recursive algorithm and as fast as a iteration...
To answer your first question (how to brute-force iterate over all paths): If you start at the top of the triangle and move down along some random path, you have to make a decision to go left or right for every level that you go down. The number of different paths is thus 2^(nrows-1). For your problem with 200 rows, there are thus 8e59 different paths, way to much to check in a brute-force way.
For a small triangle, you can still iterate over all possible paths in a brute-force way, for example like this:
In [10]: from itertools import product
In [11]: triangle = [[5], [9,8], [5,4,6], [9,7,3,4]]
In [12]: for decisions in product((0,1), repeat = len(triangle)-1):
...: pos = 0
...: path = [triangle[0][0]]
...: for lr, row in zip(decisions, triangle[1:]):
...: pos += lr # cumulative sum of left-right decisions
...: path.append(row[pos])
...: print path
[5, 9, 5, 9]
[5, 9, 5, 7]
[5, 9, 4, 7]
[5, 9, 4, 3]
[5, 8, 4, 7]
[5, 8, 4, 3]
[5, 8, 6, 3]
[5, 8, 6, 4]
The way this works is to use itertools.product to iterate over all possible combinations of nrows-1 left/right decisisions, where a 0 means go left and a 1 means go right (so you are more or less generating the bits of all binary numbers up to 2^(nrows-1)). If you store the triangle as a list of lists, going left means staying at the same index in the next row, while going right means adding 1. To keep track of the position in the row, you thus simply calculate the cumulative sum of all left/right decisions.
To answer your second question: First of all, your algorithm seems pretty good, you only need to iterate once backwards over all rows and you do not have the exponential number of cases to check as in the brute-force solution. The only thing I would add to that is to build a new triangle, which indicates at every step whether the maximum was found to the left or to the right. This is useful to reconstruct the optimal path afterwards. All this can be implemented like this:
mx = triangle[-1] # maximum distances so far, start with last row
directions = [] # upside down triangle with left/right direction towards max
for row in reversed(triangle[:-1]): # iterate from penultimate row backwards
directions.append([l < r for l, r in zip(mx[:-1], mx[1:])])
mx = [x + max(l, r) for x, l, r in zip(row, mx[:-1], mx[1:])]
print 'Maximum so far:', mx
print 'The maximum distance is', mx[0]
directions.reverse()
pos = 0
path = [triangle[0][0]]
for direction, row in zip(directions, triangle[1:]):
pos += direction[pos]
path.append(row[pos])
print 'The optimal path is', path
As before, I used the trick that False = 0 and True = 1 to indicate going left and right. Using the same triangle as before, the result:
Maximum so far: [14, 11, 10]
Maximum so far: [23, 19]
Maximum so far: [28]
The maximum distance is 28
The optimal path is [5, 9, 5, 9]

Categories

Resources