How to compare 4 consecutive elements in a list? - python

I am new to coding so I apologize in advance if what I am asking is simple or doesn't make much sense but I will try to elaborate as much as I can. First of all this is not for any work or project I am simply studying to learn a bit of coding for my satisfaction. I've been trying to find some real life problems to apply into coding (pseudo code mostly but python language would also be kind of understandable to me).
I wanted to be able to have a list of x elements and compare 4 of them sequentially.
For example, myList = [a, b, c, d, e, f, g, h, i, j, k, l]
First I want to compare a,b,c and d.
If b>a, c>b, d>c and d> all of 3 previous ones (d>a, d>b, d>c) I want to do something otherwise go to next comparison.
Then I wanted to compare b,c,d and e.
Similarly if c>b, d>c, e>d and e> all of 3 previous ones (e>b, e>c, e>d) I want to do something otherwise go to next comparison.
What if my list contains infinite elements? myList = [:]
Where do I start? Do I have to have a starting point?
I am guessing I have to use a for loop to iterate through the list but I honestly can't figure out how to iterate through the first 4 elements and then continue from the second element in 4 element batches.
Since I am currently studying the Arrays and lists maybe there is some functionality I am missing? Or I simply my brain can grasp it.
I tried looking at other posts in stackoverflow but honestly I can't figure it out from other people's answers. I would appreciate any help or guidance.
Thanks in advance.

You can use the built-in all() function for this problem:
myList = [5, 4, 3, 6, 3, 5, 6, 2, 3, 10, 11, 3]
def do_somthing():
#your code here
pass
for i in range(len(myList)-4):
new_list = myList[i:i+4] #here, using list slicing to jump ahead four elements.
if all(new_list[-1] > b for b in new_list[:-1]) and all(new_list[:-1][c] > new_list[:-1][c+1] for c in range(len(new_list)-2)):
do_something()

L = [...]
# get all the valid indices of the elements in the list, except for the last 4. These are the indices at which the 4-element windows start
for i in range(len(L)-4):
window = L[i:i+4] # the 4 elements you want to compare
print("I am considering the elements starting at index", i, ". They are:", window)
a,b,c,d = window
if d>a>b>c<d and d>b:
print("The checks pass!")
Now, there is a simpler way to do this:
for a,b,c,d in (L[i:i+4] for i in range(len(L)-4):
if d>a>b>c<d and d>b:
print("The checks pass!")

to consume just one item at a time from an iterator and operate on 4 lagged elements try a circle buffer:
# make a generator as example of 'infinte list'
import string
agen = (e for e in string.ascii_lowercase)
# initialize len 4 circle buffer
cb = [next(agen) for _ in range(4)] # assumes there are at least 4 items
ptr = 0 # initialize circle buffer pointer
while True:
a,b,c,d = (cb[(i+ptr)%4] for i in range(4)) # get current 4 saved items
# some fuction here
print(a,b,c,d)
# get next item from generator, catch StopIteration on empty
try:
cb[ptr] = next(agen)
except StopIteration:
break
ptr = (ptr + 1)%4 # update circle buffer pointer
a b c d
b c d e
c d e f
d e f g
e f g h
f g h i
g h i j
h i j k
i j k l
j k l m
k l m n
l m n o
m n o p
n o p q
o p q r
p q r s
q r s t
r s t u
s t u v
t u v w
u v w x
v w x y
w x y z
'some function' could include a stopping condition too:
# random.choice() as example of 'infinte iterator'
import string
import random
random.choice(string.ascii_lowercase)
# initialize len 4 circle buffer
cb = [random.choice(string.ascii_lowercase) for _ in range(4)] # assumes there are at least 4 items
ptr = 0 # initialize circile buffer pointer
while True:
a,b,c,d = (cb[(i+ptr)%4] for i in range(4)) # get current 4 saved items
# some fuction here
print(a,b,c,d)
if a<b<c<d: # stopping condition
print("found ordered string: ", a,b,c,d)
break
# get next item from generator, catch StopIteration on empty
try:
cb[ptr] = random.choice(string.ascii_lowercase)
except StopIteration:
break
ptr = (ptr + 1)%4 # update circle buffer pointer
o s w q
s w q k
w q k j
q k j r
k j r q
j r q r
r q r u
q r u v
found ordered string: q r u v

Since you can index a list, how about start from index 0, compare the 0th, (0+1)th, (0+2)th, and (0+3)th elements. Then, by the next round, increase your index to 1, and compare the 1st, (1+1)th, (1+2)th, and (1+3)th elements, and so on. For the nth round, you compare the n, n+1, n+2, and (n+3)th elements, until you reach the 4th element before the end. This is how you generally do stuff like 'testing m elements each time from a sequence of length n', and you can easily expand this pattern to matrices or 3d arrays. The code you see in other answers are basically all doing this, and certain features in Python make this job very easy.
Now, 'what if the list contains infinite elements'? Well, then you'll need a generator, which is a bit advanced at this stage I assume, but the concept is very simple: you let a function read that infinite stream of elements in a (might be infinite) loop, set a cursor on one of them, return (yield) the element under the cursor as well as the 3 elements following it each time, and increase the cursor by one before the next loop starts:
def unsc_infinity(somelist):
cur = 0
while True:
yield somelist[c:c+4]
cur = cur + 1
infinity_reader = unsc_infinity(endless_stream)
next(infinity_reader)
# gives the 0, 1, 2, 3 th elements in endless_stream
next(infinity_reader)
# gives the 1, 2, 3, 4 th elements in endless_stream
next(infinity_reader)
# ...
And you can loop over that generator too:
for a, b, c, d in unsc_infinity(endless_stream):
if d>a>b>c<d and d>b:
do_something()
Hope that helps a bit for you to build a mental model about how this kind of problems are done.

Related

Complex behaviour with lists

I need to make some strange and complex calculation with lists. I have tried and I have endeavored to get it up and running, but it runs into error. better saying quite difficult to achieve that behavior.
I have following lists.
A = [1,1,1,2,2,2]
B = [3,3] # B list is length of numbers 1 and 2.
E = [10,10]
C = 1
D = []
I have this code, but not really working:
for k in B:
for j in E:
for i in range(len(A)-k):
print(i)
if i == 0:
D.append(C)
else:
D.append(C+(E[k]))
print(D)
Explaining to achieve results.
I want to have a for-loop, which enables to append values to my empty list, which looks at first 3 values in the beginning of list A by taking B[0]= 3, do something with first 3 values. And looks at B[1]= 3, ie. take the last 3 values in the list A, then do something to them and append them all in order to empty list.
First 3 values:
When A[0] is selected, I want to have D[0] = C, and in case A[1] and A[2], B list should be B[1]= C + 1*E[0] and B[2]= C + 2*E[0].
Last 3 values:
When A[3] is selected, I want to have D[3] = C, and in case A[4] and A[5], B list should be B[4]= C + 1*E[1] and B[5]= C + 2*E[1].
Expected output:
[1,11,21,1,11,21]
I want to get it programmatically, in case changing A list to A = [1,1,2,2] and B = [2,2] or something else.
Initialization of your lists
A = [1,1,1,2,2,2]
B = [3,3] # B list is length of numbers 1 and 2.
E = [10,10]
C = 1
D = []
we want to count in A starting the first time from 0, the next times from the previous start plus how many items we have used, hence we initialize start
start = 0
We start a loop on the elements b of B, counting them in k, we extract from A the elements we need and update the start position for the next pass
for k, b in enumerate(B):
sub_A = A[start:start+b]
start = start+b
Now an inner loop on the elements a of the sub-list, counting them with i, note that for the first item i is zero and so we append C+0*E[k]=C, as requested
for i, _ in enumerate(sub_A):
D.append(C+i*E[k])
To see everything without my comments
start = 0
for k, b in enumerate(B):
sub_A = A[start:start+b]
start = start+b
for i, _ in enumerate(sub_A):
D.append(C+i*E[k])

Python Array out of bounds error

I am getting an array error out of bounds issue. I am trying loop through a multi-dimensional array and add the value from the formula to each element. How do i fix the loop so i jump out of the array bounds.
z=int(4.3/7.9)
V =51
T =51
r = 1
c = 1
a=[[0]*c for i in range(r)]
for r in range(1,51):
for c in range(1,51):
a[c][r]=35.74 + 0.6215*T - (35.75*V)**0.16 + (0.4275*T*V)**0.16
print()
#print to html file down below
outfile=open("c:\\data\\pfile.html","w")
outfile.write("<!DOCTYPE html>")
outfile.write("<html>")
outfile.write("<head>")
outfile.write("<title>Kye Fullwood</title>")
outfile.write("<style> table,td{border:1px solid black;border-collaspse:collapse;background-color:aqua;}</style>\r\n")
outfile.write("</head>")
outfile.write("<body>")
outfile.write("<h1>This is a Windchill table</h1>")
outfile.write("<table>")
for V in range(1,51,1):
outfile.write("<tr>")
for TV in range(1,51,1):
outfile.write("<td>"+str(a[r][c])+"</td>\r\n")
outfile.write("</tr>")
outfile.write("</table>")
outfile.write("</body>")
outfile.write("</html>")
outfile.close()
print("complete")
a=[[0]*c for i in range(r)]
basically means a=[[0]] because your code starts with c=1 and r=1. So when you try to access matrix a with indices in range(1,51) you got an "out of range" exception, as there is only one element in your matrix.
To create a 50x50 null matrix, the following python idiom could be used:
a = [[0 for col in range(50)] for row in range(50)]
but I guess from your code that you actually need this one:
a = [[35.74 + 0.6215*T - (35.75*V)**0.16 + (0.4275*T*V)**0.16
for T in range(1,51)] for V in range(1,51)]
When you're initializing your list the way you do above
r = 1
c = 1
a=[[0]*c for i in range(r)]
because at that time c and r are both 1, your list looks like this -- [[0]] -- so you're going to get out of range errors when you try to update any indices in the list other than a[0][0]. Because in this code
for r in range(1,51):
for c in range(1,51):
a[c][r]=35.74 + 0.6215*T - (35.75*V)**0.16 + (0.4275*T*V)**0.16
you're going up to a[51][51], when you initialize the list in the first place you would need both r and c set to at least 52.
For that matter, in this code
for V in range(1,51,1):
outfile.write("<tr>")
for TV in range(1,51,1):
outfile.write("<td>"+str(a[r][c])+"</td>\r\n")
you're just going to be printing the same value 2500 times, because you're never changing r and c in those loops.

When printing outputs an empty line appears before my outputs

I have attempted to write a program which asks the user for a string and a number (On the same line) and then prints all possible combinations of the string up to the size of the number. The output format should be: All capitals, Each combination on each line, Length of combination(Shortest First) and in alphabetical.
My code outputs the right combinations in the right order but it places an empty before the outputs and I'm not sure why.
from itertools import combinations
allcombo = []
S = input().strip()
inputlist = S.split()
k = int(inputlist[1])
S = inputlist[0]
#
for L in range(0, k+1):
allcombo = []
for pos in combinations(S, L):
pos = sorted(pos)
pos = str(pos).translate({ord(c): None for c in "[]()', "})
allcombo.append(pos)
allcombo = sorted(allcombo)
print(*allcombo, sep = '\n')
Input:
HACK 2
Output:
(Empty Line)
A
C
H
K
AC
AH
AK
CH
CK
HK
Also I've only been coding for about a week so if anyone would like to show me how to write this properly, I'd be very pleased.
Observe the line:
for L in range(0, k+1) # Notice that L is starting at 0.
Now, observe this line:
for pos in combinations(S, L)
So, we will have the following during our first iteration of the inner for loop:
for pos in combinations(S, 0) # This is an empty collection during your first loop.
Basically no work is being performed inside your loop because there is nothing to iterate over, and you will just being printing an empty string.
Change the following code:
for L in range(0, k+1)
to this:
for L in range(1, k+1) # Skips the empty collection since L starts at 1.
and this will fix your problem.

Motif search with Gibbs sampler

I am a beginner in both programming and bioinformatics. So, I would appreciate your understanding. I tried to develop a python script for motif search using Gibbs sampling as explained in Coursera class, "Finding Hidden Messages in DNA". The pseudocode provided in the course is:
GIBBSSAMPLER(Dna, k, t, N)
randomly select k-mers Motifs = (Motif1, …, Motift) in each string
from Dna
BestMotifs ← Motifs
for j ← 1 to N
i ← Random(t)
Profile ← profile matrix constructed from all strings in Motifs
except for Motifi
Motifi ← Profile-randomly generated k-mer in the i-th sequence
if Score(Motifs) < Score(BestMotifs)
BestMotifs ← Motifs
return BestMotifs
Problem description:
CODE CHALLENGE: Implement GIBBSSAMPLER.
Input: Integers k, t, and N, followed by a collection of strings Dna.
Output: The strings BestMotifs resulting from running GIBBSSAMPLER(Dna, k, t, N) with
20 random starts. Remember to use pseudocounts!
Sample Input:
8 5 100
CGCCCCTCTCGGGGGTGTTCAGTAACCGGCCA
GGGCGAGGTATGTGTAAGTGCCAAGGTGCCAG
TAGTACCGAGACCGAAAGAAGTATACAGGCGT
TAGATCAAGTTTCAGGTGCACGTCGGTGAACC
AATCCACCAGCTCCACGTGCAATGTTGGCCTA
Sample Output:
TCTCGGGG
CCAAGGTG
TACAGGCG
TTCAGGTG
TCCACGTG
I followed the pseudocode to the best of my knowledge. Here is my code:
def BuildProfileMatrix(dnamatrix):
ProfileMatrix = [[1 for x in xrange(len(dnamatrix[0]))] for x in xrange(4)]
indices = {'A':0, 'C':1, 'G': 2, 'T':3}
for seq in dnamatrix:
for i in xrange(len(dnamatrix[0])):
ProfileMatrix[indices[seq[i]]][i] += 1
ProbMatrix = [[float(x)/sum(zip(*ProfileMatrix)[0]) for x in y] for y in ProfileMatrix]
return ProbMatrix
def ProfileRandomGenerator(profile, dna, k, i):
indices = {'A':0, 'C':1, 'G': 2, 'T':3}
score_list = []
for x in xrange(len(dna[i]) - k + 1):
probability = 1
window = dna[i][x : k + x]
for y in xrange(k):
probability *= profile[indices[window[y]]][y]
score_list.append(probability)
rnd = uniform(0, sum(score_list))
current = 0
for z, bias in enumerate(score_list):
current += bias
if rnd <= current:
return dna[i][z : k + z]
def score(motifs):
ProfileMatrix = [[0 for x in xrange(len(motifs[0]))] for x in xrange(4)]
indices = {'A':0, 'C':1, 'G': 2, 'T':3}
for seq in motifs:
for i in xrange(len(motifs[0])):
ProfileMatrix[indices[seq[i]]][i] += 1
score = len(motifs)*len(motifs[0]) - sum([max(x) for x in zip(*ProfileMatrix)])
return score
from random import randint, uniform
def GibbsSampler(k, t, N):
dna = ['CGCCCCTCTCGGGGGTGTTCAGTAACCGGCCA',
'GGGCGAGGTATGTGTAAGTGCCAAGGTGCCAG',
'TAGTACCGAGACCGAAAGAAGTATACAGGCGT',
'TAGATCAAGTTTCAGGTGCACGTCGGTGAACC',
'AATCCACCAGCTCCACGTGCAATGTTGGCCTA']
Motifs = []
for i in [randint(0, len(dna[0])-k) for x in range(len(dna))]:
j = 0
kmer = dna[j][i : k+i]
j += 1
Motifs.append(kmer)
BestMotifs = []
s_best = float('inf')
for i in xrange(N):
x = randint(0, t-1)
Motifs.pop(x)
profile = BuildProfileMatrix(Motifs)
Motif = ProfileRandomGenerator(profile, dna, k, x)
Motifs.append(Motif)
s_motifs = score(Motifs)
if s_motifs < s_best:
s_best = s_motifs
BestMotifs = Motifs
return [s_best, BestMotifs]
k, t, N =8, 5, 100
best_motifs = [float('inf'), None]
# Repeat the Gibbs sampler search 20 times.
for repeat in xrange(20):
current_motifs = GibbsSampler(k, t, N)
if current_motifs[0] < best_motifs[0]:
best_motifs = current_motifs
# Print and save the answer.
print '\n'.join(best_motifs[1])
Unfortunately, my code never gives the same output as the solved example. Besides, while trying to debug the code I found that I get weird scores that define the mismatches between motifs. However, when I tried to run the score function separately, it worked perfectly.
Each time I run the script, the output changes, but anyway here is an example of one of the outputs for the input present in the code:
Example output of my code
TATGTGTA
TATGTGTA
TATGTGTA
GGTGTTCA
TATACAGG
Could you please help me debug this code?!! I spent the whole day trying to find out what's wrong with it although I know it might be some silly mistake I made, but my eye failed to catch it.
Thank you all!!
Finally, I found out what was wrong in my code! It was in line 54:
Motifs.append(Motif)
After randomly removing one of the motifs, followed by building a profile out of these motifs then randomly selecting a new motif based on this profile, I should have added the selected motif in the same position before removal NOT appended to the end of the motif list.
Now, the correct code is:
Motifs.insert(x, Motif)
The new code worked as expected.

How to check if two permutations are symmetric?

Given two permutations A and B of L different elements, L is even, let's call these permutations "symmetric" (for a lack of a better term), if there exist n and m, m > n such as (in python notation):
- A[n:m] == B[L-m:L-n]
- B[n:m] == A[L-m:L-n]
- all other elements are in place
Informally, consider
A = 0 1 2 3 4 5 6 7
Take any slice of it, for example 1 2. It starts at the second index and its length is 2. Now take a slice symmetric to it: it ends at the penultimate index and is 2 chars long too, so it's 5 6. Swapping these slices gives
B = 0 5 6 3 4 1 2 7
Now, A and B are "symmetric" in the above sense (n=1, m=3). On the other hand
A = 0 1 2 3 4 5 6 7
B = 1 0 2 3 4 5 7 6
are not "symmetric" (no n,m with above properties exist).
How can I write an algorithm in python that finds if two given permutations (=lists) are "symmetric" and if yes, find the n and m? For simplicity, let's consider only even L (because the odd case can be trivially reduced to the even one by eliminating the middle fixed element) and assume correct inputs (set(A)==set(B), len(set(A))==len(A)).
(I have no problem bruteforcing all possible symmetries, but looking for something smarter and faster than that).
Fun fact: the number of symmetric permutations for the given L is a Triangular number.
I use this code to test out your answers.
Bounty update: many excellent answers here. #Jared Goguen's solution appears to be the fastest.
Final timings:
testing 0123456789 L= 10
test_alexis ok in 15.4252s
test_evgeny_kluev_A ok in 30.3875s
test_evgeny_kluev_B ok in 27.1382s
test_evgeny_kluev_C ok in 14.8131s
test_ian ok in 26.8318s
test_jared_goguen ok in 10.0999s
test_jason_herbburn ok in 21.3870s
test_tom_karzes ok in 27.9769s
Here is the working solution for the question:
def isSymmetric(A, B):
L = len(A) #assume equivalent to len(B), modifying this would be as simple as checking if len(A) != len(B), return []
la = L//2 # half-list length
Al = A[:la]
Ar = A[la:]
Bl = B[:la]
Br = B[la:]
for i in range(la):
lai = la - i #just to reduce the number of computation we need to perform
for j in range(1, lai + 1):
k = lai - j #same here, reduce computation
if Al[i] != Br[k] or Ar[k] != Bl[i]: #the key for efficient computation is here: do not proceed unnecessarily
continue
n = i #written only for the sake of clarity. i is n, and we can use i directly
m = i + j
if A[n:m] == B[L-m:L-n] and B[n:m] == A[L-m:L-n]: #possibly symmetric
if A[0:n] == B[0:n] and A[m:L-m] == B[m:L-m] and A[L-n:] == B[L-n:]:
return [n, m]
return []
As you have mentioned, though the idea looks simple, but it is actually quite a tricky one. Once we see the patterns, however, the implementation is straight-forward.
The central idea of the solution is this single line:
if Al[i] != Br[k] or Ar[k] != Bl[i]: #the key for efficient computation is here: do not proceed unnecessarily
All other lines are just either direct code translation from the problem statement or optimization made for more efficient computation.
There are few steps involved in order to find the solution:
Firstly, we need to split the each both list Aand list B into two half-lists (called Al, Ar, Bl, and Br). Each half-list would contain half of the members of the original lists:
Al = A[:la]
Ar = A[la:]
Bl = B[:la]
Br = B[la:]
Secondly, to make the evaluation efficient, the goal here is to find what I would call pivot index to decide whether a position in the list (index) is worth evaluated or not to check if the lists are symmetric. This pivot index is the central idea to find an efficient solution. So I would try to elaborate it quite a bit:
Consider the left half part of the A list, suppose you have a member like this:
Al = [al1, al2, al3, al4, al5, al6]
We can imagine that there is a corresponding index list for the mentioned list like this
Al = [al1, al2, al3, al4, al5, al6]
iAl = [0, 1, 2, 3, 4, 5 ] #corresponding index list, added for explanation purpose
(Note: the reason why I mention of imagining a corresponding index list is for ease of explanation purposes)
Likewise, we can imagine that the other three lists may have similar index lists. Let's name them iAr, iBl, and iBr respectively and they are all having identical members with iAl.
It is the index of the lists which would really matter for us to look into - in order to solve the problem.
Here is what I mean: suppose we have two parameters:
index (let's give a variable name i to it, and I would use symbol ^ for current i)
length (let's give a variable name j to it, and I would use symbol == to visually represent its length value)
for each evaluation of the index element in iAl - then each evaluation would mean:
Given an index value i and length value of j in iAl, do
something to determine if it is worth to check for symmetric
qualifications starting from that index and with that length
(Hence the name pivot index come).
Now, let's take example of one evaluation when i = 0 and j = 1. The evaluation can be illustrated as follow:
iAl = [0, 1, 2, 3, 4, 5]
^ <-- now evaluate this index (i) = 0
== <-- now this has length (j) of 1
In order for those index i and length j to be worth evaluated further, then the counterpart iBr must have the same item value with the same length but on different index (let's name it index k)
iBr = [0, 1, 2, 3, 4, 5]
^ <-- must compare the value in this index to what is pointed by iAl
== <-- must evaluate with the same length = 1
For example, for the above case, this is a possible "symmetric" permutation just for the two lists Al-Br (we will consider the other two lists Ar-Bl later):
Al = [0, x, x, x, x, x] #x means don't care for now
Br = [x, x, x, x, x, 0]
At this moment, it is good to note that
It won't worth evaluating further if even the above condition is not
true
And this is where you get the algorithm to be more efficient; that is, by selectively evaluating only the few possible cases among all possible cases. And how to find the few possible cases?
By trying to find relationship between indexes and lengths of the
four lists. That is, for a given index i and length j in a
list (say Al), what must be the index k in the counterpart
list (in the case is Br). Length for the counterpart list need not
be found because it is the same as in the list (that is j).
Having know that, let's now proceed further to see if we can see more patterns in the evaluation process.
Consider now the effect of length (j). For example, if we are to evaluate from index 0, but the length is 2 then the counterpart list would need to have different index k evaluated than when the length is 1
iAl = [0, 1, 2, 3, 4, 5]
^ <-- now evaluate this index (i) = 0
===== <-- now this has length (j) of 2
iBr = [0, 1, 2, 3, 4, 5]
^ <-- must compare the value in this index to what is pointed by iAl
===== <-- must evaluate with the same length = 2
Or, for the illustration above, what really matters fox i = 0 and y = 2 is something like this:
# when i = 0 and y = 2
Al = [0, y, x, x, x, x] #x means don't care for now
Br = [x, x, x, x, 0, y] #y means to be checked later
Take a look that the above pattern is a bit different from when i = 0 and y = 1 - the index position for 0 value in the example is shifted:
# when i = 0 and y = 1, k = 5
Al = [0, x, x, x, x, x] #x means don't care for now
Br = [x, x, x, x, x, 0]
# when i = 0 and y = 2, k = 4
Al = [0, y, x, x, x, x] #x means don't care for now
Br = [x, x, x, x, 0, y] #y means to be checked later
Thus, length shifts where the index of the counterpart list must be checked. In the first case, when i = 0 and y = 1, then the k = 5. But in the second case, when i = 0 and y = 1, then the k = 4. Thus we found the pivot indexes relationship when we change the length j for a fixed index i (in this case being 0) unto the counterpart list index k.
Now, consider the effects of index i with fixed length j for counterpart list index k. For example, let's fix the length as y = 4, then for index i = 0, we have:
iAl = [0, 1, 2, 3, 4, 5]
^ <-- now evaluate this index (i) = 0
========== <-- now this has length (j) of 4
iAl = [0, 1, 2, 3, 4, 5]
^ <-- now evaluate this index (i) = 1
========== <-- now this has length (j) of 4
iAl = [0, 1, 2, 3, 4, 5]
^ <-- now evaluate this index (i) = 2
========== <-- now this has length (j) of 4
#And no more needed
In the above example, it can be seen that we need to evaluate 3 possibilities for the given i and j, but if the index i is changed to 1 with the same length j = 4:
iAl = [0, 1, 2, 3, 4, 5]
^ <-- now evaluate this index (i) = 1
========== <-- now this has length (j) of 4
iAl = [0, 1, 2, 3, 4, 5]
^ <-- now evaluate this index (i) = 2
========== <-- now this has length (j) of 4
Note that we only need to evaluate 2 possibilities. Thus the increase of index i decreases the number of possible cases to be evaluated!
With all the above patterns found, we almost found all the basis we need to make the algorithm works. But to complete that, we need to find the relationship between indexes which appear in Al-Br pair for a given [i, j] => [k, j] with the indexes in Ar-Bl pair for the same [i, j].
Now, we can actually see that they are simply mirroring the relationship we found in Al-Br pair!
(IMHO, this is really beautiful! and thus I think term "symmetric" permutation is not far from truth)
For example, if we have the following Al-Br pair evaluated with i = 0 and y = 2
Al = [0, y, x, x, x, x] #x means don't care for now
Br = [x, x, x, x, 0, y] #y means to be checked later
Then, to make it symmetric, we must have the corresponding Ar-Bl:
Ar = [x, x, x, x, 3, y] #x means don't care for now
Bl = [3, y, x, x, x, x] #y means to be checked later
The indexing of Al-Br pair is mirroring (or, is symmetric to) the indexing of Ar-Bl pair!
Therefore, combining all the pattern we found above, we now could find the pivot indexes for evaluating Al, Ar, Bl, and Br.
We only need to check the values of the lists in the pivot index
first. If the values of the lists in the pivot indexes of Al, Ar, Bl, and Br
matches in the evaluation then and only then we need to check
for symmetric criteria (thus making the computation efficient!)
Putting up all the knowledge above into code, the following is the resulting for-loop Python code to check for symmetricity:
for i in range(len(Al)): #for every index in the list
lai = la - i #just simplification
for j in range(1, lai + 1): #get the length from 1 to la - i + 1
k = lai - j #get the mirror index
if Al[i] != Br[k] or Ar[k] != Bl[i]: #if the value in the pivot indexes do not match
continue #skip, no need to evaluate
#at this point onwards, then the values in the pivot indexes match
n = i #assign n
m = i + j #assign m
#test if the first two conditions for symmetric are passed
if A[n:m] == B[L-m:L-n] and B[n:m] == A[L-m:L-n]: #possibly symmetric
#if it passes, test the third condition for symmetric, the rests of the elements must stay in its place
if A[0:n] == B[0:n] and A[m:L-m] == B[m:L-m] and A[L-n:] == B[L-n:]:
return [n, m] #if all three conditions are passed, symmetric lists are found! return [n, m] immediately!
#passing this but not outside of the loop means
#any of the 3 conditions to find symmetry are failed
#though values in the pivot indexes match, simply continue
return [] #nothing can be found - asymmetric lists
And there go you with the symmetric test!
(OK, this is quite a challenge and it takes quite a while for me to figure out how.)
I rewrote the code without some of the complexity (and errors).
def test_o_o(a, b):
L = len(a)
H = L//2
n, m = 0, H-1
# find the first difference in the left-side
while n < H:
if a[n] != b[n]: break
n += 1
else: return
# find the last difference in the left-side
while m > -1:
if a[m] != b[m]: break
m -= 1
else: return
# for slicing, we want end_index+1
m += 1
# compare each slice for equality
# order: beginning, block 1, block 2, middle, end
if (a[0:n] == b[0:n] and \
a[n:m] == b[L-m:L-n] and \
b[n:m] == a[L-m:L-n] and \
a[m:L-m] == b[m:L-m] and \
a[L-n:L] == b[L-n:L]):
return n, m
The implementation is both elegant and efficient.
The break into else: return structures ensure that the function returns at the soonest possible point. They also validate that n and m have been set to valid values, but this does not appear to be necessary when explicitly checking the slices. These lines can be removed with no noticeable impact on the timing.
Explicitly comparing the slices will also short-circuit as soon as one evaluates to False.
Originally, I checked whether a permutation existed by transforming b into a:
b = b[:]
b[n:m], b[L-m:L-n] = b[L-m:L-n], b[n:m]
if a == b:
return n, m
But this is slower than explicitly comparing the slices. Let me know if the algorithm doesn't speak for itself and I can offer further explanation (maybe even proof) as to why it works and is minimal.
I tried to implement 3 different algorithms for this task. All of them have O(N) time complexity and require O(1) additional space. Interesting fact: all other answers (known so far) implement 2 of these algorithms (though they not always keep optimal asymptotic time/space complexity). Here is high-level description for each algorithm:
Algorithm A
Compare the lists, group "non-equal" intervals, make sure there are exactly two such intervals (with special case when intervals meet in the middle).
Check if "non-equal" intervals are positioned symmetrically, and their contents is also "symmetrical".
Algorithm B
Compare first halves of the lists to guess where are "intervals to be exchanged".
Check if contents of these intervals is "symmetrical". And make sure the lists are equal outside of these intervals.
Algorithm C
Compare first halves of the lists to find first mismatched element.
Find this mismatched element of first list in second one. This hints the position of "intervals to be exchanged".
Check if contents of these intervals is "symmetrical". And make sure the lists are equal outside of these intervals.
There are two alternative implementations for step 1 of each algorithm: (1) using itertools, and (2) using plain loops (or list comprehensions). itertools are efficient for long lists but relatively slow on short lists.
Here is algorithm C with first step implemented using itertools. It looks simpler than other two algorithms (at the end of this post). And it is pretty fast, even for short lists:
import itertools as it
import operator as op
def test_C(a, b):
length = len(a)
half = length // 2
mismatches = it.imap(op.ne, a, b[:half]) # compare half-lists
try:
n = next(it.compress(it.count(), mismatches))
nr = length - n
mr = a.index(b[n], half, nr)
m = length - mr
except StopIteration: return None
except ValueError: return None
if a[n:m] == b[mr:nr] and b[n:m] == a[mr:nr] \
and a[m:mr] == b[m:mr] and a[nr:] == b[nr:]:
return (n, m)
This could be done using mostly itertools:
def test_A(a, b):
equals = it.imap(op.eq, a, b) # compare lists
e1, e2 = it.tee(equals)
l = it.chain(e1, [True])
r = it.chain([True], e2)
borders = it.imap(op.ne, l, r) # delimit equal/non-equal intervals
ranges = list(it.islice(it.compress(it.count(), borders), 5))
if len(ranges) == 4:
n1, m1 = ranges[0], ranges[1]
n2, m2 = ranges[2], ranges[3]
elif len(ranges) == 2:
n1, m1 = ranges[0], len(a) // 2
n2, m2 = len(a) // 2, ranges[1]
else:
return None
if n1 == len(a) - m2 and m1 == len(a) - n2 \
and a[n1:m1] == b[n2:m2] and b[n1:m1] == a[n2:m2]:
return (n1, m1)
High-level description of this algorithm is already provided in OP comments by #j_random_hacker. Here are some details:
Start with comparing the lists:
A 0 1 2 3 4 5 6 7
B 0 5 6 3 4 1 2 7
= E N N E E N N E
Then find borders between equal/non-equal intervals:
= E N N E E N N E
B _ * _ * _ * _ *
Then determine ranges for non-equal elements:
B _ * _ * _ * _ *
[1 : 3] [5 : 7]
Then check if there are exactly 2 ranges (with special case when both ranges meet in the middle), the ranges themselves are symmetrical, and their contents too.
Other alternative is to use itertools to process only half of each list. This allows slightly simpler (and slightly faster) algorithm because there is no need to handle a special case:
def test_B(a, b):
equals = it.imap(op.eq, a, b[:len(a) // 2]) # compare half-lists
e1, e2 = it.tee(equals)
l = it.chain(e1, [True])
r = it.chain([True], e2)
borders = it.imap(op.ne, l, r) # delimit equal/non-equal intervals
ranges = list(it.islice(it.compress(it.count(), borders), 2))
if len(ranges) != 2:
return None
n, m = ranges[0], ranges[1]
nr, mr = len(a) - n, len(a) - m
if a[n:m] == b[mr:nr] and b[n:m] == a[mr:nr] \
and a[m:mr] == b[m:mr] and a[nr:] == b[nr:]:
return (n, m)
This does the right thing:
Br = B[L//2:]+B[:L//2]
same_full = [a==b for (a,b) in zip(A, Br)]
same_part = [a+b for (a,b) in zip(same_full[L//2:], same_full[:L//2])]
for n, vn in enumerate(same_part):
if vn != 2:
continue
m = n
for vm in same_part[n+1:]:
if vm != 2:
break
m+=1
if m>n:
print("n=", n, "m=", m+1)
I'm pretty sure you could do the counting a bit bettter, but... meh
I believe the following pseudocode should work:
Find the first element i for which A[i] != B[i], set n = i. If no such element, return success. If n >= L/2, return fail.
Find the first element i > n for which A[i] == B[i], set m = i. If no such element or m > L/2, set m = L/2.
Check so A[0:n] == B[0:n], A[n:m] == B[L-m:L-n], B[n:m] == A[L-m:L-n], A[m:L-m] == B[m:L-m] and A[L-n:L] == B[L-n:L]. If all are true, return success. Else, return fail.
Complexity is O(n) which should be the lowest possible as one always needs to compare all elements in the lists.
I build a map of where the characters are in list B, then use that to determine the implied subranges in list A. Once I have the subranges, I can sanity check some of the info, and compare the slices.
If A[i] == x, then where does x appear in B? Call that position p.
I know i, the start of the left subrange.
I know L (= len(A)), so I know L-i, the end of the right subrange.
If I know p, then I know the implied start of the right subrange, assuming that B[p] and A[i] are the start of a symmetric pair of ranges. Thus, the OP's L - m would be p if the lists were symmetric.
Setting L-m == p gives me m, so I have all four end points.
Sanity tests are:
n and m are in left half of list(s)
n <= m (note: OP did not prohibit n == m)
L-n is in right half of list (computed)
L-m is in right half (this is a good check for quick fail)
If all those check out, compare A[left] == B[right] and B[left] == A[right]. Return left if true.
def find_symmetry(a:list, b:list) -> slice or None:
assert len(a) == len(b)
assert set(a) == set(b)
assert len(set(a)) == len(a)
length = len(a)
assert length % 2 == 0
half = length // 2
b_loc = {bi:n for n,bi in enumerate(b)}
for n,ai in enumerate(a[:half]):
L_n = length - 1 - n # L - n
L_m = b_loc[ai] # L - m (speculative)
if L_m < half: # Sanity: bail if on wrong side
continue
m = b_loc[a[L_n]] # If A[n] starts range, A[m] ends it.
if m < n or m > half: # Sanity: bail if backwards or wrong side
continue
left = slice(n, m+1)
right = slice(L_m, L_n+1)
if a[left] == b[right] and \
b[left] == a[right]:
return left
return None
res = find_symmetry(
[ 10, 11, 12, 13, 14, 15, 16, 17, ],
[ 10, 15, 16, 13, 14, 11, 12, 17, ])
assert res == slice(1,3)
res = find_symmetry(
[ 0, 1, 2, 3, 4, 5, 6, 7, ],
[ 1, 0, 2, 3, 4, 5, 7, 6, ])
assert res is None
res = find_symmetry("abcdefghijklmn", "nbcdefghijklma")
assert res == slice(0,1)
res = find_symmetry("abcdefghijklmn", "abjklfghicdmen")
assert res == slice(3,4)
res = find_symmetry("abcdefghijklmn", "ancjkfghidelmb")
assert res == slice(3,5)
res = find_symmetry("abcdefghijklmn", "bcdefgaijklmnh")
assert res is None
res = find_symmetry("012345", "013245")
assert res == slice(2,3)
Here's an O(N) solution which passes the test code:
def sym_check(a, b):
cnt = len(a)
ml = [a[i] == b[i] for i in range(cnt)]
sl = [i for i in range(cnt) if (i == 0 or ml[i-1]) and not ml[i]]
el = [i+1 for i in range(cnt) if not ml[i] and (i == cnt-1 or ml[i+1])]
assert(len(sl) == len(el))
range_cnt = len(sl)
if range_cnt == 1:
start1 = sl[0]
end2 = el[0]
if (end2 - start1) % 2 != 0:
return None
end1 = (start1 + end2) // 2
start2 = end1
elif range_cnt == 2:
start1, start2 = sl
end1, end2 = el
else:
return None
if end1 - start1 != end2 - start2:
return None
if start1 != cnt - end2:
return None
if a[start1:end1] != b[start2:end2]:
return None
if b[start1:end1] != a[start2:end2]:
return None
return start1, end1
I only tested it with Python 2, but I believe it will also work with Python 3.
It identifies the ranges where the two lists differ. It looks for two such ranges (if there is only one such range, it tries to divide it in half). It then checks that both ranges are the same length and in the proper positions relative to each other. If so, then it checks that the elements in the ranges match.
Yet another version:
def compare(a, b):
i_zip = list(enumerate(zip(a, b)))
llen = len(a)
hp = llen // 2
def find_index(i_zip):
for i, (x, y) in i_zip:
if x != y:
return i
return i_zip[0][0]
# n and m are determined by the unmoved items:
n = find_index(i_zip[:hp])
p = find_index(i_zip[hp:])
m = llen - p
q = llen - n
# Symmetric?
if a[:n] + a[p:q] + a[m:p] + a[n:m] + a[q:] != b:
return None
return n, m
This solution is based on:
All validly permuted list pairs A, B adhering to the symmetry requirement will have the structure:
A = P1 + P2 + P3 + P4 + P5
B = P1 + P4 + P3 + P2 + P5
^n ^m ^hp ^p ^q <- indexes
,len(P1) == len(P5) and len(P2) == len(P4)
Therefore the 3 last lines of the above function will determine the correct solution provided the indexes n, m are correctly determined. (p & q are just mirror indexes of m & n)
Finding n is a matter of determining when items of A and B start to diverge. Next the same method is applied to finding p starting from midpoint hp. m is just mirror index of p. All involved indexes are found and the solution emerges.
Make a list (ds) of indices where the first halves of the two lists differ.
A possible n is the first such index, the last such index is m - 1.
Check if valid symmetry. len(ds) == m - n makes sure there aren't any gaps.
import itertools as it
import operator as op
def test(a, b):
sz = len(a)
ds = list(it.compress(it.count(), map(op.ne, a[:sz//2], b[:sz//2])))
n,m = ds[0], ds[-1]+1
if a[n:m] == b[sz-m:sz-n] and b[n:m] == a[sz-m:sz-n] and len(ds) == m - n:
return n,m
else:
return None
Here's a simple solution that passes my tests, and yours:
Compare the inputs, looking for a subsequence that does not match.
Transform A by transposing the mismatched subsequence according to the rules. Does the result match B?
The algorithm is O(N); there are no embedded loops, explicit or implicit.
In step 1, I need to detect the case where the swapped substrings are adjacent. This can only happen in the middle of the string, but I found it easier to just look out for the first element of the moved piece (firstval). Step 2 is simpler (and hence less error-prone) than explicitly checking all the constraints.
def compare(A, B):
same = True
for i, (a, b) in enumerate(zip(A,B)):
if same and a != b: # Found the start of a presumed transposition
same = False
n = i
firstval = a # First element of the transposed piece
elif (not same) and (a == b or b == firstval): # end of the transposition
m = i
break
# Construct the transposed string, compare it to B
origin = A[n:m]
if n == 0: # swap begins at the edge
dest = A[-m:]
B_expect = dest + A[m:-m] + origin
else:
dest = A[-m:-n]
B_expect = A[:n] + dest + A[m:-m] + origin + A[-n:]
return bool(B_expect == B)
Sample use:
>>> compare("01234567", "45670123")
True
Bonus: I believe the name for this relationship would be "symmetric block transposition". A block transposition swaps two subsequences, taking ABCDE to ADCBE. (See definition 4 here; I actually found this by googling "ADCBE"). I've added "symmetric" to the name to describe the length conditions.

Categories

Resources