When given a letter sequence , find closest from given list

When given a letter sequence , find closest from given list - python

I am writing a cable mapping script that needs to find the closest switch. So if a patch panel is in Rack 08AB, and there is a switch in Rack 08AC and 08AD, I need it to pick the closest(08AC). The problem is the rack numbering sequence. The first two digits will always be the same (08) and the letters increment. But they increment A-Z then AA-AZ. So if the panel is in Rack 08Z, 08AA is closer than 08X.
I have this working by converting the letters into numbers and seeing which is closest but it seems clunky and I'm wondering if there's a better way:
###CLOSEST SWITCH IS IN 08AA
full_panel_location = '08Z'
full_switches_location = ['08X', '08AA']
switches_checksum = []
###REMOVE DIGITS AND CONVERT LETTERS TO CHECKSUM
panel_letters = ''.join([i for i in full_panel_location if not i.isdigit()])
panel_checksum = int(reduce(lambda x,y:x+y, map(ord, panel_letters)))
if panel_checksum > 100:
panel_checksum -= 39
for switch in full_switches_location:
switch_letters = ''.join([i for i in switch if not i.isdigit()])
switch_checksum = int(reduce(lambda x,y:x+y, map(ord, switch_letters)))
if switch_checksum > 100:
switch_checksum -= 39
switches_checksum.append(switch_checksum)
###FIND CLOSEST CHECKSUM/INDEX/SWITCH
closest_switch_checksum = min(switches_checksum, key=lambda x: abs(x - panel_checksum))
closest_switch_index = switches_checksum.index(closest_switch_checksum)
closest_switch = full_switches_location[closest_switch_index]
This provides the closest switch being 08AA, which is what I want. My question is, is there a better way to be doing this?

This is basically a base 26 conversion problem. Iterate through each letter of a location in reversed order and add its ordinal number difference to letter A times 26 to the power of the letter offset to get the checksum, with which you can calculate the distance to the reference location as a key for the min function:
def checksum(x):
s = 0
for i, c in enumerate(reversed(x)):
s += (ord(c) - ord('A') + 1) * 26 ** i
return s
full_panel_location = '08Z'
full_switches_location = ['08X', '08AA']
panel_checksum = checksum(full_panel_location[2:])
print(min(full_switches_location, key=lambda i: abs(checksum(i[2:]) - panel_checksum)))
This outputs:
08AA

Related

My Function To Count The Largest Binary Gap Doesn't Work But I Can't Figure Out Why

I'm working through the Codility problems and I have gotten the first one almost correct. The task is to write a function which returns the longest binary gap (a sequence of 0s in between 1s) in a binary number. I have gotten all of the test numbers correct apart from 9, which should be 2 (its binary representation is 1001) but which my function returns as 0. I can't seem to figure out why.
My function is as follows:
def Solution(N):
x = bin(N)[2:]
x_string = str(x)
y = (len(x_string))
count = 0
max = 0
for index, item in enumerate(x_string):
if item == "1":
count = 0
elif item == "0" and x_string[index + 1:y-1] != "0"*(y -1 - (index + 1)):
count = count + 1
if count > max:
max = count
print(max)
The complicated indexing and second condition in the elif statement is so that when a 0 is not contained between two 1s then it isn't recognised as the beginning of a binary gap e.g. when the for loop looks at the second character in bin(16) = 10000, it doesn't set count to 1 because all of the remaining characters in the string are also zero.

Simple solution
x_string[index + 1:y-1] != "0"
this bit wants to take a look at the whole string that's left, but the end argument isn't inclusive,it's exluded, so if string length = 4; string[0:4] is the whole string.
source: https://docs.python.org/3/tutorial/introduction.html
-Sam

How can I convert an IntVector to an Int in z3py

I am using z3py and I have an IntVector of size 3. I need to parse each digit in the IntVector as one whole number. Meaning, if I have an IntVector which has constraints like this:
myIntVector = IntVector('iv', 3)
s = Solver()
s.add(iv[0] == 5)
s.add(iv[1] == 2)
s.add(iv[2] == 6)
….
I need to be able to operate on the number 526 as an Int sort in z3 because I need to both add constraints that apply to each individual member of the IntVector (digit) AND constraints which apply to the whole number, in this case 526. I cannot do something like:
s.add(iv[0] / iv == 55)
because those are 2 separate types. iv[0] is an Int while iv is an IntVector

Here is an example that uses the concept of IntVector as separate digits and as numbers formed by these digits.
It solves the traditional riddle to replace each of the letters of "SEND + MORE = MONEY" to a different digit.
from z3 import *
# trying to find different digits for each letter for SEND+MORE=MONEY
letters = 'SENDMOREMONEY'
iv = IntVector('iv', len(letters))
send = Int('send')
more = Int('more')
money = Int('money')
s = Solver()
# all letters to be replaced by digits 0..9
s.add([And(i >= 0, i <= 9) for i in iv])
# the first digit of a number can not be 0
s.add(And(iv[0] > 0, iv[4] > 0, iv[8] > 0))
# distinct letters need distinct digits
s.add(Distinct([i for k, i in enumerate(iv) if letters[k] not in letters[:k]]))
# "send" is the number formed by the first 4 digits, "more" the 4 next, "money" the last
s.add(send == Sum([10**(3-k)*i for k,i in enumerate(iv[:4])]))
s.add(more == Sum([10**(3-k)*i for k,i in enumerate(iv[4:8])]))
s.add(money == Sum([10**(4-k)*i for k,i in enumerate(iv[8:])]))
# the numbers for "send" and "more" sum together to "money"
s.add(send + more == money)
if s.check() == sat:
m = s.model()
# list each digit of iv
print([m[i].as_long() for i in iv])
# show the sum as "send" + "more" = "money"
print("{} + {} = {}".format(m[send].as_long(), m[more].as_long(), m[money].as_long()))

What is a faster version of this code instead of the double for-loop (python)?

When I hand this code in on a site (from my university) that corrects it, it is too long for its standards.
Here is the code:
def pangram(String):
import string
alfabet = list(string.ascii_lowercase)
interpunctie = string.punctuation + "’" + "123456789"
String = String.lower()
string_1 = ""
for char in String:
if not char in interpunctie:
string_1 += char
string_1 = string_1.replace(" ", "")
List = list(string_1)
List.sort()
list_2 = []
for index, char in enumerate(List):
if not List[index] == 0:
if not (char == List[index - 1]):
list_2.append(char)
return list_2 == alfabet
def venster(tekst):
pangram_gevonden = False
if pangram(tekst) == False:
return None
for lengte in range(26, len(tekst)):
if pangram_gevonden == True:
break
for n in range(0, len(tekst) - lengte):
if pangram(tekst[n:lengte+n]):
kortste_pangram = tekst[n:lengte+n]
pangram_gevonden = True
break
return kortste_pangram
So the first function (pangram) is fine and it determines whether or not a given string is a pangram: it contains all the letters of the alphabet at least once.
The second function checks whether or not the string(usually a longer tekst) is a pangram or not and if it is, it returns the shortest possible pangram within that tekst (even if that's not correct English). If there are two pangrams with the same length: the most left one is returned.
For this second function I used a double for loop: The first one determines the length of the string that's being checked (26 - len(string)) and the second one uses this length to go through the string at each possible point to check if it is a pangram. Once the shortest (and most left) pangram is found, it breaks out of both of the for loops.
However this (apparantly) still takes too long. So i wonder if anyone knew a faster way of tackling this second function. It doesn't necessarily have to be with a for loop.
Thanks in advance
Lucas

Create a map {letter; int} and activecount counter.
Make two indexes left and right, set them in 0.
Move right index.
If l=s[right] is letter, increment value for map key l.
If value becomes non-zero - increment activecount.
Continue until activecount reaches 26
Now move left index.
If l=s[left] is letter, decrement value for map key l.
If value becomes zero - decrement activecount and stop.
Start moving right index again and so on.
Minimal difference between left and right while
activecount==26 corresponds to the shortest pangram.
Algorithm is linear.
Example code for string containing only lower letters from alphabet 'abcd'. Returns length of the shortest substring that contains all letters from abcd. Does not check for valid chars, is not thoroughly tested.
import string
def findpangram(s):
alfabet = list(string.ascii_lowercase)
map = dict(zip(alfabet, [0]*len(alfabet)))
left = 0
right = 0
ac = 0
minlen = 100000
while left < len(s):
while right < len(s):
l = s[right]
c = map[l]
map[l] = c + 1
right += 1
if c==0:
ac+=1
if ac == 4:
break
if ac < 4:
break
if right - left < minlen:
minlen = right - left
while left < right:
l = s[left]
c = map[l]
map[l] = c - 1
left += 1
if c==1:
ac-=1
break
if right - left + 2 < minlen:
minlen = right - left + 1
return minlen
print(findpangram("acacdbcca"))

Next lexicographically bigger string permutation and solution efficiency

I'm trying to solve Hackerrank question: Find next lexicographically bigger string permutation for a given string input.
Here my solution:
def biggerIsGreater(w):
if len(w)<=1: return w
# pair letters in w string with int representing positional index in alphabet
letter_mapping = dict(zip(string.ascii_lowercase, range(1, len(string.ascii_lowercase)+1)))
char_ints = [letter_mapping[letter] for letter in w.lower() if letter in letter_mapping]
# reverse it
reversed_char_ints = char_ints[::-1]
# get char set to reorder, including pivot.
scanned_char_ints = []
index = 0
zipped = list(zip(reversed_char_ints, reversed_char_ints[1:]))
while index < len(zipped):
char_tuple = zipped[index]
scanned_char_ints.append(char_tuple[0])
if char_tuple[0] <= char_tuple[1]:
if index == len(zipped) - 1:
return "no answer"
else:
scanned_char_ints.append(char_tuple[1])
break
index += 1
# get smallest among bigger values of pivot
char_to_switch = None
char_to_switch_index = None
for item in scanned_char_ints[:-1]:
if item > scanned_char_ints[-1]:
if char_to_switch == None or item <= char_to_switch:
char_to_switch = item
char_to_switch_index = scanned_char_ints.index(item)
# switch pivot and smallest of bigger chars in scanned chars
pivot_index = len(scanned_char_ints) - 1
scanned_char_ints[pivot_index], scanned_char_ints[char_to_switch_index] = scanned_char_ints[char_to_switch_index], scanned_char_ints[pivot_index]
# order from second to end the other chars, so to find closest bigger number of starting number
ord_scanned_char_ints = scanned_char_ints[:-1]
ord_scanned_char_ints.sort(reverse=True)
ord_scanned_char_ints.append(scanned_char_ints[-1])
# reverse scanned chars
ord_scanned_char_ints.reverse()
# rebuild char int list
result_ints = char_ints[:len(char_ints) - len(ord_scanned_char_ints)]
result_ints.extend(ord_scanned_char_ints)
result_ = ""
for char_intx in result_ints:
for char, int_charz in letter_mapping.items():
if int_charz == char_intx:
result_ += char
return result_
(I know that there are solution on internet with more concise way of implement the problem, but I obviously trying to succeed by myself).
Now, it seems to run for 1, 2, 100 input of strings with at most 100 characters.
But when hackerrank test procedure tests it against 100000 strings of at most 100 letters, an error results, with no further information about it. Running a test with a similar input size, in my machine, does not throw any error.
What is wrong with this solution?
Thanks in advance

Compressing multiple nested `for` loops

Similar to this and many other questions, I have many nested loops (up to 16) of the same structure.
Problem: I have 4-letter alphabet and want to get all possible words of length 16. I need to filter those words. These are DNA sequences (hence 4 letter: ATGC), filtering rules are quite simple:
no XXXX substrings (i.e. can't have same letter in a row more than 3 times, ATGCATGGGGCTA is "bad")
specific GC content, that is number of Gs + number of Cs should be in specific range (40-50%). ATATATATATATA and GCGCGCGCGCGC are bad words
itertools.product will work for that, but data structure here gonna be giant (4^16 = 4*10^9 words)
More importantly, if I do use product, then I still have to go through each element to filter it out. Thus I will have 4 billion steps times 2
My current solution is nested for loops
alphabet = ['a','t','g','c']
for p1 in alphabet:
for p2 in alphabet:
for p3 in alphabet:
...skip...
for p16 in alphabet:
word = p1+p2+p3+...+p16
if word_is_good(word):
good_words.append(word)
counter+=1
Is there good pattern to program that without 16 nested loops? Is there a way to parallelize it efficiently (on multi-core or multiple EC2 nodes)
Also with that pattern i can plug word_is_good? check inside middle of the loops: word that starts badly is bad
...skip...
for p3 in alphabet:
word_3 = p1+p2+p3
if not word_is_good(word_3):
break
for p4 in alphabet:
...skip...

from itertools import product, islice
from time import time
length = 16
def generate(start, alphabet):
"""
A recursive generator function which works like itertools.product
but restricts the alphabet as it goes based on the letters accumulated so far.
"""
if len(start) == length:
yield start
return
gcs = start.count('g') + start.count('c')
if gcs >= length * 0.5:
alphabet = 'at'
# consider the maximum number of Gs and Cs we can have in the end
# if we add one more A/T now
elif length - len(start) - 1 + gcs < length * 0.4:
alphabet = 'gc'
for c in alphabet:
if start.endswith(c * 3):
continue
for string in generate(start + c, alphabet):
yield string
def brute_force():
""" Straightforward method for comparison """
lower = length * 0.4
upper = length * 0.5
for s in product('atgc', repeat=length):
if lower <= s.count('g') + s.count('c') <= upper:
s = ''.join(s)
if not ('aaaa' in s or
'tttt' in s or
'cccc' in s or
'gggg' in s):
yield s
def main():
funcs = [
lambda: generate('', 'atgc'),
brute_force
]
# Testing performance
for func in funcs:
# This needs to be big to get an accurate measure,
# otherwise `brute_force` seems slower than it really is.
# This is probably because of how `itertools.product`
# is implemented.
count = 100000000
start = time()
for _ in islice(func(), count):
pass
print(time() - start)
# Testing correctness
global length
length = 12
for x, y in zip(*[func() for func in funcs]):
assert x == y, (x, y)
main()
On my machine, generate was just a bit faster than brute_force, at about 390 seconds vs 425. This was pretty much as fast as I could make them. I think the full thing would take about 2 hours. Of course, actually processing them will take much longer. The problem is that your constraints don't reduce the full set much.
Here's an example of how to use this in parallel across 16 processes:
from multiprocessing.pool import Pool
alpha = 'atgc'
def generate_worker(start):
start = ''.join(start)
for s in generate(start, alpha):
print(s)
Pool(16).map(generate_worker, product(alpha, repeat=2))

Since you happen to have an alphabet of length 4 (or any "power of 2 integer"), the idea of using and integer ID and bit-wise operations comes to mind instead of checking for consecutive characters in strings. We can assign an integer value to each of the characters in alphabet, for simplicity lets use the index corresponding to each letter.
Example:
6546354310 = 33212321033134 = 'aaaddcbcdcbaddbd'
The following function converts from a base 10 integer to a word using alphabet.
def id_to_word(word_id, word_len):
word = ''
while word_id:
rem = word_id & 0x3 # 2 bits pet letter
word = ALPHABET[rem] + word
word_id >>= 2 # Bit shift to the next letter
return '{2:{0}>{1}}'.format(ALPHABET[0], word_len, word)
Now for a function to check whether a word is "good" based on its integer ID. The following method is of a similar format to id_to_word, except a counter is used to keep track of consecutive characters. The function will return False if the maximum number of identical consecutive characters is exceeded, otherwise it returns True.
def check_word(word_id, max_consecutive):
consecutive = 0
previous = None
while word_id:
rem = word_id & 0x3
if rem != previous:
consecutive = 0
consecutive += 1
if consecutive == max_consecutive + 1:
return False
word_id >>= 2
previous = rem
return True
We're effectively thinking of each word as an integer with base 4. If the Alphabet length was not a "power of 2" value, then modulo % alpha_len and integer division // alpha_len could be used in place of & log2(alpha_len) and >> log2(alpha_len) respectively, although it would take much longer.
Finally, finding all the good words for a given word_len. The advantage of using a range of integer values is that you can reduce the number of for-loops in your code from word_len to 2, albeit the outer loop is very large. This may allow for more friendly multiprocessing of your good word finding task. I have also added in a quick calculation to determine the smallest and largest IDs corresponding to good words, which helps significantly narrow down the search for good words
ALPHABET = ('a', 'b', 'c', 'd')
def find_good_words(word_len):
max_consecutive = 3
alpha_len = len(ALPHABET)
# Determine the words corresponding to the smallest and largest ids
smallest_word = '' # aaabaaabaaabaaab
largest_word = '' # dddcdddcdddcdddc
for i in range(word_len):
if (i + 1) % (max_consecutive + 1):
smallest_word = ALPHABET[0] + smallest_word
largest_word = ALPHABET[-1] + largest_word
else:
smallest_word = ALPHABET[1] + smallest_word
largest_word = ALPHABET[-2] + largest_word
# Determine the integer ids of said words
trans_table = str.maketrans({c: str(i) for i, c in enumerate(ALPHABET)})
smallest_id = int(smallest_word.translate(trans_table), alpha_len) # 1077952576
largest_id = int(largest_word.translate(trans_table), alpha_len) # 3217014720
# Find and store the id's of "good" words
counter = 0
goodies = []
for i in range(smallest_id, largest_id + 1):
if check_word(i, max_consecutive):
goodies.append(i)
counter += 1
In this loop I have specifically stored the word's ID as opposed to the actual word itself incase you are going to use the words for further processing. However, if you are just after the words then change the second to last line to read goodies.append(id_to_word(i, word_len)).
NOTE: I receive a MemoryError when attempting to store all good IDs for word_len >= 14. I suggest writing these IDs/words to a file of some sort!

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

When given a letter sequence , find closest from given list - python

Related

My Function To Count The Largest Binary Gap Doesn't Work But I Can't Figure Out Why

How can I convert an IntVector to an Int in z3py

What is a faster version of this code instead of the double for-loop (python)?

Next lexicographically bigger string permutation and solution efficiency

Compressing multiple nested `for` loops

Categories

Resources