sequential counting using letters instead of numbers [duplicate] - python

This question already has answers here:
How to count sequentially using letters instead of numbers?
(3 answers)
Closed 2 months ago.
I need a method that 'increments' the string a to z and than aa to az and then ba to bz and so on, like the columns in an excel sheet. I will feed the method the previous string and it should increment to the next letter.
PSEUDO CODE
def get_next_letter(last_letter):
return last_letter += 1
So I could use it like so:
>>> get_next_letter('a')
'b'
>>> get_next_letter('b')
'c'
>>> get_next_letter('c')
'd'
...
>>> get_next_letter('z')
'aa'
>>> get_next_letter('aa')
'ab'
>>> get_next_letter('ab')
'ac'
...
>>> get_next_letter('az')
'ba'
>>> get_next_letter('ba')
'bb'
...
>>> get_next_letter('zz')
'aaa'

I believe there are better ways to handle this, but you can implement the algorithm for adding two numbers on paper...
def get_next_letter(string):
x = list(map(ord, string)) # convert to list of numbers
x[-1] += 1 # increment last element
result = ''
carry = 0;
for c in reversed(x):
result = chr((c + carry )) + result # i'm not accounting for when 'z' overflows here
carry = c > ord('z')
if carry: # add the new letter at the beggining in case there is still carry
result = 'a' + result
return result.replace('{', 'a') # replace overflowed 'z' with 'a'

all proposed are just way too complicated
I came up with below, using a recursive call,
this is it!
def getNextLetter(previous_letter):
"""
'increments' the provide string to the next letter recursively
raises TypeError if previous_letter is not a string
returns "a" if provided previous_letter was emtpy string
"""
if not isinstance(previous_letter, str):
raise TypeError("the previous letter should be a letter, doh")
if previous_letter == '':
return "a"
for letter_location in range(len(previous_letter) - 1, -1, -1):
if previous_letter[letter_location] == "z":
return getNextLetter(previous_letter[:-1])+"a"
else:
characters = "abcdefghijklmnopqrstuvwxyz"
return (previous_letter[:-1])\
+characters[characters.find(previous_letter[letter_location])+1]
# EOF

Why not use openpyxl's get_column_letter and column_index_from_string
from openpyxl.utils import get_column_letter, column_index_from_string
# or `from openpyxl.utils.cell import get_column_letter, column_index_from_string`
def get_next_letter(s: str) -> str:
return get_column_letter(
column_index_from_string(s) + 1
).lower()
and then
>>> get_next_letter('aab')
'aac'
>>> get_next_letter('zz')
'aaa'
?
Keeping in mind that this solution only works in [A, ZZZ[.

I fact what you want to achieve is increment a number expressed in base26 (using the 26 alphabet letters as symbols).
We all know decimal base that we use daily.
We know hexadecimal that is in fact base16 with symbols including digits and a, b, c, d, e, f.
Example : 0xff equals 15.
An approach is to convert into base10, increment the result decimal number, then convert it back to base26.
Let me explain.
I define 2 functions.
A first function to convert a string (base26) into a base10 (decimal) number.
str_tobase10("abcd") # 19010
The inverse function to convert a base10 number (decimal) to a string (base26).
base10_tostr(19010) # abcd
get_next_letter() just has to convert the string to a number, increment by one and converts back to a string.
Advantages :
pure Python, no extra lib/dependency required.
works with very long strings
Example :
get_next_letter("abcdefghijz") # abcdefghika
def str_tobase10(value: str) -> int:
n = 0
for letter in value:
n *= 26
n += ord(letter)-ord("a")+1
return n
def base10_tostr(value: int) -> str:
s = ""
n = value
while n > 26:
r = n % 26
s = chr(ord("a")-1+r) + s
n = n // 26
return chr(ord("a")-1+n) + s
def get_next_letter(value: str):
n = str_tobase10(value)
return base10_tostr(n+1)

Related

Count the amount of numbers that are length of 5 and has exactly two repeating numbers

I'm trying to count the amount of numbers that are length of 5 and has exactly two repeating numbers (where zero can be leading like 00123). What I did:
def checkNumber(num):
temp = [0] * 10
for d in map(int, str(num)):
temp[d] += 1
contains_two_unique_digits = False
for d in temp:
if d > 2:
return False
if d == 2:
if contains_two_unique_digits:
return False
contains_two_unique_digits = True
return True
counter = 0
for num in range(10000,100000):
counter += checkNumber(num)
print(counter)
But of course it does not count the cases with a leading zeros. How can I add them here? Python does not allow 001234 numbers.
The zfill method of the str type might be of help.
>>> "123".zfill(5)
'00123'
>>> "123456789".zfill(5)
'123456789'
To convert an int to a str, simply use str:
>>> str(123)
'123'
If you NEED to accept integers in python, then you are correct that no solution can be made. If you can accept a string and treat the values inside as integers, then you can definitely make it happen. A integer or float with leading zeros will always cause an error in Python.
def checkNumber(string_input):
# is it a string?
if not isinstance(string_input, str):
string_input = str(string_input)
# does it have five characters?
if not len(string_input) == 5:
raise ValueError('You must enter a string of length 5.')
counter_list = [0] * 10
# iterate over the string and count how many of each integer we have
for value in string_input:
if value.isdigit():
counter_list[int(value)] += 1
# check to see if any of them have identically two
for value_count in counter_list:
if value_count == 2:
return True
return False
print(checkNumber('01234'))
If you want to iterate over numbers by their digit representation, then itertool.product() will likely be more useful. itertools.product(string.digits, repeat=5) will yield each of the numbers you need, as digit tuples.
Taking advantage of collections.Counter can also help here, and avoids the flag / loop logic.
from collections import Counter
from itertools import product
from typing import Tuple
import string
def has_two_repeats(digits: Tuple[str, ...]):
counts = Counter(digits).values()
return (
# no digits occur more than 2 times
sum(count > 2 for count in counts) == 0 and
# one digit occcurs two times
sum(count == 2 for count in counts) == 1
)
def all_two_repeats_five_digits() -> int:
return sum(
has_two_repeats(digits)
for digits in product(string.digits, repeat=5)
)
print(all_two_repeats_five_digits())

How to find the max number of times a sequence of characters repeats consecutively in a string? [duplicate]

This question already has answers here:
How to count consecutive repetitions of a substring in a string?
(4 answers)
Closed 1 year ago.
I'm working on a cs50/pset6/dna project. I'm struggling with finding a way to analyze a sequence of strings, and gather the maximum number of times a certain sequence of characters repeats consecutively. Here is an example:
String: JOKHCNHBVDBVDBVDJHGSBVDBVD
Sequence of characters I should look for: BVD
Result: My function should be able to return 3, because in one point the characters BVD repeat three times consecutively, and even though it repeats again two times, I should look for the time that it repeats the most number of times.
It's a bit lame, but one "brute-force"ish way would be to just check for the presence of the longest substring possible. As soon as a substring is found, break out of the loop:
EDIT - Using a function might be more straight forward:
def get_longest_repeating_pattern(string, pattern):
if not pattern:
return ""
for i in range(len(string)//len(pattern), 0, -1):
current_pattern = pattern * i
if current_pattern in string:
return current_pattern
return ""
string = "JOKHCNHBVDBVDBVDJHGSBVDBVD"
pattern = "BVD"
longest_repeating_pattern = get_longest_repeating_pattern(string, pattern)
print(len(longest_repeating_pattern))
EDIT - explanation:
First, just a simple for-loop that starts at a larger number and goes down to a smaller number. For example, we start at 5 and go down to 0 (but not including 0), with a step size of -1:
>>> for i in range(5, 0, -1):
print(i)
5
4
3
2
1
>>>
if string = "JOKHCNHBVDBVDBVDJHGSBVDBVD", then len(string) would be 26, if pattern = "BVD", then len(pattern) is 3.
Back to my original code:
for i in range(len(string)//len(pattern), 0, -1):
Plugging in the numbers:
for i in range(26//3, 0, -1):
26//3 is an integer division which yields 8, so this becomes:
for i in range(8, 0, -1):
So, it's a for-loop that goes from 8 to 1 (remember, it doesn't go down to 0). i takes on the new value for each iteration, first 8 , then 7, etc.
In Python, you can "multiply" strings, like so:
>>> pattern = "BVD"
>>> pattern * 1
'BVD'
>>> pattern * 2
'BVDBVD'
>>> pattern * 3
'BVDBVDBVD'
>>>
A slightly less bruteforcey solution:
string = 'JOKHCNHBVDBVDBVDJHGSBVDBVD'
key = 'BVD'
len_k = len(key)
max_l = 0
passes = 0
curr_len=0
for i in range(len(string) - len_k + 1): # split the string into substrings of same len as key
if passes > 0: # If key was found in previous sequences, pass ()this way, if key is 'BVD', we will ignore 'VD.' and 'D..'
passes-=1
continue
s = string[i:i+len_k]
if s == key:
curr_len+=1
if curr_len > max_l:
max_l=curr_len
passes = len(key)-1
if prev_s == key:
if curr_len > max_l:
max_l=curr_len
else:
curr_len=0
prev_s = s
print(max_l)
You can do that very easily, elegantly and efficiently using a regex.
We look for all sequences of at least one repetition of your search string. Then, we just need to take the maximum length of these sequences, and divide by the length of the search string.
The regex we use is '(:?<your_sequence>)+': at least one repetition (the +) of the group (<your_sequence>). The :? is just here to make the group non capturing, so that findall returns the whole match, and not just the group.
In case there is no match, we use the default parameter of the max function to return 0.
The code is very short, then:
import re
def max_consecutive_repetitions(search, data):
search_re = re.compile('(?:' + search + ')+')
return max((len(seq) for seq in search_re.findall(data)), default=0) // len(search)
Sample run:
print(max_consecutive_repetitions("BVD", "JOKHCNHBVDBVDBVDJHGSBVDBVD"))
# 3
This is my contribution, I'm not a professional but it worked for me (sorry for bad English)
results = {}
# Loops through all the STRs
for i in range(1, len(reader.fieldnames)):
STR = reader.fieldnames[i]
j = 0
s=0
pre_s = 0
# Loops through all the characters in sequence.txt
while j < (len(sequence) - len(STR)):
# checks if the character we are currently looping is the same than the first STR character
if STR[0] == sequence[j]:
# while the sub-string since j to j - STR lenght is the same than STR, I called this a streak
while sequence[j:(j + len(STR))] == STR:
# j skips to the end of sub-string
j += len(STR)
# streaks counter
s += 1
# if s > 0 means that that the whole STR and sequence coincided at least once
if s > 0:
# save the largest streak as pre_s
if s > pre_s:
pre_s = s
# restarts the streak counter to continue exploring the sequence
s=0
j += 1
# assigns pre_s value to a dictionary with the current STR as key
results[STR] = pre_s
print(results)

How to improve python dict performance?

I recently coded a python solution using dictoionaries which got TLE verdict. The solution is exactly similar to a multiset solution in c++ which works. So, we are sure that the logic is correct, but the implementation is not upto the mark.
The problem description for understanding below code (http://codeforces.com/contest/714/problem/C):
For each number we need to get a string of 0s and 1s such that i'th digit is 0/1 if respective ith digit in number is even/odd.
We need to maintain the count of number that have the same mapping that is given by above described point.
Any hints/pointer to improve the performance of below code? It gave TLE (Time Limit Exceeded) for a large test case(http://codeforces.com/contest/714/submission/20594344).
from collections import defaultdict
def getPattern(s):
return ''.join(list(s.zfill(19)))
def getSPattern(s):
news = s.zfill(19)
patlist = [ '0' if (int(news[i])%2 == 0) else '1' for i in range(19) ]
return "".join(patlist)
t = int(raw_input())
pat = defaultdict(str) # holds strings as keys and int as value
for i in range(0, t):
oper, num = raw_input().strip().split(' ')
if oper == '+' :
pattern = getSPattern(str(num))
if pattern in pat:
pat[pattern] += 1
else:
pat[pattern] = 1
elif oper == '-' :
pattern = getSPattern(str(num))
pat[pattern] = max( pat[pattern] - 1, 0)
elif oper == '?' :
print pat.get(getPattern(num) , 0 )
I see lots of small problems with your code but can't say if they add up to significant performance issues:
You've set up, and used, your defaultdict() incorrectly:
pat = defaultdict(str)
...
if pattern in pat:
pat[pattern] += 1
else:
pat[pattern] = 1
The argument to the defaultdict() constructor should be the type of the values, not the keys. Once you've set up your defaultdict properly, you can simply do:
pat = defaultdict(int)
...
pat[pattern] += 1
As the value will now default to zero if the pattern isn't there already.
Since the specification says:
 -  ai — delete a single occurrence of non-negative integer ai from the multiset. It's guaranteed, that there is at least one ai in the
multiset.
Then this:
pat[pattern] = max( pat[pattern] - 1, 0)
can simply be this:
pat[pattern] -= 1
You're working with 19 character strings but since the specification says the numbers will be less than 10 ** 18, you can work with 18 character strings instead.
getSPattern() does a zfill() and then processes the string, it should do it in the reverse order, process the string and then zfill() it, as there's no need to run the logic on the leading zeros.
We don't need the overhead of int() to convert the characters to numbers:
(int(news[i])%2 == 0)
Consider using ord() instead as the ASCII values of the digits have the same parity as the digits themselves: ord('4') -> 52
And you don't need to loop over the indexes, you can simply loop over the characters.
Below is my rework of your code with the above changes, see if it still works (!) and gains you any performance:
from collections import defaultdict
def getPattern(string):
return string.zfill(18)
def getSPattern(string):
# pattern_list = (('0', '1')[ord(character) % 2] for character in string)
pattern_list = ('0' if ord(character) % 2 == 0 else '1' for character in string)
return ("".join(pattern_list)).zfill(18)
patterns = defaultdict(int) # holds keys as strings as and values as int
text = int(raw_input())
for _ in range(text):
operation, number = raw_input().strip().split()
if operation == '+':
pattern = getSPattern(number)
patterns[pattern] += 1
elif operation == '-':
pattern = getSPattern(number)
patterns[pattern] -= 1
elif operation == '?':
print patterns.get(getPattern(number), 0)
With the explanation already done by #cdlane, I just need to add my rewrite of getSPattern where I think the bulk of time is spent. As per my initial comment this is available on https://eval.in/641639
def getSPattern(s):
patlist = ['0' if c in ['0', '2', '4', '6', '8'] else '1' for c in s]
return "".join(patlist).zfill(19)
Using zfill(18) might marginally spare you some time.

New Programmer: For Loops and While Loops(python) [duplicate]

This question already has answers here:
Understanding slicing
(38 answers)
Closed 7 years ago.
I'm a new programmer and I'm having a lot of trouble understanding for loops and while loops. In what situations would I know to use a for loop and in what situations would I know to use a while loop?
Also, could you explain to me what these 2 codes mean? I have a a lot of confusion.
1 function:
def every_nth_character(s, n):
""" (str, int) -> str
Precondition: n > 0
Return a string that contains every nth character from s, starting at index 0.
>>> every_nth_character('Computer Science', 3)
'CpeSee'
"""
result = ''
i = 0
while i < len(s):
result = result + s[i]
i = i + n
return result
****What does s[i] mean?****
2nd function:
def find_letter_n_times(s, letter, n):
""" (str, str, int) -> str
Precondition: letter occurs at least n times in s
Return the smallest substring of s starting from index 0 that contains
n occurrences of letter.
>>> find_letter_n_times('Computer Science', 'e', 2)
'Computer Scie'
"""
i = 0
count = 0
while count < n:
if s[i] == letter:
count = count + 1
i = i + 1
return s[:i]
what does s[i] and s[:i] mean??
S is a list of characters 'Computer Science'["C","o","m","p"...], and i is the indexposition for each item/character in the list S, so in your case you've stated that your loop counts each third(3) item in S as long as there are items in S, that is, S[i] = [C], S[i]=[p], S=[e], S[i]= C, S[i]=p, where i is each third element in S.
In the second case you've defined i as a variable with value 0, after each loop i increases with +1, i = i + 1, and [:i] means return elements in S up to the latest looped slice, for instance "Computer Scie" + one additional loop would give you "Computer Scien" (i = 9 (the current range of S/number looped characters in S) -> i+1 (increased by +1) -> i=10 (i = 10, S[i]=10 means the first 10 indexpositions/charachters in S]
Your first question about differencies in while and for loops is completely answered here.
Strings and indexing:
Variable s holds a string value. As you may have noticed, it has been submitted as an argument for every_nth_character(s, n) function.
Now every letter in a string is in some position and that position is called index. indexing starts from 0. So if we have a string s containing value 'foo123', it's first character s[0] is 'f' and the last character s[5] = 3.
String can be cutted and sliced using ':' in the index field. Referring to the previous example, we have determined string s. Now you can take only first three characters of that string by using s[:3] and get 'foo' as a result. Read more about it here
Loops:
While and for loops start over and over again until they reach the limit you have determined.
For example:
x = 0
while x < 5:
print x
x = x + 1
prints numbers from 0 to 4. Variable x increases +1 at every single run and the loop ends when x reaches value 5.
Get familiar with Python documentation page, it will help you a lot in future especially in basic things. Google search: Python (your-python-version) documentation

Encoding a 128-bit integer in Python?

Inspired by the "encoding scheme" of the answer to this question, I implemented my own encoding algorithm in Python.
Here is what it looks like:
import random
from math import pow
from string import ascii_letters, digits
# RFC 2396 unreserved URI characters
unreserved = '-_.!~*\'()'
characters = ascii_letters + digits + unreserved
size = len(characters)
seq = range(0,size)
# Seed random generator with same randomly generated number
random.seed(914576904)
random.shuffle(seq)
dictionary = dict(zip(seq, characters))
reverse_dictionary = dict((v,k) for k,v in dictionary.iteritems())
def encode(n):
d = []
n = n
while n > 0:
qr = divmod(n, size)
n = qr[0]
d.append(qr[1])
chars = ''
for i in d:
chars += dictionary[i]
return chars
def decode(str):
d = []
for c in str:
d.append(reverse_dictionary[c])
value = 0
for i in range(0, len(d)):
value += d[i] * pow(size, i)
return value
The issue I'm running into is encoding and decoding very large integers. For example, this is how a large number is currently encoded and decoded:
s = encode(88291326719355847026813766449910520462)
# print s -> "3_r(AUqqMvPRkf~JXaWj8"
i = decode(s)
# print i -> "8.82913267194e+37"
# print long(i) -> "88291326719355843047833376688611262464"
The highest 16 places match up perfectly, but after those the number deviates from its original.
I assume this is a problem with the precision of extremely large integers when dividing in Python. Is there any way to circumvent this problem? Or is there another issue that I'm not aware of?
The problem lies within this line:
value += d[i] * pow(size, i)
It seems like you're using math.pow here instead of the built-in pow method. It returns a floating point number, so you lose accuracy for your large numbers. You should use the built-in pow or the ** operator or, even better, keep the current power of the base in an integer variable:
def decode(s):
d = [reverse_dictionary[c] for c in s]
result, power = 0, 1
for x in d:
result += x * power
power *= size
return result
It gives me the following result now:
print decode(encode(88291326719355847026813766449910520462))
# => 88291326719355847026813766449910520462

Categories

Resources