Understanding repetition count in a Python String

Understanding repetition count in a Python String - python

I have this challenge: Repeated String to solve. I have been trying to solve the challenge but getting failure cos of Memory Failure. I cannot help this situation cos the HackerRank platform is not supporting my solution. Might be 32-bit platform.
I have this solution for this, which is quite working for problem having smaller length, but I have worked on this thing to learn according to less memory usage.
My Code:
def repeatedString(s, n):
if(s != 'a'):
return len([x for x in (s*n)[:n] if x == 'a'])
return n
Now this throws Memory Error error for input having very large length, and string.
I have researched on it, and saw some submissions, and found this out.
Correct Solution from Leaderboard:
def repeatedString(s, n):
L = len(s)
return (s.count('a') * (n//L) + s[:n % L].count('a'))
That's it! I got so confused by this solution that I could figure what is actually happening and how. Could anybody please let me know how the above correct solution works? I am new to python and trying my best to get my hands dirty on competitive coding. Thank You!

Your function is throwing a Memory error because you are constructing a string with the length of the input paramater n.
n can be 10^12, which results in a string with a maximum length 1000 billion letters, which would mean the string you are creating has a possible memory size of 1 terabyte (Possibly more depending on the encoding of your string).
So there has to be another way to count the number of a's in a string of that size right?
Yes (That's why the correct answer is different from your solution).
1. First we get the length of the input string.
L = len(s)
For example 'abcde' has a length of 5.
2. Then, we count the number of 'a's in s.
s.count('a')
3. Next, we want to know how many times s is repeated as a whole before we reach a string with a length of n.
(n//L)
The // operator is called integer division, which results in a whole number. For instance with s='abcde' and n=14, n//L equals 2.
4. Multiple the number of 'a's in s by the number of times s can fit into a string of length n.
s.count('a') * (n//L)
5. We are almost done, but for our example, something is still missing. 'abcde' can be repeated twice inside a string of length n, but there are still 4 characters left, in our example 'abcd'.
Here, we construct the remaining string from s with s[:n % L], or in our example s[:14 % 5] or s[:4], which results in 'abcd'.
Then we count the number of 'a's in this string with s[:n % L].count('a')
6. Add it all together and we get the function in your answer:
def repeatedString(s, n):
L = len(s)
return (s.count('a') * (n//L) + s[:n % L].count('a'))

So, the key difference between the two algorithms is that in your original, you do s*n, which will actually try to build the massive string in-memory. This is why you get the memory error.
The second algorithm essentially says "For a string s of length X that's repeated out to length N, s will fit into M N//X times, possibly with a chunk of s left over (the division remainder).
E.g. if your string is aab (X=3) and N is 10, you know the string will fit 3 times, with 1 character left over.
So, given there are 2 letter a in s, you know that there will be 2*3 a in the first 9 chars. Now you need to deal with the remainder. The final character (the remainder) will be the first character of s.
In the second solution, s.count('a') * (n//L) + s[:n % L].count('a') is these two parts;
s.count('a') * (n//L) - This gives you the 2 * 3 in my example.
s[:n % L].count('a') - this gives you the 'remainder' part.

Related

How to tackle calculating then verifying 10^8 solutions to find the one true answer?

I have a number which is 615 digits in length. Throughout the number, there 8 fixed places where a digit is missing. I have to find what those missing digits are. So there are 10^8 possibilities. After computing them I have to raise a ciphetext to each possible number, and see what the output is (mod N), and see which number gives the correct output. In other words, I am trying to find the decryption key in an RSA problem. My main concern right now is how to efficiently/properly create all 10^8 possible answers.
I am using gmpy2, and to get that to work, I had to download Python2.7 just to not get an error when trying to install gmpy2. I hope they are adequate enough to tackle this problem. If not, I would really appreciate someone pointing me in the correct direction.
I have not tried anything yet, as Im sure this will take hours to compute. So I really want to make sure I am doing everything correct so that if I let my laptop run for a couple hours, I do not mess up the insides, nor will it freeze and I will be sitting here not knowing if my laptop messed up, or if its still computing.
So I suppose I am trying to seek advice on how I should proceed further.
In terms of actual code, I suppose looping through 0-9 8 times is not that hard, but I dont know how to a number into another number. In Python, how do I make it so that a number will only be inserted into the position I need it to? The number looks like this example:
X = 124621431523_13532535_62635292 //this is only 30 digits long, mine is 615 digits long
where each "_" is where a number is missing.
I am completely at a loss on how to do this.
Once all the numbers are generated, I aim to loop through them all and raise them until I get the answer required. This part seems to be a bit easier, as it seems like just a simple loop.
So I guess my main question is how to loop through 10^8 numbers but placing them in a specific spot inside a number that is already 615 digits long? I am seeking advice on technical as well as code design so as to not take too long to generate them all.
Thank you for reading.

Turn the number into a string, use the format method, use itertools.product to generate numbers to fill the holes, then turn it back.
Example:
from itertools import product
def make_seed(n, replace_positions):
seed = str(n)
to_replace = [-1] + replace_positions
substrings = [seed[start + 1:end]
for start, end
in zip(to_replace, to_replace[1:])] + [seed[to_replace[-1] + 1:]]
return '{}'.join(substrings)
def make_number(seed):
n = seed.count('{}')
for numbers in product([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], repeat=n):
yield int(seed.format(*numbers))
seed = make_seed(123456789, [3, 5, 7])
# seed = '123{}5{}7{}9'
for i in make_number(seed):
print(i)
Output:
123050709
123050719
123050729
123050739
123050749
123050759
123050769
123050779
123050789
123050799
123051709
123051719
123051729
...

Since a decimal digit is just summation of digit * pow(10, n), you can assume the unknown digits to be zero, and add it with the digit-products
# 124621431523_13532535_62635292 this is the original digit
x = 124621431523013532535062635292
positions = [8,17] # the missing digits are the 8th and 17th digits from the right
from itertools import product
trials = product(range(0,10), repeat=2)
for t in trials:
x_prime = x
for (digit, pos) in zip(t, positions):
x_prime = x_prime + digit * pow(10, pos)
print(x_prime) # do your checking here
outputs:
124621431523013532535062635292
124621431523113532535062635292
124621431523213532535062635292
124621431523313532535062635292
...
etc

Pseudorandom Algorithm for VERY Large (10^1.2mil) Numbers?

I'm looking for a pseudo-random number generator (an algorithm where you input a seed number and it outputs a different 'random-looking' number, and the same seed will always generate the same output) for numbers between 1 and 951,312,000.
I would use the Linear Feedback Shift Register (LFSR) PRNG, but if I did, I would have to convert the seed number (which could be up to 1.2 million digits long in base-10) into a binary number, which would be so massive that I think it would take too long to compute.
In response to a similar question, the Feistel cipher was recommended, but I didn't understand the vocabulary of the wiki page for that method (I'm going into 10th grade so I don't have a degree in encryption), so if you could use layman's terms, I would strongly appreciate it.
Is there an efficient way of doing this which won't take until the end of time, or is this problem impossible?
Edit: I forgot to mention that the prng sequence needs to have a full period. My mistake.

A simple way to do this is to use a linear congruential generator with modulus m = 95^1312000.
The formula for the generator is x_(n+1) = a*x_n + c (mod m). By the Hull-Dobell Theorem, it will have full period if and only if gcd(m,c) = 1 and 95 divides a-1. Furthermore, if you want good second values (right after the seed) even for very small seeds, a and c should be fairly large. Also, your code can't store these values as literals (they would be much too big). Instead, you need to be able to reliably produce them on the fly. After a bit of trial and error to make sure gcd(m,c) = 1, I hit upon:
import random
def get_book(n):
random.seed(1941) #Borges' Library of Babel was published in 1941
m = 95**1312000
a = 1 + 95 * random.randint(1, m//100)
c = random.randint(1, m - 1) #math.gcd(c,m) = 1
return (a*n + c) % m
For example:
>>> book = get_book(42)
>>> book % 10**100
4779746919502753142323572698478137996323206967194197332998517828771427155582287891935067701239737874
shows the last 100 digits of "book" number 42. Given Python's built-in support for large integers, the code runs surprisingly fast (it takes less than 1 second to grab a book on my machine)

If you have a method that can produce a pseudo-random digit, then you can concatenate as many together as you want. It will be just as repeatable as the underlying prng.
However, you'll probably run out of memory scaling that up to millions of digits and attempting to do arithmetic. Normally stuff on that scale isn't done on "numbers". It's done on byte vectors, or something similar.

How do I represent a string as a number?

I need to represent a string as a number, however it is 8928313 characters long, note this string can contain more than just alphabet letters, and I have to be able to convert it back efficiently too. My current (too slow) code looks like this:
alpha = 'abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ,.?!#()+-=[]/*1234567890^*{}\'"$\\&#;|%<>:`~_'
alphaLeng = len(alpha)
def letterNumber(letters):
letters = str(letters)
cof = 1
nr = 0
for i in range(len(letters)):
nr += cof*alpha.find(letters[i])
cof *= alphaLeng
print(i,' ',len(letters))
return str(nr)

Ok, since other people are giving awful answers, I'm going to step in.
You shouldn't do this.
You shouldn't do this.
An integer and an array of characters are ultimately the same thing: bytes. You can access the values in the same way.
Most number representations cap out at 8 bytes (64-bits). You're looking at 8 MB, or 1 million times the largest integer representation. You shouldn't do this. Really.
You shouldn't do this. Your number will just be a custom, gigantic number type that would be identical under the hood.
If you really want to do this, despite all the reasons above, here's how...
Code
def lshift(a, b):
# bitwise left shift 8
return (a << (8 * b))
def string_to_int(data):
sum_ = 0
r = range(len(data)-1, -1, -1)
for a, b in zip(bytearray(data), r):
sum_ += lshift(a, b)
return sum_;
DONT DO THIS
Explanation
Characters are essentially bytes: they can be encoded in different ways, but ultimately you can treat them within a given encoding as a sequence of bytes. In order to convert them to a number, we can shift them left 8-bits for their position in the sequence, creating a unique number. r, the range value, is the position in reverse order: the 4th element needs to go left 24 bytes (3*8), etc.
After getting the range and converting our data to 8-bit integers, we can then transform the data and take the sum, giving us our unique identifier. It will be identical byte-wise (or in reverse byte-order) of the original number, but just "as a number". This is entirely futile. Don't do it.
Performance
Any performance is going to be outweighed by the fact that you're creating an identical object for no valid reason, but this solution is decently performant.
1,000 elements takes ~486 microseconds, 10,000 elements takes ~20.5 ms, while 100,000 elements takes about 1.5 seconds. It would work, but you shouldn't do it. This means it's scaled as O(n**2), which is likely due to memory overhead of reallocating the data each time the integer size gets larger. This might take ~4 hours to process all 8e6 elements (14365 seconds, calculated fitting the lower-order data to ax**2+bx+c). Remember, this is all to get the identical byte representation as the original data.
Futility
Remember, there are ~1e78 to 1e82 atoms in the entire universe, on current estimates. This is ~2^275. Your value will be able to represent 2^71426504, or about 260,000 times as many bits as you need to represent every atom in the universe. You don't need such a number. You never will.

If there are only ANSII characters. You can use ord() and chr().
built-in functions

There are several optimizations you can perform. For example, the find method requires searching through your string for the corresponding letter. A dictionary would be faster. Even faster might be (benchmark!) the chr function (if you're not too picky about the letter ordering) and the ord function to reverse the chr. But if you're not picky about ordering, it might be better if you just left-NULL-padded your string and treated it as a big binary number in memory if you don't need to display the value in any particular format.
You might get some speedup by iterating over characters instead of character indices. If you're using Python 2, a large range will be slow since a list needs to be generated (use xrange instead for Python 2); Python 3 uses a generator, so it's better.
Your print function is going to slow down output a fair bit, especially if you're outputting to a tty.
A big number library may also buy you speed-up: Handling big numbers in code

Your alpha.find() function needs to iterate through alpha on each loop.
You can probably speed things up by using a dict, as dictionary lookups are O(1):
alpha = 'abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ,.?!#()+-=[]/*1234567890^*{}\'"$\\&#;|%<>:`~_'
alpha_dict = { letter: index for index, letter in enumerate(alpha)}
print(alpha.find('$'))
# 83
print(alpha_dict['$'])
# 83

Store your strings in an array of distinct values; i.e. a string table. In your dataset, use a reference number. A reference number of n corresponds to the nth element of the string table array.

Solving recursive sequence

Lately I've been solving some challenges from Google Foobar for fun, and now I've been stuck in one of them for more than 4 days. It is about a recursive function defined as follows:
R(0) = 1
R(1) = 1
R(2) = 2
R(2n) = R(n) + R(n + 1) + n (for n > 1)
R(2n + 1) = R(n - 1) + R(n) + 1 (for n >= 1)
The challenge is writing a function answer(str_S) where str_S is a base-10 string representation of an integer S, which returns the largest n such that R(n) = S. If there is no such n, return "None". Also, S will be a positive integer no greater than 10^25.
I have investigated a lot about recursive functions and about solving recurrence relations, but with no luck. I outputted the first 500 numbers and I found no relation with each one whatsoever. I used the following code, which uses recursion, so it gets really slow when numbers start getting big.
def getNumberOfZombits(time):
if time == 0 or time == 1:
return 1
elif time == 2:
return 2
else:
if time % 2 == 0:
newTime = time/2
return getNumberOfZombits(newTime) + getNumberOfZombits(newTime+1) + newTime
else:
newTime = time/2 # integer, so rounds down
return getNumberOfZombits(newTime-1) + getNumberOfZombits(newTime) + 1
The challenge also included some test cases so, here they are:
Test cases
==========
Inputs:
(string) str_S = "7"
Output:
(string) "4"
Inputs:
(string) str_S = "100"
Output:
(string) "None"
I don't know if I need to solve the recurrence relation to anything simpler, but as there is one for even and one for odd numbers, I find it really hard to do (I haven't learned about it in school yet, so everything I know about this subject is from internet articles).
So, any help at all guiding me to finish this challenge will be welcome :)

Instead of trying to simplify this function mathematically, I simplified the algorithm in Python. As suggested by #LambdaFairy, I implemented memoization in the getNumberOfZombits(time) function. This optimization sped up the function a lot.
Then, I passed to the next step, of trying to see what was the input to that number of rabbits. I had analyzed the function before, by watching its plot, and I knew the even numbers got higher outputs first and only after some time the odd numbers got to the same level. As we want the highest input for that output, I first needed to search in the even numbers and then in the odd numbers.
As you can see, the odd numbers take always more time than the even to reach the same output.
The problem is that we could not search for the numbers increasing 1 each time (it was too slow). What I did to solve that was to implement a binary search-like algorithm. First, I would search the even numbers (with the binary search like algorithm) until I found one answer or I had no more numbers to search. Then, I did the same to the odd numbers (again, with the binary search like algorithm) and if an answer was found, I replaced whatever I had before with it (as it was necessarily bigger than the previous answer).
I have the source code I used to solve this, so if anyone needs it I don't mind sharing it :)

The key to solving this puzzle was using a binary search.
As you can see from the sequence generators, they rely on a roughly n/2 recursion, so calculating R(N) takes about 2*log2(N) recursive calls; and of course you need to do it for both the odd and the even.
Thats not too bad, but you need to figure out where to search for the N which will give you the input. To do this, I first implemented a search for upper and lower bounds for N. I walked up N by powers of 2, until I had N and 2N that formed the lower and upper bounds respectively for each sequence (odd and even).
With these bounds, I could then do a binary search between them to quickly find the value of N, or its non-existence.

Slow Big Int Output in python

Is there anyway to improve performance of "str(bigint)" and "print bigint" in python ? Printing big integer values takes a lot of time. I tried to use the following recursive technique :
def p(x,n):
if n < 10:
sys.stdout.write(str(x))
return
n >>= 1
l = 10**n
k = x/l
p(k,n)
p(x-k*l,n)
n = number of digits,
x = bigint
But the method fails for certain cases where x in a sub call has leading zeros. Is there any alternative to it or any faster method. ( Please do not suggest using any external module or library ).

Conversion from a Python integer to a string has a running of O(n^2) where n is the length of the number. For sufficiently large numbers, it will be slow. For a 1,000,001 digit number, str() takes approximately 24 seconds on my computer.
If you are really needing to convert very large numbers to a string, your recursive algorithm is a good approach.
The following version of your recursive code should work:
def p(x,n=0):
if n == 0:
n = int(x.bit_length() * 0.3)
if n < 100:
return str(x)
n >>= 1
l = 10**n
a,b = divmod(x, l)
upper = p(a,n)
lower = p(b,n).rjust(n, "0")
return upper + lower
It automatically estimates the number of digits in the output. It is about 4x faster for a 1,000,001 digit number.
If you need to go faster, you'll probably need to use an external library.

For interactive applications, the built-in print and str functions run in the blink of an eye.
>>> print(2435**356)
392312129667763499898262143039114894750417507355276682533585134425186395679473824899297157270033375504856169200419790241076407862555973647354250524748912846623242257527142883035360865888685267386832304026227703002862158054991819517588882346178140891206845776401970463656725623839442836540489638768126315244542314683938913576544051925370624663114138982037489687849052948878188837292688265616405774377520006375994949701519494522395226583576242344239113115827276205685762765108568669292303049637000429363186413856005994770187918867698591851295816517558832718248949393330804685089066399603091911285844172167548214009780037628890526044957760742395926235582458565322761344968885262239207421474370777496310304525709023682281880997037864251638836009263968398622163509788100571164918283951366862838187930843528788482813390723672536414889756154950781741921331767254375186751657589782510334001427152820459605953449036021467737998917512341953008677012880972708316862112445813219301272179609511447382276509319506771439679815804130595523836440825373857906867090741932138749478241373687043584739886123984717258259445661838205364797315487681003613101753488707273055848670365977127506840194115511621930636465549468994140625
>>> str(2435**356)
'392312129667763499898262143039114894750417507355276682533585134425186395679473824899297157270033375504856169200419790241076407862555973647354250524748912846623242257527142883035360865888685267386832304026227703002862158054991819517588882346178140891206845776401970463656725623839442836540489638768126315244542314683938913576544051925370624663114138982037489687849052948878188837292688265616405774377520006375994949701519494522395226583576242344239113115827276205685762765108568669292303049637000429363186413856005994770187918867698591851295816517558832718248949393330804685089066399603091911285844172167548214009780037628890526044957760742395926235582458565322761344968885262239207421474370777496310304525709023682281880997037864251638836009263968398622163509788100571164918283951366862838187930843528788482813390723672536414889756154950781741921331767254375186751657589782510334001427152820459605953449036021467737998917512341953008677012880972708316862112445813219301272179609511447382276509319506771439679815804130595523836440825373857906867090741932138749478241373687043584739886123984717258259445661838205364797315487681003613101753488707273055848670365977127506840194115511621930636465549468994140625'
If however you are printing big integers to (standard output, say) so that they can be read (from standard input) by another process, and you are finding the binary-to-decimal operations impacting the overall performance, you can look at Is there a faster way to convert an arbitrary large integer to a big endian sequence of bytes? (although the accepted answer suggests numpy, which is an external library, though there are other suggestions).

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.