Counting number of binary substrings of length 3+ with one unique element

Counting number of binary substrings of length 3+ with one unique element - python

Essentially, I'm given two lines of input. The first line is an integer length of the substring presented in the second line of input, consisting of only Gs and Hs, similar to 0s and 1s.
N = int(input())
chars = list(input())
lonely = 0
for i in range(3, N + 1):
for j in range(N - i + 1):
if ((chars[j:j + i].count('G') == 1) or (chars[j:j + i].count('H') == 1)):
lonely += 1
print(lonely)
An example input is:
5
GHGHG
for which the answer is 3: the answer is the number of the original string's substrings of length 3 or greater that only have one G or one H (this G or H is 'lonely'), and for the above sample, the substrings that meet this criterion are chars[0:3], chars[1:4], and chars[2:5]. While I think this is technically correct, there are no constraints on N so I am timing out for test cases where N = 5000 and such (I have a time limit of 4 seconds per test case).
How do I work around this?
Thank you!

You could split the string on "G" and analyse the size of the left and right streak of H on each side of the splits. This will let you use compute the number of substrings with that lonely G in them. The number of substrings for a given G split will be formed of 3 parts: The n Hs on the left will form n-1 substrings that end with the G. The m Hs on the right will form m-1 substrings starting with the G. And the product of the left and right (n x m) will form substrings with the G in between Hs.
def count3(chars):
count = 0
for lonely in "GH": # count for G, then H
streaks = map(len,chars.split(lonely))
left = next(streaks) # first left side
for right in streaks: # get right sides
count += max(0,left-1) # HH...G
count += max(0,right-1) # G...HH
count += left*right # H...G...H
left = right # track new left side
return count
Output:
for testCase in ("G","GH","GHH","HG","HGH","HGHH","HHG","HHGH","HHGHH",
"GG","HHHGHHH","GGHGG","GGH"):
print(testCase,count3(testCase))
G 0
GH 0
GHH 1
HG 0
HGH 1
HGHH 3
HHG 1
HHGH 3
HHGHH 6
GG 0
HHHGHHH 13
GGHGG 6
GGH 1

Related

My code doesn't allow 2 times a character (how to fix)

I need to write a code that counts the amount of closed area's en the amount of ends within a word (so B has 2 closed area's) but when 1 character sits 2 times within 1 question it only counts 1 time.
I tried something that should count the amount of characters but that just gived me more errorzs
G = 0
Chosen_word = str(input("Choose a word of max 60 character(only uppercase)"))
if "A" in Chosen_word:
U = U + 2
G = G + 1
if you type AA it should print 4 ends en 2 closed area's but it prints 2 ends en 1 closed area

You're only going through this code once - for the first letter. To go through each letter, you need to use a loop (a for loop that goes through every character would be best here):
for letter in chosen_word:
if letter == 'A':
U = U + 2
G = G + 1
elif letter == 'B':
...

G = 0
U=0
Chosen_word = str(input("Choose a word of max 60 character(only uppercase)"))
n = Chosen_word.count("A")
U = n * 2
G = n
print (U)
print (G)
OUTPUT:
Choose a word of max 60 character(only uppercase)SADDSAAAA
10
5

Find the shortest substring whose replacement makes the string contain equal number of each character

I have a string of length n composed of letters A,G,C and T. The string is steady if it contains equal number of A,G,C and T(each n/4 times). I need to find the minimum length of the substring that when replaced makes it steady. Here's a link to the full description of the problem.
Suppose s1=AAGAAGAA.
Now since n=8 ideally it should have 2 As, 2 Ts, 2 Gs and 2 Cs. It has 4 excessive As. Hence we need a substring which contains at least 4 As.
I start by taking a 4 character substring from left and if not found then I increment a variable mnum(ie look for 5 variable substrings and so on).
We get AAGAA as an answer. But it's too slow.
from collections import Counter
import sys
n=int(input()) #length of string
s1=input()
s=Counter(s1)
le=int(n/4) #ideal length of each element
comp={'A':le,'G':le,'C':le,'T':le} #dictionary containing equal number of all elements
s.subtract(comp) #Finding by how much each element ('A','G'...) is in excess or loss
a=[]
b=[]
for x in s.values(): #storing frequency(s.values--[4,2]) of elements which are in excess
if(x>0):
a.append(x)
for x in s.keys(): #storing corresponding elements(s.keys--['A','G'])
if(s[x]>0):
b.append(x)
mnum=sum(a) #minimum substring length to start with
if(mnum==0):
print(0)
sys.exit
flag=0
while(mnum<=n): #(when length 4 substring with all the A's and G's is not found increasing to 5 and so on)
for i in range(n-mnum+1): #Finding substrings with length mnum in s1
for j in range(len(a)): #Checking if all of excess elements are present
if(s1[i:i+mnum].count(b[j])==a[j]):
flag=1
else:
flag=0
if(flag==1):
print(mnum)
sys.exit()
mnum+=1

The minimal substring can be found in O(N) time and O(N) space.
First count a frequency fr[i] of each character from the input of length n.
Now, the most important thing to realise is that the necessary and sufficient condition for a substring to be considered minimal, it must contain each excessive character with a frequency of at least fr[i] - n/4. Otherwise, it won't be possible to replace the missing characters. So, our task is to go through each such substring and pick the one with the minimal length.
But how to find all of them efficiently?
At start, minLength is n. We introduce 2 pointer indices - left and right (initially 0) that define a substring from left to right in the original string str. Then, we increment right until the frequency of each excessive character in str[left:right] is at least fr[i] - n/4. But it's not all yet since str[left : right] may contain unnecessary chars to the left (for example, they're not excessive and so can be removed). So, we increment left as long as str[left : right] still contains enough excessive elements. When we're finished we update minLength if it's larger than right - left. We repeat the procedure until right >= n.
Let's consider an example. Let GAAAAAAA be the input string. Then, the algorithm steps are as below:
1.Count frequencies of each character:
['G'] = 1, ['A'] = 6, ['T'] = 0, ['C'] = 0 ('A' is excessive here)
2.Now iterate through the original string:
Step#1: |G|AAAAAAA
substr = 'G' - no excessive chars (left = 0, right = 0)
Step#2: |GA|AAAAAA
substr = 'GA' - 1 excessive char, we need 5 (left = 0, right = 1)
Step#3: |GAA|AAAAA
substr = 'GAA' - 2 excessive chars, we need 5 (left = 0, right = 2)
Step#4: |GAAA|AAAA
substr = 'GAAA' - 3 excessive chars, we need 5 (left = 0, right = 3)
Step#5: |GAAAA|AAA
substr = 'GAAAA' - 4 excessive chars, we need 5 (left = 0, right = 4)
Step#6: |GAAAAA|AA
substr = 'GAAAAA' - 5 excessive chars, nice but can we remove something from left? 'G' is not excessive anyways. (left = 0, right = 5)
Step#7: G|AAAAA|AA
substr = 'AAAAA' - 5 excessive chars, wow, it's smaller now. minLength = 5 (left = 1, right = 5)
Step#8: G|AAAAAA|A
substr = 'AAAAAA' - 6 excessive chars, nice, but can we reduce the substr? There's a redundant 'A'(left = 1, right = 6)
Step#9: GA|AAAAA|A
substr = 'AAAAA' - 5 excessive chars, nice, minLen = 5 (left = 2, right = 6)
Step#10: GA|AAAAAA|
substr = 'AAAAAA' - 6 excessive chars, nice, but can we reduce the substr? There's a redundant 'A'(left = 2, right = 7)
Step#11: GAA|AAAAA|
substr = 'AAAAA' - 5 excessive chars, nice, minLen = 5 (left = 3, right = 7)
Step#12: That's it as right >= 8
Or the full code below:
from collections import Counter
n = int(input())
gene = raw_input()
char_counts = Counter()
for i in range(n):
char_counts[gene[i]] += 1
n_by_4 = n / 4
min_length = n
left = 0
right = 0
substring_counts = Counter()
while right < n:
substring_counts[gene[right]] += 1
right += 1
has_enough_excessive_chars = True
for ch in "ACTG":
diff = char_counts[ch] - n_by_4
# the char cannot be used to replace other items
if (diff > 0) and (substring_counts[ch] < diff):
has_enough_excessive_chars = False
break
if has_enough_excessive_chars:
while left < right and substring_counts[gene[left]] > (char_counts[gene[left]] - n_by_4):
substring_counts[gene[left]] -= 1
left += 1
min_length = min(min_length, right - left)
print (min_length)

Here's one solution with limited testing done. This should give you some ideas on how to improve your code.
from collections import Counter
import sys
import math
n = int(input())
s1 = input()
s = Counter(s1)
if all(e <= n/4 for e in s.values()):
print(0)
sys.exit(0)
result = math.inf
out = 0
for mnum in range(n):
s[s1[mnum]] -= 1
while all(e <= n/4 for e in s.values()) and out <= mnum:
result = min(result, mnum - out + 1)
s[s1[out]] += 1
out += 1
print(result)

Determining Longest run of Heads and Tails

I have a question about my fourth function, LongestRun. I want to output what the longest run of heads was and the longest run of tails based on how many flips (n) the user enters. I have tried a ton of different things, and it doesn't seem to work. Can you guys help me out?:
def LongestRun(n):
H = 0
T = 1
myList = []
for i in range(n):
random.randint(0,1)
if random.randint(0,1) == H:
myList.append('H')
else:
myList.append('T')
I want this next piece to output two things.
"The longest run of heads was: " And then whatever the longest run
of heads was.
"The longest run of tails was: " and whatever the longest run of
tails was.
Please help me! Thank you guys!

from itertools import groupby
my_list = [1,1,0,0,0,1,1,1,1,0,0,1,1,1,1,1,1,0,1]
max(len(list(v)) for k,v in groupby(my_list) if k==1)
is a fun way to group consecutive values and then counts the longest length of 1's, if you were to use "H/T" instead just change the if condition at the end

I guess there is a way with higher performance than my solution, but it gets you what you want:
You can do the following also with lists instead of np.arrays:
import numpy as np
n = 100
choices = ['H', 'T']
HT_array = np.random.choice(choices, n) # creates a n dimensional array with random entries of H and T
max_h = 0
max_t = 0
count_h = 0
count_t = 0
for item in HT_array:
if item == 'H':
count_h += 1
count_t = 0
if count_h > max_h:
max_h = count_h
elif item == 'T':
count_t += 1
count_h = 0
if count_t > max_t:
max_t = count_t
print(max_t)
print(max_h)

My not so optimized version:
def LongestRun(myList, lookFor='H'):
current_longest = 0
max_longest = 0
for x in myList:
if x == lookFor:
current_longest+=1
if current_longest > max_longest:
max_longest = current_longest
else:
current_longest=0
return max_longest
myList = 'H H H H H T H T H T T H T T T T T T H H H H H H H H H H H H H T'.split()
print LongestRun(myList)
print LongestRun(myList, 'T')

As #khuderm suggested, one solution is to have a counter that keep track of of the current run of heads or tails and two variables that keep track of the max run for each one.
Here's what the process should look like:
Initialize counter, max_H and max_T to zero,
Each time you append a 'H' or 'T', increment counter by 1
After incrementing counter, if corresponding max is less than counter, update max to the value of counter.
Finally, if the previous flip was a 'H' and now its a 'T' or vice vera, reset counter to zero.

Keep track of the longest sequence as you go, resetting each sequence after comparing the current to the longest sequence:
def LongestRun(n):
my_t, my_h = 0, 0
long_h, long_t = 0, 0
for i in range(n):
if not random.randint(0, 1):
my_h += 1
# if we have heads, check current len of tails seq and reset
if my_t > long_t:
long_t = my_t
my_t = 0
else:
# else we have tails, check current len of heads seq and reset
my_t += 1
if my_h > long_h:
long_h = my_h
my_h = 0
print("Longest run of heads was {}\nLongest run of tails was {}".format(long_h, long_t))
Output:
In [4]: LongestRun(1000)
Longest run of heads was 11
Longest run of tails was 13
In [5]: LongestRun(1000)
Longest run of heads was 7
Longest run of tails was 10
In [6]: LongestRun(1000)
Longest run of heads was 13
Longest run of tails was 8

Adding a triangle up from the bottom rows

I'm taking some guidance from this question (Max path triangle (Python)) but I stumbled upon it after I already started to write out what I thought.
I want to find the maximum of numbers within a triangle, leading from the bottom to the top. So once the loops reach the end the final position of the triangle will be the largest addition of the numbers from the rows below.
For instance...if this was the triangle:
2
3 7
8 2 10
2 6 9 4
It adds row n with row n-1 to mind the maximum values, so if my code ran the triangle would look like this after one iteration.
2
3 7
14 11 19
However the code I've written seems to not replace the elements in the list above.
for i in range(len(a)-1, 0, 1):
for j in range(0, len(a[i])-1, 1):
'''
i = Row position
j = Column position
'''
a[i-1][j] = max(a[i][j] + a[i-1][j], a[i][j+1] + a[i-1][j])
print a
I know it works, because when I put in numbers to check it spits out the correct answer. From the triangle I provided, the first numbers checked would be 2+8 and 6+8, making 14 the correct answer.
So what is wrong with my code?
Thanks :)

In your first for statement, you need to change the delta to -1. You can't go from len(a)-1 to 0 with positive numbers
for i in range(len(a)-1, 0, -1):
for j in range(0, len(a[i])-1, 1):
'''
i = Row position
j = Column position
'''
a[i-1][j] = max(a[i][j] + a[i-1][j], a[i][j+1] + a[i-1][j])
print a

python - print squares of numbers which are palindromes : improve efficiency

I have an assignment to do. The problem is something like this. You give a number, say x. The program calculates the square of the numbers starting from 1 and prints it only if it's a palindrome. The program continues to print such numbers till it reaches the number x provided by you.
I have solved the problem. It works fine for uptil x = 10000000. Works fine as in executes in a reasonable amount of time. I want to improve upon the efficiency of my code. I am open to changing the entire code, if required. My aim is to make a program that could execute 10^20 within around 5 mins.
limit = int(input("Enter a number"))
def palindrome(limit):
count = 1
base = 1
while count < limit:
base = base * base #square the number
base = list(str(base)) #convert the number into a list of strings
rbase = base[:] #make a copy of the number
rbase.reverse() #reverse this copy
if len(base) > 1:
i = 0
flag = 1
while i < len(base) and flag == 1:
if base[i] == rbase[i]: #compare the values at the indices
flag = 1
else:
flag = 0
i += 1
if flag == 1:
print(''.join(base)) #print if values match
base = ''.join(base)
base = int(base)
base = count + 1
count = count + 1
palindrome(limit)

He're my version:
import sys
def palindrome(limit):
for i in range(limit):
istring = str(i*i)
if istring == istring[::-1]:
print(istring,end=" ")
print()
palindrome(int(sys.argv[1]))
Timings for your version on my machine:
pu#pumbair: ~/Projects/Stackexchange time python3 palin1.py 100000
121 484 676 10201 12321 14641 40804 44944 69696 94249 698896 1002001 1234321
4008004 5221225 6948496 100020001 102030201 104060401 121242121 123454321 125686521
400080004 404090404 522808225 617323716 942060249
real 0m0.457s
user 0m0.437s
sys 0m0.012s
and for mine:
pu#pumbair: ~/Projects/Stackexchange time python3 palin2.py 100000
0 1 4 9
121 484 676 10201 12321 14641 40804 44944 69696 94249 698896 1002001 1234321
4008004 5221225 6948496 100020001 102030201 104060401 121242121 123454321 125686521
400080004 404090404 522808225 617323716 942060249
real 0m0.122s
user 0m0.104s
sys 0m0.010s
BTW, my version gives more results (0, 1, 4, 9).

Surely something like this will perform better (avoiding the unnecessary extra list operations) and is more readable:
def palindrome(limit):
base = 1
while base < limit:
squared = str(base * base)
reversed = squared[::-1]
if squared == reversed:
print(squared)
base += 1
limit = int(input("Enter a number: "))
palindrome(limit)

I think we can do it a little bit easier.
def palindrome(limit):
count = 1
while count < limit:
base = count * count # square the number
base = str(base) # convert the number into a string
rbase = base[::-1] # make a reverse of the string
if base == rbase:
print(base) #print if values match
count += 1
limit = int(input("Enter a number: "))
palindrome(limit)
String into number and number into string conversions were unnecessary. Strings can be compared, this is why you shouldn't make a loop.

You can keep a list of square palindromes upto a certain limit(say L) in memory.If the Input number x is less than sqrt(L) ,you can simply iterate over the list of palindromes and print them.This way you wont have to iterate over every number and check if its square is palindrome .
You can find a list of square palindromes here : http://www.fengyuan.com/palindrome.html

OK, here's my program. It caches valid suffixes for squares (i.e. the values of n^2 mod 10^k for a fixed k), and then searches for squares which have both that suffix and start with the suffix reversed. This program is very fast: in 24 seconds, it lists all the palindromic squares up to 10^24.
from collections import defaultdict
# algorithm will print palindromic squares x**2 up to x = 10**n.
# efficiency is O(max(10**k, n*10**(n-k)))
n = 16
k = 6
cache = defaultdict(list)
print 0, 0 # special case
# Calculate everything up to 10**k; these will be the prefix/suffix pairs we use later
tail = 10**k
for i in xrange(tail):
if i % 10 == 0: # can't end with 0 and still be a palindrome
continue
sq = i*i
s = str(sq)
if s == s[::-1]:
print i, s
prefix = int(str(sq % tail).zfill(k)[::-1])
cache[prefix].append(i)
prefixes = sorted(cache)
# Loop through the rest, but only consider matching prefix/suffix pairs
for l in xrange(k*2+1, n*2+1):
for p in prefixes:
low = (p * 10**(l-k))**.5
high = ((p+1) * 10**(l-k))**.5
low = int(low / tail) * tail
high = (int(high / tail) + 1) * tail
for n in xrange(low, high, tail):
for suf in cache[p]:
x = n + suf
s = str(x*x)
if s == s[::-1]:
print x, s
Sample output:
0 0
1 1
2 4
3 9
11 121
22 484
26 676
101 10201
111 12321
121 14641
202 40804
212 44944
<snip>
111010010111 12323222344844322232321
111100001111 12343210246864201234321
111283619361 12384043938083934048321
112247658961 12599536942224963599521
128817084669 16593841302620314839561
200000000002 40000000000800000000004

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Counting number of binary substrings of length 3+ with one unique element - python

Related

My code doesn't allow 2 times a character (how to fix)

Find the shortest substring whose replacement makes the string contain equal number of each character

Determining Longest run of Heads and Tails

Adding a triangle up from the bottom rows

python - print squares of numbers which are palindromes : improve efficiency

Categories

Resources