I am solving LeetCode problem https://leetcode.com/problems/longest-substring-without-repeating-characters/:
Given a string s, find the length of the longest substring without repeating characters.
Constraints:
0 <= s.length <= 5 * 104
s consists of English letters, digits, symbols and spaces.
If used this sliding window algorithm:
def lengthOfLongestSubstring(str):
# define base case
if (len(str) < 2): return len(str)
# define pointers and frequency counter
left = 0
right = 0
freqCounter = {} # used to store the character count
maxLen = 0
while (right < len(str)):
# adds the character count into the frequency counter dictionary
if (str[right] not in freqCounter):
freqCounter[str[right]] = 1
else:
freqCounter[str[right]] += 1
# print (freqCounter)
# runs the while loop if we have a key-value with value greater than 1.
# this means that there are repeated characters in the substring.
# we want to move the left pointer by 1 until that value decreases to 1 again. E.g., {'a':2,'b':1,'c':1} to {'a':1,'b':1,'c':1}
while (len(freqCounter) != right-left+1):
# while (freqCounter[str[right]] > 1): ## Time Limit Exceeded Error
print(len(freqCounter), freqCounter)
freqCounter[str[left]] -= 1
# remove the key-value if value is 0
if (freqCounter[str[left]] == 0):
del freqCounter[str[left]]
left += 1
maxLen = max(maxLen, right-left+1)
# print(freqCounter, maxLen)
right += 1
return maxLen
print(lengthOfLongestSubstring("abcabcbb")) # 3 'abc'
I got the error "Time Limit Exceeded" when I submitted with this while loop:
while (freqCounter[str[right]] > 1):
instead of
while (len(freqCounter) != right-left+1):
I thought the first is accessing an element in a dictionary, which has a time complexity of O(1). Not sure why this would be significantly slower than the second version. This seems to mean my approach is not optimal in either case. I thought sliding window would be the most efficient algorithm; did I implement it wrong?
Your algorithm running time is close to the timeout limit for some tests -- I even got the time-out with the version len(freqCounter). The difference between the two conditions you have tried cannot be that much different, so I would look into more drastic ways to improve the efficiency of the algorithm:
Instead of counting the frequency of letters, you could store the index of where you last found the character. This allows you to update left in one go, avoiding a second loop where you had to decrease frequencies at each unit step.
Performing a del is really not necessary.
You can also use some more pythonic looping, like with enumerate
Here is the update of your code applying those ideas (the first one is the most important one):
class Solution(object):
def lengthOfLongestSubstring(self, s):
lastpos = {}
left = 0
maxLen = 0
for right, ch in enumerate(s):
if lastpos.setdefault(ch, -1) >= left:
left = lastpos[ch] + 1
else:
maxLen = max(maxLen, right - left + 1)
lastpos[ch] = right
return maxLen
Another boost can be achieved when you work with ASCII codes instead of characters, as then you can use a list instead of a dictionary. As the code challenge guarantees the characters are from a small set of basic characters, we don't need to take other character codes into consideration:
class Solution(object):
def lengthOfLongestSubstring(self, s):
lastpos = [-1] * 128
left = 0
maxLen = 0
for right, asc in enumerate(map(ord, s)):
if lastpos[asc] >= left:
left = lastpos[asc] + 1
else:
maxLen = max(maxLen, right - left + 1)
lastpos[asc] = right
return maxLen
When submitting this, it scored very well in terms of running time.
Related
I wanted to know if the algorithm that i wrotte just below in python is correct.
My goal is to find an algorithm that print/find all the possible combinaison of words that can be done using the character from character '!' (decimal value = 33) to character '~' (decimal value = 126) in the asccii table:
Here the code using recursion:
byteWord = bytearray(b'\x20') # Hex = '\x21' & Dec = '33' & Char = '!'
cntVerif = 0 # Test-------------------------------------------------------------------------------------------------------
def comb_fct(bytes_arr, cnt: int):
global cntVerif # Test------------------------------------------------------------------------------------------------
if len(bytes_arr) > 3: # Test-----------------------------------------------------------------------------------------
print(f'{cntVerif+1}:TEST END')
sys.exit()
if bytes_arr[cnt] == 126:
if cnt == len(bytes_arr) or len(bytes_arr) == 1:
bytes_arr.insert(0, 32)
bytes_arr[cnt] = 32
cnt += 1
cntVerif += 1 # Test----------------------------------------------------------------------------------------------
print(f'{cntVerif}:if bytes_arr[cnt] == 126: \n\tbytes_arr = {bytes_arr}') # Test-------------------------------------------------------------------------------------------
comb_fct(bytes_arr, cnt)
if cnt == -1 or cnt == len(bytes_arr)-1:
bytes_arr[cnt] = bytes_arr[cnt] + 1
cntVerif += 1 # Test----------------------------------------------------------------------------------------------
print(f'{cntVerif}:if cnt==-1: \n\tbytes_arr = {bytes_arr}') # Test-------------------------------------------------------------------------------------------
comb_fct(bytes_arr, cnt=-1) # index = -1 means last index
bytes_arr[cnt] = bytes_arr[cnt] + 1
cntVerif += 1 # Test--------------------------------------------------------------------------------------------------
print(f'{cntVerif}:None if: \n\tbytes_arr={bytes_arr}') # Test-----------------------------------------------------------------------------------------------
comb_fct(bytes_arr, cnt+1)
comb_fct(byteWord, -1)
Thank your for your help because python allow just a limited number of recursion (996 on my computer) so i for exemple i can't verify if my algorithm give all the word of length 3 that can be realised with the range of character describe upper.
Of course if anyone has a better idea to writte this algorithm (a faster algorithm for exemple). I will be happy to read it.
Although you might be able to tweak this a bit, I think the code below is close to the most efficient solution to your problem, which I take to be "generate all possible sequences of maximum length N from a given set of characters". That might be a bit more general than you need, since your set of characters is fixed, but the general solution is more useful and little overhead is added.
Note that the function is written as a generator, using functions from the itertools standard library module. Itertools is described as a set of "functions creating iterators for efficient looping" (emphasis added), and it indeed is. Generators are one of Python's great features, since they allow you to easily and efficiently iterate over complex sequences. If you want to write efficient and "pythonic" code, you should familiarise yourself with these concepts (as well as other essential features, such as comprehensions). So I'm not going to explain these features further; please read the tutorial sections for details.
So here's the simple solution:
from itertools import product, chain
def genseq(maxlen, chars):
return map(''.join,
chain.from_iterable(product(chars, repeat=i)
for i in range(maxlen+1)))
# Example usage:
chars = ''.join(chr(i) for i in range(33, 127))
for word in genseq(4, chars):
# Do something with word
There are 78,914,411 possible words (including the empty word); the above generates all of them in 7 seconds on my laptop. Much of that time is spent creating (and garbage collecting) those strings; you might well be able to do better using a bytearray and recycling it for each generated word. I didn't try that.
For the record, here's a simpler way of "unindexing" an enumeration of such strings. The enumeration starts with the empty word, followed by all 1-character words, then 2-character words, and so on. This ordering makes it unnecessary to specify the length (or even maximum length) of the resulting string.
def unindex(i, chars):
v = []
n = len(chars)
while i > 0:
i -= 1
v.append(i % n)
i //= n
return ''.join(chars[j] for j in v[::-1])
# Example to generate the same words as above:
# chars as above
index_limit = (len(chars) ** 5 - 1) // (len(chars) - 1)
for i in range(0, index_limit):
word = unindex(i, chars)
# Do something with word
Again, you can probably speed this up a bit by using a recycled bytearray. As written above, it took about two minutes, sixteen times as long as my first version.
Note that using bytearrays in the way you do in your answer does not significantly speed things up, because it creates a new bytearray each time. In order to achieve the savings, you have to use a single bytearray for the entire generations, modifying it rather than recreating it. That's more awkward in practice, because it means that if you need to keep a generated word around for later, perhaps because it passed some test, you must copy it. It's easy to forget that, and the resulting bug can be very hard to track down.
You don't need a recursion here. Consider your word as a n-digit number, where the digits are ASCII symbols in the range of interest ([!..~]). Start with the smallest one (all !), and increment it by 1, until you reach the largest (all ~).
To increment the long number, add 1 to the least significant byte. If it becomes ~, make it ! and try to increment the next one, etc.
Keep in mind that the amount of words is huge. There are 94 ** n n-letter words. For n == 4 there are 78074896 of them.
EXPLANATION:
To solve this problem i think that i ve found a more elegant and faster way to do it without using recursive algorithm.
Complexity:
I think too that it is the time and space optimal solution.
As it is in time: O(n) with n the total number of possible combinaison that can be very very high. And theorically O(1) in space complexity. Concerning the space complexity because of the python language characteristics my code ,from a practical point of view, creates a lot of bytearray. This can be corrected with light modification. But for a better code check the solution posted by #ricci that i marked as the accepted answer.
Mathematical principle used:
I am using the fact that it exists a bijection between all the number in decimal basis and the number in base 94.
It is obvious that each number in base 94 can be written using a special sequance of unique character as the one in the range [30, 126] (in decimal value) in the ascii code.
Exemple of base conversion:
https://www.rapidtables.com/convert/number/decimal-to-hex.html
The operator '//' is the quotient operator and the operator '%' is the modulo operator.
I will be happy if anyone can confirm that my solution is correct. :-)
ALGORITHM
VERSION 1:
If you are NOT interested by getting all the sequence of words starting by '!'.
For exemple in lenght 2, you are NOT interested by the words of the form '!!'...'!A' '!B' ... etc ...'!R'...'!~' (as in our base '!' is equivalent to zero).
# Get all ascii relevant character in a list
asciiList = []
for c in (chr(i) for i in range(33, 127)):
asciiList.append(c)
print(f'ascii List: \n{asciiList} \nlist length: {len(asciiList)}')
def base10_to_base94_fct(int_to_convert: int) -> str:
sol_str = ''
loop_condition = True
while loop_condition is True:
quo = int_to_convert // 94
mod = int_to_convert % 94
sol_str = asciiList[mod] + sol_str
int_to_convert = quo
if quo == 0:
loop_condition = False
return sol_str
# test = base10_to_base94_fct(94**2-1)
# print(f'TEST result: {test}')
def comb_fct(word_length: int) -> None:
max_iter = 94**word_length
cnt = 1
while cnt < max_iter:
str_tmp = base10_to_base94_fct(cnt)
cnt += 1
print(f'{cnt}: Current word check:{str_tmp}')
# Test
comb_fct(3)
VERSION 2:
If you are interested by getting all the sequence of words starting by '!'.
For exemple in lenght 2, you are interested by the words of the form '!!'...'!A' '!B' ... etc ...'!R'...'!~' (as in our base '!' is equivalent to zero).
# Get all ascii relevant character in a list
asciiList = []
for c in (chr(i) for i in range(33, 127)):
asciiList.append(c)
print(f'The word should contain only the character in the following ascii List: \n{asciiList} \nlist length: {len(asciiList)}')
def base10_to_base94_fct(int_to_convert: int, str_length: int) -> bytearray:
sol_str = bytearray(b'\x21') * str_length
digit_nbr = str_length-1
loop_condition = True
while loop_condition is True:
quo = int_to_convert // 94
mod = int_to_convert % 94
sol_str[digit_nbr] = 33 + mod
digit_nbr -= 1
int_to_convert = quo
if digit_nbr == -1:
loop_condition = False
return sol_str
def comb_fct(max_word_length: int) -> None:
max_iter_abs = (94/93) * (94**max_word_length-1) # sum of a geometric series: 94 + 94^2 + 94^3 + 94^4 + ... + 94^N
max_iter_rel = 94
word_length = 1
cnt_rel = 0 # rel = relative
cnt_abs = 0 # abs = absolute
while cnt_rel < max_iter_rel**word_length and cnt_abs < max_iter_abs:
str_tmp = base10_to_base94_fct(cnt_rel, word_length)
print(f'{cnt_abs}:Current word test:{str_tmp}.')
print(f'cnt_rel = {cnt_rel} and cnt_abs={cnt_abs}')
if str_tmp == bytearray(b'\x7e') * word_length:
word_length += 1
cnt_rel = 0
continue
cnt_rel += 1
cnt_abs += 1
comb_fct(2) # Test
Recently i saw a competitive coding question, the bruteforce approach doesn't meet the time complexity, Is there any other solution for this,
Question:
An expanding sequence is give which starts with 'a',we should replace each character in the following way,
a=>ab
b=>cd
c=>cd
d=>ab
there for it will look like this in each iteration,
a
ab
abcd
abcdcdab
abcdcdabcdababcd
.......
a number n will be given as input ,the function should return the character at nth postion.
I have tried the brute force approach by forming the full string and returning the char at n.but time limit exceeded.
i have tried the following:
dictionary={
'a':'ab',
'b':'cd',
'c':'cd',
'd':'ab'
}
string="a"
n=128
while len(string)<n:
new_string=''
for i in string:
new_string+=dictionary[i]
string=new_string
print(string[n-1])
The solution to problems like this is never to actually generate all the strings.
Here's a fast solution that descends directly through the tree of substitutions:
dictionary={
'a':['a','b'],
'b':['c','d'],
'c':['c','d'],
'd':['a','b']
}
def nth_char(n):
# Determine how many levels of substitution are reqired
# to produce the nth character.
# Remember the size of the last level
levels = 1
totalchars = 1
lastlevelsize = 1
while totalchars < n:
levels += 1
lastlevelsize *= 2
totalchars += lastlevelsize
# position of the target char in the last level
pos = (n-1) - (totalchars - lastlevelsize)
# start at char 1, and find the path to the target char
# through the levels
current = 'a'
while levels > 1:
levels -= 1
# next iteration, we'll go to the left or right subtree
totalchars -= lastlevelsize
# half of the last level size is the last level size in the next iteration
lastlevelsize = lastlevelsize//2
# is the target char a child of the left or right subtitution product?
# each corresponds to a contiguous part of the last level
if pos < lastlevelsize:
#left - take the left part of the last level
current = dictionary[current][0]
else:
#right - take the right part of the last level
current = dictionary[current][1]
pos -= lastlevelsize
return current
print(nth_char(17))
This is my current approach:
def isPalindrome(s):
if (s[::-1] == s):
return True
return False
def solve(s):
l = len(s)
ans = ""
for i in range(l):
subStr = s[i]
for j in range(i + 1, l):
subStr += s[j]
if (j - i + 1 <= len(ans)):
continue
if (isPalindrome(subStr)):
ans = max(ans, subStr, key=len)
return ans if len(ans) > 1 else s[0]
print(solve(input()))
My code exceeds the time limit according to the auto scoring system. I've already spend some time to look up on Google, all of the solutions i found have the same idea with no optimization or using dynamic programming, but sadly i must and only use brute force to solve this problem. I was trying to break the loop earlier by skipping all the substrings that are shorter than the last found longest palindromic string, but still end up failing to meet the time requirement. Is there any other way to break these loops earlier or more time-efficient approach than the above?
With subStr += s[j], a new string is created over the length of the previous subStr. And with s[::-1], the substring from the previous offset j is copied over and over again. Both are inefficient because strings are immutable in Python and have to be copied as a new string for any string operation. On top of that, the string comparison in s[::-1] == s is also inefficient because you've already compared all of the inner substrings in the previous iterations and need to compare only the outermost two characters at the current offset.
You can instead keep track of just the index and the offset of the longest palindrome so far, and only slice the string upon return. To account for palindromes of both odd and even lengths, you can either increase the index by 0.5 at a time, or double the length to avoid having to deal with float-to-int conversions:
def solve(s):
length = len(s) * 2
index_longest = offset_longest = 0
for index in range(length):
offset = 0
for offset in range(1 + index % 2, min(index, length - index), 2):
if s[(index - offset) // 2] != s[(index + offset) // 2]:
offset -= 2
break
if offset > offset_longest:
index_longest = index
offset_longest = offset
return s[(index_longest - offset_longest) // 2: (index_longest + offset_longest) // 2 + 1]
Solved by using the approach "Expand Around Center", thanks #Maruthi Adithya
This modification of your code should improve performance. You can stop your code when the max possible substring is smaller than your already computed answer. Also, you should start your second loop with j+ans+1 instead of j+1 to avoid useless iterations :
def solve(s):
l = len(s)
ans = ""
for i in range(l):
if (l-i+1 <= len(ans)):
break
subStr = s[i:len(ans)]
for j in range(i + len(ans) + 1, l+1):
if (isPalindrome(subStr)):
ans = subStr
subStr += s[j]
return ans if len(ans) > 1 else s[0]
This is a solution that has a time complexity greater than the solutions provided.
Note: This post is to think about the problem better and does not specifically answer the question. I have taken a mathematical approach to find a time complexity greater than 2^L (where L is size of input string)
Note: This is a post to discuss potential algorithms. You will not find the answer here. And the logic shown here has not been proven extensively.
Do let me know if there is something that I haven't considered.
Approach: Create set of possible substrings. Compare and find the maximum pair* from this set that has the highest possible pallindrome.
Example case with input string: "abc".
In this example, substring set has: "a","b","c","ab","ac","bc","abc".
7 elements.
Comparing each element with all other elements will involve: 7^2 = 49 calculations.
Hence, input size is 3 & no of calculations is 49.
Time Complexity:
First compute time complexity for generating the substring set:
<img src="https://latex.codecogs.com/gif.latex?\sum_{a=1}^{L}\left&space;(&space;C_{a}^{L}&space;\right&space;)" title="\sum_{a=1}^{L}\left ( C_{a}^{L} \right )" />
(The math equation is shown in the code snippet)
Here, we are adding all the different substring size combination from the input size L.
To make it clear: In the above example input size is 3. So we find all the pairs with size =1 (i.e: "a","b","c"). Then size =2 (i.e: "ab","ac","bc") and finally size = 3 (i.e: "abc").
So choosing 1 character from input string = combination of taking L things 1 at a time without repetition.
In our case number of combinations = 3.
This can be mathematically shown as (where a = 1):
<img src="https://latex.codecogs.com/gif.latex?C_{a}^{L}" title="C_{a}^{L}" />
Similarly choosing 2 char from input string = 3
Choosing 3 char from input string = 1
Finding time complexity of palindrome pair from generated set with maximum length:
Size of generated set: N
For this we have to compare each string in set with all other strings in set.
So N*N, or 2 for loops. Hence the final time complexity is:
<img src="https://latex.codecogs.com/gif.latex?\sum_{a=1}^{L}\left&space;(&space;C_{a}^{L}&space;\right&space;)^{2}" title="\sum_{a=1}^{L}\left ( C_{a}^{L} \right )^{2}" />
This is diverging function greater than 2^L for L > 1.
However, there can be multiple optimizations applied to this. For example: there is no need to compare "a" with "abc" as "a" will also be compared with "a". Even if this optimization is applied, it will still have a time complexity > 2^L (For the most cases).
Hope this gave you a new perspective to the problem.
PS: This is my first post.
You should not find the string start from the beginning of that string, but you should start from the middle of it & expand the current string
For example, for the string xyzabccbalmn, your solution will cost ~ 6 * 11 comparison but searching from the middle will cost ~ 11 * 2 + 2 operations
But anyhow, brute-forcing will never ensure that your solution will run fast enough for any arbitrary string.
Try this:
def solve(s):
if len(s)==1:
print(0)
return '1'
if len(s)<=2 and not(isPalindrome(s)):
print (0)
return '1'
elif isPalindrome(s):
print( len(s))
return '1'
elif isPalindrome(s[0:len(s)-1]) or isPalindrome(s[1:len(s)]):
print (len(s)-1)
return '1'
elif len(s)>=2:
solve(s[0:len(s)-1])
return '1'
return 0
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 5 years ago.
Improve this question
A array of length t has all elements initialized by 1 .Now we can perform two types of queries on the array
to replace the element at ith index to 0 .This query is denoted by 0 index
find and print an integer denoting the index of the kth 1 in array A on a new line; if no such index exists print -1.This query is denoted by 1 k
Now suppose for array of length t=4 all its elements at the beginning are [1,1,1,1] now for query 0 2 the array becomes [1,0,1,1] and for query 1 3 the output comes out to be 4
I have used a brute force approach but how to make the code more efficient?
n,q=4,2
arr=[1]*4
for i in range(q):
a,b=map(int,input().split())
if a==0:
arr[b-1]=0
else:
flag=True
count=0
target=b
for i,j in enumerate(arr):
if j ==1:
count+=1
if count==target:
print(i+1)
flag=False
break
if flag:
print(-1)
I have also tried to first append all the indexes of 1 in a list and then do binary search but pop 0 changes the indices due to which the code fails
def binary_search(low,high,b):
while(low<=high):
mid=((high+low)//2)
#print(mid)
if mid+1==b:
print(stack[mid]+1)
return
elif mid+1>b:
high=mid-1
else:
low=mid+1
n=int(input())
q=int(input())
stack=list(range(n))
for i in range(q):
a,b=map(int,input().split())
if a==0:
stack.pop(b-1)
print(stack)
else:
if len(stack)<b:
print(-1)
continue
else:
low=0
high=len(stack)-1
binary_search(low,high,b)
You could build a binary tree where each node gives you the number of ones that are below and at the left of it. So if n is 7, that tree would initially look like this (the actual list with all ones is shown below it):
4
/ \
2 2
/ \ / \
1 1 1 1
----------------
1 1 1 1 1 1 1 -
Setting the array element at index 4 (zero-based) to 0, would change that tree to:
4
/ \
2 1*
/ \ / \
1 1 0* 1
----------------
1 1 1 1 0*1 1 -
Putting a 0 thus represents a O(log(n)) time complexity.
Counting the number of ones can then also be done in the same time complexity by summing up the node values while descending down the tree in the right direction.
Here is Python code you could use. It represents the tree in a list in breadth-first order. I have not gone to great lengths to further optimise the code, but it has the above time complexities:
class Ones:
def __init__(self, n): # O(n)
self.lst = [1] * n
self.one_count = n
self.tree = []
self.size = 1 << (n-1).bit_length()
at_left = self.size // 2
width = 1
while width <= at_left:
self.tree.extend([at_left//width] * width)
width *= 2
def clear_index(self, i): # O(logn)
if i >= len(self.lst) or self.lst[i] == 0:
return
self.one_count -= 1
self.lst[i] = 0
# Update tree
j = 0
bit = self.size >> 1
while bit >= 1:
go_right = (i & bit) > 0
if not go_right:
self.tree[j] -= 1
j = j*2 + 1 + go_right
bit >>= 1
def get_index_of_ith_one(self, num_ones): # O(logn)
if num_ones <= 0 or num_ones > self.one_count:
return -1
j = 0
k = 0
bit = self.size >> 1
while bit >= 1:
go_right = num_ones > self.tree[j]
if go_right:
k |= bit
num_ones -= self.tree[j]
j = j*2 + 1 + go_right
bit >>= 1
return k
def is_consistent(self): # Only for debugging
# Check that list can be derived by calling get_index_of_ith_one for all i
lst = [0] * len(self.lst)
for i in range(1, self.one_count+1):
lst[self.get_index_of_ith_one(i)] = 1
return lst == self.lst
# Example use
ones = Ones(12)
print('tree', ones.tree)
ones.clear_index(5)
ones.clear_index(2)
ones.clear_index(1)
ones.clear_index(10)
print('tree', ones.tree)
print('lst', ones.lst)
print('consistent = ', ones.is_consistent())
Be aware that this treats indexes as zero-based, while the method get_index_of_ith_one expects an argument that is at least 1 (but it returns a zero-based index).
It should be easy to adapt to your needs.
Complexity
Creation: O(n)
Clear at index: O(logn)
Get index of one: O(logn)
Space complexity: O(n)
Let's start with some general tricks:
Check if the n-th element is too big for the list before iterating. If you also keep a "counter" that stores the number of zeros, you could even check if nth >= len(the_list) - number_of_zeros (not sure if >= is correct here, it seems like the example uses 1-based indices so I could be off-by-one). That way you save time whenever too big values are used.
Use more efficient functions.
So instead of input you could use sys.stdin.readline (note that it will include the trailing newline).
And, even though it's probably not useful in this context, the built-in bisect module would be better than the binary_search function you created.
You could also use for _ in itertools.repeat(None, q) instead of for i in range(q), that's a bit faster and you don't need that index.
Then you can use some more specialized facts about the problem to improve the code:
You only store zeros and ones, so you can use if j to check for ones and if not j to check for zeros. These will be a bit faster than manual comparisons especially in when you do that in a loop.
Every time you look for the nth 1, you could create a temporary dictionary (or a list) that contains the encountered ns + index. Then re-use that dict for subsequent queries (dict-lookup and list-random-access is O(1) while your search is O(n)). You could even expand it if you have subsequent queries without change in-between.
However if a change happens you either need to discard that dictionary (or list) or update it.
A few nitpicks:
The variable names are not very descriptive, you could use for index, item in enumerate(arr): instead of i and j.
You use a list, so arr is a misleading variable name.
You have two i variables.
But don't get me wrong. It's a very good attempt and the fact that you use enumerate instead of a range is great and shows that you already write pythonic code.
Consider something akin to the interval tree:
root node covers the entire array
children nodes cover left and right halves of the parent range respectively
each node holds the number of ones in its range
Both replace and search queries could be completed in logarithmic time.
Refactored with less lines, so more efficient in terms of line count but run time probably the same O(n).
n,q=4,2
arr=[1]*4
for i in range(q):
query, target = map(int,input('query target: ').split())
if query == 0:
arr[target-1] = 0
else:
count=0
items = enumerate(arr, 1)
try:
while count < target:
index, item = next(items)
count += item
except StopIteration as e:
index = -1
print(index)
Assumes arr contains ONLY ones and zeroes - you don't have to check if an item is one before you add it to count, adding zero has no affect.
No flags to check, just keep calling next on the enumerate object (items) till you reach your target or the end of arr.
For runtime efficiency, using an external library but basically the same process (algorithm):
import numpy as np
for i in range(q):
query, target = map(int,input('query target: ').split())
if query == 0:
arr[target-1] = 0
else:
index = -1
a = np.array(arr).cumsum() == target
if np.any(a):
index = np.argmax(a) + 1
print(index)
I am trying to use the regex module to find non-overlapping repeats (duplicated sub-strings) within a given string (30 char), with the following requirements:
I am only interested in non-overlapping repeats that are 6-15 char long.
allow 1 mis-match
return the positions for each match
One way I thought of is that for each possible repeat length, let python loop through the 30char string input. For example,
string = "ATAGATATATGGCCCGGCCCATAGATATAT" #input
#for 6char repeats, first one in loop would be for the following event:
text = "ATAGAT"
text2 ="(" + text + ")"+ "{e<=1}" #this is to allow 1 mismatch later in regex
string2="ATATGGCCCGGCCCATAGATATAT" #string after excluding text
for x in regex.finditer(text2,string2,overlapped=True):
print x.span()
#then still for 6char repeats, I will move on to text = "TAGATA"...
#after 6char, loop again for 7char...
There should be two outputs for this particular string = "ATAGATATATGGCCCGGCCCATAGATATAT". 1. The bold two "ATAGATATAT" + 1 mismatch: "ATAGATATATG" &"CATAGATATAT" with position index returned as (0,10)&(19, 29); 2. "TGGCCC" & "GGCCCA" (need add one mismatch to be at least 6 char), with index (9,14)&(15,20). Numbers can be in a list or table.
I'm sorry that I didn't include a real loop, but I hope the idea is clear...As you can see, this is a very less efficient method, not to mention it would create redundancy --- e.g. 10char repeats will be counted more than once, because it would suit for 9,8,7 and 6 char repeats loops. Moreover, I have a lot of such 30 char strings to work with, so I would appreciate your advice on some cleaner methods.
Thank you very much:)
I'd try straightforward algorithm instead of regex (which are quite confusing in this instance);
s = "ATAGATATATGGCCCGGCCCATAGATATAT"
def fuzzy_compare(s1, s2):
# sanity check
if len(s1) != len(s2):
return False
diffs = 0
for a, b in zip(s1, s2):
if a != b:
diffs += 1
if diffs > 1:
return False
return True
slen = len(s) # 30
for l in range(6, 16):
i = 0
while (i + l * 2) <= slen:
sub1 = s[i:i+l]
for j in range(i+l, slen - l):
sub2 = s[j:j+l]
if fuzzy_compare(sub1, sub2):
# checking if this could be partial
partial = False
if i + l < j and j + l < slen:
extsub1 = s[i:i+l+1]
extsub2 = s[j:j+l+1]
# if it is partial, we'll get it later in the main loop
if fuzzy_compare(extsub1, extsub2):
partial = True
if not partial:
print (i, i+l), (j, j+l)
i += 1
It's a first draft, so feel free to experiment with it. It also seems to be clunky and not optimal, but try running it first - it may be sufficient enough.