Extracting the subsequence of maximum length from a sequence [PYTHON] [duplicate] - python

This question already has an answer here:
Longest increasing unique subsequence
(1 answer)
Closed 6 years ago.
I have a sequence of values [1,2,3,4,1,5,1,6,7], and I have to find the longest subsequence of increasing length. However, the function needs to stop counting once it reaches a number lower than the previous one. The answer in this sequence in that case is [1,2,3,4]. As it has 4 values before being reset. How would I write the Python code for this?
Note: Finding the "longest increasing subsequence" seems to be a common challenge and so searching online I find a lot of solutions that would count for the entire length of the sequence, and return a subsequence of increasing values, ignoring any decrease, so in this case it would return [1,2,3,4,5,6,7]. That is not what I'm looking for.
It needs to count each subsequence, and reset the count upon reaching a number lower than the previous one. It then needs to compare all the subsequences counted, and return the longest one.
Thanks in advance.

Consider a function that generates all possible ascending subsequences, you would start with an empty list, add items until one element was less (or equal to?) the the previous at which point you save (yield) the subsequence and restart with a new subsequence.
One implementation using a generator could be this:
def all_ascending_subsequences(sequence):
#use an iterator so we can pull out the first element without slicing
seq = iter(sequence)
try: #NOTE 1
last = next(seq) # grab the first element from the sequence
except StopIteration: # or if there are none just return
#yield [] #NOTE 2
return
sub = [last]
for value in seq:
if value > last: #or check if their difference is exactly 1 etc.
sub.append(value)
else: #end of the subsequence, yield it and reset sub
yield sub
sub = [value]
last = value
#after the loop we send the final subsequence
yield sub
two notes about the handling of empty sequences:
To finish a generator a StopIteration needs to be
raised so we could just let the one from next(seq) propegate out - however when from __future__ import generator_stop is in
effect it would cause a RuntimeError so to be future compatible we
need to catch it and explicitly return.
As I've written it passing an empty list to
all_ascending_subsequences would generate no values, which may not
be the desired behaviour. Feel free to uncomment the yield [] to
generate an empty list when passed an empty list.
Then you can just get the longest by calling max on the result with key=len
b = [1,2,3,4,1,5,1,6,7]
result = max(all_ascending_subsequences(b),key=len)
print("longest is", result)
#print(*all_ascending_subsequences(b))

b = [4,1,6,3,4,5,6,7,3,9,1,0]
def findsub(a):
start = -1
count = 0
cur = a[0]
for i, n in enumerate(a):
if n is cur+1:
if start is -1:
start = i - 2
count=1
count+=1
cur = n
if n < cur and count > 1:
return [a[j] for j in range(start,start+count+1)]
print findsub(b)
A somewhat sloppy algorithm, but I believe it does what you want. Usually i would not have simply shown you code, but I suspect that is what you wanted, and I hope you can learn from it, and create your own from what you learn.
a slightly better looking way because I didn't like that:
b = [1,2,0,1,2,3,4,5]
def findsub(a):
answer = [a[0]]
for i in a[1:]:
if answer[-1] + 1 is i:
answer.append(i)
if not len(answer) > 1:
answer = [i]
elif i < answer[-1] and len(answer) > 1:
return answer
return answer
print findsub(b)

You need to do the following:
Create a function W that given a list, returns the index of the last item which is not strictly increasing from the start of the list.
For example, given the following lists: [1,2,3,4,1,2,3], [4,2,1], [5,6,7,8], it should return 4, 1, 4, respectively for each list
Create a variable maxim and set the value to 0
Repeatedly do the following until your list is empty
Call your function W on your list, and let's call the return value x
if x is greater than maxim
set maxim to x
At this point if you wish to store this sequence, you can use the list-slice notation to get that portion of your list which contains the sequence.
delete that portion of your list from the list
Finally, print maxim and if you were storing the parts of your list containing the longest sequence, then print the last one you got

Related

Find the number from the set that is not present in the array

I was given this question in an interview: You are given a set of numbers {1..N} and an array A[N-1]. Find the number from the set that is not present in the array. Below is the code and pseudocode I have so far, that doesn't work.
I am assuming that there is one (and only one) number in the set that isn’t in the array
loop through each element in the set
loop through each element in the array O(n)
check to see if the number is in the array
if it is, do nothing
else, early return the number
def findMissingNo(arr, s):
for num in s: #loop through each element in the set
for num2 in arr: ##loop through each element in the array O(n)
if (num == num2): #if the number in the set is in the array, break
break
print (num)
return num #if the number in the set is not in the array, early return the number
return -1 #return -1 if there is no missing element
s1 = {1,4,5}
arr1 = [1,4]
findMissingNo(arr1, s1)
By defination, we have a set from 1 to N and a array of size N-1 , contains numbers from 1 to N , with one number missing and we have to find that number
since only 1 number is missing, and set has n element and array has n-1 element. so array is subset of set, with missing element as missing, that means
all_number_of_set = all_number_of_array + missing_number
also
sum_of_all_number_of_set = sum_of_array_number + missing_number
which implies
missing_number = sum_of_all_number_of_set - sum_of_array_number
pseudo code
def findMissingNo(set_, arr_ ):
return sum(set_) - sum(arr_)
If I understood your question well then you are finding the efficient way of finding the set number that do not exist in list. I see you are inner looping which would be O(n^2). I would suggest to make the dict for the list which would be O(n) then find O(1) element in dictionay by looping over set O(n). Considering large list with subset set:
def findMissingNo(arr_list, s_list):
d = dict()
for el in arr_list:
d.update({el: el})
for s in s_list:
try:
d[s]
pass
except KeyError:
return s
return -1
s1 = {1,4,5}
arr1 = [1,4]
findMissingNo(arr1, s1)
Hope it helps:)
Your function is quadratic, because it has to check the whole list for each item in the set.
It's important that you don't iterate over the set. Yes, that can work, but you're showing that you don't know the time complexity advantages that you can get from a set or dict in python (or hashtables in general). But you can't iterate over the list either, because the missing item is ... missing. So you won't find it there.
Instead, you build a set from the list, and use the difference function. Or better, symmetric_difference (^) see https://docs.python.org/3.8/library/stdtypes.html#set
def findMissingNo(arr, s):
d = set(arr) ^ s # symmetric difference
if 1 == len(d):
for item in d:
return item
print (findMissingNo([1,4], {1,4,5}))
5
I took a few shortcuts because I knew we wanted one item, and I knew which container it was supposed to be in. I decided to return None if no item was found, but I didn't check for multiple items.
What about something like:
def findMissingNo(arr, s):
for num in s: # loop through each element in the set
if num in arr:
pass
else:
return num # if the number in the set is not in the array, early return the number
return -1 # return -1 if there is no missing element

Problem with coding a function that returns the maximum integer from even positions of a string

Make a function that receives a string containing only digits and may also be an empty string, which returns an integer value which is the maximum of all the digits that are in the EVEN POSITIONS of the original string.
If the empty string, the function should return -1.
For example:
max_even_pos("123454321") returns 5, because 5 is the maximum of 1,3,5,3,1, which are the digits in the original string in even positions.
# My code
def max_even_pos(st):
if not st:
return -1 ### This line satisfies the empty list condition
for i in range(len(st)): ### Problem I have. Trying to find
## the max integer from even positions
num_list = [i] # assigns elements to the list
if i%2 == 0: # checks if elements are even
return max(num_list) # result
My only problem is trying to get the code to find the greatest integer in the even position of the original string. I think "num_list = [i]" causes the error, but I am unsure how to change this so it executes properly.
As of right now, it outputs 0 for all cases
Your current code ensures that num_list has no more than a single element. When you hit the first even index, 0, you stop and return that index, without regard to the input value. You have several errors to correct:
Put the return after the loop. You have to get through all of the input before you can return the required value.
Accumulate the values with append. Your current code keeps only the last one.
Accumulate the input values; i is the position, not the value. I believe that you want st[i]
Also look into better ways to iterate through a list. Look at for loops or list slicing with a step of 2. If you are ready for another level of learning, look up list comprehension; you can reduce this function to a one-line return.
#Prune is correct. Maybe comparing the lines below to your original will help you for next time....
test_string = "123454321"
def max_even_pos(st):
num_list = []
if not st:
return -1
for i in range(len(st)):
if i%2 == 0:
num_list.append(st[i])
return max(num_list)
print(max_even_pos(test_string))

Dynamic substrings on List. 10 elements before variable

I have problem with dynamic substrings. I have list which can have 1000 elements, 100 elements or even 20. I want to make copy of that list, which will have elements from -10 to variable.
For example(pseudo-code):
L = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20]
variable = 12
print L[substring:variable]
>>> L = [2,3,4,5,6,7,8,9,10,12]
I can't figure out how make it correct. The point is that variable is always changing by one.
Here is my piece of code:
def Existing(self, Pages):
if(self.iter <= 10):
list = self.other_list[:self.iter]
else:
list = self.other_list[self.iter-10:self.iter]
result = 0
page = Pages[0]
list.reverse()
for blocks in Pages:
if(list.index(blocks) > result):
result = list.index(blocks)
page = blocks
return page
That method is looking for the element which has the farest index.
This part can be unclear. So assume that we have
list = [1,2,3,4,1,5,2,1,2,3,4]
Method should return 5, because it is the farest element. List has duplicates and .index() is returning index of the first element so i reverse list. With that code sometimes program returns that some element do not exist in List. The problem (after deep review with debbuger) is with substrings in self.other_list.
Could you help me with that problem? How to make it correct? Thanks for any advice.
EDIT: Because my problem is not clear enough (I was sure that it can be), so here are more examples.
Okay, so list Pages are list which cointains currently pages which are used. Second list "list" are list of all pages which HAS BEEN used. Method is looking for pages which are already used and choose that one which has been not used for the longest time. With word "use" I mean the index of element. What means the farest element? That one which the smallest index (remember about duplicates, the last duplicates means the real index).
So we have:
Pages = [1,3,5,9]
and
list = [1,2,5,3,6,3,5,1,2,9,3,2]
Method should return 5.
To sum up:
I'm looking for substring which give result:
With list =[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20]
For variable 12: [2,3,4,5,6,7,8,9,10,12]
for 13: [3,4,5,6,7,8,9,10,11,13]
ect :-)
I know that problem can be complicated. So i would aks you to focus only about substrings. :-) Thanks you very much!
If I understood your problem correctly you want to find the index of items from pages that is at minimum position in lst(taking duplicates in consideration).
So, for this you need to first reverse the list and then first the index of each item in pages in lst, if item is not found then return negative Infinity. Out of those indices you can find the max item and you'll get your answer.
from functools import partial
pages = [1,3,5,9]
lst = [1,2,5,3,6,3,5,1,2,9,3,2]
def get_index(seq, i):
try:
return seq.index(i)
except ValueError:
return float('-inf')
lst.reverse()
print max(pages, key=partial(get_index, lst))
#5
Note that the above method will take quadratic time, so it won't perform well for huge lists. If you're not concerned with some additional memory but linear time then you can use set and dict for this:
pages_set = set(pages)
d = {}
for i, k in enumerate(reversed(lst), 1):
if k not in d and k in pages_set:
d[k] = len(lst) - i
print min(d, key=d.get)
#5

Multiple mismatches in DNA search sequence regex

I have written this barbaric script to create permutations of a string of characters that contain n (up to n=4) $'s in all possible combinations of positions within the string. I will eventually .replace('$','(\\w)') to use for mismatches in a dna search sequence. Because of the way I wrote the script, some of the permutations have less than the requested number of $'s. I then wrote a script to remove them, but it doesn't seem to be effective, and each time I run the removal script, it removes more of the unwanted permutations. In the code pasted below, you will see that I test the function with a simple sequence with 4 mismatches. I then run a series of removal scripts that count how many expressions are removed each time...in my experience, it takes about 8 times to remove all expressions with less than 4 wild-card $'s. I have a couple questions about this:
Is there a built in function for searches with 'n' mismatches? Maybe even in biopython? So far, I've seen the Paul_McGuire_regex function:
Search for string allowing for one mismatch in any location of the string,
which seems only to generate 1 mismatch. I must admit, I don't fully understand all of the code in the remainining functions on that page, as I am a very new coder.
Since I see this as a good exercise for me, is there a better way to write this entire script?...Can I iterate Paul_McGuire_regex function as many times as I need?
Most perplexing to me, why won't the removal script work 100% the first time?
Thanks for any help you can provide!
def Mismatch(Search,n):
List = []
SearchL = list(Search)
if n > 4:
return("Error: Maximum of 4 mismatches")
for i in range(0,len(Search)):
if n == 1:
SearchL_i = list(Search)
SearchL_i[i] = '$'
List.append(''.join(SearchL_i))
if n > 1:
for j in range (0,len(Search)):
if n == 2:
SearchL_j = list(Search)
SearchL_j[i] = '$'
SearchL_j[j] = '$'
List.append(''.join(SearchL_j))
if n > 2:
for k in range(0,len(Search)):
if n == 3:
SearchL_k = list(Search)
SearchL_k[i] = '$'
SearchL_k[j] = '$'
SearchL_k[k] = '$'
List.append(''.join(SearchL_k))
if n > 3:
for l in range(0,len(Search)):
if n ==4:
SearchL_l = list(Search)
SearchL_l[i] = '$'
SearchL_l[j] = '$'
SearchL_l[k] = '$'
SearchL_l[l] = '$'
List.append(''.join(SearchL_l))
counter=0
for el in List:
if el.count('$') < n:
counter+=1
List.remove(el)
return(List)
List_RE = Mismatch('abcde',4)
counter = 0
for el in List_RE:
if el.count('$') < 4:
List_RE.remove(el)
counter+=1
print("Filter2="+str(counter))
We can do away with questions 2 and 3 by answering question 1, but understanding question 3 is important so I'll do that first and then show how you can avoid it entirely:
Question 3
As to question 3, it's because when you loop over a list in python and make changes to it within the loop, the list that you loop over changes.
From the python docs on control flow (for statement section):
It is not safe to modify the sequence being iterated over in the loop
(this can only happen for mutable sequence types, such as lists).
Say your list is [a,b,c,d] and you loop through it with for el in List.
Say el is currently a and you do List.remove(el).
Now, your list is [b,c,d]. However, the iterator points to the second element in the list (since it's done the first), which is now c.
In essence, you've skipped b. So the problem is that you are modifying the list you are iterating over.
There are a few ways to fix this: if your List is not expensive to duplicate, you could make a copy. So iterate over List[:] but remove from List.
But suppose it's expensive to make copies of List all the time.
Then what you do is iterate over it backwards. Note the reversed below:
for el in reversed(List):
if el.count('$') < n:
counter+=1
List.remove(el)
return(List)
In the example above, suppose we iterate backwards over List.
The iterator starts at d, and then goes to c.
Suppose we remove c, so that List=[a,b,d].
Since the iterator is going backwards, it now points to element b, so we haven't skipped anything.
Basically, this avoids modifying bits of the list you have yet to iterate over.
Questions 1 & 2
If I understand your question correctly, you basically want to choose n out of m positions, where m is the length of the string (abcde), and place a '$' in each of these n positions.
In that case, you can use the itertools module to do that.
import itertools
def Mismatch(Search,n):
SearchL = list(Search)
List = [] # hold output
# print list of indices to replace with '$'
idxs = itertools.combinations(range(len(SearchL)),n)
# for each combination `idx` in idxs, replace str[idx] with '$':
for idx in idxs:
str = SearchL[:] # make a copy
for i in idx:
str[i]='$'
List.append( ''.join(str) ) # convert back to string
return List
Let's look at how this works:
turn the Search string into a list so it can be iterated over, create empty List to hold results.
idxs = itertools.combinations(range(len(SearchL)),n) says "find all subsets of length n in the set [0,1,2,3,...,length-of-search-string -1].
Try
idxs = itertools.combinations(range(5),4)
for idx in idxs:
print idx
to see what I mean.
Each element of idxs is a tuple of n indices from 0 to len(SearchL)-1 (e.g. (0,1,2,4). Replace the i'th character of SearchL with a '$' for each i in the tuple.
Convert the result back into a string and add it to List.
As an example:
Mismatch('abcde',3)
['$$$de', '$$c$e', '$$cd$', '$b$$e', '$b$d$', '$bc$$', 'a$$$e', 'a$$d$', 'a$c$$', 'ab$$$']
Mismatch('abcde',4) # note, the code you had made lots of duplicates.
['$$$$e', '$$$d$', '$$c$$', '$b$$$', 'a$$$$']

Count the number of occurrences of a given item in a (sorted) list?

I'm asked to create a method that returns the number of occurrences of a given item in a list. I know how to write code to find a specific item, but how can I code it to where it counts the number of occurrences of a random item.
For example if I have a list [4, 6 4, 3, 6, 4, 9] and I type something like
s1.count(4), it should return 3 or s1.count(6) should return 2.
I'm not allowed to use and built-in functions though.
In a recent assignment, I was asked to count the number of occurrences that sub string "ou" appeared in a given string, and I coded it
if len(astr) < 2:
return 0
else:
return (astr[:2] == "ou")+ count_pattern(astr[1:])
Would something like this work??
def count(self, item):
num=0
for i in self.s_list:
if i in self.s_list:
num[i] +=1
def __str__(self):
return str(self.s_list)
If this list is already sorted, the "most efficient" method -- in terms of Big-O -- would be to perform a binary search with a count-forward/count-backward if the value was found.
However, for an unsorted list as in the example, then the only way to count the occurrences is to go through each item in turn (or sort it first ;-). Here is some pseudo-code, note that it is simpler than the code presented in the original post (there is no if x in list or count[x]):
set count to 0
for each element in the list:
if the element is what we are looking for:
add one to count
Happy coding.
If I told you to count the number of fours in the following list, how would you do it?
1 4 2 4 3 8 2 1 4 2 4 9 7 4
You would start by remembering no fours yet, and add 1 for each element that equals 4. To traverse a list, you can use a for statement. Given an element of the list el, you can check whether it is four like this:
if el == 4:
# TODO: Add 1 to the counter here
In response to your edit:
You're currently testing if i in self.s_list:, which doesn't make any sense since i is an element of the list and therefore always present in it.
When adding to a number, you simply write num += 1. Brackets are only necessary if you want to access the values of a list or dictionary.
Also, don't forget to return num at the end of the function so that somebody calling it gets the result back.
Actually the most efficient method in terms of Big-O would be O(log n). #pst's method would result in O(log n + s) which could become linear if the array is made up of equal elements.
The way to achieve O(log n) would be to use 2 binary searches (which gives O(2log n), but we discard constants, so it is still O(log n)) that are modified to not have an equality test, therefore making all searches unsuccessful. However, on an unsuccessful search (low > high) we return low.
In the first search, if the middle is greater than your search term, recurse into the higher part of the array, else recurse into the lower part. In the second search, reverse the binary comparison.
The first search yields the right boundary of the equal element and the second search yields the left boundary. Simply subtract to get the amount of occurrences.
Based on algorithm described in Skiena.
This seems like a homework... anyways. Try list.count(item). That should do the job.
Third or fourth element here:
http://docs.python.org/tutorial/datastructures.html
Edit:
try something else like:
bukket = dict()
for elem in astr:
if elem not in bukket.keys():
bukket[elem] = 1
else:
bukket[elem] += 1
You can now get all the elements in the list with dict.keys() as list and the corresponding occurences with dict[key].
So you can test it:
import random
l = []
for i in range(0,200):
l.append(random.randint(0,20))
print l
l.sort()
print l
bukket = dict()
for elem in l:
if elem not in bukket.keys():
bukket[elem] = 1
else:
bukket[elem] += 1
print bukket

Categories

Resources