Why these two almost same codes have different efficient (Python)?

Why these two almost same codes have different efficient (Python)? - python

I was practicing programming by Python on leetcode.
So this is the problem:
https://leetcode.com/problems/reverse-vowels-of-a-string/
And this is my answer:
def reverseVowels(s):
result = list(s)
v_str = 'aeiouAEIOU'
v_list = [item for item in s if item in v_str]
v_list.reverse()
v_index = 0
for i, item in enumerate(s):
if item in v_list:
result[i] = v_list[v_index]
v_index+=1
return ''.join(result)
The result: Time Limit Exceeded
And I found an extremely similar answer in the discussion:
def reverseVowels(s):
lst = list(s)
vowels_str = "aeiouAEIOU"
vowels_list = [item for item in lst if item in vowels_str]
vowels_list.reverse()
vowels_index = 0
for index, item in enumerate(lst):
if item in vowels_str:
lst[index] = vowels_list[vowels_index]
vowels_index += 1
return ''.join(lst)
The result: Accepted
This is so weird. I think these two code seem exactly the same.
The difference of them is nothing but parameters.
I am curious about why these codes cause distinct results.

There are two different lines between both codes.
First one is:
for index, item in enumerate(lst):
for i, item in enumerate(s):
In the first case it iterates through the list and in the second it iterates through the string. There may be some performance loss here but it is not the main problem.
if item in vowels_str:
if item in v_list:
This is where the running time goes. In the first case (working code) it looks for the character in the string made of vowels, which has constant length.
In the second case, it looks for the character in the list of all vowels contained in the string, which can be huge depending of the string given in test.

In the first of these you are iterating over the string (s) directly, multiple times. In the second, after converting to a list, you are iterating over that list (lst).
The exact reason this causes a difference is an (admittedly large and probably important for correctness) implementation detail in the python interpreter.
See related question for more discussion: Why is it slower to iterate over a small string than a small list?

Related

double list in return statement. need explanation in python

So I was trying to complete this kata on code wars and I ran across an interesting solution. The kata states:
"Given an array of integers, find the one that appears an odd number of times.
There will always be only one integer that appears an odd number of times."
and one of the solutions for it was:
def find_it(seq):
return [x for x in seq if seq.count(x) % 2][0]
My question is why is there [0] at the end of the statement. I tried playing around with it and putting [1] instead and when testing, it passed some tests but not others with no obvious pattern.
Any explanation will be greatly appreciated.

The first brackets are a list comprehension, the second is indexing the resulting list. It's equivalent to:
def find_it(seq):
thelist = [x for x in seq if seq.count(x) % 2]
return thelist[0]
The code is actually pretty inefficient, because it builds the whole list just to get the first value that passed the test. It could be implemented much more efficiently with next + a generator expression (like a listcomp, but lazy, with the values produced exactly once, and only on demand):
def find_it(seq):
return next(x for x in seq if seq.count(x) % 2)
which would behave the same, with the only difference being that the exception raised if no values passed the test would be IndexError in the original code, and StopIteration in the new code, and it would operate more efficiently by stopping the search the instant a value passed the test.
Really, you should just give up on using the .count method and count all the elements in a single pass, which is truly O(n) (count solutions can't be, because count itself is O(n) and must be called a number of times roughly proportionate to the input size; even if you dedupe it, in the worst case scenario all elements appear twice and you have to call count n / 2 times):
from collections import Counter
def find_it(it):
# Counter(it) counts all items of any iterable, not just sequence,
# in a single pass, and since 3.6, it's insertion order preserving,
# so you can just iterate the items of the result and find the first
# hit cheaply
return next(x for x, cnt in Counter(it).items() if cnt % 2)

That list comprehension yields a sequence of values that occur an odd number of times. The first value of that sequence will occur an odd number of times. Therefore, getting the first value of that sequence (via [0]) gets you a value that occurs an odd number of times.
Happy coding!

That code [x for x in seq if seq.count(x) % 2] return the list which has 1 value appears in input list an odd numbers of times.
So, to make the output as number, not as list, he indicates 0th index, so it returns 0th index of list with one value.

There is a nice another answer here by ShadowRanger, so I won't duplicate it providing partially only another phrasing of the same.
The expression [some_content][0] is not a double list. It is a way to get elements out of the list by using indexing. So the second "list" is a syntax for choosing an element of a list by its index (i.e. the position number in the list which begins in Python with zero and not as sometimes intuitively expected with one. So [0] addresses the first element in the list to the left of [0].
['this', 'is', 'a', 'list'][0] <-- this an index of 'this' in the list
print( ['this', 'is', 'a', 'list'][0] )
will print
this
to the stdout.
The intention of the function you are showing in your question is to return a single value and not a list.
So to get the single value out of the list which is built by the list comprehension the index [0] is used. The index guarantees that the return value result is taken out of the list [result] using [result][0] as
[result][0] == result.
The same function could be also written using a loop as follows:
def find_it(seq):
for x in seq:
if seq.count(x) % 2 != 0:
return x
but using a list comprehension instead of a loop makes it in Python mostly more effective considering speed. That is the reason why it sometimes makes sense to use a list comprehension and then unpack the found value(s) out of the list. It will be in most cases faster than an equivalent loop, but ... not in this special case where it will slow things down as mentioned already by ShadowRanger.
It seems that your tested sequences not always have only one single value which occurs an odd number of times. This will explain why you experience that sometimes the index [1] works where it shouldn't because it was stated that the tested seq will contain one and only one such value.
What you experienced looking at the function in your question is a failed attempt to make it more effective by using a list comprehension instead of a loop. The actual improvement can be achieved but by using a generator expression and another way of counting as shown in the answer by ShadowRanger:
from collections import Counter
def find_it(it):
return next(x for x, cnt in Counter(it).items() if cnt % 2)

What is the purpose of the fourth line of code particularly "l[j][-1]" here? [duplicate]

This question already has answers here:
Understanding slicing
(38 answers)
Closed 7 years ago.
The ff is a code to find the longest increasing subsequence of an array/list in this case it is d. The 4th line confuses me like what does l[j][-1] mean. Let's say i=1. What would l[j][-1] be?
d = [10,300,40,43,2,69]
l = []
for i in range(len(d)):
l.append(max([l[j] for j in range(i) if l[j][-1] < d[i]] or [[]], key=len) + [d[i]])

When I hit something like this, I consider that either I'm facing a smart algorithm that's beyond my trivial understanding, or that I'm just seeing tight code written by somebody while their mind was well soaked with the problem at hand, and not thinking about the guy who'd try to read it later (that guy most often being the writer himself, six months later... unless he's an APL hacker...).
So I take the code and try deobfuscate it. I copy/paste it into my text editor, and from there I rework it to split complex statements or expressions into smaller ones, and to assign intermediate values to variables with meaningful names. I do so iteratively, peeling off one layer of convolution at a time, until the result makes sense to me.
I did just that for your code. I believe you'll understand it easily by yourself in this shape. And me not being all too familiar with Python, I commented a few things for my own understanding at least as much as for yours.
This resulting code could use a bit of reordering and simplification, but I left in the form that would map most easily to the original four-liner, being the direct result of its expansion. Here we go :
data = [10,300,40,43,2,69]
increasingSubsequences = []
for i in range(len(data)):
# increasingSubsequences is a list of lists
# increasingSubsequences[j][-1] is the last element of increasingSubsequences[j]
candidatesForMoreIncrease = []
for j in range(i):
if data[i] > increasingSubsequences[j][-1]:
candidatesForMoreIncrease.append( increasingSubsequences[j] )
if len(candidatesForMoreIncrease) != 0:
nonEmpty = candidatesForMoreIncrease
else:
nonEmpty = [[]]
longestCandidate = max( nonEmpty, key=len ) # keep the LONGEST of the retained lists
# with the empty list included as a bottom element
# (stuff + [data[i]]) is like copying stuff and and then appending data[i] to the copy
increasingSubsequences.append( longestCandidate + [data[i]] )
print "All increasingSubsequences : ", increasingSubsequences
print "The result you expected : ", max(increasingSubsequences, key=len)

This looks like python to me.
max(arg1, arg2, *args[, key])
Returns the largest item in an iterable or the largest of two or more arguments.
If one positional argument is provided, iterable must be a non-empty iterable (such as a non-empty string, tuple or list). The largest item in the iterable is returned. If two or more positional arguments are provided, the largest of the positional arguments is returned.
The optional key argument specifies a one-argument ordering function like that used for list.sort(). The key argument, if supplied, must be in keyword form (for example, max(a,b,c,key=func)).
In this case it appears the variable l = [] is going to be appended with the largest iterable or the largest of 2 arguments from the stuff inside of the max function.
iterable 1:
[l[j] for j in range(i) if l[j][-1] < d[i]]
or
[[]]
Sometimes in words it helps too:
j in l for j in range i if j in l times -1 is less than i in d or [[]]
iterable 2:
key=len
Add [d[i]] to the one that returns the max result
Append this result to l = [] making l = [result]
Append simply adds an item to the end of the list; equivalent to a[len(a):] = [x].

Dynamic substrings on List. 10 elements before variable

I have problem with dynamic substrings. I have list which can have 1000 elements, 100 elements or even 20. I want to make copy of that list, which will have elements from -10 to variable.
For example(pseudo-code):
L = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20]
variable = 12
print L[substring:variable]
>>> L = [2,3,4,5,6,7,8,9,10,12]
I can't figure out how make it correct. The point is that variable is always changing by one.
Here is my piece of code:
def Existing(self, Pages):
if(self.iter <= 10):
list = self.other_list[:self.iter]
else:
list = self.other_list[self.iter-10:self.iter]
result = 0
page = Pages[0]
list.reverse()
for blocks in Pages:
if(list.index(blocks) > result):
result = list.index(blocks)
page = blocks
return page
That method is looking for the element which has the farest index.
This part can be unclear. So assume that we have
list = [1,2,3,4,1,5,2,1,2,3,4]
Method should return 5, because it is the farest element. List has duplicates and .index() is returning index of the first element so i reverse list. With that code sometimes program returns that some element do not exist in List. The problem (after deep review with debbuger) is with substrings in self.other_list.
Could you help me with that problem? How to make it correct? Thanks for any advice.
EDIT: Because my problem is not clear enough (I was sure that it can be), so here are more examples.
Okay, so list Pages are list which cointains currently pages which are used. Second list "list" are list of all pages which HAS BEEN used. Method is looking for pages which are already used and choose that one which has been not used for the longest time. With word "use" I mean the index of element. What means the farest element? That one which the smallest index (remember about duplicates, the last duplicates means the real index).
So we have:
Pages = [1,3,5,9]
and
list = [1,2,5,3,6,3,5,1,2,9,3,2]
Method should return 5.
To sum up:
I'm looking for substring which give result:
With list =[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20]
For variable 12: [2,3,4,5,6,7,8,9,10,12]
for 13: [3,4,5,6,7,8,9,10,11,13]
ect :-)
I know that problem can be complicated. So i would aks you to focus only about substrings. :-) Thanks you very much!

If I understood your problem correctly you want to find the index of items from pages that is at minimum position in lst(taking duplicates in consideration).
So, for this you need to first reverse the list and then first the index of each item in pages in lst, if item is not found then return negative Infinity. Out of those indices you can find the max item and you'll get your answer.
from functools import partial
pages = [1,3,5,9]
lst = [1,2,5,3,6,3,5,1,2,9,3,2]
def get_index(seq, i):
try:
return seq.index(i)
except ValueError:
return float('-inf')
lst.reverse()
print max(pages, key=partial(get_index, lst))
#5
Note that the above method will take quadratic time, so it won't perform well for huge lists. If you're not concerned with some additional memory but linear time then you can use set and dict for this:
pages_set = set(pages)
d = {}
for i, k in enumerate(reversed(lst), 1):
if k not in d and k in pages_set:
d[k] = len(lst) - i
print min(d, key=d.get)
#5

I think this should raise an error, but it doesn't

Below is a simple function to remove duplicates in a list while preserving order. I've tried it and it actually works, so the problem here is my understanding. It seems to me that the second time you run uniq.remove(item) for a given item, it will return an error (KeyError or ValueError I think?) because that item has already been removed from the unique set. Is this not the case?
def unique(seq):
uniq = set(seq)
return [item for item in seq if item in uniq and not uniq.remove(item)]

There's a check if item in uniq which gets executed before the item is removed. The and operator is nice in that it "short circuits". This means that if the condition on the left evaluates to False-like, then the condition on the right doesn't get evaluated -- We already know the expression can't be True-like.

set.remove is an in-place operation. This means that it does not return anything (well, it returns None); and bool(None) is False.
So your list comprehension is effectively this:
answer = []
for item in seq:
if item in uniq and not uniq.remove(item):
answer.append(item)
and since python does short circuiting of conditionals (as others have pointed out), this is effectively:
answer = []
for item in seq:
if item in uniq:
if not uniq.remove(item):
answer.append(item)
Of course, since unique.remove(item) returns None (the bool of which is False), either both conditions are evaluated or neither.
The reason that the second condition exists is to remove item from uniq. This way, if/when you encounter item again (as a duplicate in seq), it will not be found in uniq because it was deleted from uniq the last time it was found there.
Now, keep in mind, that this is fairly dangerous as conditions that modify variables are considered bad style (imagine debugging such a conditional when you aren't fully familiar with what it does). Conditionals should really not modify the variables they check. As such, they should only read the variables, not write to them as well.
Hope this helps

mgilson and others has answered this question nicely, as usual. I thought I might point out what is probably the canonical way of doing this in python, namely using the unique_everseen recipe from the recipe section of the itertools docs, quoted below:
from itertools import ifilterfalse
def unique_everseen(iterable, key=None):
"List unique elements, preserving order. Remember all elements ever seen."
# unique_everseen('AAAABBBCCDAABBB') --> A B C D
# unique_everseen('ABBCcAD', str.lower) --> A B C D
seen = set()
seen_add = seen.add
if key is None:
for element in ifilterfalse(seen.__contains__, iterable):
seen_add(element)
yield element
else:
for element in iterable:
k = key(element)
if k not in seen:
seen_add(k)
yield element

def unique_with_order(seq):
final = []
for item in seq:
if item not in final:
final.append(item)
return final
print unique_with_order([1,2,3,3,4,3,6])
Break it down, make it simple :) Not everything has to be a list comprehension these days.

#mgilson's answer is the right one, but here, for your information, is a possible lazy (generator) version of the same function. This means it'll work for iterables that don't fit in memory - including infinite iterators - as long as the set of its elements will.
def unique(iterable):
uniq = set()
for item in iterable:
if item not in uniq:
uniq.add(item)
yield item

The first time you run this function, you will get [1,2,3,4] from your list comprehension and the set uniq will be emptied. The second time you run this function, you will get [] because your set uniq will be empty. The reason you don't get any errors on the second run is that Python's and short circuits - it sees the first clause (item in uniq) is false and doesn't bother to run the second clause.

Multiple mismatches in DNA search sequence regex

I have written this barbaric script to create permutations of a string of characters that contain n (up to n=4) $'s in all possible combinations of positions within the string. I will eventually .replace('$','(\\w)') to use for mismatches in a dna search sequence. Because of the way I wrote the script, some of the permutations have less than the requested number of $'s. I then wrote a script to remove them, but it doesn't seem to be effective, and each time I run the removal script, it removes more of the unwanted permutations. In the code pasted below, you will see that I test the function with a simple sequence with 4 mismatches. I then run a series of removal scripts that count how many expressions are removed each time...in my experience, it takes about 8 times to remove all expressions with less than 4 wild-card $'s. I have a couple questions about this:
Is there a built in function for searches with 'n' mismatches? Maybe even in biopython? So far, I've seen the Paul_McGuire_regex function:
Search for string allowing for one mismatch in any location of the string,
which seems only to generate 1 mismatch. I must admit, I don't fully understand all of the code in the remainining functions on that page, as I am a very new coder.
Since I see this as a good exercise for me, is there a better way to write this entire script?...Can I iterate Paul_McGuire_regex function as many times as I need?
Most perplexing to me, why won't the removal script work 100% the first time?
Thanks for any help you can provide!
def Mismatch(Search,n):
List = []
SearchL = list(Search)
if n > 4:
return("Error: Maximum of 4 mismatches")
for i in range(0,len(Search)):
if n == 1:
SearchL_i = list(Search)
SearchL_i[i] = '$'
List.append(''.join(SearchL_i))
if n > 1:
for j in range (0,len(Search)):
if n == 2:
SearchL_j = list(Search)
SearchL_j[i] = '$'
SearchL_j[j] = '$'
List.append(''.join(SearchL_j))
if n > 2:
for k in range(0,len(Search)):
if n == 3:
SearchL_k = list(Search)
SearchL_k[i] = '$'
SearchL_k[j] = '$'
SearchL_k[k] = '$'
List.append(''.join(SearchL_k))
if n > 3:
for l in range(0,len(Search)):
if n ==4:
SearchL_l = list(Search)
SearchL_l[i] = '$'
SearchL_l[j] = '$'
SearchL_l[k] = '$'
SearchL_l[l] = '$'
List.append(''.join(SearchL_l))
counter=0
for el in List:
if el.count('$') < n:
counter+=1
List.remove(el)
return(List)
List_RE = Mismatch('abcde',4)
counter = 0
for el in List_RE:
if el.count('$') < 4:
List_RE.remove(el)
counter+=1
print("Filter2="+str(counter))

We can do away with questions 2 and 3 by answering question 1, but understanding question 3 is important so I'll do that first and then show how you can avoid it entirely:
Question 3
As to question 3, it's because when you loop over a list in python and make changes to it within the loop, the list that you loop over changes.
From the python docs on control flow (for statement section):
It is not safe to modify the sequence being iterated over in the loop
(this can only happen for mutable sequence types, such as lists).
Say your list is [a,b,c,d] and you loop through it with for el in List.
Say el is currently a and you do List.remove(el).
Now, your list is [b,c,d]. However, the iterator points to the second element in the list (since it's done the first), which is now c.
In essence, you've skipped b. So the problem is that you are modifying the list you are iterating over.
There are a few ways to fix this: if your List is not expensive to duplicate, you could make a copy. So iterate over List[:] but remove from List.
But suppose it's expensive to make copies of List all the time.
Then what you do is iterate over it backwards. Note the reversed below:
for el in reversed(List):
if el.count('$') < n:
counter+=1
List.remove(el)
return(List)
In the example above, suppose we iterate backwards over List.
The iterator starts at d, and then goes to c.
Suppose we remove c, so that List=[a,b,d].
Since the iterator is going backwards, it now points to element b, so we haven't skipped anything.
Basically, this avoids modifying bits of the list you have yet to iterate over.
Questions 1 & 2
If I understand your question correctly, you basically want to choose n out of m positions, where m is the length of the string (abcde), and place a '$' in each of these n positions.
In that case, you can use the itertools module to do that.
import itertools
def Mismatch(Search,n):
SearchL = list(Search)
List = [] # hold output
# print list of indices to replace with '$'
idxs = itertools.combinations(range(len(SearchL)),n)
# for each combination `idx` in idxs, replace str[idx] with '$':
for idx in idxs:
str = SearchL[:] # make a copy
for i in idx:
str[i]='$'
List.append( ''.join(str) ) # convert back to string
return List
Let's look at how this works:
turn the Search string into a list so it can be iterated over, create empty List to hold results.
idxs = itertools.combinations(range(len(SearchL)),n) says "find all subsets of length n in the set [0,1,2,3,...,length-of-search-string -1].
Try
idxs = itertools.combinations(range(5),4)
for idx in idxs:
print idx
to see what I mean.
Each element of idxs is a tuple of n indices from 0 to len(SearchL)-1 (e.g. (0,1,2,4). Replace the i'th character of SearchL with a '$' for each i in the tuple.
Convert the result back into a string and add it to List.
As an example:
Mismatch('abcde',3)
['$$$de', '$$c$e', '$$cd$', '$b$$e', '$b$d$', '$bc$$', 'a$$$e', 'a$$d$', 'a$c$$', 'ab$$$']
Mismatch('abcde',4) # note, the code you had made lots of duplicates.
['$$$$e', '$$$d$', '$$c$$', '$b$$$', 'a$$$$']

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Why these two almost same codes have different efficient (Python)? - python

Related

double list in return statement. need explanation in python

What is the purpose of the fourth line of code particularly "l[j][-1]" here? [duplicate]

Dynamic substrings on List. 10 elements before variable

I think this should raise an error, but it doesn't

Multiple mismatches in DNA search sequence regex

Categories

Resources