EDIT: I know there are other solutions to this. My question is what am I doing wrong. Where is my logic simple. Nothing else.
Was solving the minions work assignment code in Python.
The question is the following
Write a function called solution(data, n) that takes in a list of less than 100 integers and a
number n, and returns that same list but with all of the numbers that occur more than n times
removed entirely. The returned list should retain the same ordering as the original list - you don't want to mix up those carefully planned shift rotations! For instance, if data was [5, 10,
15, 10, 7] and n was 1, solution(data, n) would return the list [5, 15, 7] because 10 occurs
twice, and thus was removed from the list entirely.
My code is the following
from collections import OrderedDict
def solution(data, n):
# Your code here
if(len(data)>=100):
return []
seen=OrderedDict()
s=[]
for i in data:
if i in seen:
seen[i]+=1
else:
seen[i]=1
for k in seen:
if(seen[k]<=n):
s.append(k)
return s
My logic was to use an ordered dict to keep track of the numbers and the number of times they show up. This way we could do the code in linear time instead of n^2 (by checking the count of every value in data). This worked for most cases but is failing in some. What am I missing? Is there some space constraint? Some overlooked case?
Related
I am just trying to merge two sorted lists into one sorted list. I know it is a simple task and plenty solutions online, but my question is different. Here's my code:
def merge(list1, list2):
len1 = len(list1)
len2 = len(list2)
list3 = []
pointer = 0
for i in range(len1):
if (list1[i] >= list2[pointer]):
while (pointer < len2 and list1[i] >= list2[pointer]):
list3.append(list2[pointer])
pointer += 1
i -= 1
else:
list3.append(list1[i])
while (pointer < len2):
list3.append(list2[pointer])
pointer += 1
return list3
if __name__ == "__main__":
print(merge([1, 2, 3, 10, 11, 22], [4, 5, 6, 7, 20, 21, 30]))
I did debugging and I was confused to see that when I decrease the value i by 1, for example from 3 to 2, on the next iteration it goes back to 4. I have no idea why? You can check it by running the code and seeing the result. I just need explanation why that is happening. Thanks
I was confused to see that when I decrease the value i by 1, for example from 3 to 2, on the next iteration it goes back to 4. I have no idea why?
Because for i in range(x) means "execute the for body with i assuming the values of 0 through x-1". Assigning a different value to i does not affect its value in the next iteration.
In other words, for i in range(10) is not a translation of C's or JavaScript's for (i = 0; i < 10; i++). Instead, you can think of it as for i in [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]. Seen like that, it is clear that changing one value of i will not affect the subsequent value, which is blindly taken out of a pre-generated list. If you need to modify the iteration progress based on changing conditions, you can write the C/JS-style loop explicitly:
i = 0
while i < len1:
# ... loop body goes here ...
i += 1
Written like this, modifying i in loop body will affect iteration in the way you expected.
You are editing i inside the for loop that runs on i. I don't believe it will work the way you intend it.
Also, you can simply merge the lists and sort the outcome with this:
list1 = [1,2,3,10,11,22]
list2 = [4,5,6,7,20,21,30]
list3 = list1 + list2
list3.sort()
print(list3)
Hope this helps.
This is because range() is a generator function. It does not create the list of numbers, as you might expect, but generates a new number as you need it. And, even if it created the list, the numbers would be taken from the list, one after the other, regardless how you modify them. You can think of the result of range() in a sense as 'read-only'. user4815162342 below is right, you should not confuse it with a C-style loop. More like a Fortran loop, where the number of iterations is computed in advance.
From https://pynative.com/python-range-function/:
Python 3’s range uses the generator. Python 3’s range() will produce value when for loop iteration asked for it. i.e., it The range() doesn’t produce all numbers at once.
Python range() function returns an immutable sequence object of integers, so it is possible to convert range() output to the Python list. Use list class to convert range output to list. Let’s understand this with the following example.
I am taking in an integer value, finding the factorial of that value and trying to count the number of trailing zeros if any are present. For example:
def zeros(n):
import math
factorial = str(math.factorial(n))
zeros_lst = [number if number == "0" (else) for number in factorial[::-1]]
return len(zeros_lst)
The "else" in parenthesis is where the issue is occurring. I want to leave the loop if the as soon as it encounters a number that is not zero. I tried using break like you normally would, then looking up some examples but found nothing of similarity.
If someone knows how to break from a list comprehension or if is even possible that would be great. I am sure there are better ways to solve this problem, please post if you do.
There is no "breaking" in list comprehensions, but there are other tricks, e.g. itertools.takewhile which iterates an iterable while a condition is satisfied:
>>> from itertools import takewhile
>>>
>>> values = [7, 9, 11, 4, 2, 78, 9]
>>> list(takewhile(lambda x: x > 5, values))
[7, 9, 11]
In your case (I want to leave the loop if the as soon as it encounters a number that is not zero):
zeros_lst = list(takewhile(lambda x: x=="0", factorial[::-1]))
There is a more mathematical approach to this problem that is very simple and easy to implement. We only need to count how many factors of ten there are in factorial(n). We have an excess of factors of 2, so we choose to count factors of 5. It doesn't look as clean, but it avoids the computation of a factorial. The algorithm accounts for extra factors of 5 that show up in numbers like 25, 50, 125 and all of the rest.
def find_zeros_in_factorial(n):
factors_of_5 = [n/5]
while factors_of_5[-1] > 0:
factors_of_5.append(factors_of_5[-1]/5)
return sum(factors_of_5)
Here is a function that will count the zeros, you just need to pass it your number. This saves the string operations you had before. It will terminate once there are no more trailing zeros.
def count_zeros(n):
n_zeros = 0;
while True:
if n%10 == 0:
n = n/10
n_zeros+=1
else:
return n_zeros
print(count_zeros(math.factorial(12)))
If someone knows how to break from a list comprehension
You can not break a list compression.
But you can modify your list comprehension with the if condition in for loop. With if, you can decide what values are needed to be the part of the list:
def zeros(n):
import math
factorial = str(math.factorial(n))
# Check this line
zeros_lst = [number for number in factorial[::-1] if number == '0']
return len(zeros_lst)
It is better to use simple for loop. In fact for loops are faster than list comprehension in terms of performance. Check HERE the comparison I did for another question.
Even though list comprehension should be preferred as they are clean and more readable. Again, it is opinion based: Readability V/S Speed.
Suggestion:
Also, there is a easier way to achieve what you are doing via:
import math
def find_zeros_in_factorial(n):
num_str = str(math.factorial(n))
return len(num_str)-len(num_str.rstrip('0'))
The idea here is to subtract the length of the string with the length of string without zeroes at the end.
So my code is as shown below. Input is a list with exactly one duplicate item and one missing item.The answer is a list of two elements long ,first of which is the duplicate element in the list and second the missing element in the list in the range 1 to n.
Example =[1,4,2,5,1] answer=[1,3]
The code below works.
Am , I wrong about the complexity being O(n) and is there any faster way of achieving this in Python?
Also, is there any way I can do this without using extra space.
Note:The elements may be of the order 10^5 or larger
n = max(A)
answer = []
seen = set()
for i in A:
if i in seen:
answer.append(i)
else:
seen.add(i)
for i in xrange(1,n):
if i not in A:
answer.append(i)
print ans
You are indeed correct the complexity of this algorithm is O(n), which is the best you can achieve. You can try to optimize it by aborting the search as soon as you finish the duplicate value. But worst case your duplicate is at the back of the list and you still need to traverse it completely.
The use of hashing (your use of a set) is a good solution. There are a lot other approaches, for instance the use of Counters. But this won't change the assymptotic complexity of the algorithm.
As #Emisor advices, you can leverage the information that you have a list with 1 duplicate and 1 missing value. As you might know if you would have a list with no duplicate and no missing value, summing up all elements of the list would result in 1+2+3+..+n, which can be rewritten in the mathematical equivalent (n*n+1)/2
When you've discovered the duplicate value, you can calculate the missing value, without having to perform:
for i in xrange(1,n):
if i not in A:
answer.append(i)
Since you know the sum if all values would be present: total = (n*n+1)/2) = 15, and you know which value is duplicated. By taking the sum of the array A = [1,4,2,5,1] which is 13 and removing the duplicated value 1, results in 12.
Taking the calculated total and subtracting the calculated 12from it results in 3.
This all can be written in a single line:
(((len(A)+1)*(len(A)+2))/2)-sum(A)-duplicate
Slight optimization (i think)
def lalala2(A):
_max = 0
_sum = 0
seen = set()
duplicate = None
for i in A:
_sum += i
if _max < i:
_max = i
if i in seen:
duplicate = i
elif duplicate is None:
seen.add(i)
missing = -_sum + duplicate + (_max*(_max + 1)/2) # This last term means the sum of every number from 1 to N
return [duplicate , missing]
Looks a bit uglier, and i'm doing stuff like sum() and max() on my own instead of relying on Python's tools. But with this way, we only check every element once. Also, It'll stop adding stuff to the set once it's found the duplicate since it can calculate the missing element from it, once it knows the max
This question already has answers here:
How do I remove duplicates from a list, while preserving order?
(31 answers)
Closed 8 years ago.
I have a function, unique(a) that takes a list, a, of numbers and returns only one of each value. At the same time, it maintains the order of the list. I also have a function, big_list(n) that generates a list of len(n).
The reason to why I reverse the direction of the list is so that when removing values, It removes them from the back of the original list, just to make the modified list more clean and readable when comparing it to the original list.
The function works when I have a relatively small length of the list I'm creating, but when I get to larger lengths, such as 1,000,000 for ex, the execution time takes FOREVER.
If anyone can help me by making my function a lot faster, that would be great!
FYI : I need to use a set somewhere in the function for the assignment I am working on. I still need to remove list items from the back as well.
Thanks in advance!
def big_list(n) :
# Create a list of n 'random' values in the range [-n/2,n/2]
return [ randrange(-n//2, n//2) for i in range(n) ]
def unique(a) :
a = a[::-1]
b = set(a)
for i in b :
while a.count(i) != 1 :
a.remove(i)
a.count(i)
a = a[::-1]
return a
Your algorithm is doing a lot of extra work moving elements around. Consider:
def unique(a):
b = set()
r = []
for x in a:
if x not in b:
r.append(x)
b.insert(x)
return r
Every time you call a.count(i) it loops over the entire list to count up the occurrences. This is an O(n) operation which you repeat over and over. When you factor in the O(n) runtime of the outer for i in b: loop the overall algorithmic complexity is O(n2).
It doesn't help that there's a second unnecessary a.count(i) inside the while loop. That call doesn't do anything but chew up time.
This entire problem can be done in O(n) time. Your best bet would be to avoid list.count() altogether and figure out how you can loop over the list and count elements yourself. If you're clever you can do everything in a single pass, no nested loops (or implicit nested loops) required.
You can find a thorough benchmark of "unique" functions at this address. My personal favorite is
def unique(seq):
# Order preserving
seen = set()
return [x for x in seq if x not in seen and not seen.add(x)]
because it's the fastest and it preserves order, while using sets smartly. I think it's f7, it's given in the comments.
Inspired by this earlier stack overflow question I have been considering how to randomly interleave iterables in python while preserving the order of elements within each iterable. For example:
>>> def interleave(*iterables):
... "Return the source iterables randomly interleaved"
... <insert magic here>
>>> interleave(xrange(1, 5), xrange(5, 10), xrange(10, 15))
[1, 5, 10, 11, 2, 6, 3, 12, 4, 13, 7, 14, 8, 9]
The original question asked to randomly interleave two lists, a and b, and the accepted solution was:
>>> c = [x.pop(0) for x in random.sample([a]*len(a) + [b]*len(b), len(a)+len(b))]
However, this solution works for only two lists (though it can easily be extended) and relies on the fact that a and b are lists so that pop() and len() can be called on them, meaning it cannot be used with iterables. It also has the unfortunate side effect of emptying the source lists a and b.
Alternate answers given for the original question take copies of the source lists to avoid modifying them, but this strikes me as inefficient, especially if the source lists are sizeable. The alternate answers also make use of len() and therefore cannot be used on mere iterables.
I wrote my own solution that works for any number of input lists and doesn't modify them:
def interleave(*args):
iters = [i for i, b in ((iter(a), a) for a in args) for _ in xrange(len(b))]
random.shuffle(iters)
return map(next, iters)
but this solution also relies on the source arguments being lists so that len() can be used on them.
So, is there an efficient way to randomly interleave iterables in python, preserving the original order of elements, which doesn't require knowledge of the length of the iterables ahead of time and doesn't take copies of the iterables?
Edit: Please note that, as with the original question, I don't need the randomisation to be fair.
Here is one way to do it using a generator:
import random
def interleave(*args):
iters = map(iter, args)
while iters:
it = random.choice(iters)
try:
yield next(it)
except StopIteration:
iters.remove(it)
print list(interleave(xrange(1, 5), xrange(5, 10), xrange(10, 15)))
Not if you want fit to be "fair".
Imagine you have a list containing one million items and another containing just two items. A "fair" randomization would have the first element from the short list occurring at about index 300000 or so.
a,a,a,a,a,a,a,...,a,a,a,b,a,a,a,....
^
But there's no way to know in advance until you know the length of the lists.
If you just take from each list with 50% (1/n) probability then it can be done without knowing the lengths of the lists but you'll get something more like this:
a,a,b,a,b,a,a,a,a,a,a,a,a,a,a,a,...
^ ^
I am satisfied that the solution provided by aix meets the requirements of the question. However, after reading the comments by Mark Byers I wanted to see just how "unfair" the solution was.
Furthermore, sometime after I wrote this question, stack overflow user EOL posted another solution to the original question which yields a "fair" result. EOL's solution is:
>>> a.reverse()
>>> b.reverse()
>>> [(a if random.randrange(0, len(a)+len(b)) < len(a) else b).pop()
... for _ in xrange(len(a)+len(b))]
I also further enhanced my own solution so that it does not rely on its arguments supporting len() but does make copies of the source iterables:
def interleave(*args):
iters = sum(([iter(list_arg)]*len(list_arg) for list_arg in map(list, args)), [])
random.shuffle(iters)
return map(next, iters)
or, written differently:
def interleave(*args):
iters = [i for i, j in ((iter(k), k) for k in map(list, args)) for _ in j]
random.shuffle(iters)
return map(next, iters)
I then tested the accepted solution to the original question, written by F.J and reproduced in my question above, to the solutions of aix, EOL and my own. The test involved interleaving a list of 30000 elements with a single element list (the sentinel). I repeated the test 1000 times and the following table shows, for each algorithm, the minimum, maximum and mean index of the sentinel after interleaving, along with the total time taken. We would expect a "fair" algorithm to produce a mean of approx. 15,000:
algo min max mean total_seconds
---- --- --- ---- -------------
F.J: 5 29952 14626.3 152.1
aix: 0 8 0.9 27.5
EOL: 45 29972 15091.0 61.2
srgerg: 23 29978 14961.6 18.6
As can be seen from the results, each of the algorithms of F.J, EOL and srgerg produce ostensibly "fair" results (at least under the given test conditions). However aix's algorithm has always placed the sentinel within the first 10 elements of the result. I repeated the experiment several times with similar results.
So Mark Byers is proved correct. If a truly random interleaving is desired, the length of the source iterables will need to be known ahead of time, or copies will need to be made so the length can be determined.