Project Euler 92

Project Euler 92 - python

My code for solving problem 92 was correct but too slow and hence I tried to modify it by considering only one number for each possible permutation of that number, effectively reducing the size of the problem to 11439 from the original 10 million. Here's my code
import time
from Euler import multCoeff
start = time.time()
def newNum(n):
return sum([int(dig)**2 for dig in str(n)])
def chain(n, l):
if n in l:
return n, l
else:
l.append(n)
return chain(newNum(n), l)
nums = []
for i in range(1,10000000):
if all(str(i)[j] <= str(i)[j+1] for j in range(len(str(i))-1)):
nums.append(i)
count = 0
for i in nums:
if 89 in chain(i,[])[1]:
perms = multCoeff(i)
count += perms
end = time.time() - start
print count, end
multCoeff is a method that I created which is basically equivalent to len(set(permutations([int(j) for j in str(i)])))
and works just fine. Anyway, the problem is that the result I get is not the correct one, and it looks like I'm ignoring some of the cases but I can't really see which ones. I'd be really grateful if someone could point me in the right direction. Thanks.

We're missing the code for multCoeff, so I'm guessing here.
You're trying to filter from 1 to 999,999,999 by excluding numbers that do not have digits in ascending order and then re-calculating their permutations after.
Your problem is 0.
According to your filter 2, 20, 200, 2000, 20000, 200000, 2000000 are all represented by 2, however you're probably not adding back this many permutations.
General observations about your code:
item in list has O(n) time complexity; try to avoid doing this for large lists.
You are throwing away the results of many computations; any number in a chain that results in 89 or 1 will always have that end result.
Every function call has a time cost; try to keep the number of functions in looping calls low.
Casting str to int is somewhat expensive; try to keep the number of casts to a minimum.

Related

Finding the product of first Million natural numbers in Python

I have just started with Python programming language. I tried to write a function which takes input either a list or multiple integers to find their product. I am trying to find the product of first million natural numbers but its displaying an MemoryError.
def product(*arg):
answer=1
if type(arg) == tuple:
arg=str(arg)
arg=arg.lstrip('[(')
arg=arg.rstrip('],)')
arg=arg.split(',')
for i in arg:
answer*=int(i)
return answer
else:
for i in arg:
answer*=int(i)
return answer
j=range(1,1000000,1)
j=list(j)
print(product(j))
Steps:
I convert the range object into list object if i am to pass a list as
argument
Now, within the function, i try to split the tuple by converting it
string.
I convert the resultant string into a list and then loop over the
elements to find the product
Q1: How to avoid the memory error as i try to find the product of first Million natural numbers?
Q2 How to improve this code?

You can use a Generator in Python.
def generate_product():
r = 1
for i in range(1,1000000):
r *= i + 1
yield r
list(generate_product())[0]
It is more memory efficient and better in terms of performance.

To calculate the product of all numbers from 1 to 1 million use a simple loop:
r = 1
for l in range(1,1000000):
r*=(i+1)
print(res)
But keep in mind that the result will be a pretty big number.
That means that your calculation might take long and the resulting number will need a lot memory.
EDIT Then i missread your question a little. This is a function that multiplies the elements in a list:
def multiply_list_elements(_list):
result = 1
for element in _list:
result*=element
return result
multiply_list_elements([1,2,3,4])
>>> 24
The memory error probably came from the huge number as #ZabirAlNazi calculated so nicely.

All of the solution is fine, but one point to make - your question is equivalent to find the factorial of 1 million.
The number of digits of n! = log10(1) + log10(2) + ... log10(n)
import math
num_dig = 1
for i in range(1,1000000):
num_dig += math.log10(i)
print(num_dig)
So, the number of digits in your answer is 5565703 (approx.).
That's only the final n, if you also want the intermediate results it will require squared memory O(m^2).
import math
ans = 1
for i in range(2,1000001):
ans *= i
print(ans)
N.B: You can approximate with logarithms and Stirling numbers with faster run-time.

A very simple solution would be:
def prod_of():
p=1
for i in range(1,1000000):
p* = i
print(p)

Elements of Programming Interview 5.15 (Random Subset Computation)

Algorithm problem:
Write a program which takes as input a positive integer n and size
k <= n; return a size-k subset of {0, 1, 2, .. , n -1}. The subset
should be represented as an array. All subsets should be equally
likely, and in addition, all permutations of elements of the array
should be equally likely. You may assume you have a function which
takes as input a nonnegative integer t and returns an integer in the
set {0, 1,...,t-1}.
My original solution to this in pseudocode is as follows:
Set t = n, and output the result of the random number generator into a set() until set() has size(set) == t. Return list(set)
The author solution is as follows:
def online_sampling(n, k):
changed_elements = {}
for i in range(k):
rand_idx = random.randrange(i, n)
rand_idx_mapped = changed_elements.get(rand_idx, rand_idx)
i_mapped = changed_elements.get(i, i)
changed_elements[rand_idx] = i_mapped
changed_elements[i] = rand_idx_mapped
return [changed_elements[i] for i in range(k)]
I totally understand the author's solution - my question is more about why my solution is incorrect. My guess is that it becomes greatly inefficient as t approaches n, because in that case, the probability that I need to keep running the random num function until I get a number that isn't in t gets higher and higher. If t == n, for the very last element to add to set there is just a 1/n chance that I get the correct element, and would probabilistically need to run the given rand() function n times just to get the last item.
Is this the correct reason for why my solution isn't efficient? Is there anything else I'm missing? And how would one describe the time complexity of my solution then? By the above rationale, I believe would be O(n^2) since probabilistically need to run n + n - 1 + n - 2... times.

Your solution is (almost) correct.
Firstly, it will run in O(n log n) instead of O(n^2), assuming that all operations with set are O(1). Here's why.
The expected time to add the first element to the set is 1 = n/n.
The expected time to add the second element to the set is n/(n-1), because the probability to randomly choose yet unchosen element is (n-1)/n. See geometric distribution for an explanation.
...
For k-th element, the expected time is n/(n-k). So for n elements the total time is n/n + n/(n-1) + ... + n/1 = n * (1 + 1/2 + ... + 1/n) = n log n.
Moreover, we can prove by induction that all chosen subsets will be equiprobable.
However, when you do list(set(...)), it is not guaranteed the resulting list will contain elements in the same order as you put them into a set. For example, if set is implemented as a binary search tree then the list will always be sorted. So you have to store the list of unique found elements separately.
UPD (#JimMischel): we proved the average case running time. There still is a possibility that the algorithm will run indefinitely (for example, if rand() always returns 1).

Your method has a big problem. You may return duplicate numbers if you random number generator create same number two times isn't it?
If you say set() will not keep duplicate numbers, your method has created members of set with different chance. So numbers in your set will not be equally likely.
Problem with your method is not efficiency, it does not create an equally likely result set. The author uses a variation of Fisher-Yates method for creating that subset which will be equally likely.

Better python logic that prevent time out when comparing arrays in nested loops

I was attempting to solve a programing challenge and the program i wrote solved the small test data correctly for this question. But When they run it against the larger datasets, my program timed out on some of the occasions . I am mostly a self taught programmer, if there is a better algorithm/implementation than my logic can you guys tell me.thanks.
Question
Given an array of integers, a, return the maximum difference of any
pair of numbers such that the larger integer in the pair occurs at a
higher index (in the array) than the smaller integer. Return -1 if you
cannot find a pair that satisfies this condition.
My Python Function
def maxDifference( a):
diff=0
find=0
leng = len(a)
for x in range(0,leng-1):
for y in range(x+1,leng):
if(a[y]-a[x]>=diff):
diff=a[y]-a[x]
find=1
if find==1:
return diff
else:
return -1
Constraints:
1 <= N <= 1,000,000
-1,000,000 <= a[i] <= 1,000,000 i belongs to [1,N]
Sample Input:
Array { 2,3,10,2,4,8,1}
Sample Output:
8

Well... since you don't care for anything else than finding the highest number following the lowest number, provided that difference is the highest so far, there's no reason to do several passes or using max() over a slice of the array:
def f1(a):
smallest = a[0]
result = 0
for b in a:
if b < smallest:
smallest = b
if b - smallest > result:
result = b - smallest
return result if result > 0 else -1
Thanks #Matthew for the testing code :)
This is very fast even on large sets:
The maximum difference is 99613 99613 99613
Time taken by Sojan's method: 0.0480000972748
Time taken by #Matthews's method: 0.0130000114441
Time taken by #GCord's method: 0.000999927520752

The reason your program takes too long is that your nested loop inherently means quadratic time.
The outer loop goes through N-1 indices. The inner loop goes through a different number of indices each time, but the average is obviously (N-1)/2 rounded up. So, the total number of times through the inner loop is (N-1) * (N-1)/2, which is O(N^2). For the maximum N=1000000, that means 499999000001 iterations. That's going to take a long time.
The trick is to find a way to do this in linear time.
Here's one solution (as a vague description, rather than actual code, so someone can't just copy and paste it when they face the same test as you):
Make a list of the smallest value before each index. Each one is just min(smallest_values[-1], arr[i]), and obviously you can do this in N steps.
Make a list of the largest value after each index. The simplest way to do this is to reverse the list, do the exact same loop as above (but with max instead of min), then reverse again. (Reversing a list takes N steps, of course.)
Now, for each element in the list, instead of comparing to every other element, you just have to compare to smallest_values[i] and largest_values[i]. Since you're only doing 2 comparisons for each of the N values, this takes 2N time.
So, even being lazy and naive, that's a total of N + 3N + 2N steps, which is O(N). If N=1000000, that means 6000000 steps, which is a whole lot faster than 499999000001.
You can obviously see how to remove the two reverses, and how to skip the first and last comparisons. If you're smart, you can see how to take the whole largest_values out of the equation entirely. Ultimately, I think you can get it down to 2N - 3 steps, or 1999997. But that's all just a small constant improvement; nowhere near as important as fixing the basic algorithmic problem. You'd probably get a bigger improvement than 3x (maybe 20x), for less work, by just running the naive code in PyPy instead of CPython, or by converting to NumPy—but you're not going to get the 83333x improvement in any way other than changing the algorithm.

Here's a linear time solution. It keeps a track of the minimum value before each index of the list. These minimum values are stored in a list min_lst. Finally, the difference between corresponding elements of the original and the min list is calculated into another list of differences by zipping the two. The maximum value in this differences list should be the required answer.
def get_max_diff(lst):
min_lst = []
running_min = lst[0]
for item in lst:
if item < running_min:
running_min = item
min_lst.append(running_min)
val = max(x-y for (x, y) in zip(lst, min_lst))
if not val:
return -1
return val
>>> get_max_diff([5, 6, 2, 12, 8, 15])
13
>>> get_max_diff([2, 3, 10, 2, 4, 8, 1])
8
>>> get_max_diff([5, 4, 3, 2, 1])
-1

Well, I figure since someone in the same problem can copy your code and run with that, I won't lose any sleep over them copying some more optimized code:
import time
import random
def max_difference1(a):
# your function
def max_difference2(a):
diff = 0
for i in range(0, len(a)-1):
curr_diff = max(a[i+1:]) - a[i]
diff = max(curr_diff, diff)
return diff if diff != 0 else -1
my_randoms = random.sample(range(100000), 1000)
t01 = time.time()
max_dif1 = max_difference1(my_randoms)
dt1 = time.time() - t01
t02 = time.time()
max_dif2 = max_difference2(my_randoms)
dt2 = time.time() - t02
print("The maximum difference is", max_dif1)
print("Time taken by your method:", dt1)
print("Time taken by my method:", dt2)
print("My method is", dt1/dt2, "times faster.")
The maximum difference is 99895
Time taken by your method: 0.5533690452575684
Time taken by my method: 0.08005285263061523
My method is 6.912546237558299 times faster.
Similar to what #abarnert said (who always snipes me on these things I swear), you don't want to loop over the list twice. You can exploit the fact that you know that your larger value has to be in front of the smaller one. You also can exploit the fact that you don't care for anything except the largest number, that is, in the list [1,3,8,5,9], the maximum difference is 8 (9-1) and you don't care that 3, 8, and 5 are in there. Thus: max(a[i+1:]) - a[i] is the maximum difference for a given index.
Then you compare it with diff, and take the larger of the 2 with max, as calling default built-in python functions is somewhat faster than if curr_diff > diff: diff = curr_diff (or equivalent).
The return line is simply your (fixed) line in 1 line instead of 4
As you can see, in a sample of 1000, this method is ~6x faster (note: used python 3.4, but nothing here would break on python 2.x)

I think the expected answer for
1, 2, 4, 2, 3, 8, 5, 6, 10
will be 8 - 2 = 6 but instead Saksham Varma code will return 10 - 1 = 9.
Its max(arr) - min(arr).
Don't we have to reset the min value when there is a dip
. ie; 4 -> 2 will reset current_smallest = 2 and continue diff the calculation with value '2'.
def f2(a):
current_smallest = a[0]
large_diff = 0
for i in range(1, len(a)):
# Identify the dip
if a[i] < a[i-1]:
current_smallest = a[i]
if a[i] - current_smallest > large_diff:
large_diff = a[i] - current_smallest

Python: speed up removal of every n-th element from list

I'm trying to solve this programming riddle and although the solution (see code below) works correctly, it is too slow for succesful submission.
Any pointers as how to make this run
faster (removal of every n-th element from a list)?
Or suggestions for a better algorithm to calculate the same; seems I can't think of anything
else than brute-force for now...
Basically, the task at hand is:
GIVEN:
L = [2,3,4,5,6,7,8,9,10,11,........]
1. Take the first remaining item in list L (in the general case 'n'). Move it to
the 'lucky number list'. Then drop every 'n-th' item from the list.
2. Repeat 1
TASK:
Calculate the n-th number from the 'lucky number list' ( 1 <= n <= 3000)
My original code (it calculated the 3000 first lucky numbers in about a second on my machine - unfortunately too slow):
"""
SPOJ Problem Set (classical) 1798. Assistance Required
URL: http://www.spoj.pl/problems/ASSIST/
"""
sieve = range(3, 33900, 2)
luckynumbers = [2]
while True:
wanted_n = input()
if wanted_n == 0:
break
while len(luckynumbers) < wanted_n:
item = sieve[0]
luckynumbers.append(item)
items_to_delete = set(sieve[::item])
sieve = filter(lambda x: x not in items_to_delete, sieve)
print luckynumbers[wanted_n-1]
EDIT: thanks to the terrific contributions of Mark Dickinson, Steve Jessop and gnibbler, I got at the following, which is quite a whole lot faster than my original code (and succesfully got submitted at http://www.spoj.pl with 0.58 seconds!)...
sieve = range(3, 33810, 2)
luckynumbers = [2]
while len(luckynumbers) < 3000:
if len(sieve) < sieve[0]:
luckynumbers.extend(sieve)
break
luckynumbers.append(sieve[0])
del sieve[::sieve[0]]
while True:
wanted_n = input()
if wanted_n == 0:
break
else:
print luckynumbers[wanted_n-1]

This series is called ludic numbers
__delslice__ should be faster than __setslice__+filter
>>> L=[2,3,4,5,6,7,8,9,10,11,12]
>>> lucky=[]
>>> lucky.append(L[0])
>>> del L[::L[0]]
>>> L
[3, 5, 7, 9, 11]
>>> lucky.append(L[0])
>>> del L[::L[0]]
>>> L
[5, 7, 11]
So the loop becomes.
while len(luckynumbers) < 3000:
item = sieve[0]
luckynumbers.append(item)
del sieve[::item]
Which runs in less than 0.1 second

Try using these two lines for the deletion and filtering, instead of what you have; filter(None, ...) runs considerably faster than the filter(lambda ...).
sieve[::item] = [0]*-(-len(sieve)//item)
sieve = filter(None, sieve)
Edit: much better to simply use del sieve[::item]; see gnibbler's solution.
You might also be able to find a better termination condition for the while loop: for example, if the first remaining item in the sieve is i then the first i elements of the sieve will become the next i lucky numbers; so if len(luckynumbers) + sieve[0] >= wanted_n you should already have computed the number you need---you just need to figure out where in sieve it is so that you can extract it.
On my machine, the following version of your inner loop runs around 15 times faster than your original for finding the 3000th lucky number:
while len(luckynumbers) + sieve[0] < wanted_n:
item = sieve[0]
luckynumbers.append(item)
sieve[::item] = [0]*-(-len(sieve)//item)
sieve = filter(None, sieve)
print (luckynumbers + sieve)[wanted_n-1]

An explanation on how to solve this problem can be found here. (The problem I linked to asks for more, but the main step in that problem is the same as the one you're trying to solve.) The site I linked to also contains a sample solution in C++.
The set of numbers can be represented in a binary tree, which supports the following operations:
Return the nth element
Erase the nth element
These operations can be implemented to run in O(log n) time, where n is the number of nodes in the tree.
To build the tree, you can either make a custom routine that builds the tree from a given array of elements, or implement an insert operation (make sure to keep the tree balanced).
Each node in the tree need the following information:
Pointers to the left and right children
How many items there are in the left and right subtrees
With such a structure in place, solving the rest of the problem should be fairly straightforward.
I also recommend calculating the answers for all possible input values before reading any input, instead of calculating the answer for each input line.
A Java implementation of the above algorithm gets accepted in 0.68 seconds at the website you linked.
(Sorry for not providing any Python-specific help, but hopefully the algorithm outlined above will be fast enough.)

You're better off using an array and zeroing out every Nth item using that strategy; after you do this a few times in a row, the updates start getting tricky so you'd want to re-form the array. This should improve the speed by at least a factor of 10. Do you need vastly better than that?

Why not just create a new list?
L = [x for (i, x) in enumerate(L) if i % n]

How can I merge two lists and sort them working in 'linear' time?

I have this, and it works:
# E. Given two lists sorted in increasing order, create and return a merged
# list of all the elements in sorted order. You may modify the passed in lists.
# Ideally, the solution should work in "linear" time, making a single
# pass of both lists.
def linear_merge(list1, list2):
finalList = []
for item in list1:
finalList.append(item)
for item in list2:
finalList.append(item)
finalList.sort()
return finalList
# +++your code here+++
return
But, I'd really like to learn this stuff well. :) What does 'linear' time mean?

Linear means O(n) in Big O notation, while your code uses a sort() which is most likely O(nlogn).
The question is asking for the standard merge algorithm. A simple Python implementation would be:
def merge(l, m):
result = []
i = j = 0
total = len(l) + len(m)
while len(result) != total:
if len(l) == i:
result += m[j:]
break
elif len(m) == j:
result += l[i:]
break
elif l[i] < m[j]:
result.append(l[i])
i += 1
else:
result.append(m[j])
j += 1
return result
>>> merge([1,2,6,7], [1,3,5,9])
[1, 1, 2, 3, 5, 6, 7, 9]

Linear time means that the time taken is bounded by some undefined constant times (in this context) the number of items in the two lists you want to merge. Your approach doesn't achieve this - it takes O(n log n) time.
When specifying how long an algorithm takes in terms of the problem size, we ignore details like how fast the machine is, which basically means we ignore all the constant terms. We use "asymptotic notation" for that. These basically describe the shape of the curve you would plot in a graph of problem size in x against time taken in y. The logic is that a bad curve (one that gets steeper quickly) will always lead to a slower execution time if the problem is big enough. It may be faster on a very small problem (depending on the constants, which probably depends on the machine) but for small problems the execution time isn't generally a big issue anyway.
The "big O" specifies an upper bound on execution time. There are related notations for average execution time and lower bounds, but "big O" is the one that gets all the attention.
O(1) is constant time - the problem size doesn't matter.
O(log n) is a quite shallow curve - the time increases a bit as the problem gets bigger.
O(n) is linear time - each unit increase means it takes a roughly constant amount of extra time. The graph is (roughly) a straight line.
O(n log n) curves upwards more steeply as the problem gets more complex, but not by very much. This is the best that a general-purpose sorting algorithm can do.
O(n squared) curves upwards a lot more steeply as the problem gets more complex. This is typical for slower sorting algorithms like bubble sort.
The nastiest algorithms are classified as "np-hard" or "np-complete" where the "np" means "non-polynomial" - the curve gets steeper quicker than any polynomial. Exponential time is bad, but some are even worse. These kinds of things are still done, but only for very small problems.
EDIT the last paragraph is wrong, as indicated by the comment. I do have some holes in my algorithm theory, and clearly it's time I checked the things I thought I had figured out. In the mean time, I'm not quite sure how to correct that paragraph, so just be warned.
For your merging problem, consider that your two input lists are already sorted. The smallest item from your output must be the smallest item from one of your inputs. Get the first item from both and compare the two, and put the smallest in your output. Put the largest back where it came from. You have done a constant amount of work and you have handled one item. Repeat until both lists are exhausted.
Some details... First, putting the item back in the list just to pull it back out again is obviously silly, but it makes the explanation easier. Next - one input list will be exhausted before the other, so you need to cope with that (basically just empty out the rest of the other list and add it to the output). Finally - you don't actually have to remove items from the input lists - again, that's just the explanation. You can just step through them.

Linear time means that the runtime of the program is proportional to the length of the input. In this case the input consists of two lists. If the lists are twice as long, then the program will run approximately twice as long. Technically, we say that the algorithm should be O(n), where n is the size of the input (in this case the length of the two input lists combined).
This appears to be homework, so I will no supply you with an answer. Even though this is not homework, I am of the opinion that you will be best served by taking a pen and a piece of paper, construct two smallish example lists which are sorted, and figure out how you would merge those two lists, by hand. Once you figured that out, implementing the algorithm is a piece of cake.
(If all goes well, you will notice that you need to iterate over each list only once, in a single direction. That means that the algorithm is indeed linear. Good luck!)

If you build the result in reverse sorted order, you can use pop() and still be O(N)
pop() from the right end of the list does not require shifting the elements, so is O(1)
Reversing the list before we return it is O(N)
>>> def merge(l, r):
... result = []
... while l and r:
... if l[-1] > r[-1]:
... result.append(l.pop())
... else:
... result.append(r.pop())
... result+=(l+r)[::-1]
... result.reverse()
... return result
...
>>> merge([1,2,6,7], [1,3,5,9])
[1, 1, 2, 3, 5, 6, 7, 9]

This thread contains various implementations of a linear-time merge algorithm. Note that for practical purposes, you would use heapq.merge.

Linear time means O(n) complexity. You can read something about algorithmn comlexity and big-O notation here: http://en.wikipedia.org/wiki/Big_O_notation .
You should try to combine those lists not after getting them in the finalList, try to merge them gradually - adding an element, assuring the result is sorted, then add next element... this should give you some ideas.

A simpler version which will require equal sized lists:
def merge_sort(L1, L2):
res = []
for i in range(len(L1)):
if(L1[i]<L2[i]):
first = L1[i]
secound = L2[i]
else:
first = L2[i]
secound = L1[i]
res.extend([first,secound])
return res

itertoolz provides an efficient implementation to merge two sorted lists
https://toolz.readthedocs.io/en/latest/_modules/toolz/itertoolz.html#merge_sorted

'Linear time' means that time is an O(n) function, where n - the number of items input (items in the lists).
f(n) = O(n) means that that there exist constants x and y such that x * n <= f(n) <= y * n.
def linear_merge(list1, list2):
finalList = []
i = 0
j = 0
while i < len(list1):
if j < len(list2):
if list1[i] < list2[j]:
finalList.append(list1[i])
i += 1
else:
finalList.append(list2[j])
j += 1
else:
finalList.append(list1[i])
i += 1
while j < len(list2):
finalList.append(list2[j])
j += 1
return finalList

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.