Finding median of list in Python - python

How do you find the median of a list in Python? The list can be of any size and the numbers are not guaranteed to be in any particular order.
If the list contains an even number of elements, the function should return the average of the middle two.
Here are some examples (sorted for display purposes):
median([1]) == 1
median([1, 1]) == 1
median([1, 1, 2, 4]) == 1.5
median([0, 2, 5, 6, 8, 9, 9]) == 6
median([0, 0, 0, 0, 4, 4, 6, 8]) == 2

Python 3.4 has statistics.median:
Return the median (middle value) of numeric data.
When the number of data points is odd, return the middle data point.
When the number of data points is even, the median is interpolated by taking the average of the two middle values:
>>> median([1, 3, 5])
3
>>> median([1, 3, 5, 7])
4.0
Usage:
import statistics
items = [6, 1, 8, 2, 3]
statistics.median(items)
#>>> 3
It's pretty careful with types, too:
statistics.median(map(float, items))
#>>> 3.0
from decimal import Decimal
statistics.median(map(Decimal, items))
#>>> Decimal('3')

(Works with python-2.x):
def median(lst):
n = len(lst)
s = sorted(lst)
return (s[n//2-1]/2.0+s[n//2]/2.0, s[n//2])[n % 2] if n else None
>>> median([-5, -5, -3, -4, 0, -1])
-3.5
numpy.median():
>>> from numpy import median
>>> median([1, -4, -1, -1, 1, -3])
-1.0
For python-3.x, use statistics.median:
>>> from statistics import median
>>> median([5, 2, 3, 8, 9, -2])
4.0

The sorted() function is very helpful for this. Use the sorted function
to order the list, then simply return the middle value (or average the two middle
values if the list contains an even amount of elements).
def median(lst):
sortedLst = sorted(lst)
lstLen = len(lst)
index = (lstLen - 1) // 2
if (lstLen % 2):
return sortedLst[index]
else:
return (sortedLst[index] + sortedLst[index + 1])/2.0

Of course you can use build in functions, but if you would like to create your own you can do something like this. The trick here is to use ~ operator that flip positive number to negative. For instance ~2 -> -3 and using negative in for list in Python will count items from the end. So if you have mid == 2 then it will take third element from beginning and third item from the end.
def median(data):
data.sort()
mid = len(data) // 2
return (data[mid] + data[~mid]) / 2

Here's a cleaner solution:
def median(lst):
quotient, remainder = divmod(len(lst), 2)
if remainder:
return sorted(lst)[quotient]
return sum(sorted(lst)[quotient - 1:quotient + 1]) / 2.
Note: Answer changed to incorporate suggestion in comments.

You can try the quickselect algorithm if faster average-case running times are needed. Quickselect has average (and best) case performance O(n), although it can end up O(n²) on a bad day.
Here's an implementation with a randomly chosen pivot:
import random
def select_nth(n, items):
pivot = random.choice(items)
lesser = [item for item in items if item < pivot]
if len(lesser) > n:
return select_nth(n, lesser)
n -= len(lesser)
numequal = items.count(pivot)
if numequal > n:
return pivot
n -= numequal
greater = [item for item in items if item > pivot]
return select_nth(n, greater)
You can trivially turn this into a method to find medians:
def median(items):
if len(items) % 2:
return select_nth(len(items)//2, items)
else:
left = select_nth((len(items)-1) // 2, items)
right = select_nth((len(items)+1) // 2, items)
return (left + right) / 2
This is very unoptimised, but it's not likely that even an optimised version will outperform Tim Sort (CPython's built-in sort) because that's really fast. I've tried before and I lost.

You can use the list.sort to avoid creating new lists with sorted and sort the lists in place.
Also you should not use list as a variable name as it shadows python's own list.
def median(l):
half = len(l) // 2
l.sort()
if not len(l) % 2:
return (l[half - 1] + l[half]) / 2.0
return l[half]

def median(x):
x = sorted(x)
listlength = len(x)
num = listlength//2
if listlength%2==0:
middlenum = (x[num]+x[num-1])/2
else:
middlenum = x[num]
return middlenum

def median(array):
"""Calculate median of the given list.
"""
# TODO: use statistics.median in Python 3
array = sorted(array)
half, odd = divmod(len(array), 2)
if odd:
return array[half]
return (array[half - 1] + array[half]) / 2.0

A simple function to return the median of the given list:
def median(lst):
lst = sorted(lst) # Sort the list first
if len(lst) % 2 == 0: # Checking if the length is even
# Applying formula which is sum of middle two divided by 2
return (lst[len(lst) // 2] + lst[(len(lst) - 1) // 2]) / 2
else:
# If length is odd then get middle value
return lst[len(lst) // 2]
Some examples with the median function:
>>> median([9, 12, 20, 21, 34, 80]) # Even
20.5
>>> median([9, 12, 80, 21, 34]) # Odd
21
If you want to use library you can just simply do:
>>> import statistics
>>> statistics.median([9, 12, 20, 21, 34, 80]) # Even
20.5
>>> statistics.median([9, 12, 80, 21, 34]) # Odd
21

I posted my solution at Python implementation of "median of medians" algorithm , which is a little bit faster than using sort(). My solution uses 15 numbers per column, for a speed ~5N which is faster than the speed ~10N of using 5 numbers per column. The optimal speed is ~4N, but I could be wrong about it.
Per Tom's request in his comment, I added my code here, for reference. I believe the critical part for speed is using 15 numbers per column, instead of 5.
#!/bin/pypy
#
# TH #stackoverflow, 2016-01-20, linear time "median of medians" algorithm
#
import sys, random
items_per_column = 15
def find_i_th_smallest( A, i ):
t = len(A)
if(t <= items_per_column):
# if A is a small list with less than items_per_column items, then:
#
# 1. do sort on A
# 2. find i-th smallest item of A
#
return sorted(A)[i]
else:
# 1. partition A into columns of k items each. k is odd, say 5.
# 2. find the median of every column
# 3. put all medians in a new list, say, B
#
B = [ find_i_th_smallest(k, (len(k) - 1)/2) for k in [A[j:(j + items_per_column)] for j in range(0,len(A),items_per_column)]]
# 4. find M, the median of B
#
M = find_i_th_smallest(B, (len(B) - 1)/2)
# 5. split A into 3 parts by M, { < M }, { == M }, and { > M }
# 6. find which above set has A's i-th smallest, recursively.
#
P1 = [ j for j in A if j < M ]
if(i < len(P1)):
return find_i_th_smallest( P1, i)
P3 = [ j for j in A if j > M ]
L3 = len(P3)
if(i < (t - L3)):
return M
return find_i_th_smallest( P3, i - (t - L3))
# How many numbers should be randomly generated for testing?
#
number_of_numbers = int(sys.argv[1])
# create a list of random positive integers
#
L = [ random.randint(0, number_of_numbers) for i in range(0, number_of_numbers) ]
# Show the original list
#
# print L
# This is for validation
#
# print sorted(L)[int((len(L) - 1)/2)]
# This is the result of the "median of medians" function.
# Its result should be the same as the above.
#
print find_i_th_smallest( L, (len(L) - 1) / 2)

In case you need additional information on the distribution of your list, the percentile method will probably be useful. And a median value corresponds to the 50th percentile of a list:
import numpy as np
a = np.array([1,2,3,4,5,6,7,8,9])
median_value = np.percentile(a, 50) # return 50th percentile
print median_value

Here what I came up with during this exercise in Codecademy:
def median(data):
new_list = sorted(data)
if len(new_list)%2 > 0:
return new_list[len(new_list)/2]
elif len(new_list)%2 == 0:
return (new_list[(len(new_list)/2)] + new_list[(len(new_list)/2)-1]) /2.0
print median([1,2,3,4,5,9])

Just two lines are enough.
def get_median(arr):
'''
Calculate the median of a sequence.
:param arr: list
:return: int or float
'''
arr = sorted(arr)
return arr[len(arr)//2] if len(arr) % 2 else (arr[len(arr)//2] + arr[len(arr)//2-1])/2

median Function
def median(midlist):
midlist.sort()
lens = len(midlist)
if lens % 2 != 0:
midl = (lens / 2)
res = midlist[midl]
else:
odd = (lens / 2) -1
ev = (lens / 2)
res = float(midlist[odd] + midlist[ev]) / float(2)
return res

I had some problems with lists of float values. I ended up using a code snippet from the python3 statistics.median and is working perfect with float values without imports. source
def calculateMedian(list):
data = sorted(list)
n = len(data)
if n == 0:
return None
if n % 2 == 1:
return data[n // 2]
else:
i = n // 2
return (data[i - 1] + data[i]) / 2

def midme(list1):
list1.sort()
if len(list1)%2>0:
x = list1[int((len(list1)/2))]
else:
x = ((list1[int((len(list1)/2))-1])+(list1[int(((len(list1)/2)))]))/2
return x
midme([4,5,1,7,2])

def median(array):
if len(array) < 1:
return(None)
if len(array) % 2 == 0:
median = (array[len(array)//2-1: len(array)//2+1])
return sum(median) / len(median)
else:
return(array[len(array)//2])

I defined a median function for a list of numbers as
def median(numbers):
return (sorted(numbers)[int(round((len(numbers) - 1) / 2.0))] + sorted(numbers)[int(round((len(numbers) - 1) // 2.0))]) / 2.0

import numpy as np
def get_median(xs):
mid = len(xs) // 2 # Take the mid of the list
if len(xs) % 2 == 1: # check if the len of list is odd
return sorted(xs)[mid] #if true then mid will be median after sorting
else:
#return 0.5 * sum(sorted(xs)[mid - 1:mid + 1])
return 0.5 * np.sum(sorted(xs)[mid - 1:mid + 1]) #if false take the avg of mid
print(get_median([7, 7, 3, 1, 4, 5]))
print(get_median([1,2,3, 4,5]))

A more generalized approach for median (and percentiles) would be:
def get_percentile(data, percentile):
# Get the number of observations
cnt=len(data)
# Sort the list
data=sorted(data)
# Determine the split point
i=(cnt-1)*percentile
# Find the `floor` of the split point
diff=i-int(i)
# Return the weighted average of the value above and below the split point
return data[int(i)]*(1-diff)+data[int(i)+1]*(diff)
# Data
data=[1,2,3,4,5]
# For the median
print(get_percentile(data=data, percentile=.50))
# > 3
print(get_percentile(data=data, percentile=.75))
# > 4
# Note the weighted average difference when an int is not returned by the percentile
print(get_percentile(data=data, percentile=.51))
# > 3.04

Try This
import math
def find_median(arr):
if len(arr)%2==1:
med=math.ceil(len(arr)/2)-1
return arr[med]
else:
return -1
print(find_median([1,2,3,4,5,6,7,8]))

Implement it:
def median(numbers):
"""
Calculate median of a list numbers.
:param numbers: the numbers to be calculated.
:return: median value of numbers.
>>> median([1, 3, 3, 6, 7, 8, 9])
6
>>> median([1, 2, 3, 4, 5, 6, 8, 9])
4.5
>>> import statistics
>>> import random
>>> numbers = random.sample(range(-50, 50), k=100)
>>> statistics.median(numbers) == median(numbers)
True
"""
numbers = sorted(numbers)
mid_index = len(numbers) // 2
return (
(numbers[mid_index] + numbers[mid_index - 1]) / 2 if mid_index % 2 == 0
else numbers[mid_index]
)
if __name__ == "__main__":
from doctest import testmod
testmod()
source from

Function median:
def median(d):
d=np.sort(d)
n2=int(len(d)/2)
r=n2%2
if (r==0):
med=d[n2]
else:
med=(d[n2] + d[n2+1]) / 2
return med

Simply, Create a Median Function with an argument as a list of the number and call the function.
def median(l):
l = sorted(l)
lent = len(l)
if (lent % 2) == 0:
m = int(lent / 2)
result = l[m]
else:
m = int(float(lent / 2) - 0.5)
result = l[m]
return result

What I did was this:
def median(a):
a = sorted(a)
if len(a) / 2 != int:
return a[len(a) / 2]
else:
return (a[len(a) / 2] + a[(len(a) / 2) - 1]) / 2
Explanation: Basically if the number of items in the list is odd, return the middle number, otherwise, if you half an even list, python automatically rounds the higher number so we know the number before that will be one less (since we sorted it) and we can add the default higher number and the number lower than it and divide them by 2 to find the median.

Here's the tedious way to find median without using the median function:
def median(*arg):
order(arg)
numArg = len(arg)
half = int(numArg/2)
if numArg/2 ==half:
print((arg[half-1]+arg[half])/2)
else:
print(int(arg[half]))
def order(tup):
ordered = [tup[i] for i in range(len(tup))]
test(ordered)
while(test(ordered)):
test(ordered)
print(ordered)
def test(ordered):
whileloop = 0
for i in range(len(ordered)-1):
print(i)
if (ordered[i]>ordered[i+1]):
print(str(ordered[i]) + ' is greater than ' + str(ordered[i+1]))
original = ordered[i+1]
ordered[i+1]=ordered[i]
ordered[i]=original
whileloop = 1 #run the loop again if you had to switch values
return whileloop

It is very simple;
def median(alist):
#to find median you will have to sort the list first
sList = sorted(alist)
first = 0
last = len(sList)-1
midpoint = (first + last)//2
return midpoint
And you can use the return value like this median = median(anyList)

Related

How do I check the list elements satisfy a given condition?

I have a list of numbers where in the sum of any two adjacent numbers is a perfect square.
The list is x=[1,8,28,21,4,32,17,19,30,6,3,13,12,24]
for i in range(len(x)-1):
y= x[i]+x[i+1]
z=y**0.5
#till here found the square root of the sum of the adjacent numbers in list
if(z.is_integer==True):
//code
I want to check the remaining numbers in the list. If all the elements of the list satisfy the condition. Then I want to print the list
The expected output should be
[1,8,28,21,4,32,17,19,30,6,3,13,12,24] satisfies the condition
Maybe something like this? Make function that will be called for list and return True if list satisfies condition and False if it doesn't.
def some_function(nums):
for i in range(len(nums) - 1):
y = nums[i] + nums[i + 1]
z = y ** 0.5
#till here found the square root of the sum of the adjacent numbers in list
if z.is_integer() not True:
# if there is some two numbers that don't meet condition, function will return False
return False
return True
You call it like this: meet_condition = some_function(x)
After that just check if it's True and if it is print list and appropriate text.
You can use this approach. I am sure there is a better way, with fewer lines of code. Have a blast!
numbers = [1,8,28,21,4,32,17,19,30,6,3,13,12,24]
for i in range(len(numbers)):
if i < len(numbers) - 1:
if ((numbers[i] + numbers[i+1]) ** 0.5) % 1 == 0:
continue
else:
print("Does not satisfy condition")
break
print(numbers)
Output:
[1, 8, 28, 21, 4, 32, 17, 19, 30, 6, 3, 13, 12, 24]
import math
x=[1,8,28,21,4,32,17,19,30,6,3,13,12,24]
b = [x[i] for i in range(0,len(x)-1) if math.sqrt(x[i]+x[i+1]).is_integer()]
if (b[-1] == x[-2]):
b.append(x[-1])
print(b)
Output : [1, 8, 28, 21, 4, 32, 17, 19, 30, 6, 3, 13, 12, 24]
I'd suggest you review the way you check the integer is a square:
y = 9
z = y ** 0.5
z #=> 3.0
z.is_integer==True #=> False
So, it seems 9 not to be a square.
is_integer should be is_integer() without check.
Please refer to this topic for example: Check if a number is a perfect square
For a solution to the question, I propose to split the problem:
Define a method that returns consecutive pairs from the list;
Define a method that checks if the number is a squere;
Put it together using the all(iterable) function.
First the method for getting consecutive elements, in this case is a generator:
def each_cons(iterable, n = 2):
if n < 2: n = 1
i, size = 0, len(iterable)
while i < size-n+1:
yield iterable[i:i+n]
i += 1
Given your list x=[1,8,28,21,4,32,17,19,30,6,3,13,12,24], you call it this way:
list(each_cons(x));
# => [[1, 8],[8, 28],[28, 21],[21, 4],[4, 32],[32, 17],[17, 19],[19, 30],[30, 6],[6, 3],[3, 13],[13, 12],[12, 24]]
Then a method for the square, which I've stolen here:
def is_square(apositiveint):
x = apositiveint // 2
seen = set([x])
while x * x != apositiveint:
x = (x + (apositiveint // x)) // 2
if x in seen: return False
seen.add(x)
return True
Finally put all together:
all(is_square(a + b) for a, b in each_cons(x))
#=> True
Something pythonic should be like this.
import numpy as np
import functools
nums = np.array([1,8,28,21,4,32,17,19,30,6,3,13,12,24])
result = functools.reduce(lambda a,b: a and b, map(lambda x : (x ** 0.5).is_integer(), nums[1:] + nums[:-1]))

min sum of consecutive values with Divide and Conquer

Given an array of random integers
N = [1,...,n]
I need to find min sum of two consecutive values using divide and conquer.
What is not working here but my IQ?
def minSum(array):
if len(array) < 2:
return array[0]+array[1]
if (len(a)%2) != 0:
mid = int(len(array)/2)
leftArray = array[:mid]
rightArray = array[mid+1:]
return min(minSum(leftArray),minSum(rightArray),crossSum(array,mid))
else:
mid = int(len(array)/2)
leftArray = array[:mid]
rightArray = array[mid:]
return min(minSum(leftArray), minSum(rightArray), array[mid]+array[mid+1])
def crossSum(array,mid):
return min(array[mid-1]+array[mid],array[mid]+array[mid+1])
The main problem seems to be that the first condition is wrong: If len(array) < 2, then the following line is bound to raise an IndexError. Also, a is not defined. I assume that that's the name of the array in the outer scope, thus this does not raise an exception but just silently uses the wrong array. Apart from that, the function seems to more-or-less work (did not test it thoroughly, though.
However, you do not really need to check whether the array has odd or even length, you can just use the same code for both cases, making the crossSum function unneccesary. Also, it is kind of confusing that the function for returning the min sum is called maxSum. If you really want a divide-and-conquer approach, try this:
def minSum(array):
if len(array) < 2:
return 10**100
elif len(array) == 2:
return array[0]+array[1]
else:
# len >= 3 -> both halves guaranteed non-empty
mid = len(array) // 2
leftArray = array[:mid]
rightArray = array[mid:]
return min(minSum(leftArray),
minSum(rightArray),
leftArray[-1] + rightArray[0])
import random
lst = [random.randint(1, 10) for _ in range(20)]
r = minSum(lst)
print(lst)
print(r)
Random example output:
[1, 5, 6, 4, 1, 2, 2, 10, 7, 10, 8, 4, 9, 5, 7, 6, 5, 1, 4, 9]
3
However, a simple loop would be much better suited for the problem:
def minSum(array):
return min(array[i-1] + array[i] for i in range(1, len(array)))

Return elements of array in subset sum problem

For the following recursive solution to the subset sum problem (see this link), the following code returns true if there is any subset in a whose sum equals the value of sum
def isSubsetSum(set,n, sum) :
# Base Cases
if (sum == 0) :
return True
if (n == 0 and sum != 0) :
return False
# If last element is greater than sum, then ignore it
if (set[n - 1] > sum) :
return isSubsetSum(set, n - 1, sum);
# else, check if sum can be obtained by any of the following
# (a) including the last element
# (b) excluding the last element
return isSubsetSum(set, n-1, sum) or isSubsetSum(set, n-1, sum-set[n-1])
set = [2, 1, 14, 12, 15, 2]
sum = 9
n = len(set)
if (isSubsetSum(set, n, sum) == True) :
print("Found a subset with given sum")
else :
print("No subset with given sum")
How could I also return the indices of a which satisfy the sum?
Also, how can I make the function work for negative integers in array a.
This is one possibility:
def isSubsetSum(numbers, n, x, indices):
# Base Cases
if (x == 0):
return True
if (n == 0 and x != 0):
return False
# If last element is greater than x, then ignore it
if (numbers[n - 1] > x):
return isSubsetSum(numbers, n - 1, x, indices)
# else, check if x can be obtained by any of the following
# (a) including the last element
found = isSubsetSum(numbers, n - 1, x, indices)
if found: return True
# (b) excluding the last element
indices.insert(0, n - 1)
found = isSubsetSum(numbers, n - 1, x - numbers[n - 1], indices)
if not found: indices.pop(0)
return found
numbers = [2, 1, 4, 12, 15, 3]
x = 9
n = len(numbers)
indices = []
found = isSubsetSum(numbers, n, x, indices)
print(found)
# True
print(indices)
# [0, 2, 5]
EDIT: For a cleaner interface, you can wrap the previous function in another one that returns the list of indices on success and None otherwise:
def isSubsetSum(numbers, x):
indices = []
found = _isSubsetSum_rec(numbers, len(numbers), x, indices)
return indices if found else None
def _isSubsetSum_rec(numbers, n, x, indices):
# Previous function
Here's an approach: Instead of True and False we return the subarray if found, else None:
def isSubsetSum(sset,n, ssum) :
# Base Cases
if (ssum == 0) :
return []
# not found
if (n == 0 and ssum != 0) :
return None
# If last element is greater than sum, then ignore it
if (sset[n - 1] > ssum) :
return isSubsetSum(sset, n - 1, ssum);
# else, check if sum can be obtained by any of the following
# (a) including the last element
# (b) excluding the last element
a1 = isSubsetSum(sset, n-1, ssum)
# (b) excluding last element fails
if a1 is None:
a2 = isSubsetSum(sset, n-1, ssum-sset[n-1])
# (a) including last element fails
if a2 is None:
return None
# (a) including last element successes
else: return a2 + [sset[n-1]]
else:
return a1
sset = [2, 1, 4, 12, 15, 2]
ssum = 9
n = len(sset)
subset = isSubsetSum(sset, n, ssum)
if subset is not None :
print(subset)
else :
print("No subset with given sum")
# [2, 1, 4, 2]
The "easy" way is to make the index list another parameter to the function. Maintain it as you move down the call tree.
The "clean" way is to build it as you return back up the tree with a successful result. Instead of returning a Boolean, return a list of the indices; None indicates failure.
One important note: do not "shadow" a built-in type with a variable name; set is both dangerous as a variable name and misleading. Try coins, for example.
Base cases:
if (sum == 0) :
return [n]
if (n == 0 and sum != 0) :
return None
Recursion:
include = isSubsetSum(coins, n-1, sum-coins[n-1]):
if include:
return include.append(n-1)
Those are immediate additions for your existing code. I strongly recommend that you research this problem on line to see how others have solved it. You waste space and time by passing the entire list and a pointer as parameters; rather, you can start with the last element and work your way back, snipping the end off the list on each call. This will make your indexing problems much simpler.

Find the nth lucky number generated by a sieve in Python

I'm trying to make a program in Python which will generate the nth lucky number according to the lucky number sieve. I'm fairly new to Python so I don't know how to do all that much yet. So far I've figured out how to make a function which determines all lucky numbers below a specified number:
def lucky(number):
l = range(1, number + 1, 2)
i = 1
while i < len(l):
del l[l[i] - 1::l[i]]
i += 1
return l
Is there a way to modify this so that I can instead find the nth lucky number? I thought about increasing the specified number gradually until a list of the appropriate length to find the required lucky number was created, but that seems like a really inefficient way of doing it.
Edit: I came up with this, but is there a better way?
def lucky(number):
f = 2
n = number * f
while True:
l = range(1, n + 1, 2)
i = 1
while i < len(l):
del l[l[i] - 1::l[i]]
i += 1
if len(l) >= number:
return l[number - 1]
f += 1
n = number * f
I came up with this, but is there a better way?
Truth is, there will always be a better way, the remaining question being: is it good enough for your need?
One possible improvement would be to turn all this into a generator function. That way, you would only compute new values as they are consumed. I came up with this version, which I only validated up to about 60 terms:
import itertools
def _idx_after_removal(removed_indices, value):
for removed in removed_indices:
value -= value / removed
return value
def _should_be_excluded(removed_indices, value):
for j in range(len(removed_indices) - 1):
value_idx = _idx_after_removal(removed_indices[:j + 1], value)
if value_idx % removed_indices[j + 1] == 0:
return True
return False
def lucky():
yield 1
removed_indices = [2]
for i in itertools.count(3, 2):
if not _should_be_excluded(removed_indices, i):
yield i
removed_indices.append(i)
removed_indices = list(set(removed_indices))
removed_indices.sort()
If you want to extract for example the 100th term from this generator, you can use itertools nth recipe:
def nth(iterable, n, default=None):
"Returns the nth item or a default value"
return next(itertools.islice(iterable, n, None), default)
print nth(lucky(), 100)
I hope this works, and there's without any doubt more room for code improvement (but as stated previously, there's always room for improvement!).
With numpy arrays, you can make use of boolean indexing, which may help. For example:
>>> a = numpy.arange(10)
>>> print a
[0 1 2 3 4 5 6 7 8 9]
>>> print a[a > 3]
[4 5 6 7 8 9]
>>> mask = np.array([True, False, True, False, True, False, True, False, True, False])
>>> print a[mask]
[0 2 4 6 8]
Here is a lucky number function using numpy arrays:
import numpy as np
class Didnt_Findit(Exception):
pass
def lucky(n):
'''Return the nth lucky number.
n --> int
returns int
'''
# initial seed
lucky_numbers = [1]
# how many numbers do you need to get to n?
candidates = np.arange(1, n*100, 2)
# use numpy array boolean indexing
next_lucky = candidates[candidates > lucky_numbers[-1]][0]
# accumulate lucky numbers till you have n of them
while next_lucky < candidates[-1]:
lucky_numbers.append(next_lucky)
#print lucky_numbers
if len(lucky_numbers) == n:
return lucky_numbers[-1]
mask_start = next_lucky - 1
mask_step = next_lucky
mask = np.array([True] * len(candidates))
mask[mask_start::mask_step] = False
#print mask
candidates = candidates[mask]
next_lucky = candidates[ candidates > lucky_numbers[-1]][0]
raise Didnt_Findit('n = ', n)
>>> print lucky(10)
33
>>> print lucky(50)
261
>>> print lucky(500)
3975
Checked mine and #icecrime's output for 10, 50 and 500 - they matched.
Yours is much faster than mine and scales better with n.
n=input('enter n ')
a= list(xrange(1,n))
x=a[1]
for i in range(1,n):
del a[x-1::x]
x=a[i]
l=len(a)
if i==l-1:
break
print "lucky numbers till %d" % n
print a
lets do this with an example.lets print lucky numbers till 100
put n=100
firstly a=1,2,3,4,5....100
x=a[1]=2
del a[1::2] leaves
a=1,3,5,7....99
now l=50
and now x=3
then del a[2::3] leaving a =1,3,7,9,13,15,.....
and loop continues till i==l-1

Efficient way to find missing elements in an integer sequence

Suppose we have two items missing in a sequence of consecutive integers and the missing elements lie between the first and last elements. I did write a code that does accomplish the task. However, I wanted to make it efficient using less loops if possible. Any help will be appreciated. Also what about the condition when we have to find more missing items (say close to n/4) instead of 2. I think then my code should be efficient right because I am breaking out from the loop earlier?
def missing_elements(L,start,end,missing_num):
complete_list = range(start,end+1)
count = 0
input_index = 0
for item in complete_list:
if item != L[input_index]:
print item
count += 1
else :
input_index += 1
if count > missing_num:
break
def main():
L = [10,11,13,14,15,16,17,18,20]
start = 10
end = 20
missing_elements(L,start,end,2)
if __name__ == "__main__":
main()
If the input sequence is sorted, you could use sets here. Take the start and end values from the input list:
def missing_elements(L):
start, end = L[0], L[-1]
return sorted(set(range(start, end + 1)).difference(L))
This assumes Python 3; for Python 2, use xrange() to avoid building a list first.
The sorted() call is optional; without it a set() is returned of the missing values, with it you get a sorted list.
Demo:
>>> L = [10,11,13,14,15,16,17,18,20]
>>> missing_elements(L)
[12, 19]
Another approach is by detecting gaps between subsequent numbers; using an older itertools library sliding window recipe:
from itertools import islice, chain
def window(seq, n=2):
"Returns a sliding window (of width n) over data from the iterable"
" s -> (s0,s1,...s[n-1]), (s1,s2,...,sn), ... "
it = iter(seq)
result = tuple(islice(it, n))
if len(result) == n:
yield result
for elem in it:
result = result[1:] + (elem,)
yield result
def missing_elements(L):
missing = chain.from_iterable(range(x + 1, y) for x, y in window(L) if (y - x) > 1)
return list(missing)
This is a pure O(n) operation, and if you know the number of missing items, you can make sure it only produces those and then stops:
def missing_elements(L, count):
missing = chain.from_iterable(range(x + 1, y) for x, y in window(L) if (y - x) > 1)
return list(islice(missing, 0, count))
This will handle larger gaps too; if you are missing 2 items at 11 and 12, it'll still work:
>>> missing_elements([10, 13, 14, 15], 2)
[11, 12]
and the above sample only had to iterate over [10, 13] to figure this out.
Assuming that L is a list of integers with no duplicates, you can infer that the part of the list between start and index is completely consecutive if and only if L[index] == L[start] + (index - start) and similarly with index and end is completely consecutive if and only if L[index] == L[end] - (end - index). This combined with splitting the list into two recursively gives a sublinear solution.
# python 3.3 and up, in older versions, replace "yield from" with yield loop
def missing_elements(L, start, end):
if end - start <= 1:
if L[end] - L[start] > 1:
yield from range(L[start] + 1, L[end])
return
index = start + (end - start) // 2
# is the lower half consecutive?
consecutive_low = L[index] == L[start] + (index - start)
if not consecutive_low:
yield from missing_elements(L, start, index)
# is the upper part consecutive?
consecutive_high = L[index] == L[end] - (end - index)
if not consecutive_high:
yield from missing_elements(L, index, end)
def main():
L = [10,11,13,14,15,16,17,18,20]
print(list(missing_elements(L,0,len(L)-1)))
L = range(10, 21)
print(list(missing_elements(L,0,len(L)-1)))
main()
missingItems = [x for x in complete_list if not x in L]
a=[1,2,3,7,5,11,20]
b=[]
def miss(a,b):
for x in range (a[0],a[-1]):
if x not in a:
b.append(x)
return b
print (miss(a,b))
ANS:[4, 6, 8, 9, 10, 12, 13, 14, 15, 16, 17, 18, 19]
works for sorted,unsorted , with duplicates too
Using collections.Counter:
from collections import Counter
dic = Counter([10, 11, 13, 14, 15, 16, 17, 18, 20])
print([i for i in range(10, 20) if dic[i] == 0])
Output:
[12, 19]
arr = [1, 2, 5, 6, 10, 12]
diff = []
"""zip will return array of tuples (1, 2) (2, 5) (5, 6) (6, 10) (10, 12) """
for a, b in zip(arr , arr[1:]):
if a + 1 != b:
diff.extend(range(a+1, b))
print(diff)
[3, 4, 7, 8, 9, 11]
If the list is sorted we can lookup for any gap. Then generate a range object between current (+1) and next value (not inclusive) and extend it to the list of differences.
Using scipy lib:
import math
from scipy.optimize import fsolve
def mullist(a):
mul = 1
for i in a:
mul = mul*i
return mul
a = [1,2,3,4,5,6,9,10]
s = sum(a)
so = sum(range(1,11))
mulo = mullist(range(1,11))
mul = mullist(a)
over = mulo/mul
delta = so -s
# y = so - s -x
# xy = mulo/mul
def func(x):
return (so -s -x)*x-over
print int(round(fsolve(func, 0))), int(round(delta - fsolve(func, 0)))
Timing it:
$ python -mtimeit -s "$(cat with_scipy.py)"
7 8
100000000 loops, best of 3: 0.0181 usec per loop
Other option is:
>>> from sets import Set
>>> a = Set(range(1,11))
>>> b = Set([1,2,3,4,5,6,9,10])
>>> a-b
Set([8, 7])
And the timing is:
Set([8, 7])
100000000 loops, best of 3: 0.0178 usec per loop
My take was to use no loops and set operations:
def find_missing(in_list):
complete_set = set(range(in_list[0], in_list[-1] + 1))
return complete_set - set(in_list)
def main():
sample = [10, 11, 13, 14, 15, 16, 17, 18, 20]
print find_missing(sample)
if __name__ == "__main__":
main()
# => set([19, 12])
Simply walk the list and look for non-consecutive numbers:
prev = L[0]
for this in L[1:]:
if this > prev+1:
for item in range(prev+1, this): # this handles gaps of 1 or more
print item
prev = this
Here's a one-liner:
In [10]: l = [10,11,13,14,15,16,17,18,20]
In [11]: [i for i, (n1, n2) in enumerate(zip(l[:-1], l[1:])) if n1 + 1 != n2]
Out[11]: [1, 7]
I use the list, slicing to offset the copies by one, and use enumerate to get the indices of the missing item.
For long lists, this isn't great because it's not O(log(n)), but I think it should be pretty efficient versus using a set for small inputs. izip from itertools would probably make it quicker still.
>>> l = [10,11,13,14,15,16,17,18,20]
>>> [l[i]+1 for i, j in enumerate(l) if (l+[0])[i+1] - l[i] > 1]
[12, 19]
We found a missing value if the difference between two consecutive numbers is greater than 1:
>>> L = [10,11,13,14,15,16,17,18,20]
>>> [x + 1 for x, y in zip(L[:-1], L[1:]) if y - x > 1]
[12, 19]
Note: Python 3. In Python 2 use itertools.izip.
Improved version for more than one value missing in a row:
>>> import itertools as it
>>> L = [10,11,14,15,16,17,18,20] # 12, 13 and 19 missing
>>> [x + diff for x, y in zip(it.islice(L, None, len(L) - 1),
it.islice(L, 1, None))
for diff in range(1, y - x) if diff]
[12, 13, 19]
def missing_elements(inlist):
if len(inlist) <= 1:
return []
else:
if inlist[1]-inlist[0] > 1:
return [inlist[0]+1] + missing_elements([inlist[0]+1] + inlist[1:])
else:
return missing_elements(inlist[1:])
First we should sort the list and then we check for each element, except the last one, if the next value is in the list. Be carefull not to have duplicates in the list!
l.sort()
[l[i]+1 for i in range(len(l)-1) if l[i]+1 not in l]
I stumbled on this looking for a different kind of efficiency -- given a list of unique serial numbers, possibly very sparse, yield the next available serial number, without creating the entire set in memory. (Think of an inventory where items come and go frequently, but some are long-lived.)
def get_serial(string_ids, longtail=False):
int_list = map(int, string_ids)
int_list.sort()
n = len(int_list)
for i in range(0, n-1):
nextserial = int_list[i]+1
while nextserial < int_list[i+1]:
yield nextserial
nextserial+=1
while longtail:
nextserial+=1
yield nextserial
[...]
def main():
[...]
serialgenerator = get_serial(list1, longtail=True)
while somecondition:
newserial = next(serialgenerator)
(Input is a list of string representations of integers, yield is an integer, so not completely generic code. longtail provides extrapolation if we run out of range.)
There's also an answer to a similar question which suggests using a bitarray for efficiently handling a large sequence of integers.
Some versions of my code used functions from itertools but I ended up abandoning that approach.
A bit of mathematics and we get a simple solution. The below solution works for integers from m to n.
Works for both sorted and unsorted postive and negative numbers.
#numbers = [-1,-2,0,1,2,3,5]
numbers = [-2,0,1,2,5,-1,3]
sum_of_nums = 0
max = numbers[0]
min = numbers[0]
for i in numbers:
if i > max:
max = i
if i < min:
min = i
sum_of_nums += i
# Total : sum of numbers from m to n
total = ((max - min + 1) * (max + min)) / 2
# Subtract total with sum of numbers which will give the missing value
print total - sum_of_nums
With this code you can find any missing values in a sequence, except the last number. It in only required to input your data into excel file with column name "numbers".
import pandas as pd
import numpy as np
data = pd.read_excel("numbers.xlsx")
data_sort=data.sort_values('numbers',ascending=True)
index=list(range(len(data_sort)))
data_sort['index']=index
data_sort['index']=data_sort['index']+1
missing=[]
for i in range (len(data_sort)-1):
if data_sort['numbers'].iloc[i+1]-data_sort['numbers'].iloc[i]>1:
gap=data_sort['numbers'].iloc[i+1]-data_sort['numbers'].iloc[i]
numerator=1
for j in range (1,gap):
mis_value=data_sort['numbers'].iloc[i+1]-numerator
missing.append(mis_value)
numerator=numerator+1
print(np.sort(missing))

Categories

Resources