Efficient way to find missing elements in an integer sequence

Efficient way to find missing elements in an integer sequence - python

Suppose we have two items missing in a sequence of consecutive integers and the missing elements lie between the first and last elements. I did write a code that does accomplish the task. However, I wanted to make it efficient using less loops if possible. Any help will be appreciated. Also what about the condition when we have to find more missing items (say close to n/4) instead of 2. I think then my code should be efficient right because I am breaking out from the loop earlier?
def missing_elements(L,start,end,missing_num):
complete_list = range(start,end+1)
count = 0
input_index = 0
for item in complete_list:
if item != L[input_index]:
print item
count += 1
else :
input_index += 1
if count > missing_num:
break
def main():
L = [10,11,13,14,15,16,17,18,20]
start = 10
end = 20
missing_elements(L,start,end,2)
if __name__ == "__main__":
main()

If the input sequence is sorted, you could use sets here. Take the start and end values from the input list:
def missing_elements(L):
start, end = L[0], L[-1]
return sorted(set(range(start, end + 1)).difference(L))
This assumes Python 3; for Python 2, use xrange() to avoid building a list first.
The sorted() call is optional; without it a set() is returned of the missing values, with it you get a sorted list.
Demo:
>>> L = [10,11,13,14,15,16,17,18,20]
>>> missing_elements(L)
[12, 19]
Another approach is by detecting gaps between subsequent numbers; using an older itertools library sliding window recipe:
from itertools import islice, chain
def window(seq, n=2):
"Returns a sliding window (of width n) over data from the iterable"
" s -> (s0,s1,...s[n-1]), (s1,s2,...,sn), ... "
it = iter(seq)
result = tuple(islice(it, n))
if len(result) == n:
yield result
for elem in it:
result = result[1:] + (elem,)
yield result
def missing_elements(L):
missing = chain.from_iterable(range(x + 1, y) for x, y in window(L) if (y - x) > 1)
return list(missing)
This is a pure O(n) operation, and if you know the number of missing items, you can make sure it only produces those and then stops:
def missing_elements(L, count):
missing = chain.from_iterable(range(x + 1, y) for x, y in window(L) if (y - x) > 1)
return list(islice(missing, 0, count))
This will handle larger gaps too; if you are missing 2 items at 11 and 12, it'll still work:
>>> missing_elements([10, 13, 14, 15], 2)
[11, 12]
and the above sample only had to iterate over [10, 13] to figure this out.

Assuming that L is a list of integers with no duplicates, you can infer that the part of the list between start and index is completely consecutive if and only if L[index] == L[start] + (index - start) and similarly with index and end is completely consecutive if and only if L[index] == L[end] - (end - index). This combined with splitting the list into two recursively gives a sublinear solution.
# python 3.3 and up, in older versions, replace "yield from" with yield loop
def missing_elements(L, start, end):
if end - start <= 1:
if L[end] - L[start] > 1:
yield from range(L[start] + 1, L[end])
return
index = start + (end - start) // 2
# is the lower half consecutive?
consecutive_low = L[index] == L[start] + (index - start)
if not consecutive_low:
yield from missing_elements(L, start, index)
# is the upper part consecutive?
consecutive_high = L[index] == L[end] - (end - index)
if not consecutive_high:
yield from missing_elements(L, index, end)
def main():
L = [10,11,13,14,15,16,17,18,20]
print(list(missing_elements(L,0,len(L)-1)))
L = range(10, 21)
print(list(missing_elements(L,0,len(L)-1)))
main()

missingItems = [x for x in complete_list if not x in L]

a=[1,2,3,7,5,11,20]
b=[]
def miss(a,b):
for x in range (a[0],a[-1]):
if x not in a:
b.append(x)
return b
print (miss(a,b))
ANS:[4, 6, 8, 9, 10, 12, 13, 14, 15, 16, 17, 18, 19]
works for sorted,unsorted , with duplicates too

Using collections.Counter:
from collections import Counter
dic = Counter([10, 11, 13, 14, 15, 16, 17, 18, 20])
print([i for i in range(10, 20) if dic[i] == 0])
Output:
[12, 19]

arr = [1, 2, 5, 6, 10, 12]
diff = []
"""zip will return array of tuples (1, 2) (2, 5) (5, 6) (6, 10) (10, 12) """
for a, b in zip(arr , arr[1:]):
if a + 1 != b:
diff.extend(range(a+1, b))
print(diff)
[3, 4, 7, 8, 9, 11]
If the list is sorted we can lookup for any gap. Then generate a range object between current (+1) and next value (not inclusive) and extend it to the list of differences.

Using scipy lib:
import math
from scipy.optimize import fsolve
def mullist(a):
mul = 1
for i in a:
mul = mul*i
return mul
a = [1,2,3,4,5,6,9,10]
s = sum(a)
so = sum(range(1,11))
mulo = mullist(range(1,11))
mul = mullist(a)
over = mulo/mul
delta = so -s
# y = so - s -x
# xy = mulo/mul
def func(x):
return (so -s -x)*x-over
print int(round(fsolve(func, 0))), int(round(delta - fsolve(func, 0)))
Timing it:
$ python -mtimeit -s "$(cat with_scipy.py)"
7 8
100000000 loops, best of 3: 0.0181 usec per loop
Other option is:
>>> from sets import Set
>>> a = Set(range(1,11))
>>> b = Set([1,2,3,4,5,6,9,10])
>>> a-b
Set([8, 7])
And the timing is:
Set([8, 7])
100000000 loops, best of 3: 0.0178 usec per loop

My take was to use no loops and set operations:
def find_missing(in_list):
complete_set = set(range(in_list[0], in_list[-1] + 1))
return complete_set - set(in_list)
def main():
sample = [10, 11, 13, 14, 15, 16, 17, 18, 20]
print find_missing(sample)
if __name__ == "__main__":
main()
# => set([19, 12])

Simply walk the list and look for non-consecutive numbers:
prev = L[0]
for this in L[1:]:
if this > prev+1:
for item in range(prev+1, this): # this handles gaps of 1 or more
print item
prev = this

Here's a one-liner:
In [10]: l = [10,11,13,14,15,16,17,18,20]
In [11]: [i for i, (n1, n2) in enumerate(zip(l[:-1], l[1:])) if n1 + 1 != n2]
Out[11]: [1, 7]
I use the list, slicing to offset the copies by one, and use enumerate to get the indices of the missing item.
For long lists, this isn't great because it's not O(log(n)), but I think it should be pretty efficient versus using a set for small inputs. izip from itertools would probably make it quicker still.

>>> l = [10,11,13,14,15,16,17,18,20]
>>> [l[i]+1 for i, j in enumerate(l) if (l+[0])[i+1] - l[i] > 1]
[12, 19]

We found a missing value if the difference between two consecutive numbers is greater than 1:
>>> L = [10,11,13,14,15,16,17,18,20]
>>> [x + 1 for x, y in zip(L[:-1], L[1:]) if y - x > 1]
[12, 19]
Note: Python 3. In Python 2 use itertools.izip.
Improved version for more than one value missing in a row:
>>> import itertools as it
>>> L = [10,11,14,15,16,17,18,20] # 12, 13 and 19 missing
>>> [x + diff for x, y in zip(it.islice(L, None, len(L) - 1),
it.islice(L, 1, None))
for diff in range(1, y - x) if diff]
[12, 13, 19]

def missing_elements(inlist):
if len(inlist) <= 1:
return []
else:
if inlist[1]-inlist[0] > 1:
return [inlist[0]+1] + missing_elements([inlist[0]+1] + inlist[1:])
else:
return missing_elements(inlist[1:])

First we should sort the list and then we check for each element, except the last one, if the next value is in the list. Be carefull not to have duplicates in the list!
l.sort()
[l[i]+1 for i in range(len(l)-1) if l[i]+1 not in l]

I stumbled on this looking for a different kind of efficiency -- given a list of unique serial numbers, possibly very sparse, yield the next available serial number, without creating the entire set in memory. (Think of an inventory where items come and go frequently, but some are long-lived.)
def get_serial(string_ids, longtail=False):
int_list = map(int, string_ids)
int_list.sort()
n = len(int_list)
for i in range(0, n-1):
nextserial = int_list[i]+1
while nextserial < int_list[i+1]:
yield nextserial
nextserial+=1
while longtail:
nextserial+=1
yield nextserial
[...]
def main():
[...]
serialgenerator = get_serial(list1, longtail=True)
while somecondition:
newserial = next(serialgenerator)
(Input is a list of string representations of integers, yield is an integer, so not completely generic code. longtail provides extrapolation if we run out of range.)
There's also an answer to a similar question which suggests using a bitarray for efficiently handling a large sequence of integers.
Some versions of my code used functions from itertools but I ended up abandoning that approach.

A bit of mathematics and we get a simple solution. The below solution works for integers from m to n.
Works for both sorted and unsorted postive and negative numbers.
#numbers = [-1,-2,0,1,2,3,5]
numbers = [-2,0,1,2,5,-1,3]
sum_of_nums = 0
max = numbers[0]
min = numbers[0]
for i in numbers:
if i > max:
max = i
if i < min:
min = i
sum_of_nums += i
# Total : sum of numbers from m to n
total = ((max - min + 1) * (max + min)) / 2
# Subtract total with sum of numbers which will give the missing value
print total - sum_of_nums

With this code you can find any missing values in a sequence, except the last number. It in only required to input your data into excel file with column name "numbers".
import pandas as pd
import numpy as np
data = pd.read_excel("numbers.xlsx")
data_sort=data.sort_values('numbers',ascending=True)
index=list(range(len(data_sort)))
data_sort['index']=index
data_sort['index']=data_sort['index']+1
missing=[]
for i in range (len(data_sort)-1):
if data_sort['numbers'].iloc[i+1]-data_sort['numbers'].iloc[i]>1:
gap=data_sort['numbers'].iloc[i+1]-data_sort['numbers'].iloc[i]
numerator=1
for j in range (1,gap):
mis_value=data_sort['numbers'].iloc[i+1]-numerator
missing.append(mis_value)
numerator=numerator+1
print(np.sort(missing))

Related

How do I check the list elements satisfy a given condition?

I have a list of numbers where in the sum of any two adjacent numbers is a perfect square.
The list is x=[1,8,28,21,4,32,17,19,30,6,3,13,12,24]
for i in range(len(x)-1):
y= x[i]+x[i+1]
z=y**0.5
#till here found the square root of the sum of the adjacent numbers in list
if(z.is_integer==True):
//code
I want to check the remaining numbers in the list. If all the elements of the list satisfy the condition. Then I want to print the list
The expected output should be
[1,8,28,21,4,32,17,19,30,6,3,13,12,24] satisfies the condition

Maybe something like this? Make function that will be called for list and return True if list satisfies condition and False if it doesn't.
def some_function(nums):
for i in range(len(nums) - 1):
y = nums[i] + nums[i + 1]
z = y ** 0.5
#till here found the square root of the sum of the adjacent numbers in list
if z.is_integer() not True:
# if there is some two numbers that don't meet condition, function will return False
return False
return True
You call it like this: meet_condition = some_function(x)
After that just check if it's True and if it is print list and appropriate text.

You can use this approach. I am sure there is a better way, with fewer lines of code. Have a blast!
numbers = [1,8,28,21,4,32,17,19,30,6,3,13,12,24]
for i in range(len(numbers)):
if i < len(numbers) - 1:
if ((numbers[i] + numbers[i+1]) ** 0.5) % 1 == 0:
continue
else:
print("Does not satisfy condition")
break
print(numbers)
Output:
[1, 8, 28, 21, 4, 32, 17, 19, 30, 6, 3, 13, 12, 24]

import math
x=[1,8,28,21,4,32,17,19,30,6,3,13,12,24]
b = [x[i] for i in range(0,len(x)-1) if math.sqrt(x[i]+x[i+1]).is_integer()]
if (b[-1] == x[-2]):
b.append(x[-1])
print(b)
Output : [1, 8, 28, 21, 4, 32, 17, 19, 30, 6, 3, 13, 12, 24]

I'd suggest you review the way you check the integer is a square:
y = 9
z = y ** 0.5
z #=> 3.0
z.is_integer==True #=> False
So, it seems 9 not to be a square.
is_integer should be is_integer() without check.
Please refer to this topic for example: Check if a number is a perfect square
For a solution to the question, I propose to split the problem:
Define a method that returns consecutive pairs from the list;
Define a method that checks if the number is a squere;
Put it together using the all(iterable) function.
First the method for getting consecutive elements, in this case is a generator:
def each_cons(iterable, n = 2):
if n < 2: n = 1
i, size = 0, len(iterable)
while i < size-n+1:
yield iterable[i:i+n]
i += 1
Given your list x=[1,8,28,21,4,32,17,19,30,6,3,13,12,24], you call it this way:
list(each_cons(x));
# => [[1, 8],[8, 28],[28, 21],[21, 4],[4, 32],[32, 17],[17, 19],[19, 30],[30, 6],[6, 3],[3, 13],[13, 12],[12, 24]]
Then a method for the square, which I've stolen here:
def is_square(apositiveint):
x = apositiveint // 2
seen = set([x])
while x * x != apositiveint:
x = (x + (apositiveint // x)) // 2
if x in seen: return False
seen.add(x)
return True
Finally put all together:
all(is_square(a + b) for a, b in each_cons(x))
#=> True

Something pythonic should be like this.
import numpy as np
import functools
nums = np.array([1,8,28,21,4,32,17,19,30,6,3,13,12,24])
result = functools.reduce(lambda a,b: a and b, map(lambda x : (x ** 0.5).is_integer(), nums[1:] + nums[:-1]))

Find two numbers from a list that add up to a specific number

This is super bad and messy, I am new to this, please help me.
Basically, I was trying to find two numbers from a list that add up to a target number.
I have set up an example with lst = [2, 4, 6, 10] and a target value of target = 8. The answer in this example would be (2, 6) and (6, 2).
Below is my code but it is long and ugly and I am sure there is a better way of doing it. Can you please see how I can improve from my code below?
from itertools import product, permutations
numbers = [2, 4, 6, 10]
target_number = 8
two_nums = (list(permutations(numbers, 2)))
print(two_nums)
result1 = (two_nums[0][0] + two_nums[0][1])
result2 = (two_nums[1][0] + two_nums[1][1])
result3 = (two_nums[2][0] + two_nums[2][1])
result4 = (two_nums[3][0] + two_nums[3][1])
result5 = (two_nums[4][0] + two_nums[4][1])
result6 = (two_nums[5][0] + two_nums[5][1])
result7 = (two_nums[6][0] + two_nums[6][1])
result8 = (two_nums[7][0] + two_nums[7][1])
result9 = (two_nums[8][0] + two_nums[8][1])
result10 = (two_nums[9][0] + two_nums[9][1])
my_list = (result1, result2, result3, result4, result5, result6, result7, result8, result9, result10)
print (my_list)
for i in my_list:
if i == 8:
print ("Here it is:" + str(i))

For every number on the list, you can look for his complementary (number that when added to the previous one would give the required target sum). If it exists, get the pair and exit, otherwise move on.
This would look like the following:
numbers = [2, 4, 6, 10]
target_number = 8
for i, number in enumerate(numbers[:-1]): # note 1
complementary = target_number - number
if complementary in numbers[i+1:]: # note 2
print("Solution Found: {} and {}".format(number, complementary))
break
else: # note 3
print("No solutions exist")
which produces:
Solution Found: 2 and 6
Notes:
You do not have to check the last number; if there was a pair you would have already found it by then.
Notice that the membership check (which is quite costly in lists) is optimized since it considers the slice numbers[i+1:] only. The previous numbers have been checked already. A positive side-effect of the slicing is that the existence of e.g., one 4 in the list, does not give a pair for a target value of 8.
This is an excellent setup to explain the miss-understood and often confusing use of else in for-loops. The else triggers only if the loop was not abruptly ended by a break.
If the e.g., 4 - 4 solution is acceptable to you even when having a single 4 in the list you can modify as follows:
numbers = [2, 4, 6, 10]
target_number = 8
for i, number in enumerate(numbers):
complementary = target_number - number
if complementary in numbers[i:]:
print("Solution Found: {} and {}".format(number, complementary))
break
else:
print("No solutions exist")

A list comprehension will work well here. Try this:
from itertools import permutations
numbers = [2, 4, 6, 10]
target_number = 8
solutions = [pair for pair in permutations(numbers, 2) if sum(pair) == 8]
print('Solutions:', solutions)
Basically, this list comprehension looks at all the pairs that permutations(numbers, 2) returns, but only keeps the ones whose total sum equals 8.

The simplest general way to do this is to iterate over your list and for each item iterate over the rest of the list to see if it adds up to the target value. The downside of this is it is an O(n^2) operation. I don't know off the top of my head if there is a more efficient solution. I'm not 100% sure my syntax is correct, but it should look something like the following:
done = False
for i, val in enumerate(numbers):
if val >= target_number:
continue
for j, val2 in enumerate(numbers, i+1):
if val + val2 == target_number:
print ("Here it is: " + str(i) + "," + str(j))
done = True
break
if done:
break
Of course you should create this as a function that returns your result instead of just printing it. That would remove the need for the "done" variable.

If you are trying to find the answer for multiple integers with a long list that has duplicate values, I would recommend using frozenset. The "checked" answer will only get the first answer and then stop.
import numpy as np
numbers = np.random.randint(0, 100, 1000)
target = 17
def adds_to_target(base_list, target):
return_list = []
for i in range(len(base_list)):
return_list.extend([list((base_list[i], b)) for b in base_list if (base_list[i] + b)==target])
return set(map(frozenset, return_list))
# sample output
{frozenset({7, 10}),
frozenset({4, 13}),
frozenset({8, 9}),
frozenset({5, 12}),
frozenset({2, 15}),
frozenset({3, 14}),
frozenset({0, 17}),
frozenset({1, 16}),
frozenset({6, 11})}
1) In the first for loop, lists containing two integers that sum to the target value are added to "return_list" i.e. a list of lists is created.
2) Then frozenset takes out all duplicate pairs.
%timeit adds_to_target(numbers, target_number)
# 312 ms ± 8.86 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

you can do it in one line with list comprehension like below:
from itertools import permutations
numbers = [2, 4, 6, 10]
target_number = 8
two_nums = (list(permutations(numbers, 2)))
result=[i for i in two_nums if i[0]+i[1] == target_number]
[(2,6) , (6,2)]

If you want a way to do this efficiently without itertools -
numbers = [1,3,4,5,6,2,3,4,1]
target = 5
number_dict = {}
pairs = []
for num in numbers:
number_dict[num] = number_dict.get(num, 0) + 1
complement = target - num
if complement in number_dict.keys():
pairs.append((num, complement))
number_dict.pop(num)
number_dict.pop(complement)

This is this simple :)
def func(array, target):
flag = 0;
for x in array:
for y in array:
if (target-x) == y and x != y:
print(x,y)
flag = 1
break
if flag ==1:
break

import pandas as pd
Filename = "D:\\python interview\\test.txt"
wordcount_dict = dict()
#input("Enter Filename:")
list_ = [1,2,4,6,8]
num = 10
for number in list_:
num_add = number
for number_ in list_:
if number_ + num_add == num and number_ != num_add :
print(number_ , num_add)

n is the sum desired, L is the List. Basically you enter inside the loop and from that no to end of list iterate through the next loop. If L[i],L[j] indexes in list adds up to n and if L[i]!=L[j] print it.
numbers=[1,2,3,4,9,8,5,10,20,30,6]
def two_no_summer(n,L):
for i in range(0,len(L)):
for j in range(i,len(L)):
if (L[i]+L[j]==n) & (L[i]!=L[j]):
print(L[i],L[j])
Execution: https://i.stack.imgur.com/Wu47x.jpg

Finding if the next element is smaller than the one before it and deleting it from the list python

I am having trouble with my code, I am writing a method that will check if the next element is smaller than the previous element and if it is, it will delete it.
Example:
Input: [1, 20, 10, 30]
Desired output: [1,20,30]
Actual output: [30]
def findSmaller(s):
i = -1
y = []
while i <= len(s):
for j in range(len(s)):
if s[i+1] <= s[i]:
del s[i + 1]
y.append(s[i])
i += 1
return y

If you are uncertain about how your loops work I recommend adding in some print statements. That way you can see what your loop is actually doing, especially in more complicated problems this is useful.
Something like this would solve your problem.
a = [1,2,3,2,4]
for k in range(0,len(a)-2): #-2 so that one don't go past the loops length
#print(k)
y = a
if(a[k]>a[k+1]):
del y[k+1] #delete the k+1 element if it is

>>> s = [5, 20, 10, 15, 30]
>>> max_so_far = s[0]
>>> result = []
>>> for x in s:
if x >= max_so_far:
result.append(x)
max_so_far = x
>>> result
[5, 20, 30]

Depending whether you need to do some calculation later with the list you can use a generator
s = [1, 20, 10, 30]
def find_smaller_generator(l: list):
last_item = None
for item in l:
if last_item is None or item >= last_item:
last_item = item
yield item
def find_smaller_list(l: list):
return list(find_smaller_generator(l))
print(find_smaller_list(s))
for i in find_smaller_generator(s):
print(i)
print([i**2 for i in find_smaller_generator(s)])
this returns:
[1, 20, 30]
1
20
30
[1, 400, 900]

You can try something like this
def findSmaller(s):
# sets p (previous) as the first value in s
p = s[0]
# initializes y to be an array and sets the first value to p
y = [p]
# iterates over s, starting with the second element
for i in s[1::]:
# checks if i is greater than or equal to the previous element
if i >= p:
# if it is, i is appended to the list y
y.append(i)
# also set the previous value to i, so the next iteration can check against p
p = i
#returns the list
return y
What this does is iterate over s and checks if the current item in the list is greater than or equal to the previous element in the list. If it is then it appends it to y, and y is returned.
Try out the code here.

Finding median of list in Python

How do you find the median of a list in Python? The list can be of any size and the numbers are not guaranteed to be in any particular order.
If the list contains an even number of elements, the function should return the average of the middle two.
Here are some examples (sorted for display purposes):
median([1]) == 1
median([1, 1]) == 1
median([1, 1, 2, 4]) == 1.5
median([0, 2, 5, 6, 8, 9, 9]) == 6
median([0, 0, 0, 0, 4, 4, 6, 8]) == 2

Python 3.4 has statistics.median:
Return the median (middle value) of numeric data.
When the number of data points is odd, return the middle data point.
When the number of data points is even, the median is interpolated by taking the average of the two middle values:
>>> median([1, 3, 5])
3
>>> median([1, 3, 5, 7])
4.0
Usage:
import statistics
items = [6, 1, 8, 2, 3]
statistics.median(items)
#>>> 3
It's pretty careful with types, too:
statistics.median(map(float, items))
#>>> 3.0
from decimal import Decimal
statistics.median(map(Decimal, items))
#>>> Decimal('3')

(Works with python-2.x):
def median(lst):
n = len(lst)
s = sorted(lst)
return (s[n//2-1]/2.0+s[n//2]/2.0, s[n//2])[n % 2] if n else None
>>> median([-5, -5, -3, -4, 0, -1])
-3.5
numpy.median():
>>> from numpy import median
>>> median([1, -4, -1, -1, 1, -3])
-1.0
For python-3.x, use statistics.median:
>>> from statistics import median
>>> median([5, 2, 3, 8, 9, -2])
4.0

The sorted() function is very helpful for this. Use the sorted function
to order the list, then simply return the middle value (or average the two middle
values if the list contains an even amount of elements).
def median(lst):
sortedLst = sorted(lst)
lstLen = len(lst)
index = (lstLen - 1) // 2
if (lstLen % 2):
return sortedLst[index]
else:
return (sortedLst[index] + sortedLst[index + 1])/2.0

Of course you can use build in functions, but if you would like to create your own you can do something like this. The trick here is to use ~ operator that flip positive number to negative. For instance ~2 -> -3 and using negative in for list in Python will count items from the end. So if you have mid == 2 then it will take third element from beginning and third item from the end.
def median(data):
data.sort()
mid = len(data) // 2
return (data[mid] + data[~mid]) / 2

Here's a cleaner solution:
def median(lst):
quotient, remainder = divmod(len(lst), 2)
if remainder:
return sorted(lst)[quotient]
return sum(sorted(lst)[quotient - 1:quotient + 1]) / 2.
Note: Answer changed to incorporate suggestion in comments.

You can try the quickselect algorithm if faster average-case running times are needed. Quickselect has average (and best) case performance O(n), although it can end up O(n²) on a bad day.
Here's an implementation with a randomly chosen pivot:
import random
def select_nth(n, items):
pivot = random.choice(items)
lesser = [item for item in items if item < pivot]
if len(lesser) > n:
return select_nth(n, lesser)
n -= len(lesser)
numequal = items.count(pivot)
if numequal > n:
return pivot
n -= numequal
greater = [item for item in items if item > pivot]
return select_nth(n, greater)
You can trivially turn this into a method to find medians:
def median(items):
if len(items) % 2:
return select_nth(len(items)//2, items)
else:
left = select_nth((len(items)-1) // 2, items)
right = select_nth((len(items)+1) // 2, items)
return (left + right) / 2
This is very unoptimised, but it's not likely that even an optimised version will outperform Tim Sort (CPython's built-in sort) because that's really fast. I've tried before and I lost.

You can use the list.sort to avoid creating new lists with sorted and sort the lists in place.
Also you should not use list as a variable name as it shadows python's own list.
def median(l):
half = len(l) // 2
l.sort()
if not len(l) % 2:
return (l[half - 1] + l[half]) / 2.0
return l[half]

def median(x):
x = sorted(x)
listlength = len(x)
num = listlength//2
if listlength%2==0:
middlenum = (x[num]+x[num-1])/2
else:
middlenum = x[num]
return middlenum

def median(array):
"""Calculate median of the given list.
"""
# TODO: use statistics.median in Python 3
array = sorted(array)
half, odd = divmod(len(array), 2)
if odd:
return array[half]
return (array[half - 1] + array[half]) / 2.0

A simple function to return the median of the given list:
def median(lst):
lst = sorted(lst) # Sort the list first
if len(lst) % 2 == 0: # Checking if the length is even
# Applying formula which is sum of middle two divided by 2
return (lst[len(lst) // 2] + lst[(len(lst) - 1) // 2]) / 2
else:
# If length is odd then get middle value
return lst[len(lst) // 2]
Some examples with the median function:
>>> median([9, 12, 20, 21, 34, 80]) # Even
20.5
>>> median([9, 12, 80, 21, 34]) # Odd
21
If you want to use library you can just simply do:
>>> import statistics
>>> statistics.median([9, 12, 20, 21, 34, 80]) # Even
20.5
>>> statistics.median([9, 12, 80, 21, 34]) # Odd
21

I posted my solution at Python implementation of "median of medians" algorithm , which is a little bit faster than using sort(). My solution uses 15 numbers per column, for a speed ~5N which is faster than the speed ~10N of using 5 numbers per column. The optimal speed is ~4N, but I could be wrong about it.
Per Tom's request in his comment, I added my code here, for reference. I believe the critical part for speed is using 15 numbers per column, instead of 5.
#!/bin/pypy
#
# TH #stackoverflow, 2016-01-20, linear time "median of medians" algorithm
#
import sys, random
items_per_column = 15
def find_i_th_smallest( A, i ):
t = len(A)
if(t <= items_per_column):
# if A is a small list with less than items_per_column items, then:
#
# 1. do sort on A
# 2. find i-th smallest item of A
#
return sorted(A)[i]
else:
# 1. partition A into columns of k items each. k is odd, say 5.
# 2. find the median of every column
# 3. put all medians in a new list, say, B
#
B = [ find_i_th_smallest(k, (len(k) - 1)/2) for k in [A[j:(j + items_per_column)] for j in range(0,len(A),items_per_column)]]
# 4. find M, the median of B
#
M = find_i_th_smallest(B, (len(B) - 1)/2)
# 5. split A into 3 parts by M, { < M }, { == M }, and { > M }
# 6. find which above set has A's i-th smallest, recursively.
#
P1 = [ j for j in A if j < M ]
if(i < len(P1)):
return find_i_th_smallest( P1, i)
P3 = [ j for j in A if j > M ]
L3 = len(P3)
if(i < (t - L3)):
return M
return find_i_th_smallest( P3, i - (t - L3))
# How many numbers should be randomly generated for testing?
#
number_of_numbers = int(sys.argv[1])
# create a list of random positive integers
#
L = [ random.randint(0, number_of_numbers) for i in range(0, number_of_numbers) ]
# Show the original list
#
# print L
# This is for validation
#
# print sorted(L)[int((len(L) - 1)/2)]
# This is the result of the "median of medians" function.
# Its result should be the same as the above.
#
print find_i_th_smallest( L, (len(L) - 1) / 2)

In case you need additional information on the distribution of your list, the percentile method will probably be useful. And a median value corresponds to the 50th percentile of a list:
import numpy as np
a = np.array([1,2,3,4,5,6,7,8,9])
median_value = np.percentile(a, 50) # return 50th percentile
print median_value

Here what I came up with during this exercise in Codecademy:
def median(data):
new_list = sorted(data)
if len(new_list)%2 > 0:
return new_list[len(new_list)/2]
elif len(new_list)%2 == 0:
return (new_list[(len(new_list)/2)] + new_list[(len(new_list)/2)-1]) /2.0
print median([1,2,3,4,5,9])

Just two lines are enough.
def get_median(arr):
'''
Calculate the median of a sequence.
:param arr: list
:return: int or float
'''
arr = sorted(arr)
return arr[len(arr)//2] if len(arr) % 2 else (arr[len(arr)//2] + arr[len(arr)//2-1])/2

median Function
def median(midlist):
midlist.sort()
lens = len(midlist)
if lens % 2 != 0:
midl = (lens / 2)
res = midlist[midl]
else:
odd = (lens / 2) -1
ev = (lens / 2)
res = float(midlist[odd] + midlist[ev]) / float(2)
return res

I had some problems with lists of float values. I ended up using a code snippet from the python3 statistics.median and is working perfect with float values without imports. source
def calculateMedian(list):
data = sorted(list)
n = len(data)
if n == 0:
return None
if n % 2 == 1:
return data[n // 2]
else:
i = n // 2
return (data[i - 1] + data[i]) / 2

def midme(list1):
list1.sort()
if len(list1)%2>0:
x = list1[int((len(list1)/2))]
else:
x = ((list1[int((len(list1)/2))-1])+(list1[int(((len(list1)/2)))]))/2
return x
midme([4,5,1,7,2])

def median(array):
if len(array) < 1:
return(None)
if len(array) % 2 == 0:
median = (array[len(array)//2-1: len(array)//2+1])
return sum(median) / len(median)
else:
return(array[len(array)//2])

I defined a median function for a list of numbers as
def median(numbers):
return (sorted(numbers)[int(round((len(numbers) - 1) / 2.0))] + sorted(numbers)[int(round((len(numbers) - 1) // 2.0))]) / 2.0

import numpy as np
def get_median(xs):
mid = len(xs) // 2 # Take the mid of the list
if len(xs) % 2 == 1: # check if the len of list is odd
return sorted(xs)[mid] #if true then mid will be median after sorting
else:
#return 0.5 * sum(sorted(xs)[mid - 1:mid + 1])
return 0.5 * np.sum(sorted(xs)[mid - 1:mid + 1]) #if false take the avg of mid
print(get_median([7, 7, 3, 1, 4, 5]))
print(get_median([1,2,3, 4,5]))

A more generalized approach for median (and percentiles) would be:
def get_percentile(data, percentile):
# Get the number of observations
cnt=len(data)
# Sort the list
data=sorted(data)
# Determine the split point
i=(cnt-1)*percentile
# Find the `floor` of the split point
diff=i-int(i)
# Return the weighted average of the value above and below the split point
return data[int(i)]*(1-diff)+data[int(i)+1]*(diff)
# Data
data=[1,2,3,4,5]
# For the median
print(get_percentile(data=data, percentile=.50))
# > 3
print(get_percentile(data=data, percentile=.75))
# > 4
# Note the weighted average difference when an int is not returned by the percentile
print(get_percentile(data=data, percentile=.51))
# > 3.04

Try This
import math
def find_median(arr):
if len(arr)%2==1:
med=math.ceil(len(arr)/2)-1
return arr[med]
else:
return -1
print(find_median([1,2,3,4,5,6,7,8]))

Implement it:
def median(numbers):
"""
Calculate median of a list numbers.
:param numbers: the numbers to be calculated.
:return: median value of numbers.
>>> median([1, 3, 3, 6, 7, 8, 9])
6
>>> median([1, 2, 3, 4, 5, 6, 8, 9])
4.5
>>> import statistics
>>> import random
>>> numbers = random.sample(range(-50, 50), k=100)
>>> statistics.median(numbers) == median(numbers)
True
"""
numbers = sorted(numbers)
mid_index = len(numbers) // 2
return (
(numbers[mid_index] + numbers[mid_index - 1]) / 2 if mid_index % 2 == 0
else numbers[mid_index]
)
if __name__ == "__main__":
from doctest import testmod
testmod()
source from

Function median:
def median(d):
d=np.sort(d)
n2=int(len(d)/2)
r=n2%2
if (r==0):
med=d[n2]
else:
med=(d[n2] + d[n2+1]) / 2
return med

Simply, Create a Median Function with an argument as a list of the number and call the function.
def median(l):
l = sorted(l)
lent = len(l)
if (lent % 2) == 0:
m = int(lent / 2)
result = l[m]
else:
m = int(float(lent / 2) - 0.5)
result = l[m]
return result

What I did was this:
def median(a):
a = sorted(a)
if len(a) / 2 != int:
return a[len(a) / 2]
else:
return (a[len(a) / 2] + a[(len(a) / 2) - 1]) / 2
Explanation: Basically if the number of items in the list is odd, return the middle number, otherwise, if you half an even list, python automatically rounds the higher number so we know the number before that will be one less (since we sorted it) and we can add the default higher number and the number lower than it and divide them by 2 to find the median.

Here's the tedious way to find median without using the median function:
def median(*arg):
order(arg)
numArg = len(arg)
half = int(numArg/2)
if numArg/2 ==half:
print((arg[half-1]+arg[half])/2)
else:
print(int(arg[half]))
def order(tup):
ordered = [tup[i] for i in range(len(tup))]
test(ordered)
while(test(ordered)):
test(ordered)
print(ordered)
def test(ordered):
whileloop = 0
for i in range(len(ordered)-1):
print(i)
if (ordered[i]>ordered[i+1]):
print(str(ordered[i]) + ' is greater than ' + str(ordered[i+1]))
original = ordered[i+1]
ordered[i+1]=ordered[i]
ordered[i]=original
whileloop = 1 #run the loop again if you had to switch values
return whileloop

It is very simple;
def median(alist):
#to find median you will have to sort the list first
sList = sorted(alist)
first = 0
last = len(sList)-1
midpoint = (first + last)//2
return midpoint
And you can use the return value like this median = median(anyList)

Choose at random from combinations

I can make a list of all combinations using list(itertools.combinations(range(n), m)) but this will typically be very large.
Given n and m, how can I choose a combination uniformly at random without first constructing a massive list??

In the itertools module there is a recipe for returning a random combination from an iterable. Below are two versions of the code, one for Python 2.x and one for Python 3.x - in both cases you are using a generator which means that you are not creating a large iterable in memory.
Assumes Python 2.x
def random_combination(iterable, r):
"Random selection from itertools.combinations(iterable, r)"
pool = tuple(iterable)
n = len(pool)
indices = sorted(random.sample(xrange(n), r))
return tuple(pool[i] for i in indices)
In your case then it would be simple to do:
>>> import random
>>> def random_combination(iterable, r):
"Random selection from itertools.combinations(iterable, r)"
pool = tuple(iterable)
n = len(pool)
indices = sorted(random.sample(xrange(n), r))
return tuple(pool[i] for i in indices)
>>> n = 10
>>> m = 3
>>> print(random_combination(range(n), m))
(3, 5, 9) # Returns a random tuple with length 3 from the iterable range(10)
In the case of Python 3.x
In the case of Python 3.x you replace the xrange call with range but the use-case is still the same.
def random_combination(iterable, r):
"Random selection from itertools.combinations(iterable, r)"
pool = tuple(iterable)
n = len(pool)
indices = sorted(random.sample(range(n), r))
return tuple(pool[i] for i in indices)

From http://docs.python.org/2/library/itertools.html#recipes
def random_combination(iterable, r):
"Random selection from itertools.combinations(iterable, r)"
pool = tuple(iterable)
n = len(pool)
indices = sorted(random.sample(xrange(n), r))
return tuple(pool[i] for i in indices)

A generator would be more memory efficient for iteration:
def random_combination(iterable,r):
i = 0
pool = tuple(iterable)
n = len(pool)
rng = range(n)
while i < r:
i += 1
yield [pool[j] for j in random.sample(rng, r)]

I've modified Jthorpe's generator in order to work when the number of combination you need greater than the length of iterator you pass:
def random_combination(iterable,r):
i = 0
pool = tuple(iterable)
n = len(pool)
rng = range(n)
while i < n**2/2:
i += 1
yield [pool[j] for j in random.sample(rng, r)]

And so, this code maybe help you:
I created a empty list = "lista" to append results of np.random.permutation
"i" = iterator to control loop while
"r" = number of times to loop (in my case 1milion)
Into the while I did permutation of "20" numbers and get(output) a numpy array like: [20,3,5,8,19,7,5,...n20]
Observation:: there isn't permutation repeated...
Take this output is very
lista = []
i = 0
r = 1000000
while i < r:
lista.append(np.random.permutation(20))
i += 1
print(lista)
# thanks to my friend WIniston for asking about it
Results:::
[array([11, 15, 12, 18, 5, 0, 9, 8, 14, 13, 19, 10, 7, 16, 3, 1, 17,4, 6, 2]),
array([ 5, 15, 12, 4, 17, 16, 14, 7, 19, 1, 2, 10, 3, 0, 18, 6, 9, 11, 13, 8]),
array([16, 5, 12, 19, 18, 17, 7, 1, 10, 4, 11, 3, 0, 14, 15, 9, 6, 2, 13, 8]), ...

The problem with the existing answers is they sample with replacement or wastefully discard duplicate samples. This is fine if you only want to draw one sample but not for drawing many samples (like I wanted to do!).
You can uniformly sample from the space of all combinations very efficiently by making use of Python's inbuilt methods.
from random import sample
def sample_from_all_combinations(all_items, num_samples = 10):
"""Randomly sample from the combination space"""
num_items = len(all_items)
num_combinations = 2 ** num_items
num_samples = min(num_combinations, num_samples)
samples = sample(range(1, num_combinations), k=num_samples)
for combination_num in samples:
items_subset = [
all_items[item_idx]
for item_idx, item_sampled in enumerate(
format(combination_num, f"0{num_items}b")
)
if item_sampled == "1"
]
yield items_subset
NOTE the above will fail with an OverflowError when len(all_items) >= 64 because the Python ints >= 2 ** 64 are too large to convert to C ssize_t. You can solve this by modifying the sample method from random to the following: (Don't forget to update the import to from RandomLarge import sample)
class RandomLarge(random.Random):
"""
Clone of random inbuilt methods but modified to work with large numbers
>= 2^64
"""
def sample(self, population, k):
"""
Chooses k unique random elements from a population sequence or set.
Modified random sample method to work with large ranges (>= 2^64)
"""
if isinstance(population, random._Set):
population = tuple(population)
if not isinstance(population, random._Sequence):
raise TypeError(
"Population must be a sequence or set. For dicts, use list(d)."
)
randbelow = self._randbelow
# NOTE this is the only line modified from the original function
# n = len(population)
n = population.stop - population.start
if not 0 <= k <= n:
raise ValueError("Sample larger than population or is negative")
result = [None] * k
setsize = 21 # size of a small set minus size of an empty list
if k > 5:
setsize += 4 ** random._ceil(
random._log(k * 3, 4)
) # table size for big sets
if n <= setsize:
# An n-length list is smaller than a k-length set
pool = list(population)
for i in range(k): # invariant: non-selected at [0,n-i)
j = randbelow(n - i)
result[i] = pool[j]
pool[j] = pool[n - i - 1] # move non-selected item into vacancy
else:
selected = set()
selected_add = selected.add
for i in range(k):
j = randbelow(n)
while j in selected:
j = randbelow(n)
selected_add(j)
result[i] = population[j]
return result

To choose a random subset of multiple combinations
You can keep picking random samples and discarding the ones that were already picked.
def random_subset_of_combos2(iterable, r, k):
"""Returns at most `n` random samples of
`r` length (combinations of) subsequences of elements in `iterable`.
"""
def random_combination(iterable, r):
"Random selection from itertools.combinations(iterable, r)"
pool = tuple(iterable)
n = len(pool)
indices = sorted(random.sample(range(n), r))
return tuple(pool[i] for i in indices)
results = set()
while True:
new_combo = random_combination(iterable, r)
if new_combo not in results:
results.add(new_combo)
if len(results) >= min(k, len(iterable)):
break
return results
This seems faster in most cases.
Alternatively, you can assign a index number to each combination, and create a list indexing the targeted number of random samples.
def random_subset_of_combos(iterable, r, k):
"""Returns at most `n` random samples of
`r` length (combinations of) subsequences of elements in `iterable`.
"""
max_combinations = math.comb(len(iterable), min(r, len(iterable)))
k = min(k, max_combinations) # Limit sample size to ...
indexes = set(random.sample(range(max_combinations), k))
max_idx = max(indexes)
results = []
for i, combo in zip(it.count(), it.combinations(iterable, r)):
if i in indexes:
results.append(combo)
elif i > max_idx:
break
return results
Technically, this answer may not be what the OP asked for, but search engines (and at least three SO members) have considered this to be the same question.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Efficient way to find missing elements in an integer sequence - python

missingItems = [x for x in complete_list if not x in L]

a=[1,2,3,7,5,11,20] b=[] def miss(a,b): for x in range (a[0],a[-1]): if x not in a: b.append(x) return b print (miss(a,b)) ANS:[4, 6, 8, 9, 10, 12, 13, 14, 15, 16, 17, 18, 19] works for sorted,unsorted , with duplicates too

Using collections.Counter: from collections import Counter dic = Counter([10, 11, 13, 14, 15, 16, 17, 18, 20]) print([i for i in range(10, 20) if dic[i] == 0]) Output: [12, 19]

My take was to use no loops and set operations: def find_missing(in_list): complete_set = set(range(in_list[0], in_list[-1] + 1)) return complete_set - set(in_list) def main(): sample = [10, 11, 13, 14, 15, 16, 17, 18, 20] print find_missing(sample) if name == "main": main() # => set([19, 12])

Simply walk the list and look for non-consecutive numbers: prev = L[0] for this in L[1:]: if this > prev+1: for item in range(prev+1, this): # this handles gaps of 1 or more print item prev = this

>>> l = [10,11,13,14,15,16,17,18,20] >>> [l[i]+1 for i, j in enumerate(l) if (l+[0])[i+1] - l[i] > 1] [12, 19]

def missing_elements(inlist): if len(inlist) <= 1: return [] else: if inlist[1]-inlist[0] > 1: return [inlist[0]+1] + missing_elements([inlist[0]+1] + inlist[1:]) else: return missing_elements(inlist[1:])

First we should sort the list and then we check for each element, except the last one, if the next value is in the list. Be carefull not to have duplicates in the list! l.sort() [l[i]+1 for i in range(len(l)-1) if l[i]+1 not in l]

Related

How do I check the list elements satisfy a given condition?

Find two numbers from a list that add up to a specific number

Finding if the next element is smaller than the one before it and deleting it from the list python

Finding median of list in Python

Choose at random from combinations

Categories

Resources