Generate a random derangement of a list - python

How can I randomly shuffle a list so that none of the elements remains in its original position?
In other words, given a list A with distinct elements, I'd like to generate a permutation B of it so that
this permutation is random
and for each n, a[n] != b[n]
e.g.
a = [1,2,3,4]
b = [4,1,2,3] # good
b = [4,2,1,3] # good
a = [1,2,3,4]
x = [2,4,3,1] # bad
I don't know the proper term for such a permutation (is it "total"?) thus having a hard time googling. The correct term appears to be "derangement".

After some research I was able to implement the "early refusal" algorithm as described e.g. in this paper [1]. It goes like this:
import random
def random_derangement(n):
while True:
v = [i for i in range(n)]
for j in range(n - 1, -1, -1):
p = random.randint(0, j)
if v[p] == j:
break
else:
v[j], v[p] = v[p], v[j]
else:
if v[0] != 0:
return tuple(v)
The idea is: we keep shuffling the array, once we find that the permutation we're working on is not valid (v[i]==i), we break and start from scratch.
A quick test shows that this algorithm generates all derangements uniformly:
N = 4
# enumerate all derangements for testing
import itertools
counter = {}
for p in itertools.permutations(range(N)):
if all(p[i] != i for i in p):
counter[p] = 0
# make M probes for each derangement
M = 5000
for _ in range(M*len(counter)):
# generate a random derangement
p = random_derangement(N)
# is it really?
assert p in counter
# ok, record it
counter[p] += 1
# the distribution looks uniform
for p, c in sorted(counter.items()):
print p, c
Results:
(1, 0, 3, 2) 4934
(1, 2, 3, 0) 4952
(1, 3, 0, 2) 4980
(2, 0, 3, 1) 5054
(2, 3, 0, 1) 5032
(2, 3, 1, 0) 5053
(3, 0, 1, 2) 4951
(3, 2, 0, 1) 5048
(3, 2, 1, 0) 4996
I choose this algorithm for simplicity, this presentation [2] briefly outlines other ideas.
References:
[1] An analysis of a simple algorithm for random derangements. Merlini, Sprugnoli, Verri. WSPC Proceedings, 2007.
[2] Generating random derangements. Martínez, Panholzer, Prodinger.

Such permutations are called derangements. In practice you can just try random permutations until hitting a derangement, their ratio approaches the inverse of 'e' as 'n' grows.

As a possible starting point, the Fisher-Yates shuffle goes like this.
def swap(xs, a, b):
xs[a], xs[b] = xs[b], xs[a]
def permute(xs):
for a in xrange(len(xs)):
b = random.choice(xrange(a, len(xs)))
swap(xs, a, b)
Perhaps this will do the trick?
def derange(xs):
for a in xrange(len(xs) - 1):
b = random.choice(xrange(a + 1, len(xs) - 1))
swap(xs, a, b)
swap(len(xs) - 1, random.choice(xrange(n - 1))
Here's the version described by Vatine:
def derange(xs):
for a in xrange(1, len(xs)):
b = random.choice(xrange(0, a))
swap(xs, a, b)
return xs
A quick statistical test:
from collections import Counter
def test(n):
derangements = (tuple(derange(range(n))) for _ in xrange(10000))
for k,v in Counter(derangements).iteritems():
print('{} {}').format(k, v)
test(4):
(1, 3, 0, 2) 1665
(2, 0, 3, 1) 1702
(3, 2, 0, 1) 1636
(1, 2, 3, 0) 1632
(3, 0, 1, 2) 1694
(2, 3, 1, 0) 1671
This does appear uniform over its range, and it has the nice property that each element has an equal chance to appear in each allowed slot.
But unfortunately it doesn't include all of the derangements. There are 9 derangements of size 4. (The formula and an example for n=4 are given on the Wikipedia article).

This should work
import random
totalrandom = False
array = [1, 2, 3, 4]
it = 0
while totalrandom == False:
it += 1
shuffledArray = sorted(array, key=lambda k: random.random())
total = 0
for i in array:
if array[i-1] != shuffledArray[i-1]: total += 1
if total == 4:
totalrandom = True
if it > 10*len(array):
print("'Total random' shuffle impossible")
exit()
print(shuffledArray)
Note the variable it which exits the code if too many iterations are called. This accounts for arrays such as [1, 1, 1] or [3]
EDIT
Turns out that if you're using this with large arrays (bigger than 15 or so), it will be CPU intensive. Using a randomly generated 100 element array and upping it to len(array)**3, it takes my Samsung Galaxy S4 a long time to solve.
EDIT 2
After about 1200 seconds (20 minutes), the program ended saying 'Total Random shuffle impossible'. For large arrays, you need a very large number of permutations... Say len(array)**10 or something.
Code:
import random, time
totalrandom = False
array = []
it = 0
for i in range(1, 100):
array.append(random.randint(1, 6))
start = time.time()
while totalrandom == False:
it += 1
shuffledArray = sorted(array, key=lambda k: random.random())
total = 0
for i in array:
if array[i-1] != shuffledArray[i-1]: total += 1
if total == 4:
totalrandom = True
if it > len(array)**3:
end = time.time()
print(end-start)
print("'Total random' shuffle impossible")
exit()
end = time.time()
print(end-start)
print(shuffledArray)

Here is a smaller one, with pythonic syntax -
import random
def derange(s):
d=s[:]
while any([a==b for a,b in zip(d,s)]):random.shuffle(d)
return d
All it does is shuffles the list until there is no element-wise match. Also, be careful that it'll run forever if a list that cannot be deranged is passed.It happens when there are duplicates. To remove duplicates simply call the function like this derange(list(set(my_list_to_be_deranged))).

import random
a=[1,2,3,4]
c=[]
i=0
while i < len(a):
while 1:
k=random.choice(a)
#print k,a[i]
if k==a[i]:
pass
else:
if k not in c:
if i==len(a)-2:
if a[len(a)-1] not in c:
if k==a[len(a)-1]:
c.append(k)
break
else:
c.append(k)
break
else:
c.append(k)
break
i=i+1
print c

A quick way is to try to shuffle your list until you reach that state. You simply try to shuffle your list until you are left with a list that satisfies your condition.
import random
import copy
def is_derangement(l_original, l_proposal):
return all([l_original[i] != item for i, item in enumerate(l_proposal)])
l_original = [1, 2, 3, 4, 5]
l_proposal = copy.copy(l_original)
while not is_derangement(l_original, l_proposal):
random.shuffle(l_proposal)
print(l_proposal)

Related

Is there a statistical test that can compare two ordered lists

I would like to get a statistical test statistic to compare two lists. Suppose my Benchmark list is
Benchmark = [a,b,c,d,e,f,g]
and I have two other lists
A = [g,c,b,a,f,e,d]
C = [c,d,e,a,b,f,g]
I want the test to inform me which list is closer to the Benchmark. The test should consider the absolute location, but also the relative location for example it should penalize the fact that in list A 'g' is at the start but in the benchmark it is at the end(how far is something from its true location), but also it should also reward the fact that 'a' and 'b' are close to each other in list C just like in the Benchmark.
A and C are always shuffled Benchmark. I would like a statistical test or some kind of metric that informs me that the orderings of list A , B and C are not statistically different from that of the Benchmark but that of a certain list D is significantly different at a certain threshold or p-value such as 5%. And even among the lists A,B and C, the test should perfectly outline which ordering is closer to the Benchmark.
Well, if you come to the conclusion that a metric will suffice, here you go:
def dist(a, b):
perm = []
for v in b:
perm.append(a.index(v))
perm_vals = [a[p] for p in perm]
# displacement
ret = 0
for i, v in enumerate(perm):
ret += abs(v - i)
# coherence break
current = perm_vals.index(a[0])
for v in a[1:]:
new = perm_vals.index(v)
ret += abs(new - current) - 1
current = new
return ret
I've created a few samples to test this:
import random
ground_truth = [0, 1, 2, 3, 4, 5, 6]
samples = []
for i in range(7):
samples.append(random.sample(ground_truth, len(ground_truth)))
samples.append([0, 6, 1, 5, 3, 4, 2])
samples.append([6, 5, 4, 3, 2, 1, 0])
samples.append([0, 1, 2, 3, 4, 5, 6])
def dist(a, b):
perm = []
for v in b:
perm.append(a.index(v))
perm_vals = [a[p] for p in perm]
# displacement
ret = 0
for i, v in enumerate(perm):
ret += abs(v - i)
# coherence break
current = perm_vals.index(a[0])
for v in a[1:]:
new = perm_vals.index(v)
ret += abs(new - current) - 1
current = new
return ret
for s in samples:
print(s, dist(ground_truth, s))
The metric is a cost, that is, the lower it is, the better. I designed it to yield 0 iff the permutation is an identity. The job left for you, that which none can do for you, is deciding how strict you want to be when evaluating samples using this metric, which definitely depends on what you're trying to achieve.

How to shuffle an array of numbers without two consecutive elements repeating?

I'm currently trying to get an array of numbers like this one randomly shuffled:
label_array = np.repeat(np.arange(6), 12)
The only constrain is that no consecutive elements of the shuffle must be the same number. For that I'm currently using this code:
# Check if there are any occurrences of two consecutive
# elements being of the same category (same number)
num_occurrences = np.sum(np.diff(label_array) == 0)
# While there are any occurrences of this...
while num_occurrences != 0:
# ...shuffle the array...
np.random.shuffle(label_array)
# ...create a flag for occurrences...
flag = np.hstack(([False], np.diff(label_array) == 0))
flag_array = label_array[flag]
# ...and shuffle them.
np.random.shuffle(flag_array)
# Then re-assign them to the original array...
label_array[flag] = flag_array
# ...and check the number of occurrences again.
num_occurrences = np.sum(np.diff(label_array) == 0)
Although this works for an array of this size, I don't know if it would work for much bigger arrays. And even so, it may take a lot of time.
So, is there a better way of doing this?
May not be technically the best answer, hopefully it suffices for your requirements.
import numpy as np
def generate_random_array(block_length, block_count):
for blocks in range(0, block_count):
nums = np.arange(block_length)
np.random.shuffle(nums)
try:
if nums[0] == randoms_array [-1]:
nums[0], nums[-1] = nums[-1], nums[0]
except NameError:
randoms_array = []
randoms_array.extend(nums)
return randoms_array
generate_random_array(block_length=1000, block_count=1000)
Here is a way to do it, for Python >= 3.6, using random.choices, which allows to choose from a population with weights.
The idea is to generate the numbers one by one. Each time we generate a new number, we exclude the previous one by temporarily setting its weight to zero. Then, we decrement the weight of the chosen one.
As #roganjosh duly noted, we have a problem at the end when we are left with more than one instance of the last value - and that can be really frequent, especially with a small number of values and a large number of repeats.
The solution I used is to insert these value back into the list where they don't create a conflict, with the short send_back function.
import random
def send_back(value, number, lst):
idx = len(lst)-2
for _ in range(number):
while lst[idx] == value or lst[idx-1] == value:
idx -= 1
lst.insert(idx, value)
def shuffle_without_doubles(nb_values, repeats):
population = list(range(nb_values))
weights = [repeats] * nb_values
out = []
prev = None
for i in range(nb_values * repeats):
if prev is not None:
# remove prev from the list of possible choices
# by turning its weight temporarily to zero
old_weight = weights[prev]
weights[prev] = 0
try:
chosen = random.choices(population, weights)[0]
except IndexError:
# We are here because all of our weights are 0,
# which means that all is left to choose from
# is old_weight times the previous value
send_back(prev, old_weight, out)
break
out.append(chosen)
weights[chosen] -= 1
if prev is not None:
# restore weight
weights[prev] = old_weight
prev = chosen
return out
print(shuffle_without_doubles(6, 12))
[5, 1, 3, 4, 3, 2, 1, 5, 3, 5, 2, 0, 5, 4, 3, 4, 5,
3, 4, 0, 4, 1, 0, 1, 5, 3, 0, 2, 3, 4, 1, 2, 4, 1,
0, 2, 0, 2, 5, 0, 2, 1, 0, 5, 2, 0, 5, 0, 3, 2, 1,
2, 1, 5, 1, 3, 5, 4, 2, 4, 0, 4, 2, 4, 0, 1, 3, 4,
5, 3, 1, 3]
Some crude timing: it takes about 30 seconds to generate (shuffle_without_doubles(600, 1200)), so 720000 values.
I came from Creating a list without back-to-back repetitions from multiple repeating elements (referred as "problem A") as I organise my notes and there was no correct answer under "problem A" nor in the current one. Also these two problems seems different because problem A requires same elements.
Basically what you asked is same as an algorithm problem (link) where the randomness is not required. But when you have like almost half of all numbers same, the result can only be like "ABACADAEA...", where "ABCDE" are numbers. In the most voted answer to this problem, a priority queue is used so the time complexity is O(n log m), where n is the length of the output and m is the count of option.
As for this problem A easier way is to use itertools.permutations and randomly select some of them with different beginning and ending so it looks like "random"
I write draft code here and it works.
from itertools import permutations
from random import choice
def no_dup_shuffle(ele_count: int, repeat: int):
"""
Return a shuffle of `ele_count` elements repeating `repeat` times.
"""
p = permutations(range(ele_count))
res = []
curr = last = [-1] # -1 is a dummy value for the first `extend`
for _ in range(repeat):
while curr[0] == last[-1]:
curr = choice(list(p))
res.extend(curr)
last = curr
return res
def test_no_dup_shuffle(count, rep):
r = no_dup_shuffle(count, rep)
assert len(r) == count * rep # check result length
assert len(set(r)) == count # check all elements are used and in `range(count)`
for i, n in enumerate(r): # check no duplicate
assert n != r[i - 1]
print(r)
if __name__ == "__main__":
test_no_dup_shuffle(5, 3)
test_no_dup_shuffle(3, 17)

Finding median of list in Python

How do you find the median of a list in Python? The list can be of any size and the numbers are not guaranteed to be in any particular order.
If the list contains an even number of elements, the function should return the average of the middle two.
Here are some examples (sorted for display purposes):
median([1]) == 1
median([1, 1]) == 1
median([1, 1, 2, 4]) == 1.5
median([0, 2, 5, 6, 8, 9, 9]) == 6
median([0, 0, 0, 0, 4, 4, 6, 8]) == 2
Python 3.4 has statistics.median:
Return the median (middle value) of numeric data.
When the number of data points is odd, return the middle data point.
When the number of data points is even, the median is interpolated by taking the average of the two middle values:
>>> median([1, 3, 5])
3
>>> median([1, 3, 5, 7])
4.0
Usage:
import statistics
items = [6, 1, 8, 2, 3]
statistics.median(items)
#>>> 3
It's pretty careful with types, too:
statistics.median(map(float, items))
#>>> 3.0
from decimal import Decimal
statistics.median(map(Decimal, items))
#>>> Decimal('3')
(Works with python-2.x):
def median(lst):
n = len(lst)
s = sorted(lst)
return (s[n//2-1]/2.0+s[n//2]/2.0, s[n//2])[n % 2] if n else None
>>> median([-5, -5, -3, -4, 0, -1])
-3.5
numpy.median():
>>> from numpy import median
>>> median([1, -4, -1, -1, 1, -3])
-1.0
For python-3.x, use statistics.median:
>>> from statistics import median
>>> median([5, 2, 3, 8, 9, -2])
4.0
The sorted() function is very helpful for this. Use the sorted function
to order the list, then simply return the middle value (or average the two middle
values if the list contains an even amount of elements).
def median(lst):
sortedLst = sorted(lst)
lstLen = len(lst)
index = (lstLen - 1) // 2
if (lstLen % 2):
return sortedLst[index]
else:
return (sortedLst[index] + sortedLst[index + 1])/2.0
Of course you can use build in functions, but if you would like to create your own you can do something like this. The trick here is to use ~ operator that flip positive number to negative. For instance ~2 -> -3 and using negative in for list in Python will count items from the end. So if you have mid == 2 then it will take third element from beginning and third item from the end.
def median(data):
data.sort()
mid = len(data) // 2
return (data[mid] + data[~mid]) / 2
Here's a cleaner solution:
def median(lst):
quotient, remainder = divmod(len(lst), 2)
if remainder:
return sorted(lst)[quotient]
return sum(sorted(lst)[quotient - 1:quotient + 1]) / 2.
Note: Answer changed to incorporate suggestion in comments.
You can try the quickselect algorithm if faster average-case running times are needed. Quickselect has average (and best) case performance O(n), although it can end up O(n²) on a bad day.
Here's an implementation with a randomly chosen pivot:
import random
def select_nth(n, items):
pivot = random.choice(items)
lesser = [item for item in items if item < pivot]
if len(lesser) > n:
return select_nth(n, lesser)
n -= len(lesser)
numequal = items.count(pivot)
if numequal > n:
return pivot
n -= numequal
greater = [item for item in items if item > pivot]
return select_nth(n, greater)
You can trivially turn this into a method to find medians:
def median(items):
if len(items) % 2:
return select_nth(len(items)//2, items)
else:
left = select_nth((len(items)-1) // 2, items)
right = select_nth((len(items)+1) // 2, items)
return (left + right) / 2
This is very unoptimised, but it's not likely that even an optimised version will outperform Tim Sort (CPython's built-in sort) because that's really fast. I've tried before and I lost.
You can use the list.sort to avoid creating new lists with sorted and sort the lists in place.
Also you should not use list as a variable name as it shadows python's own list.
def median(l):
half = len(l) // 2
l.sort()
if not len(l) % 2:
return (l[half - 1] + l[half]) / 2.0
return l[half]
def median(x):
x = sorted(x)
listlength = len(x)
num = listlength//2
if listlength%2==0:
middlenum = (x[num]+x[num-1])/2
else:
middlenum = x[num]
return middlenum
def median(array):
"""Calculate median of the given list.
"""
# TODO: use statistics.median in Python 3
array = sorted(array)
half, odd = divmod(len(array), 2)
if odd:
return array[half]
return (array[half - 1] + array[half]) / 2.0
A simple function to return the median of the given list:
def median(lst):
lst = sorted(lst) # Sort the list first
if len(lst) % 2 == 0: # Checking if the length is even
# Applying formula which is sum of middle two divided by 2
return (lst[len(lst) // 2] + lst[(len(lst) - 1) // 2]) / 2
else:
# If length is odd then get middle value
return lst[len(lst) // 2]
Some examples with the median function:
>>> median([9, 12, 20, 21, 34, 80]) # Even
20.5
>>> median([9, 12, 80, 21, 34]) # Odd
21
If you want to use library you can just simply do:
>>> import statistics
>>> statistics.median([9, 12, 20, 21, 34, 80]) # Even
20.5
>>> statistics.median([9, 12, 80, 21, 34]) # Odd
21
I posted my solution at Python implementation of "median of medians" algorithm , which is a little bit faster than using sort(). My solution uses 15 numbers per column, for a speed ~5N which is faster than the speed ~10N of using 5 numbers per column. The optimal speed is ~4N, but I could be wrong about it.
Per Tom's request in his comment, I added my code here, for reference. I believe the critical part for speed is using 15 numbers per column, instead of 5.
#!/bin/pypy
#
# TH #stackoverflow, 2016-01-20, linear time "median of medians" algorithm
#
import sys, random
items_per_column = 15
def find_i_th_smallest( A, i ):
t = len(A)
if(t <= items_per_column):
# if A is a small list with less than items_per_column items, then:
#
# 1. do sort on A
# 2. find i-th smallest item of A
#
return sorted(A)[i]
else:
# 1. partition A into columns of k items each. k is odd, say 5.
# 2. find the median of every column
# 3. put all medians in a new list, say, B
#
B = [ find_i_th_smallest(k, (len(k) - 1)/2) for k in [A[j:(j + items_per_column)] for j in range(0,len(A),items_per_column)]]
# 4. find M, the median of B
#
M = find_i_th_smallest(B, (len(B) - 1)/2)
# 5. split A into 3 parts by M, { < M }, { == M }, and { > M }
# 6. find which above set has A's i-th smallest, recursively.
#
P1 = [ j for j in A if j < M ]
if(i < len(P1)):
return find_i_th_smallest( P1, i)
P3 = [ j for j in A if j > M ]
L3 = len(P3)
if(i < (t - L3)):
return M
return find_i_th_smallest( P3, i - (t - L3))
# How many numbers should be randomly generated for testing?
#
number_of_numbers = int(sys.argv[1])
# create a list of random positive integers
#
L = [ random.randint(0, number_of_numbers) for i in range(0, number_of_numbers) ]
# Show the original list
#
# print L
# This is for validation
#
# print sorted(L)[int((len(L) - 1)/2)]
# This is the result of the "median of medians" function.
# Its result should be the same as the above.
#
print find_i_th_smallest( L, (len(L) - 1) / 2)
In case you need additional information on the distribution of your list, the percentile method will probably be useful. And a median value corresponds to the 50th percentile of a list:
import numpy as np
a = np.array([1,2,3,4,5,6,7,8,9])
median_value = np.percentile(a, 50) # return 50th percentile
print median_value
Here what I came up with during this exercise in Codecademy:
def median(data):
new_list = sorted(data)
if len(new_list)%2 > 0:
return new_list[len(new_list)/2]
elif len(new_list)%2 == 0:
return (new_list[(len(new_list)/2)] + new_list[(len(new_list)/2)-1]) /2.0
print median([1,2,3,4,5,9])
Just two lines are enough.
def get_median(arr):
'''
Calculate the median of a sequence.
:param arr: list
:return: int or float
'''
arr = sorted(arr)
return arr[len(arr)//2] if len(arr) % 2 else (arr[len(arr)//2] + arr[len(arr)//2-1])/2
median Function
def median(midlist):
midlist.sort()
lens = len(midlist)
if lens % 2 != 0:
midl = (lens / 2)
res = midlist[midl]
else:
odd = (lens / 2) -1
ev = (lens / 2)
res = float(midlist[odd] + midlist[ev]) / float(2)
return res
I had some problems with lists of float values. I ended up using a code snippet from the python3 statistics.median and is working perfect with float values without imports. source
def calculateMedian(list):
data = sorted(list)
n = len(data)
if n == 0:
return None
if n % 2 == 1:
return data[n // 2]
else:
i = n // 2
return (data[i - 1] + data[i]) / 2
def midme(list1):
list1.sort()
if len(list1)%2>0:
x = list1[int((len(list1)/2))]
else:
x = ((list1[int((len(list1)/2))-1])+(list1[int(((len(list1)/2)))]))/2
return x
midme([4,5,1,7,2])
def median(array):
if len(array) < 1:
return(None)
if len(array) % 2 == 0:
median = (array[len(array)//2-1: len(array)//2+1])
return sum(median) / len(median)
else:
return(array[len(array)//2])
I defined a median function for a list of numbers as
def median(numbers):
return (sorted(numbers)[int(round((len(numbers) - 1) / 2.0))] + sorted(numbers)[int(round((len(numbers) - 1) // 2.0))]) / 2.0
import numpy as np
def get_median(xs):
mid = len(xs) // 2 # Take the mid of the list
if len(xs) % 2 == 1: # check if the len of list is odd
return sorted(xs)[mid] #if true then mid will be median after sorting
else:
#return 0.5 * sum(sorted(xs)[mid - 1:mid + 1])
return 0.5 * np.sum(sorted(xs)[mid - 1:mid + 1]) #if false take the avg of mid
print(get_median([7, 7, 3, 1, 4, 5]))
print(get_median([1,2,3, 4,5]))
A more generalized approach for median (and percentiles) would be:
def get_percentile(data, percentile):
# Get the number of observations
cnt=len(data)
# Sort the list
data=sorted(data)
# Determine the split point
i=(cnt-1)*percentile
# Find the `floor` of the split point
diff=i-int(i)
# Return the weighted average of the value above and below the split point
return data[int(i)]*(1-diff)+data[int(i)+1]*(diff)
# Data
data=[1,2,3,4,5]
# For the median
print(get_percentile(data=data, percentile=.50))
# > 3
print(get_percentile(data=data, percentile=.75))
# > 4
# Note the weighted average difference when an int is not returned by the percentile
print(get_percentile(data=data, percentile=.51))
# > 3.04
Try This
import math
def find_median(arr):
if len(arr)%2==1:
med=math.ceil(len(arr)/2)-1
return arr[med]
else:
return -1
print(find_median([1,2,3,4,5,6,7,8]))
Implement it:
def median(numbers):
"""
Calculate median of a list numbers.
:param numbers: the numbers to be calculated.
:return: median value of numbers.
>>> median([1, 3, 3, 6, 7, 8, 9])
6
>>> median([1, 2, 3, 4, 5, 6, 8, 9])
4.5
>>> import statistics
>>> import random
>>> numbers = random.sample(range(-50, 50), k=100)
>>> statistics.median(numbers) == median(numbers)
True
"""
numbers = sorted(numbers)
mid_index = len(numbers) // 2
return (
(numbers[mid_index] + numbers[mid_index - 1]) / 2 if mid_index % 2 == 0
else numbers[mid_index]
)
if __name__ == "__main__":
from doctest import testmod
testmod()
source from
Function median:
def median(d):
d=np.sort(d)
n2=int(len(d)/2)
r=n2%2
if (r==0):
med=d[n2]
else:
med=(d[n2] + d[n2+1]) / 2
return med
Simply, Create a Median Function with an argument as a list of the number and call the function.
def median(l):
l = sorted(l)
lent = len(l)
if (lent % 2) == 0:
m = int(lent / 2)
result = l[m]
else:
m = int(float(lent / 2) - 0.5)
result = l[m]
return result
What I did was this:
def median(a):
a = sorted(a)
if len(a) / 2 != int:
return a[len(a) / 2]
else:
return (a[len(a) / 2] + a[(len(a) / 2) - 1]) / 2
Explanation: Basically if the number of items in the list is odd, return the middle number, otherwise, if you half an even list, python automatically rounds the higher number so we know the number before that will be one less (since we sorted it) and we can add the default higher number and the number lower than it and divide them by 2 to find the median.
Here's the tedious way to find median without using the median function:
def median(*arg):
order(arg)
numArg = len(arg)
half = int(numArg/2)
if numArg/2 ==half:
print((arg[half-1]+arg[half])/2)
else:
print(int(arg[half]))
def order(tup):
ordered = [tup[i] for i in range(len(tup))]
test(ordered)
while(test(ordered)):
test(ordered)
print(ordered)
def test(ordered):
whileloop = 0
for i in range(len(ordered)-1):
print(i)
if (ordered[i]>ordered[i+1]):
print(str(ordered[i]) + ' is greater than ' + str(ordered[i+1]))
original = ordered[i+1]
ordered[i+1]=ordered[i]
ordered[i]=original
whileloop = 1 #run the loop again if you had to switch values
return whileloop
It is very simple;
def median(alist):
#to find median you will have to sort the list first
sList = sorted(alist)
first = 0
last = len(sList)-1
midpoint = (first + last)//2
return midpoint
And you can use the return value like this median = median(anyList)

How to split a list into subsets with no repeating elements in python

I need code that takes a list (up to n=31) and returns all possible subsets of n=3 without any two elements repeating in the same subset twice (think of people who are teaming up in groups of 3 with new people every time):
list=[1,2,3,4,5,6,7,8,9]
and returns
[1,2,3][4,5,6][7,8,9]
[1,4,7][2,3,8][3,6,9]
[1,6,8][2,4,9][3,5,7]
but not:
[1,5,7][2,4,8][3,6,9]
because 1 and 7 have appeared together already (likewise, 3 and 9).
I would also like to do this for subsets of n=2.
Thank you!!
Here's what I came up with:
from itertools import permutations, combinations, ifilter, chain
people = [1,2,3,4,5,6,7,8,9]
#get all combinations of 3 sets of 3 people
combos_combos = combinations(combinations(people,3), 3)
#filter out sets that don't contain all 9 people
valid_sets = ifilter(lambda combo:
len(set(chain.from_iterable(combo))) == 9,
combos_combos)
#a set of people that have already been paired
already_together = set()
for sets in valid_sets:
#get all (sorted) combinations of pairings in this set
pairings = list(chain.from_iterable(combinations(combo, 2) for combo in sets))
pairings = set(map(tuple, map(sorted, pairings)))
#if all of the pairings have never been paired before, we have a new one
if len(pairings.intersection(already_together)) == 0:
print sets
already_together.update(pairings)
This prints:
~$ time python test_combos.py
((1, 2, 3), (4, 5, 6), (7, 8, 9))
((1, 4, 7), (2, 5, 8), (3, 6, 9))
((1, 5, 9), (2, 6, 7), (3, 4, 8))
((1, 6, 8), (2, 4, 9), (3, 5, 7))
real 0m0.182s
user 0m0.164s
sys 0m0.012s
Try this:
from itertools import permutations
lst = list(range(1, 10))
n = 3
triplets = list(permutations(lst, n))
triplets = [set(x) for x in triplets]
def array_unique(seq):
checked = []
for x in seq:
if x not in checked:
checked.append(x)
return checked
triplets = array_unique(triplets)
result = []
m = n * 3
for x in triplets:
for y in triplets:
for z in triplets:
if len(x.union(y.union(z))) == m:
result += [[x, y, z]]
def groups(sets, i):
result = [sets[i]]
for x in sets:
flag = True
for y in result:
for r in x:
for p in y:
if len(r.intersection(p)) >= 2:
flag = False
break
else:
continue
if flag == False:
break
if flag == True:
result.append(x)
return result
for i in range(len(result)):
print('%d:' % (i + 1))
for x in groups(result, i):
print(x)
Output for n = 10:
http://pastebin.com/Vm54HRq3
Here's my attempt of a fairly general solution to your problem.
from itertools import combinations
n = 3
l = range(1, 10)
def f(l, n, used, top):
if len(l) == n:
if all(set(x) not in used for x in combinations(l, 2)):
yield [l]
else:
for group in combinations(l, n):
if any(set(x) in used for x in combinations(group, 2)):
continue
for rest in f([i for i in l if i not in group], n, used, False):
config = [list(group)] + rest
if top:
# Running at top level, this is a valid
# configuration. Update used list.
for c in config:
used.extend(set(x) for x in combinations(c, 2))
yield config
break
for i in f(l, n, [], True):
print i
However, it is very slow for high values of n, too slow for n=31. I don't have time right now to try to improve the speed, but I might try later. Suggestions are welcome!
My wife had this problem trying to arrange breakout groups for a meeting with nine people; she wanted no pairs of attendees to repeat.
I immediately busted out itertools and was stumped and came to StackOverflow. But in the meantime, my non-programmer wife solved it visually. The key insight is to create a tic-tac-toe grid:
1 2 3
4 5 6
7 8 9
And then simply take 3 groups going down, 3 groups going across, and 3 groups going diagonally wrapping around, and 3 groups going diagonally the other way, wrapping around.
You can do it just in your head then.
- : 123,456,789
| : 147,258,368
\ : 159,267,348
/ : 168,249,357
I suppose the next question is how far can you take a visual method like this? Does it rely on the coincidence that the desired subset size * the number of subsets = the number of total elements?

find length of sequences of identical values in a numpy array (run length encoding)

In a pylab program (which could probably be a matlab program as well) I have a numpy array of numbers representing distances: d[t] is the distance at time t (and the timespan of my data is len(d) time units).
The events I'm interested in are when the distance is below a certain threshold, and I want to compute the duration of these events. It's easy to get an array of booleans with b = d<threshold, and the problem comes down to computing the sequence of the lengths of the True-only words in b. But I do not know how to do that efficiently (i.e. using numpy primitives), and I resorted to walk the array and to do manual change detection (i.e. initialize counter when value goes from False to True, increase counter as long as value is True, and output the counter to the sequence when value goes back to False). But this is tremendously slow.
How to efficienly detect that sort of sequences in numpy arrays ?
Below is some python code that illustrates my problem : the fourth dot takes a very long time to appear (if not, increase the size of the array)
from pylab import *
threshold = 7
print '.'
d = 10*rand(10000000)
print '.'
b = d<threshold
print '.'
durations=[]
for i in xrange(len(b)):
if b[i] and (i==0 or not b[i-1]):
counter=1
if i>0 and b[i-1] and b[i]:
counter+=1
if (b[i-1] and not b[i]) or i==len(b)-1:
durations.append(counter)
print '.'
Fully numpy vectorized and generic RLE for any array (works with strings, booleans etc too).
Outputs tuple of run lengths, start positions, and values.
import numpy as np
def rle(inarray):
""" run length encoding. Partial credit to R rle function.
Multi datatype arrays catered for including non Numpy
returns: tuple (runlengths, startpositions, values) """
ia = np.asarray(inarray) # force numpy
n = len(ia)
if n == 0:
return (None, None, None)
else:
y = ia[1:] != ia[:-1] # pairwise unequal (string safe)
i = np.append(np.where(y), n - 1) # must include last element posi
z = np.diff(np.append(-1, i)) # run lengths
p = np.cumsum(np.append(0, z))[:-1] # positions
return(z, p, ia[i])
Pretty fast (i7):
xx = np.random.randint(0, 5, 1000000)
%timeit yy = rle(xx)
100 loops, best of 3: 18.6 ms per loop
Multiple data types:
rle([True, True, True, False, True, False, False])
Out[8]:
(array([3, 1, 1, 2]),
array([0, 3, 4, 5]),
array([ True, False, True, False], dtype=bool))
rle(np.array([5, 4, 4, 4, 4, 0, 0]))
Out[9]: (array([1, 4, 2]), array([0, 1, 5]), array([5, 4, 0]))
rle(["hello", "hello", "my", "friend", "okay", "okay", "bye"])
Out[10]:
(array([2, 1, 1, 2, 1]),
array([0, 2, 3, 4, 6]),
array(['hello', 'my', 'friend', 'okay', 'bye'],
dtype='|S6'))
Same results as Alex Martelli above:
xx = np.random.randint(0, 2, 20)
xx
Out[60]: array([1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1])
am = runs_of_ones_array(xx)
tb = rle(xx)
am
Out[63]: array([4, 5, 2, 5])
tb[0][tb[2] == 1]
Out[64]: array([4, 5, 2, 5])
%timeit runs_of_ones_array(xx)
10000 loops, best of 3: 28.5 µs per loop
%timeit rle(xx)
10000 loops, best of 3: 38.2 µs per loop
Slightly slower than Alex (but still very fast), and much more flexible.
While not numpy primitives, itertools functions are often very fast, so do give this one a try (and measure times for various solutions including this one, of course):
def runs_of_ones(bits):
for bit, group in itertools.groupby(bits):
if bit: yield sum(group)
If you do need the values in a list, just can use list(runs_of_ones(bits)), of course; but maybe a list comprehension might be marginally faster still:
def runs_of_ones_list(bits):
return [sum(g) for b, g in itertools.groupby(bits) if b]
Moving to "numpy-native" possibilities, what about:
def runs_of_ones_array(bits):
# make sure all runs of ones are well-bounded
bounded = numpy.hstack(([0], bits, [0]))
# get 1 at run starts and -1 at run ends
difs = numpy.diff(bounded)
run_starts, = numpy.where(difs > 0)
run_ends, = numpy.where(difs < 0)
return run_ends - run_starts
Again: be sure to benchmark solutions against each others in realistic-for-you examples!
Here is a solution using only arrays: it takes an array containing a sequence of bools and counts the length of the transitions.
>>> from numpy import array, arange
>>> b = array([0,0,0,1,1,1,0,0,0,1,1,1,1,0,0], dtype=bool)
>>> sw = (b[:-1] ^ b[1:]); print sw
[False False True False False True False False True False False False
True False]
>>> isw = arange(len(sw))[sw]; print isw
[ 2 5 8 12]
>>> lens = isw[1::2] - isw[::2]; print lens
[3 4]
sw contains a true where there is a switch, isw converts them in indexes. The items of isw are then subtracted pairwise in lens.
Notice that if the sequence started with an 1 it would count the length of the 0s sequences: this can be fixed in the indexing to compute lens. Also, I have not tested corner cases such sequences of length 1.
Full function that returns start positions and lengths of all True-subarrays.
import numpy as np
def count_adjacent_true(arr):
assert len(arr.shape) == 1
assert arr.dtype == np.bool
if arr.size == 0:
return np.empty(0, dtype=int), np.empty(0, dtype=int)
sw = np.insert(arr[1:] ^ arr[:-1], [0, arr.shape[0]-1], values=True)
swi = np.arange(sw.shape[0])[sw]
offset = 0 if arr[0] else 1
lengths = swi[offset+1::2] - swi[offset:-1:2]
return swi[offset:-1:2], lengths
Tested for different bool 1D-arrays (empty array; single/multiple elements; even/odd lengths; started with True/False; with only True/False elements).
Just in case anyone is curious (and since you mentioned MATLAB in passing), here's one way to solve it in MATLAB:
threshold = 7;
d = 10*rand(1,100000); % Sample data
b = diff([false (d < threshold) false]);
durations = find(b == -1)-find(b == 1);
I'm not too familiar with Python, but maybe this could help give you some ideas. =)
Perhaps late to the party, but a Numba-based approach is going to be fastest by far and large.
import numpy as np
import numba as nb
#nb.njit
def count_steps_nb(arr):
result = 1
last_x = arr[0]
for x in arr[1:]:
if last_x != x:
result += 1
last_x = x
return result
#nb.njit
def rle_nb(arr):
n = count_steps_nb(arr)
values = np.empty(n, dtype=arr.dtype)
lengths = np.empty(n, dtype=np.int_)
last_x = arr[0]
length = 1
i = 0
for x in arr[1:]:
if last_x != x:
values[i] = last_x
lengths[i] = length
i += 1
length = 1
last_x = x
else:
length += 1
values[i] = last_x
lengths[i] = length
return values, lengths
Numpy-based approaches (inspired by #ThomasBrowne answer but faster because the use of the expensive numpy.concatenate() is reduced to a minimum) are the runner-up (here two similar approaches are presented, one using not-equality to find the positions of the steps, and the other one using differences):
import numpy as np
def rle_np_neq(arr):
n = len(arr)
if n == 0:
values = np.empty(0, dtype=arr.dtype)
lengths = np.empty(0, dtype=np.int_)
else:
positions = np.concatenate(
[[-1], np.nonzero(arr[1:] != arr[:-1])[0], [n - 1]])
lengths = positions[1:] - positions[:-1]
values = arr[positions[1:]]
return values, lengths
import numpy as np
def rle_np_diff(arr):
n = len(arr)
if n == 0:
values = np.empty(0, dtype=arr.dtype)
lengths = np.empty(0, dtype=np.int_)
else:
positions = np.concatenate(
[[-1], np.nonzero(arr[1:] - arr[:-1])[0], [n - 1]])
lengths = positions[1:] - positions[:-1]
values = arr[positions[1:]]
return values, lengths
These both outperform the naïve and simple single loop approach:
import numpy as np
def rle_loop(arr):
values = []
lengths = []
last_x = arr[0]
length = 1
for x in arr[1:]:
if last_x != x:
values.append(last_x)
lengths.append(length)
length = 1
last_x = x
else:
length += 1
values.append(last_x)
lengths.append(length)
return np.array(values), np.array(lengths)
On the contrary, using itertools.groupby() is not going to be any faster than the simple loop (unless on very special cases like in #AlexMartelli answer or someone will implement __len__ on the group object) because in general there is no simple way of extracting the group size information other than looping through the group itself, which is not exactly fast:
import itertools
import numpy as np
def rle_groupby(arr):
values = []
lengths = []
for x, group in itertools.groupby(arr):
values.append(x)
lengths.append(sum(1 for _ in group))
return np.array(values), np.array(lengths)
The results of some benchmarks on random integer arrays of varying size are reported:
(Full analysis here).
durations = []
counter = 0
for bool in b:
if bool:
counter += 1
elif counter > 0:
durations.append(counter)
counter = 0
if counter > 0:
durations.append(counter)
import numpy as np
a = np.array([3, 3, 3, 1, 3, 3, 3, 3, 1, 1, 3, 3, 3, 3, 3])
is_not_in_run = (a != 3)
is_not_in_run = np.concatenate(([True], is_not_in_run, [True]))
gap_indices = np.where(is_not_in_run)[0]
run_lengths = np.diff(gap_indices) - 1
run_lengths = np.delete(run_lengths, np.where(run_lengths == 0)[0])
print(run_lengths)

Categories

Resources