Sample irregular list of numbers with a set delta - python

Is there a simpler way, using e.g. numpy, to get samples for a given X and delta than the below code?
>>> X = [1, 4, 5, 6, 11, 13, 15, 20, 21, 22, 25, 30]
>>> delta = 5
>>> samples = [X[0]]
>>> for x in X:
... if x - samples[-1] >= delta:
... samples.append(x)
>>> samples
[1, 6, 11, 20, 25, 30]

If you are aiming to "vectorize" the process for performance reasons (e.g. using numpy), you could compute the number of elements that are less than each element plus the delta. This will give you indices for the items to select with the items that need to be skipped getting the same index as the preceding ones to be kept.
import numpy as np
X = np.array([1, 4, 5, 6, 11, 13, 15, 20, 21, 22, 25, 30])
delta = 5
i = np.sum(X<X[:,None]+delta,axis=1) # index of first to keep
i = np.insert(i[:-1],0,0) # always want the first, never the last
Y = X[np.unique(i)] # extract values as unique indexes
print(Y)
[ 1 6 11 20 25 30]
This assumes that the numbers are in ascending order
[EDIT]
As indicated in my comment, the above solution is flawed and will only work some of the time. Although vectorizing a python function does not fully leverage the parallelism (and is slower than the python loop), it is possible to implement the filter like this
X = np.array([1, 4, 5, 6, 10,11,12, 13, 15, 20, 21, 22, 25, 30])
delta = 5
fdelta = np.frompyfunc(lambda a,b:a if a+delta>b else b,2,1)
Y = X[X==fdelta.accumulate(X,dtype=np.object)]
print(Y)
[ 1 6 11 20 25 30]

Related

Obtaining a list of ordered integers from a list of "pairs" in Python

Hello I am currently working with a large set of data which contains an even amount of integers, all of which have a matching value. I am trying to create a list which is made up of "one of a pair" in Python.I am able to have multiple pairs of the same value, thus simply using the set function does not work. For example, if I have a list:
List = [10, 10, 11, 20, 15, 20, 15, 11, 10, 10]
In this example, indices 0 and 1 would be a pair, then 2 and 7, 3 and 5, 4 and 6, 8 and 9.
I want to extract from that list the values that make up each pair and create a new list with said values to produce something such as:
newList = [10, 11, 20, 15, 10]
Using the set function makes it such that only one element from the entire set of data is put into the list, where I need half of the total data from List. For situations where I have more than one pair of the same value, it would look something such as:
List = [10, 10, 11, 10, 11, 10]
Would need to produce a list such as:
newList = [10, 11, 10]
Any insight would be great as I am new to Python and there are a lot of functions I may not be aware of.
Thank you
Just try:
new_list = set(list)
This should return your desired output.
If I've understood correctly, you don't want to have any duplicated value, want to retain a list with unique values from a particular list.
If I'm right, a simple way to do so would be:
List = [10, 10, 11, 11, 15, 20, 15, 20]
newList = []
for x in List:
if x not in newList:
newList.append(x)
print(newList)
A python-like way to do so would be:
newList = set(List)
Here is a slight variation on one of #Alain T's answer:
[i for s in [set()] for i in List if (s.remove(i) if i in s else (not s.add(i)))]
NB: the following was my answer before you add the ordering requirement
sorted(List)[::2]
This sorts the input List and then take only one value out of each two consecutive.
As a general approach, this'll do:
l = [10, 10, 11, 20, 15, 20, 15, 11, 10, 10]
i = 0
while i < len(l):
del l[l.index(l[i], i + 1)]
i += 1
It iterates through the list one by one, finding the index of the next occurrence of the current value, and deletes it, shortening the list. This can probably be dressed up in various ways, but is a simple algorithm. Should a number not have a matching pair, this will raise a ValueError.
The following code reates a new list of half the number of items occuring in the input list. The order is in the order of first occurrence in the input list.
>>> from collections import Counter
>>> d = [10, 10, 11, 20, 15, 20, 15, 11, 10, 10]
>>> c = Counter(d)
>>> c
Counter({10: 4, 11: 2, 20: 2, 15: 2})
>>> answer = sum([[key] * (val // 2) for key, val in c.items()], [])
>>> answer
[10, 10, 11, 20, 15]
>>>
If you need to preserve the order of the first occurrence of each pair, you could use a set with an XOR operation on values to alternate between first and second occurrences.
List = [10, 10, 11, 20, 15, 20, 15, 11, 10, 10]
paired = [ i for pairs in [set()] for i in List if pairs.symmetric_difference_update({i}) or i in pairs]
print(p)
# [10, 11, 20, 15, 10]
You could also do this with the accumulate function from itertools:
from itertools import accumulate
paired = [a for a,b in zip(List,accumulate(({n} for n in List),set.__xor__)) if a in b]
print(paired)
# [10, 11, 20, 15, 10]
Or use a bitmap instead of a set (if your values are relatively small positive integers (e.g. between 0 and 64):
paired = [ n for n,m in zip(List,accumulate((1<<n for n in List),int.__xor__)) if (1<<n)&m ]
print(paired)
# [10, 11, 20, 15, 10]
Or you could use a Counter from collections
from collections import Counter
paired = [ i for c in [Counter(List)] for i in List if c.update({i:-1}) or c[i]&1 ]
print(paired)
# [10, 11, 20, 15, 10]
And , if you're not too worried about efficiency, a double sort with a 2 step striding could do it:
paired = [List[i] for i,_ in sorted(sorted(enumerate(List),key=lambda n:n[1])[::2])]
print(paired)
# [10, 11, 20, 15, 10]

How do i reference values from various ranges within a list?

What I want to do is reference several different ranges from within a list, i.e. I want the 4-6th elements, the 12 - 18th elements, etc. This was my initial attempt:
test = theList[4:7, 12:18]
Which I would expect to give do the same thing as:
test = theList[4,5,6,12,13,14,15,16,17]
But I got a syntax error. What is the best/easiest way to do this?
You can add the two lists.
>>> theList = list(range(20))
>>> theList[4:7] + theList[12:18]
[4, 5, 6, 12, 13, 14, 15, 16, 17]
You can also use itertools module :
>>> from itertools import islice,chain
>>> theList=range(20)
>>> list(chain.from_iterable(islice(theList,*t) for t in [(4,7),(12,18)]))
[4, 5, 6, 12, 13, 14, 15, 16, 17]
Note that since islice returns a generator in each iteration it performs better than list slicing in terms of memory use.
Also you can use a function for more indices and a general way .
>>> def slicer(iterable,*args):
... return chain.from_iterable(islice(iterable,*i) for i in args)
...
>>> list(slicer(range(40),(2,8),(10,16),(30,38)))
[2, 3, 4, 5, 6, 7, 10, 11, 12, 13, 14, 15, 30, 31, 32, 33, 34, 35, 36, 37]
Note : if you want to loop over the result you don't need convert the result to list!
You can add the two lists as #Bhargav_Rao stated. More generically, you can also use a list generator syntax:
test = [theList[i] for i in range(len(theList)) if 4 <= i <= 7 or 12 <= i <= 18]

Intersection between all elements of same list where elements are set [duplicate]

This question already has answers here:
Best way to find the intersection of multiple sets?
(7 answers)
Closed 8 years ago.
I have a list-
list_of_sets = [{0, 1, 2}, {0}]
I want to calculate the intersection between the elements of the list. I have thought about this solution:
a = list_of_sets[0]
b = list_of_sets[1]
c = set.intersection(a,b)
This solution works as i know the number of the elements of the list. (So i can declare as many as variable i need like a,b etc.)
My problem is that i can't figure out a solution for the other case, where the number of the elements of the list is unknown.
N.B: the thought of counting the number of elements of the list using loop and than creating variables according to the result has already been checked. As i have to keep my code in a function (where the argument is list_of_sets), so i need a more generalized solution that can be used for any numbered list.
Edit 1:
I need a solution for all the elements of the list. (not pairwise or for 3/4 elements)
If you wanted the intersection between all elements of all_sets:
intersection = set.intersection(*all_sets)
all_sets is a list of sets. the set is the set type.
For pairwise calculations,
This calculates intersections of all unordered pairs of 2 sets from a list all_sets. Should you need for 3, then use 3 as the argument.
from itertools import combinations, starmap
all_intersections = starmap(set.intersection, combinations(all_sets, 2))
If you did need the sets a, b for calculations, then:
for a, b in combinations(all_sets, 2):
# do whatever with a, b
You want the intersection of all the set. Then:
list_of_sets[0].intersection(*list_of_sets[1:])
Should work.
Take the first set from the list and then intersect it with the rest (unpack the list with the *).
You can use reduce for this. If you're using Python 3 you will have to import it from functools. Here's a short demo:
#!/usr/bin/env python
n = 30
m = 5
#Find sets of numbers i: 1 <= i <= n that are coprime to each number j: 2 <= j <= m
list_of_sets = [set(i for i in range(1, n+1) if i % j) for j in range(2, m+1)]
print 'Sets in list_of_sets:'
for s in list_of_sets:
print s
print
#Get intersection of all the sets
print 'Numbers less than or equal to %d that are coprime to it:' % n
print reduce(set.intersection, list_of_sets)
output
Sets in list_of_sets:
set([1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29])
set([1, 2, 4, 5, 7, 8, 10, 11, 13, 14, 16, 17, 19, 20, 22, 23, 25, 26, 28, 29])
set([1, 2, 3, 5, 6, 7, 9, 10, 11, 13, 14, 15, 17, 18, 19, 21, 22, 23, 25, 26, 27, 29, 30])
set([1, 2, 3, 4, 6, 7, 8, 9, 11, 12, 13, 14, 16, 17, 18, 19, 21, 22, 23, 24, 26, 27, 28, 29])
Numbers less than or equal to 30 that are coprime to it:
set([1, 7, 11, 13, 17, 19, 23, 29])
Actually, we don't even need reduce() for this, we can simply do
set.intersection(*list_of_sets)

Find common numbers in Python

I've 2 list
a = [1,9] # signifies the start point and end point, ie numbers 1,2,3,4,5,6,7,8,9
b = [4,23] # same for this.
Now I need to find whether the numbers from a intersect with numbers from b.
I can do it via making a list of numbers from a and b,and then intersecting the 2 lists, but I'm looking for some more pythonic solution.
Is there anything better solution.
My o/p should be 4,5,6,7,8,9
This is using intersecting two lists:
c = list(set(range(a[0],a[1]+1)) & set(range(b[0],b[1]+1)))
>>> print c
[4,5,6,7,8,9]
This is using min and max:
>>> c = range(max([a[0],b[0]]), min([a[1],b[1]])+1)
a = [1, 2, 3, 4, 5, 6, 7, 8, 9]
b = [4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23]
The most efficient way is using sets:
result = set(a).intersection(b)
Of course you can use a generator (a pythonic way of applying your logic)
result = (x for x in a if x in b)
You need to get [] or None or sth if sets do not inersect. Something like this would be most efficient:
def intersect(l1, l2):
bg = max(l1[0], l2[0])
end = max(l1[1], l2[1])
return [bg, end] if bg < end else []

How to choose days off evenly in a month using python?

I will assign 8 days off to a crew randomly in a calendar month.
I would like to randomly choose 8 days, and the days off distribution should be as even as possible. I mean all 8 days-off shouldn't be gathered in first 8 days of the month, for example.
For example: [1, 5, 8, 14, 18, 24, 27, 30] is a good distribution.
[1,2,3,4,26,27,28,29] is not a good distribution.
Actually, a crew can't work 7 consecutive days. In every 7 days, there must be 1 day-off.
All days are treated equally, ie Sundays are not days-off by themselves. Crew may work on weekends as well.
I want to choose days-off one by one. Not 8 of them together at once.
Could you recommend an algorithm using python to achieve this?
Not all days in the month may be available to be days off.
Best Regards
Use random.sample() to get a random set from a sequence. List the days that are available, then pass that to the .sample() function:
import sample
daysoff = [1, 2, 3, 4, 5, 6, 8, 9, 10, 11, 12, 13, 15, 16, 17, 18, 19, 20]
picked = random.sample(daysoff, 8)
In the above example I used the day of the month, and the list omits certain days (say, sundays and the last 10 days of the month), then we pick 8 random days from that population.
This is the key here:
Actually, a crew can't work 7 consecutive days. In every 7 days, there must be 1 day-off.
Reword the problem to say a random 2 days in every 7 days (or divide the month into four lengths of time as appropriate). You are then guaranteed an even-ish distribution. Use random.sample() as Martijn Pieters suggests.
You can generate two values using this technique from the first week, then yield them in sequence if you want them one by one.
edit:
As observed by tcaswell, there are still some cases where you end up with ten days in a row on duty. To combat this, you can assign a day off every three days, create a list of ten, and remove two days at random from the subset of days that don't invalidate the 7-continuous-day criteria.
Alternatively, you could just keep generating lists using the original algorithm until it fits the criteria, since you're very likely to get a valid solution anyway. You'd have to write a validation function of some kind, but it would be very easy to do since you're just counting the longest continuous string of days on.
CODE:
An implementation of the second option.
import random
from itertools import chain
from itertools import count
def candidate(m):
''' Returns 2 days per week, in m days, where m is the length of the month. '''
weeks = weeksmaker(m)
return sorted(list(chain(*[random.sample(week, 2) for week in weeks])))
def weeksmaker(m):
''' Divides a month up into four weeks, randomly assigning extra days to weeks. '''
weeks = [range(i, i+7) for i in xrange(1,29,7)]
for i in range(m - 28):
weeks[random.randint(1, len(weeks))-1].append(i)
c = count(1)
return [[c.next() for day in week] for week in weeks]
def valid(days, c):
''' Validity check. Cant work more than c consecutive days. '''
for i in xrange(1, len(days)):
if days[i] - days[i-1] > c:
return False
else:
return True
def daysoff(m, n, c):
''' In month length m, need n days off, cant work more than c consecutive days. '''
while True:
days = candidate(n)
if valid(days, c):
return days
>>> for i in range(28, 32):
... daysoff(i, 8, 7)
...
[6, 7, 10, 14, 18, 20, 27, 28]
[4, 7, 10, 13, 19, 21, 23, 24]
[2, 4, 9, 13, 15, 20, 25, 27]
[1, 3, 9, 12, 18, 19, 24, 28]
You should just split the total number of days.
This code works regardless of the amount of days off needed, and regardless of the days there are in total.
from random import randint
def foo(l, n):
dist = round(len(l)/n)
return [randint(l[i*dist], l[(i+1)*dist-1]) for i in range(n)]
In [1]: days = [i for i in range(1,31)]
In [2]: foo(days, 8)
Out[2]: [1, 4, 6, 9, 13, 16, 20, 27]
In [3]: mylist = [i for i in range(500)]
In [4]: foo(mylist, 5)
Out[4]: [80, 147, 250, 346, 448]
Some problems will occur with the rounding tho, list index might get out of range or so.
This (I think) does what #Martijn did and has the extra benefit of not including consecutive days (eg if you don't want 8 off-days in a row):
#Day selector
import random
Ndays = 8
daysoff = range(1,25)
concurrent_tol = 3
while True:
cntr = 0
sample = random.sample(daysoff, Ndays)
sample.sort()
for i in range(1,Ndays-1):
if abs(sample[i]-sample[i-1]) == 1:
cntr +=1
if abs(sample[i]-sample[i+1]) == 1:
cntr +=1
if cntr<concurrent_tol:
print "Found a good set of off-days :"
print sample
break
else:
print "Didn't find a good set, trying again"
print sample
Output example:
Didn't find a good set, trying again
[3, 4, 5, 6, 7, 8, 9, 11]
Didn't find a good set, trying again
[1, 5, 6, 7, 12, 14, 19, 20]
Didn't find a good set, trying again
[4, 5, 7, 9, 11, 15, 16, 20]
Didn't find a good set, trying again
[3, 4, 6, 7, 12, 13, 14, 23]
Didn't find a good set, trying again
[1, 7, 10, 12, 15, 16, 17, 22]
Didn't find a good set, trying again
[5, 7, 8, 11, 17, 18, 19, 23]
Didn't find a good set, trying again
[3, 8, 11, 12, 13, 15, 17, 21]
Didn't find a good set, trying again
[2, 5, 7, 8, 9, 12, 13, 21]
Found a good set of off-days :
[1, 2, 5, 12, 15, 17, 19, 20]
This also has the added benefit of looking ugly. Note that possible days are 1-24, as defined in daysoff.
Generate (and store) a list of all valid work schedules (via brute force...there are only 30C8 ways to do it). You can then safely and quickly pick from that list later.
import itertools
import numpy as np
good_lst = []
for days_off in itertools.combinations(range(30),8):
if np.max(np.diff( (0,) + days_off + (30,))) < 7:
good_lst.append(days_off)
(there may be some off-by-one bugs in there somplace)
This ran on a decent machine in ~5min. You will probably want to do more pruning as (0, 1, 2, 3, 6, 12, 18, 24) is a valid work schedule, but involves 4 sections of 6 work days.

Categories

Resources