If you have a range of numbers from 1-49 with 6 numbers to choose from, there are nearly 14 million combinations. Using my current script, I currently have only 7.2 million combinations remaining. Of the 7.2 million remaining combinations, I want to eliminate all 3, 4, 5, 6, dual, and triple consecutive numbers.
Example:
3 consecutive: 1, 2, 3, x, x, x
4 consecutive: 3, 4, 5, 6, x, x
5 consecutive: 4, 5, 6, 7, 8, x
6 consecutive: 5, 6, 7, 8, 9, 10
double separate consecutive: 1, 2, 5, 6, 14, 18
triple separate consecutive: 1, 2, 9, 10, 22, 23
Note: combinations such as 1, 2, 12, 13, 14, 15 must also be eliminated or else they conflict with the rule that double and triple consecutive combinations to be eliminated.
I'm looking to find how many combinations of the 7.2 million remaining combinations have zero consecutive numbers (all mixed) and only 1 consecutive pair.
Thank you!
import functools
_MIN_SUM = 120
_MAX_SUM = 180
_MIN_NUM = 1
_MAX_NUM = 49
_NUM_CHOICES = 6
_MIN_ODDS = 2
_MAX_ODDS = 4
#functools.lru_cache(maxsize=None)
def f(n, l, s = 0, odds = 0):
if s > _MAX_SUM or odds > _MAX_ODDS:
return 0
if n == 0 :
return int(s >= _MIN_SUM and odds >= _MIN_ODDS)
return sum(f(n-1, i+1, s+i, odds + i % 2) for i in range(l, _MAX_NUM+1))
result = f(_NUM_CHOICES, _MIN_NUM)
print('Number of choices = {}'.format(result))
While my answer should work, I think someone might be able to offer a faster solution.
Consider the following code:
not_allowed = []
for x in range(48):
not_allowed.append([x, x+1, x+2])
# not_allowed = [ [0,1,2], [1,2,3], ... [11,12,13], ... [47,48,49] ]
my_numbers = [[1, 2, 5, 9, 11, 33], [1, 3, 7, 8, 9, 31], [12, 13, 14, 15, 23, 43]]
for x in my_numbers:
for y in not_allowed:
if set(y) <= set(x): # if [1,2,3] is a subset of [1,2,5,9,11,33], etc.
# drop x
This code will remove all instances that contain double consecutive numbers, which is all you really need to check for, because triple, quadruple, etc. all imply double consecutive. Try implementing this and let me know how it works.
The easiest approach is probably to generate and filter. I used numpy to try to vectorize as much of this as I could:
import numpy as np
from itertools import combinations
combos = np.array(list(combinations(range(1, 50), 6))) # build all combos
# combos is shape (13983816, 6)
filt = np.where(np.bincount(np.where(np.abs(
np.subtract(combos[:, :-1], combos[:, 1:])) == 1)[0]) <= 1)[0] # magic!
filtered = combos[filt]
# filtered is shape (12489092, 6)
Breaking down that "magic" line
First we subtract the first five items in the list from the last five items to get the differences between them. We do this for the entire set of combinations in one shot with np.subtract(combos[:, :-1], combos[:, 1:]). Note that itertools.combinations produces sorted combinations, on which this depends.
Next we take the absolute value of these differences to make sure we only look at positive distances between numbers with np.abs(...).
Next we grab the indicies from this operation for the entire dataset that indicate a difference of 1 (consecutive numbers) with np.where(... == 1)[0]. Note that np.where returns a tuple where the first item are all of the rows, and the second item are all of the corresponding columns for our condition. This is important because any row value that shows up more than once tells us that we have more than one consecutive number in that row!
So we count how many times each row shows up in our results with np.bincount(...), which will return something like [5, 4, 4, 4, 3, 2, 1, 0] indicating how many consecutive pairs are in each row of our combinations dataset.
Finally we grab only the row numbers where there are 0 or 1 consecutive values with np.where(... <= 1)[0].
I am returning way more combinations than you seem to indicate, but I feel fairly confident that this is working. By all means, poke holes in it in the comments and I will see if I can find fixes!
Bonus, because it's all vectorized, it's super fast!
Related
I have been struggling with this quite a bit, and was wondering if there is a solution for this.
I would like to use the range(start,stop,step) function in Python, but I would like to use a different order than what the normal functionalities allow. From what I understand the range function can do 3 things:
0, 1, 2, 3 etc
10, 9, 8, 7 etc
2, 4, 6, 8 etc
Now I am looking for the following order:
0, 10, 1, 9, 2, 8, 3, 7 etc
In this case 10 in the len(df), so the last row of the df.
You can create a generator to do that:
def my_range(n):
"""Yields numbers from 0 to n, in order 0, n, 1, n-1..."""
low = 0
high = n
while low <= high:
yield low
if high != low:
yield high
low += 1
high -= 1
Some examples:
print(list(my_range(10)))
# [0, 10, 1, 9, 2, 8, 3, 7, 4, 6, 5]
for i in my_range(5):
print(i)
0
5
1
4
2
3
This will provide every number in the interval exactly once, and lazily, just as range.
To answer the question in your comment:
if you want to mix the numbers in your_list = [1,3,4,5,7,9,10,12,13,14], you can just use this function to generate the indices:
your_list = [1,3,4,5,7,9,10,12,13,14]
for index in my_range(len(your_list)-1):
print(your_list[index], end=' ')
# 1 14 3 13 4 12 5 10 7 9
or you could build a new list in the mixed order:
new = [your_list[index] for index in my_range(len(your_list)-1)]
print(new)
# [1, 14, 3, 13, 4, 12, 5, 10, 7, 9]
range just has start, stop, step params. But you can achieve what you want by zipping two ranges together along with the chain.from_iterable function:
from itertools import chain
for val in chain.from_iterable(zip(range(11), range(10, -1, -1))):
print(val)
# 0 10 1 9 2 8 3 7 4 6 5 5 6 4 7 3 8 2 9 1 10 0
Note this solution repeats values, if you want no repeated values, then a generator is the way to go.
To keep things simple and clear the range function cannot do this.
Since it only allows to either increment or decrement at once.
but this can be achieved with loops and variables.
this is how to do it
for i in range(0,11):
print(i,10-i,end=" ")
this code will do it.
I have an array of 9 elements.
I sample 4 elements randomly and repeat each one 3 times.
But I also want to repeat twice (in other array) the numbers that were not sampled.
For example:
yeses = [0,0,0,4,4,4,1,1,1,8,8,8]
I need:
noes = [1,1,2,2,3,3,5,5,6,6,7,7,9,9]
How can I do that?
allStims = [0, 1, 2, 3, 4, 5, 6, 7, 8]
##Pick randomly 4 numbers and repeat each 3 times
yeses = np.repeat(random.sample(allStims, 4),3)
print(yeses)
You can use a list comprehension to get all the values in the original list that aren't in yeses.
nos = np.repeat([x for x in allStims if x not in yeses], 2)
I'm trying to sum the a portion of the sessions in my dictionary so I can get totals for the current and previous week.
I've converted the JSON into a pandas dataframe in one test. I'm summing the total of the sessions using the .sum() function in pandas. However, I also need to know the total sessions from this week and the week prior. I've tried a few methods to sum values (-1:-7) and (-8:-15), but I'm pretty sure I need to use .iloc.
IN:
response = requests.get("url")
data = response.json()
df=pd.DataFrame(data['DailyUsage'])
total_sessions = df['Sessions'].sum()
current_week= df['Sessions'].iloc[-1:-7]
print(current_week)
total_sessions =['current_week'].sum
OUT:
Series([], Name: Sessions, dtype: int64)
AttributeError 'list' object has no attribute 'sum'
Note: I've tried this with and without pd.to_numeric and also with variations on the syntax of the slice and sum methods. Pandas doesn't feel very Pythonic and I'm out of ideas as to what to try next.
Assuming that df['Sessions'] holds each day, and you are comparing current and previous week only, you can use reshape to create a weekly sum for the last 14 values.
weekly_matrix = df['Sessions'][:-15:-1].values.reshape((2, 7))
Then, you can sum each row and get the weekly sum, most recent will be the first element.
import numpy as np
weekly_sum = np.sum(weekly_matrix, axis=1)
current_week = weekly_sum[0]
previous_week = weekly_sum[1]
EDIT: how the code works
Let's take the 1D-array which is accessed by the values attribute of the pandas Series. It contains the last 14 days, which is ordered from most recent to the oldest. I will call it x.
x = array([14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1])
The array's reshape function is then called on x to split this data into a 2D-array (matrix) with 2 rows and 7 columns.
The default behavior of the reshape function is to first fill all columns in a row before moving to the next row. Therefore, x[0] will be the element (1,1) in the reshaped array, x[1] will be the element (1,2), and so on. After the element (1,7) is filled with x[6] (ending the current week), the next element x[7] will then be placed in (2,1). This continues until finishing the reshape operation, with the placement of x[13] in (2,7).
This results in placing the first 7 elements of x (current week) in the first row, and the last 7 elements of x (previous week) in the second row. This was called weekly_matrix.
weekly_matrix = x.reshape((2, 7))
# weekly_matrix = array([[14, 13, 12, 11, 10, 9, 8],
# [ 7, 6, 5, 4, 3, 2, 1]])
Since now we have the values of each week organized in a matrix, we can use numpy.sum function to finish our operation. numpy.sum can take an axis argument, which will control how the value is computed:
if axis=None, all elements are added in a grand total.
if axis=0, all rows in each column will be added. In the case of weekly_matrix, this will result in a 7 element 1D-array ([21, 19,
17, 15, 13, 11, 9], which is not the result we want, as we are
actually adding equivalent days on each week).
if axis=1 (as the case of the solution), all columns in each row will be added, producing a 2 element 1D-array in the case of weekly_matrix. Order of this result
array follows the same order of the rows in the matrix (i.e., element
0 is the total of the first row, and element 1 is the total of the
second row). Since we know that the first row is the current week, and
the second row is the previous week, we can extract the information
using these indexes, which is
# weekly_sum = array([77, 28])
current_week = weekly_sum[0] # sum of [14, 13, 12, 11, 10, 9, 8] = 77
previous_week = weekly_sum[1] # sum of [ 7, 6, 5, 4, 3, 2, 1] = 28
To group and sum by a fixed number of values, for instance with daily data and weekly aggregation, consider groupby. You can do this forwards or backwards by slicing your series as appropriate:
np.random.seed(0)
df = pd.DataFrame({'col': np.random.randint(0, 10, 21)})
print(df['col'].values)
# array([5, 0, 3, 3, 7, 9, 3, 5, 2, 4, 7, 6, 8, 8, 1, 6, 7, 7, 8, 1, 5])
# forwards groupby
res = df['col'].groupby(df.index // 7).sum()
# 0 30
# 1 40
# 2 35
# Name: col, dtype: int32
# backwards groupby
df['col'].iloc[::-1].reset_index(drop=True).groupby(df.index // 7).sum()
# 0 35
# 1 40
# 2 30
# Name: col, dtype: int32
This question already has answers here:
Generate random numbers summing to a predefined value
(7 answers)
Closed 4 years ago.
I have the following list:
Sum=[54,1536,36,14,9,360]
I need to generate 4 other lists, where each list will consist of 6 random numbers starting from 0, and the numbers will add upto the values in sum. For eg;
l1=[a,b,c,d,e,f] where a+b+c+d+e+f=54
l2=[g,h,i,j,k,l] where g+h+i+j+k+l=1536
and so on upto l6. And I need to do this in python. Can it be done?
Generating a list of random numbers that sum to a certain integer is a very difficult task. Keeping track of the remaining quantity and generating items sequentially with the remaining available quantity results in a non-uniform distribution, where the first numbers in the series are generally much larger than the others. On top of that, the last one will always be different from zero because the previous items in the list will never sum up to the desired total (random generators usually use open intervals in the maximum). Shuffling the list after generation might help a bit but won't generally give good results either.
A solution could be to generate random numbers and then normalize the result, eventually rounding it if you need them to be integers.
import numpy as np
totals = np.array([54,1536,36,14]) # don't use Sum because sum is a reserved keyword and it's confusing
a = np.random.random((6, 4)) # create random numbers
a = a/np.sum(a, axis=0) * totals # force them to sum to totals
# Ignore the following if you don't need integers
a = np.round(a) # transform them into integers
remainings = totals - np.sum(a, axis=0) # check if there are corrections to be done
for j, r in enumerate(remainings): # implement the correction
step = 1 if r > 0 else -1
while r != 0:
i = np.random.randint(6)
if a[i,j] + step >= 0:
a[i, j] += step
r -= step
Each column of a represents one of the lists you want.
Hope this helps.
This might not be the most efficient way but it will work
totals = [54, 1536, 36, 14]
nums = []
x = np.random.randint(0, i, size=(6,))
for i in totals:
while sum(x) != i: x = np.random.randint(0, i, size=(6,))
nums.append(x)
print(nums)
[array([ 3, 19, 21, 11, 0, 0]), array([111, 155, 224, 511, 457,
78]), array([ 8, 5, 4, 12, 2, 5]), array([3, 1, 3, 2, 1, 4])]
This is a way more efficient way to do this
totals = [54,1536,36,14,9,360, 0]
nums = []
for i in totals:
if i == 0:
nums.append([0 for i in range(6)])
continue
total = i
temp = []
for i in range(5):
val = np.random.randint(0, total)
temp.append(val)
total -= val
temp.append(total)
nums.append(temp)
print(nums)
[[22, 4, 16, 0, 2, 10], [775, 49, 255, 112, 185, 160], [2, 10, 18, 2,
0, 4], [10, 2, 1, 0, 0, 1], [8, 0, 0, 0, 0, 1], [330, 26, 1, 0, 2, 1],
[0, 0, 0, 0, 0, 0]]
I have a numpy matrix M and I need to apply some operations to all the rows of the matrix, except for a determined rows.
For example, suppose I have rows [3,5] whose elements should be avoided from an operation like M[:,8] = 4. So I want to have all the rows of the 8th column to be set to 4, but I want to avoid doing so to rows 3 and 5. How can I do this in numpy?
Edit: basically I need that to avoid a division by zero when doing a normalization by the sum of the elements of a row. Some rows are all zeros, so doing the summation (which is zero) then dividing by the summation will give a division by zero. What I'm doing is that I find out which rows are all zeros and then I want not to do the normalization operation for those specific rows.
Perhaps something like this?
>>> import numpy as np
>>> M = np.arange(32).reshape(8, 4)
>>> ignore = {3, 5}
>>> rest = [i for i in xrange(M.shape[0]) if i not in ignore]
>>> M[rest, 3] = 4
>>> M
array([[ 0, 1, 2, 4],
[ 4, 5, 6, 4],
[ 8, 9, 10, 4],
[12, 13, 14, 15],
[16, 17, 18, 4],
[20, 21, 22, 23],
[24, 25, 26, 4],
[28, 29, 30, 4]])
Based on your edit, in order to solve your specific problem, where you seem to manipulating a matrix with non-negative entries, you may exploit the following trick
import numpy as np
rng = np.random.RandomState(42)
M = rng.randn(10, 10) ** 2
M[[0, 5]] = 0. # set 2 lines to 0
M_norm = M / (M.sum(axis=1) + 1e-18)[:, np.newaxis]
Obviously this result is not exact, but exact enough to not notice the difference. To make it slightly better, you can also write
M_norm = M / np.maximum(M.sum(axis=1), 1e-18)[:, np.newaxis]
If this still isn't sufficient, and you want it exact, for the general case (negativity allowed) you can write
row_sums = M.sum(axis=1)
row_sums[row_sums == 0] = 1.
M_norm = M / row_sums[:, np.newaxis] # dividing the zeros by 1 still yields 0
To add some robustness, you could also do
tolerance = 1e-6
row_sums = M.sum(axis=1)
OK_rows = np.abs(row_sums) > tolerance
M_norm = np.zeros_like(M)
M_norm[OK_rows] = M[OK_rows] / row_sums[OK_rows][:, np.newaxis]