How do I output a list which counts and displays the number of times different values fit into a range?
Based on the below example, the output would be x = [0, 3, 2, 1, 0] as there are 3 Pro scores (11, 24, 44), 2 Champion scores (101, 888), and 1 King score (1234).
- P1 = 11
- P2 = 24
- P3 = 44
- P4 = 101
- P5 = 1234
- P6 = 888
totalsales = [11, 24, 44, 101, 1234, 888]
Here is ranking corresponding to the sales :
Sales___________________Ranking
0-10____________________Noob
11-100__________________Pro
101-1000________________Champion
1001-10000______________King
100001 - 200000__________Lord
This is one way, assuming your values are integers and ranges do not overlap.
from collections import Counter
# Ranges go to end + 1
score_ranges = [
range(0, 11), # Noob
range(11, 101), # Pro
range(101, 1001), # Champion
range(1001, 10001), # King
range(10001, 200001) # Lord
]
total_sales = [11, 24, 44, 101, 1234, 888]
# This counter counts how many values fall into each score range (by index).
# It works by taking the index of the first range containing each value (or -1 if none found).
c = Counter(next((i for i, r in enumerate(score_ranges) if s in r), -1) for s in total_sales)
# This converts the above counter into a list, taking the count for each index.
result = [c[i] for i in range(len(score_ranges))]
print(result)
# [0, 3, 2, 1, 0]
As a general rule homework should not be posted on stackoverflow. As such, just a pointer on how to solve this, implementation is up to you.
Iterate over the totalsales list and check if each number is in range(start,stop). Then for each matching check increment one per category in your result list (however using a dict to store the result might be more apt).
Here a possible solution with no use of modules such as numpy or collections:
totalsales = [11, 24, 44, 101, 1234, 888]
bins = [10, 100, 1000, 10000, 20000]
output = [0]*len(bins)
for s in totalsales:
slot = next(i for i, x in enumerate(bins) if s <= x)
output[slot] += 1
output
>>> [0, 3, 2, 1, 0]
If your sales-to-ranking mapping always follows a logarithmic curve, the desired output can be calculated in linear time using math.log10 with collections.Counter. Use an offset of 0.5 and the abs function to handle sales of 0 and 1:
from collections import Counter
from math import log10
counts = Counter(int(abs(log10(abs(s - .5)))) for s in totalsales)
[counts.get(i, 0) for i in range(5)]
This returns:
[0, 3, 2, 1, 0]
Here, I have used the power of dataframe to store the values, then using bin and cut to group the values into the right categories. The extracting the value count into list.
Let me know if it is okay.
import pandas as pd
import numpy
df = pd.DataFrame([11, 24, 44, 101, 1234, 888], columns=['P'])# Create dataframe
bins = [0, 10, 100, 1000, 10000, 200000]
labels = ['Noob','Pro', 'Champion', 'King', 'Lord']
df['range'] = pd.cut(df.P, bins, labels = labels)
df
outputs:
P range
0 11 Pro
1 24 Pro
2 44 Pro
3 101 Champion
4 1234 King
5 888 Champion
Finally, to get the value count. Use:
my = df['range'].value_counts().sort_index()#this counts to the number of occurences
output=map(int,my.tolist())#We want the output to be integers
output
The result below:
[0, 3, 2, 1, 0]
You can use collections.Counter and a dict:
from collections import Counter
totalsales = [11, 24, 44, 101, 1234, 888]
ranking = {
0: 'noob',
10: 'pro',
100: 'champion',
1000: 'king',
10000: 'lord'
}
c = Counter()
for sale in totalsales:
for k in sorted(ranking.keys(), reverse=True):
if sale > k:
c[ranking[k]] += 1
break
Or as a two-liner (credits to #jdehesa for the idea):
thresholds = sorted(ranking.keys(), reverse=True)
c = Counter(next((ranking[t] for t in thresholds if s > t)) for s in totalsales)
Related
So here is my problem: I have an array like this:
arr = array([0, 0, 1, 8, 10, 20, 26, 32, 37, 52, 0, 0, 46, 42, 30, 19, 8, 2, 0, 0, 0])
In this array I want to find n consecutive values, greater than zero with the biggest sum. In this example with n = 5 this would be array([20, 26, 32, 37, 52]) and the index would be 5.
What I tried is of course a loop:
n = 5
max_sum = 0
max_loc = 0
for i in range(arr.size - n):
if all(arr[i:i + n] > 0) and arr[i:i + n].sum() > max_sum:
max_sum = arr[i:i + n].sum()
max_loc = i
print(max_loc)
This is fine for not too many short arrays but of course I need to use this on many not so short arrays.
I was experimenting with numpy so I would only have to iterate non-zero value groups:
diffs = np.concatenate((np.array([False]), np.diff(arr > 0)))
groups = np.split(arr, np.where(diffs)[0])
for group in groups:
if group.sum() > 0 and group.size >= n:
...
but I believe this is nice but not the right direction. I am looking for a simpler and faster numpy / pandas solution that really uses the powers of these packages.
Using cross-correlation, numpy.correlate, is a possible, concise and fast solution:
n=5
arr[arr<0] = np.iinfo(arr.dtype).min # The greatest negative integer possible
#Thanks for the np.iinfo suggestion, #Corralien
idx = np.argmax(np.correlate(arr, np.ones(n), 'valid'))
idx, arr[idx:(idx+5)]
Another possible solution:
n, l = 5, arr.size
arr[arr<0] = np.iinfo(arr.dtype).min # The greatest negative integer possible
#Thanks for the np.iinfo suggestion, #Corralien
idx = np.argmax([np.sum(np.roll(arr,-x)[:n]) for x in range(l-n+1)])
idx, arr[idx:(idx+n)]
Output:
(5, array([20, 26, 32, 37, 52]))
You can use sliding_window_view:
from numpy.lib.stride_tricks import sliding_window_view
N = 5
win = sliding_window_view(arr, N)
idx = ((win.sum(axis=1)) * ((win>0).all(axis=1))).argmax()
print(idx, arr[idx:idx+N])
# Output
5 [20 26 32 37 52]
Answer greatly enhanced by chrslg to save memory and keep a win as a view.
Update
A nice bonus is this should work with Pandas Series just fine.
N = 5
idx = pd.Series(arr).where(lambda x: x > 0).rolling(N).sum().shift(-N+1).idxmax()
print(idx, arr[idx:idx+N])
# Output
5 [20 26 32 37 52]
I have a numpy array with many rows in it that look roughly as follows:
0, 50, 50, 2, 50, 1, 50, 99, 50, 50
50, 2, 1, 50, 50, 50, 98, 50, 50, 50
0, 50, 50, 98, 50, 1, 50, 50, 50, 50
0, 50, 50, 50, 50, 99, 50, 50, 2, 50
2, 50, 50, 0, 98, 1, 50, 50, 50, 50
I am given a variable n<50. Each row, of length 10, has the following in it:
Every number from 0 to n, with one possibly missing. In the example above, n=2.
Possibly a 98, which will be in the place of the missing number, if there is a number missing.
Possibly a 99, which will be in the place of the missing number, if there is a number missing, and there is not already a 98.
Many 50's.
What I want to get is an array with all the indices of the 0s in the first row, all the indices of the 1s in the second row, all the indices of the 2s in the third row, etc. For the above example, my desired output is this:
0, 6, 0, 0, 3
5, 2, 5, 5, 5
3, 1, 3, 8, 0
You may have noticed the catch: sometimes, exactly one of the numbers is replaced either by a 98, or a 99. It's pretty easy to write a for loop which determines which number, if any, was replaced, and uses that to get the array of indices.
Is there a way to do this with numpy?
The follwing numpy solution rather aggressively uses the assumptions listed in OP. If they are not 100% guaranteed some more checks may be in order.
The mildly clever bit (even if I say so myself) here is to use the data array itself for finding the right destinations of their indices. For example, all the 2's need their indices stored in row 2 of the output array. Using this we can bulk store most of the indices in a single operation.
Example input is in array data:
n = 2
y,x = data.shape
out = np.empty((y,n+1),int)
# find 98 falling back to 99 if necessary
# and fill output array with their indices
# if neither exists some nonsense will be written but that does no harm
# most of this will be overwritten later
out.T[...] = ((data-98)&127).argmin(axis=1)
# find n+1 lowest values in each row
idx = data.argpartition(n,axis=1)[:,:n+1]
# construct auxiliary indexer
yr = np.arange(y)[:,None]
# put indices of low values where they belong
out[yr,data[yr,idx[:,:-1]]] = idx[:,:-1]
# ^ ^ ^ ^ ^ ^ ^ ^ ^ ^
# the clever bit
# rows with no missing number still need the last value
nomiss, = (data[range(y),idx[:,n]] == n).nonzero()
out[nomiss,n] = idx[nomiss,n]
# admire
print(out.T)
outputs:
[[0 6 0 0 3]
[5 2 5 5 5]
[3 1 3 8 0]]
I don't think you're getting away without a for-loop here. But here's how you could go about it.
For each number in n, find all of the locations where it is known. Example:
locations = np.argwhere(data == 1)
print(locations)
[[0 5]
[1 2]
[2 5]
[4 5]]
You can then turn this into a map for easy lookup per number in n:
known = {
i: dict(np.argwhere(data == i))
for i in range(n + 1)
}
pprint(known)
{0: {0: 0, 2: 0, 3: 0, 4: 3},
1: {0: 5, 1: 2, 2: 5, 4: 5},
2: {0: 3, 1: 1, 3: 8, 4: 0}}
Do the same for the unknown numbers:
unknown = dict(np.argwhere((data == 98) | (data == 99)))
pprint(unknown)
{0: 7, 1: 6, 2: 3, 3: 5, 4: 4}
And now for each location in the result, you can lookup the index in the known list and fallback to the unknown.
result = np.array(
[
[known[i].get(j, unknown.get(j)) for j in range(len(data))]
for i in range(n + 1)
]
)
print(result)
[[0 6 0 0 3]
[5 2 5 5 5]
[3 1 3 8 0]]
Bonus: Getting fancy with dictionary constructor and unpacking:
from collections import OrderedDict
unknown = np.argwhere((data == 98) | (data == 99))
results = np.array([
[*OrderedDict((*unknown, *np.argwhere(data == i))).values()]
for i in range(n + 1)
])
print(results)
trying to select subset from a list, however the order is reversed after selection
tried using pandas isin
df.mon =[1,2,3,4,5,6,7,8,9,10,11,12,1,2,3,4,5,6,7,8,9,10,11,12,...]
# selecting
results = df[df.month.isin([10,11,12,1,2,3])]
print(results.mon]
mon = [1,2,3,10,11,12, 1,2,3,10,11,12,...]
desired results
mon= [10,11,12,1,2,3,10,11,12,1,2,3,...]
# sorting results in this
mon = [1,1,2,2,3,3,10,10,11,11,12,12] and I dont want that either
thanks for the help
I work most with basic python lists, so I have converted the df to a list.
Data
The data is displayed in an xlsx file like this.
The input is a xlsx document which goes 1, 2, .. 12, 1, 2, .. 12 only twice, the "Values" start at 90 and count by 10 all the way out to the second 12.
Process
import pandas as pd
df = pd.read_excel('Book1.xlsx')
arr = df['Column'].tolist()
arr2 = df['Values'].tolist()
monthsofint = [10, 11, 12, 1, 2, 3]
locs = []
dictor = {}
for i in range(len(monthsofint)):
dictor[monthsofint[i]] = []
for i in range(len(monthsofint)): # !! Assumption !!
for j in range(len(arr)):
if monthsofint[i] == arr[j]:
dictor[monthsofint[i]].append(j)
newlist = []
newlist2 = []
for i in range(len(dictor[monthsofint[0]])):
for j in range(len(monthsofint)):
newlist.append(arr[dictor[monthsofint[j]][i]])
newlist2.append(arr2[dictor[monthsofint[j]][i]])
print(newlist)
print(newlist2)
Output: [10, 11, 12, 1, 2, 3, 10, 11, 12, 1, 2, 3] and [180, 190, 200, 90, 100, 110, 300, 310, 320, 210, 220, 230]
Note on Assumption: The assumption made is that there will always be 12 months for every year in the file.
In your case , we using Categorical + cumcount
#results = df[df.mon.isin([10, 11, 12, 1, 2, 3])].copy()
results.mon=pd.Categorical(results.mon,[10,11,12,1,2,3])
s=results.sort_values('mon')
s=s.iloc[s.groupby('mon').cumcount().argsort()]
s
Out[172]:
mon
9 10
10 11
11 12
0 1
1 2
2 3
21 10
22 11
23 12
12 1
13 2
14 3
I think you can take what we can have for each category, then use izip_longest to zip those lists.
So I found a relatively easy and simple way to do it from another source
For those who might be interested:
df[(df.index > 4) & (df.month.isin([10, 11, 12, 1, 2, 3]))]
If you have a range of numbers from 1-49 with 6 numbers to choose from, there are nearly 14 million combinations. Using my current script, I currently have only 7.2 million combinations remaining. Of the 7.2 million remaining combinations, I want to eliminate all 3, 4, 5, 6, dual, and triple consecutive numbers.
Example:
3 consecutive: 1, 2, 3, x, x, x
4 consecutive: 3, 4, 5, 6, x, x
5 consecutive: 4, 5, 6, 7, 8, x
6 consecutive: 5, 6, 7, 8, 9, 10
double separate consecutive: 1, 2, 5, 6, 14, 18
triple separate consecutive: 1, 2, 9, 10, 22, 23
Note: combinations such as 1, 2, 12, 13, 14, 15 must also be eliminated or else they conflict with the rule that double and triple consecutive combinations to be eliminated.
I'm looking to find how many combinations of the 7.2 million remaining combinations have zero consecutive numbers (all mixed) and only 1 consecutive pair.
Thank you!
import functools
_MIN_SUM = 120
_MAX_SUM = 180
_MIN_NUM = 1
_MAX_NUM = 49
_NUM_CHOICES = 6
_MIN_ODDS = 2
_MAX_ODDS = 4
#functools.lru_cache(maxsize=None)
def f(n, l, s = 0, odds = 0):
if s > _MAX_SUM or odds > _MAX_ODDS:
return 0
if n == 0 :
return int(s >= _MIN_SUM and odds >= _MIN_ODDS)
return sum(f(n-1, i+1, s+i, odds + i % 2) for i in range(l, _MAX_NUM+1))
result = f(_NUM_CHOICES, _MIN_NUM)
print('Number of choices = {}'.format(result))
While my answer should work, I think someone might be able to offer a faster solution.
Consider the following code:
not_allowed = []
for x in range(48):
not_allowed.append([x, x+1, x+2])
# not_allowed = [ [0,1,2], [1,2,3], ... [11,12,13], ... [47,48,49] ]
my_numbers = [[1, 2, 5, 9, 11, 33], [1, 3, 7, 8, 9, 31], [12, 13, 14, 15, 23, 43]]
for x in my_numbers:
for y in not_allowed:
if set(y) <= set(x): # if [1,2,3] is a subset of [1,2,5,9,11,33], etc.
# drop x
This code will remove all instances that contain double consecutive numbers, which is all you really need to check for, because triple, quadruple, etc. all imply double consecutive. Try implementing this and let me know how it works.
The easiest approach is probably to generate and filter. I used numpy to try to vectorize as much of this as I could:
import numpy as np
from itertools import combinations
combos = np.array(list(combinations(range(1, 50), 6))) # build all combos
# combos is shape (13983816, 6)
filt = np.where(np.bincount(np.where(np.abs(
np.subtract(combos[:, :-1], combos[:, 1:])) == 1)[0]) <= 1)[0] # magic!
filtered = combos[filt]
# filtered is shape (12489092, 6)
Breaking down that "magic" line
First we subtract the first five items in the list from the last five items to get the differences between them. We do this for the entire set of combinations in one shot with np.subtract(combos[:, :-1], combos[:, 1:]). Note that itertools.combinations produces sorted combinations, on which this depends.
Next we take the absolute value of these differences to make sure we only look at positive distances between numbers with np.abs(...).
Next we grab the indicies from this operation for the entire dataset that indicate a difference of 1 (consecutive numbers) with np.where(... == 1)[0]. Note that np.where returns a tuple where the first item are all of the rows, and the second item are all of the corresponding columns for our condition. This is important because any row value that shows up more than once tells us that we have more than one consecutive number in that row!
So we count how many times each row shows up in our results with np.bincount(...), which will return something like [5, 4, 4, 4, 3, 2, 1, 0] indicating how many consecutive pairs are in each row of our combinations dataset.
Finally we grab only the row numbers where there are 0 or 1 consecutive values with np.where(... <= 1)[0].
I am returning way more combinations than you seem to indicate, but I feel fairly confident that this is working. By all means, poke holes in it in the comments and I will see if I can find fixes!
Bonus, because it's all vectorized, it's super fast!
This question already has answers here:
Generate random numbers summing to a predefined value
(7 answers)
Closed 4 years ago.
I have the following list:
Sum=[54,1536,36,14,9,360]
I need to generate 4 other lists, where each list will consist of 6 random numbers starting from 0, and the numbers will add upto the values in sum. For eg;
l1=[a,b,c,d,e,f] where a+b+c+d+e+f=54
l2=[g,h,i,j,k,l] where g+h+i+j+k+l=1536
and so on upto l6. And I need to do this in python. Can it be done?
Generating a list of random numbers that sum to a certain integer is a very difficult task. Keeping track of the remaining quantity and generating items sequentially with the remaining available quantity results in a non-uniform distribution, where the first numbers in the series are generally much larger than the others. On top of that, the last one will always be different from zero because the previous items in the list will never sum up to the desired total (random generators usually use open intervals in the maximum). Shuffling the list after generation might help a bit but won't generally give good results either.
A solution could be to generate random numbers and then normalize the result, eventually rounding it if you need them to be integers.
import numpy as np
totals = np.array([54,1536,36,14]) # don't use Sum because sum is a reserved keyword and it's confusing
a = np.random.random((6, 4)) # create random numbers
a = a/np.sum(a, axis=0) * totals # force them to sum to totals
# Ignore the following if you don't need integers
a = np.round(a) # transform them into integers
remainings = totals - np.sum(a, axis=0) # check if there are corrections to be done
for j, r in enumerate(remainings): # implement the correction
step = 1 if r > 0 else -1
while r != 0:
i = np.random.randint(6)
if a[i,j] + step >= 0:
a[i, j] += step
r -= step
Each column of a represents one of the lists you want.
Hope this helps.
This might not be the most efficient way but it will work
totals = [54, 1536, 36, 14]
nums = []
x = np.random.randint(0, i, size=(6,))
for i in totals:
while sum(x) != i: x = np.random.randint(0, i, size=(6,))
nums.append(x)
print(nums)
[array([ 3, 19, 21, 11, 0, 0]), array([111, 155, 224, 511, 457,
78]), array([ 8, 5, 4, 12, 2, 5]), array([3, 1, 3, 2, 1, 4])]
This is a way more efficient way to do this
totals = [54,1536,36,14,9,360, 0]
nums = []
for i in totals:
if i == 0:
nums.append([0 for i in range(6)])
continue
total = i
temp = []
for i in range(5):
val = np.random.randint(0, total)
temp.append(val)
total -= val
temp.append(total)
nums.append(temp)
print(nums)
[[22, 4, 16, 0, 2, 10], [775, 49, 255, 112, 185, 160], [2, 10, 18, 2,
0, 4], [10, 2, 1, 0, 0, 1], [8, 0, 0, 0, 0, 1], [330, 26, 1, 0, 2, 1],
[0, 0, 0, 0, 0, 0]]