Python - Weighted Average from a single list - python

I have a single list as this:
prices = [10, 10, 10, 40, 40, 50]
and I would like to calculate the weighted average from this list, so the weight of the number 10 would be 3, weight of 40 would be 2 and weight of 50 would be 1. How do I do this without having 2 separated lists?

You neeed some form of lookup for your weights, either a list of the same lenght or a dictionary seem prudent.
Compute it using a dictionary of weights:
l = [10, 10, 10, 40, 40, 50]
w = {10:3,40:2,50:1}
avg_w = sum(i*w[i] for i in l)/len(l)
print(avg_w)
or use a weight list:
w2= [3,3,3,2,2,1]
avg_w = sum(v*w2[i] for i,v in enumerate(l))/len(l)
print(avg_w)
Output:
50

Related

Knapsack with constraint same value

I am Solving a Multiple Knapsacks Problem in python :
The problem is to pack a subset of the items into five bins, each of which has a maximum capacity of 100, so that the total packed value is a maximum.
data = {}
data['weights'] = [
48, 30, 42, 36, 36, 48, 42, 42, 36, 24, 30, 30, 42, 36, 36
]
data['values'] = [
10, 30, 25, 50, 35, 30, 15, 40, 30, 35, 45, 10, 20, 30, 25
]
assert len(data['weights']) == len(data['values'])
data['num_items'] = len(data['weights'])
data['all_items'] = range(data['num_items'])
data['bin_capacities'] = [100, 100, 100, 100, 100]
data['num_bins'] = len(data['bin_capacities'])
data['all_bins'] = range(data['num_bins'])
The data includes the following:
weights: A vector containing the weights of the items.
values: A vector containing the values of the items.
capacities: A vector containing the capacities of the bins.
The following code declares the MIP solver.
solver = pywraplp.Solver.CreateSolver('SCIP')
if solver is None:
print('SCIP solver unavailable.')
return
The following code creates the variables for the problem.
# x[i, b] = 1 if item i is packed in bin b.
x = {}
for i in data['all_items']:
for b in data['all_bins']:
x[i, b] = solver.BoolVar(f'x_{i}_{b}')
The following code defines the constraints for the problem:
Each x[(i, j)] is a 0-1 variable, where i is an item and j is a bin. In the solution, x[(i, j)] will be 1 if item i is placed in bin j, and 0 otherwise.
# Each item is assigned to at most one bin.
for i in data['all_items']:
solver.Add(sum(x[i, b] for b in data['all_bins']) <= 1)
# The amount packed in each bin cannot exceed its capacity.
for b in data['all_bins']:
solver.Add(
sum(x[i, b] * data['weights'][i]
for i in data['all_items']) <= data['bin_capacities'][b])
# Maximize total value of packed items.
objective = solver.Objective()
for i in data['all_items']:
for b in data['all_bins']:
objective.SetCoefficient(x[i, b], data['values'][i])
objective.SetMaximization()
I Try to add another contraint which consist that all items in the same bag should have the same weight, but I struggle to do it in python . Can you help me to code it?
Thanks
Just a sketch. Think about it... Maybe fix it (it's just an idea).
What you have: Assignment-matrix A items <-> bins
item 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
0
1
2
3
4
<=1 <=1 <=1 ...
bin
What you should add: Assignment-matrix B item-classes <-> bins
Item-class: set of all items of same weight
e.g.:
import numpy as np
weights = np.array([48, 30, 42, 36, 36, 48, 42, 42, 36, 24, 30, 30, 42, 36, 36])
unique_weights = set(weights)
partition = [np.where(weights == i)[0] for i in unique_weights]
# [array([ 3, 4, 8, 13, 14]), array([ 2, 6, 7, 12]), array([0, 5]), array([9]), array([ 1, 10, 11])]
Additional assignment-matrix:
item-class 0 1 2 3 4
0 <=1
1 <=1
2 <=1
3 <=1
4 <=1
bin
Then: Link/Channel those
Sum of assigned items to bin of class C is 0 if class C is not UNIQUELY assigned to this bin, unbounded (big-M) otherwise.
Something like:
for b in range(n_bins):
for c in range(n_partitions):
sum(A[b, all_indices_of_items_in_class(c)]) <= B[b, c] * len(all_indices_of_items_in_class(c))
Remarks
Obviously, this is more of some addition to the status quo.
It might make more sense to not model A as big boolean-mat, but just introduce cardinality-constraints (how much identical items are picked) as we already have variables expressing "what" we pick.

How to plot a histogram in matplotlib in python?

I know how to plot a histogram when individual datapoints are given like:
(33, 45, 54, 33, 21, 29, 15, ...)
by simply using something matplotlib.pyplot.hist(x, bins=10)
but what if I only have grouped data like:
and so on.
I know that I can use bar plots to mimic a histogram by changing xticks but what if I want to do this by using only hist function of matplotlib.pyplot?
Is it possible to do this?
You can build the hist() params manually and use the existing value counts as weights.
Say you have this df:
>>> df = pd.DataFrame({'Marks': ['0-10', '10-20', '20-30', '30-40'], 'Number of students': [8, 12, 24, 26]})
Marks Number of students
0 0-10 8
1 10-20 12
2 20-30 24
3 30-40 26
The bins are all the unique boundary values in Marks:
>>> bins = pd.unique(df.Marks.str.split('-', expand=True).astype(int).values.ravel())
array([ 0, 10, 20, 30, 40])
Choose one x value per bin, e.g. the left edge to make it easy:
>>> x = bins[:-1]
array([ 0, 10, 20, 30])
Use the existing value counts (Number of students) as weights:
>>> weights = df['Number of students'].values
array([ 8, 12, 24, 26])
Then plug these into hist():
>>> plt.hist(x=x, bins=bins, weights=weights)
One possibility is to “ungroup” data yourself.
For example, for the 8 students with a mark between 0 and 10, you can generate 8 data points of value of 5 (the mean). For the 12 with a mark between 10 and 20, you can generate 12 data points of value 15.
However, the “ungrouped” data will only be an approximation of the real data. Thus, it is probably better to just use a matplotlib.pyplot.bar plot.

Obtaining a list of ordered integers from a list of "pairs" in Python

Hello I am currently working with a large set of data which contains an even amount of integers, all of which have a matching value. I am trying to create a list which is made up of "one of a pair" in Python.I am able to have multiple pairs of the same value, thus simply using the set function does not work. For example, if I have a list:
List = [10, 10, 11, 20, 15, 20, 15, 11, 10, 10]
In this example, indices 0 and 1 would be a pair, then 2 and 7, 3 and 5, 4 and 6, 8 and 9.
I want to extract from that list the values that make up each pair and create a new list with said values to produce something such as:
newList = [10, 11, 20, 15, 10]
Using the set function makes it such that only one element from the entire set of data is put into the list, where I need half of the total data from List. For situations where I have more than one pair of the same value, it would look something such as:
List = [10, 10, 11, 10, 11, 10]
Would need to produce a list such as:
newList = [10, 11, 10]
Any insight would be great as I am new to Python and there are a lot of functions I may not be aware of.
Thank you
Just try:
new_list = set(list)
This should return your desired output.
If I've understood correctly, you don't want to have any duplicated value, want to retain a list with unique values from a particular list.
If I'm right, a simple way to do so would be:
List = [10, 10, 11, 11, 15, 20, 15, 20]
newList = []
for x in List:
if x not in newList:
newList.append(x)
print(newList)
A python-like way to do so would be:
newList = set(List)
Here is a slight variation on one of #Alain T's answer:
[i for s in [set()] for i in List if (s.remove(i) if i in s else (not s.add(i)))]
NB: the following was my answer before you add the ordering requirement
sorted(List)[::2]
This sorts the input List and then take only one value out of each two consecutive.
As a general approach, this'll do:
l = [10, 10, 11, 20, 15, 20, 15, 11, 10, 10]
i = 0
while i < len(l):
del l[l.index(l[i], i + 1)]
i += 1
It iterates through the list one by one, finding the index of the next occurrence of the current value, and deletes it, shortening the list. This can probably be dressed up in various ways, but is a simple algorithm. Should a number not have a matching pair, this will raise a ValueError.
The following code reates a new list of half the number of items occuring in the input list. The order is in the order of first occurrence in the input list.
>>> from collections import Counter
>>> d = [10, 10, 11, 20, 15, 20, 15, 11, 10, 10]
>>> c = Counter(d)
>>> c
Counter({10: 4, 11: 2, 20: 2, 15: 2})
>>> answer = sum([[key] * (val // 2) for key, val in c.items()], [])
>>> answer
[10, 10, 11, 20, 15]
>>>
If you need to preserve the order of the first occurrence of each pair, you could use a set with an XOR operation on values to alternate between first and second occurrences.
List = [10, 10, 11, 20, 15, 20, 15, 11, 10, 10]
paired = [ i for pairs in [set()] for i in List if pairs.symmetric_difference_update({i}) or i in pairs]
print(p)
# [10, 11, 20, 15, 10]
You could also do this with the accumulate function from itertools:
from itertools import accumulate
paired = [a for a,b in zip(List,accumulate(({n} for n in List),set.__xor__)) if a in b]
print(paired)
# [10, 11, 20, 15, 10]
Or use a bitmap instead of a set (if your values are relatively small positive integers (e.g. between 0 and 64):
paired = [ n for n,m in zip(List,accumulate((1<<n for n in List),int.__xor__)) if (1<<n)&m ]
print(paired)
# [10, 11, 20, 15, 10]
Or you could use a Counter from collections
from collections import Counter
paired = [ i for c in [Counter(List)] for i in List if c.update({i:-1}) or c[i]&1 ]
print(paired)
# [10, 11, 20, 15, 10]
And , if you're not too worried about efficiency, a double sort with a 2 step striding could do it:
paired = [List[i] for i,_ in sorted(sorted(enumerate(List),key=lambda n:n[1])[::2])]
print(paired)
# [10, 11, 20, 15, 10]

Count the number of times values appear within a range of values

How do I output a list which counts and displays the number of times different values fit into a range?
Based on the below example, the output would be x = [0, 3, 2, 1, 0] as there are 3 Pro scores (11, 24, 44), 2 Champion scores (101, 888), and 1 King score (1234).
- P1 = 11
- P2 = 24
- P3 = 44
- P4 = 101
- P5 = 1234
- P6 = 888
totalsales = [11, 24, 44, 101, 1234, 888]
Here is ranking corresponding to the sales :
Sales___________________Ranking
0-10____________________Noob
11-100__________________Pro
101-1000________________Champion
1001-10000______________King
100001 - 200000__________Lord
This is one way, assuming your values are integers and ranges do not overlap.
from collections import Counter
# Ranges go to end + 1
score_ranges = [
range(0, 11), # Noob
range(11, 101), # Pro
range(101, 1001), # Champion
range(1001, 10001), # King
range(10001, 200001) # Lord
]
total_sales = [11, 24, 44, 101, 1234, 888]
# This counter counts how many values fall into each score range (by index).
# It works by taking the index of the first range containing each value (or -1 if none found).
c = Counter(next((i for i, r in enumerate(score_ranges) if s in r), -1) for s in total_sales)
# This converts the above counter into a list, taking the count for each index.
result = [c[i] for i in range(len(score_ranges))]
print(result)
# [0, 3, 2, 1, 0]
As a general rule homework should not be posted on stackoverflow. As such, just a pointer on how to solve this, implementation is up to you.
Iterate over the totalsales list and check if each number is in range(start,stop). Then for each matching check increment one per category in your result list (however using a dict to store the result might be more apt).
Here a possible solution with no use of modules such as numpy or collections:
totalsales = [11, 24, 44, 101, 1234, 888]
bins = [10, 100, 1000, 10000, 20000]
output = [0]*len(bins)
for s in totalsales:
slot = next(i for i, x in enumerate(bins) if s <= x)
output[slot] += 1
output
>>> [0, 3, 2, 1, 0]
If your sales-to-ranking mapping always follows a logarithmic curve, the desired output can be calculated in linear time using math.log10 with collections.Counter. Use an offset of 0.5 and the abs function to handle sales of 0 and 1:
from collections import Counter
from math import log10
counts = Counter(int(abs(log10(abs(s - .5)))) for s in totalsales)
[counts.get(i, 0) for i in range(5)]
This returns:
[0, 3, 2, 1, 0]
Here, I have used the power of dataframe to store the values, then using bin and cut to group the values into the right categories. The extracting the value count into list.
Let me know if it is okay.
import pandas as pd
import numpy
df = pd.DataFrame([11, 24, 44, 101, 1234, 888], columns=['P'])# Create dataframe
bins = [0, 10, 100, 1000, 10000, 200000]
labels = ['Noob','Pro', 'Champion', 'King', 'Lord']
df['range'] = pd.cut(df.P, bins, labels = labels)
df
outputs:
P range
0 11 Pro
1 24 Pro
2 44 Pro
3 101 Champion
4 1234 King
5 888 Champion
Finally, to get the value count. Use:
my = df['range'].value_counts().sort_index()#this counts to the number of occurences
output=map(int,my.tolist())#We want the output to be integers
output
The result below:
[0, 3, 2, 1, 0]
You can use collections.Counter and a dict:
from collections import Counter
totalsales = [11, 24, 44, 101, 1234, 888]
ranking = {
0: 'noob',
10: 'pro',
100: 'champion',
1000: 'king',
10000: 'lord'
}
c = Counter()
for sale in totalsales:
for k in sorted(ranking.keys(), reverse=True):
if sale > k:
c[ranking[k]] += 1
break
Or as a two-liner (credits to #jdehesa for the idea):
thresholds = sorted(ranking.keys(), reverse=True)
c = Counter(next((ranking[t] for t in thresholds if s > t)) for s in totalsales)

Converting (and back) discontinuous values to continuous values in Python 2

Let's say I have a list of:
5 10 10 20 50 50 20
(there are 4 distinguish numbers).
I want to convert them to:
0 1 1 2 3 3 2
(then convert back to the original form).
There are tons of ways to do that, but I am not sure what is the best and Pythonic way?
(a way is to generate a set, convert the set to a list, sort the list, then generate output by the sorted list, but I think it is not the best one)
I think this is a good problem for make use of collections.defaultdict() and itertools.count() methods.
from itertools import count
from collections import defaultdict
c = count()
dct = defaultdict(lambda: next(c))
lst = [5, 10, 10, 20, 50, 50, 20]
conv = [dct[i] for i in lst]
# [0, 1, 1, 2, 3, 3, 2]
back = [k for c in conv for k, v in dct.items() if v == c]
# [5, 10, 10, 20, 50, 50, 20]
The suggested answer by Delgan is O(n^2) due to the nested loops in back. This solution is O(n).
An alternative solution is as follows:
lst = [5, 10, 10, 20, 50, 50, 20]
# Convert (and build reverse mapping)
mapping = {}
reverse_mapping = {}
conv = []
for i in lst:
v = mapping.setdefault(i, len(mapping))
reverse_mapping[v] = i
conv.append(v)
# Convert back
back = [reverse_mapping[v] for v in conv]

Categories

Resources