Sum combination of lists by element - python

I have a nested list, which can be of varying length (each sublist will always contain the same number of elements as the others):
list1=[[4.1,2.9,1.2,4.5,7.9,1.2],[0.7,1.1,2.0,0.4,1.8,2.2],[5.1,4.1,6.5,7.1,2.3,3.6]]
I can find every possible combination of sublists of length n using itertools:
n=2
itertools.combinations(list1,n)
[([4.1, 2.9, 1.2, 4.5, 7.9, 1.2], [0.7, 1.1, 2.0, 0.4, 1.8, 2.2]),
([4.1, 2.9, 1.2, 4.5, 7.9, 1.2], [5.1, 4.1, 6.5, 7.1, 2.3, 3.6]),
([0.7, 1.1, 2.0, 0.4, 1.8, 2.2], [5.1, 4.1, 6.5, 7.1, 2.3, 3.6])]
I would like to sum all lists in each tuple by index. In this example, I would end up with:
[([4.8, 4.0, 3.2, 4.9, 9.7, 3.4],
[9.2, 7.0, 7.7, 6.8, 10.2, 4.8],
[5.8, 5.2, 8.5, 7.5, 4.1, 5.8])]
I have tried:
[sum(x) for x in itertools.combinations(list1,n)]
[sum(x) for x in zip(*itertools.combinations(list1,n))]
Each run into errors.

You can use zip for this:
>>> [tuple(map(sum, zip(*x))) for x in itertools.combinations(list1, n)]
[(4.8, 4.0, 3.2, 4.9, 9.700000000000001, 3.4000000000000004),
(9.2, 7.0, 7.7, 11.6, 10.2, 4.8),
(5.8, 5.199999999999999, 8.5, 7.5, 4.1, 5.800000000000001)]

Try this :
>>> list1=[[4.1,2.9,1.2,4.5,7.9,1.2],[0.7,1.1,2.0,0.4,1.8,2.2],[5.1,4.1,6.5,7.1,2.3,3.6]]
>>> from itertools import combinations as c
>>> list(list(map(sum, zip(*k))) for k in c(list1, 2))
[[4.8, 4.0, 3.2, 4.9, 9.700000000000001, 3.4000000000000004], [9.2, 7.0, 7.7, 11.6, 10.2, 4.8], [5.8, 5.199999999999999, 8.5, 7.5, 4.1, 5.800000000000001]]

Related

I would like to create a new column based on conditions using .loc

I have the code:
to_test['averageRating'].unique()
array([5.8, 5.2, 5. , 6.5, 5.5, 7.3, 7.2, 4.2, 6.4, 7.1, 6.6, 5.4, 6.9,
6. , 6.1, 8.1, 6.3, 7.8, 3.9, 6.8, 6.2, 7.9, 7. , 4.9, 5.9, 7.5,
6.7, 8. , 5.7, 3.2, 4.8, 5.6, 7.4, 4.5, 3.6, 4.3, 3.4, 5.1, 4.4,
4.7, 7.7, 5.3, 4. , 8.4, 7.6, 3.3, 2.2, 3.7, 8.2, 4.1, 8.3, 1.7,
9. , 4.6, 8.5, 3.1, 3.8, 3.5, 1.9, 2.9, 2.8, 2.7, 9.2, 1.2, 2.1,
3. , 1.3, 1.1, 8.6, 2.5, 1. , 9.8, 8.7, 1.5, 9.3])
`
create a list of our conditions
conditions = [(to_test.loc[(to_test['averageRating']>=0.0) & (to_test['averageRating'] <= 3.3)]),
(to_test.loc[(to_test['averageRating']>=3.4) & (to_test['averageRating'] <=6.6)]),
(to_test.loc[(to_test['averageRating']>=6.7) & (to_test['averageRating'] <=10)])]
create a list of the values we want to assign for each condition
values = ['group1', 'group2', 'group3']
create a new column and use np.select to assign values to it using our lists as arguments
to_test['group'] = np.select(conditions, values)
display updated DataFrame
to_test.head()`
but it's not working
This is using a classic case of using cut. Sample code
df = pd.DataFrame({'averageRating' : np.random.uniform(0,10,100)})
df['group_using_cut'] = pd.cut(df['averageRating'],
[0,3.3,6.6,10],
labels=['group1','group2','group3'])
If you want to use np.select use conditions without loc
Sample Code
conds = [
(df['averageRating']>=0.0) & (df['averageRating'] <= 3.3),
(df['averageRating']>=3.4) & (df['averageRating'] <= 6.6),
(df['averageRating']>=6.7) & (df['averageRating'] <= 10),
]
df['group_using_selec'] = np.select(conds,['group1','group2','group3'])
Output df.head()

Best way to split a dict based on key

I have a dict that looks like this (shortened)
{'18/09/2022-morning': [5.4, 6.0, 6.5, 6.7, 6.9, 7.9, 8.5, 7.5, 7.9, 7.8, 7.6, 6.8],
'18/09/2022-night': [6.4, 5.7, 4.8, 5.4, 4.7, 4.3],
'19/09/2022-morning': [3.8],
'19/09/2022-night': [4.1, 4.4, 4.3, 3.8, 3.5, 2.8]}
What is the best way to split it into two different dictionaries based on morning/night?
I can't think of an easy way to this! Example of desired output:
dic1 = {'18/09/2022-morning': [5.4, 6.0, 6.5, 6.7, 6.9, 7.9, 8.5, 7.5, 7.9, 7.8, 7.6, 6.8], '19/09/2022-morning': [3.8]}
dic2 = {'18/09/2022-night': [6.4, 5.7, 4.8, 5.4, 4.7, 4.3], '19/09/2022-night': [4.1, 4.4, 4.3, 3.8, 3.5, 2.8]}
mornings = {}
nights = {}
for k, v in d.items():
if k.endswith("morning"):
mornings[k] = v
else:
nights[k] = v

Use itertools.groupby (or a neet, pythonic way) to group a list by the difference between the consecutive numbers

I've read this question
But my question is a little different:
For example:
[0.0, 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 5.0, 9.1, 9.2, 9.3, 9.4, 9.5, 9.6, 9.7, 9.8, 9.9, 10.0]
should gave me:
[[0.0], [1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9], [5.0], [9.1, 9.2, 9.3, 9.4, 9.5, 9.6, 9.7, 9.8, 9.9, 10.0]]
All the types of them is float.The difference of each element of sub list should smaller than 0.1.
I try to solve it without using third party module.(Not homework, just a practice for python)
What I have tried: too much code with itertools.groupby(Couldn't solve it).One of my attempt:
import itertools
lst = [0.0, 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 5.0, 9.1, 9.2, 9.3, 9.4, 9.5, 9.6, 9.7, 9.8, 9.9, 10]
res = []
for _, item in itertools.groupby(enumerate(lst), key=lambda index_num: index_num[0]-index_num[1]):
print(list(item)) # Not expected. The solution I mentioned didn't work.
I want a neat, pythonic way to solve it.Any tricks are welcomed, I just want to learn more skills.:)

Fit gamma distribution with fixed mean with scipy?

scipy.stats.rv_continuous.fit, allows you to fix parameters when fitting a distribution, but it's dependent on scipy's choice of parametrization. For the gamma distribution is uses the k, theta (shape, scale) parametrization, so it would be easy to fit while holding theta constant, for example. I want to fit to a data set where I know the mean, but the observed mean might vary due to sampling error. This would be easy if scipy used the parametrization that uses mu = k*theta instead of theta. Is there a way to make scipy do this? And if not, is there another library that can?
Here's some example code with a data set has an observed mean of 9.952, but I know the actual mean of the underlying distribution is 11:
from scipy.stats import gamma
observations = [17.6, 24.9, 3.9, 17.6, 11.8, 10.4, 4.1, 11.7, 5.7, 1.6,
8.6, 12.9, 5.7, 8.0, 7.4, 1.2, 11.3, 10.4, 1.0, 1.9,
6.0, 9.3, 13.3, 5.4, 9.1, 4.0, 12.8, 11.1, 23.1, 4.2,
7.9, 11.1, 10.0, 3.4, 27.8, 7.2, 14.9, 2.9, 5.5, 7.0,
3.9, 12.3, 10.6, 22.1, 5.0, 4.1, 21.3, 15.9, 34.5, 8.1,
19.6, 10.8, 13.4, 22.8, 27.6, 6.8, 5.9, 9.0, 7.1, 21.2,
1.0, 14.6, 16.9, 1.0, 6.5, 2.9, 7.1, 14.1, 15.2, 7.8,
9.0, 4.9, 2.1, 9.5, 5.6, 11.1, 7.7, 18.3, 3.8, 11.0,
4.2, 12.5, 8.4, 3.2, 4.0, 3.8, 2.0, 24.7, 24.6, 3.4,
4.3, 3.2, 7.6, 8.3, 14.5, 8.3, 8.4, 14.0, 1.0, 9.0]
shape, _, scale = gamma.fit(observations, floc = 0)
print(shape*scale)
and this gives
9.952
but I would like a fit such that shape*scale = 11.0
The fit method of the SciPy distributions provides the maximum likelihood estimate of the parameters. You are correct that it only provides for fitting the shape, location and scale. (Actually, you said shape and scale, but SciPy also includes a location parameter. Sometimes this is called the three parameter gamma distribution.)
For most of the distributions in SciPy, the fit method uses a numerical optimizer to minimize the negative log-likelihood, as defined in the nnlf method. Instead of using the fit method, you could do this yourself with just a couple lines of code. This allows you to create an objective function with just one parameter, say the shape k, and within that function, set theta = mean/k, where mean is the desired mean, and call gamma.nnlf to evaluate the negative log-likelihood. Here's one way you could do it:
import numpy as np
from scipy.stats import gamma
from scipy.optimize import fmin
def nll(k, mean, x):
return gamma.nnlf(np.array([k[0], 0, mean/k[0]]), x)
observations = [17.6, 24.9, 3.9, 17.6, 11.8, 10.4, 4.1, 11.7, 5.7, 1.6,
8.6, 12.9, 5.7, 8.0, 7.4, 1.2, 11.3, 10.4, 1.0, 1.9,
6.0, 9.3, 13.3, 5.4, 9.1, 4.0, 12.8, 11.1, 23.1, 4.2,
7.9, 11.1, 10.0, 3.4, 27.8, 7.2, 14.9, 2.9, 5.5, 7.0,
3.9, 12.3, 10.6, 22.1, 5.0, 4.1, 21.3, 15.9, 34.5, 8.1,
19.6, 10.8, 13.4, 22.8, 27.6, 6.8, 5.9, 9.0, 7.1, 21.2,
1.0, 14.6, 16.9, 1.0, 6.5, 2.9, 7.1, 14.1, 15.2, 7.8,
9.0, 4.9, 2.1, 9.5, 5.6, 11.1, 7.7, 18.3, 3.8, 11.0,
4.2, 12.5, 8.4, 3.2, 4.0, 3.8, 2.0, 24.7, 24.6, 3.4,
4.3, 3.2, 7.6, 8.3, 14.5, 8.3, 8.4, 14.0, 1.0, 9.0]
# This is the desired mean of the distribution.
mean = 11
# Initial guess for the shape parameter.
k0 = 3.0
opt = fmin(nll, k0, args=(mean, np.array(observations)),
xtol=1e-11, disp=False)
k_opt = opt[0]
theta_opt = mean / k_opt
print(f"k_opt: {k_opt:9.7f}")
print(f"theta_opt: {theta_opt:9.7f}")
This script prints
k_opt: 1.9712604
theta_opt: 5.5801861
Alternatively, one can modify the first order conditions for the extremum of the log-likelihood shown in wikipedia so that there is just one parameter, k. Then the condition for the extreme value can be implemented as a scalar equation whose root can be found with, say, scipy.optimize.fsolve. The following is a variation of the above script that uses this technique.
import numpy as np
from scipy.special import digamma
from scipy.optimize import fsolve
def first_order_eq(k, mean, x):
mean_logx = np.mean(np.log(x))
return (np.log(k) - digamma(k) + mean_logx - np.mean(x)/mean
- np.log(mean) + 1)
observations = [17.6, 24.9, 3.9, 17.6, 11.8, 10.4, 4.1, 11.7, 5.7, 1.6,
8.6, 12.9, 5.7, 8.0, 7.4, 1.2, 11.3, 10.4, 1.0, 1.9,
6.0, 9.3, 13.3, 5.4, 9.1, 4.0, 12.8, 11.1, 23.1, 4.2,
7.9, 11.1, 10.0, 3.4, 27.8, 7.2, 14.9, 2.9, 5.5, 7.0,
3.9, 12.3, 10.6, 22.1, 5.0, 4.1, 21.3, 15.9, 34.5, 8.1,
19.6, 10.8, 13.4, 22.8, 27.6, 6.8, 5.9, 9.0, 7.1, 21.2,
1.0, 14.6, 16.9, 1.0, 6.5, 2.9, 7.1, 14.1, 15.2, 7.8,
9.0, 4.9, 2.1, 9.5, 5.6, 11.1, 7.7, 18.3, 3.8, 11.0,
4.2, 12.5, 8.4, 3.2, 4.0, 3.8, 2.0, 24.7, 24.6, 3.4,
4.3, 3.2, 7.6, 8.3, 14.5, 8.3, 8.4, 14.0, 1.0, 9.0]
# This is the desired mean of the distribution.
mean = 11
# Initial guess for the shape parameter.
k0 = 3
sol = fsolve(first_order_eq, k0, args=(mean, observations),
xtol=1e-11)
k_opt = sol[0]
theta_opt = mean / k_opt
print(f"k_opt: {k_opt:9.7f}")
print(f"theta_opt: {theta_opt:9.7f}")
Output:
k_opt: 1.9712604
theta_opt: 5.5801861

How can I convert a dict_keys list to integers

I am trying to find a way of converting a list within dict_keys() to an integer so I can use it as a trigger to send to another system. My code (below) imports a list of 100 words (a txt file with words each on a new line) which belong to 10 categories (e.g. the first 10 words belong to category 1, second 10 words belong to category 2 etc...).
Code:
from numpy.random import choice
from collections import defaultdict
number_of_elements = 10
Words = open('file_location').read().split()
categories = defaultdict(list)
for i in range(len(words)):
categories[i/number_of_elements].append(words[i])
category_labels = categories.keys()
category_labels
Output
dict_keys([0.0, 1.1, 2.0, 3.0, 4.9, 5.0, 0.5, 1.9, 8.0, 9.0, 1.3, 2.7, 3.9, 9.2, 9.4, 7.2, 4.2, 8.6, 5.1, 5.4, 3.3, 1.0, 6.6, 7.4, 7.7, 8.4, 5.8, 9.8, 0.7, 8.8, 2.1, 7.0, 6.4, 4.3, 0.1, 2.5, 3.8, 1.2, 6.9, 7.1, 5.6, 0.4, 5.3, 2.9, 7.3, 3.5, 9.5, 8.2, 2.8, 3.1, 0.9, 2.3, 8.1, 4.0, 6.3, 6.7, 4.5, 0.2, 1.7, 2.2, 8.9, 1.4, 7.6, 9.1, 7.8, 5.5, 4.8, 0.6, 3.2, 2.4, 6.5, 9.9, 9.6, 1.5, 6.0, 3.7, 4.7, 3.4, 5.9, 4.1, 1.6, 6.8, 9.3, 3.6, 8.5, 8.7, 0.3, 0.8, 7.5, 5.2, 2.6, 4.6, 5.7, 7.9, 6.1, 1.8, 8.3, 6.2, 9.7, 4.4])
What I need:
I would like the first number before the point (e.g. if it was 6.7, I just want the 6 as an int).
Thank you in advance for any help and/or advice!
Just convert your keys to integers using a list comprehension; note that there is no need to call .keys() here as iteration over the dictionary directly suffices:
[int(k) for k in categories]
You may want to bucket your values directly into integer categories rather than by floating point values:
categories = defaultdict(list)
for i, word in enumerate(words):
categories[int(i / number_of_elements)].append(word)
I used enumerate() to pair words up with their index, rather than use range() plus indexing back into words.

Categories

Resources