Finding closest combinations of a given sum without replacement [closed] - python

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
I have been dipping my feet into programming and originally was making a simple macro in VBA to find a certain combination that sum to a given number when given a list of inputs. I imagined a number of tasks and minutes to do those tasks, and inputted a number of people to divide the minutes approximately equal between them. To give a concrete example, if I had eight tasks and three people, I might have the following variables:
M = [44,39,29,77,102,35,40,59]
N = 3
Avg = sum(M)/N
I want the program to be able to find the set of combinations for each person that sum closest to the average value. For instance in this example, I would like an output of something like:
A = [102, 40], B = [44,39,59], C = [29,77,35]
If anyone can at least lead me in the right direction with regards to this project, I would be grateful. While this began as an aside from a macro for an Excel sheet, I wouldn't mind if I learned more about optimization algorithms in a more suitable language like Python.

Getting each persons work as close as possible to the mean is equivalent to max-min fair allocation problem
It’s essentially an optimization problem — google research did some work on this here
https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/45386.pdf
A CS professor wrote a Python module for this https://github.com/anirudhSK/cell-codel/blob/master/schism/utils/max-min-fairness.py

Related

Use scenarios for complex data type in Python [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
As we all know in python, there is one type of data - complex . I read the doc and then cannot find the use case, when to use it, and then what's its characters?
a = 20
b = 328
c = complex(a, b)
print(type(c))
Complex data types are used for advanced calculation in Electronics and applied or astrophysics or such type, just like we use complex numbers in the real world.
Complex numbers are used in electronics and electromagnetism. A single complex number puts together two real quantities, making the numbers easier to work with. For example, in electronics, the state of a circuit element is defined by the voltage (V) and the current (I).
In python, some libraries are used these things like:-
SkiDL
PySpice
Numpy
...etc

To optimize 20 parameters which should be the best algorithm to use? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I have 20 parameters that can take binary value which are passed to function to return the score like this.
score = fmin( para 1, para 2 , para 3,.....para20)
Now to optimize this scenario, which can be the best algorithm ?
I read about genetic algorithm where in chromosome can do mutation and crossover to select best combination out of 2^20 search points.
I also read about hyperopt that optimises the function but in less number of trials.
Which can be the better one ? Any pros or cons of using these algorithms ?
It really depends on the properties you expect your function to have. If you have reason to believe that similar parameter sets have similar scores, then you can try simulated annealing or genetic algorithms.
However, if you don't have reason to expect similar parameters will generate similar scores, those methods won't help: you would do just as well picking parameter sets at random. But (as mentioned in the comments), 2^20 isn't much more than a million trials: if your function isn't too expensive, you could just try them all.

Fast way to determine the optimal number of topics for a large corpus using LDA [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I have a corpus consisting of around 160,000 documents. I want to do a topic modeling on it using LDA in R (specifically the function lda.collapsed.gibbs.sampler in lda package).
I want to determine the optimal number of topics. It seems the common procedure is to have a vector of topic numbers, e.g., from 1 to 100, then run the model for 100 times and the find the one has the largest harmonic mean or samllest perplexity.
However, given the large amount of documents, the optimal number of topics can easily go to several hundreds or even thousands. I find that as the number of topic increases, the computation time grows significantly. Even if I use parallel computing, it will several days or weeks.
I wonder is there a better (time-efficient) way to choose the optimal number of topics? or is there any suggestion to reduce the computation time?
Any suggestion is welcomed.
Start with some guess in middle. decrease and increase the number of topics by say 50 or 100 instead of 1. Check in which way Coherence Score is increasing. I am sure it will converge.

Solution search solved using DFS or Greedy BFS? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I have a problem that sounds like this: A company has 4 taxis in 4 different (A B C D) locations. 4 people (W X Y Z) call the company that they need a taxi. I need to find the fastest way that the taxis can arrive at their people knowing that a taxi can only go for 1 person and each taxi has assigned a value between its destination and the people's destinations.
I was thinking of building a tree with all the possible combinations ex: AW-BX-CY-DZ or AX-BW-CY-DZ etc and find the minimum cost for each of them but I need to solve this using the DFS or greedy BFS approach. Any ideas how this would work? I can't imagine it.
I just want the idea on how to solve this using DFS/GBFS. I can't figure out how it would have to go or when the search would end since I'm looking for the minimum distance used
This is an instance of an assignment problem, which is finding maximum/minimum weight matching in a weighted bipartite graph. Most common algorithm used to solve this kind of problem is the Hungarian Algorithm, solving it in O(n^3). There is a Python module implementing it - munkres.
However, If You really want to use DFS/BFS You can think of some naive algorithm creating every possible solution, and then searching through the solution space using DFS/BFS, but it will be highly nonoptimal.

What is the average length of a Python module [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
I have bits and bobs of code and I'm thinking of getting them in a python module. But I might need a Python package.
I know it has to do mostly with how I want to divide my code.
But I still need to know what is an average length (in lines) of a python module.
Using the following numbers please select small | average | big
1,000 lines of python
10,000 lines
50,000 lines
100,000 lines
1,000,000 lines
Please help.
A module should be the smallest indepently useable unit of code. That's what modules are for: modularity, independence, take only what you need. A package should be a set of modules that functionally belong together to cover a certain problem area, e.g. statistical computations or 3D graphics.
So the number of lines is not really important. Still I think modules of 10000+ lines are rare. But there's no lower bound.

Categories

Resources