I'm currently coding an algorithm for 4-parametric RAINFLOW method. The idea of this method is to eliminate load cycles from a cycle history, which is normally given in a load (for example force) - time diagram. This is a very frequently used method in mechanical engineering to determine the life span of a product/element, that is exposed to a certain number of the load cycles.
However the result of this method is a so called FROM-TO table or FROM-TO matrix where the rows present the FROM and the columns present the TO number like shown in the picture below:
example of from-to table/matrix
This example is non-realistic as you normally get a file with million points of measurements which means, that some cycles won't occur only once(1) or twice (2) like its shown in the table, but they may occur thousands of times.
Now to the problem:
I coded the algorithm of the method and as a result formed a vector with FROM values and a vector with TO values, like this:
vek_from=[]
vek_to=[]
d=len(a)/2
for i in range(int(d)):
vek_from.append(a[2*i]) # FROM
vek_to.append(a[2*i+1]) # TO
a is the vector with all values, like a=[from, to, from, to,...]
Now I'm trying to form a matrix out of this, like this:
mat_from_to = np.zeros(shape=(int(d),int(d)))
MAT = np.zeros(shape=(int(d),int(d)))
s=int(d-1)
for i in range(s):
mat_from_to[vek_from[i]-2, vek_to[i]-2] += 1
So the problem is that I don't know how to code that when a load cycles occurs several times (it has the same from-to values), how to add +1 to the FROM-TO combination every time that happens, because with what I've coded, it only replaces the previous value with 1, so I can never exceed 1...
So to make explanation shorter, how to code that whenever a combination of FROM-TO values is made that determined the position of an element in the matrix, to add a +1 there...
Hopefully I didn't make it too complicated and someone will be happy to help me with this.
Regards,
Luka
Related
I keep encountering the same issue while trying to solve an Integer Programming Problem with cvxpy particularly with the constraints.
Some background on my problem and use case. I am trying to write a program that optimizes cut locations for 3D objects. The goal is to have as few interfaces as possible, but there is a constraint that each section can only have a certain maximum length. To visualize, you could picture a tree. If you cut at the bottom, you only have to make one large cut, but if the tree is larger than the maximum allowed length (if you needed to move it with a trailer of certain length for example) you would need to make another or several more cuts along the tree. As you go further up, it is likely that in addition to the main stem of the tree, you would need to cut some smaller, side branches along the same horizontal plane. I have written a program that output the number of interfaces (or cuts) needed at many evenly spaced horizontal planes along the height of an object. Now I am trying to pass that data to a new piece of code that will perform an Integer Programming optimization to determine the best location(s) to cut the tree if it treats each of the horizontal cutting planes as either active or inactive.
Below is my code:
#Create ideal configuration to solve for
config = cp.Variable(layer_split_number, boolean=True)
#Create objective
objective = sum(config*layer_islands_data[0])
problem = cp.Problem(cp.Minimize(objective),[layer_height_constraint(layer_split_number,layer_height,config) <= ChunkingParameters.max_reach_z])
#solve
problem.solve(solver = cp.GLPK_MI)
Layer Height Constraint function
def layer_height_constraint(layer_split_number,layer_height,config):
#create array of the absolute height (relative to ground) of each layer
layer_height_array = (np.array(range(1,layer_split_number+1))+1)*layer_height
#filter set inactive cuts to 0
active_heights = layer_height_array * config
#filter out all 0's
active_heights_trim = active_heights[active_heights != 0]
#insert top and bottom values
active_heights = np.append(active_heights,[(layer_split_number+1)*layer_height])
active_heights_trim = np.insert(active_heights,0,0)
#take the difference between active cuts to find distance
active_heights_diff = np.diff(active_heights_trim)
#find the maximum of those differences
max_height = max(active_heights_diff)
return(max_height)
With this setup, I get the following error:
Cannot evaluate the truth value of a constraint or chain constraints, e.g., 1 >= x >= 0.
I know that the two problem spots are using the python 'max' function in the last step and the step in the middle where I filter out the 0s in the array (because this introduces another equality of sorts. However, I can't really think of another way to solve this or setup the constraints differently. Is it possible to have cvxpy just accept a value into the constraint? My function is setup to just output a singular maximum distance value based on a given configuration, so to me it would make sense if I could just feed it the value being tried for the configuration (array of 0s and 1s representing inactive and active cuts respectively) for the current iteration and function would return the result which can then just be compared to the maximum allowed distance. However, I'm pretty sure IP solvers are a bit more complex than just running a bunch of iterations but I don't really know.
Any help on this or ideas would be greatly appreciated. I have tried an exhaustive search of the solution space, but when I have 10 or even 50+ cuts an exhaustive search is super inefficient. I would need to try 2^n combinations for n number of potential cuts.
I want to generate many randomized realizations of a low discrepancy sequence thanks to scipy.stat.qmc. I only know this way, which directly provide a randomized sequence:
from scipy.stats import qmc
ld = qmc.Sobol(d=2, scramble=True)
r = ld.random_base2(m=10)
But if I run
r = ld_deterministic.random_base2(m=10)
twice I get
The balance properties of Sobol' points require n to be a power of 2. 2048 points have been previously generated, then: n=2048+2**10=3072. If you still want to do this, the function 'Sobol.random()' can be used.
It seems like using Sobol.random() is discouraged from the doc.
What I would like (and it should be faster) is to first get
ld = qmc.Sobol(d=2, scramble=False)
then to generate like a 1000 scrambling (or other randomization method) from this initial series.
It avoids having to regenerate the Sobol sequence for each sample and just do scrambling.
How to that?
It seems to me like it is the proper way to do many Randomized QMC, but I might be wrong and there might be other ways.
As the warning suggests, Sobol' is a sequence meaning that there is a link between with the previous samples. You have to respect the properties of 2^m. It's perfectly fine to use Sobol.random() if you understand how to use it, this is why we created Sobol.random_base2() which prints a warning if you try to do something that would break the properties of the sequence. Remember that with Sobol' you cannot skip 10 points and then sample 5 or do arbitrary things like that. If you do that, you will not get the convergence rate guaranteed by Sobol'.
In your case, what you want to do is to reset the sequence between the draws (Sobol.reset). A new draw will be different from the previous one if scramble=True. Another way (using a non scrambled sequence for instance) is to sample 2^k and skip the first 2^(k-1) points then you can sample 2^n with n<k-1.
I'm building a web app to match high school students considering a gap year to students who have taken a gap year, based on interest as denoted by tags. A prototype is up at covidgapyears.com. I have never written a matching/recommendation algorithm, so though people have suggested things like collaborative filtering and association rule mining, or adapting the stable marriage problem, I don't think any of those will work because it's a small dataset (few hundred users right now, few thousand soon). So I wrote my own alg using common sense.
It essentially takes in a list of tags that the student is interested it, then searches for an exact match of those tags with someone who has taken a gap year and registered with the site (who also selected tags on registration). An exactMatch, as given below, is when the tags the user specifies are ALL contained by some profile (i.e., are a subset). If it can't find an exact match with ALL of the user's inputted tags, it will check all n-1 length subsets of the tags list itself to see if any less selective queries have matches. It does this recursively until at least 3 matches are found. While it works fine for small tags selections (up to 5-7) it gets slow for larger tags selections (7-13), taking several seconds to return a result. When 11-13 tags are selected, hits a Heroku error due to worker timeout.
I did some tests by putting variables inside the algorithm to count computations and it seems that when it goes a bit deep into the recursive stack, it checks a few hundred subsets each time (to see if there's an exactMatch for that subset, and if there is, add it to results list to output), and the total number of computations doubles as you add one more tag (it went 54, 150, 270, 500, 1000, 1900, 3400 operations for more and more tags). It is true that there are a few hundred subsets at each depth. But exactMatches is O(1) as I've written it (no iteration), and aside from the other O(1) operations like IF, the FOR inside the subset loop will, at most, be gone through around 10 times. This agrees with the measured result of a few thousand computations each time.
This did not surprise me as selecting and iterating over all subsets seems to be something that could get non harder, but my question is about why it's so slow despite only doing a few thousand computations. I know my computer operates in GHz and I expect web servers are similar, so surely a few thousand computations would be near-instantaneous? What am I missing and how can I improve this algorithm? Any other approaches I should look into?
# takes in a list of length n and returns a list of all combos of subsets of depth n
def arbSubsets(seq, n):
return list(itertools.combinations(seq, len(seq)-n))
# takes in a tagsList and check Gapper.objects.all to see if any gapper has all those tags
def exactMatches(tagsList):
tagsSet = set(tagsList)
exactMatches = []
for gapper in Gapper.objects.all():
gapperSet = set(gapper.tags.names())
if tagsSet.issubset(gapperSet):
exactMatches.append(gapper)
return exactMatches
# takes in tagsList that has been cleaned to remove any tags that NO gappers have and then checks gapper objects to find optimal match
def matchGapper(tagsList, depth, results):
# handles the case where we're only given tags contained by no gappers
if depth == len(tagsList):
return []
# counter variable is to measure complexity for debugging
counter += 1
# we don't want too many results or it stops feeling tailored
upper_limit_results = 3
# now we must check subsets for match
subsets = arbSubsets(tagsList, depth)
for subset in subsets:
counter += 1
matches = exactMatches(subset)
if matches:
for match in matches:
counter += 1
# new need to check because we might be adding depth 2 to results from depth 1
# which we didn't do before, to make sure we have at least 3 results
if match not in results:
# don't want to show too many or it doesn't feel tailored anymore
counter += 1
if len(results) > upper_limit_results: break
results.append(match)
# always give at least 3 results
if len(results) > 2:
return results
else:
# check one level deeper (less specific) into tags if not enough gappers that match to get more results
counter += 1
return matchGapper(tagsList, depth + 1, results)
# this is the list of matches we then return to the user
matches = matchGapper(tagsList, 0, [])
It doesn't seem you are doing a few hundred computation steps. In fact you have a few hundred options for each depth, thus you should not add, but multiply the number of steps at each depth to estimate the complexity of your solution.
Additionally this statement: This or adapting the stable marriage problem, I don't think any of those will work because it's a small dataset is also obviously not true. Although these algorithms may be overkill for some very simple cases, they are still valid and will work for them.
Okay, so after much fiddling with timers I've figured it out. There are a few functions at play when matching: exactMatches, matchGapper and arbSubset. When I put the counter into a global variable and measured operations (as measured as lines of my code being executed, it came in around 2-10K for large inputs (around 10 tags)).
It is true that arbSubset, which returns a list of subsets, at first seems like a plausible bottleneck. But if you look closely, we are 1) handling small amounts of tags (order of 10-50) and more importantly, 2) we are only calling arbSubset when we recurse matchGapper, which only happens a max of about 10 times, since tagsList can only be around 10 (order of 10-50, as above). And when I checked the time it took to generate arbSubsets, it was order of 2e-5. And so the total time spend on generating the subsets of arbitrary size is only 2e-4. In other words, not the source of the 5-30 second waiting time in the web app.
And so with that aside, knowing that arbSubset is only called on the order of 10 times, and is fast at that, and knowing that there are only around a max of 10K computations taking place in my code it starts to become clear that I must be using some out-of-the-box function, I don't know--like set() or .issubset() or something like that--that takes a nontrivial amount of time to compute, and is executed many times. Adding some counters in some more places, it becomes clear that exactMatch() accounts for around 95-99% of all computations that take place (as would be expected if we have to check all combinations of subsets of various sizes for exactMatches).
So the problem, at this point, is reduced to the fact that exactMatch takes around 0.02s (empirically) as implemented, and is called several thousand times. And so we can either try to make it faster by a couple of order of magnitudes (it's already pretty optimal), or take another approach that doesn't involve finding matches using subsets. A friend of mine suggested creating a dict with all the combinations of tags (so 2^len(tagsList) keys) and setting them equal to lists of registered profiles with that exact combination. This way, querying is just traversing a (huge) dict, which can be done fast. Any other suggestions are welcome.
I have wrote a code in Python for CRP problem. The problem itself can be found here:
http://cog.brown.edu/~mj/classes/cg168/slides/ChineseRestaurants.pdf
And to give a short description of it:
Suppose we want to assign people entering to a restaurants to potentially infinite number of tables. If $z_i$ represents the random variable assigned for the $i$'th person entering the restaurant the following should hold:
With probability $p(z_i=a|z_1,...,z_{i-1})=\frac{n_a}{i-1+\alpha} for $n_a>0$, $i$'th person will sit in table $a$ and with probability $p(z_i=a|z_1,...,z_{i-1})=\frac{\alpha}{i-1+\alpha} $i$'th person will sit around a new table.
I am not quite sure if my code is correct cause I am surprised how small the final number of tables are.
I would be happy if somebody could say if the implementation is correct and if so are there any possible improvements.
import numpy as np
def CRP(alpha,N):
"""Chinese Restaurant Process with alpha as concentration parameter and N
the number of sample"""
#Array which will save for each i, the number of people people sitting
#until table i
summed=np.ones(1) #first person assigned to the first table
for i in range(1,N):
#A loop that assigns the people to tables
#randind represent the random number from the interval [1,i-1+alpha]
randind=(float(i)+alpha)*np.random.uniform(low=0.0, high=1.0, size=1)
#update is the index for the table that the person should be placed which
#if greater than the total number, will be placed in a new table
update=np.searchsorted(summed,randind,side='left')
if randind>i:
summed=np.append(summed,i+1)
else:
zerovec=np.zeros(update)
onevec=np.ones(summed.size-update)
summed+=np.append(zerovec,onevec)
#This part converts summed array to tables array which indicates the number
#of persons assigned to that table
tables=np.zeros(summed.size)
tables[0]=summed[0]
for i in range(1,summed.size):
tables[i]=summed[i]-summed[i-1]
return tables
a=CRP(0.9999,1000)
print a
Suggestion. Forget about the code you have written. Construct declarative tests of the code. By taking that approach, you start with examples for which you know the correct answer. That would have answered Brainiac's question, for example.
Then write your program. You will likely find that if you start approaching problems this way, you may create sub-problems first, for which you can also write tests. Until they all pass, there is no need to rush on to the full problem.
Hi I have done reseatch and I believe I ended up in the right direction when I ended up at this thread:
http://code.activestate.com/recipes/117241/
Basically my question is: what is the code in the link doing line by line. You could potentially ignore all that I wrote below if your explanation makes me understand what the code in the link does to a satisfactory extent.
I BELIEVE that the code at that link generates a random number BUT the random number is directly related to the probability.
In my own code I am attempting to take a "number" and its probability of appearing, and get an output "number", that will appear according to the probability. I know this is confusing but if you look at the link above then I hope it will be clear what I am trying to do. My code below is in reference to the link above.
so in my program, these are my global variables:
HIGH= 3
MED= 2
LOW= 1
This is the list I am working with:
n= [LOW,lowAttackProb).(MED,medAttackProb),(HIGH,highAttackProb)]
#lowAttackProb,med...,etc. are based on user input are just percents converted to decimals that add up to 1 in every case
This is how I implemented the random code as per the link above:
x= random.uniform(0,1)
for alevel,probability in n:
if x<probability:
break
x=x-probability
return alevel
I am unsure exactly what is happening inside the for loop and what x=x-probability is doing.
Lets say that x=0.90
and that in my list, the chance of the second list entry occuring is 0.60, then, since x (is less than)probability is False(im not too sure what if x(is less than)probability even does), the code moves on to n=n-probability.
I really hope this makes sense. If it does not please let me know what is unclear and I will try to fix it up. Thank you for any and all help.
This code implements the selection of event taking probabilities of possible events into account. Here is the idea behind it.
There are three events (or levels as you call them), LOW, MED, HIGH, with certain nonzero probability each, and all probabilities sum up to exactly 1. Using standard means of Python one can generate a random number between 0 and 1. So how can we "map" them to each other? Lets align our probabilities (lets call them L, M, and H for brevity) along the numbers line the following way:
0__________________L______________L+M_________________________L+M+H ( = 1)
Now taking our randomly generated number x we can say that
If x lies in interval [0, L] then the first event occurred.
If x lies in half-interval (L, L+M] then the second event occurred.
If x lies in half-interval (L+M, L+M+H] then the third event occurred.
The code you are asking about simply matches x to one of the intervals and returns the corresponding event (or level).