I want to generate many randomized realizations of a low discrepancy sequence thanks to scipy.stat.qmc. I only know this way, which directly provide a randomized sequence:
from scipy.stats import qmc
ld = qmc.Sobol(d=2, scramble=True)
r = ld.random_base2(m=10)
But if I run
r = ld_deterministic.random_base2(m=10)
twice I get
The balance properties of Sobol' points require n to be a power of 2. 2048 points have been previously generated, then: n=2048+2**10=3072. If you still want to do this, the function 'Sobol.random()' can be used.
It seems like using Sobol.random() is discouraged from the doc.
What I would like (and it should be faster) is to first get
ld = qmc.Sobol(d=2, scramble=False)
then to generate like a 1000 scrambling (or other randomization method) from this initial series.
It avoids having to regenerate the Sobol sequence for each sample and just do scrambling.
How to that?
It seems to me like it is the proper way to do many Randomized QMC, but I might be wrong and there might be other ways.
As the warning suggests, Sobol' is a sequence meaning that there is a link between with the previous samples. You have to respect the properties of 2^m. It's perfectly fine to use Sobol.random() if you understand how to use it, this is why we created Sobol.random_base2() which prints a warning if you try to do something that would break the properties of the sequence. Remember that with Sobol' you cannot skip 10 points and then sample 5 or do arbitrary things like that. If you do that, you will not get the convergence rate guaranteed by Sobol'.
In your case, what you want to do is to reset the sequence between the draws (Sobol.reset). A new draw will be different from the previous one if scramble=True. Another way (using a non scrambled sequence for instance) is to sample 2^k and skip the first 2^(k-1) points then you can sample 2^n with n<k-1.
I'm currently coding an algorithm for 4-parametric RAINFLOW method. The idea of this method is to eliminate load cycles from a cycle history, which is normally given in a load (for example force) - time diagram. This is a very frequently used method in mechanical engineering to determine the life span of a product/element, that is exposed to a certain number of the load cycles.
However the result of this method is a so called FROM-TO table or FROM-TO matrix where the rows present the FROM and the columns present the TO number like shown in the picture below:
example of from-to table/matrix
This example is non-realistic as you normally get a file with million points of measurements which means, that some cycles won't occur only once(1) or twice (2) like its shown in the table, but they may occur thousands of times.
Now to the problem:
I coded the algorithm of the method and as a result formed a vector with FROM values and a vector with TO values, like this:
vek_from=[]
vek_to=[]
d=len(a)/2
for i in range(int(d)):
vek_from.append(a[2*i]) # FROM
vek_to.append(a[2*i+1]) # TO
a is the vector with all values, like a=[from, to, from, to,...]
Now I'm trying to form a matrix out of this, like this:
mat_from_to = np.zeros(shape=(int(d),int(d)))
MAT = np.zeros(shape=(int(d),int(d)))
s=int(d-1)
for i in range(s):
mat_from_to[vek_from[i]-2, vek_to[i]-2] += 1
So the problem is that I don't know how to code that when a load cycles occurs several times (it has the same from-to values), how to add +1 to the FROM-TO combination every time that happens, because with what I've coded, it only replaces the previous value with 1, so I can never exceed 1...
So to make explanation shorter, how to code that whenever a combination of FROM-TO values is made that determined the position of an element in the matrix, to add a +1 there...
Hopefully I didn't make it too complicated and someone will be happy to help me with this.
Regards,
Luka
I am trying to make an algorithm that's able to solve a big tileset in the tiling problem. Right now it's able to find the correct Tiles to place based on their width and height, but there are some problems with making it properly recursive.
As you can see the idea is that after each tile that's placed the field will be separated in a Field Right and a Field Below. The algorithm will first try to fill the Field Right and as soon as that's done it has to start trying to fill Field Below.
The problem I have is that once Field Right is solved it has to be "send back" in a way (I think using recursion, though this is quite complex) to get it to go back a Tile and go to the Field Below that belongs to that Tile. I put the idea in some pseudocode to make it a bit easier to follow.
As you can see when FieldRightWidth is solved and the FieldBelowHeight is also solved I want to make it return to the previous tile to check if FieldBelow is solved. I think that's where I need to put some code to make this work, but after hours of Googling I still have no clue.
Pseudocode:
def Methods:
create global tileset
create local tileset (without duplicates)
If globaltiles is empty:
solved
end
else:
if fieldRightWidth == solved:
if fieldBelowHeight == solved:
return True???
#FIELD BELOW
else:
Search for Tile
Place Tile
Return Methods
#FIELD RIGHT
else:
Search for Tile
Place Tile
Return Methods
And a picture of what I want the algorithm to do:
And all of the code:
http://pastebin.com/8t4PeiZP
http://www.filedropper.com/tilingnew
I'm still a newbie in coding, so any advice or help is very appreciated!
alright, let's think the area you want to calculate are either square or rectangular,(not rotated), it start from minimum [x,y] and end maximum [x,y] right, like so:
SMaxX = 5
SMinX = 0
SMaxY = 5
SMinY = 0
or if you are familiar with 2D vector you can optimize it like so:
S = [5,5]
you might know about 2D vector, just in case i explain what is vector in 2D cartesian coordinate:
S = [5,5] means, if S start from [0,0], it will end at [5,5], (simpler right?)
so boxes also will be like so:
#space each box taking
box1 = [3,3]
box2 = [2,2]
box3 = [1,1]
and since there is priority for each box, let's say:
#priority of each box
box1 = 6
box2 = 4
box3 = 2
we can merge both space and priority into dictionary like so:
#Items
dic = {'box1':{'space':[3,3] , 'priority ':6},
'box2':{'space':[2,2] , 'priority ':4},
'box3':{'space':[1,1] , 'priority ':2}}
having priority and spaces of each boxes, looks like Knapsack problem algorithm.
if you are familiar about Knapsack problem algorithm, in a table we are trying to find the highest priority that fill the space perfectly, or in other word best possible way of fitting boxes. check this link1 and link2.
however Knapsack problem algorithm's chart is 1D solution, which if you do it, you will get 10, so Box1 and Box2. but since it's 2D and you have different height and width, so the standard 1D formula wont work, maybe you need to look into it see if you can come up with 2D formula or ask around see if someone done that before.
other than Knapsack problem algorithm you can try Flood fill algorithm which is a bit slower if you have huge area, but it work just like how Tetris game is.
you need to set standard size like 1x1, and then define the whole area with 1x1 data, and store it in a variable and set each True (Boolean), then with higher priority of boxes fill the area and set those 1x1 date to False, then really easy you can check if how many of the them are True and what area are they taking.
anyway, i'm trying to figure out the same thing in irregular shape, so that was all i found out, hope that help you.
(check this link as well, i got some useful answers.)
Edit: okay, if you use Tetris idea with defining the area and Knapsack problem algorithm in one axis and then base on standard Tetris area, use Knapsack problem algorithm again in other axis should work perfectly.
I am trying to solve a problem related to graph theory but can't seem to remember/find/understand the proper/best approach so I figured I'd ask the experts...
I have a list of paths from two nodes (1 and 10 in example code). I'm trying to find the minimum number of nodes to remove to cut all paths. I'm also only able to remove certain nodes.
I currently have it implemented (below) as a brute force search. This works fine on my test set but is going to be an issue when scaling up to a graphs that have paths in the 100K and available nodes in the 100 (factorial issue). Right now, I'm not caring about the order I remove nodes in, but I will at some point want to take that into account (switch sets to list in code below).
I believe there should be a way to solve this using a max flow/min cut algorithm. Everything I'm reading though is going way over my head in some way. It's been several (SEVERAL) years since doing this type of stuff and I can't seem to remember anything.
So my questions are:
1) Is there a better way to solve this problem other than testing all combinations and taking the smallest set?
2) If so, can you either explain it or, preferably, give pseudo code to help explain? I'm guessing there is probably a library that already does this in some way (I have been looking and using networkX lately but am open to others)
3) If not (or even of so), suggestions for how to multithread/process solution? I want to try to get every bit of performance I can from computer. (I have found a few good threads on this question I just haven't had a chance to implement so figured I'd ask at same time just in chance. I first want to get everything working properly before optimizing.)
4) General suggestions on making code more "Pythonic" (probably will help with performance too). I know there are improvements I can make and am still new to Python.
Thanks for the help.
#!/usr/bin/env python
def bruteForcePaths(paths, availableNodes, setsTested, testCombination, results, loopId):
#for each node available, we are going to
# check if we have already tested set with node
# if true- move to next node
# if false- remove the paths effected,
# if there are paths left,
# record combo, continue removing with current combo,
# if there are no paths left,
# record success, record combo, continue to next node
#local copy
currentPaths = list(paths)
currentAvailableNodes = list(availableNodes)
currentSetsTested = set(setsTested)
currentTestCombination= set(testCombination)
currentLoopId = loopId+1
print "loop ID: %d" %(currentLoopId)
print "currentAvailableNodes:"
for set1 in currentAvailableNodes:
print " %s" %(set1)
for node in currentAvailableNodes:
#add to the current test set
print "%d-current node: %s current combo: %s" % (currentLoopId, node, currentTestCombination)
currentTestCombination.add(node)
# print "Testing: %s" % currentTestCombination
# print "Sets tested:"
# for set1 in currentSetsTested:
# print " %s" % set1
if currentTestCombination in currentSetsTested:
#we already tested this combination of nodes so go to next node
print "Already test: %s" % currentTestCombination
currentTestCombination.remove(node)
continue
#get all the paths that don't have node in it
currentRemainingPaths = [path for path in currentPaths if not (node in path)]
#if there are no paths left
if len(currentRemainingPaths) == 0:
#save this combination
print "successful combination: %s" % currentTestCombination
results.append(frozenset(currentTestCombination))
#add to remember we tested combo
currentSetsTested.add(frozenset(currentTestCombination))
#now remove the node that was add, and go to the next one
currentTestCombination.remove(node)
else:
#this combo didn't work, save it so we don't test it again
currentSetsTested.add(frozenset(currentTestCombination))
newAvailableNodes = list(currentAvailableNodes)
newAvailableNodes.remove(node)
bruteForcePaths(currentRemainingPaths,
newAvailableNodes,
currentSetsTested,
currentTestCombination,
results,
currentLoopId)
currentTestCombination.remove(node)
print "-------------------"
#need to pass "up" the tested sets from this loop
setsTested.update(currentSetsTested)
return None
if __name__ == '__main__':
testPaths = [
[1,2,14,15,16,18,9,10],
[1,2,24,25,26,28,9,10],
[1,2,34,35,36,38,9,10],
[1,3,44,45,46,48,9,10],
[1,3,54,55,56,58,9,10],
[1,3,64,65,66,68,9,10],
[1,2,14,15,16,7,10],
[1,2,24,7,10],
[1,3,34,35,7,10],
[1,3,44,35,6,10],
]
setsTested = set()
availableNodes = [2, 3, 6, 7, 9]
results = list()
currentTestCombination = set()
bruteForcePaths(testPaths, availableNodes, setsTested, currentTestCombination, results, 0)
print "results:"
for result in sorted(results, key=len):
print result
UPDATE:
I reworked the code using itertool for generating the combinations. It make the code cleaner and faster (and should be easier to multiprocess. Now to try to figure out the dominate nodes as suggested and multiprocess function.
def bruteForcePaths3(paths, availableNodes, results):
#start by taking each combination 2 at a time, then 3, etc
for i in range(1,len(availableNodes)+1):
print "combo number: %d" % i
currentCombos = combinations(availableNodes, i)
for combo in currentCombos:
#get a fresh copy of paths for this combiniation
currentPaths = list(paths)
currentRemainingPaths = []
# print combo
for node in combo:
#determine better way to remove nodes, for now- if it's in, we remove
currentRemainingPaths = [path for path in currentPaths if not (node in path)]
currentPaths = currentRemainingPaths
#if there are no paths left
if len(currentRemainingPaths) == 0:
#save this combination
print combo
results.append(frozenset(combo))
return None
Here is an answer which ignores the list of paths. It just takes a network, a source node, and a target node, and finds the minimum set of nodes within the network, not either source or target, so that removing these nodes disconnects the source from the target.
If I wanted to find the minimum set of edges, I could find out how just by searching for Max-Flow min-cut. Note that the Wikipedia article at http://en.wikipedia.org/wiki/Max-flow_min-cut_theorem#Generalized_max-flow_min-cut_theorem states that there is a generalized max-flow min-cut theorem which considers vertex capacity as well as edge capacity, which is at least encouraging. Note also that edge capacities are given as Cuv, where Cuv is the maximum capacity from u to v. In the diagram they seem to be drawn as u/v. So the edge capacity in the forward direction can be different from the edge capacity in the backward direction.
To disguise a minimum vertex cut problem as a minimum edge cut problem I propose to make use of this asymmetry. First of all give all the existing edges a huge capacity - for example 100 times the number of nodes in the graph. Now replace every vertex X with two vertices Xi and Xo, which I will call the incoming and outgoing vertices. For every edge between X and Y create an edge between Xo and Yi with the existing capacity going forwards but 0 capacity going backwards - these are one-way edges. Now create an edge between Xi and Xo for each X with capacity 1 going forwards and capacity 0 going backwards.
Now run max-flow min-cut on the resulting graph. Because all the original links have huge capacity, the min cut must all be made up of the capacity 1 links (actually the min cut is defined as a division of the set of nodes into two: what you really want is the set of pairs of nodes Xi, Xo with Xi in one half and Xo in the other half, but you can easily get one from the other). If you break these links you disconnect the graph into two parts, as with standard max-flow min-cut, so deleting these nodes will disconnect the source from the target. Because you have the minimum cut, this is the smallest such set of nodes.
If you can find code for max-flow min-cut, such as those pointed to by http://www.cs.sunysb.edu/~algorith/files/network-flow.shtml I would expect that it will give you the min-cut. If not, for instance if you do it by solving a linear programming problem because you happen to have a linear programming solver handy, notice for example from http://www.cse.yorku.ca/~aaw/Wang/MaxFlowMinCutAlg.html that one half of the min cut is the set of nodes reachable from the source when the graph has been modifies to subtract out the edge capacities actually used by the solution - so given just the edge capacities used at max flow you can find it pretty easily.
If the paths were not provided as part of the problem I would agree that there should be some way to do this via http://en.wikipedia.org/wiki/Max-flow_min-cut_theorem, given a sufficiently ingenious network construction. However, because you haven't given any indication as to what is a reasonable path and what is not I am left to worry that a sufficiently malicious opponent might be able to find strange collections of paths which don't arise from any possible network.
In the worst case, this might make your problem as difficult as http://en.wikipedia.org/wiki/Set_cover_problem, in the sense that somebody, given a problem in Set Cover, might be able to find a set of paths and nodes that produces a path-cut problem whose solution can be turned into a solution of the original Set Cover problem.
If so - and I haven't even attempted to prove it - your problem is NP-Complete, but since you have only 100 nodes it is possible that some of the many papers you can find on Set Cover will point at an approach that will work in practice, or can provide a good enough approximation for you. Apart from the Wikipedia article, http://www.cs.sunysb.edu/~algorith/files/set-cover.shtml points you at two implementations, and a quick search finds the following summary at the start of a paper in http://www.ise.ufl.edu/glan/files/2011/12/EJORpaper.pdf:
The SCP is an NP-hard problem in the strong sense (Garey and Johnson, 1979) and many algorithms
have been developed for solving the SCP. The exact algorithms (Fisher and Kedia, 1990; Beasley and
JØrnsten, 1992; Balas and Carrera, 1996) are mostly based on branch-and-bound and branch-and-cut.
Caprara et al. (2000) compared different exact algorithms for the SCP. They show that the best exact
algorithm for the SCP is CPLEX. Since exact methods require substantial computational effort to solve
large-scale SCP instances, heuristic algorithms are often used to find a good or near-optimal solution in a
reasonable time. Greedy algorithms may be the most natural heuristic approach for quickly solving large
combinatorial problems. As for the SCP, the simplest such approach is the greedy algorithm of Chvatal
(1979). Although simple, fast and easy to code, greedy algorithms could rarely generate solutions of good
quality....
Edit: If you want to destroy in fact all paths, and not those from a given list, then max-flow techniques as explained by mcdowella is much better than this approach.
As mentioned by mcdowella, the problem is NP-hard in general. However, the way your example looks, an exact approach might be feasible.
First, you can delete all vertices from the paths that are not available for deletion. Then, reduce the instance by eliminating dominated vertices. For example, every path that contains 15 also contains 2, so it never makes sense to delete 15. In the example if all vertices were available, 2, 3, 9, and 35 dominate all other vertices, so you'd have the problem down to 4 vertices.
Then take a vertex from the shortest path and branch recursively into two cases: delete it (remove all paths containing it) or don't delete it (delete it from all paths). (If the path has length one, omit the second case.) You can then check for dominance again.
This is exponential in the worst case, but might be sufficient for your examples.