How to generate a solved grid for the binary-puzzle

How to generate a solved grid for the binary-puzzle - python

I'm working on my first game with Python and Pygame, and I have to create a binary puzzle.
I'm facing a problem generating a solved grid with these conditions:
Each box should contain either a zero or a one.
More than two equal numbers immediately next to or below each are not allowed.
Each row and each column should contain an equal number of zeros and ones.
Each row is unique and each column is unique. Thus, any row cannot be exactly equal to another row, and any column cannot be exactly equal to another column.
I tried somthing like
parents = []
unique_found = False
while not unique_found:
candidate_array = np.random.choice([0, 1], size=(CELL,CELL))
if not any((candidate_array == x).all() for x in parents):
[(i, j) for i, j in enumerate(candidate_array)]
unique_found = True
parents.append(candidate_array)
It's generating a random grid of one and zeros:
[[0 0 0 1 0 1]
[1 0 0 1 1 1]
[0 0 0 0 1 0]
[1 0 0 0 1 0]
[0 1 1 1 1 1]
[0 1 1 0 1 1]]
but I don't know how to add the conditions I want to make this grid less random.

For a direct solution, there's basically no way to get around coding up those constraints. Once you do code them up, try backtracking: try putting a 0 in the first cell and check the constraints. If the constraints are satisfied, recursively step to the next cell and try a 0 there.
If a constraint is ever breached and you've tried all of the possible values on a square, one of the assumptions somewhere along the way must have been invalid, so undo the most recent 0 or 1 placed, pop the call stack and backtrack to the previous index to try the next possible value. This backtracking might unwind all of the moves, but eventually it'll hone in on a solution if a board is solvable. It's brute force with the optimization that it stops exploring impossible positions.
This is basically the same as the most straightforward way of solving Sudoku, but with different constraints and different values per square (much fewer, but more squares). The only difference between generating a solved board from scratch as you're doing and solving one with a few pre-filled values is that you skip the squares with pre-filled values. It's a trivial difference. Check out this gif which illustrates the backtracking algorithm working.
Looking into Sudoku solving can offer deeper ideas for domain-specific improvements you could apply to this puzzle to gain a speed increase, once you have a basic backtracking approach working. Even without any domain-specific knowledge, the backtracking approach can be multithreaded/multiprocessed for a speed increase, with each worker starting from a specific configuration of the first few top-level values for the first squares at the the root of the search.
By the way, following this method is deterministic on an empty board, so you'll get the same filled board every time. You could randomize the choices along the way, or even the order of visiting squares, if you want a non-deterministic result.
You can improve the backtracking approach by permuting a pre-filled grid that satisfies constraint 3 (equal numbers of 0s and 1s per row/column), but it's still basically backtracking (the granularity of a stack frame changes -- now it's a possible configuration of a row rather than a cell), and is purely an optimization, so I'd start with the cell-by-cell approach.
That said, I'd avoid blindly generating and testing random configurations. It's not much easier to code than backtracking, because you still need to code up and check all of the constraints, and a non-systematic approach would likely take much longer to find a solution than a systematic one.
If you're open to using external tools, you can add the constraints to an off-the-shelf constraint solver like PuLP or Z3 and it'll spit the answer out.

You could create initial solution like the one below which only satisfies condition 1 and 3.
[1 ,1, 1, 0, 0 ,0]
[1 ,1, 1, 0, 0 ,0]
[1 ,1, 1, 0, 0 ,0]
[0 ,0, 0, 1, 1 ,1]
[0 ,0, 0, 1, 1 ,1]
[0 ,0, 0, 1, 1 ,1]
And then swap rows and columns randomly.
As those wouldn't break the third condition.
The idea would then be to swap rows or columns until you hit a valid solution.

Related

Can this (find + sort) problem be solved in O(n)?

I went thru this problem on geeksforgeeks.com and while my solution managed to pass all test cases, I actually used .sort() so I know it doesn't fit the Expected Time Complexity of O(n): I mean we all know no sorting algorithm works on O(n), not even the best implementation of Timsort (which is what Python uses). So I went to check the website's Answer/Solution and found this:
def printRepeating(arr, n):
# First check all the
# values that are
# present in an array
# then go to that
# values as indexes
# and increment by
# the size of array
for i in range(0, n):
index = arr[i] % n
arr[index] += n
# Now check which value
# exists more
# than once by dividing
# with the size
# of array
for i in range(0, n):
if (arr[i]/n) >= 2:
print(i, end=" ")
I tried to follow the logic behind that algorithm but honestly couldn't, so I tested different datasets until I found that it failed for some. For instance:
arr = [5, 6, 3, 1, 3, 6, 6, 0, 0, 11, 11, 1, 1, 50, 50]
Output: 0 1 3 5 6 11 13 14
Notice that:
Number 5 IS NOT repeated in the array,
Numbers 13 and 14 are not even present in the array, and
Number 50 is both, present and repeated, and the solution won't show it.
I already reported the problem to the website, I just wanted to know if, since these problems are supposed to be curated, there is a solution in O(n). My best guess is there isn't unless you can somehow insert every repeated number in O(1) within the mapping of all keys/values.

The reason the code doesn't work with your example data set is that you're violating one of the constraints that is given in the problem. The input array (of length n) is supposed to only contain values from 0 to n-1. Your values of 50 are too big (since you have 15 elements in your list). That constraint is why adding n to the existing values doesn't break things. You have a less-than-n original value (that can be extracted with arr[i] % n), and the count (that can be extracted with arr[i] // n). The two values are stacked on top of each other, cleverly reusing the existing array with no extra space needed.

The problem can be solved with dict().
And for Python here: https://docs.python.org/3.10/library/stdtypes.html#mapping-types-dict
It's an abstract data type that accesses in amortized O(1), which as you've mentioned, is exactly what you need.
Python stdlib also has collections.Counter, which is a specialization of dict that accomplishes 90% of what the problem asks for.
edit
Oh, the results have to be sorted too. Looks like they want you to use a list() "as a dict", mapping integers to their number of occurrences via their own value as an index.

Suppose an array contains only two kinds of elements, how to quickly find their boundaries?

I've asked a similar question before, but this time it's different.
Since our array contains only two elements, we might as well set it to 1 and -1, where 1 is on the left side of the array and -1 is on the right side of the array:
[1,...,1,1,-1,-1,...,-1]
Both 1 and -1 exist at the same time and the number of 1 and -1 is not necessarily the same. Also, the numbers of 1 and -1 are both very large.
Then, define the boundary between 1 and -1 as the index of the -1 closest to 1. For example, for the following array:
[1,1,1,-1,-1,-1,-1]
Its boundary is 3.
Now, for each number in the array, I cover it with a device that you have to unlock to see the number in it.
I want to try to unlock as few devices as possible that cover 1, because it takes much longer to see a '1' than it takes to see a '-1'. And I also want to reduce my time cost as much as possible.
How can I search to get the boundary as quickly as possible?

The problem is very like the "egg dropping" problem, but where a wrong guess has a large fixed cost (100), and a good guess has a small cost (1).
Let E(n) be the (optimal) expected cost of finding the index of the right-most 1 in an array (or finding that the array is all -1), assuming each possible position of the boundary is equally likely. Define the index of the right-most 1 to be -1 if the array is all -1.
If you choose to look at the array element at index i, then it's -1 with probability i/(n+1), and 1 with probability (n-i+1)/(n+1).
So if you look at array element i, your expected cost for finding the boundary is (1+E(i)) * i/(n+1) + (100+E(n-i-1)) * (n-i+1)/(n+1).
Thus E(n) = min((1+E(i)) * i/(n+1) + (100+E(n-i-1)) * (n-i+1)/(n+1), i=0..n-1)
For each n, the i that minimizes the equation is the optimal array element to look at for an array of that length.
I don't think you can solve these equations analytically, but you can solve them with dynamic programming in O(n^2) time.
The solution is going to look like a very skewed binary search for large n. For smaller n, it'll be skewed so much that it will be a traversal from the right.

If I am right, a strategy to minimize the expectation of the cost is to draw at a fraction of the interval that favors the -1 outcome, in inverse proportion of the cost. So instead of picking the middle index, take the right centile.
But this still corresponds to a logarithmic asymptotic complexity.
There is probably nothing that you can do regarding the worst case.

How do I get rid of subtours and force a specific order in visiting points using gurobipy?

I am aware that there is a TSP example on the Gurobi website. I took a great amount of time understanding it, however I was not able to (completely). Therefor I decided to make a more simple one by my self.
The problem: I am not able to get rid of sub-tours and I don't know how and which constraint I should include to visit pick up points before going to the deliver point. It is not necessary to immediately go from pick up point to deliver point. It is fine if multiple pick up points are visited before going to the deliver points.
My code does the following: it generates random orders containing: pick up points, deliver points and amount of packages to be send (if this succeeds I also want to include delivery times, capacity of the vehicle, multiple vehicles, etc.). After generating the orders it creates a distance matrix between the vehicle, pick up points and deliver points. Then I use the Gurobi optimizer to find the optimal route. I was able to add constraints that prevent traveling from point i to point i. I added the constraint that every point should be visited and since it is a symmetrical matrix, I also added the constraint that prevents traveling between two points: 0 - 4 - 0 - 4 -...
# Using spyder 3.6 and gurobi. For simplicity I only added the DISTANCE matrix, not how the orders and distance matrix are created
import numpy as np
from gurobipy import *
DISTANCE = [[0, 96.64884893, 94.41398202, 88.23264702, 18.86796226, 97.52948272, 105.99056562],
[96.64884893, 0, 183.14202139, 19.02629759, 80.77747211, 194.17775362, 189.1692364],
[94.41398202, 183.14202139, 0, 169.53170795, 113.25193155, 55.22680509, 122.58874337],
[88.23264702, 19.02629759, 169.53170795, 0, 74.81310046, 185.01081049, 187.01069488],
[18.86796226, 80.77747211, 113.25193155, 74.81310046, 0, 114.28035702, 113.35784049],
[97.52948272, 194.17775362, 55.22680509, 185.01081049, 114.28035702, 0, 73.00684899],
[105.99056562, 189.1692364, 122.58874337, 187.01069488, 113.35784049, 73.00684899, 0]]
dist = np.array(DISTANCE)
n=dist.shape[0]
# Create model
m = Model('Pickup_Deliver_Optimizer')
# Add variables
x = {}
for i in range(n):
for j in range(n):
x[i,j] = m.addVar(vtype=GRB.BINARY)
# Objective function
obj = (quicksum(x[i,j]*DISTANCE[i][j] for i in range(n) for j in range(n)))
m.setObjective(obj, GRB.MINIMIZE)
# Constraints
# Constraint that does not allow to travel from point i to point i
for i in range(n):
m.addConstr(x[i,i],GRB.EQUAL,0)
# To prevent the vehicle from returning to the same point from the previous point: if x[i,j] == 1, then x[j,i] == 0
m.addConstrs((x[i,j] == 1) >> (x[j,i] == 0) for i in range(n) for j in range(n))
# Visit all points by making a connection between two points
for i in range(n):
m.addConstr(quicksum(x[i,j] for j in range(n)),GRB.EQUAL,1)
m.addConstr(quicksum(x[j,i] for j in range(n)),GRB.EQUAL,1)
# The vehicle should return to its starting point
m.addConstr(quicksum(x[i,0] for i in range(n)),GRB.EQUAL,1)
# The vehicle always starts at its starting point
m.addConstr(quicksum(x[0,i] for i in range(n)),GRB.EQUAL,1)
m.update()
m.optimize()
Using the TSP example from Gurobi, the optimal solution should be ~522. The result I get is ~457. The difference is the result of different routes. According to the Gurobi example the correct should be [0, 4, 1, 3, 2, 5, 6, 0]. My code creates the following two loops: [0, 4, 1, 3, 0] and [2, 5 ,6, 2]. The two routes combined have a shorter distance than the one route from the Gurobi example, thus I get the two routes as the optimal solution. I know what goes wrong, but I have no idea how to solve this issue in terms of addConstr(). I looked online what theory is behind the subtour, according to the Miller-Tucker-Zemlin method on Wikipedia https://en.wikipedia.org/wiki/Travelling_salesman_problem, I would have to add a new constraint 'u'. However, I would think it could be solved easier by adding a constraint like this:
# Create one route
for j in range(n):
if j != 0:
m.addConstrs((x[i,j] == 1) >> (quicksum(x[j,k] == 1)) for i in range(n) for k in range(n))
This line of code does not work, but it would seem like something like this should solve the issue. It would force the nodes to be connected with each other.
The second issue that I can't seem to solve is visiting the pick up point before the deliver point. Odd rows/columns in the DISTANCE matrix represent pick up points and even rows/columns represent deliver points (row/column 0 is the starting point of the vehicle). Can I solve this with the current variable x[i,j] or do I need to add an additional variable (which I expect)?
Any help is much appreciated.

If you want to use indicator constraints you need to define new binary variables (see here).
I suppose you are better off with adding subtour elimination constraints as cuts. For this you need to check your current solution for any subtours, that is, tours that do not contain all nodes and forbid them for the next optimization round. This is outlined here and here. I am linking these because you did not specify which TSP code you already inspected.
There is another intuitive implementation in PySCIPOpt that uses networkx to compute subtours.

Is there a non brute force based solution to optimise the minimum sum of a 2D array only using 1 value from each row and column

I have a 2 arrays; one is an ordered array generated from a set of previous positions for connected points; the second is a new set of points specifying the new positions of the points. The task is to match up each old point with the best fitting new position. The differential between each set of points is stored in a new Array which is of size n*n. The objective is to find a way to map each previous point to a new point resulting in the smallest total sum. As such each old point is a row of the matrix and must match to a single column.
I have already looked into a exhaustive search. Although this works it has complexity O(n!) which is just not a valid solution.
The code below can be used to generate test data for the 2D array.
import numpy as np
def make_data():
org = np.random.randint(5000, size=(100, 2))
new = np.random.randint(5000, size=(100, 2))
arr = []
# ranges = []
for i,j in enumerate(org):
values = np.linalg.norm(new-j, axis=1)
arr.append(values)
# print(arr)
# print(ranges)
arr = np.array(arr)
return arr
Here are some small examples of the array and the expected output.
Ex. 1
1 3 5
0 2 3
5 2 6
The above output should return [0,2,1] to signify that row 0 maps to column 0, row 1 to column 2 and row 2 to column 1. As the optimal solution would b 1,3,2
In
The algorithm would be nice to be 100% accurate although something much quicker that is 85%+ would also be valid.

Google search terms: "weighted graph minimum matching". You can consider your array to be a weighted graph, and you're looking for a matching that minimizes edge length.
The assignment problem is a fundamental combinatorial optimization problem. It consists of finding, in a weighted bipartite graph, a matching in which the sum of weights of the edges is as large as possible. A common variant consists of finding a minimum-weight perfect matching.
https://en.wikipedia.org/wiki/Assignment_problem
The Hungarian method is a combinatorial optimization algorithm that solves the assignment problem in polynomial time and which anticipated later primal-dual methods.
https://en.wikipedia.org/wiki/Hungarian_algorithm
I'm not sure whether to post the whole algorithm here; it's several paragraphs and in wikipedia markup. On the other hand I'm not sure whether leaving it out makes this a "link-only answer". If people have strong feelings either way, they can mention them in the comments.

Effectively count the number of repetitions for each number in a two-dimensional array

I need to find duplicate numbers in multiple one-dimensional arrays and the number of repetitions for each repetition, This is good for one-dimensional arrays np.unique, but does not seem to apply to two-dimensional arrays, I have searched for similar answers, but I need a more detailed report.(The number of occurrences of all numbers, the position index)
Can numpy bincount work with 2D arrays?
This answer does not match, I hope to get a map containing more information on some of the data, such as a number of the most, and I do not like recycling, maybe this is not appropriate, but I will try to find ways to not use a loop,Because I have a very harsh demand for speed.
For example：
a = np.array([[1,2,2,2,3],
[0,1,1,1,2],
[0,0,0,1,0]])
# The number of occurrences for each number
# int count
# 0. 0
# 1. 1
# 2. 3
# 3. 1
#need the output:
#Index = the number of statistics, the number of repetitions
[[0 1 3 1]
[1 3 1 0]
[4 1 0 0]]
Because this is part of the loop, you need an efficient way of vectoring to complete more rows of statistics at once, and try to avoid looping again.
I've used packet aggregation to count the results. The function does this by constructing a key1 that differentiates rows, the data itself as key2, and a two-dimensional array of all 1s, Although able to output, but I think it is only temporary measures.Need the right way.
from numpy_indexed import group_by
def unique2d(x):
x = x.astype(int); mx = np.nanmax(x)+1
ltbe = np.tile(np.arange(x.shape[0])[:,None],(1,x.shape[1]))
vtbe = np.zeros(x.shape).astype(int) + 1
groups = npi.group_by((ltbe.ravel(),x.ravel().astype(int)))
unique, median = groups.sum(vtbe.ravel())
ctbe = np.zeros(x.shape[0]*mx.astype(int)).astype(int)
ctbe[(unique[0] * mx + unique[1]).astype(int)] = median
ctbe.shape=(x.shape[0],mx)
return ctbe
unique2d(a)
>array([[0, 1, 3, 1],
[1, 3, 1, 0],
[4, 1, 0, 0]])
Hope there are good suggestions and algorithms, thanks

The fewest lines of code I can come up with is as follows:
import numpy as np
import numpy_indexed as npi
a = np.array([[1,2,2,2,3],
[0,1,1,1,2],
[0,0,0,1,0]])
row_idx = np.indices(a.shape, dtype=np.int32)[0]
axes, table = npi.Table(row_idx.flatten(), a.flatten()).count()
I havnt profiled this, but it does not contain any hidden un-vectorized for-loops; and I doubt you could do it much faster in numpy by any means. Nor do I expect it to perform a whole lot faster than your current solution though. Using the smallest possible int-types may help.
Note that this function does not assume that the elements of a form a contiguous set; the axes labels are returned in the axes tuple; that may or may not be the behavior you are looking for. Modifying the code in the Table class to conform to your current layout shouldnt be hard though.
If speed is your foremost concern; your problem would probably map really well to numba.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.