Solving a system of Quadratic Equations - python

I am doing a cryptography program in Python.
It consists in reading a random phrase, like HELLO. Then it assigns it's respective ASCII values like:
H = 72, E = 79.
Then, using Pythagoras' Theorem, it creates two numbers C1 and C2, like this:
C1 = sqrt(A^2 + (A+B)^2);
C2 = sqrt(B^2 + (A+B)^2)
where, in this case A = H and B = E. That would be the encryption part, but I am having problems solving the system, which will act as the decryptor.
How can I solve this system using python?
C1 = sqrt(A^2 + (A+B)^2);
C2 = sqrt(B^2 + (A+B)^2);
Of course only C1 and C2 are known.
Do I need a new module? Which one?

If you're only talking about using the two characters for encryption, that's not a good idea.
That only gives 65,536 possible variants (two ASCII characters was mentioned but I'll assume a full 8-bit octet, so 256 multiplied by 256), it's easy enough to brute-force this. First, we know that every value of A and B generates a unique pair C1/C2, as per the following program which generates no duplicates:
lookup = {}
for a in range(256):
for b in range(256):
c1s = a*a + (a+b)*(a+b)
c2s = b*b + (a+b)*(a+b)
lkey = "%d:%d"%(c1s,c2s)
lookup[lkey] = 1
print(len(lookup)) # gives 65536 (256 squared)
Also, since both A and B are integers, so too will be C12 and C22.
So the first step is to work out the squares of the values you're given (since sqrt is a potentially expensive operation), taking into account the possibility of floating point inaccuracies:
c1s = int(c1 * c1 + 0.1)
c2s = int(c2 * c2 + 0.1)
Then, simply brute-force the solution:
for a in range(256):
for b in range(256):
if c1s != a*a + (a+b)*(a+b):
continue
if c2s == b*b + (a+b)*(a+b):
print(a,b)
sys.exit(0)
print("No solution")
On my machine, searching for the slowest solution (both a and b set to 255), it takes just a smidgeon over six hundredths of a second.
But you should keep in mind that, if an attacker has the C1/C2 values, they too can get the results that fast. And, even if they don't have it, the fact that there are only 64K possibilities means that they can try every possible value in a little over one and a quarter hours. So I wouldn't be using this method to store anything that's valuable for very long :-)

Related

Karatsuba multiplication error for large integers in Python

I've been trying to implement the Karatsuba algorithm in Python3 in the following way:
def karatsuba(num1,num2):
n_max = max(len(str(int(num1))), len(str(int(num2))))
if n_max == 1:
return int(num1*num2)
n = n_max + n_max%2
a = num1//10**(n/2)
b = num1%10**(n/2)
c = num2//10**(n/2)
d = num2%10**(n/2)
t1 = karatsuba(a,c)
t3 = karatsuba(b,d)
t2 = karatsuba(a+b,c+d) - t1 - t3
return int(t1*10**n + t2*10**(n/2) + t3)
While the function works for small products, it fails for ones that exceed 18 digits. One can see this by running, say,
import random
for i in range(1,12):
a = random.randint(10**i, 10**(i+1)-1)
b = random.randint(10**i, 10**(i+1)-1)
print(f"{len(str(a*b))} digits, error: {abs(a*b - karatsuba(a,b))}")
I would appreciate if someone could explain what is the root of this problem and, if possible, how could this code be modified to fix it. My best guess is that some round-off error is committed by Python at some point. That said, I don't really know how int fundamentally works in this language.
Use n//2 instead of n/2 to stay with ints and avoid that precision loss due to that float value.

if condition using sympy equation solver/ sympy very slow

I want to solve this equation witht the following parameters:
gamma = 0.1
F = 0.5
w = 0
A = symbols('A')
a = 1 + w**4 -w**2 + 4*(gamma**2)*w**2
b = 1 - w**2
sol = solve(a*A**2 + (9/16)*A**6 + (3/2)*b*A**4 -F**2)
list_A = []
for i in range(len(sol)):
if(type( solutions[i] )==float ):
print(sol[i])
list_A = sol[i]
However, as supposed, I am getting some real and complex values, and I want to remove the complex ones and only keep the floats. But this condition I implemented is not valid due to the type of sol[i] is either sympy.core.add.Add for complex or sympy.core.numbers.Float for floats.
My question is, how can I modify my condition so that it works for getting only the float values?
In addition, is there a way to speed it up? it is very slow if I put it in a loop for many values of omega.
this is my first time working with sympy
When it is able to validate solutions relative to assumptions on symbols, it will; so if you tell SymPy that A is real then -- if it can verify the solutions -- it will only show the real ones:
>>> A = symbols('A',real=True)
>>> sol = solve(a*A**2 + (9/16)*A**6 + (3/2)*b*A**4 -F**2)
>>> sol
[-0.437286658108243, 0.437286658108243]

Probability that a formula fails in IEEE 754

On my computer, I can check that
(0.1 + 0.2) + 0.3 == 0.1 + (0.2 + 0.3)
evaluates to False.
More generally, I can estimate that the formula (a + b) + c == a + (b + c) fails roughly 17% of the time when a,b,c are chosen uniformly and independently on [0,1], using the following simulation:
import numpy as np
import numexpr
np.random.seed(0)
formula = '(a + b) + c == a + (b + c)'
def failure_expectation(formula=formula, N=10**6):
a, b, c = np.random.rand(3, N)
return 1.0 - numexpr.evaluate(formula).mean()
# e.g. 0.171744
I wonder if it is possible to arrive at this probability by hand, e.g. using the definitions in the floating point standard and some assumption on the uniform distribution.
Given the answer below, I assume that the following part of the original question is out of reach, at least for now.
Is there is a tool that computes the failure probability for a given
formula without running a simulation.
Formulas can be assumed to be simple, e.g. involving the use of
parentheses, addition, subtraction, and possibly multiplication and
division.
(What follows may be an artifact of numpy random number generation, but still seems fun to explore.)
Bonus question based on an observation by NPE. We can use the following code to generate failure probabilities for uniform distributions on a sequence of ranges [[-n,n] for n in range(100)]:
import pandas as pd
def failures_in_symmetric_interval(n):
a, b, c = (np.random.rand(3, 10**4) - 0.5) * n
return 1.0 - numexpr.evaluate(formula).mean()
s = pd.Series({
n: failures_in_symmetric_interval(n)
for n in range(100)
})
The plot looks something like this:
In particular, failure probability dips down to 0 when n is a power of 2 and seems to have a fractal pattern. It also looks like every "dip" has a failure probability equal to that of some previous "peak". Any elucidation of why this happens would be great!
It's definitely possible to evaluate these things by hand, but the only methods I know are tedious and involve a lot of case-by-case enumeration.
For example, for your specific example of determining the probability that (a + b) + c == a + (b + c), that probability is 53/64, to within a few multiples of the machine epsilon. So the probability of a mismatch is 11/64, or around 17.19%, which agrees with what you were observing from your simulation.
To start with, note that there's a major simplifying factor in this particular case, and that's that Python and NumPy's "uniform-on-[0, 1]" random numbers are always of the form n/2**53 for some integer n in range(2**53), and within the constraints of the underlying Mersenne Twister PRNG, each such number is equally likely to occur. Since there are around 2**62 IEEE 754 binary64 representable values in the range [0.0, 1.0], that means that the vast majority of those IEEE 754 values aren't generated by random.random() (or np.random.rand()). This fact greatly simplifies the analysis, but also means that it's a bit of a cheat.
Here's an incomplete sketch, just to give an idea of what's involved. To compute the value of 53/64, I had to divide into five separate cases:
The case where both a + b < 1 and b + c < 1. In this case, both a + b and b + c are computed without error, and (a + b) + c and a + (b + c) therefore both give the closest float to the exact result, rounding ties to even as usual. So in this case, the probability of agreement is 1.
The case where a + b < 1 and b + c >= 1. Here (a + b) + c will be the correctly rounded value of the true sum, but a + (b + c) may not be. We can divide further into subcases, depending on the parity of the least significant bits of a, b and c. Let's abuse terminology and call a "odd" if it's of the form n/2**53 with n odd, and "even" if it's of the form n/2**53 with n even, and similarly for b and c. If b and c have the same parity (which will happen half the time), then (b + c) is computed exactly and again a + (b + c) must match (a + b) + c. For the other cases, the probability of agreement is 1/2 in each case; the details are all very similar, but for example in the case where a is odd, b is odd and c is even, (a + b) + c is computed exactly, while in computing a + (b + c) we incur two rounding errors, each of magnitude exactly 2**-53. If those two errors are in opposite directions, they cancel and we get agreement. If not, we don't. Overall, there's a 3/4 probability of agreement in this case.
The case where a + b >= 1 and b + c < 1. This is identical to the previous case after swapping the roles of a and c; the probability of agreement is again 3/4.
a + b >= 1 and b + c >= 1, but a + b + c < 2. Again, one can split on the parities of a, b and c and look at each of the resulting 8 cases in turn. For the cases even-even-even and odd-odd-odd we always get agreement. For the case odd-even-odd, the probability of agreement turns out to be 3/4 (by yet further subanalysis). For all the other cases, it's 1/2. Putting those together gets an aggregate probability of 21/32 for this case.
Case a + b + c >= 2. In this case, since we're rounding the final result to a multiple of four times 2**-53, it's necessary to look not just at the parities of a, b, and c, but to look at the last two significant bits. I'll spare you the gory details, but the probability of agreement turns out to be 13/16.
Finally, we can put all these cases together. To do that, we also need to know the probability that our triple (a, b, c) lands in each case. The probability that a + b < 1 and b + c < 1 is the volume of the square-based pyramid described by 0 <= a, b, c <= 1, a + b < 1, b + c < 1, which is 1/3. The probabilities of the other four cases can be seen (either by a bit of solid geometry, or by setting up suitable integrals) to be 1/6 each.
So our grand total is 1/3 * 1 + 1/6 * 3/4 + 1/6 * 3/4 + 1/6 * 21/32 + 1/6 * 13/16, which comes out to be 53/64, as claimed.
A final note: 53/64 almost certainly isn't quite the right answer - to get a perfectly accurate answer we'd need to be careful about all the corner cases where a + b, b + c, or a + b + c hit a binade boundary (1.0 or 2.0). It would certainly be possible to do refine the above approach to compute exactly how many of the 2**109 possible triples (a, b, c) satisfy (a + b) + c == a + (b + c), but not before it's time for me to go to bed. But the corner cases should constitute on the order of 1/2**53 of the total number of cases, so our estimate of 53/64 should be accurate to at least 15 decimal places.
Of course, there are lots of details missing above, but I hope it gives some idea of how it might be possible to do this.

Using SMT-LIB to count the number of modules using a formula

I am not sure that this is possible using SMT-LIB, if it is not possible does an alternative solver exist that can do it?
Consider the equations
a < 10 and a > 5
b < 5 and b > 0
b < c < a
with a, b and c integers
The values for a and b where the maximum number of model exist that satisfy the equations when a=9 and b=1.
Do SMT-LIB support the following: For each values of a and b count the number of models that satisfy the formulas and give the value for a and b that maximize the count.
I don't think you can do this in general; that is, when you can have arbitrary constraints over arbitrary theories. You are asking a "meta"-question: "Maximize the number of models" is not a question about the problem itself, but rather about the models of the problem; something SMTLib cannot deal with.
Having said that, however, I think it should be possible to code it for specific problems. In the example you gave, the model space is maximized when a - b is the greatest; so you can simply write:
(set-option :produce-models true)
(declare-fun a () Int)
(declare-fun b () Int)
(declare-fun c () Int)
(assert (< 5 a 10))
(assert (< 0 b 5))
(assert (< b c a))
(maximize (- a b))
(check-sat)
(get-value (a b))
To which z3 responds:
sat
((a 9)
(b 1))
as desired. Or, you can use the Python bindings:
from z3 import *
a, b, c = Ints('a b c')
o = Optimize()
o.add(And(5 < a, a < 10, 0 < b, b < 5, b < c, c < a))
o.maximize(a - b)
if o.check() == sat:
m = o.model()
print "a = %s, b = %s" % (m[a], m[b])
else:
print "unsatisfiable or unknown"
which prints:
a = 9, b = 1
There are also bindings for C/C++/Java/Scala/Haskell etc. that let you do more or less the same from those hosts as well.
But the crucial point here is that we had to manually come up with the goal that maximizing a - b would solve the problem here. That step is something that needs human intervention as it applies to whatever your current problem is. (Imagine you're working with the theory of floats, or arbitrary data-types; coming up with such a measure might be impossible.) I don't think that part can be automated magically using traditional SMT solving. (Unless Patrick comes up with a clever encoding, he's quite clever that way!)
Let's break down your goals:
You want to enumerate all possible ways in which a and b (...and more) can be assigned
For each combination, you want to count the number of satisfiable models
In general, this is not possible, as the domain of some variables in the problem might contain an infinite number of elements.
Even when one can safely assume that the domain of every other variable contains a finite number of elements, it is still highly inefficient.
For instance, if you had only Boolean variables in your problem, you would still have an exponential number of combination of values --and therefore candidate models-- to consider along the search.
However, it is also possible that your actual application is not that complex in practice, and therefore it can be handled by an SMT Solver.
The general idea could be to use some SMT Solver API and proceed as follows:
assert the whole formula
repeat until finish combinations of values:
push a back-track point
assert one specific combination of values, e.g. a = 8 and b = 2
repeat forever:
check for a solution
if UNSAT, exit the inner-most loop
if SAT, increase counter of models for the given combination of values of a and b
take the model value of any other variable, e.g. c = 5 and d = 6
assert a new constraint requesting that at least one of the "other" variables changes its value, e.g. c != 5 or d != 6
pop backtrack point
Alternatively, you may enumerate the possible assignments over a and b implicitly rather than explicitly. The idea would be as follows:
assert the whole formula
repeat forver:
check for a solution
if UNSAT, exit loop
if SAT, take the combination of values of your control variables from the model (e.g. a = 8 and b = 2), check in an internal map if you encountered this combination before, if not set the counter to 1, otherwise increase the counter by 1.
take the model value of any other variable, e.g. c = 5 and d = 6
assert a new constraint requesting for a new solution, e.g. a != 8 or b != 2 or c != 5 or d != 6
In the case that you are in doubt on which SMT Solver to pick, I would advice you to start solving your task with pysmt, which allows one to choose among several SMT engines with ease.
If for your application an explicit enumeration of models is too slow to be practical, then I would advice you to look at the vast literature on Counting Solutions of CSPs, where this problem has already been tackled and there seem to exist several ways to approximately estimate the number of solutions of CSPs.

Simultaneous Equations with given conditions

to start off I have already solved this problem so it's not a big deal, I'm just asking to satisfy my own curiosity. The question is how to solve a series of simultaneous equations given a set of constraints. The equations are:
tau = 62.4*d*0.0007
A = (b + 1.5*d)*d
P = b + 2*d*sqrt(1 + 1.5**2)
R = A/P
Q = (1.486/0.03)*A*(R**(2.0/3.0))*(0.0007**0.5)
and the conditions are:
tau <= 0.29, Q = 10000 +- say 3, and minimize b
As I mentioned I was already able to come up with a solution using a series of nested loops:
b = linspace(320, 330, 1000)
d = linspace(0.1, 6.6392, 1000)
ansQ = []
ansv = []
anstau = []
i_index = []
j_index = []
for i in range(len(b)):
for j in range(len(d)):
tau = 62.4*d[j]*0.0007
A = (b[i] + 1.5*d[j])*d[j]
P = b[i] + 2*d[j]*sqrt(1 + 1.5**2)
R = A/P
Q = (1.486/0.03)*A*(R**(2.0/3.0))*(0.0007**0.5)
if Q >= 10000 and tau <= 0.29:
ansQ.append(Q)
ansv.append(Q/A)
anstau.append(tau)
i_index.append(i)
j_index.append(j)
This takes a while, and there is something in the back of my head saying that there must be an easier/more elegant solution to this problem. Thanks (Linux Mint 13, Python 2.7.x, scipy 0.11.0)
You seem to only have two degrees of freedom here---you can rewrite everything in terms of b and d or b and tau or (pick your two favorites). Your constraint on tau implies directly a constraint on d, and you can use your constraint on Q to imply a constraint on b.
And it doesn't look (to me at least, I still haven't finished my coffee) that your code is doing anything other than plotting some two dimensional functions over a grid you've defined--NOT solving a system of equations. I normally understand "solving" to involve setting something equal to something else, and writing one variable as a function of another variable.
It does appear you've only posted a snippet, though, so I'll assume you do something else with your data down stream.
Ok, I see. I think this isn't really a minimization problem, it's a plotting problem. The first thing I'd do is see what ranges are implied for b and d from your constraints on tau, and then use that to derive a constraint on d. Then you can mesh those points with meshgrid (as you mentioned below) and run over all combinations.
Since you're applying the constraint before you apply the mesh (as opposed to after, as in your code), you'll only be sampling the parameter space that you're interested in. In your code you generate a bunch of junk you're not interested in, and pick out the gems. If you apply your constraints first, you'll only be left with gems!
I'd define my functions like:
P = lambda b, d: b + 2*d*np.sqrt(1 + 1.5**2)
which works like
>>> import numpy as np
>>> P = lambda b, d: b + 2*d*np.sqrt(1 + 1.5**2)
>>> P(1,2)
8.2111025509279791
Then you can write another function to serve up b and d for you, so you can do something like:
def get_func_vals(b, d):
pvals.append(P(b,d))
or, better yet, store b and d as tuples in a function that doesn't return but yields:
pvals = [P(b,d) for (b,d) in thing_that_yields_b_and_d_tuples]
I didn't test this last line of code, and I always screw up these parenthesis, but I think it's right.

Categories

Resources