I'm using cvxopt to calculate the Nash equilibrium of a the following two-person zero-sum game.
[-5, 3, 1, 8]
[ 5, 5, 4, 6]
[-4, 6, 0, 5]
Here's the code (with doctest) I'm using.
from cvxopt import matrix, solvers
from cvxopt.modeling import op, dot, variable
import numpy as np
def solve_lp(a, b, c):
"""
>>> a = matrix([[-5., 3., 1., 8., 1.],
... [ 5., 5., 4., 6., 1.],
... [-4., 6., 0., 5., 1.],
... [-1.,-1.,-1.,-1., 0.],
... [ 1., 1., 1., 1., 0.],
... [-1., 0., 0., 0., 0.],
... [ 0.,-1., 0., 0., 0.],
... [ 0., 0.,-1., 0., 0.],
... [ 0., 0., 0.,-1., 0.]])
>>> b = matrix([0.,0.,0.,0.,1.])
>>> c = matrix([0.,0.,0., 1.,-1.,0.,0.,0.,0.])
>>> solve_lp(a, b, c)
"""
variables = c.size[0]
x = variable(variables, 'x')
eq = (a*x == b)
ineq = (x >= 0)
lp = op(dot(c, x), [eq, ineq])
lp.solve(solver='glpk')
return (lp.objective.value(), x.value)
Running it generates the following error:
Traceback (most recent call last):
...
TypeError: 'G' must be a dense or sparse 'd' matrix with 9 columns
It seems that cvxopt is throwing an exception regarding the ineq constraint, even though I seem to be following the syntax for constraints from the modeling examples.
What I've tried so far
Changing the code by multiplying x by a vector of 1s:
def solve_lp(a, b, c):
variables = c.size[0]
x = variable(variables, 'x')
e = matrix(1.0, (1, variables))
eq = (a*x == b)
ineq = (e*x >= 0)
lp = op(dot(c, x), [eq, ineq])
lp.solve(solver='glpk')
return (lp.objective.value(), x.value)
at least it gets to GLPK, which in turn produces this error:
Scaling...
A: min|aij| = 1.000e+00 max|aij| = 8.000e+00 ratio = 8.000e+00
Problem data seem to be well scaled
Constructing initial basis...
Size of triangular part = 6
* 0: obj = 0.000000000e+00 infeas = 0.000e+00 (0)
PROBLEM HAS UNBOUNDED SOLUTION
glp_simplex: unable to recover undefined or non-optimal solution
How do I fix this?
I think you should follow the usage of glpk solver in this webpage:
https://github.com/benmoran/L1-Sudoku/blob/master/sudoku.py
Follow this exactly you will fix and use this glpk solver correctly...
def solve_plain_l1(A, b, solver='glpk'):
'''Find x with min l1 such that Ax=b,
using plain L1 minimization'''
n = A.size[1]
c0 = ones_v(2*n)
G1 = concathoriz(A,-A) # concatenate horizontally
G2 = concathoriz(-A,A)
G3 = -eye(2*n)
G = reduce(concatvert, [G1,G2,G3]) # concatenate vertically
hh = reduce(concatvert, [b, -b, zeros_v(2*n)])
u = cvxopt.solvers.lp(c0, G, hh, solver=solver)
v = u['x'][:n]
return v
Related
I'm trying to convert this R function to Python. However, I am running into issues with the results being incorrect.
The R function in question:
construct_omega <- function(k){
E <- diag(2*k)
omega <- matrix(0, ncol=2*k, nrow=2*k)
for (i in 1:k){
omega <- omega +
E[,2*i-1] %*% t(E[,2*i]) -
E[,2*i] %*% t(E[,2*i-1])
}
return(omega)
}
This is my current attempt at porting the function to Python:
def construct_omega(k=1):
E = np.identity(2*k)
omega = np.zeros((2*k, 2*k))
for i in range(1, k):
omega = omega + \
E[:,2*i-1] * np.transpose(E[:,2*i]) - \
E[:,2*i] * np.transpose(E[:,2*i-1])
return omega
In R, the result matrix is this:
> construct_omega(2)
[,1] [,2] [,3] [,4]
[1,] 0 1 0 0
[2,] -1 0 0 0
[3,] 0 0 0 1
[4,] 0 0 -1 0
But in Python, the result is the 4x4 zero matrix.
Any help would be appreciated, thanks!
The issue here is an edge case of matrix multiplication in numpy. You should consult the docs and this post or this one. Basically what happens is that you are doing a dot product and getting a scalar, not a matrix as you assume. The fix is to use np.outer. The other issue is the indexing that in python starts at 0 so you need to rewrite your code a bit.
import numpy as np
def construct_omega(k=1):
E = np.identity(2*k)
omega = np.zeros((2*k, 2*k))
for i in range(k):
omega = omega + \
np.outer(E[:,2*i], E[:,2*i+1].T) - \
np.outer(E[:,2*i+1], E[:,2*i].T)
return omega
construct_omega(1)
#array([[ 0., 1.],
# [ -1., 0.]])
and
construct_omega(2)
#array([[ 0., 1., 0., 0.],
# [-1., 0., 0., 0.],
# [ 0., 0., 0., 1.],
# [ 0., 0., -1., 0.]])
I'm trying to find the determinant of a matrix using torch.det. However, it seems like I'm either not doing it right or the function is not working properly (the results should be 0 rather than a small number).
a = torch.tensor([1.0, 1.0])
b = torch.tensor([3.0, 3.0])
c = torch.stack([a,b], dim = 1)
print(c)
torch.det(d)
>>>tensor([[1., 3.],
[1., 3.]])
tensor(1.2517e-06)
Another example:
a = torch.tensor([2, -1, 1]).float()
b = torch.tensor([3, -4, -2]).float()
c = torch.tensor([5, -10, -8]).float()
d = torch.stack([a,b,c], dim = 1)
print(d)
print(torch.det(d))
>>>
tensor([[ 2., 3., 5.],
[ -1., -4., -10.],
[ 1., -2., -8.]])
tensor(1.2517e-06)
Update 1:
I think I had a typo in the first example (I restarted everything and reran it):
import torch
a = torch.tensor([1.0, 1.0])
b = torch.tensor([3.0, 3.0])
c = torch.stack([a,b], dim = 1)
print(c)
torch.det(c)
>>> tensor([[1., 3.],
[1., 3.]])
tensor(0.)
Though, I believe the second example should also be 0
I have implemented Gaussian elimination, back substitution, and solver functions to solve Simultaneous Linear Equations. Somehow I'm flowing through the code but I don't understand why I'm getting an error for the given function and it's as follows:
import numpy as np
A = np.array([[4., -1., -1., -1.],
[-1., 3., 0., -1.],
[-1., 3., 0., -1.],
[-1., -1., -1., 4.]])
v = np.array([5.,0.,5.,0.])
def gaussian_elimination(A, v):
N = len(v) # A should be shaped (N,N), and v is a (N,) vector
A_new = np.copy(A) # make copies, so our elimination doesn't effect original
v_new = np.copy(v)
for m in range(N):
# divide by the diagonal element to normalize
div = A_new[m, m]
A_new[m,:] /= div
v_new[m] /= div
# now subtract from the lower rows
for i in range(m+1, N):
mult = A_new[i, m]
A_new[i,:] -= mult * A_new[m,:]
v_new[i] -= mult * v_new[m]
return A_new,v_new
def backsubstitution(A, v):
N = len(v) # A should be shaped (N,N), and v is (N,) vector
x = np.empty(N, float) # holds results we will return
for m in range(N-1, -1, -1):
x[m] = v[m]
for i in range(m+1, N):
x[m] -= A[m, i] * x[i]
return x
def linear_system_solve(A, v):
A_upper, v_upper = gaussian_elimination(A, v)
x = backsubstitution(A_upper, v_upper)
return x
#system of equations using the gaussian elimination function
x = linear_system_solve(A, v)
print(A)
print(v)
print(x)
#the NumPy linear algebra library to solve the same system
x2 = np.linalg.solve(A, v)
print( x2 )
For gaussian elimination I'm getting the following error:
[[ 4. -1. -1. -1.]
[-1. 3. 0. -1.]
[-1. 3. 0. -1.]
[-1. -1. -1. 4.]]
[5. 0. 5. 0.]
[nan nan nan nan]
For linear algebra library I'm getting the following error:
---------------------------------------------------------------------------
LinAlgError Traceback (most recent call last)
<ipython-input-5-cc74f430c404> in <module>()
1 # use the NumPy linear algebra library to solve the same system here
2
----> 3 x2 = np.linalg.solve(A, v)
4 print( x2 )
~/anaconda3/lib/python3.7/site-packages/numpy/linalg/linalg.py in solve(a, b)
392 signature = 'DD->D' if isComplexType(t) else 'dd->d'
393 extobj = get_linalg_error_extobj(_raise_linalgerror_singular)
--> 394 r = gufunc(a, b, signature=signature, extobj=extobj)
395
396 return wrap(r.astype(result_t, copy=False))
~/anaconda3/lib/python3.7/site-packages/numpy/linalg/linalg.py in
_raise_linalgerror_singular(err, flag)
87
88 def _raise_linalgerror_singular(err, flag):
---> 89 raise LinAlgError("Singular matrix")
90
91 def _raise_linalgerror_nonposdef(err, flag):
LinAlgError: Singular matrix
Can anyone please help me to understand the glitch?
The procedure you implemented requires the matrix to be of full rank and so does np.linalg.solve. Your matrix A, however, is not of full rank since rows A[1] and A[2] are not lineraly independent. Otherwise, your code works fine. See the following example with modified data.
# set A[2, 2] to 1
A = np.array([[ 4., -1., -1., -1.],
[-1., 3., 0., -1.],
[-1., 3., 1., -1.],
[-1., -1., -1., 4.]])
v = np.array([5., 0., 5., 0.])
np.allclose(np.linalg.solve(A, v), linear_system_solve(A, v)) # True
You can check the rank of a matrix using np.linalg.matrix_rank.
I use np.einsum to calculate the flow of material in a graph (1 node to 4 nodes in this example). The amount of flow is given by amount (amount.shape == (1, 1, 2) the dimensions define certain criteria, let's call them a, b, c).
The boolean matrix route determines the permissible flow based on the a, b, c criteria into y (route.shape == (4, 1, 1, 2); yabc). I label the dimensions y, a, b, c. abc are equivalent to amounts dimensions abc, y is the direction of the flow (0, 1, 2 or 3). To determine the amount of material in y, I calculate np.einsum('abc,yabc->y', amount, route) and get a y-dim vector with the flows into y. There's also an implicit priorisation of the route. For instance, any route[0, ...] == True is False for any y=1..3, any route[1, ...] == True is False for the next higher y-dim routes and so on. route[3, ...] (last y-index) defines the catch-all route, that is, its values are True when previous y-index values were False ((route[0] ^ route[1] ^ route[2] ^ route[3]).all() == True).
This works fine. However, when I introduce another criteria (dimension) x which only exists in route, but not in amount, this logic seems to break. The below code demonstrates the problem:
>>> import numpy as np
>>> amount = np.asarray([[[5000.0, 0.0]]])
>>> route = np.asarray([[[[[False, True]]], [[[False, True]]], [[[False, True]]]], [[[[True, False]]], [[[False, False]]], [[[False, False]]]], [[[[False, False]]], [[[True, False]]], [[[False, False]]]], [[[[False, False]]], [[[False, False]]], [[[True, False]]]]], dtype=bool)
>>> amount.shape
(1, 1, 2)
>>> Added dimension `x`
>>> # y,x,a,b,c
>>> route.shape
(4, 3, 1, 1, 2)
>>> # Attempt 1: `5000` can flow into y=1, 2 or 3. I expect
>>> # `flows1.sum() == amount.sum()` as it would be without `x`.
>>> # Correct solution would be `[0, 5000, 0, 0]` because material is routed
>>> # to y=1, and is not available for y=2 and y=3 as they are lower
>>> # priority (higher index)
>>> flows1 = np.einsum('abc,yxabc->y', amount, route)
>>> flows1
array([ 0., 5000., 5000., 5000.])
>>> # Attempt 2: try to collapse `x` => not much different, duplication
>>> np.einsum('abc,yabc->y', amount, route.any(1))
array([ 0., 5000., 5000., 5000.])
>>> # This is the flow by `y` and `x`. I'd only expect a `5000` in the
>>> # 2nd row (`[5000., 0., 0.]`) not the others.
>>> np.einsum('abc,yxabc->yx', amount, route)
array([[ 0., 0., 0.],
[5000., 0., 0.],
[ 0., 5000., 0.],
[ 0., 0., 5000.]])
Is there any feasible operation which I can apply to route (.all(1) doesn't work either) to ignore the x-dimension?
Another example:
>>> amount2 = np.asarray([[[5000.0, 1000.0]]])
>>> np.einsum('abc,yabc->y', amount2, route.any(1))
array([1000., 5000., 5000., 5000.])
can be interpreted as 1000.0 being routed to y=0 (and none of the other y-destinations) and 5000.0 being compatible with destination y=1, y=2 and y=3, but ideally, I'd only like to show 5000.0 up in y=1 (as that's the lowest index and highest destination priority).
Solution attempt
The below works, but is not very numpy-ish. It'll be great if the loop could be eliminated.
# Initialise destination
result = np.zeros((route.shape[0]))
# Calculate flow by maintaining all dimensions (this will cause
# double ups because `x` is not part of `amount2`
temp = np.einsum('abc,yxabc->yxabc', amount2, route)
temp_ixs = np.asarray(np.where(temp))
# For each original amount, find the destination (`y`)
for a, b, c in zip(*np.where(amount2)):
# Find where dimensions `abc` are equal in the destination.
# Take the first vector which contains `yxabc` (we get `yx` as result)
ix = np.where((temp_ixs[2:].T == [a, b, c]).all(axis=1))[0][0]
y_ix = temp_ixs.T[ix][0]
# ignored
x_ix = temp_ixs.T[ix][1]
v = amount2[a, b, c]
# build resulting destination
result[y_ix] += v
# result == array([1000., 5000., 0., 0.])
With other words for each value in amount2, I am looking for the lowest indices yx in temp so that the value can be written to result[y] = value (x is ignored).
>>> temp = np.einsum('abc,yxabc->yx', amount2, route)
>>> temp
# +--- value=1000 at y=0 => result[0] += 1000
# /
array([[1000., 1000., 1000.],
# +--- value=5000 at y=1 => result[1] += 5000
# /
[5000., 0., 0.],
[ 0., 5000., 0.],
[ 0., 0., 5000.]])
>>> result
array([1000., 5000., 0., 0.])
>>> amount2
array([[[5000., 1000.]]])
Another attempt to reduce the dimensionality of route is:
>>> r = route.any(1)
>>> for x in xrange(1, route.shape[0]):
r[x] = r[x] & (r[:x] == False).all(axis=0)
>>> np.einsum('abc,yabc->y', amount2, r)
array([1000., 5000., 0., 0.])
This essentially preserves above-mentioned priority given by the first dimension of route. Any lower priority (higher index) array cannot contain a True value when a higher priority array has a value of True already at that sub index. While this is a lot better than my explicit approach, it would be great if the for x in xrange... loop could be expressed as numpy vector operation.
I haven't tried to follow your 'flow' interpretation of the multiplication problem. I'm just focusing on the calculation options.
Stripped of unnecessary dimensions, your arrays are:
In [194]: amount
Out[194]: array([5000., 0.])
In [195]: route
Out[195]:
array([[[0, 1],
[0, 1],
[0, 1]],
[[1, 0],
[0, 0],
[0, 0]],
[[0, 0],
[1, 0],
[0, 0]],
[[0, 0],
[0, 0],
[1, 0]]])
And the yx calculation is:
In [197]: np.einsum('a,yxa->yx',amount, route)
Out[197]:
array([[ 0., 0., 0.],
[5000., 0., 0.],
[ 0., 5000., 0.],
[ 0., 0., 5000.]])
which is just this slice of route times 5000.
In [198]: route[:,:,0]
Out[198]:
array([[0, 0, 0],
[1, 0, 0],
[0, 1, 0],
[0, 0, 1]])
Omit the x on the RHS of the einsum results in summation across the dimension.
Equivalently we can multiply (with broadcasting):
In [200]: (amount*route).sum(axis=2)
Out[200]:
array([[ 0., 0., 0.],
[5000., 0., 0.],
[ 0., 5000., 0.],
[ 0., 0., 5000.]])
In [201]: (amount*route).sum(axis=(1,2))
Out[201]: array([ 0., 5000., 5000., 5000.])
Maybe looking at amount*route will help visualize the problem. You can also use max, min, argmax etc instead of sum, or along with it on one or more of the axes.
I'm trying to convert from a numpy array of signs (i.e., a numpy array whose entries are either 1. or -1.) to an integer and back through a binary representation. I have something that works, but it's not Pythonic, and I expect it'll be slow.
def sign2int(s):
s[s==-1.] = 0.
bstr = ''
for i in range(len(s)):
bstr = bstr + str(int(s[i]))
return int(bstr, 2)
def int2sign(i, m):
bstr = bin(i)[2:].zfill(m)
s = []
for d in bstr:
s.append(float(d))
s = np.array(s)
s[s==0.] = -1.
return s
Then
>>> m = 4
>>> s0 = np.array([1., -1., 1., 1.])
>>> i = sign2int(s0)
>>> print i
11
>>> s = int2sign(i, m)
>>> print s
[ 1. -1. 1. 1.]
I'm concerned about (1) the for loops in each and (2) having to build an intermediate representation as a string.
Ultimately, I will want something that works with a 2-d numpy array, too---e.g.,
>>> s = np.array([[1., -1., 1.], [1., 1., 1.]])
>>> print sign2int(s)
[5, 7]
For 1d arrays you can use this one linear Numpythonic approach, using np.packbits:
>>> np.packbits(np.pad((s0+1).astype(bool).astype(int), (8-s0.size, 0), 'constant'))
array([11], dtype=uint8)
And for reversing:
>>> unpack = (np.unpackbits(np.array([11], dtype=np.uint8))[-4:]).astype(float)
>>> unpack[unpack==0] = -1
>>> unpack
array([ 1., -1., 1., 1.])
And for 2d array:
>>> x, y = s.shape
>>> np.packbits(np.pad((s+1).astype(bool).astype(int), (8-y, 0), 'constant')[-2:])
array([5, 7], dtype=uint8)
And for reversing:
>>> unpack = (np.unpackbits(np.array([5, 7], dtype='uint8'))).astype(float).reshape(x, 8)[:,-y:]
>>> unpack[unpack==0] = -1
>>> unpack
array([[ 1., -1., 1.],
[ 1., 1., 1.]])
I'll start with sig2int.. Convert from a sign representation to binary
>>> a
array([ 1., -1., 1., -1.])
>>> (a + 1) / 2
array([ 1., 0., 1., 0.])
>>>
Then you can simply create an array of powers of two, multiply it by the binary and sum.
>>> powers = np.arange(a.shape[-1])[::-1]
>>> np.power(2, powers)
array([8, 4, 2, 1])
>>> a = (a + 1) / 2
>>> powers = np.power(2, powers)
>>> a * powers
array([ 8., 0., 2., 0.])
>>> np.sum(a * powers)
10.0
>>>
Then make it operate on rows by adding axis information and rely on broadcasting.
def sign2int(a):
# powers of two
powers = np.arange(a.shape[-1])[::-1]
np.power(2, powers, powers)
# sign to "binary" - add one and divide by two
np.add(a, 1, a)
np.divide(a, 2, a)
# scale by powers of two and sum
np.multiply(a, powers, a)
return np.sum(a, axis = -1)
>>> b = np.array([a, a, a, a, a])
>>> sign2int(b)
array([ 11., 11., 11., 11., 11.])
>>>
I tried it on a 4 by 100 bit array and it seemed fast
>>> a = a.repeat(100)
>>> b = np.array([a, a, a, a, a])
>>> b
array([[ 1., 1., 1., ..., 1., 1., 1.],
[ 1., 1., 1., ..., 1., 1., 1.],
[ 1., 1., 1., ..., 1., 1., 1.],
[ 1., 1., 1., ..., 1., 1., 1.],
[ 1., 1., 1., ..., 1., 1., 1.]])
>>> sign2int(b)
array([ 2.58224988e+120, 2.58224988e+120, 2.58224988e+120,
2.58224988e+120, 2.58224988e+120])
>>>
I'll add the reverse if i can figure it. - the best I could do relies on some plain Python without any numpy vectoriztion magic and I haven't figured how to make it work with a sequence of ints other than to iterate over them and convert them one at a time - but the time still seems acceptable.
def foo(n):
'''yields bits in increasing powers of two
bit sequence from lsb --> msb
'''
while n > 0:
n, r = divmod(n, 2)
yield r
def int2sign(n):
n = int(n)
a = np.fromiter(foo(n), dtype = np.int8, count = n.bit_length())
np.multiply(a, 2, a)
np.subtract(a, 1, a)
return a[::-1]
Works on 1324:
>>> bin(1324)
'0b10100101100'
>>> a = int2sign(1324)
>>> a
array([ 1, -1, 1, -1, -1, 1, -1, 1, 1, -1, -1], dtype=int8)
Seems to work with 1.2e305:
>>> n = int(1.2e305)
>>> n.bit_length()
1014
>>> a = int2sign(n)
>>> a.shape
(1014,)
>>> s = bin(n)
>>> s = s[2:]
>>> all(2 * int(x) -1 == y for x, y in zip(s, a))
True
>>>
Here are some vectorized versions of your functions:
def sign2int(s):
return int(''.join(np.where(s == -1., 0, s).astype(int).astype(str)), 2)
def int2sign(i, m):
tmp = np.array(list(bin(i)[2:].zfill(m)))
return np.where(tmp == "0", "-1", tmp).astype(int)
s0 = np.array([1., -1., 1., 1.])
sign2int(s0)
# 11
int2sign(11, 5)
# array([-1, 1, -1, 1, 1])
To use your functions on 2-d arrays, you can use map function:
s = np.array([[1., -1., 1.], [1., 1., 1.]])
map(sign2int, s)
# [5, 7]
map(lambda x: int2sign(x, 4), [5, 7])
# [array([-1, 1, -1, 1]), array([-1, 1, 1, 1])]
After a bit of testing, the Numpythonic approach of #wwii that doesn't use strings seems to fit what I need best. For the int2sign, I used a for-loop over the exponents with a standard algorithm for the conversion---which will have at most 64 iterations for 64-bit integers. Numpy's broadcasting happens across each integer very efficiently.
packbits and unpackbits are restricted to 8-bit integers; otherwise, I suspect that would've been the best (though I didn't try).
Here are the specific implementations I tested that follow the suggestions in the other answers (thanks to everyone!):
def _sign2int_str(s):
return int(''.join(np.where(s == -1., 0, s).astype(int).astype(str)), 2)
def sign2int_str(s):
return np.array(map(_sign2int_str, s))
def _int2sign_str(i, m):
tmp = np.array(list(bin(i)[2:])).astype(int)
return np.pad(np.where(tmp == 0, -1, tmp), (m - len(tmp), 0), "constant", constant_values = -1)
def int2sign_str(i,m):
return np.array(map(lambda x: _int2sign_str(x, m), i.astype(int).tolist())).transpose()
def sign2int_np(s):
p = np.arange(s.shape[-1])[::-1]
s = s + 1
return np.sum(np.power(s, p), axis = -1).astype(int)
def int2sign_np(i,m):
N = i.shape[-1]
S = np.zeros((m, N))
for k in range(m):
b = np.power(2, m - 1 - k).astype(int)
S[k,:] = np.divide(i.astype(int), b).astype(float)
i = np.mod(i, b)
S[S==0.] = -1.
return S
And here is my test:
X = np.sign(np.random.normal(size=(5000, 20)))
N = 100
t = time.time()
for i in range(N):
S = sign2int_np(X)
print 'sign2int_np: \t{:10.8f} sec'.format((time.time() - t)/N)
t = time.time()
for i in range(N):
S = sign2int_str(X)
print 'sign2int_str: \t{:10.8f} sec'.format((time.time() - t)/N)
m = 20
S = np.random.randint(0, high=np.power(2,m), size=(5000,))
t = time.time()
for i in range(N):
X = int2sign_np(S, m)
print 'int2sign_np: \t{:10.8f} sec'.format((time.time() - t)/N)
t = time.time()
for i in range(N):
X = int2sign_str(S, m)
print 'int2sign_str: \t{:10.8f} sec'.format((time.time() - t)/N)
This produced the following results:
sign2int_np: 0.00165325 sec
sign2int_str: 0.04121902 sec
int2sign_np: 0.00318024 sec
int2sign_str: 0.24846984 sec
I think numpy.packbits is worth another look. Given a real-valued sign array a, you can use numpy.packbits(a > 0). Decompression is done by numpy.unpackbits. This implicitly flattens multi-dimensional arrays so you'll need to reshape after unpackbits if you have a multi-dimensional array.
Note that you can combine bit packing with conventional compression (e.g., zlib or lzma). If there is a pattern or bias to your data, you may get a useful compression factor, but for unbiased random data, you'll typically see a moderate size increase.