I'm looking to port an algorithm from MATLAB to Python. One step in said algorithm involves taking A^(-1/2) where A is a 9x9 square complex matrix. As I understand it, the square root of matrices (and by extension their inverses) are not-unique.
I've been experimenting with scipy.linalg.fractional_matrix_power and an approximation using A^(-1/2) = exp((-1/2)*log(A)) with numpy's built in expm and logm functions. The former is exceptionally poor and only provides 3 decimal places of precision whereas the latter is decently correct for elements in the top left corner but gets progressively worse as you move down and to the right. This may or may not be a perfectly valid mathematical solution to the expression however it doesn't suffice for this application.
As a result, I'm looking to directly implement MATLAB's matrix power algorithm in Python so that I can 100% confirm the same result each time. Does anyone have any insight or documentation on how this would work? The more parallelizable this algorithm is, the better, as eventually the goal would be to rewrite it in OpenCL for GPU acceleration.
EDIT: An MCVE as requested:
[[(0.591557294607941+4.33680868994202e-19j), (-0.219707725574605-0.35810724986609j), (-0.121305654177909+0.244558388829046j), (0.155552026648172-0.0180264818714123j), (-0.0537690384136066-0.0630740244116577j), (-0.0107526931263697+0.0397896274845627j), (0.0182892503609312-0.00653264433724856j), (-0.00710188853532244-0.0050445035279044j), (-2.20414002823034e-05+0.00373184532662288j)], [(-0.219707725574605+0.35810724986609j), (0.312038814492119+2.16840434497101e-19j), (-0.109433401402399-0.174379997015402j), (-0.0503362231078033+0.108510948023091j), (0.0631826956936223-0.00992931123813742j), (-0.0219902325360141-0.0233215237172002j), (-0.00314837555001163+0.0148621558916679j), (0.00630295247506065-0.00266790359447072j), (-0.00249343102520442-0.00156160619280611j)], [(-0.121305654177909-0.244558388829046j), (-0.109433401402399+0.174379997015402j), (0.136649392858215-1.76182853028894e-19j), (-0.0434623984527311-0.0669251299161109j), (-0.0168737559719828+0.0393768358149159j), (0.0211288536117387-0.00417146769324491j), (-0.00734306979471257-0.00712443264825166j), (-0.000742681625102133+0.00455752452374196j), (0.00179068247786595-0.000862706240042082j)], [(0.155552026648172+0.0180264818714123j), (-0.0503362231078033-0.108510948023091j), (-0.0434623984527311+0.0669251299161109j), (0.0467980890488569+5.14996031930615e-19j), (-0.0140208255975664-0.0209483313237692j), (-0.00472995448413803+0.0117916398375124j), (0.00589653974090387-0.00134198920550751j), (-0.00202109265416585-0.00184021636458858j), (-0.000150793859056431+0.00116822322464066j)], [(-0.0537690384136066+0.0630740244116577j), (0.0631826956936223+0.00992931123813742j), (-0.0168737559719828-0.0393768358149159j), (-0.0140208255975664+0.0209483313237692j), (0.0136137125669776-2.03287907341032e-20j), (-0.00387854073283377-0.0056769786724813j), (-0.0011741038702424+0.00306007798625676j), (0.00144000687517355-0.000355251914809693j), (-0.000481433965262789-0.00042129815655098j)], [(-0.0107526931263697-0.0397896274845627j), (-0.0219902325360141+0.0233215237172002j), (0.0211288536117387+0.00417146769324491j), (-0.00472995448413803-0.0117916398375124j), (-0.00387854073283377+0.0056769786724813j), (0.00347771689075251+8.21621958836671e-20j), (-0.000944046302699304-0.00136521328407881j), (-0.00026318475762475+0.000704212317211994j), (0.00031422288569727-8.10033316327328e-05j)], [(0.0182892503609312+0.00653264433724856j), (-0.00314837555001163-0.0148621558916679j), (-0.00734306979471257+0.00712443264825166j), (0.00589653974090387+0.00134198920550751j), (-0.0011741038702424-0.00306007798625676j), (-0.000944046302699304+0.00136521328407881j), (0.000792908166233942-7.41153828847513e-21j), (-0.00020531962049495-0.000294952695922854j), (-5.36226164765808e-05+0.000145645628243286j)], [(-0.00710188853532244+0.00504450352790439j), (0.00630295247506065+0.00266790359447072j), (-0.000742681625102133-0.00455752452374196j), (-0.00202109265416585+0.00184021636458858j), (0.00144000687517355+0.000355251914809693j), (-0.00026318475762475-0.000704212317211994j), (-0.00020531962049495+0.000294952695922854j), (0.000162971629601464-5.39321759384574e-22j), (-4.03304806590714e-05-5.77159110863666e-05j)], [(-2.20414002823034e-05-0.00373184532662288j), (-0.00249343102520442+0.00156160619280611j), (0.00179068247786595+0.000862706240042082j), (-0.000150793859056431-0.00116822322464066j), (-0.000481433965262789+0.00042129815655098j), (0.00031422288569727+8.10033316327328e-05j), (-5.36226164765808e-05-0.000145645628243286j), (-4.03304806590714e-05+5.77159110863666e-05j), (3.04302590501313e-05-4.10281583826302e-22j)]]
I can think of two explanations, in both cases I accuse user error. In chronological order:
Theory #1 (the subtle one)
My suspicion is that you're copying the printed values of the input matrix from one code as input into the other. I.e. you're throwing away double precision when you switch codes, which gets amplified during the inverse-square-root calculation.
As proof, I compared MATLAB's inverse square root with the very function you're using in python. I will show a 3x3 example due to size considerations, but—spoiler warning—I did the same with a 9x9 random matrix and got two results with condition number 11.245754109790719 (MATLAB) and 11.245754109790818 (numpy). That should tell you something about the similarity of the results without having to save and load the actual matrices between the two codes. I suggest you do this though: keywords are scipy.io.loadmat and savemat.
What I did was generate the random data in python (because that's what I prefer):
>>> import numpy as np
>>> print((np.random.rand(3,3) + 1j*np.random.rand(3,3)).tolist())
[[(0.8404782758300281+0.29389006737780765j), (0.741574080512219+0.7944606900644321j), (0.12788250870304718+0.37304665786925073j)], [(0.8583402784463595+0.13952117266781894j), (0.2138809231406249+0.6233427148017449j), (0.7276466404131303+0.6480559739625379j)], [(0.1784816129006297+0.72452362541158j), (0.2870462766764591+0.8891190037142521j), (0.0980355896905617+0.03022344706473823j)]]
By copying the same truncated output into both codes, I guarantee the correspondence of the inputs.
Example in MATLAB:
>> M = [[(0.8404782758300281+0.29389006737780765j), (0.741574080512219+0.7944606900644321j), (0.12788250870304718+0.37304665786925073j)]; [(0.8583402784463595+0.13952117266781894j), (0.2138809231406249+0.6233427148017449j), (0.7276466404131303+0.6480559739625379j)]; [(0.1784816129006297+0.72452362541158j), (0.2870462766764591+0.8891190037142521j), (0.0980355896905617+0.03022344706473823j)]];
>> A = M^(-0.5);
>> format long
>> disp(A)
0.922112307438377 + 0.919346397931976i 0.108620882045523 - 0.649850434897895i -0.778737740194425 - 0.320654127149988i
-0.423384022626231 - 0.842737730824859i 0.592015668030645 + 0.661682656423866i 0.529361991464903 - 0.388343838121371i
-0.550789874427422 + 0.021129515921025i 0.472026152514446 - 0.502143106675176i 0.942976466768961 + 0.141839849623673i
>> cond(A)
ans =
3.429368520364765
Example in python:
>>> M = [[(0.8404782758300281+0.29389006737780765j), (0.741574080512219+0.7944606900644321j), (0.12788250870304718+0.37304665786925073j)], [(0.8583402784463595+0.13952117266781894j), (0.2138809231406249+0.6233427148017449j), (0.7276466404
... 131303+0.6480559739625379j)], [(0.1784816129006297+0.72452362541158j), (0.2870462766764591+0.8891190037142521j), (0.0980355896905617+0.03022344706473823j)]]
>>> A = fractional_matrix_power(M,-0.5)
>>> print(A)
[[ 0.92211231+0.9193464j 0.10862088-0.64985043j -0.77873774-0.32065413j]
[-0.42338402-0.84273773j 0.59201567+0.66168266j 0.52936199-0.38834384j]
[-0.55078987+0.02112952j 0.47202615-0.50214311j 0.94297647+0.14183985j]]
>>> np.linalg.cond(A)
3.4293685203647408
My suspicion is that if you scipy.io.loadmat the matrix into python, do the calculation, scipy.io.savemat the result and load it back in with MATLAB, you'll see less than 1e-12 absolute error (hopefully even less) between the results.
Theory #2 (the facepalm one)
My suspicion is that you're using python 2, and your -1/2-powered division is a simple inverse:
>>> # python 3 below
>>> # python 3's // is python 2's /, i.e. integer division
>>> 1/2
0.5
>>> 1//2
0
>>> -1/2
-0.5
>>> -1//2
-1
So if you're using python 2, then calling
fractional_matrix_power(M,-1/2)
is actually the inverse of M. The obvious solution is to switch to python 3. The less obvious solution is to keep using python 2 (which you shouldn't, as the above exemplifies), but use
from __future__ import division
on top of your every source file. This will override the behaviour of the simple / division operator so that it reflects the python 3 version, and you will have one less headache.
Related
I'm curious as to why it's so much faster to multiply than to take powers in python (though from what I've read this may well be true in many other languages too). For example it's much faster to do
x*x
than
x**2
I suppose the ** operator is more general and can also deal with fractional powers. But if that's why it's so much slower, why doesn't it perform a check for an int exponent and then just do the multiplication?
Edit: Here's some example code I tried...
def pow1(r, n):
for i in range(r):
p = i**n
def pow2(r, n):
for i in range(r):
p = 1
for j in range(n):
p *= i
Now, pow2 is just a quick example and is clearly not optimised!
But even so I find that using n = 2 and r = 1,000,000, then pow1 takes ~ 2500ms and pow2 takes ~ 1700ms.
I admit that for large values of n, then pow1 does get much quicker than pow2. But that's not too surprising.
Basically naive multiplication is O(n) with a very low constant factor. Taking the power is O(log n) with a higher constant factor (There are special cases that need to be tested... fractional exponents, negative exponents, etc) . Edit: just to be clear, that's O(n) where n is the exponent.
Of course the naive approach will be faster for small n, you're only really implementing a small subset of exponential math so your constant factor is negligible.
Adding a check is an expense, too. Do you always want that check there? A compiled language could make the check for a constant exponent to see if it's a relatively small integer because there's no run-time cost, just a compile-time cost. An interpreted language might not make that check.
It's up to the particular implementation unless that kind of detail is specified by the language.
Python doesn't know what distribution of exponents you're going to feed it. If it's going to be 99% non-integer values, do you want the code to check for an integer every time, making runtime even slower?
Doing this in the exponent check will slow down the cases where it isn't a simple power of two very slightly, so isn't necessarily a win. However, in cases where the exponent is known in advance( eg. literal 2 is used), the bytecode generated could be optimised with a simple peephole optimisation. Presumably this simply hasn't been considered worth doing (it's a fairly specific case).
Here's a quick proof of concept that does such an optimisation (usable as a decorator). Note: you'll need the byteplay module to run it.
import byteplay, timeit
def optimise(func):
c = byteplay.Code.from_code(func.func_code)
prev=None
for i, (op, arg) in enumerate(c.code):
if op == byteplay.BINARY_POWER:
if c.code[i-1] == (byteplay.LOAD_CONST, 2):
c.code[i-1] = (byteplay.DUP_TOP, None)
c.code[i] = (byteplay.BINARY_MULTIPLY, None)
func.func_code = c.to_code()
return func
def square(x):
return x**2
print "Unoptimised :", timeit.Timer('square(10)','from __main__ import square').timeit(10000000)
square = optimise(square)
print "Optimised :", timeit.Timer('square(10)','from __main__ import square').timeit(10000000)
Which gives the timings:
Unoptimised : 6.42024898529
Optimised : 4.52667593956
[Edit]
Actually, thinking about it a bit more, there's a very good reason why this optimisaton isn't done. There's no guarantee that someone won't create a user defined class that overrides the __mul__ and __pow__ methods and do something different for each. The only way to do it safely is if you can guarantee that the object on the top of the stack is one that has the same result "x**2" and "x*x", but working that out is much harder. Eg. in my example it's impossible, as any object could be passed to the square function.
An implementation of b^p with binary exponentiation
def power(b, p):
"""
Calculates b^p
Complexity O(log p)
b -> double
p -> integer
res -> double
"""
res = 1
while p:
if p & 0x1: res *= b
b *= b
p >>= 1
return res
I'd suspect that nobody was expecting this to be all that important. Typically, if you want to do serious calculations, you do them in Fortran or C or C++ or something like that (and perhaps call them from Python).
Treating everything as exp(n * log(x)) works well in cases where n isn't integral or is pretty large, but is relatively inefficient for small integers. Checking to see if n is a small enough integer does take time, and adds complication.
Whether the check is worth it depends on the expected exponents, how important it is to get best performance here, and the cost of the extra complexity. Apparently, Guido and the rest of the Python gang decided the check wasn't worth doing.
If you like, you could write your own repeated-multiplication function.
how about xxxxx?
is it still faster than x**5?
as int exponents gets larger, taking powers might be faster than multiplication.
but the number where actual crossover occurs depends on various conditions, so in my opinion, that's why the optimization was not done(or couldn't be done) in language/library level. But users can still optimize for some special cases :)
I'm looking for a pseudo-random number generator (an algorithm where you input a seed number and it outputs a different 'random-looking' number, and the same seed will always generate the same output) for numbers between 1 and 951,312,000.
I would use the Linear Feedback Shift Register (LFSR) PRNG, but if I did, I would have to convert the seed number (which could be up to 1.2 million digits long in base-10) into a binary number, which would be so massive that I think it would take too long to compute.
In response to a similar question, the Feistel cipher was recommended, but I didn't understand the vocabulary of the wiki page for that method (I'm going into 10th grade so I don't have a degree in encryption), so if you could use layman's terms, I would strongly appreciate it.
Is there an efficient way of doing this which won't take until the end of time, or is this problem impossible?
Edit: I forgot to mention that the prng sequence needs to have a full period. My mistake.
A simple way to do this is to use a linear congruential generator with modulus m = 95^1312000.
The formula for the generator is x_(n+1) = a*x_n + c (mod m). By the Hull-Dobell Theorem, it will have full period if and only if gcd(m,c) = 1 and 95 divides a-1. Furthermore, if you want good second values (right after the seed) even for very small seeds, a and c should be fairly large. Also, your code can't store these values as literals (they would be much too big). Instead, you need to be able to reliably produce them on the fly. After a bit of trial and error to make sure gcd(m,c) = 1, I hit upon:
import random
def get_book(n):
random.seed(1941) #Borges' Library of Babel was published in 1941
m = 95**1312000
a = 1 + 95 * random.randint(1, m//100)
c = random.randint(1, m - 1) #math.gcd(c,m) = 1
return (a*n + c) % m
For example:
>>> book = get_book(42)
>>> book % 10**100
4779746919502753142323572698478137996323206967194197332998517828771427155582287891935067701239737874
shows the last 100 digits of "book" number 42. Given Python's built-in support for large integers, the code runs surprisingly fast (it takes less than 1 second to grab a book on my machine)
If you have a method that can produce a pseudo-random digit, then you can concatenate as many together as you want. It will be just as repeatable as the underlying prng.
However, you'll probably run out of memory scaling that up to millions of digits and attempting to do arithmetic. Normally stuff on that scale isn't done on "numbers". It's done on byte vectors, or something similar.
I have this line of MATLAB code:
a/b
I am using these inputs:
a = [1,2,3,4,5,6,7,8,9,1,2,3,4,5,6,7,8,9]
b = ones(25, 18)
This is the result (a 1x25 matrix):
[5,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]
What is MATLAB doing? I am trying to duplicate this behavior in Python, and the mrdivide documentation in MATLAB was unhelpful. Where does the 5 come from, and why are the rest of the values 0?
I have tried this with other inputs and receive similar results, usually just a different first element and zeros filling the remainder of the matrix. In Python when I use linalg.lstsq(b.T,a.T), all of the values in the first matrix returned (i.e. not the singular one) are 0.2. I have already tried right division in Python and it gives something completely off with the wrong dimensions.
I understand what a least square approximation is, I just need to know what mrdivide is doing.
Related:
Array division- translating from MATLAB to Python
MRDIVIDE or the / operator actually solves the xb = a linear system, as opposed to MLDIVIDE or the \ operator which will solve the system bx = a.
To solve a system xb = a with a non-symmetric, non-invertible matrix b, you can either rely on mridivide(), which is done via factorization of b with Gauss elimination, or pinv(), which is done via Singular Value Decomposition, and zero-ing of the singular values below a (default) tolerance level.
Here is the difference (for the case of mldivide): What is the difference between PINV and MLDIVIDE when I solve A*x=b?
When the system is overdetermined, both algorithms provide the
same answer. When the system is underdetermined, PINV will return the
solution x, that has the minimum norm (min NORM(x)). MLDIVIDE will
pick the solution with least number of non-zero elements.
In your example:
% solve xb = a
a = [1,2,3,4,5,6,7,8,9,1,2,3,4,5,6,7,8,9];
b = ones(25, 18);
the system is underdetermined, and the two different solutions will be:
x1 = a/b; % MRDIVIDE: sparsest solution (min L0 norm)
x2 = a*pinv(b); % PINV: minimum norm solution (min L2)
>> x1 = a/b
Warning: Rank deficient, rank = 1, tol = 2.3551e-014.
ans =
5.0000 0 0 ... 0
>> x2 = a*pinv(b)
ans =
0.2 0.2 0.2 ... 0.2
In both cases the approximation error of xb-a is non-negligible (non-exact solution) and the same, i.e. norm(x1*b-a) and norm(x2*b-a) will return the same result.
What is MATLAB doing?
A great break-down of the algorithms (and checks on properties) invoked by the '\' operator, depending upon the structure of matrix b is given in this post in scicomp.stackexchange.com. I am assuming similar options apply for the / operator.
For your example, MATLAB is most probably doing a Gaussian elimination, giving the sparsest solution amongst a infinitude (that's where the 5 comes from).
What is Python doing?
Python, in linalg.lstsq uses pseudo-inverse/SVD, as demonstrated above (that's why you get a vector of 0.2's). In effect, the following will both give you the same result as MATLAB's pinv():
from numpy import *
a = array([1,2,3,4,5,6,7,8,9,1,2,3,4,5,6,7,8,9])
b = ones((25, 18))
# xb = a: solve b.T x.T = a.T instead
x2 = linalg.lstsq(b.T, a.T)[0]
x2 = dot(a, linalg.pinv(b))
TL;DR: A/B = np.linalg.solve(B.conj().T, A.conj().T).conj().T
I did not find the earlier answers to create a satisfactory substitute, so I dug into Matlab's reference documents for mrdivide further and found the solution. I cannot explain the actual mathematics here or take credit for coming up with the answer. I'm just following Matlab's explanation. Additionally, I wanted to post the actual detail from Matlab to give credit. If it's a copyright issue, someone tell me and I'll remove the actual text.
%/ Slash or right matrix divide.
% A/B is the matrix division of B into A, which is roughly the
% same as A*INV(B) , except it is computed in a different way.
% More precisely, A/B = (B'\A')'. See MLDIVIDE for details.
%
% C = MRDIVIDE(A,B) is called for the syntax 'A / B' when A or B is an
% object.
%
% See also MLDIVIDE, RDIVIDE, LDIVIDE.
% Copyright 1984-2005 The MathWorks, Inc.
Note that the ' symbol indicates the complex conjugate transpose. In python using numpy, that requires .conj().T chained together.
Per this handy "cheat sheet" of numpy for matlab users, linalg.lstsq(b,a) -- linalg is numpy.linalg.linalg, a light-weight version of the full scipy.linalg.
a/b finds the least square solution to the system of linear equations bx = a
if b is invertible, this is a*inv(b), but if it isn't, the it is the x which minimises norm(bx-a)
You can read more about least squares on wikipedia.
according to matlab documentation, mrdivide will return at most k non-zero values, where k is the computed rank of b. my guess is that matlab in your case solves the least squares problem given by replacing b by b(:1) (which has the same rank). In this case the moore-penrose inverse b2 = b(1,:); inv(b2*b2')*b2*a' is defined and gives the same answer
I'm just working on Project Euler problem 12, so I need to do some testing against numbers that are multiples of over 500 unique factors.
I figured that the array [1, 2, 3... 500] would be a good starting point, since the product of that array is the lowest possible such number. However, numpy.prod() returns zero for this array. I'm sure I'm missing something obvious, but what the hell is it?
>>> import numpy as np
>>> array = []
>>> for i in range(1,100):
... array.append(i)
...
>>> np.prod(array)
0
>>> array.append(501)
>>> np.prod(array)
0
>>> array.append(5320934)
>>> np.prod(array)
0
Note that Python uses "unlimited" integers, but in numpy everything is typed, and so it is a "C"-style (probably 64-bit) integer here. You're probably experiencing an overflow.
If you look at the documentation for numpy.prod, you can see the dtype parameter:
The type of the returned array, as well as of the accumulator in which the elements are multiplied.
There are a few things you can do:
Drop back to Python, and multiply using its "unlimited integers" (see this question for how to do so).
Consider whether you actually need to find the product of such huge numbers. Often, when you're working with the product of very small or very large numbers, you switch to sums of logarithms. As #WarrenWeckesser notes, this is obviously imprecise (it's not like taking the exponent at the end will give you the exact solution) - rather, it's used to gauge whether one product is growing faster than another.
Those numbers get very big, fast.
>>> np.prod(array[:25])
7034535277573963776
>>> np.prod(array[:26])
-1569523520172457984
>>> type(_)
numpy.int64
You're actually overflowing numpy's data type here, hence the wack results. If you stick to python ints, you won't have overflow.
>>> import operator
>>> reduce(operator.mul, array, 1)
933262154439441526816992388562667004907159682643816214685929638952175999932299156089414639761565182862536979208272237582511852109168640000000000000000000000L
You get the result 0 due to the large number of factors 2 in the product, there are more than 450 of those factors. Thus in a reduction modulo 2^64, the result is zero.
Why the data type forces this reduction is explained in the other answers.
250+125+62+31+15+7+3+1 = 494 is the multiplicity of 2 in 500!
added 12/2020: or, in closer reading the question and its code,
49+24+12+6+3+1 = 95 as the multiplicity of 2 in 99!
which is the product of the first part of your list. Still enough binary zeros at the end of the number to fill all the bit positions of a 64bit integer. Just to compare, you get
19+3 = 22 factors of 5 in 99!
which is also the number of trailing zeros in the decimal expression of this factorial.
More specifically, numpy:
In [24]: a=np.random.RandomState(4)
In [25]: a.rand()
Out[25]: 0.9670298390136767
In [26]: a.get_state()
Out[26]:
('MT19937',
array([1248735455, ..., 1532921051], dtype=uint32),
2,0,0.0)
octave:
octave:17> rand('state',4)
octave:18> rand()
ans = 0.23605
octave:19> rand('seed',4)
octave:20> rand()
ans = 0.12852
Octave claims to perform the same algorithm (Mersenne Twister with a period of 2^{19937-1})
Anybody know why the difference?
Unfortunately the MT19937 generator in Octave does not allow you to initialise it using a single 32-bit integer as np.random.RandomState(4) does. If you use rand("seed",4) this actually switches to an earlier version of the PRNG used previously in Octave, which PRNG is not MT19937 at all, but rather the Fortran RANDLIB.
It is possible to get the same numbers in NumPy and Octave, but you have to hack around the random seed generation algorithm in Octave and write your own function to construct the state vector out of the initial 32-bit integer seed. I am not an Octave guru, but with several Internet searches on bit manipulation functions and integer classes in Octave/Matlab I was able to write the following crude script to implement the seeding:
function state = mtstate(seed)
state = uint32(zeros(625,1));
state(1) = uint32(seed);
for i=1:623,
tmp = uint64(1812433253)*uint64(bitxor(state(i),bitshift(state(i),-30)))+i;
state(i+1) = uint32(bitand(tmp,uint64(intmax('uint32'))));
end
state(625) = 1;
Use it like this:
octave:9> rand('state',mtstate(4));
octave:10> rand(1,5)
ans =
0.96703 0.54723 0.97268 0.71482 0.69773
Just for comparison with NumPy:
>>> a = numpy.random.RandomState(4)
>>> a.rand(5)
array([ 0.96702984, 0.54723225, 0.97268436, 0.71481599, 0.69772882])
The numbers (or at least the first five of them) match.
Note that the default random number generator in Python, provided by the random module, is also MT19937, but it uses a different seeding algorithm, so random.seed(4) produces a completely different state vector and hence the PRN sequence is then different.
If you look at the details of the Mersenne Twister algorithm, there are lots of parameters that affect the actual numbers produced. I don't think Python and Octave are trying to produce the same sequence of numbers.
It looks like numpy is returning the raw random integers, whereas octave is normalising them to floats between 0 and 1.0.