I have a text file that has some values of a matrix, but it just has half of the values of it, like this:
1. 1. 0.01
2. 1. 0.052145
2. 2. 0.045
3. 1. 0.054521
3. 2. 0.05424
3. 3. 0.05459898
the first two columns are referent to matrix (x,y) position, and the last one, the value it has. the first two values might be, actually, value-1.
I made a function that reads the file and mirrors these values to a full matrix:
def expand_mirror_matrix(matrix_path='data.txt'):
data = np.loadtxt(matrix_path)
shape = (int(data[-1][0]), int(data[-1][1]))
m = np.zeros(shape)
for d in data:
x, y, z = int(d[0]), int(d[1]), d[2]
m[x-1,y-1] = z
m[shape[0]-x,shape[1]-y]=z
return m
But it has some unnecessary loops, like the first and the last, and the loop that changes the value of the center of the matrix.
Is there a way of optimizing it? This file actually have thousands of lines, it might be great to downgrade this loop execution time.
I believe this does what you want, at least without the mirroring:
def expand_mirror_matrix(matrix_path='data.txt'):
data = np.loadtxt(matrix_path)
shape = (int(data[-1][0]), int(data[-1][1]))
xs = data[:,0].astype(int) - 1 # Numpy uses zero-based indexing.
ys = data[:,1].astype(int) - 1
m = np.zeros(shape)
m[(xs, ys)] = data[:,2]
return m
For your example file above this returns:
array([[0.01 , 0. , 0. ],
[0.052145 , 0.045 , 0. ],
[0.054521 , 0.05424 , 0.05459898]])
If you wish to mirror it you probably want to edit the above function with the following:
m[(xs, ys)] = data[:,2]
m[(ys, xs)] = data[:,2] # Mirrored.
The result of that is:
array([[0.01 , 0.052145 , 0.054521 ],
[0.052145 , 0.045 , 0.05424 ],
[0.054521 , 0.05424 , 0.05459898]])
Note that this assumes the matrix is square.
Related
I have to scale between [0,1] a matrix. So, for each element from matrix i have to do this formula:
(Element - min_cols) / (max_cols - min_cols)
min_cols -> array with every minimum of each column from the matrix. max_cols -> same but with max
My problem is, i want to calculate result with this:
result = (Element- min_cols) / (max_cols - min_cols)
Or, from each element from the matrix i have to do difference between that element and the minimum from element's column, and do the difference between (maximum element's column and the minimum).*
but when i have for example the value from min_cols negative and the value from max_cols also negative, it results the sum between both.
I want to specify that the matrix is: _mat = np.random.randn(1000, 1000) * 50
Use numpy
Example
import numpy as np
x = 50*np.random.rand(6,4)
array([[26.7041017 , 46.88118463, 41.24541748, 31.17881807],
[47.57036124, 16.49040094, 6.62454156, 37.15976348],
[46.7157895 , 8.53357717, 39.01399714, 5.14287858],
[24.36012016, 5.67603151, 40.7697121 , 13.09877845],
[21.69045322, 12.61989002, 8.74692768, 46.23368735],
[ 3.9058066 , 35.50845507, 4.66785679, 2.34177134]])
Apply your formula
np.divide(np.subtract(x, x.min(axis=0)), x.max(axis=0)-x.min(axis=0))
array([[0.52212361, 1. , 1. , 0.65700132],
[1. , 0.26245187, 0.05349413, 0.79326663],
[0.98042871, 0.06934923, 0.93899483, 0.06381829],
[0.46844205, 0. , 0.98699461, 0.24507946],
[0.40730168, 0.16851918, 0.1115184 , 1. ],
[0. , 0.7239974 , 0. , 0. ]])
The max value of each column is mapped to 1, the min value of each column is mapped to 0 an the intermediate values have are linearly mapped between 0 and 1
I am calculating the difference of each element in a numpy array. My code is
import numpy as np
M = 10
x = np.random.uniform(0,1,M)
y = np.array([x])
# Calculate the difference
z = np.array(y[:,None]-y)
When I run my code I get [[[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]]]. I don't get a 10 by 10 array.
Where do I go wrong?
You should read the broadcasting rules for numpy
y.T - x
Another way:
np.subtract.outer(x, x)
You are not getting 10 by 10 array because value of M is 10. Try:
M = (10,10)
I searched stackoverflow but could not find an answer to this specific question. Sorry if it is a naive question, I am a newbie to python.
I have several 2d arrays (or lists) that I would like to read into a 3d array (list) in python. In Matlab, I can simply do
for i=1:N
# read 2d array "a"
newarray(:,:,i)=a(:,:)
end
so newarray is a 3d array with "a" being the 2d slices arranged along the 3rd dimension.
Is there a simple way to do this in python?
Edit: I am currently trying the following:
for file in files:
img=mpimg.imread(file)
newarray=np.array(0.289*cropimg[:,:,0]+0.5870*cropimg[:,:,1]+0.1140*cropimg[:,:,2])
i=i+1
I tried newarray[:,:,i] and it gives me an error
NameError: name 'newarray' is not defined
Seems like I have to define newarray as a numpy array? Not sure.
Thanks!
If you're familiar with MATLAB, translating that into using NumPy is fairly straightforward.
Lets say you have a couple arrays
a = np.eye(3)
b = np.arange(9).reshape((3, 3))
print(a)
# [[ 1. 0. 0.]
# [ 0. 1. 0.]
# [ 0. 0. 1.]]
print(b)
# [[0 1 2]
# [3 4 5]
# [6 7 8]]
If you simply want to put them into another dimension, pass them both to the array constructor in an iterable (e.g. a list) like so:
x = np.array([a, b])
print(x)
# [[[ 1. 0. 0.]
# [ 0. 1. 0.]
# [ 0. 0. 1.]]
#
# [[ 0. 1. 2.]
# [ 3. 4. 5.]
# [ 6. 7. 8.]]]
Numpy is smart enough to recognize the arrays are all the same size and creates a new dimension to hold it all.
print(x.shape)
# (2, 3, 3)
You can loop through it, but if you want to apply the same operations to it across some dimensions, I would strongly suggest you use broadcasting so that NumPy can vectorize the operation and it runs a whole lot faster.
For example, across one dimension, lets multiply one slice by 2, another by 3. (If it's not a pure scalar, we need to reshape the array to the same number of dimensions to broadcast, then the size on each needs to either match the array or be 1). Note that I'm working along the 0th axis, your image is probably different. I don't have a handy image to load up to toy with
y = x * np.array([2, 3]).reshape((2, 1, 1))
print(y)
#[[[ 2. 0. 0.]
# [ 0. 2. 0.]
# [ 0. 0. 2.]]
#
# [[ 0. 3. 6.]
# [ 9. 12. 15.]
# [ 18. 21. 24.]]]
Then we can add them up
z = np.sum(y, axis=0)
print(z)
#[[ 2. 3. 6.]
# [ 9. 14. 15.]
# [ 18. 21. 26.]]
If you're using NumPy arrays, you can translate almost directly from Matlab:
for i in range(1, N+1):
# read 2d array "a"
newarray[:, :, i] = a[:, :]
Of course you'd probably want to use range(N), because arrays use 0-based indexing. And obviously you're going to need to pre-create newarray in some way, just as you'd have to in Matlab, but you can translate that pretty directly too. (Look up the zeros function if you're not sure how.)
If you're using lists, you can't do this directly—but you probably don't want to anyway. A better solution would be to build up a list of 2D lists on the fly:
newarray = []
for i in range(N):
# read 2d list of lists "a"
newarray.append(a)
Or, more simply:
newarray = [read_next_2d_list_of_lists() for i in range(N)]
Or, even better, make that read function a generator, then just:
newarray = list(read_next_2d_list_of_lists())
If you want to transpose the order of the axes, you can use the zip function for that.
When you know the number of dimensions of your lattice ahead of time, it is straight-forward to use meshgrid to evaluate a function over a mesh.
from pylab import *
lattice_points = linspace(0,3,4)
xs,ys = meshgrid(lattice_points,lattice_points)
zs = xs+ys # <- stand-in function, to be replaced by something more interesting
print(zs)
Produces
[[ 0. 1. 2. 3.]
[ 1. 2. 3. 4.]
[ 2. 3. 4. 5.]
[ 3. 4. 5. 6.]]
But I would like to have a version of something similar, for which the number of dimensions is determined during runtime, or is passed as a parameter.
from pylab import *
#np.vectorize
def fn(listOfVars) :
return sum(listOfVars) # <- stand-in function, to be replaced
# by something more interesting
n_vars = 2
lattice_points = linspace(0,3,4)
indices = meshgrid(*(n_vars*[lattice_points])) # this works fine
zs = fn(indices) # <-- this line is wrong, but I don't
# know what would work instead
print(zs)
Produces
[[[ 0. 1. 2. 3.]
[ 0. 1. 2. 3.]
[ 0. 1. 2. 3.]
[ 0. 1. 2. 3.]]
[[ 0. 0. 0. 0.]
[ 1. 1. 1. 1.]
[ 2. 2. 2. 2.]
[ 3. 3. 3. 3.]]]
But I want it to produce the same result as above.
There is probably a solution where you can find the indices of each dimension and use itertools.product to generate all of the possible combinations of indices etc. etc., but is there not a nice pythonic way of doing this?
Joe Kington and user2357112 have helped me to see the error in my ways. For those of you that would like to see a complete solution:
from pylab import *
## 2D "preknown case" (for testing / to compare output)
lattice_points = linspace(0,3,4)
xs,ys = meshgrid(lattice_points,lattice_points)
zs = xs+ys
print('2-D Case')
print(zs)
## 3D "preknown case" (for testing / to compare output)
lattice_points = linspace(0,3,4)
ws,xs,ys = meshgrid(lattice_points,lattice_points,lattice_points)
zs = ws+xs+ys
print('3-D Case')
print(zs)
## Solution, thanks to comments from Joe Kington and user2357112
def fn(listOfVars) :
return sum(listOfVars)
n_vars = 3 ## can change to 2 or 3 to compare to example cases above
lattice_points = linspace(0,3,4)
indices = meshgrid(*(n_vars*[lattice_points]))
zs = np.apply_along_axis(fn,0,indices)
print('adaptable n-D Case')
print(zs)
I am trying to replicate the results from a paper.
"Two-dimensional Fourier Transform (2D-FT) in space and time along sections of constant latitude (east-west) and longitude (north-south) were used to characterize the spectrum of the simulated flux variability south of 40degS." - Lenton et al(2006)
The figures published show "the log of the variance of the 2D-FT".
I have tried to create an array consisting of the seasonal cycle of similar data as well as the noise. I have defined the noise as the original array minus the signal array.
Here is the code that I used to plot the 2D-FT of the signal array averaged in latitude:
import numpy as np
from numpy import ma
from matplotlib import pyplot as plt
from Scientific.IO.NetCDF import NetCDFFile
### input directory
indir = '/home/nicholas/data/'
### get the flux data which is in
### [time(5day ave for 10 years),latitude,longitude]
nc = NetCDFFile(indir + 'CFLX_2000_2009.nc','r')
cflux_southern_ocean = nc.variables['Cflx'][:,10:50,:]
cflux_southern_ocean = ma.masked_values(cflux_southern_ocean,1e+20) # mask land
nc.close()
cflux = cflux_southern_ocean*1e08 # change units of data from mmol/m^2/s
### create an array that consists of the seasonal signal fro each pixel
year_stack = np.split(cflux, 10, axis=0)
year_stack = np.array(year_stack)
signal_array = np.tile(np.mean(year_stack, axis=0), (10, 1, 1))
signal_array = ma.masked_where(signal_array > 1e20, signal_array) # need to mask
### average the array over latitude(or longitude)
signal_time_lon = ma.mean(signal_array, axis=1)
### do a 2D Fourier Transform of the time/space image
ft = np.fft.fft2(signal_time_lon)
mgft = np.abs(ft)
ps = mgft**2
log_ps = np.log(mgft)
log_mgft= np.log(mgft)
Every second row of the ft consists completely of zeros. Why is this?
Would it be acceptable to add a randomly small number to the signal to avoid this.
signal_time_lon = signal_time_lon + np.random.randint(0,9,size=(730, 182))*1e-05
EDIT: Adding images and clarify meaning
The output of rfft2 still appears to be a complex array. Using fftshift shifts the edges of the image to the centre; I still have a power spectrum regardless. I expect that the reason that I get rows of zeros is that I have re-created the timeseries for each pixel. The ft[0, 0] pixel contains the mean of the signal. So the ft[1, 0] corresponds to a sinusoid with one cycle over the entire signal in the rows of the starting image.
Here are is the starting image using following code:
plt.pcolormesh(signal_time_lon); plt.colorbar(); plt.axis('tight')
Here is result using following code:
ft = np.fft.rfft2(signal_time_lon)
mgft = np.abs(ft)
ps = mgft**2
log_ps = np.log1p(mgft)
plt.pcolormesh(log_ps); plt.colorbar(); plt.axis('tight')
It may not be clear in the image but it is only every second row that contains completely zeros. Every tenth pixel (log_ps[10, 0]) is a high value. The other pixels (log_ps[2, 0], log_ps[4, 0] etc) have very low values.
Consider the following example:
In [59]: from scipy import absolute, fft
In [60]: absolute(fft([1,2,3,4]))
Out[60]: array([ 10. , 2.82842712, 2. , 2.82842712])
In [61]: absolute(fft([1,2,3,4, 1,2,3,4]))
Out[61]:
array([ 20. , 0. , 5.65685425, 0. ,
4. , 0. , 5.65685425, 0. ])
In [62]: absolute(fft([1,2,3,4, 1,2,3,4, 1,2,3,4]))
Out[62]:
array([ 30. , 0. , 0. , 8.48528137,
0. , 0. , 6. , 0. ,
0. , 8.48528137, 0. , 0. ])
If X[k] = fft(x), and Y[k] = fft([x x]), then Y[2k] = 2*X[k] for k in {0, 1, ..., N-1} and zero otherwise.
Therefore, I would look into how your signal_time_lon is being tiled. That may be where the problem lies.