I have a series of maps with two different indices, i and j. Let this be indexed like map_series[i][j].
EDIT 1/21: A minimal working example would be something like
map_series=np.array([np.array([np.arange(12) + 0.1*(i+1) + 0.01*(j+1) for j in range(3)]) for i in range(5)])
I'd like to apply the same mask to each; if map_series is one-dimensional, these each work.
I can imagine a few different ways of applying these maps:
(A) Applying the mask to the whole array:
map_series_ma = hp.ma(map_series)
map_series_ma.mask = predefined_mask
(B1) Applying the mask to each element of the array:
map_series_ma = np.zeros_like(map_series)
for i in range(len(map_series)):
for j in range(len(map_series[0])):
temp = hp.ma(map_series[i][j])
temp.mask = predefined_mask
map_series_ma[i][j] = temp
(B2) Applying the mask to each element of the array:
map_series_ma = np.zeros_like(map_series)
for i in range(len(map_series)):
for j in range(len(map_series[0])):
map_series_ma[i][j] = hp.ma(map_series[i][j])
map_series_ma[i][j].mask = predefined_mask
(C) Pythonically enumerating the list:
map_series_ma = np.array([hp.ma(map_series[i][j]) for j in range(j_max) for i in range(i_max)])
map_series_ma.mask = predetermined_mask
All of these fail to give my desired output, however.
Upon trying (A) or (C) I get an error after the first step, telling me TypeError: bad number of pixels.
Upon trying (B1) I don't get an error, but I also none of the elements of the maps_series_ma have masks; in fact, they do not even appear to be hp.ma objects. Oddly enough, though: when I return temp it does have the appropriate mask.
Upon trying (B2) I get the error
AttributeError: 'numpy.ndarray' object has no attribute 'mask' (which, after looking at my syntax, I totally understand!)
I'm a little confused how to go about this. Both (A) and (B1) seem acceptable to me...
Any help is much appreciated,
Thanks,
Sam
this works for me:
import numpy as np
import healpy as hp
map_series=np.array([np.array([np.arange(12) + 0.1*(i+1) + 0.01*(j+1) for j in range(3)]) for i in range(5)])
map_series_ma = map(lambda x: hp.ma(x), map_series)
pm=[True, True,True,True,True,True,False,False,False,False,False,False]
for m in map_series_ma:
for mm in m:
mm.mask=pm
Related
I am using numpy arrays aside from pandas for speed purposes. However, I am unable to advance my codes using broadcasting, indexing etc. Instead, I am using loop in loops as below. It is working but seems so ugly and inefficient to me.
Basically what I am doing is, I am trying to imitate groupby of pandas at the step mydata[mydata[:,1]==i]. You may consider it as a firm id number. Then with respect to the lookup data, I am checking if it is inside the selected firm or not at the step all(np.isin(lookup[u],d[:,3])). But as I denoted at the beginning, I feel so uncomfortable about this.
out = []
for i in np.unique(mydata[:,1]):
d = mydata[mydata[:,1]==i]
for u in range(0,len(lookup)):
control = all(np.isin(lookup[u],d[:,3]))
if(control):
out.append(d[np.isin(d[:,3],lookup[u])])
It takes about 0.27 seconds. However there must exist some clever alternatives.
I also tried Numba jit() but it does not work.
Could anyone help me about that?
Thanks in advance!
Fake Data:
a = np.repeat(np.arange(100)+5000, np.random.randint(50, 100, 100))
b = np.random.randint(100,200,len(a))
c = np.random.randint(10,70,len(a))
index = np.arange(len(a))
mydata = np.vstack((index,a, b,c)).T
lookup = []
for i in range(0,60):
lookup.append(np.random.randint(10,70,np.random.randint(3,6,1) ))
I had some problems getting the goal of your Program, but I got a decent performance improvement, by refactoring your second for loop. I was able to compress your code to 3 or 4 lines.
f = (
lambda lookup: out1.append(d[np.isin(d[:, 3], lookup)])
if all(np.isin(lookup, d[:, 3]))
else None
)
out = []
for i in np.unique(mydata[:, 1]):
d = mydata[mydata[:, 1] == i]
list(map(f, lookups))
This resolves to the same output list you received previously and the code runs almost twice as quick (at least on my machine).
I'm trying to translate this line of code from Python to MATLAB:
new_img[M[0, :] - corners[0][0], M[1, :] - corners[1][0], :] = img[T[0, :], T[1, :], :]
So, naturally, I wrote something like this:
new_img(M(1,:)-corners(2,1),M(2,:)-corners(2,2),:) = img(T(1,:),T(2,:),:);
But it gives me the following error when it reaches that line:
Requested 106275x106275x3 (252.4GB) array exceeds maximum array size
preference. Creation of arrays greater than this limit may take a long
time and cause MATLAB to become unresponsive. See array size limit or
preference panel for more information.
This has made me believe that it is not assigning things correctly. Img is at most a 1000 × 1500 RGB image. The same code works in less than 5 seconds in Python. How can I do vector assignment like the code in the first line in MATLAB?
By the way, I didn't paste all lines of my code for this post not to get too long. If I need to add anything else, please let me know.
Edit:
Here's an explanation of what I want my code to do (basically, this is what the Python code does):
Consider this line of code. It's not a real MATLAB code, I'm just trying to explain what I want to do:
A([2 3 5], [1 3 5]) = B([1 2 3], [2 4 6])
It is interpreted like this:
A(2,1) = B(1,2)
A(3,1) = B(2,2)
A(5,1) = B(3,2)
A(2,3) = B(1,4)
A(3,3) = B(2,4)
A(5,3) = B(3,4)
...
...
...
Instead, I want it to be interpreted like this:
A(2,1) = B(1,2)
A(3,3) = B(2,4)
A(5,5) = B(3,6)
When you do A[vector1, vector2] in Python, you index the set:
A[vector1[0], vector2[0]]
A[vector1[1], vector2[1]]
A[vector1[2], vector2[2]]
A[vector1[3], vector2[3]]
...
In MATLAB, the similar-looking A(vector1, vector2) instead indexes the set:
A(vector1(1), vector2(1))
A(vector1(1), vector2(2))
A(vector1(1), vector2(3))
A(vector1(1), vector2(4))
...
A(vector1(2), vector2(1))
A(vector1(2), vector2(2))
A(vector1(2), vector2(3))
A(vector1(2), vector2(4))
...
That is, you get each combination of indices. You should think of it as a sub-array composed of the rows and columns specified in the two vectors.
To accomplish the same as the Python code, you need to use linear indexing:
index = sub2ind(size(A), vector1, vector2);
A(index)
Thus, your MATLAB code should do:
index1 = sub2ind(size(new_img), M(1,:)-corners(2,1), M(2,:)-corners(2,2));
index2 = sub2ind(size(img), T(1,:), T(2,:));
% these indices are for first 2 dims only, need to index in 3rd dim also:
offset1 = size(new_img,1) * size(new_img,2);
offset2 = size(img,1) * size(img,2);
index1 = index1.' + offset1 * (0:size(new_img,3)-1);
index2 = index2.' + offset2 * (0:size(new_img,3)-1);
new_img(index1) = img(index2);
What the middle block does here is add linear indexes for the same elements along the 3rd dimension. If ii is the linear index to an element in the first channel, then ii + offset1 is an index to the same element in the second channel, and ii + 2*offset1 is an index to the same element in the third channel, etc. So here we're generating indices to all those matrix elements. The + operation is doing implicit singleton expansion (what they call "broadcasting" in Python). If you have an older version of MATLAB this will fail, you need to replace that A+B with bsxfun(#plus,A,B).
I am working on a problem which involves a batch of 19 tokens each with 400 features. I get the shape (19,1,400) when concatenating two vectors of size (1, 200) into the final feature vector. If I squeeze the 1 out I am left with (19,) but I am trying to get (19,400). I have tried converting to list, squeezing and raveling but nothing has worked.
Is there a way to convert this array to the correct shape?
def attn_output_concat(sample):
out_h, state_h = get_output_and_state_history(agent.model, sample)
attns = get_attentions(state_h)
inner_outputs = get_inner_outputs(state_h)
if len(attns) != len(inner_outputs):
print 'Length err'
else:
tokens = [np.zeros((400))] * largest
print(tokens.shape)
for j, (attns_token, inner_token) in enumerate(zip(attns, inner_outputs)):
tokens[j] = np.concatenate([attns_token, inner_token], axis=1)
print(np.array(tokens).shape)
return tokens
The easiest way would be to declare tokens to be a numpy.shape=(19,400) array to start with. That's also more memory/time efficient. Here's the relevant portion of your code revised...
import numpy as np
attns_token = np.zeros(shape=(1,200))
inner_token = np.zeros(shape=(1,200))
largest = 19
tokens = np.zeros(shape=(largest,400))
for j in range(largest):
tokens[j] = np.concatenate([attns_token, inner_token], axis=1)
print(tokens.shape)
BTW... It makes it difficult for people to help you if you don't include a self-contained and runnable segment of code (which is probably why you haven't gotten a response on this yet). Something like the above snippet is preferred and will help you get better answers because there's less guessing at what your trying to accomplish.
Following is the numpy array I have. I need to create a matrix containing zeors for an instance like np.zeroes([1,1]).
newEdges =
array([['0', 'Firm'],
['1', 'Firm'],
['2', 'Firm'],
...,
['binA', 'year2017_bin'],
['binA', 'year2017_bin'],
['binA', 'year2017_bin']],
dtype='<U21')
newEdges.shape
#(63673218, 2)
newEdges.size
#127346436
However, based on the size of my matrix (as you can see above, that is, (63673218, 2)), if I run syntax to generate the zeroes matrix I get a Memory Error.
He is full syntax:
print(newEdges)
unique_Bin = np.unique(newEdges[:,0])
n_unique_Bin = len(unique_Bin)
unique_Bin
n_unique_Bin
#3351248
Q = np.zeros([n_unique_Bin,n_unique_Bin])
--------------------------------------------------------------------------
MemoryError Traceback (most recent call last)
<ipython-input-16-581dfaca2eab> in <module>()
----> 1 Q = np.zeros([n_unique_Bin,n_unique_Bin])
MemoryError:
How do I resolve this error? Or, how would I safely convert this huge matrix to a sparse matrix for further calculation done below:
for n, employer_employee in enumerate(newEdges):
#print(employer_employee)
#copy the array for the original o be intact
eee = np.copy(newEdges)
#sustitue the current tuple with a empty one to avoid self comparing
eee[n] = (None,None)
#get the index for the current employee, the one on the y axis
employee_index = np.where(employer_employee[0] != unique_Bin)
#get the indexes where the the employees letter match
eq_index = np.where(eee[:,1] == employer_employee[1])[0]
eq_employee = eee[eq_index,0]
#add at the final array Q by index
for emp in eq_employee:
#print(np.unique(emp))
emp_index = np.where(unique_Bin == emp)
#print(emp)
Q[employee_index,emp_index]+= 1
# print(Q)
print(Q)
I have 24GB left in the memory for this calculation.
Just to point this out you are trying to create an array which is 3,351,248 x 3,351,248 in size. This is 11,230,863,157,504 entries! 11.2 trillion entries! The fact that you tried to print this makes me think you hadn't realised how big it was. I don't think you will be able to do this without sparse matrices. First of all you should probably make sure you need to do this and see if there is some other way.
Otherwise you can create a sparse matrix using scipy
import numpy as np
import scipy
Q = scipy.sparse.csr_matrix((n_unique_Bin,n_unique_Bin), dtype = np.int8)
Then go from there.
This question may be a little specialist, but hopefully someone might be able to help. I normally use IDL, but for developing a pipeline I'm looking to use python to improve running times.
My fits file handling setup is as follows:
import numpy as numpy
from astropy.io import fits
#Directory: /Users/UCL_Astronomy/Documents/UCL/PHASG199/M33_UVOT_sum/UVOTIMSUM/M33_sum_epoch1_um2_norm.img
with fits.open('...') as ima_norm_um2:
#Open UVOTIMSUM file once and close it after extracting the relevant values:
ima_norm_um2_hdr = ima_norm_um2[0].header
ima_norm_um2_data = ima_norm_um2[0].data
#Individual dimensions for number of x pixels and number of y pixels:
nxpix_um2_ext1 = ima_norm_um2_hdr['NAXIS1']
nypix_um2_ext1 = ima_norm_um2_hdr['NAXIS2']
#Compute the size of the images (you can also do this manually rather than calling these keywords from the header):
#Call the header and data from the UVOTIMSUM file with the relevant keyword extensions:
corrfact_um2_ext1 = numpy.zeros((ima_norm_um2_hdr['NAXIS2'], ima_norm_um2_hdr['NAXIS1']))
coincorr_um2_ext1 = numpy.zeros((ima_norm_um2_hdr['NAXIS2'], ima_norm_um2_hdr['NAXIS1']))
#Check that the dimensions are all the same:
print(corrfact_um2_ext1.shape)
print(coincorr_um2_ext1.shape)
print(ima_norm_um2_data.shape)
# Make a new image file to save the correction factors:
hdu_corrfact = fits.PrimaryHDU(corrfact_um2_ext1, header=ima_norm_um2_hdr)
fits.HDUList([hdu_corrfact]).writeto('.../M33_sum_epoch1_um2_corrfact.img')
# Make a new image file to save the corrected image to:
hdu_coincorr = fits.PrimaryHDU(coincorr_um2_ext1, header=ima_norm_um2_hdr)
fits.HDUList([hdu_coincorr]).writeto('.../M33_sum_epoch1_um2_coincorr.img')
I'm looking to then apply the following corrections:
# Define the variables from Poole et al. (2008) "Photometric calibration of the Swift ultraviolet/optical telescope":
alpha = 0.9842000
ft = 0.0110329
a1 = 0.0658568
a2 = -0.0907142
a3 = 0.0285951
a4 = 0.0308063
for i in range(nxpix_um2_ext1 - 1): #do begin
for j in range(nypix_um2_ext1 - 1): #do begin
if (numpy.less_equal(i, 4) | numpy.greater_equal(i, nxpix_um2_ext1-4) | numpy.less_equal(j, 4) | numpy.greater_equal(j, nxpix_um2_ext1-4)): #then begin
#UVM2
corrfact_um2_ext1[i,j] == 0
coincorr_um2_ext1[i,j] == 0
else:
xpixmin = i-4
xpixmax = i+4
ypixmin = j-4
ypixmax = j+4
#UVM2
ima_UVM2sum = total(ima_norm_um2[xpixmin:xpixmax,ypixmin:ypixmax])
xvec_UVM2 = ft*ima_UVM2sum
fxvec_UVM2 = 1 + (a1*xvec_UVM2) + (a2*xvec_UVM2*xvec_UVM2) + (a3*xvec_UVM2*xvec_UVM2*xvec_UVM2) + (a4*xvec_UVM2*xvec_UVM2*xvec_UVM2*xvec_UVM2)
Ctheory_UVM2 = - alog(1-(alpha*ima_UVM2sum*ft))/(alpha*ft)
corrfact_um2_ext1[i,j] = Ctheory_UVM2*(fxvec_UVM2/ima_UVM2sum)
coincorr_um2_ext1[i,j] = corrfact_um2_ext1[i,j]*ima_sk_um2[i,j]
The above snippet is where it is messing up, as I have a mixture of IDL syntax and python syntax. I'm just not sure how to convert certain aspects of IDL to python. For example, the ima_UVM2sum = total(ima_norm_um2[xpixmin:xpixmax,ypixmin:ypixmax]) I'm not quite sure how to handle.
I'm also missing the part where it will update the correction factor and coincidence correction image files, I would say. If anyone could have the patience to go over it with a fine tooth comb and suggest the neccessary changes I need that would be excellent.
The original normalised image can be downloaded here: Replace ... in above code with this file
One very important thing about numpy is that it does every mathematical or comparison function on an element-basis. So you probably don't need to loop through the arrays.
So maybe start where you convolve your image with a sum-filter. This can be done for 2D images by astropy.convolution.convolve or scipy.ndimage.filters.uniform_filter
I'm not sure what you want but I think you want a 9x9 sum-filter that would be realized by
from scipy.ndimage.filters import uniform_filter
ima_UVM2sum = uniform_filter(ima_norm_um2_data, size=9)
since you want to discard any pixel that are at the borders (4 pixel) you can simply slice them away:
ima_UVM2sum_valid = ima_UVM2sum[4:-4,4:-4]
This ignores the first and last 4 rows and the first and last 4 columns (last is realized by making the stop value negative)
now you want to calculate the corrections:
xvec_UVM2 = ft*ima_UVM2sum_valid
fxvec_UVM2 = 1 + (a1*xvec_UVM2) + (a2*xvec_UVM2**2) + (a3*xvec_UVM2**3) + (a4*xvec_UVM2**4)
Ctheory_UVM2 = - np.alog(1-(alpha*ima_UVM2sum_valid*ft))/(alpha*ft)
these are all arrays so you still do not need to loop.
But then you want to fill your two images. Be careful because the correction is smaller (we inored the first and last rows/columns) so you have to take the same region in the correction images:
corrfact_um2_ext1[4:-4,4:-4] = Ctheory_UVM2*(fxvec_UVM2/ima_UVM2sum_valid)
coincorr_um2_ext1[4:-4,4:-4] = corrfact_um2_ext1[4:-4,4:-4] *ima_sk_um2
still no loop just using numpys mathematical functions. This means it is much faster (MUCH FASTER!) and does the same.
Maybe I have forgotten some slicing and that would yield a Not broadcastable error if so please report back.
Just a note about your loop: Python's first axis is the second axis in FITS and the second axis is the first FITS axis. So if you need to loop over the axis bear that in mind so you don't end up with IndexErrors or unexpected results.