How to looping in python with raster dataset - python

I have a multiband raster (84 bands). I am reading the raster using GDAL and converting it to numpy array. In numpy when I am checking the array shape it is showing as 84 = bands, 3 = row and 5 = col. I want to compute the ratio between the band(0)/band(n+1) for n in 1 to 84. Thus, I am able to get 83 arrays, each array represents pixel-by-pixel ratio. For example, I have:
Band 1
[[1, 2, 3, 4, 5],
[6, 7, 8, 9, 10],
[11, 12, 13, 14, 15]]
Band 2
[[21, 22, 23, 24, 25],
[26, 27, 28, 29, 30],
[31, 32, 33, 34, 35]]
Band 3
[[31, 32, 33, 34, 35],
[36, 37, 38, 39, 40],
[41, 42, 43, 44, 45]]
...
...
Band84
I need to loop through all the bands in such a way that I get these: Band2/Band1; Band3/Band1; ... ; Band84/Band1
Band2/Band1
[[1/21, 2/22, 3/23, 4/24, 5/25],
[6/26, 7/27, 8/28, 9/29, 10/30],
[11,31, 12/32, 13/33, 14/34, 15/35]]
And so on...
There is any way to vectorize this calculation?
I really appreciate your advice.

If I understand you need
Band2/Band1; Band3/Band1 ... Band84/Band1
Band3/Band2; Band4/Band2 ... Band84/Band2
...
Band84/Band83
It should be something like this
for a in range(0, len(all_bands)-1):
for b in range(a+1, len(all_bands)):
print( all_bands[b]/all_bands[a] )

Related

numpy array replace value with a conditional loop

I have a numpy array,
myarray= np.array([49, 7, 44, 27, 13, 35, 171])
i wanted to replace the values if it is greater than 45, so i applied the below code
myarray=np.where(myarray> 45,myarray - 45, myarray)
but this is applied only once in that array, for example, the above array becomes
myarray= np.array([4, 7, 44, 27, 13, 35, 126])
Expected array
myarray= np.array([4, 7, 44, 27, 13, 35, 36])
How do i run the np.where till the condition is satisfied? basically in the above array i dont want any value to be greater than 45, is there a pythonic way of doing it. Thanks in advance.
Well the subtraction happens only once, since you did the operation only once, hence 171 - 45 => 126 and the operation has completed. Try using the modulo operator if you wanna do it this way.
myarray = np.array([49, 7, 44, 27, 13, 35, 171])
myarray = np.where(myarray> 45, myarray % 45, myarray)
print(myarray)
The output matches your prompt.
[ 4 7 44 27 13 35 36]
Not the most pythonic or even optimal way, but pretty easy to understand. You can try this:
import numpy as np
myarray= np.array([49, 7, 44, 27, 13, 35, 171])
for i in range(len(myarray)):
while(myarray[i]>45):
myarray[i]=myarray[i]-45
print(myarray)

Matlab to Python - extracting lower subdiagonal triangle, why different order?

I am translating code from MATLAB to Python. I need to extract the lower subdiagonal values of a matrix. My attempt in python seems to extract the same values (sum is equal), but in different order. This is a problem as I need to apply corrcoef after.
The original Matlab code is using an array of indices to subset a matrix.
MATLAB code:
values = 1:100;
matrix = reshape(values,[10,10]);
subdiag = find(tril(ones(10),-1));
matrix_subdiag = matrix(subdiag);
subdiag_sum = sum(matrix_subdiag);
disp(matrix_subdiag(1:10))
disp(subdiag_sum)
Output:
2
3
4
5
6
7
8
9
10
13
1530
My attempt in Python
import numpy as np
matrix = np.arange(1,101).reshape(10,10)
matrix_t = matrix.T #to match MATLAB arrangement
matrix_subdiag = matrix_t[np.tril_indices((10), k = -1)]
subdiag_sum = np.sum(matrix_subdiag)
print(matrix_subdiag[0:10], subdiag_sum))
Output:
[2 3 13 4 14 24 5 15 25 35] 1530
How do I get the same order output? Where is my error?
Thank you!
For the sum use directly numpy.triu on the non-transposed matrix:
S = np.triu(matrix, k=1).sum()
# 1530
For the indices, numpy.triu_indices_from and slicing as a flattened array:
idx = matrix[np.triu_indices_from(matrix, k=1)]
output:
array([ 2, 3, 4, 5, 6, 7, 8, 9, 10, 13, 14, 15, 16, 17, 18, 19, 20,
24, 25, 26, 27, 28, 29, 30, 35, 36, 37, 38, 39, 40, 46, 47, 48, 49,
50, 57, 58, 59, 60, 68, 69, 70, 79, 80, 90])

Multiprocessing of two for loops

I'm struggling with the implementation of an algorithm in python (2.7) to parallelize the computation of a physics problem. There's a parameter space over two variables (let's say a and b) over which I would like to run my written program f(a,b) which returns two other variables c and d.
Up to now, I worked with two for-loops over a and b to calculate two arrays for c and d which are then saved as txt documents. Since the parameter space is relatively large and each calculation of a point f(a,b) in it takes relatively long, it would be great to use all of my 8 CPU cores for the parameter space scan.
I've read about multithreading and multiprocessing and it seems that multiprocessing is what I'm searching for. Do you know of a good code example for this application or resources to learn about the basics of multiprocessing for my rather simple application?
Here is an example of how you might use multiprocessing with a simple function that takes two arguments and returns a tuple of two numbers, and a parameter space over which you want to do the calculation:
from itertools import product
from multiprocessing import Pool
import numpy as np
def f(a, b):
c = a + b
d = a * b
return (c, d)
a_vals = [1, 2, 3, 4, 5, 6]
b_vals = [10, 11, 12, 13, 14, 15, 16, 17]
na = len(a_vals)
nb = len(b_vals)
p = Pool(8) # <== maximum number of simultaneous worker processes
answers = np.array(p.starmap(f, product(a_vals, b_vals))).reshape(na, nb, 2)
c_vals = answers[:,:,0]
d_vals = answers[:,:,1]
This gives the following:
>>> c_vals
array([[11, 12, 13, 14, 15, 16, 17, 18],
[12, 13, 14, 15, 16, 17, 18, 19],
[13, 14, 15, 16, 17, 18, 19, 20],
[14, 15, 16, 17, 18, 19, 20, 21],
[15, 16, 17, 18, 19, 20, 21, 22],
[16, 17, 18, 19, 20, 21, 22, 23]])
>>> d_vals
array([[ 10, 11, 12, 13, 14, 15, 16, 17],
[ 20, 22, 24, 26, 28, 30, 32, 34],
[ 30, 33, 36, 39, 42, 45, 48, 51],
[ 40, 44, 48, 52, 56, 60, 64, 68],
[ 50, 55, 60, 65, 70, 75, 80, 85],
[ 60, 66, 72, 78, 84, 90, 96, 102]])
The p.starmap returns a list of 2-tuples, from which the c and d values are then extracted.
This assumes that you will do your file I/O in the main program after getting back all the results.
Addendum:
If p.starmap is unavailable (Python 2), then instead you can change your function to take a single input (a 2-element tuple):
def f(inputs):
a, b = inputs
# ... etc as before ...
and then use p.map in place of p.starmap in the above code.
If it is not convenient to change the function (e.g. it is also called from elsewhere), then you can of course write a wrapper function:
def f_wrap(inputs):
a, b = inputs
return f(a, b)
and call that instead.

For loop in python- to generate output containing list of items

I am new to programming and to python, currently using python3 via annoconda/jupyter. My input (test_Set) is list of quality score codes from a fastq file. The loop converts the quality score codes into quality scores. The following is the code I used
test_set=['.GA', '<AG', '#<<']
output1=[]
output2=[]
for i in test_set:
s=i
for j in range(len(s)):
qs=ord(s[j])-33
output1.append(qs)
output2.append(output1)
The outputs i have are:
output1: [13, 38, 32, 27, 32, 38, 2, 27, 27]
output2: [[13, 38, 32, 27, 32, 38, 2, 27, 27],
[13, 38, 32, 27, 32, 38, 2, 27, 27],
[13, 38, 32, 27, 32, 38, 2, 27, 27]]
But the output i am trying to achieve is :
desired_output: [[13, 38, 32], [27, 32, 38], [2, 27, 27]]
I would like to know what I am doing wrong with my loops and how to change them to achieve the desired output.
Thank you and appreciate any help, including resources to understand for and while loops
You almost got it right. You only need to "reset" output1 in every pass of the for loop.
Here's a simplified version that does what you want:
test_set = ['.GA', '<AG', '#<<']
output2 = []
for s in test_set:
output1 = [ord(c)-33 for c in s]
output2.append(output1)
Because the string s is itself an iterable, you don't need to use the len(range(s)) to index into it. Technically, you could do it in a one-liner, if you find it understandable-enough:
[[ord(c)-33 for c in s] for s in test_set]
test_set=['.GA', '<AG', '#<<']
output=[]
for i in test_set:
aux = []
for j in range(len(i)):
qs=ord(s[j])-33
aux.append(qs)
output.append(aux)

Combining many 3D numpy arrays into one, from shape from (3, 2, 1) to (3, 2, 4)

I know that this has probably been asked before, but in all of the questions i am looking, they are talking about a different type of reshaping.
Let's say that we have the following numpy arrays:
data1 = np.array([[[12], [13]], [[14], [15]], [[16], [17]]])
data2 = np.array([[[22], [23]], [[24], [25]], [[26], [27]]])
data3 = np.array([[[32], [33]], [[34], [35]], [[36], [37]]])
data4 = np.array([[[42], [43]], [[44], [45]], [[46], [47]]])
with a shape of (3, 2, 1)
How can we combine these four arrays so we can get the following shape (3, 2, 4)
result = np.array([[[12, 22, 32, 42], [13, 23, 33, 43]], [[14, 24, 34, 44], [15, 25, 35, 45]], [[16, 26, 36, 46], [17, 27, 36, 47]]])
You can use np.concatenate():
np.concatenate((data1, data2, data3, data4), axis=2)

Categories

Resources