Using list comprehension in data cubes

Using list comprehension in data cubes - python

I am currently trying to use list comprehensions to filter some values on a data cube with some images, however I got lost to make the jump from 2 (as we can see in here or here) to 3 dimensions.
For a single image, the line of code that accomplishes what I want is:
AM2 = [[x if x > 1e-5 else 0 for x in line] for line in AM[0]]
How do I take this to also consider the different images that are stacked on the top of each other? I assume I would need to add a third nested loop but so far all my attempts to do so failed.
In my particular case the datacube is composed of numpy arrays having the dimensions of (100x400x900). Are lists comprehensions still advised to be used for filtering values over that volume of data?
Thanks for your time.

Don't use list comprehensions for numpy arrays, you lose their speed and power. Instead use numpy advanced indexing. For example your comprehension can be written as
AM2 = AM.copy() # USe AM2 = AM.copy()[0] if you just want the first row as in your example
AM2[AM2 < 1e-5] = 0

For pure Python nested lists, try this:
AM2 = [[x if x > 1e-5 else 0 for x in line] for A in AM for line in A]
See #FHTMitchell's answer if these are numpy arrays.

Related

Is there a better way to send multiple arguments to itertools.product?

I am trying to create itertools.product from a 2D list containing many rows. For example, consider a list s:
[[0.7168573116730971,
1.3404415914042531,
1.8714268721791336,
11.553051251803975],
[0.6702207957021266,
1.2476179147860895,
1.7329576877705954,
10.635778602978927],
[0.6238089573930448,
1.1553051251803976,
1.5953667904468385,
9.725277699842893],
[0.5776525625901988,
1.0635778602978927,
1.4587916549764335,
8.822689900641748]]
I want to compute itertools.product between the 4 rows of the list:
pr = []
for j in (it.product(s[0],s[1],s[2],s[3])):
pr.append(j)
This gives the necessary result for pr which has dimensions 256,4 where 256 is (number of columns^number of rows). But, is there a better way to send all the rows of the list s as arguments without having to write each row's name. This would be annoying if it were to be done for a larger list.
I guess numpy.meshgrid can be used if s was a numpy.array. But even there, I'll have to jot down each row one by one as arguments.

You can use the unpacking notation * in Python for this:
import itertools as it
s = [[0.7168573116730971,
1.3404415914042531,
1.8714268721791336,
11.553051251803975],
[0.6702207957021266,
1.2476179147860895,
1.7329576877705954,
10.635778602978927],
[0.6238089573930448,
1.1553051251803976,
1.5953667904468385,
9.725277699842893],
[0.5776525625901988,
1.0635778602978927,
1.4587916549764335,
8.822689900641748]]
pr = []
for j in (it.product(*s):
pr.append(j)
It will send each item of your list s to the product function

Minimum of pairs between two lists, is there a quicker way?

I have two (very long) lists. I want to find the sum of the minimum of each pair in the list. Eg, if
X = [2,3,4]
Y = [5,4,2]
then, the sum would be 2+3+2 = 7.
At the moment, I'm doing this by zipping the lists and using a list comprehension. My lists are X and Y:
mins = [min(x,y) for x,y in zip(X,Y)]
summed_mins = sum(mins)
This is causing serious runtime issues in my program. Is there a faster way to do this? List comprehensions are the fastest that I know of.

You can use Python generators and the built-in map function to avoid the creation of the list, but this will probably be just slightly faster (thanks to Veedrac):
summed_mins = sum(map(min, x, y))
Alternatively, you can use Numpy. Here is how:
summed_mins = np.stack((X, Y)).min(axis=0).sum()
If you can store the input list directly as Numpy arrays, this can be much faster.
If you can even store it directly in a 2D Numpy array, you don't need the np.stack call resulting in a much faster code.
If you cannot store/create the input directly as Numpy arrays, you can create the Numpy arrays on the fly quickly by specifying the data type (assuming you are sure the list contain small integers). Here is an example:
summed_mins = np.stack((np.array(a, np.int64), np.array(b, np.int64))).min(axis=0)

How to have an array of arrays in Python

I'm new to python, but I'm solid in coding in vb.net. I'm trying to hold numerical values in a jagged array; to do this in vb.net I would do the following:
Dim jag(3)() as double
For I = 0 to 3
Redim jag(i)(length of this row)
End
Now, I know python doesn't use explicit declarations like this (maybe it can, but I don't know how!). I have tried something like this;
a(0) = someOtherArray
But that doesn't work - I get the error Can't assign to function call. Any advice on a smoother way to do this? I'd prefer to stay away from using a 2D matrix as the different elements of a (ie. a(0), a(1),...) are different lengths.

arr = [[]]
I'm not sure what you're trying to do, python lists is dynamically assigned, but if you want a predefined length and dimension use list comprehensions.
arr = [[0 for x in range(3)] for y in range(3)]

From Microsoft documentation:
A jagged array is an array whose elements are arrays. The elements of
a jagged array can be of different dimensions and sizes
Python documentation about Data Structures.
You could store a list inside another list or a dictionary that stores a list. Depending on how deep your arrays go, this might not be the best option.
numbersList = []
listofNumbers = [1,2,3]
secondListofNumbers = [4,5,6]
numbersList.append(listofNumbers)
numbersList.append(secondListofNumbers)
for number in numbersList:
print(number)

Python Iterating through nested list using list comprehension

I'm working on Euler Project, problem 11, which involves finding the greatest product of all possible combinations of four adjacent numbers in a grid. I've split the numbers into a nested list and used a list comprehension to slice the relevant numbers, like this:
if x+4 <= len(matrix[x]): #check right
my_slice = [int(matrix[x][n]) for n in range(y,y+4)]
...and so on for the other cardinal directions. So far, so good. But when I get to the diagonals things get problematic. I tried to use two ranges like this:
if x+4 <= len(matrix[x]) and y-4 >=0:# check up, right
my_slice = [int(matrix[m][n]) for m,n in ((range(x,x+4)),range(y,y+4))]
But this yields the following error:
<ipython-input-53-e7c3ebf29401> in <listcomp>(.0)
48 if x+4 <= len(matrix[x]) and y-4 >=0:# check up, right
---> 49 my_slice = [int(matrix[m][n]) for m,n in ((range(x,x+4)),range(y,y+4))]
ValueError: too many values to unpack (expected 2)
My desired indices for x,y values of [0,0] would be ['0,0','1,1','2,2','3,3']. This does not seem all that different for using the enumerate function to iterate over a list, but clearly I'm missing something.
P.S. My apologies for my terrible variable nomenclature, I'm a work in progress.

You do not need to use two ranges, simply use one and apply it twice:
my_slice = [int(matrix[m][m-x+y]) for m in range(x,x+4)]
Since your n is supposed to be attached to range(y,y+4) we know that there will always be a difference of y-x between m and n. So instead of using two variables, we can counter the difference ourselves.
Or in case you still wish to use two range(..) constructs, you can use zip(..) which takes a list of generators, consumes them concurrently and emits tuples:
my_slice = [int(matrix[m][n]) for m,n in zip(range(x,x+4),range(y,y+4))]
But I think this will not improve performance because of the tuple packing and unpacking overhead.

[int(matrix[x+d][n+d]) for d in range(4)] for one diagonal.
[int(matrix[x+d][n-d]) for d in range(4)] for the other.
Btw, better use standard matrix index names, i.e., row i and column j. Not x and y. It's confusing. I think you even confused yourself, as for example your if x+4 <= len(matrix[x]) tests x against the second dimension length but uses it in the first dimension. Huh?

How to find the index of an array within an array

I have created an array in the way shown below; which represents 3 pairs of co-ordinates. My issue is I don't seem to be able to find the index of a particular pair of co-ordinates within the array.
import numpy as np
R = np.random.uniform(size=(3,2))
R
Out[5]:
array([[ 0.57150157, 0.46611662],
[ 0.37897719, 0.77653461],
[ 0.73994281, 0.7816987 ]])
R.index([ 0.57150157, 0.46611662])
The following is returned:
AttributeError: 'numpy.ndarray' object has no attribute 'index'
The reason I'm trying to do this is so I can extend a list, with the index of a co-ordinate pair, within a for-loop.
e.g.
v = []
for A in R:
v.append(R.index(A))
I'm just not sure why the index function isn't working, and can't seem to find a way around it.
I'm new to programming so excuse me if this seems like nonsense.

index() is a method of the type list, not of numpy.array. Try:
R.tolist().index(x)
Where x is, for example, the third entry of R. This first convert your array into a list, then you can use index ;)

You can achieve the desired result by converting your inner arrays (the coordinates) to tuples.
R = map(lambda x: (x), R);
And then you can find the index of a tuple using R.index((number1, number2));
Hope this helps!
[Edit] To explain what's going on in the code above, the map function goes through (iterates) the items in the array R, and for each one replaces it with the return result of the lambda function.
So it's equivalent to something along these lines:
def someFunction(x):
return (x)
for x in range(0, len(R)):
R[x] = someFunction(R[x])
So it takes each item and does something to it, putting it back in the list. I realized that it may not actually do what I thought it did (returning (x) doesn't seem to change a regular array to a tuple), but it does help your situation because I think by iterating through it python might create a regular array out of the numpy array.
To actually convert to a tuple, the following code should work
R = map(tuple, R)
(credits to https://stackoverflow.com/a/10016379/2612012)

Numpy arrays don't an index function, for a number of reasons. However, I think you're wanting something different.
For example, the code you mentioned:
v = []
for A in R:
v.append(R.index(A))
Would just be (assuming R has unique rows, for the moment):
v = range(len(R))
However, I think you might be wanting the built-in function enumerate. E.g.
for i, row in enumerate(R):
# Presumably you're doing something else with "row"...
v.append(i)
For example, let's say we wanted to know the indies where the sum of each row was greater than 1.
One way to do this would be:
v = []
for i, row in enumerate(R)
if sum(row) > 1:
v.append(i)
However, numpy also provides other ways of doing this, if you're working with numpy arrays. For example, the equivalent to the code above would be:
v, = np.where(R.sum(axis=1) > 1)
If you're just getting started with python, focus on understanding the first example before worry too much about the best way to do things with numpy. Just be aware that numpy arrays behave very differently than lists.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Using list comprehension in data cubes - python

Don't use list comprehensions for numpy arrays, you lose their speed and power. Instead use numpy advanced indexing. For example your comprehension can be written as AM2 = AM.copy() # USe AM2 = AM.copy()[0] if you just want the first row as in your example AM2[AM2 < 1e-5] = 0

For pure Python nested lists, try this: AM2 = [[x if x > 1e-5 else 0 for x in line] for A in AM for line in A] See #FHTMitchell's answer if these are numpy arrays.

Related

Is there a better way to send multiple arguments to itertools.product?

Minimum of pairs between two lists, is there a quicker way?

How to have an array of arrays in Python

Python Iterating through nested list using list comprehension

How to find the index of an array within an array

Categories

Resources