How to calculate weighted Median for every subarray in 2d matrice? - python

This question is a new one (I've already looked into similar questions and did not find what I need). Therefore:
What is the most efficient way to apply a weighted median to every subarray of a 2d numpy matrix efficiently? (No extra frameworks, but pure numpy if possible)
Data = np.asarray([[ 1.1, 7.8, 3.3, 4.9],
[ 6.1, 9.8, 5.3, 7.9],
[ 4.1, 4.8, 3.3, 7.1],
...
[ 1.1, 7.4, 3.1, 4.9],
[ 7.1, 3.8, 7.3, 8.1],
[ 19.1, 2.8, 3.2, 1.1]])
weights = [0.64, 0.79, 0.91, 0]
Note: the answers to the other questions only show an 1d problem. This problem hast to deal with 1.000.000 subarrays efficiently

Using Data provided by #JoonyoungPark, you can use a list comprehension:
[np.median(i*weights) for i in Data]
[1.8535000000000001,
4.3635,
2.8135,
1.7625000000000002,
3.7729999999999997,
2.5620000000000003]

Related

How to combine these two numpy arrays?

How would I combine these two arrays:
x = np.asarray([[1.0, 1.1, 1.2, 1.3], [2.0, 2.1, 2.2, 2.3], [3.0, 3.1, 3.2, 3.3],
[4.0, 4.1, 4.2, 4.3], [5.0, 5.1, 5.2, 5.3]])
y = np.asarray([[0.1], [0.2], [0.3], [0.4], [0.5]])
Into something like this:
xy = [[0.1, [1.0, 1.1, 1.2, 1.3]], [0.2, [2.0, 2.1, 2.2, 2.3]...
Thank you for the assistance!
Someone suggested I post code that I have tried and I realized I had forgot to:
xy = np.array(list(zip(x, y)))
This is my current solution, however it is extremely inefficient.
You can use zip to combine
[[a,b] for a,b in zip(y,x)]
Out:
[[array([0.1]), array([1. , 1.1, 1.2, 1.3])],
[array([0.2]), array([2. , 2.1, 2.2, 2.3])],
[array([0.3]), array([3. , 3.1, 3.2, 3.3])],
[array([0.4]), array([4. , 4.1, 4.2, 4.3])],
[array([0.5]), array([5. , 5.1, 5.2, 5.3])]]
A pure numpy solution will be much faster than list comprehension for large arrays.
I do have to say your use case makes no sense, as there is no logic in putting these arrays into a single data structure, and I believe you should re check your design.
Like #user2357112 supports Monica was subtly implying, this is very likely an XY problem. See if this is really what you are trying to solve, and not something else. If you want something else, try asking about that.
I strongly suggest checking what you want to do before moving on, as you will put yourself in a place with bad design.
That aside, here's a solution
import numpy as np
x = np.asarray([[1.0, 1.1, 1.2, 1.3], [2.0, 2.1, 2.2, 2.3], [3.0, 3.1, 3.2, 3.3],
[4.0, 4.1, 4.2, 4.3], [5.0, 5.1, 5.2, 5.3]])
y = np.asarray([[0.1], [0.2], [0.3], [0.4], [0.5]])
xy = np.hstack([y, x])
print(xy)
prints
[[0.1 1. 1.1 1.2 1.3]
[0.2 2. 2.1 2.2 2.3]
[0.3 3. 3.1 3.2 3.3]
[0.4 4. 4.1 4.2 4.3]
[0.5 5. 5.1 5.2 5.3]]

Find median for each element in list

I have some large list of data, between 1000 and 10000 elements. Now I want to filter out some peak values with the help of the median function.
#example list with just 10 elements
my_list = [4.5, 4.7, 5.1, 3.9, 9.9, 5.6, 4.3, 0.2, 5.0, 4.6]
#list of medians calculated from 3 elements
my_median_list = []
for i in range(len(my_list)):
if i == 0:
my_median_list.append(statistics.median([my_list[0], my_list[1], my_list[2]])
elif i == (len(my_list)-1):
my_median_list.append(statistics.median([my_list[-1], my_list[-2], my_list[-3]])
else:
my_median_list.append(statistics.median([my_list[i-1], my_list[i], my_list[i+1]])
print(my_median_list)
# [4.7, 4.7, 4.7, 5.1, 5.6, 5.6, 4.3, 4.3, 4.6, 4.6]
This works so far. But I think it looks ugly and is maybe inefficient? Is there a way with statistics or NumPy to do it faster? Or another solution? Also, I look for a solution where I can pass an argument from how many elements the median is calculated. In my example, I used the median always from 3 elements. But with my real data, I want to play with the median setting and then maybe use the median out of 10 elements.
You are calculating too many values since:
my_median_list.append(statistics.median([my_list[i-1], my_list[i], my_list[i+1]])
and
my_median_list.append(statistics.median([my_list[0], my_list[1], my_list[2]])
are the same when i == 1. The same error happens at the end so you get one too many end values.
It's easier and less error-prone to do this with zip() which will make the three element tuples for you:
from statistics import median
my_list = [4.5, 4.7, 5.1, 3.9, 9.9, 5.6, 4.3, 0.2, 5.0, 4.6]
[median(l) for l in zip(my_list, my_list[1:], my_list[2:])]
# [4.7, 4.7, 5.1, 5.6, 5.6, 4.3, 4.3, 4.6]
For groups of arbitrary size collections.deque is super handy because you can set a max size. Then you just keep pushing items on one end and it removes items off the other to maintain the size. Here's a generator example that takes you groups size as n:
from statistics import median
from collections import deque
def rolling_median(l, n):
d = deque(l[0:n], n)
yield median(d)
for num in l[n:]:
d.append(num)
yield median(d)
my_list = [4.5, 4.7, 5.1, 3.9, 9.9, 5.6, 4.3, 0.2, 5.0, 4.6]
list(rolling_median(my_list, 3))
# [4.7, 4.7, 5.1, 5.6, 5.6, 4.3, 4.3, 4.6]
list(rolling_median(my_list, 5))
# [4.7, 5.1, 5.1, 4.3, 5.0, 4.6]

Numpy - custom sort of rows and columns in array

Can I sort the rows or columns of an array according to values stored in a separate list?
For example:
row_keys = [10, 11, 5, 6]
z = np.array([[2.77, 11., 4.1, 7.2],
[3.7, 2.2, 1.1, 0.5],
[2.5, 3.5, 5.0, 9.0],
[4.3, 2.2, 5.1, 6.1]])
Should produce something like
array([[ 2.5, 3.5, 5. , 9. ],
[ 4.3, 2.2, 5.1, 6.1]
[ 2.77, 11. , 4.1, 7.2],
[ 3.7, 2.2, 1.1, 0.5],
])
And similar functionality applied to the columns, please.
Another way for rows
z_rows = z[np.argsort(row_keys)]
and for columns
z_columns = z.T[np.argsort(row_keys)].T

Creating a numpy array of 3D coordinates from three 1D arrays, first index changing fastest

similar to the question here
I have three arbitrary 1D arrays, for example:
x_p = np.array((0.0,1.1, 2.2, 3.3, 4.4))
y_p = np.array((5.5,6.6,7.7))
z_p = np.array((8.8, 9.9))
I need
points = np.array([[0.0, 5.5, 8.8],
[1.1, 5.5, 8.8],
[2.2, 5.5, 8.8],
...
[4.4, 7.7, 9.9]])
1) with the first index changing fastest.2) points are float coordinates, not integer index.
3) I noticed from version 1.7.0, numpy.meshgrid has changed behavior with default indexing='xy' and need to use
np.vstack(np.meshgrid(x_p,y_p,z_p,indexing='ij')).reshape(3,-1).T
to get the result points with last index changing fast, which is not I want.(It was mentioned only from 1.7.0,meshgrid supports dimension>2, I didn't check)
I found this with some trial and error.
I think the ij v xy indexing has been in meshgrid forever (it's the sparse parameter that's newer). It just affects the order of the 3 returned elements.
To get x_p varying fastest I put it last in the argument list, and then used a ::-1 to reverse column order at the end.
I used stack to join the arrays on a new axis at the end, so I don't need to transpose. But the reshaping and transpose's are all cheap (time wise). So they can be used in any combination that works and is understandable.
In [100]: np.stack(np.meshgrid(z_p, y_p, x_p, indexing='ij'),3).reshape(-1,3)[:,::-1]
Out[100]:
array([[ 0. , 5.5, 8.8],
[ 1.1, 5.5, 8.8],
[ 2.2, 5.5, 8.8],
[ 3.3, 5.5, 8.8],
[ 4.4, 5.5, 8.8],
[ 0. , 6.6, 8.8],
...
[ 2.2, 7.7, 9.9],
[ 3.3, 7.7, 9.9],
[ 4.4, 7.7, 9.9]])
You might permute axes with np.transpose to achieve the output in that desired format -
np.array(np.meshgrid(x_p, y_p, z_p)).transpose(3,1,2,0).reshape(-1,3)
Sample output -
In [104]: np.array(np.meshgrid(x_p, y_p, z_p)).transpose(3,1,2,0).reshape(-1,3)
Out[104]:
array([[ 0. , 5.5, 8.8],
[ 1.1, 5.5, 8.8],
[ 2.2, 5.5, 8.8],
[ 3.3, 5.5, 8.8],
[ 4.4, 5.5, 8.8],
[ 0. , 6.6, 8.8],
[ 1.1, 6.6, 8.8],
[ 2.2, 6.6, 8.8],
[ 3.3, 6.6, 8.8],
[ 4.4, 6.6, 8.8],
[ 0. , 7.7, 8.8],
[ 1.1, 7.7, 8.8],
....
[ 3.3, 7.7, 9.9],
[ 4.4, 7.7, 9.9]])

List conversion

I am looking for a way to convert a list like this
[[1.1, 1.2, 1.3, 1.4, 1.5],
[2.1, 2.2, 2.3, 2.4, 2.5],
[3.1, 3.2, 3.3, 3.4, 3.5],
[4.1, 4.2, 4.3, 4.4, 4.5],
[5.1, 5.2, 5.3, 5.4, 5.5]]
to something like this
[[(1.1,1.2),(1.2,1.3),(1.3,1.4),(1.4,1.5)],
[(2.1,2.2),(2.2,2.3),(2.3,2.4),(2.4,2.5)]
.........................................
The following line should do it:
[list(zip(row, row[1:])) for row in m]
where m is your initial 2-dimensional list
UPDATE for second question in comment
You have to transpose (= exchange columns with rows) your 2-dimensional list. The python way to achieve a transposition of m is zip(*m):
[list(zip(column, column[1:])) for column in zip(*m)]
In response to further comment from questioner, two answers:
# Original grid
grid = [[1.1, 1.2, 1.3, 1.4, 1.5],
[2.1, 2.2, 2.3, 2.4, 2.5],
[3.1, 3.2, 3.3, 3.4, 3.5],
[4.1, 4.2, 4.3, 4.4, 4.5],
[5.1, 5.2, 5.3, 5.4, 5.5]]
# Window function to return sequence of pairs.
def window(row):
return [(row[i], row[i + 1]) for i in range(len(row) - 1)]
ORIGINAL QUESTION:
# Print sequences of pairs for grid
print [window(y) for y in grid]
UPDATED QUESTION:
# Take the nth item from every row to get that column.
def column(grid, columnNumber):
return [row[columnNumber] for row in grid]
# Transpose grid to turn it into columns.
def transpose(grid):
# Assume all rows are the same length.
numColumns = len(grid[0])
return [column(grid, columnI) for columnI in range(numColumns)]
# Return windowed pairs for transposed matrix.
print [window(y) for y in transpose(grid)]
Another version would be to use lambda and map
map(lambda x: zip(x,x[1:]),m)
where m is your matrix of choice.
List comprehensions provide a concise way to create lists:
http://docs.python.org/tutorial/datastructures.html#list-comprehensions
[[(a[i],a[i+1]) for i in xrange(len(a)-1)] for a in A]

Categories

Resources