Find median for each element in list - python

I have some large list of data, between 1000 and 10000 elements. Now I want to filter out some peak values with the help of the median function.
#example list with just 10 elements
my_list = [4.5, 4.7, 5.1, 3.9, 9.9, 5.6, 4.3, 0.2, 5.0, 4.6]
#list of medians calculated from 3 elements
my_median_list = []
for i in range(len(my_list)):
if i == 0:
my_median_list.append(statistics.median([my_list[0], my_list[1], my_list[2]])
elif i == (len(my_list)-1):
my_median_list.append(statistics.median([my_list[-1], my_list[-2], my_list[-3]])
else:
my_median_list.append(statistics.median([my_list[i-1], my_list[i], my_list[i+1]])
print(my_median_list)
# [4.7, 4.7, 4.7, 5.1, 5.6, 5.6, 4.3, 4.3, 4.6, 4.6]
This works so far. But I think it looks ugly and is maybe inefficient? Is there a way with statistics or NumPy to do it faster? Or another solution? Also, I look for a solution where I can pass an argument from how many elements the median is calculated. In my example, I used the median always from 3 elements. But with my real data, I want to play with the median setting and then maybe use the median out of 10 elements.

You are calculating too many values since:
my_median_list.append(statistics.median([my_list[i-1], my_list[i], my_list[i+1]])
and
my_median_list.append(statistics.median([my_list[0], my_list[1], my_list[2]])
are the same when i == 1. The same error happens at the end so you get one too many end values.
It's easier and less error-prone to do this with zip() which will make the three element tuples for you:
from statistics import median
my_list = [4.5, 4.7, 5.1, 3.9, 9.9, 5.6, 4.3, 0.2, 5.0, 4.6]
[median(l) for l in zip(my_list, my_list[1:], my_list[2:])]
# [4.7, 4.7, 5.1, 5.6, 5.6, 4.3, 4.3, 4.6]
For groups of arbitrary size collections.deque is super handy because you can set a max size. Then you just keep pushing items on one end and it removes items off the other to maintain the size. Here's a generator example that takes you groups size as n:
from statistics import median
from collections import deque
def rolling_median(l, n):
d = deque(l[0:n], n)
yield median(d)
for num in l[n:]:
d.append(num)
yield median(d)
my_list = [4.5, 4.7, 5.1, 3.9, 9.9, 5.6, 4.3, 0.2, 5.0, 4.6]
list(rolling_median(my_list, 3))
# [4.7, 4.7, 5.1, 5.6, 5.6, 4.3, 4.3, 4.6]
list(rolling_median(my_list, 5))
# [4.7, 5.1, 5.1, 4.3, 5.0, 4.6]

Related

Reduce python list size while preserving information

I have multiple long lists in my program. Each list has approximately 3000 float values.
And there are around 100 such lists.
I want to reduce the size of each list to say, 500, while preserving the information in the original list. I know that it is not possible to completely preserve the information, but I would like to have the elements in the original list to have contribution to the values of the smaller list.
Let's say we have the following list, and want to shorten it to a lists of size 3 or 4.
myList = [[4.3, 2.3, 5.1, 6.4, 3.2, 7.7, 1.5, 6.5, 7.4, 4.1],
[7.3, 3.5, 6.2, 7.4, 2.6, 3.7, 2.6, 7.1, 3.4, 7.1],
[4.7, 2.6, 5.6, 7.4, 3.7, 7.7, 3.5, 6.5, 7.2, 4.1],
[7.3, 7.3, 4.1, 6.6, 2.2, 3.9, 1.6, 3.0, 2.3, 4.6],
[4.7, 2.3, 5.7, 6.4, 3.4, 6.8, 7.2, 6.9, 8.4, 7.1]]
Is there some way to do this. Maybe by averaging of some sort (?)
You can do something like this:
from statistics import mean, stdev
myList = [[4.3, 2.3, 5.1, 6.4, 3.2, 7.7, 1.5, 6.5, 7.4, 4.1], [2.3, 6.4, 3.2, 7.7, 1.5, 6.5, 7.4, 4.1]]
shorten_list = [[max(i)-min(i), mean(i), round(stdev(i), 5)] for i in myList]
You can also include information such as the sum of the list or the mode. If you just want to take the mean of each list within your list, you can just do this:
from statistics import mean
mean_list = list(map(mean, myList))
batching may work.
I request you to look at this question
How do I split a list into equally-sized chunks?
this converts the list into equal batches.
or can sequence the dimension of the list using max pool layer
import numpy as np
from keras.models import Sequential
from keras.layers import MaxPooling2D
image = np.array([[4.3, 2.3, 5.1, 6.4, 3.2, 7.7, 1.5, 6.5, 7.4, 4.1],
[7.3, 3.5, 6.2, 7.4, 2.6, 3.7, 2.6, 7.1, 3.4, 7.1],
[4.7, 2.6, 5.6, 7.4, 3.7, 7.7, 3.5, 6.5, 7.2, 4.1],
[7.3, 7.3, 4.1, 6.6, 2.2, 3.9, 1.6, 3.0, 2.3, 4.6],
[4.7, 2.3, 5.7, 6.4, 3.4, 6.8, 7.2, 6.9, 8.4, 7.1]]
)
image = image.reshape(1, 5, 10, 1)
model = Sequential([MaxPooling2D(pool_size =(1,10), strides = (1))])
output = model.predict(image)
print(output)
this gives output as
[[[[7.7]]
[[7.4]]
[[7.7]]
[[7.3]]
[[8.4]]]]
if you want to change the output size, can change the pool size.

algorithm on how to put as many numbers of a list into different capacity of buckets

I am trying to figure out an algorithm to put as many numbers as possible into a list of different capacity of buckets. It's aimed to solve a problem that a group of runners who run different miles cover as many Ragnar relay legs without skipping a leg.
Thirty-six numbers (legs in miles) below can be repeated many times and can start with any legs in the list.
legs = [3.3, 4.2, 5.2, 3, 2.7, 4,
5.3, 4.5, 3, 5.8, 3.3, 4.9,
3.1, 3.2, 4, 3.5, 4.9, 2.3,
3.2, 4.6, 4.5, 4, 5.3, 5.9,
2.8, 1.9, 2.1, 3, 2.5, 5.6,
1.3, 4.6, 1.5, 1.2, 4.1, 8.1]
A list of runs:
runs = [3.2, 12.3, 5.2, 2.9, 2.9, 5.5]
It becomes an optimization problem that try to put as many numbers to different capacity of buckets if we think legs are list of numbers and runs are buckets.
Given a start leg (1 in this case), find out how many legs can be covered. Below is a sample output starting with leg 1:
Total run mileage = 32.0
Total legs covered = 7 (L1, L2, L3, L4, L5, L6, L7) Total mileage used = 27.7
Total mileage wasted = 4.3
{'Total': 3.2, 'Reminder': 0.2, 'Index': 0, 'L4': 3}
{'Total': 12.3, 'Reminder': 0.8, 'Index': 1, 'L1': 3.3, 'L2': 4.2, 'L6': 4}
{'Total': 5.2, 'Reminder': 0.0, 'Index': 2, 'L3': 5.2}
{'Total': 2.9, 'Reminder': 0.2, 'Index': 3, 'L5': 2.7}
{'Total': 2.9, 'Reminder': 2.9, 'Index': 4}
{'Total': 5.5, 'Reminder': 0.2, 'Index': 5, 'L7': 5.3}
I think this can be written and solved as an explicit optimization problem (to be precise an integer programming model):
Input data:
L[i], r[j] (legs and runs)
Binary variables
assign[i,j] (runner j does leg i)
covered[i] (leg i is covered by a runner)
Model:
max sum(i, covered[i]) (objective)
sum(i,L[i]*assign[i,j]) <= r[j] (runner capacity)
covered[i] <= sum(j,assign[i,j]) (leg is covered)
covered[i] <= covered[i-1] (ordering of legs)
This is not code but the mathematical model. The code will depend on the MIP solver (and modeling tool) that is being used. When solve this model I get:
---- 52 PARAMETER report results
run1 run2 run3 run4 run5 run6
leg1 3.300
leg2 4.200
leg3 5.200
leg4 3.000
leg5 2.700
leg6 4.000
leg7 5.300
total 3.000 11.500 5.200 2.700 5.300
runLen 3.200 12.300 5.200 2.900 2.900 5.500
Note: I basically printed here assign[i,j]*covered[j]*L[i]. The reason is that some variables assign[i,j] may be turned on, while the corresponding covered[j]=0. So just printing assign may be a bit misleading.
A sample implementation using cvxpy can look like:
import cvxpy as cp
legs = [3.3, 4.2, 5.2, 3, 2.7, 4,
5.3, 4.5, 3, 5.8, 3.3, 4.9,
3.1, 3.2, 4, 3.5, 4.9, 2.3,
3.2, 4.6, 4.5, 4, 5.3, 5.9,
2.8, 1.9, 2.1, 3, 2.5, 5.6,
1.3, 4.6, 1.5, 1.2, 4.1, 8.1]
runs = [3.2, 12.3, 5.2, 2.9, 2.9, 5.5]
n = len(legs)
m = len(runs)
assign = cp.Variable((n,m),boolean=True)
covered = cp.Variable(n,boolean=True)
prob = cp.Problem(cp.Maximize(cp.sum(covered)),
[
assign.T#legs <= runs,
covered <= cp.sum(assign,axis=1),
covered[1:n]<=covered[0:(n-1)]
])
prob.solve()
# reporting of solution
L = round(prob.value)
result = assign.value[0:L,]
for i in range(L):
for j in range(m):
result[i,j] *= covered.value[i]*legs[i]
print(result)
I just transcribed the mathematical model here.

How to combine these two numpy arrays?

How would I combine these two arrays:
x = np.asarray([[1.0, 1.1, 1.2, 1.3], [2.0, 2.1, 2.2, 2.3], [3.0, 3.1, 3.2, 3.3],
[4.0, 4.1, 4.2, 4.3], [5.0, 5.1, 5.2, 5.3]])
y = np.asarray([[0.1], [0.2], [0.3], [0.4], [0.5]])
Into something like this:
xy = [[0.1, [1.0, 1.1, 1.2, 1.3]], [0.2, [2.0, 2.1, 2.2, 2.3]...
Thank you for the assistance!
Someone suggested I post code that I have tried and I realized I had forgot to:
xy = np.array(list(zip(x, y)))
This is my current solution, however it is extremely inefficient.
You can use zip to combine
[[a,b] for a,b in zip(y,x)]
Out:
[[array([0.1]), array([1. , 1.1, 1.2, 1.3])],
[array([0.2]), array([2. , 2.1, 2.2, 2.3])],
[array([0.3]), array([3. , 3.1, 3.2, 3.3])],
[array([0.4]), array([4. , 4.1, 4.2, 4.3])],
[array([0.5]), array([5. , 5.1, 5.2, 5.3])]]
A pure numpy solution will be much faster than list comprehension for large arrays.
I do have to say your use case makes no sense, as there is no logic in putting these arrays into a single data structure, and I believe you should re check your design.
Like #user2357112 supports Monica was subtly implying, this is very likely an XY problem. See if this is really what you are trying to solve, and not something else. If you want something else, try asking about that.
I strongly suggest checking what you want to do before moving on, as you will put yourself in a place with bad design.
That aside, here's a solution
import numpy as np
x = np.asarray([[1.0, 1.1, 1.2, 1.3], [2.0, 2.1, 2.2, 2.3], [3.0, 3.1, 3.2, 3.3],
[4.0, 4.1, 4.2, 4.3], [5.0, 5.1, 5.2, 5.3]])
y = np.asarray([[0.1], [0.2], [0.3], [0.4], [0.5]])
xy = np.hstack([y, x])
print(xy)
prints
[[0.1 1. 1.1 1.2 1.3]
[0.2 2. 2.1 2.2 2.3]
[0.3 3. 3.1 3.2 3.3]
[0.4 4. 4.1 4.2 4.3]
[0.5 5. 5.1 5.2 5.3]]

How to calculate weighted Median for every subarray in 2d matrice?

This question is a new one (I've already looked into similar questions and did not find what I need). Therefore:
What is the most efficient way to apply a weighted median to every subarray of a 2d numpy matrix efficiently? (No extra frameworks, but pure numpy if possible)
Data = np.asarray([[ 1.1, 7.8, 3.3, 4.9],
[ 6.1, 9.8, 5.3, 7.9],
[ 4.1, 4.8, 3.3, 7.1],
...
[ 1.1, 7.4, 3.1, 4.9],
[ 7.1, 3.8, 7.3, 8.1],
[ 19.1, 2.8, 3.2, 1.1]])
weights = [0.64, 0.79, 0.91, 0]
Note: the answers to the other questions only show an 1d problem. This problem hast to deal with 1.000.000 subarrays efficiently
Using Data provided by #JoonyoungPark, you can use a list comprehension:
[np.median(i*weights) for i in Data]
[1.8535000000000001,
4.3635,
2.8135,
1.7625000000000002,
3.7729999999999997,
2.5620000000000003]

List conversion

I am looking for a way to convert a list like this
[[1.1, 1.2, 1.3, 1.4, 1.5],
[2.1, 2.2, 2.3, 2.4, 2.5],
[3.1, 3.2, 3.3, 3.4, 3.5],
[4.1, 4.2, 4.3, 4.4, 4.5],
[5.1, 5.2, 5.3, 5.4, 5.5]]
to something like this
[[(1.1,1.2),(1.2,1.3),(1.3,1.4),(1.4,1.5)],
[(2.1,2.2),(2.2,2.3),(2.3,2.4),(2.4,2.5)]
.........................................
The following line should do it:
[list(zip(row, row[1:])) for row in m]
where m is your initial 2-dimensional list
UPDATE for second question in comment
You have to transpose (= exchange columns with rows) your 2-dimensional list. The python way to achieve a transposition of m is zip(*m):
[list(zip(column, column[1:])) for column in zip(*m)]
In response to further comment from questioner, two answers:
# Original grid
grid = [[1.1, 1.2, 1.3, 1.4, 1.5],
[2.1, 2.2, 2.3, 2.4, 2.5],
[3.1, 3.2, 3.3, 3.4, 3.5],
[4.1, 4.2, 4.3, 4.4, 4.5],
[5.1, 5.2, 5.3, 5.4, 5.5]]
# Window function to return sequence of pairs.
def window(row):
return [(row[i], row[i + 1]) for i in range(len(row) - 1)]
ORIGINAL QUESTION:
# Print sequences of pairs for grid
print [window(y) for y in grid]
UPDATED QUESTION:
# Take the nth item from every row to get that column.
def column(grid, columnNumber):
return [row[columnNumber] for row in grid]
# Transpose grid to turn it into columns.
def transpose(grid):
# Assume all rows are the same length.
numColumns = len(grid[0])
return [column(grid, columnI) for columnI in range(numColumns)]
# Return windowed pairs for transposed matrix.
print [window(y) for y in transpose(grid)]
Another version would be to use lambda and map
map(lambda x: zip(x,x[1:]),m)
where m is your matrix of choice.
List comprehensions provide a concise way to create lists:
http://docs.python.org/tutorial/datastructures.html#list-comprehensions
[[(a[i],a[i+1]) for i in xrange(len(a)-1)] for a in A]

Categories

Resources