Calculate a discrete mean in python

Calculate a discrete mean in python - python

I have a set of data points for which I have made a program that will look into the data set, from that set take every n points, and sum it, and put it in a new list. And with that I can make a simple bar plots.
Now I'd like to calculate a discrete mean for my new list.
The formula I'm using is this: t_av=(1/nsmp) Sum[N_i*t_i,{i,n_l,n_u}]
Basically I have nsmp bins that have N_i number in them, t_i is a time of a bin, and n_l is the first bin, and n_u is the last bin.
So if my list is this: [373, 156, 73, 27, 16],
I have 5 bins, and I have: t_av=1/5 (373*1+156*2+73*3+27*4+16*5)=218.4
Now I have run into a problem. I tried with this:
for i in range(0,len(L)):
sr_vr = L[i]*i
tsr=sr_vr/nsmp
Where nsmp is the number of bins I can set, and I have L calculated. Since range will go from 0,1,2,3,4 I won't get the correct answer, because my first bin is calculated by 0. If I say range(1,len(L)+1) I'll get IndexError: list index out of range, since that will mess up the L[i]*i part since he will still multiply second (1) element of the list with 1, and then he'll be one entry short for the last part.
How do I correct this?

You can just use L[i]*(i+1) (assuming you stick with zero-based indexing).
However you can also use enumerate() to loop over indices and values together, and you can even provide 1 as the second argument so that the indexing starts at 1 instead of 0.
Here is how I would write this:
tsr = sum(x * i for i, x in enumerate(L, 1)) / len(L)
Note that if you are on Python 2.x and L contains entirely integers this will perform integer division. To get a float just convert one of the arguments to a float (for example float(len(L))). You can also use from __future__ import division.

Related

Random partitioning given array with given bin sizes

How to randomly partition given array with given bin sizes?
Is there an inbuilt function for that? For example, I want something like
function(12,(2,3,3,2,2)) to output four partitions of numbers from 1 go 12 (or 0 to 11, doesn't matter). So output may be a list like [[3,4],[7,8,11],[12,1,2],[5,9],[6,10]](or some other efficient data structure). The first argument of the function may be just a number n, in which case it will consider np.arange(n) as the input, otherwise it may be any other ndarray.
Of course we can randomly permute the list and then pick the first 2, next 3, next 3, next 2 and last 2 elements. But does there exist something more efficient?
numpy.partition() function has a different meaning, it performs a step in quicksort, and I also couldn't find any such function in the numpy.random submodule.

Try this following solution:
def func(a, b:List):
# a is integer and b is a python list
indx = np.random.rand(a).argsort() # Get randomly arranged index
b = np.array(b)
return np.r_[np.split(indx,b.cumsum()[:-1])] # split the index and merge

Fredo and Array Update in python

I will have an interview with a company which like the hackerearth.com. I don't know how to work and doing the code perfectly. Could you help me with the following example?
This is the example for the .hackerearth.com, however, I don't know that I should consider the constraint in the code? can I use a package like NumPy? or I should only use the basic calculation with my self? Could you check my response and let me know the problem with that? Thank you so much
Input Format:
First line of input consists of an integer N denoting the number of elements in the array A.
Second line consists of N space separated integers denoting the array elements.
Output Format:
The only line of output consists of the value of x.
Input Constraints:
1<N<100
1<A[i]<100
explanation:
An initial sum of array is 1+2+3+4+5=15
When we update all elements to 4, the sum of array which is greater than 15 .
Note that if we had updated the array elements to 3, which is not greater than 15 . So, 4 is the minimum value to which array elements need to be updated.
# Write your code here
import numpy as np
A= [1, 2, 3,4,5]
for i in range(1, max(A)+1):
old = sum(A)
new = sum(i*np.ones(len(A)))
diff = new-old
if diff>0:
print(i)
break

Well this isn't Code Review stack exchange, but:
You don't say how to calculate x. It seems to be something to do with finding an average value, but no-one can judge your code without know what it's trying to do. A web search suggests it is this:
Fredo is assigned a new task today. He is given an array A containing N integers. His task is to update all elements of array to some minimum value x , that is, ; such that sum of this new array is strictly greater than the sum of the initial array. Note that x should be as minimum as possible such that sum of the new array is greater than the sum of the initial array.
Given that the task starts by accepting input, it's important that your program does this part.
N = int(input()) # you can put a prompt string in here, but may conflict with limited output
A = list(map(int,input().split()))
# might need input checks
# might need range checks
# might check that A has exactly N values
you don't need to recalculate old = sum(A) every time around your search loop
calculation of new doesn't need a sum at all - it's just new = i * len(A)
there's no point in checking values of i at or below min(A)
your search will fail if all values of A are the same (try it), because you never look above max(A)
These remarks apply to your approach; a more efficient search would be binary chop, and there is also a mathematical way to go straight to the answer from sum(A) without any searching:
x = sum(A) // len(A) + 1

You don't need numpy or looping for this. Get the average of the array elements, then get the next higher integer from this.
N = 5
A = [1, 2, 3, 4, 5]
total = sum(A)
avg = A/N # not checking for zero-divide because conditions say N > 1
x = floor(avg + 1)
print(x)
Adding 1 is necessary to make the new sum greater than the original sum when the average is an exact integer (e.g. 15/5 == 3).

How to loop through an xarray and calculating using an index in python

I have a data variable(sst) in an xarray(nino6), first I use enumerate to assign each value of data variable of the array an index, then I want to calculate with the values of data variable using the index. This code calculates with the indizes itself instead of the data variable values, but I just wanted you to show what I tried.
How can I loop through an index but actually calculating with the values?
for i, entry in enumerate(nino6['sst']):
a=((i-1)+i+(i+1))/3
ssta.append(a)
I apologise for my question is very likely to be really simple (I just started programming), but I searched unsuccesfully here and and on youtube.

If you are trying to get the average of every 3 adjacent numbers in sst, you do it like this:
lst = nino6['sst']
ssta = []
for i in range(1,len(lst) - 1):
a = (lst[i-1] + lst[i] + lst[i+1])/3
ssta.append(a)
Notice that in this implementation, the length of ssta will be smaller than the length of sst by 2 because the first and last numbers do not have flanking numbers. You can have other variations, where you just get the average of two numbers for the first and last numbers.

Coding an iterated sum of sums in python

For alpha and k fixed integers with i < k also fixed, I am trying to encode a sum of the form
where all the x and y variables are known beforehand. (this is essentially the alpha coordinate of a big iterated matrix-vector multiplication)
For a normal sum varying over one index I usually create a 1d array A and set A[i] equal to the i indexed entry of the sum then use sum(A), but in the above instance the entries of the innermost sum depend on the indices in the previous sum, which in turn depend on the indices in the sum before that, all the way back out to the first sum which prevents me using this tact in a straightforward manner.
I tried making a 2D array B of appropriate length and width and setting the 0 row to be the entries in the innermost sum, then the 1 row as the entries in the next sum times sum(np.transpose(B),0) and so on, but the value of the first sum (of row 0) needs to vary with each entry in row 1 since that sum still has indices dependent on our position in row 1, so on and so forth all the way up to sum k-i.
A sum which allows for a 'variable' filled in by each position of the array it's summing through would thusly do the trick, but I can't find anything along these lines in numpy and my attempts to hack one together have thus far failed -- my intuition says there is a solution that involves summing along the axes of a k-i dimensional array, but I haven't been able to make this precise yet. Any assistance is greatly appreciated.

One simple attempt to hard-code something like this would be:
for j0 in range(0,n0):
for j1 in range(0,n1):
....
Edit: (a vectorized version)
You could do something like this: (I didn't test it)
temp = np.ones(n[k-i])
for j in range(0,k-i):
temp = x[:n[k-i-1-j],:n[k-i-j]].T#(y[:n[k-i-j]]*temp)
result = x[alpha,:n[0]]#(y[:n[0]]*temp)
The basic idea is that you try to press it into a matrix-vector form. (note that this is python3 syntax)
Edit: You should note that you need to change the "k-1" to where the innermost sum is (I just did it for all sums up to index k-i)

This is 95% identical to #sehigle's answer, but includes a generic N vector:
def nested_sum(XX, Y, N, alpha):
intermediate = np.ones(N[-1], dtype=XX.dtype)
for n1, n2 in zip(N[-2::-1], N[:0:-1]):
intermediate = np.sum(XX[:n1, :n2] * Y[:n2] * intermediate, axis=1)
return np.sum(XX[alpha, :N[0]] * Y[:N[0]] * intermediate)
Similarly, I have no knowledge of the expression, so I'm not sure how to build appropriate tests. But it runs :\

Plotting occurrences for values higher than a threshold in Python

I have a non-uniform array 'A'.
A = [1,3,2,4,..., 12002, 13242, ...]
I want to explore how many elements from the array 'A' have values above certain threshold values.
For example, there are 1000 elements that have values larger than 1200, so I want to plot the number of elements that have values larger than 1200. Also, there are other 1500 elements that have values larger than 110 (this includes the 1000 elements, whose values are larger than 1200).
This is a rather large data set, so I would not like to omit any kind of information.
Then, I want to plot the number of elements 'N' above a value A vs. Log (A), i.e.
**'Log N(> A)" vs. 'Log (A)'**.
I thought of binning the data, but I was rather unsuccessful.
I haven't done that much statistics in python, so I was wondering if there is a good way to plot this data?
Thanks in advance.

Let me take another crack at what we have:
A = [1, 3, 2, 4, ..., 12002, 13242, ...]
# This is a List of 12,000 zeros.
num_above = [0]*(12000)
# Notice how we can re-write this for-loop!
for i in B:
num_above = [val+1 if key <= i else val for key,val in enumerate(num_above)]
I believe this is what you want. The final list num_above will be such that for num_above[5] equals the number of elements in A that are above 5.
Explanation::
That last line is where all the magic happens. It goes through elements in A (i)and adds one to all the elements in num_above whose index is less than i.
The enumerate(A) statement is an enumerator that generates an iterator of tuples that include the keys and values of all the elements in A: (0,1) (1,3) -> (2,2) -> (3,4) -> ...
Also, the num_above = [x for y in List] statement is known as List Comprehension, and is a really powerful tool in Python.
Improvements: I see you already modified your question to include these changes, but I think they were important.
I removed the numpy dependency. When possible, removing dependencies reduces the complexity of projects, especially larger projects.
I also removed the original list A. This could be replaced with something that was basically like A = range(12000).

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Calculate a discrete mean in python - python

Related

Random partitioning given array with given bin sizes

Fredo and Array Update in python

How to loop through an xarray and calculating using an index in python

Coding an iterated sum of sums in python

Plotting occurrences for values higher than a threshold in Python

Categories

Resources