Random partitioning given array with given bin sizes - python

How to randomly partition given array with given bin sizes?
Is there an inbuilt function for that? For example, I want something like
function(12,(2,3,3,2,2)) to output four partitions of numbers from 1 go 12 (or 0 to 11, doesn't matter). So output may be a list like [[3,4],[7,8,11],[12,1,2],[5,9],[6,10]](or some other efficient data structure). The first argument of the function may be just a number n, in which case it will consider np.arange(n) as the input, otherwise it may be any other ndarray.
Of course we can randomly permute the list and then pick the first 2, next 3, next 3, next 2 and last 2 elements. But does there exist something more efficient?
numpy.partition() function has a different meaning, it performs a step in quicksort, and I also couldn't find any such function in the numpy.random submodule.

Try this following solution:
def func(a, b:List):
# a is integer and b is a python list
indx = np.random.rand(a).argsort() # Get randomly arranged index
b = np.array(b)
return np.r_[np.split(indx,b.cumsum()[:-1])] # split the index and merge

Related

Finding the two elements in a list that give the minimum absolute difference among all elements

Let's say I have a list: l=[7,2,20,9] and I wan't to find the minimum absolute difference among all elements within (in this case it would be 9-7 = 2 or equivalently |7-9|). To do it in nlogn complexity, I need to do sort, take the difference, and find the minimum element:
import numpy as np
sorted_l = sorted(l) # sort list
diff_sorted = abs(np.diff(sorted_l)) # get absolute value differences
min_diff = min(diff_sorted) # get min element
However, after doing this, I need to track which elements were used in the original l list that gave rise to this difference. So for l the minimum difference is 2 and the output I need is 7 and 9 since 9-7 is 2. Is there a way to do this? sorted method ruins the order and it's hard to backtrack. Am I missing something obvious? Thanks.
Use:
index = diff_sorted.tolist().index(min_diff)
sorted_l[index:index+2]
Output
[7, 9]
Whole Script
import numpy as np
l=[12,24,36,35,7]
sorted_l = sorted(l)
diff_sorted = np.diff(sorted_l)
min_diff = min(diff_sorted)
index = diff_sorted.tolist().index(min_diff)
sorted_l[index:index+2]
Output
[35, 36]
Explanation
tolist is transforming the numpy array into a list whose functions contain a index which gives you the index of the input argument. Therefore, using tolist and index functions, we get the index of the minimum in the sorted array. Using this index, we get two numbers which resulted the minimum difference ([index:index+2] is selecting two number in the sorted array)

Fredo and Array Update in python

I will have an interview with a company which like the hackerearth.com. I don't know how to work and doing the code perfectly. Could you help me with the following example?
This is the example for the .hackerearth.com, however, I don't know that I should consider the constraint in the code? can I use a package like NumPy? or I should only use the basic calculation with my self? Could you check my response and let me know the problem with that? Thank you so much
Input Format:
First line of input consists of an integer N denoting the number of elements in the array A.
Second line consists of N space separated integers denoting the array elements.
Output Format:
The only line of output consists of the value of x.
Input Constraints:
1<N<100
1<A[i]<100
explanation:
An initial sum of array is 1+2+3+4+5=15
When we update all elements to 4, the sum of array which is greater than 15 .
Note that if we had updated the array elements to 3, which is not greater than 15 . So, 4 is the minimum value to which array elements need to be updated.
# Write your code here
import numpy as np
A= [1, 2, 3,4,5]
for i in range(1, max(A)+1):
old = sum(A)
new = sum(i*np.ones(len(A)))
diff = new-old
if diff>0:
print(i)
break
Well this isn't Code Review stack exchange, but:
You don't say how to calculate x. It seems to be something to do with finding an average value, but no-one can judge your code without know what it's trying to do. A web search suggests it is this:
Fredo is assigned a new task today. He is given an array A containing N integers. His task is to update all elements of array to some minimum value x , that is, ; such that sum of this new array is strictly greater than the sum of the initial array. Note that x should be as minimum as possible such that sum of the new array is greater than the sum of the initial array.
Given that the task starts by accepting input, it's important that your program does this part.
N = int(input()) # you can put a prompt string in here, but may conflict with limited output
A = list(map(int,input().split()))
# might need input checks
# might need range checks
# might check that A has exactly N values
you don't need to recalculate old = sum(A) every time around your search loop
calculation of new doesn't need a sum at all - it's just new = i * len(A)
there's no point in checking values of i at or below min(A)
your search will fail if all values of A are the same (try it), because you never look above max(A)
These remarks apply to your approach; a more efficient search would be binary chop, and there is also a mathematical way to go straight to the answer from sum(A) without any searching:
x = sum(A) // len(A) + 1
You don't need numpy or looping for this. Get the average of the array elements, then get the next higher integer from this.
N = 5
A = [1, 2, 3, 4, 5]
total = sum(A)
avg = A/N # not checking for zero-divide because conditions say N > 1
x = floor(avg + 1)
print(x)
Adding 1 is necessary to make the new sum greater than the original sum when the average is an exact integer (e.g. 15/5 == 3).

Randomly select a length of numbers from numpy array

I have a number of data files, each containing a large amount of data points.
After loading the file with numpy, I get a numpy array:
f=np.loadtxt("...-1.txt")
How do I randomly select a length of x, but the order of numbers should not be changed?
For example:
f = [1,5,3,7,4,8]
if I wanted to select a random length of 3 data points, the output should be:
1,5,3, or
3,7,4, or
5,3,7, etc.
Pure logic will get you there.
For a list f and a max length x, the valid starting points of your random slices are limited to 0, len(f)-x:
0 1 2 3
f = [1,5,3,7,4,8]
So all valid starting point can be selected with random.randrange(len(f)-x+1) (where the +1 is because randrange works like range).
Store the random starting point into a variable start and slice your array with [start:start+x], or be creative and use another slice after the first:
result = f[random.randrange(len(f)-x+1):][:3]
Building on usr2564301's answer you can take out only the elements you need in 1 go using a range so you avoid building a potentially very large intermediate array:
result = f[range(random.randrange(len(f)-x+1), x)]
A range also avoids that you build large index arrays when your length x becomes larger.

Plotting occurrences for values higher than a threshold in Python

I have a non-uniform array 'A'.
A = [1,3,2,4,..., 12002, 13242, ...]
I want to explore how many elements from the array 'A' have values above certain threshold values.
For example, there are 1000 elements that have values larger than 1200, so I want to plot the number of elements that have values larger than 1200. Also, there are other 1500 elements that have values larger than 110 (this includes the 1000 elements, whose values are larger than 1200).
This is a rather large data set, so I would not like to omit any kind of information.
Then, I want to plot the number of elements 'N' above a value A vs. Log (A), i.e.
**'Log N(> A)" vs. 'Log (A)'**.
I thought of binning the data, but I was rather unsuccessful.
I haven't done that much statistics in python, so I was wondering if there is a good way to plot this data?
Thanks in advance.
Let me take another crack at what we have:
A = [1, 3, 2, 4, ..., 12002, 13242, ...]
# This is a List of 12,000 zeros.
num_above = [0]*(12000)
# Notice how we can re-write this for-loop!
for i in B:
num_above = [val+1 if key <= i else val for key,val in enumerate(num_above)]
I believe this is what you want. The final list num_above will be such that for num_above[5] equals the number of elements in A that are above 5.
Explanation::
That last line is where all the magic happens. It goes through elements in A (i)and adds one to all the elements in num_above whose index is less than i.
The enumerate(A) statement is an enumerator that generates an iterator of tuples that include the keys and values of all the elements in A: (0,1) (1,3) -> (2,2) -> (3,4) -> ...
Also, the num_above = [x for y in List] statement is known as List Comprehension, and is a really powerful tool in Python.
Improvements: I see you already modified your question to include these changes, but I think they were important.
I removed the numpy dependency. When possible, removing dependencies reduces the complexity of projects, especially larger projects.
I also removed the original list A. This could be replaced with something that was basically like A = range(12000).

Calculate a discrete mean in python

I have a set of data points for which I have made a program that will look into the data set, from that set take every n points, and sum it, and put it in a new list. And with that I can make a simple bar plots.
Now I'd like to calculate a discrete mean for my new list.
The formula I'm using is this: t_av=(1/nsmp) Sum[N_i*t_i,{i,n_l,n_u}]
Basically I have nsmp bins that have N_i number in them, t_i is a time of a bin, and n_l is the first bin, and n_u is the last bin.
So if my list is this: [373, 156, 73, 27, 16],
I have 5 bins, and I have: t_av=1/5 (373*1+156*2+73*3+27*4+16*5)=218.4
Now I have run into a problem. I tried with this:
for i in range(0,len(L)):
sr_vr = L[i]*i
tsr=sr_vr/nsmp
Where nsmp is the number of bins I can set, and I have L calculated. Since range will go from 0,1,2,3,4 I won't get the correct answer, because my first bin is calculated by 0. If I say range(1,len(L)+1) I'll get IndexError: list index out of range, since that will mess up the L[i]*i part since he will still multiply second (1) element of the list with 1, and then he'll be one entry short for the last part.
How do I correct this?
You can just use L[i]*(i+1) (assuming you stick with zero-based indexing).
However you can also use enumerate() to loop over indices and values together, and you can even provide 1 as the second argument so that the indexing starts at 1 instead of 0.
Here is how I would write this:
tsr = sum(x * i for i, x in enumerate(L, 1)) / len(L)
Note that if you are on Python 2.x and L contains entirely integers this will perform integer division. To get a float just convert one of the arguments to a float (for example float(len(L))). You can also use from __future__ import division.

Categories

Resources