Given a single integer and the number of bins, how to split the integer into as equal parts as possible?
E.g. the sum of the outputs should be equals to the input integer
[in]: x = 20 , num_bins = 3
[out]: (7, 7, 6)
Another e.g.
[in]: x = 20 , num_bins = 6
[out]: (4, 4, 3, 3, 3, 3)
I've tried this:
x = 20
num_bins = 3
y = [int(x/num_bins)] * num_bins
for i in range(x%num_bins):
y[i] += 1
It works but there must be a simpler/better way, maybe using bisect or numpy?
Using numpy from https://stackoverflow.com/a/48899071/610569 , I could do this too:
list(map(len, np.array_split(range(x), num_bins)))
But that's a little convoluted with creating a generate to get the a pretend list and getting the length.
The built-in divmod function could be useful for this.
def near_split(x, num_bins):
quotient, remainder = divmod(x, num_bins)
return [quotient + 1] * remainder + [quotient] * (num_bins - remainder)
Demo
In [11]: near_split(20, 3)
Out[11]: [7, 7, 6]
In [12]: near_split(20, 6)
Out[12]: [4, 4, 3, 3, 3, 3]
Updated simplified using integer arithmetic.
Here's a one-liner:
np.arange(n+k-1, n-1, -1) // k
Little demo:
>>> for k in range(4, 10, 3):
... for n in range(10, 17):
... np.arange(n+k-1, n-1, -1) // k
...
array([3, 3, 2, 2])
array([3, 3, 3, 2])
array([3, 3, 3, 3])
array([4, 3, 3, 3])
array([4, 4, 3, 3])
array([4, 4, 4, 3])
array([4, 4, 4, 4])
array([2, 2, 2, 1, 1, 1, 1])
array([2, 2, 2, 2, 1, 1, 1])
array([2, 2, 2, 2, 2, 1, 1])
array([2, 2, 2, 2, 2, 2, 1])
array([2, 2, 2, 2, 2, 2, 2])
array([3, 2, 2, 2, 2, 2, 2])
array([3, 3, 2, 2, 2, 2, 2])
Related
I have arrays like
arr1['a'] = np.array([1, 1, 1])
arr1['b'] = np.array([1, 1, 1])
arr1['c'] = np.array([1, 1, 1])
b_index = [0, 2, 5]
arr2['a'] = np.array([2, 2, 2, 2, 2, 2])
arr2['b'] = np.array([2, 2, 2, 2, 2, 2])
arr2['c'] = np.array([2, 2, 2, 2, 2, 2])
arr2['f'] = np.array([2, 2, 2, 2, 2, 2])
b_index is the list of indexes.
I want to copy from arr1 to arr2 at indexes in b_index.
so the result should be something like
arr2['a'] = np.array([1, 2, 1, 2, 2, 1])
arr2['b'] = np.array([1, 2, 1, 2, 2, 1])
arr2['c'] = np.array([1, 2, 1, 2, 2, 1])
arr2['f'] = np.array([2, 2, 2, 2, 2, 2])
I can obviously do using loops, but not sure if that is a right way to do that.
We are talking about 100 columns('a','b','c') and around a 1 million rows.
One solution, which might not be optimal, is to use advanced array indexing:
In [1]: arr = np.ones((5, 3))
In [2]: arr2 = np.full((5, 5), 2)
In [3]: arr2[:, [1, 2, 4]] = arr
In [4]: arr2
Out[4]:
array([[2, 1, 1, 2, 1],
[2, 1, 1, 2, 1],
[2, 1, 1, 2, 1],
[2, 1, 1, 2, 1],
[2, 1, 1, 2, 1]])
Does it help ?
I work with large data sets in my research.
I need to duplicate an element in a Numpy array. The code below achieves this, but is there a function in Numpy that performs the operation in a more efficient manner?
"""
Example output
>>> (executing file "example.py")
Choose a number between 1 and 10:
2
Choose number of repetitions:
9
Your output array is:
[1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 4, 5, 6, 7, 8, 9, 10]
>>>
"""
x = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
y = int(input('Choose the number you want to repeat (1-10):\n'))
repetitions = int(input('Choose number of repetitions:\n'))
output = []
for i in range(len(x)):
if x[i] != y:
output.append(x[i])
else:
for j in range(repetitions):
output.append(x[i])
print('Your output array is:\n', output)
One approach would be to find the index of the element to be repeated with np.searchsorted. Use that index to slice the left and right sides of the array and insert the repeated array in between.
Thus, one solution would be -
idx = np.searchsorted(x,y)
out = np.concatenate(( x[:idx], np.repeat(y, repetitions), x[idx+1:] ))
Let's consider a bit more generic sample case with x as -
x = [2, 4, 5, 6, 7, 8, 9, 10]
Let the number to be repeated is y = 5 and repetitions = 7.
Now, use the proposed codes -
In [57]: idx = np.searchsorted(x,y)
In [58]: idx
Out[58]: 2
In [59]: np.concatenate(( x[:idx], np.repeat(y, repetitions), x[idx+1:] ))
Out[59]: array([ 2, 4, 5, 5, 5, 5, 5, 5, 5, 6, 7, 8, 9, 10])
For the specific case of x always being [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], we would have a more compact/elegant solution, like so -
np.r_[x[:y-1], [y]*repetitions, x[y:]]
There is the numpy.repeat function:
>>> np.repeat(3, 4)
array([3, 3, 3, 3])
>>> x = np.array([[1,2],[3,4]])
>>> np.repeat(x, 2)
array([1, 1, 2, 2, 3, 3, 4, 4])
>>> np.repeat(x, 3, axis=1)
array([[1, 1, 1, 2, 2, 2],
[3, 3, 3, 4, 4, 4]])
>>> np.repeat(x, [1, 2], axis=0)
array([[1, 2],
[3, 4],
[3, 4]])
Say you have a numpy vector [0,3,1,1,1] and you run argsort
you will get [0,2,3,4,1] but all the ones are the same!
What I want is an efficient way to shuffle indices of identical values.
Any idea how to do that without a while loop with two indices on the sorted vector?
numpy.array([0,3,1,1,1]).argsort()
Use lexsort:
np.lexsort((b,a)) means Sort by a, then by b
>>> a
array([0, 3, 1, 1, 1])
>>> b=np.random.random(a.size)
>>> b
array([ 0.00673736, 0.90089115, 0.31407214, 0.24299867, 0.7223546 ])
>>> np.lexsort((b,a))
array([0, 3, 2, 4, 1])
>>> a.argsort()
array([0, 2, 3, 4, 1])
>>> a[[0, 3, 2, 4, 1]]
array([0, 1, 1, 1, 3])
>>> a[[0, 2, 3, 4, 1]]
array([0, 1, 1, 1, 3])
This is a bit of a hack, but if your array contains integers only you could add random values and argsort the result. np.random.rand gives you results in [0, 1) so in this case you're guaranteed to maintain the order for non-identical elements.
>>> import numpy as np
>>> arr = np.array([0,3,1,1,1])
>>> np.argsort(arr + np.random.rand(*arr.shape))
array([0, 4, 3, 2, 1])
>>> np.argsort(arr + np.random.rand(*arr.shape))
array([0, 3, 4, 2, 1])
>>> np.argsort(arr + np.random.rand(*arr.shape))
array([0, 3, 4, 2, 1])
>>> np.argsort(arr + np.random.rand(*arr.shape))
array([0, 2, 3, 4, 1])
>>> np.argsort(arr + np.random.rand(*arr.shape))
array([0, 2, 3, 4, 1])
>>> np.argsort(arr + np.random.rand(*arr.shape))
array([0, 4, 2, 3, 1])
Here we see index 0 is always first in the argsort result and index 1 is last, but the rest of the results are in a random order.
In general you could generate random values bounded by np.diff(np.sort(arr)).max(), but you might run into precision issues at some point.
Edit
Working!!! Thanks everyone for your input!
//= in the function is required to port from 2.x to 3.x
I am attempting to factor very large numbers in Python in a timely manner. This is working out except there is a large discrepancy in the values of the primes when multiplied together vs. the original value.
Code:
import math
x = 4327198439888438284329493298321832193892183218382918932183128863216694329
def getPrimes(n):
num = abs(n)
factor = 2
primes = []
while num > 1:
factor = getNext(num, factor)
primes.append(factor)
num /= factor
if n < -1:
primes[0] = -primes[0]
return primes
def getNext(n, f):
if n % 2 == 0:
return 2
for x in range(max(f, 3), int(math.sqrt(n) + 1), 2):
if n % x == 0:
return x
return n
values = getPrimes(x)
orig = int(1);
print(values)
for y in values:
orig *= int(y)
print("\n")
print(x)
print("\n")
print(orig)
print("\n")
print(orig-x)
Output:
[17, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2
, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2
, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
3, 83, 20845357395553.0]
4327198439888438284329493298321832193892183218382918932183128863216694329
4327198439888438374354383059307859040070974971297410068584490149575917568
90024889760986026846178791752914491136401361286359223239
???
When dividing the original number down, it is able to reach one of the prime factors just fine. This makes me confident that the factors that I am getting in the above factorization are correct.
>>> x /= 17
>>> x /= 20845357395553
>>> x /= (2**185)
>>> x /= 3
>>> x
83.0
>>> x /= 83
>>> x
1.0
>>>
TL;DR
I believe that python's code has an error with large-number (int) multiplication, or maybe I'm doing something absolutely crazy, sanity check!
Thanks!
EDIT
I did the second example code in an online python interpreter, notably 2.xx not 3.xx but I did run the code up top in 3.x as some of you noted. Redid the second operation in 3.xx and replaced. Unaware if anyone has an answer to why the code has two separate values for what should be the same.
EDIT - 7/12/2014
After further examination it appears that I have a case in which the factors are incorrect (checking with Wolfram Alpha) and I've switched algo's. I'll be testing later with long's in 2.7.
Python3 has changed what the division operator does.
In Python 2:
>>> 3 / 2
1
In Python 3:
>>> 3 / 2
1.5
>>> 3 // 2
1
Therefore, your getprimes() function should include the following code:
while num > 1:
factor = getNext(num, factor)
primes.append(factor)
num //= factor
Say you have a numpy vector [0,3,1,1,1] and you run argsort
you will get [0,2,3,4,1] but all the ones are the same!
What I want is an efficient way to shuffle indices of identical values.
Any idea how to do that without a while loop with two indices on the sorted vector?
numpy.array([0,3,1,1,1]).argsort()
Use lexsort:
np.lexsort((b,a)) means Sort by a, then by b
>>> a
array([0, 3, 1, 1, 1])
>>> b=np.random.random(a.size)
>>> b
array([ 0.00673736, 0.90089115, 0.31407214, 0.24299867, 0.7223546 ])
>>> np.lexsort((b,a))
array([0, 3, 2, 4, 1])
>>> a.argsort()
array([0, 2, 3, 4, 1])
>>> a[[0, 3, 2, 4, 1]]
array([0, 1, 1, 1, 3])
>>> a[[0, 2, 3, 4, 1]]
array([0, 1, 1, 1, 3])
This is a bit of a hack, but if your array contains integers only you could add random values and argsort the result. np.random.rand gives you results in [0, 1) so in this case you're guaranteed to maintain the order for non-identical elements.
>>> import numpy as np
>>> arr = np.array([0,3,1,1,1])
>>> np.argsort(arr + np.random.rand(*arr.shape))
array([0, 4, 3, 2, 1])
>>> np.argsort(arr + np.random.rand(*arr.shape))
array([0, 3, 4, 2, 1])
>>> np.argsort(arr + np.random.rand(*arr.shape))
array([0, 3, 4, 2, 1])
>>> np.argsort(arr + np.random.rand(*arr.shape))
array([0, 2, 3, 4, 1])
>>> np.argsort(arr + np.random.rand(*arr.shape))
array([0, 2, 3, 4, 1])
>>> np.argsort(arr + np.random.rand(*arr.shape))
array([0, 4, 2, 3, 1])
Here we see index 0 is always first in the argsort result and index 1 is last, but the rest of the results are in a random order.
In general you could generate random values bounded by np.diff(np.sort(arr)).max(), but you might run into precision issues at some point.