How to "inverse" list in python? Like Inverse Function - python

I have some complicated function called dis(x), which returns a number.
I am making two lists called let's say ,,indices'' and ,,values''. So what I do is following:
for i in np.arange(0.01,4,0.01):
values.append(dis(i))
indices.append(i)
So i have following problem, how do i find some index j (from indices), which dis(j) (from values) is closest to some number k.

Combination of enumerate and the argmin function in numpy will do the job for you.
import numpy as np
values = []
indices = []
def dis(x):
return 1e6*x**2
for i in np.arange(0.01,4,0.01):
values.append(dis(i))
indices.append(i)
target = 10000
closest_index = np.argmin([np.abs(x-target) for x in values])
print(closest_index)

The way you are stating it, I see two options:
Brute force it (try many indices i, and then see which dis(i) ended up closest to k. Works best when dis is reasonably fast, and the possible indices are reasonably few.
Learn about optimization: https://en.wikipedia.org/wiki/Optimization_problem. This is a pretty extensive field, but the python SciPy packages has many optimization functions.

Using numpy
closestindice = np.argmin(np.abs(np.array(values)-k))
But it is a strange as it does not use the 'indices' list.
May be you could skip the definition of the 'indices' list and and get the values in a numpy array.
import numpy as np
def dis(i):
return ((i-1)**2)
nprange = np.arange(0.01, 4, 0.01)
npvalues = np.array([dis(x) for x in nprange])
k = .5
closestindice = np.abs(npvalues-k).argmin()
print(closestindice, npvalues[closestindice])
Output:
28 0.5041
By the way, if 'dis' function is not monotone on the range, you could have more than one correct answers on both side of a local extremum.

Related

Numpy: how to vectorize a for loop generating random numbers?

Say I have the following simple function that I use to generate random numbers:
def my_func():
rvs = np.random.random(size=3)
return rvs[2] - rvs[1]
I want to call this function a number of times, lets say 1000 and I want to store the results in an array, for example:
result = []
for _ in range(1000):
result += [my_func()]
Is there a way to use numpy to vectorize this operation and make everything faster? I don't mind if the workflow changes.
If I understand your question correctly, you just need to use the np.random.rand function:
np.random.rand(1000)
This function create an array of the given shape and populate it with random samples from a uniform distribution over [0, 1).
You can vectorize as follows:
rvs_vect = np.random.rand(10000, 3)
result = rvs_vect[:,2] - rvs_vect[:,1]
rvs_vect[:,1] selects all rows in column 1.
rvs_vect[:,2] selects all rows in column 2.
Execution times for instances of 10000 elements on my machine are about 100 times faster than your solution and the other proposed ones (np.vectorize and list comprehension).
Extras
I have prepared an example for you with Numba. Numba is an open source JIT compiler that translates a subset of Python and NumPy code into fast machine code.
Although you will not gain substantial advantages over numpy on this type of operation.
import numba as nb
nb.njit
def my_rand(n):
rvs_vect = np.random.rand(n, 3)
return rvs_vect[:,2] - rvs_vect[:,1]
You could try: result = [my_func() for i in range(1000)], it is already fast enough
Try this:
import numpy as np
def my_func(arr):
rvs = np.random.random(size=3)
return rvs[2] - rvs[1]
vfunc = np.vectorize(my_func)
result = []
result.append(vfunc([1]*1000))
print(result)
Hope it hepls
Explanation:
np.vectorize is vectorizing the function. In normal cases you will pass a numpy array to a function that performs some task on its elements, but here I just passed an anonymous list for the function to be executed 1000 times rest is as you were doing

Float required in list output

I am trying to create a custom filter to run it with the generic filter from SciPy package.
scipy.ndimage.filters.generic_filter
The problem is that I don't know how to get the returned value to be a scalar, as it needs for the generic function to work. I read through these threads (bottom), but I can't find a way for my function to perform.
The code is this:
import scipy.ndimage as sc
def minimum(window):
list = []
for i in range(window.shape[0]):
window[i] -= min(window)
list.append(window[i])
return list
test = np.ones((10, 10)) * np.arange(10)
result = sc.generic_filter(test, minimum, size=3)
It gives the error:
cval, origins, extra_arguments, extra_keywords)
TypeError: a float is required
Scipy filter with multi-dimensional (or non-scalar) output
How to apply ndimage.generic_filter()
http://ilovesymposia.com/2014/06/24/a-clever-use-of-scipys-ndimage-generic_filter-for-n-dimensional-image-processing/
If I understand, you want to substract each pixel the min of its 3-horizontal neighbourhood. It's not a good practice to do that with lists, because numpy is for efficiency( ~100 times faster ). The simplest way to do that is just :
test-sc.generic_filter(test, np.min, size=3)
Then the substraction is vectorized on the whole array.
You can also do:
test-np.min([np.roll(test,1),np.roll(test,-1),test],axis=0)
10 times faster, if you accept the artefact at the border.
Using the example in Scipy filter with multi-dimensional (or non-scalar) output I converted your code to:
def minimum(window,out):
list = []
for i in range(window.shape[0]):
window[i] -= min(window)
list.append(window[i])
out.append(list)
return 0
test = np.ones((10, 10)) * np.arange(10)
result = []
sc.generic_filter(test, minimum, size=3, extra_arguments=(result,))
Now your function minimum outputs its result to the parameter out, and the return value is not used anymore. So the final result matrix contains all the results concatenated, not the output of generic_filter.
Edit 1: Using the generic_filter with a function that returns a scalar, a matrix of the same dimensions is returned. In this case however the lists are appended of each call by the filter which results in a larger matrix (100x9 in this case).

Efficiently convert a vector of bin counts to a vector of bin indices [duplicate]

Given an array of integer counts c, how can I transform that into an array of integers inds such that np.all(np.bincount(inds) == c) is true?
For example:
>>> c = np.array([1,3,2,2])
>>> inverse_bincount(c) # <-- what I need
array([0,1,1,1,2,2,3,3])
Context: I'm trying to keep track of the location of multiple sets of data, while performing computation on all of them at once. I concatenate all the data together for batch processing, but I need an index array to extract the results back out.
Current workaround:
def inverse_bincount(c):
return np.array(list(chain.from_iterable([i]*n for i,n in enumerate(c))))
using numpy.repeat :
np.repeat(np.arange(c.size), c)
no numpy needed :
c = [1,3,2,2]
reduce(lambda x,y: x + [y] * c[y], range(len(c)), [])
The following is about twice as fast on my machine than the currently accepted answer; although I must say I am surprised by how well np.repeat does. I would expect it to suffer a lot from temporary object creation, but it does pretty well.
import numpy as np
c = np.array([1,3,2,2])
p = np.cumsum(c)
i = np.zeros(p[-1],np.int)
np.add.at(i, p[:-1], 1)
print np.cumsum(i)

How to find the index of an array within an array

I have created an array in the way shown below; which represents 3 pairs of co-ordinates. My issue is I don't seem to be able to find the index of a particular pair of co-ordinates within the array.
import numpy as np
R = np.random.uniform(size=(3,2))
R
Out[5]:
array([[ 0.57150157, 0.46611662],
[ 0.37897719, 0.77653461],
[ 0.73994281, 0.7816987 ]])
R.index([ 0.57150157, 0.46611662])
The following is returned:
AttributeError: 'numpy.ndarray' object has no attribute 'index'
The reason I'm trying to do this is so I can extend a list, with the index of a co-ordinate pair, within a for-loop.
e.g.
v = []
for A in R:
v.append(R.index(A))
I'm just not sure why the index function isn't working, and can't seem to find a way around it.
I'm new to programming so excuse me if this seems like nonsense.
index() is a method of the type list, not of numpy.array. Try:
R.tolist().index(x)
Where x is, for example, the third entry of R. This first convert your array into a list, then you can use index ;)
You can achieve the desired result by converting your inner arrays (the coordinates) to tuples.
R = map(lambda x: (x), R);
And then you can find the index of a tuple using R.index((number1, number2));
Hope this helps!
[Edit] To explain what's going on in the code above, the map function goes through (iterates) the items in the array R, and for each one replaces it with the return result of the lambda function.
So it's equivalent to something along these lines:
def someFunction(x):
return (x)
for x in range(0, len(R)):
R[x] = someFunction(R[x])
So it takes each item and does something to it, putting it back in the list. I realized that it may not actually do what I thought it did (returning (x) doesn't seem to change a regular array to a tuple), but it does help your situation because I think by iterating through it python might create a regular array out of the numpy array.
To actually convert to a tuple, the following code should work
R = map(tuple, R)
(credits to https://stackoverflow.com/a/10016379/2612012)
Numpy arrays don't an index function, for a number of reasons. However, I think you're wanting something different.
For example, the code you mentioned:
v = []
for A in R:
v.append(R.index(A))
Would just be (assuming R has unique rows, for the moment):
v = range(len(R))
However, I think you might be wanting the built-in function enumerate. E.g.
for i, row in enumerate(R):
# Presumably you're doing something else with "row"...
v.append(i)
For example, let's say we wanted to know the indies where the sum of each row was greater than 1.
One way to do this would be:
v = []
for i, row in enumerate(R)
if sum(row) > 1:
v.append(i)
However, numpy also provides other ways of doing this, if you're working with numpy arrays. For example, the equivalent to the code above would be:
v, = np.where(R.sum(axis=1) > 1)
If you're just getting started with python, focus on understanding the first example before worry too much about the best way to do things with numpy. Just be aware that numpy arrays behave very differently than lists.

Best way to create a NumPy array from a dictionary?

I'm just starting with NumPy so I may be missing some core concepts...
What's the best way to create a NumPy array from a dictionary whose values are lists?
Something like this:
d = { 1: [10,20,30] , 2: [50,60], 3: [100,200,300,400,500] }
Should turn into something like:
data = [
[10,20,30,?,?],
[50,60,?,?,?],
[100,200,300,400,500]
]
I'm going to do some basic statistics on each row, eg:
deviations = numpy.std(data, axis=1)
Questions:
What's the best / most efficient way to create the numpy.array from the dictionary? The dictionary is large; a couple of million keys, each with ~20 items.
The number of values for each 'row' are different. If I understand correctly numpy wants uniform size, so what do I fill in for the missing items to make std() happy?
Update: One thing I forgot to mention - while the python techniques are reasonable (eg. looping over a few million items is fast), it's constrained to a single CPU. Numpy operations scale nicely to the hardware and hit all the CPUs, so they're attractive.
You don't need to create numpy arrays to call numpy.std().
You can call numpy.std() in a loop over all the values of your dictionary. The list will be converted to a numpy array on the fly to compute the standard variation.
The downside of this method is that the main loop will be in python and not in C. But I guess this should be fast enough: you will still compute std at C speed, and you will save a lot of memory as you won't have to store 0 values where you have variable size arrays.
If you want to further optimize this, you can store your values into a list of numpy arrays, so that you do the python list -> numpy array conversion only once.
if you find that this is still too slow, try to use psycho to optimize the python loop.
if this is still too slow, try using Cython together with the numpy module. This Tutorial claims impressive speed improvements for image processing. Or simply program the whole std function in Cython (see this for benchmarks and examples with sum function )
An alternative to Cython would be to use SWIG with numpy.i.
if you want to use only numpy and have everything computed at C level, try grouping all the records of same size together in different arrays and call numpy.std() on each of them. It should look like the following example.
example with O(N) complexity:
import numpy
list_size_1 = []
list_size_2 = []
for row in data.itervalues():
if len(row) == 1:
list_size_1.append(row)
elif len(row) == 2:
list_size_2.append(row)
list_size_1 = numpy.array(list_size_1)
list_size_2 = numpy.array(list_size_2)
std_1 = numpy.std(list_size_1, axis = 1)
std_2 = numpy.std(list_size_2, axis = 1)
While there are already some pretty reasonable ideas present here, I believe following is worth mentioning.
Filling missing data with any default value would spoil the statistical characteristics (std, etc). Evidently that's why Mapad proposed the nice trick with grouping same sized records.
The problem with it (assuming there isn't any a priori data on records lengths is at hand) is that it involves even more computations than the straightforward solution:
at least O(N*logN) 'len' calls and comparisons for sorting with an effective algorithm
O(N) checks on the second way through the list to obtain groups(their beginning and end indexes on the 'vertical' axis)
Using Psyco is a good idea (it's strikingly easy to use, so be sure to give it a try).
It seems that the optimal way is to take the strategy described by Mapad in bullet #1, but with a modification - not to generate the whole list, but iterate through the dictionary converting each row into numpy.array and performing required computations. Like this:
for row in data.itervalues():
np_row = numpy.array(row)
this_row_std = numpy.std(np_row)
# compute any other statistic descriptors needed and then save to some list
In any case a few million loops in python won't take as long as one might expect. Besides this doesn't look like a routine computation, so who cares if it takes extra second/minute if it is run once in a while or even just once.
A generalized variant of what was suggested by Mapad:
from numpy import array, mean, std
def get_statistical_descriptors(a):
if ax = len(shape(a))-1
functions = [mean, std]
return f(a, axis = ax) for f in functions
def process_long_list_stats(data):
import numpy
groups = {}
for key, row in data.iteritems():
size = len(row)
try:
groups[size].append(key)
except KeyError:
groups[size] = ([key])
results = []
for gr_keys in groups.itervalues():
gr_rows = numpy.array([data[k] for k in gr_keys])
stats = get_statistical_descriptors(gr_rows)
results.extend( zip(gr_keys, zip(*stats)) )
return dict(results)
numpy dictionary
You can use a structured array to preserve the ability to address a numpy object by a key, like a dictionary.
import numpy as np
dd = {'a':1,'b':2,'c':3}
dtype = eval('[' + ','.join(["('%s', float)" % key for key in dd.keys()]) + ']')
values = [tuple(dd.values())]
numpy_dict = np.array(values, dtype=dtype)
numpy_dict['c']
will now output
array([ 3.])

Categories

Resources