I've been writing a program to brute force check a sequence of numbers to look for euler bricks, but the method that I came up with involves a triple loop. Since nested Python loops get notoriously slow, I was wondering if there was a better way using numpy to create the array of values that I need.
#x=max side length of brick. User Input.
for t in range(3,x):
a=[];b=[];c=[];
for u in range(2,t):
for v in range(1,u):
a.append(t)
b.append(u)
c.append(v)
a=np.array(a)
b=np.array(b)
c=np.array(c)
...
Is there a better way to generate the array af values, using numpy commands?
Thanks.
Example:
If x=10, when t=3 I want to get:
a=[3]
b=[2]
c=[1]
the first time through the loop. After that, when t=4:
a=[4, 4, 4]
b=[2, 3, 3]
c=[1, 1, 2]
The third time (t=5) I want:
a=[5, 5, 5, 5, 5, 5]
b=[2, 3, 3, 4, 4, 4]
c=[1, 1, 2, 1, 2, 3]
and so on, up to max side lengths around 5000 or so.
EDIT: Solution
a=array(3)
b=array(2)
c=array(1)
for i in range(4,x): #Removing the (3,2,1) check from code does not affect results.
foo=arange(1,i-1)
foo2=empty(len(foo))
foo2.fill(i-1)
c=hstack((c,foo))
b=hstack((b,foo2))
a=empty(len(b))
a.fill(i)
...
Works many times faster now. Thanks all.
Try to use .empty and .fill (http://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.fill.html)
There are couple of things which could help, but probably only for large values of x. For starters use xrange instead of range, that will save creating a list you never need. You could also create empty numpy arrays of the correct length and fill them up with the values as you go, instead of appending to a list and then converting it into a numpy array.
I believe this code will work (no python access right this second):
for t in xrange(3, x):
size = (t - 2) * (t - 3)
a = np.zeros(size)
b = np.zeros(size)
c = np.zeros(size)
idx = 0
for u in xrange(2,t):
for v in xrange(1,u):
a[idx] = t
b[idx] = u
c[idx] = v
idx += 1
Related
Let's say we have a simple 1D ndarray. That is:
import numpy as np
a = np.array([1,2,3,4,5,6,7,8,9,10])
I want to get the first 3 and the last 2 values, so that the output would be [ 1 2 3 9 10].
I have already solved this by merging and concatenating the merged variables as follows :
b= a[:2]
c= a[-2:]
a=np.concatenate([b,c])
However I would like to know if there is a more direct way to achieve this using slices, such as a[:2 and -2:] for instance. As an alternative I already tried this :
a = a[np.r_[:2, -2:]]
but it not seems to be working. It returns me only the first 2 values that is [1 2] ..
Thanks in advance!
Slicing a numpy array needs to be continuous AFAIK. The np.r_[-2:] does not work because it does not know how big the array a is. You could do np.r_[:2, len(a)-2:len(a)], but this will still copy the data since you are indexing with another array.
If you want to avoid copying data or doing any concatenation operation you could use np.lib.stride_tricks.as_strided:
ds = a.dtype.itemsize
np.lib.stride_tricks.as_strided(a, shape=(2,2), strides=(ds * 8, ds)).ravel()
Output:
array([ 1, 2, 9, 10])
But since you want the first 3 and last 2 values the stride for accessing the elements will not be equal. This is a bit trickier, but I suppose you could do:
np.lib.stride_tricks.as_strided(a, shape=(2,3), strides=(ds * 8, ds)).ravel()[:-1]
Output:
array([ 1, 2, 3, 9, 10])
Although, this is a potential dangerous operation because the last element is reading outside the allocated memory.
In afterthought, I cannot find out a way do this operation without copying the data somehow. The numpy ravel in the code snippets above is forced to make a copy of the data. If you can live with using the shapes (2,2) or (2,3) it might work in some cases, but you will only have reading permission to a strided view and this should be enforced by setting the keyword writeable=False.
You could try to access the elements with a list of indices.
import numpy as np
a = np.array([1,2,3,4,5,6,7,8,9,10])
b = a[[0,1,2,8,9]] # b should now be array([ 1, 2, 3, 9, 10])
Obviously, if your array is too long, you would not want to type out all the indices.
Thus, you could build the inner index list from for loops.
Something like that:
index_list = [i for i in range(3)] + [i for i in range(8, 10)]
b = a[index_list] # b should now be array([ 1, 2, 3, 9, 10])
Therefore, as long as you know where your desired elements are, you can access them individually.
I am trying to use numpy in Python in solving my project.
I have a random binary array rndm = [1, 0, 1, 1] and a resource_arr = [[2, 3], 4, 2, [1, 2]]. What I am trying to do is to multiply the array element wise, then get their sum. As an expected output for the sample above,
output = 5 0 2 3. I find hard to solve such problem because of the nested array/list.
So far my code looks like this:
def fitness_score():
output = numpy.add(rndm * resource_arr)
return output
fitness_score()
I keep getting
ValueError: invalid number of arguments.
For which I think is because of the addition that I am trying to do. Any help would be appreciated. Thank you!
Numpy treats its arrays as matrices, and resource_arr is not a (valid) matrix. In your case a python list is more suitable:
def sum_nested(l):
tmp = []
for element in l:
if isinstance(element, list):
tmp.append(numpy.sum(element))
else:
tmp.append(element)
return tmp
In this function we check for each element inside l if it is a list. If so, we sum its elements. On the other hand, if the encountered element is just a number, we leave it untouched. Please note that this only works for one level of nesting.
Now, if we run sum_nested([[2, 3], 4, 2, [1, 2]]) we will get [5 4 2 3]. All that's left is multiplying this result by the elements of rndm, which can be achieved easily using numpy:
def fitness_score(a, b):
return numpy.multiply(a, sum_nested(b))
Numpy is all about the non-jagged arrays. You can do things with jagged arrays, but doing so efficiently and elegantly isnt trivial.
Almost always, trying to find a way to map your datastructure to a non-nested one, for instance, encoding the information as below, will be more flexible, and more performant.
resource_arr = (
[0, 0, 1, 2, 3, 3]
[2, 3, 4, 2, 1, 2]
)
That is, an integer denoting the 'row' each value belongs to, paired with an array of equal size of the values themselves.
This may 'feel' wasteful when coming from a C-style way of doing arrays (omg more memory consumption), but staying away from nested datastructures is almost certainly your best bet in terms of performance, and the amount of numpy/scipy ecosystem that will actually be compatible with your data representation. If it really uses more memory is actually rather questionable; every new python object uses a ton of bytes, so if you have only few elements per nesting, it is the more memory efficient solution too.
In this case, that would give you the following efficient solution to your problem:
output = np.bincount(*resource_arr) * rndm
I have not worked much with pandas/numpy so I'm not sure if this is most efficient way, but it works (atleast for the example you have shown):
import numpy as np
rndm = [1, 0, 1, 1]
resource_arr = [[2, 3], 4, 2, [1, 2]]
multiplied_output = np.multiply(rndm, resource_arr)
print(multiplied_output)
output = []
for elem in multiplied_output:
output.append(sum(elem)) if isinstance(elem, list) else output.append(elem)
final_output = np.array(output)
print(final_output)
I have a 41000x3 numpy array that I call "sortedlist" in the function below. The third column has a bunch of values, some of which are duplicates, others which are not. I'd like to take a sample of unique values (no duplicates) from the third column, which is sortedlist[:,2]. I think I can do this easily with numpy.random.sample(sortedlist[:,2], sample_size). The problem is I'd like to return, not only those values, but all three columns where, in the last column, there are the randomly chosen values that I get from numpy.random.sample.
EDIT: By unique values I mean I want to choose random values which appear only once. So If I had an array:
array = [[0, 6, 2]
[5, 3, 9]
[3, 7, 1]
[5, 3, 2]
[3, 1, 1]
[5, 2, 8]]
And I wanted to choose 4 values of the third column, I want to get something like new_array_1 out:
new_array_1 = [[5, 3, 9]
[3, 7, 1]
[5, 3, 2]
[5, 2, 8]]
But I don't want something like new_array_2, where two values in the 3rd column are the same:
new_array_2 = [[5, 3, 9]
[3, 7, 1]
[5, 3, 2]
[3, 1, 1]]
I have the code to choose random values but without the criterion that they shouldn't be duplicates in the third column.
samplesize = 100
rand_sortedlist = sortedlist[np.random.randint(len(sortedlist), size = sample_size),:]]
I'm trying to enforce this criterion by doing something like this
array_index = where( array[:,2] == sample(SelectionWeight, sample_size) )
But I'm not sure if I'm on the right track. Any help would be greatly appreciated!
I can't think of a clever numpythonic way to do this that doesn't involve multiple passes over the data. (Sometimes numpy is so much faster than pure Python that's still the fastest way to go, but it never feels right.)
In pure Python, I'd do something like
def draw_unique(vec, n):
# group indices by value
d = {}
for i, x in enumerate(vec):
d.setdefault(x, []).append(i)
drawn = [random.choice(d[k]) for k in random.sample(d, n)]
return drawn
which would give
>>> a = np.random.randint(0, 10, (41000, 3))
>>> drawn = draw_unique(a[:,2], 3)
>>> drawn
[4219, 6745, 25670]
>>> a[drawn]
array([[5, 6, 0],
[8, 8, 1],
[5, 8, 3]])
I can think of some tricks with np.bincount and scipy.stats.rankdata but they hurt my head, and there always winds up being one step at the end I can't see how to vectorize.. and if I'm not vectorizing the whole thing I might as well use the above which at least is simple.
I believe this will do what you want. Note that the running time will almost certainly be dominated by whatever method you use to generate your random numbers. (An exception is if the dataset is gigantic but you only need a small number of rows, in which case very few random numbers need to be drawn.) So I'm not sure this will run much faster than a pure python method would.
# arrayify your list of lists
# please don't use `array` as a variable name!
a = np.asarray(arry)
# sort the list ... always the first step for efficiency
a2 = a[np.argsort(a[:, 2])]
# identify rows that are duplicates (3rd column is non-increasing)
# Note this has length one less than a2
duplicate_rows = np.diff(a2[:, 2]) == 0)
# if duplicate_rows[N], then we want to remove row N and N+1
keep_mask = np.ones(length(a2), dtype=np.bool) # all True
keep_mask[duplicate_rows] = 0 # remove row N
keep_mask[1:][duplicate_rows] = 0 # remove row N + 1
# now actually slice the array
a3 = a2[keep_mask]
# select rows from a3 using your preferred random number generator
# I actually prefer `random` over numpy.random for sampling w/o replacement
import random
result = a3[random.sample(xrange(len(a3)), DESIRED_NUMBER_OF_ROWS)]
I want to set a specific element of my list to specific value with low overhead.
for example if I have this : a = numpy.array([1,2,3,0,4,0]) I want to change every 0 value to 10; in the end I want to have [1, 2, 3, 10, 4, 10]
in Matlab you can do this easily like a(a==0) = 10, is there any equivalent in numpy?
Remarkably similar to Matlab:
>>> a[a == 0] = 10
>>> a
array([ 1, 2, 3, 10, 4, 10])
There's a really nice "NumPy for Matlab Users" guide at the SciPy website.
I should note, this doesn't work on regular Python lists. NumPy arrays are a different datatype that work a lot more like a Matlab matrix than a Python list in terms of access and math operators.
A little more pythonic way would be like this, I guess:
import numpy
a = numpy.array([1,2,3,0,4,0])
for k,v in enumerate(a):
if v == 0:
a[k] = 10
print a
Even more pythonic way (provided by #mtrw)
[10 if k == 0 else k for k in a]
What is the best way to touch two following values in an numpy array?
example:
npdata = np.array([13,15,20,25])
for i in range( len(npdata) ):
print npdata[i] - npdata[i+1]
this looks really messed up and additionally needs exception code for the last iteration of the loop.
any ideas?
Thanks!
numpy provides a function diff for this basic use case
>>> import numpy
>>> x = numpy.array([1, 2, 4, 7, 0])
>>> numpy.diff(x)
array([ 1, 2, 3, -7])
Your snippet computes something closer to -numpy.diff(x).
How about range(len(npdata) - 1) ?
Here's code (using a simple array, but it doesn't matter):
>>> ar = [1, 2, 3, 4, 5]
>>> for i in range(len(ar) - 1):
... print ar[i] + ar[i + 1]
...
3
5
7
9
As you can see it successfully prints the sums of all consecutive pairs in the array, without any exceptions for the last iteration.
You can use ediff1d to get differences of consecutive elements. More generally, a[1:] - a[:-1] will give the differences of consecutive elements and can be used with other operators as well.