For performance reasons I'd like to use the Python list insert() method. I will demonstrate why:
My final list is a 31k * 31k matrix:
w=31*10**3
h=31*10**3
distance_matrix = [[0 for x in range(w)] for y in range(h)]
I intent to update the matrix one iteration at the time:
for i in range(len(index)):
for j in range(len(index)):
distance_matrix[index[i]][index[j]] = k[0][i][j]
Obviously this doesn't perform well.
I'd rather like to start with an empty list and fill it up gradually, making the computation intense at the end of the process (and easy at the beginning):
distance_matrix = []
for i in range(len(index)):
for j in range(len(index)):
distance_matrix.insert([index[i]][index[j]], k[0][i][j])
But this multi-index or list-in-list insert doesn't seem to be possible.
How would you advise to proceed? I've also looked into numpy arrays, but without luck so far.
To be precise: updating the (ordered) large array of zeros index by index is the issue here. In a DataFrame I can use custom columns/indices, but that is not scalable in performance.
Additional information:
I split up the entire original data matrix in parts to compute distance matrices in parallel. The issue in this process is to aggregate the distance matrix again with the computed values. The distance matrix/array is very large, therefore a simple list insert or edit takes very long.
I think this approach achieves what I had in mind:
distance_matrix = []
def dynamic_append(x,i,j,val):
if((len(x)-1)<i):
dif_x = i-len(x)+1
for k in range(dif_x):
x.append([])
dif_y = j-len(x[i])+1
for l in range(dif_y):
x[i].append([])
elif((len(x[i])-1)<j):
dif_y = j-len(x[i])+1
for l in range(dif_y):
x[i].append([])
x[i][j]=val
return(x)
for i in range(len(index)):
for j in range(len(index)):
distance_matrix=dynamic_append(distance_matrix,index[i],index[j],k[0][i][j])
Related
I am currently working with sparse matrix in python. I choose to use lil_matrix for my problem because as explained in the documentation lil_matrix are intended to be used for constructing a sparse matrix. My sparse matrix has dimensions 2500x2500
I have two piece of code inside two loops (which iterate in the matrix elements) which are having different execution time and I want to understand why. The first one is
current = lil_matrix_A[i,j]
lil_matrix_A[i, j] = current + 1
lil_matrix_A[j, i] = current + 1
Basically just taking every element of the matrix and incrementing its value by one.
And the second one is as below
value = lil_matrix_A[i, j]
temp = (value * 10000) / (dictionary[listA[i]] * dictionary[listB[j]])
lil_matrix_A[i, j] = temp
lil_matrix_A[j, i] = temp
Basically taking the value, making the calculation of a formula and inserting this new value to the matrix.
The first code is executed for around 0.4 seconds and the second piece of code is executed for around 32 seconds.
I understand that the second one has an extra calculation in the middle, but the time difference, in my opinion, does not make sense. The dictionary and list indexing have O(1) complexity so it is not supposed to be a problem. Is there any suggestion what it is causing this difference in execution time?
Note: The number of elements in list and dictionary is also 2500.
I would like to perform the operation
If had a regular shape, then I could use np.einsum, I believe the syntax would be
np.einsum('ijp,ipk->ijk',X, alpha)
Unfortunately, my data X has a non regular structure on the 1st (if we zero index) axis.
To give a little more context, refers to the p^th feature of the j^th member of the i^th group. Because groups have different sizes, effectively, it is a list of lists of different lengths, of lists of the same length.
has a regular structure and thus can be saved as a standard numpy array (it comes in 1-dimensional and then I use alpha.reshape(a,b,c) where a,b,c are problem specific integers)
I would like to avoid storing X as a list of lists of lists or a list of np.arrays of different dimensions and writing something like
A = []
for i in range(num_groups):
temp = np.empty(group_sizes[i], dtype=float)
for j in range(group_sizes[i]):
temp[i] = np.einsum('p,pk->k',X[i][j], alpha[i,:,:])
A.append(temp)
Is this some nice numpy function/data structure for doing this or am I going to have to compromise with some only partially vectorised implementation?
I know this sounds obvious, but, if you can afford the memory, I'd start just by checking the performance you get simply by padding the data to have a uniform size, that is, simply adding zeros and perform the operation. Sometimes a simpler solution is faster than a more supposedly optimal one that has more Python/C roundtrips.
If that doesn't work, then your best bet, as Tom Wyllie suggested, is probably a bucketing strategy. Assuming X is your list of lists of lists and alpha is an array, you can start by collecting the sizes of the second index (maybe you already have this):
X_sizes = np.array([len(x_i) for x_i in X])
And sort them:
idx_sort = np.argsort(X_sizes)
X_sizes_sorted = X_sizes[idx_sort]
Then you choose a number of buckets, which is the number of divisions of your work. Let's say you pick BUCKETS = 4. You just need to divide the data so that more or less each piece is the same size:
sizes_cumsum = np.cumsum(X_sizes_sorted)
total = sizes_cumsum[-1]
bucket_idx = []
for i in range(BUCKETS):
low = np.round(i * total / float(BUCKETS))
high = np.round((i + 1) * total / float(BUCKETS))
m = sizes_cumsum >= low & sizes_cumsum < high
idx = np.where(m),
# Make relative to X, not idx_sort
idx = idx_sort[idx]
bucket_idx.append(idx)
And then you make the computation for each bucket:
bucket_results = []
for idx in bucket_idx:
# The last index in the bucket will be the biggest
bucket_size = X_sizes[idx[-1]]
# Fill bucket array
X_bucket = np.zeros((len(X), bucket_size, len(X[0][0])), dtype=X.dtype)
for i, X_i in enumerate(idx):
X_bucket[i, :X_sizes[X_i]] = X[X_i]
# Compute
res = np.einsum('ijp,ipk->ijk',X, alpha[:, :bucket_size, :])
bucket_results.append(res)
Filling the array X_bucket will probably be slow in this part. Again, if you can afford the memory, it would be more efficient to have X in a single padded array and then just slice X[idx, :bucket_size, :].
Finally, you can put back your results into a list:
result = [None] * len(X)
for res, idx in zip(bucket_results, bucket_idx):
for r, X_i in zip(res, idx):
result[X_i] = res[:X_sizes[X_i]]
Sorry I'm not giving a proper function, but I'm not sure how exactly is your input or expected output so I just put the pieces and you can use them as you see fit.
I have some very large lists that I am working with (>1M rows), and I am trying to find a fast (the fastest?) way of, given a float, ranking that float compared to the list of floats, and finding it's percentage rank compared to the range of the list. Here is my attempt, but it's extremely slow:
X =[0.595068426145485,
0.613726840488019,
1.1532608695652,
1.92952380952385,
4.44137931034496,
3.46432160804035,
2.20331487122673,
2.54736842105265,
3.57702702702689,
1.93202764976956,
1.34720184204056,
0.824997304105564,
0.765782842381996,
0.615110856990126,
0.622708022872803,
1.03211045820975,
0.997225012974318,
0.496352327702226,
0.67103858866700,
0.452224068868272,
0.441842124852685,
0.447584524952608,
0.4645525042246]
val = 1.5
arr = np.array(X) #X is actually a pandas column, hence the conversion
arr = np.insert(arr,1,val, axis=None) #insert the val into arr, to then be ranked
st = np.sort(arr)
RANK = float([i for i,k in enumerate(st) if k == val][0])+1 #Find position
PCNT_RANK = (1-(1-round(RANK/len(st),6)))*100 #Find percentage of value compared to range
print RANK, PCNT_RANK
>>> 17.0 70.8333
For the percentage ranks I could probably build a distribution and sample from it, not quite sure yet, any suggestions welcome...it's going to be used heavily so any speed-up will be advantageous.
Thanks.
Sorting the array seems to be rather slow. If you don't need the array to be sorted in the end, then numpy's boolean operations are quicker.
arr = np.array(X)
bool_array = arr < val # Returns boolean array
RANK = float(np.sum(bool_array))
PCT_RANK = RANK/len(X)
Or, better yet, use a list comprehension and avoid numpy all together.
RANK = float(sum([x<val for x in X]))
PCT_RANK = RANK/len(X)
Doing some timing, the numpy solution above gives 6.66 us on my system while the list comprehension method gives 3.74 us.
The two slow parts of your code are:
st = np.sort(arr). Sorting the list takes on average O(n log n) time, where n is the size of the list.
RANK = float([i for i, k in enumerate(st) if k == val][0]) + 1. Iterating through the list takes O(n) time.
If you don't need to sort the list, then as #ChrisMueller points out, you can just iterate through it once without sorting, which takes O(n) time and will be the fastest option.
If you do need to sort the list (or have access to it pre-sorted), then the fastest option for the second step is RANK = np.searchsorted(st, val) + 1. Since the list is already sorted, finding the index will only take O(log n) time by binary search instead of having to iterate through the whole list. This will still be a lot faster than your original code.
I am having a small issue understanding indexing in Numpy arrays. I think a simplified example is best to get an idea of what I am trying to do.
So first I create an array of zeros of the size I want to fill:
x = range(0,10,2)
y = range(0,10,2)
a = zeros(len(x),len(y))
so that will give me an array of zeros that will be 5X5. Now, I want to fill the array with a rather complicated function that I can't get to work with grids. My problem is that I'd like to iterate as:
for i in xrange(0,10,2):
for j in xrange(0,10,2):
.........
"do function and fill the array corresponding to (i,j)"
however, right now what I would like to be a[2,10] is a function of 2 and 10 but instead the index for a function of 2 and 10 would be a[1,4] or whatever.
Again, maybe this is elementary, I've gone over the docs and find myself at a loss.
EDIT:
In the end I vectorized as much as possible and wrote the simulation loops that I could not in Cython. Further I used Joblib to Parallelize the operation. I stored the results in a list because an array was not filling right when running in Parallel. I then used Itertools to split the list into individual results and Pandas to organize the results.
Thank you for all the help
Some tips for your to get the things done keeping a good performance:
- avoid Python `for` loops
- create a function that can deal with vectorized inputs
Example:
def f(xs, ys)
return x**2 + y**2 + x*y
where you can pass xs and ys as arrays and the operation will be done element-wise:
xs = np.random.random((100,200))
ys = np.random.random((100,200))
f(xs,ys)
You should read more about numpy broadcasting to get a better understanding about how the arrays's operations work. This will help you to design a function that can handle properly the arrays.
First, you lack some parenthesis with zeros, the first argument should be a tuple :
a = zeros((len(x),len(y)))
Then, the corresponding indices for your table are i/2 and j/2 :
for i in xrange(0,10,2):
for j in xrange(0,10,2):
# do function and fill the array corresponding to (i,j)
a[i/2, j/2] = 1
But I second Saullo Castro, you should try to vectorize your computations.
My question is about a specific array operation that I want to express using numpy.
I have an array of floats w and an array of indices idx of the same length as w and I want to sum up all w with the same idx value and collect them in an array v.
As a loop, this looks like this:
for i, x in enumerate(w):
v[idx[i]] += x
Is there a way to do this with array operations?
My guess was v[idx] += w but that does not work, since idx contains the same index multiple times.
Thanks!
numpy.bincount was introduced for this purpose:
tmp = np.bincount(idx, w)
v[:len(tmp)] += tmp
I think as of 1.6 you can also pass a minlength to bincount.
This is a known behavior and, though somewhat unfortunate, does not have a numpy-level workaround. (bincount can be used for this if you twist its arm.) Doing the loop yourself is really your best bet.
Note that your code might have been a bit more clear without re-using the name w and without introducing another set of indices, like
for i, w_thing in zip(idx, w):
v[i] += w_thing
If you need to speed up this loop, you might have to drop down to C. Cython makes this relatively easy.