This may be more of an 'approach' or conceptual question.
Basically, I have a python a multi-dimensional list like so:
my_list = [[0,1,1,1,0,1], [1,1,1,0,0,1], [1,1,0,0,0,1], [1,1,1,1,1,1]]
What I have to do is iterate through the array and compare each element with those directly surrounding it as though the list was layed out as a matrix.
For instance, given the first element of the first row, my_list[0][0], I need to know know the value of my_list[0][1], my_list[1][0] and my_list[1][1]. The value of the 'surrounding' elements will determine how the current element should be operated on. Of course for an element in the heart of the array, 8 comparisons will be necessary.
Now I know I could simply iterate through the array and compare with the indexed values, as above. I was curious as to whether there was a more efficient way which limited the amount of iteration required? Should I iterate through the array as is, or iterate and compare only values to either side and then transpose the array and run it again. This, however would ignore those values to the diagonal. And should I store results of the element lookups, so I don't keep determining the value of the same element multiple times?
I suspect this may have a fundamental approach in Computer Science, and I am eager to get feedback on the best approach using Python as opposed to looking for a specific answer to my problem.
You may get faster, and possibly even simpler, code by using numpy, or other alternatives (see below for details). But from a theoretical point of view, in terms of algorithmic complexity, the best you can get is O(N*M), and you can do that with your design (if I understand it correctly). For example:
def neighbors(matrix, row, col):
for i in row-1, row, row+1:
if i < 0 or i == len(matrix): continue
for j in col-1, col, col+1:
if j < 0 or j == len(matrix[i]): continue
if i == row and j == col: continue
yield matrix[i][j]
matrix = [[0,1,1,1,0,1], [1,1,1,0,0,1], [1,1,0,0,0,1], [1,1,1,1,1,1]]
for i, row in enumerate(matrix):
for j, cell in enumerate(cell):
for neighbor in neighbors(matrix, i, j):
do_stuff(cell, neighbor)
This has takes N * M * 8 steps (actually, a bit less than that, because many cells will have fewer than 8 neighbors). And algorithmically, there's no way you can do better than O(N * M). So, you're done.
(In some cases, you can make things simpler—with no significant change either way in performance—by thinking in terms of iterator transformations. For example, you can easily create a grouper over adjacent triplets from a list a by properly zipping a, a[1:], and a[2:], and you can extend this to adjacent 2-dimensional nonets. But I think in this case, it would just make your code more complicated that writing an explicit neighbors iterator and explicit for loops over the matrix.)
However, practically, you can get a whole lot faster, in various ways. For example:
Using numpy, you may get an order of magnitude or so faster. When you're iterating a tight loop and doing simple arithmetic, that's one of the things that Python is particularly slow at, and numpy can do it in C (or Fortran) instead.
Using your favorite GPGPU library, you can explicitly vectorize your operations.
Using multiprocessing, you can break the matrix up into pieces and perform multiple pieces in parallel on separate cores (or even separate machines).
Of course for a single 4x6 matrix, none of these are worth doing… except possibly for numpy, which may make your code simpler as well as faster, as long as you can express your operations naturally in matrix/broadcast terms.
In fact, even if you can't easily express things that way, just using numpy to store the matrix may make things a little simpler (and save some memory, if that matters). For example, numpy can let you access a single column from a matrix naturally, while in pure Python, you need to write something like [row[col] for row in matrix].
So, how would you tackle this with numpy?
First, you should read over numpy.matrix and ufunc (or, better, some higher-level tutorial, but I don't have one to recommend) before going too much further.
Anyway, it depends on what you're doing with each set of neighbors, but there are three basic ideas.
First, if you can convert your operation into simple matrix math, that's always easiest.
If not, you can create 8 "neighbor matrices" just by shifting the matrix in each direction, then perform simple operations against each neighbor. For some cases, it may be easier to start with an N+2 x N+2 matrix with suitable "empty" values (usually 0 or nan) in the outer rim. Alternatively, you can shift the matrix over and fill in empty values. Or, for some operations, you don't need an identical-sized matrix, so you can just crop the matrix to create a neighbor. It really depends on what operations you want to do.
For example, taking your input as a fixed 6x4 board for the Game of Life:
def neighbors(matrix):
for i in -1, 0, 1:
for j in -1, 0, 1:
if i == 0 and j == 0: continue
yield np.roll(np.roll(matrix, i, 0), j, 1)
matrix = np.matrix([[0,0,0,0,0,0,0,0],
[0,0,1,1,1,0,1,0],
[0,1,1,1,0,0,1,0],
[0,1,1,0,0,0,1,0],
[0,1,1,1,1,1,1,0],
[0,0,0,0,0,0,0,0]])
while True:
livecount = sum(neighbors(matrix))
matrix = (matrix & (livecount==2)) | (livecount==3)
(Note that this isn't the best way to solve this problem, but I think it's relatively easy to understand, and likely to illuminate whatever your actual problem is.)
Related
Given a two-dimensional array T of size NxN, filled with various natural numbers (They do not have to be sorted in any way as in the example below.). My task is to write a program that transforms the array in such a way that all elements lying above the main diagonal are larger than each element lying on the diagonal and all elements lying below the main diagonal are to be smaller than each element on the diagonal.
For example:
T looks like this:
[2,3,5][7,11,13][17,19,23] and one of the possible solutions is:
[13,19,23][3,7,17][5,2,11]
I have no clue how to do this. Would anyone have an idea what algorithm should be used here?
Let's say the matrix is NxN.
Put all N² values inside an array.
Sort the array with whatever method you prefer (ascending order).
In your final array, the (N²-N)/2 first values go below the diagonal, the following N values go to the diagonal, and the final (N²-N)/2 values go above the diagonal.
The following pseudo-code should do the job:
mat <- array[N][N] // To be initialized.
vec <- array[N*N]
for i : 0 to (N-1)
for j : 0 to (N-1)
vec[i*N+j]=mat[i][j]
next j
next i
sort(vec)
p_below <- 0
p_diag <- (N*N-N)/2
p_above <- (N*N+N)/2
for i : 0 to (N-1)
for j : 0 to (N-1)
if (i>j)
mat[i][j] = vec[p_above]
p_above <- p_above + 1
endif
if (i<j)
mat[i][j] = vec[p_below]
p_below <- p_below + 1
endif
if (i=j)
mat[i][j] = vec[p_diag]
p_diag <- p_diag + 1
endif
next j
next i
Code can be heavily optimized by sorting directly the matrix, using a (quite complex) custom sort operator, so it can be sorted "in place". Technically, you'll do a bijection between the matrix indices to a partitioned set of indices representing "below diagonal", "diagonal" and "above diagonal" indices.
But I'm unsure that it can be considered as an algorithm in itself, because it will be highly dependent on the language used AND on how you stored, internally, your matrix (and how iterators/indices are used). I could write one in C++, but I lack knownledge to give you such an operator in Python.
Obviously, if you can't use a standard sorting function (because it can't work on anything else but an array), then you can write your own with the tricky comparison builtin the algorithm.
For such small matrixes, even a bubble-sort can work properly, but obviously implementing at least a quicksort would be better.
Elements about optimizing:
First, we speak about the trivial bijection from matrix coordinate [x][y] to [i]: i=x+y*N. The invert is obviously x=floor(i/N) & y=i mod N. Then, you can parse the matrix as a vector.
This is already what I do in the first part initializing vec, BTW.
With matrix coordinates, it's easy:
Diagonal is all cells where x=y.
The "below" partition is everywhere x<y.
The "above" partition is everywhere x>y.
Look at coordinates in the below 3x3 matrix, it's quite evident when you know it.
0,0 1,0 2,0
0,1 1,1 2,1
0,2 1,2 2,2
We already know that the ordered vector will be composed of three parts: first the "below" partition, then the "diagonal" partition, then the "above" partition.
The next bijection is way more tricky, since it requires either a piecewise linear function OR a look-up table. The first requires no additional memory but will use more CPU power, the second use as much memory as the matrix but will require less CPU power.
As always, optimization for speed often cost memory. If memory is scarse because you use huge matrixes, then you'll prefer a function.
In order to shorten a bit, I'll explain only for "below" partition. In the vector, the (N-1) first elements will be the ones belonging to the first column. Then, we'll have (N-2) elements for the 2nd column, (N-3) for the third, until we had only 1 element for the (N-1)th column. You see the scheme, sum of the number of elements and the column (zero-based index) is always (N-1).
I won't write the function, because it's quite complex and, honestly, it won't help so much to understand. Simply know that converting from matrix indices to vector is "quite easy".
The opposite is more tricky and CPU-intensive, and it SHOULD use a (N-1) element vector to store where each column starts within the vector to GREATLY speed up the process. Thanks, this vector can also be used (from end to begin) for the "above" partition, so it won't burn too much memory.
Now, you can sort your "vector" normally, simply by chaining the two bijection together with the vector index, and you'll get a matrix cell instead. As long as the sorting algorithm is stable (that's usually the case), it will works and will sort your matrix "in place", at the expense of a lot of mathematical computing to "route" the linear indexes to matrix indexes.
Please note that, despite we speak about bijections, we need ONLY the "vector to matrix" formulas. The "matrix to vector" are important - it MUST be a bijection! - but you won't use them, since you'll sort directly the (virtual) vector from 0 to N²-1.
So I got a matrix looking something like
0.8, 0.7,nan,...
0.1,nan,0.1,...
0.9,nan,0.3,...
with dimensions N X M, where M >> N
Now I want to go through all columns and change the values of the matrix as follows:
If the column has exactly K values >= 0, do nothing
If the column has L < K values >= 0, set K-L values of this column to X
If the column has S > K values >= 0, set S-K values of this column to nan
I am looking for a fast way to do this with python for a large optimization and my for loop is taking too long still.
The outcome in the above example, with K = 2, should look like:
0.8, 0.7,nan,...
0.1,X,0.1,...
nan, nan,0.3,...
What I am considering at the moment is first sorting each column of the entire matrix (I have to restore the previous order later though) and to somehow apply np.where(num_cols != K,applyChanges(myMat),myMat), but I am not sure how to make np.where return me the columns of the matrix and not the invidivual scalars.
Thanks for any suggestion :)
edit:
So I was able to speed up my code by a factor of slightly more than 5 by first using argsort on the columns. This was I could directly set the appropriate values without requiring anyfurther "argwhere" calls which made my previous code slow.
All I can do now seems to be to move from numpy arrays to python lists as I now do a for loop and access the elements directly which is faster done using lists.
edit2:
Using numba's #jit decorator I was able to increase the speed again by a factor of 5. Wow, I am impressed, pretty cool library (is that even the proper term for it?). Now my code is fast enough for my current requirements.
edit3:
As requested, here is my for loop, it is straight forward
idx_sorted = np.argsort(pred_new, axis=0)
num_bands_selected = np.sum(pred_new >= 0, axis=0)
for idx_col in range(0, num_cols):
if num_bands_selected[idx_col] < N:
pred_new[idx_sorted[num_bands_selected[idx_col]-N:, idx_col], idx_col] = dummy_value
elif num_bands_selected[idx_col] > N:
pred_new[idx_sorted[-num_bands_selected[idx_col]:-N, idx_col], idx_col] = np.nan
pred_new is something a neural network returns, but the data has to be in a specific format for some postprocessing algorithm to work.
This could be sped up to the best of my knowledge in two ways (aside from the #jit which was not included):
Using python lists instead of numpy arrays (they allow faster read access)
First selecting columns that require modification. Here I go through everything, but often only some subset (~50 %) might require modification
If I use the code
from collections import deque
q = deque(maxlen=2)
while step <= step_max:
calculate(item)
q.append(item)
another_calculation(q)
how does it compare in efficiency and readability to
q = []
while step <= step_max:
calculate(item)
q.append(item)
q = q[-2:]
another_calculation(q)
calculate() and another_calculation() are not real in this case but in my actual program are simply two calculations. I'm doing these calculations every step for millions of steps (I'm simulating an ion in 2-d space). Because there are so many steps, q gets very long and uses a lot of memory, while another_calculation() only uses the last two values of q. I had been using the latter method, then heard deque mentioned and thought it might be more efficient; thus the question.
I.e., how do deques in python compare to just normal list slicing?
q = q[-2:]
now this is a costly operation because it recreates a list everytime (and copies the references). (A nasty side effect here is that it changes the reference of q even if you can use q[:] = q[-2:] to avoid that).
The deque object just changes the start of the list pointer and "forgets" the oldest item. So it's faster and it's one of the usages it's been designed for.
Of course, for 2 values, there isn't much difference, but for a bigger number there is.
If I interpret your question correctly, you have a function, that calculates a value, and you want to do another calculation with this and the previous value. The best way is to use two variables:
while step <= step_max:
item = calculate()
another_calculation(previous_item, item)
previous_item = item
If the calculations are some form of vector math, you should consider using numpy.
How to arrange 5 integers using array in ascending order, no use of sort()
then 5x5 multiplication table using array too, like using list, array append.
I'm going to take a stab at what I believe you're asking, mostly because I hope it's educational. You're lucky I'm procrastinating studying at the moment.
Sorting, because who likes entropy anyway?
Bubbles!
Your first task is to look at the bubble sort, a sorting algorithm that's as simple to code as it is to understand. (It performs poorly with large arrays due to its O(n2) performance but is probably among the first sorts a lot of people encounter.) I highly, highly suggest you understand the algorithm before even thinking about looking at code.
How does it work?
Start at the beginning! Look at the first pair of numbers. If they're in the wrong order, swap them. Increment your starting position by 1 and repeat until the end of the array.
What would this look like in Python?
I'm glad you asked. You need to loop through the whole array and swap whenever appropriate. Thankfully Python makes swapping very easy, allowing you to pull tricks like a, b = b, a. We can (hopefully quickly) write down some code to do what we want:
def bubble_sort(array):
for i in xrange(len(array)):
for j in xrange(len(array) - i - 1):
if array[j] > array[j + 1]:
array[j], array[j + 1] = array[j + 1], array[j]
return array
This should be straightforward and follows the sorting procedure directly. Pass in an array (or list of numbers) and it will return an array sorted in ascending order.
Multiplication Table
I'm assuming you mean something like this table that you learn in first grade. The requirement I'm imposing on your vague wording is that we want to return a 2D array where the first row is multiples of 0, the second is multiples of 1, etc. This goes for the columns as well, since multiplication tables are symmetric between rows and columns. There are a number of possible approaches, but I'm only going to consider the one I personally find the most elegant and Pythonic. Python comes packed with great list comprehension, so why not make use of it? Try this:
table = [[x*y for x in xrange(6)] for y in xrange(6)]
This creates a 6x6 matrix, i.e. the multiplication table from 0–5. Take some time to really understand this code. I think that list comprehension is absolutely fundamental to Python and is something that sets it apart. If you look at the (i, j)th element of the array, you'll see that it equals ij. For example, table[3][2] == 6 is true.
I desperately hope you learned something useful from this. Next time you post a question, hopefully you'll give us more to work on.
I have two boolean sparse square matrices of c. 80,000 x 80,000 generated from 12BM of data (and am likely to have orders of magnitude larger matrices when I use GBs of data).
I want to multiply them (which produces a triangular matrix - however I dont get this since I don't limit the dot product to yield a triangular matrix).
I am wondering what the best way of multiplying them is (memory-wise and speed-wise) - I am going to do the computation on a m2.4xlarge AWS instance which has >60GB of RAM. I would prefer to keep the calc in RAM for speed reasons.
I appreciate that SciPy has sparse matrices and so does h5py, but have no experience in either.
Whats the best option to go for?
Thanks in advance
UPDATE: sparsity of the boolean matrices is <0.6%
If your matrices are relatively empty it might be worthwhile encoding them as a data structure of the non-False values. Say a list of tuples describing the location of the non-False values. Or a dictionary with the tuples as the keys.
If you use e.g. a list of tuples you could use a list comprehension to find the items in the second list that can be multiplied with an element from the first list.
a = [(0,0), (3,7), (5,2)] # et cetera
b = ... # idem
for r, c in a:
res = [(r, k) for j, k in b if k == j]
-- EDITED TO SATISFY BELOW COMMENT / DOWNVOTER --
You're asking how to multiply matrices fast and easy.
SOLUTION 1: This is a solved problem: use numpy. All these operations are easy in numpy, and since they are implemented in C, are rather blazingly fast.
http://www.numpy.org/
http://www.scipy.org
also see:
Very large matrices using Python and NumPy
http://docs.scipy.org/doc/scipy/reference/sparse.html
SciPy and Numpy have sparse matrices and matrix multiplication. It doesn't use much memory since (at least if I wrote it in C) it probably uses linked lists, and thus will only use the memory required for the sum of the datapoints, plus some overhead. And, it will almost certainly be blazingly fast compared to pure python solution.
SOLUTION 2
Another answer here suggests storing values as tuples of (x, y), presuming value is False unless it exists, then it's true. Alternate to this is a numeric matrix with (x, y, value) tuples.
REGARDLESS: Multiplying these would be Nasty time-wise: find element one, decide which other array element to multiply by, then search the entire dataset for that specific tuple, and if it exists, multiply and insert the result into the result matrix.
SOLUTION 3 ( PREFERRED vs. Solution 2, IMHO )
I would prefer this because it's simpler / faster.
Represent your sparse matrix with a set of dictionaries. Matrix one is a dict with the element at (x, y) and value v being (with x1,y1, x2,y2, etc.):
matrixDictOne = { 'x1:y1' : v1, 'x2:y2': v2, ... }
matrixDictTwo = { 'x1:y1' : v1, 'x2:y2': v2, ... }
Since a Python dict lookup is O(1) (okay, not really, probably closer to log(n)), it's fast. This does not require searching the entire second matrix's data for element presence before multiplication. So, it's fast. It's easy to write the multiply and easy to understand the representations.
SOLUTION 4 (if you are a glutton for punishment)
Code this solution by using a memory-mapped file of the required size. Initialize a file with null values of the required size. Compute the offsets yourself and write to the appropriate locations in the file as you do the multiplication. Linux has a VMM which will page in and out for you with little overhead or work on your part. This is a solution for very, very large matrices that are NOT SPARSE and thus won't fit in memory.
Note this solves the complaint of the below complainer that it won't fit in memory. However, the OP did say sparse, which implies very few actual datapoints spread out in giant arrays, and Numpy / SciPy handle this natively and thus nicely (lots of people at Fermilab use Numpy / SciPy regularly, I'm confident the sparse matrix code is well tested).