Apply function to 2D numpy array elements - python

I've seen this post and want to do something similar, but not exactly the same.
I am implementing a little game of life game and using numpy arrays for representing the states of the game. So I need to check, how many alive neighbors a cell has. I already got a function for getting a window of neighbors given a coordinate and row count and column count for the window size that I want to have.
So usually my windows will be of 3x3 size like this:
T = True
F = False
[[T,T,T],
[F,T,T],
[F,F,F]] # some random truth values
In this representation True stands for a cell being alive.
Now I wrote some code iterating over all cells of the state, counting the True values and so on using a double for loop, but I think there is probably a better numpy solution.
What I'd do in the naive approach:
iterate over all cells of the state (not only the window) (I'd like to formulate some code to to be executed if a cell meets a criteria or another (being alive and surviving or being dead and coming alive))
get the window (wrapping or not wrapping) (function for that I already have)
check if the current cell is alive (could just do a lookup in the state's numpy array)
if it is alive start with an alive neighbors count of -1 otherwise start with 0
count all True values of the window (np.sum) and add it to the alive neighbors count (which is -1 if the cell itself was alive, so that I only count neighbors but not the cell itself)
depending on whether the count of alive neighbors is in certain ranges (configurable), write in another (new) state's array True values. (I'd start out with an array, which I created using: np.full((height, width), False, dtype=bool))
go on with that new array, keeping the old one in a list for history or logging purposes
Basically:
if cell meets criteria:
write True at the cell's position in a new array
However meeting the criteria depends on multiple rows, because the state's numpy array is a 2D array. That's why I think the linked post is close but not exactly what I need.
How can I do this in an efficient numpy-y way, avoiding unnecessary looping?
Clarification
I am searching for the best way of implementing this in python using numpy and scipy, which aims to be very readable and has good performance.

Perhaps I did not understand all you are trying to do, but what is stopping you from simply using the numpy.sum function?
Example - Let the state be:
import numpy as np
state = np.random.randint(1, 10, (9,9))
Here I am using {0, 1} as values for the state, where 1 means "alive".
Then you can just slice around the cell being investigated, e.g. [2,3]
s = state[1:3,2:5]
if s[1,1]:
val = -1
else
val = 0
val += s.sum()
If you put this in a for loop and pay attention to border cases, clamping or wrapping as appropriate, it should do what you describe.
If you are looking for a short elegant implementation, it can be done very efficiently with Python and Numpy.

Related

Appending to a numpy array in for loop

I'm trying to create a Monte Carlo simulation to simulate future stock prices using Numpy arrays.
My current approach is: create a For Loop which fills an array, stock_price_array, with simulated stock prices. These stock prices are generated by taking the last stock price, then multiplying it by 1 + an annual return. The annual returns are drawn randomly from a normal distribution and stored in the array annual_ret.
My problem is that although the "stock price" variables I print from my For Loop appear to be correct, I simply cannot figure out how to Append these stock price variables to stock_price_array.
I've tried various methods, including initializing the stock_price_array using .full instead of .empty, changing the order of where the array appears in the For Loop, and checking the size of the array.
I've read other Stack Overflow posts on similar topics but can't figure out what I'm doing wrong.
Thank you in advance for your help!
annual_mean = .06
annual_stdev = .15
start_stock_price = 100
numYears = 3
numSimulations = 4
stock_price_array = np.empty(numYears)
# draw an annual return from a normal distribution; this annual return will be random
annual_ret = np.random.normal(annual_mean, annual_stdev, numSimulations)
for i in range(numYears):
stock_price = np.multiply(start_stock_price, (1 + annual_ret[i]))
np.append(stock_price_array, [stock_price])
start_stock_price = stock_price
The 1st rule of numpy is: never iterate your array yourself. Use numpy function that does all the computation in batch (and for doing so, they iterate the array, sure. But that iteration is not a python iteration, so it is way faster).
No-for solution
For example, here, you could do something like this
np.cumprod(np.hstack([start_stock_price, annual_ret+1]))
What it does is 1st building an array of a initial value, and some factors.
So if initial value is 100, and interest rate are 0.1, -0.1, 0.2, 0.2 (for example), then hstack build and array of values 100, 1.1, 0.9, 1.2, 1.2.
And the cumprod just build the cumulative product of those
100, 100×1.1=110, 100×1.1×0.9=110×0.9=99, 100×1.1×0.9×1.2=99×1.2=118.8, 100×1.1×0.9×1.2×1.2=118.8×1.2=142.56
Correction of yours
To answer to your initial question anyway (even if I strongly advise that you try to use solutions like the usage of cumprod I've shown), you have 2 choices:
Either you allocate in advance an array, as you did (your stock_price_array = np.empty(numYears)). And then, instead of trying to append the new stock_price to stock_price_array, you should simply fill one of the empty place that are already there. By simply doing stock_price_array[i] = stock_price
Or you don't. And then you replace the np.empty line by a stock_price_array=[]. And then, at each step, you do append the result to create a new stock_price_array, like this stock_price_array = np.append(stock_price_array, [stock_price])
I strongly advise against the 2nd solution. Since you already know the final size of the array, it is way better to create it once. Because np.append recreate a brand new array, then copies the input data it it. It does not just extend the existing array (generally speaking, we can't do that anyway).
But, well, anyway, I advise against both solution, since I find mine (with cumprod) preferable. for is the taboo word in numpy. And it is even more so, when what inside this for is the creation of a new array, like append is.
Monte-Carlo
Since you've mentioned Monte-Carlo, and then shown a code that compute only one result (you draw 1 set of annual ret, and perform one computation of future values), I am wondering if that is really what you want.
In particular, I see that you have numSimulation and numYears, that appear to be playing redundant roles in your code (and therefore in mines).
The only reason why it doesn't just throw a index error, is because numSimulation is used only to decide how many annual_ret you draw. And since numSimulation > numYears, you have more than enough annual_ret to compute the result.
Wasn't your initial intention to redo the simulation over the years numSimulation time, to have numSimulation results ?
In which case, you probably need numSimulation sets of numYears annual rate. So a 2D array. And like wise, you should be computing numSimulation series of numYears results.
If my guess is not completely off, I surmise that what you really wanted to do was rather in the effect of:
annual_ret = np.random.normal(annual_mean, annual_stdev, (numSimulations, numYears)) # 2d array of interest rate. 1 simulation per row, 1 year per column
t = np.pad(annual_ret+1, ((0,0), (1,0)), constant_values=start_stock_price) # Add 1 as we did earlier. And pad with an initial 100 (`start_stock_price`) at the beginning of each simulation
res = np.cumprod(t, axis=1) # cumulative multiplication. `axis=1` means that it is done along axis 1 (along years) for each row (for each simulation)

Vectorize Conway's Game of Life in pure numpy?

I'm wondering if there is a way to implement Conway's game of life without resorting to for loops, if statements and other control structures typical of programming.
It should be pretty easy to vectorize for loops, but how would you convert the checks on the neighborhood to a matrix operation?
The base logic is something like this:
def neighbors(cell, distance=1):
"""Return the neighbors of cell."""
x, y = cell
r = xrange(0 - distance, 1 + distance)
return ((x + i, y + j) # new cell offset from center
for i in r for j in r # iterate over range in 2d
if not i == j == 0) # exclude the center cell
I hope this is not considered as off-topic by the mods, I'm genuinely curios and I am just starting out with CAs.
Cheers
The answer to your question is "yes, it is possible" (particularly the board updates from board n to board n+1).
I describe the process in detail here. The main technique to generate the neighborhood around a central cell involves using "strides" (the way that numpy and other array computation systems know how to walk across rows and columns of elements when they are really stored in memory in flat 1D thing) in a custom fashion to generate neighborhoods around cells. I describe that process here.
One last comment: since Game of Life iterates from state n to state n+1, while you could literally remove all imperative looping, it doesn't really make sense to take out that top-level control loop. So, has a loop: for round in range(num_rounds): board.update() where board.update doesn't use loops (except to do some side calculations ... again, you could remove those but it would make the program longer and less elegant).
To give you a concrete example (and be more compatible with StackOverflow's answer requirements), here's some select cutting and pasting from my posts to generate the central neighborhoods from a simple 4x4 board [apologies, this is python 2 code, you'll have to modify the prints a bit]:
board = np.arange(16).reshape((4,4))
print board
print board.shape
We want to pick out the four "complete" neighborhoods centered around 5, 6, 7, and 8. Let’s look at the neighborhood for 5. What is the shape of the result? 3×3. What are the strides? Well, to walk across a row is still just walking one element at a time and to get to the next row is still 4 elements at a time. These are the same as the strides in the original. The difference is we don’t take "everything", we just take a selection. Let’s see if that actually works:
from numpy.lib.stride_tricks import as_strided
neighbors = as_strided(board, shape=(3,3), strides=board.strides)
print neighbors
Ok, nice. Now, if we want all four neighborhoods, what is the output shape? We have several 3×3 results. How many? In this case, we have 2×2 of them (for each of the "center" cells). This gives a shape of (2,2,3,3) – the neighborhoods are the inner dimensions and the organization of the neighborhoods is the outer dimensions.
So, our strides (in terms of elements) end up being (4,0) within one neighborhood and (4,0) for progressing neighborhood to neighborhood. The total stride (element wise) is: (4,0,4,0). But, the component strides (our outer two dimensions) are the same as the strides of the board. This means that our neighborhood strides are board.strides + board.strides.
print board.strides + board.strides
neighborhoods = as_strided(board,
shape=(2,2,3,3),
strides=board.strides+board.strides)
print neighborhoods[0,0]
print neighborhoods[-1, -1]

How to calculate Delta F / F using python?

I've recently "taught" myself python in order to analyze data for my experiments. As such I'm pretty clueless on many aspects. I've managed to make my analysis work for certain files but in some cases it breaks down and I imagine it is a result of faulty programming.
Currently I export a file containing 3 numpy arrays. One of these arrays is my signal (float values from -10 to 10). What I wish to do is to normalize every datum in this array to a range of values that preceed it. (i.e. the 30001st value must have the average of the preceeding 3000 values subtracted from it and then the difference must then be divided by thisvery same average (the preceeding 3000 values). My data is collected at a rate of 100Hz thus to get a normalization of the alst 30s i must use the preceeding 3000values.
As it stand this is how I've managed to make it work:
this stores the signal into the variable photosignal
photosignal = np.array(seg.analogsignals[0], ndmin=1)
now this the part I use to get the delta F/F over a moving window of 30s
normalizedphotosignal = [(uu-(np.mean(photosignal[uu-3000:uu])))/abs(np.mean(photosignal[uu-3000:uu])) for uu in photosignal[3000:]]
The following adds 3000 values to the beginning to keep the array the same length since later on i must time lock it to another list that is the same length
holder =list(range(3000))
normalizedphotosignal = holder + normalizedphotosignal
What I have noticed is that in certain files this code gives me an error because it says that the"slice" is empty and therefore it cannot create a mean.
I think maybe there is a better way to program this that could avoid this problem altogether. Or this a correct way to approach this problem?
So i tried the solution but it is quite slow and it nevertheless still gives me the "empty slice error".
I went over the moving average post and found this method:
def running_mean(x, N):
cumsum = np.cumsum(np.insert(x, 0, 0))
return (cumsum[N:] - cumsum[:-N]) / N
however I'm having trouble accommodating it to my desired output. namely (x-running average)/running average
Allright so I finally figured it out thanks to your help and the posts you referred me to.
The calculation for my entire data (300 000 +) takes about a second!
I used the following code:
def runningmean(x,N):
cumsum =np.cumsum(np.insert(x,0,0))
return (cumsum[N:] -cumsum[:-N])/N
photosignal = np.array(seg.analogsignal[0], ndmin =1)
photosignalaverage = runningmean(photosignal, 3000)
holder = np.zeros(2999)
photosignalaverage = np.append(holder,photosignalaverage)
detalfsignal = (photosignal-photosignalaverage)/abs(photosignalaverage)
Photosignal stores my raw signal in a numpy array.
Photosignalaverage uses cumsum to calculate the running average of every datapoint in photosignal. I then add the first 2999 values as 0, to maintian the same list size as my photosignal.
I then use basic numpy calculations to get my delta F/F signal.
Thank you once more for the feedback, was truly helpful!
Your approach goes in the right direction. However, you made a mistake in your list comprehension: you are using uu as your index whereas uu are the elements of your input data photosignal.
You want something like this:
normalizedphotosignal2 = np.zeros((photosignal.shape[0]-3000))
for i, uu in enumerate(photosignal[3000:]):
normalizedphotosignal2 = (uu - (np.mean(photosignal[i-3000:i]))) / abs(np.mean(photosignal[i-3000:i]))
Keep in mind that for-loops are relatively slow in python. If performance is an issue here, you could try avoiding the for loop and use numpy methods instead (e.g. have a look at Moving average or running mean).
Hope this helps.

Avoid python loop in deleting error prone values

My timelines are stored in simple numpy Arrays, and they are long (>10 Million entrys)
I have to detect machine shutdowns, that show in jumps in the time vector . After that shutdown I want do delete the next 10 values (The sensors do give bad results for a while after being switched on) and continue.
I came up with the following code:
Keep_data=np.empty_like(Timestamp_new,dtype=np.bool)
Keep_data[0]=False
Keep_data[1:]=Timestamp_new[1:]>(Timestamp_new[:-1]+min_shutdown_length)
for item in np.nonzero(np.logical_not(Keep_data))[0]:
Keep_data[item:min(item+10,len(Keep_data)]=False
Timestampnew=Timestampnew[Keep_data]
Can anyone suggest a more effective code, without a pure python Loop?
Thank you.
Basically you are trying to spread/grow or in image-processing terms dilate the False regions. For the same, we have a built-in as scipy's binary_dilation. Now, you are trying to make it grow starting from each such False element in input array Keep_data towards higher indices. So, we need to use a different offset (or as scipy calls it : origin) than the default one as 0, which otherwise would have dilated across both ends for each element.
Thus, to sum up, an implementation with it to get rid of the loopy portion of the code, we would have an implementation like so -
N = 10 # Interval length
dilated_mask = binary_dilation(~Keep_data, structure=np.ones(N),origin=-int(N/2))
Keep_data[dilated_mask] = False
An alternative approach that would be closer to the one posted as the loopy code in the question, but vectorized with NumPy's broadcasting feature, would look something like this -
N = 10 # Interval length
idx = np.nonzero(np.logical_not(Keep_data[:-N]))[0]
Keep_datac[(idx + np.arange(N)[:,None]).ravel()] = False
rest = np.nonzero(np.logical_not(Keep_data[-N:]))[0]
if len(rest)>0:
Keep_datac[-N+rest[0]:] = False

Python - comparing elements of list with 'neighbour' elements

This may be more of an 'approach' or conceptual question.
Basically, I have a python a multi-dimensional list like so:
my_list = [[0,1,1,1,0,1], [1,1,1,0,0,1], [1,1,0,0,0,1], [1,1,1,1,1,1]]
What I have to do is iterate through the array and compare each element with those directly surrounding it as though the list was layed out as a matrix.
For instance, given the first element of the first row, my_list[0][0], I need to know know the value of my_list[0][1], my_list[1][0] and my_list[1][1]. The value of the 'surrounding' elements will determine how the current element should be operated on. Of course for an element in the heart of the array, 8 comparisons will be necessary.
Now I know I could simply iterate through the array and compare with the indexed values, as above. I was curious as to whether there was a more efficient way which limited the amount of iteration required? Should I iterate through the array as is, or iterate and compare only values to either side and then transpose the array and run it again. This, however would ignore those values to the diagonal. And should I store results of the element lookups, so I don't keep determining the value of the same element multiple times?
I suspect this may have a fundamental approach in Computer Science, and I am eager to get feedback on the best approach using Python as opposed to looking for a specific answer to my problem.
You may get faster, and possibly even simpler, code by using numpy, or other alternatives (see below for details). But from a theoretical point of view, in terms of algorithmic complexity, the best you can get is O(N*M), and you can do that with your design (if I understand it correctly). For example:
def neighbors(matrix, row, col):
for i in row-1, row, row+1:
if i < 0 or i == len(matrix): continue
for j in col-1, col, col+1:
if j < 0 or j == len(matrix[i]): continue
if i == row and j == col: continue
yield matrix[i][j]
matrix = [[0,1,1,1,0,1], [1,1,1,0,0,1], [1,1,0,0,0,1], [1,1,1,1,1,1]]
for i, row in enumerate(matrix):
for j, cell in enumerate(cell):
for neighbor in neighbors(matrix, i, j):
do_stuff(cell, neighbor)
This has takes N * M * 8 steps (actually, a bit less than that, because many cells will have fewer than 8 neighbors). And algorithmically, there's no way you can do better than O(N * M). So, you're done.
(In some cases, you can make things simpler—with no significant change either way in performance—by thinking in terms of iterator transformations. For example, you can easily create a grouper over adjacent triplets from a list a by properly zipping a, a[1:], and a[2:], and you can extend this to adjacent 2-dimensional nonets. But I think in this case, it would just make your code more complicated that writing an explicit neighbors iterator and explicit for loops over the matrix.)
However, practically, you can get a whole lot faster, in various ways. For example:
Using numpy, you may get an order of magnitude or so faster. When you're iterating a tight loop and doing simple arithmetic, that's one of the things that Python is particularly slow at, and numpy can do it in C (or Fortran) instead.
Using your favorite GPGPU library, you can explicitly vectorize your operations.
Using multiprocessing, you can break the matrix up into pieces and perform multiple pieces in parallel on separate cores (or even separate machines).
Of course for a single 4x6 matrix, none of these are worth doing… except possibly for numpy, which may make your code simpler as well as faster, as long as you can express your operations naturally in matrix/broadcast terms.
In fact, even if you can't easily express things that way, just using numpy to store the matrix may make things a little simpler (and save some memory, if that matters). For example, numpy can let you access a single column from a matrix naturally, while in pure Python, you need to write something like [row[col] for row in matrix].
So, how would you tackle this with numpy?
First, you should read over numpy.matrix and ufunc (or, better, some higher-level tutorial, but I don't have one to recommend) before going too much further.
Anyway, it depends on what you're doing with each set of neighbors, but there are three basic ideas.
First, if you can convert your operation into simple matrix math, that's always easiest.
If not, you can create 8 "neighbor matrices" just by shifting the matrix in each direction, then perform simple operations against each neighbor. For some cases, it may be easier to start with an N+2 x N+2 matrix with suitable "empty" values (usually 0 or nan) in the outer rim. Alternatively, you can shift the matrix over and fill in empty values. Or, for some operations, you don't need an identical-sized matrix, so you can just crop the matrix to create a neighbor. It really depends on what operations you want to do.
For example, taking your input as a fixed 6x4 board for the Game of Life:
def neighbors(matrix):
for i in -1, 0, 1:
for j in -1, 0, 1:
if i == 0 and j == 0: continue
yield np.roll(np.roll(matrix, i, 0), j, 1)
matrix = np.matrix([[0,0,0,0,0,0,0,0],
[0,0,1,1,1,0,1,0],
[0,1,1,1,0,0,1,0],
[0,1,1,0,0,0,1,0],
[0,1,1,1,1,1,1,0],
[0,0,0,0,0,0,0,0]])
while True:
livecount = sum(neighbors(matrix))
matrix = (matrix & (livecount==2)) | (livecount==3)
(Note that this isn't the best way to solve this problem, but I think it's relatively easy to understand, and likely to illuminate whatever your actual problem is.)

Categories

Resources