The problem
Create a higher dimensional NumPy array with zeros on the new dimensions
Details
Analyzing the last dimension, the result is similar to this:
(not an actual code, just a didactic example)
a.shape = (100,2,10)
a[0,0,0]=1
a[0,0,1]=2
...
a[0,0,9]=10
b.shape = (100,2,10,10)
b[0,0,0,:]=[0,0,0,0,0,0,0,0,0,1]
b[0,0,1,:]=[0,0,0,0,0,0,0,0,2,1]
b[0,0,2,:]=[0,0,0,0,0,0,0,3,2,1]
...
b[0,0,2,:]=[10,9,8,7,6,5,4,3,2,1]
a -> b
The objective is to transform from a into b. The problem is that is not only filled with zeros but has a sequential composition with the original array.
Simpler problem for better understanding
Another way to visualize is using lower-dimensional arrays:
We have this:
a = [1,2]
And I want this:
b = [[0,1],[2,1]]
Using NumPy array and avoiding long for loops.
2d to 3d case
We have this:
a = [[1,2,3],[4,5,6],[7,8,9]]
And I want this:
b[0] = [[0,0,1],[0,2,1],[3,2,1]]
b[1] = [[0,0,4],[0,5,4],[6,5,4]]
b[2] = [[0,0,7],[0,8,7],[9,8,7]]
I feel that for the 4-dimensional problem only one for loop with 10 iterations is enough.
Try something like this in the framework of numpy:
import numpy as np
# create transformation tensors
N = a.shape[-1]
sigma1 = np.zeros((N,N,N))
sigma2 = np.zeros((N,N,N))
E = np.ones((N,N))
for i in range(N):
sigma1[...,i] = np.diag(np.diag(E,N-1-i),N-1-i)
sigma2[N-1-i,N-1-i:,i] = 1
b1 = np.tensordot(a, sigma1, axes=([-1],[0]))
b2 = np.tensordot(a, sigma2, axes=([-1],[0]))
where sigma1, sigma2 are the transformation tensors for which you can transform the data associated with the last dimension of a as you want (the two versions you mentioned in your question and comments). Here the loop is only used to create the transformation tensor.
For a = [[1,2,3],[1,2,3],[1,2,3]], the first algorithm gives:
[[[0. 0. 1.] [0. 1. 2.] [1. 2. 3.]]
[[0. 0. 4.] [0. 4. 5.] [4. 5. 6.]]
[[0. 0. 7.] [0. 7. 8.] [7. 8. 9.]]]
and the last algorithm gives:
[[[0. 0. 1.] [0. 2. 1.] [3. 2. 1.]] [[0. 0. 4.] [0. 5. 4.] [6. 5. 4.]] [[0. 0. 7.] [0. 8. 7.] [9. 8. 7.]]]
Try to avoid lists and loops when using numpy as they slow down the execution speed.
I was able to solve the problem but there is probably a more efficient way to do it:
a = np.array([[1,2,3],[4,5,6],[7,8,9]]) #two dim case
a = np.array([[[1,2,3],[4,5,6],[7,8,9]],[[1,2,3],[4,5,6],[7,8,9]],[[1,2,3],[4,5,6],[7,8,9]]])# three dim case
def increase_dim(arr):
stack_list = []
stack_list.append(arr)
for i in range(1,arr.shape[-1]):
stack_list.append(np.delete(np.delete(np.append(np.zeros(arr.shape),arr,axis=-1),np.s_[-i:],axis = len(arr.shape)-1),np.s_[:arr.shape[-1]-i],axis = -1))
return np.stack(stack_list,axis = -1)
b = increase_dim(b)
I hope this can help the question understanding.
I'm working on an animated bar plot to show how the number frequencies of rolling a six-sided die converge the more you roll the die. I'd like to show the number frequencies after each iteration, and for that I have to get a list of the number frequencies for that iteration in another list. Here's the code so far:
import numpy as np
import numpy.random as rd
rd.seed(23)
n_samples = 3
freqs = np.zeros(6)
frequencies = []
for roll in range(n_samples):
x = rd.randint(0, 6)
freqs[x] += 1
print(freqs)
frequencies.append(freqs)
print()
for x in frequencies:
print(x)
Output:
[0. 0. 0. 1. 0. 0.]
[1. 0. 0. 1. 0. 0.]
[1. 1. 0. 1. 0. 0.]
[1. 1. 0. 1. 0. 0.]
[1. 1. 0. 1. 0. 0.]
[1. 1. 0. 1. 0. 0.]
Desired output:
[0. 0. 0. 1. 0. 0.]
[1. 0. 0. 1. 0. 0.]
[1. 1. 0. 1. 0. 0.]
[0. 0. 0. 1. 0. 0.]
[1. 0. 0. 1. 0. 0.]
[1. 1. 0. 1. 0. 0.]
The upper three lists indeed show the number frequencies after each iteration. However, when I try to append the list to the 'frequencies' list, in the end it just shows the final number frequencies each time as can be seen in the lower three lists. This one's got me stumped, and I am rather new to Python. How would one get each list like in the first three lists of the output, in another? Thanks in advance!
You can do it like that by changing only frequencies.append(freqs) with frequencies.append(freqs.copy()). Like that, you can make a copy of freqs that would be independent of original freqs. A change in freqs won't change freqs.copy().
import numpy as np
import numpy.random as rd
rd.seed(23)
n_samples = 3
freqs = np.zeros(6)
frequencies = []
for roll in range(n_samples):
x = rd.randint(0, 6)
freqs[x] += 1
print(freqs)
frequencies.append(freqs.copy())
print(frequencies)
print()
for x in frequencies:
print(x)
Python is keeping track of freqs as single identity, and its value gets changed even after it gets appended. There is a good explanation for this beyond my comprehension =P
However, here is quick and dirty work around:
import numpy as np
import numpy.random as rd
rd.seed(23)
n_samples = 3
freqs = np.zeros(6)
frequencies = []
for roll in range(n_samples):
x = rd.randint(0, 6)
freqs_copy = []
for item in freqs:
freqs_copy.append(item)
freqs_copy[x] += 1
print(freqs_copy)
frequencies.append(freqs_copy)
print()
for x in frequencies:
print(x)
The idea is to make a copy of "freqs" that would be independent of original "freqs". In the code above "freqs_copy" would be unique to each iteration.
I have a list of x y like the picture above
in code it works like this:
np.array([[1.3,2.1],[1.5,2.2],[3.1,4.8]])
now I would like to set a grid of which I can set the start, the number of columns and rows as well as the row and columns size, and then count the number of points in each cell.
in this example [0,0] has 1 point in it, [1,0] has 1, [2,0] has 3, [0,1] has 4 and so on.
I know it would probably be trivial to do by hand, even without numpy, but I need to create it as fast as possible, since I will have to process a ton of data this way.
whats a good way to do this? Basicly create a 2D Histogramm of points? And more importantly, how can I do it as fast as possible?
I think numpy.histogram2d is the best option.
a = np.array([[1.3,2.1],[1.5,2.2],[3.1,4.8]])
H, _, _ = np.histogram2d(a[:, 0], a[:, 1], bins=(range(6), range(6)))
print(H)
# [[0. 0. 0. 0. 0.]
# [0. 0. 2. 0. 0.]
# [0. 0. 0. 0. 0.]
# [0. 0. 0. 0. 1.]
# [0. 0. 0. 0. 0.]]
When you know the number of dimensions of your lattice ahead of time, it is straight-forward to use meshgrid to evaluate a function over a mesh.
from pylab import *
lattice_points = linspace(0,3,4)
xs,ys = meshgrid(lattice_points,lattice_points)
zs = xs+ys # <- stand-in function, to be replaced by something more interesting
print(zs)
Produces
[[ 0. 1. 2. 3.]
[ 1. 2. 3. 4.]
[ 2. 3. 4. 5.]
[ 3. 4. 5. 6.]]
But I would like to have a version of something similar, for which the number of dimensions is determined during runtime, or is passed as a parameter.
from pylab import *
#np.vectorize
def fn(listOfVars) :
return sum(listOfVars) # <- stand-in function, to be replaced
# by something more interesting
n_vars = 2
lattice_points = linspace(0,3,4)
indices = meshgrid(*(n_vars*[lattice_points])) # this works fine
zs = fn(indices) # <-- this line is wrong, but I don't
# know what would work instead
print(zs)
Produces
[[[ 0. 1. 2. 3.]
[ 0. 1. 2. 3.]
[ 0. 1. 2. 3.]
[ 0. 1. 2. 3.]]
[[ 0. 0. 0. 0.]
[ 1. 1. 1. 1.]
[ 2. 2. 2. 2.]
[ 3. 3. 3. 3.]]]
But I want it to produce the same result as above.
There is probably a solution where you can find the indices of each dimension and use itertools.product to generate all of the possible combinations of indices etc. etc., but is there not a nice pythonic way of doing this?
Joe Kington and user2357112 have helped me to see the error in my ways. For those of you that would like to see a complete solution:
from pylab import *
## 2D "preknown case" (for testing / to compare output)
lattice_points = linspace(0,3,4)
xs,ys = meshgrid(lattice_points,lattice_points)
zs = xs+ys
print('2-D Case')
print(zs)
## 3D "preknown case" (for testing / to compare output)
lattice_points = linspace(0,3,4)
ws,xs,ys = meshgrid(lattice_points,lattice_points,lattice_points)
zs = ws+xs+ys
print('3-D Case')
print(zs)
## Solution, thanks to comments from Joe Kington and user2357112
def fn(listOfVars) :
return sum(listOfVars)
n_vars = 3 ## can change to 2 or 3 to compare to example cases above
lattice_points = linspace(0,3,4)
indices = meshgrid(*(n_vars*[lattice_points]))
zs = np.apply_along_axis(fn,0,indices)
print('adaptable n-D Case')
print(zs)