I want to create a 2D list of floats with Perlin Noise. I would like the values generated to be different every time I run the program. However, I'm not sure how to provide randomized seeds for the noise library I found on GitHub here.
How can I make the program generate different values every time it's run?
My Code:
from __future__ import division
import noise
import math
from singleton import ST
def create_map_list():
"""
This creates a 2D list of floats using the noise library. It then assigns
ST.map_list to the list created. The range of the floats inside the list
is [0, 1].
"""
# used to normalize noise to [0, 1]
min_val = -math.sqrt(2) / 2
max_val = abs(min_val)
map_list = []
for y in range(0, ST.MAP_HEIGHT):
row = []
for x in range(0, ST.MAP_WIDTH):
nx = x / ST.MAP_WIDTH - 0.5
ny = y / ST.MAP_HEIGHT - 0.5
row.append((noise.pnoise2(nx, ny, 8) - min_val) / (max_val - min_val))
map_list.append(row )
ST.map_list = map_list
The noise library do not support seed. At actual state, you cannot have random output.
But, one pull request has been posted to fix this point.
To do so, you will have to rebuild the library once you get the modified code. (python setup.py install)
An easy way to do so is to add up a random number to x and y at the noise function, so you can use the random.seed(), too. I'm working on a Minecraft world generation project and I'm using noise.
Related
I'm trying to plot a simple moving averages function but the resulting array is a few numbers short of the full sample size. How do I plot such a line alongside a more standard line that extends for the full sample size? The code below results in this error message:
ValueError: x and y must have same first dimension, but have shapes (96,) and (100,)
This is using standard matplotlib.pyplot. I've tried just deleting X values using remove and del as well as switching all arrays to numpy arrays (since that's the output format of my moving averages function) then tried adding an if condition to the append in the while loop but neither has worked.
import random
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
def movingaverage(values, window):
weights = np.repeat(1.0, window) / window
smas = np.convolve(values, weights, 'valid')
return smas
sampleSize = 100
min = -10
max = 10
window = 5
vX = np.array([])
vY = np.array([])
x = 0
val = 0
while x < sampleSize:
val += (random.randint(min, max))
vY = np.append(vY, val)
vX = np.append(vX, x)
x += 1
plt.plot(vX, vY)
plt.plot(vX, movingaverage(vY, window))
plt.show()
Expected results would be two lines on the same graph - one a simple moving average of the other.
Just change this line to the following:
smas = np.convolve(values, weights,'same')
The 'valid' option, only convolves if the window completely covers the values array. What you want is 'same', which does what you are looking for.
Edit: This, however, also comes with its own issues as it acts like there are extra bits of data with value 0 when your window does not fully sit on top of the data. This can be ignored if chosen, as is done in this solution, but another approach is to pad the array with specific values of your choosing instead (see Mike Sperry's answer).
Here is how you would pad a numpy array out to the desired length with 'nan's (replace 'nan' with other values, or replace 'constant' with another mode depending on desired results)
https://docs.scipy.org/doc/numpy/reference/generated/numpy.pad.html
import numpy as np
bob = np.asarray([1,2,3])
alice = np.pad(bob,(0,100-len(bob)),'constant',constant_values=('nan','nan'))
So in your code it would look something like this:
import random
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
def movingaverage(values,window):
weights = np.repeat(1.0,window)/window
smas = np.convolve(values,weights,'valid')
shorted = int((100-len(smas))/2)
print(shorted)
smas = np.pad(smas,(shorted,shorted),'constant',constant_values=('nan','nan'))
return smas
sampleSize = 100
min = -10
max = 10
window = 5
vX = np.array([])
vY = np.array([])
x = 0
val = 0
while x < sampleSize:
val += (random.randint(min,max))
vY = np.append(vY,val)
vX = np.append(vX,x)
x += 1
plt.plot(vX,vY)
plt.plot(vX,(movingaverage(vY,window)))
plt.show()
To answer your basic question, the key is to take a slice of the x-axis appropriate to the data of the moving average. Since you have a convolution of 100 data elements with a window of size 5, the result is valid for the last 96 elements. You would plot it like this:
plt.plot(vX[window - 1:], movingaverage(vY, window))
That being said, your code could stand to have some optimization done on it. For example, numpy arrays are stored in fixed size static buffers. Any time you do append or delete on them, the entire thing gets reallocated, unlike Python lists, which have amortization built in. It is always better to preallocate if you know the array size ahead of time (which you do).
Secondly, running an explicit loop is rarely necessary. You are generally better off using the under-the-hood loops implemented at the lowest level in the numpy functions instead. This is called vectorization. Random number generation, cumulative sums and incremental arrays are all fully vectorized in numpy. In a more general sense, it's usually not very effective to mix Python and numpy computational functions, including random.
Finally, you may want to consider a different convolution method. I would suggest something based on numpy.lib.stride_tricks.as_strided. This is a somewhat arcane, but very effective way to implement a sliding window with numpy arrays. I will show it here as an alternative to the convolution method you used, but feel free to ignore this part.
All in all:
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
def movingaverage(values, window):
# this step creates a view into the same buffer
values = np.lib.stride_tricks.as_strided(values, shape=(window, values.size - window + 1), strides=values.strides * 2)
smas = values.sum(axis=0)
smas /= window # in-place to avoid temp array
return smas
sampleSize = 100
min = -10
max = 10
window = 5
v_x = np.arange(sampleSize)
v_y = np.cumsum(np.random.random_integers(min, max, sampleSize))
plt.plot(v_x, v_y)
plt.plot(v_x[window - 1:], movingaverage(v_y, window))
plt.show()
A note on names: in Python, variable and function names are conventionally name_with_underscore. CamelCase is reserved for class names. np.random.random_integers uses inclusive bounds just like random.randint, but allows you to specify the number of samples to generate. Confusingly, np.random.randint has an exclusive upper bound, more like random.randrange.
I am trying to convert a set of 3D points into a heightmap (a 2d image that shows the largest displacements of the points from the floor)
The only way I can come up with is writing a for look that iterates through all points and update the heightmap, this method, is quite slow.
import numpy as np
heightmap_resolution = 0.02
# generate some random 3D points
points = np.array([[x,y,z] for x in np.random.uniform(0,2,100) for y in np.random.uniform(0,2,100) for z in np.random.uniform(0,2,100)])
heightmap = np.zeros((int(np.max(points[:,1])/heightmap_resolution) + 1,
int(np.max(points[:,0])/heightmap_resolution) + 1))
for point in points:
y = int(point[1]/heightmap_resolution)
x = int(point[0]/heightmap_resolution)
if point[2] > heightmap[y][x]:
heightmap[y][x] = point[2]
I wonder if there is a better way of doing this. Any improvement is greatly appreciated!
The intuition:
If you find yourself using a for loop with numpy, you probably need to check again if numpy has an operation for it. I saw you wanted to compare items to get max and I wasn't sure if the structure was imporant so I changed it.
2nd point is heightmap is pre-allocating a lot of memory you aren't going to use. Try using a dictionary with a tuple (x,y) as the key or this (a dataframe)
import numpy as np
import pandas as pd
heightmap_resolution = 0.02
# generate some random 3D points
points = np.array([[x,y,z] for x in np.random.uniform(0,2,100) for y in np.random.uniform(0,2,100) for z in np.random.uniform(0,2,100)])
points_df = pd.DataFrame(points, columns = ['x','y','z'])
#didn't know if you wanted to keep the x and y columns so I made new ones.
points_df['x_normalized'] = (points_df['x']/heightmap_resolution).astype(int)
points_df['y_normalized'] = (points_df['y']/heightmap_resolution).astype(int)
points_df.groupby(['x_normalized','y_normalized'])['z'].max()
I need to generate a Healpyx map (using Healpy) from random $a_{\ell m}$, for a spin-2 function.
Schematically, this should look like that:
import healpy as hp
nside = 16 # for example
for el in range(1, L+1): #loop over ell mode
for m in range(-el,el): #for each ell mode loop over m
ind = hp.sphtfunc.Alm.getidx(nside, el, m)
if m == 0:
a_lm[ind] = np.random.randn()
else:
a_lm[ind] = np.random.randn() + 1j * np.random.randn()
a_tmp = hp.sphtfunc.alm2map(a_lm, nside, pol=True)
My two questions are:
1) how do I initialise a_lm ? Specifically, what would be its dimension, using
a_lm = np.zeros(???)
2) if I understood correctly, the output a_tmp is a 1 dimensional list. How do I reshape it into a two-dimensional list (the map) for plotting?
1) What properties do you want your alm to have? You could also just assume a certain power spectrum (C_ell) and use hp.synalm() or hp.synfast().
For the initialization, you've already implemented that m goes from -ell to +ell, so you have a one-dimensional array of length sum_0^ell [2ell+1]. Doing the math should give you the length you need.
2) For the plotting, you could just directly generate a random map and then use e.g. hp.mollview(), which takes the 1-dimensional HEALPix map.
Alternatively, you can use hp.alm2map() to convert your alm to a map.
I also suggest you check out the tutorial for the plotting.
Usually we can follow the following steps to get the length of a_lm.
import healpy as hp
inside = 16
# Get the maximum multipole with the current nside
lmax = 3*nside - 1 #This can vary according to the use. In cosmology, the common value is 2*nside
alm_len = hp.Alm.getsize(lmax)
a_lm = np.empty(alm_len)
I think the tutorial linked in #Daniel's answer is a good resource for plotting Healpix maps.
I have two sets of points, one is a map consisting of x,y coordinates, and the second is a path of x,y coordinates. I'm trying to find the closest map points to my path points, pretty simple. Except my map is 380000 points and my paths (of which I have several) each consist of ~ 350000 points themselves.
Other than sampling my data to get smaller datasets, I'm trying to find a faster way to accomplish this task.
base algorithm:
import pandas as pd
from scipy.spatial.distance import cdist
...
def closeset_point(point, points):
return points[cdist([point], points).argmin()]
# log['point'].shape; 333000
# map_data['point'].shape; 380000
closest = [closest_point(log_p, list(map_data['point'])) for log_p in log['point']]
as per this example: Find closest point in Pandas DataFrames
After converting this to a tqdm progress bar to see how long it would take (as it was taking a while, obviously), I noticed it would take about 10hrs to complete.
tqdm loop:
for i in trange(len(log), desc='finding closest points'):
closest.append(closest_point(log['point'].loc[i], list(map_data['point'])))
>> finding closest points: 5%| | 16432/333456 [32:11<10:13:52], 8.60it/s
While 10 hours is not impossible, I wonder if there is a way to speed this up? I have a solid gpu/cpu/ram at my disposal so I feel this should be doable. I'm also learning tensorflow (but honestly my math is atrocious so I'm very in the dark with it)
Any ideas on how to speed this up with either multi-threading, gpu computation, tensorflow or some other sort of wizardry?
inb4 python is slow ;)
*edit: image shows what i'm trying to do. green is path, blue is map, orange is what I'm trying to find.
The following is a mini example of what you're trying to do. Considers the variable coords1 as your variable log['point'] and coords2 as your variable log['point']. The end result is the index of the coord2 closest to coord1.
from scipy.spatial import distance
import numpy as np
coords1 = [(35.0456, -85.2672),
(35.1174, -89.9711),
(35.9728, -83.9422),
(36.1667, -86.7833)]
coords2 = [(35.0456, -85.2672),
(35.1174, -89.9711),
(35.9728, -83.9422),
(34.9728, -83.9422),
(36.1667, -86.7833)]
tmp = distance.cdist(coords1, coords2, "sqeuclidean") # sqeuclidean based on Mark Setchell comment to improve speed further
result = np.argmin(tmp,1)
# result: array([0, 1, 2, 4])
This should be way faster, because it's done everything in one iteration.
After 3 years, but if anyone is looking at this issue... You may want to try Numba I get almost a 9x speed reduction from scipy distance.cdist on a 1.5 Million set of points to a 1.5 K set of path points. Also, as #
Mark Setchell said if you want to remove the np.sqrt in a big enough set of points could be considerable saved time.
Results
size: (1459383, 2)
numba: 0.06402060508728027
cdist: 0.5371212959289551
Code
# EUCLEDIAN DISTANCE
#numba.njit('(float64[:,::1], float64[::1], float64[::1])', parallel=True, fastmath=True)
def pz_dist(p_array, x_flat, y_flat):
m = p_array.shape[0]
n = x_flat.shape[0]
d = np.empty(shape=(m, n), dtype=np.float64)
for i in numba.prange(m):
p1 = p_array[i, :]
for j in range(n):
_x = x_flat[j] - p1[0]
_y = y_flat[j] - p1[1]
_d = np.sqrt(_x**2 + _y**2)
d[i, j] = _d
return d
In my application, the data data is sampled on a distorted grid, and I would like to resample it to a nondistorted grid. In order to test this, I wrote this program with examplary distortions and a simple function as data:
from __future__ import division
import numpy as np
import scipy.interpolate as intp
import pylab as plt
# Defining some variables:
quadratic = -3/128
linear = 1/16
pn = np.poly1d([quadratic, linear,0])
pixels_x = 50
pixels_y = 30
frame = np.zeros((pixels_x,pixels_y))
x_width= np.concatenate((np.linspace(8,7.8,57) , np.linspace(7.8,8,pixels_y-57)))
def data(x,y):
z = y*(np.exp(-(x-5)**2/3) + np.exp(-(x)**2/5) + np.exp(-(x+5)**2))
return(z)
# Generating grid coordinates
yt = np.arange(380,380+pixels_y*4,4)
xt = np.linspace(-7.8,7.8,pixels_x)
X, Y = np.meshgrid(xt,yt)
Y=Y.T
X=X.T
Y_m = np.zeros((pixels_x,pixels_y))
X_m = np.zeros((pixels_x,pixels_y))
# generating distorted grid coordinates:
for i in range(pixels_y):
Y_m[:,i] = Y[:,i] - pn(xt)
X_m[:,i] = np.linspace(-x_width[i],x_width[i],pixels_x)
# Sample data:
for i in range(pixels_y):
for j in range(pixels_x):
frame[j,i] = data(X_m[j,i],Y_m[j,i])
Y_m = Y_m.flatten()
X_m = X_m.flatten()
frame = frame.flatten()
##
Y = Y.flatten()
X = X.flatten()
ipf = intp.interp2d(X_m,Y_m,frame)
interpolated_frame = ipf(xt,yt)
At this point, I have to questions:
The code works, but I get the the following warning:
Warning: No more knots can be added because the number of B-spline coefficients
already exceeds the number of data points m. Probably causes: either
s or m too small. (fp>s)
kx,ky=1,1 nx,ny=54,31 m=1500 fp=0.000006 s=0.000000
Also, some interpolation artifacts appear, and I assume that they are related to the warning - Do you guys know what I am doing wrong?
For my actual applications, the frames need to be around 500*100, but when doing this, I get a MemoryError - Is there something I can do to help that, apart from splitting the frame into several parts?
Thanks!
This problem is most likely related to the usage of bisplrep and bisplev within interp2d. The docs mention that they use a smooting factor of s=0.0 and that bisplrep and bisplev should be used directly if more control over s is needed. The related docs mention that s should be found between (m-sqrt(2*m),m+sqrt(2*m)) where m is the number of points used to construct the splines. I had a similar problem and found it solved when using bisplrep and bisplev directly, where s is only optional.
For 2d interpolation,
griddata
is solid, local, fast.
Take a look at problem-with-2d-interpolation-in-scipy-non-rectangular-grid on SO.
You might want to look at the following interp method in basemap:
mpl_toolkits.basemap.interp
http://matplotlib.sourceforge.net/basemap/doc/html/api/basemap_api.html
unless you really need spline-based interpolation.