I have a data set of 2 1D arrays. My goal is to count the points in each section of a grid (with a size of my choosing).
plt.figure(figsize=(8,7))
np.random.seed(5)
x = np.random.random(100)
y = np.random.random(100)
plt.plot(x,y,'bo')
plt.grid(True)
My Plot
I would like to be able to split each section into is own unique set of 2 1D or 1 2D arrays.
import numpy as np
def split(arr, cond):
return [arr[cond], arr[~cond]]
a = np.array([1,3,5,7,2,4,6,8])
print split(a, a<5)
this will return a list of two arrays containing [1,2,3,4] and [5,6,7,8].
Try using this function based on the conditions you set (intervals of 0.2 it seems)
NOTE: to implement this correctly for your problem, you'll have to modify the split function seeing that you want to split the data into more than two sections. I'll leave that as an exercise for you to do :)
This function takes in two 1D arrays and returns a 2D matrix, in which each element is the number of points in the grid section corresponding to your image:
import numpy as np
def count_points(arr1, arr2, bin_width):
x = np.floor(arr1/bin_width).astype(int) # Bin number for each value
y = np.floor(arr2/bin_width).astype(int) # Bin number for each value
counts = np.zeros(shape=(max(x)+1, max(y)+1), dtype=int)
for i in range(x.shape[0]):
row = max(y) - y[i]
col = x[i]
counts[row, col] += 1
return counts
Note that x and y don't line up with the column and row index, since the origin is at the bottom left in the plot but the "origin" (index [0,0]`) of the matrix is the top left. I rearranged the matrix so that the elements line up with what you see in the photo.
Example:
np.random.seed(0)
x = np.random.random(100)
y = np.random.random(100)
print count_points(x, y, 0.2) # 0.2 matches the default gridlines in matplotlib
# Output:
#[[8 4 5 4 0]
# [2 5 5 7 4]
# [7 1 3 8 3]
# [4 2 5 3 4]
# [4 4 3 1 4]]
Which matches the counts here:
Related
I would like to apply function func over each row of 2D ndarray arr shaped n x m with provided list of arguments args (of lengh n). That is for each row i function is executed as func(arr[i, :], args[i]).
This task can be acomplished with np.fromiter (using for loop):
iterable = (func(row, arg) for row, arg in zip(arr, args))
results = np.fromiter(iterable, dtype=int)
However this can take some time in case of large arrays. Acoording to unutbu's answer using numpy's python utility functions (e.g. np.apply_along_axis) does not provide siginifacnt speedup. Is there a way to optimize this process?
To avoid falling into XY problem trap, beneath is my orginal problem statement:
I have an ndarray representing image, shaped n x m. This image undergo processing during, which for each row a specifix index i is calculated. I want to compose a image of orginal shape (n x m) using data on the right from index i for each row. That is I want to resample each row[i:] of length m - i to m samples. Note that I want to use my own implementation of resampling function (don't want to use scipy.signal.resample etc).
EDIT:
Test code with func example (added count argument to fromiter as suggested by LudvigH):
import numpy as np
import matplotlib.pyplot as plt
def simple_slant_range_correction(
row, height, n_samples, max_ground_range, max_slant_range, slant_range_resolution
):
ground_ranges = np.linspace(height, max_ground_range, n_samples)
slant_ranges = np.sqrt(ground_ranges ** 2 + height ** 2)
slant_ranges_indicies = slant_ranges / slant_range_resolution - 1
slant_ranges_indicies_floor = np.floor(slant_ranges_indicies).astype(np.int16)
slant_ranges_indicies_ceil = np.clip(
0, n_samples - 1, slant_ranges_indicies_floor + 1
)
weight = slant_ranges_indicies - slant_ranges_indicies_floor
return (
weight * row[slant_ranges_indicies_ceil]
+ (1 - weight) * row[slant_ranges_indicies_floor]
).astype(np.float32)
if __name__ == "__main__":
# Test parameters
n, m = 100, 100
max_slant_range = 50
slant_range_resolution = max_slant_range / m
# Create some dummy data
data = np.zeros((n, m))
h_indicies = np.ones((n), dtype=int)
for i in np.arange(0, n, 5):
data[:i, :i] += i
h_indicies[:i] += 1
heights = h_indicies * slant_range_resolution
max_ground_ranges = np.sqrt(max_slant_range ** 2 - heights ** 2)
# Perform resampling based on h_index
iters = (
simple_slant_range_correction(
row, height, m, max_ground_range, max_slant_range, slant_range_resolution
)
for row, height, max_ground_range in zip(data, heights, max_ground_ranges)
)
data_sampled = np.fromiter(iters, dtype=np.dtype((np.float32, m)), count=n)
# Plot data
fig, axs = plt.subplots(1, 2)
axs[0].plot(h_indicies + 0.5, np.arange(n) + 0.5, c="red")
axs[0].imshow(data, vmin=0, vmax=data.max())
axs[1].imshow(data_sampled, vmin=0, vmax=data.max())
axs[0].set_axis_off()
axs[1].set_axis_off()
plt.tight_layout()
plt.show()
It is typically faster to take advantage of vectorization by using numpy operations to manipulate the data, as compared to using python functions and objects to manipulate the data. Below is an example of a way to solve the problem described at the end of your question using numpy vectorization.
import numpy as np
Choosing some array and column indices as an example:
# 1 2 3 3 1
# A = 4 5 6 6 row_indices = 3
# 7 8 9 9 2
A = np.array([[1,2,3,3],[4,5,6,6],[7,8,9,9]])
row_indices = np.array([1,3,2])
Use vector operations to build a boolean masking array and then multiply the original array by the mask:
NM = np.shape(A)
N = NM[0]
M = NM[1]
col = np.arange(M,dtype=np.uint32)
B = np.outer(np.ones([1,N],dtype=np.uint32),col)
C = np.outer(row_indices,np.ones([1,M],dtype=np.uint32))
A_sampled = (B>=C)*A
print(A_sampled)
# output:
# 0 2 3 3
# 0 0 0 6
# 0 0 9 9
I am using Numpy as part of a neural network, and when updating the weights I am struggling to implement a step in a natural way.
The step works for an input rho_deltas (shape: (m,)) and self.node_layers[i-1].val (shape: (n,)) and outputs self.previous_edge_layer[i - 1] (shape: (m,n))
It should be such that self.previous_edge_layer[i - 1][j][k] == rho_deltas[j] * self.node_layers[i - 1].vals[k]
Example working inputs and outputs here.
(I'll try to update these so it is easier to copy and paste for testing your methods.)
I have managed to get it working well like:
self.previous_edge_layer[i - 1] = np.array([rho_delta * self.node_layers[i - 1].vals for rho_delta in rho_deltas])
However, it feels to me as though there is a Numpy operator/function that should be able to do this without the iteration over the full list. My inclination is matrix multiplication (#) however, I have not been able to get this to work. Or perhaps, dot product (*), however for n != m this fails.
Furthermore, I struggled to come up with a useful name for this question so feel free to rename so something better :).
Matrix multiplication is the right idea: the preliminary is to form matrices from your 1D vectors. We need 2D matrices here, even though one dimension will be of size 1. Something like this:
import numpy as np
rho_deltas = np.array([7.6, 12.3, 11.1]) # example data with m = 3
layer_vals = np.array([1.5, 20.9, -3.5, 7.0]) # example with n = 4
rho_deltas_row_mat = rho_deltas.reshape(-1, 1) # m rows, 1 column
layer_vals_col_mat = layer_vals.reshape(1, -1) # 1 row, n columns
res = rho_deltas_row_mat # layer_vals_col_mat
print(res.shape)
print(all(res[j][k] == rho_deltas[j] * layer_vals[k] for j in range(rho_deltas.shape[0]) for k in range(layer_vals.shape[0])))
prints:
(3, 4)
True
Alternatively, you could reshape both of them to row matrices and use transposition, something like:
rho_deltas_row_mat = rho_deltas.reshape(-1, 1)
layer_vals_row_mat = layer_vals.reshape(-1, 1)
res = rho_deltas_row_mat # layer_vals_row_mat.T
Based on the link you provided you can use Numpy meshgrid function to repeat two array based on each others dimension and then simply multiply them element wise.
The following will do what you want (tested it on your example and produced same results)
import numpy as np
a = np.array([1,2,3])
b = np.array([10,20,30,40,50])
bv, av = np.meshgrid(b,a) # returns repetition of one arraya by the other one's dimension.
# av = [[1 1 1 1 1]
# [2 2 2 2 2]
# [3 3 3 3 3]]
# bv = [[10 20 30 40 50]
# [10 20 30 40 50]
# [10 20 30 40 50]]
c = av*bv
# c = [[ 10 20 30 40 50]
# [ 20 40 60 80 100]
# [ 30 60 90 120 150]]
Similar result can also be achieved with Numpy einsum function if you are familiar with einstein sum and notation.
I am tying to plot 1d lattice graph, but i face with below:
NetworkXPointlessConcept: the null graph has no paths, thus there is no averageshortest path length
what is the problem of this code?
thanks.
N = 1000
x = 0
for n in range(1, N, 10):
lattice_1d_distance = list()
d = 0
lattice_1d = nx.grid_graph(range(1,n))
d = nx.average_shortest_path_length(lattice_1d)
lattice_1d_distance.append(d)
x.append(n)
plt.plot(x, lattice_1d_distance)
plt.show()
According to networkx documentation nx.grid_graph the input is a list of dimensions for nx.grid_graph
Example
print(list(range(1,4)))
nx.draw(nx.grid_graph(list(range(1,4))) # this is a two dimensional graph, as there is only 3 entries AND ONE ENTRY = 1
[1, 2, 3]
print(list(range(1,5)))
nx.draw(nx.grid_graph([1,2,3,4])) # this is a 3 dimensional graph, as there is only 4 entries AND ONE ENTRY = 1
[1, 2, 3, 4]
Therefore, lets say if you want to 1. plot the distance vs increment of number of dimensions for grid graphs but with constant size for each dimension, or you want to 2. plot the distance vs increment of size for each dimension for grid graphs but with constant number of dimensions:
import networkx as nx
import matplotlib.pyplot as plt
N = 10
x = []
lattice_1d_distance = []
for n in range(1, 10):
d = 0
lattice_1d = nx.grid_graph([2]*n) # plotting incrementing number of dimensions, but each dimension have same length.
d = nx.average_shortest_path_length(lattice_1d)
lattice_1d_distance.append(d)
x.append(n)
plt.plot(x, lattice_1d_distance)
plt.show()
N = 10
x = []
lattice_1d_distance = []
for n in range(1, 10):
d = 0
lattice_1d = nx.grid_graph([n,n]) # plotting 2 dimensional graphs, but each graph have incrementing length for each dimension.
d = nx.average_shortest_path_length(lattice_1d)
lattice_1d_distance.append(d)
x.append(n)
plt.plot(x, lattice_1d_distance)
plt.show()
Also, you need to pay attention to the declaration of list variables.
I am having trouble parsing a text file that I created with another program. The text file looks something like this:
velocity 4
0 0
0.0800284750334461 0.0702333599787275
0.153911082737118 0.128537103048848
0.222539323234924 0.176328826156044
0.286621942300277 0.21464146333504
0.346732028739683 0.244229944930359
0.403339781262399 0.265638972071027
...
velocity 8
0 0
0.169153136373962 0.124121036173475
0.312016311613761 0.226778846267302
0.435889653693839 0.312371513797743
0.545354054604357 0.383832483710643
0.643486956562741 0.443203331839287
...
I want to grab the number in the same row as velocity (the header) and save it as the title of the plot of the subsequent data. Every other row apart from the header represents the x and y coordinates of a shooting ball.
So if I have five different headers, I would like to see five different lines on a single graph with a legend displaying the different velocities.
Here is my python code so far. I am close to what I want to get, but I am missing the first set of data (velocity = 4 m/s) and the colors on my legend don't match the line colors.
import matplotlib.pyplot as plt
xPoints = []
yPoints = []
fig, ax = plt.subplots()
with open('artilleryMotion.txt') as inf:
for line in inf:
column = line.split()
if line.startswith("v"):
velocity = column[1]
ax.plot(xPoints, yPoints, label = '%s m/s' % velocity)
else:
xPoints.append(column[0])
yPoints.append(column[1])
ax.legend()
plt.title("Ping-Pong Ball Artillery Motion")
plt.xlabel("distance")
plt.ylabel("height")
plt.ylim(ymin = 0)
ax.set_autoscaley_on(1)
I have been struggling with this for a while.
Edit_1: This is my output at the moment:
Artillery motion plot
Edit_2: I removed the indentation of the last lines of code. The color problem still occurs.
Edit_3: How would I go about saving the x and y points to a new array for each velocity? This may solve my issues.
Edit_4: Thanks to Charles Morris, I was able to create these plots. I just need to now determine if the initial upwards "arcing" motion by the ping pong ball for the higher velocities is representative of the physics or is a limitation of my code.
Artillery Motion Final
Edit: Ignore the old information, and see Solved solution below:
The following code works an example text file: input.txt
velocity 4
0 0
0.0800284750334461 0.0702333599787275
0.153911082737118 0.128537103048848
0.222539323234924 0.176328826156044
0.286621942300277 0.21464146333504
0.346732028739683 0.244229944930359
0.403339781262399 0.265638972071027
velocity 8
0 0
0.169153136373962 0.124121036173475
0.312016311613761 0.226778846267302
0.435889653693839 0.312371513797743
0.545354054604357 0.383832483710643
0.643486956562741 0.443203331839287
1) Import our text file
We use np.genfromtxt() for imports. In this case, we can Specify that dtype = float. This has the effect that the affect that Numbers are imported as 'Float' and thus, strings (in this case 'Velocity'), are imported as NaN.
Source:
https://docs.scipy.org/doc/numpy/user/basics.io.genfromtxt.html
How to use numpy.genfromtxt when first column is string and the remaining columns are numbers?
from matplotlib import pyplot as plt
from itertools import groupby
from numpy import NaN as nan
A = np.genfromtxt('input.txt',dtype=float)
>>>
array([[ nan, 4. ],
[ 0. , 0. ],
[ 0.08002848, 0.07023336],
[ 0.15391108, 0.1285371 ],
[ 0.22253932, 0.17632883],
[ 0.28662194, 0.21464146],
[ 0.34673203, 0.24422994],
[ 0.40333978, 0.26563897],
[ nan, 8. ],
[ 0. , 0. ],
[ 0.16915314, 0.12412104],
[ 0.31201631, 0.22677885],
[ 0.43588965, 0.31237151],
[ 0.54535405, 0.38383248],
[ 0.64348696, 0.44320333]])
2) Slice the imported array A
We can slice these arrays into separate X and Y arrays representing our X and Y values. Read up on array slicing in numpy here: https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html
In this case, we take all values with index = 0 (X) and all values with index 1 (Y):
# x values
# y values
X = A[:,0]
Y = A[:,1]
>>> X = array([ nan, 0. , 0.08002848, 0.15391108, 0.22253932,
0.28662194, 0.34673203, 0.40333978, nan, 0. ,
0.16915314, 0.31201631, 0.43588965, 0.54535405, 0.64348696])
>>> Y = array([ 4. , 0. , 0.07023336, 0.1285371 , 0.17632883,
0.21464146, 0.24422994, 0.26563897, 8. , 0. ,
0.12412104, 0.22677885, 0.31237151, 0.38383248, 0.44320333])
3) Split the data for each velocity.
Here we desire to separate our X and Y values into those for each Velocity. Our X values are separated by Nan and our Y values are separated by 4,8,16....
Thus: For x, we split by nan. nan is a result of the genfromtxt() parsing Velocity as a float and returning nan.
Sources:
numpy: split 1D array of chunks separated by nans into a list of the chunks
Split array at value in numpy
For y, we split our array up on the numbers 4,8,16 etc. To do this, we exclude values that, when divided by 4, have zero remainder (using the % Python operator).
Sources:
Split array at value in numpy
How to check if a float value is a whole number
Split NumPy array according to values in the array (a condition)
Find the division remainder of a number
How do I use Python's itertools.groupby()?
XX = [list(v) for k,v in groupby(X,np.isfinite) if k]
YY = [list(v) for k,v in groupby(Y,lambda x: x % 4 != 0 or x == 0) if k]
>>>
XX = [[0.0,
0.080028475033446095,
0.15391108273711801,
0.22253932323492401,
0.28662194230027699
0.34673202873968301,
0.403339781262399],
[0.0,
0.16915313637396201,
0.31201631161376098,
0.43588965369383897,
0.54535405460435704,
0.64348695656274102]]
>>> YY =
[[0.0,
0.070233359978727497,
0.12853710304884799,
0.17632882615604401,
0.21464146333504,
0.24422994493035899,
0.26563897207102699],
[0.0,
0.124121036173475,
0.22677884626730199,
0.31237151379774297,
0.38383248371064299,
0.44320333183928701]]
4) Extract labels
Using a similar technique as above, we accept values = to our velocities 4,8,16 etc. In this case, we accept only those numbers which, when divided by 4, have 0 remainder, and are not 0. We then convert to a string and add m/s.
Ylabels = [list(v) for k,v in groupby(Y,lambda x: x % 4 == 0 and x != 0) if k]
Velocities = [str(i[0]) + ' m/s' for i in Ylabels]
>>> Y labels = [[4.0], [8.0]]
>>> Velocities = ['4.0 m/s', '8.0 m/s']
5) Plot
Plot values by index for each velocity.
fig, ax = plt.subplots()
for i in range(0,len(XX)):
plt.plot(XX[i],YY[i],label = Velocities[i])
ax.legend()
plt.title("Ping-Pong Ball Artillery Motion")
plt.xlabel("distance")
plt.ylabel("height")
plt.ylim(ymin = 0)
ax.set_autoscaley_on(1)
Code Altogether:
import numpy as np
from matplotlib import pyplot as plt
from itertools import groupby
from numpy import NaN as nan
A = np.genfromtxt('input.txt',dtype=float)
X = A[:,0]
Y = A[:,1]
Ylabels = [list(v) for k,v in groupby(Y,lambda x: x % 4 == 0 and x != 0) if k]
Velocities = [str(i[0]) + ' m/s' for i in Ylabels]
XX = [list(v) for k,v in groupby(X,np.isfinite) if k]
YY = [list(v) for k,v in groupby(Y,lambda x: x % 4 != 0 or x == 0) if k]
fig, ax = plt.subplots()
for i in range(0,len(XX)):
plt.plot(XX[i],YY[i],label = Velocities[i])
ax.legend()
plt.title("Ping-Pong Ball Artillery Motion")
plt.xlabel("distance")
plt.ylabel("height")
plt.ylim(ymin = 0)
ax.set_autoscaley_on(1)
Old Answer:
The first time you iterate over all lines in the file, your xPoints and yPoints arrays are empty. Therefore, when you try and plot values for v = 4, you are plotting an empty array - hence your missing line.
You need to populate the arrays first, and then plot them. At the moment, you are plotting the values for v = 4 in the line labelled v = 8, and for v = 8, the values for v = 16 and so on.
Ignore:
For the array population, try the following:
xPoints = []
yPoints = []
with open('artilleryMotion.txt') as inf:
# initialize placeholder velocity variable
velocity = 0
for line in inf:
column = line.split()
if line.startswith("v"):
velocity = column[1]
else:
xPoints.append({velocity: column[0]})
yPoints.append({velocity: column[1]})
In the above, you save the data as a list of dictionaries (separate for x and y points), where the key is equal to the velocity that has been read in most recently, and the values are the x and y coordinates.
As a new velocity is read in, the placeholder variable velocity is updated and so the x and y values can be identified according the key that they have.
This allows you to Seaprate your plots by dictionary key (look up D.iteritems() D.items() ) and you can plot each set of points individually.
I have two arrays, lets say x and y that contain a few thousand datapoints.
Plotting a scatterplot gives a beautiful representation of them. Now I'd like to select all points within a certain radius. For example r=10
I tried this, but it does not work, as it's not a grid.
x = [1,2,4,5,7,8,....]
y = [-1,4,8,-1,11,17,....]
RAdeccircle = x**2+y**2
r = 10
regstars = np.where(RAdeccircle < r**2)
This is not the same as an nxn array, and RAdeccircle = x**2+y**2 does not seem to work as it does not try all permutations.
You can only perform ** on a numpy array, But in your case you are using lists, and using ** on a list returns an error,so you first need to convert the list to numpy array using np.array()
import numpy as np
x = np.array([1,2,4,5,7,8])
y = np.array([-1,4,8,-1,11,17])
RAdeccircle = x**2+y**2
print RAdeccircle
r = 10
regstars = np.where(RAdeccircle < r**2)
print regstars
>>> [ 2 20 80 26 170 353]
>>> (array([0, 1, 2, 3], dtype=int64),)