General way to quantize floating point numbers into arbitrary number of bins? - python

I want to quantize a series of numbers which have a maximum and minimum value of X and Y respectively into arbitrary number of bins. For instance, if the maximum value of my array is 65535 and the minimum is 0 (do not assume these are all integers) and I want to quantize the values into 2 bins, all values more than floor(65535/2) would become 65535 and the rest become 0. Similar story repeats if I want to quantize the array from any number between 1 to 65535. I wonder, is there an efficient and easy way to do this? If not, how can I do this efficiently for number of bins being powers of 2? Although a pseudocode would be fine but Python + Numpy is preferred.

It's not the most elegant solution, but:
MIN_VALUE = 0
MAX_VALUE = 65535
NO_BINS = 2
# Create random dataset from [0,65535] interval
numbers = np.random.randint(0,65535+1,100)
# Create bin edges
bins = np.arange(0,65535, (MAX_VALUE-MIN_VALUE)/NO_BINS)
# Get bin values
_, bin_val = np.histogram(numbers, NO_BINS-1, range=(MIN_VALUE, MAX_VALUE))
# Change the values to the bin value
for iter_bin in range(1,NO_BINS+1):
numbers[np.where(digits == iter_bin)] = bin_val[iter_bin-1]
UPDATE
Does the same job:
import pandas as pd
import numpy as np
# or bin_labels = [i*((MAX_VALUE - MIN_VALUE) / (NO_BINS-1)) for i in range(NO_BINS)]
_, bin_labels = np.histogram(numbers, NO_BINS-1, range=(MIN_VALUE, MAX_VALUE))
pd.cut(numbers, NO_BINS, right=False, labels=bin_labels)

Related

Compare two coordinates represented as complex numbers if they are within (1,1) of each other

I have two arrays loaded with complex numbers that represent a position in a cartesian coordinate (x,y).
sensors= np.array([-1.6-0.8j,-1.1-0.8j])
cameras= np.array([-3.7-0.8j,-1.6+0.9j,-1.6-0.9j])
Where the real part represents X and the imaginary part represents Y. These numbers represent in meters. So 1.5-0.5j = 1.5 meters +X and 0.5 meters -Y.
Using the isclose function has issues when the position of the sensors gets further from 0.0.
def close_to_sensors(sensors, observations):
tolerance = 0.6
observe_indices = np.zeros(observations.size, dtype=bool)
for sensor in sensors:
closeness = np.isclose(observations, np.ones(observations.size, dtype=np.complex128)*sensor, rtol=tolerance, atol=tolerance)
observe_indices = np.logical_or(observe_indices, closeness)
print("Closeness : ", closeness)
return np.argwhere(observe_indices).flatten()
This returns
Closeness : [False False True]
Likely Close: [2]
The isclose function is the wrong function to use. I need to return the indices of the cameras that are within 1 meter of the sensors. What would be the best way to do this?
To calculate distances from complex numbers, subtracting them and calculating the absolute value of the difference is a straight-forward approach to solve this problem:
import numpy as np
sensors = np.array([-1.6 - 0.8j, -1.1 - 0.8j])
cameras = np.array([-3.7 - 0.8j, -1.6 + 0.9j, -1.6 - 0.9j])
distance_limit = 1
# calculate difference of each sensor to each camera
# "None" is used to create a new axis, which enables broadcasting to a (sensors x cameras) matrix
complex_differences = sensors[:, None] - cameras
axis_sensor, axis_camera = (0,1)
distances = np.abs(complex_differences)
# check cameras which have any sensor within distance limit
within_range = distances < distance_limit
valid_cameras = np.any(within_range, axis=axis_sensor)
# show indices of valid cameras
print(np.where(valid_cameras)[0])
Thank you all for your responses but those resulted in undesired results. I eventually decided to change the complex number arrays to [real, imag] lists, then load the sensors list to a KDTree and searched the tree for observations that were close; where 1 = 1 meter. This provided the results I needed.
EDIT: Added code with data
import numpy as np
import scipy.spatial as spatial
def close_to_sensors(bifrost_sensors, observations):
sensors_x_y = []
observations_x_y = []
for i in range(bifrost_sensors.size):
sensors_x_y.append((bifrost_sensors[i].real, bifrost_sensors[i].imag))
for i in range(observations.size):
observations_x_y.append((observations[i].real, observations[i].imag))
observe_indices = np.zeros(observations.size, dtype=bool)
#KDTree the sensor list
sensor_tree = spatial.cKDTree(np.c_[sensors_x_y])
for i in range(len(observations_x_y)):
closeness = (sensor_tree.data[sensor_tree.query_ball_point(observations_x_y[i], 1)])
if closeness.size == 0:
observe_indices[i] = np.logical_or(observe_indices[i], 0)
else:
observe_indices[i] = np.logical_or(observe_indices[i], 1)
#Find the indices of array elements that are non-zero, grouped by element.
return np.argwhere(observe_indices).flatten()
#Excel copied data into arrays - 12 entries
sensors = np.array([-0.6-0.8j,-0.8-1.2j,-0.9-1.2j,-1.-0.9j,-1.1-1.j,1.1+1.j,-1.5-1.5j,-1.6-1.1j,-1.7-1.5j,1.1+1.j,1.8+0.8j,-2.-1.6j])
cameras = np.array([-4.03-1.1j,-4.15-1.14j,-1.5-1.16j,-4.05-1.14j,-4.05-1.14j,4.03+2.19j,-4.08-1.13j,-4.06-1.14j,-1.15-0.98j,3.21+1.92j,3.9+1.65j,-4.08-1.13j])
likely_bifrost = close_to_sensors(sensors, cameras)
print("Likely bifrost : ", likely_bifrost.size, " : ",likely_bifrost)

(Python) Equally Spacing Trues in a Series of Booleans

I am trying to make a custom PWM script to work with my Charlieplexed series of LEDs. However, I am struggling to make certain intensity values look smooth with no flashing. Different brightness values will have the LED on for a different amount of ticks. In order to make it feel smooth, I need to optimize the spacing of on and off ticks for the LEDs but I can quite figure out how to do so.
If you have a pattern that has x amount of booleans and n amount of them are true, how would you go about equally spacing out the trues as much as possible?
Here is an example of what I am looking for:
x = 10, n = 7
Desired result: 1,1,1,0,1,1,0,1,1,0
x = 10, n = 4
Desired result: 1,0,1,0,0,1,0,1,0,0
You can use the linspace function of numpy for this.
import numpy as np
x = 10
n = 7
# Gets the indices that should be true
def get_spaced_indices(array_length, num_trues):
array = np.arange(array_length) # Create array of "indices" 0 to x
indices = np.round(np.linspace(0, len(array) - 1, num_trues)).astype(int) # Evenly space the indices
values = array[indices]
return values
true_indices = get_spaced_indices(x, n)
print("true_indices:", true_indices)
result = [0] * x # Initialize your whole result to "false"
for i in range(x):
if i in true_indices: # Set evenly-spaced values to true in your result based on indices from get_spaced_indices
result[i] = 1
print("result:", result)

How do I force two arrays to be equal for use in pyplot?

I'm trying to plot a simple moving averages function but the resulting array is a few numbers short of the full sample size. How do I plot such a line alongside a more standard line that extends for the full sample size? The code below results in this error message:
ValueError: x and y must have same first dimension, but have shapes (96,) and (100,)
This is using standard matplotlib.pyplot. I've tried just deleting X values using remove and del as well as switching all arrays to numpy arrays (since that's the output format of my moving averages function) then tried adding an if condition to the append in the while loop but neither has worked.
import random
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
def movingaverage(values, window):
weights = np.repeat(1.0, window) / window
smas = np.convolve(values, weights, 'valid')
return smas
sampleSize = 100
min = -10
max = 10
window = 5
vX = np.array([])
vY = np.array([])
x = 0
val = 0
while x < sampleSize:
val += (random.randint(min, max))
vY = np.append(vY, val)
vX = np.append(vX, x)
x += 1
plt.plot(vX, vY)
plt.plot(vX, movingaverage(vY, window))
plt.show()
Expected results would be two lines on the same graph - one a simple moving average of the other.
Just change this line to the following:
smas = np.convolve(values, weights,'same')
The 'valid' option, only convolves if the window completely covers the values array. What you want is 'same', which does what you are looking for.
Edit: This, however, also comes with its own issues as it acts like there are extra bits of data with value 0 when your window does not fully sit on top of the data. This can be ignored if chosen, as is done in this solution, but another approach is to pad the array with specific values of your choosing instead (see Mike Sperry's answer).
Here is how you would pad a numpy array out to the desired length with 'nan's (replace 'nan' with other values, or replace 'constant' with another mode depending on desired results)
https://docs.scipy.org/doc/numpy/reference/generated/numpy.pad.html
import numpy as np
bob = np.asarray([1,2,3])
alice = np.pad(bob,(0,100-len(bob)),'constant',constant_values=('nan','nan'))
So in your code it would look something like this:
import random
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
def movingaverage(values,window):
weights = np.repeat(1.0,window)/window
smas = np.convolve(values,weights,'valid')
shorted = int((100-len(smas))/2)
print(shorted)
smas = np.pad(smas,(shorted,shorted),'constant',constant_values=('nan','nan'))
return smas
sampleSize = 100
min = -10
max = 10
window = 5
vX = np.array([])
vY = np.array([])
x = 0
val = 0
while x < sampleSize:
val += (random.randint(min,max))
vY = np.append(vY,val)
vX = np.append(vX,x)
x += 1
plt.plot(vX,vY)
plt.plot(vX,(movingaverage(vY,window)))
plt.show()
To answer your basic question, the key is to take a slice of the x-axis appropriate to the data of the moving average. Since you have a convolution of 100 data elements with a window of size 5, the result is valid for the last 96 elements. You would plot it like this:
plt.plot(vX[window - 1:], movingaverage(vY, window))
That being said, your code could stand to have some optimization done on it. For example, numpy arrays are stored in fixed size static buffers. Any time you do append or delete on them, the entire thing gets reallocated, unlike Python lists, which have amortization built in. It is always better to preallocate if you know the array size ahead of time (which you do).
Secondly, running an explicit loop is rarely necessary. You are generally better off using the under-the-hood loops implemented at the lowest level in the numpy functions instead. This is called vectorization. Random number generation, cumulative sums and incremental arrays are all fully vectorized in numpy. In a more general sense, it's usually not very effective to mix Python and numpy computational functions, including random.
Finally, you may want to consider a different convolution method. I would suggest something based on numpy.lib.stride_tricks.as_strided. This is a somewhat arcane, but very effective way to implement a sliding window with numpy arrays. I will show it here as an alternative to the convolution method you used, but feel free to ignore this part.
All in all:
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
def movingaverage(values, window):
# this step creates a view into the same buffer
values = np.lib.stride_tricks.as_strided(values, shape=(window, values.size - window + 1), strides=values.strides * 2)
smas = values.sum(axis=0)
smas /= window # in-place to avoid temp array
return smas
sampleSize = 100
min = -10
max = 10
window = 5
v_x = np.arange(sampleSize)
v_y = np.cumsum(np.random.random_integers(min, max, sampleSize))
plt.plot(v_x, v_y)
plt.plot(v_x[window - 1:], movingaverage(v_y, window))
plt.show()
A note on names: in Python, variable and function names are conventionally name_with_underscore. CamelCase is reserved for class names. np.random.random_integers uses inclusive bounds just like random.randint, but allows you to specify the number of samples to generate. Confusingly, np.random.randint has an exclusive upper bound, more like random.randrange.

Nan values when using np.linspace() as input

I am trying to calculate the probability of transmission for an electron through a series of potential wells. When looping through energy values using np.linspace() I get a return of nan for any value under 15. I understand this for values of 0 and 15, since they return a value of zero in the denominator for the k and q values. If I simply call getT(5) for example, I get a real value. However when getT(5) gets called from the loop using np.linspace(0,30,2001) then it returns nan. Shouldnt it return either nan or a value in both cases?
import numpy as np
import matplotlib.pyplot as plt
def getT(Ein):
#constants
hbar=1.055e-34 #J-s
m=9.109e-31 #mass of electron kg
N=10 #number of cells
a=1e-10 #meters
b=2e-10 #meters
#convert energy and potential to Joules
conv_J=1.602e-19
E_eV=Ein
V_eV=15
E=conv_J*E_eV
V=conv_J*V_eV
#calculate values for k and q
k=(2*m*E/hbar**2)**.5
q=(2*m*(E-V)/hbar**2)**.5
#create M1, M2 Matrices
M1=np.matrix([[((q+k)/(2*q))*np.exp(1j*k*b),((q-k)/(2*q))*np.exp(-1j*k*b)], \
[((q-k)/(2*q))*np.exp(1j*k*b),((q+k)/(2*q))*np.exp(-1j*k*b)]])
M2=np.matrix([[((q+k)/(2*k))*np.exp(1j*q*a),((k-q)/(2*k))*np.exp(-1j*q*a)], \
[((k-q)/(2*k))*np.exp(1j*q*a),((q+k)/(2*k))*np.exp(-1j*q*a)]])
#calculate M_Cell
M_Cell=M1*M2
#calculate M for N cells
M=M_Cell**N
#get items in M_Cell
M11=M.item(0,0)
M12=M.item(0,1)
M21=M.item(1,0)
M22=M.item(1,1)
#calculate r and t values
r=-M21/M22
t=M11-M12*M21/M22
#calculate final T value
T=abs(t)**2
return Ein,T
#create empty array for data to plot
data=[]
#Calculate T for 500 values of E in between 0 and 30 eV
for i in np.linspace(0,30,2001):
data.append(getT(i))
data=np.transpose(data)
#generate plot
fig, (ax1)=plt.subplots(1)
ax1.set_xlim([0,30])
ax1.set_xlabel('Energy (eV)',fontsize=32)
ax1.set_ylabel('T',fontsize=32)
ax1.grid()
plt.tick_params(labelsize=32)
plt.plot(data[0],data[1],lw=6)
plt.draw()
plt.show()
I think the difference comes from the line
q=(2*m*(E-V)/hbar**2)**.5
When testing with single values between 0 and 15, you're basically taking the root of a negative number (because E-V is negative), which is irrational, for example:
(-2)**0.5
>> (8.659560562354934e-17+1.4142135623730951j)
But when using np.linspace, you take the root of a NumPy array with negative values, which results in nan (and a warning):
np.array(-2)**0.5
>> RuntimeWarning: invalid value encountered in power
>> nan

Normal distributed sub-sampling from a numpy array in python

I have a numpy array whose values are distributed in the following manner
From this array I need to get a random sub-sample which is normally distributed.
I need to get rid of the values from the array which are above the red line in the picture. i.e. I need to get rid of some occurences of certain values from the array so that my distribution gets smoothened when the abrupt peaks are removed.
And my array's distribution should become like this:
Can this be achieved in python, without manually looking for entries corresponding to the peaks and remove some occurences of them ? Can this be done in a simpler way ?
The following kind of works, it is rather aggressive, though:
It works by ordering the samples, transforming to uniform and then trying to select a regular griddish subsample. If you feel it is too aggressive you could increase ns which is essentially the number of samples kept.
Also, please note that it requires the knowledge of the true distribution. In case of normal distribution you should be fine with using sample mean and unbiased variance estimate (the one with n-1).
Code (without plotting):
import scipy.stats as ss
import numpy as np
a = ss.norm.rvs(size=1000)
b = ss.uniform.rvs(size=1000)<0.4
a[b] += 0.1*np.sin(10*a[b])
def smooth(a, gran=25):
o = np.argsort(a)
s = ss.norm.cdf(a[o])
ns = int(gran / np.max(s[gran:] - s[:-gran]))
grid, dp = np.linspace(0, 1, ns, endpoint=False, retstep=True)
grid += dp/2
idx = np.searchsorted(s, grid)
c = np.flatnonzero(idx[1:] <= idx[:-1])
while c.size > 0:
idx[c+1] = idx[c] + 1
c = np.flatnonzero(idx[1:] <= idx[:-1])
idx = idx[:np.searchsorted(idx, len(a))]
return o[idx]
ap = a[smooth(a)]
c, b = np.histogram(a, 40)
cp, _ = np.histogram(ap, b)

Categories

Resources