Problem
Assume you have a structured np.array as such:
first_array = np.array([1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9])
and you would like to create a new np.array of same size, but shuffled.
E.g.
second_array = np.random.shuffle(first_array)
second_array
# np.array([3, 2, 9, 5, 6, 1, 1, 6, 9, 7, 8, 5, 2, 7, 8, 3, 4, 4])
So far, so good. However, random shuffling leads to some duplicates being very close to each other, which is something I'm trying to avoid.
Question
How do I shuffle this array so that the integer order is pseudo-random but so that each duplicate has high probability to be very distant to each other? Is there a more specific term for this problem?
This is more of an algorithmic problem than numpy only. A naive approach can be to split the array with the minimum target spacing (spacing_condition), that numbers should be at least that far apart.
import numpy as np
first_array = np.array([1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9])
spacing_condition = 3
subarrays = np.split(first_array, spacing_condition)
Next step is to choose sequentially from subarrays in order, this would guarantee the spacing condition, and remove the choice from that subarray along.
However, this last two step, naive loops will be slow for large arrays. Following a naive implementation, seed is there only to reproduce.
np.random.seed(42)
def choose_over_spacing(subarrays):
choices = []
new_subarrays_ = []
subarray_indices = np.arange(len(subarrays[0]))
for subarray in subarrays:
index_to_choose = np.random.choice(subarray_indices, 1)[0]
number_choice = subarray[index_to_choose]
choices.append(number_choice)
new_subarray = np.delete(subarray, index_to_choose)
new_subarrays_.append(new_subarray)
return choices, new_subarrays_
all_choices = []
for _ in np.arange(len(subarrays[0])):
choices, subarrays = choose_over_spacing(subarrays)
all_choices = all_choices + choices
Inspecting the resulting, we see that we guarantee that duplicated numbers are at least 3 numbers apart, as we condition with spacing_condition, one could choose different spacing condition as long as initial split works.
[2, 6, 8, 3, 6, 7, 2, 5, 9, 3, 4, 9, 1, 4, 8, 1, 5, 7]
Related
I want to solve this problem, but this isn't my issue. I only give this as context.
"You are given an integer array height of length n. There are n vertical lines drawn such that the two endpoints of the ith line are (i, 0) and (i, height[i]).
Find two lines that together with the x-axis form a container, such that the container contains the most water.
Return the maximum amount of water a container can store."
The above vertical lines are represented by array [1,8,6,2,5,4,8,3,7]. In this case, the max area of water (blue section) the container can contain is 49.
I made a simple nested for loop to solve this problem:
for i in range(0, len(height)):
for j in range(0, len(height)):
maxim = max(min(height[i], height[j]) * abs(j - i),maxim)
But this solution takes too long for a bigger array. So I tried to do this with List Comprehension:
mxm = [min(height[i], height[j] * abs(j - i)) for i in range(0, len(height)) for j in range(0, len(height))]
maxim = max(mxm)
The problem is , I have 2 different outputs: the nested for loop works (it returns 49) but the second one returns 8. (the mxm array has these elements: [0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 6, 4, 8, 8, 8, 8, 8, 2, 6, 0, 2, 6, 6, 6, 6, 6, 2, 2, 2, 0, 2, 2, 2, 2, 2, 4, 5, 5, 2, 0, 4, 5, 5, 5, 4, 4, 4, 4, 4, 0, 4, 4, 4, 6, 8, 8, 6, 8, 4, 0, 3, 8, 3, 3, 3, 3, 3, 3, 3, 0, 3, 7, 7, 7, 7, 7, 7, 7, 3, 0])
Why are they different? And how can I make my solution faster?
In the first example you're applying the min function to just the height values
min(height[i], height[j])
In the second you include the absolute distance between index positions in that min function as well, it'll always apply to height[j] instead of the actual minimum.
min(height[i], height[j] * abs(j - i))
Also regarding your solution, I believe ive seen this problem before. I think what you're looking for is a sliding window.
My aim is to transform a one-dimensional time-series into a two-dimensional phase space. Since the time-series is one-dimensional, the phase space will be a pseudo (lag) phase space.
One theoretical approach to transform a time-series into pseudo phase space is as follows:
The original list of data is the full-length time-series x(t).
A subseries of data is the "lagged" version of the original time-series, starting with the second value of the time-series (instead of with its first one, as the original time-series) x(t+1).
Consequently, the subseries will always have one value less in its list. For a 3D phase space, a second subseries would have two values less in its list.
This is where my code related problem comes in, since matplotlib does not allow me to plot a two-dimensional plane when the length of two lists is not equal.
Here is my current code:
import numpy as np
import matplotlib.pyplot as plt
# Example time-series
Data = [924, -5, 24, 1, 0, 242, -5, 42, 5, 1, -9, 50, 3, 432, 0, -5, 4, 1, 2, 3]
# Embedding (time-series to phase space)
x_list = Data[:-1]
y_list = Data[1:]
# Plot
plt.plot(x_list, y_list, c="blue", linewidth=0.5)
plt.show()
This code uses the whole length of the time-series except the last value in the list by x_list = Data[:-1] for the x-axis. For the y-axis, the code uses the whole time-series except the very first item in the list by Data[1:].
While this code works, its result is not a real embedding of a time-series into its two-dimensional phase space, since x_list = Data[:-1] does not include the last value of the time-series.
What would be a proper way for coding and plotting the phase space of subseries that increasingly diminish in length compared to the original time-series data?
A simple approach is to use pandas and it's shift method:
Data = [924, -5, 24, 1, 0, 242, -5, 42, 5, 1, -9, 50, 3, 432, 0, -5, 4, 1, 2, 3]
import pandas as pd
import matplotlib.pyplot as plt
timeseries = pd.Series(Data)
plt.plot(timeseries, timeseries.shift(), c='blue', linewidth=0.5)
For a lag of 2 use shift(2)
output:
NB. you can also shift with numpy, but it is less elegant IMO
autocorrelation
I am not sure what is your end goal, but in case you try to determine if you have a period, or to perform autocorrelation analysis you can use pandas.plotting.autocorrelation_plot:
pd.plotting.autocorrelation_plot(timeseries)
output:
For a wrap around solution you could use a list comp to shift the data:
Data = list(range(10))
d = 5
multi = [Data[dims:]+Data[:dims] for dims in range(d) ]
print(*multi, sep="\n")
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[1, 2, 3, 4, 5, 6, 7, 8, 9, 0]
[2, 3, 4, 5, 6, 7, 8, 9, 0, 1]
[3, 4, 5, 6, 7, 8, 9, 0, 1, 2]
[4, 5, 6, 7, 8, 9, 0, 1, 2, 3]
if you do not want to wrap around, fix it like so:
d = 5
multi = [(Data[dims:]+Data[:dims])[:-d+1] for dims in range(d)]
to get
[0, 1, 2, 3, 4, 5]
[1, 2, 3, 4, 5, 6]
[2, 3, 4, 5, 6, 7]
[3, 4, 5, 6, 7, 8]
[4, 5, 6, 7, 8, 9]
If you want a hypothetical last value you would have to do some extrapolation of the series you got - if that makes more sense then cutting it short ... dunno.
Lets say I have a Python Numpy array a.
a = numpy.array([1,2,3,4,5,6,7,8,9,10,11])
I want to create a matrix of sub sequences from this array of length 5 with stride 3. The results matrix hence will look as follows:
numpy.array([[1,2,3,4,5],[4,5,6,7,8],[7,8,9,10,11]])
One possible way of implementing this would be using a for-loop.
result_matrix = np.zeros((3, 5))
for i in range(0, len(a), 3):
result_matrix[i] = a[i:i+5]
Is there a cleaner way to implement this in Numpy?
Approach #1 : Using broadcasting -
def broadcasting_app(a, L, S ): # Window len = L, Stride len/stepsize = S
nrows = ((a.size-L)//S)+1
return a[S*np.arange(nrows)[:,None] + np.arange(L)]
Approach #2 : Using more efficient NumPy strides -
def strided_app(a, L, S ): # Window len = L, Stride len/stepsize = S
nrows = ((a.size-L)//S)+1
n = a.strides[0]
return np.lib.stride_tricks.as_strided(a, shape=(nrows,L), strides=(S*n,n))
Sample run -
In [143]: a
Out[143]: array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])
In [144]: broadcasting_app(a, L = 5, S = 3)
Out[144]:
array([[ 1, 2, 3, 4, 5],
[ 4, 5, 6, 7, 8],
[ 7, 8, 9, 10, 11]])
In [145]: strided_app(a, L = 5, S = 3)
Out[145]:
array([[ 1, 2, 3, 4, 5],
[ 4, 5, 6, 7, 8],
[ 7, 8, 9, 10, 11]])
Starting in Numpy 1.20, we can make use of the new sliding_window_view to slide/roll over windows of elements.
And coupled with a stepping [::3], it simply becomes:
from numpy.lib.stride_tricks import sliding_window_view
# values = np.array([1,2,3,4,5,6,7,8,9,10,11])
sliding_window_view(values, window_shape = 5)[::3]
# array([[ 1, 2, 3, 4, 5],
# [ 4, 5, 6, 7, 8],
# [ 7, 8, 9, 10, 11]])
where the intermediate result of the sliding is:
sliding_window_view(values, window_shape = 5)
# array([[ 1, 2, 3, 4, 5],
# [ 2, 3, 4, 5, 6],
# [ 3, 4, 5, 6, 7],
# [ 4, 5, 6, 7, 8],
# [ 5, 6, 7, 8, 9],
# [ 6, 7, 8, 9, 10],
# [ 7, 8, 9, 10, 11]])
Modified version of #Divakar's code with checking to ensure that memory is contiguous and that the returned array cannot be modified. (Variable names changed for my DSP application).
def frame(a, framelen, frameadv):
"""frame - Frame a 1D array
a - 1D array
framelen - Samples per frame
frameadv - Samples between starts of consecutive frames
Set to framelen for non-overlaping consecutive frames
Modified from Divakar's 10/17/16 11:20 solution:
https://stackoverflow.com/questions/40084931/taking-subarrays-from-numpy-array-with-given-stride-stepsize
CAVEATS:
Assumes array is contiguous
Output is not writable as there are multiple views on the same memory
"""
if not isinstance(a, np.ndarray) or \
not (a.flags['C_CONTIGUOUS'] or a.flags['F_CONTIGUOUS']):
raise ValueError("Input array a must be a contiguous numpy array")
# Output
nrows = ((a.size-framelen)//frameadv)+1
oshape = (nrows, framelen)
# Size of each element in a
n = a.strides[0]
# Indexing in the new object will advance by frameadv * element size
ostrides = (frameadv*n, n)
return np.lib.stride_tricks.as_strided(a, shape=oshape,
strides=ostrides, writeable=False)
I need a function that decimates, removes m in n of, a numpy array. For example to remove 1 in 2 or remove 2 in 3. So an array which is:
[7, 4, 3, 5, 9, 2, 4, 1, 6, 8]
decimated by 1:2 would become:
[7, 3, 9, 4, 6]
I wonder if it is possible to reshape the array from 1d array N long to one that is 2d and N/2, 2 long then drop the extra dimension?
Ideally, rather than just dump the decimated samples, I would like to find the maximum value across each set (in this example pair) of values. For example:
[7, 5, 9, 4, 8]
Is there a way to find the maximum value across each set rather than just to drop it?
The added challenge is that the point here is to plot the values.
The decimation is required because plotting every value is taking too long meaning that I have to reduce the size of an array before plotting it but I need to do this quickly. So for or while loops would take too long.
A quick and dirty way is
k,N = 3,18
a = np.random.randint(0,10,N) #[9, 6, 6, 6, 8, 4, 1, 4, 8, 1, 2, 6, 1, 8, 9, 8, 2, 8]
a = a[:-k:k] #[9, 6, 1, 1, 1]
This should work regardless of k dividing into N or not.
It is worth being afraid of simply throwing out readings, because significant readings can be thrown out.
For the tasks that you described, it is worth using decimation.
Unfortunately it is not in numpy, but it is in scipy.
In the code below, I gave an example when discarding samples leads to an error.
As you can see, the original data (blue) has a peak. And manual thinning can just skip it (green).
If you apply deciamation from the library, then it will be included in the result (orange).
from scipy import signal
import matplotlib.pyplot as plt
import numpy as np
downsampling_factor = 2
t = np.linspace(0, 1, 50)
y = list(np.random.randint(0,10,int(len(t)/2))) + [50] + list(np.random.randint(0,10,int(len(t)/2-1)))
ydem = signal.decimate(y, downsampling_factor)
t_new = np.linspace(0, 1, len(ydem))
manual_decimation = y[:-downsampling_factor:downsampling_factor]
t_manual_decimation = np.linspace(0, 1, len(manual_decimation))
plt.plot(t, y, '.-', t_new, ydem, 'o-', t_manual_decimation, manual_decimation, 'x-')
plt.legend(['data', 'scipy decimate', 'manual decimate'], loc='best')
plt.show()
In general, this is not such a trivial task, please be careful.
UPD: note that the length of the vector must be greater than 27.
to find the maximum:
1) k divides N:
k,N = 3,18
a = np.random.randint(0,10,N)
a
# array([0, 6, 6, 3, 7, 0, 9, 2, 3, 2, 5, 4, 2, 6, 9, 6, 3, 2])
a.reshape(-1,k).max(1)
# array([6, 7, 9, 5, 9, 6])
2) k does not divide N:
k,N = 4,21
a = np.random.randint(0,10,N)
a
# array([4, 4, 6, 0, 0, 1, 7, 8, 2, 3, 0, 5, 7, 1, 1, 5, 7, 8, 3, 1, 7])
np.maximum.reduceat(a, np.arange(0,N,k))
# array([6, 8, 5, 7, 8, 7])
2) should always work but I suspect 1) is faster where applicable
I am struggeling with pretty easy thing but unfortunatelly I cannot solve it. I have a matrix 64x64 elements as you can see on the image. Where reds are zeros and greens are values I am interested in.
I would like to end up with only lower triangular part under diagonal (green values) into one array.
I use Python 2.7
Thank you a lot,
Michael
Assuming you can pull your data into a numpy array, use the tril_indices function. It looks like your data doesn't include the main diagonal so you can shift by -1
data = np.arange(4096).reshape(64, 64)
inds = np.tril_indices(64, -1)
vals = data[inds]
You can use np.tril_indices which returns the indices of a lower triangular part of a matrix with given shape, the indices can be further used to extract values from the matrix, suppose your matrix is called arr:
arr[np.tril_indices(n=64,m=64)]
You can provide an extra offset parameter if you want to exclude the diagonal:
arr[np.tril_indices(n = 64, m = 64, k = -1)]
An example:
arr = np.array([list(range(i, 5+i)) for i in range(5)])
arr
#array([[0, 1, 2, 3, 4],
# [1, 2, 3, 4, 5],
# [2, 3, 4, 5, 6],
# [3, 4, 5, 6, 7],
# [4, 5, 6, 7, 8]])
arr[np.tril_indices(n = 5, m = 5)]
# array([0, 1, 2, 2, 3, 4, 3, 4, 5, 6, 4, 5, 6, 7, 8])
Two time faster than triu on this example :
np.concatenate([arr[i,:i] for i in range(1,n)])