I want to implement a rolling concatenation function for numpy array of arrays. For example, if my numpy array is the following:
[[1.0]
[1.5]
[1.6]
[1.8]
...
...
[1.2]
[1.3]
[1.5]]
then, for a window size of 3, my function should return:
[[1.0]
[1.0 1.5]
[1.0 1.5 1.6]
[1.5 1.6 1.8]
...
...
[1.2 1.3 1.5]]
The input array could have elements of different shapes as well. For example, if input is:
[[1.0]
[1.5]
[1.6 1.7]
[1.8]
...
...
[1.2]
[1.3]
[1.5]]
then output should be:
[[1.0]
[1.0 1.5]
[1.0 1.5 1.6 1.7]
[1.5 1.6 1.7 1.8]
...
...
[1.2 1.3 1.5]]
First, make your array into a list. There's no purpose in having an array of arrays in numpy.
l = arr.tolist() #l is a list of arrays
Now use list comprehension to get your elements, and concatenate them with np.r_
l2 = [np.r_[tuple(l[max(i - n, 0):i])] for i in range(1, len(l)+1)]
Related
I would like to remove nan elements from two pair of different dimension numpy array using Python. One numpy array with shape (8, 3) and another with shape (8,). Meaning if at least one nan element appear in a row, the entire row need to be removed. However I faced issues when this two pair of array has different dimension.
For example,
[1.7 2.3 3.4] 4.2
[2.3 3.4 4.2] 4.6
[3.4 nan 4.6] 4.8
[4.2 4.6 4.8] 4.6
[4.6 4.8 4.6] nan
[4.8 4.6 nan] nan
[4.6 nan nan] nan
[nan nan nan] nan
I want it to become
[1.7 2.3 3.4] 4.2
[2.3 3.4 4.2] 4.6
[4.2 4.6 4.8] 4.6
This is my code which generate the sequence data,
def split_sequence(sequence, n_steps):
X, y = list(), list()
for i in range(len(sequence)):
# find the end of this pattern
end_ix = i + n_steps
# check if we are beyond the sequence
if end_ix > len(sequence)-1:
break
# gather input and output parts of the pattern
seq_x, seq_y = sequence[i:end_ix], sequence[end_ix]
X.append(seq_x)
y.append(seq_y)
return array(X), array(y)
n_steps = 3
sequence = df_sensorRefill['sensor'].to_list()
X, y = split_sequence(sequence, n_steps)
Thanks
You could use np.isnan(), np.any() to find rows containing nan's and np.delete() to remove such rows.
I am trying to find "missing" values in a python array of floats.
Such that in this case [1.1, 1.3, 2.1, 2.2, 2.3] I would like to print "1.2"
I dont have much experience with floats, I have tried something like this How to find a missing number from a list? but it doesn't work on floats.
Thanks!
To solve this, the problem would need to be simplified first, I am assuming that all the values would be float and with one decimal place, also let's assume that there can be multiple ranges like 1.1-1.3 and 2.1-2.3, also assuming that the numbers are in sorted order, here is a solution. It is written in python 3 by the way
vals = [1.1, 1.3, 2.1, 2.2, 2.3] # This will be the values in which to find the missing number
# The logic starts from here
for i in range(len(vals) - 1):
if vals[i + 1] * 10 - vals[i] * 10 == 2:
print((vals[i] * 10 + 1)/10)
print("\nfinished")
You might want to use https://numpy.org/doc/stable/reference/generated/numpy.arange.html
and create a list of floats (if you know start, end, step values)
Then you can create two sets and use difference to find missing values
Simplest yet dumb way:
Split float to integer and decimal parts.
Create cartesian product of both to generate Full array.
Use set and XOR to find out missing ones.
from itertools import product
source = [1.1, 1.3, 2.1, 2.2, 2.3]
separated = [str(n).split(".") for n in source]
integers, decimals = map(set, zip(*separated))
products = [float(f"{i}.{d}") for i, d in product(integers, decimals)]
print(*(set(products) ^ set(source)))
output:
1.2
I guess that the solutions to the problem you quote proprably work on your case, you just need to adapt the built-in range function to numpy.arange that allow you to create a range of numbers with floats.
it gets something like that: (just did a simple example)
import numpy as np
np_range = np.arange(1, 2, 0.1)
float_list = [1.2, 1.3, 1.4, 1.6]
for i in np_range:
if not round(i, 1) in float_list:
print(round(i, 1))
output:
1.0
1.1
1.5
1.7
1.8
1.9
This is an absolutely AWFUL way to do this, but depending on how many numbers you have in the list and how difficult the other solutions are you might appreciate it.
If you write
firstvalue = 1.1
secondvalue = 1.2
thirdvalue = 1.3
#assign these for every value you are keeping track of
if firstvalue in val: #(or whatever you named your list)
print("1.1 is in the list")
else:
print("1.1 is missing!")
if secondvalue in val:
print("1.2 is in the list")
else:
print("1.2 is missing!")
#etc etc etc for every value in the list. It's tedious and dumb but if you have few enough values in your list it might be your simplest option
With numpy
import numpy as np
arr = [1.1, 1.3, 2.1, 2.2, 2.3]
find_gaps = np.array(arr).round(1)
find_gaps[np.r_[np.diff(find_gaps).round(1), False] == 0.2] + 0.1
Output
array([1.2])
Test with random data
import numpy as np
np.random.seed(10)
arr = np.arange(0.1, 10.4, 0.1)
mask = np.random.randint(0,2, len(arr)).astype(np.bool)
gaps = arr[mask]
print(gaps)
find_gaps = np.array(gaps).round(1)
print('missing values:')
print(find_gaps[np.r_[np.diff(find_gaps).round(1), False] == 0.2] + 0.1)
Output
[ 0.1 0.2 0.4 0.6 0.7 0.9 1. 1.2 1.3 1.6 2.2 2.5 2.6 2.9
3.2 3.6 3.7 3.9 4. 4.1 4.2 4.3 4.5 5. 5.2 5.3 5.4 5.6
5.8 5.9 6.1 6.4 6.8 6.9 7.3 7.5 7.6 7.8 7.9 8.1 8.7 8.9
9.7 9.8 10. 10.1]
missing values:
[0.3 0.5 0.8 1.1 3.8 4.4 5.1 5.5 5.7 6. 7.4 7.7 8. 8.8 9.9]
More general solution
Find all missing value with specific gap size
import numpy as np
def find_missing(find_gaps, gaps = 1):
find_gaps = np.array(find_gaps)
gaps_diff = np.r_[np.diff(find_gaps).round(1), False]
gaps_index = find_gaps[(gaps_diff >= 0.2) & (gaps_diff <= round(0.1*(gaps + 1),1))]
gaps_values = np.searchsorted(find_gaps, gaps_index)
ranges = np.vstack([(find_gaps[gaps_values]+0.1).round(1),find_gaps[gaps_values+1]]).T
return np.concatenate([np.arange(start, end, 0.1001) for start, end in ranges]).round(1)
vals = [0.1,0.3, 0.6, 0.7, 1.1, 1.5, 1.8, 2.1]
print('Vals:', vals)
print('gap=1', find_missing(vals, gaps = 1))
print('gap=2', find_missing(vals, gaps = 2))
print('gap=3', find_missing(vals, gaps = 3))
Output
Vals: [0.1, 0.3, 0.6, 0.7, 1.1, 1.5, 1.8, 2.1]
gap=1 [0.2]
gap=2 [0.2 0.4 0.5 1.6 1.7 1.9 2. ]
gap=3 [0.2 0.4 0.5 0.8 0.9 1. 1.2 1.3 1.4 1.6 1.7 1.9 2. ]
Working with a numpy's ndarray for preprossessing data to a neural network. It basically contains several fixed-length arrays for sensor data. So for example:
>>> type(arr)
<class 'numpy.ndarray'>
>>> arr.shape
(400,1,5,4)
>>> arr
[
[[ 9.4 -3.7 -5.2 3.8]
[ 2.8 1.4 -1.7 3.4]
[ 0.0 0.0 0.0 0.0]
[ 0.0 0.0 0.0 0.0]
[ 0.0 0.0 0.0 0.0]]
..
[[ 0.0 -1.0 2.1 0.0]
[ 3.0 2.8 -3.0 8.2]
[ 7.5 1.7 -3.8 2.6]
[ 0.0 0.0 0.0 0.0]
[ 0.0 0.0 0.0 0.0]]
]
Each of the nested array is shaped (1, 5,4). The goal is to run through this arr and select only those arrays having at least the first three rows as non-zero (although single entry can be zero, but not whole row).
So in the example I give above, the first nested array should be deleted because only 2 first-rows are non-zero, whereas we need 3 and above.
Here's a trick you can use:
mask = arr[:,:,:3].any(axis=3).all(axis=2)
arr_filtered = arr[mask]
Quick explanation: To keep a nested array it should have at least 3 first rows (hence we need to look only at arr[:,:,:3]) such that all of them (hence .all(axis=2) at the end) have at least one non-zero entry (hence .any(axis=3)).
The question of how to compute a running mean of a series of numbers has been asked and answered before. However, I am trying to compute the running mean of a series of ndarrays, with an unknown length of series. So, for example, I have an iterator data where I would do:
running_mean = np.zeros((1000,3))
while True:
datum = next(data)
running_mean = calc_running_mean(datum)
What would calc_running_mean look like? My primary concern here is memory, as I can't have the entirety of the data in memory, and I don't know how much data I will be receiving. datum would be an ndarray, let's say that for this example it's a (1000,3) array, and the running mean would be an array of the same size, with each element containing the elementwise mean of every element we've seen in that position so far.
The key distinction this question has from previous questions is that it's calculating the elementwise mean of a series of ndarrays, and the number of arrays is unknown.
You can use itertools together with standard operators:
>>> import itertools, operator
>>> running_sum = itertools.accumulate(data)
>>> running_mean = map(operator.truediv, running_sum, itertools.count(1))
Example:
>>> data = (np.linspace(-i, i*i, 6) for i in range(10))
>>>
>>> running_sum = itertools.accumulate(data)
>>> running_mean = map(operator.truediv, running_sum, itertools.count(1))
>>>
>>> for i in running_mean:
... print(i)
...
[0. 0. 0. 0. 0. 0.]
[-0.5 -0.3 -0.1 0.1 0.3 0.5]
[-1. -0.46666667 0.06666667 0.6 1.13333333 1.66666667]
[-1.5 -0.5 0.5 1.5 2.5 3.5]
[-2. -0.4 1.2 2.8 4.4 6. ]
[-2.5 -0.16666667 2.16666667 4.5 6.83333333 9.16666667]
[-3. 0.2 3.4 6.6 9.8 13. ]
[-3.5 0.7 4.9 9.1 13.3 17.5]
[-4. 1.33333333 6.66666667 12. 17.33333333 22.66666667]
[-4.5 2.1 8.7 15.3 21.9 28.5]
I have a numpy array (a):
array([[ 1. , 5.1, 3.5, 1.4, 0.2],
[ 1. , 4.9, 3. , 1.4, 0.2],
[ 2. , 4.7, 3.2, 1.3, 0.2],
[ 2. , 4.6, 3.1, 1.5, 0.2]])
I would like to make a pandas dataframe (pd) with values=a, columns= A,B,C,D and index= to the first column of my numpy array, finally it should looks like this:
A B C D
1 5.1 3.5 1.4 0.2
1 4.9 3.0 1.4 0.2
2 4.7 3.2 1.3 0.2
2 4.6 3.1 1.5 0.2
I am trying this:
df = pd.DataFrame(a, index=a[:,0], columns=['A', 'B','C','D'])
and I get the following error:
ValueError: Shape of passed values is (5, 4), indices imply (4, 4)
Any help?
Thanks
You passed the complete array as the data param, you need to slice your array also if you want just 4 columns from the array as the data:
In [158]:
df = pd.DataFrame(a[:,1:], index=a[:,0], columns=['A', 'B','C','D'])
df
Out[158]:
A B C D
1 5.1 3.5 1.4 0.2
1 4.9 3.0 1.4 0.2
2 4.7 3.2 1.3 0.2
2 4.6 3.1 1.5 0.2
Also having duplicate values in the index will make filtering/indexing problematic
So here a[:,1:] I take all the rows but index from column 1 onwards as desired, see the docs