Create a matrix with solution arrays in it - python

I need to create a matrix in Python containing a list of unknown arrays r that have this form:
r_i=[r1,r2,r3,r4,th_12,th_13]
I am running for statements with a couple of if conditions that will give me in output a number of r_i arrays that I don't know from the beginning.
I am looking for a function like append that I normally use to create a vector with all the solutions that I generate, but this time each solution is not a single value but an array of 6 values and I am not able to generate what I want.
I need to create a matrix like this one, where every r_1 has the form of the code that I wrote above.
EDIT: I would like to generate a numpy array (R_tot should be a numpy array).

You can generate the array normally as I explained in my comment:
r_tot = []
for r_i in however_many_rs_there_are: # each r_i contains an array of 6 values
r_tot.append(r_i)
You can then convert r_tot into a numpy array like so:
import numpy
np_array = numpy.array(r_tot)
Here's a very simple proof of concept:
>>> import random, numpy
>>> r_tot = []
>>> for i in range(0,random.randint(1,20)): # append an arbitrary number of arrays
r_i = [1,2,3,4,5,6] # all of size six
r_tot.append(r_i) # to r_tot
>>> np_array = numpy.array(r_tot) # then convert to numpy array!
>>> np_array # did it work?
array([[1, 2, 3, 4, 5, 6],
[1, 2, 3, 4, 5, 6],
[1, 2, 3, 4, 5, 6],
[1, 2, 3, 4, 5, 6],
[1, 2, 3, 4, 5, 6],
[1, 2, 3, 4, 5, 6], # yeaaaah
[1, 2, 3, 4, 5, 6],
[1, 2, 3, 4, 5, 6],
[1, 2, 3, 4, 5, 6],
[1, 2, 3, 4, 5, 6],
[1, 2, 3, 4, 5, 6],
[1, 2, 3, 4, 5, 6],
[1, 2, 3, 4, 5, 6]])

Related

Creating shifted Hankel matrix

Say I have some time-series data in the form of a simple array.
X1 = np.array[(1, 2, 3, 4]
The Hankel matrix can be obtained by using scipy.linalg.hankel, which would look something like this:
hankel(X1)
array([[1, 2, 3, 4],
[2, 3, 4, 0],
[3, 4, 0, 0],
[4, 0, 0, 0]])
Now assume I had a larger array in the form of
X2 = np.array([1, 2, 3, 4, 5, 6, 7])
What I want to do is fill in the zeros in this matrix with the numbers that are next in the index (specific to each row). Taking the same Hankel matrix earlier by using the first four values in the array X2, I'd like to see the following output:
hankel(X2[:4])
array([[1, 2, 3, 4],
[2, 3, 4, 5],
[3, 4, 5, 6],
[4, 5, 6, 7]])
How would I do this? I'd ideally like to use this for larger data.
Appreciate any tips or pointers given. Thanks!
If you have a matrix with the appropriate index values into your dataset, you can use integer array indexing directly into your dataset.
To create the index matrix, you can simply use the upper-left quadrant of a double-sized Hankel array. There are likely simpler ways to create the index matrix, but this does the trick.
>>> X = np.array([9, 8, 7, 6, 5, 4, 3])
>>> N = 4 # the size of the "window"
>>> indices = scipy.linalg.hankel(np.arange(N*2))[:N, :N]
>>> indices
array([[0, 1, 2, 3],
[1, 2, 3, 4],
[2, 3, 4, 5],
[3, 4, 5, 6]])
>>> X[indices]
array([[9, 8, 7, 6],
[8, 7, 6, 5],
[7, 6, 5, 4],
[6, 5, 4, 3]])

Shuffling two 2D tensors in PyTorch and maintaining same order correlation

Is it possible to shuffle two 2D tensors in PyTorch by their rows, but maintain the same order for both? I know you can shuffle a 2D tensor by rows with the following code:
a=a[torch.randperm(a.size()[0])]
To elaborate:
If I had 2 tensors
a = torch.tensor([[1, 1, 1, 1, 1],
[2, 2, 2, 2, 2],
[3, 3, 3, 3, 3]])
b = torch.tensor([[4, 4, 4, 4, 4],
[5, 5, 5, 5, 5],
[6, 6, 6, 6, 6]])
And ran them through some function/block of code to shuffle randomly but maintain correlation and produce something like the following
a = torch.tensor([[2, 2, 2, 2, 2],
[1, 1, 1, 1, 1],
[3, 3, 3, 3, 3]])
b = torch.tensor([[5, 5, 5, 5, 5],
[4, 4, 4, 4, 4],
[6, 6, 6, 6, 6]])
My current solution is converting to a list, using the random.shuffle() function like below.
a_list = a.tolist()
b_list = b.tolist()
temp_list = list(zip(a_list , b_list ))
random.shuffle(temp_list) # Shuffle
a_temp, b_temp = zip(*temp_list)
a_list, b_list = list(a_temp), list(b_temp)
# Convert back to tensors
a = torch.tensor(a_list)
b = torch.tensor(b_list)
This takes quite a while and was wondering if there is a better way.
You mean
indices = torch.randperm(a.size()[0])
a=a[indices]
b=b[indices]
?

Trying to convert pandas df to np array, dtaidistance computes list instead

I am attempting to compute the distance matrix for an ndarray that I have converted from pandas. I tried to convert the pandas df currently in this format:
move_df =
movement
0 [4, 3, 6, 2]
1 [5, 2, 3, 6, 2]
2 [4, 7, 2, 3, 6, 1]
3 [4, 4, 4, 3]
... ...
33410 [2, 6, 3, 1, 8]
[33410 x 1 columns]
to a numpy ndarray by using the following:
1) m = move_df.to_numpy()
2) m = pd.DataFrame(move_df.tolist()).values
3) m = [move_df.tolist() for i in move_df.columns]
Each of these conversions resulted in a numpy array in this format:
[[list([4, 3, 6, 2])]
[list([5, 2, 3, 6, 2])]
[list([4, 7, 2, 3, 6, 1])]
[list([4, 4, 4, 3])]
...
[list([2, 6, 3, 1, 8])]]
So when I try to run dtaidistance matrix, I get the following error:
d_m = dtw.distance_matrix(m)
TypeError: unsupported operand type(s) for -: 'list' and 'list'
But when I create a list of lists by copying and pasting several of the numpy arrays created with any of the methods mentioned above, the code works. But this is not feasible in the long run since the arrays are over 30k rows. Is there something I am doing wrong in the conversion from pandas df to numpy array? I used
print(type(m))
and it outputs that it is a numpy array and I already know that I cannot subtract a list from a list, hence the error.
EDIT:
For move_df.head(10).to_dict()
{'movement': {0: [4, 3, 6, 2],
1: [5, 2, 3, 6, 2],
2: [4, 7, 2, 3, 6, 1],
3: [4, 4, 4, 3],
4: [3, 6, 2, 3, 3],
5: [6, 2, 1],
6: [1, 1, 1, 1],
7: [7, 2, 3, 1, 1],
8: [7, 2, 3, 2, 1],
9: [6, 2, 3, 1]}}
(one of the dtaidistance authors here)
The dtaidistance package expects one of three formats:
A 2D numpy array (where all sequences have the same length by definition)
A Python list of 1D numpy.array or array.array.
A Python list of Python lists
In your case you could do:
series = move_df['movement'].to_list()
dtw.distance_matrix(series)
which works then on a list of lists.
To use the fast C implementation an array is required (either Numpy or std lib array). If you want to keep different lengths you can do
series = move_df['movement'].apply(lambda a: np.array(a, dtype=np.double)).to_list()
dtw.distance_matrix_fast(series)
Note that it might make sense to do the apply operation inplace on your move_df datastructure such that you only have to do it once and not keep track of two nearly identical datastructures. After you do this, the to_list call is sufficient. Thus:
move_df['movement'] = move_df['movement'].apply(lambda a: np.array(a, dtype=np.double))
series = move_df['movement'].to_list()
dtw.distance_matrix_fast(series)
If you want to use a 2D numpy matrix, you would need to truncate or pad all series to be the same length as is explained in other answers (for dtw padding is more common to not lose information).
ps. This assumes you want to do univariate DTW, the ndim subpackage for multivariate time series expects a different datastructure.
Assuming you want to form an array with the lists of length 4:
m = df['movement'].str.len().eq(4)
a = np.array(df.loc[m, 'movement'].to_list())
output:
array([[4, 3, 6, 2],
[4, 4, 4, 3],
[1, 1, 1, 1],
[6, 2, 3, 1]])
used input:
df = pd.DataFrame({'movement': [[4, 3, 6, 2],
[5, 2, 3, 6, 2],
[4, 7, 2, 3, 6, 1],
[4, 4, 4, 3],
[3, 6, 2, 3, 3],
[6, 2, 1],
[1, 1, 1, 1],
[7, 2, 3, 1, 1],
[7, 2, 3, 2, 1],
[6, 2, 3, 1]]})
A dataframe created with:
In [112]: df = pd.DataFrame({'movement': {0: [4, 3, 6, 2],
...: 1: [5, 2, 3, 6, 2],
...: 2: [4, 7, 2, 3, 6, 1],
...: 3: [4, 4, 4, 3],
...: 4: [3, 6, 2, 3, 3],
...: 5: [6, 2, 1],
...: 6: [1, 1, 1, 1],
...: 7: [7, 2, 3, 1, 1],
...: 8: [7, 2, 3, 2, 1],
...: 9: [6, 2, 3, 1]}})
has an object dtype column that contains lists. The array derived from that column is object dtype:
In [121]: arr = df['movement'].to_numpy()
In [122]: arr
Out[122]:
array([list([4, 3, 6, 2]), list([5, 2, 3, 6, 2]),
list([4, 7, 2, 3, 6, 1]), list([4, 4, 4, 3]),
list([3, 6, 2, 3, 3]), list([6, 2, 1]), list([1, 1, 1, 1]),
list([7, 2, 3, 1, 1]), list([7, 2, 3, 2, 1]), list([6, 2, 3, 1])],
dtype=object)
By selecting the column I get a 1d array, not the 2d you get. Otherwise it's the same
This cannot be converted into a 2d numeric dtype array. For most purposes we can think of this as a list of lists.
In [123]: arr.tolist()
Out[123]:
[[4, 3, 6, 2],
[5, 2, 3, 6, 2],
[4, 7, 2, 3, 6, 1],
[4, 4, 4, 3],
[3, 6, 2, 3, 3],
[6, 2, 1],
[1, 1, 1, 1],
[7, 2, 3, 1, 1],
[7, 2, 3, 2, 1],
[6, 2, 3, 1]]
If the lists were all the same length, or if we pick a subset, it is possible to construct a 2d array:
In [125]: arr[[0,3,6,9]]
Out[125]:
array([list([4, 3, 6, 2]), list([4, 4, 4, 3]), list([1, 1, 1, 1]),
list([6, 2, 3, 1])], dtype=object)
In [126]:
In [126]: np.stack(arr[[0,3,6,9]])
Out[126]:
array([[4, 3, 6, 2],
[4, 4, 4, 3],
[1, 1, 1, 1],
[6, 2, 3, 1]])
Padding and slicing could also be used to force the lists to matching lengths - but that could mean losing information.
But without knowing what dtw.distance_matrix expects (looks like it wants a 2d numeric array), or what these lists represent, I can't go further.
The fundamental point is that your dataframe contains lists that vary in length.

swap two elements in 2d array

I have an array of the shape (10296, 6). I want to swap the two last elements in the subarray.
a = [[1, 2, 3, 4, 5, 6][1, 2, 3, 4, 5, 6]...
So that 5 and 6 of each array is swapped into:
a = [[1, 2, 3, 4, 6, 5][1, 2, 3, 4, 6, 5]...
Try advanced slicing in numpy. Read more here -
import numpy as np
a = np.array([[1, 2, 3, 4, 5, 6],
[1, 2, 3, 4, 5, 6]])
a[:,[4, 5]] = a[:,[5, 4]]
array([[1, 2, 3, 4, 6, 5],
[1, 2, 3, 4, 6, 5]])

How to sum columns of an array in Python

How do I add up all of the values of a column in a python array? Ideally I want to do this without importing any additional libraries.
input_val = [[1, 2, 3, 4, 5],
[1, 2, 3, 4, 5],
[1, 2, 3, 4, 5]]
output_val = [3, 6, 9, 12, 15]
I know I this can be done in a nested for loop, wondering if there was a better way (like a list comprehension)?
zip and sum can get that done:
Code:
[sum(x) for x in zip(*input_val)]
zip takes the contents of the input list and transposes them so that each element of the contained lists is produced at the same time. This allows the sum to see the first elements of each contained list, then next iteration will get the second element of each list, etc...
Test Code:
input_val = [[1, 2, 3, 4, 5],
[1, 2, 3, 4, 5],
[1, 2, 3, 4, 5]]
print([sum(x) for x in zip(*input_val)])
Results:
[3, 6, 9, 12, 15]
In case you decide to use any library, numpy easily does this:
np.sum(input_val,axis=0)
You may also use sum with zip within the map function:
# In Python 3.x
>>> list(map(sum, zip(*input_val)))
[3, 6, 9, 12, 15]
# explicitly type-cast it to list as map returns generator expression
# In Python 2.x, explicit type-casting to list is not needed as `map` returns list
>>> map(sum, zip(*input_val))
[3, 6, 9, 12, 15]
Try this:
input_val = [[1, 2, 3, 4, 5],
[1, 2, 3, 4, 5],
[1, 2, 3, 4, 5]]
output_val = [sum([i[b] for i in input_val]) for b in range(len(input_val[0]))]
print output_val
Please construct your array using the NumPy library:
import numpy as np
create the array using the array( ) function and save it in a variable:
arr = np.array(([1, 2, 3, 4, 5],[1, 2, 3, 4, 5],[1, 2, 3, 4, 5]))
apply sum( ) function to the array specifying it for the columns by setting the axis parameter to zero:
arr.sum(axis = 0)
This should work:
[sum(i) for i in zip(*input_val)]
I guess you can use:
import numpy as np
new_list = sum(map(np.array, input_val))
I think this is the most pythonic way of doing this
map(sum, [x for x in zip(*input_val)])
One-liner using list comprehensions: for each column (length of one row), make a list of all the entries in that column, and sum that list.
output_val = [sum([input_val[i][j] for i in range(len(input_val))]) \
for j in range(len(input_val[0]))]
Try this code. This will make output_val end up as [3, 6, 9, 12, 15] given your input_val:
input_val = [[1, 2, 3, 4, 5],
[1, 2, 3, 4, 5],
[1, 2, 3, 4, 5]]
vals_length = len(input_val[0])
output_val = [0] * vals_length # init empty output array with 0's
for i in range(vals_length): # iterate for each index in the inputs
for vals in input_val:
output_val[i] += vals[i] # add to the same index
print(output_val) # [3, 6, 9, 12, 15]
Using Numpy you can easily solve this issue in one line:
1: Input
input_val = [[1, 2, 3, 4, 5],
[1, 2, 3, 4, 5],
[1, 2, 3, 4, 5]]
2: Numpy does the math for you
np.sum(input_val,axis=0)
3: Then finally the results
array([ 3, 6, 9, 12, 15])
output_val=input_val.sum(axis=0)
this would make the code even simpler I guess
You can use the sum function instead of np.sum simply.
input_val = np.array([[1, 2, 3, 4, 5],
[1, 2, 3, 4, 5],
[1, 2, 3, 4, 5]])
sum(input_val)
output: array([ 3, 6, 9, 12, 15])

Categories

Resources