Related
I'm currently creating a new column in my pandas dataframe, which calculates a value based on a simple calculation using a value in another column, and a simple value subtracting from it. This is my current code, which almost gives me the output I desire (example shortened for reproduction):
subtraction_value = 3
data = pd.DataFrame({"test":[12, 4, 5, 4, 1, 3, 2, 5, 10, 9]}
data['new_column'] = data['test'][::-1] - subtraction_value
When run, this gives me the current output:
print(data['new_column'])
[9,1,2,1,-2,0,-1,3,7,6]
However, if I wanted to use a different value to subtract on the column, from position [0], then use the original subtraction value on positions [1:3] of the column, before using the second value on position [4] again, and repeat this pattern, how would I do this iteratively? I realize I could use a for loop to achieve this, but for performance reasons I'd like to do this another way. My new output would ideally look like this:
subtraction_value_2 = 6
print(data['new_column'])
[6,1,2,1,-5,0,-1,3,4,6]
You can use positional indexing:
subtraction_value_2 = 6
col = data.columns.get_loc('new_column')
data.iloc[0::4, col] = data['test'].iloc[0::4].sub(subtraction_value_2)
or with numpy.where:
data['new_column'] = np.where(data.index%4,
data['test']-subtraction_value,
data['test']-subtraction_value_2)
output:
test new_column
0 12 6
1 4 1
2 5 2
3 4 1
4 1 -5
5 3 0
6 2 -1
7 5 2
8 10 4
9 9 6
subtraction_value = 3
subtraction_value_2 = 6
data = pd.DataFrame({"test":[12, 4, 5, 4, 1, 3, 2, 5, 10, 9]})
data['new_column'] = data.test - subtraction_value
data['new_column'][::4] = data.test[::4] - subtraction_value_2
print(list(data.new_column))
Output:
[6, 1, 2, 1, -5, 0, -1, 2, 4, 6]
In the below dataframe the column "CumRetperTrade" is a column which consists of a few vertical vectors (=sequences of numbers) separated by zeros. (= these vectors correspond to non-zero elements of column "Portfolio"). I would like to find the cumulative local maxima of every non-zero vector contained in column "CumRetperTrade".
To be precise, I would like to transform (using vectorization - or other - methods) column "CumRetperTrade" to the column "PeakCumRet" (desired result) which gives for every vector ( = subset corresponding to ’Portfolio =1 ’) contained in column "CumRetperTrade" the cumulative maximum value of (all its previous) values. The numeric example is below. Thanks in advance!
PS In other words, I guess that we need to use cummax() but to apply it only to the consequent (where 'Portfolio' = 1) subsets of 'CumRetperTrade'
import numpy as np
import pandas as pd
df1 = pd.DataFrame({"Portfolio": [1, 1, 1, 1, 0 , 0, 0, 1, 1, 1],
"CumRetperTrade": [2, 3, 2, 1, 0 , 0, 0, 4, 2, 1],
"PeakCumRet": [2, 3, 3, 3, 0 , 0, 0, 4, 4, 4]})
df1
Portfolio CumRetperTrade PeakCumRet
0 1 2 2
1 1 3 3
2 1 2 3
3 1 1 3
4 0 0 0
5 0 0 0
6 0 0 0
7 1 4 4
8 1 2 4
9 1 1 4
PPS I already asked a similar question previously (Dataframe column: to find local maxima) and received a correct answer to my question, however in my question I did not explicitly mention the requirement of cumulative local maxima
You only need a small modification to the previous answer:
df1["PeakCumRet"] = (
df1.groupby(df1["Portfolio"].diff().ne(0).cumsum())
["CumRetperTrade"].expanding().max()
.droplevel(0)
)
expanding().max() is what produces the local maxima.
I have a minimum and a maximum value, i also have a 70X3 array. I would like to find all the values from the second column of the array which are within the range of the min and max value and export all 3 columns of the array for those values.
For example
A=[2,3,4
3,5,6
5,5,5
5,6,7
10,11,22
3,50,6]
Max value is 11 and min is 5 the result of the matrix would be something like
B=[3,5,6
5,5,5
5,6,7
10,11,22]
Up till now i what i did is :
for i in MatrixA[:,1]):
if i<maximum and i>minimum:
aa.append(i)
aa=np.asarray(aa)
But this only finds the range of values i need from the second column and not the corresponding values from column 1 and 3
You can use
A[numpy.logical_and(5 <= A[:, 1], A[:, 1] <= 11), :]
With simple numpy expression:
import numpy as np
a = np.array([[2,3,4],[3,5,6], [5,5,5], [5,6,7],[10,11,22], [3,50,6]])
b = a[(a[:,1] >= 5) & (a[:,1] <= 11)]
print(b)
The output:
[[ 3 5 6]
[ 5 5 5]
[ 5 6 7]
[10 11 22]]
a[:,1] - considering values from specified axis (1 axis, 2nd column)
I know that the order of the keys is not guaranteed and that's OK, but what exactly does it mean that the order of the values is not guaranteed as well*?
For example, I am representing a matrix as a dictionary, like this:
signatures_dict = {}
M = 3
for i in range(1, M):
row = []
for j in range(1, 5):
row.append(j)
signatures_dict[i] = row
print signatures_dict
Are the columns of my matrix correctly constructed? Let's say I have 3 rows and at this signatures_dict[i] = row line, row will always have 1, 2, 3, 4, 5. What will signatures_dict be?
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5
or something like
1 2 3 4 5
1 4 3 2 5
5 1 3 4 2
? I am worried about cross-platform support.
In my application, the rows are words and the columns documents, so can I say that the first column is the first document?
*Are order of keys() and values() in python dictionary guaranteed to be the same?
You will guaranteed have 1 2 3 4 5 in each row. It will not reorder them. The lack of ordering of values() refers to the fact that if you call signatures_dict.values() the values could come out in any order. But the values are the rows, not the elements of each row. Each row is a list, and lists maintain their order.
If you want a dict which maintains order, Python has that too: https://docs.python.org/2/library/collections.html#collections.OrderedDict
Why not use a list of lists as your matrix? It would have whatever order you gave it;
In [1]: matrix = [[i for i in range(4)] for _ in range(4)]
In [2]: matrix
Out[2]: [[0, 1, 2, 3], [0, 1, 2, 3], [0, 1, 2, 3], [0, 1, 2, 3]]
In [3]: matrix[0][0]
Out[3]: 0
In [4]: matrix[3][2]
Out[4]: 2
Let say I have the matrix
import numpy as np
A = np.matrix([[1,2,3,33],[4,5,6,66],[7,8,9,99]])
I am trying to understand the function argmax, as far as I know it returns the largest value
If I tried it on Python:
np.argmax(A[1:,2])
Should I get the largest element in the second row till the end of the row (which is the third row) and along the third column? So it should be the array [6 9], and arg max should return 9? But why when I run it on Python, it returns the value 1?
And if I want to return the largest element from row 2 onwards in column 3 (which is 9), how should I modify the code?
I have checked the Python documentation but still a bit unclear. Thanks for the help and explanation.
No argmax returns the position of the largest value. max returns the largest value.
import numpy as np
A = np.matrix([[1,2,3,33],[4,5,6,66],[7,8,9,99]])
np.argmax(A) # 11, which is the position of 99
np.argmax(A[:,:]) # 11, which is the position of 99
np.argmax(A[:1]) # 3, which is the position of 33
np.argmax(A[:,2]) # 2, which is the position of 9
np.argmax(A[1:,2]) # 1, which is the position of 9
It took me a while to figure this function out. Basically argmax returns you the index of the maximum value in the array. Now the array can be 1 dimensional or multiple dimensions. Following are some examples.
1 dimensional
a = [[1,2,3,4,5]]
np.argmax(a)
>>4
The array is 1 dimensional so the function simply returns the index of the maximum value(5) in the array, which is 4.
Multiple dimensions
a = [[1,2,3],[4,5,6]]
np.argmax(a)
>>5
In this example the array is 2 dimensional, with shape (2,3). Since no axis parameter is specified in the function, the numpy library flattens the array to a 1 dimensional array and then returns the index of the maximum value. In this case the array is transformed to [[1,2,3,4,5,6]] and then returns the index of 6, which is 5.
When parameter is axis = 0
a = [[1,2,3],[4,5,6]]
np.argmax(a, axis=0)
>>array([1, 1, 1])
The result here was a bit confusing to me at first. Since the axis is defined to be 0, the function will now try to find the maximum value along the rows of the matrix. The maximum value,6, is in the second row of the matrix. The index of the second row is 1. According to the documentation the dimension specified in the axis parameter will be removed. Since the shape of the original matrix was (2,3) and axis specified as 0, the returned matrix will have a shape of(3,) instead, since the 2 in the original shape(2,3) is removed.The row in which the maximum value was found is now repeated for the same number of elements as the columns in the original matrix i.e. 3.
When parameter is axis = 1
a = [[1,2,3],[4,5,6]]
np.argmax(a, axis=1)
>>array([2, 2])
Same concept as above but now index of the column is returned at which the maximum value is available. In this example the maximum value 6 is in the 3rd column, index 2. The column of the original matrix with shape (2,3) will be removed, transforming to (2,) and so the return array will display two elements, each showing the index of the column in which the maximum value was found.
Here is how argmax works. Let's suppose we have given array
matrix = np.array([[1,2,3],[4,5,6],[7,8,9], [9, 9, 9]])
Now, find the max value from given array
np.max(matrix)
The answer will be -> 9
Now find argmax of given array
np.argmax(matrix)
The answer will be -> 8
How it got 8, let's understand
python will convert array to one dimension, so array will look like
array([1, 2, 3, 4, 5, 6, 7, 8, 9, 9, 9, 9])
Index 0 1 2 3 4 5 6 7 8 9 10 11
so max value is 9 and first occurrence of 9 is at index 8. That's why answer of argmax is 8.
axis = 0 (column wise max)
Now, find max value column wise
np.argmax(matrix, axis=0)
Index 0 1 2
0 [1, 2, 3]
1 [4, 5, 6]
2 [7, 8, 9]
3 [9, 9, 9]
In first column values are 1 4 7 9, max value in first column is 9 and is at index 3
same for second column, values are 2 5 8 9 and max value in second column is 9 and is at index 3
for third column values are 3 6 9 9 and max value is 9 and is at index 2 and 3, so first occurrence of 9 is at index 2
so the output will be like [3, 3, 2]
axis = 1 (row wise)
Now find max value row wise
np.argmax(matrix, axis=1)
Index 0 1 2
0 [1, 2, 3]
1 [4, 5, 6]
2 [7, 8, 9]
3 [9, 9, 9]
for first row values are 1 2 3 and max value is 3 and is at index 2
for second row values are 4 5 6 and max value is 6 and is at index 2
for third row values are 7 8 9 and max value is 9 and is at index 2
for fourth row values are 9 9 9 and max value is 9 and is at index 0 1 2, but first occurrence of 9 is at index 0
so the output will be like [2 2 2 0]
argmax is a function which gives the index of the greatest number in the given row or column and the row or column can be decided using axis attribute of argmax funcion. If we give axis=0 then it will give the index from columns and if we give axis=1 then it will give the index from rows.
In your given example A[1:, 2] it will first fetch the values from 1st row on wards and the only 2nd column value from those rows, then it will find the index of max value from into the resulted matrix.
In my first steps in python i have tested this function. And the result with this example clarified me how works argmax.
Example:
# Generating 2D array for input
array = np.arange(20).reshape(4, 5)
array[1][2] = 25
print("The input array: \n", array)
# without axis
print("\nThe max element: ", np.argmax(array))
# with axis
print("\nThe indices of max element: ", np.argmax(array, axis=0))
print("\nThe indices of max element: ", np.argmax(array, axis=1))
Result Example:
The input array:
[[ 0 1 2 3 4]
[ 5 6 25 8 9]
[10 11 12 13 14]
[15 16 17 18 19]]
The max element: 7
The indices of max element: [3 3 1 3 3]
The indices of max element: [4 2 4 4]
In that result we can see 3 results.
The highest element in all array is in position 7.
The highest element in every column is in the last row which index is 3, except on third column where the highest value is in row number two which index is 1.
The highest element in every row is in the last column which index is 4, except on second row where the highest value is in third columen which index is 2.
Reference: https://www.crazygeeks.org/numpy-argmax-in-python/
I hope that it helps.