Plotting multidimensional binary data as horizontal bars - python
I'm trying to nicely plot a multidimensional array (see below) as parallel horizontal bars, in a way that it's filled when True and white when False.
Here's my data:
bar_1 bar_2 bar_3
0 True False False
1 True False False
2 True False True
3 False True False
4 False True False
5 False True False
6 False False False
7 False False False
8 False False False
9 False True False
10 False True False
11 False True False
12 False True False
13 False True False
14 False True False
15 False True False
16 True False False
17 True False False
18 True False True
19 False True False
20 False True False
21 False True False
22 True False True
23 False True False
24 False True False
25 False True False
Here's how I'd like to display it:
I was looking through the matplotlib docs for something similar, but no luck. Perhaps I'm missing the keywords for this type of plotting. What'd be the name of this type of plot? Is it possible to generate this with matplotlib?
Thanks to ImportanceOfBeingErnest I came up with a solution. I am going to use broken_barh which was not exactly designed for this purpose, but with a little bit of tweaking, it can be used for this.
For the sake of simplicity, I'll only display a function, and mark the places red where it's above 0.5.
index = np.linspace(start=-1, stop=5, num=100)
sin = np.sin(index**2)
df = pd.DataFrame({'sin': sin, 'filtered': sin > .5}, index=index)
Produces
Note I'm not going to use the same data as on the plots because that'd take up too much space
In the next step, I compute the flag points where my function crosses the threshold. Just to visualize it:
0 0 0 1 1 1 0 0 1 1 0 0 1 0 0
I compute the shifted values:
0 0 1 1 1 0 0 1 1 0 0 1 0 0 0
0 0 0 1 1 1 0 0 1 1 0 0 1 0 0 // The original values
0 0 0 0 1 1 1 0 0 1 1 0 0 1 0
And I keep only the values where the original array is True and the other two arrays xor (either one is True or the other but not both):
0 0 0 1 0 1 0 0 1 1 0 0 0 0 0
Note that a single spike that doesn't cross just touches the threshold will be 0.
This can be easily achieved with
flags = df.filtered & (df.filtered.shift().fillna(False) ^ df.filtered.shift(-1).fillna(False))
Now I multiply the flag points with the index (not necessarily an integer index).
flag_values = flags * flags.index
0 0 0 3 0 5 0 0 8 9 0 0 0 0 0
And drop the 0 values:
flag_values = flag_values[flag_values != 0]
[3, 5, 8, 9]
I still need to reshape it:
value_pairs = flag_values.values.reshape((-1, 2))
[[3, 5],
[8, 9]]
And now I need to subtract the first column from the second one:
value_pairs[:, 1] = value_pairs[:, 1] - value_pairs[:, 0]
And I can plot it as follows:
ax = df.sin.plot()
// The second parameter is the height of the bar, and its thickness.
ax.broken_barh(value_pairs, (0.49, 0.02,), facecolors='red')
Here's the result
Related
how to change the particuler elements of an array
I have an array for an example: import numpy as np data=np.array([[4,4,4,0,1,1,1,1,1,1,0,0,0,0,1], [3,0,1,0,1,1,1,1,1,1,1,1,1,1,0], [6,0,0,0,1,1,1,1,1,1,1,1,1,1,0], [2,0,0,0,1,1,1,0,1,0,1,1,1,0,0], [2,0,1,0,1,1,1,0,1,0,1,0,1,0,0]]) Requirement : In the data array, if element 1's are consecutive as the square size of ((3,3)) and more than square size no changes. Otherwise, replace element value 1 with zero except the square size. Expected output : [[4 4 4 0 1 1 1 1 1 1 0 0 0 0 0] [3 0 0 0 1 1 1 1 1 1 1 1 1 0 0] [6 0 0 0 1 1 1 1 1 1 1 1 1 0 0] [2 0 0 0 1 1 1 0 0 0 1 1 1 0 0] [2 0 0 0 1 1 1 0 0 0 0 0 0 0 0]] current code: k = np.ones((3,3)) print(k) jk=binary_dilation(binary_erosion(data==1, k), k) print(jk) current output: [[False False False False True True True True True True False False False False False] [False False False False True True True True True True True True True False False] [False False False False True True True True True True True True True False False] [False False False False True True True False False False True True True False False] [False False False False True True True False False False False False False False False]]
You can use a 2D convolution on the 1s with a 3x3 kernel of 1s to identify the centers of the 3x3 squares, then dilate them and restore the non 1 numbers from scipy.signal import convolve2d from scipy.ndimage import binary_dilation # get 1s (as boolean) m = data==1 kernel = np.ones((3, 3)) # get centers conv = convolve2d(m, kernel, mode='same') # dilate and restore the other numbers out = np.where(m, binary_dilation(conv == 9, kernel).astype(int), data) print(out) Alternative, erosion and dilation, similarly to my previous answer: from scipy.ndimage import binary_dilation, binary_erosion m = data==1 kernel = np.ones((3, 3)) out = np.where(m, binary_dilation(binary_erosion(m, kernel), kernel), data) Output: [[4 4 4 0 1 1 1 1 1 1 0 0 0 0 0] [3 0 0 0 1 1 1 1 1 1 1 1 1 0 0] [6 0 0 0 1 1 1 1 1 1 1 1 1 0 0] [2 0 0 0 1 1 1 0 0 0 1 1 1 0 0] [2 0 0 0 1 1 1 0 0 0 0 0 0 0 0]]
The below proposed solution uses skimage.util.view_as_windows in order to get all of the possible 3x3 views onto the larger array and then looping using them with help of Python loops to set the found hits back to ones after zeroing all the ones in the first step. It seems that OpenCV, scimage, numpy don't provide a method able to label areas in which a 3x3 square can move around having all the values beneath set to 1's, but I am not sure here. So if you read this and know much about dilations, convolutions, etc. please leave a note pointing me in the right direction allowing to label the area of image by a methond implemented in C-code for better speed on huge arrays. # https://stackoverflow.com/questions/73649612/how-to-change-the-particuler-elements-of-an-array # Example array: import numpy as np D_in =np.array([[4,4,4,0,1,1,1,1,1,1,0,0,0,0,1], [3,0,1,0,1,1,1,1,1,1,1,1,1,1,0], [6,0,0,0,1,1,1,1,1,1,1,1,1,1,0], [2,0,0,0,1,1,1,0,1,0,1,1,1,0,0], [2,0,1,0,1,1,1,0,1,0,1,0,1,0,0]]) # Requirements: If in the D_in array at least one 3x3 square with 1's # is found, replace in D_in all 1's with 0's except the square size and more than square size no changes. # Otherwise. # Expected output : D_tgt=np.array([[4,4,4,0,1,1,1,1,1,1,0,0,0,0,0], [3,0,0,0,1,1,1,1,1,1,1,1,1,0,0], [6,0,0,0,1,1,1,1,1,1,1,1,1,0,0], [2,0,0,0,1,1,1,0,0,0,1,1,1,0,0], [2,0,0,0,1,1,1,0,0,0,0,0,0,0,0]]) # ---------------------------------------------------------------------- import numpy as np D_out = np.copy(D_in) # D_out for restoring hit 3x3 ONEs D_out[D_out==1] = 0 # set all ONEs to zero needle = np.ones((3,3)) # not necessary here, except documentation # Create all possible 3x3 needle views on larger D_in heystack: from skimage.util import view_as_windows as winview # vvvvvvvvvvvvvvvvvvvvvvvvvvv all3x3 = winview(D_in,(3,3)) # the CORE of the algorithm #^^^^^^^^^^^^^^^^^^^^^^^^^^^^ all3x3_rows, all3x3_cols, needle_rows, needle_cols = all3x3.shape print(f'{all3x3_rows=}, {all3x3_cols=}, {needle_rows=}, {needle_cols=}') noOfHits = 0 # used also to decide about the output if not found Hits for row in range(all3x3_rows): for col in range(all3x3_cols): if np.all(all3x3[row,col,:,:]): noOfHits += 1 # print(f'3x3 Ones at: {row=}, {col=}') D_out[row:row+3, col:col+3] = 1 print('--------------------------') print(D_in) print('--------------------------') if noOfHits > 0: print(D_out) assert D_out.all() == D_tgt.all() # make sure the result is correct else: print(D_in) gives on output: all3x3_rows=3, all3x3_cols=13, needle_rows=3, needle_cols=3 -------------------------- [[4 4 4 0 1 1 1 1 1 1 0 0 0 0 1] [3 0 1 0 1 1 1 1 1 1 1 1 1 1 0] [6 0 0 0 1 1 1 1 1 1 1 1 1 1 0] [2 0 0 0 1 1 1 0 1 0 1 1 1 0 0] [2 0 1 0 1 1 1 0 1 0 1 0 1 0 0]] -------------------------- [[4 4 4 0 1 1 1 1 1 1 0 0 0 0 0] [3 0 0 0 1 1 1 1 1 1 1 1 1 0 0] [6 0 0 0 1 1 1 1 1 1 1 1 1 0 0] [2 0 0 0 1 1 1 0 0 0 1 1 1 0 0] [2 0 0 0 1 1 1 0 0 0 0 0 0 0 0]] The same output can be achieved with the one-liner from the answer by mozway. Here some preliminary code: import numpy as np data =np.array([[4,4,4,0,1,1,1,1,1,1,0,0,0,0,1], [3,0,1,0,1,1,1,1,1,1,1,1,1,1,0], [6,0,0,0,1,1,1,1,1,1,1,1,1,1,0], [2,0,0,0,1,1,1,0,1,0,1,1,1,0,0], [2,0,1,0,1,1,1,0,1,0,1,0,1,0,0]]) print(data) print(" ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^") from scipy.signal import convolve2d from scipy.ndimage import binary_dilation b = binary_dilation c = convolve2d d = data k = (3,3) m = data==1 # ; print( m.astype(int) ) n = np.ones s = 'same' w = np.where than out=w(m,b(c(d==1,n(k),mode=s)==9,n(k)),d) and finally print(out)
Python - Check values in consecutive columns(i.e across rows)
I have a 6x4 dataframe containing numerical values. I would like to check if the value in the current column is the same as the next column's i.e are there any equal values in consecutive columns per row?. How do I perform this check as a new column? import itertools as it import pandas as pd list(set(it.permutations([1,1,0,0]))) x_list = list(set(it.permutations([1,1,0,0]))) x_df = pd.DataFrame(x_list) x_df.columns = ['one', 'two', 'three', 'four']
If I understood you correctly: x = x_df.diff(periods=-1, axis=1) x['four'] = x_df['four'] - x_df['three'] print((x==0)) Input: one two three four 0 1 0 1 0 1 1 1 0 0 2 1 0 0 1 3 0 1 1 0 4 0 1 0 1 5 0 0 1 1 Output: one two three four 0 False False False False 1 True False True True 2 False True False False 3 False True False False 4 False False False False 5 True False True True
Count number of consecutive True in column, restart when False
I work with the following column in a pandas df: A True True True False True True I want to add column B that counts the number of consecutive "True" in A. I want to restart everytime a "False" comes up. Desired output: A B True 1 True 2 True 3 False 0 True 1 True 2
Using cumsum identify the blocks of rows where the values in column A stays True, then group the column A on these blocks and calculate cumulative sum to assign ordinal numbers df['B'] = df['A'].groupby((~df['A']).cumsum()).cumsum() A B 0 True 1 1 True 2 2 True 3 3 False 0 4 True 1 5 True 2
Using a simple & native approach (For a small code sample it worked fine) import pandas as pd df = pd.DataFrame({'A': [True, False, True, True, True, False, True, True]}) class ToNums: counter = 0 #staticmethod def convert(bool_val): if bool_val: ToNums.counter += 1 else: ToNums.counter = 0 return ToNums.counter df['B'] = df.A.map(ToNums.convert) df A B 0 True 1 1 False 0 2 True 1 3 True 2 4 True 3 5 False 0 6 True 1 7 True 2
Here's an example v=0 for i,val in enumerate(df['A']): if val =="True": df.loc[i,"C"]= v =v+1 else: df.loc[i,"C"]=v=0 df.head() This will give the desired output A C 0 True 1 1 True 2 2 True 3 3 False 0 4 True 1
You can use a combination of groupby, cumsum, and cumcount df['B'] = (df.groupby((df['A']& ~df['A'].shift(1).fillna(False) # row is True and next is False ) .cumsum() # make group id ) .cumcount().add(1) # make cumulated count *df['A'] # multiply by 0 where initially False, 1 otherwise ) output: A B 0 True 1 1 True 2 2 True 3 3 False 0 4 True 1 5 True 2
In python, how to shift and fill with a specific values for all the shifted rows in DataFrame?
I have a following dataframe. y = pd.DataFrame(np.zeros((10,1), dtype = 'bool'), columns = ['A']) y.iloc[[3,5], 0] = True A 0 False 1 False 2 False 3 True 4 False 5 True 6 False 7 False 8 False 9 False And I want to make 'True' for the next three rows from where 'True' is found in the above dataframe. The expected results is shown in the below. A 0 False 1 False 2 False 3 True 4 True 5 True 6 True 7 True 8 False 9 False I can do that in the following way, but I wonder if there is a smarter way to do so. y['B'] = y['A'].shift() y['C'] = y['B'].shift() y['D'] = y.any(axis = 1) y['A'] = y['D'] y = y['A'] Thank you for the help in advance.
I use parameter limit in forward filling missing values with replace False to missing values and last replace NaNs to False: y.A = y.A.replace(False, np.nan).ffill(limit=2).fillna(False) print (y) A 0 False 1 False 2 False 3 True 4 True 5 True 6 True 7 True 8 False 9 False Another idea with Rolling.apply and any for test at least one True per window: y.A = y.A.rolling(3, min_periods=1).apply(lambda x: x.any()).astype(bool)
Adding a count to prior cell value in Pandas
in Pandas I am looking to add a value in one column 'B' depending on the boolean values from another column 'A'. So if 'A' is True then start counting (i.e. adding a one each new line) as long as 'A' is false. When 'A' is True reset and start counting again. I managed to do this with a 'for' loop but this is very time consuming. I am wondering if there is no more time efficient solution? the result should look like this: Date A B 01.2010 False 0 02.2010 True 1 03.2010 False 2 04.2010 False 3 05.2010 True 1 06.2010 False 2
You can use cumsum with groupby and cumcount: print df Date A 0 1.201 False 1 1.201 True 2 1.201 False 3 2.201 True 4 3.201 False 5 4.201 False 6 5.201 True 7 6.201 False roll = df.A.cumsum() print roll 0 0 1 1 2 1 3 2 4 2 5 2 6 3 7 3 Name: A, dtype: int32 df['B'] = df.groupby(roll).cumcount() + 1 #if in first values are False, output is 0 df.loc[roll == 0 , 'B'] = 0 print df Date A B 0 1.201 False 0 1 1.201 True 1 2 1.201 False 2 3 2.201 True 1 4 3.201 False 2 5 4.201 False 3 6 5.201 True 1 7 6.201 False 2
thanks, I got the solution from another post similar to this: rolling_count = 0 def set_counter(val): if val == False: global rolling_count rolling_count +=1 else: val == True rolling_count = 1 return rolling_count df['B'] = df['A'].map(set_counter)