Markers on seaborn line plot in python - python

New here so putting hyperlinks. My dataframe looks like this.
HR ICULOS SepsisLabel PatientID
100.3 1 0 1
117.0 2 0 1
103.9 3 0 1
104.7 4 0 1
102.0 5 0 1
88.1 6 0 1
Access the whole file here. What I wanted is to add a marker on the HR graph based on SepsisLabel (See the file). E.g., at ICULOS = 249, Sepsis Label changed from 0 to 1. I wanted to show that at this point on graph, sepsis label changed. I was able to calculate the position using this code:
mark = dummy.loc[dummy['SepsisLabel'] == 1, 'ICULOS'].iloc[0]
print("The ICULOS where SepsisLabel changes from 0 to 1 is:", mark)
Output: The ICULOS where SepsisLabel changes from 0 to 1 is: 249
I Plotted the graph using the code:
plt.figure(figsize=(15,6))
ax = plt.gca()
ax.set_title("Patient ID = 1")
ax.set_xlabel('ICULOS')
ax.set_ylabel('HR Readings')
sns.lineplot(ax=ax,
x="ICULOS",
y="HR",
data=dummy,
marker = '^',
markersize=5,
markeredgewidth=1,
markeredgecolor='black',
markevery=mark)
plt.show()
This is what I got: Graph. The marker was supposed to be on position 249 only. But it is also on position 0. Why is it happening? Can someone help me out?
Thanks.

Working with markevery can be tricky in this case, as it strongly depends on there being exactly one entry for each patient and each ICULOS.
Here is an alternative approach, using an explicit scatter plot to draw the marker:
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
df = pd.DataFrame({'HR': np.random.randn(200).cumsum() + 60,
'ICULOS': np.tile(np.arange(1, 101), 2),
'SepsisLabel': np.random.binomial(2, 0.05, 200),
'PatientID': np.repeat([1, 2], 100)})
for patient_id in [1, 2]:
dummy = df[df['PatientID'] == patient_id]
fig, ax = plt.subplots(figsize=(15, 6))
ax.set_title(f"Patient ID = {patient_id}")
ax.set_xlabel('ICULOS')
ax.set_ylabel('HR Readings')
sns.lineplot(ax=ax,
x="ICULOS",
y="HR",
data=dummy)
x = dummy[dummy['SepsisLabel'] == 1]["ICULOS"].values[0]
y = dummy[dummy['SepsisLabel'] == 1]["HR"].values[0]
ax.scatter(x=x,
y=y,
marker='^',
s=5,
linewidth=1,
edgecolor='black')
ax.text(x, y, str(x) + '\n', ha='center', va='center', color='red')
plt.show()
For your new question, here is an example how to convert the 'ICULOS' column to pandas dates. The example uses date 20210101 to correspond with ICULOS == 1. You probably have a different starting date for each patient.
df_fb = pd.DataFrame()
df_fb['Y'] = df['HR']
df_fb['DS'] = pd.to_datetime('20210101') + pd.to_timedelta(df['ICULOS'] - 1, unit='D')

Related

Is it possible to get plot from panda dataframe includes missing data by Heatmap with especial color?

I was wondering if I can get all plots of columns in panda dataframe in one-window via heatmap in 24x20 self-made matrix-model-square which I designed to map every 480 values of each column(which means 1-cycle) by mapping them inside of it through all cycles. The challenging point is I want to show missing data by using especial color which is out of color range of colormap cmap ='coolwarm'
I already tried by using df = df.replace([np.inf, -np.inf], np.nan) make sure that all inf convert to nan and then by using df = df.replace(0,np.nan) before sns.heatmap(df, vmin=-1, vmax=+1, cmap ='coolwarm' I can recognize missing values via white color since in cmap ='coolwarm' white color represents nan/inf in this interval [vmin=-1, vmax=+1] after applying above-mentioned instructions however it has 2 problem:
First in case that you have 0 in your dataset it will be shown like missing data by white color too and you can't distinguish between inf/nan and 0 in columns. Second problem is you can't even differentiate between nan and inf values!
I also tried mask=df.isnull() inside sns.heatmap() by specifying a mask, where data will not be shown for those cells whose mask values are True but it covers again 0 based on this answer GH375. I'm not sure the answer here mentioned by #Scotty1- is right solution for my case by adding marker to interpolate the values by newdf = newdf.interpolate().
Is it good idea to filter missing data by subsetting :
import math
df = df[df['a'].apply(lambda x: math.isnan(x))]
df = df[df['a'] == float('inf')]
My scripts are following however in for-loop I couldn't get proper output due to in each cycle it prints plot each of them 3 times in different intervals eg. it prints A left then again it prints A under the name of B and C in middle and right in-one-window. Again it prints B 3-times instead once and put it middle and in the end it prints C 3-times instead of once and put in right side it put in middle and left!
import numpy as np
import pandas as pd
import os
import seaborn as sns
import matplotlib.pyplot as plt
#extract the parameters and put them in lists based on id_set
df = pd.read_csv('D:\SOF.TXT', header=None)
id_set = df[df.index % 4 == 0].astype('int').values
a = df[df.index % 4 == 1].values
b = df[df.index % 4 == 2].values
c = df[df.index % 4 == 3].values
data = {'A': a[:,0], 'B': b[:,0], 'C': c[:,0] }
#main_data contains all the data
main_data = pd.DataFrame(data, columns=['A','B','C'], index = id_set[:,0])
#next iteration create all plots, change the numer of cycles
cycles = int(len(main_data)/480)
print(cycles)
for i in main_data:
try:
os.mkdir(i)
except:
pass
min_val = main_data[i].min()
min_nor = -1
max_val = main_data[i].max()
max_nor = 1
for cycle in range(1): #iterate thriugh all cycles range(1) by ====> range(int(len(main_data)/480))
count = '{:04}'.format(cycle)
j = cycle * 480
ordered_data = mkdf(main_data.iloc[j:j+480][i])
csv = print_df(ordered_data)
#Print .csv files contains matrix of each parameters by name of cycles respectively
csv.to_csv(f'{i}/{i}{count}.csv', header=None, index=None)
if 'C' in i:
min_nor = -40
max_nor = 150
#Applying normalizayion for C between [-40,+150]
new_value = normalize(main_data.iloc[j:j+480][i].values, min_val, max_val, -40, 150)
n_cbar_kws = {"ticks":[-40,150,-20,0,25,50,75,100,125]}
else:
#Applying normalizayion for A,B between [-1,+1]
new_value = normalize(main_data.iloc[j:j+480][i].values, min_val, max_val, -1, 1)
n_cbar_kws = {"ticks":[-1.0,-0.75,-0.50,-0.25,0.00,0.25,0.50,0.75,1.0]}
Sections = mkdf(new_value)
df = print_df(Sections)
#Plotting parameters by using HeatMap
plt.figure()
sns.heatmap(df, vmin=min_nor, vmax=max_nor, cmap ='coolwarm', cbar_kws=n_cbar_kws)
plt.title(i, fontsize=12, color='black', loc='left', style='italic')
plt.axis('off')
#Print .PNG iamges contains HeatMap plots of each parametersby name of cycles respectively
plt.savefig(f'{i}/{i}{count}.png')
#plotting all columns ['A','B','C'] in-one-window side by side
fig, axes = plt.subplots(nrows=1, ncols=3 , figsize=(20,10))
plt.subplot(131)
sns.heatmap(df, vmin=-1, vmax=1, cmap ="coolwarm", cbar=True , cbar_kws={"ticks":[-1.0,-0.75,-0.5,-0.25,0.00,0.25,0.5,0.75,1.0]})
fig.axes[-1].set_ylabel('[MPa]', size=20) #cbar_kws={'label': 'Celsius'}
plt.title('A', fontsize=12, color='black', loc='left', style='italic')
plt.axis('off')
plt.subplot(132)
sns.heatmap(df, vmin=-1, vmax=1, cmap ="coolwarm", cbar=True , cbar_kws={"ticks":[-1.0,-0.75,-0.5,-0.25,0.00,0.25,0.5,0.75,1.0]})
fig.axes[-1].set_ylabel('[Mpa]', size=20) #cbar_kws={'label': 'Celsius'}
#sns.despine(left=True)
plt.title('B', fontsize=12, color='black', loc='left', style='italic')
plt.axis('off')
plt.subplot(133)
sns.heatmap(df, vmin=-40, vmax=150, cmap ="coolwarm" , cbar=True , cbar_kws={"ticks":[-40,150,-20,0,25,50,75,100,125]})
fig.axes[-1].set_ylabel('[°C]', size=20) #cbar_kws={'label': 'Celsius'}
#sns.despine(left=True)
plt.title('C', fontsize=12, color='black', loc='left', style='italic')
plt.axis('off')
plt.suptitle(f'Analysis of data in cycle Nr.: {count}', color='yellow', backgroundcolor='black', fontsize=48, fontweight='bold')
plt.subplots_adjust(top=0.7, bottom=0.3, left=0.05, right=0.95, hspace=0.2, wspace=0.2)
#plt.subplot_tool()
plt.savefig(f'{i}/{i}{i}{count}.png')
plt.show()
my data frame looks like following:
A B C
0 2.291171 -2.689658 -344.047912
10 2.176816 -4.381186 -335.936524
20 2.291171 -2.589725 -342.544885
30 2.176597 -6.360999 0.000000
40 2.577268 -1.993412 -344.326376
50 9.844076 -2.690917 -346.125859
60 2.061782 -2.889378 -346.375655
Here below is overview of my dataset sample from .TXT file: dataset
in case that you want to check out with missing data values please change the last 3 values of end of text file to nan/inf and save it and debug it.
7590 7590
0 nan
7.19025828418 nan
-1738.000075 inf
I'd like to visualise a large pandas-dataframe includes 3 columns columns=['A','B','C'] via heatmaps in-one-window. This dataframe has two types of variables: strings (nan or inf) and floats.
I want the heatmap to show missing data cells inside of matrix-squared-model by fixed colors like nan by black and inf by silver or gray, and the rest of the dataframe as a normal heatmap, with the floats in a scale of cmap ='coolwarm'.
Here is image of desired output when there is no nan/inf in dataset:
I'm looking forward to hearing from those people they are dealing with these issues.

Python: draw multiple positive/negative Bar Charts by conditions

This is my first time drawing bar charts in python.
My df op:
key descript score
0 noodles taste 5
1 noodles color -2
2 noodles health 3
3 apple color 7
4 apple hard 9
My code:
import matplotlib.pyplot as plt
op['positive'] = op['score'] > 0
op['score'].plot(kind='barh', color=op.positive.map({True: 'r', False: 'k'}), use_index=True)
plt.show()
plt.savefig('sample1.png')
Output:
But this is not what I expected. I would like to draw two charts by different keys in this case with index and maybe use different colors like below:
How can I accomplish this?
Try:
fig, ax = plt.subplots(1,op.key.nunique(), figsize=(15,5), sharex=True)
i = 0
#Fix some data issues/typos
op['key']=op.key.str.replace('noodels','noodles')
for n, g in op.assign(positive=op['score'] >= 0).groupby('key'):
g.plot.barh(y='score', x='descript', ax=ax[i], color=g['positive'].map({True:'red',False:'blue'}), legend=False)\
.set_xlabel(n)
ax[i].set_ylabel('Score')
ax[i].spines['top'].set_visible(False)
ax[i].spines['right'].set_visible(False)
ax[i].spines['top'].set_visible(False)
ax[i].spines['left'].set_position('zero')
i += 1
Output:
Update added moving of labels for yaxis - Thanks to this SO solution by # ImportanceOfBeingErnest
fig, ax = plt.subplots(1,op.key.nunique(), figsize=(15,5), sharex=True)
i = 0
#Fix some data issues/typos
op['key']=op.key.str.replace('noodels','noodles')
for n, g in op.assign(positive=op['score'] >= 0).groupby('key'):
g.plot.barh(y='score', x='descript', ax=ax[i], color=g['positive'].map({True:'red',False:'blue'}), legend=False)\
.set_xlabel(n)
ax[i].set_ylabel('Score')
ax[i].spines['top'].set_visible(False)
ax[i].spines['right'].set_visible(False)
ax[i].spines['top'].set_visible(False)
ax[i].spines['left'].set_position('zero')
plt.setp(ax[i].get_yticklabels(), transform=ax[i].get_yaxis_transform())
i += 1
Output:

Plot gets shifted when using secondary_y

I want to plot temperature and precipitation from a weather station in the same plot with two y-axis. However, when I try this, one of the plots gets shifted for no reason it seems like. This is my code: (I have just tried for two precipitation measurements as of now, but you get the deal.)
ax = m_prec_ra.plot()
ax2 = m_prec_po.plot(kind='bar',secondary_y=True,ax=ax)
ax.set_xlabel('Times')
ax.set_ylabel('Left axes label')
ax2.set_ylabel('Right axes label')
This returns the following plot:
My plot is to be found here
I saw someone asking the same question, but I can't seem to figure out how to manually shift one of my datasets.
Here is my data:
print(m_prec_ra,m_prec_po)
Time
1 0.593436
2 0.532058
3 0.676219
4 1.780795
5 4.956048
6 11.909394
7 17.820051
8 14.225257
9 10.261061
10 2.628336
11 0.240568
12 0.431227
Name: Precipitation (mm), dtype: float64 Time
1 0.704339
2 1.225169
3 1.905223
4 4.156270
5 11.531221
6 22.246230
7 30.133800
8 27.634639
9 20.693056
10 5.282412
11 0.659365
12 0.622562
Name: Precipitation (mm), dtype: float64
The explanation for this behaviour is found in this Q & A.
Here, the solution would be to shift the lines one to the front, i.e. plotting against an index which starts at 0, instead of 1.
import numpy as np; np.random.seed(42)
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({"A" : np.arange(1,11),
"B" : np.random.rand(10),
"C" : np.random.rand(10)})
df.set_index("A", inplace=True)
ax = df.plot(y='B', kind = 'bar', legend = False)
df2 = df.reset_index()
df2.plot(ax = ax, secondary_y = True, y = 'B', kind = 'line')
plt.show()
What version of pandas are you using for this plotting?
Using 0.23.4 running this code:
df1 = pd.DataFrame({'Data_1':[1,2,4,8,16,12,8,4,1]})
df2 = pd.DataFrame({'Data_2':[1,2,4,8,16,12,8,4,1]})
ax = df1.plot()
ax2 = df2.plot(kind='bar',secondary_y=True,ax=ax)
ax.set_xlabel('Times')
ax.set_ylabel('Left axes label')
ax2.set_ylabel('Right axes label')
I get:
If you want to add sample data we could look at that.

group multiple plot in one figure python

My function return 28 plots ( figure) but i need to group them on one figure this is my code for generating 28 plots
for cat in df.ASS_ASSIGNMENT.unique() :
a = df.loc[df['ASS_ASSIGNMENT'] == cat]
dates = a['DATE']
prediction = a['CSPL_RECEIVED_CALLS']
plt.plot(dates,prediction)
plt.ylabel("nmb_app")
plt.legend([cat.decode('utf-8')],loc='best')
plt.xlabel(cat.decode('utf-8'))
Use plt.subplots. For example,
import numpy as np
import matplotlib.pyplot as plt
fig, axes = plt.subplots(ncols=7, nrows=4)
for i, ax in enumerate(axes.flatten()):
x = np.random.randint(-5, 5, 20)
y = np.random.randint(-5, 5, 20)
ax.scatter(x, y)
ax.set_title('Axis {}'.format(i))
plt.tight_layout()
Going a little deeper, as Mauve points out, it depends if you want 28 curves in a single plot in a single figure or 28 individual plots each with its own axis all in one figure.
Assuming you have a dataframe, df, with 28 columns you can put all 28 curves on a single plot in a single figure using plt.subplots like so,
fig1, ax1 = plt.subplots()
df.plot(color=colors, ax=ax1)
plt.legend(ncol=4, loc='best')
If instead you want 28 individual axes all in one figure you can use plt.subplots this way
fig2, axes = plt.subplots(nrows=4, ncols=7)
for i, ax in enumerate(axes.flatten()):
df[df.columns[i]].plot(color=colors[i], ax=ax)
ax.set_title(df.columns[i])
Here df looks like
In [114]: df.shape
Out[114]: (15, 28)
In [115]: df.head()
Out[115]:
IYU ZMK DRO UIC DOF ASG DLU \
0 0.970467 1.026171 -0.141261 1.719777 2.344803 2.956578 2.433358
1 7.982833 7.667973 7.907016 7.897172 6.659990 5.623201 6.818639
2 4.608682 4.494827 6.078604 5.634331 4.553364 5.418964 6.079736
3 1.299400 3.235654 3.317892 2.689927 2.575684 4.844506 4.368858
4 10.690242 10.375313 10.062212 9.150162 9.620630 9.164129 8.661847
BO1 JFN S9Q ... X4K ZQG 2TS \
0 2.798409 2.425745 3.563515 ... 7.623710 7.678988 7.044471
1 8.391905 7.242406 8.960973 ... 5.389336 5.083990 5.857414
2 7.631030 7.822071 5.657916 ... 2.884925 2.570883 2.550461
3 6.061272 4.224779 5.709211 ... 4.961713 5.803743 6.008319
4 10.240355 9.792029 8.438934 ... 6.451223 5.072552 6.894701
RS0 P6T FOU LN9 CFG C9D ZG2
0 9.380106 9.654287 8.065816 7.029103 7.701655 6.811254 7.315282
1 3.931037 3.206575 3.728755 2.972959 4.436053 4.906322 4.796217
2 3.784638 2.445668 1.423225 1.506143 0.786983 -0.666565 1.120315
3 5.749563 7.084335 7.992780 6.998563 7.253861 8.845475 9.592453
4 4.581062 5.807435 5.544668 5.249163 6.555792 8.299669 8.036408
and was created by
import pandas as pd
import numpy as np
import string
import random
m = 28
n = 15
def random_data(m, n):
return np.cumsum(np.random.randn(m*n)).reshape(m, n)
def id_generator(number, size=6, chars=string.ascii_uppercase + string.digits):
sequence = []
for n in range(number):
sequence.append(''.join(random.choice(chars) for _ in range(size)))
return sequence
df = pd.DataFrame(random_data(n, m), columns=id_generator(number=m, size=3))
Colors was defined as
import seaborn as sns
colors = sns.cubehelix_palette(28, rot=-0.4)

Error when trying to plot multi-colored line in Python

I am unable to plot a variable where the points are coloured by reference to an index. What I ultimately want is the line-segment of each point (connecting to the next point) to be a particular colour. I tried with both Matplotlib and pandas. Each method throws a different error.
Generating a trend-line:
datums = np.linspace(0,10,5)
sinned = np.sin(datums)
plt.plot(sinned)
So now we generate a new column of the labels:
sinned['labels'] = np.where((sinned < 0), 1, 2)
print(sinned)
Which generate our final dataset:
0 labels
0 0.000000 2
1 0.598472 2
2 -0.958924 1
3 0.938000 2
4 -0.544021 1
And now for the plotting attempt:
plt.plot(sinned[0], c = sinned['labels'])
Which results in the error: length of rgba sequence should be either 3 or 4
I also tried setting the labels to be the strings 'r' or 'b', which didn't work either :-/
1 and 2 are not a color, 'b'lue and 'r'ed are used in the example below. You need to plot each separately.
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
datums = np.linspace(0,10,5)
sinned = pd.DataFrame(data=np.sin(datums))
sinned['labels'] = np.where((sinned < 0), 'b', 'r')
fig, ax = plt.subplots()
for s in range(0, len(sinned[0]) - 1):
x=(sinned.index[s], sinned.index[s + 1])
y=(sinned[0][s], sinned[0][s + 1])
ax.plot(x, y, c=sinned['labels'][s])
plt.show()

Categories

Resources