I have data that describes the makeup of a binary file. Each point of data specifies a start and end range in bytes as well as a type:
[0x046270, 0x057574, "type1"]
[0x057574, 0x05BF20, "type2"]
[0x05BF20, 0x05EF80, "type1"]
[0x05EF80, 0x05F050, "type2"]
I would like to be able to visualize the file by coloring sections and getting something similar to what can be seen in the old Windows disk defragmentation utility.
I have tried using matplotlib's stacked bar chart for this, but I am seeing some issues and think I may be misusing it for this purpose. Is there a name for the type of graph below or any clean way of going about rendering this?
Basic stacked graph with 256 sector images. To make it two tiers like the presented image, you need to add in ax2 or change the structure of the data There is a The process is very heavy, so it takes some time to output.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import random
FAT_No = np.arange(0, pow(2,8))
sector_st = random.choices(['type1','type2','type3','type4'], k=256)
value = [1]*256
before = ['before']*256
df = pd.DataFrame({'before':before,'fat_no':FAT_No, 'sector':sector_st, 'value':value})
df
before fat_no sector value
0 before 0 type1 1
1 before 1 type1 1
2 before 2 type4 1
3 before 3 type2 1
4 before 4 type2 1
... ... ... ... ...
251 before 251 type2 1
252 before 252 type2 1
253 before 253 type3 1
254 before 254 type2 1
255 before 255 type4 1
fig = plt.figure(figsize=(16,3),dpi=144)
ax = fig.add_subplot(111)
color = {'type1':'b','type2':'g','type3':'r','type4':'w'}
for i in range(len(df)):
ax.barh(df['before'], df['value'].iloc[i], color=color[df['sector'].iloc[i]], left=df['value'].iloc[:i].sum())
plt.show()
Related
Let it be the following Python Panda DataFrame:
value
other_value
cluster
1382
2.1
0
10
3.9
1
104
5.9
1
82
-1.1
0
100
0.9
2
1003
0.85
2
232
4.1
0
19
0.6
3
1434
0.3
3
23
1.6
3
Using the seaborn module, I want to display a set of boxplots for each column of values, showing the comparative information per value of the cluster column.
That is, for the above DataFrame, it would show a first graph for the 'value' column with 4 boxplots, one for each cluster value. The second graph would include information for the 'other_value' column also showing 1 boxplot for each cluster.
My idea is to do the same, but instead of in R language, in python: Boxplots of different variables by cluster assigned on one graph in ggplot
My code, It only shows the 1 to 1 graphs, I would like to get a joint graph with all graphs applied, as in the link above:
sns.boxplot(y='value', x='cluster',
data=df,
palette="colorblind",
hue='cluster')
Thanks for the help offered.
Most seaborn functions work best with the data in "long form".
Here is how the code could look like:
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
df = pd.read_html('https://stackoverflow.com/questions/72301993/')[0]
df_long = df.melt(id_vars='cluster', value_vars=df.columns[:-1], var_name='variable', value_name='values')
sns.catplot(kind='box', data=df_long,
col='variable', y='values', x='cluster', hue='cluster', palette="colorblind", sharey=False, colwrap=2)
plt.tight_layout()
plt.show()
I am making a graph to plot Gender count for the time series data that look like following data. Each row represent hourly data of each respective patient.
HR
SBP
DBP
Sepsis
Gender
P_ID
92
120
80
0
0
0
98
115
85
0
0
0
93
125
75
1
1
1
95
130
90
1
1
1
102
120
80
0
0
2
109
115
75
0
0
2
94
135
100
0
0
2
97
100
70
1
1
3
85
120
80
1
1
3
88
115
75
1
1
3
93
125
85
1
1
3
78
130
90
1
0
4
115
140
110
1
0
4
102
120
80
0
1
5
98
140
110
0
1
5
This is my code:
gender = df_n['Gender'].value_counts()
plt.figure(figsize=(7, 6))
ax = gender.plot(kind='bar', rot=0, color="c")
ax.set_title("Bar Graph of Gender", y = 1)
ax.set_xlabel('Gender')
ax.set_ylabel('Number of People')
ax.set_xticklabels(('Male', 'Female'))
for rect in ax.patches:
y_value = rect.get_height()
x_value = rect.get_x() + rect.get_width() / 2
space = 1
label = format(y_value)
ax.annotate(label, (x_value, y_value), xytext=(0, space), textcoords="offset points", ha='center', va='bottom')
plt.show()
Now what is happening is the code is calculating total number of instances (0: Male, 1: Female) and plotting it. But I want to plot the total males and females, not the total number of 0s and 1s, as the Same patient is having multiple rows of data (as per P_ID). Like how many patients are male and how many are female?
Can someone help me out? I guess maybe sns.countplot can be used. But I don't know how.
Thanks for helping me out >.<
__________ Udpate ________________
How I can group those Genders that are sepsis (1) or no sepsis (0)?
__________ Update 2 ___________
So, I got the total actual count of Male and Female, thanks to #Shaido.
In the whole dataset, there are only 2932 septic patients. Rest are non-septic. This is what I got from #JohanC answer.
Now, the problem is that as there are only 2932 septic patients, by looking at the graph, it is assumed that only 426 (251 Male) and (175 Female) are septic patients (out of 2932), rest are non-septic. But this is not true. Please help. Thanks.
I have a working example for selecting the unique IDS, it looks ugly so there is probably a better way, but it works...
import pandas as pd
# example of data:
data = {'gender': [0, 0, 1, 1, 1, 1, 0, 0], 'id': [1, 1, 2, 2, 3, 3, 4, 4]}
df = pd.DataFrame(data)
# get all unique ids:
ids = set(df.id)
# Go over all id, get first element of gender:
g = [list(df[df['id'] == i]['gender'])[0] for i in ids]
# count genders, laze way using pandas since the rest of the code also assumes a dataframe for plotting:
gender_counts = pd.DataFrame(g).value_counts()
# from here you can use your plot function.
# Or Counter
from collections import Counter
gender_counts = Counter(g)
# You have to create another method for plotting the gender.
You can group by 'P_ID' and take the first row for each of them (supposing a 'P_ID' has only one gender and only one sepsis). Then you can call sns.countplot on that dataframe, using gender for x and sepsis for hue (or vice versa). You can rename the values in the columns to show their names in the legend and in the tick labels.
import matplotlib.pyplot as plt
import seaborn as sns
from io import StringIO
data_str = '''
HR|SBP|DBP|Sepsis|Gender|P_ID
92|120|80|0|0|0
98|115|85|0|0|0
93|125|75|1|1|1
95|130|90|1|1|1
102|120|80|0|0|2
109|115|75|0|0|2
94|135|100|0|0|2
97|100|70|1|1|3
85|120|80|1|1|3
88|115|75|1|1|3
93|125|85|1|1|3
78|130|90|1|0|4
115|140|110|1|0|4
102|120|80|0|1|5
98|140|110|0|1|5
'''
df = pd.read_csv(StringIO(data_str), delimiter='|')
# new df: take Sepsis and Gender from the first row for every P_ID
df_per_PID = df.groupby('P_ID')[['Sepsis', 'Gender']].first()
# give names to the values in the columns
df_per_PID = df_per_PID.replace({'Gender': {0: 'Male', 1: 'Female'}, 'Sepsis': {0: 'No sepsis', 1: 'Sepsis'}})
# show counts per Gender and Sepsis
ax = sns.countplot(data=df_per_PID, x='Gender', hue='Sepsis', palette='rocket')
ax.legend(title='') # remove title, as it is clear from the legend items
ax.set_xlabel('')
for bars in ax.containers:
ax.bar_label(bars)
# ax.margins(y=0.1) # make some extra space for the labels
ax.locator_params(axis='y', integer=True)
sns.despine()
plt.show()
I using pandas to webscrape this site https://www.mapsofworld.com/lat_long/poland-lat-long.html but i only gettin 3 elements. How could I get all elements from table?
import numpy as np
import pandas as pd
#for getting world map
import folium
# Retreiving Latitude and Longitude coordinates
info = pd.read_html("https://www.mapsofworld.com/lat_long/poland-lat-long.html",match='Augustow',skiprows=2)
#convering the table data into DataFrame
coordinates = pd.DataFrame(info[0])
data = coordinates.head()
print(data)
It looks like if you install and use html5lib as your parser it may fix your issues:
df = pd.read_html("https://www.mapsofworld.com/lat_long/poland-lat-long.html",attrs={"class":"tableizer-table"},skiprows=2,flavor="html5lib")
>>>df
[ 0 1 2
0 Locations Latitude Longitude
1 NaN NaN NaN
2 Augustow 53°51'N 23°00'E
3 Auschwitz/Oswiecim 50°02'N 19°11'E
4 Biala Podxlaska 52°04'N 23°06'E
.. ... ... ...
177 Zawiercie 50°30'N 19°24'E
178 Zdunska Wola 51°37'N 18°59'E
179 Zgorzelec 51°10'N 15°0'E
180 Zyrardow 52°3'N 20°28'E
181 Zywiec 49°42'N 19°10'E
[182 rows x 3 columns]]
I am wanting to display the confidence interval for each bar in my plot, but they do not seem to show. I have two dataframes, and I am displaying the average of the NUMBER_GIRLS column in my plot from both dataframes.
For example, consider the two dataframes (shown below).
schools_north_df
ID NAME NUMBER_GIRLS
----------------------------
1 SCHOOL_1 32
2 SCHOOL_2 12
3 SCHOOL_3 26
schools_south_df
ID NAME NUMBER_GIRLS
----------------------------
1 SCHOOL_1 56
2 SCHOOL_2 33
3 SCHOOL_3 34
Therefore, I have used this code (shown below) to plot my barplot with the confidence intervals showing for each bar - but when plotting it, the confidence interval does not show up.
import matplotlib.pyplot as plt
objects = ('North', 'South')
y_pos = np.arange(len(objects))
avg_girls = [schools_north_df[NUMBER_GIRLS].mean(), schools_south_df[NUMBER_GIRLS].mean()]
sns.barplot(y_pos, avg_girls, ci=95)
plt.xticks(y_pos, objects)
plt.title('Average Number of Girls')
plt.show()
If anyone could kindly help me and indicate what is wrong with my code. I really need the confidence interval to display on my barplot.
Thank you very much!
If you want seaborn to display the confidence intervals, you need to let seaborn aggregate the data by itself (that is to say, provide the raw data instead of calculating the mean yourself).
I would create a new dataframe with an extra column (region) to indicate whether the data are from the "north" or the "south" and then request seaborn to plot NUMBER_GIRLS vs region:
df = pd.concat([schools_north_df.assign(region='North'), schools_south_df.assign(region='South')])
output:
ID NAME NUMBER_GIRLS region
0 1 SCHOOL_1 32 North
1 2 SCHOOL_2 12 North
2 3 SCHOOL_3 26 North
0 1 SCHOOL_1 56 South
1 2 SCHOOL_2 33 South
2 3 SCHOOL_3 34 South
plot:
sns.barplot(data=df, x='region', y='NUMBER_GIRLS', ci=95)
Below are three columns VMDensity, ServerswithCorrectable errors and VMReboots.
VMDensity correctableCount avgVMReboots
LowDensity 7 5
HighDensity 1 23
LowDensity 5 11
HighDensity 1 23
LowDensity 9 5
HighDensity 1 22
HighDensity 1 22
LowDensity 9 2
LowDensity 9 6
LowDensity 5 3
I tried the following but not sure how to create it by groups with different colors.
import matplotlib.pyplot as plt
import pandas as pd
plt.scatter(df.correctableCount, df.avgVMReboots)
Now, I need generate a scatter plot with the grouping by VMDensity. The low density VM's should be in one color and the high density in another one.
If I understand you correctly you do not need to "group" the data: You want to plot all data points regardsless. You just want to color them differently. So try something like
plt.scatter(df.correctableCount, df.avgVMReboots, c=df.VMDensity)
You will need to map the df.VMDensity strings to numbers and/or play with scatter's cmap parameter.
See this example from matplotlib's gallery.