This question already has answers here:
Plot key count per unique value count in pandas
(3 answers)
Closed 2 years ago.
I am trying to count the number of labels for my multilabel classification, but I fail to plot a bar graph for my label column. Is there anybody who can help me out? i already used below code to plot but it shows
*'DataFrame' object has no attribute 'arange'
As you can see the multiple labels are there in a Label column so I want to plot a bar graph for them please help me out
i=data.arange(20)
tag_df_sorted.head(20).plot(kind='bar')
plt.title('Frequency of top 20 tags')
plt.xticks(i, tag_df_sorted['Labels'])
plt.xlabel('Tags')
plt.ylabel('Counts')
plt.show()
Seems like you want to have a histogram.
You can either go like this:
tag_df_sorted.groupby('Labels').count().plot()
or with Pandas's hist function:
# number of unique values in the column "Labels"
Num = len(tag_df_sorted['Labels'].unique())
# plot histogram
hist = tag_df_sorted['Labels'].hist(bins=Num )
There is a nice little tutorial on plotting histograms here.
Related
This question already has answers here:
How to plot multiple pandas columns
(3 answers)
Plot multiple columns of pandas DataFrame using Seaborn
(2 answers)
How do I create a multiline plot using seaborn?
(3 answers)
Closed 26 days ago.
Newbie to Python so am unsure whether this can be done in one graph or not. I have one DataFrame containing Year, Number of Accidents and Number of Fatalities:
I am trying to generate a line plot that shows x axis = Year, y axis = number of instances per year, and 2 lines showing number of each individual column. Using Seaborn, I can only see a way to map 2 columns and hue. Can anyone please provide any advice on whether this is achievable in either Matplotlib or Seaborn.
Tried using Seaborn but cannot work out how to set up x and y axis as required and show 2 individual columns within that:
sns.lineplot(x=f1_safety['NumberOfFatalities'],y=f1_safety['NumberOfAccidents'].count(), hue = f1_safety['year'].count())
plt.show()
There are at least two ways to accomplish what you want to do here.
The simpler one uses pandas built-in plotting API. You can plot dataframes directly when they are already in the correct form. In your case, you need to set the year as the index, and then can plot right away:f1_safety.set_index("year").plot()
If you want to use seaborn, you first need to transform the data into the correct format. seaborn takes x and y, and you can not specify different y columns directly (like y1, y2 and so on). Instead, you need to transform the data into "long format". In such a table, you get one index or id column, one value column and a "description" kind of columns. This works like this:
f1_safety = pd.melt(df, id_vars="year", value_vars=["NumberOfAccidents", "NumberOfFatalities"])
sns.lineplot(data=f1_safety, x="year", y="value", hue="variable")
The plot in both cases looks quite the same:
There are other ways. In particular, in Jupyter you can execute two plot statements in the same cell, and matplotlib will put the plots into the same figure, even cycling through the colors as necessary.
This question already has answers here:
How to plot and annotate grouped bars in seaborn / matplotlib
(1 answer)
How to plot and annotate a grouped bar chart
(1 answer)
How to add value labels on a bar chart
(7 answers)
Closed 1 year ago.
I have been working on a campus recruitment dataset. The target variable in the dataset is "status", which indicates if the student is placed or not. Now, I am comparing each variable (for e.g. gender) with the target variable (status of placement), to know which variable affects the target variable the most. To compare two variables, I have been using countplots in seaborn. The plot for the variable "gender" looks like this.
Image showing the sns plot
The code for the sns plot is as follows:
ax = sns.countplot(x = "cat_degree_t", hue = "status", order = df['cat_degree_t'].value_counts().index, data = df);
abs_values = df["cat_degree_t"].value_counts().values;
ax.bar_label(container=ax.containers[0], labels=abs_values);
Now I want to know how I could add values of individual bars in the countplot (not the total value like already written in the figure shown above, but on every individual bar). This would help me find out the percentage of placed and not placed for each category in the variable "gender".
Any help would be really appreciated.
Thanks
This question already has answers here:
Detect and exclude outliers in a pandas DataFrame
(19 answers)
Closed 1 year ago.
See the violinplot:
here I'm showing the points to show that the long tail of the violin is due to a single point. I would like to ignore these outliers points so that I have a more concise violin plot. Can I do that with seaborn when plotting the violin or do I have to remove them from the distribution myself?
You can do it by excluding the outlier data while passing it through the plot function.
e.g.
sns.violinplot(y = df[df["Column"]<x]["Column"])
wherein, df is your dataframe. Column is the name of the column you want to plot and x is the outlier value that you want to exclude.
This question already has an answer here:
pyplot, why isn't the x-axis showing?
(1 answer)
Closed 3 years ago.
I am trying to plot a graph using matplotlib library.
This is my code:
df = pd.DataFrame()
df = milo_data2.loc[milo_data2['id'] == device]
plt.figure()
plt.title(device)
plt.ylabel('Counter')
plt.plot(df['timestamp'],df['counter'])
The graph looks like
The values on the x-axis are crowded and not readable.(The bold black line is the group of values overlapping each other) How do I reduce the number of values on the x-axis so that I can see some values on x-axis to get an estimate.
You can manually set the ticks to display. For instance, you can leave every tenth tick:
ticks = list(df['timestamp'])
plt.xticks([ticks[i] for i in range(len(ticks)) if i % 10 == 0], rotation='vertical')
For more information see documentation
This question already has answers here:
reducing number of plot ticks
(10 answers)
Pandas: bar plot xtick frequency
(1 answer)
Closed 4 years ago.
I'm having a slightly frustrating situation with pandas/matplotlib (not sure which one's at fault here). If I create a simple random series and plot it as a line chart, the formatting is pretty clean:
test1 = pd.Series(index = pd.date_range(start='2000-06-30', end='2018-06-30', freq='3M'),
data = np.random.rand(73))
test1.plot(kind='line')
Most satisfactory.
If I change the format to bar, though, the x-axis formatting goes to hell in a handbasket:
Matplotlib seems to feel obliged to label every single bar. Is there a logical reason for this or an easy fix?
Thanks in advance.
Matplotlib is trying to use each time as its own bar, and is trying to label each bar appropriately. To change this behaviour, you should change the xticks or the xticklabels. For example, one quick way to just remove the labels entirely is to do something like
subplot = test1.plot(kind='bar')
ax = subplot.axes
#ax.set_xticks([]) # Alternatively, you can manually adjust the ticks
ax.set_xticklabels([]) # or their labels
f = ax.get_figure()
f.show()
which will produce
You can reduce the number of ticks to something like using every nth tick and its label in the same way.