secondary_y=True changes x axis in pandas

secondary_y=True changes x axis in pandas - python

I'm trying to plot two series together in Pandas, from different dataframes.
Both their axis are datetime objects, so they can be plotted together:
amazon_prices.Close.plot()
data[amazon].BULL_MINUS_BEAR.resample("W").plot()
plt.plot()
Yields:
All fine, but I need the green graph to have its own scale. So I use the
amazon_prices.Close.plot()
data[amazon].BULL_MINUS_BEAR.resample("W").plot(secondary_y=True)
plt.plot()
This secondary_y creates a problem, as instead of having the desired graph, I have the following:
Any help with this is hugely appreciated.
(Less relevant notes: I'm (evidently) using Pandas, Matplotlib, and all this is in an Ipython notebook)
EDIT:
I've since noticed that removing the resample("W") solves the issue. It is still a problem however as the non-resampled data is too noisy to be visible. Being able to plot sampled data with a secondary axis would be hugely helpful.

import matplotlib.pyplot as plt
import pandas as pd
from numpy.random import random
df = pd.DataFrame(random((15,2)),columns=['a','b'])
df.a = df.a*100
fig, ax1 = plt.subplots(1,1)
df.a.plot(ax=ax1, color='blue', label='a')
ax2 = ax1.twinx()
df.b.plot(ax=ax2, color='green', label='b')
ax1.set_ylabel('a')
ax2.set_ylabel('b')
ax1.legend(loc=3)
ax2.legend(loc=0)
plt.show()

I had the same issue, always getting a strange plot when I wanted a secondary_y.
I don't know why no-one mentioned this method in this post, but here's how I got it to work, using the same example as cphlewis:
import matplotlib.pyplot as plt
import pandas as pd
from numpy.random import random
df = pd.DataFrame(random((15,2)),columns=['a','b'])
ax = df.plot(secondary_y=['b'])
plt.show()
Here's what it'll look like

Related

Plot separately different parts of a dataframe

I would like to plot from the seaborn dataset 'tips'.
import seaborn as sns
import pandas as pd
tips = sns.load_dataset("tips")
x1 = tips.loc[(df['time']=='lunch'), 'tip']
x2 = tips.loc[(df['time']=='dinner'),'tip']
x1.plot.kde(color='orange')
x2.plot.kde(color='blue')
plt.show()
I don't know exactly where it's wrong...
Thanks for the help.

Seaborn's sns.kdeplot() supports the hue argument to split the plot between different categories:
import seaborn as sns
import pandas as pd
tips = sns.load_dataset("tips")
sns.kdeplot(data=tips, x='tip', hue='time')
Of course your approach could work too, but there are several problems with your code:
What is df? Shouldn't that be tips?
The category names Lunch and Dinner must be capitalized, as in the data.
You're mixing different indexing techniques. It should be e.g. x1 = tips.tip[tips['time'] == 'Lunch'].
If you want to plot two KDE in the same diagram, they should be scaled according to sample size. With my approach above, seaborn has done that automatically.

As you are loading data from the seaborn built-in datasets check that your column names are case sensitive replace them with correct name.
You can plot the cumulative distribution between the time and density as follows:
sns.kdeplot(
data=tips, x="total_bill", hue="time",
cumulative=True, common_norm=False, common_grid=True,
)

Creating multiple plot using for loop from dataframe

I am trying to create a figure which contains 9 subplots (3 x 3). X, and Y axis data is coming from the dataframe using groupby. Here is my code:
fig, axs = plt.subplots(3,3)
for index,cause in enumerate(cause_list):
df[df['CAT']==cause].groupby('RYQ')['NO_CONSUMERS'].mean().axs[index].plot()
axs[index].set_title(cause)
plt.show()
However, it does not produce the desired output. In fact it returned the error. If I remove the axs[index]before plot() and put inside the plot() function like plot(ax=axs[index]) then it worked and produces nine subplot but did not display the data in it (as shown in the figure).
Could anyone guide me where am I making the mistake?

You need to flatten axs otherwise it is a 2d array. And you can provide the ax in plot function, see documentation of pandas plot, so using an example:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
cause_list = np.arange(9)
df = pd.DataFrame({'CAT':np.random.choice(cause_list,100),
'RYQ':np.random.choice(['A','B','C'],100),
'NO_CONSUMERS':np.random.normal(0,1,100)})
fig, axs = plt.subplots(3,3,figsize=(8,6))
axs = axs.flatten()
for index,cause in enumerate(cause_list):
df[df['CAT']==cause].groupby('RYQ')['NO_CONSUMERS'].mean().plot(ax=axs[index])
axs[index].set_title(cause)
plt.tight_layout()

How to Create Double or Stacked Bar Graph Using Matplotlib

How do you create a grouped or stacked bar graph using three sets of value_counts() data from a csv file in Python? I can successfully graph each set individually using this code:
dfwidget1.country.value_counts().plot('bar', color='blue')
I don't know, however, how to get them all to plot onto the same graph. Obviously this, which I've tried, doesn't work:
dfwidget1.country.value_counts().plot('bar', color='blue')
dfwidget2.country.value_counts().plot('bar', color='red')
dfwidget3.country.value_counts().plot('bar', color='yellow')
After researching, I've also tried using groupby(), but without success. If that's the way to go, I'd appreciate affirmation along those lines, and I'll go study up. If there is a simpler way to do it, I'm all ears.
Here's the toy df (saved as widget-by-country.csv):
Here's the code I've tried:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
sns.set()
df = pd.read_csv('widget-by-country.csv')
dfwidget1 = df[df['category'].str.contains('widget1', na=False)]
dfwidget2 = df[df['category'].str.contains('widget2', na=False)]
dfwidget3 = df[df['category'].str.contains('widget3', na=False)]
dfwidget1.country.value_counts().plot('bar', color='blue')
dfwidget2.country.value_counts().plot('bar', color='red')
dfwidget3.country.value_counts().plot('bar', color='yellow')
This is what I get, which does not show me the full distribution of countries where each widget is made:
I'd really like to see a grouped bar graph showing this data that looks something like this:

ploting a bar plot for large amount of data

I have a 752 data points which i need to plot,
I have plotted the data on bar plot using seaborn library in python , but graph i get is very unclear and I am not able to analyze anything through graph , is there any way i can view this graph more clearly and all data points fit with labels seen clearly in python

code written is following
import seaborn as sns
sns.set_style("whitegrid")
ax = sns.barplot(x="Events", y = "Count" , data = Unique_Complaints)

It is always difficult to visualise so many points. Nihal, has rightly pointed that it is best to use Pandas and statistical analysis to extract information from your data. Having said this, IDEs like Spyder and Pycharm and packages like Bokeh allow interactive plots where you can zoom to different parts of the plot. Here is an example with Pycharm:
Code:
# Import libraries
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
# Exponential decay function
x = np.arange(1,10, 0.1)
A = 7000
y = A*np.exp(-x)
# Plot the exponential function
sns.barplot(x = x, y = y)
plt.show()
Figure without magnification
Magnified figure

To see a large amount of data you can use the figure from matplotlib.pyplot like this
from matplotlib.pyplot import figure
figure(num=None, figsize=(20,18), dpi=80, facecolor='w', edgecolor='r')
sns.barplot(x="Events", y = "Count" , data = Unique_Complaints)
plt.show()
I am using this to see a graph with 49 variables and the result is:
My code is
from matplotlib.pyplot import figure
figure(num=None, figsize=(20,18), dpi=256, facecolor='w', edgecolor='r')
plt.title("Missing Value Prercentage")
sns.barplot(miss_val_per, df.columns)
plt.show()
Data I am using is:
https://www.kaggle.com/sobhanmoosavi/us-accidents

just swap x and y and try to increase the fig size

Is it possible to plot a "checkerboard" type plot in python?

I have a data set that has two independent variables and 1 dependent variable. I thought the best way to represent the dataset is by a checkerboard-type plot wherein the color of the cells represent a range of values, like this:
I can't seem to find a code to do this automatically.

You need to use a plotting package to do this. For example, with matplotlib:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.cm as cm
X = 100*np.random.rand(6,6)
fig, ax = plt.subplots()
i = ax.imshow(X, cmap=cm.jet, interpolation='nearest')
fig.colorbar(i)
plt.show()

For those who come across this years later as myself, what Original Poster wants is a heatmap.
Matplotlib has documentation regarding the following example here.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

secondary_y=True changes x axis in pandas - python

Related

Plot separately different parts of a dataframe

Creating multiple plot using for loop from dataframe

How to Create Double or Stacked Bar Graph Using Matplotlib

ploting a bar plot for large amount of data

Is it possible to plot a "checkerboard" type plot in python?

Categories

Resources