How to plot the string values on the graph in matplotlib? - python

I am doing the data segmentation where I have huge data of 1200 rows and 17 columns. I want to plot the graph for entire data for country and population.
When I am trying to work with below code I am getting an error:
ValueError: could not convert string to float: 'Canada'
The code:
import pandas as pd # for dataframes
import matplotlib.pyplot as plt # for plotting graphs
import seaborn as sns # for plotting graphs
import datetime as dt
data = pd.read_excel("TestData.xls")
plt.figure(1, figsize=(15, 6))
n=0
for x in ['Country', 'Product', 'Sales']:
n += 1
plt.subplot(1,3,n)
plt.subplots_adjust(hspace=0.5, wspace=0.5)
sns.distplot(data[x], bins=20)
plt.title('Displot of {}'.format(x))
plt.show()

If you're passing an object of type str as the first argument in the seaborn.distplot() method, make sure that the string is in the format of an integer or float.
You can try:
import pandas as pd # for dataframes
import matplotlib.pyplot as plt # for plotting graphs
import seaborn as sns # for plotting graphs
import datetime as dt
data = pd.read_excel("TestData.xls")
plt.figure(1, figsize=(15, 6))
n=0
for x in ['Country', 'Product', 'Sales']:
n += 1
plt.subplot(1,3,n)
plt.subplots_adjust(hspace=0.5, wspace=0.5)
a = data[x]
if a.replace('.', '', 1).isdigit():
sns.distplot(a, bins=20)
else:
print(f"{a} is not a float.")
plt.title('Displot of {}'.format(x))
plt.show()
But do note from the linked documentation:
Warning
This function is deprecated and will be removed in a future version. >Please adapt your code to use one of two new functions:
displot(), a figure-level function with a similar flexibility over the kind of plot to draw
histplot(), an axes-level function for plotting histograms, including with kernel density smoothing

Related

Plot dates on x,y axes of matplotlib 3d graph

I have a large Pandas DataFrame that contains three columns: two different dates and one of measurement (floats). I want to plot a 3d figure (eg. trisurf, plot_surface, etc) where the dates are on the x and y axes and measurement is on the z axis. I tried using the suggestions in this post, but it isn't helpful.
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
import numpy as np
import matplotlib.dates as dates
import datetime
import matplotlib.ticker as ticker
import pandas as pd
df = pd.DataFrame()
df['date1'] = pd.date_range(start='2018-01-05', end='2018-04-15', freq='1D')
df['date2'] = pd.date_range(start='2018-01-19', end='2018-04-29', freq='1D')
df['mydata'] = np.sin(2*np.linspace(-1,1,len(df))) # dummy variable
def format_date(x, pos=None):
return dates.num2date(x).strftime('%Y-%m-%d') #use FuncFormatter to format dates
plt.figure()
ax = Axes3D(fig,rect=[0,0.1,1,1]) #make room for date labels
ax.plot_trisurf(df.date1, df.date2, df.mydata, cmap=cm.coolwarm, linewidth=0.2)
ax.w_xaxis.set_major_locator(ticker.FixedLocator(some_dates)) # I want all the dates on my xaxis
ax.w_xaxis.set_major_formatter(ticker.FuncFormatter(format_date))
ax.w_yaxis.set_major_locator(ticker.FixedLocator(some_dates))
ax.w_yaxis.set_major_formatter(ticker.FuncFormatter(format_date))
for tl in ax.w_xaxis.get_ticklabels(): # re-create what autofmt_xdate but with w_xaxis
tl.set_ha('right')
tl.set_rotation(30)
for tl in ax.w_yaxis.get_ticklabels():
tl.set_ha('right')
#tl.set_rotation(30)
ax.set_xlabel('date1')
ax.set_ylabel('date2')
ax.set_zlabel('mydata')
plt.show()
I keep getting the error RuntimeError: Error in qhull Delaunay triangulation calculation: singular input data (exitcode=2); use python verbose option (-v) to see original qhull error. What am I doing wrong and how do I resolve it?

How to make a distplot for each column in a pandas dataframe

I 'm using Seaborn in a Jupyter notebook to plot histograms like this:
import numpy as np
import pandas as pd
from pandas import DataFrame
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
df = pd.read_csv('CTG.csv', sep=',')
sns.distplot(df['LBE'])
I have an array of columns with values that I want to plot histogram for and I tried plotting a histogram for each of them:
continous = ['b', 'e', 'LBE', 'LB', 'AC']
for column in continous:
sns.distplot(df[column])
And I get this result - only one plot with (presumably) all histograms:
My desired result is multiple histograms that looks like this (one for each variable):
How can I do this?
Insert plt.figure() before each call to sns.distplot() .
Here's an example with plt.figure():
Here's an example without plt.figure():
Complete code:
# imports
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
plt.rcParams['figure.figsize'] = [6, 2]
%matplotlib inline
# sample time series data
np.random.seed(123)
df = pd.DataFrame(np.random.randint(-10,12,size=(300, 4)), columns=list('ABCD'))
datelist = pd.date_range(pd.datetime(2014, 7, 1).strftime('%Y-%m-%d'), periods=300).tolist()
df['dates'] = datelist
df = df.set_index(['dates'])
df.index = pd.to_datetime(df.index)
df.iloc[0]=0
df=df.cumsum()
# create distplots
for column in df.columns:
plt.figure() # <==================== here!
sns.distplot(df[column])
Distplot has since been deprecated in seaborn versions >= 0.14.0. You can, however, use sns.histplot() to plot histogram distributions of the entire dataframe (numerical features only) in the following way:
fig, axes = plt.subplots(2,5, figsize=(15, 5))
ax = axes.flatten()
for i, col in enumerate(df.columns):
sns.histplot(df[col], ax=ax[i]) # histogram call
ax[i].set_title(col)
# remove scientific notation for both axes
ax[i].ticklabel_format(style='plain', axis='both')
fig.tight_layout(w_pad=6, h_pad=4) # change padding
plt.show()
If, you specifically want a way to estimate the probability density function of a continuous random variable using the Kernel Density Function (mimicing the default behavior of sns.distplot()), then inside the sns.histplot() function call, add kde=True, and you will have curves overlaying the histograms.
Also works when looping with plt.show() inside:
for column in df.columns:
sns.distplot(df[column])
plt.show()

Passing datetime-like object to seaborn.lmplot

I am trying to do a plot of values over time using seaborn linear model plot but I get the error
TypeError: invalid type promotion
I have read that it is not possible to plot pandas date objects, but that seems really strange given seaborn requires you pass a pandas DataFrame to the plots.
Below is a simple example. Does anyone know how I can get this to work?
import pandas as pd
import seaborn as sns; sns.set(color_codes=True)
import matplotlib.pyplot as plt
date = ['1975-12-03','2008-08-20', '2011-03-16']
value = [1,4,5]
df = pd.DataFrame({'date':date, 'value': value})
df['date'] = pd.to_datetime(df['date'])
g = sns.lmplot(x="date", y="value", data=df, size = 4, aspect = 1.5)
I am trying to do a plot like this one I created in r using ggplot hence why I want to use sns.lmplot
You need to convert your dates to floats, then format the x-axis to reinterpret and format the floats into dates.
Here's how I would do this:
import pandas
import seaborn
from matplotlib import pyplot, dates
%matplotlib inline
date = ['1975-12-03','2008-08-20', '2011-03-16']
value = [1,4,5]
df = pandas.DataFrame({
'date': pandas.to_datetime(date), # pandas dates
'datenum': dates.datestr2num(date), # maptlotlib dates
'value': value
})
#pyplot.FuncFormatter
def fake_dates(x, pos):
""" Custom formater to turn floats into e.g., 2016-05-08"""
return dates.num2date(x).strftime('%Y-%m-%d')
fig, ax = pyplot.subplots()
# just use regplot if you don't need a FacetGrid
seaborn.regplot('datenum', 'value', data=df, ax=ax)
# here's the magic:
ax.xaxis.set_major_formatter(fake_dates)
# legible labels
ax.tick_params(labelrotation=45)
I have found a derived solution from Paul H. for plotting timestamp in seaborn. I had to apply it over my data due to some backend error messages that was returning.
In my solution, I added a matplotlib.ticker FuncFormatter over the ax.xaxis.set_major_formatter. This FuncFormatter wraps the fake_dates function. This way, one doesn't need to insert the #pyplot.FuncFormatter beforehand.
Here is my solution:
import pandas
import seaborn
from matplotlib import pyplot, dates
from matplotlib.ticker import FuncFormatter
date = ['1975-12-03','2008-08-20', '2011-03-16']
value = [1,4,5]
df = pandas.DataFrame({
'date': pandas.to_datetime(date), # pandas dates
'datenum': dates.datestr2num(date), # maptlotlib dates
'value': value
})
def fake_dates(x, pos):
""" Custom formater to turn floats into e.g., 2016-05-08"""
return dates.num2date(x).strftime('%Y-%m-%d')
fig, ax = pyplot.subplots()
# just use regplot if you don't need a FacetGrid
seaborn.regplot('datenum', 'value', data=df, ax=ax)
# here's the magic:
ax.xaxis.set_major_formatter(FuncFormatter(fake_dates))
# legible labels
ax.tick_params(labelrotation=45)
fig.tight_layout()
I hope that works.

Arranging multiple for loop categorical plots with Seaborn

I am creating multiple categorical plots for data frame df with a for loop:
object_bol = df.dtypes == 'object'
for catplot in df.dtypes[object_bol].index:
sns.countplot(y=catplot,data=df)
plt.show()
Output is all the plots sequenced one after the other, how do i assign this to a grid with n columns and m rows (n & m vary depending on number of objects in data frame)?
You would want to extend the example from How do I plot two countplot graphs side by side in seaborn? to more subplots.
import numpy as np
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt
df=pd.DataFrame(np.random.choice(list("abcd"), size=(100,20), p=[.4,.3,.2,.1]))
fig, axes =plt.subplots(5,4, figsize=(10,10), sharex=True)
axes = axes.flatten()
object_bol = df.dtypes == 'object'
for ax, catplot in zip(axes, df.dtypes[object_bol].index):
sns.countplot(y=catplot, data=df, ax=ax, order=np.unique(df.values))
plt.tight_layout()
plt.show()
You would get something similar without seaborn directly from pandas:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
df=pd.DataFrame(np.random.choice(list("abcd"), size=(100,20), p=[.4,.3,.2,.1]))
df.apply(pd.value_counts).plot(kind="barh", subplots=True, layout=(4,5), legend=False)
plt.tight_layout()
plt.show()

Python Pandas Matplotlib Plot Colored by type value defined in single column

I have data of the following format:
import pandas as ps
table={'time':[1,2,3,4,5,1,2,3,4,5,1,2,3,4,5],\
'data':[1,1,2,2,2,1,2,3,4,5,1,2,2,2,3],\
'type':['a','a','a','a','a','b','b','b','b','b','c','c','c','c','c']}
df=ps.DataFrame(table,columns=['time','data','type']
I would like to plot data as a function of time connected as a line, but I would like each line to be a separate color for unique types. In this example, the result would be three lines: a data(time) line for each type a, b, and, c. Any guidance is appreciated.
I have been unable to produce a line with this data--pandas.scatter will produce a plot, while pandas.plot will not. I have been messing with loops to produce a plot for each type, but I have not found a straight forward way to do this. My data typically has an unknown number of unique 'type's. Does pandas and/or matpltlib have a way to create this type of plot?
Pandas plotting capabilities will allow you to do this if everything is indexed properly. However, sometimes it's easier to just use matplotlib directly:
import pandas as pd
import matplotlib.pyplot as plt
table={'time':[1,2,3,4,5,1,2,3,4,5,1,2,3,4,5],
'data':[1,1,2,2,2,1,2,3,4,5,1,2,2,2,3],
'type':['a','a','a','a','a','b','b','b','b','b','c','c','c','c','c']}
df=pd.DataFrame(table, columns=['time','data','type'])
groups = df.groupby('type')
fig, ax = plt.subplots()
for name, group in groups:
ax.plot(group['time'], group['data'], label=name)
ax.legend(loc='best')
plt.show()
If you'd prefer to use the pandas plotting wrapper, you'll need to override the legend labels:
import pandas as pd
import matplotlib.pyplot as plt
table={'time':[1,2,3,4,5,1,2,3,4,5,1,2,3,4,5],
'data':[1,1,2,2,2,1,2,3,4,5,1,2,2,2,3],
'type':['a','a','a','a','a','b','b','b','b','b','c','c','c','c','c']}
df=pd.DataFrame(table, columns=['time','data','type'])
df.index = df['time']
groups = df[['data', 'type']].groupby('type')
fig, ax = plt.subplots()
groups.plot(ax=ax, legend=False)
names = [item[0] for item in groups]
ax.legend(ax.lines, names, loc='best')
plt.show()
Just to throw in the seaborn solution.
import seaborn as sns
import matplotlib.pyplot as plt
g = sns.FacetGrid(df, hue="type", size=5)
g.map(plt.plot, "time", "data")
g.add_legend()

Categories

Resources