Issue with x-axis tick labels in matplotlib scatter plot

Issue with x-axis tick labels in matplotlib scatter plot - python

I'm trying to plot some data that I have and I'm having issues with the x-axis tick labels. Does anyone have a fix for this? Also, is there an easier way to plot this data with certain conditions? For example, I'm looking at poker hands here, and I only want to plot this data for individuals that have over 50 hands (ie. data points). To do this, I created a new list and filtered out those with Hands < 50, is there a way of plotting this with pandas without creating a new list?
## For data handling
import pandas as pd
import numpy as np
from pandas import plotting
## For plotting
import matplotlib.pyplot as plt
import seaborn as sns
from matplotlib.figure import Figure
preflop = pd.read_csv("all_player_preflop_report_tourney.csv", thousands=',')
#preflop['Hands'] = preflop['Hands'].astype(int) if preflop['Hands'] < 20000
preflop['Hands'] = preflop['Hands'].astype(int)
preflop = preflop.rename(columns={'BB/100':'BB_100','Raise First':'RFI','WTSD %': 'WTSD', 'All-In Adj BB/100':'adj_BB_100','Avg PF All-In Equity':'pf_all_in','CC 2Bet PF':'cc_2bet','3Bet PF':'3bet','2Bet PF & Call 3Bet':'2Bet_call_3Bet','Raise & 4Bet+ PF':'rfi_and_4bet+','2Bet PF & Fold':'2bet_and_fold','5Bet+ PF':'5bet+','3Bet PF & Fold':'3bet_and_fold','Call Any PFR':'call_any_pfr','Call Steal':'call_steal', 'Call vs BTN Open':'call_btn_open','CC 3Bet+ PF':'cc_3bet+','Limp Behind':'limp_behind','Raise Limpers':'raise_limpers'})
preflop = preflop.set_index('Player')
preflop_copy = preflop.copy()
preflop_train = preflop_copy.sample(frac = .75, random_state = 250)
preflop_test = preflop_copy.drop(preflop_train.index)
## first make a figure
## this makes a figure that is 8 units by 8 units
plt.figure(figsize = (8,8))
preflop_50 = preflop_copy.loc[(preflop_copy.Hands > 100)]
#preflop_50.plot.scatter(x="RFI", y="BB_100")
plt.scatter(preflop_50.RFI,preflop_50.BB_100)
x = np.arange(0,1,0.1)
plt.xticks(x)
#Figure.align_xlabels(plot)
## Always good practice to label well when
## presenting a figure to others
## place an xlabel
plt.xlabel("RFI", fontsize =16)
## place a ylabel
plt.ylabel("BB/100", fontsize = 16)
## type this to show the plot
plt.show()

Related

Avoiding overlapping plots in seaborn bar plot

I have the following code where I am trying to plot a bar plot in seaborn. (This is a sample data and both x and y variables are continuous variables).
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
xvar = [1,2,2,3,4,5,6,8]
yvar = [3,6,-4,4,2,0.5,-1,0.5]
year = [2010,2011,2012,2010,2011,2012,2010,2011]
df = pd.DataFrame()
df['xvar'] = xvar
df['yvar']=yvar
df['year']=year
df
sns.set_style('whitegrid')
fig,ax=plt.subplots()
fig.set_size_inches(10,5)
sns.barplot(data=df,x='xvar',y='yvar',hue='year',lw=0,dodge=False)
It results in the following plot:
Two questions here:
I want to be able to plot the two bars on 2 side by side and not overlapped the way they are now.
For the x-labels, in the original data, I have alot of them. Is there a way I can set xticks to a specific frequency? for instance, in the chart above only I only want to see 1,3 and 6 for x-labels.
Note: If I set dodge = True then the lines become very thin with the original data.

For the first question, get the patches in the bar chart and modify the width of the target patch. It also shifts the position of the x-axis to represent the alignment.
The second question can be done by using slices to set up a list or a manually created list in a specific order.
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
xvar = [1,2,2,3,4,5,6,8]
yvar = [3,6,-4,4,2,0.5,-1,0.5]
year = [2010,2011,2012,2010,2011,2012,2010,2011]
df = pd.DataFrame({'xvar':xvar,'yvar':yvar,'year':year})
fig,ax = plt.subplots(figsize=(10,5))
sns.set_style('whitegrid')
g = sns.barplot(data=df, x='xvar', y='yvar', hue='year', lw=0, dodge=False)
for idx,patch in enumerate(ax.patches):
current_width = patch.get_width()
current_pos = patch.get_x()
if idx == 8 or idx == 15:
patch.set_width(current_width/2)
if idx == 15:
patch.set_x(current_pos+(current_width/2))
ax.set_xticklabels([1,'',3,'','',6,''])
plt.show()

Convert a histogram plot from a Pandas dataframe to a scatter plot

I have a histogram:
# Lets load a dataset of house prices in Boston.
from sklearn.datasets import load_diabetes
#sklearn gives you the data as a dictionary, so
diabetes = load_diabetes(as_frame=True)
data = diabetes['frame']
import matplotlib.pyplot as plt
%matplotlib inline
bmi_hist = plt.hist(data['bmi'], density=False)
bmi_hist = plt.ylabel("Frequency")
bmi_hist = plt.xlabel("Normalized BMI")
bp_hist = plt.hist(data['bp'], density=False)
bp_hist = plt.ylabel("Frequency")
bp_hist = plt.xlabel("Normalized BP")
This is a histogram for two of the columns in the frame above.
I want to compare these two in a scatter graph. My attempts haven't been quite successful as I know I need an X and a Y to plot.
I thought I would use the same axis as the histogram:
y_bmi = data['bmi'].value_counts() # frequency
x_bmi = data['bmi'] # normalized value
ax1 = df.plot.scatter(x = x_bmi, y= y_bmi, c='DarkBlue')
But this can only be used on the 'dataframe' so do I have to repeat the values of bmi column into a new dataframe? or is there a simpler method?
Any help would be greatly appreciated.
Many Thanks.

Matplotlib Plot time series with different periodicity

I have 2 dfs. One of them has data for a month. Another one, averages for the past quarters. I wanna plot the averages in front of the monthly data. How can I do it? Please note that I am trying to plot averages as dots and monthly as line chart.
So far my best result was achieved by ax1=ax.twiny(), but still not ideal result as data point appear in throughout the chart, rather than just in front.
import pandas as pd
import numpy as np
import matplotlib.dates as mdates
from matplotlib.ticker import ScalarFormatter, FormatStrFormatter, FuncFormatter
import matplotlib.ticker as ticker
import matplotlib.pyplot as plt
date_base = pd.date_range(start='1/1/2018', end='1/30/2018')
df_base = pd.DataFrame(np.random.randn(30,4), columns=list("ABCD"), index=date_base)
date_ext = pd.date_range(start='1/1/2017', end='1/1/2018', freq="Q")
df_ext = pd.DataFrame(np.random.randn(4,4), columns=list("ABCD"), index=date_ext)
def drawChartsPlt(df_base, df_ext):
fig = plt.figure(figsize=(10,5))
ax = fig.add_subplot(111)
number_of_plots = len(df_base.columns)
LINE_STYLES = ['-', '--', '-.', 'dotted']
colormap = plt.cm.nipy_spectral
ax.set_prop_cycle("color", [colormap(i) for i in np.linspace(0,1,number_of_plots)])
date_base = df_base.index
date_base = [i.strftime("%Y-%m-%d") for i in date_base]
q_ends = df_ext.index
q_ends = [i.strftime("%Y-%m-%d") for i in q_ends]
date_base.insert(0, "") #to shift xticks so they match chart
date_base += q_ends
for i in range(number_of_plots):
df_base.ix[:-3, df_base.columns[i]].plot(kind="line", linestyle=LINE_STYLES[i%2], subplots=False, ax=ax)
#ax.set_xticks(date_base)
#ax.set_xticklabels(date_base)
# ax.xaxis.set_major_locator(ticker.MultipleLocator(20))
ax.xaxis.set_major_locator(ticker.LinearLocator(len(date_base)))
ax.xaxis.set_major_formatter(plt.FixedFormatter(date_base))
fig.autofmt_xdate()
# ax1=ax.twinx()
ax1=ax.twiny()
ax1.set_prop_cycle("color", [colormap(i) for i in np.linspace(0,1,number_of_plots)])
for i in range(len(df_ext.columns)):
ax1.scatter(x=df_ext.index, y=df_ext[df_ext.columns[i]])
ax.set_title("Test")
#plt.minorticks_off())
ax.minorticks_off()
#ax1.minorticks_off()
#ax1.set_xticklabels(date_base)
#ax1.set_xticklabels(q_ends)
ax.legend(loc="center left", bbox_to_anchor=(1,0.5))
ax.xaxis.label.set_size(12)
plt.xlabel("TEST X Label")
plt.ylabel("TEST Y Label")
ax1.set_xlabel("Quarters")
plt.show()
drawChartsPlt(df_base, df_ext)

The way I ended up coding it is by saving quarterly index of df_ext to a temp variable, overwriting it with dates that are close to df_base.index using pd.date_range(start=df_base.index[-1], periods=len(df_ext), freq='D'), and the finally setting the dates that I need with ax.set_xticklabels(list(date_base)+list(date_ext)).
It looks like it could be achieved using broken axes as indicated Break // in x axis of matplotlib and Python/Matplotlib - Is there a way to make a discontinuous axis?, but I haven't tried that solution.

Generating a smooth line with Pandas dataframe and Matplotlib

I am trying to generate a smooth line using a dataset that contains time (measured as number of days) and a set of numbers that represent a socioeconomic variable.
Here is a sample of my data:
date, data
726,1.2414
727,1.2414
728,1.2414
729,1.2414
730,1.2414
731,1.2414
732,1.2414
733,1.2414
734,1.2414
735,1.2414
736,1.2414
737,1.804597701
738,1.804597701
739,1.804597701
740,1.804597701
741,1.804597701
742,1.804597701
743,1.804597701
744,1.804597701
745,1.804597701
746,1.804597701
747,1.804597701
748,1.804597701
749,1.804597701
750,1.804597701
751,1.804597701
752,1.793103448
753,1.793103448
754,1.793103448
755,1.793103448
756,1.793103448
757,1.793103448
758,1.793103448
759,1.793103448
760,1.793103448
761,1.793103448
762,1.793103448
763,1.793103448
764,1
765,1
This is my code so far:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
%matplotlib inline
out_file = "path_to_file/file.csv"
df = pd.read_csv(out_file)
time = df['date']
data = df['data']
ax1 = plt.subplot2grid((4,3),(0,0), colspan = 2, rowspan = 2) # Will be adding other plots
plt.plot(time, data)
plt.yticks(np.arange(1,5,1)) # Include classes 1-4 showing only 1 step changes
plt.gca().invert_yaxis() # Reverse y axis
plt.ylabel('Trend', fontsize = 8, labelpad = 10)
This generates the following plot:
Test plot
I have seen posts that answer similar questions (like the ones below), but can't seem to get my code to work. Can anyone suggest an elegant solution?
Generating smooth line graph using matplotlib
Python Matplotlib - Smooth plot line for x-axis with date values

Set Seaborn PairGrid x-axis with 2 different value ranges

[The resolution is described below.]
I'm trying to create a PairGrid. The X-axis has at least 2 different value ranges, although even when 'cvar' below is plotted by itself the x-axis overwrites itself.
My question: is there a way to tilt the x-axis labels to be vertical or have fewer x-axis labels so they don't overlap? Is there another way to solve this issue?
====================
import seaborn as sns
import matplotlib.pylab as plt
import pandas as pd
import numpy as np
columns = ['avar', 'bvar', 'cvar']
index = np.arange(10)
df = pd.DataFrame(columns=columns, index = index)
myarray = np.random.random((10, 3))
for val, item in enumerate(myarray):
df.ix[val] = item
df['cvar'] = [400,450,43567,23000,19030,35607,38900,30202,24332,22322]
fig1 = sns.PairGrid(df, y_vars=['avar'],
x_vars=['bvar', 'cvar'],
palette="GnBu_d")
fig1.map(plt.scatter, s=40, edgecolor="white")
# The fix: Add the following to rotate the x axis.
plt.xticks( rotation= -45 )
=====================
The code above produces this image
Thanks!

I finally figured it out. I added "plt.xticks( rotation= -45 )" to the original code above. More can be fund on the MatPlotLib site here.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Issue with x-axis tick labels in matplotlib scatter plot - python

Related

Avoiding overlapping plots in seaborn bar plot

Convert a histogram plot from a Pandas dataframe to a scatter plot

Matplotlib Plot time series with different periodicity

Generating a smooth line with Pandas dataframe and Matplotlib

Set Seaborn PairGrid x-axis with 2 different value ranges

Categories

Resources