Matplotlib shows NaN on X axis [duplicate] - python

This question already has answers here:
Use index in pandas to plot data
(6 answers)
Closed 1 year ago.
I'm learning Python, specifically Pandas and Matplotlib at the moment. I have a dataset of Premier League Hattrick scorers and have been using pandas to do some basic analysis. I then want to produce a bar chart based on this data extract. I have been able to create a bar chart, but the X axis shows 'nan' instead of the player names.
My code to extract the data...
import matplotlib.pyplot as plt
import pandas as pd
top10 = df.groupby(['Player'])[['Goals']].count().sort_values(['Goals'],ascending=False).head(10)
This produces the following, which I know is a Pandas DataFrame as if I print the type of 'top10' i get the following:
<class 'pandas.core.frame.DataFrame'>
This produces the following if printed out...
I tried to create a chart direct from this dataFrame, but was given an error message 'KeyError: Player'
So, I made an new dataframe and plotted this, which was kind of successful, but it displayed 'nan' on the X access?
top10df = pd.DataFrame(top10,columns=['Player','Goals'])
top10df.plot(x ='Player', y='Goals', kind='bar')
plt.show()
I did manually create a dataframe and it worked, so unsure where to go, tried googling and searching stackoverflow with no success. Any ideas please??

You could plot directly using the groupby results in the following way:
top10.plot(kind='bar', title='Your Title', ylabel='Goals',
xlabel='Player', figsize=(6, 5))
A dummy example, since you did not supply your data (next time it's best to do so):
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({'category': list('XYZXY'),
'sex': list('mfmff'),
'ThisColumnIsNotUsed': range(5,10)})
x = df.groupby('sex').count()
x.plot(kind='bar', ylabel='category', xlabel='sex')
we get:

Related

How to create a heatmap of Pandas dataframe in Python

I'm trying to create a heatmap and I am following the following question:
Making heatmap from pandas DataFrame
My dataframe looks like the following picture:
I tried the following code:
years = ["1860","1870", "1880","1890","1900","1910","1920","1930","1940","1950","1960","1970","1980","1990","2000"]
kantons = ["AG","AI","AR","BE","BL","BS","FR","GE","GL","GR","JU","LU","NE","NW","OW","SG","SH","SO","SZ","TG","TI","UR","VD","VS","ZG","ZH"]
df = pd(abs(dfYears), index=years, columns=kantons)
which gives the exception that:
"AG" can not be used as float
So I thought if I need to drop the index column which is not possible.
Any suggestions?
When replicating similar data, you can do:
import pandas as pd
import numpy as np
years = ["1860","1870", "1880","1890","1900","1910","1920","1930","1940","1950","1960","1970","1980","1990","2000"]
kantons = ["AG","AI","AR","BE","BL","BS","FR","GE","GL","GR","JU","LU","NE","NW","OW","SG","SH","SO","SZ","TG","TI","UR","VD","VS","ZG","ZH"]
df = pd.DataFrame(np.random.randint(low=10000, high=200000, size=(15, 26)), index=years, columns=kantons)
df.style.background_gradient(cmap='Reds')
Pandas has some Builtin Styles for the most common visualization needs. .background_gradient function is a simple way for highlighting cells based on their values. cmap parameter determines the color map based on the matplotlib colormaps.

How can I get actual values from pandas df in sns.distplot displayed on x axis [duplicate]

This question already has an answer here:
Prevent scientific notation
(1 answer)
Closed 1 year ago.
I'm trying to create an histogram made of data I got as homework.
when I'm trying to plot it, values on the x axis are different (0.0-1.0) from those in the actual dataset (20,000 - 1,000,000).
How do I get the range of actual values from my data to be displayed on the x axis of the histogram instead?
My code:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.read_csv('okcupid_profiles.csv')
df = df[df['income'] != -1]
income_histogram = sns.distplot(df['income'], bins=40)
income_histogram
the histogram I've created
Thanks
The values displayed in the x-axis are the same on the dataset, if you can see in the bottom right corner there is 1e6, that mean :
0.1 * 1e6 == 100,000

How to use two columns in x-axis

I'm using the below code to get Segment and Year in x-axis and Final_Sales in y-axis but it is throwing me an error.
CODE
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()
%matplotlib inline
order = pd.read_excel("Sample.xls", sheet_name = "Orders")
order["Year"] = pd.DatetimeIndex(order["Order Date"]).year
result = order.groupby(["Year", "Segment"]).agg(Final_Sales=("Sales", sum)).reset_index()
bar = plt.bar(x = result["Segment","Year"], height = result["Final_Sales"])
ERROR
Can someone help me to correct my code to see the output as below.
Required Output
Try to add another pair of brackets - result[["Segment","Year"]],
What you tried to do is to retrieve column named - "Segment","Year",
But actually what are you trying to do is to retrieve a list of columns - ["Segment","Year"].
There are several problems with your code:
When using several columns to index a dataframe you want to pass a list of columns to [] (see the docs) as follows :
result[["Segment","Year"]]
From the figure you provide it looks like you want to use year as hue. matplotlib.barplot doesn't have a hue argument, you would have to build it manually as described here. Instead you can use seaborn library that you are already importing anyway (see https://seaborn.pydata.org/generated/seaborn.barplot.html):
sns.barplot(x = 'Segment', y = 'Final_Sales', hue = 'Year', data = result)

Make pandas plot() show xlabel and xvalues

I am using the standard pandas.df.plot() function to plot two columns in a dataframe. For some reason, the x-axis values and the xlabel are not visible! There seem to be no options to turn them on in the function either (see https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.plot.html).
Does someone know what is going on, and how to correct it?
import matplotlib.cm as cm
import pandas as pd
ax1 = df.plot.scatter(x='t', y='hlReference', c='STEP_STRENGTH', cmap=cm.autumn);
gives this:
This is a bug with Jupyter notebooks displaying pandas scatterplots that have a colorscale displayed while using Matplotlib as the plotting backend.
#june-skeeter has a solution in the answers that works. Alternatively, pass sharex=False to df.plot.scatter and you don't need to create subplots.
import matplotlib.cm as cm
import pandas as pd
X = np.random.rand(10,3)
df = pd.DataFrame(X,columns=['t','hlReference', 'STEP_STRENGTH'])
df.plot.scatter(
x='t',
y='hlReference',
c='STEP_STRENGTH',
cmap=cm.autumn,
sharex=False
)
See discussion in this closed pandas issues. Which references the above solution in a related SO answer.
Still an issue with pandas v1.1.0. You can track the issue here: https://github.com/pandas-dev/pandas/issues/36064
Create your axes instance first and then send it as an argument to the plot()
import matplotlib.cm as cm
import pandas as pd
X = np.random.rand(10,3)
df = pd.DataFrame(X,columns=['t','hlReference', 'STEP_STRENGTH'])
fig,ax1=plt.subplots()
df.plot.scatter(x='t', y='hlReference', c='STEP_STRENGTH', cmap=cm.autumn,ax=ax1)

Boxplot a dataframe column by date

I have a pandas dataframe of all tweets about a sporting event. The tweets are arranged by date and now include a polarity rating using the afinn sentiment library.
What I'd like to do is create a matplotlib boxplot chart for each day in the range. Unfortunately, I'm a coding newb, and I'm stuck.
Here is my code:
import numpy
import matplotlib.pyplot as plt
%matplotlib inline
boxplot_maker = lambda x: plt.boxplot()
#creating a function that I will run against a specific column in the df
Event_df["P Score"].map(boxplot_maker)
#using the .map function and my boxplot maker to the column in question
Unfortunately, this doesn't work. I get the following error:
TypeError: boxplot() missing 1 required positional argument: 'x'

Categories

Resources