Key error while plotting a bar graph using Matplotlib - python

I have been facing one issue while I am trying to plot a bar graph using the matplotlib library.
Please find the sample data below
Sample Data Image
count_movies_year = n_db.groupby('release_year').agg({'title':'count'}).rename(columns={'title':'no_of_titles'})
count_movies_year.reset_index()
I have written the above code and did the group_by on certain cases and renamed the column in the dataframe that I have in place. Now after this I wanted to plot a bar graph of the same using the matplotlib and I have written the below code
plt.bar(count_movies_year['release_year'],count_movies_year['no_of_titles'])
plt.xlabel('release_year')
plt.ylabel('no_of_titles')
plt.show()
but, when I do this I have some errors in place and the key_error shows me 'release_year'. Can I know what is wrong over here as I am new to Python and Matplotlib understanding. Can someone guide me where exactly things are going wrong so that I can correct them next time?

When doing a group_by, the column "release_year" no longer exist in you Dataframe, since it's now the index.
You have multiple solution :
using a reset_index as you did, but you should reattribute it to your variable
count_movies_year = count_movies_year.reset_index()
or use the inplace parameter
count_movies_year.reset_index(inplace=True)
use the .index directly in your plot
plt.bar(count_movies_year.index, count_movies_year['no_of_titles'])

Related

Creating a bar plot in python

I'm working on my school project which asks me to create a bar plot. I'm unable to understand the function, can anyone please help?
def get_barplot(f_dict,title):
"""
******* CHANGE 2 (50 points) **********
Shows and saves the Bar Plot
"""
#Uncomment and fill the blanks
freq_df = pd.DataFrame(f_dict._______,columns=['key','value']) #coverts the dictionary as dataframe
bar_plot = ___.barplot(_________________________)
bar_plot.set(title=title+'_BarPlot',xlabel='Words', ylabel='Count') #Setting title and labels
plt.xticks(rotation=45) #Rotating the each word beacuse of the length of the words
plt.show()
bar_plot.figure.savefig(title+'_barplot.png',bbox_inches='tight') #saving the file
This is the code. Can anyone please let me know what should i write in the blanks given? I've spent the last hour trying to understand but I can't
I tried to use different methods but it didnt work.
It is always useful to look at the API documentation when trying to understand the library functions.
Blank 1: In the first line of your code you are trying to create a Pandas data frame from a dictionary. The first argument for pd.DataFrame is the data (see pandas.DataFrame). In this case, the items in your dictionary i.e. f_dict.items(). The columns parameter provides you a clue here as these are "key" and "value" i.e. an item in the dictionary.
Blanks 2 and 3: I assume you are using Seaborn which has a .barplot method (see seaborn.barplot). I also assume that this has been imported with the alias sns. Seaborn's .barplot method takes a data frame as the first argument which in this case would be the data frame you created in the first line of your code i.e. sns.barplot(data=freq_df).
Firstly, you must pass to the dataframe method not just a dictionary, but its items:
freq_df = pd.DataFrame(f_dict.items(),columns=['key','value'])
Next, you need to create a barplot. Pandas has a slightly different method for creating a barplot (.plot.bar()), in your case you use .barplot, which corresponds to the method from the seaborn library.
As I understand it, you need to build a barplot for the frequency of values. The following code does this:
bar_plot = sns.barplot(x = 'value', y = freq_df['value'].value_counts(), data = freq_df)
And make sure you import the seaborn library. The abbreviation sns is usually used for it:
import seaborn as sns

How to resolve NameError: name Country is not defined with seaborn.barplot

I am working of data visualization. I am trying to use seaborn library in my python code but each time when ever I try to plot bar chart I get name error for the column that I choose for x and y axis. Please check the attached image for more information:
You have passed hue=Country and it takes it as a variable. Use it like this:
sns.barplot(..., hue="Country")

Why does not Seaborn Relplot print datetime value on x-axis?

I'm trying to solve a Kaggle Competition to get deeper into data science knowledge. I'm dealing with an issue with seaborn library. I'm trying to plot a distribution of a feature along the date but the relplot function is not able to print the datetime value. On the output, I see a big black box instead of values.
Here there is my code, for plotting:
rainfall_types = list(auser.loc[:,1:])
grid = sns.relplot(x='Date', y=rainfall_types[0], kind="line", data=auser);
grid.fig.autofmt_xdate()
Here there is the
Seaborn.relpot output and the head of my dataset
I found the error. Pratically, when you use pandas.read_csv(dataset), if your dataset contains datetime column they are parsed as object, but python read these values as 'str' (string). So when you are going to plot them, matplotlib is not able to show them correctly.
To avoid this behaviour, you should convert the datetime value into datetime object by using:
df = pandas.read_csv(dataset, parse_date='Column_Date')
In this way, we are going to indicate to pandas library that there is a date column identified by the key 'Column_Date' and it has to be converted into datetime object.
If you want, you could use the Column Date as index for your dataframe, to speed up the analyis along the time. To do it add argument index='Column_Date' at your read_csv.
I hope you will find it helpful.

Plotting Unsorted Dataframes with Plotly Scatter Plots

Whenever I try to plot data using the plotly python library (in this case from Modeanalytics dataframe), it ends up connecting out-of-order data points together and causing a mess as follows:
If I sort my data with the SQL query that genrates the dataframe, then the plot looks great!
However, I want to actually sort the data in python and not in SQL.
I attempted to take the out-of-order dataframe and do this:
df.sort_values(by=['time'])
but it still resulted in the messy plot.
How can I sort my data frame in python such that it is plotted correctly?
By default sort_values() returns a new dataframe without modifying the original.
You can either set the flag to True or assign the output back to the original dataframe.
Try:
df = df.sort_values(by=['time'])
Or
df.sort_values(by=['time'], inplace=True)

Pandas plot sum of occurrence of string

Hi hoping someone can help. I have a data frame where one of the columns contains a list of names. These names are repeated in some circumstances but not all. I am trying to plot a graph where the x-axis contains the name and then the y-axis contains the number of times that name appears in the column.
I have used the following to count the number of time each name appears.
df.groupby('name').name.count()
Then tried to use the following to plot the graph. However, I get a key error messasge.
df.plot.bar(x='name', y=df.groupby('name').name.count())
Anyone able to tell me what I am doing wrong?
Thanks
I believe you need plot Series returned from count function by Series.plot.bar:
df.groupby('name').name.count().plot.bar()
Or use value_counts:
df['name'].value_counts().plot.bar()

Categories

Resources