Creating a bar plot in python - python

I'm working on my school project which asks me to create a bar plot. I'm unable to understand the function, can anyone please help?
def get_barplot(f_dict,title):
"""
******* CHANGE 2 (50 points) **********
Shows and saves the Bar Plot
"""
#Uncomment and fill the blanks
freq_df = pd.DataFrame(f_dict._______,columns=['key','value']) #coverts the dictionary as dataframe
bar_plot = ___.barplot(_________________________)
bar_plot.set(title=title+'_BarPlot',xlabel='Words', ylabel='Count') #Setting title and labels
plt.xticks(rotation=45) #Rotating the each word beacuse of the length of the words
plt.show()
bar_plot.figure.savefig(title+'_barplot.png',bbox_inches='tight') #saving the file
This is the code. Can anyone please let me know what should i write in the blanks given? I've spent the last hour trying to understand but I can't
I tried to use different methods but it didnt work.

It is always useful to look at the API documentation when trying to understand the library functions.
Blank 1: In the first line of your code you are trying to create a Pandas data frame from a dictionary. The first argument for pd.DataFrame is the data (see pandas.DataFrame). In this case, the items in your dictionary i.e. f_dict.items(). The columns parameter provides you a clue here as these are "key" and "value" i.e. an item in the dictionary.
Blanks 2 and 3: I assume you are using Seaborn which has a .barplot method (see seaborn.barplot). I also assume that this has been imported with the alias sns. Seaborn's .barplot method takes a data frame as the first argument which in this case would be the data frame you created in the first line of your code i.e. sns.barplot(data=freq_df).

Firstly, you must pass to the dataframe method not just a dictionary, but its items:
freq_df = pd.DataFrame(f_dict.items(),columns=['key','value'])
Next, you need to create a barplot. Pandas has a slightly different method for creating a barplot (.plot.bar()), in your case you use .barplot, which corresponds to the method from the seaborn library.
As I understand it, you need to build a barplot for the frequency of values. The following code does this:
bar_plot = sns.barplot(x = 'value', y = freq_df['value'].value_counts(), data = freq_df)
And make sure you import the seaborn library. The abbreviation sns is usually used for it:
import seaborn as sns

Related

Key error while plotting a bar graph using Matplotlib

I have been facing one issue while I am trying to plot a bar graph using the matplotlib library.
Please find the sample data below
Sample Data Image
count_movies_year = n_db.groupby('release_year').agg({'title':'count'}).rename(columns={'title':'no_of_titles'})
count_movies_year.reset_index()
I have written the above code and did the group_by on certain cases and renamed the column in the dataframe that I have in place. Now after this I wanted to plot a bar graph of the same using the matplotlib and I have written the below code
plt.bar(count_movies_year['release_year'],count_movies_year['no_of_titles'])
plt.xlabel('release_year')
plt.ylabel('no_of_titles')
plt.show()
but, when I do this I have some errors in place and the key_error shows me 'release_year'. Can I know what is wrong over here as I am new to Python and Matplotlib understanding. Can someone guide me where exactly things are going wrong so that I can correct them next time?
When doing a group_by, the column "release_year" no longer exist in you Dataframe, since it's now the index.
You have multiple solution :
using a reset_index as you did, but you should reattribute it to your variable
count_movies_year = count_movies_year.reset_index()
or use the inplace parameter
count_movies_year.reset_index(inplace=True)
use the .index directly in your plot
plt.bar(count_movies_year.index, count_movies_year['no_of_titles'])

How do I display Grouped Bar Chartfor multiple fields? (Altair)

I have the following dataset
I want to display this in some kind of diagram: the parameters should be located on the X-axis: confirmed, deaths, recovered. They must be defined for each region_name. The Y axis should be the sum of these values. I read about the melt () method in the official documentation, but I didn't quite understand how to use it.
I need to get something like this, only in the following form.
You have wide-form data; you need to convert it to long-form data. You can either do that in pandas using melt() or a similar method, or you can use Altair's transform_fold. You can read more about this in https://altair-viz.github.io/user_guide/data.html#long-form-vs-wide-form-data
For your data, it might look something like this:
import pandas as pd
import altair as alt
data = pd.read_csv('data_from_screenshot.csv')
alt.Chart(data).transform_fold(
["confirmed", "deaths", "recovered"],
as_=["field", "value"]
).mark_bar().encode(
x="field:N",
y="sum(value):Q",
column="region_name:N"
)

How to show more categories in a line plot of a pivot table

I have an Excel file containing rows of objects with at least two columns of variables: one for year and one for category. There are 22 types in the category variable.
So far, I can read the Excel file into a DataFrame and apply a pivot table to show the count of each category per year. I can also plot these yearly counts by category. However, when I do so, only 4 of the 22 categories are plotted. How do I instruct Matplotlib to show plot lines and labels for each of the 22 categories?
Here is my code
import numpy as np
import pandas as pd
import matplotlib as plt
df = pd.read_excel("table_merged.xlsx", sheet_name="records", encoding="utf8")
df.pivot_table(index="year", columns="category", values="y_m_d", aggfunc=np.count_nonzero, fill_value="0").plot(figsize=(10,10))
I checked the matplotlib documentation for plot(). The only argument that seemed remotely related to what I'm trying to accomplish is markevery() but it produced the error "positional argument follows keyword argument", so it doesn't seem right. I was able to use several of the other arguments successfully, like making the lines dashed, etc.
Here is the dataframe
Here is the resulting plot generated by matplotlib
Here are the same data plotted in Excel. I'm trying to make a similar plot using matplotlib
Solution
Change pivot(...,fill_value="0") to pivot(...,fill_value=0) and all of the categories appear in the figure as coded above. In the original figure, the four displayed categories were the only ones of the 22 that did not have a 0 value for any year. This is why they were displayed. Any category that had a "0" value was ignored by matplotlib.
A simpler, and better solution is pd.crosstab(df['year'],df['category']) rather than my line 5 above.
The problem comes with the pivot, most likely you don't need that since you are just tabulating years and category. the y-m-d column is not useful at all.
Try something like below:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({'year':np.random.randint(2008,2020,1000),
'category':np.random.choice(np.arange(10),size=1000,p=np.arange(10)/sum(np.arange(10))),
'y_m_d':np.random.choice(['a','b','c'],1000)})
pd.crosstab(df['year'],df['category']).plot()
And looking at the code you have, the error comes from:
pivot(...,fill_value="0")
You are filling with a string "0" and this changes the column to something else, and will be ignored by matplotlib. It should be fill_value=0 and it will work, though a very complicated approach......

Adding a key on a density graph with Pandas

I want to add a key so that I'm able to know which color is which column in my data frame. I made this by df.column_name.plot.density() multiple times. I've seen other examples with the key but I haven't been able to locate the code that adds it in.
In matplotlib, the display you're talking about is called a legend. I'm not sure if it's the same in pandas, but it's worth looking at!
Since your example didn't include enough code for me to try it out, I didn't.
Don't plot the variables one by one. Use df.plot.density(). If you want to plot a subset of variables: df.plot[var_list].density(). If you want to plot them one by one for some reason you may need to use label argument in plot function and add a legend at the end.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.DataFrame(np.random.normal(size = (10,4)),
columns = ["Col1", "Col2", "Col3", "Col4"])
df.plot.density()
plt.show()

How do you replace data for an existing line chart using python-pptx?

I have a prebuilt and populated powerpoint presentation where I am modifying the data for charts and tables. I would like to retain all formatting (and much of the text), but replace the data in a line chart within a slide.
I have a function that will replace the data using a pandas data frame that works with bar charts.
def replaceCategoryChart(df, chart, skipLastCol=0):
"""
Replaces Category chartdata for a simple series chart. e.g. Nonfarm Employment
Parameters:
df: dataframe containing new data. column 0 is the categories
chart: the powerpoint shape chart.
skipLast: 0=don't skip last column, 1+ will skip that many columns from the end
Returns: replaced chart(?)
"""
cols1= list(df)
#print(cols1)
#create chart data object
chart_data = CategoryChartData()
#create categories
chart_data.categories=df[cols1[0]]
# Loop over all series
for col in cols1[1:-skipLastCol]:
chart_data.add_series(col, df[col])
#replace chart data
chart.replace_data(chart_data)
...
S0_L= pd.read_excel(EXCEL_BOOK, sheet_name="S0_L", usecols="A:F")
S0_L_chart = prs.slides[0].shapes[3].chart
print(S0_L)
replaceCategoryChart(S0_L, S0_L_chart)
...
The python file runs successfully, however, when I open the powerpoint file I get the error
Powerpoint found a problem with content in Name.pptx.
Powerpoint can attempt to repair the presentation.
If you trust the source of this presentation, click Repair.
After clicking repair, the slide I attempted to modify is replaced by a blank layout.
Because this function works for bar charts, I think there is a mistake in the way I am understanding how to use replace_data() for a line chart.
Thank you for your help!
If your "line chart" is an "XY Scatter" chart, you'll need a different chart-data object, the XyChartData object and then to populate its XySeries objects: https://python-pptx.readthedocs.io/en/latest/api/chart-data.html#pptx.chart.data.XyChartData
I would recommend starting by getting it working using literal values, e.g. "South" and 1.05, and then proceed to supply the values from Pandas dataframes. That way you're sure the python-pptx part of your code is properly structured and you'll know where to go looking for any problems that arise.
As scanny mentioned, replace_data() does work for category line charts.
The repair error was (probably) caused by incorrectly adding series data, (there was a bad loop, corrected below).
# Loop over all series
for col in cols1[1:len(cols1)-skipLastCol]:
print('Type of column is ' + str(type(col)))
chart_data.add_series(col, df[col])

Categories

Resources