I have the following dataframe (with different campaigns)
When I use groupby and try to plot, I get several graphs
df.groupby("Campaign").plot(y=["Visits"], x = "Week")
I would like to have only one graph with all the visits in the same graph by every campaign during the week time. Also because the graphs show up separated, I do not know which one belongs to each campaign.
I would appreciate any tips regarding this.
You could do this:
df.set_index(['Week','Campaign'])['Visits'].unstack().plot(title='Visits by Campaign')
For multiple values of Week/Campaign let's aggregate them with sum or you could use mean to average the values:
df.groupby(['Week','Campaign'])['Visits'].sum().unstack().plot(title='Visits by Campain')
Output:
Another possible solution would be to use seaborn
import seaborn as sns
ax = sns.lineplot(x="Week",
y="Visits",
hue="Campaign",
estimator=None,
lw=1,
data=df)
The documentation is here
Related
I have started using python for lots of data problems at work and the datasets are always slightly different. I'm trying to explore more efficient ways of plotting data using the inbuilt pandas function rather than individually writing out the code for each column and editing the formatting to get a nice result.
Background: I'm using Jupyter notebook and looking at histograms where the values are all unique integers.
Problem: I want the xtick labels to align with the centers of the histogram bars when plotting multiple columns of data with the one function e.g. df.hist() to get histograms of all columns at once.
Does anyone know if this is possible?
Or is it recommended to do each graph on its own vs. using the inbuilt function applied to all columns?
I can modify them individually following this post: Matplotlib xticks not lining up with histogram
which gives me what I would like but only for one graph and with some manual processing of the values.
Desired outcome example for one graph:
Basic example of data I have:
# Import libraries
import pandas as pd
import numpy as np
# create list of datapoints
data = [[170,30,210],
[170,50,200],
[180,50,210],
[165,35,180],
[170,30,190],
[170,70,190],
[170,50,190]]
# Create the pandas DataFrame
df = pd.DataFrame(data, columns = ['height', 'width','weight'])
# print dataframe.
df
Code that displays the graphs in the problem statement
df.hist(figsize=(5,5))
plt.show()
Code that displays the graph for weight how I would like it to be for all
df.hist(column='weight',bins=[175,185,195,205,215])
plt.xticks([180,190,200,210])
plt.yticks([0,1,2,3,4,5])
plt.xlim([170, 220])
plt.show()
Any tips or help would be much appreciated!
Thanks
I hope this helps.You take the column and count the frequency of each label (value counts) then you specify sort_index in order to get the order by the label not by the frecuency, then you plot the bar plot.
data = [[170,30,210],
[170,50,200],
[180,50,210],
[165,35,180],
[170,30,190],
[170,70,190],
[170,50,190]]
# Create the pandas DataFrame
df = pd.DataFrame(data, columns = ['height', 'width','weight'])
df.weight.value_counts().sort_index().plot(kind = 'bar')
plt.show()
I am trying to plot this DataFrame which records various amounts of money over a yearly series:
from matplotlib.dates import date2num
jp = pd.DataFrame([1000,2000,2500,3000,3250,3750,4500], index=['2011','2012','2013','2014','2015','2016','2017'])
jp.index = pd.to_datetime(jp.index, format='%Y')
jp.columns = ['Money']
I would simply like to make a bar graph out of this using PyPlot (i.e pyplot.bar).
I tried:
plt.figure(figsize=(15,5))
xvals = date2num(jp.index.date)
yvals = jp['Money']
plt.bar(xvals, yvals, color='black')
ax = plt.gca()
ax.xaxis_date()
plt.show()
But the chart turns out like this:
Only by increasing the width substantially will I start seeing the bars. I have a feeling that this graph is attributing the data to the first date of the year (2011-01-01 for example), hence the massive space between each 'bar' and the thinness of the bars.
How can I plot this properly, knowing that this is a yearly series? Ideally the y-axis would contain only the years. Something tells me that I do not need to use date2num(), since this seems like a very common, ordinary plotting exercise.
My guess as to where I'm stuck is not handling the year correctly. As of now I have them as DateTimeIndex, but maybe there are other steps I need to take.
This has puzzled me for 2 days. All solutions I found online seems to use DataFrame.plot, but I would rather learn how to use PyPlot properly. I also intend to add two more sets of bars, and it seems like the most common way to do that is through plt.bar().
Thanks everyone.
You can either do
jp.plot.bar()
which gives:
or plot against the actual years:
plt.bar(jp.index.year, jp.Money)
which gives:
I'm trying to create a bar chart with two series from raw data. My data looks like this:
So what I want to do is group by prev_purchase_count, count unique customers, and split/colour by segment.
I've written a few lines of code which achieve what I want to do but I know there is an easier way - probably one line.
lv_purch = df_customers.loc[df_customers['segment']=='low-value','prev_purchase_count']
hv_purch = df_customers.loc[df_customers['segment']=='high-value','prev_purchase_count']
plt.hist([lv_purch,hv_purch], label=['low-value','high-value'])
plt.legend(loc='upper right')
plt.show()
Thanks a lot!
You can use df.groupby(['prev_purchase_count']) to group them according to low or high
Found it!
import seaborn as sns
ax = sns.countplot(x='prev_purchase_count', hue = 'segment', data=df_customers)
plt.legend(loc='upper right'
How would I graph this data in seaborn. I would like to have the various categories on the x axis, and the data on the y axis as percentages.
I tried to create a barplot with seaborn but I can't get it to look right.
Any help would be appreciated!
Thanks
Edit: code:
sns.barplot(x = new_df.columns,data=new_df)
I suggest you organize your DataFrame more like this, it will make it much easier to plot and organize this type of data.
Instead of doing your DataFrame as you have it, instead transpose it to two simple columns like so:
name value
debt_consolidation 0.152388
credit_card 0.115689
all_other 0.170111
etc. By doing this you can simply plot your data in Seaborn by doing the below:
sns.barplot(x="name",y="value", data = df)
Which will look like this (click)
I have a count table as dataframe in Python and I want to plot my distribution as a boxplot. E.g.:
df=pandas.DataFrame.from_items([('Quality',[29,30,31,32,33,34,35,36,37,38,39,40]), ('Count', [3,38,512,2646,9523,23151,43140,69250,107597,179374,840596,38243])])
I 'solved' it by repeating my quality value by its count. But I dont think its a good way and my dataframe is getting very very big.
In R there its a one liner:
ggplot(df, aes(x=1,y=Quality,weight=Count)) + geom_boxplot()
This will output:!Boxplot from R1
My aim is to compare the distribution of different groups and it should look like
Can Python solve it like this too?
What are you trying to look at here? The boxplot hereunder will return the following figure.
import matplotlib.pyplot as plt
import pandas as pd
%matplotlib inline
df=pd.DataFrame.from_items([('Quality',[29,30,31,32,33,34,35,36,37,38,39,40]), ('Count', [3,38,512,2646,9523,23151,43140,69250,107597,179374,840596,38243])])
plt.figure()
df_box = df.boxplot(column='Quality', by='Count',return_type='axes')
If you want to look at your Quality distibution weighted on Count, you can try plotting an histogramme:
plt.figure()
df_hist = plt.hist(df.Quality, bins=10, range=None, normed=False, weights=df.Count)