barplot x axis construction from data pandas seaborn python - python

So i'm trying to create a barplot using seaborn. My data is in the form
Packet number,Flavour,Contents
1,orange,4
2,orange,3
3,orange,2
4,orange,4
...
36, orange,3
1, coffee,5
2, coffee,3
...
1, raisin,4
etc.
My code is currently:
revels_data = pd.read_csv("testtt.txt") rd = revels_data
ax = sns.barplot(x="Packet number", y="Contents", data=rd) plt.show()
I'm trying to create bars for each packet number (on x axis) which are divided by colour inside each bar for the flavour with the total contents per packet on the y axis.
Started trying to make the totals of each packet i.e.
total_1 = (rd.loc[rd["Packet number"] == 1, "Contents"].sum())
but not sure how i'd go from there, or if there is an easier way to do it.
Any advice is much appreciated!

You want to use hue for that. As well, currently you are displaying the mean of each category. To calculate different function you can use estimator.
Thus, your code should be:
ax = sns.barplot(x="Packet number", y="Contents", hue="Flavour", data=rd)
Or if you want to show the sum instead of the mean:
ax = sns.barplot(x="Packet number", y="Contents", hue="Flavour", estimator=np.sum, data=rd)
Edit:
If you are interested in stacked barplot, you can make it directly using pandas, but you need to group your data first:
# Sum (or mean if you'd rather) the Contents per packet number and flavor
# unstack() will turn the flavor into columns, and fillna will put 0 in
# all missing columns
grouped = rd.groupby(["Packet number", "Flavour"])["Contents"].sum().unstack().fillna(0)
# The x axis is taken from the index. The y axis from the columns
grouped.plot(kind="bar", stacked=True)

Related

how to use 'for' loop to create multiple figures, not all data in one figure?

I want to plot some statistics results for each region. I have nested 'for' loops, where in the inner loop it generates the statistics, in the outer loop it selects the regions and plot the respective statistic results. Not sure why my code plots data from all regions into the same figure, not one figure for a region.
Yrstat = []
for j in Regionlist:
for i in Yrlist:
dfnew = df.loc[(df['Yr']==i)&(df['Region']==j)]
if not dfnew.empty:
#calculate the confidence interval and mean for data in each year
CI = scipy.stats.norm.interval(alpha=0.95, loc=np.mean(dfnew['FluxTot']), scale=scipy.stats.sem(dfnew['FluxTot']))
list(CI)
mean = np.mean (dfnew['FluxTot'])
Yrstat.append((i, mean, CI[0], CI[1]))
#convert stats list to a dataframe
yrfullinfo = pd.DataFrame(Yrstat, columns = ['Yr', 'mean', 'CI-','CI+'])
#making figures
fig, ax =plt.subplots()
ax.plot(yrfullinfo['Yr'], yrfullinfo['mean'], label = 'mean')
ax.plot(yrfullinfo['Yr'], yrfullinfo['CI-'], label = '95%CI')
ax.plot(yrfullinfo['Yr'], yrfullinfo['CI+'], label = '95%CI')
ax.legend()
#exporting figures
filename = "C:/Users/Christina/Desktop/python test/Summary in {}.png". format (j)
fig.savefig(filename)
plt.close(fig)
The problem wasn't the figure, the script saves a png file for each region in an own plot which is correct. The problem is your data.
You intitialize Yrstat=[] outside of both loops. Then you append data to it in every step of the inner loop (and also all outer loops) and plot the data of the "new" DataFrame yrfullinfo. This DataFrame grows bigger with each iteration.
You need to create a new list of values for each Region, that's why I moved the list Yrstat in the outer loop to get reinitialized for every region.
for j in Regionlist:
Yrstat = []
for i in Yrlist:
dfnew=dfmerge.loc[(dfmerge['Yr']==i)&(dfmerge['Region']==j)]
if not dfnew.empty:
#get all statistics for data in each year
CI = st.norm.interval(alpha=0.95, loc=np.mean(dfnew['FluxTot']), scale=st.sem(dfnew['FluxTot']))
list(CI)
mean = np.mean (dfnew['FluxTot'])
Yrstat.append((j, i, mean, CI[0], CI[1]))

Python - Plot a graph with times on x-axis

I have the following dataframe in pandas:
dfClicks = pd.DataFrame({'clicks': [700,800,550],'date_of_click': ['10/25/1995
03:30','10/25/1995 04:30','10/25/1995 05:30']})
dfClicks['date_of_click'] = pd.to_datetime(dfClicks['date_of_click'])
dfClicks.set_index('date_of_click')
dfClicks.clicks = pd.to_numeric(dfClicks.clicks)
Could you please advise how I can plot the above such that the x-axis shows the date/time and the y axis the number of clicks? I will also need to plot another data frame which includes predicted clicks on the same graph, just to compare. The test could be a replica of above, with minor changes:
dfClicks2 = pd.DataFrame({'clicks': [750,850,500],'date_of_click': ['10/25/1995
03:30','10/25/1995 04:30','10/25/1995 05:30']})
dfClicks2['date_of_click'] = pd.to_datetime(dfClicks2['date_of_click'])
dfClicks2.set_index('date_of_click')
dfClicks2.clicks = pd.to_numeric(dfClicks2.clicks)
Change to numeric the column clicks and then:
ax = dfClicks.plot()
dfClicks2.plot(ax=ax)
ax.legend(["Clicks","Clicks2"])
Output:
UPDATE:
There is an error in how you set the index, change
dfClicks.set_index('date_of_click')
with:
dfClicks = dfClicks.set_index('date_of_click')

python - bokeh - stacked bar chart with conditional coloring

how to detach height of the stacked bars from colors of the fill?
I have multiple categories which I want to present in stacked bar chart so that the height represent the value and color is conditionally defined by another variable (something like fill= in the ggplot ).
I am new to bokeh and struggling with the stack bar chart mechanics. I tried construct this type of chart, but I haven't got anything except all sorts of errors. The examples of stacked bar chart are very limited in the bokeh documentation.
My Data is stored in pandas dataframe:
data =
['A',1, 15, 1]
'A',2, 14, 2
'A',3, 60, 1
'B',1, 15, 2
'B',2, 25, 2
'B',3, 20, 1
'C',1, 15, 1
'C',2, 25, 1
'C',3, 55, 2
...
]
Columns represent Category, Regime, Value, State.
I want to plot Category on x axis, Regimes stacked on y axis where bar length represents Value and color represents State.
is this achievable in bokeh?
can anybody demonstrate please
I think this problem becomes much easier if you transform your data to the following form:
from bokeh.plotting import figure
from bokeh.io import show
from bokeh.transform import stack, factor_cmap
import pandas as pd
df = pd.DataFrame({
"Category": ["a", "b"],
"Regime1_Value": [1, 4],
"Regime1_State": ["A", "B"],
"Regime2_Value": [2, 5],
"Regime2_State": ["B", "B"],
"Regime3_Value": [3, 6],
"Regime3_State": ["B", "A"]})
p = figure(x_range=["a", "b"])
p.vbar_stack(["Regime1_Value", "Regime2_Value", "Regime3_Value"],
x="Category",
fill_color=[
factor_cmap(state, palette=["red", "green"], factors=["A", "B"])
for state in ["Regime1_State","Regime2_State", "Regime3_State"]],
line_color="black",
width=0.9,
source=df)
show(p)
This is a bit strange, because vbar_stack behaves unlike a "normal glyph". Normally you have three options for attributes of a renderer (assume we want to plot n dots/rectangles/shapes/things:
Give a single value that is used for all n glyphs
Give a column name that is looked up in the source (source[column_name] must produce an "array" of length n)
Give an array of length n of data
But vbar_stack does not create one renderer, it creates as many as there are elements in the first array you give. Lets call this number k. Then to make sense of the attributes you have again three options:
Give a single value that is used for all glyphs
Give an array of k things that are used as columns names in the source (each lookup must produce an array of length n).
Give an array of length n of data (so for all 1-k glyphs have the same data).
So p.vbar(x=[a,b,c]) and p.vbar_stacked(x=[a,b,c]) actually do different things (the first gives literal data, the second gives column names) which confused, and it's not clear from the documentation.
But why do we have to transform your data so strangely? Lets unroll vbar_stack and write it on our own (details left out for brevity):
plotted_regimes = []
for regime in regimes:
if not plotted_regimes:
bottom = 0
else:
bottom = stack(*plotted_regimes)
p.vbar(bottom=bottom, top=stack(*plotted_regimes, regime))
plotted_regimes.append(regime)
So for each regime we have a separate vbar that has its bottom where the sum of the other regimes ended. Now with the original data structure this is not really possible because there doesn't need to be a a value for each regime for each category. Here we are forced to set these values to 0 if we actually want.
Because the stacked values corrospond to column names we have to put these values in one dataframe. The vbar_stack call in the beginning could also be written with stack (basically because vbar_stack is a convenience wrapper around stack).
The factor_cmap is used so that we don't have to manually assign colors. We could also simply add a Regime1_Color column, but this way the mapping is done automatically (and client side).

Python sort_values plot is inverted

new Python learner here. This seems like a very simple task but I can't do it to save my life.
All I want to do is to grab 1 column from my DataFrame, sort it, and then plot it. THAT'S IT. But when I plot it, the graph is inverted. Upon examination, I find that the values are sorted, but the index is not...
Here is my simple 3 liner code:
testData = pd.DataFrame([5,2,4,2,5,7,9,7,8,5,4,6],[9,4,3,1,5,6,7,5,4,3,7,8])
x = testData[0].sort_values()
plt.plot(x)
edit:
Using matplotlib
If you're talking about ordering values sequentially on the x-axis like 0, 1, 2, 3, 4 ... You need to re-index your values.
x = testData[0].sort_values()
x.index = range(len(x))
plt.plot(x)
Other than that if you want your values sorted in the data frame but displayed by order of index then you want a scatter plot not a line plot
plt.scatter(x.index, x.values)

Is there a way for iPython to generate these kinds of charts given a dataframe?

This picture
Please ignore the background image. The foreground chart is what I am interested in showing using pandas or numpy or scipy (or anything in iPython).
I have a dataframe where each row represents temperatures for a single day.
This is an example of some rows:
100 200 300 400 500 600 ...... 2300
10/3/2013 53*C 57*C 48*C 49*C 54*C 54*C 55*C
10/4/2013 45*C 47*C 48*C 49*C 50*C 52*C 57*C
Is there a way to get a chart that represents the changes from hour to hour using the first column as a 'zero'
Something quick and dirty that might get you most of the way there, assuming your data frame is named df:
import matplotlib.pyplot as plt
plt.imshow(df.T.diff().fillna(0.0).T.drop(0, axis=1).values)
Since I can't easily construct a sample version with your exact column labels, there might be slight additional tinkering with getting rid of any index columns that are included in the diff and moved with the transposition. But this worked to make a simple heat-map-ish plot for me on a random data example.
Then you can create a matplotlib figure or axis object and specify whatever you want for the x- and y-axis labels.
You could just plot lines one at a time for each row with an offset:
nrows, ncols = 12, 30
# make up some fake data:
d = np.random.rand(nrows, ncols)
d *= np.sin(2*np.pi*np.arange(ncols)*4/ncols)
d *= np.exp(-0.5*(np.arange(nrows)-nrows/2)**2/(nrows/4)**2)[:,None]
#this is all you need, if you already have the data:
for i, r in enumerate(d):
plt.fill_between(np.arange(ncols), r+(nrows-i)/2., lw=2, facecolor='white')
You could do it all at once if you don't need the fill color to block the previous line:
d += np.arange(nrows)[:, None]
plt.plot(d.T)

Categories

Resources