Iterate through Groupby.unstack() items to make separate plots

Iterate through Groupby.unstack() items to make separate plots - python

I have an dataframe called afplot:
apple_fplot = apple_f1.groupby(['Year','Domain Category'])['Value'].sum()
afplot = apple_fplot.unstack('Domain Category')
I now need to produce a plot for each column of afplot, and need to save each plot to a unique filename.
I've been trying to do this through a for loop, (I know thats inefficient) but can't seem to get it right.
for index, column in afplot.iteritems():
plt.figure(index); afplot[column].plot(figsize=(12,6))
plt.xlabel('Year')
plt.ylabel('Fungicide used / lb')
plt.title('Amount of fungicides used on apples in the US')
plt.legend()
plt.savefig('C:/Users/User/Documents/Work/Year 3/Project/Plots/apple_fplot{}'.format(index))
I'm not sure if I'm going about this the right way, but the whole idea is to have the plot be reset each time it goes through the iteration, plotting only the next column's line plot, and then saves it to a new filename.

The df.iteritems() iterator returns (column name, series) pairs ([see docs])1. So you can simplify:
for col, data in afplot.iteritems():
ax = data.plot(title='Amount of fungicides used on apples in the US'))
ax.set_ylabel('Fungicide used / lb')
plt.gcf().savefig('C:/Users/User/Documents/Work/Year 3/Project/Plots/apple_fplot{}'.format(col))
plt.close()
The xlabel should already be 'Year' as this seems to be the name of the index. Legend is True by default. See additional plot parameters.

Related

Getting multiple legends from for loop with specific data column and row as legend title

I have a plot that iterates through different databases in a list. However, I of course want to know which database is what graph. The databases don't have a title. So I was thinking of using a specific point from a column / row. Which makes it clear for me what database I am seeing.
Inorder to extract a specific float from a column / row which is contained in my list of databases,
I use: specific_float = list_of_databases[list_index][database_column_name][database_index]
The same principle I would like to use for my for loop with plots. So I was thinking of something like this:
Some context about the names in the code:
dc_oppervlakte_filter is the list with dataframes
TP_TARGET is an column name from the dataframes
TP_SMOTE same story as TP_TARGET
STRIP_WIDTH is the column I want to use for my legend with index 0
for k in dc_oppervlakte_filter:
plt.plot(k['TP_TARGET'], k['TP_SMOTE'], 'o', label = k['STRIP_WIDTH'][0])
plt.xlabel("TP_TARGET in ($\u00b0C$)")
plt.ylabel("TP_SMOTE in ($\u00b0C$)")
plt.legend(loc = 'lower right')
However, this gives me the error: Keyerror: 0
So I tried:
for k in dc_oppervlakte_filter:
plt.plot(k['TP_TARGET'], k['TP_SMOTE'], 'o', label = dc_oppervlakte_filter[k]['STRIP_WIDTH'][0])
plt.xlabel("TP_TARGET in ($\u00b0C$)")
plt.ylabel("TP_SMOTE in ($\u00b0C$)")
plt.legend(loc = 'lower right')
But it gives me the error: TypeError: list indices must be integers or slices, not DataFrame
An optimal scenario for me would look like that there's an legend with the specific float chosen from every database that's used to plot a graph.

Ipywidgets and plotly interaction

I'm trying to make an interactive plot with ipywidgets using plotly, but I'm afraid i'm not getting something.
I have some dataframe with coordinates and some columns. I'd want to plot the dataframe in a scatterplot so that coord1=x, coord2=y and each marker point is colored by the value of a column selected by a column selected interactively.
Additionally I'd want that when I change the column value with the interactive menu, the color for every point changes to the column that i selected, rescaling the min and max of the colorbar accordingly to the min and max of the new column.
Furthermore, when I change another selector (selector2) then i want the plot to display only the subset of mu dataframe that matched a certain colID big_grid[big_grid["id_col"]==selector2.value].
Lastly there should be a rangeslider widget to adjust the color range of the colorbar
so by now i have this
big_grid=pd.DataFrame(data=dict(id_col=[1,2,3,4,5],
col1=[0.1,0.2,0.3,0.4,0.5],
col2=[10,20,30,40,50],
coord1=[6,7,8,9,10],
coord2=[6,7,8,9,10]))
list_elem=["col1","col2"]
list_id=big_grid.id_col.values
dropm_elem=widgets.Dropdown(options=list(list_elem))
dropm_id=widgets.SelectMultiple(
options=list_id,
description="Active",
disabled=False
)
rangewidg=widgets.FloatRangeSlider(value=[big_grid[dropm_elem.value].min(),big_grid[dropm_elem.value].max()],
min=big_grid[dropm_elem.value].min(),
max=big_grid[dropm_elem.value].max(),
step=0.001,
readout_format='.3f',
description="Color Range",
continuous_update=False)
fig = go.FigureWidget(data=px.scatter(big_grid,
x="coord1",
y="coord2",
color=big_grid[dropm_elem.value],
color_continuous_scale="Turbo",)
)
def handle_id_change(change):
fig.data[0]['x']=big_grid[big_grid['id_col'].isin(dropm_id.value)]["coord1"]
fig.data[0]['y']=big_grid[big_grid['id_col'].isin(dropm_id.value)]["coord2"]
fig.data[0]['marker']['color']=big_grid[big_grid['id_col'].isin(dropm_id.value)][dropm_elem.value]
fig.data[0]['marker']['cmin']=big_grid[big_grid['id_col'].isin(dropm_id.value)][dropm_elem.value].min()
fig.data[0]['marker']['cmax']=big_grid[big_grid['id_col'].isin(dropm_id.value)][dropm_elem.value].max()
def handle_elem_change(change):
fig.data[0]['marker']['color']=big_grid[big_grid['id_col'].isin(dropm_id.value)][dropm_elem.value]
dropm_elem.observe(handle_elem_change,names='value')
dropm_id.observe(handle_id_change,names='value')
right_box1 =widgets.HBox([fig])
right_box2=widgets.VBox([dropm_elem,dropm_id,rangewidg])
box=widgets.HBox([right_box1,right_box2])
box
So, like this the selection of the subset (from dropm_id) works, but the rangewidget and the hovering are broken. Basically when i change dromp_elem the color doesn't adjust as i am expecting, and instead it gets dark and uniform. At the same time if you change column and you hover over the points it lists the value of col2, but the label still says col1.
I'm afraid that I'm overcomplicating my life and there is surely an easier way, could someone enlighten me?
EDIT: If I use a different approach and I use a global variable to define the subset to plot, a plotting function and a the widget.interact function I can make it work. The problem is that in this case the plot is not a widget, so i cannot put it into a VBox or HBox.
It also still feels wrong and using global variables is not grood practice. I'll provide the code anyway for reference:
def plot(elem,rang):
fig = px.scatter(subset, x="coord1", y="coord2", color=elem,color_continuous_scale="Turbo",range_color=rang)
fig.show()
def handle_elem_change(change):
with rangewidg.hold_trait_notifications(): #This is because if you do't put it it set max,
rangewidg.max=big_grid[dropm_elem.value].max() #and if max is < min he freaks out. Like this he first
rangewidg.min=big_grid[dropm_elem.value].min() #set everything and then send the eventual errors notification.
rangewidg.value=[big_grid[dropm_elem.value].min(),big_grid[dropm_elem.value].max()]
def handle_id_change(change):
global subset
subset=big_grid[big_grid['id_col'].isin(dropm_id.value)]
big_grid=pd.DataFrame(data=dict(id_col=[1,2,3,4,5],
col1=[0.1,0.2,0.3,0.4,0.5],
col2=[10,20,30,40,50],
coord1=[6,7,8,9,10],
coord2=[6,7,8,9,10]))
subset=big_grid
list_elem=["col1","col2"]
list_id=big_grid.id_col.values
dropm_elem=widgets.Dropdown(options=list(list_elem))
dropm_id=widgets.SelectMultiple(
options=list_id,
description="Active",
disabled=False
)
rangewidg=widgets.FloatRangeSlider(value=[big_grid[dropm_elem.value].min(),big_grid[dropm_elem.value].max()],
min=big_grid[dropm_elem.value].min(),
max=big_grid[dropm_elem.value].max(),
step=0.001,
readout_format='.3f',
description="Color Range",
continuous_update=False)
dropm_elem.observe(handle_elem_change,names='value')
dropm_id.observe(handle_id_change,names='value')
display(dropm_id)
widgets.interact(plot,elem=dropm_elem,rang=rangewidg)
So, I would want the behaviour of this second code, but in a widget.Hbox, ans possibly without using global variables

UPDATE: I manage to get a working version using the following code:
def handle_elem_change(change):
with rangewidg.hold_trait_notifications(): #This is because if you do't put it it set max,
rangewidg.max=big_grid[dropm_elem.value].max() #and if max is < min he freaks out. Like this he first
rangewidg.min=big_grid[dropm_elem.value].min() #set everything and then send the eventual errors notification.
rangewidg.value=[big_grid[dropm_elem.value].min(),big_grid[dropm_elem.value].max()]
def plot_change(change):
df=big_grid[big_grid['id_col'].isin(dropm_id.value)]
output.clear_output(wait=True)
with output:
fig = px.scatter(df, x="coord1", y="coord2", color=dropm_elem.value,hover_data=["info"],
width=500,height=800, color_continuous_scale="Turbo",range_color=rangewidg.value)
fig.show()
#define the widgets dropm_elem and rangewidg, which are the possible df.columns and the color range
#used in the function plot.
big_grid=pd.DataFrame(data=dict(id_col=[1,2,3,4,5],
col1=[0.1,0.2,0.3,0.4,0.5],
col2=[10,20,30,40,50],
coord1=[6,7,8,9,10],
coord2=[6,7,8,9,10],
info=["info1","info2","info3","info4","info5",]))
list_elem=["col1","col2","info"]
list_id=big_grid.id_col.values
dropm_elem=widgets.Dropdown(options=list_elem) #creates a widget dropdown with all the _ppms
dropm_id=widgets.SelectMultiple(
options=list_id,
description="Active Jobs",
disabled=False
)
rangewidg=widgets.FloatRangeSlider(value=[big_grid[dropm_elem.value].min(),big_grid[dropm_elem.value].max()],
min=big_grid[dropm_elem.value].min(),
max=big_grid[dropm_elem.value].max(),
step=0.001,
readout_format='.3f',
description="Color Scale Range",
continuous_update=False)
output=widgets.Output()
# this line is crucial, it basically says: Whenever you move the dropdown menu widget, call the function
# #handle_elem_change, which will in turn update the values of rangewidg
dropm_elem.observe(handle_elem_change,names='value')
dropm_elem.observe(plot_change,names='value')
dropm_id.observe(plot_change,names='value')
rangewidg.observe(plot_change,names='value')
# # #this line is also crucial, it links the widgets dropmenu and rangewidg with the function plot, assigning
# # #to elem and to rang (parameters of function plot) the values of dropmenu and rangewidg
left_box = widgets.VBox([output])
right_box =widgets.VBox([dropm_elem,rangewidg,dropm_id])
tbox=widgets.HBox([left_box,right_box])
# widgets.interact(plot,elem=dropm_elem,rang=rangewidg)
display(tbox)
This way everything works, but I basically need to create a new dataframe every time that I move anything. It might not be very efficient for big dataframes, but it runs.

dataframe line plot is not plotting a line with column values

I think there is something wrong with the data in my dataframe, but I am having a hard time coming to a conclusion. I think there might be some missing datetime values, which is the index of the dataframe. Given that there are over 1000 rows, it isn't possible for me to check each row manually. Here is a picture of my data and the corresponding line plt. Clearly this isn't a line plot!
Is there any way to supplement the possible missing values in my dataframe somehow?
I also did a line plot in seaborne as well to get another perspective, but I don't think it was immediately helpful.

You have effectively done same as I have simulated. Really you have a multi-index date and age_group. plotting both together means line jumps between the two. Separate them out and plot as separate lines and it is as you expect.
d = pd.date_range("1-jan-2020", "16-mar-2021")
df = pd.concat([pd.DataFrame({"daily_percent":np.sort(np.random.uniform(0.5,1, len(d)))}, index=d).assign(age_group="0-9 Years"),
pd.DataFrame({"daily_percent":np.sort(np.random.uniform(0,0.5, len(d)))}, index=d).assign(age_group="20-29 Years")])
df.plot(kind="line", y="daily_percent", color="red")
df.set_index("age_group", append=True).unstack(1).droplevel(0, axis=1).plot(kind="line", color=["red","blue"])

Sorting based on the alt.Color field in Altair

I am attempting to sort a horizontal barchart based on the group to which it belongs. I have included the dataframe, code that I thought would get me to group-wise sorting, and image. The chart is currently sorted according to the species column in alphabetical order, but I would like it sorted by the group so that all "bads" are together, similarly, all "goods" are together. Ideally, I would like to take it one step further so that the goods and bads are subsequently sorted by value of 'LDA Score', but that was the next step.
Dataframe:
Unnamed: 0,Species,Unknown,group,LDA Score,p value
11,a,3.474929757,bad,3.07502591,5.67e-05
16,b,3.109308852,bad,2.739744898,0.000651725
31,c,3.16979865,bad,2.697247855,0.03310557
38,d,0.06730106400000001,bad,2.347746497,0.013009626000000002
56,e,2.788383183,good,2.223874347,0.0027407140000000004
65,f,2.644346144,bad,2.311106698,0.00541244
67,g,3.626001112,good,2.980960068,0.038597163
74,h,3.132399759,good,2.849798377,0.007021518000000001
117,i,3.192113412,good,2.861299028,8.19e-06
124,j,0.6140430960000001,bad,2.221483531,0.0022149739999999998
147,k,2.873671544,bad,2.390164757,0.002270102
184,l,3.003479213,bad,2.667274876,0.008129727
188,m,2.46344998,good,2.182085465,0.001657861
256,n,0.048663767,bad,2.952260299,0.013009626000000002
285,o,2.783848855,good,2.387345098,0.00092491
286,p,3.636219,good,3.094047,0.001584756
The code:
bars = alt.Chart(df).mark_bar().encode(
alt.X('LDA Score:Q'),
alt.Y("Species:N"),
alt.Color('group:N', sort=alt.EncodingSortField(field="Clinical group", op='distinct', order='ascending'))
)
bars
The resulting figure:

Two things:
If you want to sort the y-axis, you should put the sort expression in the y encoding. Above, you are sorting the color labels in the legend.
Sorting by field in Vega-Lite only works for numeric data (Edit: this is incorrect; see below), so you can use a calculate transform to map the entries to numbers by which to sort.
The result might look something like this:
alt.Chart(df).transform_calculate(
order='datum.group == "bad" ? 0 : 1'
).mark_bar().encode(
alt.X('LDA Score:Q'),
alt.Y("Species:N", sort=alt.SortField('order')),
alt.Color('group:N')
)
Edit: it turns out the reason sorting by group fails is that the default operation for sort fields is sum, which only works well on quantitative data. If you choose a different operation, you can sort on nominal data directly. For example, this shows the correct output:
alt.Chart(df).mark_bar().encode(
alt.X('LDA Score:Q'),
alt.Y("Species:N", sort=alt.EncodingSortField('group', op='min')),
alt.Color('group:N')
)
See vega/vega-lite#6064 for more information.

My time series plot showing the wrong order

I'm plotting:
df['close'].plot(legend=True,figsize=(10,4))
The original data series comes in an descending order,I then did:
df.sort_values(['quote_date'])
The table now looks good and sorted in the desired manner, but the graph is still the same, showing today first and then going back in time.
Does the .plot() order by index? If so, how can I fix this ?
Alternatively, I'm importing the data with:
df = pd.read_csv(url1)
Can I somehow sort the data there already?

There are two problems with this code:
1) df.sort_values(['quote_date']) does not sort in place. This returns a sorted data frame but df is unchanged =>
df = df.sort_values(['quote_date'])
2) Yes, the plot() method plots by index by default but you can change this behavior with the keyword use_index
df['close'].plot(use_index=False, legend=True,figsize=(10,4))

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Iterate through Groupby.unstack() items to make separate plots - python

Related

Getting multiple legends from for loop with specific data column and row as legend title

Ipywidgets and plotly interaction

dataframe line plot is not plotting a line with column values

Sorting based on the alt.Color field in Altair

My time series plot showing the wrong order

Categories

Resources