Related
I'm plotting some columns of a datafame into a boxplot. Sofar, no problem. As seen below I wrote some stuff and it works. BUT: the second plot contains the plot of the first plot, too. So as you can see I tried it with "= None" or "del value", but it does not work. Putting the plot function outside also don't solves the problem.
Whats wrong with my code?
Here is an executable example
import pandas as pd
d1 = {'ff_opt_time': [10, 20, 11, 5, 15 , 13, 19, 25 ], 'ff_count_opt': [30, 40, 45, 29, 35,38,32,41]}
df1 = pd.DataFrame(data=d1)
d2 = {'ff_opt_time': [1, 2, 1, 5, 1 , 1, 4, 5 ], 'ff_count_opt': [3, 4, 4, 9, 5,3, 2,4]}
df2 = pd.DataFrame(data=d2)
def evaluate2(df1, df2):
def plot(df, output ):
boxplot = df.boxplot(rot=45,fontsize=5)
fig = boxplot.get_figure()
fig.savefig(output + ".pdf")
df_ot = pd.DataFrame(columns=['opt_time1' , 'opt_time2'])
df_ot['opt_time1'] = df1['ff_opt_time']
df_ot['opt_time2'] = df2['ff_opt_time']
plot(df_ot, "bp_opt_time")
df_op = pd.DataFrame(columns=['count_opt1' , 'count_opt2'])
df_op['count_opt1'] = df1['ff_count_opt']
df_op['count_opt2'] = df2['ff_count_opt']
plot(df_op, "bp_count_opt_perm")
evaluate2(df1, df2)
Here is another executable example. I even used other variable names.
import pandas as pd
d1 = {'ff_opt_time': [10, 20, 11, 5, 15 , 13, 19, 25 ], 'ff_count_opt': [30, 40, 45, 29, 35,38,32,41]}
df1 = pd.DataFrame(data=d1)
d2 = {'ff_opt_time': [1, 2, 1, 5, 1 , 1, 4, 5 ], 'ff_count_opt': [3, 4, 4, 9, 5,3, 2,4]}
df2 = pd.DataFrame(data=d2)
def evaluate2(df1, df2):
df_ot = pd.DataFrame(columns=['opt_time1' , 'opt_time2'])
df_ot['opt_time1'] = df1['ff_opt_time']
df_ot['opt_time2'] = df2['ff_opt_time']
boxplot1 = df_ot.boxplot(rot=45,fontsize=5)
fig1 = boxplot1.get_figure()
fig1.savefig( "bp_opt_time.pdf")
df_op = pd.DataFrame(columns=['count_opt1' , 'count_opt2'])
df_op['count_opt1'] = df1['ff_count_opt']
df_op['count_opt2'] = df2['ff_count_opt']
boxplot2 = df_op.boxplot(rot=45,fontsize=5)
fig2 = boxplot2.get_figure()
fig2.savefig( "bp_count_opt_perm.pdf")
evaluate2(df1, df2)
I can see from your code that boxplots: boxplot1 & boxplot2 are in the same graph. What you need to do is instruct that there is going to be two plots.
This can be achieved either by
Create two sub plots using pyplot in matplotlib, this code does the trick fig1, ax1 = plt.subplots() with ax1 specifying boxplot to put in that axes and fig2 specifying boxplot figure
Dissolve evaluate2 function and execute the boxplot separately in different cell in the jupyter notebook
Solution 1 : Two subplots using pyplot
import pandas as pd
import matplotlib.pyplot as plt
d1 = {'ff_opt_time': [10, 20, 11, 5, 15 , 13, 19, 25 ], 'ff_count_opt': [30, 40, 45, 29, 35,38,32,41]}
df1 = pd.DataFrame(data=d1)
d2 = {'ff_opt_time': [1, 2, 1, 5, 1 , 1, 4, 5 ], 'ff_count_opt': [3, 4, 4, 9, 5,3, 2,4]}
df2 = pd.DataFrame(data=d2)
def evaluate2(df1, df2):
df_ot = pd.DataFrame(columns=['opt_time1' , 'opt_time2'])
df_ot['opt_time1'] = df1['ff_opt_time']
df_ot['opt_time2'] = df2['ff_opt_time']
fig1, ax1 = plt.subplots()
boxplot1 = df_ot.boxplot(rot=45,fontsize=5)
ax1=boxplot1
fig1 = boxplot1.get_figure()
fig1.savefig( "bp_opt_time.pdf")
df_op = pd.DataFrame(columns=['count_opt1' , 'count_opt2'])
df_op['count_opt1'] = df1['ff_count_opt']
df_op['count_opt2'] = df2['ff_count_opt']
fig2, ax2 = plt.subplots()
boxplot2 = df_op.boxplot(rot=45,fontsize=5)
fig2 = boxplot2.get_figure()
ax2=boxplot2
fig2.savefig( "bp_count_opt_perm.pdf")
plt.show()
evaluate2(df1, df2)
Solution 2: Executing boxplot in different cell
Update based on comments : clearing plots
Two ways you can clear the plot,
plot itself using clf()
matplotlib.pyplot.clf() function to clear the current Figure’s state without closing it
clear axes using cla()
matplotlib.pyplot.cla() function clears the current Axes state without closing the Axes.
Simply call plt.clf() function after calling fig.save
Read this documentation on how to clear a plot in Python using matplotlib
Just grab the code from Archana David and put it in your plot function: the goal is to call "fig, ax = plt.subplots()" to create a new graph.
import pandas as pd
import matplotlib.pyplot as plt
d1 = {'ff_opt_time': [10, 20, 11, 5, 15, 13, 19, 25],
'ff_count_opt': [30, 40, 45, 29, 35, 38, 32, 41]}
df1 = pd.DataFrame(data=d1)
d2 = {'ff_opt_time': [1, 2, 1, 5, 1, 1, 4, 5],
'ff_count_opt': [3, 4, 4, 9, 5, 3, 2, 4]}
df2 = pd.DataFrame(data=d2)
def evaluate2(df1, df2):
def plot(df, output):
fig, ax = plt.subplots()
boxplot = df.boxplot(rot=45, fontsize=5)
ax = boxplot
fig = boxplot.get_figure()
fig.savefig(output + ".pdf")
df_ot = pd.DataFrame(columns=['opt_time1', 'opt_time2'])
df_ot['opt_time1'] = df1['ff_opt_time']
df_ot['opt_time2'] = df2['ff_opt_time']
plot(df_ot, "bp_opt_time")
df_op = pd.DataFrame(columns=['count_opt1' , 'count_opt2'])
df_op['count_opt1'] = df1['ff_count_opt']
df_op['count_opt2'] = df2['ff_count_opt']
plot(df_op, "bp_count_opt_perm")
evaluate2(df1, df2)
Here is my dataframe:
df = pd.DataFrame({"Date":["2020-01-27","2020-02-27","2020-03-27","2020-04-27", "2020-05-27", "2020-06-27", "2020-07-27",
"2020-01-27","2020-02-27","2020-03-27","2020-04-27", "2020-05-27", "2020-06-27", "2020-07-27"],
"A_item":[2, 8, 0, 1, 8, 10, 4, 7, 2, 15, 5, 12, 10, 7],
"B_item":[1, 7, 10, 6, 5, 9, 2, 5, 6, 1, 2, 6, 15, 8],
"C_item":[9, 2, 9, 3, 9, 18, 7, 2, 8, 1, 2, 8, 1, 3],
"Channel_type":["Chanel_1", "Chanel_1", "Chanel_1", "Chanel_1", "Chanel_1", "Chanel_1", "Chanel_1",
"Chanel_2", "Chanel_2", "Chanel_2", "Chanel_2", "Chanel_2", "Chanel_2", "Chanel_2"]
})
I want to plot a group Bar chart with the dropdown filter on the Channel_type col. That's what I am trying:
trace2 = go.Bar(x=df["Date"], y=df[["B_item"]])
trace3 = go.Bar(x=df["Date"], y=df[["C_item"]])
list_updatemenus = [{'label': 'All',
'method': 'update',
'args': [{'visible': [True, True]}, {'title': 'All'}]},
{'label': 'Chanel_1',
'method': 'update',
'args': [{'visible': [True, False]}, {'title': 'Chanel_1'}]},
{'label': 'Chanel_2',
'method': 'update',
'args': [{'visible': [False, True]}, {'title': 'Chanel_2'}]}]
data = [trace1,trace2,trace3]
layout=go.Layout(title='Distribution of Sales by Region',updatemenus=list([dict(buttons= list_updatemenus)]),width=1000,height=800,barmode='group')
fig = go.Figure(data,layout)
fig.show()
And not getting the desired output:Plot 1
As it filters the graph by the "A_item", "B_item" and "C_item" while I would like to filter it by the Channel_type col as mentioned.
So the ideal result would be the below graph, but with the dropdown menu that changes the graph based on Channel_type :
Plot 2
I am able to solve the problem with Ipywidgets in the Jupyter notebook, but it’s not really working for my particular task. Here is the code:
from plotly import graph_objs as go
import ipywidgets as w
from IPython.display import display
x = 'Date'
y1 = 'A_item'
y2 = 'B_item'
y3 = 'C_item'
trace1 = {
'x': df[x],
'y': df[y1],
'type': 'bar',
'name':'A_item'
}
trace2={
'x': df[x],
'y': df[y2],
'type': 'bar',
'name':'B_item'
}
trace3 = {
'x': df[x],
'y': df[y3],
'type': 'bar',
'name':'C_item',
}
data = [trace1, trace2, trace3]
# Create layout for the plot
layout=dict(
title='Channels',
width=1200, height=700, title_x=0.5,
paper_bgcolor='#fff',
plot_bgcolor="#fff",
xaxis=dict(
title='Date',
type='date',
tickformat='%Y-%m-%d',
gridcolor='rgb(255,255,255)',
zeroline= False,
),
yaxis=dict(
title='My Y-axis',
zeroline= False
)
)
fig = go.FigureWidget(data=data, layout=layout)
def update_fig(change):
aux_df = df[df.Channel_type.isin(change['new'])]
with fig.batch_update():
for trace, column in zip(fig.data, [y1, y2, y3]):
trace.x = aux_df[x]
trace.y = aux_df[column]
drop = w.Dropdown(options=[
('All', ['Chanel_1', 'Chanel_2']),
('Chanel_1', ['Chanel_1']),
('Chanel_2', ['Chanel_2']),
])
drop.observe(update_fig, names='value')
display(w.VBox([drop, fig]))
And here is the output:
The problem is that I am not able to wrap the VBox into an HTML file and save the dropdown menu. Also, it isn’t working in the Python shell as it is intended for the Jupyter notebook, and I need to share it.
So the ideal result would be to wrap the last figure within the Plotly fig only without the ipywidgets.
Any help be really appreciated!
Thank you!
The most important thing to note is that for go.Bar, if you have n dates in the x parameter and you pass a 2D array of dimension (m, n) to the y parameter of go.Bar, Plotly understands to create a grouped bar chart with each date n having m bars.
For your DataFrame, something like df[df['Channel_type'] == "Channel_1"][items].T.values will reshape it as needed. So we can apply this to the y field of args that we pass the to the buttons we make.
Credit to #vestland for the portion of the code making adjustments to the buttons to make it a dropdown.
import pandas as pd
import plotly.graph_objects as go
df = pd.DataFrame({"Date":["2020-01-27","2020-02-27","2020-03-27","2020-04-27", "2020-05-27", "2020-06-27", "2020-07-27",
"2020-01-27","2020-02-27","2020-03-27","2020-04-27", "2020-05-27", "2020-06-27", "2020-07-27"],
"A_item":[2, 8, 0, 1, 8, 10, 4, 7, 2, 15, 5, 12, 10, 7],
"B_item":[1, 7, 10, 6, 5, 9, 2, 5, 6, 1, 2, 6, 15, 8],
"C_item":[9, 2, 9, 3, 9, 18, 7, 2, 8, 1, 2, 8, 1, 3],
"Channel_type":["Channel_1", "Channel_1", "Channel_1", "Channel_1", "Channel_1", "Channel_1", "Channel_1",
"Channel_2", "Channel_2", "Channel_2", "Channel_2", "Channel_2", "Channel_2", "Channel_2"]
})
fig = go.Figure()
colors = ['#636efa','#ef553b','#00cc96']
items = ["A_item","B_item","C_item"]
for item, color in zip(items, colors):
fig.add_trace(go.Bar(
x=df["Date"], y=df[item], marker_color=color
))
# one button for each df column
# slice the DataFrame and apply transpose to reshape it correctly
updatemenu= []
buttons=[]
for channel in df['Channel_type'].unique():
buttons.append(dict(method='update',
label=channel,
args=[{
'y': df[df['Channel_type'] == channel][items].T.values
}])
)
## add a button for both channels
buttons.append(dict(
method='update',
label='Both Channels',
args=[{
'y': df[items].T.values
}])
)
# some adjustments to the updatemenu
# from code by vestland
updatemenu=[]
your_menu=dict()
updatemenu.append(your_menu)
updatemenu[0]['buttons']=buttons
updatemenu[0]['direction']='down'
updatemenu[0]['showactive']=True
fig.update_layout(updatemenus=updatemenu)
fig.show()
I have a JSON that need to convert to Excel.
I'm using Python 3.8 with xlsxwriter library.
Below is sample JSON.
{
"companyId": "123456",
"companyName": "Test",
"companyStatus": "ACTIVE",
"document": {
"employee": {
"employeeId": "EM1567",
"employeeLastName": "Test Last",
"employeeFirstName": "Test Fist"
},
"expenseEntry": [
{
"allocation": [
{
"allocationId": "03B249B3598",
"journal": [
{
"journalAccountCode": "888",
"journalPayee": "EMPL",
"journalPayer": "COMP",
"taxGuid": [
"51645A638114E"
]
},
{
"journalAccountCode": "999",
"journalPayee": "EMPL",
"journalPayer": "EMPL",
"taxGuid": [
"8114E51645A63"
]
},
],
"tax": [
{
"taxCode": "TAX123",
"taxSource": "SYST"
},
{
"taxCode": "TAX456",
"taxSource": "SYST"
}
]
}
],
"approvedAmount": 200.0,
"entryDate": "2020-12-10",
"entryId": "ENTRY9988"
}
],
"report": {
"currencyCode": "USD",
"reportCreationDate": "2020-12-10",
"reportId": "ACA849BBB",
"reportName": "Test Report",
"totalApprovedAmount": 200.0
}
},
"id": "c71b7d756f549"
}
And my current code:
https://repl.it/#tonyiscoming/jsontoexcel
I tried with pandas
import pandas as pd
df = pd.json_normalize(data, max_level=5)
df.to_excel('test.xlsx', index=False)
And got the result
I tried with json_excel_converter
from json_excel_converter import Converter
from json_excel_converter.xlsx import Writer
conv = Converter()
conv.convert(data, Writer(file='test.xlsx'))
And got the result
This is my expectation
Would anyone please help me in this case? Thank you so much.
Here is the code what you are looking for. I did this using XlsxWriter package. First I made the template with some cell format stuff. After that, I entered values using according to your JSON.
import xlsxwriter
from itertools import zip_longest
data = [
{
"companyId": "123456",
"companyName": "Test",
"companyStatus": "ACTIVE",
"document": {
"employee": {
"employeeId": "EM1567",
"employeeLastName": "Test Last",
"employeeFirstName": "Test Fist"
},
"expenseEntry": [
{
"allocation": [
{
"allocationId": "03B249B3598",
"journal": [
{
"journalAccountCode": "888",
"journalPayee": "EMPL",
"journalPayer": "COMP",
"taxGuid": [
"51645A638114E"
]
},
{
"journalAccountCode": "999",
"journalPayee": "EMPL",
"journalPayer": "EMPL",
"taxGuid": [
"8114E51645A63"
]
},
],
"tax": [
{
"taxCode": "TAX123",
"taxSource": "SYST"
},
{
"taxCode": "TAX456",
"taxSource": "SYST"
}
]
}
],
"approvedAmount": 200.0,
"entryDate": "2020-12-10",
"entryId": "ENTRY9988"
}
],
"report": {
"currencyCode": "USD",
"reportCreationDate": "2020-12-10",
"reportId": "ACA849BBB",
"reportName": "Test Report",
"totalApprovedAmount": 200.0
}
},
"id": "c71b7d756f549"
}
]
xlsx_file = 'your_file_name_here.xlsx'
# define the excel file
workbook = xlsxwriter.Workbook(xlsx_file)
# create a sheet for our work, defaults to Sheet1.
worksheet = workbook.add_worksheet()
# common merge format
merge_format = workbook.add_format({'align': 'center', 'valign': 'vcenter'})
# set all column width to 20
worksheet.set_column('A:V', 20)
# column wise template creation (A-V)
worksheet.merge_range(0, 0, 4, 0, 'companyId', merge_format) # A
worksheet.merge_range(0, 1, 4, 1, 'companyName', merge_format) # B
worksheet.merge_range(0, 2, 4, 2, 'companyStatus', merge_format) # C
worksheet.merge_range(0, 3, 0, 20, 'document', merge_format) # C-U
worksheet.merge_range(1, 3, 1, 5, 'employee', merge_format) # D-F
worksheet.merge_range(2, 3, 4, 3, 'employeeId', merge_format) # D
worksheet.merge_range(2, 4, 4, 4, 'employeeLastName', merge_format) # E
worksheet.merge_range(2, 5, 4, 5, 'employeeFirstName', merge_format) # F
worksheet.merge_range(1, 6, 1, 15, 'expenseEntry', merge_format) # G-P
worksheet.merge_range(2, 6, 2, 12, 'allocation', merge_format) # G-M
worksheet.merge_range(3, 6, 4, 6, 'allocationId', merge_format) # G
worksheet.merge_range(3, 7, 3, 10, 'journal', merge_format) # H-K
worksheet.write(4, 7, 'journalAccountCode') # H
worksheet.write(4, 8, 'journalPayee') # I
worksheet.write(4, 9, 'journalPayer') # J
worksheet.write(4, 10, 'taxGuid') # K
worksheet.merge_range(3, 11, 3, 12, 'tax', merge_format) # L-M
worksheet.write(4, 11, 'taxCode') # L
worksheet.write(4, 12, 'taxSource') # M
worksheet.merge_range(2, 13, 4, 13, 'approvedAmount', merge_format) # N
worksheet.merge_range(2, 14, 4, 14, 'entryDate', merge_format) # O
worksheet.merge_range(2, 15, 4, 15, 'entryId', merge_format) # P
worksheet.merge_range(1, 16, 1, 20, 'report', merge_format) # Q-U
worksheet.merge_range(2, 16, 4, 16, 'currencyCode', merge_format) # Q
worksheet.merge_range(2, 17, 4, 17, 'reportCreationDate', merge_format) # R
worksheet.merge_range(2, 18, 4, 18, 'reportId', merge_format) # S
worksheet.merge_range(2, 19, 4, 19, 'reportName', merge_format) # T
worksheet.merge_range(2, 20, 4, 20, 'totalApprovedAmount', merge_format) # U
worksheet.merge_range(0, 21, 4, 21, 'id', merge_format) # V
# inserting data
row = 5
for obj in data:
worksheet.write(row, 0, obj.get('companyId'))
worksheet.write(row, 1, obj.get('companyName'))
worksheet.write(row, 2, obj.get('companyStatus'))
document = obj.get('document', {})
# employee details
employee = document.get('employee', {})
worksheet.write(row, 3, employee.get('employeeId'))
worksheet.write(row, 4, employee.get('employeeLastName'))
worksheet.write(row, 5, employee.get('employeeFirstName'))
# report details
report = document.get('report', {})
worksheet.write(row, 16, report.get('currencyCode'))
worksheet.write(row, 17, report.get('reportCreationDate'))
worksheet.write(row, 18, report.get('reportId'))
worksheet.write(row, 19, report.get('reportName'))
worksheet.write(row, 20, report.get('totalApprovedAmount'))
worksheet.write(row, 21, obj.get('id'))
# expenseEntry details
expense_entries = document.get('expenseEntry', [])
for expense_entry in expense_entries:
worksheet.write(row, 13, expense_entry.get('approvedAmount'))
worksheet.write(row, 14, expense_entry.get('entryDate'))
worksheet.write(row, 15, expense_entry.get('entryId'))
# allocation details
allocations = expense_entry.get('allocation', [])
for allocation in allocations:
worksheet.write(row, 6, allocation.get('allocationId'))
# journal and tax details
journals = allocation.get('journal', [])
taxes = allocation.get('tax', [])
for journal_and_tax in list(zip_longest(journals, taxes)):
journal, tax = journal_and_tax
worksheet.write(row, 7, journal.get('journalAccountCode'))
worksheet.write(row, 8, journal.get('journalPayee'))
worksheet.write(row, 9, journal.get('journalPayer'))
worksheet.write(row, 11, tax.get('taxCode'))
worksheet.write(row, 12, tax.get('taxSource'))
# taxGuid details
tax_guides = journal.get('taxGuid', [])
if not tax_guides:
row = row + 1
continue
for tax_guide in tax_guides:
worksheet.write(row, 10, tax_guide)
row = row + 1
# finally close the created excel file
workbook.close()
One thing, instead of creating a template in the script you can make your own one and save it somewhere else. Then get the copy of that template and just add data using the script. This will give you a chance to make your own base template, otherwise, you have to format your excel using the script, such as border formattings, merge cells, etc.
I used zip_longest python built-in function from itertools to zip journal and tax objects. Just follow Python – Itertools.zip_longest() or Python's zip_longest Function article for examples. If you didn't understand anything from my code, please comment below.
Having empty cells in an Excel Grid is not something really "propper", which is why json_excel_converter beahaves like this.
So, If you want to achieve this, I'm afraid you'll have to develop it all by yourself.
I'm studying dash library.
This code showing the scatter plot when i select column in the data frame.
This works without any problem, but call back error occurs on the web page.
on the web, callback error updating spas-graph.figure
i can't understand why this error occurs.
[import dash
import dash_core_components as dcc
import dash_html_components as html
from dash.dependencies import Input, Output
import plotly.graph_objects as go
import pandas as pd
df = pd.DataFrame({
'depth' : [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
'upper_value' : [1, 4, 6, 2, 6, 8, 9, 10, 4, 2],
'middle_value' : [5, 3, 7, 8, 1, 2, 3, 1, 4, 8],
'down_value' : [6, 2, 1, 10, 5, 2, 3, 4, 2, 7]
})
col_list = df.columns[1:4]
app = dash.Dash(__name__)
app.layout = html.Div([
dcc.Dropdown(
id = 'select-cd',
options = [
{'label' : i, 'value' : i}
for i in col_list
]
),
dcc.Graph(id = 'spas-graph')
])
#app.callback(
Output('spas-graph', 'figure'),
[Input('select-cd', 'value')]
)
def update_figure(selected_col):
return {
'data' : [go.Scatter(
x = df[selected_col],
y = df['depth'],
mode = 'lines + markers',
marker = {
'size' : 15,
'opacity' : 0.5,
'line' : {'width' : 0.5, 'color' : 'white'}
}
)],
'layout' : go.Layout(
xaxis={'title': 'x_scale'},
yaxis={'title': 'y_scale'},
hovermode='closest'
)
}
if __name__ == '__main__':
app.run_server(debug=True)
You have not defined the value parameter in your dropdown method. So when the server starts the first input it picks up is a None value.
You can solve it in two ways:
Add a default value in the Dropdown:
Handle None value in the callback method
import dash
import dash_core_components as dcc
import dash_html_components as html
from dash.dependencies import Input, Output
import plotly.graph_objects as go
import pandas as pd
df = pd.DataFrame({
'depth' : [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
'upper_value' : [1, 4, 6, 2, 6, 8, 9, 10, 4, 2],
'middle_value' : [5, 3, 7, 8, 1, 2, 3, 1, 4, 8],
'down_value' : [6, 2, 1, 10, 5, 2, 3, 4, 2, 7]
})
col_list = df.columns[1:4]
app = dash.Dash(__name__)
app.layout = html.Div([
dcc.Dropdown(
id = 'select-cd',
options = [
{'label' : i, 'value' : i}
for i in col_list
],
value = col_list[0]
),
dcc.Graph(id = 'spas-graph')
])
#app.callback(
Output('spas-graph', 'figure'),
[Input('select-cd', 'value')]
)
def update_figure(selected_col):
if selected_col is None:
selected_col = col_list[0]
return {
'data' : [go.Scatter(
x = df[selected_col],
y = df['depth'],
mode = 'lines + markers',
marker = {
'size' : 15,
'opacity' : 0.5,
'line' : {'width' : 0.5, 'color' : 'white'}
}
)],
'layout' : go.Layout(
xaxis={'title': 'x_scale'},
yaxis={'title': 'y_scale'},
hovermode='closest'
)
}
The plot I am trying to make needs to achieve 3 things.
If a quiz is taken on the same day with the same score, that point needs to be bigger.
If two quiz scores overlap there needs to be some jitter so we can see all points.
Each quiz needs to have its own color
Here is how I am going about it.
import seaborn as sns
import pandas as pd
data = {'Quiz': [1, 1, 2, 1, 2, 1],
'Score': [7.5, 5.0, 10, 10, 10, 10],
'Day': [2, 5, 5, 5, 11, 11],
'Size': [115, 115, 115, 115, 115, 355]}
df = pd.DataFrame.from_dict(data)
sns.lmplot(x = 'Day', y='Score', data = df, fit_reg=False, x_jitter = True, scatter_kws={'s': df.Size})
plt.show()
Setting the hue, which almost does everything I need, results in this.
import seaborn as sns
import pandas as pd
data = {'Quiz': [1, 1, 2, 1, 2, 1],
'Score': [7.5, 5.0, 10, 10, 10, 10],
'Day': [2, 5, 5, 5, 11, 11],
'Size': [115, 115, 115, 115, 115, 355]}
df = pd.DataFrame.from_dict(data)
sns.lmplot(x = 'Day', y='Score', data = df, fit_reg=False, hue = 'Quiz', x_jitter = True, scatter_kws={'s': df.Size})
plt.show()
Is there a way I can have hue while keeping the size of my points?
It doesn't work because when you are using hue, seaborn does two separate scatterplots and therefore the size argument you are passing using scatter_kws= no longer aligns with the content of the dataframe.
You can recreate the same effect by hand however:
x_col = 'Day'
y_col = 'Score'
hue_col = 'Quiz'
size_col = 'Size'
jitter=0.2
fig, ax = plt.subplots()
for q,temp in df.groupby(hue_col):
n = len(temp[x_col])
x = temp[x_col]+np.random.normal(scale=0.2, size=(n,))
ax.scatter(x,temp[y_col],s=temp[size_col], label=q)
ax.set_xlabel(x_col)
ax.set_ylabel(y_col)
ax.legend(title=hue_col)