How to add Prefix to rSquared extracted from Altair?

How to add Prefix to rSquared extracted from Altair? - python

I'm adding the rSquared to a chart using the method outlined in this answer:
r2 = alt.Chart(df).transform_regression('x', 'y', params=True
).mark_text().encode(x=alt.value(20), y=alt.value(20), text=alt.Text('rSquared:N', format='.4f'))
But I want to prepend "rSquared = " to the final text.
I've seen this answer involving an f string and a value calculated outside the chart, but I'm not clever enough to figure out how to apply that solution to this scenario.
I've tried, e.g., format='rSquared = .4f', but adding any additional text breaks the output, which I'm sure is the system working as intended.

One possible solution using the posts you linked to would be to extract the value of the parameter using altair_transform and then add the value to the plot. This is not the most elegant solution but should achieve what you want.
# pip install git+https://github.com/altair-viz/altair-transform.git
import altair as alt
import pandas as pd
import numpy as np
import altair_transform
np.random.seed(42)
x = np.linspace(0, 10)
y = x - 5 + np.random.randn(len(x))
df = pd.DataFrame({'x': x, 'y': y})
chart = alt.Chart(df).mark_point().encode(
x='x',
y='y'
)
line = chart.transform_regression('x', 'y').mark_line()
params = chart.transform_regression('x','y', params=True).mark_line()
R2 = altair_transform.extract_data(params)['rSquared'][0]
text = alt.Chart({'values':[{}]}).mark_text(
align="left", baseline="top"
).encode(
x=alt.value(5), # pixels from left
y=alt.value(5), # pixels from top
text=alt.value(f"rSquared = {R2:.4f}"),
)
chart + line + text

Related

Altair: adding sorting destroys chart

The following code produces a column chart in which the y axis grows in the wrong direction.
alt.Chart(df).mark_line().encode(
x = alt.X('pub_date', timeUnit='month'),
y = alt.Y('sum(has_kw)', ),
)
I wanted to correct it as suggested by https://stackoverflow.com/a/58326269, and changed my code to
alt.Chart(df).mark_line().encode(
x = alt.X('pub_date', timeUnit='month'),
y = alt.Y('sum(has_kw)', sort=alt.EncodingSortField('y', order='descending') ),
)
But now altair produces a strange diagram, see 2.
That is, sum(has_kw) is calculated wrong. Why this, and how to correct it?

It is hard to know exactly without seeing a sample of your data but you could try one of the following (based on the example you linked). This first approach is similar to what you tried already:
import altair as alt
import numpy as np
import pandas as pd
# Compute x^2 + y^2 across a 2D grid
x, y = np.meshgrid(range(0, 3), range(0, 3))
z = x ** 2 + y ** 2
# Convert this grid to columnar data expected by Altair
source = pd.DataFrame({
'x': x.ravel(),
'y': y.ravel(),
'z': z.ravel()
})
alt.Chart(source).mark_rect().encode(
x='x:O',
y=alt.Y('y:O', sort='descending'),
color='z:Q'
)
This second approaches simply reverses the axes without sorting it and might be more compatible with your data:
alt.Chart(source).mark_rect().encode(
x='x:O',
y=alt.Y('y:O', scale=alt.Scale(reverse=True)),
color='z:Q'
)

Seeking to modify code to pull data from an excel sheet where column A has X numeric value. ( ie all rows with value= 0 )

Just to be upfront, I am a Mechanical Engineer with limited coding experience thou I have some programming classes under my belt( Java, C++, and lisp)
I have inherited this code from my predecessor and am just trying to make it work for what I'm doing with it. I need to iterate through an excel file that has column A values of 0, 1, 2, and 3 (in the code below this correlates to "Revs" ) but I need to pick out all the value = 0 and put into a separate folder, and again for value = 2, etc.. Thank you for bearing with me, I appreciate any help I can get
import pandas as pd
import numpy as np
import os
import os.path
import xlsxwriter
import matplotlib.pyplot as plt
import six
import matplotlib.backends.backend_pdf
from matplotlib.gridspec import GridSpec
from matplotlib.ticker import AutoMinorLocator, MultipleLocator
def CamAnalyzer(entryName):
#Enter excel data from file as a dataframe
df = pd.read_excel (str(file_loc) + str(entryName), header = 1) #header 1 to get correct header row
print (df)
#setup grid for plots
plt.style.use('fivethirtyeight')
fig = plt.figure(figsize=(17,22))
gs = GridSpec(3,2, figure=fig)
props = dict(boxstyle='round', facecolor='w', alpha=1)
#create a list of 4 smaller dataframes by splitting df when the rev count changes and name them
dfSplit = list(df.groupby("Revs"))
names = ["Air Vent","Inlet","Diaphram","Outlet"]
for x, y in enumerate(dfSplit):
#for each smaller dataframe #x,(df-y), create a polar plot and assign it to a space in the grid
dfs = y[1]
r = dfs["Measurement"].str.strip(" in") #radius measurement column has units. ditch em
r = r.apply(pd.to_numeric) + zero/2 #convert all values in the frame to a float
theta = dfs["Rads"]
if x<2:
ax = fig.add_subplot(gs[1,x],polar = True)
else:
ax = fig.add_subplot(gs[2,x-2],polar = True)
ax.set_rlim(0,0.1) #set limits to radial axis
ax.plot(theta, r)
ax.grid(True)
ax.set_title(names[x]) #nametag
#create another subplot in the grid that overlays all 4 smaller dataframes on one plot
ax2 = fig.add_subplot(gs[0,:],polar = True)
ax2.set_rlim(0,0.1)
for x, y in enumerate(dfSplit):
dfs = y[1]
r = dfs["Measurement"].str.strip(" in")
r = r.apply(pd.to_numeric) + zero/2
theta = dfs["Rads"]
ax2.plot(theta, r)
ax2.set_title("Sample " + str(entryName).strip(".xlsx") + " Overlayed")
ax2.legend(names,bbox_to_anchor=(1.1, 1.05)) #place legend outside of plot area
plt.savefig(str(file_loc) + "/Results/" + str(entryName).strip(".xlsx") + ".png")
print("Results Saved")

I'm on my phone, so I can't check exact code examples, but this should get you started.
First, most of the code you posted is about graphing, and therefore not useful for your needs. The basic approach: use pandas (a library), to read in the Excel sheet, use the pandas function 'groupby' to split that sheet by 'Revs', then iterate through each Rev, and use pandas again to write back to a file. Copying the relevant sections from above:
#this brings in the necessary library
import pandas as pd
#Read excel data from file as a dataframe
#header should point to the row that describes your columns. The first row is row 0.
df = pd.read_excel("filename.xlsx", header = 1)
#create a list of 4 smaller dataframes using GroupBy.
#This returns a 'GroupBy' object.
dfSplit = df.groupby("Revs")
#iterate through the groupby object, saving each
#iterating over key (name) and value (dataframes)
#use the name to build a filename
for name, frame in dfSplit:
frame.to_excel("Rev "+str(name)+".xlsx")
Edit: I had a chance to test this code, and it should now work. This will depend a little on your actual file (eg, which row is your header row).

How to update histogram based on selection of points (Altair)?

My goal is to update the histogram shown on the right side of 1 based on the selection of points on the left side.
Initially the plot seems to be alright, however once a selection is made the histogram won't be redrawn (altair 3.2.0)
Does anybody know how to do this?
below is the code to recreate the example:
import altair as alt
import pandas as pd
import numpy as np
from random import choice
dates = pd.date_range("1.1.2019", "2.28.2019")
np.random.seed(999)
Y = np.random.normal(0.5, 0.1, len(dates))
features = [choice(["a", "b", "c"]) for i in range(len(dates))]
df = pd.DataFrame({"dates": dates, "Y": Y, "features": features})
base = alt.Chart(df)
area_args = {"opacity": 0.3, "interpolate": "step"}
pts = alt.selection(type="interval", encodings=["x"])
points = (
base.mark_circle()
.encode(alt.X("dates:T"), alt.Y("Y:Q"), color="features:N")
.add_selection(pts)
)
yscale = alt.Scale(domain=(0, 1))
right_hist = (
base.mark_area(**area_args)
.encode(
alt.Y(
"Y:Q", bin=alt.Bin(maxbins=20, extent=yscale.domain), stack=None, title=""
),
alt.X("count()", stack=None, title=""),
alt.Color("features:N"),
)
.transform_filter(pts)
)
(points | right_hist)
edit1: another image to clarify my point #jvp

Solved in the comments as an issue with the OPs setup and how the plots were rendered on their end.

Only getting one bar in bqplot chart

I have some data of the form:
Name Score1 Score2 Score3 Score4
Bob -2 3 5 7
and im trying to use bqplot to plot a really basic bar chart
i'm trying:
sc_ord = OrdinalScale()
y_sc_rf = LinearScale()
bar_chart = Bars(x=data6.Name,
y=[data6.Score1, data6.Score2, data6.Score3],
scales={'x': sc_ord, 'y': y_sc_rf},
labels=['Score1', 'Score2', 'Score3'],
)
ord_ax = Axis(label='Score', scale=sc_ord, grid_lines='none')
y_ax = Axis(label='Scores', scale=y_sc_rf, orientation='vertical',
grid_lines='solid')
Figure(axes=[ord_ax, y_ax], marks=[bar_chart])
but all im getting is one bar, i assume because Name only has one value, is there a way to set the column headers as the x data? or some other way to solve this

I think this is what Doug is getting at. Your length of x and y data should be the same. In this case, x is the column labels, and y is the score values. You should set the 'Name' column of your DataFrame as the index; this will prevent it from being plotted as a value.
PS. Next time, if you make sure your code is a complete example that can be run from scratch without external data (a MCVE, https://stackoverflow.com/help/mcve) you are likely to get a much quicker answer.
BQPlot documentation has lots of good examples using the more complex pyplot interface which are worth reading: https://github.com/bloomberg/bqplot/blob/master/examples/Marks/Object%20Model/Bars.ipynb
from bqplot import *
import pandas as pd
data = pd.DataFrame(
index = ['Bob'],
columns = ['score1', 'score2', 'score3', 'score4'],
data = [[-2, 3,5,7]]
)
sc_ord = OrdinalScale()
y_sc_rf = LinearScale()
bar_chart = Bars(x=data.columns, y = data.iloc[0],
scales={'x': sc_ord, 'y': y_sc_rf},
labels=data.index[1:].tolist(),
)
ord_ax = Axis(label='Score', scale=sc_ord, grid_lines='none')
y_ax = Axis(label='Scores', scale=y_sc_rf, orientation='vertical',
grid_lines='solid')
Figure(axes=[ord_ax, y_ax], marks=[bar_chart])

Python on Raspberry Pi/Using Plotly to graph 5 temperatures . x values duplicated in data. Updating every minute. Not streaming. fileopt = "extend"

I'm recording datafrom 5 temperature sensors using a Raspberry Pi running Python 3.
All is working well and I now want to display plots of the 5 temperatures on one graph, updating every 10 minutes or so. I'd like to use Plotly.
I wrote the following code to test out the idea.
#many_lines2
# tryimg to sort out why x is sent more than once when using extend
import time
import plotly.plotly as py
from plotly.graph_objs import *
import plotly.tools as tls
#tls.set_credentials_file(username=, api_key)
from datetime import datetime
for count in range (1,5):
x1 = count
y1 = count * 2
y2 = count * 3
y3 = count * 4
trace1 = Scatter(x=x1,y = y1,mode = "lines")
trace2 = Scatter(x=x1,y = y2,mode = "lines")
trace3 = Scatter(x=x1,y = y3,mode = "lines")
data = [trace1,trace2,trace3]
py.plot (data,filename="3lines6", fileopt = "extend")
time.sleep(60)
See plot and data received by plotly here https://plot.ly/~steverat/334/trace0-trace1-trace2/
See data tab for data received by plotly.
It looksto me as though the x value in the data table has been added three times after the first values were sent.
I cab get the right results by using .append in python to creat lists of values. This leads to long lists, more data to be sent to plotly and seems just wrong.
The code to do this is below and the data on the plotly serve can be found here.https://plot.ly/~steverat/270
# using lists and append to send data to plotly
import time
import plotly.plotly as py
from plotly.graph_objs import *
import plotly.tools as tls
#tls.set_credentials_file(username='steverat', api_key='f0qs8y2vj8')
from datetime import datetime
xlist = []
y1list= []
y2list = []
y3list = []
for count in range (1,5):
xlist.append (count)
y1list.append (count * 2)
y2list.append (count * 3)
y3list.append (count * 4)
print "xlist = ", xlist
print "y1list = ", y1list
print "y2list = ", y2list
trace1 = Scatter(x=xlist,y = y1list,mode = "lines")
trace2 = Scatter(x=xlist,y = y2list,mode = "lines")
trace3 = Scatter(x=xlist,y = y3list,mode = "lines")
data = [trace1,trace2,trace3]
py.plot (data,filename="3lines2")
time.sleep(60)
I've searched the web and can find examples where data is streamed but I only want to update the plots every 10 ninsor longer.
Have I missed something obvious???
Cheers
Steve

Andrew from Plotly here. Thanks very much for documenting this so well!
EDIT
This issue should now be fixed, which makes the following workaround obsolete/incorrect. Please don't use the following workaround anymore! (keeping it here for documentation though)
TL;DR (just make it work)
Try this code out:
import time
import plotly.plotly as py
from plotly.graph_objs import Figure, Scatter
filename = 'Stack Overflow 31436471'
# setup original figure (behind the scenes, Plotly assumes you're sharing that x source)
# note that the x arrays are all the same and the y arrays are all *different*
fig = Figure()
fig['data'].append(Scatter(x=[0], y=[1], mode='lines'))
fig['data'].append(Scatter(x=[0], y=[2], mode='lines'))
fig['data'].append(Scatter(x=[0], y=[3], mode='lines'))
print py.plot(fig, filename=filename, auto_open=False)
# --> https://plot.ly/~theengineear/4949
# start extending the plots
for i in xrange(1, 3):
x1 = i
y1 = i * 2
y2 = i * 3
y3 = i * 4
fig = Figure()
# note that since the x arrays are shared, you only need to extend one of them
fig['data'].append(Scatter(x=x1, y=y1))
fig['data'].append(Scatter(y=y2))
fig['data'].append(Scatter(y=y3))
py.plot(fig, filename=filename, fileopt='extend', auto_open=False)
time.sleep(2)
More info
This appears to be a bug in our backend code. The issue is that we reuse data arrays that hash to the same value. In this case your x value is hashing to the same value and when you go to extend the traces you're actually extending the same x array three times.
The fix proposed above has you only extend one of the x arrays, which is the same array being used by the other traces anyhow.
Do note that for this to work you must supply a non-zero length array in the initial setup. This is because Plotly won't save an array if it doesn't have any data to begin with.
The takeaway is that you'll be A-OK as long as you initialize identical x arrays and ensure that in the initialization your y arrays aren't also identical to any of the x arrays.
Apologies for the inconvenient workaround. I'll edit this response when a fix has been submitted on our end.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to add Prefix to rSquared extracted from Altair? - python

Related

Altair: adding sorting destroys chart

Seeking to modify code to pull data from an excel sheet where column A has X numeric value. ( ie all rows with value= 0 )

How to update histogram based on selection of points (Altair)?

Only getting one bar in bqplot chart

Python on Raspberry Pi/Using Plotly to graph 5 temperatures . x values duplicated in data. Updating every minute. Not streaming. fileopt = "extend"

Categories

Resources