NaN values in bokeh plot - python

I asked this question on the google group of bokeh earlier link and learned quite a bit by the helpful answer provided by Sarah Bird, just to post the answer here, for anyone who is encountering something like this. I was using bokeh 0.9.2 then.
I was trying to build a bubble chart for a batch of commercial leases where:
x axis represent the date the lease will end (Datetime: 2015 - 2020)
y axis represent the rent level (Float: 200- 500)
size / radius of the circles represent the size of the premise (Float: 200 - 8,000 - GFA)
color of the circle represent the floor # (Integer: 1 - 40)
My attempt:
import pandas as pd
import numpy as np
from bokeh.io import output_notebook, show, output_file
from bokeh.plotting import figure, ColumnDataSource
from datetime import datetime
output_notebook()
PATH = ''
filename = 'test_RR.xlsx'
df = pd.read_excel(PATH + filename)
df['TA_END'] = np.where(pd.isnull(df['ET Date']), df.L_END, np.where(df['ET Date'] < df.L_END, df['ET Date'], df.L_END)) # just some data cleaning, don't bother with this
GFA_SCALE_FACTOR = 2
df['GFA_radius'] = np.sqrt( df.GFA / np.pi ) * GFA_SCALE_FACTOR
import seaborn as sns
colors = list(sns.cubehelix_palette(28, start=.5, rot=-.75))
hex_colors = np.array(['#%02x%02x%02x' % (c[0]*255, c[1]*255, c[2]*255) for c in colors])
df['color'] = hex_colors[df.FL - 4]
An error occurred when I was trying:
source = ColumnDataSource(df)
p = figure(x_axis_type="datetime", width = 800, height = 400)
p.circle(x='TA_END', y='Eff Rent',
size= 'GFA_radius',
fill_alpha=0.8, line_width=0.5, line_alpha=0.5, color = 'color', source = source)
show(p)
Error message led me to think there was something wrong with the way datetime was serialized:
ValueError: month must be in 1..12
I shall post Sarah's answer in the answers.

The problem, it turns out was actually because of an NaN value in the column of "ET Date" of the DataFrame "df", which although is irrelevant to the plot, caused bokeh's serialization to fail.
so if I just do:
source = ColumnDataSource(df[['TA_END', 'Eff Rent', 'GFA_radius', 'color']])
Everything would work out.
A good tip is always only make a ColumnDataSource from the columns you need because then you're using up more data / processing power in the
browser than you need to - also from Sarah.
However, I do hope bokeh can handle some NaN data, as some plots may want to show an empty slot now and then.

Related

How can i Plot arrows in a existing mplsoccer pitch?

I tried to do the tutorial of McKay Johns on YT (reference to the Jupyter Notebook to see the data (https://github.com/mckayjohns/passmap/blob/main/Pass%20map%20tutorial.ipynb).
I understood everything but I wanted to do a little change. I wanted to change plt.plot(...) with:
plt.arrow(df['x'][x],df['y'][x], df['endX'][x] - df['x'][x], df['endY'][x]-df['y'][x],
shape='full', color='green')
But the problem is, I still can't see the arrows. I tried multiple changes but I've failed. So I'd like to ask you in the group.
Below you can see the code.
## Read in the data
df = pd.read_csv('...\Codes\Plotting_Passes\messibetis.csv')
#convert the data to match the mplsoccer statsbomb pitch
#to see how to create the pitch, watch the video here: https://www.youtube.com/watch?v=55k1mCRyd2k
df['x'] = df['x']*1.2
df['y'] = df['y']*.8
df['endX'] = df['endX']*1.2
df['endY'] = df['endY']*.8
# Set Base
fig ,ax = plt.subplots(figsize=(13.5,8))
# Change background color of base
fig.set_facecolor('#22312b')
# Change color of base inside
ax.patch.set_facecolor('#22312b')
#this is how we create the pitch
pitch = Pitch(pitch_type='statsbomb',
pitch_color='#22312b', line_color='#c7d5cc')
# Set the axes to our Base
pitch.draw(ax=ax)
# X-Achsen => 0 to 120
# Y-Achsen => 80 to 0
# Lösung: Y-Achse invertieren:
plt.gca().invert_yaxis()
#use a for loop to plot each pass
for x in range(len(df['x'])):
if df['outcome'][x] == 'Successful':
#plt.plot((df['x'][x],df['endX'][x]),(df['y'][x],df['endY'][x]),color='green')
plt.scatter(df['x'][x],df['y'][x],color='green')
**plt.arrow(df['x'][x],df['y'][x], df['endX'][x] - df['x'][x], df['endY'][x]-df['y'][x],
shape='full', color='green')** # Here is the problem!
if df['outcome'][x] == 'Unsuccessful':
plt.plot((df['x'][x],df['endX'][x]),(df['y'][x],df['endY'][x]),color='red')
plt.scatter(df['x'][x],df['y'][x],color='red')
plt.title('Messi Pass Map vs Real Betis',color='white',size=20)
It always shows:
The problem is that plt.arrow has default values for head_width and head_length, which are too small for your figure. I.e. it is drawing arrows, the arrow heads are just way too tiny to see them (even if you zoom out). E.g. try something as follows:
import pandas as pd
import matplotlib.pyplot as plt
from mplsoccer.pitch import Pitch
df = pd.read_csv('https://raw.githubusercontent.com/mckayjohns/passmap/main/messibetis.csv')
...
# create a dict for the colors to avoid repetitive code
colors = {'Successful':'green', 'Unsuccessful':'red'}
for x in range(len(df['x'])):
plt.scatter(df['x'][x],df['y'][x],color=colors[df.outcome[x]], marker=".")
plt.arrow(df['x'][x],df['y'][x], df['endX'][x] - df['x'][x],
df['endY'][x]-df['y'][x], color=colors[df.outcome[x]],
head_width=1, head_length=1, length_includes_head=True)
# setting `length_includes_head` to `True` ensures that the arrow head is
# *part* of the line, not added on top
plt.title('Messi Pass Map vs Real Betis',color='white',size=20)
Result:
Note that you can also use plt.annotate for this, passing specific props to the parameter arrowprops. E.g.:
import pandas as pd
import matplotlib.pyplot as plt
from mplsoccer.pitch import Pitch
df = pd.read_csv('https://raw.githubusercontent.com/mckayjohns/passmap/main/messibetis.csv')
...
# create a dict for the colors to avoid repetitive code
colors = {'Successful':'green', 'Unsuccessful':'red'}
for x in range(len(df['x'])):
plt.scatter(df['x'][x],df['y'][x],color=colors[df.outcome[x]], marker=".")
props= {'arrowstyle': '-|>,head_width=0.25,head_length=0.5',
'color': colors[df.outcome[x]]}
plt.annotate("", xy=(df['endX'][x],df['endY'][x]),
xytext=(df['x'][x],df['y'][x]), arrowprops=props)
plt.title('Messi Pass Map vs Real Betis',color='white',size=20)
Result (a bit sharper, if you ask me, but maybe some tweaking with params in plt.arrow can also achieve that):

Holoviews/Datashaded map overlay not displaying

I am using the code below to get a Panel dashboard with a dropdown select box, a histogram and a map.
import pandas as pd
import holoviews as hv
from holoviews.operation.datashader import datashade, rasterize, shade
import panel as pn
from holoviews.element.tiles import OSM
import hvplot.pandas
df = pd.read_parquet('cleanedFiles/AllMNO.parquet')
mno = pn.widgets.Select(options=df['mnc'].unique().tolist())
#pn.depends(mno)
def mnoStats(operator):
return'### Operator {} has {} samples'.format(operator, len(df[df['mnc'] == operator]))
#pn.depends(mno)
def plotMap(mno):
opts = dict(width=700, height=300, tools=['hover'])
tiles = OSM().opts(alpha=0.4, xaxis=None, yaxis=None)
points = hv.Points(df[df['mnc'] == mno], ['latitude', 'longitude'])
rasterized = shade(rasterize(points, x_sampling=1, y_sampling=1)).opts(**opts)
return tiles*rasterized
def plotHist(df):
return df.hvplot.hist(y='rsrp', by='mnc', bins=20)
pn.Row(pn.Column(pn.WidgetBox('## Ofcom scanner data', mno, mnoStats)),
pn.Column(plotHist(df))).servable()
pn.Row(plotMap).servable()
The dropdown selector and histogram appear as expected, however I get a 'blocky' image for the map as below. I wanted to get the locations (lat/longs) of the measurements each coloured / datashaded by the signal level denoted by the column 'rsrp'
Please advice how this can be corrected.
According to the holoviews docs, hv.rasterize is a high-level resampling interface and passes parameters to several internal methods:
holoviews.core.operation.Operation: group, input_ranges
holoviews.operation.datashader.LinkableOperation: link_inputs
holoviews.operation.datashader.ResamplingOperation: dynamic, streams, expand, height, width, x_range, y_range, x_sampling, y_sampling, target, element_type, precompute
holoviews.operation.datashader.AggregationOperation: vdim_prefix
Based on this, it looks like your arguments x_sampling and y_sampling are passed to ResamplingOperation, which are described:
x_sampling = param.Number(allow_None=True, inclusive_bounds=(True, True), label=’X sampling’)
Specifies the smallest allowed sampling interval along the x axis.
y_sampling = param.Number(allow_None=True, inclusive_bounds=(True, True), label=’Y sampling’)
Specifies the smallest allowed sampling interval along the y axis.
So, I'd guess that the issue is that providing the arguments x_sampling=1, y_sampling=1 to rasterize has the effect of aggregating all of your data to 1 degree, or approximately 110 km/70 mile blocks, which is causing the blockiness in your figure. Changing these parameters to a smaller value, such as 0.1 or smaller, should resolve the issue, as long as your data itself has sufficient resolution.

How to format median and errors differently in corner plots?

I'm trying to format a corner plot using the corner package in Python. As far as I know, there's the command title_fmt = *arg, however it gives the same format to both the median and the errors, which is inconvenient for reporting measurement errors. I need the error to be shown with 2 significant figures, and then the median to be rounded at the last sig.fig. of its error. Here's an example of what I can do
import numpy as np
from matplotlib import pyplot as plt
import corner
np.random.seed(539)
# generate data
data = np.random.randn(5000,3)
data[:,0] = data[:,0]*20 + 150.75
data[:,1] = data[:,1] + 7.52
data[:,2] = data[:,2]*5 + 31.25
# make plot
labels = ['x','y','f']
fig=plt.figure(figsize=(7,7),dpi=100)
fig=corner.corner(
data, labels=labels, quantiles=(0.16, 0.84),show_titles=True,
title_fmt='g', use_math_text=True, fig=fig)
fig.show()
which gives
I could enter a line like title_fmt = '.2g' which gives
Where as expected shows less sig.fig. for the errors, but then I'm missing sig.fig. in the median and some of the errors don't display the last sig.fig. when it's 0. For my example I'd need to get something like
x = 151 +20
-19
y = 7.5 +1.0
-1.0
f = 31.3 +5.2
-5.0
I've read the API and it doesn't give any more explanations beyond the title_fmt option. If anyone can help, thanks in advance.
title_fmt = '.2f' should format numbers with 2 decimal places.
Here you have all possible formatting options:
https://docs.python.org/3/reference/lexical_analysis.html#f-strings

Save "DataFrame.style" table in Jupyter as png? [duplicate]

I constructed a pandas dataframe of results. This data frame acts as a table. There are MultiIndexed columns and each row represents a name, ie index=['name1','name2',...] when creating the DataFrame. I would like to display this table and save it as a png (or any graphic format really). At the moment, the closest I can get is converting it to html, but I would like a png. It looks like similar questions have been asked such as How to save the Pandas dataframe/series data as a figure?
However, the marked solution converts the dataframe into a line plot (not a table) and the other solution relies on PySide which I would like to stay away simply because I cannot pip install it on linux. I would like this code to be easily portable. I really was expecting table creation to png to be easy with python. All help is appreciated.
Pandas allows you to plot tables using matplotlib (details here).
Usually this plots the table directly onto a plot (with axes and everything) which is not what you want. However, these can be removed first:
import matplotlib.pyplot as plt
import pandas as pd
from pandas.table.plotting import table # EDIT: see deprecation warnings below
ax = plt.subplot(111, frame_on=False) # no visible frame
ax.xaxis.set_visible(False) # hide the x axis
ax.yaxis.set_visible(False) # hide the y axis
table(ax, df) # where df is your data frame
plt.savefig('mytable.png')
The output might not be the prettiest but you can find additional arguments for the table() function here.
Also thanks to this post for info on how to remove axes in matplotlib.
EDIT:
Here is a (admittedly quite hacky) way of simulating multi-indexes when plotting using the method above. If you have a multi-index data frame called df that looks like:
first second
bar one 1.991802
two 0.403415
baz one -1.024986
two -0.522366
foo one 0.350297
two -0.444106
qux one -0.472536
two 0.999393
dtype: float64
First reset the indexes so they become normal columns
df = df.reset_index()
df
first second 0
0 bar one 1.991802
1 bar two 0.403415
2 baz one -1.024986
3 baz two -0.522366
4 foo one 0.350297
5 foo two -0.444106
6 qux one -0.472536
7 qux two 0.999393
Remove all duplicates from the higher order multi-index columns by setting them to an empty string (in my example I only have duplicate indexes in "first"):
df.ix[df.duplicated('first') , 'first'] = '' # see deprecation warnings below
df
first second 0
0 bar one 1.991802
1 two 0.403415
2 baz one -1.024986
3 two -0.522366
4 foo one 0.350297
5 two -0.444106
6 qux one -0.472536
7 two 0.999393
Change the column names over your "indexes" to the empty string
new_cols = df.columns.values
new_cols[:2] = '','' # since my index columns are the two left-most on the table
df.columns = new_cols
Now call the table function but set all the row labels in the table to the empty string (this makes sure the actual indexes of your plot are not displayed):
table(ax, df, rowLabels=['']*df.shape[0], loc='center')
et voila:
Your not-so-pretty but totally functional multi-indexed table.
EDIT: DEPRECATION WARNINGS
As pointed out in the comments, the import statement for table:
from pandas.tools.plotting import table
is now deprecated in newer versions of pandas in favour of:
from pandas.plotting import table
EDIT: DEPRECATION WARNINGS 2
The ix indexer has now been fully deprecated so we should use the loc indexer instead. Replace:
df.ix[df.duplicated('first') , 'first'] = ''
with
df.loc[df.duplicated('first') , 'first'] = ''
There is actually a python library called dataframe_image
Just do a
pip install dataframe_image
Do the imports
import pandas as pd
import numpy as np
import dataframe_image as dfi
df = pd.DataFrame(np.random.randn(6, 6), columns=list('ABCDEF'))
and style your table if you want by:
df_styled = df.style.background_gradient() #adding a gradient based on values in cell
and finally:
dfi.export(df_styled,"mytable.png")
The best solution to your problem is probably to first export your dataframe to HTML and then convert it using an HTML-to-image tool.
The final appearance could be tweaked via CSS.
Popular options for HTML-to-image rendering include:
WeasyPrint
wkhtmltopdf/wkhtmltoimage
Let us assume we have a dataframe named df.
We can generate one with the following code:
import string
import numpy as np
import pandas as pd
np.random.seed(0) # just to get reproducible results from `np.random`
rows, cols = 5, 10
labels = list(string.ascii_uppercase[:cols])
df = pd.DataFrame(np.random.randint(0, 100, size=(5, 10)), columns=labels)
print(df)
# A B C D E F G H I J
# 0 44 47 64 67 67 9 83 21 36 87
# 1 70 88 88 12 58 65 39 87 46 88
# 2 81 37 25 77 72 9 20 80 69 79
# 3 47 64 82 99 88 49 29 19 19 14
# 4 39 32 65 9 57 32 31 74 23 35
Using WeasyPrint
This approach uses a pip-installable package, which will allow you to do everything using the Python ecosystem.
One shortcoming of weasyprint is that it does not seem to provide a way of adapting the image size to its content.
Anyway, removing some background from an image is relatively easy in Python / PIL, and it is implemented in the trim() function below (adapted from here).
One also would need to make sure that the image will be large enough, and this can be done with CSS's #page size property.
The code follows:
import weasyprint as wsp
import PIL as pil
def trim(source_filepath, target_filepath=None, background=None):
if not target_filepath:
target_filepath = source_filepath
img = pil.Image.open(source_filepath)
if background is None:
background = img.getpixel((0, 0))
border = pil.Image.new(img.mode, img.size, background)
diff = pil.ImageChops.difference(img, border)
bbox = diff.getbbox()
img = img.crop(bbox) if bbox else img
img.save(target_filepath)
img_filepath = 'table1.png'
css = wsp.CSS(string='''
#page { size: 2048px 2048px; padding: 0px; margin: 0px; }
table, td, tr, th { border: 1px solid black; }
td, th { padding: 4px 8px; }
''')
html = wsp.HTML(string=df.to_html())
html.write_png(img_filepath, stylesheets=[css])
trim(img_filepath)
Using wkhtmltopdf/wkhtmltoimage
This approach uses an external open source tool and this needs to be installed prior to the generation of the image.
There is also a Python package, pdfkit, that serves as a front-end to it (it does not waive you from installing the core software yourself), but I will not use it.
wkhtmltoimage can be simply called using subprocess (or any other similar means of running an external program in Python).
One would also need to output to disk the HTML file.
The code follows:
import subprocess
df.to_html('table2.html')
subprocess.call(
'wkhtmltoimage -f png --width 0 table2.html table2.png', shell=True)
and its aspect could be further tweaked with CSS similarly to the other approach.
Although I am not sure if this is the result you expect, you can save your DataFrame in png by plotting the DataFrame with Seaborn Heatmap with annotations on, like this:
http://stanford.edu/~mwaskom/software/seaborn/generated/seaborn.heatmap.html#seaborn.heatmap
It works right away with a Pandas Dataframe. You can look at this example: Efficiently ploting a table in csv format using Python
You might want to change the colormap so it displays a white background only.
Hope this helps.
Edit:
Here is a snippet that does this:
import matplotlib
import seaborn as sns
def save_df_as_image(df, path):
# Set background to white
norm = matplotlib.colors.Normalize(-1,1)
colors = [[norm(-1.0), "white"],
[norm( 1.0), "white"]]
cmap = matplotlib.colors.LinearSegmentedColormap.from_list("", colors)
# Make plot
plot = sns.heatmap(df, annot=True, cmap=cmap, cbar=False)
fig = plot.get_figure()
fig.savefig(path)
The solution of #bunji works for me, but default options don't always give a good result.
I added some useful parameter to tweak the appearance of the table.
import pandas as pd
import matplotlib.pyplot as plt
from pandas.plotting import table
import numpy as np
dates = pd.date_range('20130101',periods=6)
df = pd.DataFrame(np.random.randn(6,4),index=dates,columns=list('ABCD'))
df.index = [item.strftime('%Y-%m-%d') for item in df.index] # Format date
fig, ax = plt.subplots(figsize=(12, 2)) # set size frame
ax.xaxis.set_visible(False) # hide the x axis
ax.yaxis.set_visible(False) # hide the y axis
ax.set_frame_on(False) # no visible frame, uncomment if size is ok
tabla = table(ax, df, loc='upper right', colWidths=[0.17]*len(df.columns)) # where df is your data frame
tabla.auto_set_font_size(False) # Activate set fontsize manually
tabla.set_fontsize(12) # if ++fontsize is necessary ++colWidths
tabla.scale(1.2, 1.2) # change size table
plt.savefig('table.png', transparent=True)
The result:
I had the same requirement for a project I am doing. But none of the answers came elegant to my requirement. Here is something which finally helped me, and might be useful for this case:
from bokeh.io import export_png, export_svgs
from bokeh.models import ColumnDataSource, DataTable, TableColumn
def save_df_as_image(df, path):
source = ColumnDataSource(df)
df_columns = [df.index.name]
df_columns.extend(df.columns.values)
columns_for_table=[]
for column in df_columns:
columns_for_table.append(TableColumn(field=column, title=column))
data_table = DataTable(source=source, columns=columns_for_table,height_policy="auto",width_policy="auto",index_position=None)
export_png(data_table, filename = path)
There is a Python library called df2img available at https://pypi.org/project/df2img/ (disclaimer: I'm the author). It's a wrapper/convenience function using plotly as backend.
You can find the documentation at https://df2img.dev.
import pandas as pd
import df2img
df = pd.DataFrame(
data=dict(
float_col=[1.4, float("NaN"), 250, 24.65],
str_col=("string1", "string2", float("NaN"), "string4"),
),
index=["row1", "row2", "row3", "row4"],
)
Saving a pd.DataFrame as a .png-file can be done fairly quickly. You can apply formatting, such as background colors or alternating the row colors for better readability.
fig = df2img.plot_dataframe(
df,
title=dict(
font_color="darkred",
font_family="Times New Roman",
font_size=16,
text="This is a title",
),
tbl_header=dict(
align="right",
fill_color="blue",
font_color="white",
font_size=10,
line_color="darkslategray",
),
tbl_cells=dict(
align="right",
line_color="darkslategray",
),
row_fill_color=("#ffffff", "#d7d8d6"),
fig_size=(300, 160),
)
df2img.save_dataframe(fig=fig, filename="plot.png")
If you're okay with the formatting as it appears when you call the DataFrame in your coding environment, then the absolute easiest way is to just use print screen and crop the image using basic image editing software.
Here's how it turned out for me using Jupyter Notebook, and Pinta Image Editor (Ubuntu freeware).
As jcdoming suggested, use Seaborn heatmap():
import seaborn as sns
import matplotlib.pyplot as plt
fig = plt.figure(facecolor='w', edgecolor='k')
sns.heatmap(df.head(), annot=True, cmap='viridis', cbar=False)
plt.savefig('DataFrame.png')
The easiest and fastest way to convert a Pandas dataframe into a png image using Anaconda Spyder IDE- just double-click on the dataframe in variable explorer, and the IDE table will appear, nicely packaged with automatic formatting and color scheme. Just use a snipping tool to capture the table for use in your reports, saved as a png:
This saves me lots of time, and is still elegant and professional.
The following would need extensive customisation to format the table correctly, but the bones of it works:
import numpy as np
from PIL import Image, ImageDraw, ImageFont
import pandas as pd
df = pd.DataFrame({ 'A' : 1.,
'B' : pd.Series(1,index=list(range(4)),dtype='float32'),
'C' : np.array([3] * 4,dtype='int32'),
'D' : pd.Categorical(["test","train","test","train"]),
'E' : 'foo' })
class DrawTable():
def __init__(self,_df):
self.rows,self.cols = _df.shape
img_size = (300,200)
self.border = 50
self.bg_col = (255,255,255)
self.div_w = 1
self.div_col = (128,128,128)
self.head_w = 2
self.head_col = (0,0,0)
self.image = Image.new("RGBA", img_size,self.bg_col)
self.draw = ImageDraw.Draw(self.image)
self.draw_grid()
self.populate(_df)
self.image.show()
def draw_grid(self):
width,height = self.image.size
row_step = (height-self.border*2)/(self.rows)
col_step = (width-self.border*2)/(self.cols)
for row in range(1,self.rows+1):
self.draw.line((self.border-row_step//2,self.border+row_step*row,width-self.border,self.border+row_step*row),fill=self.div_col,width=self.div_w)
for col in range(1,self.cols+1):
self.draw.line((self.border+col_step*col,self.border-col_step//2,self.border+col_step*col,height-self.border),fill=self.div_col,width=self.div_w)
self.draw.line((self.border-row_step//2,self.border,width-self.border,self.border),fill=self.head_col,width=self.head_w)
self.draw.line((self.border,self.border-col_step//2,self.border,height-self.border),fill=self.head_col,width=self.head_w)
self.row_step = row_step
self.col_step = col_step
def populate(self,_df2):
font = ImageFont.load_default().font
for row in range(self.rows):
print(_df2.iloc[row,0])
self.draw.text((self.border-self.row_step//2,self.border+self.row_step*row),str(_df2.index[row]),font=font,fill=(0,0,128))
for col in range(self.cols):
text = str(_df2.iloc[row,col])
text_w, text_h = font.getsize(text)
x_pos = self.border+self.col_step*(col+1)-text_w
y_pos = self.border+self.row_step*row
self.draw.text((x_pos,y_pos),text,font=font,fill=(0,0,128))
for col in range(self.cols):
text = str(_df2.columns[col])
text_w, text_h = font.getsize(text)
x_pos = self.border+self.col_step*(col+1)-text_w
y_pos = self.border - self.row_step//2
self.draw.text((x_pos,y_pos),text,font=font,fill=(0,0,128))
def save(self,filename):
try:
self.image.save(filename,mode='RGBA')
print(filename," Saved.")
except:
print("Error saving:",filename)
table1 = DrawTable(df)
table1.save('C:/Users/user/Pictures/table1.png')
The output looks like this:
People who use Plotly for data visualization:
You can easily convert the dataframe to go.Table.
You can save the dataframe with columns names.
You can format the dataframe through go.Table.
You can save the dataframe as pdf, jpg, or png with different scales and high resolution.
import plotly.express as px
df = px.data.medals_long()
fig = go.Figure(data=[
go.Table(
header=dict(values=list(df.columns),align='center'),
cells=dict(values=df.values.transpose(),
fill_color = [["white","lightgrey"]*df.shape[0]],
align='center'
)
)
])
fig.write_image('image.png',scale=6)
Note: the image is downloaded in the same directory where the current python file is running.
Output:
I really like the way Jupyter notebooks format the DataFrame and this library exports it in the same format:
import dataframe_image as dfi
dfi.export(df, "df.png")
There is also a dpi argument in case you want to increase the quality of the image. I'd recommend 300 for an ok quality, 600 for exelent, 1200 for perfect and more than that is probably too much.
import dataframe_image as dfi
dfi.export(df, "df.png", dpi = 600)

Python on Raspberry Pi/Using Plotly to graph 5 temperatures . x values duplicated in data. Updating every minute. Not streaming. fileopt = "extend"

I'm recording datafrom 5 temperature sensors using a Raspberry Pi running Python 3.
All is working well and I now want to display plots of the 5 temperatures on one graph, updating every 10 minutes or so. I'd like to use Plotly.
I wrote the following code to test out the idea.
#many_lines2
# tryimg to sort out why x is sent more than once when using extend
import time
import plotly.plotly as py
from plotly.graph_objs import *
import plotly.tools as tls
#tls.set_credentials_file(username=, api_key)
from datetime import datetime
for count in range (1,5):
x1 = count
y1 = count * 2
y2 = count * 3
y3 = count * 4
trace1 = Scatter(x=x1,y = y1,mode = "lines")
trace2 = Scatter(x=x1,y = y2,mode = "lines")
trace3 = Scatter(x=x1,y = y3,mode = "lines")
data = [trace1,trace2,trace3]
py.plot (data,filename="3lines6", fileopt = "extend")
time.sleep(60)
See plot and data received by plotly here https://plot.ly/~steverat/334/trace0-trace1-trace2/
See data tab for data received by plotly.
It looksto me as though the x value in the data table has been added three times after the first values were sent.
I cab get the right results by using .append in python to creat lists of values. This leads to long lists, more data to be sent to plotly and seems just wrong.
The code to do this is below and the data on the plotly serve can be found here.https://plot.ly/~steverat/270
# using lists and append to send data to plotly
import time
import plotly.plotly as py
from plotly.graph_objs import *
import plotly.tools as tls
#tls.set_credentials_file(username='steverat', api_key='f0qs8y2vj8')
from datetime import datetime
xlist = []
y1list= []
y2list = []
y3list = []
for count in range (1,5):
xlist.append (count)
y1list.append (count * 2)
y2list.append (count * 3)
y3list.append (count * 4)
print "xlist = ", xlist
print "y1list = ", y1list
print "y2list = ", y2list
trace1 = Scatter(x=xlist,y = y1list,mode = "lines")
trace2 = Scatter(x=xlist,y = y2list,mode = "lines")
trace3 = Scatter(x=xlist,y = y3list,mode = "lines")
data = [trace1,trace2,trace3]
py.plot (data,filename="3lines2")
time.sleep(60)
I've searched the web and can find examples where data is streamed but I only want to update the plots every 10 ninsor longer.
Have I missed something obvious???
Cheers
Steve
Andrew from Plotly here. Thanks very much for documenting this so well!
EDIT
This issue should now be fixed, which makes the following workaround obsolete/incorrect. Please don't use the following workaround anymore! (keeping it here for documentation though)
TL;DR (just make it work)
Try this code out:
import time
import plotly.plotly as py
from plotly.graph_objs import Figure, Scatter
filename = 'Stack Overflow 31436471'
# setup original figure (behind the scenes, Plotly assumes you're sharing that x source)
# note that the x arrays are all the same and the y arrays are all *different*
fig = Figure()
fig['data'].append(Scatter(x=[0], y=[1], mode='lines'))
fig['data'].append(Scatter(x=[0], y=[2], mode='lines'))
fig['data'].append(Scatter(x=[0], y=[3], mode='lines'))
print py.plot(fig, filename=filename, auto_open=False)
# --> https://plot.ly/~theengineear/4949
# start extending the plots
for i in xrange(1, 3):
x1 = i
y1 = i * 2
y2 = i * 3
y3 = i * 4
fig = Figure()
# note that since the x arrays are shared, you only need to extend one of them
fig['data'].append(Scatter(x=x1, y=y1))
fig['data'].append(Scatter(y=y2))
fig['data'].append(Scatter(y=y3))
py.plot(fig, filename=filename, fileopt='extend', auto_open=False)
time.sleep(2)
More info
This appears to be a bug in our backend code. The issue is that we reuse data arrays that hash to the same value. In this case your x value is hashing to the same value and when you go to extend the traces you're actually extending the same x array three times.
The fix proposed above has you only extend one of the x arrays, which is the same array being used by the other traces anyhow.
Do note that for this to work you must supply a non-zero length array in the initial setup. This is because Plotly won't save an array if it doesn't have any data to begin with.
The takeaway is that you'll be A-OK as long as you initialize identical x arrays and ensure that in the initialization your y arrays aren't also identical to any of the x arrays.
Apologies for the inconvenient workaround. I'll edit this response when a fix has been submitted on our end.

Categories

Resources