Using a R function in python notebook to visualize missing data - python

naniar is a common R package for visualizing missing data. I am trying to use rpy2 to call an R function vis_miss() in naniar to plot the missing data.
Python is giving me a data frame as output instead of a plot in my notebook and I would like to solve this. The idea is to use the vis_miss package in a python notebook.
Below is a working example using iris dataset:
# install rpy2 to run R in python
!pip3 install rpy2
%load_ext rpy2.ipython
from sklearn.datasets import load_iris
%R install.packages("naniar")
%R library(naniar)
%R library(ggplot2)
# Load Iris data
iris = load_iris()
# Run vis_miss function, expecting to see a graph showing missing data
%R naniar::vis_miss(iris)
My output should now be an image of missing data but instead I get:
ListVector with 10 elements.
data R/rpy2 DataFrame (750 x 4)
rows variable valueType value
... ... ... ...
layers ListVector with 1 elements.
[no name] [RTYPES.ENVSXP]
scales add: function clone: function find: function get_scales: function has_scale: function input: function n: function non_position_scales: function scales: list super:
... ...
plot_env
labels ListVector with 4 elements.
x [RTYPES.STRSXP]
y [RTYPES.STRSXP]
text [RTYPES.STRSXP]
fill [RTYPES.STRSXP]
guides ListVector with 1 elements.
fill [RTYPES.VECSXP]
How can I get the required output that would occur in R, within a cell in this python notebook?
Would I perhaps use matplotlib or ggplot2 here?

Use cell magic (%%R) to get the output as an image:
%%R
naniar::vis_miss(iris)
The cell magic also allows to customize width/height/dpi and format, see: IPython magic integration.

Related

Python Databricks cannot visualise dtreeviz decision tree

I need to visualize a decision tree in dtreeviz in Databricks.
The code seems to be working fine.
However, instead of showing the decision tree it throws the following:
Out[23]: <dtreeviz.trees.DTreeViz at 0x7f5b27a91160>
Running the following code:
import pandas as pd
from sklearn import preprocessing, tree
from dtreeviz.trees import dtreeviz
Things = {'Feature01': [3,4,5,0],
'Feature02': [4,5,6,0],
'Feature03': [1,2,3,8],
'Target01': ['Red','Blue','Teal','Red']}
df = pd.DataFrame(Things,
columns= ['Feature01', 'Feature02',
'Feature02', 'Target01'])
label_encoder = preprocessing.LabelEncoder()
label_encoder.fit(df.Target01)
df['target'] = label_encoder.transform(df.Target01)
classifier = tree.DecisionTreeClassifier()
classifier.fit(df.iloc[:,:3], df.target)
dtreeviz(classifier,
df.iloc[:,:3],
df.target,
target_name='toy',
feature_names=df.columns[0:3],
class_names=list(label_encoder.classes_)
)
if you look into dtreeviz documentation you'll see that dtreeviz method just creates an object, and then you need to use function like .view() to show it. On Databricks, view won't work, but you can use .svg() method to generate output as SVG, and then use displayHTML function to show it. Following code:
viz = dtreeviz(classifier,
...)
displayHTML(viz.svg())
will give you desired output:
P.S. You need to have the dot command-line tool to generate output. It could be installed by executing in a cell of the notebook:
%sh apt-get install -y graphviz

Matplotlib bug in histogram - all stacked in one bar

When I run :
plt.hist('%s/draws.csv' % CONFIG['build']['draw_data'])
I get this weird histogram,
all my data is put under the same frequency bar, and I cannot get rid of the file path below, neither show x.ticks.
Its weird because, my data is simply a csv file exported from np.random.normal(size=100000).
When I run directly the code :
data = np.random.normal(size=10000)
plt.hist(data)
I get a normal histogram :
What could be the issue here?

Trying to save output of display(Math('f(x) = x + 2C')) to an image file using jupyter notebook

I have learned how to display symbolic math in jupyter notebook using the display, Math, & Latex packages.
An example would be: display(Math('some math eqn')), which would result in clearly formatted symbolic math within jupyter notebook.
Now I am interested in exporting the display to an image file so I can use it elsewhere.
I tried using the code below, but it returned a NoneType object:
Image(display(Math('f(x) = x + 2C')), embed=True, format=PNG)
How would I go about doing this please?
Update:
I tried executing display(Math("\dfrac{5x}{3}")) via a cmd prompt and got this: <IPython.core.display.Math object>. I'm wondering if there's a way to convert this object type to an image?

matplotlib.pyplot.plot() doesn't show the graph

I am learning Python and I have a side project to learn to display data using matplotlib.pyplot module. Here is an example to display the data using dates[] and prices[] as data. Does anyone know why we need line 5 and line 6? I am confused why this step is needed to have the graph displayed.
from sklearn import linear_model
import matplotlib.pyplot as plt
def showgraph(dates, prices):
dates = numpy.reshape(dates, (len(dates), 1)) # line 5
prices = numpy.reshape(prices, (len(prices), 1)) # line 6
linear_mod = linear_model.LinearRegression()
linear_mod.fit(dates,prices)
plt.scatter(dates,prices,color='yellow')
plt.plot(dates,linear_mod.predict(dates),color='green')
plt.show()
try the following in terminal to check the backend:
import matplotlib
import matplotlib.pyplot
print matplotlib.backends.backend
If it shows 'agg', it is a non-interactive one and wont show but plt.savefig works.
To show the plot, you need to switch to TkAgg or Qt4Agg.
You need to edit the backend in matplotlibrc file. To print its location in terminal do the following.
import matplotlib
matplotlib.matplotlib_fname()
more about matplotlibrc
Line 5 and 6 transform what Im assuming are row vectors (im not sure how data and prices are encoded before this transformation) into column vectors. So now you have vectors that look like this.
[0,
1,
2,
3]
which is the form that linear_model.Linear_Regression.fit() is expecting. The reshaping was not necessary for plotting under the assumption that data and prices are row vectors.
My approach is exactly like yours but still without line 5 and 6 display is correct. I think those line are unnecessary. It seems that you do not need fit() function because of your input data are in row format.

How to use lattice in Rpy2 and save result to pdf?

I am following the documentation for rpy2 here (http://rpy.sourceforge.net/rpy2/doc-2.1/html/graphics.html?highlight=lattice). I can successfully plot interactively using lattice from rpy2, e.g.:
iris = r('iris')
p = lattice.xyplot(Formula("Petal.Length ~ Petal.Width"),
data=iris)
rprint = robj.globalenv.get("print")
rprint(p)
rprint displays the graph. However, when I try to save the graph to pdf by first doing:
r.pdf("myfile.pdf")
and then my lattice calls, it does not work and instead results in an empty pdf. If I do the same (call r.pdf, then plot) with ggplot2 or with the R base, then I get a working pdf. Does lattice require anything special from within Rpy2 to save the results to a PDF file? The following does not work either:
iris = r('iris')
r.pdf("myfile.pdf")
grdevices = importr('grDevices')
p = lattice.xyplot(Formula("Petal.Length ~ Petal.Width"),
data=iris)
rprint = robj.globalenv.get("print")
rprint(p)
grdevices.dev_off()
Thank you.
you need some equivalent of dev.off() after the print command.
That is, in order to save your graphs to pdf, the general outline is:
pdf(...)
print(....)
dev.off()
Failing to call dev.off() will result in an empty pdf file.
from this source, it appears that the equivalent in rpy2 might be
grdevices.dev_off()
The solution is to use:
robjects.r["dev.off"]()
For some reason the other variants do not do the trick.

Categories

Resources