Problem
I use Jupyter a lot, and while using jupyter i have the same list of imports that are long and cumbersome, something like:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from IPython.core.display import display, HTML
from ipywidgets import interact, IntSlider
from IPython.display import display
pd.options.display.max_columns = 35
pd.options.display.max_rows = 300
plt.rcParams['figure.figsize'] = [12, 8]
plt.rcParams['figure.dpi'] = 170 # 200 e.g. is really fine, but slower
import IPython.display as ipd
plt.ion()
display(HTML("<style>.container { width:95% !important; }</style>"))
def plot_frozen(df, num_rows=30, num_columns=30, step_rows=1,
step_columns=1):
"""
Freeze the headers (column and index names) of a Pandas DataFrame. A widget
enables to slide through the rows and columns.
Parameters
----------
df : Pandas DataFrame
DataFrame to display
num_rows : int, optional
Number of rows to display
num_columns : int, optional
Number of columns to display
step_rows : int, optional
Step in the rows
step_columns : int, optional
Step in the columns
Returns
-------
Displays the DataFrame with the widget
"""
#interact(last_row=IntSlider(min=min(num_rows, df.shape[0]),
max=df.shape[0],
step=step_rows,
description='rows',
readout=False,
disabled=False,
continuous_update=True,
orientation='horizontal',
slider_color='purple'),
last_column=IntSlider(min=min(num_columns, df.shape[1]),
max=df.shape[1],
step=step_columns,
description='columns',
readout=False,
disabled=False,
continuous_update=True,
orientation='horizontal',
slider_color='purple'))
def _freeze_header(last_row, last_column):
display(df.iloc[max(0, last_row-num_rows):last_row,
max(0, last_column-num_columns):last_column])
It's imports and a bunch of plotting/display helper functions.
Is there a way for me to bundle all of this up into a single pip package so that i can only have a line or two?
I'm imagining running:
pip install Genesis
then inside my jupyter notebook have:
import Genesis
and nothing else.
What I've tried:
I've tried making a genesis package that is basically a copy of this guide but with a single file called jupyter.py that contains the setup code above.
Then I run the following:
from Genesis import jupyter
jupyter.setup()
But it doesn't import pandas,numpy and matplotlib.pyplot for me. It makes sense because those packages are imported within the scope of the package. But any way to avoid that? Is it even possible in Python?
You can make a package with all your imports no problem, you just need to be careful of namespaces.
Say I have a file:
# genesis/__init__.py
import pandas as pd
import numpy as np
...
Importing that genesis package will run that code, but it won't be accessible directly
>>> import genesis
>>> help(np)
raceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'np' is not defined
>>> help(genesis.np) # This should succeed
...
You could address this with from genesis import * which would bring everything into the namespace you expect
e.g.
>>> from genesis import *
>>> help(np) # This should succeed
...
Related
Please help, I get the error below running jupyter notebook.
import numpy as np
import pandas as pd
from helper import boston_dataframe
np.set_printoptions(precision=3, suppress=True)
Error:
ImportError Traceback (most recent call last)
<ipython-input-3-a6117bd64450> in <module>
1 import numpy as np
2 import pandas as pd
----> 3 from helper import boston_dataframe
4
5
ImportError: cannot import name 'boston_dataframe' from 'helper' (/Users/irina/opt/anaconda3/lib/python3.8/site-packages/helper/__init__.py)
Since you are not giving the where you get the notebook, I have to guess that you get it from this course Supervised Learning: Regression provided IBM.
In the zip folder in week 1, it provides helper.py.
What you need to do it is to change the directory to where this file is. Change IPython/Jupyter notebook working directory
Alternatively, you can load boston data from sklearn then load it to Pandas Dataframe
Advices for you:
Learn how to use Jupyter notebook
Learn how Python import work
Learn how to provide information in a question so that no one need to guess
EDIT Found solution. Plot Python Plots Inline
I have been making a Rmarkdown notebook to take notes on my python studies. Within RStudio I have been able to knit the document containing my python code to HTML with no problem until I started plotting data using matplotlib. The curious part is that the plots are generated correctly within the code chunks. However, after knitting, it spits out an error every time at 80%.
Here is my sample code:
---
title: "Python Plot"
output: html_document
---
```{r setup, include=FALSE}
library(knitr)
knitr::opts_chunk$set(echo = TRUE)
library(reticulate) #Allows for Python to be run in chunks
```
```{python, eval=F}
import numpy as np
trees = np.array(r.trees) #Imported an internal R dataset. It got rid of headers and first row. Don't know how to deal with that right now.
type(trees)
np.shape(trees)
print(trees[1:6,:])
import matplotlib.pyplot as plt
plt.plot(trees[:,0], trees[:,1])
plt.show()
plt.clf() #Reset plot surface
```
Again, this plot comes out just fine when processing within the chunk, but does not knit. The error message says,
"This application failed to start because it could not find or load the Qt platform plugin "windows" in ",
Reinstalling the application may fix this problem."
I have uninstalled and reinstalled both Rstudio and Python and continue to have the same result. I find it odd that it works within the chunk but not to knit to HTML. All my other python code knits just fine.
I have
python 3.7.3 (default, Mar 27 2019, 17:13:21) [MSC v.1915 64 bit (AMD64)]
Rstudio Version 1.2.1335.
I've read other solutions. I believe that the libEGL.dll is in the same place as all the other QT5*.dll.
I found an answer here by Kevin Arseneau. Plot Python Plots Inline
I would not call this a duplicate because the problem was different, but the solution worked for both problems.
What is needed is to add the following code:
import os
os.environ['QT_QPA_PLATFORM_PLUGIN_PATH'] = '/path/to/Anaconda3/Library/plugins/platforms'
Here is my updated working code. It is similar to original question and updated with a python chunk for imports that Bryan suggested.
---
title: "Python Plot"
output: html_document
---
```{r setup, include=FALSE}
library(knitr)
knitr::opts_chunk$set(echo = TRUE)
library(reticulate) #Allows for Python to be run in chunks
```
```{python import}
import os
os.environ['QT_QPA_PLATFORM_PLUGIN_PATH'] = '/path/to/Anaconda3/Library/plugins/platforms'
import numpy as np
import matplotlib.pyplot as plt
```{python, eval=TRUE}
trees = np.array(r.trees) #Imported an internal R dataset. It got rid of headers and first row. Don't know how to deal with that right now.
type(trees)
np.shape(trees)
print(trees[1:6,:])
plt.plot(trees[:,0], trees[:,1])
plt.show()
plt.clf() #Reset plot surface
```
In your import libraries, you have to install and import PyQT5 into your Python environment. So for example, my first chunks look like the following, first line of # Base Libraries is import PyQt5:
---
title: "Cancellations TS"
author: "Bryan Butler"
date: "7/1/2019"
output:
html_document:
toc: false
toc_depth: 1
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, warning = FALSE, cache.lazy = FALSE)
```
# <strong>Time Series of Auto Policies</strong> {.tabset .tabset-fade .tabset-pills}
<style>
.main-container {
max-width: 1200px !important;
margin-left: auto;
margin-right: auto;
}
</style>
{r, loadPython}
library(reticulate)
use_condaenv('timeseries')
## Load Python
{python importLibaries}
# Base libraries
import PyQt5
import pandas as pd
from pandas import Series, DataFrame
from pandas.plotting import lag_plot
import numpy as np
import pyodbc
I was able to run your code with some modifications. I broke the chunks up for error checking. You need to import numpy as np, and I added others. Here is the code that I got to work. Also, I use conda virtual environments so that the Python environment is exact. This is what worked:
---
title: "test"
author: "Bryan Butler"
date: "7/2/2019"
output: html_document
---
```{r setup, include=FALSE}
library(knitr)
knitr::opts_chunk$set(echo = TRUE)
library(reticulate) #Allows for Python to be run in chunks
use_condaenv('timeseries')
```
```{python import}
import PyQt5
import pandas as pd
from pandas import Series, DataFrame
import numpy as np
import matplotlib.pyplot as plt
```
```{python, test}
trees = np.array(r.trees) #Imported an internal R dataset. It got rid of headers and first row. Don't know how to deal with that right now.
type(trees)
np.shape(trees)
print(trees[1:6,:])
```
```{python plot}
plt.plot(trees[:,0], trees[:,1])
plt.show()
plt.clf() #Reset plot surface
```
This is a python script in the Spyder IDE of Anaconda. I have Python 3.6.2. The last two lines do nothing (apparently) when I run the script, but work if I type them in the IPython console of Spyder. How do I get them to work in the script please?
# import packages
import os # misc operating system functions
import sys # system parameters and functions
import pandas as pd # data frame handling
import statsmodels.formula.api as sm # stats module
import matplotlib.pyplot as plt # matlab-like plotting
import numpy as np # big, fast arrays for maths
# set working directory to here
os.chdir(os.path.dirname(sys.argv[0]))
# read data using pandas
datafile = '1314 Powerview Pasture Potential.csv'
data1 = pd.read_csv(datafile)
#list(data) # to show column names
# plot some data
# don't know how to pop out a separate window
plt.scatter(data1['long'],data1['lat'],s=40,facecolors='none',edgecolors='b')
# simple multiple regression
Y = data1[['Pasture and Crop eaten t DM/ha']]
X = data1[['Net imported Supplements per Ha',
'LWT/ha',
'lat']]
result = sm.OLS(Y,X).fit()
result.summary()
My question is: How can I write an IPython cell magic which has access to the namespace of the IPython notebook?
IPython allows writing user-defined cell magics. My plan is creating a plotting function which can plot one or multiple arbitrary Python expressions (expressions based on Pandas Series objects), whereby each line in the cell string is a separate graph in the chart.
This is the code of the cell magic:
def p(line, cell):
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame()
line_list = cell.split('\n')
counter = 0
for line in line_list:
df['series' + str(counter)] = eval(line)
counter += 1
plt.figure(figsize = [20,6])
ax = plt.subplot(111)
df.plot(ax = ax)
def load_ipython_extension(ipython):
ipython.register_magic_function(p, 'cell')
The function receives the entire cell contents as a string. This string is then split by line breaks and evaluated using eval(). The result is added to a Pandas DataFrame. Finally the DataFrame is plotted using matplotlib.
Usage example: First define the Pandas Series object in IPython notebook.
import pandas as pd
ts = pd.Series([1,2,3])
Then call the magic in IPython notebook (whereby the whole code below is one cell):
%%p
ts * 3
ts + 1
This code fails with the following error:
NameError: name 'ts' is not defined
I suspect the problem is that the p function only receives ts * 3\n ts + 1 as a string and that it does not have access to the ts variable defined in the namespace of IPython notebook (because the p function is defined in a separate .py file).
How does my code have to be changed so the cell magic has access to the ts variable defined in the IPython notebook (and therefore does not fail with the NameError)?
Use the #needs_local_scope decorator decorator. Documentation is a bit missing, but you can see how it is used, and contributing to docs would be welcome.
You could also use shell.user_ns from Magics. For example something like:
from IPython.core.magic import Magics
class MyClass(Magics):
def myfunc(self):
print(self.shell.user_ns)
See how it's used in code examples: here and here.
I'm using spyder and have written the following class:
class Ray:
def __init__(self, r, p, k):
if r.shape == (3,):
self.r = r
if p.shape == (3,):
self.p = p
if k.shape == (3,):
self.k = k
r = array(range(3))
p = array(range(3))
k = array(range(3))
It is stored in /home/user/workspace/spyder/project and the console working directory is that one. In the console I can run an array(range(3)) and it returns an array with values 0,1,2. However when doing
import ray
I get the following error
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "ray.py", line 8, in <module>
class Ray:
File "ray.py", line 20, in ray
r = array(range(3));
NameError: name 'array' is not defined
EDIT:
by default spyder has the following behaviour, don't really understand why array() works by default I thought it was only part of numpy.
import numpy as np # NumPy (multidimensional arrays, linear algebra, ...)
import scipy as sp # SciPy (signal and image processing library)
import matplotlib as mpl # Matplotlib (2D/3D plotting library)
import matplotlib.pyplot as plt # Matplotlib's pyplot: MATLAB-like syntax
from mayavi import mlab # 3D plotting functions
from pylab import * # Matplotlib's pylab interface
ion() # Turned on Matplotlib's interactive mode
Within Spyder, this intepreter also provides:
* special commands (e.g. %ls, %pwd, %clear)
* system commands, i.e. all commands starting with '!' are subprocessed
(e.g. !dir on Windows or !ls on Linux, and so on)
You need from numpy import array.
This is done for you by the Spyder console. But in a program, you must do the necessary imports; the advantage is that your program can be run by people who do not have Spyder, for instance.
I am not sure of what Spyder imports for you by default. array might be imported through from pylab import * or equivalently through from numpy import *. If you want to directly copy code from the Spyder console to a program, you might need from numpy import * or even from pylab import *. It is officially not recommended to do this in a program, though, as this pollutes the program's namespace; doing import numpy as np and then np.array(…) is customary.