Fail to use imports in jupyter notebook in vscode - python

I have a pretty straightforward code that run smoothly with Python 3.7:
import academic_data_settings as local_settings
import pandas as pd
import glob
import os
def get_all_data():
all_files = glob.glob(os.path.join(local_settings.ACADEMIC_DATA_SOURCE_PATH, "*.csv"))
df_from_each_file = [pd.read_csv(f) for f in all_files]
concatenated_df = pd.concat(df_from_each_file, ignore_index=True)
return concatenated_df
if __name__ == "__main__":
raw_data = get_all_data()
print(raw_data)
However, it is pretty hard to visualize the data in the pandas dataframe.
In order to view the data, I found the following article on how to use Jupyter notebook directly from VSCode: https://devblogs.microsoft.com/python/data-science-with-python-in-visual-studio-code/
In order to be able to see the Python interactive window, I needed to turn the code into a jupyter cell:
#%%
import academic_data_settings as local_settings
import pandas as pd
import glob
import os
def get_all_data():
all_files = glob.glob(os.path.join(local_settings.ACADEMIC_DATA_SOURCE_PATH, "*.csv"))
df_from_each_file = [pd.read_csv(f) for f in all_files]
concatenated_df = pd.concat(df_from_each_file, ignore_index=True)
return concatenated_df
if __name__ == "__main__":
raw_data = get_all_data()
print(raw_data)
As soon as I try to run or debug the cell, I get an exception at the first line:
import academic_data_settings as local_settings...
ModuleNotFoundError: No module named 'academic_data_settings'
I believe that the cell evaluation only send the code of the current cell. Is that correct?
Is there a way to get the import to work correctly?
I wouldn't like to end up writing Jupyter notebooks and then copy over the code to what will end up being the 'production' code.

I had a similar issue. I could import modules in IPython in the vscode terminal but not in the vscode interactive window (or jupyter notebook).
Changing the .vscode/settings.json file from
{
"python.pythonPath": "/MyPythonPath.../bin/python"
}
to
{
"python.pythonPath": "/MyPythonPath.../bin/python"
"jupyter.notebookFileRoot": "${workspaceFolder}"
}
resolved it for me.

Jaep. I'm a developer on this extension and I think I know what is happening here based on this comment:
#Lamarus it is sitting next to the file I run
The VSCode python interactive features use a bit different relative loading path versus jupyter. In VSCode the relative path is relative to the folder / workspace that you have opened as opposed to jupyter where it is relative to the file. To work around this you can either change your path to academic_data_settings to be relative to the opened folder, or you can set the Notebook File Root in the setting to point to the location that you want for this workspace to be the working root. We have a bug to support using the current file location for the notebook file root here if you want to upvote that.
https://github.com/microsoft/vscode-python/issues/4441

Related

Why can I run my PyTorch dataset file from a python file but not a notebook?

I've written a custom dataset class in PyTorch in a file dataset.py and tried to test it by running the following code in my notebook with the following code:
from dataset import MyCustomDataset
from torch.utils.data import DataLoader
ds = MyCustomDataset("/Volumes/GoogleDrive/MyDrive/MyProject/data/train.pkl",target_type="labels")
dl = DataLoader(ds,batch_size = 16, shuffle = False)
X, y = next(iter(dl))
print(f"X: {X}, y: {y}")
After some unsuccessful troubleshooting, I tried running the exact same code in a file test.py, which worked without issues!
Why can't I run this from my notebook?
For me, the problem is usually the pathing somehow, but in this case, all of the files, both .py, .ipynb and "data"-directory are in the same directory "MyProject". I've tried with both absolute paths (as in the example) and with relative paths, but it's the same result in both cases. I'm using vscode if that gives any insight.
Furthermore, the error message in the notebook is "list indices must be integers or slices, not str", unfortunately, the prompt tells me the wrong lines (there's a comment on the line where the error's supposed to be). But if this is really an error, then it should not work in a python file either, right?
Any help or suggestions are welcome!
Try to check if there is any problem with the path
import os.path
from os import path
a= path.exists("/Volumes/GoogleDrive/MyDrive/MyProject/data/train.pkl")
print(a)
if it returns true it means path is not the issue and you need to provide more details in your question
Jupyter and Python file has different cwd. You can execute this to get the cwd:
import os
print(os.getcwd())
And you can add this in the settings.json file to modify the cwd of jupyter notebook to let it take the ${workspaceFolder} as the cwd like the python file does:
"jupyter.notebookFileRoot": "${workspaceFolder}",

How can I set the path of the script as working directory in Python using Atom editor?

I want to set the path of the python script as the working directory. I've tried the solutions I've found other solutions, but they aren't working for me.
This solution:
import os
path = os.path.dirname(os.path.realpath(sys.argv[0]))
dn = os.path.dirname(os.path.realpath("__file__"))
dn
gives:
'C:\\Users\\23392\\Desktop'
and my script is in a folder of desktop.
This solution:
import os
print(os.path.dirname(os.path.realpath(__file__)))
gives the following error:
NameError: name '__file__' is not defined
I need to define it as a string to prevent the error. And I get the same result that the previous one:
'C:\\Users\\23392\\Desktop'
The path should be:
C:\Users\23392\Desktop\05_Work\test.py
EDIT
I've found a partial solution. If I open the file with right click->open with->Atom, it recognizes the path of the file. It works this way but it has to be another way to do it.
Please try this version:
import os
abspath = os.path.abspath(__file__)
basename = os.path.basename(__file__)
fullpath = os.path.join(abspath + "\\" + basename)
print(fullpath)
Write this code into a file and then run it using Python interpreter.
If you try it from an interactive shell it will not work, __file__ is not defined in the interactive interpreter because it is meaningless there.
It is set by the import implementation, so if you use a non-standard import mechanism it might also be unset.

Struggling to handle paths in Jupyter Notebook

I'm struggling to handle my paths for the project. To give you an overview I need to show you my directory tree:
I'd like to setup the paths correctly so that I won't have to change them when working on two machines.
In PortfolioOptimizer notebook, I'm using:
# set current working path
notebook_path = os.getcwd()
print (notebook_path)
I don't understand, why it prints out C:\xampp\htdocs\tools\python\learn2fly which is the path to the different project.
Even when I add let's say portfolio_paths.py to Portfolio_analysis directory with this code:
import os
def get_dir_path():
current_path = os.getcwd()
return current_path
and then in my notebook I use the below line of code:
from Portfolio_analysis.portfolio_paths import get_dir_path
# set current working path
notebook_path = get_dir_path()
I'm still getting C:\xampp\htdocs\tools\python\learn2fly
getcwd() returns the current working directory, this may change based on the way you run Jupyter Notebook or Lab (namely if you use --notebook-dir).
Take also a look to this answer.

Python Sys Path For Calling py File

When I run the predict.py file alone, it finds and reads the data.csv file. but it fails by running the predict.py file from the asd.py file in another file path;
My Files
-sbin
-master
+asd.py
-scripts
-bash
-dataset
+data.csv
+predict.py
asd.py
import os
import sys
runPath = os.path.dirname(os.path.realpath(__file__))
sys.path.append(os.path.join(runPath, "../../scripts/bash"))
from predict import pred
pred().main()
predict.py
import pandas as pd
class pred:
def main(self):
data = pd.read_csv('dataset/data.csv', sep=';')
Could the reason for this error be caused by changing the path of operation? Or I didn't get it because of another mistake.
Error:
FileNotFoundError: File b'dataset/data.csv' does not exist
Longer answer on the comment above:
This happens because although you appended the scripts folder to your sys path in order to import stuff from predict.py, whenever you call that code inside asd.py, it will run from the calling script's (asd.py) current working directory.
What that means for you is that the relative path dataset/dataset.csv does not exist in the current working directory of asd.py (sbin/master) and consequently the code will raise a FileNotFound exception.
The way to fix this and be able to run your code from anywhere, would be to give predict.py the absolute path of your dataset file.
To do that in a way that is not hardcoded, I would do what you did to get your runPath, namely get the absolute path of your predict.py file inside a varible and use os.path.join to join that to the dataset file. That way you can always be sure the dataset file will be found by whatever calling script uses the code in predict.py.
Example below:
predict.py
import pandas as pd
import os
current_dir = os.path.dirname(os.path.realpath(__file__))
class pred:
def main(self):
data_fullpath = os.path.join(current_dir, 'dataset/dataset.csv')
data = pd.read_csv(data_fullpath, sep=';')
I think you should use absolute path, instead of relative path
sys.path documentations says:
A list of strings that specifies the search path for modules
So I would not expect that relative paths starts from the path you set in asd.py
The script is trying to open dataset/data.csv starting from current folder where the script was started.
In your case I would try to pass that path somehow to the second script.

Configure a first cell by default in Jupyter notebooks

TIs there a way to configure a default first cell for a specific python kernel in the Jupyter notebook? I agree that default python imports go against good coding practices.
So, can I configure the notebook such that the first cell of a new python notebook is always
import numpy as np
for instance?
Creating an IPython profile as mentioned above is a good first solution, but IMO it isn't totally satisfying, especially when it comes to code sharing.
The names of your libraries imported through the command exec_lines do not appear in the notebook, so you can easily forget it. And running the code on another profile / machine would raise an error.
Therefore I would recommend to use a Jupyter notebook extension, because the imported libraries are displayed. It avoids importing always the same libraries at the beginning of a notebook.
First you need to install the nbextension of Jupyter.
You can either clone the repo : https://github.com/ipython-contrib/jupyter_contrib_nbextensions
or use the pip : pip install jupyter_contrib_nbextensions
Then you can create a nb extension by adding a folder 'default_cells' to the path where the nb extensions are installed. For example on Ubuntu it's /usr/local/share/jupyter/nbextensions/., maybe on windows : C:\Users\xxx.xxx\AppData\Roaming\jupyter\nbextensions\
You have to create 3 files in this folder:
main.js which contains the js code of the extension
default_cells.yaml description for the API in Jupyter
README.MD the usual description for the reader appearing in the API.
I used the code from : https://github.com/jupyter/notebook/issues/1451my main.js is :
define([
'base/js/namespace'
], function(
Jupyter
) {
function load_ipython_extension() {
if (Jupyter.notebook.get_cells().length===1){
//do your thing
Jupyter.notebook.insert_cell_above('code', 0).set_text("# Scientific libraries\nimport numpy as np\nimport scipy\n\n# import Pandas\n\nimport pandas as pd\n\n# Graphic libraries\n\nimport matplotlib as plt\n%matplotlib inline\nimport seaborn as sns\nfrom plotly.offline import init_notebook_mode, iplot, download_plotlyjs\ninit_notebook_mode()\nimport plotly.graph_objs as go\n\n# Extra options \n\npd.options.display.max_rows = 10\npd.set_option('max_columns', 50)\nsns.set(style='ticks', context='talk')\n\n# Creating alias for magic commands\n%alias_magic t time");
}
}
return {
load_ipython_extension: load_ipython_extension
};
});
the .yaml has to be formatted like this :
Type: IPython Notebook Extension
Compatibility: 3.x, 4.x
Name: Default cells
Main: main.js
Link: README.md
Description: |
Add a default cell for each new notebook. Useful when you import always the same libraries$
Parameters:
- none
and the README.md
default_cells
=========
Add default cells to each new notebook. You have to modify this line in the main.js file to change your default cell. For example
`Jupyter.notebook.insert_cell_above('code', 0).set_text("import numpy as np/nimportpandas as pd")`
You can also add another default cell by creating a new line just below :
`Jupyter.notebook.insert_cell_above('code', 1).set_text("from sklearn.meatrics import mean_squared_error")`
**Don't forget to increment 1 if you want more than one extra cell. **
Then you just have to enable the 'Default cells' extension in the new tab 'nbextensions' which appeared in Jupyter.
The only issue is that it detects if the notebook is new, by looking at the number of cells in the notebook. But if you wrote all your code in one cell, it will detect it as a new notebook and still add the default cells.
Another half-solution: keep the default code in a file, and manually type and execute a %load command in your first cell.
I keep my standard imports in firstcell.py:
%reload_ext autoreload
%autoreload 2
import numpy as np
import pandas as pd
...
Then in each new notebook, I type and run %load firstcell.py in the first cell, and jupyter changes the first cell contents to
# %load firstcell.py
%reload_ext autoreload
%autoreload 2
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
If you really just want a single import statement, this doesn't get you anything, but if you have several you always want to use, this might help.
Go there:
~/.ipython/profile_default/startup/
You can read the README:
This is the IPython startup directory
.py and .ipy files in this directory will be run prior to any code
or files specified via the exec_lines or exec_files configurables
whenever you load this profile.
Files will be run in lexicographical order, so you can control the
execution order of files with a prefix, e.g.::
00-first.py
50-middle.py
99-last.ipy
So you just need to create a file there like 00_imports.py which contains:
import numpy as np
if you want to add stuff like %matplotlib inline use .ipy, which you can use directly as well.
Alternatively, there seems to exist another solution with notebook extension, but I don't know how it works, see here for the github issue of the topic:
https://github.com/jupyter/notebook/issues/640
HTH
An alternative which I find to be quite handy is to use %load command in your notebook.
As an example I stored the following one line code into a python file __JN_init.py
import numpy as np
Then whenever you need it, you could just type in:
%load __JN_init.py
and run the cell. You will get the intended package to be loaded in. The advantage is that you could keep a set of commonly used initialization code with almost no time to set up.
I came up with this:
1 - Create a startup script that will check for a .jupyternotebookrc file:
# ~/.ipython/profile_default/startup/run_jupyternotebookrc.py
import os
import sys
if 'ipykernel' in sys.modules: # hackish way to check whether it's running a notebook
path = os.getcwd()
while path != "/" and ".jupyternotebookrc" not in os.listdir(path):
path = os.path.abspath(path + "/../")
full_path = os.path.join(path, ".jupyternotebookrc")
if os.path.exists(full_path):
get_ipython().run_cell(open(full_path).read(), store_history=False)
2 - Create a configuration file in your project with the code you'd like to run:
# .jupyternotebookrc in any folder or parent folder of the notebook
%load_ext autoreload
%autoreload 2
%matplotlib inline
import numpy as np
You could commit and share your .jupyternotebookrc with others, but they'll also need the startup script that checks for it.
A quick and also flexible solution is to create template notebooks e.g.
One notebook with specific imports for a python 2.7 kernel:
a_template_for_python27.ipynb
Another notebook with different imports:
a_template_for_python36.ipynb
The preceding a_ has the advantage that your notebook shows up on top.
Now you can duplicate the notebook whenever you need it. The advantage over %load firstcell.py is that you don't need another file.
However the problem of this approach is that the imports do not change dynamically when you want to start an existing notebook with another kernel.
While looking into the same question I found the Jupytemplate project from I was looking into the same question and found a pretty good lightwightb solution for this kind of problem. Jupytemplate copies a template Notebook on top of the notebook you are working on when you initialise the template or press a button. Afterwords the inserted cells are a completly normal part of your Notebook and can be edited/converted/downloaded/exported/imported like you do with any other notebook.
Github Project jupytemplate

Categories

Resources