How to prevent an import of a module in jupyter notebook? - python

I have set up a jupyter notebook server for multiple users to run notebooks. I want to provide modules that can fetch data and do some pre-processing. Since the data and data processing code is proprietary, I don't want the users to have access to the source code which can be done via import inspect module.
I have two questions:
Is there a way to prevent inspect module from loading? I have seen this in Quantopian notebooks where importing inspect module throws an error.
Are there other ways to prevent access to the source code of the modules?

Related

How to run multiple notebooks in Google Colab

I have several different Notebooks in Google Colab that I use to automate some tasks. I want to create a single Notebook that runs all this tasks, so I can open a single tab with one single Notebook, run it and it will run all other tasks inside these different Notebooks.
I have two questions regarding this problem:
The solution I found is not working (I will describe it below). How
do I make it work?
Is there a better solution than the one I found?
About the first question:
Image I have Notebook_1 and Notebook_2 each one with a bunch of functions that automate my tasks. What I am doing is, downloading them as Notebook_1.py and Notebook_2.py, saving these files in a Google Drive folder. Then in my Notebook_main, which is the notebook that should house all notebooks, I run:
# Mounts Google Drive
from google.colab import drive
drive.mount('/content/drive')
# Copies .py files into Google Colab
!cp /content/drive/MyDrive/my_modules/Notebook_1.py /content
!cp /content/drive/MyDrive/my_modules/Notebook_2.py /content
# Import Modules
import Notebook_1
import Notebook_2
If I want to run a simple function inside these modules I just do:
Notebook_1.run_simple_function()
and this works. My problem happens when the function I am trying to run from the Notebook_1, for example, uses another module. Then I get the following error:
name 'os' is not defined
I imagine it happens because inside Notebook_1.py I call:
import os
...
os.remove(os.path.join(dir_download, item))
...
And I also think this will happen with all the modules that I call inside Notebook_1.py.
I have tried importing theses modules in Notebook_main, but it did not work. I do not know how to fix this. I need help.
Another issue is that I use a lot of Selenium, which needs to be installed in Google Colab before being imported. So I need to install and import it in Notebook_main and when I run a function with Notebook_1.run_function_that_uses_selenium() it should use the import from Notebook_main.
The second question is simpler. I just want to know if there is a better way to achieve the same result, i.e. run different Notebooks in Google Colab from a single notebook.
My constriction is that I can only use Google Colab and other Google related Platforms, I can not run anything locally.

Import error for csv file azure databricks

I have a csv file which has a lot of text data. I am trying to import it in azure databricks using python pandas but it is giving me a long list of errors but primarily its telling me this:- ERROR: Internal Python error in the inspect module. However, when I am putting file in local desktop and then importing it on local desktop using jupyter/spyder it is imported without any errors.
I have also put in option of encoding UTF-8 while importing it in azure databricks but its still showing error. Any idea how to tackle this?
problem solved. had to enter encoding=cp1252. Not sure about it why i had to put this option but tried several and this worked. There were several symbols and brackets in the text data fields so this might be useful when importing similar data and facing such problems

Python - Can't Get Pandas and Numpy Working in Visual Studio Code or Eclipse

I'm fairly new to IDE's and I'm trying to take courses in Python. No matter what I try, I cannot successfully run a python script that has import pandas and import numpy in it in either Visual Studio Code or Eclipse (running on Windows 10). I have Python 3.8 installed, and when I try running those commands in the shell it works fine. I suspect when I try executing an actual Python script instead of using the console, it might be using a different interpreter, and I only get errors when I try doing this, saying numpy is not defined. I also get the error "cannot import name 'numpy' from partially initialized module 'pandas' (most likely due to a circular import)" when I specify "from pandas import numpy" rather than "from pandas import *".
I am very frustrated and don't know how to fix this problem. I've tried searching for help but not having a programming background, I don't know where to go to resolve this or how.
I also cannot get pip or pip3 to work at all to install packages. Those commands don't get recognized.
Please help!
I recommend using Jupyter Notebooks/pycharm(IDE). Both are very useful for learning python and working with data, data manipulation, and data visualizations.
PyCharm knows everything about your code. Rely on it for intelligent code completion, on-the-fly error checking and quick-fixes & easy project navigation.
While
Jupyter Notebooks can run line by line, rerun specific lines after making changes, and it's inline output is very useful for debugging and visualizations. You can get it from https://jupyter.org.
Zepellin Notebooks can also serve as alternatives.

AWS Lambda Deployment Package in Python Limites

I want to run my code on AWS lambda function. To do so, i need to import some python packages (i.e. pandas, numpy, sklearn, scipy)
I have two problems:
First of all, the size of (unzip) packaged python zip files is greater than 250MB.
Secondly, I got some error using scipy as well as sklearn as:
Unable to import module 'lambda_function': cannot import name
'_ccallback_c'
of
Unable to import module 'lambda_function': No module named
'sklearn.check_build._check_build'
___________________________________________________________________________ Contents of /var/task/sklearn/__check_build:
__pycache _check_build.cpython-35m-x86_64-linux-gnu.sosetup.py
init.py
___________________________________________________________________________ It seems that scikit-learn has not been built correctly.
I tried to reinstall many times...
But still problems in sklearn and scipy.
Any idea?
sample code in AWS LambdaFunction:
import json
import numpy
import pandas
import sklearn
import scipy
def lambda_handler(event, context):
# TODO implement
print(event)
return
You appear to have two issues.
The first (and easiest to solve) is that you need to install the relevant modules on a Linux distro comparable to Amazon Linux.
You can either do this using EC2 or in a Docker container with Amazon Linux on it.
The second issue (which is a bit trickier if not impossible to solve given the size of the modules you want to use) is that you need to get your deployment size down to under 250MB unzipped and under 50MB zipped.
Using relevant CFLAG when installing may get you some of the way there. See here for an idea of what might work.
If you are still over limit (which I suspect you will be) your only choice left will be to delete some of the files in the modules which you believe will not be used in your particular program. This is risky, often error prone and usually takes many attempts to get right. Using code coverage tools may help you here, as they can indicate which files are actually being used.

History saving thread error when trying to open Pandas

I just installed IPython on a remote desktop at work. I had to create a shortcut on my desktop to connect to IPython because the remote desktop does not have internet access. I am able to successfully open the IPython notebook. However, when I try to import pandas
import pandas as pd
I get this error that I have never seen before
The history saving thread hit an unexpected error (OperationalError('database or disk is full',)).History will not be written to the database.
Does this error relate to how it was installed on the remote desktop?
I suffered from this problem for a long time. My dirty fix was to simply restart the kernel and go about my work. However, I did find a way which eliminated it for good. This question seems to have mixed answers for different users. I'll try to list all based on answers elsewhere (all links at the end).
So the issue seems to be because of a certain nbsignatures.db file. And we need to simply remove it to solve the issue. You may find the file here in any one of the locations:
~/.local/share/jupyter/nbsignatures.db (I found mine here)
~/.ipython/profile_default/security/nbsignatures.db
~/Library/Jupyter/nbsignatures.db
All links:
https://github.com/ipython/ipython/issues/9293
IPython Notebook error: Error loading notebook

Categories

Resources