execution process in Jupyter Notebook - python

I have some questions about the way that Jupyter Notebook reads python code lines. (Sorry for not being able to upload code image. my reputation level is low.)
there exists csv file named 'train.csv' and I allocate this file to the variable named 'titanic_df'
import pandas as pd
titanic_df=pd.read_csv('train.csv')
print(titanic_df)
this goes well when it is executed. However, my questioin is,
import pandas as pd
# titanic_df=pd.read_csv('train.csv')
print(titanic_df)
this also goes well contrary to my intention. Even though I commented out the reading csv file step, titanic_df prints datas.
As I run same code on python installed on my computer and second code doesn't work, I guess there are some differences on the way Jupyter Notebook executes codes. How Jupyter Notebook works?

Jupyter can be somewhat confusing at first, but I will explain what's going on here.
A sequence of events occurred after the following code was run in Jupyter:
import pandas as pd
titanic_df=pd.read_csv('train.csv')
print(titanic_df)
In that first line of code, you imported the pandas module and loaded the pandas into memory. The pandas module is available to use. In the second line, you access the pd.read_csv function within the pandas module.
The pandas module and it's functions are available whenever called and loaded into memory. The pandas functions will be available to be used until pandas is removed from memory.
Therefore, to answer this question: When the pd.read_csv line of code is commented-out like such:
# titanic_df=pd.read_csv('train.csv')
this pandas function has not been removed from memory. Pandas is still loaded in memory. The only thing that changes is the commented line of code will not be executed again, or any time you run this block of code. But the pandas module and the pandas features will remain in memory and available and ready to be used.
Even if the first line of code were to be commented out, the pandas module and its features would still remain active in memory and ready to use in Jupyter. But if Jupyter is restarted, then the panda module would not be reloaded into memory.
Also, know about restarting the kernel. If you were to comment-out the first line of code but not the second line of code, and then you were to select in Jupyter "Restart kernel and run all cells", then two things would happen. The pandas module would not be loaded and then calling the pd.read_csv line of code would cause an error. The error would occur because your code would call for a pandas function, but the pandas module had not been installed.
A saved Jupyter Notebook file will run all the cells in the file whenever the existing file is opened.

Related

Jupyter notebook kernel constantly needs to be restarted

My installation of Jupyter Notebook only allows the execution of a few cells before becoming unresponsive. After moving to this "unresponsive mode", the execution of any cell, even a newly written one with a basic arithmetic command, will not noticeably execute or show output. Restarting the kernel is the only solution I've been able to find and that makes development painfully slow.
I'm running these versions for Jupyter 1, python 3.9, and I'm on windows 10. I've read the jupyter documentation and I can't find a reference to this issue. Similarily, there is no console output when Jupyter goes into "unresponsive mode". I've resolved all warnings shown in the console on startup.
I apologize for such a vague question. My issue is that I'm not quite sure what's gone wrong as well. I'm doing some basic data analysis with pandas:
%pylab
import pandas as pd
import glob
from scipy.signal import find_peaks
# Import data
dataFiles = glob.glob("Data/*.spe")
dataList = [pd.read_csv(f, names=[f]) for f in dataFiles]
# Join data into one DataFrame for ease
combinedData = pd.concat(dataList, axis=1, join="inner")
# Trim off arbitrary header and footers for each data run
lowerJunkRow = 12
upperJunkRow = 16395
combinedData = combinedData.truncate(before=lowerJunkRow, after=upperJunkRow)
combinedData.reset_index(drop=True, inplace=True)
# Cast dataFrame to integers
combinedData = combinedData.astype(int)
# Sum all counts by channel to aggregate data
combinedData["sum"] = combinedData.sum(axis=1)
Edit: I tried working in a different notebook with similar libraries and everything worked fine until I referenced a variable that I hadn't defined. The kernel then exhibited the same behavior as above. I tried saving my data in one combined csv file to avoid the large amount of memory the above code generates, but no dice. I also experience the same issue in Jupyter Lab which leads me to believe it's a kernel issue.
It seems to me that you are processing a very large quantity of data. It may be the case that there is simply a lot of processing to do - and the reason for the 'unresponsive' state is that your kernel is executing a cell which takes a lot of processing.
If you are attempting to concatenate multiple csv files, I suggest at least saving the concatenated dataframe as a csv. You can then check if this file exists (using the os module), and read in this csv instead of going through the rigmarole of concatenating everything again.

pycharm project: source root and imports not updated?

I have a python project (in Pycharm), and I have, let's say, 2 folders in it. One is called data and the other is algorithms. The data folder has a python file where I import some data from an excel sheet. And another file where I have defined some constants.
The algorithm folder has, let's say, one python file where I import the constants and the data from the data folder. I use:
from data.constants import const
from data.struct import table
When I run the algorithms (that are in the algorithm folder), things work perfectly. But when I change a constant in the constant file or the data in the excel sheet, nothing much changes. In other words, the constants are not updated when imported again and the same for the excel data imported. The old values of the constants and the table are used.
I tried to mark both folders as source root, but the problem persists.
What I do now, is close pycharm and reopen it again, but if there is a better way to handle this rather than closing and losing vars in the python console, I would be grateful to know about it!
I am not sure if I get it correct or not but try following. Once you change the constants in constants file try to import it again, i.e. do the following
from data.constants import const.
After this you see that constants are not changed ?
Please try this:
from constants.constant import v
print('v =', v)
del v
The problem can be connected with cache. Here is similar problem as yours but for spider
Spyder doesn't detect changes in imported python files
Check this post
pycharm not updating with environment variables
As it suggests you may have to take few steps. Set Environment Variables,
or check the solution suggested here
Pycharm: set environment variable for run manage.py Task .
I found the answer in this post :
https://stackoverflow.com/a/5399339/13890967
Basically add these two lines into the settings>Console>Python Console
%load_ext autoreload
%autoreload 2
see this answer as well for better visualization:
https://stackoverflow.com/a/60099037/13890967
and this answer for syntax errors:
https://stackoverflow.com/a/32994117/13890967

How to keep variable in memory - PyCharm

I'm running code quite frequently when working in PyCharm. Problem is that the entire code manipulates data temporarily stored in excel (we'll be moving this to the database once the program is up and running). Loading data takes time.
Is there a way in PyCharm to keep variable in the initial memory (without running a piece of code in the console) even after the program finished running?
data = pd.read_excel(path, index_col=0)
I want to avoid reloading data every time I am running program.
No, this feature has not been implemented yet and there is no way to do this.
If working in PyCharm is not a neccesity, you could work in a jupyter notebook: https://jupyter.org/
You could load your data in a cell and work with it in the next cells. Once executed, the result of a cell is kept in memory.
I've found a dirty trick - I am aware this is a very, very non-pythonic and not appropriate way of doing this. But it does the trick for me in this example. Again this code is only temporarily used for testing and will be removed once I'm happy with the code.
Module I run is as follow:
data = pd.read_excel(path, index_col=0) #Data is loaded only once
while True:
reload(TestModule)
TestModule.test_function(data)
input("Press Enter to rerun the test")
Now in TestModule I have test_function where I can reload ModyfiedModule I am working on and any function I want to test.
TestModule:
def test_function(data):
from ModyfiedModule import MyClass
#Run bunch of tests from MyClass
#Code to test MyClass is here
In this case, I load data only once and I can modify MyClass module and perform various test defined in TestModule without need to reload data each time.
The only thing I need to do after modifying code is to save MyClass and TestModule and press Enter in the console to rerun the test.

Why does 'from file import function_name' fail in my interactive Python session but works in a script?

I am moving common functions shared between a couple of Python files in to a third common_commands.py file. To test the functions I imported a couple of functions using:
from common_commands import func_one, func_two
But got this error:
ImportError: cannot import name 'func_two'
So I tried only importing func_one and that worked fine but then importing just func_two gives me the same error again! Why?! And to make things even confusing, the exact same import line from above works perfectly fine when I put it in the scripts I am refactoring.
What is causing this odd behavior?
TL;DR:
I had renamed func_twosince starting my interactive shell. Starting a new shell got things working.
What I learned:
I don't understand all the inner workings of the interactive shell and what happens when you call import, but after quitting and starting a new shell the exact same import call worked.
When I started the shell func_two was old_func_two but then I decided to rename it, and then I tried to import it with the new name, and that failed. After scratching my head and doing some google foo I found nothing that helped in my case and tried starting a new shell and it worked!
So I decided to do a little more experimenting before asking this question and learned that I could rename the function as much as I wanted after starting the shell but only until I first imported the file in some way.
That is to say, as soon as I called from common_commands import func_one I can no longer rename any functions and import them with the new name, since the file has already been imported. I can, however, still import old_func_two. I also tried changing the 'guts' of func_two after importing it and then importing it again and it kept the original behavior. So from what I can tell, the first time you import a file (not a function or class, but the whole file) it is cached and all future imports are run on the cached version, not the real file on disk.
So, even if you only import func_one i.e. from common_commands import func_one and then rename or change func_two and then import it, you'll have to use the original name of func_two and you'll get the original functionality as well, even though you didn't explicitly import it earlier.

Migrating Python code away from Jupyter and over to PyCharm

I'd like to take advantage of a number of features in PyCharm hence looking to port code over from my Notebooks. I've installed everything but am now faced with issues such as:
The Display function appears to fail hence dataframe outputs (used print) are not so nicely formatted. Equivalent function?
I'd like to replicate the n number of code cell in a Jupyter notebook. The Jupyter code is split over 9 cells in the one Jupyter file and shift+ Enteris an easy way to check outputs then move on. Now I've had to place all the code in the one Project/python file and have 1200 lines of code. Is there a way to section the code like it is in Jupyter? My VBA background envisions 9 routines and one additional calling routine to get the same result.
Each block of code is importing data from SQL Server and some flat files so there is some validation in between running them. I was hoping there was an alternative to manually selecting large chunks of code/executing and/or Breakpoints everytime it's run.
Any thoughts/links would be appreciated. I spent some $$ on Udemy on a PyCharm course but it does not help me with this one.
Peter
The migration part is solved in this question: convert json ipython notebook(.ipynb) to .py file, but perhaps you already knew that.
The code-splitting part is harder. One reason to why Jupyter is so widely spread is the functionality to split the output and run each cell separately. I would recommend #Andrews answer though.
If you are using classes put each class in a new file.

Categories

Resources