Migrating Python code away from Jupyter and over to PyCharm - python

I'd like to take advantage of a number of features in PyCharm hence looking to port code over from my Notebooks. I've installed everything but am now faced with issues such as:
The Display function appears to fail hence dataframe outputs (used print) are not so nicely formatted. Equivalent function?
I'd like to replicate the n number of code cell in a Jupyter notebook. The Jupyter code is split over 9 cells in the one Jupyter file and shift+ Enteris an easy way to check outputs then move on. Now I've had to place all the code in the one Project/python file and have 1200 lines of code. Is there a way to section the code like it is in Jupyter? My VBA background envisions 9 routines and one additional calling routine to get the same result.
Each block of code is importing data from SQL Server and some flat files so there is some validation in between running them. I was hoping there was an alternative to manually selecting large chunks of code/executing and/or Breakpoints everytime it's run.
Any thoughts/links would be appreciated. I spent some $$ on Udemy on a PyCharm course but it does not help me with this one.
Peter

The migration part is solved in this question: convert json ipython notebook(.ipynb) to .py file, but perhaps you already knew that.
The code-splitting part is harder. One reason to why Jupyter is so widely spread is the functionality to split the output and run each cell separately. I would recommend #Andrews answer though.
If you are using classes put each class in a new file.

Related

Can the content of python Jupyter notebook cells be manipulated from within the code running inside said notebook?

I was wondering if it is possible to manipulate the content of a cell of a currently running jupyter notebook from the code running inside it.
The reason I want to do this is because I have a notebook that uses some remote API and requires an APIKEY.
I have APIKEY="".... as the contents of my first cell in the notebook.
From time to time I am required to update this API KEY.
In theory what I had in mind was that if the code that runs gets an answer from API that the token has expired or is not suitable for the current IP, It could ask me for and updated APIKEY (which I copy from the back-office of the said API) and update the said cell on its own for the next runs....
I know that there might be security issues and maybe a ton of other things to consider before actually trying to do something like this - and I also assume that in theory it should be possible to manipulate the underlying notebook file - but still was wandering if jupyter exposes some sort of an API for doing this type of in-place code manipulation.
This would also make it possible to do "self evolving" code or code generation of different sorts.

Split long and slow jupyter-notebook in separate blocks

Background:
I have a very long jupyther-notebook storing a lot of large numpy arrays.
As I use it for documenting a project, the jupyther notebook consists of several independent blocks and one import block (necessary for all other blocks). The notebook gets very slow, after many cells have been calculated, so I want to find a way to speed things up. The question below, seems the most solid and convenient solution to me at the moment, but I am open to other ideas.
My Question:
Is there a convenient way, to define independent blocks of a jupyther-notebook and execute them separately from each other with just a view clicks?
Ideas I had so far:
Always put the latest block on the top of my notebook (after the include statements). At the end of this block write a raise command to prevent the execution of further blocks: This is somehow messy and I can not execute blocks further down in the document by just a view clicks.
Split the notebook in separate notebook documents: This helps, but I want to keep better overview over my work.
delete all variables, which were used in the current block after it's execution: For whatever reason, this did not bring a considerable speedup. Is it possible, that I did something wrong here?
Start the browser I use for the jupyther-notebook with some nice value (I am using linux): This does not improve the performance in the notebook, but at least the computer keeps running fast and I can do something else on it, while waiting for the notebook.
The workaround I will end up, if I don't find a better solution here, is to define variables
actBlock1=False
actBlock2=True
actBlock3=False
and put if statements in all cells of a block. But I would prefer something which produces less unnecessary ifs and indents, to keep my work clean.
Thank you very much in advance,
You can take a look at the Jupyter Notebook Extensions package, and, in particular, at the Freeze extension. It will allow you to mark cells as "frozen" which means they cannot be executed (until you "unfreeze" them, that is).
For example, in this image:
The blue-shaded cells are "frozen" (you can select that with the asterisk button in the toolbar). After clicking "Run all" only the non-frozen cells have been executed.

PyCharm: Storing variables in memory to be able to run code from a "checkpoint"

I've been searching everywhere for an answer to this but to no avail. I want to be able to run my code and have the variables stored in memory so that I can perhaps set a "checkpoint" which I can run from in the future. The reason is that I have a fairly expensive function that takes some time to compute (as well as user input) and it would be nice if I didn't have to wait for it to finish every time I run after I change something downstream.
I'm sure a feature like this exists in PyCharm but I have no idea what it's called and the documentation isn't very clear to me at my level of experience. It would save me a lot of time if someone could point me in the right direction.
Turns out this is (more or less) possible by using the PyCharm console. I guess I should have realized this earlier because it seems so simple now (though I've never used a console in my life so I guess I should learn).
Anyway, the console lets you run blocks of your code presuming the required variables, functions, libraries, etc... have been specified beforehand. You can actually highlight a block of your code in the PyCharm editor, right click and select "Run in console" to execute it.
This feature is not implement in Pycharm (see pycharm forum) but seems implemented in Spyder.

Merge CSV into HDF5 in pandas leads to crash

I have about 700 CSV files. They are each typically a few meg and few thousand rows. So, the total folder is ~1gig. I want to merge them into a single HDF5 file.
I first defined a function read_file(file) that reads a single file, and parses it using pd.read_csv(). It then returns a dataframe.
I then use this code to convert:
for file in files:
print (file + " Num: "+str(file_num)+" of: "+str(len(files)))
file_num=file_num+1
in_pd=read_file(file)
in_pd.to_hdf('AllFlightLogs.h5','flights',mode='a',append=True)
And, it works just fine for about 202 files, and then python crashes with: Abort trap: 6
I don't know what this error means. I have also seen it pop up a window showing a stack error.
I have tried using complib='lzo' and that doesn't seem to make any difference. I have tried saving to a different hdf5 file every 100 reads, and that does change the exact number of files before the crash. But, it still happens.
There doesn't seem to be anything special about that particular file. Is there anyway to find out anything else about this particular error? I know that the crash happens when I try to call in_pd.to_hdf() (I added print statements before and after).
I am running on a Mac, and using pandas 16.2.
I upgraded to 3.2.1 and that seems to have fixed it. So, it was not a problem with my code (which was driving me crazy), but was a pytables problem.
Adam's answer solved my problem on my imac.
But as of 1sep15, while pytables is available for linux and osx, it is still not for Windows - I use the Anaconda distribution (very good in every other respect). Does anybody know why ? Is there a specific reason for that ?

how to have one file work on objects generated by another in spyder

I'm sure someone has come across this before, but it was hard thinking of how to search for it.
Suppose I have a file generate_data.py and another plot_utils.py which contains a function for plotting this data.
Of note, generate_data.py takes a long time to run and I would like to only have to run it once. However, I haven't finished working out the kinks in plot_utils.py, so I have to run this a bunch of times.
It seems in spyder that when I run generate_data (be it in the current console or in a new dedicated python interpreter) that it doesn't allow me to modify plot_utils.py and call "from plot_utils import plotter" in the command line. -- I mean it doesn't have an error, but it's clear the changes haven't been made.
I guess I kind of want cell mode between different .py files.
EDIT: After being forced to formulate exactly what I want, I think I got around this by putting "from plot_utils import plotter" \n "plotter(foo)" inside a cell in generate_data.py. I am now wondering if there is a more elegant solution.
SECOND EDIT: actually the method mentioned above in the edit does not work as I said it did. Still looking for a method.
You need to reload it:
# Python 2.7
plotter = reload(plotter)
or
# Python 3.x
from imp import reload
plotter = reload(plotter)

Categories

Resources