I have multiple codes written in different cells in jupyter notebook. The first cell contains the file name on which I need to perform the task. I am facing difficulty in running loop in jupyter notebook as I want to perform operation file by file. i.e. first take file1 to go through all the cells and then come back to lookout for file2 and so on.
I know a similar question has been asked Link1 but I am not sure how it can be done in jupter as I know in spyder we can indent in for loop and it'll run till we complete all the task then it jumps to another one but here in jupyter it seems difficult as it's cell by cell operation.
Cell 1 file_names = ['file1','file2','file3']
Cell 2 a = []
Cell 3 for file in file_names:
a.append(file)
Your distribution of code to the cells is fine (i.e. it is one possibility out of many).
You define variables in cells 1 and 2 and execute the loop in cell 3. Your file processing takes place only in cell 3.
Of course, every cell must contain valid python syntax (as must a script opened in spyder). This means the body of the for-loop must be correctly indented.
Related
TLDR: How can I make a notebook cell save its own python code to a file so that I can reference it later?
I'm doing tons of small experiments where I make adjustments to Python code to change its behaviour, and then run various algorithms to produce results for my research. I want to save the cell code (the actual python code, not the output) into a new uniquely named file every time I run it so that I can easily keep track of which experiments I have already conducted. I found lots of answers on saving the output of a cell, but this is not what I need. Any ideas how to make a notebook cell save its own code to a file in Google Colab?
For example, I'm looking to save a file that contains the entire below snippet in text:
df['signal adjusted'] = df['signal'].pct_change() + df['baseline']
results = run_experiment(df)
All cell codes are stored in a List variable In.
For example you can print the lastest cell by
print(In[-1]) # show itself
# print(In[-1]) # show itself
So you can easily save the content of In[-1] or In[-2] to wherever you want.
Posting one potential solution but still looking for a better and cleaner option.
By defining the entire cell as a string, I can execute it and save to file with a separate command:
cell_str = '''
df['signal adjusted'] = df['signal'].pct_change() + df['baseline']
results = run_experiment(df)
'''
exec(cell_str)
with open('cell.txt', 'w') as f:
f.write(cell_str)
So if I have the same piece of code inside of 10 separate .ipynb files with different names and lets say that the code is as follows.
x = 1+1
so pretty simple stuff, but I want to change the variable x to y. Is their anyway using python to loop through each .ipynb file and do some sort of find and replace anywhere it sees x to change it or replace it with y? Or will I have to open each file up in Jupiter notebook and make the change manually?
I never tried this before, but the .ipynb files are simply JSONs. These pretty much function like nested dictionaries. Each cell is contained within the key 'cells', and then the 'cell_type' tells you if the cell is code. You then access the contents of the code cell (the code part) with the 'source' key.
In a notebook I am writing I can look for a particular piece of code like this:
import json
with open('UW_Demographics.ipynb') as f:
ff = json.load(f)
for cell in ff['cells']:
if cell['cell_type'] == 'code':
for elem in cell['source']:
if "pd.read_csv('UWdemographics.csv')" in elem:
print("OK")
You can iterate over your ipynb files, identify the code you want to change using the above, change it and save using json.dump in the normal way.
I have a confusing issue in some code I'm currently running in a Jupyter notebook. I am reading in a couple thousand text files, each of them tens of MB, with Pandas, and searching for certain values within them. To track the progress while this happens, I threw in some print statements every once in a while. The important pieces of the code here are:
read_dir = '/Volumes/Data/'
all_text_files = os.listdir(read_dir)
all_text_files.sort()
# Split the file up so we at least get some output if this doesn't finish running
file_chunks = [all_text_files[i:i+100] for i in range(0,len(all_text_files),100)]
for chunk_num, current_chunk in enumerate(file_chunks):
print('Chunk', chunk_num)
for file_num, file in enumerate(current_chunk):
if file_num % 10 == 0:
print('File', file_num, '-', file)
# Read in the data down here
Normally, I see a bunch of print statements like
File 10 - some_file.txt
However, I recently saw one of these print statements stall midway. That is, the Jupyter output sat at File 10 - for a couple of minutes, and then eventually it completed to the above line, and the code continued to run.
Any idea how it got stuck in the middle of a print statement like that? Is that stalling likely within the Python print function, or is it in displaying that output on the Jupyter notebook? And finally, is there any information to take away about this - has something in my own code potentially caused this?
I have data in an excel spreadsheet (*.xlsx) that consists of 1,213 rows of sensitive information (so, I'm sorry I can't share the data) and 35 columns. Every entry is a string (I don't know if that is screwing it up or not). The first row is the column names and I've never had a problem importing it with the column names embedded before (it's just easier to click that they're embedded so I don't have to name every column by hand). I put the path to the data in the quick start wizard and hit the next button and it doesn't do anything. I hit it again and it turns the mouse into the loader as if it's loading. I've waited for it for 15 minutes before, but every time I click on QlikView the program just crashes.
I have a deadline I have to meet here and I can't afford to not finish this project. It's extremely important that I get it working.
Just as a NB, I used Python to merge two Excel spreadsheets together so I don't know if that may be what's causing the problem either. I can open the file perfectly fine in Excel though.
I was using the walkthrough when creating a new file when I should have just made a script. First click on edit script in the menu bar. Click Table Files.... Choose your file, then make sure that the Labels section has Embedded Labels in the dropdown. It will create a query like the following-
LOAD [Resource name],
[Employee ID],
Vertical,
[Contract Type *],
Notes
FROM
[D:\path\to\file\*.xlsx]
(ooxml, embedded labels, table is Sheet1);
That's one part of the solution, but then I ran into a new problem. It said that it was able to fetch all of the rows, but when I started making the charts and graphs it was only showing 6 data points. I recreated the file, did exactly what I did above but also added the transformation step. Now the problem is solved.
I have had QlikView crash when importing an Excel spreadsheet that was exported with the SQuirreL SQL client (from a Firebird database). Opening the spreadsheet in Excel, and saving it again solved the problem.
I know that this is no longer relevant to your problem, but hopefully it can help someone with a similarly appearing issue.
I see that you do not include a 'header is 0 lines,' which could be cauing an issue?
Follows a snippet from my standard Excel file import - just the FROM section. My setup is done through variables but they follow this form:
Set vTableW = 'WIP Metrics' ;
Set vPathData = '..\Raw Data Reports\' ;
Set vFile08 = 'Misc Transactions VCB*.xlsx' ;
Set vHeader08 = 2 ;
Set vSheet08 = 'Misc Trans' ;
Set vWhere08 = ( Len(Trim([Date Received])) > 0
And Len(Trim([Lot Number])) > 0
And Len(Trim([Y/N])) > 0
And Len(Trim([Initials])) > 0 ) ;
'$(vTableW)':
Load
AutoNumber(RowNo(), 1) As [_Load WIPWO ID],
...
additional columns
...
If(IsNull([Comments]), '', Trim([Comments])) As [VCB Comments]
From
'$(vPathData)$(vFile08)'
(ooxml, embedded labels, header is $(#vHeader08) lines, table is '$(vSheet08)')
Where($(vWhere08)) ;
Regarding the point made above relating to syn keys, add the line
Exit Script ;
Before your import to check load is OK to that point. Then move it immediately after the Excel load and repeat. Use the debug facility in the Load Script process.
Hope this helps!
In recent versions of MATLAB, one can execute a code region between two lines starting with %% using Ctrl-Enter. Such region is called a code cell, and it allows for fast code testing and debugging.
E.g.
%% This is the beginning of the 1st cell
a = 5;
%% This is the end of the 1st cell and beginning of the 2nd cell
% This is just a comment
b = 6;
%% This is the end of the 2nd cell
Are there any python editors that support a similar feature?
EDIT: I just found that Spyderlib supports "block" execution (code regions separated with blank lines) with F9, but as the this thread mentions, this feature is still not very robust (in particular in combination with loops).
The Interactive Editor for Python IEP has a Matlab-style cell notation to mark code sections (by starting a line with '##'), and the shortcut by default is also Ctrl+Enter:
## Cell one
"""
A cell is everything between two commands starting with '##'
"""
a = 3
b = 4
print('The answer is ' + str(a+b))
## Cell two
print('Hello World')
Spyder3 defines a cell as all code between lines starting with #%%.
Run a cell with Ctrl+Enter, or run a cell and advance with Shift+Enter.
Spyder3 & PyCharm: #%% or # %%
Spyder3: Ctrl+Enter: to run current cell, Shift+Enter: to run current cell and advance.
PyCharm: Ctrl+Enter: to run and advance
# %%
print('You are in cell 1')
# %%
print('You are in cell 2')
# %%
print('You are in cell 3')
enter image description here
I have written a vim plugin in which cells are delimited by ## . It sends cells to an ipython interpreter running in tmux. You can define key mappings to execute the current cell, execute current cell and move to next or execute the current line :
https://github.com/julienr/vim-cellmode
I recently started working on a similar plugin for Intellij PyCharm. It can send the cell to either the internal python console (which has some issues with plots) or to an ipython interpreter running in tmux :
https://github.com/julienr/pycharm-cellmode
Pyscripter supports block execution. But it's Win only. And it's limited to select code block - > run it(Ctrl+F7). No notion of cells.
IDLE with IdleX has support for Matlab-like and Sage-like cells using SubCodes. Code in between '##' markers can be executed with Ctrl+Return. It also allows for indented markers so that indented code can be executed.
There is Sage that offers something like this. It is meant to be a python alternative to Matlab, you should take a look.
In a sage notebook, you write python commands within blocks that are pretty similar to matlab's cell.