How to save python notebook cell code to file in Colab - python

TLDR: How can I make a notebook cell save its own python code to a file so that I can reference it later?
I'm doing tons of small experiments where I make adjustments to Python code to change its behaviour, and then run various algorithms to produce results for my research. I want to save the cell code (the actual python code, not the output) into a new uniquely named file every time I run it so that I can easily keep track of which experiments I have already conducted. I found lots of answers on saving the output of a cell, but this is not what I need. Any ideas how to make a notebook cell save its own code to a file in Google Colab?
For example, I'm looking to save a file that contains the entire below snippet in text:
df['signal adjusted'] = df['signal'].pct_change() + df['baseline']
results = run_experiment(df)

All cell codes are stored in a List variable In.
For example you can print the lastest cell by
print(In[-1]) # show itself
# print(In[-1]) # show itself
So you can easily save the content of In[-1] or In[-2] to wherever you want.

Posting one potential solution but still looking for a better and cleaner option.
By defining the entire cell as a string, I can execute it and save to file with a separate command:
cell_str = '''
df['signal adjusted'] = df['signal'].pct_change() + df['baseline']
results = run_experiment(df)
'''
exec(cell_str)
with open('cell.txt', 'w') as f:
f.write(cell_str)

Related

How do I keep a variable open and not refill it when running a Python script?

I want to ask you how I can keep a variable open and not refill it when I execute the script. As an example, I read the file and assigned all of its lines to a variable. Then, I created some processes to interact with data executed from a file. I realized I needed to change something in my process after running the file, so I changed a few lines and ran the script again. The file is large, and I need to wait for it to upload, so I considered how I could keep the variable that refers to this file open at all times and easily make changes to my script without having to wait so long for it to upload.
import numpy as np
from tqdm import tqdm
from scipy import spatial
# This is the variable that I want to keep always open
embeddings_dict = {}
# This is the current file
filename = "/some_filename"
with open(filename, 'r', encoding="utf-8") as f:
lines = f.readlines()
for i in tqdm(range(len(lines))):
values = lines[i].split()
word = values[0]
vector = np.asarray(values[1:], "float32")
embeddings_dict[word] = vector
# This is the process
def find_closest_embeddings_euc(embedding):
return sorted(embeddings_dict.keys(),
key=lambda word: spatial.distance.euclidean(embeddings_dict[word], embedding))
print(find_closest_embeddings_euc(embeddings_dict['software'])[:10])
I expect to understand how can I make it.
You can't really persist memory in RAM once a process finishes. What you're describing is a classic workflow in the ML community (having to load in some huge dataset in memory and then apply and tweak a series of transformations to it) and a notebook environment is usually the answer.
You can check out how to setup your environment at either of these links:
https://docs.jupyter.org/en/latest/install/notebook-classic.html
https://code.visualstudio.com/docs/datascience/jupyter-notebooks (I recommend this one if you are already using VS Code)
Once you create your first notebook, you can add two cells to it - one for the data loading and another for the transformations. Now you can execute them independently - you can load your data once and apply the transformations and experiment with them as many times as you'd like.

How to get LibreOffice headless Calc calculate to save new values from uno?

I am trying to open an excel file from python, get it to recalculate and then save it with the newly calculated values.
The spreadsheet is large and opens fine in LibreOffice with GUI, and initially shows old values. If I then do a Data->Calculate->Recalculate Hard I see the correct values, and I can of course saveas and all seems fine.
But, there are multiple large spreadsheets I want to do it from so I don't want to use a GUI instead I want to use Python. The following all seems to work to create a new spreadsheet but it doesn't have the new values (unless I again manually do a recalculate hard)
I'm running on Linux. First I do this:
soffice --headless --nologo --nofirststartwizard --accept="socket,host=0.0.0.0,port=8100,tcpNoDelay=1;urp"
Then, here is sample python code:
import uno
local = uno.getComponentContext()
resolver = local.ServiceManager.createInstanceWithContext("com.sun.star.bridge.UnoUrlResolver", local)
context = resolver.resolve("uno:socket,host=localhost,port=8100;urp;StarOffice.ServiceManager")
remoteContext = context.getPropertyValue("DefaultContext")
desktop = context.createInstanceWithContext("com.sun.star.frame.Desktop", remoteContext)
document = desktop.getCurrentComponent()
file_url="file://foo.xlsx"
document = desktop.loadComponentFromURL(file_url, "_blank", 0, ())
controller=document.getCurrentController()
sheet=document.getSheets().getByIndex(0)
controller.setActiveSheet(sheet)
document.calculateAll()
file__out_url="file://foo_out.xlsx"
from com.sun.star.beans import PropertyValue
pv_filtername = PropertyValue()
pv_filtername.Name = "FilterName"
pv_filtername.Value = "Calc MS Excel 2007 XML"
document.storeAsURL(file__out_url, (pv_filtername,))
document.dispose()
After running the above code, and opening foo_out.xlsx it shows the "old" values, not the recalculated values. I know that the calculateAll() is taking a little while, as I would expect for it to do the recalculation. But, the new values don't seem to actually get saved.
If I open it in Excel it does an auto-recalculate and shows the correct values and if I open in LibreOffice and do Recalculate Hard it shows the correct values. But, what I need is to save it, from python like above, so that it already contains the recalculated values.
Is there any way to do that?
Essentially, what I want to do from python is:
open, recalculate hard, saveas
It seems that this was a problem with an older version of LibreOffice. I was using 5.0.6.2, on Linux, and even though I was recalculating, the new values were not even showing up when I extracted the cell values directly.
However, I upgraded to 6.2 and the problem has gone away, using the same code and the same input files.
I decided to just answer my own question, instead of deleting it, as this was leading to a frustration until I solved it.

Looping through ipynb files in python

So if I have the same piece of code inside of 10 separate .ipynb files with different names and lets say that the code is as follows.
x = 1+1
so pretty simple stuff, but I want to change the variable x to y. Is their anyway using python to loop through each .ipynb file and do some sort of find and replace anywhere it sees x to change it or replace it with y? Or will I have to open each file up in Jupiter notebook and make the change manually?
I never tried this before, but the .ipynb files are simply JSONs. These pretty much function like nested dictionaries. Each cell is contained within the key 'cells', and then the 'cell_type' tells you if the cell is code. You then access the contents of the code cell (the code part) with the 'source' key.
In a notebook I am writing I can look for a particular piece of code like this:
import json
with open('UW_Demographics.ipynb') as f:
ff = json.load(f)
for cell in ff['cells']:
if cell['cell_type'] == 'code':
for elem in cell['source']:
if "pd.read_csv('UWdemographics.csv')" in elem:
print("OK")
You can iterate over your ipynb files, identify the code you want to change using the above, change it and save using json.dump in the normal way.

Cannot get the output of pstats

I'm trying to use cProfile from: https://docs.python.org/2/library/profile.html#module-cProfile
I can get the data to print but I want to be able to manipulate the data and sort so that I get just the info I want. To get the data to print I use:
b = cProfile.run("function_name")
But after that runs and prints, b = None and I cannot figure out where the data is that it printed so that I can manipulate the data. Of course, I can see the data, but in order to analyze the data I need to able to get some sort of output into my IED editor. I've tried pstats but I get error messages. It seems that to use pstats I have to save some sort of file but I cannot figure out how to run the program and save it to a file.
UPDATE:
I almost have a solution
cProfile.run('re.compile("foo|bar")', 'restats')
There is a second argument where you can save a file as 'restats'. Now I should be able to open it and read it.
SOLVED:
cProfile.run("get_result()", 'data_stats')
p = pstats.Stats('data_stats')
p.strip_dirs().sort_stats(-1).print_stats()
p.sort_stats('name')
cProfile.run("get_result()", 'data_stats')
p = pstats.Stats('data_stats')
p.strip_dirs().sort_stats(-1).print_stats()
p.sort_stats('name')
In addition to the first argument which runs the code, the second argument actually saves the output to a file. The next line will then open the file. Once that file is open you should be able to see the values of p in your IED editor and be able to use normal python operations to manipulate it.

Constant first row of a .csv file?

I have a Python code which is logging some data into a .csv file.
logging_file = 'test.csv'
dt = datetime.datetime.now()
f = open(logging_file, 'a')
f.write('\n "{:%H:%M:%S}",{},{}'.format(dt,x,y,))
The above code is the core part and this produces continuous data in .csv file as
"00:34:09" ,23.05,23.05
"00:36:09" ,24.05,24.05
"00:38:09" ,26.05,26.05
... etc.,
Now I wish to add the following lines in first row of this data. time, data1,data2.I expect output as
time, data1, data2
"00:34:09" ,23.05,23.05
"00:36:09" ,24.05,24.05
"00:38:09" ,26.05,26.05
... etc.,
I tried many ways. Those ways not produced me the result as preferred format.But I am unable to get my expected result.
Please help me to solve the problem.
I would recommend writing a class specifically for creating and managing logs.Have it initialize a file, on creation, with the expected first line (don't forget a \n character!), and keep track of any necessary information about that log(the name of the log it created, where it is, etc). You can then have the class 'write' to the log (append the log, really), you can create new logs as necessary, and, you can have it check for existing logs, and make decisions about either updating what is existing, or scrapping it and starting over.

Categories

Resources