How to keep variable in memory - PyCharm

How to keep variable in memory - PyCharm - python

I'm running code quite frequently when working in PyCharm. Problem is that the entire code manipulates data temporarily stored in excel (we'll be moving this to the database once the program is up and running). Loading data takes time.
Is there a way in PyCharm to keep variable in the initial memory (without running a piece of code in the console) even after the program finished running?
data = pd.read_excel(path, index_col=0)
I want to avoid reloading data every time I am running program.

No, this feature has not been implemented yet and there is no way to do this.

If working in PyCharm is not a neccesity, you could work in a jupyter notebook: https://jupyter.org/
You could load your data in a cell and work with it in the next cells. Once executed, the result of a cell is kept in memory.

I've found a dirty trick - I am aware this is a very, very non-pythonic and not appropriate way of doing this. But it does the trick for me in this example. Again this code is only temporarily used for testing and will be removed once I'm happy with the code.
Module I run is as follow:
data = pd.read_excel(path, index_col=0) #Data is loaded only once
while True:
reload(TestModule)
TestModule.test_function(data)
input("Press Enter to rerun the test")
Now in TestModule I have test_function where I can reload ModyfiedModule I am working on and any function I want to test.
TestModule:
def test_function(data):
from ModyfiedModule import MyClass
#Run bunch of tests from MyClass
#Code to test MyClass is here
In this case, I load data only once and I can modify MyClass module and perform various test defined in TestModule without need to reload data each time.
The only thing I need to do after modifying code is to save MyClass and TestModule and press Enter in the console to rerun the test.

Related

Use static object in LibreOffce python script?

I've got a LibreOffice python script that uses serial IO. On my systems, opening a serial port is a very slow process (around 1 second), so I'd like to keep the serial port open, and just send stuff as required.
But LibreOffice python apparently reloads the python framework every time a call is made. Unlike most python implementations, where the process is persistent, and un-enclosed code in a module is run once, when the module is imported.
Is there a way in LibreOffice python to persist objects between calls?
SerialObject=None
def return_global():
return str(SerialObject) #always returns "None"
def init_serial_object():
SerialObject=True

It looks like there is a simple bug. Add global and then it works.
However, it may be that the real problem is your setup. Here is a working example. Put the code in a file called inctest.py under $LIBREOFFICE_USER_DIR/Scripts/python/pythonpath/incmod/.
def init_serial_object():
global SerialObject
SerialObject=True
def moduleVersion():
return "2.0" #change to verify that this is the most recently updated code
The code to call it should be located in the user profile directory (that is, location=user).
from incmod import inctest
inctest.init_serial_object()
msgbox("{}: {}".format(
inctest.moduleVersion(), inctest.return_global()))
Run the script by going to Tools > Macros > Run Macro and find it under My Macros.
Be aware that inctest.py will not always get reloaded. There are two ways to reload it: restart LibreOffice, or force python to reload it by doing del sys.modules[mod].
The moduleVersion() function is not necessary but helps you see when the module is getting reloaded — make changes to that line and then see whether the output changes.

pycharm: pass console variables to script

I'm running a script which loads some huge amount of data using pickles.
For this big amount of data, running the script takes a lot of time which in turn makes it very hard to work with (especially to debug).
For solving the problem above I thought about passing some of the variables defined in the console to the script. This will allow me to load the pickles only one time and just pass it to the script every time I want to use their data.
I tried to find a way to do this but couldn't find any.
Is there any way of doing passing console variables to a script?

Never mind. I can just create a function in the script and then call it from the console instead of having __main__.
For example, for a script A.py, add a function b(params) and then in the console just run
from <pathToA>.A import *
b(params)

Reproducible test framework

I am looking for a test or integration framework that supports long, costly tests for correctness. The tests should only be rerun if the code affecting the test has changed.
Ideally the test framework would
find the code of the test
produce a hash of it,
run the code and write to an output file with the hash as the name
or skip if that already exists.
provide a simple overview what tests succeeded and which failed.
It would be OK if the test has to specify the modules and files it depends on.
Python would be ideal, but this problem may be high-level enough that other languages would work too.
Perhaps there exists already a test or build integration framework I can adapt to fit this behaviour?

Basically you need to track what is the test doing so you can check whether it has changed.
Python code can be traced with sys.settrace(tracefunc). There is a module trace that can help with it.
But if it is not just Python code - if the tests execute other programs, test input files etc. and you need to watch it for changes too, then you would need tracing on operating system level, like strace, dtrace, dtruss.
I've created a small demo/prototype of simple testing framework that runs only tests that changed from last run: https://gist.github.com/messa/3825eba3ad3975840400 It uses the trace module. It works this way:
collect tests, each test is identified by name
load test fingerprints from JSON file (if present)
for each test:
if the fingerprint matches the current bytecode of functions listed in the fingerprint, the test is skipped
run test otherwise
trace it while running, record all functions being called
create test fingerprint with function names and bytecode MD5 hashes of each recorded function
save updated test fingerprints to a JSON file
But there is one problem: it's slow. Running code while tracing it with trace.Trace is about 40x slower than without tracing. So maybe you will be just better running all tests without tracing :) But if the tracer would be implemented in C like for example it is in the coverage module it should be faster. (Python trace module is not in C.)
Maybe some other tricks could help with speed. Maybe you are interested just in some top-level function whether they changed or not, so you don't need to trace all function calls.
Have you considered other ways how to speed up expensive tests? Like paralellization, ramdisk (tmpfs)... For example, if you test against a database, don't use the "system" or development one, but run a special instance of the database with lightweight configuration (no prealloc, no journal...) from tmpfs. If it is possible, of course - some tests need to be run on configuration similar to the production.
Some test frameworks (or their plugins) can run only the tests that failed last time - that's different, but kind of similar functinality.

This may not be the most efficient way to do this, but this can be done with Python's pickle module.
import pickle
At the end of your file, have it save itself as a pickle.
myfile = open('myfile.py', 'r') #Your script
savefile = open('savefile.pkl', 'w') #File the script will be saved to
#Any file extension can be used but I like .pkl for "pickle"
mytext = myfile.readlines()
pickle.dump(mytext, savefile) #Saves list from readlines() as a pickle
myfile.close()
savefile.close()
And then at the beginning of your script (after you have pickled it once already), add the code bit that checks it against the pickle.
myfile = ('myfile.py', 'r')
savefile = ('savefile.pkl', 'r')
mytext = myfile.readlines
savetext = pickle.load(savefile)
myfile.close()
savefile.close()
if mytext == savetext:
#Do whatever you want it to do
else:
#more code
That should work. It's a little long, but it's pure python and should do what you're looking for.

how to have one file work on objects generated by another in spyder

I'm sure someone has come across this before, but it was hard thinking of how to search for it.
Suppose I have a file generate_data.py and another plot_utils.py which contains a function for plotting this data.
Of note, generate_data.py takes a long time to run and I would like to only have to run it once. However, I haven't finished working out the kinks in plot_utils.py, so I have to run this a bunch of times.
It seems in spyder that when I run generate_data (be it in the current console or in a new dedicated python interpreter) that it doesn't allow me to modify plot_utils.py and call "from plot_utils import plotter" in the command line. -- I mean it doesn't have an error, but it's clear the changes haven't been made.
I guess I kind of want cell mode between different .py files.
EDIT: After being forced to formulate exactly what I want, I think I got around this by putting "from plot_utils import plotter" \n "plotter(foo)" inside a cell in generate_data.py. I am now wondering if there is a more elegant solution.
SECOND EDIT: actually the method mentioned above in the edit does not work as I said it did. Still looking for a method.

You need to reload it:
# Python 2.7
plotter = reload(plotter)
or
# Python 3.x
from imp import reload
plotter = reload(plotter)

Keep persistent variables in memory between runs of Python script

Is there any way of keeping a result variable in memory so I don't have to recalculate it each time I run the beginning of my script?
I am doing a long (5-10 sec) series of the exact operations on a data set (which I am reading from disk) every time I run my script.
This wouldn't be too much of a problem since I'm pretty good at using the interactive editor to debug my code in between runs; however sometimes the interactive capabilities just don't cut it.
I know I could write my results to a file on disk, but I'd like to avoid doing so if at all possible. This should be a solution which generates a variable the first time I run the script, and keeps it in memory until the shell itself is closed or until I explicitly tell it to fizzle out. Something like this:
# Check if variable already created this session
in_mem = var_in_memory() # Returns pointer to var, or False if not in memory yet
if not in_mem:
# Read data set from disk
with open('mydata', 'r') as in_handle:
mytext = in_handle.read()
# Extract relevant results from data set
mydata = parse_data(mytext)
result = initial_operations(mydata)
in_mem = store_persistent(result)
I've an inkling that the shelve module might be what I'm looking for here, but looks like in order to open a shelve variable I would have to specify a file name for the persistent object, and so I'm not sure if it's quite what I'm looking for.
Any tips on getting shelve to do what I want it to do? Any alternative ideas?

You can achieve something like this using the reload global function to re-execute your main script's code. You will need to write a wrapper script that imports your main script, asks it for the variable it wants to cache, caches a copy of that within the wrapper script's module scope, and then when you want (when you hit ENTER on stdin or whatever), it calls reload(yourscriptmodule) but this time passes it the cached object such that yourscript can bypass the expensive computation. Here's a quick example.
wrapper.py
import sys
import mainscript
part1Cache = None
if __name__ == "__main__":
while True:
if not part1Cache:
part1Cache = mainscript.part1()
mainscript.part2(part1Cache)
print "Press enter to re-run the script, CTRL-C to exit"
sys.stdin.readline()
reload(mainscript)
mainscript.py
def part1():
print "part1 expensive computation running"
return "This was expensive to compute"
def part2(value):
print "part2 running with %s" % value
While wrapper.py is running, you can edit mainscript.py, add new code to the part2 function and be able to run your new code against the pre-computed part1Cache.

To keep data in memory, the process must keep running. Memory belongs to the process running the script, NOT to the shell. The shell cannot hold memory for you.
So if you want to change your code and keep your process running, you'll have to reload the modules when they're changed. If any of the data in memory is an instance of a class that changes, you'll have to find a way to convert it to an instance of the new class. It's a bit of a mess. Not many languages were ever any good at this kind of hot patching (Common Lisp comes to mind), and there are a lot of chances for things to go wrong.

If you only want to persist one object (or object graph) for future sessions, the shelve module probably is overkill. Just pickle the object you care about. Do the work and save the pickle if you have no pickle-file, or load the pickle-file if you have one.
import os
import cPickle as pickle
pickle_filepath = "/path/to/picklefile.pickle"
if not os.path.exists(pickle_filepath):
# Read data set from disk
with open('mydata', 'r') as in_handle:
mytext = in_handle.read()
# Extract relevant results from data set
mydata = parse_data(mytext)
result = initial_operations(mydata)
with open(pickle_filepath, 'w') as pickle_handle:
pickle.dump(result, pickle_handle)
else:
with open(pickle_filepath) as pickle_handle:
result = pickle.load(pickle_handle)

Python's shelve is a persistence solution for pickled (serialized) objects and is file-based. The advantage is that it stores Python objects directly, meaning the API is pretty simple.
If you really want to avoid the disk, the technology you are looking for is a "in-memory database." Several alternatives exist, see this SO question: in-memory database in Python.

Weirdly, none of the earlier answers here mention simple text files. The OP says they don't like the idea, but as this is becoming a canonical for duplicates which might not have that constraint, this alternative deserves a mention. If all you need is for some text to survive between invocations of your script, save it in a regular text file.
def main():
# Before start, read data from previous run
try:
with open('mydata.txt', encoding='utf-8') as statefile:
data = statefile.read().rstrip('\n')
except FileNotFound:
data = "some default, or maybe nothing"
updated_data = your_real_main(data)
# When done, save new data for next run
with open('mydata.txt', 'w', encoding='utf-8') as statefile:
statefile.write(updated_data + '\n')
This easily extends to more complex data structures, though then you'll probably need to use a standard structured format like JSON or YAML (for serializing data with tree-like structures into text) or CSV (for a matrix of columns and rows containing text and/or numbers).
Ultimately, shelve and pickle are just glorified generalized versions of the same idea; but if your needs are modest, the benefits of a simple textual format which you can inspect and update in a regular text editor, and read and manipulate with ubiquitous standard tools, and easily copy and share between different Python versions and even other programming languages as well as version control systems etc, are quite compelling.
As an aside, character encoding issues are a complication which you need to plan for; but in this day and age, just use UTF-8 for all your text files.
Another caveat is that beginners are often confused about where to save the file. A common convention is to save it in the invoking user's home directory, though that obviously means multiple users cannot share this data. Another is to save it in a shared location, but this then requires an administrator to separately grant write access to this location (except I guess on Windows; but that then comes with its own tectonic plate of other problems).
The main drawback is that text is brittle if you need multiple processes to update the file in rapid succession, and slow to handle if you have lots of data and need to update parts of it frequently. For these use cases, maybe look at a database (probably start with SQLite which is robust and nimble, and included in the Python standard library; scale up to Postgres or etc if you have entrerprise-grade needs).
And, of course, if you need to store native Python structures, shelve and pickle are still there.

This is a os dependent solution...
$mkfifo inpipe
#/usr/bin/python3
#firstprocess.py
complicated_calculation()
while True:
with open('inpipe') as f:
try:
print( exec (f.read()))
except Exception as e: print(e)
$./first_process.py &
$cat second_process.py > inpipe
This will allow you to change and redefine variables in the first process without copying or recalculating anything. It should be the most efficient solution compared to multiprocessing, memcached, pickle, shelve modules or databases.
This is really nice if you want to edit and redefine second_process.py iteratively in your editor or IDE until you have it right without having to wait for the first process (e.g. initializing a large dict, etc.) to execute each time you make a change.

You can do this but you must use a Python shell. In other words, the shell that you use to start Python scripts must be a Python process. Then, any global variables or classes will live until you close the shell.
Look at the cmd module which makes it easy to write a shell program. You can even arrange so that any commmands that are not implemented in your shell get passed to the system shell for execution (without closing your shell). Then you would have to implement some kind of command, prun for instance, that runs a Python script by using the runpy module.
http://docs.python.org/library/runpy.html
You would need to use the init_globals parameter to pass your special data to the program's namespace, ideally a dict or a single class instance.

You could run a persistent script on the server through the os which loads/calcs, and even periodically reloads/recalcs the sql data into memory structures of some sort and then acess the in-memory data from your other script through a socket.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to keep variable in memory - PyCharm - python

No, this feature has not been implemented yet and there is no way to do this.

If working in PyCharm is not a neccesity, you could work in a jupyter notebook: https://jupyter.org/ You could load your data in a cell and work with it in the next cells. Once executed, the result of a cell is kept in memory.

Related

Use static object in LibreOffce python script?

pycharm: pass console variables to script

Reproducible test framework

how to have one file work on objects generated by another in spyder

Keep persistent variables in memory between runs of Python script

Categories

Resources