I am going some data analysis with python, and it involves reading data at the beginning of the script. I am currently debugging it, and it is cumbersome to wait for the data file to read each time. Is there any way that I can do something similar to a breakpoint which python will not need to read the data each time? It would just begin with the code below reading the data.
It sounds from your question like you have some lines at the beginning of a script which you do not want to process each time you run the script. That particular scenario is not really something that makes a lot of sense from a scripting point of view. Scripts are read from the top down unless you call a function or something. With that said, here is what I'm gathering you want your workflow to be like:
Do some time consuming data loading (once)
Try out code variations until one works
Be able to run the entire thing when you're done
If that's accurate, I suggest 3 options:
If you don't need the data that's loaded from step 1 in the specific code you're testing, just comment out the time consuming portion until you're done with the new code
If you do need the data, but not ALL of the data to test your new code, create a variable that looks like a small subset of the actual data returned, comment out the time consuming portion, then switch it back when complete. Something like this:
# data_result = time_consuming_file_parser()
data_result = [row1, row2, row3]
# new code using data_result
Finally, if you absolutely need the full data set but don't want to wait for it to load every time before you make changes, try looking into pdb or Python DeBugger. This will let you put a breakpoint after your data load and then play around in the python shell until you are satisfied with your result.
import pdb
pdb.set_trace()
Related
I love pycharm as it is very useful for my data analysis.
However there is something that I still can't figure out a big problem.
When I start to save a lot of variables, it is very useful. But sometimes, especially when I want to run a piece of row using seaborn and create a new graph, sometimes, all my variables disseapear and I have to reload them again from scratch.
I'd like to know, do you know a way to keep the data stored and run only a piece of my code without getting this problem ?
Thank you
I am not really sure what issue you are having, but from the sounds of it you just need to store a bunch of data and only use some of it.
If that's the case, I would save files which each set of data and then import the one you want to use at that time.
If you stick to the same data names then your program can just do something like:
import data_435 as data
# now your code can always access data.whatever_var
# even though you have 435 different data sets
I try to design a programm in VBA (used within a simulation software) which is able to call Python for some calculations and receive the result to proceed. The VBA-Python call will happen many times.
My first idea is based on a text file communication, e.g. something like this:
in VBA:
do something
write text file 'calc_todo.txt' in specific directory
while not exists 'calc_finished.txt':
wait 1 second
read 'calc_finished.txt'
delete'calc_finished.txt'
delete'calc_todo.txt'
do something
write text file 'calc_todo.txt' in specific directory
... repeat
in Python:
do something
while not exists 'calc_todo.txt':
wait 1 second
read 'calc_todo.txt'
do calculations based on 'calc_todo.txt'
write 'calc_finished.txt'
delete'calc_todo.txt'
do something
while not exists 'calc_todo.txt':
wait 1 second
... repeat
I have done something similar in past and unfortunately there are a lot of things I do not like:
fixed waiting time of e.g. 1 second might slow down performance
if something breaks VBA and/or Python will get stuck in a while loop or run in an error
to fix the second issue, error handling with initialisation can be implemented but last time it was a mess
What would be a more professional way on how to handle such communication?
I'm pretty new to Python. However, I am writing a script that loads some data from a file and generates another file. My script has several functions and it also needs two user inputs (paths) to work.
Now, I am wondering, if there is a way to test each function individually. Since there are no classes, I don't think I can do it with Unit tests, do I?
What is the common way to test a script, if I don't want to run the whole script all the time? Someone else has to maintain the script later. Therefore, something similar to unit tests would be awesome.
Thanks for your inputs!
If you write your code in the form of functions that operate on file objects (streams) or, if the data is small enough, that accept and return strings, you can easily write tests that feed the appropriate data and check the results. If the real data is large enough to need streams, but the test data is not, use the StringIO function in the test code to adapt.
Then use the __name__=="__main__" trick to allow your unit test driver to import the file without running the user-facing script.
I was able to write a program in python to do my data analyses. The program runs all well with a small mcve data from beginning to end. But, when I run it using my big dataset all works well until somewhere the data structure gets faulty and I get TypeError. Since the program is big and creates several data on the fly, I am not able to track at which specific line of the big data is the data-structure really messed up.
Problem: I want to know at what line of my data is the data structure wrong. Any easy way to do it.
I can tell from which function the problem is coming from. But, my problem isn't with the function, but the data structure which probably has a subtle structural problem somewhere. The data runs through several times until it hits the problem, but I cannot tell where. I tried adding a print function to visually trace it down. But, the data is so huge and lots of similar patterns and is really hard trace it back to the main-big data.
I am not sure if I should put my scripts here, but I think there are possible suggestions I can receive without writing my program on SE.
Any info appreciated.
Code would help, but without it, all I can think of is to keep track of the line number and include it with your error. Use a try.
line_number = 0
for line in your_file:
line_number += 1
try:
<do your thing>
except(TypeError):
print("Error at line number {}".format(line_number))
EDIT: This will simply print the line number and keep going. You could also raise the error if you want to halt processing.
I'm sure someone has come across this before, but it was hard thinking of how to search for it.
Suppose I have a file generate_data.py and another plot_utils.py which contains a function for plotting this data.
Of note, generate_data.py takes a long time to run and I would like to only have to run it once. However, I haven't finished working out the kinks in plot_utils.py, so I have to run this a bunch of times.
It seems in spyder that when I run generate_data (be it in the current console or in a new dedicated python interpreter) that it doesn't allow me to modify plot_utils.py and call "from plot_utils import plotter" in the command line. -- I mean it doesn't have an error, but it's clear the changes haven't been made.
I guess I kind of want cell mode between different .py files.
EDIT: After being forced to formulate exactly what I want, I think I got around this by putting "from plot_utils import plotter" \n "plotter(foo)" inside a cell in generate_data.py. I am now wondering if there is a more elegant solution.
SECOND EDIT: actually the method mentioned above in the edit does not work as I said it did. Still looking for a method.
You need to reload it:
# Python 2.7
plotter = reload(plotter)
or
# Python 3.x
from imp import reload
plotter = reload(plotter)