When making changes to larger modules, this is my current (inefficient) process:
Make needed change to code
Run program to test (using pdb - python3 -m pdb path/to/script.py
Program will throw an error
Fix error/create an exception
Run again
New error appears
Rinse an repeat
The data processing module I'm working on has many steps, and rerunning every time I make a code change to make sure there are no errors takes a long time and it's frustating. It's also obviously an inefficient way to develop a program, but I don't know what alternative
What advice do you have so that I don't have to run, and wait for, my whole data processing pipeline to find what the next error will be? Is there any way to make changes on the code and continue executing before the last error appeared?
You could do unit testing for every module and for every steps. Basically it's "create fake data to pass to every step and check if the result after the step is what you want", obviously automated.
Check the internet to learn about testing in general and testing in Python.
Related
I wrote a python program which uses a module (pytesseract, specifically) and I notice it takes a few seconds to import the module once I run it. I am wondering if there is a way to initialise the module before running the main program in order to cut the duration of the actual program by a few seconds. Any suggestions?
One possible solution for slow startup time would be to split your program into two parts--one part that is always running as a daemon or service and another that communicates with it to process individual tasks.
As a quick answer without more info, pytesseract also imports (if they are installed) PIL, numpy, and pandas. If you don't need these, you could uninstall them to reduce load time.
I presume that you need to start your application multiple times with different arguments and you don't want to waste time on imports every time, right?
You can wrap actual code in while True: and use input() to get new arguments. Or read arguments from the file.
Call a function in a running program
I am fairly new to programming and recently decided that I want to expand my Python knowledge and practice a bit. For that reason I decided that I create a little Weather Station with a Raspberry PI.
The program I am currently creating takes the output from a thermometer, parses it and writes it into a database. At the time the program is started every minute and after completing the aforementioned procedure, the program ends and all the instances get deleted (is that how you say it?).
I think that restarting it every minute wastes resources and time is getting lost so I want my program to run all the time and be accessible via command line commands (bash).
For example I have the class Thermometer:
class Thermometer():
def measure():
# do stuff
# return stuff
An instance of this class is created like this:
if __name__ == "__main__":
thermo = Thermometer
while True:
pass
Is it possible that I could call the function measure like this?:
sudo python3 < thermo.measure()
Or is there an other way to achieve this or am I doing a completely wrong approach? Also how could one describe this problem? I tried searching for "Python call function from outside" or "Python call function from bash" but I didn't find anything useful except Call a function from a running process
StackOverflow but it seems that this is the wrong Python version.
You can find my code here: github Jocomol/weatherstation
or if you don't trust the link go to github and search for "Jocomol/weatherstation".
Code review
As I already mentioned I am quite new to python and programming itself, so I make a lot of mistakes or writing useless code that could be resolved otherwise. So I am thankful for any feedback I can get on my code. If you guys could take a look at my code and point out places that aren't optimal or where I did something completely useless, I would be very happy.
I have a code that repeats itself every 10 seconds, but I can't test it for a long time because my powershell keeps on hanging and the code just stops for no particular reason (the code is running but it doesn't give out results). Is there a way to test the code, or safely running it without it being interrupted? I tried to search but it seems that a library like Unittest will just crash along with my code due to windowshell if I want to run it for lets say a day. Because it usually hangs just a few hours after I start testing manually.
The code is something like this:
import time
import requests
while True:
getting = requests.get(some_url)
result = getting. json()
posting = requests.post(another_url,headers,json=result)
time.sleep(10)
Thank you for your help.
So after testing and experimenting. It seems like a resource miss-management by windows when I use PowerShell, cmd.exe or the default Python IDE. Thus in case someone wants to test their code for a prolonged period of time, it is recommended to use PyCharm as it has been running for more than a day for me. This should mean that PyCharm has better management in this specific area.
For more details check the comments under the question itself.
In the PyCharm debugger we can pause a process. I have a program to debug that takes a lot of time before we arrive to the part I'm debugging.
The program can be modeled like that: GOOD_CODE -> CODE_TO_DEBUG.
I'm wondering if there is a way to..
run GOOD_CODE
save the process
edit the code in CODE_TO_DEBUG
restore the process and with the edited CODE_TO_DEBUG
Is serialization the good way to do it or is there some tool to do that?
I'm working on OSX with PyCharm.
Thank you for your kind answers.
The classic method is to write a program that reproduces the conditions that lead into the buggy code, without taking a bunch of time -- say, read in the data from a file instead of generating it -- and then paste in the code you're trying to fix. If you get it fixed in the test wrapper, and it still doesn't work in the original program, you then "only" have to find the interaction with the rest of the program that's faulty (global variables, bad parameters passes, etc.)
So I am working on a Matlab application that has to do some communication with a Python script. The script that is called is a simple client software. As a side note, if it would be possible to have a Matlab client and a Python server communicating this would solve this issue completely but I haven't found a way to work that out.
Anyhow, after searching the web I have found two ways to call Python scripts, either by the system() command or editing the perl.m file to call Python scripts instead. Both ways are too slow though (tic tocing them to > 20ms and must run faster than 6ms) as this call will be in a loop that is very time sensitive.
As a solution I figured I could instead save a file at a certain location and have my Python script continuously check for this file and when finding it executing the command I want it to. Now after timing each of these steps and summing them up I found it to be incredibly much faster (almost 100x so for sure fast enough) and I cant really believe that, or rather I cant understand why calling python scripts is so slow (not that I have more than a superficial knowledge of the subject). I also found this solution to be really messy and ugly so just wanted to check that, first, is it a good idea and second, is there a better one?
Finally, I realize that the Python time.time() and Matlab tic, toc might not be precise enough to measure time correctly on that scale so also a reason why I ask.
Spinning up new instances of the Python interpreter takes a while. If you spin up the interpreter once, and reuse it, this cost is paid only once, rather than for every run.
This is normal (expected) behaviour, since startup includes large numbers of allocations and imports. For example, on my machine, the startup time is:
$ time python -c 'import sys'
real 0m0.034s
user 0m0.022s
sys 0m0.011s