How to profile performance in a QGIS Python plugin? - python

Is it possible to use kernprof.py, line_profiler.py, or something similar to profile a QGIS plugin? I can't run the plugin outside of QGIS because the plugin requires state from QGIS & will make calls to the QGIS API.
It seems like I might be able to modify the plugin's initializer to call kernprof, to call back to the plugin and pass the state all the way through, but I can't wrap my head around it.
Does anyone have experience with running a Python profiler from inside another tool?

I used a simpler way to profile my plugin using cProfile. In the constructor of main class of plugin (that is returned in classFactory), I used this code:
self.pr = cProfile.Profile()
self.pr.enable()
and in unload method of the class or any where some one needs print the profile stats:
self.pr.disable()
s = io.StringIO()
sortby = SortKey.CUMULATIVE
ps = pstats.Stats(self.pr, stream=s).sort_stats(sortby)
ps.print_stats()
remember to use following code for imports:
import cProfile, pstats, io
from pstats import SortKey

It's possible to use line_profiler while running your script inside QGIS.
You need to import it inside the main file of your plugin along your other imports, then add profile = line_profiler.LineProfiler() before your main class, add the decorator #profile just before your main function to profile and finally add profile.print_stats(stream=stream) just before the return of the function.
I suppose there is other ways to do it, but it's the way I found that works good enough for me.
Below is an example for a Processing plugin:
import os
import line_profiler
profile = line_profiler.LineProfiler()
class processingScriptExample(QgsProcessingAlgorithm):
INPUT_directory = 'INPUT_directory'
def initAlgorithm(self, config):
self.addParameter(QgsProcessingParameterNumber(self.INPUT_directory,
self.tr('Output directory'),
QgsProcessingParameterFile.Folder))
#profile
def processAlgorithm(self, parameters, context, feedback):
directory = self.parameterAsInt(parameters, self.INPUT_directory, context)
ls = []
for ii in range(1000000):
ls.append(ii)
ls = [ii for ii in range(1000000)]
path_profiling = os.path.join(directory, "line_profiling.txt")
with open(path_profiling, 'w') as stream:
profile.print_stats(stream=stream)
return {'Profiling file': path_profiling}
The resulting file:
Timer unit: 1e-07 s
Total time: 1.31260 s
File: C:\OSGeo4W\profiles\default/python/plugins\test\algo_test.py
Function: processAlgorithm at line 70
Line # Hits Time Per Hit % Time Line Contents
==============================================================
70 #profile
71 def processAlgorithm(self, parameters, context, feedback):
72 1 248.0 248.0 0.0 directory = self.parameterAsInt(parameters, self.INPUT_directory, context)
73
74 1 8.0 8.0 0.0 ls = []
75 1000001 5054594.0 5.1 38.5 for ii in range(1000000):
76 1000000 6633146.0 6.6 50.5 ls.append(ii)
77
78 1 1418416.0 1418416.0 10.8 ls = [ii for ii in range(1000000)]
79
80 1 561.0 561.0 0.0 path_profiling = os.path.join(directory, "line_profiling.txt")
81 1 19001.0 19001.0 0.1 with open(path_profiling, 'w') as stream:
82 profile.print_stats(stream=stream)
83
84 return {"Profiling file":path_profiling}

Related

I am trying to run a python script from inside a very simple Docker container. The script runs as expected on my machine, but not on the container

First, I'll post the Python script:
import os
import re
import statistics
from collections import defaultdict
def read_file():
cwd = os.getcwd()
file_path = f"{cwd}/SensorLog.txt"
reference_temp = 0
reference_humidity = 0
current_sensor_name = ""
sensors = defaultdict(list)
with open(file_path) as file:
for row in file.readlines():
reference = re.search(r"^reference temp +(\d+.?\d*) +humidity +(\d+.?\d*)$", row)
if reference:
reference_temp = float(reference.group(1))
reference_humidity = float(reference.group(2))
current_sensor = re.search(r"^(?:(?:temperature .*)|(?:humidity .*))$", row)
if current_sensor:
current_sensor_name = current_sensor.group()
value = re.search(r"(?<= )\d{1,5}.\d{1,4}(?=\n)$", row)
if value and current_sensor_name:
sensors[current_sensor_name].append(value.group())
return reference_temp, reference_humidity, sensors
def rate(key, reference_temp, reference_humidity, stdev, mean):
result = ''
if 'temperature' in key:
reference = reference_temp
if abs(reference-mean) <= 0.5:
if stdev < 3:
result = 'is ultra-precise'
if stdev < 5:
result = 'is very precise'
if not result:
result = 'is precise'
if 'humidity' in key:
reference = reference_humidity
if abs(reference-mean) <= 0.01:
result = 'is precise'
else:
result = 'is discarded'
return result
def main():
reference_temp, reference_humidity, sensors = read_file()
result = dict()
for key, values in sensors.items():
float_values = list(map(float, values))
stdev = statistics.stdev(float_values)
mean = statistics.mean(float_values)
result[key] = rate(key, reference_temp, reference_humidity, stdev, mean)
print(result)
main()
When I run this on my computer, it prints a dictionary full of the information I'm expecting to see, properly sorted. I tried copying the script and the file it's supposed to process to a plain Docker container, with the intent that the python script would run automatically, and all I get printed to the console when the Docker container runs is a pair of empty curly brackets containing nothing. I'll post the Dockerfile next:
FROM python:3
COPY "./sensor.py" "./SensorLog.txt" ./
ENTRYPOINT ["python3", "./sensor.py"]
Like I said, very simple. I opened a bash shell into the container, and SensorLog.txt is there, and when I run 'cat' on it, it contains the same information as on my computer. I suspect the problem is related to my using os.getcwd() but I'm honestly not sure what the workaround should be. Just in case, here is the contents of the text file, as found on my computer and when running cat in the container, as well:
reference temp 70.0 humidity 45.0
temperature thermometer-1
2007-04-05T22:00 72.4
2007-04-05T22:01 76.0
2007-04-05T22:02 79.1
2007-04-05T22:03 75.6
2007-04-05T22:04 71.2
2007-04-05T22:05 71.4
2007-04-05T22:06 69.2
2007-04-05T22:07 65.2
2007-04-05T22:08 62.8
2007-04-05T22:09 61.4
2007-04-05T22:10 64.0
2007-04-05T22:11 67.5
2007-04-05T22:12 69.4
temperature thermometer-2
2007-04-05T22:01 69.5
2007-04-05T22:02 70.1
2007-04-05T22:03 71.3
2007-04-05T22:04 71.5
2007-04-05T22:05 69.8
humidity hygrometer-1
2007-04-05T22:04 45.2
2007-04-05T22:05 45.3
2007-04-05T22:06 45.1
humidity hygrometer-2
2007-04-05T22:04 44.4
2007-04-05T22:05 43.9
2007-04-05T22:06 44.9
2007-04-05T22:07 43.8
2007-04-05T22:08 42.1
temperature thermometer-3
2007-04-05T22:00 70.4
2007-04-05T22:01 70.0
2007-04-05T22:02 72.1
2007-04-05T22:03 71.6
2007-04-05T22:04 72.2
2007-04-05T22:05 70.4
2007-04-05T22:06 69.2
2007-04-05T22:07 70.2
EDIT 9/13/21 On the advice of #Steve Trotta I edited the Dockerfile thusly:
FROM python:3
RUN mkdir -p "/var/env/script"
COPY "./sensor.py" "./SensorLog.txt" "/var/env/script"
ENTRYPOINT ["python3", "/var/env/script/sensor.py"]
Now, when running the container, I get this error message:
Traceback (most recent call last):
File "/var/env/script/sensor.py", line 70, in main()
File "/var/env/script/sensor.py", line 59, in main reference_temp, reference_humidity, sensors = read_file()
File "/var/env/script/sensor.py", line 14, in read_file with open(file_path) as file: FileNotFoundError:
[Errno 2] No such file or directory: '//SensorLog.txt'
And yet, when I open a bash shell into that same container, and runt he script in that folder, it works just fine, populating a filled dict just as on my computer.
Try adding a:
RUN mkdir /path/to/some/folder
then change your COPY command to:
COPY "./sensor.py" "./SensorLog.txt" /path/to/some/folder
then change your final line to:
ENTRYPOINT ["python3", "path/to/some/folder/sensor.py"]
That would make it easier to diagnose, I would imagine, and I think you're onto something with the working directory comment.
Okay, so, after continuing to fiddle with it, here is the solution that worked, I changed the Dockerfile to read this:
FROM python:3
RUN mkdir -p "/var/env/script"
COPY "./sensor.py" "./SensorLog.txt" /var/env/script
WORKDIR "/var/env/script"
ENTRYPOINT ["python3", "sensor.py"]
This solved my problems, and the python script runs automatically when the container starts and prints a populated dict as expected.

DS18B20, W1ThermSensor, Raspberry pi Zero W, and Python3.9 - Does not consistently read sensor

Before I start, I have browsed the "similar questions" section before writing this and could not see one that matched a situation like mine. If one is found, please let me know and I will mark it as "answered" if it is in fact similar. I am a .net full stack developer by profession, i only recently started dabbling in Python and Electrical Engineering as a hobby.
I am creating an Automated Aquaponics Control system, a part of the project reads the temp of the grow bed media and with the input of various other sensors, recalculates the frequency at which the pump cycles to flood the bed. I am using a DS18B20 with Python3.9 and the W1ThermSensor v2.0.0a2 library. Here is the init and first of several functions for the sensor. I have the w1thermsensor as a property of the class instead of inheritance just during the initial testing, since it is easier to manipulate the code this way for me.
#!/usr/bin/env python3
from w1thermsensor import W1ThermSensor, Sensor, Unit
from datetime import datetime
import os
import numpy
import traceback
class DS18B20:
def __init__(self, min_temp=18, max_temp=26):
self.sensor = W1ThermSensor()
self.temp_string = "{dt} : Sensor: {id} :: {temp_c}C - {temp_f}F"
self.temp_c = 0.00
self.temp_f = 0.00
self.is_active = False
self.is_alert = False
self.min_temp = min_temp
self.max_temp = max_temp
self.values = [0.00, 0.00, 0.00, 0.00, 0.00]
self.value = 0.00
def start(self):
if self.sensor is None:
return False
os.system('modprobe w1-gpio')
os.system('modprobe w1-therm')
# Set baseline for values Average
self.is_active = True
self.monitor()
self.values = [self.temp_c, self.temp_c, self.temp_c, self.temp_c, self.temp_c]
self.value = numpy.average(self.values)
This issue that I am running into is that it will have one of 3 issues:
Raises w1thermsensor.errors.NoSensorFoundError
Raises w1thermsensor.errors.SensorNotReadyError
Returns no value in the temp_c property after calling get_temperature()
I looked into this a bit more, If i load up the IDLE in Terminal using the 'sudo python3' command I can enter the following commands and it works no problem:
sudo python3
>>> from w1thermsensor import W1ThermSensor, Sensor
>>> import time
>>> temp_sensor = W1ThermSensor(Sensor.DS18B20)
>>> while True:
... print(str(round(temp_sensor.get_temperature()))
... time.sleep(2)
and it works without issue. I also try the 'cat' command
cd /sys/bus/w1/devices
cd 28-3c01d607414b
cat w1_slave
94 01 55 05 7f a5 81 66 5b : crc=5b YES
94 01 55 05 7f a5 81 66 5b t=25250
The stacktrace shows that it is throwing the Errors when it is calling the W1Termsensor() function in "init()". My question is, is it my code or implementation that is causing the issue, or is it something else. My sleep is set to 2 seconds in the hope that I am just catching it in the middle of an update. Any help would be a big help.
Addtional Info:
the DS18B20 is wired to a separate 5v power source, the capacitor it to stableize the voltage since there is a 5v relay and a LED array on the same 5v power rail of the power supply.
5v+ -------------+---------VCC------
| |
4.7 Kohm |
| |
GPIO4 ---------------------DQ = 1uf polCap
|
|
|
GND ----------------------GND-------
I have double-checked that I have 1-wire enabled.

Pyqt 4 - QWebView.load(url) leaks memory (not from python)

Basically, I pull a series of links from my database, and want to scrape them for specific links I'm looking for. I then re-feed those links into my link queue that my multiple QWebViews reference, and they continue to pull those down for processing/storage.
My issue is that as this runs for... say 200 or 500 links, it starts to use up more and more RAM.
I have exhaustively looked into this, using heapy, memory_profiler, and objgraph to figure out what's causing the memory leak... The python heap's objects stay about the the same in terms of amount AND size over time. This made me think the C++ objects weren't getting removed. Sure enough, using memory_profiler, the RAM only goes up when the self.load(self.url) lines of code are called. I've tried to fix this, but to no avail.
Code:
from PyQt4.QtCore import QUrl
from PyQt4.QtWebKit import QWebView, QWebSettings
from PyQt4.QtGui import QApplication
from lxml.etree import HTMLParser
# My functions
from util import dump_list2queue, parse_doc
class ThreadFlag:
def __init__(self, threads, jid, db):
self.threads = threads
self.job_id = jid
self.db_direct = db
self.xml_parser = HTMLParser()
class WebView(QWebView):
def __init__(self, thread_flag, id_no):
super(QWebView, self).__init__()
self.loadFinished.connect(self.handleLoadFinished)
self.settings().globalSettings().setAttribute(QWebSettings.AutoLoadImages, False)
# This is actually a dict with a few additional details about the url we want to pull
self.url = None
# doing one instance of this to avoid memory leaks
self.qurl = QUrl()
# id of the webview instance
self.id = id_no
# Status webview instance, green mean it isn't working and yellow means it is.
self.status = 'GREEN'
# Reference to a single universal object all the webview instances can see.
self.thread_flag = thread_flag
def handleLoadFinished(self):
try:
self.processCurrentPage()
except Exception as e:
print e
self.status = 'GREEN'
if not self.fetchNext():
# We're finished!
self.loadFinished.disconnect()
self.stop()
else:
# We're not finished! Do next url.
self.qurl.setUrl(self.url['url'])
self.load(self.qurl)
def processCurrentPage(self):
self.frame = str(self.page().mainFrame().toHtml().toUtf8())
# This is the case for the initial web pages I want to gather links from.
if 'name' in self.url:
# Parse html string for links I'm looking for.
new_links = parse_doc(self.thread_flag.xml_parser, self.url, self.frame)
if len(new_links) == 0: return 0
fkid = self.url['pkid']
new_links = map(lambda x: (fkid, x['title'],x['url'], self.thread_flag.job_id), new_links)
# Post links to database, db de-dupes and then repull ones that made it.
self.thread_flag.db_direct.post_links(new_links)
added_links = self.thread_flag.db_direct.get_links(self.thread_flag.job_id,fkid)
# Add the pulled links to central queue all the qwebviews pull from
dump_list2queue(added_links, self._urls)
del added_links
else:
# Process one of the links I pulled from the initial set of data that was originally in the queue.
print "Processing target link!"
# Get next url from the universal queue!
def fetchNext(self):
if self._urls and self._urls.empty():
self.status = 'GREEN'
return False
else:
self.status = 'YELLOW'
self.url = self._urls.get()
return True
def start(self, urls):
# This is where the reference to the universal queue gets made.
self._urls = urls
if self.fetchNext():
self.qurl.setUrl(self.url['url'])
self.load(self.qurl)
# uq = central url queue shared between webview instances
# ta = array of webview objects
# tf - thread flag (basically just a custom universal object that all the webviews can access).
# This main "program" is started by another script elsewhere.
def main_program(uq, ta, tf):
app = QApplication([])
webviews = ta
threadflag = tf
tf.app = app
print "Beginning the multiple async web calls..."
# Create n "threads" (really just webviews) that each will make asynchronous calls.
for n in range(0,threadflag.threads):
webviews.append(WebView(threadflag, n+1))
webviews[n].start(uq)
app.exec_()
Here's what my memory tools say (they're all about constant through the whole program)
RAM: resource.getrusage(resource.RUSAGE_SELF).ru_maxrss / 1024
2491(MB)
Objgraph most common types:
methoddescriptor 9959
function 8342
weakref 6440
tuple 6418
dict 4982
wrapper_descriptor 4380
getset_descriptor 2314
list 1890
method_descriptor 1445
builtin_function_or_method 1298
Heapy:
Partition of a set of 9879 objects. Total size = 1510000 bytes.
Index Count % Size % Cumulative % Kind (class / dict of class)
0 2646 27 445216 29 445216 29 str
1 563 6 262088 17 707304 47 dict (no owner)
2 2267 23 199496 13 906800 60 __builtin__.weakref
3 2381 24 179128 12 1085928 72 tuple
4 212 2 107744 7 1193672 79 dict of guppy.etc.Glue.Interface
5 50 1 52400 3 1246072 83 dict of guppy.etc.Glue.Share
6 121 1 40200 3 1286272 85 list
7 116 1 32480 2 1318752 87 dict of guppy.etc.Glue.Owner
8 240 2 30720 2 1349472 89 types.CodeType
9 42 0 24816 2 1374288 91 dict of class
Your program is indeed experiencing growth due to C++ code, but it is not an actual leak in terms of the creation of objects that are no longer referenced. What is happening, at least in part, is that your QWebView holds a QWebPage which holds a QWebHistory(). Each time you call self.load the history is getting a bit longer.
Note that QWebHistory has a clear() function.
Documentation is available: http://pyqt.sourceforge.net/Docs/PyQt4/qwebview.html#history

ipython/python console with my own parser/processor?

Is there an easy way to make ipython console redirect the cmd-line towards external parser and then output the result in the current session.
Say for example I have parser that calculates expressions (just for the sake of the example).
Then I want when the cmd-line starts with "calc:" to pass it to this external parser ... here is hypothetical example :
In[XX]: calc: 5 + 5
external calc: 5 + 5 = 10
and so on, you get the idea..
this is the closest I found so far :
first create a shell script :
#!/bin/sh
echo $1
then in ipython :
In [473]: !./x 123
123
if it is in system path then even shorter :
In [475]: !x 123
123
Now if I can share state across invocations.
I made it work as extension :
from __future__ import print_function
from IPython.core.magic import (Magics, magics_class, line_magic, cell_magic, line_cell_magic)
from bi_lang import *
#magics_class
class BiMagics(Magics):
def __init__(self, shell):
super(BiMagics, self).__init__(shell)
self.bi = BiLang()
#line_magic
def do(self, line):
rv = self.bi.run(line)
return rv
ip = get_ipython()
magics = BiMagics(ip)
ip.register_magics(magics)
Then :
In [3]: %reload_ext ipython_extension
In [4]: %do 5 + 6
===== return ast =====
Value
+- val 11
`- vtype 'num'
If you need more info, look here :
http://ipython.readthedocs.io/en/stable/config/custommagics.html#defining-magics

line_profiler not working as expected

Trying to use line_profiler as an API. Following their docs and this tutorial (scroll down to Line Profiling), I get a minimalist test case for profiling some numpy ufuncs:
import numpy as np
import line_profiler
import time
shp = (1000,1000)
a = np.ones(shp)
o = np.zeros(shp)
def main():
t = time.time()
np.divide(a,1,o)
for i in xrange(200):
np.multiply(a,2,o)
np.add(a,1,o)
print 'duration', time.time()-t
profiler = line_profiler.LineProfiler()
profiler.add_function(main)
main()
profiler.print_stats()
I get this in stdout which indicates that main ran, but was not profiled:
duration 2.6779999733
Timer unit: 5.59936e-07 s
File: testprof.py
Function: main at line 9
Total time: 0 s
Line # Hits Time Per Hit % Time Line Contents
==============================================================
9 def main():
10 t = time.time()
11 np.divide(a,1,o)
12 for i in xrange(200):
13 np.multiply(a,2,o)
14 np.add(a,1,o)
15 print 'duration', time.time
()-t
I'm new to line_profiler. See my other q if curious why I don't use cProfile.
Try add
profiler.enable_by_count()
before
main()

Categories

Resources