Locust requests counter and users - python

I am currently working with the python performance test framework Locust.
I prepared a script that uses Locust as a library and runs multiple tests one after the other.
After every run I multiply the number of users with the test number (e.g. second test -> 2*user) in order to test how an API would respond to the change of this variable.
What I saw for "high" values of users was not what I was expecting, The number of requests sent stayed the same even after the increase of the user.
For a range between 100 to 1000 the requests count shown in the CSV files were practically the same and I wanted to better understand what could be the causes of this behaviour.
In this image, extracted from Grafana, every sawtooth wave represent the request count.
It can be seen that the request counter peaks are similar, although there is a difference of 100 users between one and the other.
Could this be a limitation of Locust as a library?
I tried to explore the documentation on this topic but I did not find anything about this problem.
If someone knows about a reliable source of information it would be very useful.
Thanks to everyone who will take the time to answer my question

Related

Python Agent how to track various counters/values evolution over time with ElasticAPM?

I'm really new to APM & Kibana, but ok with Python & ElasticSearch. Before I had Graphite and it was quite easy to do custom tracking.
I'm looking to track 3 simple custom metrics and their evolution over time.
CounterName and it's value. For example queue_size: 23 and send it by any of the workers. What happens when different workers send different values? (because of the time, the value might increase/decrease rapidly).
I do have 20 names of queues to track. Should I put all under a service_name or should I use labels?
Before I used:
self._graphite.gauge("service.queuesize", 3322)
No idea what to have here now:
....
Time spent within a method. I saw here it's possible to have a context manager.
Before I had:
with self._graphite.timer("service.action")
Will become
with elasticapm.capture_span('service.action')
Number of requests. (only count no other tracking)
Before I had
self._graphite.incr("service.incoming_requests")
Is this correct?
client.begin_transaction('processors')
client.end_transaction('processors')
...
THanks a lot!
You can add a couple of different types of metadata to your events in APM. Since it sounds like you want to be able to search/dashboard/aggregate over these counters, you probably want labels, using elasticapm.label().
elasticapm.capture_span is indeed the correct tool here. Note that it can be used either as a function decorator, or as a context manager.
Transactions are indeed the best way to keep track of request volume. If you're using one of the supported frameworks these transactions will be created automatically, so you don't have to deal with keeping track of the Client object or starting the transactions yourself.

Responsible time delays - web crawling

What is a responsible / ethical time delay to put in a web crawler that only crawls one root page?
I'm using time.sleep(#) between the following calls
requests.get(url)
I'm looking for a rough idea on what timescales are:
1. Way too conservative
2. Standard
3. Going to cause problems / get you noticed
I want to touch every page (at least 20,000, probably a lot more) meeting certain criteria. Is this feasible within a reasonable timeframe?
EDIT
This question is less about avoiding being blocked (though any relevant info. would be appreciated) and rather what time delays do not cause issues to the host website / servers.
I've tested with 10 second time delays and around 50 pages. I just don't have a clue if I'm being over cautious.
I'd check their robots.txt. If it lists a crawl-delay, use it! If not, try something reasonable (this depends on the size of the page). If it's a large page, try 2/second. If it's a simple .txt file, 10/sec should be fine.
If all else fails, contact the site owner to see what they're capable of handling nicely.
(I'm assuming this is an amateur server with minimal bandwidth)

New Relic Servers API Getting data using available metrics

I'm running a New Relic server agent on a couple Linux boxes (in R&D stage right now) for gathering performance data, CPU utilization, Memory, etc. I've the the NR API to get back available metrics and the names passable to them. However, I'm not entirely sure how to get that data back correctly (not convinced it's even possible at this point). The one I'm most concerned about this point is:
System/Disk/^dev^xvda1/Utilization/percent.
With available names:
[u'average_response_time', u'calls_per_minute', u'call_count', u'min_response_time', u'max_response_time', u'average_exclusive_time', u'average_value', u'total_call_time_per_minute', u'requests_per_minute', u'standard_deviation']
According to the NR API doc, the proper end point for this is https://api.newrelic.com/v2/servers/${APP_ID}/metrics/data.xml. Where I assume ${APP_ID} is the Server ID.
So, I'm able to send the request, however, the data I'm getting back is not at all what I'm looking for.
Response:
<average_response_time>0</average_response_time>
<calls_per_minute>1.4</calls_per_minute>
<call_count>1</call_count>
<min_response_time>0</min_response_time>
<max_response_time>0</max_response_time>
<average_exclusive_time>0</average_exclusive_time>
<average_value>0</average_value>
<total_call_time_per_minute>0</total_call_time_per_minute>
<requests_per_minute>1.4</requests_per_minute>
<standard_deviation>0</standard_deviation>
Which would be what is expected. I think the data in these metrics is accurate, but I think they're to be taken at face value. However, the reason I even say they're to be taken for face value is based upon this statement in the NR API Docs:
Metric values include:
Total disk space used, indicated by average_response_time
Capacity of the disk, indicated by average_exclusive_time.
Which would lead one to believe that the data we want is is listed within one of the available name parameters for the the request. So, essentially my question is, is there a more specific way I need to hit the NR API to actually get the disk utilization as a percentage? Or is that not possible, even though I'm given to believe otherwise based upon the aforementioned information?. I'm hoping maybe there is information I'm missing here... Thanks!

Is it possible to make writing to files/reading from files safe for a questionnaire type website?

My web app asks users 3 questions and simple writes that to a file, a1,a2,a3. I also have real time visualization of the average of the data (reads real time from file).
Must I use a database to ensure that no/minimal information is lost? Is it possible to produce a queue of read/writes>(Since files are small I am not too worried about the execution time of each call). Does python/flask already take care of this?
I am quite experienced in python itself, but not in this area(with flask).
I see a few solutions:
read /dev/urandom a few times, calculate sha-256 of the number and use it as a file name; collision is extremely improbable
use Redis and command like LPUSH, using it from Python is very easy; then RPOP from right end of the linked list, there's your queue

Check for json change in python

I am running this python code on my raspberry pi, which checks USGS data and finds the magnitude of all earthquakes within the last hour. The only problem is that the json is always changing. How do I make it keep checking to see if it changed again?
The simplest setup would be to periodically run the request logic over and over, caching the results each time, perhaps with an optional increasing backoff if several requests yield the same results.
You could then compare the new parsed values with the previous ones if the delta is what you really care about, or just replace inline if you're just want to ensure the freshest. Since json.loads by default deserializes to a dictionary, all the standard dictionary methods are available to make comparisons.
Very simple examples of timed-interval callbacks are available on other SO posts
Alternatively there are heavier solutions like APScheduler, though that's probably a lot more than you'd be interested in for a Raspberry Pi.

Categories

Resources