Speeding up API Response Time

Speeding up API Response Time - python

I wish reduce the complete time a web server to REQUEST/RECEIVE data from an API server for a given query.
Assuming MySQL as bottleneck, I updated the API server db to Cassandra, but still complete time remains the same. May be something else is a bottleneck which I could not figure out.
Environment:
Number of Request Estimated per minute: 100
Database: MySQl / Cassandra
Hardware: EC2 Small
Server Used: Apache HTTP
Current Observations:
Cassandra Query Response Time: .03 Secs
Time between request made and response received: 4 Secs
Required:
Time between request made and response received: 1 Secs
BOTTOM LINE: How can we reduce the complete time taken in this given case?
Feel free to ask for more details if required. Thanks

Summarizing from the chat:
Environment:
Running on a small Amazon EC2 instance (1 virtual CPU, 1.7GB RAM)
Web server is Apache
100 worker threads
Python is using Pylons (which implies WSGI)
Test client in the EC2
Tests:
1.8k requests, single thread
Unknown CPU cost
Cassandra request time: 0.079s (spread 0.048->0.759)
MySQL request time: 0.169s (spread 0.047->1.52)
10k requests, multiple threads
CPU runs at 90%
Cassandra request time: 2.285s (spread 0.102->6.321)
MySQL request time: 7.879s (spread 0.831->14.065)
Observation: 100 threads is probably a lot too many on your small EC2 instance. Bear in mind that each thread spawns a Python process which occupies memory and resources - even when not doing anything. Reducing the threads reduces:
Memory contention (and memory paging kills performance)
CPU buffer misses
CPU contention
DB contention
Recommendation: You should aim to run only as many threads as are needed to max out your CPU (but fewer if they max out on memory or other resources). Running more threads increases overheads and decreases throughput.
Observation: Your best performance time in single-threaded mode gives a probable best-case cost of 0.05 CPU-seconds per request. Assuming some latency (waits for IO), your CPU cost may be quite a lot lower). Assuming CPU is the bottleneck in your architecture, you probably are capable of 20-40 transactions a second on your EC2 server with just thread tuning.
Recommendation: Use a standard Python profiler to profile the system (when running with an optimum number of threads). The profiler will indicate where the CPU spends the most time. Distinguish between waits (i.e. for the DB to return, for disk to read or write data) vs. the inherent CPU cost of the code.
Where you have a high inherent CPU cost: can you decrease the cost? If this is not in your code, can you avoid that code path by doing something different? Caching? Using another library?
Where there is latency: Given your single-threaded results, latency is not necessarily bad assuming that the CPU can service another request. In fact you can get a rough idea on the number of threads you need by calculating: (total time / (total time - wait time))
However, check to see that, while Python is waiting, the DB (for instance) isn't working hard to return a result.
Other thoughts: Consider how the test harness delivers HTTP requests - does it do so as fast as it can (eg tries to open 10k TCP sockets simultaneously?) If so, this may be skewing your results. It may be better to use a different loading pattern and tool.

Cassandra works faster on high load and average time of 3 - 4 secs on a two system on different sides of the world is ok.

Related

What the settings for generating 15000 POST Requests per Second in JMeter for atleast an hour continuously?

What the settings/approach for generating 15000 POST Requests per Second in JMeter for atleast an hour continuously?
As per project requirements: at-least 15K users will post messages per second and they will continue it for an hour or more.
What is the, Number of Threads, Ramp-up time, loop count ?

Number of Threads: depending on your application response time, if your application responds fast - you will need less threads, if it responds slow - more, something like:
if your application response time is 500ms - you will need 7500 threads
if your application response time is 1s - you will need 15000 threads
if your application response time is 2s - you will need 30000 threads
etc.
Ramp-up - depending on the number of threads and your test scenario. A good option would be:
10 minutes to ramp up
60 minutes (or more) to hold the load
10 minutes to ramp down
Loop Count: Forever. The duration of the test can be limited by "Scheduler" section of the Thread Group or using Runtime Controller or you can manually stop the test when needed.
You can use i.e. Constant Throughput Timer or Precise Throughput Timer in order to set JMeter's throughput to 15k/second
Make sure to follow JMeter Best Practices as 15k requests per second is quite high load, it might be the case you will have to go for Distributed Testing if you won't be able to conduct the required load from the single machine.
Make sure that JMeter machine(s) have enough headroom to operate in terms of CPU, RAM, etc. as if JMeter will lack resources - it will not be able to send requests fast enough even if the application under test is capable of handling more load. You can monitor resources usage using i.e. JMeter PerfMon Plugin

(AWS) What happens to a python script without enough CPU?

My small AWS EC2 instance runs a two python scripts, one to receive JSON messages as a web-socket(~2msg/ms) and write to csv file, and one to compress and upload the csvs. After testing, the data(~2.4gb/day) recorded by the EC2 instance is sparser than if recorded on my own computer(~5GB). Monitoring shows the EC2 instance consumed all CPU credits and is operating on baseline power. My question is, does the instance drop messages because it cannot write them fast enough?
Thank you to anyone that can provide any insight!

It depends on the WebSocket server.
If your first script cannot run fast enough to match the message generation speed on server side, the TCP receive buffer will become full and the server will slow down on sending packets. Assuming a near-constant message production rate, unprocessed messages will pile up on the server, and the server could be coded to let them accumulate or eventually drop them.
Even if the server never dropped a message, without enough computational power, your instance would never catch up - on 8/15 it could be dealing with messages from 8/10 - so instance upgrade is needed.
Does data rate vary greatly throughout the day (e.g. much more messages in evening rush around 20:00)? If so, data loss may have occurred during that period.
But is Python really that slow? 5GB/day is less than 100KB per second, and even a fraction of one modern CPU core can easily handle it. Perhaps you should stress test your scripts and optimize them (reduce small disk writes, etc.)

Simpy - accessing multiple resources

I am starting to learn SimPy DES framework.I want to implement a simulation in which requests arrive at different times to the server. There are different types of requests, each one of them loads server with specific memory/cpu load. So, for instance, there might be requests that typically use 10% of CPU and 100MB of mem, while other requests might need 15% of CPU and 150MB of RAM (those are just example numbers). Server has its own characteristics and some ammount of memory. If a request arrive to server and it does not have required amount of resources ready this request should wait. I know how to handle case of a single resource - I could for instance implement CPU load using Container class with capacity of 100 and initial amount of 100, similarly for mememory. However, how do I implement situation in which my requests should wait for both CPU and memory to be available?
Thanks in advance!

The easiest solution would be to use the AllOf condition event like this:
cpu_req = cpu.get(15) # Request 15% CPU capactiy
mem_req = mem.get(10) # Request 10 memories
yield cpu_req & mem_req # Wait until we have cpu time and memory
yield env.timeout(10) # Use resources for 10 time units
This would cause your process to wait until both request events are triggered. However, if the cpu would be available at t=5 and the memory at t=20, the CPU would be blocked for the whole time (from 5-20 + the time you actually use the CPU).
Might this work for you?

How to optimize uWSGI python app + nginx on Ubuntu?

I have a simple Flask application that exposes one api. Calling the api runs a python algorithm that does a lot of string manipulation and file reading (no writing). The algorithm takes about 1000ms. I'm trying to see if there's anyway to optimize concurrent requests. I'm running on a single instance of 4 vCPU VM.
I wrote a client that makes a request every 1000ms. There's minimal RAM usage, and CPU usage is about 35%. When I up the request to every 750ms. RAM usage did not increase by much, but CPU usage doubles to 70%. If I increase the requests to every 500ms, the response will start taking longer time, eventually timing out. CPU usage is at 100%, and RAM is still minimal.
I followed this tutorial to set my application. I enabled threads in my uWSGI settings. However, I did not really notice much difference.
I was hoping to get some advice on what I can do software/settings-wise to respond better to concurrent requests.

Speed up web requests by making multiple at once?

Will running multiple processes that all make HTTP requests be notably faster than one?
I'm parsing about a million urls using lxml.html.parse
At first, I ran a Python process that simply looped through the urls and called lxml.html.parse(myUrl) on each, and waited for the rest of the method to deal with the data before doing so again. This way, I was able to process on the order of 10000 urls/hour.
I imagined that if I ran a few identical processes (dealing with different sets of urls), I would speed up the rate at which I could fetch these urls. Surprisingly, (to me at least), I measured about 10400 urls/hour this time, which isn't notably better, considering I'm sure both were fluctuating dramatically.
My question is: why isn't running three of these processes much faster than one?
I know for a fact that my requests aren't meaningfully affecting their target in any way, so I don't think it's them. Do I not have enough bandwidth to make these extra processes worthwhile? If not, how can I measure this? Am I totally misunderstanding how my MacBook is running these processes? (I'm assuming on different cores concurrent threads, or something roughly equivalent to that.) Something else entirely?
(Apologies if I mangled any web terminology -- I'm new to this kind of stuff. Corrections are appreciated.)
Note: I imagine that running these processes on three different servers would probably be about 3x as fast. (That correct?) I'm not interested in that -- worst case, 10000/hour is sufficient for my purposes.
Edit: from speedtest.net (twice):
With 3 running:
Ping: 29 ms (25 ms)
Download speed: 6.63 mbps (7.47 mbps)
Upload speed: 3.02 mbps (3.32 mbps)
With all paused:
Ping: 26 ms (28 ms)
Download speed: 9.32 mbps (8.82 mbps)
Upload speed: 5.15 mbps (6.56 mbps)

Considering you have roughly 7mbit/s (1MB/s counting high).
If you get 2.888 pages per second (10'400 pages per hour). I'd say you're maxing out your connection speed (especially if you're running ADSL or WiFi, you're hammering with TCP connection handshakes for sure).
You're downloading a page roughly containing 354kB of data in each of your processes, which isn't half bad considering that's close the the limit of your bandwidth.
Take in account for TCP headers and all that happens when you actually establish a connection (SYN, ACK.. etc) You're up in a descent speed tbh.
Note: This is just to take in account the download rate which is much higher than your upload speed which is also important considering that's what actually transmits your connection request, headers to the web server etc. And i know most 3G modems and ADSL lines claim to be "full duplex", they really aren't (especially ADSL). You'll never perform full speed in both direction despite what your ISP tells you. If you want to achieve such tasks you need to switch to fiber optics.
Ps. I assume you understand the basic difference between mega-bit and mega-byte.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.