After I figured out that Celery is rather slow I moved on to test another part of my environment - a web proxy server. The test here compares two proxy implementations - one with Python Twisted, the other in Go. The backend is a simple web server written in Go, which is probably the fastest thing when it comes to serving HTML.
The test content is a snapshot of the front page of this blog taken few days ago. The system is a standard Lenovo X220 laptop, with Intel Core i7 CPU, with 4 cores. The measurement instrument is the popular wrk tool with a custom Lua script to redirect the requests through the proxy.
All tests were repeated several times, only the best results are shown here.
I’ve taken time between the tests in order for all open TCP ports to close.
I’ve also observed the number of open ports (e.g. sockets) using
Using wrk against the web server in Go yields around 30000 requests per second with an average of 2000 TCP ports in use:
$ ./wrk -c1000 -t20 -d30s http://127.0.0.1:8000/atodorov.html Running 30s test @ http://127.0.0.1:8000/atodorov.html 20 threads and 1000 connections Thread Stats Avg Stdev Max +/- Stdev Latency 304.43ms 518.27ms 1.47s 82.69% Req/Sec 1.72k 2.45k 17.63k 88.70% 1016810 requests in 29.97s, 34.73GB read Non-2xx or 3xx responses: 685544 Requests/sec: 33928.41 Transfer/sec: 1.16GB
The Twisted implementation performs at little over 1000 reqs/sec with an average TCP port use between 20000 and 30000:
./wrk -c1000 -t20 -d30s http://127.0.0.1:8080 -s scripts/proxy.lua -- http://127.0.0.1:8000/atodorov.html Running 30s test @ http://127.0.0.1:8080 20 threads and 1000 connections Thread Stats Avg Stdev Max +/- Stdev Latency 335.53ms 117.26ms 707.57ms 64.77% Req/Sec 104.14 72.19 335.00 55.94% 40449 requests in 30.02s, 3.67GB read Socket errors: connect 0, read 0, write 0, timeout 8542 Non-2xx or 3xx responses: 5382 Requests/sec: 1347.55 Transfer/sec: 125.12MB
First I’ve run several 30 seconds tests and performance was around 8000 req/sec
with around 20000 ports used (most of them remain in TIME_WAIT state for a while).
Then I’ve modified
proxy.go to make use of all available CPUs on the system and let
the test run for 5 minutes.
$ ./wrk -c1000 -t20 -d300s http://127.0.0.1:9090 -s scripts/proxy.lua -- http://127.0.0.1:8000/atodorov.html Running 5m test @ http://127.0.0.1:9090 20 threads and 1000 connections Thread Stats Avg Stdev Max +/- Stdev Latency 137.22ms 437.95ms 4.45s 97.55% Req/Sec 669.54 198.52 1.71k 76.40% 3423108 requests in 5.00m, 58.27GB read Socket errors: connect 0, read 26, write 181, timeout 24268 Non-2xx or 3xx responses: 2870522 Requests/sec: 11404.19 Transfer/sec: 198.78MB
Performance peaked at 10000 req/sec. TCP port usage initially rose to around 30000
but rapidly dropped and stayed around 3000. Both
printing the following messages on the console:
2014/11/18 21:53:06 http: Accept error: accept tcp [::]:9090: too many open files; retrying in 1s
There’s no doubt that Go is blazingly fast compared to Python and I’m most likely to use it further in my experiments. Still I didn’t expect a 3x difference in performance from webserver vs. proxy.
Another thing that worries me is the huge number of open TCP ports which then drops and stays consistent over time and the error messages from both webserver and proxy (maybe per process sockets limit).
At the moment I’m not aware of the internal workings of neither wrk, nor Go itself, nor the goproxy library to make conclusion if this is a bad thing or expected. I’m eager to hear what others think in the comments. Thanks!