There have been many claims around the internet of SSL/TLS adding negligible CPU overhead (note, I’m only considering HTTPS here). Most of these focus on many small transfers, typical of most websites, where performance is dominated by the handshake, rather than the message encryption. Although SSL may be less useful for typical large transfers, as HTTPS becomes more pervasive, we’re likely going to see this happen more often.
However, after seeing a CPU core get maxed out during a large upload*, I was interested in the performance impacts for single large transfers. Presumably reasonable CPUs should fast enough to serve content to typical internet clients, even on a 1Gbps line, but how close are they to this?
* As it turns out, the reason for this was actually SSL/TLS compression, so if you’re reading this after seeing high CPU usage during SSL transfer and the figures don’t match empirical speeds, check that it isn’t enabled!
So I decided to run a few (relatively unscientific) tests on a few dedicated servers I happen to have access to at the moment. The test is fairly simple – create a 1GB file and measure CPU usage over SSL. Note that I’ll be measuring the usage of the client rather than the server, since the latter is a little more difficult to perform – presumably the client should give a decent ballpark of the CPU usage of the server.
Test Setup
The 1GB file was created using dd if=/dev/zero of=1g bs=1M count=1024
This file was served by nginx 1.4/1.6 on Debian 7. SSLv3 was disabled, as it seems to be out of favour these days, so the test is only over TLS. I tested various cipher suites using the ssl_ciphers directive:
- No SSL: just as a baseline (transfer over HTTP)
- NULL-MD5: another baseline
- ECDHE-RSA-AES256-GCM-SHA384: labelled “Default“, this seems to be the preferred cipher if you don’t give nginx a ssl_ciphers directive
- RC4-MD5: clients may not accept this, but perhaps the fastest crypto/hashing combo that might be accepted (unless the CPU supports crypto h/w accel)
- AES128-SHA: probably the fastest cipher likely accepted by clients
- ECDHE-RSA-AES128-GCM-SHA256: labelled “AES128-GCM” (no-one has space to fit that in a table; oh, and why does this WordPress theme have a limited column width?!); this is likely just a faster version of Default
The following commands were used for testing:
- CPU benchmark:
openssl speed [-decrypt] -evp [algorithm]
- Wget download:
time wget --no-check-certificate https://localhost/1g -O /dev/null
- cURL download:
time curl -k https://localhost/1g > /dev/null
- cURL upload:
time curl -kF file=@1g https://localhost/null.php > /dev/null
For checking CPU speed, the ‘user time’ measurement was taken from the time
command. I suspect wget uses GnuTLS whilst cURL uses OpenSSL for handling SSL.
I ran the test on 4 rather different CPUs:
- VIA Nano U2250
- Note that this is a single core CPU, so transfer speeds will be affected by the webserver performing encryption whilst the client does decryption on the same core
- OpenSSL was patched to support VIA Padlock (hardware accelerated AES/SHA1/SHA256)
- AMD Athlon II X2 240
- Intel Xeon E3 1246v3
- This CPU supports hardware accelerated AES
- Fairly close to the fastest you’ll get from a CPU today
- Marvell Armada 370/XP
- A quad core ARMv7 CPU; quite a weak CPU, perhaps comparable to a Pentium III in terms of performance
CPU Benchmark
To get an idea of the speed of the CPU, I ran some hashing/encryption benchmarks using OpenSSL’s speed test. The following figures are in MB/s, taken from the 8192K column. CPUs across the top, ciphers down the side.
Nano | Athlon | Xeon | Armada | |
---|---|---|---|---|
RC4 | 235.80 | 514.05 | 943.37 | 98.84 |
MD5 | 289.68 | 551.29 | 755.16 | 141.54 |
AES-128-CBC | 899.14 | 227.45 | 854.48 | 50.05 |
AES-128-CBC (decrypt) | 899.56 | 218.91 | 4871.77 | 48.61 |
AES-256-CBC | 693.24 | 159.82 | 615.08 | 37.95 |
AES-256-CBC (decrypt) | 696.25 | 162.48 | 3655.11 | 38.14 |
AES-128-GCM | 51.38 | 68.63 | 1881.33 | 24.37 |
AES-256-GCM | 41.61 | 51.48 | 1642.22 | 21.22 |
SHA1 | 459.06 | 413.71 | 881.87 | 105.54 |
SHA256 | 396.90 | 178.01 | 296.98 | 52.73 |
SHA512 | 100.98 | 277.43 | 464.86 | 24.42 |
(decryption for RC4 and AES-GCM is likely the same as encryption, them being stream(-like) ciphers and all)
Test Results
Notes:
- wget doesn’t seem to like NULL-MD5 or “AES128-GCM”
- Columns:
- Transfer (MB/s): download/upload speed, probably not useful, but may be interesting
- CPU Speed (MB/s): = 1024MB ÷ (user) CPU Time (s)
- I’ve included pretty graphs for management type people who can’t read tables; the speed is log scale though, so stay awake!
VIA Nano U2250
Cipher | Wget download | cURL download | cURL upload | |||
Transfer | CPU Speed | Transfer | CPU Speed | Transfer | CPU Speed | |
No SSL | 495 | 4129.03 | 457 | 1580.25 | 55.9 | 1706.67 |
NULL-MD5 | 57.1 | 144.88 | 40.4 | 155.06 | ||
Default | 14.3 | 16.95 | 17.7 | 37.83 | 15.5 | 37.61 |
RC4-MD5 | 29.7 | 46.65 | 44 | 103.18 | 32 | 106.00 |
AES128-SHA | 19.2 | 23.62 | 48.9 | 96.49 | 37.7 | 145.95 |
AES128-GCM | 21 | 45.55 | 18.1 | 45.41 |
AMD Athlon II X2 240
Cipher | Wget download | cURL download | cURL upload | |||
Transfer | CPU Speed | Transfer | CPU Speed | Transfer | CPU Speed | |
No SSL | 1782 | 10240.00 | 1975 | 12800.00 | 404 | 13473.68 |
NULL-MD5 | 308 | 438.36 | 211 | 416.94 | ||
Default | 40.7 | 41.07 | 46.9 | 49.55 | 43.1 | 49.35 |
RC4-MD5 | 86.6 | 88.43 | 263 | 346.88 | 189 | 340.43 |
AES128-SHA | 59.1 | 60.00 | 118 | 127.11 | 98 | 130.15 |
AES128-GCM | 55.8 | 65.56 | 56.7 | 64.65 |
Intel Xeon E3 1246v3
Cipher | Wget download | cURL download | cURL upload | |||
Transfer | CPU Speed | Transfer | CPU Speed | Transfer | CPU Speed | |
No SSL | 4854 | 32000.00 | 5970 | 32000.00 | 1363 | 51200.00 |
NULL-MD5 | 556 | 638.40 | 452 | 677.25 | ||
Default | 88.6 | 88.55 | 997 | 1312.82 | 699 | 1422.22 |
RC4-MD5 | 182 | 185.91 | 514 | 587.16 | 420 | 587.16 |
AES128-SHA | 128 | 128.00 | 556 | 643.22 | 449 | 664.94 |
AES128-GCM | 1102 | 1497.08 | 723 | 1641.03 |
Marvel Armada 370/XP
Cipher | Wget download | cURL download | cURL upload | |||
Transfer | CPU Speed | Transfer | CPU Speed | Transfer | CPU Speed | |
No SSL | 223 | 882.76 | 182 | 544.68 | 44.4 | 403.15 |
NULL-MD5 | 44.3 | 62.48 | 25.7 | 60.24 | ||
Default | 7.01 | 7.23 | 16 | 18.52 | 13.7 | 18.56 |
RC4-MD5 | 20.5 | 22.14 | 32.6 | 41.90 | 23.1 | 41.80 |
AES128-SHA | 9.16 | 9.62 | 21.5 | 24.11 | 16.2 | 23.63 |
AES128-GCM | 17.5 | 20.15 | 14.8 | 20.15 |
Conclusion
On slower CPUs (okay, I’ll ignore the Armada here), it does appear that SSL can have a significant impact on CPU usage for single large transfers. Even on an Athlon II, the effect can be noticeable if you’re transferring at 1Gbps – whilst the CPU can achieve it, if you’re using the CPU significantly for other purposes, you may find it to be a bottleneck. On modern CPUs (especially those with AES-NI) though, the impact is relatively low (and may be lower once Intel’s Skylake platform comes out (though I suspect GCM modes will still be faster, it may help clients that don’t support TLS1.2)), unless you’re looking at saturating a 10Gbps connection on a single connection (or CPU core).
Cipher selection can be quite important in making things fast if you’re on a slower CPU, although most of the time it’s AES crypto with a choice of SHA/GCM for integrity checking, if you want client support.
The crypto library likely has a noticeable effect, but this wasn’t particularly tested (it appears that OpenSSL is usually faster than GnuTLS, but this is only a very rough guess from the results).
Oh and AES-GCM is ridiculously fast on CPUs with AES-NI.
Stuff I left out
- Triple DES ciphers: cause they’re slow, and caring about IE6 isn’t popular any more
- AES crypto without AES-NI on modern Intel CPUs: this is perhaps only useful for those running inside VMs that don’t pass on the capability to guests, but I cbf testing this
- Ads throughout the post