TLS/SSL CPU Usage for Large Transfers

There have been many claims around the internet of SSL/TLS adding negligible CPU overhead (note, I’m only considering HTTPS here).  Most of these focus on many small transfers, typical of most websites, where performance is dominated by the handshake, rather than the message encryption. Although SSL may be less useful for typical large transfers, as HTTPS becomes more pervasive, we’re likely going to see this happen more often.

However, after seeing a CPU core get maxed out during a large upload*, I was interested in the performance impacts for single large transfers.  Presumably reasonable CPUs should fast enough to serve content to typical internet clients, even on a 1Gbps line, but how close are they to this?
* As it turns out, the reason for this was actually SSL/TLS compression, so if you’re reading this after seeing high CPU usage during SSL transfer and the figures don’t match empirical speeds, check that it isn’t enabled!

So I decided to run a few (relatively unscientific) tests on a few dedicated servers I happen to have access to at the moment.  The test is fairly simple – create a 1GB file and measure CPU usage over SSL.  Note that I’ll be measuring the usage of the client rather than the server, since the latter is a little more difficult to perform – presumably the client should give a decent ballpark of the CPU usage of the server.

Test Setup

The 1GB file was created using dd if=/dev/zero of=1g bs=1M count=1024

This file was served by nginx 1.4/1.6 on Debian 7. SSLv3 was disabled, as it seems to be out of favour these days, so the test is only over TLS.  I tested various cipher suites using the ssl_ciphers directive:

  • No SSL: just as a baseline (transfer over HTTP)
  • NULL-MD5: another baseline
  • ECDHE-RSA-AES256-GCM-SHA384: labelled “Default“, this seems to be the preferred cipher if you don’t give nginx a ssl_ciphers directive
  • RC4-MD5: clients may not accept this, but perhaps the fastest crypto/hashing combo that might be accepted (unless the CPU supports crypto h/w accel)
  • AES128-SHA: probably the fastest cipher likely accepted by clients
  • ECDHE-RSA-AES128-GCM-SHA256: labelled “AES128-GCM” (no-one has space to fit that in a table; oh, and why does this WordPress theme have a limited column width?!); this is likely just a faster version of Default

The following commands were used for testing:

  • CPU benchmark:
    openssl speed [-decrypt] -evp [algorithm]
  • Wget download:
    time wget --no-check-certificate https://localhost/1g -O /dev/null
  • cURL download:
    time curl -k https://localhost/1g > /dev/null
  • cURL upload:
    time curl -kF file=@1g https://localhost/null.php > /dev/null

For checking CPU speed, the ‘user time’ measurement was taken from the time command.  I suspect wget uses GnuTLS whilst cURL uses OpenSSL for handling SSL.

I ran the test on 4 rather different CPUs:

  • VIA Nano U2250
    • Note that this is a single core CPU, so transfer speeds will be affected by the webserver performing encryption whilst the client does decryption on the same core
    • OpenSSL was patched to support VIA Padlock (hardware accelerated AES/SHA1/SHA256)
  • AMD Athlon II X2 240
  • Intel Xeon E3 1246v3
  • Marvell Armada 370/XP
    • A quad core ARMv7 CPU; quite a weak CPU, perhaps comparable to a Pentium III in terms of performance

CPU Benchmark

To get an idea of the speed of the CPU, I ran some hashing/encryption benchmarks using OpenSSL’s speed test.  The following figures are in MB/s, taken from the 8192K column. CPUs across the top, ciphers down the side.

Nano Athlon Xeon Armada
RC4 235.80 514.05 943.37 98.84
MD5 289.68 551.29 755.16 141.54
AES-128-CBC 899.14 227.45 854.48 50.05
AES-128-CBC (decrypt) 899.56 218.91 4871.77 48.61
AES-256-CBC 693.24 159.82 615.08 37.95
AES-256-CBC (decrypt) 696.25 162.48 3655.11 38.14
AES-128-GCM 51.38 68.63 1881.33 24.37
AES-256-GCM 41.61 51.48 1642.22 21.22
SHA1 459.06 413.71 881.87 105.54
SHA256 396.90 178.01 296.98 52.73
SHA512 100.98 277.43 464.86 24.42

(decryption for RC4 and AES-GCM is likely the same as encryption, them being stream(-like) ciphers and all)

Test Results

Notes:

  • wget doesn’t seem to like NULL-MD5 or “AES128-GCM”
  • Columns:
    • Transfer (MB/s): download/upload speed, probably not useful, but may be interesting
    • CPU Speed (MB/s): = 1024MB ÷ (user) CPU Time (s)
  • I’ve included pretty graphs for management type people who can’t read tables; the speed is log scale though, so stay awake!

VIA Nano U2250

Cipher Wget download cURL download cURL upload
Transfer CPU Speed Transfer CPU Speed Transfer CPU Speed
No SSL 495 4129.03 457 1580.25 55.9 1706.67
NULL-MD5
57.1 144.88 40.4 155.06
Default 14.3 16.95 17.7 37.83 15.5 37.61
RC4-MD5 29.7 46.65 44 103.18 32 106.00
AES128-SHA 19.2 23.62 48.9 96.49 37.7 145.95
AES128-GCM
21 45.55 18.1 45.41

Speed Graph

AMD Athlon II X2 240

Cipher Wget download cURL download cURL upload
Transfer CPU Speed Transfer CPU Speed Transfer CPU Speed
No SSL 1782 10240.00 1975 12800.00 404 13473.68
NULL-MD5
308 438.36 211 416.94
Default 40.7 41.07 46.9 49.55 43.1 49.35
RC4-MD5 86.6 88.43 263 346.88 189 340.43
AES128-SHA 59.1 60.00 118 127.11 98 130.15
AES128-GCM
55.8 65.56 56.7 64.65

Speed Graph

Intel Xeon E3 1246v3

Cipher Wget download cURL download cURL upload
Transfer CPU Speed Transfer CPU Speed Transfer CPU Speed
No SSL 4854 32000.00 5970 32000.00 1363 51200.00
NULL-MD5
556 638.40 452 677.25
Default 88.6 88.55 997 1312.82 699 1422.22
RC4-MD5 182 185.91 514 587.16 420 587.16
AES128-SHA 128 128.00 556 643.22 449 664.94
AES128-GCM
1102 1497.08 723 1641.03

Speed Graph

Marvel Armada 370/XP

Cipher Wget download cURL download cURL upload
Transfer CPU Speed Transfer CPU Speed Transfer CPU Speed
No SSL 223 882.76 182 544.68 44.4 403.15
NULL-MD5
44.3 62.48 25.7 60.24
Default 7.01 7.23 16 18.52 13.7 18.56
RC4-MD5 20.5 22.14 32.6 41.90 23.1 41.80
AES128-SHA 9.16 9.62 21.5 24.11 16.2 23.63
AES128-GCM
17.5 20.15 14.8 20.15

Speed Graph

Conclusion

On slower CPUs (okay, I’ll ignore the Armada here), it does appear that SSL can have a significant impact on CPU usage for single large transfers.  Even on an Athlon II, the effect can be noticeable if you’re transferring at 1Gbps – whilst the CPU can achieve it, if you’re using the CPU significantly for other purposes, you may find it to be a bottleneck.  On modern CPUs (especially those with AES-NI) though, the impact is relatively low (and may be lower once Intel’s Skylake platform comes out (though I suspect GCM modes will still be faster, it may help clients that don’t support TLS1.2)), unless you’re looking at saturating a 10Gbps connection on a single connection (or CPU core).

Cipher selection can be quite important in making things fast if you’re on a slower CPU, although most of the time it’s AES crypto with a choice of SHA/GCM for integrity checking, if you want client support.

The crypto library likely has a noticeable effect, but this wasn’t particularly tested (it appears that OpenSSL is usually faster than GnuTLS, but this is only a very rough guess from the results).

Oh and AES-GCM is ridiculously fast on CPUs with AES-NI.

Stuff I left out

  • Triple DES ciphers: cause they’re slow, and caring about IE6 isn’t popular any more
  • AES crypto without AES-NI on modern Intel CPUs: this is perhaps only useful for those running inside VMs that don’t pass on the capability to guests, but I cbf testing this
  • Ads throughout the post

Leave a Reply