Greg Heywood - AWS Graviton Performance Test

The Graviton is AWS's own line of processors, but people can be reluctant to move to a new processor platform. There are many solid reasons for this, including, not knowing how it would perform, or how easy it is to use. So for my own experience, I wanted to throw together a quick test using a couple of servers.

Preparation

I wanted to test a couple of servers internally (using their private IP addresses, rather than public). This was beacause although I would be testing at fairly low levels in the grand scheme of things, I didn't want to risk any additional complication by hitting these servers externally. A relatively easy way to do loadtesting is to is to install and use a component called apachebench, and use that to connect to each server in turn. I wanted to connect to each instance, so when creating them, used an IAM role that has the correct SSM permissions. There is an IAM Policy called AmazonEC2RoleforSSM, that you can attach to the role that you attach to the instances.

Once you have done that, you can install apachebench on it, as a way to do some simple load testing.

To install apachebench, you can use yum, and install it via userdata.

    #!/bin/bash
    yum update -y
    yum install httpd-tools -y

Or. you can just install it when you connect via SSM.

You also need a couple of test servers. I needed two Arm servers, so that I could look at using the Graviton to compare against the newer Graviton2 Arm processor. To do this, I lanched two instances using the Amazon Linux 2 AMI, (Amazon Linux 2 LTS Arm64 Kernel 5.10 AMI 2.0.20220606.1 arm64 HVM gp2), and one on x86 (Amazon Linux 2 Kernel 5.10 AMI 2.0.20220606.1 x86_64 HVM gp2) that I used to run the loadtest from.

For each server to be tested, I used userdata to install apache so I could get the test page:

    #!/bin/bash
    yum update -y
    yum install httpd -y
    service httpd start
    chkconfig httpd on

The Tests

Once Apache was installed and started, I was able to obtain the private IP for each instance (because I was testing internally, rather than from the Internet), and could then connect to the loadtesting instance and run something like the following to check that I am getting a webpage sucessfully:

    sh-4.2$ curl http://172.31.9.252/
    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">

    <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
            <head>
                    <title>Test Page for the Apache HTTP Server</title>
                    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
                    <style type="text/css">
                            /*<![CDATA[*/
                            body {
                                    background-color: #fff;
    <truncated>

What I wanted to test for primarily were requests/sec, but also the time that 75% of requests were completed by. I also wanted to use multiple concurrent connections, and a test that ran long enough to give a reasonable, if quick result. So I decided to go for 250 concurrent connections, and run it for 500,000 connections.

Then I had to decide the instance selection to test against. I choose the a1.medium (1 vCPU, 2 GiB RAM), as it is the smallest of the original Graviton that I could get (and so easier to put under stress), and then I chose a similar or cheaper t4g instances as it had the newer processor. That was the t4g.small, which is actually only 65% of the price of the a1.medium, while having the same 2 GiB of RAM, but also 2 vCPUs. Despite the additional CPU, it still represents a signifcant cost saving.

Ireland Pricing
a1.medium	$0.0288
a1.large    $0.0576
t4g.small	$0.0184
t3.small	$0.0228

Results

Given the t4g.small has the extra vCPU, I expected it to perform better, despite being cheaper, and it did.

t4g.small:

Requests per second:    3259.02 [#/sec] (mean)
Time per request:       0.307 [ms] (mean, across all concurrent requests)

Percentage of the requests served within a certain time (ms)
50%     62
66%     66
75%     69

That seemed pretty good actually, compared to what I was expecting for the price, but I wanted to be sure, so ran the same test on the a1.medium.

a1.medium:

Requests per second:    2821.06 [#/sec] (mean)
Time per request:       0.354 [ms] (mean, across all concurrent requests)

Percentage of the requests served within a certain time (ms)
50%     83
66%     88
75%     90

That is quite a significant difference for an instance that is actually cheaper, where they both run the same verison of Linux, and same version of Apache. For an instance that is 65% of the cost, you get a 16% performance increase. I was curious to see how a larger a1 would stack up, so I also fired up an a1.large. That has 2 vCPU, and 4 GiB of RAM, so should perform better, but it was also a lot more expensive, 3x the price. So did the a1.large deliver close to 3x the performance of the t4g.small? No, not even close.

a1.large

Requests per second:    4772.74 [#/sec] (mean)
Time per request:       0.210 [ms] (mean, across all concurrent requests)

Percentage of the requests served within a certain time (ms)
50%     31
66%     32
75%     34

It did perform better, but actually 46% better in terms of requests per second, but a >300% increase in price, meaning that the t4g range still represented much better bang for your buck.

So that was good to know, but not completely unexpected. There was still one question though, one elephant in the room. If you needed to run Apache on Linux, and you must have performance, should you choose Arm, or is an x86 platform still better?

Fear not, I tested that too. The t3.small is more expensive than the the t4g.small, but closer price wise than the t3.micro and cheaper than the a1.medium. I was hoping that the the t4g.small would be fatster, but didn't expect it really.

t3.small

Requests per second:    5932.85 [#/sec] (mean)
Time per request:       42.138 [ms] (mean)

Percentage of the requests served within a certain time (ms)
50%     39
66%     42
75%     44

The t3.small is 19% more expensive, so how much better was the performance? On this simple test, 82% more requests per second compared to the t4g.small. Still, this clearly shows that the Graviton is getting more and more efficient, and the Graviton2 is a big step up from the Graviton. Additionally though, there is the Graviton3 which should be a step up again.

graph

Conclusion

Of course it isn't all about raw performance too, or virtualisation would not have taken off like it did. It is about right-sizing, and more and more, environmental impact.

The fact that the Graviton processes run at a lower wattage and use less energy (being Arm based), should be a significant factor in any decision, over and above raw performance. The tests I did, show that the Graviton2 (or Graviton3!) is of course capable enough to be considered for many production workloads.