Use AQUA with Amazon Redshift RA3.xlplus nodes

0
26

[ad_1]

Amazon Redshift RA3 is the most recent technology node sort that lets you scale compute and storage in your information warehouses independently. The RA3 node household contains RA3.16xlarge, RA3.4xlarge, and RA3.xlplus nodes for big, medium, and small workloads, respectively. RA3.xlplus, the most recent member of the RA3 node household, provides one third of the computing energy of RA3.4xlarge and prices one third of the worth. RA3.xlplus is the smallest node within the RA3 household, but it surely provides the identical superior functionalities. It has been extensively utilized in environments with mild computing demand equivalent to QA, information analytics for small groups, or processing smaller datasets.

In 2021, Amazon Redshift launched AQUA (Superior Question Accelerator) for Amazon Redshift to spice up efficiency of analytical queries that scan, filter, and combination giant datasets. AQUA makes use of AWS-designed processors with the AWS Nitro chip adapter to hurry up information encryption and compression, and customized analytical processors carried out in FPGAs to speed up purposes requiring textual content search of a really giant dataset, equivalent to advertising and personalization.

Prospects have requested us to assist AQUA for RA3.xlplus, and we lately launched AQUA for RA3.xlplus nodes. On this submit, we proceed to construct on the submit AQUA (Superior Question Accelerator) – A Pace Increase for Your Amazon Redshift Queries and present that with AQUA assist, RA3.xlplus gives the identical profit as the present supported RA3 nodes within the following areas:

  • Robotically boosting sure sorts of queries
  • Lowering the influence in your Amazon Redshift cluster by offloading sure queries that scan, filter, and combination giant datasets to AQUA

Take a look at surroundings

To check AQUA for RA3.xlplus, we began by creating an RA3.xlplus cluster with the next particulars:

  • Amazon Redshift cluster – 2-node RA3.xlplus
  • Dataset – 3 TB TPC-DS, 3 TB TPC-H
  • Question set – Pattern queries based mostly on the TPC-H and TPC-DS workload

Pattern queries

To check AQUA, we created six textual content search queries that scan, filter, and combination the lineitem desk within the TPC-H dataset, which has 18 billion rows with a WHERE clause predicate towards the l_comment column.

The next desk summarizes our desk definition.

desk encoded diststyle sortkey1 rows
lineitem Y KEY l_shipdate 18,000,048,306

We randomly generated a question set with queries of assorted complexity. The queries are designed to measure scan value, that are an space of focus for AQUA. Every question has a predicate with LIKE and OR. The variety of LIKE or OR predicates will get progressively greater to simulate complicated workloads.

For instance, Question 1 has one OR predicate:

SELECT COUNT(l_orderkey)
FROM lineitem
WHERE (l_comment LIKE '%throughout%') OR (l_comment LIKE '%courageous,%');

In distinction, Question 4 has 50 OR predicates:

SELECT COUNT(l_orderkey)
  FROM lineitem
  WHERE (l_comment LIKE '%outsi%') OR
  (l_comment LIKE '%uthless%') OR
  (l_comment LIKE '%capades%') OR
  (l_comment LIKE '%horses%') OR
  (l_comment LIKE '%ornis%' AND l_comment LIKE '%phins?%') OR
  (l_comment LIKE '%affix%') OR
  (l_comment LIKE '%integrat%') OR
....
  (l_comment LIKE '%ithin%' AND l_comment LIKE '%quiet%') OR
  (l_comment LIKE '%taphs%') OR
  (l_comment LIKE '%dugouts%' AND l_comment LIKE '%ches%') OR
  (l_comment LIKE '%telets%' AND l_comment LIKE '%detect!%') OR
  (l_comment LIKE '%develop%') OR
  (l_comment LIKE '%promise!%') OR
  (l_comment LIKE '%was%') OR
  (l_comment LIKE '%accounts%') OR
  (l_comment LIKE '%idly%' AND l_comment LIKE '%deposits%') OR
  (l_comment LIKE '%combine!%' AND l_comment LIKE '%rely%') OR
  (l_comment LIKE '%ins%' AND l_comment LIKE '%makes use of!%') OR
  (l_comment LIKE '%epitaphs!%' AND l_comment LIKE '%breac%') OR
  (l_comment LIKE '%pliers%' AND l_comment LIKE '%phins%') OR
  (l_comment LIKE '%hogs%' AND l_comment LIKE '%sentiments%') OR
  (l_comment LIKE '%ctions%' AND l_comment LIKE '%daringly%') OR
  (l_comment LIKE '%ies%' AND l_comment LIKE '%esias%');

The next desk summarizes the complexity of every question.

Question Quantity Variety of OR Variety of LIKE
Question 1 1 2
Question 2 5 7
Question 3 10 12
Question 4 50 66

Scan efficiency enchancment with AQUA

We ran the 4 queries sequentially with out some other workload on the system. With AQUA, the efficiency enhancements vary from roughly 7–13 occasions sooner, as summarized within the following desk.

Question Quantity Amazon Redshift with AQUA (seconds) Amazon Redshift Solely (seconds) Enchancment
Question 1 78.53 635.89 709.74%
Question 2 92.75 810.04 773.36%
Question 3 130.68 956.83 632.19%
Question 4 137.68 1950.9 1316.98%

AQUA influence on a number of workloads

On this surroundings, we simulated a multi-user workflow utilizing TPC-DS queries on the Amazon Redshift cluster. We recorded question runtime for 3 eventualities:

  • Baseline – We measured the end-to-end runtime working all TPC-DS queries serially on the Amazon Redshift cluster. On this situation, AQUA was off and no extra workload was run (a single person was on the cluster).
  • Baseline with extra workload – This was the identical because the baseline situation with an extra workload run in parallel. We simulated a person load by working textual content scan queries randomly chosen from Question 1, Question 2 and Question 3. These queries have comparatively brief runtimes. We had two variations of this situation:
    • AQUA turned off
    • AQUA turned on

From the outcomes, we noticed the next:

  • With AQUA turned on for all workloads, the influence of a textual content scan question on the baseline runtime was negligible.
  • With out AQUA, the baseline runtime was impacted by the extra workload created with textual content scan queries. In our case, overhead was about 31%.
Baseline Baseline with extra workload Enchancment with AQUA
AQUA turned off AQUA turned on
TPC-DS Finish-to-Finish Time 3:43:35 4:54:50 3:44:36 31.27%

Single-node RA3.xlplus assist

AQUA additionally helps the lately launched Amazon Redshift single-node RA3.xlplus. In a single-node configuration, the useful resource is shared amongst all Amazon Redshift operations, that are historically dealt with individually by a frontrunner node and compute nodes. A single-node configuration is usually utilized in a private or small group surroundings for information exploration.

We ran the identical set of queries as earlier than utilizing Question 1, 2 and Question 3. The outcomes demonstrated that AQUA gives an identical degree of accelerations for these queries in a single-node surroundings.

Question Quantity Amazon Redshift with AQUA (seconds) Amazon Redshift Solely (seconds) Enchancment
Question 1 157.91 1,254.03 694.13%
Question 2 193.64 2,037.79 952.36%
Question 3 260.75 2,495.85 857.19%

Abstract

On this submit, we ran a set of simulated efficiency assessments on the Amazon Redshift RA3.xlplus platform with AQUA. With AQUA on, RA3.xlplus gives the identical profit as earlier supported platforms. It gives a question scan efficiency enhance with AQUA-supported operators, which is able to increase over time. It could actually scale back the efficiency influence of your present workflow by offloading the scan to AQUA.

We invite you to share your feedback and use circumstances with the Amazon Redshift AQUA staff.

For extra details about how AQUA accelerates Amazon Redshift, see AQUA (Superior Question Accelerator) for Amazon Redshift.

For extra details about queries accelerated by AQUA, see When does Amazon Redshift use AQUA to run queries?


Concerning the Authors

Quan Li is a Senior Database Engineer at Amazon Redshift. His focus is enabling clients to ship most enterprise worth. Quan is keen about optimizing high-performance analytical databases. Throughout his spare time, he enjoys touring and experiencing several types of cuisines along with his household.

Steffen Rochel is a Sr. Software program Growth Supervisor at AWS. He’s targeted on information analytics acceleration. He has experience in hardware-software design and operation of large-scale, high-performance distributed techniques.

[ad_2]

LEAVE A REPLY

Please enter your comment!
Please enter your name here