Home Big Data Saved process enhancements in Amazon Redshift

Saved process enhancements in Amazon Redshift

0
Saved process enhancements in Amazon Redshift

[ad_1]

Amazon Redshift is a completely managed, petabyte-scale knowledge warehouse service within the cloud. With Amazon Redshift, you may analyze all of your knowledge to derive holistic insights about your online business and your prospects. It helps saved procedures the place ready SQL code is saved and the code might be reused again and again.

Saved procedures are generally used to encapsulate logic for knowledge transformation, knowledge validation, and business-specific logic. By combining a number of SQL steps right into a saved process, you may create reusable code blocks that may run collectively as a single transaction or a number of particular person transactions. You too can schedule saved procedures to automate knowledge processing on Amazon Redshift. For extra info, check with Bringing your saved procedures to Amazon Redshift.

Within the Redshift saved process default atomic transaction mode, a name to a Redshift saved process will create its personal transaction when the decision begins or is a part of the present transaction if an express transaction is opened earlier than the saved process is named. All of the statements inside a process behave as if they’re in a single transaction block that ends when the saved process name finishes. A nested name to a different process is handled like some other SQL assertion and operates inside the context of the identical transaction because the caller. Statements for TRUNCATE, COMMIT, and ROLLBACK and the exception dealing with block with arbitrary SQL statements shut the present transaction and begin a brand new transaction implicitly. This habits may cause challenges in migration to Amazon Redshift from different methods like Teradata.

On this put up, we talk about the enhancements to Amazon Redshift saved procedures for non-atomic transaction mode. This mode supplies enhanced transaction controls that allow you to robotically commit the statements contained in the saved process.

Non-atomic transaction mode

The brand new non-atomic transaction mode characteristic supplies three enhancements on saved procedures in Amazon Redshift:

  • Except the DML or DDL statements are a part of an express open transaction, every assertion within the saved process will run in its personal implicit transaction and a brand new transaction will likely be opened to deal with following statements. If an express transaction is opened, then all subsequent statements are run and stay un-committed till an express transaction management command (COMMIT or ROLLBACK) is run to finish the transaction.
  • Amazon Redshift is not going to re-raise the exception after the exception handler statements are full. Due to this fact, a brand new RAISE assertion with none INFO or EXCEPTION has been supplied to re-throw the exception caught by the exception dealing with block. This RAISE assertion with none INFO or EXCEPTION will solely be allowed within the exception dealing with block.
  • Additionally, the brand new START TRANSACTION assertion begins an express transaction contained in the non-atomic transaction mode saved process. Use the present transaction management command (COMMIT or ROLLBACK) to finish the explicitly began transaction.
    • Amazon Redshift doesn’t assist sub-transactions so if there may be already an open transaction, then calling this assertion once more will do nothing, and no error is raised.
    • If an express transaction continues to be open when the nonatomic transaction mode saved process name ends, then the specific transaction stays open till a transaction management command is run within the session.
    • If the session disconnects earlier than working a transaction management command, the entire transaction is robotically rolled again.

Extra restrictions

Some restrictions have additionally been launched for Redshift saved procedures:

  • For nesting saved process calls, all of the procedures have to be created in the identical transaction mode, irrespective of if it’s in atomic (default) transaction mode or the brand new non-atomic transaction mode
  • You’ll be able to’t nest saved procedures throughout the 2 transaction modes (atomic and non-atomic)
  • You’ll be able to’t set the SECURITY DEFINER possibility or SET configuration_parameter possibility for non-atomic transaction mode saved procedures

Impression to cursors

Cursors in non-atomic transaction mode saved procedures will behave in another way in comparison with the default atomic transaction mode:

  • Cursor statements will want an express transaction block earlier than starting the cursor to make sure that every iteration of the cursor loop will not be auto-committed.
  • To return a cursor from non-atomic transaction mode saved process, you will want an express transaction block earlier than starting the cursor. In any other case, the cursor will likely be closed when the SQL assertion contained in the loop is robotically dedicated.

Benefits

The next are key benefits of this characteristic from a person perspective:

  • It supplies the potential to carry and shift Teradata saved procedures to run in Teradata session mode. This helps in seamless migrations from knowledge warehouses like Teradata and SQL Server.
  • It allows Amazon Redshift to supply extra versatile operations within saved procedures when encountering errors and exceptions. Amazon Redshift can now protect earlier motion’s state earlier than reaching an exception.

Syntax

The brand new non-obligatory key phrase NONATOMIC has been added to the saved process definition syntax, as proven within the following code:

CREATE [ OR REPLACE ] PROCEDURE sp_procedure_name
( [ [ argname ] [ argmode ] argtype [, ...] ] )
[ NONATOMIC ]
AS $$
procedure_body
$$ LANGUAGE plpgsql

This non-obligatory key phrase creates the saved process beneath the non-atomic transaction mode. When you don’t specify the key phrase, then the default atomic mode would be the transaction mode when creating the saved process.

NONATOMIC means every DML and DDL assertion within the process will likely be implicitly dedicated.

With out non-atomic mode, the process will create its personal transaction when the decision begins or be a part of the present transaction if an express transaction is opened earlier than it’s known as. Each assertion inside the saved process will belong to this one transaction.

Instance of NONATOMIC mode

Let’s take into account the client contact desk custcontacts, which shops buyer main and secondary contact cellphone numbers:

CREATE desk custcontacts(
custid int4 not null,
primaryphone char(10),
secondaryphone char(10));

We insert three pattern buyer information with no contact values:

INSERT INTO custcontacts VALUES (101, 'xxxxxxxxxx', 'xxxxxxxxxx');
INSERT INTO custcontacts VALUES (102, 'xxxxxxxxxx', 'xxxxxxxxxx');
INSERT INTO custcontacts VALUES (103, 'xxxxxxxxxx', 'xxxxxxxxxx');

It is advisable to create a saved process to replace the first and secondary cellphone numbers. The requirement is to not roll again updates to the first contact quantity if updates to the secondary contact quantity fail for some cause.

You’ll be able to obtain this by creating the saved process with the NONATOMIC key phrase. The NONATOMIC key phrase ensures that every assertion within the saved process runs in its personal implicit transaction block. Due to this fact, if the UPDATE assertion for the secondary cellphone fails, then it received’t roll again the info replace made to the first cellphone. See the next code:

CREATE PROCEDURE sp_update_custcontacts(cid int4,pphone char(15),sphone char(15)) NONATOMIC AS
$$
BEGIN
UPDATE custcontacts SET primaryphone=pphone WHERE custid=cid;
UPDATE custcontacts SET secondaryphone=sphone WHERE custid=cid;
END;
$$
LANGUAGE plpgsql;

Now let’s name the saved process passing the secondary cellphone quantity with greater than 10 digits, which can fail within the secondaryphone UPDATE assertion as a result of incorrect size:

name sp_update_custcontacts(101,'1234567890','345443345324');

The previous process name will replace the first cellphone quantity efficiently. The secondary cellphone quantity replace fails. Nevertheless, the primaryphone replace is not going to roll again as a result of it ran in its personal implicit transaction block as a result of NONATOMIC clause within the saved process definition.

choose * from custcontacts;

custcontacts | primaryphone | secondaryphone
-------------+---------------+---------------
101 | 1234567890 | XXXXXXXXXX
102 | XXXXXXXXXX | XXXXXXXXXX
103 | XXXXXXXXXX | XXXXXXXXXX

Exception dealing with in NONATOMIC mode

Exceptions are dealt with in saved procedures in another way based mostly on the atomic or non-atomic mode:

  • Atomic (default) – Exceptions are at all times re-raised
  • Non-atomic – Exceptions are dealt with and you may select to re-raise or not

Let’s proceed with the earlier instance for example exception dealing with in non-atomic mode.

Create the next desk to log exceptions raised by saved procedures:

CREATE TABLE procedure_log
(log_timestamp timestamp, procedure_name varchar(100), error_message varchar(255));

Now replace the sp_update_custcontacts() process to deal with exceptions. Notice that we’re including an EXCEPTION block within the process definition. It inserts a report within the procedure_log desk within the occasion of an exception.

CREATE OR REPLACE PROCEDURE sp_update_custcontacts(cid int4,pphone char(15),sphone char(15)) NONATOMIC AS
$$
BEGIN
UPDATE custcontacts SET primaryphone=pphone WHERE custid=cid;
UPDATE custcontacts SET secondaryphone=sphone WHERE custid=cid;
EXCEPTION
WHEN OTHERS THEN
INSERT INTO procedure_log VALUES (getdate(), 'sp_update_custcontacts', sqlerrm);
END;
$$
LANGUAGE plpgsql;

Now create another saved process, which can name the previous process. It additionally has an EXCEPTION block and inserts a report within the procedure_log desk within the occasion of an exception.

CREATE PROCEDURE sp_update_customer() NONATOMIC AS
$$
BEGIN
-- Allow us to assume you could have further staments right here to replace different fields. For this instance, ommitted them for simplifiction.
-- Nested name to replace contacts
name sp_update_custcontacts(101,'1234567890','345443345324');
EXCEPTION
WHEN OTHERS THEN
INSERT INTO procedure_log VALUES (getdate(), 'sp_update_customer', sqlerrm);
END;
$$
LANGUAGE plpgsql;

Let’s name the mother or father process we created:

name sp_update_customer();

This in flip will name the sp_update_custcontacts() process. The interior process sp_update_custcontacts() will fail as a result of we’re updating the secondary cellphone with an invalid worth. The management will enter the EXCEPTION block of the sp_update_custcontacts() process and make an insert into the procedure_log desk.

Nevertheless, it is not going to re-raise the exception in non-atomic mode. Due to this fact, the mother or father process sp_update_customer() is not going to get the exception handed from the sp_update_custcontacts() process. The management is not going to enter the EXCEPTION block of the sp_update_customer() process.

When you question the procedure_log desk, you will note an entry just for the error dealt with by the sp_update_custcontacts() process:

choose * from procedure_log;

Procedure Log Output

Now redefine the sp_update_custcontacts() process with the RAISE assertion:

CREATE PROCEDURE sp_update_custcontacts(cid int4,pphone char(15),sphone char(15)) NONATOMIC AS
$$
BEGIN
UPDATE custcontacts SET primaryphone=pphone WHERE custid=cid;
UPDATE custcontacts SET secondaryphone=sphone WHERE custid=cid;
EXCEPTION
WHEN OTHERS THEN
INSERT INTO procedure_log VALUES (getdate(), 'sp_update_custcontacts', sqlerrm);
RAISE;
END;
$$
LANGUAGE plpgsql;

Let’s name the mother or father saved process sp_update_customer() once more:

name sp_update_customer();

Now the interior process sp_update_custcontacts() will re-raise the exception to the mother or father process sp_update_customer() after dealing with the exception in its personal EXCEPTION block. Then the management will attain the EXCEPTION block within the mother or father process and insert one other report into the procedure_log desk.

When you question the procedure_log desk now, you will note two entries: one by the interior process sp_update_custcontacts() and one other by the mother or father process sp_update_customer(). This demonstrates that the RAISE assertion within the interior process re-raised the exception.

choose * from procedure_log;

Procedure log output

Express START TRANSACTION assertion in non-atomic mode

You’ll be able to situation a START TRANSACTION assertion to start a transaction block contained in the saved process. It should open a brand new transaction contained in the saved process. For examples, check with Nonatomic mode saved process transaction administration.

Conclusion

On this put up, we mentioned the enhancements to Redshift saved procedures for non-atomic transaction mode, which supplies enhanced transaction controls to allow you to robotically commit the statements contained in the saved process. This mode additionally allows simpler migration to Amazon Redshift from different methods like Teradata. Check out these enhancements and tell us your expertise in feedback.


In regards to the Authors

Milind Oke is a Knowledge Warehouse Specialist Options Architect based mostly out of New York. He has been constructing knowledge warehouse options for over 15 years and makes a speciality of Amazon Redshift.

Satesh Sonti is a Sr. Analytics Specialist Options Architect based mostly out of Atlanta, specialised in constructing enterprise knowledge platforms, knowledge warehousing, and analytics options. He has over 17 years of expertise in constructing knowledge belongings and main complicated knowledge platform packages for banking and insurance coverage shoppers throughout the globe.

Kiran Chinta is a Software program Improvement Supervisor at Amazon Redshift. He leads a robust group in question processing, SQL language, knowledge safety, and efficiency. Kiran is captivated with delivering merchandise that seamlessly combine with prospects’ enterprise purposes with the suitable ease of use and efficiency. In his spare time, he enjoys studying and enjoying tennis.

Huichen Liu is a software program growth engineer on the Amazon Redshift question processing group. She focuses on question optimization, statistics and SQL language options. In her spare time, she enjoys climbing and images.

[ad_2]