Put together and cargo Amazon S3 information into Teradata utilizing AWS Glue by way of its native connector for Teradata Vantage

Big Data

Put together and cargo Amazon S3 information into Teradata utilizing AWS Glue by way of its native connector for Teradata Vantage

lohitnath.453

December 1, 2023

Put together and cargo Amazon S3 information into Teradata utilizing AWS Glue by way of its native connector for Teradata Vantage

[ad_1]

On this put up, we discover the right way to use the AWS Glue native connector for Teradata Vantage to streamline information integrations and unlock the total potential of your information.

Companies usually depend on Amazon Easy Storage Service (Amazon S3) for storing giant quantities of knowledge from varied information sources in an economical and safe method. For these utilizing Teradata for information evaluation, integrations by way of the AWS Glue native connector for Teradata Vantage unlock new prospects. AWS Glue enhances the pliability and effectivity of knowledge administration, permitting corporations to seamlessly combine their information, no matter its location, with Teradata’s analytical capabilities. This new connector eliminates technical hurdles associated to configuration, safety, and administration, enabling corporations to effortlessly export or import their datasets into Teradata Vantage. Consequently, companies can focus extra on extracting significant insights from their information, somewhat than coping with the intricacies of knowledge integration.

AWS Glue is a serverless information integration service that makes it easy for analytics customers to find, put together, transfer, and combine information from a number of sources for analytics, machine studying (ML), and software improvement. With AWS Glue, you’ll be able to uncover and hook up with greater than 100 various information sources and handle your information in a centralized information catalog. You’ll be able to visually create, run, and monitor extract, rework, and cargo (ETL) pipelines to load information into your information lakes.

Teradata Company is a number one related multi-cloud information platform for enterprise analytics, targeted on serving to corporations use all their information throughout an enterprise, at scale. As an AWS Knowledge & Analytics Competency companion, Teradata provides a whole cloud analytics and information platform, together with for Machine Studying.

Introducing the AWS Glue native connector for Teradata Vantage

AWS Glue gives help for Teradata, accessible by way of each AWS Glue Studio and AWS Glue ETL scripts. With AWS Glue Studio, you profit from a visible interface that simplifies the method of connecting to Teradata and authoring, operating, and monitoring AWS Glue ETL jobs. For information builders, this help extends to AWS Glue ETL scripts, the place you should utilize Python or Scala to create and handle extra particular information integration and transformation duties.

The AWS Glue native connector for Teradata Vantage lets you effectively learn and write information from Teradata with out the necessity to set up or handle any connector libraries. You’ll be able to add Teradata as each the supply and goal inside AWS Glue Studio’s no-code, drag-and-drop visible interface or use the connector immediately in an AWS Glue ETL script job.

Answer overview

On this instance, you employ AWS Glue Studio to counterpoint and add information saved on Amazon S3 to Teradata Vantage. You begin by becoming a member of the Occasion and Venue recordsdata from the TICKIT dataset. Subsequent, you filter the outcomes to a single geographic area. Lastly, you add the refined information to Teradata Vantage.

The TICKIT dataset tracks gross sales exercise for the fictional TICKIT web site, the place customers purchase and promote tickets on-line for sporting occasions, exhibits, and live shows. On this dataset, analysts can establish ticket motion over time, success charges for sellers, and best-selling occasions, venues, and seasons.

For this instance, you employ AWS Glue Studio to develop a visible ETL pipeline. This pipeline will learn information from Amazon S3, carry out transformations, after which load the remodeled information into Teradata. The next diagram illustrates this structure.

By the tip of this put up, your visible ETL job will resemble the next screenshot.

Stipulations

For this instance, you need to have entry to an current Teradata database endpoint with community reachability from AWS and permissions to create tables and cargo and question information.

AWS Glue wants community entry to Teradata to learn or write information. How that is configured depends upon the place your Teradata is deployed and the particular community configuration. For Teradata deployed on AWS, you would possibly have to configure VPC peering or AWS PrivateLink, safety teams, and community entry management lists (NACLs) to permit AWS Glue to speak with Teradata overt TCP. If Teradata is exterior AWS, networking companies similar to AWS Website-to-Website VPN or AWS Direct Join could also be required. Public web entry will not be advisable on account of safety dangers. In the event you select public entry, it’s safer to run the AWS Glue job in a VPC behind a NAT gateway. This method allows you to enable listing just one IP tackle for incoming visitors in your community firewall. For extra data, confer with Infrastructure safety in AWS Glue.

Arrange Amazon S3

Each object in Amazon S3 is saved in a bucket. Earlier than you’ll be able to retailer information in Amazon S3, it’s essential to create an S3 bucket to retailer the outcomes. Full the next steps:

On the Amazon S3 console, select Buckets within the navigation pane.
Select Create bucket.
For Title, enter a globally distinctive identify to your bucket; for instance, tickit8530923.
Select Create bucket.
Obtain the TICKIT dataset and unzip it.
Create the folder tickit in your S3 bucket and add the allevents_pipe.txt and venue_pipe.txt recordsdata.

Configure Teradata connections

To connect with Teradata from AWS Glue, see Configuring Teradata Connection.

You will need to create and retailer your Teradata credentials in an AWS Secrets and techniques Supervisor secret after which affiliate that secret with a Teradata AWS Glue connection. We focus on these two steps in additional element later on this put up.

Create an IAM position for the AWS Glue ETL job

While you create the AWS Glue ETL job, you specify an AWS Id and Entry Administration (IAM) position for the job to make use of. The position should grant entry to all assets utilized by the job, together with Amazon S3 (for any sources, targets, scripts, driver recordsdata, and momentary directories) and Secrets and techniques Supervisor. For directions, see Configure an IAM position to your ETL job.

Create desk in Teradata

Utilizing your most popular database instrument, log in to Teradata. Run the next code to create the desk in Teradata the place you’ll load your information:

CREATE MULTISET TABLE take a look at.tickit, FALLBACK
   (venueid varchar(25),
    venuename varchar(100),
    venuecity varchar(100),
    venuestate varchar(25),
    venueseats varchar(25),
    eventid varchar(25),
    catid varchar(25),
    dateid varchar(25),
    eventname varchar(100),
    starttime varchar(100))
    NO PRIMARY INDEX
;

Retailer Teradata login credentials

An AWS Glue connection is a Knowledge Catalog object that shops login credentials, URI strings, and extra. The Teradata connector requires Secrets and techniques Supervisor for storing the Teradata person identify and password that you just use to connect with Teradata.

To retailer the Teradata person identify and password in Secrets and techniques Supervisor, full the next steps:

On the Secrets and techniques Supervisor console, select Secrets and techniques within the navigation pane.
Select Retailer a brand new secret.
Choose Different sort of secret.
Enter the important thing/worth USER and teradata_user, then select Add row.
Enter the important thing/worth PASSWORD and teradata_user_password, then select Subsequent.

For Secret identify, enter a descriptive identify, then select Subsequent.
Select Subsequent to maneuver to the overview step, then select Retailer.

Create the Teradata connection in AWS Glue

Now you’re able to create an AWS Glue connection to Teradata. Full the next steps:

On the AWS Glue console, select Connections below Knowledge Catalog within the navigation pane.
Select Create connection.
For Title, enter a reputation (for instance, teradata_connection).
For Connection sort¸ select Teradata.
For Teradata URL, enter jdbc:teradata://url_of_teradata/database=name_of_your_database.
For AWS Secret, select the key along with your Teradata credentials that you just created earlier.

Create an AWS Glue visible ETL job to remodel and cargo information to Teradata

Full the next steps to create your AWS Glue ETL job:

On the AWS Glue console, below ETL Jobs within the navigation pane, select Visible ETL.
Select Visible ETL.
Select the pencil icon to enter a reputation to your job.

We add venue_pipe.txt as our first dataset.

Select Add nodes and select Amazon S3 on the Sources tab.

Enter the next information supply properties:
1. For Title, enter Venue.
2. For S3 supply sort, choose S3 location.
3. For S3 URL, enter the S3 path to venue_pipe.txt.
4. For Knowledge format, select CSV.
5. For Delimiter, select Pipe.
6. Deselect First line of supply file comprises column headers.

Now we add allevents_pipe.txt as our second dataset.

Select Add nodes and select Amazon S3 on the Sources tab.
Enter the next information supply properties:
1. For Title, enter Occasion.
2. For S3 supply sort, choose S3 location.
3. For S3 URL, enter the S3 path to allevents_pipe.txt.
4. For Knowledge format, select CSV.
5. For Delimiter, select Pipe.
6. Deselect First line of supply file comprises column headers.

Subsequent, we rename the columns of the Venue dataset.

Select Add nodes and select Change Schema on the Transforms tab.
Enter the next rework properties:
1. For Title, enter Rename Venue information.
2. For Node dad and mom, select Venue.
3. Within the Change Schema part, map the supply keys to the goal keys:
  1. col0: venueid
  2. col1: venuename
  3. col2: venuecity
  4. col3: venuestate
  5. col4: venueseats

Now we filter the Venue dataset to a selected geographic area.

Select Add nodes and select Filter on the Transforms tab.
Enter the next rework properties:
1. For Title, enter Location Filter.
2. For Node dad and mom, select Venue.
3. For Filter situation, select venuestate for Key, select matches for Operation, and enter DC for Worth.

Now we rename the columns within the Occasion dataset.

Select Add nodes and select Change Schema on the Transforms tab.
Enter the next rework properties:
1. For Title, enter Rename Occasion information.
2. For Node dad and mom, select Occasion.
3. Within the Change Schema part, map the supply keys to the goal keys:
  1. col0: eventid
  2. col1: e_venueid
  3. col2: catid
  4. col3: dateid
  5. col4: eventname
  6. col5: starttime

Subsequent, we be a part of the Venue and Occasion datasets.

Select Add nodes and select Be a part of on the Transforms tab.
Enter the next rework properties:
1. For Title, enter Be a part of.
2. For Node dad and mom, select Location Filter and Rename Occasion information.
3. For Be a part of sort¸ select Inside be a part of.
4. For Be a part of circumstances, select venueid for Location Filter and e_venueid for Rename Occasion information.

Now we drop the duplicate column.

Select Add nodes and select Change Schema on the Transforms tab.
Enter the next rework properties:
1. For Title, enter Drop column.
2. For Node dad and mom, select Be a part of.
3. Within the Change Schema part, choose Drop for e_venueid .

Subsequent, we load the info into the Teradata desk.

Select Add nodes and select Teradata on the Targets tab.
Enter the next information sink properties:
1. For Title, enter Teradata.
2. For Node dad and mom, select Drop column.
3. For Teradata connection, select teradata_connection.
4. For Desk identify, enter schema.tablename of the desk you created in Teradata.

Lastly, we run the job and cargo the info into Teradata.

Select Save, then select Run.

A banner will show that the job has began.

Select Runs, which shows the standing of the job.

The run standing will change to Succeeded when the job is full.

Hook up with your Teradata after which question the desk the info was loaded to it.

The filtered and joined information from the 2 datasets can be within the desk.

Clear up

To keep away from incurring further fees attributable to assets created as a part of this put up, be sure you delete the objects you created within the AWS account for this put up:

The Secrets and techniques Supervisor key created for the Teradata credentials
The AWS Glue native connector for Teradata Vantage
The information loaded within the S3 bucket
The AWS Glue Visible ETL job

Conclusion

On this put up, you created a connection to Teradata utilizing AWS Glue after which created an AWS Glue job to remodel and cargo information into Teradata. The AWS Glue native connector for Teradata Vantage empowers your information analytics journey by offering a seamless and environment friendly pathway for integrating your information with Teradata. This new functionality in AWS Glue not solely simplifies your information integration workflows but in addition opens up new avenues for superior analytics, enterprise intelligence, and machine studying improvements.

With the AWS Teradata Connector, you’ve one of the best instrument at your disposal for simplifying information integration duties. Whether or not you’re trying to load Amazon S3 information into Teradata for analytics, reporting, or enterprise insights, this new connector streamlines the method, making it extra accessible and cost-effective.

To get began with AWS Glue, confer with Getting Began with AWS Glue.

Concerning the Authors

Kamen Sharlandjiev is a Sr. Huge Knowledge and ETL Options Architect and AWS Glue professional. He’s on a mission to make life simpler for patrons who’re dealing with advanced information integration challenges. His secret weapon? Totally managed, low-code AWS companies that may get the job finished with minimal effort and no coding. Comply with Kamen on LinkedIn to maintain updated with the most recent AWS Glue information!

Sean Bjurstrom is a Technical Account Supervisor in ISV accounts at Amazon Internet Providers, the place he makes a speciality of analytics applied sciences and attracts on his background in consulting to help clients on their analytics and cloud journeys. Sean is obsessed with serving to companies harness the facility of knowledge to drive innovation and development. Exterior of labor, he enjoys operating and has participated in a number of marathons.

Vinod Jayendra is an Enterprise Help Lead in ISV accounts at Amazon Internet Providers, the place he helps clients remedy their architectural, operational, and cost-optimization challenges. With a selected give attention to serverless applied sciences, he attracts from his intensive background in software improvement to assist clients construct top-tier options. Past work, he finds pleasure in high quality household time, embarking on biking adventures, and training youth sports activities groups.

Doug Mbaya is a Senior Accomplice Answer architect with a spotlight in analytics and machine studying. Doug works intently with AWS companions and helps them combine their options with AWS analytics and machine studying options within the cloud.

[ad_2]