Close Menu

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    The story behind Lightning Chart – and its upcoming Dashtera analytics and dashboard answer

    November 14, 2025

    This week in AI updates: GPT-5.1, Cloudsmith MCP Server, and extra (November 14, 2025)

    November 14, 2025

    OpenAI’s newest replace delivers GPT-5.1 fashions and capabilities to offer customers extra management over ChatGPT’s persona

    November 13, 2025
    Facebook X (Twitter) Instagram
    • About Us
    • Contact Us
    • Disclaimer
    • Privacy Policy
    • Terms and Conditions
    TC Technology NewsTC Technology News
    • Home
    • Big Data
    • Drone
    • Software Development
    • Software Engineering
    • Technology
    TC Technology NewsTC Technology News
    Home»Big Data»Use a number of bookmark keys in AWS Glue JDBC jobs
    Big Data

    Use a number of bookmark keys in AWS Glue JDBC jobs

    adminBy adminFebruary 8, 2024Updated:February 8, 2024No Comments7 Mins Read
    Facebook Twitter Pinterest LinkedIn Tumblr Email
    Use a number of bookmark keys in AWS Glue JDBC jobs
    Share
    Facebook Twitter LinkedIn Pinterest Email
    Use a number of bookmark keys in AWS Glue JDBC jobs


    AWS Glue is a serverless information integrating service that you should utilize to catalog information and put together for analytics. With AWS Glue, you may uncover your information, develop scripts to rework sources into targets, and schedule and run extract, remodel, and cargo (ETL) jobs in a serverless setting. AWS Glue jobs are accountable for operating the information processing logic.

    One vital function of AWS Glue jobs is the flexibility to make use of bookmark keys to course of information incrementally. When an AWS Glue job is run, it reads information from a knowledge supply and processes it. A number of columns from the supply desk will be specified as bookmark keys. The column ought to have sequentially rising or lowering values with out gaps. These values are used to mark the final processed file in a batch. The following run of the job resumes from that time. This lets you course of giant quantities of knowledge incrementally. With out job bookmark keys, AWS Glue jobs must reprocess all the information throughout each run. This may be time-consuming and expensive. By utilizing bookmark keys, AWS Glue jobs can resume processing from the place they left off, saving time and lowering prices.

    This publish explains methods to use a number of columns as job bookmark keys in an AWS Glue job with a JDBC connection to the supply information retailer. It additionally demonstrates methods to parameterize the bookmark key columns and desk names within the AWS Glue job connection choices.

    This publish is targeted in the direction of architects and information engineers who design and construct ETL pipelines on AWS. You might be anticipated to have a fundamental understanding of the AWS Administration Console, AWS Glue, Amazon Relational Database Service (Amazon RDS), and Amazon CloudWatch logs.

    Resolution overview

    To implement this answer, we full the next steps:

    1. Create an Amazon RDS for PostgreSQL occasion.
    2. Create two tables and insert pattern information.
    3. Create and run an AWS Glue job to extract information from the RDS for PostgreSQL DB occasion utilizing a number of job bookmark keys.
    4. Create and run a parameterized AWS Glue job to extract information from totally different tables with separate bookmark keys

    The next diagram illustrates the parts of this answer.

    Deploy the answer

    For this answer, we offer an AWS CloudFormation template that units up the companies included within the structure, to allow repeatable deployments. This template creates the next sources:

    • An RDS for PostgreSQL occasion
    • An Amazon Easy Storage Service (Amazon S3) bucket to retailer the information extracted from the RDS for PostgreSQL occasion
    • An AWS Identification and Entry Administration (IAM) function for AWS Glue
    • Two AWS Glue jobs with job bookmarks enabled to incrementally extract information from the RDS for PostgreSQL occasion

    To deploy the answer, full the next steps:

    1. Select  to launch the CloudFormation stack:
    2. Enter a stack identify.
    3. Choose I acknowledge that AWS CloudFormation may create IAM sources with customized names.
    4. Select Create stack.
    5. Wait till the creation of the stack is full, as proven on the AWS CloudFormation console.
    6. When the stack is full, copy the AWS Glue scripts to the S3 bucket job-bookmark-keys-demo-<accountid>.
    7. Open AWS CloudShell.
    8. Run the next instructions and change <accountid> along with your AWS account ID:
    aws s3 cp s3://aws-blogs-artifacts-public/artifacts/BDB-2907/glue/scenario_1_job.py s3://job-bookmark-keys-demo-<accountid>/scenario_1_job.py
    aws s3 cp s3://aws-blogs-artifacts-public/artifacts/BDB-2907/glue/scenario_2_job.py s3://job-bookmark-keys-demo-<accountid>/scenario_2_job.py

    Add pattern information and run AWS Glue jobs

    On this part, we hook up with the RDS for PostgreSQL occasion through AWS Lambda and create two tables. We additionally insert pattern information into each the tables.

    1. On the Lambda console, select Capabilities within the navigation pane.
    2. Select the perform LambdaRDSDDLExecute.
    3. Select Take a look at and select Invoke for the Lambda perform to insert the information.


    The 2 tables product and deal with will probably be created with pattern information, as proven within the following screenshot.

    Run the multiple_job_bookmark_keys AWS Glue job

    We run the multiple_job_bookmark_keys AWS Glue job twice to extract information from the product desk of the RDS for PostgreSQL occasion. Within the first run, all the prevailing data will probably be extracted. Then we insert new data and run the job once more. The job ought to extract solely the newly inserted data within the second run.

    1. On the AWS Glue console, select Jobs within the navigation pane.
    2. Select the job multiple_job_bookmark_keys.
    3. Select Run to run the job and select the Runs tab to observe the job progress.
    4. Select the Output logs hyperlink underneath CloudWatch logs after the job is full.
    5. Select the log stream within the subsequent window to see the output logs printed.

      The AWS Glue job extracted all data from the supply desk product. It retains observe of the final mixture of values within the columns product_id and model.Subsequent, we run one other Lambda perform to insert a brand new file. The product_id 45 already exists, however the inserted file can have a brand new model as 2, making the mixture sequentially rising.
    6. Run the LambdaRDSDDLExecute_incremental Lambda perform to insert the brand new file within the product desk.
    7. Run the AWS Glue job multiple_job_bookmark_keys once more after you insert the file and await it to succeed.
    8. Select the Output logs hyperlink underneath CloudWatch logs.
    9. Select the log stream within the subsequent window to see solely the newly inserted file printed.

    The job extracts solely these data which have a mixture higher than the beforehand extracted data.

    Run the parameterised_job_bookmark_keys AWS Glue job

    We now run the parameterized AWS Glue job that takes the desk identify and bookmark key column as parameters. We run this job to extract information from totally different tables sustaining separate bookmarks.

    The primary run will probably be for the deal with desk with bookmarkkey as address_id. These are already populated with the job parameters.

    1. On the AWS Glue console, select Jobs within the navigation pane.
    2. Select the job parameterised_job_bookmark_keys.
    3. Select Run to run the job and select the Runs tab to observe the job progress.
    4. Select the Output logs hyperlink underneath CloudWatch logs after the job is full.
    5. Select the log stream within the subsequent window to see all data from the deal with desk printed.
    6. On the Actions menu, select Run with parameters.
    7. Broaden the Job parameters part.
    8. Change the job parameter values as follows:
      • Key --bookmarkkey with worth product_id
      • Key --table_name with worth product
      • The S3 bucket identify is unchanged (job-bookmark-keys-demo-<accountnumber>)
    9. Select Run job to run the job and select the Runs tab to observe the job progress.
    10. Select the Output logs hyperlink underneath CloudWatch logs after the job is full.
    11. Select the log stream to see all of the data from the product desk printed.

    The job maintains separate bookmarks for every of the tables when extracting the information from the supply information retailer. That is achieved by including the desk identify to the job identify and transformation contexts within the AWS Glue job script.

    Clear up

    To keep away from incurring future fees, full the next steps:

    1. On the Amazon S3 console, select Buckets within the navigation pane.
    2. Choose the bucket with job-bookmark-keys in its identify.
    3. Select Empty to delete all of the information and folders in it.
    4. On the CloudFormation console, select Stacks within the navigation pane.
    5. Choose the stack you created to deploy the answer and select Delete.

    Conclusion

    This publish demonstrated passing multiple column of a desk as jobBookmarkKeys in a JDBC connection to an AWS Glue job. It additionally defined how one can a parameterized AWS Glue job to extract information from a number of tables whereas maintaining their respective bookmarks. As a subsequent step, you may take a look at the incremental information extract by altering information within the supply tables.


    Concerning the Authors

    Durga Prasad is a Sr Lead Guide enabling prospects construct their Knowledge Analytics options on AWS. He’s a espresso lover and enjoys enjoying badminton.

    Murali Reddy is a Lead Guide at Amazon Internet Providers (AWS), serving to prospects construct and implement information analytics answer. When he’s not working, Murali is an avid bike rider and loves exploring new locations.



    Supply hyperlink

    Post Views: 229
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    admin
    • Website

    Related Posts

    Do not Miss this Anthropic’s Immediate Engineering Course in 2024

    August 23, 2024

    Healthcare Know-how Traits in 2024

    August 23, 2024

    Lure your foes with Valorant’s subsequent defensive agent: Vyse

    August 23, 2024

    Sony Group and Startale unveil Soneium blockchain to speed up Web3 innovation

    August 23, 2024
    Add A Comment

    Leave A Reply Cancel Reply

    Editors Picks

    The story behind Lightning Chart – and its upcoming Dashtera analytics and dashboard answer

    November 14, 2025

    This week in AI updates: GPT-5.1, Cloudsmith MCP Server, and extra (November 14, 2025)

    November 14, 2025

    OpenAI’s newest replace delivers GPT-5.1 fashions and capabilities to offer customers extra management over ChatGPT’s persona

    November 13, 2025

    The 2025 Stack Overflow Developer Survey with Jody Bailey and Erin Yepis

    November 13, 2025
    Load More
    TC Technology News
    Facebook X (Twitter) Instagram Pinterest Vimeo YouTube
    • About Us
    • Contact Us
    • Disclaimer
    • Privacy Policy
    • Terms and Conditions
    © 2025ALL RIGHTS RESERVED Tebcoconsulting.

    Type above and press Enter to search. Press Esc to cancel.