
In right now’s data-driven panorama, organizations are searching for methods to streamline their information administration processes and unlock the total potential of their information property, whereas controlling entry and imposing governance. That’s why we launched Amazon DataZone.
Amazon DataZone is a robust information administration service that empowers information engineers, information scientists, product managers, analysts, and enterprise customers to seamlessly catalog, uncover, analyze, and govern information throughout organizational boundaries, AWS accounts, information lakes, and information warehouses.
On March 21, 2024, Amazon DataZone launched a number of thrilling enhancements to its Amazon Redshift integration that simplify the method of publishing and subscribing to information warehouse property like tables and views, whereas enabling Amazon Redshift clients to reap the benefits of the information administration and governance capabilities or Amazon DataZone.
These updates empower the expertise for each information customers and directors.
Knowledge producers and shoppers can now shortly create information warehouse environments utilizing preconfigured credentials and connection parameters offered by their Amazon DataZone directors.
Moreover, these enhancements grant directors larger management over who can entry and use the assets inside their AWS accounts and Redshift clusters, and for what function.
As an administrator, now you can create parameter units on prime of DefaultDataWarehouseBlueprint
by offering parameters corresponding to cluster, database, and an AWS secret. You should use these parameter units to create surroundings profiles and authorize Amazon DataZone tasks to make use of these surroundings profiles for creating environments.
In flip, information producers and information shoppers can now choose an surroundings profile to create environments with out having to supply the parameters themselves, saving time and decreasing the chance of points.
On this put up, we clarify how you need to use these enhancements to the Amazon Redshift integration to publish your Redshift tables to the Amazon DataZone information catalog, and allow customers throughout the group to find and entry them in a self-service style. We current a pattern end-to-end buyer workflow that covers the core functionalities of Amazon DataZone, and embrace a step-by-step information of how one can implement this workflow.
The identical workflow is obtainable as video demonstration on the Amazon DataZone official YouTube channel.
Answer overview
To get began with the brand new Amazon Redshift integration enhancements, take into account the next state of affairs:
- A gross sales crew acts as the information producer, proudly owning and publishing product gross sales information (a single desk in a Redshift cluster known as
catalog_sales
) - A advertising crew acts as the information client, needing entry to the gross sales information to be able to analyze it and construct product adoption campaigns
At a excessive degree, the steps we stroll you thru within the following sections embrace duties for the Amazon DataZone administrator, Gross sales crew, and Advertising and marketing crew.
Stipulations
For the workflow described on this put up, we assume a single AWS account, a single AWS Area, and a single AWS Id and Entry Administration (IAM) consumer, who will act as Amazon DataZone administrator, Gross sales crew (producer), and Advertising and marketing crew (client).
To observe alongside, you want an AWS account. In case you don’t have an account, you may create one.
As well as, it’s essential to have the next assets configured in your account:
- An Amazon DataZone area with admin, gross sales, and advertising tasks
- A Redshift namespace and workgroup
In case you don’t have these assets already configured, you may create them by deploying an AWS CloudFormation stack:
- Select Launch Stack to deploy the offered CloudFormation template.
- For
AdminUserPassword
, enter a password, and pay attention to this password to make use of in later steps. - Go away the remaining settings as default.
- Choose I acknowledge that AWS CloudFormation would possibly create IAM assets, then select Submit.
- When the stack deployment is full, on the Amazon DataZone console, select View domains within the navigation pane to see the brand new created Amazon DataZone area.
- On the Amazon Redshift Serverless console, within the navigation pane, select Workgroup configuration and see the brand new created useful resource.
You need to be logged in utilizing the identical position that you simply used to deploy the CloudFormation stack and confirm that you simply’re in the identical Area.
As a ultimate prerequisite, you might want to create a catalog_sales
desk within the default Redshift database (dev
).
- On the Amazon Redshift Serverless console, chosen your workgroup and select Question information to open the Amazon Redshift question editor.
- Within the question editor, select your workgroup and choose Database consumer identify and password as the kind of connection, then present your admin database consumer identify and password.
- Use the next question to create the
catalog_sales
desk, which the Gross sales crew will publish within the workflow:
Now you’re able to get began with the brand new Amazon Redshift integration enhancements.
Amazon DataZone administrator duties
Because the Amazon DataZone administrator, you carry out the next duties:
- Configure the
DefaultDataWarehouseBlueprint
.- Authorize the Amazon DataZone admin challenge to make use of the blueprint to create surroundings profiles.
- Create a parameter set on prime of
DefaultDataWarehouseBlueprint
by offering parameters corresponding to cluster, database, and AWS secret.
- Arrange surroundings profiles for the Gross sales and Advertising and marketing groups.
Configure the DefaultDataWarehouseBlueprint
Amazon DataZone blueprints outline what AWS instruments and companies are provisioned for use inside an Amazon DataZone surroundings. Enabling the information warehouse blueprint will enable information shoppers and information producers to make use of Amazon Redshift and the Question Editor for information sharing, accessing, and consuming.
- On the Amazon DataZone console, select View domains within the navigation pane.
- Select your Amazon DataZone area.
- Select Default Knowledge Warehouse.
In case you used the CloudFormation template, the blueprint is already enabled.
A part of the brand new Amazon Redshift expertise entails the Managing tasks and Parameter units tabs. The Managing tasks tab lists the tasks which might be allowed to create surroundings profiles utilizing the information warehouse blueprint. By default, that is set to all tasks. For our function, let’s grant solely the admin challenge.
- On the Managing tasks tab, select Edit.
- Choose Limit to solely managing tasks and select the
AdminPRJ
challenge. - Select Save modifications.
With this enhancement, the administrator can management which tasks can use default blueprints of their account to create surroundings profile
The Parameter units tab lists parameters that you could create on prime of DefaultDataWarehouseBlueprint
by offering parameters corresponding to Redshift cluster or Redshift Serverless workgroup identify, database identify, and the credentials that enable Amazon DataZone to connect with your cluster or workgroup. You may as well create AWS secrets and techniques on the Amazon DataZone console. Earlier than these enhancements, AWS secrets and techniques needed to be managed individually utilizing AWS Secrets and techniques Supervisor, ensuring to incorporate the correct tags (key-value) for Amazon Redshift Serverless.
For our state of affairs, we have to create a parameter set to attach a Redshift Serverless workgroup containing gross sales information.
- On the Parameter units tab, select Create parameter set.
- Enter a reputation and optionally available description for the parameter set.
- Select the Area containing the useful resource you wish to hook up with (for instance, our workgroup is in
us-east-1
). - Within the Setting parameters part, choose Amazon Redshift Serverless.
If you have already got an AWS secret with credentials to your Redshift Serverless workgroup, you may present the present AWS secret ARN. On this case, the key have to be tagged with the next (key-value): AmazonDataZoneDomain: <Amazon DataZone area ID>.
- As a result of we don’t have an current AWS secret, we create a brand new one by selecting Create new AWS Secret.
- Within the pop-up, enter a secret identify and your Amazon Redshift credentials, then select Create new AWS Secret.
Amazon DataZone creates a brand new secret utilizing Secrets and techniques Supervisor and makes certain the key is tagged with the area through which you’re creating the parameter set.
- Enter the Redshift Serverless workgroup identify and database identify to finish the parameters record. In case you used the offered CloudFormation template, use
sales-workgroup
for the workgroup identify anddev
for the database identify. - Select Create parameter set.
You’ll be able to see the parameter set created on your Redshift surroundings and the blueprint enabled with a single managing challenge configured.
Arrange surroundings profiles for the Gross sales and Advertising and marketing groups
Setting profiles are predefined templates that encapsulate technical particulars required to create an surroundings, such because the AWS account, Area, and assets and instruments to be added to tasks. The subsequent Amazon DataZone administrator job consists of organising surroundings profiles, primarily based on the default enabled blueprint, for the Gross sales and Advertising and marketing groups.
This job might be carried out from the admin challenge within the Amazon DataZone information portal, so let’s observe the information portal URL and begin creating an surroundings profile for the Gross sales crew to publish their information.
- On the main points web page of your Amazon DataZone area, within the Abstract part, select the hyperlink on your information portal URL.
Once you open the information portal for the primary time, you’re prompted to create a challenge. In case you used the offered CloudFormation template, the tasks are already created.
- Select the
AdminPRJ
challenge. - On the Environments web page, select Create surroundings profile.
- Enter a reputation (for instance,
SalesEnvProfile
) and optionally available description (for instance,Gross sales DWH Setting Profile
) for the brand new surroundings profile. - For Proprietor, select
AdminPRJ
. - For Blueprint, choose the
DefaultDataWarehouse
blueprint (you’ll solely see blueprints the place the admin challenge is listed as a managing challenge). - Select the present enabled account and the parameter set you beforehand created.
Then you will notice every pre-compiled worth for Redshift Serverless. Below Licensed tasks, you may decide the approved tasks allowed to make use of this surroundings profile to create an surroundings. By default, that is set to All tasks.
- Choose Licensed tasks solely.
- Select Add tasks and select the
SalesPRJ
challenge. - Configure the publishing permissions for this surroundings profile. As a result of the Gross sales crew is our information producer, we choose Publish from any schema.
- Select Create surroundings profile.
Subsequent, you create a second surroundings profile for the Advertising and marketing crew to devour information. To do that, you repeat related steps made for the Gross sales crew.
- Select the
AdminPRJ
challenge. - On the Environments web page, select Create surroundings profile.
- Enter a reputation (for instance,
MarketingEnvProfile
) and optionally available description (for instance,Advertising and marketing DWH Setting Profile
). - For Proprietor, select
AdminPRJ
. - For Blueprint, choose the
DefaultDataWarehouse
blueprint. - Choose the parameter set you created earlier.
- This time, maintain All tasks because the default (alternatively, you would choose Licensed tasks solely and add
MarketingPRJ
). - Configure the publishing permissions for this surroundings profile. As a result of the Advertising and marketing crew is our information client, we choose Don’t enable publishing.
- Select Create surroundings profile.
With these two surroundings profiles in place, the Gross sales and Advertising and marketing groups can begin engaged on their tasks on their very own to create their correct environments (assets and instruments) with fewer configurations and fewer danger to incur errors, and publish and devour information securely and effectively inside these environments.
To recap, the brand new enhancements supply the next options:
- When creating an surroundings profile, you may select to supply your individual Amazon Redshift parameters or use one of many parameter units from the blueprint configuration. In case you select to make use of the parameter set created within the blueprint configuration, the AWS secret solely requires the
AmazonDataZoneDomain
tag (theAmazonDataZoneProject
tag is just required for those who select to supply your individual parameter units within the surroundings profile). - Within the surroundings profile, you may specify an inventory of approved tasks, in order that solely approved tasks can use this surroundings profile to create information warehouse environments.
- You may as well specify what information approved tasks are allowed to be revealed. You’ll be able to select one of many following choices: Publish from any schema, Publish from the default surroundings schema, and Don’t enable publishing.
These enhancements grant directors extra management over Amazon DataZone assets and tasks and facilitate the widespread actions of all roles concerned.
Gross sales crew duties
As an information producer, the Gross sales crew performs the next duties:
- Create a gross sales surroundings.
- Create an information supply.
- Publish gross sales information to the Amazon DataZone information catalog.
Create a gross sales surroundings
Now that you’ve got an surroundings profile, you might want to create an surroundings to be able to work with information and analytics instruments on this challenge.
- Select the
SalesPRJ
challenge. - On the Environments web page, select Create surroundings.
- Enter a reputation (for instance,
SalesDwhEnv
) and optionally available description (for instance,Setting DWH for Gross sales
) for the brand new surroundings. - For Setting profile, select
SalesEnvProfile
.
Knowledge producers can now choose an surroundings profile to create environments, with out the necessity to present their very own Amazon Redshift parameters. The AWS secret, Area, workgroup, and database are ported over to the surroundings from the surroundings profile, streamlining and simplifying the expertise for Amazon DataZone customers.
- Assessment your information warehouse parameters to substantiate every little thing is appropriate.
- Select Create surroundings.
The surroundings might be mechanically provisioned by Amazon DataZone with the preconfigured credentials and connection parameters, permitting the Gross sales crew to publish Amazon Redshift tables seamlessly.
Create an information supply
Now, let’s create a brand new information supply for our gross sales information.
- Select the
SalesPRJ
challenge. - On the Knowledge web page, select Create information supply.
- Enter a reputation (for instance,
SalesDataSource
) and optionally available description. - For Knowledge supply kind, choose Amazon Redshift.
- For Setting¸ select
SalesDevEnv
. - For Redshift credentials, you need to use the identical credentials you offered throughout surroundings creation, since you’re nonetheless utilizing the identical Redshift Serverless workgroup.
- Below Knowledge Choice, enter the schema identify the place your information is situated (for instance,
public
) after which specify a desk choice criterion (for instance, *).
Right here, the * signifies that this information supply will carry into Amazon DataZone all of the technical metadata from the database tables of your schema (on this case, a single desk known as catalog_sales
).
- Select Subsequent.
On the following web page, automated metadata technology is enabled. Which means that Amazon DataZone will mechanically generate the enterprise names of the desk and columns for that asset.
- Go away the settings as default and select Subsequent.
- For Run desire, choose when to run the information supply. Amazon DataZone can mechanically publish these property to the information catalog, however let’s choose Run on demand so we will curate the metadata earlier than publishing.
- Select Subsequent.
- Assessment all settings and select Create information supply.
- After the information supply has been created, you may manually pull technical metadata from the Redshift Serverless workgroup by selecting Run.
When the information supply has completed operating, you may see the catalog_sales
asset appropriately added to the stock.
Publish gross sales information to the Amazon DataZone information catalog
Open the catalog_sales
asset to see particulars of the brand new asset (enterprise metadata, technical metadata, and so forth).
In a real-world state of affairs, this pre-publishing part is when you may enrich the asset offering extra enterprise context and knowledge, corresponding to a readme, glossaries, or metadata varieties. For instance, you can begin accepting some metadata mechanically generated suggestions and rename the asset or its columns to be able to make them extra readable, descriptive, and straightforward to go looking and perceive from a enterprise consumer.
For this put up, merely select Publish asset to finish the Gross sales crew duties.
Advertising and marketing crew duties
Let’s swap to the Advertising and marketing crew and subscribe to the catalog_sales
asset revealed by the Gross sales crew. As a client crew, the Advertising and marketing crew will full the next duties:
- Create a advertising surroundings.
- Uncover and subscribe to gross sales information.
- Question the information in Amazon Redshift.
Create a advertising surroundings
To subscribe and entry Amazon DataZone property, the Advertising and marketing crew must create an surroundings.
- Select the
MarketingPRJ
challenge. - On the Environments web page, select Create surroundings.
- Enter a reputation (for instance,
MarketingDwhEnv
) and optionally available description (for instance,Setting DWH for Advertising and marketing
). - For Setting profile, select
MarketingEnvProfile
.
As with information producers, information shoppers may also profit from a pre-configured profile (created and managed by the administrator) to be able to pace up the surroundings creation course of, avoiding errors and decreasing dangers of errors.
- Assessment your information warehouse parameters to substantiate every little thing is appropriate.
- Select Create surroundings.
Uncover and subscribe to gross sales information
Now that now we have a client surroundings, let’s search the catalog_sales
desk within the Amazon DataZone information catalog.
- Enter
gross sales
within the search bar. - Select the
catalog_sales
desk. - Select Subscribe.
- Within the pop-up window, select your advertising client challenge, present a motive for the subscription request, and select Subscribe.
Once you get a subscription request as an information producer, Amazon DataZone will notify you thru a job within the gross sales producer challenge. Since you’re performing as each subscriber and writer right here, you will notice a notification.
- Select the notification, which can open the subscription request.
You’ll be able to see particulars together with which challenge has requested entry, who’s the requestor, and why entry is required.
- To approve, enter a message for approval and select Approve.
Now that subscription has been authorized, let’s return to the MarketingPRJ
. On the Subscribed information web page, catalog_sales
is listed as an authorized asset, however entry hasn’t been granted but. If we select the asset, you may see that Amazon DataZone is engaged on the backend to mechanically grant the entry. When it’s full, you’ll see the subscription as granted and the message “Asset added to 1 surroundings.”
Question information in Amazon Redshift
Now that the advertising challenge has entry to the gross sales information, we will use the Amazon Redshift Question Editor V2 to research the gross sales information.
- Below
MarketingPRJ
, go to the Environments web page and choose the advertising surroundings. - Below the analytics instruments, select Question information with Amazon Redshift, which redirects you to the question editor inside the surroundings of the challenge.
- To hook up with Amazon Redshift, select your workgroup and choose Federated consumer because the connection kind.
Once you’re related, you will notice the catalog_sales
desk underneath the public
schema.
- To just be sure you have entry to this desk, run the next question:
SELECT * FROM catalog_sales LIMIT 10
As a client, you’re now capable of discover information and create reviews, or you may combination information and create new property to publish in Amazon DataZone, changing into a producer of a brand new information product to share with different customers and departments.
Clear up
To scrub up your assets, full the next steps:
- On the Amazon DataZone console, delete the tasks used on this put up. This can delete most project-related objects like information property and environments.
- Clear up all Amazon Redshift assets (workgroup and namespace) to keep away from incurring extra expenses.
Conclusion
On this put up, we demonstrated how one can get began with the brand new Amazon Redshift integration in Amazon DataZone. We confirmed find out how to streamline the expertise for information producers and shoppers and find out how to grant directors management over information assets.
Embrace these enhancements and unlock the total potential of Amazon DataZone and Amazon Redshift on your information administration wants.
Assets
For extra info, discuss with the next assets:
Concerning the creator
Carmen is a Options Architect at AWS, primarily based in Milan (Italy). She is a Knowledge Lover that enjoys serving to corporations within the adoption of Cloud applied sciences, particularly with Knowledge Analytics and Knowledge Governance. Outdoors of labor, she is a artistic individuals who loves being involved with nature and typically training adrenaline actions.