AWS launches Amazon DataZone

New data management service helps customers catalog, discover, share, and govern data across their organization.

  • Thursday, 1st December 2022 Posted 1 year ago in by Phil Alsop

Amazon Web Services Inc. (AWS) has launched Amazon DataZone, a new data management service that makes it faster and easier for customers to catalog, discover, share, and govern data stored across AWS, on-premises, and third-party sources. With Amazon DataZone, administrators and data stewards who oversee an organization’s data assets can manage and govern access to data using fine-grained controls to ensure it is accessed with the right level of privileges and in the right context. Amazon DataZone makes it easy for engineers, data scientists, product managers, analysts, and business users to access data throughout an organization so they can discover, use, and collaborate with data to derive insights. To learn more, visit aws.amazon.com/datazone.

Organizations today collect petabytes, and even exabytes, of data spread across multiple departments, services, on-premises databases, and third-party sources (e.g., partner solutions and public datasets). Before organizations can unlock the full value of this data, administrators and data stewards (i.e., data producers) who generate and manage data need to make it accessible, while maintaining control and governance to ensure it can only be accessed by the right person and in the right context. Simultaneously, employees across the company (i.e., data consumers) want to discover and analyze information from data producers to drive their decision making. Organizations must balance the need for control, to ensure data remains secure, with the need for access, to drive new insights, but it is challenging to implement governance policies that take into account the variety of data, departments, and use cases across an organization. Some businesses build catalogs to curate their information, but these systems are time consuming to maintain, require data producers to manually label each dataset with additional context (e.g., origin and description) to make it discoverable, and lack built-in access controls to make governance simple. Organizations also struggle to enforce a consistent data taxonomy, and individual data producers must keep their own information in sync, which makes it hard to search for data across an organization and can lead to information becoming stale. Even if a data consumer finds the information they need, they do not have a simple way to request access from the owner directly from the catalog, to load the data into analytics services, and to collaborate with others. As a result, decision-makers cannot get the information they need in a timely manner, or they may make poor decisions based on incomplete or outdated data.

Amazon DataZone is a new data management service that makes it easier for data producers to manage and govern access to data and enables data consumers to discover, use, and collaborate on data to drive business insights. Data producers use Amazon DataZone’s web portal to set up their own business data catalog by defining their data taxonomy, configuring governance policies, and connecting to a range of AWS services (e.g., Amazon S3 and Amazon Redshift), partner solutions (e.g., Salesforce and ServiceNow), and on-premises systems. Amazon DataZone removes the heavy lifting of maintaining a catalog by using machine learning to collect and suggest metadata (e.g., origin and data type) for each dataset and by training on a customer’s taxonomy and preferences to improve over time. After the catalog is set up, data consumers can use the Amazon DataZone web portal to search and discover data assets, examine metadata for context, and request access to datasets. When a data consumer is ready to start analyzing data, they create an Amazon DataZone Data Project—a shared space in the web portal where users can pull in different datasets, share access with colleagues, and collaborate on analysis. Amazon DataZone is integrated with AWS analytics services, such as Amazon Redshift, Amazon Athena, and Amazon QuickSight, which enables data consumers to access these services in the context of their data project, so they do not need to manage separate login credentials and their data is automatically available in these services. Amazon DataZone also provides application programming interfaces (APIs) to integrate with custom solutions or partners like DataBricks, Snowflake, and Tableau, so customers can easily publish, search, and work with all their data assets.

“Good governance is the foundation that makes data accessible to the entire organization, but we often hear from customers that it is difficult to strike the right balance between making data discoverable and maintaining control,” said Swami Sivasubramanian, vice president of Databases, Analytics, and Machine Learning at AWS. “With Amazon DataZone, customers can use a single service that balances strong governance controls with streamlined access to make it easy to find, organize, and collaborate with data. Amazon DataZone sets data free across the organization, so every employee can help drive new insights to maximize its value.”