
This put up is written in collaboration with Clarisa Tavolieri, Austin Rappeport and Samantha Gignac from Zurich Insurance coverage Group.
The expansion in quantity and variety of logging sources has been rising exponentially over the previous few years, and can proceed to extend within the coming years. Consequently, clients throughout all industries are dealing with a number of challenges comparable to:
- Balancing storage prices in opposition to assembly long-term log retention necessities
- Bandwidth points when shifting logs between the cloud and on premises
- Useful resource scaling and efficiency points when making an attempt to research huge quantities of log knowledge
- Maintaining tempo with the rising storage necessities, whereas additionally having the ability to present insights from the info
- Aligning license prices for Safety Data and Occasion Administration (SIEM) distributors with log processing, storage, and efficiency necessities. SIEM options make it easier to implement real-time reporting by monitoring your setting for safety threats and alerting on threats as soon as detected.
Zurich Insurance coverage Group (Zurich) is a number one multi-line insurer offering property, casualty, and life insurance coverage options globally. In 2022, Zurich started a multi-year program to speed up their digital transformation and innovation by the migration of 1,000 purposes to AWS, together with core insurance coverage and SAP workloads.
The Zurich Cyber Fusion Middle administration staff confronted comparable challenges, comparable to balancing licensing prices to ingest and long-term retention necessities for each enterprise software log and safety log knowledge inside the current SIEM structure. Zurich needed to determine a log administration resolution to work along with their current SIEM resolution. The brand new method would want to supply the pliability to combine new applied sciences comparable to machine studying (ML), scalability to deal with long-term retention at forecasted progress ranges, and supply choices for price optimization. On this put up, we focus on how Zurich constructed a hybrid structure on AWS incorporating AWS companies to fulfill their necessities.
Answer overview
Zurich and AWS Skilled Providers collaborated to construct an structure that addressed decoupling long-term storage of logs, distributing analytics and alerting capabilities, and optimizing storage prices for log knowledge. The answer was based mostly on categorizing and prioritizing log knowledge into precedence ranges between 1–3, and routing logs to totally different locations based mostly on precedence. The next diagram illustrates the answer structure.
The workflow steps are as follows:
- All the logs (P1, P2, and P3) are collected and ingested into an extract, rework, and cargo (ETL) service, AWS Associate Cribl’s Stream product, in actual time. Capturing and streaming of logs is configured per use case based mostly on the capabilities of the supply, comparable to utilizing built-in forwarders, putting in brokers, utilizing Cribl Streams, and utilizing AWS companies like Amazon Information Firehose. This ETL service performs two capabilities earlier than knowledge reaches the analytics layer:
- Information normalization and aggregation – The uncooked log knowledge is normalized and aggregated within the required format to carry out analytics. The method consists of normalizing log subject names, standardizing on JSON, eradicating unused or duplicate fields, and compressing to cut back storage necessities.
- Routing mechanism – Upon finishing knowledge normalization, the ETL service will apply obligatory routing mechanisms to ingest log knowledge to respective downstream methods based mostly on class and precedence.
- Precedence 1 logs, comparable to community detection & response (NDR), endpoint detection and response (EDR), and cloud risk detection companies (for instance, Amazon GuardDuty), are ingested on to the present on-premises SIEM resolution for real-time analytics and alerting.
- Precedence 2 logs, comparable to working system safety logs, firewall, id supplier (IdP), electronic mail metadata, and AWS CloudTrail, are ingested into Amazon OpenSearch Service to allow the next capabilities. Beforehand, P2 logs have been ingested into the SIEM.
- Systematically detect potential threats and react to a system’s state by alerting, and integrating these alerts again into Zurich’s SIEM for bigger correlation, decreasing by roughly 85% the quantity of knowledge ingestion into Zurich’s SIEM. Ultimately, Zurich plans to make use of ML plugins comparable to anomaly detection to boost evaluation.
- Develop log and hint analytics options with interactive queries and visualize outcomes with excessive adaptability and velocity.
- Scale back the typical time to ingest and common time to go looking that accommodates the rising scale of log knowledge.
- Sooner or later, Zurich plans to make use of OpenSearch’s safety analytics plugin, which may also help safety groups shortly detect potential safety threats through the use of over 2,200 pre-built, publicly accessible Sigma safety guidelines or create customized guidelines.
- Precedence 3 logs, comparable to logs from enterprise purposes and vulnerability scanning instruments, aren’t ingested into the SIEM or OpenSearch Service, however are forwarded to Amazon Easy Storage Service (Amazon S3) for storage. These may be queried as wanted utilizing one-time queries.
- Copies of all log knowledge (P1, P2, P3) are despatched in actual time to Amazon S3 for extremely sturdy, long-term storage to fulfill the next:
- Lengthy-term knowledge retention – S3 Object Lock is used to implement knowledge retention per Zurich’s compliance and regulatory necessities.
- Price-optimized storage – Lifecycle insurance policies robotically transition knowledge with much less frequent entry patterns to lower-cost Amazon S3 storage lessons. Zurich additionally makes use of lifecycle insurance policies to robotically expire objects after a predefined interval. Lifecycle insurance policies present a mechanism to steadiness the price of storing knowledge and assembly retention necessities.
- Historic knowledge evaluation – Information saved in Amazon S3 may be queried to fulfill one-time audit or evaluation duties. Ultimately, this knowledge could possibly be used to coach ML fashions to assist higher anomaly detection. Zurich has completed testing with Amazon SageMaker and has plans so as to add this functionality within the close to future.
- One-time question evaluation – Easy audit use instances require historic knowledge to be queried based mostly on totally different time intervals, which may be carried out utilizing Amazon Athena and AWS Glue analytic companies. Through the use of Athena and AWS Glue, each serverless companies, Zurich can carry out easy queries with out the heavy lifting of working and sustaining servers. Athena helps a wide range of compression codecs for studying and writing knowledge. Due to this fact, Zurich is ready to retailer compressed logs in Amazon S3 to realize cost-optimized storage whereas nonetheless having the ability to carry out one-time queries on the info.
As a future functionality, supporting on-demand, complicated question, evaluation, and reporting on giant historic datasets could possibly be carried out utilizing Amazon OpenSearch Serverless. Additionally, OpenSearch Service helps zero-ETL integration with Amazon S3, the place customers can question their knowledge saved in Amazon S3 utilizing OpenSearch Service question capabilities.
The answer outlined on this put up supplies Zurich an structure that helps scalability, resilience, price optimization, and suppleness. We focus on these key advantages within the following sections.
Scalability
Given the quantity of knowledge presently being ingested, Zurich wanted an answer that might fulfill current necessities and supply room for progress. On this part, we focus on how Amazon S3 and OpenSearch Service assist Zurich obtain scalability.
Amazon S3 is an object storage service that provides industry-leading scalability, knowledge availability, safety, and efficiency. The entire quantity of knowledge and variety of objects you possibly can retailer in Amazon S3 are nearly limitless. Based mostly on its distinctive structure, Amazon S3 is designed to exceed 99.999999999% (11 nines) of knowledge sturdiness. Moreover, Amazon S3 shops knowledge redundantly throughout a minimal of three Availability Zones (AZs) by default, offering built-in resilience in opposition to widespread catastrophe. For instance, the S3 Normal storage class is designed for 99.99% availability. For extra data, try the Amazon S3 FAQs.
Zurich makes use of AWS Associate Cribl’s Stream resolution to route copies of all log data to Amazon S3 for long-term storage and retention, enabling Zurich to decouple log storage from their SIEM resolution, a standard problem dealing with SIEM options immediately.
OpenSearch Service is a managed service that makes it simple to run OpenSearch with out having to handle the underlying infrastructure. Zurich’s present on-premises SIEM infrastructure is comprised of greater than 100 servers, all of which need to be operated and maintained. Zurich hopes to cut back this infrastructure footprint by 75% by offloading precedence 2 and three logs from their current SIEM resolution.
To assist geographies with restrictions on cross-border knowledge switch and to satisfy availability necessities, AWS and Zurich labored collectively to outline an Amazon OpenSearch Service configuration that might assist 99.9% availability utilizing a number of AZs in a single area.
OpenSearch Service helps cross-region and cross-cluster queries, which helps with distributing evaluation and processing of logs with out shifting knowledge, and supplies the flexibility to mixture data throughout clusters. Since Zurich plans to deploy a number of OpenSearch domains in numerous areas, they may use cross-cluster search performance to question knowledge seamlessly throughout totally different regional domains with out shifting knowledge. Zurich additionally configured a connector for his or her current SIEM to question OpenSearch, which additional permits distributed processing from on premises, and allows aggregation of knowledge throughout knowledge sources. Consequently, Zurich is ready to distribute processing, decouple storage, and publish key data within the type of alerts and queries to their SIEM resolution with out having to ship log knowledge.
As well as, lots of Zurich’s enterprise models have logging necessities that is also glad utilizing the identical AWS companies (OpenSearch Service, Amazon S3, AWS Glue, and Amazon Athena). As such, the AWS elements of the structure have been templatized utilizing Infrastructure as Code (IaC) for constant, repeatable deployment. These elements are already getting used throughout Zurich’s enterprise models.
Price optimization
In fascinated about optimizing prices, Zurich needed to take into account how they might proceed to ingest 5 TB per day of safety log data only for their centralized safety logs. As well as, traces of companies wanted comparable capabilities to satisfy necessities, which might embrace processing 500 GB per day.
With this resolution, Zurich can management (by offloading P2 and P3 log sources) the portion of logs which might be ingested into their major SIEM resolution. Consequently, Zurich has a mechanism to handle licensing prices, in addition to enhance the effectivity of queries by decreasing the quantity of knowledge the SIEM must parse on search.
As a result of copies of all log knowledge are going to Amazon S3, Zurich is ready to benefit from the totally different Amazon S3 storage tiers, comparable to utilizing S3 Clever-Tiering to robotically transfer knowledge amongst Rare Entry and Archive Entry tiers, to optimize the price of retaining a number of years’ price of log knowledge. When knowledge is moved to the Rare Entry tier, prices are diminished by as much as 40%. Equally, when knowledge is moved to the Archive On the spot Entry tier, storage prices are diminished by as much as 68%.
Confer with Amazon S3 pricing for present pricing, in addition to for data by area. Transferring knowledge to S3 Rare Entry and Archive Entry tiers supplies a major price financial savings alternative whereas assembly long-term retention necessities.
The staff at Zurich analyzed precedence 2 log sources, and based mostly on historic analytics and question patterns, decided that solely the newest 7 days of logs are sometimes required. Due to this fact, OpenSearch Service was right-sized for retaining 7 days of logs in a scorching tier. Slightly than configuring UltraWarm and chilly storage tiers for OpenSearch Service, copies of the remaining logs have been concurrently being despatched to Amazon S3 for long-term retention and could possibly be queried utilizing Athena.
The mix of cost-optimization choices is projected to cut back by 53% the price of per GB of log knowledge ingested and saved for 13 months when in comparison with the earlier method.
Flexibility
One other key consideration for the structure was the pliability to combine with current alerting methods and knowledge pipelines, in addition to the flexibility to include new know-how into Zurich’s log administration method. For instance, Zurich additionally configured a connector for his or her current SIEM to question OpenSearch, which additional permits distributed processing from on premises and allows aggregation of knowledge throughout knowledge sources.
Throughout the OpenSearch Service software program, there are alternatives to increase log evaluation utilizing safety analytics with predefined indicators of compromise throughout frequent log varieties. OpenSearch Service additionally gives the aptitude to combine with ML capabilities comparable to anomaly detection and alert correlation to boost log evaluation.
With the introduction of Amazon Safety Lake, there’s one other alternative to increase the answer to extra effectively handle AWS logging sources and add to this structure. For instance, you need to use Amazon OpenSearch Ingestion to generate safety insights on safety knowledge from Amazon Safety Lake.
Abstract
On this put up, we reviewed how Zurich was capable of construct a log knowledge administration structure that offered the scalability, flexibility, efficiency, and cost-optimization mechanisms wanted to satisfy their necessities.
To be taught extra about elements of this resolution, go to the Centralized Logging with OpenSearch implementation information, evaluation Querying AWS service logs, or run by the SIEM on Amazon OpenSearch Service workshop.
In regards to the Authors
Clarisa Tavolieri is a Software program Engineering graduate with {qualifications} in Enterprise, Audit, and Technique Consulting. With an in depth profession within the monetary and tech industries, she makes a speciality of knowledge administration and has been concerned in initiatives starting from reporting to knowledge structure. She presently serves because the World Head of Cyber Information Administration at Zurich Group. In her function, she leads the info technique to assist the safety of firm property and implements superior analytics to boost and monitor cybersecurity instruments.
Austin Rappeport is a Pc Engineer who graduated from the College of Illinois Urbana/Champaign in 2011 with a spotlight in Pc Safety. After commencement, he labored for the Federal Power Regulatory Fee within the Workplace of Electrical Reliability, working with the North American Electrical Reliability Company’s Essential Infrastructure Safety Requirements on each the audit and enforcement aspect, in addition to requirements improvement. Austin presently works for Zurich Insurance coverage because the World Head of Detection Engineering and Automation, the place he leads the staff liable for utilizing Zurich’s safety instruments to detect suspicious and malicious exercise and enhance inner processes by automation.
Samantha Gignac is a World Safety Architect at Zurich Insurance coverage. She graduated from Ferris State College in 2014 with a Bachelor’s diploma in Pc Techniques & Community Engineering. With expertise within the insurance coverage, healthcare, and provide chain industries, she has held roles comparable to Storage Engineer, Danger Administration Engineer, Vulnerability Administration Engineer, and SOC Engineer. As a Cybersecurity Architect, she designs and implements safe community methods to guard organizational knowledge and infrastructure from cyber threats.
Claire Sheridan is a Principal Options Architect with Amazon Internet Providers working with international monetary companies clients. She holds a PhD in Informatics and has greater than 15 years of {industry} expertise in tech. She loves touring and visiting artwork galleries.
Jake Obi is a Principal Safety Advisor with Amazon Internet Providers based mostly in South Carolina, US, with over 20 years’ expertise in data know-how. He helps monetary companies clients enhance their safety posture within the cloud. Previous to becoming a member of Amazon, Jake was an Data Assurance Supervisor for the US Navy, the place he labored on a big satellite tv for pc communications program in addition to internet hosting authorities web sites utilizing the general public cloud.
Srikanth Daggumalli is an Analytics Specialist Options Architect in AWS. Out of 18 years of expertise, he has over a decade of expertise in architecting cost-effective, performant, and safe enterprise purposes that enhance buyer reachability and expertise, utilizing huge knowledge, AI/ML, cloud, and safety applied sciences. He has constructed high-performing knowledge platforms for main monetary establishments, enabling improved buyer attain and distinctive experiences. He’s specialised in companies like cross-border transactions and architecting strong analytics platforms.
Freddy Kasprzykowski is a Senior Safety Advisor with Amazon Internet Providers based mostly in Florida, US, with over 20 years’ expertise in data know-how. He helps clients undertake AWS companies securely based on {industry} greatest practices, requirements, and compliance rules. He’s a member of the Buyer Incident Response Crew (CIRT), serving to clients throughout safety occasions, a seasoned speaker at AWS re:Invent and AWS re:Inforce conferences, and a contributor to open supply tasks associated to AWS safety.