You should utilize Amazon Information Firehose to combination and ship log occasions out of your purposes and providers captured in Amazon CloudWatch Logs to your Amazon Easy Storage Service (Amazon S3) bucket and Splunk locations, to be used circumstances resembling information analytics, safety evaluation, utility troubleshooting and many others. By default, CloudWatch Logs are delivered as gzip-compressed objects. You may want the info to be decompressed, or need logs to be delivered to Splunk, which requires decompressed information enter, for utility monitoring and auditing.
AWS launched a characteristic to help decompression of CloudWatch Logs in Firehose. With this new characteristic, you possibly can specify an possibility in Firehose to decompress CloudWatch Logs. You now not need to carry out further processing utilizing AWS Lambda or post-processing to get decompressed logs, and may ship decompressed information to Splunk. Moreover, you should use non-compulsory Firehose options resembling report format conversion to transform CloudWatch Logs to Parquet or ORC, and dynamic partitioning to mechanically group streaming data primarily based on keys within the information (for instance, by month) and ship the grouped data to corresponding Amazon S3 prefixes.
On this publish, we take a look at how one can allow the decompression characteristic for Splunk and Amazon S3 locations. We begin with Splunk after which Amazon S3 for brand spanking new streams, then we handle migration steps to make the most of this characteristic and simplify your current pipeline.
Decompress CloudWatch Logs for Splunk
You should utilize subscription filter in CloudWatch log teams to ingest information on to Firehose or by way of Amazon Kinesis Information Streams.
Notice: For the CloudWatch Logs decompression characteristic, you want a HTTP Occasion Collector (HEC) information enter created in Splunk, with indexer acknowledgement enabled and the supply sort. That is required to map to the best supply sort for the decompressed logs. When creating the HEC enter, embrace the supply sort mapping (for instance, aws:cloudtrail
).
To create a Firehose supply stream for the decompression characteristic, full the next steps:
- Present your vacation spot settings and choose Uncooked endpoint as endpoint sort.
You should utilize a uncooked endpoint for the decompression characteristic to ingest each uncooked and JSON-formatted occasion information to Splunk. For instance, VPC Movement Logs information is uncooked information, and AWS CloudTrail information is in JSON format.
- Enter the HEC token for Authentication token.
- To allow decompression characteristic, deselect Rework supply data with AWS Lambda underneath Rework data.
- Choose Activate decompression and Activate message extraction for Decompress supply data from Amazon CloudWatch Logs.
- Choose Activate message extraction for the Splunk vacation spot.
Message extraction characteristic
After decompression, CloudWatch Logs are in JSON format, as proven within the following determine. You’ll be able to see the decompressed information has metadata info resembling logGroup
, logStream
, and subscriptionFilters
, and the precise information is included inside the message
discipline underneath logEvents
(the next instance reveals an instance of CloudTrail occasions within the CloudWatch Logs).
If you allow message extraction, Firehose will extract simply the contents of the message fields and concatenate the contents with a brand new line between them, as proven in following determine. With the CloudWatch Logs metadata filtered out with this characteristic, Splunk will efficiently parse the precise log information and map to the supply sort configured in HEC token.
Moreover, If you wish to ship these CloudWatch occasions to your Splunk vacation spot in actual time, you should use zero buffering, a brand new characteristic that was launched just lately in Firehose. You should utilize this characteristic to arrange 0 seconds because the buffer interval or any time interval between 0–60 seconds to ship information to the Splunk vacation spot in actual time inside seconds.
With these settings, now you can seamlessly ingest decompressed CloudWatch log information into Splunk utilizing Firehose.
Decompress CloudWatch Logs for Amazon S3
The CloudWatch Logs decompression characteristic for an Amazon S3 vacation spot works just like Splunk, the place you possibly can flip off information transformation utilizing Lambda and activate the decompression and message extraction choices. You should utilize the decompression characteristic to write down the log information as a textual content file to the Amazon S3 vacation spot or use with different Amazon S3 vacation spot options like report format conversion utilizing Parquet or ORC, or dynamic partitioning to partition the info.
Dynamic partitioning with decompression
For Amazon S3 vacation spot, Firehose helps dynamic partitioning, which lets you constantly partition streaming information through the use of keys inside information, after which ship the info grouped by these keys into corresponding Amazon S3 prefixes. This lets you run high-performance, cost-efficient analytics on streaming information in Amazon S3 utilizing providers resembling Amazon Athena, Amazon EMR, Amazon Redshift Spectrum, and Amazon QuickSight. Partitioning your information minimizes the quantity of knowledge scanned, optimizes efficiency, and reduces prices of your analytics queries on Amazon S3.
With the brand new decompression characteristic, you possibly can carry out dynamic partitioning with none Lambda operate for mapping the partitioning keys on CloudWatch Logs. You’ll be able to allow the Inline parsing for JSON possibility, scan the decompressed log information, and choose the partitioning keys. The next screenshot reveals an instance the place inline parsing is enabled for CloudTrail log information with a partitioning schema chosen for account ID and AWS Area within the CloudTrail report.
File format conversion with decompression
For CloudWatch Logs information, you should use the report format conversion characteristic on decompressed information for Amazon S3 vacation spot. Firehose can convert the enter information format from JSON to Apache Parquet or Apache ORC earlier than storing the info in Amazon S3. Parquet and ORC are columnar information codecs that save house and allow sooner queries in comparison with row-oriented codecs like JSON. You should utilize the options for report format conversion underneath the Rework and convert data settings to transform the CloudWatch log information to Parquet or ORC format. The next screenshot reveals an instance of report format conversion settings for Parquet format utilizing an AWS Glue schema and desk for CloudTrail log information. When the dynamic partitioning settings are configured, report format conversion works together with dynamic partitioning to create the information within the output format with a partition folder construction within the goal S3 bucket.
Migrate current supply streams for decompression
If you wish to migrate an current Firehose stream that makes use of Lambda for decompression to this new decompression characteristic of Firehose, check with the steps outlined in Enabling and disabling decompression.
Pricing
The Firehose decompression characteristic decompress the info and prices per GB of decompressed information. To know decompression pricing, check with Amazon Information Firehose pricing.
Clear up
To keep away from incurring future prices, delete the assets you created within the following order:
- Delete the CloudWatch Logs subscription filter.
- Delete the Firehose supply stream.
- Delete the S3 buckets.
Conclusion
The decompression and message extraction characteristic of Firehose simplifies supply of CloudWatch Logs to Amazon S3 and Splunk locations with out requiring any code improvement or further processing. For an Amazon S3 vacation spot, you should use Parquet or ORC conversion and dynamic partitioning capabilities on decompressed information.
For extra info, check with the next assets:
In regards to the Authors
Ranjit Kalidasan is a Senior Options Architect with Amazon Internet Companies primarily based in Boston, Massachusetts. He’s a Companion Options Architect serving to safety ISV companions co-build and co-market options with AWS. He brings over 25 years of expertise in info know-how serving to world clients implement advanced options for safety and analytics. You’ll be able to join with Ranjit on LinkedIn.
Phaneendra Vuliyaragoli is a Product Administration Lead for Amazon Information Firehose at AWS. On this function, Phaneendra leads the product and go-to-market technique for Amazon Information Firehose.