This publish is cowritten with Pramod Nayak, LakshmiKanth Mannem and Vivek Aggarwal from the Low Latency Group of LSEG.
Transaction price evaluation (TCA) is extensively utilized by merchants, portfolio managers, and brokers for pre-trade and post-trade evaluation, and helps them measure and optimize transaction prices and the effectiveness of their buying and selling methods. On this publish, we analyze choices bid-ask spreads from the LSEG Tick Historical past – PCAP dataset utilizing Amazon Athena for Apache Spark. We present you tips on how to entry knowledge, outline customized capabilities to use on knowledge, question and filter the dataset, and visualize the outcomes of the evaluation, all with out having to fret about establishing infrastructure or configuring Spark, even for giant datasets.
Background
Choices Worth Reporting Authority (OPRA) serves as a vital securities info processor, gathering, consolidating, and disseminating final sale studies, quotes, and pertinent info for US Choices. With 18 energetic US Choices exchanges and over 1.5 million eligible contracts, OPRA performs a pivotal position in offering complete market knowledge.
On February 5, 2024, the Securities Business Automation Company (SIAC) is ready to improve the OPRA feed from 48 to 96 multicast channels. This enhancement goals to optimize image distribution and line capability utilization in response to escalating buying and selling exercise and volatility within the US choices market. SIAC has really useful that companies put together for peak knowledge charges of as much as 37.3 GBits per second.
Regardless of the improve not instantly altering the overall quantity of printed knowledge, it permits OPRA to disseminate knowledge at a considerably quicker charge. This transition is essential for addressing the calls for of the dynamic choices market.
OPRA stands out as one probably the most voluminous feeds, with a peak of 150.4 billion messages in a single day in Q3 2023 and a capability headroom requirement of 400 billion messages over a single day. Capturing each single message is important for transaction price analytics, market liquidity monitoring, buying and selling technique analysis, and market analysis.
In regards to the knowledge
LSEG Tick Historical past – PCAP is a cloud-based repository, exceeding 30 PB, housing ultra-high-quality international market knowledge. This knowledge is meticulously captured immediately throughout the alternate knowledge facilities, using redundant seize processes strategically positioned in main major and backup alternate knowledge facilities worldwide. LSEG’s seize expertise ensures lossless knowledge seize and makes use of a GPS time-source for nanosecond timestamp precision. Moreover, subtle knowledge arbitrage methods are employed to seamlessly fill any knowledge gaps. Subsequent to seize, the info undergoes meticulous processing and arbitration, and is then normalized into Parquet format utilizing LSEG’s Actual Time Extremely Direct (RTUD) feed handlers.
The normalization course of, which is integral to getting ready the info for evaluation, generates as much as 6 TB of compressed Parquet information per day. The large quantity of knowledge is attributed to the surrounding nature of OPRA, spanning a number of exchanges, and that includes quite a few choices contracts characterised by numerous attributes. Elevated market volatility and market making exercise on the choices exchanges additional contribute to the quantity of knowledge printed on OPRA.
The attributes of Tick Historical past – PCAP allow companies to conduct numerous analyses, together with the next:
- Pre-trade evaluation – Consider potential commerce impression and discover totally different execution methods primarily based on historic knowledge
- Publish-trade analysis – Measure precise execution prices towards benchmarks to evaluate the efficiency of execution methods
- Optimized execution – Tremendous-tune execution methods primarily based on historic market patterns to attenuate market impression and scale back total buying and selling prices
- Threat administration – Determine slippage patterns, establish outliers, and proactively handle dangers related to buying and selling actions
- Efficiency attribution – Separate the impression of buying and selling choices from funding choices when analyzing portfolio efficiency
The LSEG Tick Historical past – PCAP dataset is on the market in AWS Information Change and might be accessed on AWS Market. With AWS Information Change for Amazon S3, you possibly can entry PCAP knowledge immediately from LSEG’s Amazon Easy Storage Service (Amazon S3) buckets, eliminating the necessity for companies to retailer their very own copy of the info. This strategy streamlines knowledge administration and storage, offering purchasers instant entry to high-quality PCAP or normalized knowledge with ease of use, integration, and substantial knowledge storage financial savings.
Athena for Apache Spark
For analytical endeavors, Athena for Apache Spark provides a simplified pocket book expertise accessible via the Athena console or Athena APIs, permitting you to construct interactive Apache Spark functions. With an optimized Spark runtime, Athena helps the evaluation of petabytes of knowledge by dynamically scaling the variety of Spark engines is lower than a second. Furthermore, frequent Python libraries resembling pandas and NumPy are seamlessly built-in, permitting for the creation of intricate utility logic. The pliability extends to the importation of customized libraries to be used in notebooks. Athena for Spark accommodates most open-data codecs and is seamlessly built-in with the AWS Glue Information Catalog.
Dataset
For this evaluation, we used the LSEG Tick Historical past – PCAP OPRA dataset from Could 17, 2023. This dataset includes the next elements:
- Greatest bid and provide (BBO) – Reviews the best bid and lowest ask for a safety at a given alternate
- Nationwide finest bid and provide (NBBO) – Reviews the best bid and lowest ask for a safety throughout all exchanges
- Trades – Data accomplished trades throughout all exchanges
The dataset includes the next knowledge volumes:
- Trades – 160 MB distributed throughout roughly 60 compressed Parquet information
- BBO – 2.4 TB distributed throughout roughly 300 compressed Parquet information
- NBBO – 2.8 TB distributed throughout roughly 200 compressed Parquet information
Evaluation overview
Analyzing OPRA Tick Historical past knowledge for Transaction Value Evaluation (TCA) includes scrutinizing market quotes and trades round a particular commerce occasion. We use the next metrics as a part of this examine:
- Quoted unfold (QS) – Calculated because the distinction between the BBO ask and the BBO bid
- Efficient unfold (ES) – Calculated because the distinction between the commerce worth and the midpoint of the BBO (BBO bid + (BBO ask – BBO bid)/2)
- Efficient/quoted unfold (EQF) – Calculated as (ES / QS) * 100
We calculate these spreads earlier than the commerce and moreover at 4 intervals after the commerce (simply after, 1 second, 10 seconds, and 60 seconds after the commerce).
Configure Athena for Apache Spark
To configure Athena for Apache Spark, full the next steps:
- On the Athena console, beneath Get began, choose Analyze your knowledge utilizing PySpark and Spark SQL.
- If that is your first time utilizing Athena Spark, select Create workgroup.
- For Workgroup identify¸ enter a reputation for the workgroup, resembling
tca-analysis
. - Within the Analytics engine part, choose Apache Spark.
- Within the Further configurations part, you possibly can select Use defaults or present a customized AWS Id and Entry Administration (IAM) position and Amazon S3 location for calculation outcomes.
- Select Create workgroup.
- After you create the workgroup, navigate to the Notebooks tab and select Create pocket book.
- Enter a reputation to your pocket book, resembling
tca-analysis-with-tick-history
. - Select Create to create your pocket book.
Launch your pocket book
When you have already created a Spark workgroup, choose Launch pocket book editor beneath Get began.
After your pocket book is created, you may be redirected to the interactive pocket book editor.
Now we will add and run the next code to our pocket book.
Create an evaluation
Full the next steps to create an evaluation:
- Create our knowledge frames for BBO, NBBO, and trades:
- Now we will establish a commerce to make use of for transaction price evaluation:
We get the next output:
We use the highlighted commerce info going ahead for the commerce product (tp), commerce worth (tpr), and commerce time (tt).
- Right here we create various helper capabilities for our evaluation
- Within the following operate, we create the dataset that incorporates all of the quotes earlier than and after the commerce. Athena Spark robotically determines what number of DPUs to launch for processing our dataset.
- Now let’s name the TCA evaluation operate with the data from our chosen commerce:
Visualize the evaluation outcomes
Now let’s create the info frames we use for our visualization. Every knowledge body incorporates quotes for one of many 5 time intervals for every knowledge feed (BBO, NBBO):
Within the following sections, we offer instance code to create totally different visualizations.
Plot QS and NBBO earlier than the commerce
Use the next code to plot the quoted unfold and NBBO earlier than the commerce:
Plot QS for every market and NBBO after the commerce
Use the next code to plot the quoted unfold for every market and NBBO instantly after the commerce:
Plot QS for every time interval and every marketplace for BBO
Use the next code to plot the quoted unfold for every time interval and every marketplace for BBO:
Plot ES for every time interval and marketplace for BBO
Use the next code to plot the efficient unfold for every time interval and marketplace for BBO:
Plot EQF for every time interval and marketplace for BBO
Use the next code to plot the efficient/quoted unfold for every time interval and marketplace for BBO:
Athena Spark calculation efficiency
Whenever you run a code block, Athena Spark robotically determines what number of DPUs it requires to finish the calculation. Within the final code block, the place we name the tca_analysis
operate, we are literally instructing Spark to course of the info, and we then convert the ensuing Spark dataframes into Pandas dataframes. This constitutes probably the most intensive processing a part of the evaluation, and when Athena Spark runs this block, it exhibits the progress bar, elapsed time, and what number of DPUs are processing knowledge at the moment. For instance, within the following calculation, Athena Spark is using 18 DPUs.
Whenever you configure your Athena Spark pocket book, you’ve got the choice of setting the utmost variety of DPUs that it might use. The default is 20 DPUs, however we examined this calculation with 10, 20, and 40 DPUs to show how Athena Spark robotically scales to run our evaluation. We noticed that Athena Spark scales linearly, taking quarter-hour and 21 seconds when the pocket book was configured with a most of 10 DPUs, 8 minutes and 23 seconds when the pocket book was configured with 20 DPUs, and 4 minutes and 44 seconds when the pocket book was configured with 40 DPUs. As a result of Athena Spark fees primarily based on DPU utilization, at a per-second granularity, the price of these calculations is comparable, however if you happen to set the next most DPU worth, Athena Spark can return the results of the evaluation a lot quicker. For extra particulars on Athena Spark pricing please click on right here.
Conclusion
On this publish, we demonstrated how you should use high-fidelity OPRA knowledge from LSEG’s Tick Historical past-PCAP to carry out transaction price analytics utilizing Athena Spark. The provision of OPRA knowledge in a well timed method, complemented with accessibility improvements of AWS Information Change for Amazon S3, strategically reduces the time to analytics for companies trying to create actionable insights for important buying and selling choices. OPRA generates about 7 TB of normalized Parquet knowledge every day, and managing the infrastructure to supply analytics primarily based on OPRA knowledge is difficult.
Athena’s scalability in dealing with large-scale knowledge processing for Tick Historical past – PCAP for OPRA knowledge makes it a compelling alternative for organizations searching for swift and scalable analytics options in AWS. This publish exhibits the seamless interplay between the AWS ecosystem and Tick Historical past-PCAP knowledge and the way monetary establishments can reap the benefits of this synergy to drive data-driven decision-making for important buying and selling and funding methods.
In regards to the Authors
Pramod Nayak is the Director of Product Administration of the Low Latency Group at LSEG. Pramod has over 10 years of expertise within the monetary expertise business, specializing in software program growth, analytics, and knowledge administration. Pramod is a former software program engineer and keen about market knowledge and quantitative buying and selling.
LakshmiKanth Mannem is a Product Supervisor within the Low Latency Group of LSEG. He focuses on knowledge and platform merchandise for the low-latency market knowledge business. LakshmiKanth helps clients construct probably the most optimum options for his or her market knowledge wants.
Vivek Aggarwal is a Senior Information Engineer within the Low Latency Group of LSEG. Vivek works on creating and sustaining knowledge pipelines for processing and supply of captured market knowledge feeds and reference knowledge feeds.
Alket Memushaj is a Principal Architect within the Monetary Companies Market Improvement staff at AWS. Alket is chargeable for technical technique, working with companions and clients to deploy even probably the most demanding capital markets workloads to the AWS Cloud.