Oleg Zabluda's blog
Wednesday, December 07, 2016
Petabytes and petabytes of Wall St trades in the public cloud.
Petabytes and petabytes of Wall St trades in the public cloud.
Wall Street regulator FINRA has put its hand up to build what financial commentators are calling “the biggest database in history” in the AWS public cloud.

FINRA is one of three shortlisted bidders competing to build a “consolidated audit trail" of every single share trade and options order made in US financial markets each day.
The US Securities and Exchange Commission (SEC) has given the database build the green light, and a committee of stock exchanges will vote on who gets paid an estimated US$2.4 billion to construct it in early 2017.

What makes FINRA's bid different from its competitors - like fintech firm Fidelity National Information Services (FIS), which has partnered with Google cloud services for its own push - is that it has already started.

The regulator has built a version of the system for its own surveillance purposes, ready to scale as soon as it gets the SEC tap.
On its own, FINRA already collects and processes up to 75 billion records on every share transaction on the US market each day.
“Stitch all this data together over weeks and months and then we are talking trillions of records - over 20 petabytes," he said.

The not-for-profit regulator is responsible for enforcing SEC rules over 90 percent of the US equities market, and about 60 percent of the US options market by volume.

In the business of stamping out fraud and market manipulation, milliseconds are critical. FINRA has to be able to effectively “replay” the whole network of trades in a time-sequenced order - even though the 3876 securities firms and 641,494 brokers under its watch can all be working to marginally different clocks.

It has to keep the data for a minimum of two years, because you never know when a fraud prosecution will kick off.

And the 75 billion records daily peak is just today: FINRA’s regulation technology director Brett Shriver said trade volumes are going up around 20 percent every year thanks to trends like high frequency trading.
In the middle of this year FINRA stood up a brand new regulatory platform based on Apache's Spark, HBase, and Hive tools, using Amazon EMR with AWS S3 as its primary storage.

Randich said he had to run the gauntlet of naysayers when the regulator decided to go public cloud and open source.

“I had one of the most senior executives at one of the largest technology companies in the world tell me this doesn’t belong in the cloud. It is not going to work,” he said.

“We had streams of proprietary database vendors coming in one-by-one telling us it wouldn’t scale, it wasn’t mature, it won’t work.

"We have proven them all wrong.”

The effort earned FINRA the praise of AWS CEO Andy Jassy, who called the firm “one of the very top practitioners of building on top of AWS” in the world today.

FINRA currently has 2 trillion rows of data in HBase, a number the team expects to grow dramatically.

The impact has been immediate for its investigators, who are now getting the results to database queries on average 400 times faster.
From a financial perspective FINRA’s use of AWS spot pricing - its cheap but unpredictable EC2 auctions - has delivered “an order of magnitude in savings” according to Shriver, who says non time-sensitive queries can be queued up until cheap compute becomes available.

“We can trade off what we want to pay and how fast we need it done. It has been a real game-changer for FINRA to help us keep up with demand,” he said.


| |


Powered by Blogger