How NerdWallet and BlockCypher are Building Data Platforms
On February 23 in San Francisco, SF DATA hosted a tech talk on Data Platforms in FinTech. In their talks, speakers revealed how they’re building out data platforms to support blockchain applications, cryptocurrencies, and detect fraudulent activity.
Matthieu Riou, CTO and co-founder of BlockCypher, explained how BlockCypher used data to hunt down $70 million in stolen Bitcoins for the Department of Homeland Security.
Vaibhav Jajoo, head of Data Infrastructure at NerdWallet, described how the data team at NerdWallet is using a new brand of data analytics solutions with Kafka, Python, EMR, and Redshift.
Analytics for Bitcoin and Other Cryptocurrencies at BlockCypher
Developers, companies, and government agencies use the BlockCypher API to build cryptocurrency applications and analyze patterns in blockchain transactions. Bitcoin alone has 8.2 million new transactions per month, with 250 million IP addresses to monitor. The Department of Homeland Security recently used the BlockCypher Analytics’ API to track down $70 million in stolen Bitcoins from the Bitfinex heist.
In August 2016, BlockCypher noticed that 0.75% of Bitcoins suddenly started moving in unusual patterns.
While the culprits are still unknown, BlockCypher was able to filter data to pinpoint where the transactions were coming from — in this instance bitcoin wallet provider BitGo. The BlockCypher architecture uses a combination of Cassandra, Redshift and Spark.
According to Matthieu, the Holy Grail in cryptocurrency is to deanonymize every transaction, have the ability to tie it with off-chain transactions, classify transactions using machine learning, and provide APIs for law enforcement and industry.
Building Data Solutions and Innovations at NerdWallet
NerdWallet gives consumers and small businesses clarity around all of life’s financial decisions by building accessible online tools and providing research and expert advice.
Vaibhav joined NerdWallet in 2014 and started the Data Analytics Team to help everybody in NerdWallet create meaning from the large volume and variety of data that NerdWallet customers generate every day (popular products, click-through rates, platform attributes etc.). NerdWallet has ~450 employees, and data consumers (i.e. “everybody”) stretch from analysts to the CEO. Product and marketing analysts care about granular views for a specific product or campaign. The CEO cares about high level views about the business. The data platform at NerdWallet needs to be flexible enough to serve data in a form each audience can understand, while allowing to slice and dice data across many dimensions.
Using a combination of Kafka, Amazon Redshift and EMR, NerdWallet has been able to create “ETL as Scale” and manage dynamic workloads. There are 250+ named SQL users, with different levels of SQL skills. That can pose a lot of challenges for managing the Redshift environment, especially for situations where some users write large ad-hoc queries. A key to balancing resources with workloads is Redshift’s WLM (Workload Management).
— —
Interested in building data platforms? Subscribe to SF Data Weekly, for more stories on data engineering you don’t want to miss.
Attend our next event? Follow the SF Data Facebook page.