Thursday, July 4, 2019

Serializing Spark dataframes to Avro using KafkaAvroSerializer

I recently worked on a project that used Spark Structured Streaming using Apache Spark, Confluent Schema Registry and Apache Kafka. Due to some versioning constraints between the various components, I had to write a custom implementation of the KafkaAvroSerializer class for serializing Spark Dataframes into Avro format. The serialized data was then published to Kafka. This post is based on the examples specified in the Confluent documentation here.

In newer versions of Confluent Schema Registry, lot of the implementations detailed below have been simplified and much easier to use. The standard recommended usage of the Confluent KafkaAvroSerializer is fairly simple in that it requires you to set it as one of the Kafka properties that is used when initializing a KafkaProducer:

val kafkaProperties = new Properties();
props.put(...)
...
...
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, io.confluent.kafka.serializers.KafkaAvroSerializer.class
val producer = new KafkaProducer(props);

This abstracts out many of the implementation specifics and details. The way this works is that when the object to be published to Kafka is sent using the KafkaProducer, internally the KafkaAvroSerializer does the following:

Thursday, January 17, 2019

My 2018 Reading Challenge



Towards the end of 2017, I found myself falling behind on my habit of reading books. It wasn't that I was not reading lesser than usual or not reading at all. It was just that I spent most of my time reading newspapers and magazines; mostly the latter, which is something I enjoyed very much. Due to this, when it came to reading books, I did not have much to be happy about.

It was during this time that I came across a friend using Goodreads to keep tracking of his "to read" list and progressing through them during the course of the year. I decided to follow suit and joined the Goodreads 2018 Reading Challenge and set myself a target of 20 books. During the course of this challenge, I came across a wide variety of books across various genres - from inspiring memoirs, spellbinding narratives and political thrillers to some that were slow and painful to progress through and had to be dropped halfway. By the end of the year, I had read a total of 17 books and had a couple of abandoned ones in various stages of progress.

This post, after a rather long time, lists down some of my favourite books from last year's reading challenge. In no order of preference, they are:

Bad Blood by John Carreyrou


If I had to pick a favourite among the 17 books that I read last year, it would have to be Bad Blood written by the WSJ journalist (and Pulitzer Prize winner) John Carreyrou on the rise and fall of the infamous startup Theranos. When I mentioned some books that were "spellbinding narratives" above, I had this book in mind. I stumbled across this book while browsing one of Bill Gates' reading recommendations. I was sold on his review and gave this book a try and boy, did I have a hard time putting it down. It was a spectacular read. I won't be able to do justice to the review that this book deserves, so I recommend that you read Bill Gates' review of it here




Sunday, August 14, 2016

Book Review: The Intel Trinity: How Robert Noyce, Gordon Moore, and Andy Grove Built the World's Most Important Company

I recently read Michael Malone's "The Intel Trinity: How Robert Noyce, Gordon Moore, and Andy Grove Built the World's Most Important Company". I enjoyed reading it so much that I recommended it to a few of my friends. Eventually, I figured that I might as well tell more people about it and wrote a review. The following review is cross-posted from Goodreads.com.

The Intel Trinity: How Robert Noyce, Gordon Moore, and Andy Grove Built the World's Most Important CompanyThe Intel Trinity: How Robert Noyce, Gordon Moore, and Andy Grove Built the World's Most Important Company by Michael S. Malone
My rating: 5 of 5 stars

I began reading this book on a whim having already been familiar with the backgrounds of the three founders - Robert Noyce, Gordon Moore and Andy Grove. Michael Malone chronicles the birth of Intel to its present state methodically and brilliantly. During the course of the narrative, the book beautifully digresses to give the reader a detailed look into the early lives of the three founders starting with Noyce, Moore and finally Grove.

The book acknowledges that it borrows from the individual biographies of its founders but doesn't come across as redundant at any point. Deeply engaging in its narrative and often poignant in its story line, I found this book beautifully compiled and enjoyed every bit of it to the point that I was a little sad when it ended.

If you are a fan of biographies, you would find this to be a great book. I'd rate is So-good-I-can't-stop-telling-you-about-it good!

Sunday, November 2, 2014

#Humblebrag - This blog got featured in a documentary on Aaron Swartz

 TL;DR - Some code that I wrote got featured in the documentary "The Internet's Own Boy: The Story of Aaron Swartz".
-- 
Some code that I wrote and blogged about in memory of Aaron Swartz in January 2013 got featured in the documentary on Aaron's life "The Internet's Own Boy: The Story of Aaron Swartz" below. Skip to 5:41 if you're curious.

How do I know its mine? The novice level of the crudely written code is uniquely me. I was supposed to improve upon it later but I never really got around to actually doing it. Meanwhile some people who forked the code from my Github repo did a far better job anyway. This project was more of a tribute than anything else and so it was a humbling experience to have randomly spotted it while watching the documentary again yesterday. 




Tuesday, July 1, 2014

30 Day Challenge - July 2014 and a brief summary

July has arrived and with it comes 31 new days to attempt another 30 Day Challenge (with a one day buffer ;) ). For those of you who are not familiar with the 30 Day Challenge, I would encourage you to watch this TED talk by Matt Cutts, who heads Google's webspam team. The idea, the rationale and the aim have all been wonderfully presented by him in less than 4 minutes.


Here are the list of 30 day challenges that I've attempted in the past. I've had fun doing some of them and have blogged about my experiences.

Thursday, January 2, 2014

Report on 30 Day Challenge - Read 15 books in 30 days

This past year during the month of June 2013, I attempted a 30 day challenge - to read 15 books. While I got off to a good start with 4 books in 11 days, however, I managed to complete only 11 by the end of the month.

This post has been long overdue but considering that I did not attempt to complete the remaining four books until now, I guess this post is not so ill-timed after all. I should mention here that my previous (best) failed attempt at this same challenge was 12 books. I am hoping to beat this sometime in future but that would depend on whether, and when, I would get my hands on a stack of books this engaging.

During the course of the challenge, I read a few great books which I would gladly recommend to any fellow bibliophile. The list goes as follows (in no particular order):

Friday, January 25, 2013

Remembering Aaron Swartz - Raw Thought

Be curious; Read widely. Try new things. I think a lot of what people call intelligence just boils down to curiosity.
 - Aaron Swartz

Aaron Swartz (1986 - 2013)
Last week, I was saddened to hear the news about the death of Aaron Swartz. I have never met Aaron in person but I was a regular follower of his blog and work for many years. Both of these led me to deeply admire and respect him. His tireless work against the passing of the SOPA bill was of significant interest to me because of the serious implications of the bill. As a programmer his talents were legendary and as an activist his tireless efforts admirable. His passing is indeed a great loss. The Economist has a nice obituary here that perfectly reflects this sentiment. 

As an avid reader of his blog, I am deeply saddened to know that there will not be anymore updates to the site. So I decided to collect all of his blog posts over the years and compile it into a PDF/Ebook. So this past week, I wrote a simple Python script that crawled Aaron's weblog and retrieved all of the posts one by one and complied them into a single file.