Data Analytics: Big Data & Deep Learning

Dates covered: from 2013 onward



Most projects on this site cover the pre-PhD period. The projects done during the PhD are documented under PhD Thesis as well as under Publications. Projects after the PhD, i.e., 2013 onward, are largely undocumented due to confidentiality restrictions, as I have been working in the industry. The projects that I have worked on in this period and recently are in Big Data and Deep Learning. There are few items that I have worked on on the side and could publish:

  • Productionization of TensorSpark in yarn-cluster mode (tested in an HDP cluster): I contributed to the TensorSpark project, helping people run it in a YARN-based production environment. TensorSpark implements Downpour SGD, a Google idea. This asynchronous stochastic gradient descent (SGD) is intuitively more suitable for cloud-based Spark clusters, as your cluster workers are typically sprinkled all over the data center and you want to avoid a network bottleneck which affects few workers to slow down too much the model training. See the GitHub issue/PR for details.
  • Class Activation Map is a great tool to help fine-tune and better understand a Deep Learning model (ConvNets). I created a notebook to help with this. Tech setup: Jupyter notebook / Python / TensorFlow / VGG model / Caltech256 dataset


While the above items are developed in my own time, I used them subsequently in the projects of the companies that I have worked for at the time.


Free template 'Colorfall' by [ Anch ] Studio. Please, don't remove this hidden copyleft!