Puneet Mehrotra

Puneet Mehrotra

PhD student | Coffee Drinker
Meme Connoiseur

University of British Columbia

Biography

Hi! I am a third year PhD Computer Science student at the University of British Columbia. I am a member of the Systopia Lab, and am supervised by Prof. Margo Seltzer.

Previously, I worked with Prof. Ivan Beschastnikh for my masters degree. My thesis explored trusted execution for cross-platform data privacy. Prior to that I worked at NetApp Inc., making tools and utilities to make Linux hosts work seamlessly with NetApp’s Data ONTAP.

Interests
  • Operating Systems
  • Distributed Systems
  • Data Processing at Scale
Education
  • PhD in Computer Science

    University of British Columbia

  • MSc in Computer Science, 2019

    University of British Columbia

  • BEng in Computer Science, 2013

    Birla Institute of Technology and Science, India

  • MSc in Biological Sciences, 2013

    Birla Institute of Technology and Science, India

Research

My research is focussed on graph processing systems and graph data management. An incredible amount of data can be naturally expressed as graphs, and there is a growing need to develop efficient and scalable systems to process and manage such data. Much research and industry effort has been spent on developing efficient systems for graph processing and graph storage individually, but rarely are they considered together. Most graph processing systems are either entirely in-memory or spend much time preprocessing the graph to make out-of-core processing feasible and efficient. Either way, all systems are designed to be plugged into deep ETL pipelines that extract the graph from a primary source (usually a database) and prepare it for processing. Data once extracted become disjoint from the primary source, and any subsequent updates to the primary data source must initiate another round of ETL and preprocessing. Moreover, the results of the computation are often discarded and are not used in subsequent rounds of computation.

Ideally, the database used to store the primary data source would also allow fairly performant analytics along with all the rich feature one expects from an ACID compliant transactional store. Relational databases and their SQL interfaces make writing iterative graph algorithms very difficult, a situation often remedied by providing a graph DSL on top of the database (ex. Oracle PGX). Graph databases exist but have not managed to find widespread adoption and are instead relegated to specialized tasks such as fraud detection.

We take a more fundamental approach to tackle this incongruity: why do people not use graph databases? In order to answer this question, we take a ground-up approach to understand the relationship between the graph representation on disk, in memory, and how it impacts the performance of popular graph analytics tasks. We have built a graph database that stores graphs in (currently) three different representations which support fast inserts and range queries to varying extents. This allows us to experiment with different representations, style of algorithm-writing (edge or vertex centric), and the statistical properties of the graph. A clear understanding of these together is essential in designing a system that performs well on both transactional and analytical workloads.

Recent Publications

Quickly discover relevant content by filtering publications.
(2020). Smooth Kronecker: Solving the Combing Problem in Kronecker Graphs.

Cite Code Slides Video DOI

Recent & Upcoming Talks

Cross-platform Data Integrity and Confidentiality with Graduated Access Control

Experience

 
 
 
 
 
Graduate Research Assistant
Jan 2018 – Oct 2019 Vancouver, BC

I worked on the Trusted Capsules project in the NSS Lab. Trusted Capsules provides graduated access control on remote devices.

  • Uses Linaro OP-TEE to manage fine-grained and trackable access to data on remote devices by linking the data to its access policy and encrypting them together
  • Leverage FUSE to intercept operations on encrypted files to facilitate their on-demand decryption and re-encryption using the trusted application running in the Secure World
  • Prototype written in C using a LeMaker Hikey board with ARM TrustZone and Linux 4.15.
 
 
 
 
 
Member of Technical Staff - II
Jul 2015 – Jul 2017 Bangalore, India
  • Designed and implemented RussianRiver – a tool to validate and set host multipath settings for NetApp SAN
  • Worked on Unified Host Utilities Kits for Linux and Unix – a tool for checking the health of storage on the OS when connected to NetApp storage controllers. It provides path and state information for all NetApp LUNs present on a host by issuing queries to the Host Bus Adapter API libraries.
  • Handled infrastructure orchestration and configuration management for interop QA infrastructure
  • Designed and developed SAN Host Remediation Tool that automates the tasks to be performed on hosts when the storage migrates from NetApp 7Mode Data ONTAP to Cluster Data ONTAP. Supports all major host OS variants
 
 
 
 
 
Member of Technical Staff - I
Jul 2013 – Jul 2015 Bangalore, India
  • Created a web application for iLAB using Django to help users select test configuration and track progress
  • Wrote Python and Perl scripts and libraries to test the interoperability of new Linux host and Data ONTAP features
  • Wrote SystemTap scripts to capture additional details during regression testing – I/O latency, CPU utilization, Device Mapper - Multipath Queue Depths, etc
 
 
 
 
 
Engineering Intern
Jul 2012 – Jun 2015 Bangalore, India
  • Designed and developed iLAB – a framework that handles dynamic testbed creation, resource allocation and initialization, test execution, and testbed tear-down. Increased execution efficiency by 95%

Teaching

 
 
 
 
 
Graduate Teaching Assistant - Software Engineering
Sep 2017 – Apr 2019 Vancouver, BC

I was the Teaching Assistant for CPSC 319: Software Engineering Project at UBC from September 2017 to April 2018. This course provides an opportunity to undergrad students to design, implement and test on a large software system for an industry sponsor. The focus in this course is to apply waterfall SDLC methodology to the solution from inception to production, producing key documentation artefacts while working in a team.

I acted as the industry sponsor liason and “engineering manager” to the teams. Over the course of the year, I overlooked two projects and managed about 40 students.

  • Managed two teams implementing a self-service tool for Uniserve clients and IT support staff.The tool allows the IT staff to manage their devices and see real time information about their network health and usage trends.

  • Supervised two student teams implementing a self-checkout and payment portal for ChainXY.

 
 
 
 
 
Advanced Operating Systems
Sep 2019 – Dec 2017 Vancouver, BC

I was the Teaching Assistant for CPSC 508: Advanced Operating Systems This seminar-style course introduces students to the theory and practice of conducting systems research. The papers discussed cover the history of operating systems research with a special emphasis on understanding what constitutes systems research and how it has evolved.

As a Teaching Assistant, I help students with their assignments and term projects - right from defining scope to finding the right compute resources for them to use. I also ran tutorials to bring the undergraduate students up to speed so that they can have a richer learning experience.

Service

 
 
 
 
 
Systems Lab Representative
Faculty Recruitment Committee
Jan 2021 – Jun 2021
 
 
 
 
 
Systems Lab Representative
Graduate Recruitment and Admissions Committee
Jan 2020 – Apr 2020
 
 
 
 
 
Tuesday Tea Czar
Computer Science Graduate Students Association
Mar 2018 – Mar 2019
 
 
 
 
 
Program Committee Member
The 1st Annual CS-Can Student Symposium
Apr 2019 – Jun 2020