Modeling Projects

Multi-Class Website Classification Research

Sophos Antivirus

Was a core technical member of a team researching solutions to a low-data, multi-class classification problem. In the support of research questions, implemented active learning methods,an attention-based URL encoder mechanism, and a novel semi-supervised method in Python/Pytorch. Authored and gave an accepted talk at CAMLIS on a novel clustered loss function used to integrate asymmetric costs of misclassification in a multi-class setting.

Portable Executable Model Research

Sophos Antivirus

Was part of a team working to improve a core production model for identifying malicious binary files. Proposed adding a non-binary auxiliary loss, which generated one of the highest performance boosts of the research period. Built a Python framework for easy calculation of metrics across core structures (multi-class, multi-label). Am an author of a currently-being-submitted paper on the value gained through creative use of auxiliary loss functions

Hierarchical Shared Weight HTML Detection Paper

Sophos Antivirus

Author on paper,accepted into the S&P Deep Learning in Security workshop, on a neural network design using shared weights over document aggregations at multiple resolutions for HTML detection. Helped design and write Keras implementations of baseline network structures to test the value of specific architectural choices. Gave the accepted paper talk at S&P conference in May 2018

Prediction Explanation Literature Review


Made a survey of current state of the art in model explanation techniques to support both data science and compliance teams. Tested LIME and feature perturbation analysis as explanation methods, to compare both the quality of their results and their efficiency. Built a Numpy-optimized Python implementation of a feature perturbation explanation system.

Subspace Clustering for Anomaly Detection

Sophos Antivirus

Researched methods to detect high-density anomalies with multivariate, categorical, time series data. Designed, implemented, and launched a time-series variant of CLICKS, a subspace clustering technique.

Copyright © All rights reserved | This template is made with by Colorlib