Erik M. Ferragut

  • Optum (UnitedHealth Group)
  • 11000 Optum Circle
  • Eden Prairie, MN 55344
  • Telecommuting from Oak Ridge, TN
  • erik.ferragut (at) optum (dot) com
Photograph of Me

I am a Principal Data Scientist in the Commercial Data Science team within the Advanced Analytics Laboratory at Optum, a part of UnitedHealth Group. I am interested in solving urgent, real world problems by applying state-of-the-art machine learning algorithms, inventing them when needed. In this group, my focus is on detection and prevention of fraud, waste, abuse, and error within medical claims data. Since labeled data can be limited, I am interested in combining unsupervised, semi-supervised, and supervised methods as appropriate.

In the recent past, I worked on cyber security and cyber-physical security. Cyber systems have peculiarities that often require new algorithms. For example, much of the data collected is computer-generated, discrete, and structured. Also, many of the underlying structures are or occur within networks. Furthermore, these systems must operate in adversarial scenarios where simple fault tolerance and reliability analyses cannot properly account for the planning and intelligence of adversaries. I explored the development of new algorithms that exploit the particulars of cyber problems to improve defensive and offensive capabilities. My later focus was on situation awareness from network sensor data and protection of the power grid.

I worked at ORNL between 2009 and 2017. In addition to being a research scientist, I was also the team lead for research-operations integration where I supported the operational deployment of research results. Previously, I worked as a cryptologic researcher for over 10 years. I earned my Ph.D. in Mathematics from the University of Michigan, Ann Arbor in 2003 (defending my thesis just hours before the Northeast blackout).

Research Interests

My research focus is in the effective use of machine learning methods to address real, operational problems. Many amazing results have been achieved recently using Deep Learning, but most of these are within the domains of images, voice, and text. In many applications, such as medical claims data and cyber security, the idiosyncrasies of the data render inappropriate the routine application of advanced algorithms. I am especially interested in determining the best analytics for addressing any given problem, whether it be from Deep Learning, other machine learning methods, optimization, probabilistic modeling, or game theory.