Harrison Pim

I'm a data scientist / machine learning engineer with a background in computational / quantum physics. I write loads of python and a little bit of everything else.

I like working on hard R&D problems involving computer vision, natural language processing, graph theory, representation learning, recommendation systems, and information retrieval.
I love turning those research projects into services which help people in the real world.

Jobs

Data Scientist, Wellcome Collection
2018-
  • Researching, implementing, and deploying cutting-edge machine learning models to extract features from millions of digitised historical texts and images
  • Developing recommendation systems on top of those features for content discovery
  • Developing, monitoring, and iterating on core search algorithms
  • Collaborating with data engineering / devops colleagues to manage research infrastructure, deploy pipelines/models
  • Mentoring junior data scientists and analysts across Wellcome, and students externally
  • Communicating the team's work through talks at conferences and universities, writing blog posts, etc
Data Analyst, The British Museum
2016-2018
  • Analysis and predictive modelling of visitor behaviour online, and within the physical museum
  • Audience segmentation, collaborating closely with membership and marketing teams to drive growth and conversion
  • Developed data pipelines in Azure
  • Explored novel data sources, with a focus on delivering real business impact
  • Communicated research at conferences, collaborated with external academic researchers
Data Science Intern, Great Little Place
2015-2015
  • Researched and implemented core recommendation system for a global tinder-style restaurant/bar/venue matching app
  • Developed human-in-the-loop feedback systems for internal model validation
  • Geographic/behavioural analysis and segmentation of existing users and places, driving growth

Education

University College London (UCL)
2012-2016

MSc Physics, 2:1

The Windsor Boys' School
2007-2012
  • A-Level Maths (A), Physics (A), Design (C)
  • 13 GCSEs between A* - B

Skills & Tools

Machine Learning
pytorch, keras, tensorflow, sklearn
Analysis
jupyter, pandas, numpy, scipy, neo4j, networkx, kibana, postgreSQL, mySQL, SQLite
Web Development
typescript, javascript, next.js, react, tailwind, fastapi, flask, prisma, stripe, jekyll
Deployment
AWS, docker, elasticsearch, terraform, typer, netlify, vercel

Recent Projects

Wellcome image search, 2020

Built and deployed computer vision services to extract features from images, enabling search, filtering, comparison, recommendation, etc. Automated image/illustration extraction from millions of printed book pages

Wellcome knowledge graph, 2020

Built an NLP model capable of recognising and disambiguating entities and concept in text, linking each one to its page in wikidata and/or other controlled vocabularies. Subsequent modelling of the concept network in neo4j, and a bit of graph-neural-network research.

theblackhart.co.uk, 2020

Jamstack ecommerce site for The Black Hart, built with next.js, typescript, tailwind CSS, prismic, and stripe, and deployed on vercel.

Bid, 2021

Auctions as a service, using stripe.

HAL, 2019

Developed a friendly cli for managing machine learning research infrastructure in AWS.

Responsible machine learning, 2020

Member of Mozilla's "Building Trustworthy AI" working group. Core member of "Museums + AI Network", developing policy recommendations for ML use in the museum sector.

Bookworm, 2017

Side project. Coupled entity-recognition and graph theory to find characters in novels and learn their social network, enabling all sorts of fun subsequent analysis.

Invisible Insights from TripAdvisor, 2016

ML driven analysis of British Museum reviews. Collaboration with TripAdvisor, the Oxford Internet Institute, and The Alan Turing Institute.

Columbus recommender system, 2015

Recommender system for the Great Little Place app.

Modelling correlations between H2O fragments on Si, 2015

MSci project. High-dimensional density-functional-theory simulations of novel atomic-scale structures. Heavy use of high performance computing clusters.

Other Stuff

Teaching and mentoring

I try to share what Iā€™m working on with people who are interested, directly or indirectly through blog posts, papers, or talks at conferences and universities. I'm a UCL MSc project supervisor, Pydata Mentor, Google/Datakind Impact Challenge reviewer.

Cycling and outdoor swimming

I like swimming outdoors, especially in winter. I've been cycling in London for eight years and I've only been hit by four cabs.