Vectology – exploring biomedical variable relationships using sentence embedding and vectors

Abstract

Many biomedical data sets contain variables that are identified by simple, and often short, descriptions. Traditionally these would either be manually annotated and/or assigned to ontologies using expert knowledge, facilitating interactions with other data sets and gaining an understanding of where these variables lie in the biomedical knowledge space. An alternative approach is to utilise sentence embedding methods and convert these variables into vectors, calculated from precomputed models derived from biomedical literature. This provides a data-driven alternative to manual expert annotation, automatically harnessing the expert knowledge captured in the existing literature. These vectors, representing the biomedical space embodied by each specific piece of text, enable us to apply methods for exploring relationships between variables in vector space, notably comparing distances between vectors. From here, it is possible to recommend a set of variables as the most conceptually similar to a given piece of text or existing vector, whilst also gaining insight into how a group of variables are related. Vectology is made available via an API (http://vectology-api.mrcieu.ac.uk/) and basic usage can be explored via a web application (http://vectology.mrcieu.ac.uk).

Conference proceedings, poster, and demo for 2019 1st International ‘Alan Turing’ Conference on Decision Support and Recommender Systems