May 23, 2022

Before you can even think about building algorithms for reading x-rays or interpreting blood smears, the machine needs to know what’s in the image. All the promises of AI in healthcare — a sector that has attracted $11.3 billion in private investment by 2021 — cannot be realized without carefully labeled datasets that tell machines exactly what they are looking for.

Creating these tagged datasets is becoming an industry and companies are responsible for unicorn status. Today, Encord, a small startup located near Y Combinator, wants to get in on the action. In order to create labeled datasets for computer vision projects, Encord has launched its own beta version of an AI-assisted labeling program called CordVision. The launch follows pilot programs at Stanford Medicine, Memorial Sloan Kettering and King’s College London. It has also been tested by Kheer Medical and AI.

Encord has developed a set of tools that allow radiologists to enlarge DICOM images, a format commonly used for transferring medical images. And instead of the radiologist interpreting and annotating the entire image, the software is designed to label only the most important parts of the image.

Encord was founded in 2020 by Applied Physics background Erik Landau and Ulrich Stig Hansen. Hansen worked on a master’s thesis project at Imperial College London that focused on visualizing large medical image datasets. It was Hansen who first noted how long it took to collect labeled datasets.

These labeled datasets are important because they provide a “ground truth” from which algorithms can learn. There are several ways to build AI that don’t require labeled datasets, but for the most part, AI (especially in healthcare) relies on supervised learning for which they are needed.

Multiple documents view the images at the same time, drawing polygons around their respective features to create a labeled dataset. In other cases, this can be done using open source tools or sensors. But in any case, the scientific literature suggests that the move is a major hurdle for healthcare in the AI ​​world, especially when it comes to radiology, a field where AI is expected to make great strides but not enough. deliver in moderation. Major paradigm shift. I

“I know there are many doubts [of AI in the medical world]I think progress is very slow,” Landau told gaming-updates. “We think moving to an approach where you really think about the training data will help accelerate the development of these models.”

As the authors of a 2021 paper in Frontiers in Radiology note, it takes 24 years for labeling professionals to label a dataset of approximately 100,000 images. Another 2021 government release issued by the European Association for Nuclear Medicine (EANM) and the European Association for Cardiovascular Imaging (EACVI) states that “obtaining labeled data from medical image analysis can be time consuming and costly.” But it also suggests that new technologies are emerging that can speed up the process.

image credit: Encord DICOM Labeling Platform

Ironically, these new technologies are versions of artificial intelligence itself. For example, a Frontiers article in Radiology 2021 showed that a proactive approach to learning can speed up the process by up to 87%. Going back to the 100,000 image example, it only takes 3.2 years of work instead of 24.

CordVision is essentially a version of an active learning process called micro-simulation. This method mostly works when the team labels a small, representative sample of images. A specific AI is then trained on these images and then applied to a larger pool that the AI ​​tags. Then reviewers will be able to study the work of AI, and not put labels from scratch.

Landu explained it well in a blog post on his Medium page: “Imagine building an algorithm designed to detect Batman in Batman movies.” Your micromodel will be trained on five images of Christian Bale’s Batman. Another can be taught to recognize Ben Affleck’s Batman and so on. Together, you build a larger algorithm with each smaller part, and then release it down the chain.

“We’ve found that it works very well because you can get by with very, very few comments and jump-start the process,” he said.

Encord released data supporting Landau’s claims. For example, a study in collaboration with King’s College London compared CordVision to a labeling program developed by Intel. Five labellers processed 25,744 video frames for endoscopy. Gastroenterologists using CordVision worked 6.4 times faster.

This method also proved effective when applied to a test set of 15,521 COVID-19 x-rays. Humans only rated 5% of the total images, and the final accuracy of the AI ​​labeling model was 93.7%.

However, Enord is far from the only company that has identified this bottleneck and tried to use AI to facilitate the labeling process. Existing companies in the sector are already reporting huge valuations. For example, Scale AI reached a $7.3 billion valuation in 2021 and achieved unicorn status.

Landau acknowledges that the company’s biggest competitor is probably Labelbox. When gaming-updates covered them on the Series A stage, Labelbox showed it to about 50 subscribers. In January, the company closed a $110 million Series D, raising it to the $1 billion mark.

Cordvision is still a very small fish. But that data is being swept up in a tidal wave of labels. Landau says the company is looking for places that are still using open source or proprietary tools to create their data labels.

To date, the company has raised $17.1 in seed and Series A funding since the end of Y Combinator. The company has grown from two founders to a team of 20 people. Landau says that Encord did not go bankrupt. The company is not currently looking for fundraising and believes that the current growth will be enough to get this device through the commercialization process.

Leave a Reply

Your email address will not be published.