Oleg Zabluda's blog
Friday, November 11, 2016
To better understand and combat [cancer], medical researchers rely on cancer registry programs—a national network of organizations that systematically collect demographic and clinical information related to the diagnosis, treatment, and history of cancer incidence in the United States. [...] Much of this data is drawn from electronic, text-based clinical reports that must be manually curated—a time-intensive process—before it can be used in research. For example, cancer pathology reports, text documents that describe cancerous tissue in detail, must be individually read and annotated by experts before becoming part of a cancer registry. With millions of new reports being produced each year [...] “The manual model is not scalable,”
Since 2014 Tourassi has led a team focused on creating software that can quickly identify valuable information in cancer reports, [...] Using a dataset composed of 1,976 pathology reports provided [...] Tourassi’s team trained a deep-learning algorithm to carry out two different but closely related information-extraction tasks. In the first task the algorithm scanned each report to identify the primary location of the cancer. In the second task the algorithm identified the cancer site’s laterality—or on which side of the body the cancer was located.

By setting up a neural network designed to exploit the related information shared by the two tasks, an arrangement known as multitask learning, the team found the algorithm performed substantially better than competing methods.


| |


Powered by Blogger