{"id":1394,"date":"2020-01-15T17:24:10","date_gmt":"2020-01-15T17:24:10","guid":{"rendered":"https:\/\/www.danielparente.net\/en\/2020\/01\/15\/prostate-cancer-detection-using-deep-convolutional-neural-networks\/"},"modified":"2020-01-15T17:24:10","modified_gmt":"2020-01-15T17:24:10","slug":"prostate-cancer-detection-using-deep-convolutional-neural-networks","status":"publish","type":"post","link":"https:\/\/www.danielparente.net\/en\/2020\/01\/15\/prostate-cancer-detection-using-deep-convolutional-neural-networks\/","title":{"rendered":"Prostate Cancer Detection using Deep Convolutional Neural Networks"},"content":{"rendered":"<p> [ad_1]<br \/>\n<\/p>\n<div id=\"Sec2-content\">\n<h3 class=\"c-article__sub-heading u-h3\" id=\"Sec3\">Data<\/h3>\n<p>A cohort of 427 consecutive patients with a PI-RADS score of 3 or higher who underwent biopsy were included. Out of 427 patients, 175 patients had clinically significant prostate cancer and 252 patients did not. A total of 5,832 2D slices of each DWI sequence (e.g., b0) which contained prostate gland were used as our dataset. We set the patient with Gleason score higher than or equal to 7 (International Society of Uropatholgists grade group (GG &gt;= 2) as the patient with a clinically significant prostate cancer and patient with Gleason score lower than or equal to 6 (GG = 1) or with no cancer (GG = 0) as the patient without a clinically significant prostate cancer.<\/p>\n<h3 class=\"c-article__sub-heading u-h3\" id=\"Sec4\">MRI Acquisition<\/h3>\n<p>The DWI data was acquired between January 2014 to July 2017 using a Philips Achieva 3T whole body unit MR imaging scanner. The transverse plane of DWI sequences was obtained using a single-slot spin-echo echo-planar imaging sequence with four b values (0, 100, 400, and 1000s mm<span class=\"mathjax-tex\">({}^{-2})<\/span>), repetition time (TR) 5000~7000 ms, echo time (TE) 61ms, slice thickness 3mm, field of view (FOV) 240 mm <span class=\"mathjax-tex\">(times )<\/span> 240 mm and matrix of 140 <span class=\"mathjax-tex\">(times )<\/span> 140.<\/p>\n<p>DWI is an MRI sequence which measures the sensitivity of tissue to Brownian motion and it has been found to be a promising imaging technique for PCa detection<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 32\" title=\"Padhani, A. R. et al. Diffusion-weighted magnetic resonance imaging as a cancer biomarker: consensus and recommendations. Neoplasia 11, 102&#x2013;125 (2009).\" href=\"https:\/\/www.nature.com\/articles\/s41598-019-55972-4#ref-CR32\" id=\"ref-link-section-d40498e812\" target=\"_blank\" rel=\"noopener\">32<\/a><\/sup>. The DWI image is usually generated with different b values (0, 100, 400, and 1000s mm<span class=\"mathjax-tex\">({}^{-2})<\/span>) which generates various signal intensities representing the amount of water diffusion in the tissue and can be used to estimate ADC and compute high b-value images (b1600)<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 33\" title=\"Glaister, J., Cameron, A., Wong, A. &amp; Haider, M.A. Quantitative investigative analysis of tumour separability in the prostate gland using ultra-high b-value computed diffusion imaging. In Engineering in Medicine and Biology Society (EMBC), 2012 Annual International Conference of the IEEE, 420&#x2013; 423 (IEEE, 2012).\" href=\"https:\/\/www.nature.com\/articles\/s41598-019-55972-4#ref-CR33\" id=\"ref-link-section-d40498e829\" target=\"_blank\" rel=\"noopener\">33<\/a><\/sup>.<\/p>\n<p>In order to use DWI images as input to our deep learning network, we resized all of the DWI slices into 144 <span class=\"mathjax-tex\">(times )<\/span> 144 pixels, and center cropped them with 66 <span class=\"mathjax-tex\">(times )<\/span> 66 pixels such that the prostate was covered. The CNNs were modified to feed DWI data with 6 channels (ADC, b0, b100, b400, b1000, and b1600) instead of images with 3 channels (red, green and blue.)<\/p>\n<h3 class=\"c-article__sub-heading u-h3\" id=\"Sec5\">Training, validation, and test sets<\/h3>\n<p>We separated 427 patients DWI images into three different sets, the training set with 271 patients (3,692 slices), the validation set with 48 patients (654 slices), and the test set with 108 patients (1,486 slices) where the training\/validation\/test ratio was 64%, 11%, 25%. The separation procedure of the dataset was as follows. First, we separated the dataset into two sets, the training\/validation set as 75% and the test set as 25% to maintain a reasonable sample size for the test set. Second, we separated the training\/validation set into two sets with training set as 85% of training\/validation set and the validation set as 15% of training\/validation set (Table\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"table anchor\" href=\"https:\/\/www.nature.com\/articles\/s41598-019-55972-4#Tab1\" target=\"_blank\" rel=\"noopener\">1<\/a>). The ratios between the PCa patients and non PCa patients were kept roughly similar throughout the data sets.<\/p>\n<div class=\"c-article-table\" data-test=\"inline-table\" data-container-section=\"table\" id=\"table-1\">\n<figure><figcaption class=\"c-article-table__figcaption\"><b id=\"Tab1\" data-test=\"table-caption\">Table 1 Number of patients and slices with and without PCa for training, validation, and test sets.<\/b><\/figcaption><\/figure>\n<\/div>\n<h3 class=\"c-article__sub-heading u-h3\" id=\"Sec6\">Data preprocessing<\/h3>\n<p>All of DWI images in the dataset were normalized across the entire dataset using the following function.<\/p>\n<div id=\"Equ1\" class=\"c-article-equation\">\n<p><span class=\"mathjax-tex\">$${X}_{i{rm{_}}normalized}=frac{{X}_{i}-mu }{std}$$<\/span><\/p>\n<p>\n                    (1)\n                <\/p>\n<\/div>\n<p> where <span class=\"mathjax-tex\">({X}_{i})<\/span> is the pixels in an individual MRI slice, <span class=\"mathjax-tex\">(mu )<\/span> is the mean of the dataset, std is the standard deviation of the dataset, and <span class=\"mathjax-tex\">({X}_{i{rm{_}}normalized})<\/span> is the normalized individual MRI slice.<\/p>\n<h3 class=\"c-article__sub-heading u-h3\" id=\"Sec7\">Pipeline<\/h3>\n<p>The proposed pipeline consists of three stages. In the first stage, each DWI slice is classified using five individually trained CNNs models. In the second stage, first-order statistical features (e.g., mean, standard deviation, median, etc.) are extracted from the probability sets of CNNs outputs, and important features are selected through a decision tree-based feature selector. In the last stage, a Random Forest classifier is used to classify patients into groups with and without PCa using these first order statistical features. The Random Forest classifier was trained and fine-tuned by the features extracted from the validation set with 10 fold cross-validation method. Figure\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"https:\/\/www.nature.com\/articles\/s41598-019-55972-4#Fig1\" target=\"_blank\" rel=\"noopener\">1<\/a> shows the block diagram of the proposed pipeline.<\/p>\n<div class=\"c-article-section__figure js-c-reading-companion-figures-item\" data-test=\"figure\" data-container-section=\"figure\" id=\"figure-1\">\n<figure><figcaption><b id=\"Fig1\" class=\"c-article-section__figure-caption\" data-test=\"figure-caption-text\">Figure 1<\/b><\/figcaption><div class=\"c-article-section__figure-content\">\n<div class=\"c-article-section__figure-item\"><a class=\"c-article-section__figure-link\" data-test=\"img-link\" data-track=\"click\" data-track-category=\"article body\" data-track-label=\"image\" data-track-action=\"view figure\" href=\"https:\/\/www.nature.com\/articles\/s41598-019-55972-4\/figures\/1\" rel=\"nofollow noopener\" target=\"_blank\"><picture><source type=\"image\/webp\" srcset=\"https:\/\/media.springernature.com\/lw685\/springer-static\/image\/art%3A10.1038%2Fs41598-019-55972-4\/MediaObjects\/41598_2019_55972_Fig1_HTML.png?as=webp\"\/><img decoding=\"async\" aria-describedby=\"figure-1-desc\" src=\"https:\/\/media.springernature.com\/lw685\/springer-static\/image\/art%3A10.1038%2Fs41598-019-55972-4\/MediaObjects\/41598_2019_55972_Fig1_HTML.png\" alt=\"figure1\" loading=\"lazy\"\/><\/picture><\/a><\/div>\n<div class=\"c-article-section__figure-description\" data-test=\"bottom-caption\" id=\"figure-1-desc\">\n<p>Block diagram of the proposed pipeline for prostate cancer detection. The inputs to each CNN are 66 <span class=\"mathjax-tex\">(times )<\/span> 66 <span class=\"mathjax-tex\">(times )<\/span> 6 (ADC, b0, b100, b400, b1000, b1600) MRI slices. The output is the slice level and patient level results.<\/p>\n<\/div>\n<\/div>\n<\/figure>\n<\/div>\n<h4 class=\"c-article__sub-heading u-h3 c-article__sub-heading--light\" id=\"Sec8\">ResNet<\/h4>\n<p>Since ResNet architecture has shown promising performance in multiple computer vision tasks<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 17\" title=\"He, K., Zhang, X., Ren, S. &amp; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, 770&#x2013; 778 (2016).\" href=\"https:\/\/www.nature.com\/articles\/s41598-019-55972-4#ref-CR17\" id=\"ref-link-section-d40498e1064\" target=\"_blank\" rel=\"noopener\">17<\/a><\/sup>, we chose it as our base architecture for this research. Each Residual Block consists of convolutional layers<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 21\" title=\"LeCun, Y., Bengio, Y. &amp; Hinton, G. Deep learning. nature 521, 436 (2015).\" href=\"https:\/\/www.nature.com\/articles\/s41598-019-55972-4#ref-CR21\" id=\"ref-link-section-d40498e1068\" target=\"_blank\" rel=\"noopener\">21<\/a><\/sup> and identity shortcut connection<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 17\" title=\"He, K., Zhang, X., Ren, S. &amp; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, 770&#x2013; 778 (2016).\" href=\"https:\/\/www.nature.com\/articles\/s41598-019-55972-4#ref-CR17\" id=\"ref-link-section-d40498e1072\" target=\"_blank\" rel=\"noopener\">17<\/a><\/sup> that skips those layers, and their outcomes are added at the end, as shown in Figure\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"https:\/\/www.nature.com\/articles\/s41598-019-55972-4#Fig2\" target=\"_blank\" rel=\"noopener\">2-a<\/a>. When input and output dimensions are the same, the identity shortcuts, denoted by x, can be directly applied. The following formula shows the identity mapping process.<\/p>\n<div id=\"Equ2\" class=\"c-article-equation\">\n<p><span class=\"mathjax-tex\">$$y=Fleft(x,{{W}_{i}}right)+x$$<\/span><\/p>\n<p>\n                    (2)\n                <\/p>\n<\/div>\n<p> where <span class=\"mathjax-tex\">(F(x,{W}_{i}))<\/span> is the output from convolutional layers and x is the input. When the dimension of input is not the same as that of the output (e.g., at the end of the Residual Block), the linear projection <span class=\"mathjax-tex\">({W}_{s})<\/span> changes the dimension of the input to be same as that of the output which is defined as: <\/p>\n<div id=\"Equ3\" class=\"c-article-equation\">\n<p><span class=\"mathjax-tex\">$$Y=Fleft(x,{{W}_{i}}right)+{W}_{s}x.$$<\/span><\/p>\n<p>\n                    (3)\n                <\/p>\n<\/div>\n<div class=\"c-article-section__figure js-c-reading-companion-figures-item\" data-test=\"figure\" data-container-section=\"figure\" id=\"figure-2\">\n<figure><figcaption><b id=\"Fig2\" class=\"c-article-section__figure-caption\" data-test=\"figure-caption-text\">Figure 2<\/b><\/figcaption><div class=\"c-article-section__figure-content\">\n<div class=\"c-article-section__figure-item\"><a class=\"c-article-section__figure-link\" data-test=\"img-link\" data-track=\"click\" data-track-category=\"article body\" data-track-label=\"image\" data-track-action=\"view figure\" href=\"https:\/\/www.nature.com\/articles\/s41598-019-55972-4\/figures\/2\" rel=\"nofollow noopener\" target=\"_blank\"><picture><source type=\"image\/webp\" srcset=\"https:\/\/media.springernature.com\/lw685\/springer-static\/image\/art%3A10.1038%2Fs41598-019-55972-4\/MediaObjects\/41598_2019_55972_Fig2_HTML.png?as=webp\"\/><img decoding=\"async\" aria-describedby=\"figure-2-desc\" src=\"https:\/\/media.springernature.com\/lw685\/springer-static\/image\/art%3A10.1038%2Fs41598-019-55972-4\/MediaObjects\/41598_2019_55972_Fig2_HTML.png\" alt=\"figure2\" loading=\"lazy\"\/><\/picture><\/a><\/div>\n<div class=\"c-article-section__figure-description\" data-test=\"bottom-caption\" id=\"figure-2-desc\">\n<p>The structural difference between original residual network and fully pre-activated residual network.<\/p>\n<\/div>\n<\/div>\n<\/figure>\n<\/div>\n<p>To improve the performance of the architecture, we implemented a fully pre-activated residual network<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 34\" title=\"He, K., Zhang, X., Ren, S. &amp; Sun, J. Identity mappings in deep residual networks. In European conference on computer vision, 630&#x2013; 645 (Springer, 2016).\" href=\"https:\/\/www.nature.com\/articles\/s41598-019-55972-4#ref-CR34\" id=\"ref-link-section-d40498e1182\" target=\"_blank\" rel=\"noopener\">34<\/a><\/sup>. In the original ResNet, batch normalization and ReLU activation layers were followed after the convolution layer, but in pre-activation ResNet, batch normalization and ReLU activation layers comes before the convolution layers. The advantage of this structure is that the gradient of a layer does not vanish even when the weights are arbitrarily small<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 34\" title=\"He, K., Zhang, X., Ren, S. &amp; Sun, J. Identity mappings in deep residual networks. In European conference on computer vision, 630&#x2013; 645 (Springer, 2016).\" href=\"https:\/\/www.nature.com\/articles\/s41598-019-55972-4#ref-CR34\" id=\"ref-link-section-d40498e1186\" target=\"_blank\" rel=\"noopener\">34<\/a><\/sup>. Instead of 2-layer deep ResNet block, we implemented a 3-layer deep &#8220;bottleneck&#8221; building block since it significantly reduces training time without sacrificing the performance<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 17\" title=\"He, K., Zhang, X., Ren, S. &amp; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, 770&#x2013; 778 (2016).\" href=\"https:\/\/www.nature.com\/articles\/s41598-019-55972-4#ref-CR17\" id=\"ref-link-section-d40498e1190\" target=\"_blank\" rel=\"noopener\">17<\/a><\/sup> (Figure <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"https:\/\/www.nature.com\/articles\/s41598-019-55972-4#Fig2\" target=\"_blank\" rel=\"noopener\">2-b<\/a>).<\/p>\n<h4 class=\"c-article__sub-heading u-h3 c-article__sub-heading--light\" id=\"Sec9\">CNNs architecture and training<\/h4>\n<p>A 41 layers deep ResNet was created for the slice-level classification. The architecture is composed of 2D convolutional layers with a 7 <span class=\"mathjax-tex\">(times )<\/span> 7 filter followed by a 3 <span class=\"mathjax-tex\">(times )<\/span> 3 Max pooling layer and residual blocks (Res Block). The depth of 41 layers were found to be optimal through hyper-parameter fine-tuning procedure using the validation set. Since the input images were small (66 <span class=\"mathjax-tex\">(times )<\/span> 66 pixels) and the tumorous regions were even smaller (e.g., 4 <span class=\"mathjax-tex\">(times )<\/span> 3 pixels), additional ResNet blocks or deeper networks were needed. The first ResNet Block (ResNet Block1 in Table\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"table anchor\" href=\"https:\/\/www.nature.com\/articles\/s41598-019-55972-4#Tab2\" target=\"_blank\" rel=\"noopener\">2<\/a>) is 3-layer bottleneck blocks with 2D CNN layers with filter sizes 64, 64 and 256 which is stacked 4 times. The second ResNet Block (ResNet Block2 in Table\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"table anchor\" href=\"https:\/\/www.nature.com\/articles\/s41598-019-55972-4#Tab2\" target=\"_blank\" rel=\"noopener\">2<\/a>) is 3-layer bottleneck blocks with 2D CNN layers with filter sizes 128, 128, and 512 which is stacked 9 times. 2 <span class=\"mathjax-tex\">(times )<\/span> 2 2D Average Pooling, Dropout layer, and 2D Fully connected Layer with 1000 nodes for two probabilistic outputs are followed by the end of Res Blocks. Table\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"table anchor\" href=\"https:\/\/www.nature.com\/articles\/s41598-019-55972-4#Tab2\" target=\"_blank\" rel=\"noopener\">2<\/a> shows the overview of the proposed CNNs architecture.<\/p>\n<div class=\"c-article-table\" data-test=\"inline-table\" data-container-section=\"table\" id=\"table-2\">\n<figure><figcaption class=\"c-article-table__figcaption\"><b id=\"Tab2\" data-test=\"table-caption\">Table 2 The Architecutre of the proposed CNNs.<\/b><\/figcaption><\/figure>\n<\/div>\n<p>Stochastic Gradient Decent<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 35\" title=\"Bottou, L. Large-scale machine learning with stochastic gradient descent. In Proceedings of COMPSTAT&#x2019;2010, 177&#x2013; 186 (Springer, 2010).\" href=\"https:\/\/www.nature.com\/articles\/s41598-019-55972-4#ref-CR35\" id=\"ref-link-section-d40498e1394\" target=\"_blank\" rel=\"noopener\">35<\/a><\/sup> was used as the optimizer with the initial learning rate of 0.001, and it was reduced by a factor of 10 when the model stopped improving after iterations. The model was trained with the batch size set to 8. Dropout rate was set to 0.90. We used a weight decay of 0.000001 and a momentum of 0.90. Since the dataset is extremely unbalanced, binary cross entropy<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 36\" title=\"De Boer, P.-T., Kroese, D. P., Mannor, S. &amp; Rubinstein, R. Y. A tutorial on the cross-entropy method. Annals of operations research 134, 19&#x2013;67 (2005).\" href=\"https:\/\/www.nature.com\/articles\/s41598-019-55972-4#ref-CR36\" id=\"ref-link-section-d40498e1398\" target=\"_blank\" rel=\"noopener\">36<\/a><\/sup> was used as the loss function.<\/p>\n<h4 class=\"c-article__sub-heading u-h3 c-article__sub-heading--light\" id=\"Sec10\">Stacked generalization<\/h4>\n<p>Due to the randomness in training CNNs (for instance, at the beginning of training CNNs, weights are set to arbitrary random numbers), each CNN may be different despite identical set of hyper-parameters and input datasets. This means each CNN may capture different features for the patient-level classification. Stacked generalization<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 37\" title=\"Wolpert, D. H. Stacked generalization. Neural networks 5, 241&#x2013;259 (1992).\" href=\"https:\/\/www.nature.com\/articles\/s41598-019-55972-4#ref-CR37\" id=\"ref-link-section-d40498e1410\" target=\"_blank\" rel=\"noopener\">37<\/a><\/sup> is an ensemble technique that trains multiple classifiers with the same dataset and makes a final prediction using a combination of individual classifiers\u2019 predictions. Stacked generalization typically yields better classification performance compared to a single classifier<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 37\" title=\"Wolpert, D. H. Stacked generalization. Neural networks 5, 241&#x2013;259 (1992).\" href=\"https:\/\/www.nature.com\/articles\/s41598-019-55972-4#ref-CR37\" id=\"ref-link-section-d40498e1414\" target=\"_blank\" rel=\"noopener\">37<\/a><\/sup>. We implemented a simple stacked generalization method using five CNNs. The number of stacked CNNs was selected based on the best performance and increasing the number of CNNs did not show improvement on the patient-level performance. Since there is a limited sample size for patient level (48 patients for validation, which was used to train Random Forest classifier for patient-level detection), increasing the number of CNNs, which leads to an increased number of patient-level features (as discussed in the next section), increases the likelihood of overfitting and hence, decreases the model\u2019s robustness<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 38\" title=\"Vapnik, V.The nature of statistical learning theory (Springer science and business media, 2013).\" href=\"https:\/\/www.nature.com\/articles\/s41598-019-55972-4#ref-CR38\" id=\"ref-link-section-d40498e1418\" target=\"_blank\" rel=\"noopener\">38<\/a><\/sup>. All the slice-level probabilities generated by the five CNNs were fed into a first-order statistical features extractor to generate one set of features for each patient. In the proposed pipeline, the patient-level performance significantly improved (2-tailed P = 0.048) using five CNNs compared to a single CNN (AUC: 0.84, CI: 0.76\u20130.91, vs. AUC: 0.71, CI: 0.61\u20130.81).<\/p>\n<h4 class=\"c-article__sub-heading u-h3 c-article__sub-heading--light\" id=\"Sec11\">First order statistical feature extraction<\/h4>\n<p>Let <span class=\"mathjax-tex\">({p}_{ij})<\/span> and <span class=\"mathjax-tex\">({n}_{ij})<\/span> be the probabilities of a MRI slice associated with PCa and non PCa, respectively, where <span class=\"mathjax-tex\">(i)<\/span> represents one of five individually trained CNNs and <span class=\"mathjax-tex\">(j)<\/span> represents each MRI slice of a patient. Each CNN produces two probability sets, <span class=\"mathjax-tex\">({P}_{i}=left{{p}_{i1},&#8230;,{p}_{iN}right})<\/span> and <span class=\"mathjax-tex\">({N}_{i}=left{{n}_{i1},&#8230;,{n}_{iN}right})<\/span> where <span class=\"mathjax-tex\">(N)<\/span> is the total number of MRI slices for each patient. Within the probability sets, top five probabilities which are higher than 0.74 were selected (<span class=\"mathjax-tex\">(acute{{P}_{i}})<\/span> and <span class=\"mathjax-tex\">(acute{{N}_{i}})<\/span>). This was done to ensure less relevant probabilities at slice level were not used for patient-level classification. The probability cutoff of 0.74 was selected by grid-search using the validation set. Next, from the new probability sets, <span class=\"mathjax-tex\">(acute{{P}_{i}})<\/span> and <span class=\"mathjax-tex\">(acute{{N}_{i}})<\/span>, the first-order statistical features set, <span class=\"mathjax-tex\">({F}_{i}={{f}_{i1},&#8230;{f}_{iK}})<\/span> where K represents the total number of statistical features, were extracted for each patient. Next, the important features, <span class=\"mathjax-tex\">(acute{{F}_{i}})<\/span> were selected by a decision tree-based feature selector<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 39\" title=\"Saeys, Y., Inza, I. &amp; Larra&#xF1;aga, P. A review of feature selection techniques in bioinformatics. bioinformatics 23, 2507&#x2013;2517 (2007).\" href=\"https:\/\/www.nature.com\/articles\/s41598-019-55972-4#ref-CR39\" id=\"ref-link-section-d40498e1663\" target=\"_blank\" rel=\"noopener\">39<\/a><\/sup>. The final feature set was constructed by combining important features, <span class=\"mathjax-tex\">(acute{{F}_{i}})<\/span>, for all five CNNs where <span class=\"mathjax-tex\">(F={acute{{F}_{1}},&#8230;acute{{F}_{5}}})<\/span>.<\/p>\n<p>We extracted nine first-order features which are the mean, standard deviation, variance, median, sum, minimum (only from non PCa class), maximum (only from PCa class), skewness<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 40\" title=\"Kim, H.-Y. Statistical notes for clinical researchers: assessing normal distribution (2) using skewness and kurtosis. Restorative dentistry &amp; endodontics 38, 52&#x2013;54 (2013).\" href=\"https:\/\/www.nature.com\/articles\/s41598-019-55972-4#ref-CR40\" id=\"ref-link-section-d40498e1721\" target=\"_blank\" rel=\"noopener\">40<\/a><\/sup>, kurtosis<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 40\" title=\"Kim, H.-Y. Statistical notes for clinical researchers: assessing normal distribution (2) using skewness and kurtosis. Restorative dentistry &amp; endodontics 38, 52&#x2013;54 (2013).\" href=\"https:\/\/www.nature.com\/articles\/s41598-019-55972-4#ref-CR40\" id=\"ref-link-section-d40498e1725\" target=\"_blank\" rel=\"noopener\">40<\/a><\/sup>, and range from the minimum to maximum from each probability set. This produced 90 features for each patient (9 features for PCa and 9 features for non PCa class for each CNN). We selected 26 best features using the decision tree-based feature selector<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 39\" title=\"Saeys, Y., Inza, I. &amp; Larra&#xF1;aga, P. A review of feature selection techniques in bioinformatics. bioinformatics 23, 2507&#x2013;2517 (2007).\" href=\"https:\/\/www.nature.com\/articles\/s41598-019-55972-4#ref-CR39\" id=\"ref-link-section-d40498e1729\" target=\"_blank\" rel=\"noopener\">39<\/a><\/sup>. The decision tree based-feature selector was fine-tuned and trained with 10 fold cross-validation method using the validation set (Fig.\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"https:\/\/www.nature.com\/articles\/s41598-019-55972-4#Fig3\" target=\"_blank\" rel=\"noopener\">3<\/a>).<\/p>\n<div class=\"c-article-section__figure js-c-reading-companion-figures-item\" data-test=\"figure\" data-container-section=\"figure\" id=\"figure-3\">\n<figure><figcaption><b id=\"Fig3\" class=\"c-article-section__figure-caption\" data-test=\"figure-caption-text\">Figure 3<\/b><\/figcaption><div class=\"c-article-section__figure-content\">\n<div class=\"c-article-section__figure-item\"><a class=\"c-article-section__figure-link\" data-test=\"img-link\" data-track=\"click\" data-track-category=\"article body\" data-track-label=\"image\" data-track-action=\"view figure\" href=\"https:\/\/www.nature.com\/articles\/s41598-019-55972-4\/figures\/3\" rel=\"nofollow noopener\" target=\"_blank\"><picture><source type=\"image\/webp\" srcset=\"https:\/\/media.springernature.com\/lw685\/springer-static\/image\/art%3A10.1038%2Fs41598-019-55972-4\/MediaObjects\/41598_2019_55972_Fig3_HTML.png?as=webp\"\/><img decoding=\"async\" aria-describedby=\"figure-3-desc\" src=\"https:\/\/media.springernature.com\/lw685\/springer-static\/image\/art%3A10.1038%2Fs41598-019-55972-4\/MediaObjects\/41598_2019_55972_Fig3_HTML.png\" alt=\"figure3\" loading=\"lazy\"\/><\/picture><\/a><\/div>\n<div class=\"c-article-section__figure-description\" data-test=\"bottom-caption\" id=\"figure-3-desc\">\n<p>Block diagram of the proposed first-order statistical feature extractor. PCa Set: probabilistic output set from each CNN which is associated with PCa class. Non PCa Set: probabilistic output set from each CNN which is associated with non PCa class.<\/p>\n<\/div>\n<\/div>\n<\/figure>\n<\/div>\n<p>Once first-order statistical features were extracted for each patient, a Random Forest classifier<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 30\" title=\"Breiman, L. Random forests. Machine learning 45, 5&#x2013;32 (2001).\" href=\"https:\/\/www.nature.com\/articles\/s41598-019-55972-4#ref-CR30\" id=\"ref-link-section-d40498e1746\" target=\"_blank\" rel=\"noopener\">30<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 31\" title=\"Nguyen, C., Wang, Y. &amp; Nguyen, H. N. Random forest classifier combined with feature selection for breast cancer diagnosis and prognostic. J. of Biomed. Sci. and Eng. 6, 551 (2013).\" href=\"https:\/\/www.nature.com\/articles\/s41598-019-55972-4#ref-CR31\" id=\"ref-link-section-d40498e1749\" target=\"_blank\" rel=\"noopener\">31<\/a><\/sup> was trained using the validation set and tested on the test set for patient-level classification.<\/p>\n<h4 class=\"c-article__sub-heading u-h3 c-article__sub-heading--light\" id=\"Sec12\">Computational time<\/h4>\n<p>The CNNs were trained using one Nvidia Titan X GPU, 8 cores Intel i7 CPU and 32 GB memory. It took 6 hours to train all five CNNs with up to 100 iterations, less than 10 seconds to train the Random Forest classifier, and less than 1 minute to test all 108 patients.<\/p>\n<h3 class=\"c-article__sub-heading u-h3\" id=\"Sec13\">Ethics approval and consent to participate<\/h3>\n<p>The Sunnybrook Health Sciences Centre Research Ethics Boards approved this retrospective single institution study and waived the requirement for informed consent.<\/p>\n<\/div>\n<p>[ad_2]<br \/>\n<br \/><a href=\"https:\/\/www.nature.com\/articles\/s41598-019-55972-4\" target=\"_blank\" rel=\"noopener\">Source link <\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>[ad_1] Data A cohort of 427 consecutive patients with a PI-RADS score of 3 or higher who underwent biopsy were included. Out of 427 patients, 175 patients had clinically significant prostate cancer and 252 patients did not. A total of 5,832 2D slices of each DWI sequence (e.g., b0) which contained prostate gland were used [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":1395,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":"","jetpack_post_was_ever_published":false},"categories":[94,92,98],"tags":[],"class_list":["post-1394","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-artificial-intelligence","category-data-science","category-machine-learning"],"blocksy_meta":[],"jetpack_featured_media_url":"https:\/\/e928cfdc7rs.exactdn.com\/info\/uploads\/sites\/3\/2020\/01\/Prostate-Cancer-Detection-using-Deep-Convolutional-Neural-Networks.png?strip=all","jetpack_shortlink":"https:\/\/wp.me\/p2TFCd-mu","jetpack_sharing_enabled":true,"jetpack-related-posts":[],"_links":{"self":[{"href":"https:\/\/www.danielparente.net\/en\/wp-json\/wp\/v2\/posts\/1394","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.danielparente.net\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.danielparente.net\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.danielparente.net\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.danielparente.net\/en\/wp-json\/wp\/v2\/comments?post=1394"}],"version-history":[{"count":0,"href":"https:\/\/www.danielparente.net\/en\/wp-json\/wp\/v2\/posts\/1394\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.danielparente.net\/en\/wp-json\/wp\/v2\/media\/1395"}],"wp:attachment":[{"href":"https:\/\/www.danielparente.net\/en\/wp-json\/wp\/v2\/media?parent=1394"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.danielparente.net\/en\/wp-json\/wp\/v2\/categories?post=1394"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.danielparente.net\/en\/wp-json\/wp\/v2\/tags?post=1394"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}