Double-stage discretization approaches for biomarker-based bladder cancer survival modeling
Published Online: Aug 10, 2021
Page range: 29 - 47
Received: Feb 02, 2021
Accepted: Jul 06, 2021
DOI: https://doi.org/10.2478/caim-2021-0003
Keywords
© 2021 Mauro Nascimben et al., published by Sciendo
This work is licensed under the Creative Commons Attribution 4.0 International License.
Bioinformatic techniques targeting gene expression data require specific analysis pipelines with the aim of studying properties, adaptation, and disease outcomes in a sample population. Present investigation compared together results of four numerical experiments modeling survival rates from bladder cancer genetic profiles. Research showed that a sequence of two discretization phases produced remarkable results compared to a classic approach employing one discretization of gene expression data. Analysis involving two discretization phases consisted of a primary discretizer followed by refinement or pre-binning input values before the main discretization scheme. Among all tests, the best model encloses a sequence of data transformation to compensate skewness, data discretization phase with class-attribute interdependence maximization algorithm, and final classification by voting feature intervals, a classifier that also provides discrete interval optimization.