Publication: A Hybrid Statistical Data Pre-processing Approach for Language-Independent Text Classification.