Skip to content
This repository was archived by the owner on Jun 22, 2022. It is now read-only.

LightGBM on dimension reduced dataset

Kamil A. Kaczmarek edited this page Jul 10, 2018 · 4 revisions

whale 🐳

Feature Extraction

  • truncated svd projection
  truncated_svd__n_components: 50
  truncated_svd__n_iter: 10
  • pca projection
  pca__n_components: 100
  • fast ica projection
fast_ica__n_components: 15
  • factor analysis
  factor_analysis__n_components: 50
  • gaussian random projection
  gaussian_random_projection__n_components: 50
  gaussian_projection__eps: 0.1

Note as it turns out the eps parameter doesn't matter (tried 0.01,0.1,1.0) with exact same results

  • sparse random projection
  sparse_random_projection__n_components: 50

Model and results

model CV LB πŸ†
lightGBM truncated svd 1.56
lightGBM pca 1.55
lightGBM fast ica 1.57
lightGBM factor analysis 1.51
lightGBM gaussian random projection 1.63
lightGBM sparse random projection 1.47
lightGBM projections (all) 1.47
lightGBM projections best (sparse random projection + factor analysis + truncated svd + fast-ica) 1.448
lightGBM projections second best (sparse random projection) 1.452
lightGBM raw + projections (second best) 1.393
lightGBM projections (second best) + aggregations 1.345
lightGBM raw + projections (second best) + aggregations 1.3416 1.41 πŸš€

Pipeline diagram

pipeline-solution-4

Open solutions

  1. honey bee 🐝 LightGBM and 5fold CV
  2. beetle πŸͺ² LightGBM on binarized dataset
  3. dromedary camel πŸͺ LightGBM with row aggregations
  4. whale 🐳 LightGBM on dimension reduced dataset
  5. water buffalo πŸƒ Exploring various dimension reduction techniques
  6. blowfish 🐑 bucketing row aggregations

Clone this wiki locally