Everybody loves gradient boosting. It shows great results in the most part of real-world applications, the phrase "stacking xgboost" has become a meme. Usually, this is about decision tree boosting, as we use CPU and machines with loads of RAM for learning. Recently many have bought video cards and decided: why don't we boost on them, cause neural networks significantly speed up on GPU.
Unfortunately, it's not that easy: boosting realization on GPU exists, but there are many nuances in its usefulness and conciseness. Do we need a video card in 2017-2018 for gradient boosting learning? We'll try to figure it out together.