Usman Raza
1 min readMar 16, 2019

--

Hi Rochelle Silva. Thanks for your comment. Yes indeed, using test set that does not contain synthetic data is better in general (really depends on the objectives one is trying to achieve). One issue to consider is that it’s hard to do k-fold cross-validation when you split the data before applying SMOTE like techniques. Apart from all this, the only real test is out-of-sample testing in the real world, which is where rubber meets the road.

--

--

Usman Raza
Usman Raza

Written by Usman Raza

Physician turned Product Leader. Transforming healthcare through technology. LinkedIn.com/in/uraza

No responses yet