Wednesday, February 3, 2016

Decision trees and overfitting

Hey!
So I am about to watch the second video in the CMU's course for ML. I am that unlike earlier, I am able to find time and complete the full course this time. So the title of the video is "Decision trees overfitting and probability". So let's see what do I remember from before:


  1. Overfitting: A phenomenon which occurs when your classifier performs very well on your training data but is not able to perform well on unseen data (testing data).
  2. Why does it happen? You are very limited in your training set?? uhh.... no being able to explain this... lets see what Tom has to say about it.
  3. How to mitigate it? Using regularization parameters wherever applicable. L1 and L2 regularizations can be used in some linear classifiers like Logistic regression. A great line from great man is: amount of training data should never be a problem (in case it is more!!). All in all there is a lot that I need to learn about overfitting. 
So lets begin watching the video, taking down notes and hopefully I would be a bit clearer about the concept. 

Comments and question:

  • Everything in machine learning depends on the assumption that the testing data is random and a good approximation of the unlabelled data. 
  • Class imbalance hasn't been touched yet! Though to think of it more, does doing upsampling even help in the case of decision tree?
  • Short answer for why a short tree is preferred: Occam's razaor. We hope that it would mean less chances of over fitting.  
  • Have to watch/read again about the guarantees about she was talking about in supervised learning.... feeling extremely sleepy right now...... have to do it sometime!!
  • The prof says that post pruning is better than pre pruning because here we evaluate more trees. Yup. It may be more computationally intensive and hence might not be practical on large datasets if you are constrained with limited time. Should see some literature/practical open source software about how it is done. Might even see Amazon's random forest implementation. A thing which I won't (and I can't) share with you guys ;)
  • Reasons for overfitting: noise and statistical coincidences. 
  • I remember someone saying the statement: decision tree is polynomial in terms of input attributes.... what is the meaning and significance of this statement? 

No comments:

Post a Comment