Quote: Originally posted by OUrocketman  | This is an interesting discussion indeed.
Carl, interesting work using machine learning in an effort to elicit the most probable predictors of future success!
At a high level, I think it's interesting that for large number of indicators the importance is on the indicators. Seems to support the overall
notion that when we down select the indicators, we are beginning to sniff out interesting features that while may not easily be intelligible by human
inspection, are in fact there. Once this is accomplished, it seems to make intuitive sense that the big money has knowledge of this as well, and
seeks to make more money in the future in the most efficient way possible, based on the underlying features that tend to move a market--thus the
higher weight on performance.
So, it seems like, at a high level, your feature extraction verifies in some way Peter's two pass approach. I'm curious to know have you considered
ASTAB-C, RSTAB-C, WFE, OOS fitness as features as well and they didn't make the cut on the importance scale in your screenshot above?
Also, have you checked out Microsoft's open source Light Gradient Boosted Method? It's rumored to out perform randoforest, but I'll confess I'm not
currently familiar enough with the topic to assess it--longer term goal of mine.
MS stuff is here: https://lightgbm.readthedocs.io/en/latest/ |
Thanks OUrocketman,
There are a lot of boost models: light boost, catboost, xgboost.
My teacher always says: try an ensemble of models to see what works best on your data set.
I have used the WF metrics in previous data analysis projects.
The issue with WF is, I only have a couple of hundred rows of data.
And it is easy to get 100k rows of data with build and validation results.
By using sampling I divide the GSB test data into an in sample and an out of sample part, so a large number of data points is more reliable.
I will look for these old WF test results and will let you know.
Too high parameter stability like 100% seems very good at first glance, but isn't always the case because 100% stability could also mean only a
particular set of parameter values give the best result, but making a small change in parameter value might cause the results to collapse.
So 100% parameter stability doesn't always mean the strategy is robust (I think...).
I prefer 50% to 80% GSB parameter stability, but always do an extensive rolling and anchored WF in TS and EWFO to be sure.
With all this said, not all the strategies I have selected according to this selection process were profitable going forward.
|