| Pages:
1
2
3
4
5
6
..
47 |
admin
Super Administrator
       
Posts: 5060
Registered: 7-4-2017
Member Is Offline
Mood: No Mood
|
|
Quote: Originally posted by cyrus68  | The video proposes a useful test that I may adopt because it is quite fast. However, I would like to note that it is a test of the viability of a
particular strategy applied to a given symbol. For example, in the case of ES, if you selected 15-minute data for the primary symbol, different
secondary datasets, 5 indicators and 2 operators – this would constitute a different strategy. And let’s not forget different historical periods over
which the test may be carried out. The test may produce different results compared with your first strategy.
One thing I noticed is that, for ES, you set the max # of unique systems at 600. Yet at the end you had 1735. Either GSB didn’t stop or you let it run
further. I have the same problem with max # completed. Typically, I set it at 5 million. So, if GSB is processing 35,000 a minute, it should stop
after 2 hours and 20 minutes. That doesn’t happen. So, I have to stop it manually.
Currently I use a rather biased, but simple method, to gauge the overall viability of a strategy. I look at the proportion of WF results with PAS
scores above 50. If, out of 60 WF, there are 40 with scores above 50 - and most are bunched closer to 100 – that is a good sign. If there are only 10
or 15 with acceptable scores, that is not a good sign. The test is biased because WF is carried out on systems that are pre-selected on the basis of
attractive metrics.
|
The reason for the over shoot in systems generated is GSB stops workers after this number is achieved, but then the workers finish of the current
population and generations. This doesnt matter the least to me. You can always delete all systems after the 600 or so you chose. However the more
systems the better. Using less workers will give less over shoot. I used 10 workers for these tests typically.
It would be good too compare the two methods. This is all fairly un explored territory, but worth investigating.
|
|
|
cyrus68
Member
 
Posts: 171
Registered: 5-6-2017
Member Is Offline
Mood: No Mood
|
|
Thanks Carl.
I didn't notice the bad link. I copied and pasted from my earlier post.
Your link is correct.
|
|
|
edgetrader
Junior Member

Posts: 24
Registered: 16-5-2018
Member Is Offline
Mood: No Mood
|
|
Other metrics can of course be added and tried. I like the way Peter compares IS and OOS in his videos. He's looking at the metrics that matter for
trading. Net profit, average trade, and profit factor are also what I would look at when evaluating a system.
The only thing I'd do differently is to divide OOS by IS (not other way round) before forming the overall average. This would output percentages, like
OOS is 60% as good as IS, and a higher number would be better.
A great benefit of Peter's approach is that it includes market direction, trading sessions and other settings. For example VXX is a market that trends
down. Long-only systems will be much harder to make than short-only systems. Since Peter's IS and OOS systems share the same settings, like
short-only, the metrics of the IS systems are indeed the best benchmark to compare metrics of OOS systems. Looking at OOS just by itself with no
suitable benchmark wouldn't account for things like short bias.
|
|
|
cyrus68
Member
 
Posts: 171
Registered: 5-6-2017
Member Is Offline
Mood: No Mood
|
|
Just to explain, for everybody's benefit, the t test is a statistically sound method of summarising system quality. It is done on the P/L, by trade,
of the system. It allows us to rank systems by their overall quality. We would also be able to see the average value for all systems and compare the
IS and OOS averages. Its primary use is to evaluate OOS systems.
The Wilcoxon test is a measure of the quality of a strategy. It is calculated on paired values of IS and OOS systems. For example, using NP or PF. It
is also a statistically sound test.
In my view, these tests would greatly enhance GSB's capabilities.
|
|
|
admin
Super Administrator
       
Posts: 5060
Registered: 7-4-2017
Member Is Offline
Mood: No Mood
|
|
Quote: Originally posted by cyrus68  | Just to explain, for everybody's benefit, the t test is a statistically sound method of summarising system quality. It is done on the P/L, by trade,
of the system. It allows us to rank systems by their overall quality. We would also be able to see the average value for all systems and compare the
IS and OOS averages. Its primary use is to evaluate OOS systems.
The Wilcoxon test is a measure of the quality of a strategy. It is calculated on paired values of IS and OOS systems. For example, using NP or PF. It
is also a statistically sound test.
In my view, these tests would greatly enhance GSB's capabilities. |
I'm going to chat to the programmer about this. This last week I UN-expectanly went to New Zealand for family matters, and have done just the bare
minimum. Hence no great changes of recent. There is a new build but it just has some bug fixes. Still waiting on the version with multi time frame /
symbol support for the same system.
|
|
|
cyrus68
Member
 
Posts: 171
Registered: 5-6-2017
Member Is Offline
Mood: No Mood
|
|
On Sunday, while imbibing my favourite scotch, a thought occurred to me. Implementing the Wilcoxon test would be a real pain in the butt for the
programmer. There is a simpler way. Why not enable exporting the results of running a strategy (the metrics panel) as a csv file? We can easily import
the data into Excel and carry out the Wilcoxon test in Excel, or in any number of statistical packages.
My main objective is to do the test on the NP of IS and OOS. But you could also do the test on PF or anything else. The whole process of exporting the
data and running the test could be done in 5 minutes. In addition, you can do a graph of the data.
As for the t test, it should be relatively simple to do the programming.
|
|
|
admin
Super Administrator
       
Posts: 5060
Registered: 7-4-2017
Member Is Offline
Mood: No Mood
|
|
Quote: Originally posted by cyrus68  | On Sunday, while imbibing my favourite scotch, a thought occurred to me. Implementing the Wilcoxon test would be a real pain in the butt for the
programmer. There is a simpler way. Why not enable exporting the results of running a strategy (the metrics panel) as a csv file? We can easily import
the data into Excel and carry out the Wilcoxon test in Excel, or in any number of statistical packages.
My main objective is to do the test on the NP of IS and OOS. But you could also do the test on PF or anything else. The whole process of exporting the
data and running the test could be done in 5 minutes. In addition, you can do a graph of the data.
As for the t test, it should be relatively simple to do the programming. |
I think excel export is a great idea.
|
|
|
Carl
Member
 
Posts: 342
Registered: 10-5-2017
Member Is Offline
Mood: No Mood
|
|
In episode 143 of the Better System Trader podcast Bruce Vanstone mentions a couple of other statistical methods to quantity system performance.
http://bettersystemtrader.com/the-dna-approach-to-trading-wi...
Anova
T-test
Ledoit-Wolf
Diebolt-Mariano
|
|
|
cyrus68
Member
 
Posts: 171
Registered: 5-6-2017
Member Is Offline
Mood: No Mood
|
|
Hi Carl
I haven't listened to the podcast yet. It looks to be useful.
The statistical methods that you mentioned are all available in a number of packages.
|
|
|
admin
Super Administrator
       
Posts: 5060
Registered: 7-4-2017
Member Is Offline
Mood: No Mood
|
|
This short video is to show how to get session times on your chosen market.
Attachment: Login to view the details
This is critical for all markets. However S&P500,S&P400, Nasdaq, mini Dow, Russell 2000 you can just use the equivalent of 830 to 1500 central USA
time. (don't use the last 15 minutes of the day)
Also explained is a little on exchange time or local time. Most users are confused over this. You should be using local time because exchange time
will not allow other symbols to be added to your chart that are on different time zones.
If you use GSB supplied data (typically central USA time) and your own local time that is not central USA, you MUST NEVER mix different local times.
Other wise the data will be forward or backward looking by the difference in the local times.
|
|
|
admin
Super Administrator
       
Posts: 5060
Registered: 7-4-2017
Member Is Offline
Mood: No Mood
|
|
The above video is now on youtube.
https://youtu.be/NFC7ego_Y70
The market validation video has been slightly tweaked, though the last bit on crude oil still could have been improved.
https://youtu.be/iG7MVOC56zk
If you are new to this thread, both videos are essential to view. The first one is simple and short.
|
|
|
admin
Super Administrator
       
Posts: 5060
Registered: 7-4-2017
Member Is Offline
Mood: No Mood
|
|
Export of system metrics can now be done, but the requested newer system quality metrics have not been done. Priority is muti time frame /data
support.
Get the exe in the member only section of beta builds.
http://trademaid.info/forum/viewthread.php?tid=39
|
|
|
admin
Super Administrator
       
Posts: 5060
Registered: 7-4-2017
Member Is Offline
Mood: No Mood
|
|
I did some interesting market validation today on sqqq 15 minutes (short only)
Market validation failed quite badly, though some of my systems on sqqq (but not all of them) have gone well.
This implies that failing market validation doesn't mean GSB cant make systems on the market, but that it is less likely to work out of sample.
My gut reaction is market validation is going work much better with 14,15,16 or 29,30,31 minute bars. This is what is being worked on in GSB now.
The system on the left I traded on live account, the system on the right I did not trade.
|
|
|
admin
Super Administrator
       
Posts: 5060
Registered: 7-4-2017
Member Is Offline
Mood: No Mood
|
|
The csv export can also be used to determine what indicators are most used, and never used.
Many years ago with the earlier generation of GSB, I found that common indicators with the odd less common indicator was the better than just allowing
only the common indicators. ( I will need to re verify this) But this is also going to be useful to see how often your custom indicators are used.
Long term I want GSB to have its own analytic's.
|
|
|
rws
Member
 
Posts: 114
Registered: 12-6-2017
Member Is Offline
Mood: No Mood
|
|
Something like this including multiple timeframe and multiple tickers confirmation is usefull.
Quote: Originally posted by admin  | I did some interesting market validation today on sqqq 15 minutes (short only)
Market validation failed quite badly, though some of my systems on sqqq (but not all of them) have gone well.
This implies that failing market validation doesn't mean GSB cant make systems on the market, but that it is less likely to work out of sample.
My gut reaction is market validation is going work much better with 14,15,16 or 29,30,31 minute bars. This is what is being worked on in GSB now.
The system on the left I traded on live account, the system on the right I did not trade.
|
|
|
|
admin
Super Administrator
       
Posts: 5060
Registered: 7-4-2017
Member Is Offline
Mood: No Mood
|
|
This is a bit different to normal monte carlo usage. I think its better. The randomize history data and randomize parameters are what I think is the
best options for GSB. I doubt there is much need for it once we have multi time frame and market features working.
These are being worked on right now, but progress is slower than I would have liked.
EWFO has nth best feature (not related to gsb nth feature) which is similar to randomize indicator variables.
|
|
|
admin
Super Administrator
       
Posts: 5060
Registered: 7-4-2017
Member Is Offline
Mood: No Mood
|
|
GSB 46.13 should be out later this week. It is very significant in improvements in out of sample results.
If we leave every 2nd day out of sample on ES.30 minutes bars, my tests show a 25% drop in out of sample results.
All systems no matter how bad are included in these results.
Criteria of a system was all systems with PF
Ive got that down to 5.4% drop in out of sample results (best case) compared to in sample.
However my favorite method - that gave the highest fitness (Net profit* average trade) gave 9.4 % drop in out of sample results
but average net profit of $38900 compared to $28,300 out of sample of 30 min bars.
This is research you can objectively test your self, with minutes of human time, and an hour or so CPU time per market.
Note all GSB code will expire at the end of the month. This is needed as there are compatibilty issues between the versions on the cloud.
The 2.8% drop was 5 indicators, with es 29,30,31 min bars with data2 $spx 29 30 31
But the highest result.... will publish soon.
Let me explain an issue.
A child wants a pet mouse. He says to his parents, only thing I want for my birthday is a horse.
Parents are not happy. They compromise with a pet mouse. That's what the child wanted. (Smart kid)
So lets say we build systems on ES,EMD,ER,DOW,NASDAQ. (Same parameters on each indice)
Degradation pretend is 1% less results out of sample compared to in sample. Awesome you think!
but out of sample profit is $10,000 
Lets say we build system on ES29, ES30 ,ES 31
Degradation out of sample is say 10% (still good figure)
But out of sample profit is $30,000
We are better of using the es29,30,31 to build systems.
|
|
|
admin
Super Administrator
       
Posts: 5060
Registered: 7-4-2017
Member Is Offline
Mood: No Mood
|
|
I have now done tests on
es30
es 29 30 31 with and without $spx and with and without $idx
es 28 30 32 with spx 28 30 32,
as above but 27 30 33
as above but 26 30 34
as above but 25 30 35
as above but 27,28,28,29 30,31,32, 33
as above but 25,26,27,28,28,29 30,31,32, 33 ,34,35
es with er emd data1 and $spx $idx $rut data2
as above but also ym and nq added
The results are riveting. Will publish in private forum tomorrow
|
|
|
rws
Member
 
Posts: 114
Registered: 12-6-2017
Member Is Offline
Mood: No Mood
|
|
A test with 5 ES tickers with random offset in a portfolio would be interesting.
A test on a portfolio will often show less NP because it is not so fitted to the curve.
But if the system on the portfolio also has a good result if you run it on only ES you could have more robust real OOS.
You only know how good something is after you tried it live 10 times.
Quote: Originally posted by admin  | GSB 46.13 should be out later this week. It is very significant in improvements in out of sample results.
If we leave every 2nd day out of sample on ES.30 minutes bars, my tests show a 25% drop in out of sample results.
All systems no matter how bad are included in these results.
Criteria of a system was all systems with PF
Ive got that down to 5.4% drop in out of sample results (best case) compared to in sample.
However my favorite method - that gave the highest fitness (Net profit* average trade) gave 9.4 % drop in out of sample results
but average net profit of $38900 compared to $28,300 out of sample of 30 min bars.
This is research you can objectively test your self, with minutes of human time, and an hour or so CPU time per market.
Note all GSB code will expire at the end of the month. This is needed as there are compatibilty issues between the versions on the cloud.
The 2.8% drop was 5 indicators, with es 29,30,31 min bars with data2 $spx 29 30 31
But the highest result.... will publish soon.
Let me explain an issue.
A child wants a pet mouse. He says to his parents, only thing I want for my birthday is a horse.
Parents are not happy. They compromise with a pet mouse. That's what the child wanted. (Smart kid)
So lets say we build systems on ES,EMD,ER,DOW,NASDAQ. (Same parameters on each indice)
Degradation pretend is 1% less results out of sample compared to in sample. Awesome you think!
but out of sample profit is $10,000 
Lets say we build system on ES29, ES30 ,ES 31
Degradation out of sample is say 10% (still good figure)
But out of sample profit is $30,000
We are better of using the es29,30,31 to build systems.
|
|
|
|
admin
Super Administrator
       
Posts: 5060
Registered: 7-4-2017
Member Is Offline
Mood: No Mood
|
|
Quote: Originally posted by rws  | A test with 5 ES tickers with random offset in a portfolio would be interesting.
A test on a portfolio will often show less NP because it is not so fitted to the curve.
But if the system on the portfolio also has a good result if you run it on only ES you could have more robust real OOS.
You only know how good something is after you tried it live 10 times.
Quote: Originally posted by admin  | GSB 46.13 should be out later this week. It is very significant in improvements in out of sample results.
If we leave every 2nd day out of sample on ES.30 minutes bars, my tests show a 25% drop in out of sample results.
All systems no matter how bad are included in these results.
Criteria of a system was all systems with PF
Ive got that down to 5.4% drop in out of sample results (best case) compared to in sample.
However my favorite method - that gave the highest fitness (Net profit* average trade) gave 9.4 % drop in out of sample results
but average net profit of $38900 compared to $28,300 out of sample of 30 min bars.
This is research you can objectively test your self, with minutes of human time, and an hour or so CPU time per market.
Note all GSB code will expire at the end of the month. This is needed as there are compatibilty issues between the versions on the cloud.
The 2.8% drop was 5 indicators, with es 29,30,31 min bars with data2 $spx 29 30 31
But the highest result.... will publish soon.
Let me explain an issue.
A child wants a pet mouse. He says to his parents, only thing I want for my birthday is a horse.
Parents are not happy. They compromise with a pet mouse. That's what the child wanted. (Smart kid)
So lets say we build systems on ES,EMD,ER,DOW,NASDAQ. (Same parameters on each indice)
Degradation pretend is 1% less results out of sample compared to in sample. Awesome you think!
but out of sample profit is $10,000 
Lets say we build system on ES29, ES30 ,ES 31
Degradation out of sample is say 10% (still good figure)
But out of sample profit is $30,000
We are better of using the es29,30,31 to build systems.
| |
Hope to add random noise in later builds. Im interested if its better than say 20 30 31 min data
|
|
|
admin
Super Administrator
       
Posts: 5060
Registered: 7-4-2017
Member Is Offline
Mood: No Mood
|
|
Market validation on CL.30 900 to 230pm exchange time gave market degradation of 42.8% with secondary filter of CLose-CloseD
I got it down to an acceptable -33% with tweaks.
Fascinating, 29 30 31 min bars did NOT improve results, which they clearly did for ES.
I will publish all my findings in the private forum when done.
For a benchmark, ES30 with secondary filter close-Closed was -14.8% degradation. With tweaks got it down to an amazing -1.2%
This is a whole universe of stats to explore on many markets.
|
|
|
admin
Super Administrator
       
Posts: 5060
Registered: 7-4-2017
Member Is Offline
Mood: No Mood
|
|
Ive got CL down to -28.7 oos vs IS degradation which is very acceptable.
ES down to -1.2%!!!
CL made no improvement going from 30 min bars to 29,30,31
NG went from -39.1% to -12.4% using 29,30,31 min bars. More improvement possible.
Amazing how CL didnt improve but ES & NG did.
This is thrilling.
|
|
|
admin
Super Administrator
       
Posts: 5060
Registered: 7-4-2017
Member Is Offline
Mood: No Mood
|
|
ng I got down to -11.3%, but the fitness was 2.64 times higher. Muti bar time frames and another tweak was used.
|
|
|
cyrus68
Member
 
Posts: 171
Registered: 5-6-2017
Member Is Offline
Mood: No Mood
|
|
Peter
In your tests you reported that 5 indicators produced better results than 3.
However, it is not clear whether you stress-tested the given systems. One sort of stress-test is walk forward, which we can do in GSB or elsewhere.
Another test is Monte Carlo simulation, where random noise is introduced in the indicator parameters. It isn't possible to do this outside GSB for the
systems that it produces, but it is highly likely that a 5-indicator system will produce poorer results than a 3-indicator system.
The general modelling principle is that parsimony and simplicity in model construction is more likely to produce robust systems.
|
|
|
admin
Super Administrator
       
Posts: 5060
Registered: 7-4-2017
Member Is Offline
Mood: No Mood
|
|
Quote: Originally posted by cyrus68  | Peter
In your tests you reported that 5 indicators produced better results than 3.
However, it is not clear whether you stress-tested the given systems. One sort of stress-test is walk forward, which we can do in GSB or elsewhere.
Another test is Monte Carlo simulation, where random noise is introduced in the indicator parameters. It isn't possible to do this outside GSB for the
systems that it produces, but it is highly likely that a 5-indicator system will produce poorer results than a 3-indicator system.
The general modelling principle is that parsimony and simplicity in model construction is more likely to produce robust systems.
|
Using multiple time frames is a decent stress test. for 830 to 1500 session there are 13 bars if 30 minutes used.
for 29 min you get 14 bars, 31 min you get 12 bars. This is a decent stress test and on ES &NG gave much better out of sample results.
Regarding 5 indicators. Lets go to extremes. 1 indicator might be robust but will produce systems with really poor metrics. 10 is excessively complex.
Too me these results show that 5 is the balance. I tested 2 indicators as well, and got worse results than 3. Im going to do another test shortly to
double check this.
Keep in mind 5 osc using "*" is less parameters using "+" & much less problematic
ie osc1*a+osc2*b+osc3*c (because we dont need weighting on each osc if we use "*")
We are likely to introduce random ticks in the input data, rather then the oscillators, then use Monte Carlo.
I have no idea if this will help, but the great news is we will objectively have the answer when GSB has the feature.
|
|
|
| Pages:
1
2
3
4
5
6
..
47 |