GSB Forums

Not logged in [Login - Register]

Futures and forex trading contains substantial risk and is not for every investor. An investor could
potentially lose all or more than the initial investment. Risk capital is money that can be lost without
jeopardizing ones’ financial security or life style. Only risk capital should be used for trading and only
those with sufficient risk capital should consider trading. Past performance is not necessarily indicative of
future results
Go To Bottom

Printable Version  
 Pages:  1  2    4    6  ..  47
Author: Subject: Update to GSB methodology. A must read, the backpacker and the Art of war by Sun Tzu
admin
Super Administrator
*********




Posts: 5060
Registered: 7-4-2017
Member Is Offline

Mood: No Mood

[*] posted on 17-5-2018 at 11:52 PM


Quote: Originally posted by cyrus68  
The video proposes a useful test that I may adopt because it is quite fast. However, I would like to note that it is a test of the viability of a particular strategy applied to a given symbol. For example, in the case of ES, if you selected 15-minute data for the primary symbol, different secondary datasets, 5 indicators and 2 operators – this would constitute a different strategy. And let’s not forget different historical periods over which the test may be carried out. The test may produce different results compared with your first strategy.

One thing I noticed is that, for ES, you set the max # of unique systems at 600. Yet at the end you had 1735. Either GSB didn’t stop or you let it run further. I have the same problem with max # completed. Typically, I set it at 5 million. So, if GSB is processing 35,000 a minute, it should stop after 2 hours and 20 minutes. That doesn’t happen. So, I have to stop it manually.

Currently I use a rather biased, but simple method, to gauge the overall viability of a strategy. I look at the proportion of WF results with PAS scores above 50. If, out of 60 WF, there are 40 with scores above 50 - and most are bunched closer to 100 – that is a good sign. If there are only 10 or 15 with acceptable scores, that is not a good sign. The test is biased because WF is carried out on systems that are pre-selected on the basis of attractive metrics.

The reason for the over shoot in systems generated is GSB stops workers after this number is achieved, but then the workers finish of the current population and generations. This doesnt matter the least to me. You can always delete all systems after the 600 or so you chose. However the more systems the better. Using less workers will give less over shoot. I used 10 workers for these tests typically.
It would be good too compare the two methods. This is all fairly un explored territory, but worth investigating.


View user's profile View All Posts By User
cyrus68
Member
***




Posts: 171
Registered: 5-6-2017
Member Is Offline

Mood: No Mood

[*] posted on 18-5-2018 at 03:02 AM


Thanks Carl.
I didn't notice the bad link. I copied and pasted from my earlier post.
Your link is correct.


View user's profile View All Posts By User
edgetrader
Junior Member
**




Posts: 24
Registered: 16-5-2018
Member Is Offline

Mood: No Mood

[*] posted on 18-5-2018 at 06:31 AM


Other metrics can of course be added and tried. I like the way Peter compares IS and OOS in his videos. He's looking at the metrics that matter for trading. Net profit, average trade, and profit factor are also what I would look at when evaluating a system.

The only thing I'd do differently is to divide OOS by IS (not other way round) before forming the overall average. This would output percentages, like OOS is 60% as good as IS, and a higher number would be better.

A great benefit of Peter's approach is that it includes market direction, trading sessions and other settings. For example VXX is a market that trends down. Long-only systems will be much harder to make than short-only systems. Since Peter's IS and OOS systems share the same settings, like short-only, the metrics of the IS systems are indeed the best benchmark to compare metrics of OOS systems. Looking at OOS just by itself with no suitable benchmark wouldn't account for things like short bias.


View user's profile View All Posts By User
cyrus68
Member
***




Posts: 171
Registered: 5-6-2017
Member Is Offline

Mood: No Mood

[*] posted on 19-5-2018 at 12:31 AM


Just to explain, for everybody's benefit, the t test is a statistically sound method of summarising system quality. It is done on the P/L, by trade, of the system. It allows us to rank systems by their overall quality. We would also be able to see the average value for all systems and compare the IS and OOS averages. Its primary use is to evaluate OOS systems.

The Wilcoxon test is a measure of the quality of a strategy. It is calculated on paired values of IS and OOS systems. For example, using NP or PF. It is also a statistically sound test.

In my view, these tests would greatly enhance GSB's capabilities.


View user's profile View All Posts By User
admin
Super Administrator
*********




Posts: 5060
Registered: 7-4-2017
Member Is Offline

Mood: No Mood

[*] posted on 19-5-2018 at 02:10 AM


Quote: Originally posted by cyrus68  
Just to explain, for everybody's benefit, the t test is a statistically sound method of summarising system quality. It is done on the P/L, by trade, of the system. It allows us to rank systems by their overall quality. We would also be able to see the average value for all systems and compare the IS and OOS averages. Its primary use is to evaluate OOS systems.

The Wilcoxon test is a measure of the quality of a strategy. It is calculated on paired values of IS and OOS systems. For example, using NP or PF. It is also a statistically sound test.

In my view, these tests would greatly enhance GSB's capabilities.

I'm going to chat to the programmer about this. This last week I UN-expectanly went to New Zealand for family matters, and have done just the bare minimum. Hence no great changes of recent. There is a new build but it just has some bug fixes. Still waiting on the version with multi time frame / symbol support for the same system.


View user's profile View All Posts By User
cyrus68
Member
***




Posts: 171
Registered: 5-6-2017
Member Is Offline

Mood: No Mood

[*] posted on 20-5-2018 at 09:20 PM


On Sunday, while imbibing my favourite scotch, a thought occurred to me. Implementing the Wilcoxon test would be a real pain in the butt for the programmer. There is a simpler way. Why not enable exporting the results of running a strategy (the metrics panel) as a csv file? We can easily import the data into Excel and carry out the Wilcoxon test in Excel, or in any number of statistical packages.

My main objective is to do the test on the NP of IS and OOS. But you could also do the test on PF or anything else. The whole process of exporting the data and running the test could be done in 5 minutes. In addition, you can do a graph of the data.

As for the t test, it should be relatively simple to do the programming.


View user's profile View All Posts By User
admin
Super Administrator
*********




Posts: 5060
Registered: 7-4-2017
Member Is Offline

Mood: No Mood

[*] posted on 20-5-2018 at 10:19 PM


Quote: Originally posted by cyrus68  
On Sunday, while imbibing my favourite scotch, a thought occurred to me. Implementing the Wilcoxon test would be a real pain in the butt for the programmer. There is a simpler way. Why not enable exporting the results of running a strategy (the metrics panel) as a csv file? We can easily import the data into Excel and carry out the Wilcoxon test in Excel, or in any number of statistical packages.

My main objective is to do the test on the NP of IS and OOS. But you could also do the test on PF or anything else. The whole process of exporting the data and running the test could be done in 5 minutes. In addition, you can do a graph of the data.

As for the t test, it should be relatively simple to do the programming.


I think excel export is a great idea.


View user's profile View All Posts By User
Carl
Member
***




Posts: 342
Registered: 10-5-2017
Member Is Offline

Mood: No Mood

[*] posted on 20-5-2018 at 11:33 PM



In episode 143 of the Better System Trader podcast Bruce Vanstone mentions a couple of other statistical methods to quantity system performance.

http://bettersystemtrader.com/the-dna-approach-to-trading-wi...

Anova
T-test
Ledoit-Wolf
Diebolt-Mariano


View user's profile View All Posts By User
cyrus68
Member
***




Posts: 171
Registered: 5-6-2017
Member Is Offline

Mood: No Mood

[*] posted on 22-5-2018 at 02:31 AM


Hi Carl

I haven't listened to the podcast yet. It looks to be useful.
The statistical methods that you mentioned are all available in a number of packages.


View user's profile View All Posts By User
admin
Super Administrator
*********




Posts: 5060
Registered: 7-4-2017
Member Is Offline

Mood: No Mood

[*] posted on 22-5-2018 at 07:31 PM


This short video is to show how to get session times on your chosen market.
Attachment: Login to view the details
This is critical for all markets. However S&P500,S&P400, Nasdaq, mini Dow, Russell 2000 you can just use the equivalent of 830 to 1500 central USA time. (don't use the last 15 minutes of the day)

Also explained is a little on exchange time or local time. Most users are confused over this. You should be using local time because exchange time will not allow other symbols to be added to your chart that are on different time zones.
If you use GSB supplied data (typically central USA time) and your own local time that is not central USA, you MUST NEVER mix different local times. Other wise the data will be forward or backward looking by the difference in the local times.




View user's profile View All Posts By User
admin
Super Administrator
*********




Posts: 5060
Registered: 7-4-2017
Member Is Offline

Mood: No Mood

[*] posted on 22-5-2018 at 10:28 PM


The above video is now on youtube.
https://youtu.be/NFC7ego_Y70

The market validation video has been slightly tweaked, though the last bit on crude oil still could have been improved.
https://youtu.be/iG7MVOC56zk
If you are new to this thread, both videos are essential to view. The first one is simple and short.


View user's profile View All Posts By User
admin
Super Administrator
*********




Posts: 5060
Registered: 7-4-2017
Member Is Offline

Mood: No Mood

[*] posted on 28-5-2018 at 01:02 AM


Export of system metrics can now be done, but the requested newer system quality metrics have not been done. Priority is muti time frame /data support.
Get the exe in the member only section of beta builds.
http://trademaid.info/forum/viewthread.php?tid=39


csv.png - 85kB


View user's profile View All Posts By User
admin
Super Administrator
*********




Posts: 5060
Registered: 7-4-2017
Member Is Offline

Mood: No Mood

[*] posted on 28-5-2018 at 09:55 PM


I did some interesting market validation today on sqqq 15 minutes (short only)
Market validation failed quite badly, though some of my systems on sqqq (but not all of them) have gone well.
This implies that failing market validation doesn't mean GSB cant make systems on the market, but that it is less likely to work out of sample.
My gut reaction is market validation is going work much better with 14,15,16 or 29,30,31 minute bars. This is what is being worked on in GSB now.
The system on the left I traded on live account, the system on the right I did not trade.


sqqq1.png - 188kB sqqq6.png - 51kB


View user's profile View All Posts By User
admin
Super Administrator
*********




Posts: 5060
Registered: 7-4-2017
Member Is Offline

Mood: No Mood

[*] posted on 30-5-2018 at 05:02 AM


The csv export can also be used to determine what indicators are most used, and never used.
Many years ago with the earlier generation of GSB, I found that common indicators with the odd less common indicator was the better than just allowing only the common indicators. ( I will need to re verify this) But this is also going to be useful to see how often your custom indicators are used.
Long term I want GSB to have its own analytic's.


View user's profile View All Posts By User
rws
Member
***




Posts: 114
Registered: 12-6-2017
Member Is Offline

Mood: No Mood

[*] posted on 30-5-2018 at 09:48 AM


Something like this including multiple timeframe and multiple tickers confirmation is usefull.




Quote: Originally posted by admin  
I did some interesting market validation today on sqqq 15 minutes (short only)
Market validation failed quite badly, though some of my systems on sqqq (but not all of them) have gone well.
This implies that failing market validation doesn't mean GSB cant make systems on the market, but that it is less likely to work out of sample.
My gut reaction is market validation is going work much better with 14,15,16 or 29,30,31 minute bars. This is what is being worked on in GSB now.
The system on the left I traded on live account, the system on the right I did not trade.




monte carlo.png - 99kB


View user's profile View All Posts By User
admin
Super Administrator
*********




Posts: 5060
Registered: 7-4-2017
Member Is Offline

Mood: No Mood

[*] posted on 31-5-2018 at 05:48 AM


This is a bit different to normal monte carlo usage. I think its better. The randomize history data and randomize parameters are what I think is the best options for GSB. I doubt there is much need for it once we have multi time frame and market features working.
These are being worked on right now, but progress is slower than I would have liked.
EWFO has nth best feature (not related to gsb nth feature) which is similar to randomize indicator variables.


View user's profile View All Posts By User
admin
Super Administrator
*********




Posts: 5060
Registered: 7-4-2017
Member Is Offline

Mood: No Mood

[*] posted on 25-6-2018 at 12:28 AM


GSB 46.13 should be out later this week. It is very significant in improvements in out of sample results.
If we leave every 2nd day out of sample on ES.30 minutes bars, my tests show a 25% drop in out of sample results.
All systems no matter how bad are included in these results.
Criteria of a system was all systems with PF

Ive got that down to 5.4% drop in out of sample results (best case) compared to in sample.
However my favorite method - that gave the highest fitness (Net profit* average trade) gave 9.4 % drop in out of sample results
but average net profit of $38900 compared to $28,300 out of sample of 30 min bars.
This is research you can objectively test your self, with minutes of human time, and an hour or so CPU time per market.
Note all GSB code will expire at the end of the month. This is needed as there are compatibilty issues between the versions on the cloud.
The 2.8% drop was 5 indicators, with es 29,30,31 min bars with data2 $spx 29 30 31
But the highest result.... will publish soon.

Let me explain an issue.
A child wants a pet mouse. He says to his parents, only thing I want for my birthday is a horse.
Parents are not happy. They compromise with a pet mouse. That's what the child wanted. (Smart kid)

So lets say we build systems on ES,EMD,ER,DOW,NASDAQ. (Same parameters on each indice)
Degradation pretend is 1% less results out of sample compared to in sample. Awesome you think!
but out of sample profit is $10,000 :(

Lets say we build system on ES29, ES30 ,ES 31
Degradation out of sample is say 10% (still good figure)
But out of sample profit is $30,000
We are better of using the es29,30,31 to build systems.






View user's profile View All Posts By User
admin
Super Administrator
*********




Posts: 5060
Registered: 7-4-2017
Member Is Offline

Mood: No Mood

[*] posted on 25-6-2018 at 05:13 AM


I have now done tests on
es30
es 29 30 31 with and without $spx and with and without $idx
es 28 30 32 with spx 28 30 32,
as above but 27 30 33
as above but 26 30 34
as above but 25 30 35
as above but 27,28,28,29 30,31,32, 33
as above but 25,26,27,28,28,29 30,31,32, 33 ,34,35
es with er emd data1 and $spx $idx $rut data2
as above but also ym and nq added
The results are riveting. Will publish in private forum tomorrow




View user's profile View All Posts By User
rws
Member
***




Posts: 114
Registered: 12-6-2017
Member Is Offline

Mood: No Mood

[*] posted on 25-6-2018 at 02:36 PM


A test with 5 ES tickers with random offset in a portfolio would be interesting.

A test on a portfolio will often show less NP because it is not so fitted to the curve.
But if the system on the portfolio also has a good result if you run it on only ES you could have more robust real OOS.
You only know how good something is after you tried it live 10 times.

Quote: Originally posted by admin  
GSB 46.13 should be out later this week. It is very significant in improvements in out of sample results.
If we leave every 2nd day out of sample on ES.30 minutes bars, my tests show a 25% drop in out of sample results.
All systems no matter how bad are included in these results.
Criteria of a system was all systems with PF

Ive got that down to 5.4% drop in out of sample results (best case) compared to in sample.
However my favorite method - that gave the highest fitness (Net profit* average trade) gave 9.4 % drop in out of sample results
but average net profit of $38900 compared to $28,300 out of sample of 30 min bars.
This is research you can objectively test your self, with minutes of human time, and an hour or so CPU time per market.
Note all GSB code will expire at the end of the month. This is needed as there are compatibilty issues between the versions on the cloud.
The 2.8% drop was 5 indicators, with es 29,30,31 min bars with data2 $spx 29 30 31
But the highest result.... will publish soon.

Let me explain an issue.
A child wants a pet mouse. He says to his parents, only thing I want for my birthday is a horse.
Parents are not happy. They compromise with a pet mouse. That's what the child wanted. (Smart kid)

So lets say we build systems on ES,EMD,ER,DOW,NASDAQ. (Same parameters on each indice)
Degradation pretend is 1% less results out of sample compared to in sample. Awesome you think!
but out of sample profit is $10,000 :(

Lets say we build system on ES29, ES30 ,ES 31
Degradation out of sample is say 10% (still good figure)
But out of sample profit is $30,000
We are better of using the es29,30,31 to build systems.






View user's profile View All Posts By User
admin
Super Administrator
*********




Posts: 5060
Registered: 7-4-2017
Member Is Offline

Mood: No Mood

[*] posted on 26-6-2018 at 02:40 AM


Quote: Originally posted by rws  
A test with 5 ES tickers with random offset in a portfolio would be interesting.

A test on a portfolio will often show less NP because it is not so fitted to the curve.
But if the system on the portfolio also has a good result if you run it on only ES you could have more robust real OOS.
You only know how good something is after you tried it live 10 times.

Quote: Originally posted by admin  
GSB 46.13 should be out later this week. It is very significant in improvements in out of sample results.
If we leave every 2nd day out of sample on ES.30 minutes bars, my tests show a 25% drop in out of sample results.
All systems no matter how bad are included in these results.
Criteria of a system was all systems with PF

Ive got that down to 5.4% drop in out of sample results (best case) compared to in sample.
However my favorite method - that gave the highest fitness (Net profit* average trade) gave 9.4 % drop in out of sample results
but average net profit of $38900 compared to $28,300 out of sample of 30 min bars.
This is research you can objectively test your self, with minutes of human time, and an hour or so CPU time per market.
Note all GSB code will expire at the end of the month. This is needed as there are compatibilty issues between the versions on the cloud.
The 2.8% drop was 5 indicators, with es 29,30,31 min bars with data2 $spx 29 30 31
But the highest result.... will publish soon.

Let me explain an issue.
A child wants a pet mouse. He says to his parents, only thing I want for my birthday is a horse.
Parents are not happy. They compromise with a pet mouse. That's what the child wanted. (Smart kid)

So lets say we build systems on ES,EMD,ER,DOW,NASDAQ. (Same parameters on each indice)
Degradation pretend is 1% less results out of sample compared to in sample. Awesome you think!
but out of sample profit is $10,000 :(

Lets say we build system on ES29, ES30 ,ES 31
Degradation out of sample is say 10% (still good figure)
But out of sample profit is $30,000
We are better of using the es29,30,31 to build systems.





Hope to add random noise in later builds. Im interested if its better than say 20 30 31 min data


View user's profile View All Posts By User
admin
Super Administrator
*********




Posts: 5060
Registered: 7-4-2017
Member Is Offline

Mood: No Mood

[*] posted on 3-7-2018 at 10:15 PM


Market validation on CL.30 900 to 230pm exchange time gave market degradation of 42.8% with secondary filter of CLose-CloseD
I got it down to an acceptable -33% with tweaks.
Fascinating, 29 30 31 min bars did NOT improve results, which they clearly did for ES.
I will publish all my findings in the private forum when done.
For a benchmark, ES30 with secondary filter close-Closed was -14.8% degradation. With tweaks got it down to an amazing -1.2%
This is a whole universe of stats to explore on many markets.


View user's profile View All Posts By User
admin
Super Administrator
*********




Posts: 5060
Registered: 7-4-2017
Member Is Offline

Mood: No Mood

[*] posted on 5-7-2018 at 03:49 AM


Ive got CL down to -28.7 oos vs IS degradation which is very acceptable.
ES down to -1.2%!!!
CL made no improvement going from 30 min bars to 29,30,31
NG went from -39.1% to -12.4% using 29,30,31 min bars. More improvement possible.
Amazing how CL didnt improve but ES & NG did.

This is thrilling.


View user's profile View All Posts By User
admin
Super Administrator
*********




Posts: 5060
Registered: 7-4-2017
Member Is Offline

Mood: No Mood

[*] posted on 6-7-2018 at 01:56 AM


ng I got down to -11.3%, but the fitness was 2.64 times higher. Muti bar time frames and another tweak was used.

View user's profile View All Posts By User
cyrus68
Member
***




Posts: 171
Registered: 5-6-2017
Member Is Offline

Mood: No Mood

[*] posted on 6-7-2018 at 06:09 AM


Peter
In your tests you reported that 5 indicators produced better results than 3.
However, it is not clear whether you stress-tested the given systems. One sort of stress-test is walk forward, which we can do in GSB or elsewhere.

Another test is Monte Carlo simulation, where random noise is introduced in the indicator parameters. It isn't possible to do this outside GSB for the systems that it produces, but it is highly likely that a 5-indicator system will produce poorer results than a 3-indicator system.

The general modelling principle is that parsimony and simplicity in model construction is more likely to produce robust systems.


View user's profile View All Posts By User
admin
Super Administrator
*********




Posts: 5060
Registered: 7-4-2017
Member Is Offline

Mood: No Mood

[*] posted on 6-7-2018 at 03:59 PM


Quote: Originally posted by cyrus68  
Peter
In your tests you reported that 5 indicators produced better results than 3.
However, it is not clear whether you stress-tested the given systems. One sort of stress-test is walk forward, which we can do in GSB or elsewhere.

Another test is Monte Carlo simulation, where random noise is introduced in the indicator parameters. It isn't possible to do this outside GSB for the systems that it produces, but it is highly likely that a 5-indicator system will produce poorer results than a 3-indicator system.

The general modelling principle is that parsimony and simplicity in model construction is more likely to produce robust systems.

Using multiple time frames is a decent stress test. for 830 to 1500 session there are 13 bars if 30 minutes used.
for 29 min you get 14 bars, 31 min you get 12 bars. This is a decent stress test and on ES &NG gave much better out of sample results.

Regarding 5 indicators. Lets go to extremes. 1 indicator might be robust but will produce systems with really poor metrics. 10 is excessively complex. Too me these results show that 5 is the balance. I tested 2 indicators as well, and got worse results than 3. Im going to do another test shortly to double check this.
Keep in mind 5 osc using "*" is less parameters using "+" & much less problematic
ie osc1*a+osc2*b+osc3*c (because we dont need weighting on each osc if we use "*")

We are likely to introduce random ticks in the input data, rather then the oscillators, then use Monte Carlo.
I have no idea if this will help, but the great news is we will objectively have the answer when GSB has the feature.


View user's profile View All Posts By User
 Pages:  1  2    4    6  ..  47

  Go To Top

Trademaid forum. Software tools for TradeStation, MultiCharts & NinjaTrader
[Queries: 67] [PHP: 35.7% - SQL: 64.3%]