GSB Forums - Powered by XMB 1.9.11

Differential Privacy algorithm

boosted - 10-7-2018 at 02:53 PM

I started this new topic because I thought this newer approach to Training and Test data sets was interesting and quite possibly be used in GSB to create even more robust strategies using less Training Data.

I'm not a math wiz or pretend to understand everything this Differential Privacy algorithm does for backtesting data sets but from a lamens viewpoint it sounded logical and was an improvement when using Training and Test data sets (cross contamination of sorts).

I wonder if this could be a way to use less Training data and produce better Test and OOS results.

I would be interested in hearing Peters or any other members feedback on this subject.

Here is the link to a blog that describes in detail this newer machine learning algorithm.

www.win-vector.com/blog/2015/10/a-simpler-explanation-of-differential-privacy/

admin - 10-7-2018 at 04:44 PM

I only had a quick look at the article. Its been posted in the last week (forgot who) about monte carlo, adding noise to data.
This will not be hard for GSB to do, and we can objectively test now with GSB if this works.
ie build 1000 systems with 2 ticks noise added or subtracted randomly too data feeds, and say 10 different noisy data feeds per system.
Then turn the noise of at out of sample, and see how the OOS performance compares to the same test without noise.

boosted - 10-7-2018 at 08:01 PM

When I read about Differential Privacy and how it introduced noise to the data set(s) which in turn allowed for less IS data to be used to produce improved OOS results sounded a bit similar to what I think Jonathan Kinlay may be doing with his data. I brought up his name last year in the forum because I thought it was interesting that he was creating real world prototyping of strategies he built that used little IS data and in turn produced superior robust OOS results compared to maybe what most were doing with data.

His one page write up on his use of Genetic Programming and using less IS data to produce robust strategies caught my eye. Here is an exerpt:

"Firstly, we have evolved methods for transforming original data series that enables us to avoid over-using the same old data-sets and, more importantly, allows new patterns to be revealed in the underlying market structure. This effectively eliminates the data mining bias that has plagued the GP approach. At the same time, because our process produces a stronger signal relative to the background noise, we consume far less data – typically no more than a couple of years worth. Secondly, we have found we can enhance the robustness of prototype strategies by using double-blind testing: i.e. data sets on which the performance of the model remains unknown to the machine, or the researcher, prior to the final model selection. Finally, we are able to test not only the alpha signal, but also multiple variations of the trade expression, including different types of entry and exit logic, as well as profit targets and stop loss constraints.

He posts results on 4 different futures markets that are extremely good on all metrics that one would find important to a robust strategy. They can be found at the bottom of his page.

I may be wrong on this thought, but it seems he may be using Differential Privacy algo(s) to allow for less IS data and more OOS data producing robust results. its clear he's got something figured out about improving use of IS and OOS data. I would like to know what it is but I'm not smart enough to figure it out but I don't think its too far to think he's introducing noise and getting the advantage of the DP algo's logic of tapping into IS data without disturbing the OOS results and over optimizing.

The link to his page about his newer way of using data and GP is below. His write up is circa Nov 2014 which is about the same time Differential Privacy logic was being written about. Anyways, would like to hear yours and other member thoughts on this DP idea and if it holds the key or potential to use less IS data more OOS robust results.

I find the whole thing intriguing and am always looking for an edge from smarter people than I. Its one of the reasons I have lasted this long as a full time discretionary trader (20yrs).

http://jonathankinlay.com/2014/11/building-systematic-strate...

cyrus68 - 11-7-2018 at 02:42 AM

Hi boosted

You provided very interesting links that I will read, in good time. I would like to note that GP, along with a host of machine learning techniques, are largely about data mining. In other words, the researcher has no idea about the data generation process that is producing the data - the hidden recurring patterns, along with the noise. Contrast this with statistical arbitrage, which posits a model of what we see. If the series are cointegrated, then systems can be built to take advantage of this process in the underlying data.

There is nothing bad about data mining, as such. But there is a difference between pure number crunching and using judgement and finesse in setting the criteria and the methods to search for underlying patterns.

boosted - 11-7-2018 at 08:58 AM

Hi cyrus68

Understood.

I was most intersted in Kinlay's assertion that he's able to use much less IS data to produce very robust OOS strategies. He apparently has found a way to pre-process his data set OR he is using a machine learning technique as you say (I'm thinking similar to Differential Privacy) to allow for smaller IS data without over optimization in the OOS data.

From my lamens understanding of the DP algo approach, it was somehow allowing OOS to be "sampled" while building the Training set without crossing the line into over optimization but overall improving the OOS results. Overall it seemed to finesse the data in such a way to build a more robust OOS result without encroaching into being over optimized (doomed for success) .

I read Kinlays approach a few years ago and then I happen to stumble upon this DP algo approach to IS and OOS sampling to produce better results. That caught my eye and I wondered if that may be what he was using or some custom derivative of the DP approach to data.

If you read Kinlays link, any idea what his secret sauce is to use less IS data while improving OOS robustness? He posts results in TS reports showing what I am most would consider extremely good metrics (bottom of his page link).

rws - 11-7-2018 at 04:33 PM

I am not sure what he uses but generating more data of existing data that has some edge is nothing new.

http://www.financial-hacker.com/better-tests-with-oversampli...

rws - 11-7-2018 at 04:52 PM

You could shift the data while generating and make 29 different versions of 30 min data.
Since the new GSB can generate on multiple tickers you could input these shifted versions
of the 30 min data. Probably 5 versions 5-35 10-40 15-45 20-50 25-55 would already show
if it would make sense. The 30 minutes bars would all show different.

cyrus68 - 12-7-2018 at 04:27 AM

boosted

I read the material in your links. It doesn't appear to me that Kinlay is using the differential privacy technique. He refers to transforming the raw data. Presumably, the transformation produces clearer signals that are, then, fed into the GP engine.

As for OOS testing, if you select the nth mode in GSB, you will have an OOS sample that is essentially hidden from you.

In what he refers to as "trade expression" alpha is merely the return attributable to the model, over and above the return produced by the market. The other items refer to pretty standard things about setting entry, exit, targets...etc...

boosted - 12-7-2018 at 01:36 PM

-cyrus68

Your explanation makes sense.

Guess he's transforming raw data into some custom synthetic data or re-arranging it to reveal more signal than
noise. His method of extracting more signal vs noise from data is at the cruxt of what I am trying to achieve myself. Looking at his prototype strategies he built, incorporating his method of transforming of data among other factors, its clear to me his method of using data has a nice advantage over anything I have seen to date.

I think GSB's nth method is a bit different than what the DP algo is doing. Similar in its idea but different in how its done is what I understand of how DP works. Which method produces better results or same I dont have a clue but I would like to see results of what DP would achieve if anything in GSB.

boosted - 12-7-2018 at 01:59 PM

rws

Yes I agree oversampling has shown to improve overall results. I remember reading his site last year on that subject.

I understand the general premise of the DP algo I mentioned, but without a good grasp of the math involved I cant measure if its method or approach to data is much different than GSB's nth method.

I like the documented math behind DP algo approach to IS and OOS data and results.

The way I read it, less is more (DP algo approach). Less IS data coupled with DS algo approach to handling OOS data producing better OSS results than vs only using IS data and getting less robust OOS results.

It seems they found a way to comingle IS and OOS data while improving overall robustness for OOS, generally speaking.

No doubt there are endless ways to trade, build strategies, use of data etc., but at the very least I think DP approach is very encouraging, more so than anything else I have seen to date.