So which scheme works best?
Without stronger statistics, score alone is too highly data-dependent to yield a good characterization of scheme quality. If not score, then perhaps rate of improvement could be useful. On average, sa1 was the most efficient approach, producing the most improvement per attempt. It was also the fastest, and as it turns out, consistently the worst performer, on all three datasets.
Turning to the `Score vs Total Attempts' scatter plots (Figures 4.8 through 4.10) for assistance, ideally we would like to see data in the bottom left-hand corner of the graph and few results in the upper right. Instead, we see sa1 produced some of the highest scores, while the results of the other schemes are in the same score range. A few of the sa4 rounds came out the lowest.
In terms of attempts, we see sa0 and rldhc take the most attempts, sa1 and sa3 the least, leaving sa2 and sa4 in the middle. Furthermore, sa4 had the widest variability in total attempts, while sa2 and sa3 were more predictable. Overall, the dynamic adaptive schedules gave us the best performance.
With so much variability in scheme performance, and sensitivity to changes in the objective function and the data, rather than trying to pick one, we use all the schemes in rotation, and multiple runs (as time permits), to sample the different approaches for the best results.