A way to do ray ordering so we take advantage of CPU cache work is to shot eye ray buckets based on scene materials. You can know that information looking into object data structures. As rays keep on doing the same material and textures calculation over an over again, they will tend to do more CPU cache work that RAM memory access. The eye rays buckets size don't need to be exactly as per materials; in materials boundaries tiles can sample bits of other materials too, but the idea is concentrating on one material before taking into the another one. This technique would have its share of inneficiencies too, for instance code overhead or when the material result depends on a secondary ray (reflection, transmision, GI). It should work at least on direct lighting diffuse component.