Better tile rendering performance

Component:YafaRay Core
Category:feature request
Assigned:David Bluecame

There is a long standing issue that happens when there are less tiles than available cores situation at the end of every pass. In case of an scene rendered with several passes, this downfall will happen at the end of every pass. This poses a lesser problem with low sampling, but with high sampling settings those last tiles take too much to render while most of the computing capabilities stay idle. I think most of YafaRay users are rendering with 4 cores if not 8.

With the limited knowledge I have about maths, I think a good solution could be that we render a sucession of increasingly smaller areas of tiles of the same size, following a mathematical pattern for consistency and formal beauty, like a Fibonacci sequence, golden ratio or just geometric series, fractals.



Status:active» postponed


I agree, this is something very annoying. For now the only workaround is:

* To choose a smaller tile size. However, sometimes a smaller tile size will make the render a bit slower overall.

* To render several images at the same time in parallel. However, this only works if you really need to render several images and you are not in a hurry to get the results.


Perhaps this should be included in a project to make a progressive render, but at this time there are other pressing matters as to improve the material system. I will postpone this to a better time.


Hi David, thank you for the feedback!

If you or any other developer have time to look into this in the future, I would like to propose an easier alternative instead of implementing a full new tile sampling pattern, which is using the existing one, but make it subdivide final tiles. We set some initial conditions, like

threads > 1
tile size > or = 4 pixels

And the pseudocode:

set "available tiles" value
set "threads" value
if available tiles = or < threads / 2
then subdivide available tiles / 4
render next tiles
I have downloaded an example below using the simplest case which is two threads. There would be still a performance hit but the idea is to make it as small as possible. Ideally, when there are several adaptive passes to render, it would be great that the idle threads can sample tiles in the subsequent pass while the running threads are finishing tiles in the current one, but the idea proposed here could still be useful for single passes or for the final pass. Thanks.

xz4ul.gif 106.08 KB


Assigned to:Anonymous» David Bluecame
Status:postponed» ready to commit


A first implementation of this feature will be included in the next YafaRay-E version, possibly to be refined in subsequent versions.



I've been testing this feature, and I see that regardless the tile size it goes to size 4. I think it should go to half the current tile on every iteration, because tile 4 is lot slower than 16.

Also, I think the behaviour is not correct. As you can see on the example, there is a lot of room to continue normal tiling, but it starts with the small tiles. In my opinion, this feature should start when there is one of the threads that can be left without work, so normally this occurs when only one tile is left. In that case, you can divide that last tile between the total number of threads, or by 4. For example.

Sorry for the mouse pointer on the end :P, also, there is a red square that is a compression artifact.


tile_animation.gif 81.77 KB


Status:ready to commit» needs work


Status:needs work» ready to commit

I've been investigating and unfortunately I cannot see an easy way to implement a "runtime" subdivision of tiles. The main reasons:

* YafaRay code is made for an initial setup of tiles. It's not designed for runtime tile changes during render. Nothing is impossible but would require a lot of work to do this.

* Even if runtime tile changes were possible, it's not possible to know beforehand how long a certain tile would take to determine if it should be split or not. We should render it first and use the timing to decide what to do in subsequent passes.

As the above points make this too complicated, I've tried to fine tune my original "pre-split" idea using an stlightly more clever way. In the beta version of v3.0.0 the last (threads * 2) tiles will be subdivided. The first "thread" number of tiles will be divided by 2 and the second "thread" number of tiles divided by 4 (all with a minimum of 4x4). I hope this way would make more sense to you.


Ok thanks!


Status:ready to commit» fixed

Fixed in v3.0.0-beta:


Status:fixed» closed

Closing as it was marked as "fixed" for 4 weeks without any further comments from users.