Given sufficient cores, FFRend runs each plugin on its own core, but this only maximizes throughput for symmetrical loads, i.e. the rare ideal case where each plugin requires the same amount of CPU time to render a frame. In practice it's common for some of a project's plugins to require significantly more CPU time than others. This situation is governed by Amdahl's law, which basically states that in a parallel system, throughput is limited by the slowest sequential task. FFRend provides a workaround, called load balancing, which can greatly improve the throughput of asymmetrical loads in some (but not all) cases.
The main limitation of load balancing is that it only works for stateless filters, i.e. plugins that don't store any type of history. Plugins that won't work include source plugins, time blurs, feedbacks, and plugins with internal oscillators. The reason they don't work can be illustrated as follows. Suppose we have a crude time blur that simply outputs the average of the current frame and the previous frame. Allocating two threads to a plugin results in two instances of it, one processing even-numbered frames, and the other processing odd-numbered frames. Since each instance has its own previous frame, we now have two different versions of the average, and the output alternates rapidly between them. This appears as strobing and clearly isn't what we want. The good news is that many useful but CPU-intensive plugins are stateless, including most blurs and keys. Note that if a plugin is only stateful because of its internal oscillators, it may be possible to make it stateless by disabling the oscillators.
To show the Load Balance control bar, use View/Load Balance or Shift+L. The bar allows you to specify the number of threads allocated to each plugin. Plugins are initially allocated one thread apiece. The bar also shows the current CPU usage of each plugin thread, both as a percentage and in milliseconds. The percentage indicates how much of one core the corresponding thread is using. If it's 100%, the thread is saturating a core and may be limiting throughput.
To change the number of threads allocated to a plugin, edit its thread count and press Tab, or left-click the edit control's up/down buttons. Changes are applied immediately, but the load balance statistics can take up to a second to update. Load balance settings are project-specific and are therefore saved in the project file.
Successful load balancing often involves trial and error. The following example may be helpful. Suppose a project has three plugins: A, B, and C, connected in series. B takes twice as long as A, and C takes twice as long as B. Assume the frame size is big enough so that we're not achieving the desired frame rate, and also assume we have eight cores to play with. The load balance dialog shows that A is using a negligible amount of CPU, B is using about 50%, and C is using 100%. C appears to be the bottleneck, so we allocate two threads to C. B now increases to 100%, and C's two threads each use about 100%, but the frame rate is still unsatisfactory, so we add another thread to C. C now has three threads, but they each use only 65%, and the frame rate doesn't improve. Surprise! This means C is starved for input. Since B is now the limiting factor, we add a second thread to B. B's two threads use 75%, and C's three threads increase to 100%. We're back to C being the bottleneck, so we add one more thread to C. B now has two threads at 100%, while C has four threads at 100%, and that's the best we can do with eight cores.
If adding extra threads to a plugin fails to increase throughput, it's because either that plugin is starved for input, or the CPU is fully loaded. In the former case the solution is to identify and address the bottleneck; in the second case, there's no solution, other than a new CPU. Both cases exhibit the same symptom: more threads doing less work. The cases can be distinguished by examining the task manager. If you're not achieving your requested frame rate, but the CPU isn't fully loaded according to the task manager, one or more plugins must be starved for input. It's also possible to determine this from the percentages in the Load Balance dialog. The sum of all the percentages can't exceed 100% times the number of cores. If you have four cores, but the percentages only add up to 300%, a core must be idle (assuming no other CPU-intensive applications are running), and tweaking the balance could yield further gains. The ultimate proof of success is improved frame rate.
Note that running all cores at 100% can potentially make the GUI sluggish, and may worsen MIDI latency. This is an important factor if you're using FFRend for live performances, but it might not matter if you're only recording movies.