WaveActiveLerp()について

こんちゃわ。Pocolです。
Wave組み込み命令の記事を漁っていたら,GithubにWaveActiveLerp()の実装を書いている人がいたので紹介しようと思います。
下記に説明の記事があります。
https://github.com/AlexSabourinDev/cranberry_blog/blob/master/WaveActiveLerp.md

実装は,https://github.com/AlexSabourinDev/cranberry_blog/blob/master/WaveActiveLerp_Shaders/WaveActiveLerp.hlslにあって,次のような感じみたいです。

[hlsl]
uint WaveGetLastLaneIndex()
{
uint4 ballot = WaveActiveBallot(true);
uint4 bits = firstbithigh(ballot); // Returns -1 (0xFFFFFFFF) if no bits set.

// For reasons unclear to me, firstbithigh causes us to consider `bits` as a vector when compiling for RDNA
// This then causes us to generate a waterfall loop later on in WaveReadLaneAt 🙁
// Force scalarization here. See: https://godbolt.org/z/barT3rM3W
bits = WaveReadLaneFirst(bits);
bits = select(bits == 0xFFFFFFFF, 0, bits + uint4(0, 32, 64, 96));

return max(max(max(bits.x, bits.y), bits.z), bits.w);
}

float WaveReadLaneLast(float t)
{
uint lastLane = WaveGetLastLaneIndex();
return WaveReadLaneAt(t, lastLane);
}

// Interpolates as lerp(lerp(Lane2, Lane1, t1), Lane0, t0), etc
//
// NOTE: Values need to be sorted in order of last interpolant to first interpolant.
//
// As an example, say we have the loop:
// for(int i = 0; i < 4; i++)
// result = lerp(result, values[i], interpolations[i]);
//
// Lane0 should hold the last value, i.e. values[3]. NOT values[0].
//
// WaveActiveLerp instead implements the loop as a reverse loop:
// for(int i = 3; i >= 0; i–)
// result = lerp(result, values[i], interpolations[i]);
//
// return.x == result of the wave's interpolation
// return.y == product of all the wave's (1-t) for continued interpolation.
float2 WaveActiveLerp(float value, float t)
{
// lerp(v1, v0, t0) = v1 * (1 – t0) + v0 * t0
// lerp(lerp(v2, v1, t1), v0, t0)
// = (v2 * (1 – t1) + v1 * t1) * (1 – t0) + v0 * t0
// = v2 * (1 – t1) * (1 – t0) + v1 * t1 * (1 – t0) + v0 * t0

// We can then split the elements of our sum for each thread.
// Lane0 = v0 * t0
// Lane1 = v1 * t1 * (1 – t0)
// Lane2 = v2 * (1 – t1) * (1 – t0)

// As you can see, each thread's (1 – tn) term is simply the product of the previous thread's terms.
// We can achieve this result by using WavePrefixProduct

float prefixProduct = WavePrefixProduct(1.0f – t);
float laneValue = value * t * prefixProduct;
float interpolation = WaveActiveSum(laneValue);

// If you don't need this for a continued interpolation, you can simply remove this part.
float postfixProduct = prefixProduct * (1.0f – t);
float oneMinusT = WaveReadLaneLast(postfixProduct);

return float2(interpolation, oneMinusT);
}
[/hlsl]

いまのところで,使いどころがパッと浮かばないのですが,知っていればどこかで使えそうな気がしています。
…というわけで,WaveActiveLerp()の実装紹介でした。