WaveActiveLerp()について

こんちゃわ。Pocolです。
Wave組み込み命令の記事を漁っていたら,GithubにWaveActiveLerp()の実装を書いている人がいたので紹介しようと思います。
下記に説明の記事があります。
https://github.com/AlexSabourinDev/cranberry_blog/blob/master/WaveActiveLerp.md

実装は,https://github.com/AlexSabourinDev/cranberry_blog/blob/master/WaveActiveLerp_Shaders/WaveActiveLerp.hlslにあって,次のような感じみたいです。

uint WaveGetLastLaneIndex()
{
	uint4 ballot = WaveActiveBallot(true);
	uint4 bits = firstbithigh(ballot); // Returns -1 (0xFFFFFFFF) if no bits set.
	
	// For reasons unclear to me, firstbithigh causes us to consider `bits` as a vector when compiling for RDNA
	// This then causes us to generate a waterfall loop later on in WaveReadLaneAt :(
	// Force scalarization here. See: https://godbolt.org/z/barT3rM3W
	bits = WaveReadLaneFirst(bits);
	bits = select(bits == 0xFFFFFFFF, 0, bits + uint4(0, 32, 64, 96));

	return max(max(max(bits.x, bits.y), bits.z), bits.w);
}

float WaveReadLaneLast(float t)
{
	uint lastLane = WaveGetLastLaneIndex();
	return WaveReadLaneAt(t, lastLane);
}

// Interpolates as lerp(lerp(Lane2, Lane1, t1), Lane0, t0), etc
// 
// NOTE: Values need to be sorted in order of last interpolant to first interpolant.
// 
// As an example, say we have the loop:
// for(int i = 0; i < 4; i++)
//    result = lerp(result, values[i], interpolations[i]);
// 
// Lane0 should hold the last value, i.e. values[3]. NOT values[0].
// 
// WaveActiveLerp instead implements the loop as a reverse loop:
// for(int i = 3; i >= 0; i--)
//    result = lerp(result, values[i], interpolations[i]);
// 
// return.x == result of the wave's interpolation
// return.y == product of all the wave's (1-t) for continued interpolation.
float2 WaveActiveLerp(float value, float t)
{
	// lerp(v1, v0, t0) = v1 * (1 - t0) + v0 * t0
	// lerp(lerp(v2, v1, t1), v0, t0)
	// = (v2 * (1 - t1) + v1 * t1) * (1 - t0) + v0 * t0
	// = v2 * (1 - t1) * (1 - t0) + v1 * t1 * (1 - t0) + v0 * t0

	// We can then split the elements of our sum for each thread.
	// Lane0 = v0 * t0
	// Lane1 = v1 * t1 * (1 - t0)
	// Lane2 = v2 * (1 - t1) * (1 - t0)

	// As you can see, each thread's (1 - tn) term is simply the product of the previous thread's terms.
	// We can achieve this result by using WavePrefixProduct
		
	float prefixProduct = WavePrefixProduct(1.0f - t);
	float laneValue = value * t * prefixProduct;
	float interpolation = WaveActiveSum(laneValue);

	// If you don't need this for a continued interpolation, you can simply remove this part.
	float postfixProduct = prefixProduct * (1.0f - t);
	float oneMinusT = WaveReadLaneLast(postfixProduct);

	return float2(interpolation, oneMinusT);
}

いまのところで,使いどころがパッと浮かばないのですが,知っていればどこかで使えそうな気がしています。
…というわけで,WaveActiveLerp()の実装紹介でした。