Screen Gloom Optimization in Wavetale

Intro

I wasn't part of the original Wavetale team as I was busy working on Lost in Random. Over the last year and a half, however, I have been part of a small team working to port Wavetale to more platforms. I've taken notes of some of the more interesting developments during this time and intend to publish a series of articles about them. This is the first of those, not counting the one introducing my Shader Cleanup tool.

We started our work by targeting the current gen platforms: Playstation 5, Xbox Series X|S, and PC. While looking into graphics performance I had noticed that a surprising amount of time was spent in the UI rendering: a whopping 7 ms per frame on Xbox Series X. For a game with a relatively sparse UI this was clearly not normal. When looking into it I soon noticed the culprit: a single post effect running a full screen pass once per frame was taking 7 ms to render. The effect in question was the Screen Gloom.

The original gloom shader

I was actually quite excited for this task. Most of the time my tasks are quite straightforward: I've identified some part of the rendering pipeline which is taking more resources than I feel it should. I look into why and find some minor changes which fix the issue. Sometimes I tweak the code, sometimes I change some settings. The larger tasks are mainly about refactoring things, implementing known techniques, etc. It's quite rare that I get free rein to just create a visual effect with no requirement other than it being efficient enough and invoking some kind of feeling. It's also highly unusual that what I work on is this abstract and artistic. I'm usually writing code to try and simulate some real-world phenomenon and trying to find a good balance between realism and efficiency. With this task I was quite free to just do anything which was fast enough and looked somewhat like gloom floating across the screen. As I took on the task I joked that I was donning my beret because today I was an artist!

The task itself wasn't big. After some initial discussions I only spent two days actually looking into the problem and creating my replacement effect. But it was a surprisingly enjoyable task and I was quite happy with the result. What really made me want to make a writeup, however, was the fact that this is a short and relatively simple shader packed with fun little details to talk about. It's also entirely standalone and as such doesn't touch on any sensitive information about our code base in general.

Purpose of the shader

In Wavetale all enemies are made of a mysterious gooey substance known as gloom. All damage you take is from being covered in gloom which threatens to enclose you entirely, as it already has many of the NPCs you need to rescue throughout the world. When our hero Sigrid takes damage you can see gloom begin to cover her body. To reinforce this there is also a shader where the more damage you take the more the screen becomes covered in blobs of gloom floating around menacingly. This is our Screen Gloom effect. As originally shipped with Stadia this effect clearly does its intended job and performance-wise it wasn't an issue.

The problem

While the original screen gloom shader did what was needed for Stadia it wasn't efficient enough for other platforms. As mentioned above, on an Xbox Series X the pass took 7 ms. To put that in perspective at 60 frames per second we only have 16.7 ms to render the entire screen including all of the world and its characters, our intricate water shader, water reflections, etc. To have over 40% of that time eaten up by a single visual effect visible only while taking damage is clearly unacceptable.

So why was this original version so expensive? The basic idea of it was to animate two layers of gloom blobs creeping across the screen. These blobs are generated using Voronoi noise, which partitions the screen into a set of cells.

Voronoi noise generated by the Voronoi node in Unity's Shader Graph

This type of noise is a good match because of its organic look which resembles biological cells. Unfortunately, a single instance of this noise requires 9 samples per pixel each of which is calculated using two calls to sin() and one to cos(). With two blocks of two layers each used to generate the blobs and spread them across the screen this ends up being 108 trigonometric functions for the noise alone. Add to this a bunch of code for fake lighting and a few other details and you have yourself one very heavy effect. For comparison, the original shader contains over a thousand floating point operations while the replacement contains just over a hundred. I spent some time comparing them and for every metric (float32 operations, instruction count, average cycles, ...) the original is between 6 and 11 times worse than the replacement.

Initial ideas

I wanted to create something which would give the same general impression as the original solution but with a lot less calculations. The most important consideration was making something significantly simpler which wouldn't have to be replaced again as we moved on to optimzing the game for weaker platforms. It was also early in the porting project and I had a lot of other things which needed doing. As such, I needed a solution which was simple, quick to implement, efficient, and looked decent enough. I sat down and wrote down a number of alternative approaches:

The last approach was the one I felt had the most potential and so I decided to dedicate a couple of days to trying it out and seeing how close it would get me. This ended up being good enough that the resulting shader got signed off and the task closed. So I moved on with my life, until we finished the work on current gen consoles and I realised this might actually make for an interesting article.

The results

Before we go into the more technical details, let's look at the results. Here you can see a comparison between the original Stadia version of the Screen Gloom and my replacement used in all ports of the game.

The original screen gloom implementation followed by the replacement

You will notice that there are some differences in the visual characteristics of the two, but overall they convey the same idea: a set of gloom blobs are swimming across the screen, crawling their way across by extending outwards and contracting. The original effect looks its best at 0% health. The replacement is instead optimized to look its best at ~50% health, which is where I expect you to be most of the time when seeing this effect.

As for performance? The original version took 7 ms on stadia. The replacement shader, on the other hand, costs 0.3 ms.

The chosen idea

So what, more exactly, is the intuition behind this idea and how did I go about implementing it? To understand what the shader does, let's look at this video. This was recorded in Unity's game view with my debug grid active and live tweaking the parameters to add each aspect of the solution one at a time. I've added a link to the relevant timestamp in each step of the explanation below.

A video displaying how the screen gloom replacement works by successively modifying parameters to add its different effects.

  1. Here you can see that we start off with a simple grid of circles of pseudorandom radius. You may notice that a lot of cells are empty. This helps obscure the underlying grid pattern.
  2. We then start scrolling the grid across the screen.
  3. Afterwards we warp the coordinate system by offsetting it by a combination of sine and cosine of time and uv coordinates. This makes our blobs sway back and forth as they traverse the screen.
  4. On top of this, we apply a scaled version of the same warp, with swapped x and y coordinates. Now our blobs move forward through undulating contractions instead of just floating by. This helps their movements look more organic and less synchronized.
  5. We now add a second layer of blobs with another scale.
  6. Finally, a mask is added such that the blob radius is scaled towards zero the closer they get to the screen center. This makes the blobs appear to be rendered as a vignette.

Implementation

So, how is this effect actually computed? The main part of the shader consists of the blobGrid() function which, after simplifying it somewhat (mainly removing the debug visualization), looks like this:

int blobGrid(float2 uv, float gridSize, float gravity, out float lightFactor)
{
	float aspect = _ScreenParams.x / _ScreenParams.y;
	float2 aspectCorrectedUV = float2(uv.x * aspect,uv.y);
	float2 toCenter = (uv - float2(0.5,0.5));
	float toCenterDistSq = dot(toCenter,toCenter);

	// UV offset
	float2 offsetUV = 0.5 + _OffsetScale0 * 
		float2(2.446 * cos(_Time.x * 40.0) * sin(13.4 * uv.y - 2.3 * uv.x) * cos(0.4 * uv.y + 9.3 * uv.x)
				, -1.453 * sin(uv.y));

	// Combine UV distortions
	float2 compositedUV = aspectCorrectedUV + offsetUV + gravity * _OffsetScale1 * float2(offsetUV.y, -offsetUV.x);

	// Add gravity
	float2 finalUV = compositedUV + float2(0.0,gravity * _Time.x);

	float2 gridIndex = floor(finalUV * gridSize);

	// Drop some to obscure grid pattern
	if(fmod(gridIndex.x, 5.0) == 0.0 
			|| fmod(gridIndex.y, 5.0) == 0.0 
			|| fmod(gridIndex.x - gridIndex.y, 3.0) == 0.0)
		return COLOR_NONE;

	// Radius
	float health = 1.0 - _HealthLevel;
	float centerFade = saturate((toCenterDistSq - _CenterMaskRadius) / _CenterMaskRadius);
	float radiusshift = saturate(0.8 + cos(gridIndex.x + gridIndex.y));
	float cosIndex = cos(157.25 * gridIndex.x + 4414.146 * gridIndex.y);
	float cosRadius = 0.5 * (2.0 - cosIndex);
	float radius = _BlobRadius * radiusshift * cosRadius * centerFade * health;
	float rimWidth = _RimWidth * centerFade * health;

	int exitCode = blobIntersect(finalUV, radius, rimWidth, gridSize);

	// Fake lighting
	lightFactor = 1.0;
	if(exitCode == COLOR_BASE)
	{
		float2 localUV = frac(gridSize * finalUV);
		float2 toHighlight = localUV - float2(0.4,0.6);
		float highlight = 0.3 * (dot(toHighlight, toHighlight) < 0.05);
		float dropShadow = 1.0 - localUV.y;
		lightFactor = 1.0 + highlight - (dropShadow * dropShadow);
	}

	return exitCode;

}

The blobIntersect function called near the end simply checks whether the generated coordinates are within the base, rim, or entirely outside of the corresponding blob:

int blobIntersect(float2 uv, float radius, float rimWidth, float gridSize)
{
	float2 blobGrid = frac(gridSize * uv);
	float len = distance(blobGrid,float2(0.5,0.5));
	float diff = len - radius;
	if(diff <= 0)
		return COLOR_BASE;
	else if(diff < rimWidth)
		return COLOR_RIM;
	return COLOR_NONE;
}

All of this generates an exit code which is one of COLOR_NONE, COLOR_RIM, or COLOR_BASE. This is later translated to an actual colour configurable in the material properties and modified using the lightFactor.

Breaking down the code

So let's go through these lines and look in detail at what most of them actually do and why. I'll be skipping some of the most boring or obvious parts.

// UV offset
float2 offsetUV = 0.5 + _OffsetScale0 * 
	float2(2.446 * cos(_Time.x * 40.0) * sin(13.4 * uv.y - 2.3 * uv.x) * cos(0.4 * uv.y + 9.3 * uv.x)
			, -1.453 * sin(uv.y));

Here we calculate a first offset to our uv coordinates. We take the input _OffsetScale0, configurable on the material, and multiply by a combination of cos() and sin(), each with different scales. These numbers, like most here, are magic constants based on playing around and seeing what works well. I tried a lot of things which didn't end up used. Originally I based the distortion on a rotation of the grid, but it was quite far from the original effect and never looked natural. I also tried using the exponential function exp() but it ended up skewing things far too strongly. In the end this simpler position offset in two steps ended up looking the best. The important part here is that for x we combine uv coordinates and current time in a way which is messy and unpredictable to the human eye. For y we only add a simple vertical distortion based on the y coordinate itself. This is because while we want a lot of spurious horizontal movement we mainly want our blobs to move at a steady pace vertically.

If you haven't worked with shaders before you might want to note the 0.5 added in front: this is a commonly used trick. Since our trigonometric functions all return values in the range [-1,1] and we want to map this to [0,1] we need to remap the range. To do this, given a value uv we can simply do 0.5 + 0.5 * uv. If you're wondering where the factor 0.5 went in the above code it's assumed to be baked into the _OffsetScale0 variable. The same remapping is often used to move from clip space to normalized device coordinates as well.

You'll notice throughout this code that I largely generate values in the [0,1] range before applying a user configurable scale. This makes it a lot easier to balance our equations so cells don't become degenerate or discontinuous.

// Combine UV distortions
float2 compositedUV = aspectCorrectedUV + offsetUV + gravity * _OffsetScale1 * float2(offsetUV.y, -offsetUV.x);

Here we add our aspect corrected uv with the offset computed above. We also add a flipped version of the same distortion, scaled by both gravity and its own parameter. This is a simple way to perform another layer of unpredictable distortions with very little extra computational cost. The combination of our two offfsets is what gives the undulating movement.

float2 gridIndex = floor(finalUV * gridSize);

// Drop some to obscure grid pattern
if(fmod(gridIndex.x, 5.0) == 0.0 
		|| fmod(gridIndex.y, 5.0) == 0.0 
		|| fmod(gridIndex.x - gridIndex.y, 3.0) == 0.0)
	return COLOR_NONE;

A lot of the details throughout this function is simply stuff tweaked to make it more difficult for the player to discern any repeating patterns. Dropping a carefully chosen pattern of cells entirely was one of the best ways I found to obscure the fact that this is an animated grid. As long as the user can tell it's a grid the motion of different blobs becomes too predictable, which breaks the illusion.

We calculate which cell our current pixel resides in using floor(), which rounds down to the nearest integer. This corresponds to row and column number. I then drop every fifth row and column as well as each cell where the difference between column and row index equals three. I actually spent quite a lot of time, maybe an hour of active time spread over a few different sessions, trying different patterns before settling on this one.

float health = 1.0 - _HealthLevel;
float centerFade = saturate((toCenterDistSq - _CenterMaskRadius) / _CenterMaskRadius);

Here we calculate two values which are used in scaling blob radius. Health scales all radii meaning the more damage you take, the more our blobs cover the screen. Then we calculate the aforementioned fade of blobs as they approach the center of the screen. Both of these are quite similar to what was done in the original shader. The video below shows how blob radius is scaled up as health goes from 100% to 0%. One nice improvement from the original is that we scale the rim width with blob radius, so small blobs don't look like all rim and no base.

A video displaying how health affects blob radius. Health is briefly kept around 50% at this point.

float radiusshift = saturate(0.8 + cos(gridIndex.x + gridIndex.y));
float cosIndex = cos(157.25 * gridIndex.x + 4414.146 * gridIndex.y);
float cosRadius = 0.5 * (2.0 - cosIndex);
float radius = _BlobRadius * radiusshift * cosRadius * centerFade * health;
float rimWidth = _RimWidth * centerFade * health;

The code to compute blob radius is surprisingly intricate. This is because I quickly noticed that different and unpredictable radius between neighbouring blobs was important to obscure the underlying patterns. These various factors were the results of a lot of experiments and each helped with enforcing the illusion. After experimenting with these factors based on indices of grid cells and seeing the difference it made for radius I actually tried really hard to use them in other parts of the calculation. In the end it simply didn't contribute much to the position offset, mainly causing issues with discontinuity, but felt really good as a radius modifier.

// Fake lighting
lightFactor = 1.0;
if(exitCode == COLOR_BASE)
{
	float2 localUV = frac(gridSize * finalUV);
	float2 toHighlight = localUV - float2(0.4,0.6);
	float highlight = 0.3 * (dot(toHighlight, toHighlight) < 0.05);
	float dropShadow = 1.0 - localUV.y;
	lightFactor = 1.0 + highlight - (dropShadow * dropShadow);
}

This adds a highlight spot to each blob and a simple drop shadow which tapers off quadratically across the height of each blob. It's a simple touch but it makes everything look a lot less flat. The original shader had a much more intricate and physically accurate lighting applied to the blobs.

Potential improvements

While I'm satisfied with the end result it could of course always be improved. From a visual perspective I would say the biggest issue is that neighbouring blobs still move a bit too much in unison. It isn't noticeable during normal gameplay, especially since it's only really visible when heavily damaged for a prolonged period of time. This is unlikely since you heal over time so you'd have to sustain continuous damage without dying. Not to mention that you'd have to stare at these blobs instead of the monsters trying to kill you. Still, when staring at the blobs during development my eyes often fix on the overall pattern in a way which makes it clear that these are circles painted on a deformed surface, not separate blobs moving roughly together. This could possibly be alleviated with a more careful choice of how to deform the grid. I did try being clever and consider scaling factors for the various distortions and the grid resolution in such a way that nearby cells move less in unison. Unfortunately, this mostly resulted either in discontinuity or extreme warping and shearing of the blobs.

Another potential improvement would be to tweak the range of sizes for these blobs. Being a grid there's an unfortunate link between blob radius and density which complicates fine-tuning. One potential solution would be to have more layers and make each very sparse, dropping the majority of blobs. A related concern is that the current solution tends to create a lot of little specks. This is mainly due to blob radius being faded towards zero as they near the center. I tried limiting this by adding a minimum size and culling blobs below it. This made the blob surface discontinuous unless radius is based entirely on grid index. But such a radius makes for a less organic look. It also leads to flickering for blobs which happen to hover near this cutoff. While it would have been a nice improvement it was yet another thing which would have required more tweaking than I had time for.

Knowing myself and the relatively low priority of this task I had given myself a hard limit for how long to spend on it unless asked by our art director or producer to continue improving it. There will always be improvements and I find it far too easy to spend days on end just tweaking minor details without considering how worthwhile it really is. So yes, I do think the visual quality could be improved, but I already picked the low hanging fruit here and I couldn't justify spending more time on this task.

It's also worth considering further performance improvements. I did this while developing the effect and in short there's not much to be gained. There are some obvious slight improvements. The blobIntersect() function uses distance() but only uses the result to compare against radius. This does imply an unnecessary sqrt and we could just dot blobGrid-float2(0.5,0.5) with itself and compare directly to radius * radius. Furthermore, looking at the equations for UV I haven't really put much thought into order of operation. I might be able to decrease the number of operations by reordering things to get more MAD. That is, fused multiply-add, a type of instruction which allows you to get an addition for free when performing multiplication. Would this type of micro optimization help us? No. As a quick test I tried replacing my shader with a trivial one, or even removing the effect altogether, and it did nothing for performance. This shader is run in parallel with other UI work which isn't compute bound, so we just end up with slightly lower GPU utilization instead.

Wavetale is out now!

I hope you enjoyed this trip through one of the many shaders used in Wavetale. The game is out now for Stadia, Xbox Series X|S, Xbox One, Playstation 5, Playstation 4, Nintendo Switch, and PC. It's a fun adventure exploring a beautiful world filled with quirky characters. I feel happy to have been a part of it and I really enjoyed taking the time to play through the story during development.

About the author

Hello,

My name is Daniel "Agentlien" Kvick and I'm a Software Engineer with a passion for games.
I currently work as a Graphics Programmer at Thunderful Development.

Here you'll find a selection of things I have worked on.