# The reverse engineering of the Witcher 3 rendering: Milky Way, portals and color grading

## Part 1: The Milky Way

In a previous post, I talked about how shooting stars are implemented in The Witcher 3. There is no such effect in Blood and Wine. In a post, I will describe the effect that only this DLC has: Milky Way .

Here is a video showing the Milky Way.

And a few screenshots: (1) before the call to draw the dome of the sky, (2) only with the color of the Milky Way, (3) after the call:

**Screenshots**

The finished frame with only one Milky Way (without the color of the sky and stars) looks like this:

The Milky Way effect, which has become one of the strongest differences from the 2015 game version, is briefly mentioned in the section "Silly stunts with the sky" . Let's see how it is implemented!

The outline will be familiar: first, we will briefly explain everything related to geometry, and then we'll talk about the pixel shader.

### 1. Geometry

Let's start with the sky dome mesh used. There are two major differences between the 2015 dome (the main game + DLC “Stone Hearts”, I usually call them both the “2015 game”) and the dome in the Blood and Wine DLC (2016):

a) In “Blood and Wine” the mesh is much denser,

b) Normal mesh vectors are used in the mesh of the KiV sky dome.

Here's the 2015 sky dome mesh - DrawIndexed (720)

*Mesh of the dome of the sky in The Witcher 3 of 2015 - 720 indexes*

And here is the mesh from KiV - DrawIndexed (2640):

*Mesh of the dome of the sky DLC "The Witcher 3: Blood and Wine" - 2640 indexes*

Here is the mesh from KiV again: I drew how the normals are distributed - they are directed to the "center" of the mesh.

*DLC Blood and Wine Dome Mesh with Normals*

### 2. Vertex Shader

The top shader of the sky dome is quite simple. Here is the corresponding assembler code. For the sake of simplicity, I skipped the calculation of SV_Position:

`vs_5_0 dcl_globalFlags refactoringAllowed dcl_constantbuffer cb1[4], immediateIndexed dcl_constantbuffer cb2[6], immediateIndexed dcl_input v0.xyz dcl_input v1.xy dcl_input v2.xyz dcl_output o0.xyzw dcl_output o1.xyzw dcl_output_siv o2.xyzw, position dcl_temps 3 0: mov o0.xy, v1.xyxx 1: mad r0.xyz, v0.xyzx, cb2[4].xyzx, cb2[5].xyzx 2: mov r0.w, l(1.000000) 3: dp4 o0.z, r0.xyzw, cb2[0].xyzw 4: dp4 o0.w, r0.xyzw, cb2[1].xyzw 5: mad r1.xyz, v2.xyzx, l(2.000000, 2.000000, 2.000000, 0.000000), l(-1.000000, -1.000000, -1.000000, 0.000000) 6: dp3 r2.x, r1.xyzx, cb2[0].xyzx 7: dp3 r2.y, r1.xyzx, cb2[1].xyzx 8: dp3 r2.z, r1.xyzx, cb2[2].xyzx 9: dp3 r1.x, r2.xyzx, r2.xyzx 10: rsq r1.x, r1.x 11: mul o1.xyz, r1.xxxx, r2.xyzx 12: dp4 o1.w, r0.xyzw, cb2[2].xyzw `

The input from the vertex buffer is:

1) Position in the local space [0-1] - v0.xyz,

2) Texcoords - v1.xy,

3) The normal vector [0-1] - v2.xyz

Incoming data from cbuffer:

1) The World Matrix (0-3) - the classic approach: uniform scaling and transformation by camera position,

2) The scale and offset for the vertex (4-5) is a trick used during the game to convert from local space [0-1] to space [-1; 1], and for potential “flattening” of meshes.

And here is a brief description of what is happening in the shader:

The shader begins with a simple transfer of texcoords (line 0). The scale and the offset are applied to the position of the top in the world (line 1), and the result is multiplied by the world matrix (lines 3-4, 12). The normal vector should be transferred from the interval [0-1] to [-1; 1] (line 5), and then it is multiplied by the world matrix (lines 6-8) and normalized at the end (lines 9-11).

The finished output has the following schema:

### 3. Pixel Shader

Milky Way calculations are just one part of the sky shader. In KiV it is much longer than in the 2015 version. It consists of 385 lines in assembler, and the 2015 version consists of 267 lines.

Let's look at the assembler code fragment responsible for the Milky Way:

`175: sample_indexable(texturecube)(float,float,float,float) r4.xyz, r2.xyzx, t0.xyzw, s0 176: mul r4.xyz, r4.xyzx, r4.xyzx 177: sample_indexable(texturecube)(float,float,float,float) r0.w, r2.xyzx, t1.yzwx, s0 178: dp3 r1.w, v1.xyzx, v1.xyzx 179: rsq r1.w, r1.w 180: mul r2.xyz, r1.wwww, v1.xyzx 181: dp3 r1.w, cb12[204].yzwy, cb12[204].yzwy 182: rsq r1.w, r1.w 183: mul r5.xyz, r1.wwww, cb12[204].yzwy 184: dp3 r1.w, r2.xyzx, r5.xyzx 185: mad_sat r0.w, r0.w, l(0.200000), r1.w 186: ge r1.w, l(0.497925), r0.w 187: if_nz r1.w 188: ge r1.w, l(0.184939), r0.w 189: mul r2.y, r0.w, l(5.407188) 190: min r2.z, r2.y, l(1.000000) 191: mad r2.w, r2.z, l(-2.000000), l(3.000000) 192: mul r2.z, r2.z, r2.z 193: mul r2.z, r2.z, r2.w 194: mul r5.xyz, r2.zzzz, l(0.949254, 0.949254, 0.949254, 0.000000) 195: mov r2.x, l(0.949254) 196: movc r2.xw, r1.wwww, r2.xxxy, l(0.000000, 0.000000, 0.000000, 0.500000) 197: not r4.w, r1.w 198: if_z r1.w 199: ge r1.w, l(0.239752), r0.w 200: add r5.w, r0.w, l(-0.184939) 201: mul r6.y, r5.w, l(18.243849) 202: mov_sat r5.w, r6.y 203: mad r6.z, r5.w, l(-2.000000), l(3.000000) 204: mul r5.w, r5.w, r5.w 205: mul r5.w, r5.w, r6.z 206: mad r5.w, r5.w, l(-0.113726), l(0.949254) 207: movc r5.xyz, r1.wwww, r5.wwww, r5.zzzz 208: and r7.xyz, r1.wwww, l(0.949254, 0.949254, 0.949254, 0.000000) 209: mov r6.x, l(0.835528) 210: movc r2.xw, r1.wwww, r6.xxxy, r2.xxxw 211: mov r2.xyzw, r2.xxxw 212: else 213: mov r7.xyz, l(0, 0, 0, 0) 214: mov r2.xyzw, r2.xxxw 215: mov r1.w, l(-1) 216: endif 217: not r5.w, r1.w 218: and r4.w, r4.w, r5.w 219: if_nz r4.w 220: ge r5.w, r0.w, l(0.239752) 221: ge r6.x, l(0.294564), r0.w 222: and r1.w, r5.w, r6.x 223: add r5.w, r0.w, l(-0.239752) 224: mul r6.w, r5.w, l(18.244175) 225: mov_sat r5.w, r6.w 226: mad r7.w, r5.w, l(-2.000000), l(3.000000) 227: mul r5.w, r5.w, r5.w 228: mul r5.w, r5.w, r7.w 229: mad r5.w, r5.w, l(0.015873), l(0.835528) 230: movc r5.xyz, r1.wwww, r5.wwww, r5.xyzx 231: movc r7.xyz, r1.wwww, l(0.835528, 0.835528, 0.835528, 0.000000), r7.xyzx 232: mov r6.xyz, l(0.851401, 0.851401, 0.851401, 0.000000) 233: movc r2.xyzw, r1.wwww, r6.xyzw, r2.xyzw 234: endif 235: not r5.w, r1.w 236: and r4.w, r4.w, r5.w 237: if_nz r4.w 238: ge r1.w, r0.w, l(0.294564) 239: add r0.w, r0.w, l(-0.294564) 240: mul r6.w, r0.w, l(4.917364) 241: mov_sat r0.w, r6.w 242: mad r4.w, r0.w, l(-2.000000), l(3.000000) 243: mul r0.w, r0.w, r0.w 244: mul r0.w, r0.w, r4.w 245: mad r0.w, r0.w, l(-0.851401), l(0.851401) 246: movc r5.xyz, r1.wwww, r0.wwww, r5.xyzx 247: movc r7.xyz, r1.wwww, l(0.851401, 0.851401, 0.851401, 0.000000), r7.xyzx 248: mov r6.xyz, l(0, 0, 0, 0) 249: movc r2.xyzw, r1.wwww, r6.xyzw, r2.xyzw 250: endif 251: else 252: mov r7.xyz, l(0, 0, 0, 0) 253: mov r2.xyzw, l(0.000000, 0.000000, 0.000000, 0.500000) 254: mov r1.w, l(0) 255: endif 256: mov_sat r2.w, r2.w 257: mad r0.w, r2.w, l(-2.000000), l(3.000000) 258: mul r2.w, r2.w, r2.w 259: mul r0.w, r0.w, r2.w 260: add r2.xyz, -r7.xyzx, r2.xyzx 261: mad r2.xyz, r0.wwww, r2.xyzx, r7.xyzx 262: movc r2.xyz, r1.wwww, r5.xyzx, r2.xyzx 263: mul r2.xyz, r2.xyzx, l(0.150000, 0.200000, 0.250000, 0.000000) `

Pretty scary, isn't it? When I saw him for the first time (this was before I saw the shader of the falling stars), I thought: “What the hell is this? This code cannot be reversed! ”

But there is one aspect - if you read a post about shooting stars, you can easily recognize this pattern. The code works very much like rendering meteorites! Soon we'll talk about the curve.

The fragment starts with sampling the cubic map of stars (line 175), where the direction of sampling is stored in r2.xyz. As you can see, line 177 has an instruction to sample another cubic map. Unlike the 2015 shader, the KiV shader has another texture called a “cubic noise map”, the edges of which look something like this:

Before we get to the curve, let's find the input for it. First, the scalar product (line 184) is calculated between the normalized normal vector of the dome of the sky (lines 178-180) and the light vector of the moon (lines 181-183) - in fact, this is N * L.

Here's a visualization of a scalar product (in linear space):

The value used as input for the Milky Way Curve function is obtained on line 185:

*x=saturate (noise * 0.2 + Ndot);*

But the visualization of such a distorted N * L, also in linear space:

Now let's move on to the Milky Way function! It is a little more complicated than the function of shooting stars. As I said in a previous post, we start with a list of control points along the x axis. Having looked at the assembler code, we will immediately see them:

`//Control points (x-axis) float controlPoint0=0.0; float controlPoint1=0.184939; float controlPoint2=0.239752; float controlPoint3=0.294564; float controlPoint4=0.497925; `

How do we know that the first control points are zero? It's pretty simple: line 189 doesn’t have an add statement.

According to the post about shooting stars, control points determine the number of segments, and then we need to find weights for them.

For the first segment it is quite simple. Weight is 0.949254:

`194: mul r5.xyz, r2.zzzz, l(0.949254, 0.949254, 0.949254, 0.000000) 195: mov r2.x, l(0.949254) `

Let's try to find them for the second and third segments:

`206: mad r5.w, r5.w, l(-0.113726), l(0.949254) 207: movc r5.xyz, r1.wwww, r5.wwww, r5.zzzz 208: and r7.xyz, r1.wwww, l(0.949254, 0.949254, 0.949254, 0.000000) 209: mov r6.x, l(0.835528) ... 229: mad r5.w, r5.w, l(0.015873), l(0.835528) 230: movc r5.xyz, r1.wwww, r5.wwww, r5.xyzx 231: movc r7.xyz, r1.wwww, l(0.835528, 0.835528, 0.835528, 0.000000), r7.xyzx 232: mov r6.xyz, l(0.851401, 0.851401, 0.851401, 0.000000) `

It was at this point that I stopped writing the article because something was wrong here (one of the moments when you think “hmm”). Look, everything is not as easy as a simple multiplication by one weight. Also, where did the values like -0.113726 and 0.015873 come from?

Then I realized that these values are simply the differences between the maximum possible values in each segment (0.835528 - 0.949254=-0.113726 and 0.851401 - 0.835528=0.015873)! Pretty obvious (one of the moments when you think “eureka!”). As it turned out, these values are not weights, but simply the y coordinates of the points that form the curve!

This changes and simplifies a lot. First, we can get rid of the weight in the function used in the previous post

`float getSmoothTransition(float cpLeft, float cpRight, float x) { return smoothstep( 0, 1, linstep(cpLeft, cpRight, x) ); } `

And we can write the Milky Way function as follows:

`float milkyway_curve( float x ) {//Define a set of 2D points which form the curve//Of course, you can use a Point2D-like struct here//Control points (x-axis) float controlPoint0=0.0; float controlPoint1=0.184939; float controlPoint2=0.239752; float controlPoint3=0.294564; float controlPoint4=0.497925;//Values at points (y-axis) float value0=0.0; float value1=0.949254; float value2=0.835528; float value3=0.851401; float value4=0.0; float function_value=0.0; [branch] if (x <= controlPoint4) { [branch] if (x <= controlPoint1) { float t=getSmoothTransition(controlPoint0, controlPoint1, x); function_value=lerp(value0, value1, t); } [branch] if (x >= controlPoint1 && x <= controlPoint2) { float t=getSmoothTransition(controlPoint1, controlPoint2, x); function_value=lerp(value1, value2, t); } [branch] if (x >= controlPoint2 && x <= controlPoint3) { float t=getSmoothTransition(controlPoint2, controlPoint3, x); function_value=lerp(value2, value3, t); } [branch] if (x >= controlPoint3) { float t=getSmoothTransition(controlPoint3, controlPoint4, x); function_value=lerp(value3, value4, t); } } return function_value; } `

This is a generalized solution for any number of points forming a smooth curve. In addition, it explains the origin of the “strange” values of control points - probably the developers used some kind of visual editor to set the points.

Of course, the same principle applies to the shooting star code.

Here is the function graph:

*Milky Way function graph.*

Red - function value,

Green - x coordinates

Blue - y coordinates

Yellow dots - control

OK, but what next? On line 263, we multiply the value from the function by a bluish color:

`263: mul r2.xyz, r2.xyzx, l(0.150000, 0.200000, 0.250000, 0.000000) `

But this is not the end! We just need to perform gamma correction:

`263: mul r2.xyz, r2.xyzx, l(0.150000, 0.200000, 0.250000, 0.000000) 264: mad r2.xyz, r4.xyzx, l(3.000000, 3.000000, 3.000000, 0.000000), r2.xyzx ... 269: log r2.xyz, r2.xyzx 270: mul r2.xyz, r2.xyzx, l(2.200000, 2.200000, 2.200000, 0.000000) 271: exp r2.xyz, r2.xyzx `

Now an interesting thing: I assigned different colors to the control points along the x axis:

`float3 gradient0=float3(1, 0, 0); float3 gradient1=float3(0, 1, 0); float3 gradient2=float3(0, 0, 1); float3 gradient3=float3(1, 1, 0); float3 gradient4=float3(0, 1, 1); `

And here's what I got:

And on this, practically everything has been done for the Milky Way.

Line 264 has r4.xyz and...

### 4. Tussent Stars (Bonus)

I know that this part of the article is called “The Milky Way”, but I could not help telling me briefly how Tussent’s stars are created. They are much brighter than in Novigrad, on Skelig or in Velen.

In one of the previous posts I talked about the stars of 2015; it's time to talk about the stars of 2016!

In fact, the bulk of the assembler code looks like this:

`175: sample_indexable(texturecube)(float,float,float,float) r4.xyz, r2.xyzx, t0.xyzw, s0 176: mul r4.xyz, r4.xyzx, r4.xyzx ... 264: mad r2.xyz, r4.xyzx, l(3.000000, 3.000000, 3.000000, 0.000000), r2.xyzx ... 269: log r2.xyz, r2.xyzx 270: mul r2.xyz, r2.xyzx, l(2.200000, 2.200000, 2.200000, 0.000000) 271: exp r2.xyz, r2.xyzx ... 302: add r0.z, -cb0[9].w, l(1.000000) 303: mul r2.xyz, r0.zzzz, r2.xyzx 304: add r2.xyz, r2.xyzx, r2.xyzx `

On HLSL it can be written like this:

`float3 stars=texStars.Sample(sampler, starsDir).rgb; stars *= stars; float3 milkyway=milkyway_func(noisePerturbed) * float3(0.15, 0.20, 0.25); float3 skyContribution=milkyway + 3.0 * stars;//gamma correction skyContribution=pow(skyContribution, 2.2);//starsOpacity - 0.0 during the day (so stars and the Milky Way are not visible then), 1.0 during the night float starsOpacity=1.0 - cb0_v9.w; skyContribution *= starsOpacity; skyContribution *= 2; `

That is, the stars themselves are simply multiplied by 3 (line 264), and then, together with the influence of the Milky Way by 2 (line 304), this is an old-school method, but it works fine!

Of course, something else happens later (for example, flickering stars with integer noise, etc.), but this does not apply to the topic of the article.

### Conclusion

In this part, I figured out how the “The Witcher 3: Blood and Wine” implements the Milky Way and the stars.

Let's replace the original shader with the code we just wrote. The finished frame looks like this:

and with the original shader, the frame looks like this:

Not bad.

## Part 2: color grading

One of the post-processing effects that can be found in The Witcher 3 almost everywhere is color grading. Its principle is to use the lookup table texture (LUT) to convert one set of colors to another.

Usually the process looks like this: there is a neutral (output color=incoming color) search table that is edited in tools like Adobe Photoshop - its contrast/brightness/saturation/hue, etc. is enhanced, that is, all modifications and changes that are quite expensive when calculating in real time. Thanks to LUTs, these operations can be replaced with a less expensive texture search.

There are at least three LUT color tables known to me: three-dimensional, “long” two-dimensional, and “square” two-dimensional.

*Neutral "long" two-dimensional LUT*

*Neutral "square" two-dimensional LUT*

Before we get to the implementation of color grading in The Witcher 3, here are some useful links on this technique:

Good OpenGL implementation with online demo

Color Correction

Metal Gear Solid V graphic research (good the article as a whole, there is a section on color grading) [translation in Habré]

Color correction using search textures (LUT)

topic from the gamedev.net forum

Article from the book “GPU Gems 2” - color correction using 3D textures

UE4 documentation on creating and using color LUTs

Let's take a look at the LUT example used around the beginning of the White Orchard game - most of the green colors have been replaced by yellow ones:

The Witcher 3 uses two-dimensional 512x512 textures.

In general, color correction is expected to be performed in the LDR space. Therefore, 256

^{3 }of possible input values are obtained - more than 16 million combinations, converted to a total of 512

^{2 }=262 144 values. To cover the entire range of input values, bilinear sampling is used.

And here are screenshots for comparison: before and after the passage of color correction.

As you can see, the difference is small, but noticeable - the sky has a more orange tint.

As for the implementation in The Witcher 3, both the input and output render targets are full-screen floating-point textures (R11G11B10). It is curious that specifically in this scene, the channels of the brightest pixels (near the Sun) have values exceeding 1.0f - even up to almost 2.0f!

Here is the assembler code for the pixel shader:

`ps_5_0 dcl_globalFlags refactoringAllowed dcl_constantbuffer cb3[2], immediateIndexed dcl_sampler s0, mode_default dcl_sampler s1, mode_default dcl_resource_texture2d (float,float,float,float) t0 dcl_resource_texture2d (float,float,float,float) t1 dcl_input_ps linear v1.xy dcl_output o0.xyzw dcl_temps 5 0: max r0.xy, v1.xyxx, cb3[0].xyxx 1: min r0.xy, r0.xyxx, cb3[0].zwzz 2: sample_indexable(texture2d)(float,float,float,float) r0.xyzw, r0.xyxx, t0.xyzw, s0 3: log r1.xyz, abs(r0.xyzx) 4: mul r1.xyz, r1.xyzx, l(0.454545, 0.454545, 0.454545, 0.000000) 5: exp r1.xyz, r1.xyzx 6: mad r2.xyz, r1.xyzx, l(1.000000, 1.000000, 0.996094, 0.000000), l(0.000000, 0.000000, 0.015625, 0.000000) 7: min r2.xyz, r2.xyzx, l(1.000000, 1.000000, 1.000000, 0.000000) 8: min r2.z, r2.z, l(0.999990) 9: add r2.xy, r2.xyxx, l(0.007813, 0.007813, 0.000000, 0.000000) 10: mul r2.xyzw, r2.xyzz, l(0.996094, 0.996094, 64.000000, 8.000000) 11: max r2.xy, r2.xyxx, l(0.015625, 0.015625, 0.000000, 0.000000) 12: min r2.xy, r2.xyxx, l(0.984375, 0.984375, 0.000000, 0.000000) 13: round_ni r3.xz, r2.wwww 14: mad r2.z, -r3.x, l(8.000000), r2.z 15: round_ni r3.y, r2.z 16: mul r2.zw, r3.yyyz, l(0.000000, 0.000000, 0.125000, 0.125000) 17: mad r2.xy, r2.xyxx, l(0.125000, 0.125000, 0.000000, 0.000000), r2.zwzz 18: sample_l(texture2d)(float,float,float,float) r2.xyz, r2.xyxx, t1.xyzw, s1, l(0) 19: mul r2.w, r1.z, l(63.750000) 20: round_ni r2.w, r2.w 21: mul r1.w, r2.w, l(0.015625) 22: mad r1.z, r1.z, l(63.750000), -r2.w 23: min r1.xyw, r1.xyxw, l(1.000000, 1.000000, 0.000000, 1.000000) 24: min r1.w, r1.w, l(0.999990) 25: add r1.xy, r1.xyxx, l(0.007813, 0.007813, 0.000000, 0.000000) 26: mul r1.xy, r1.xyxx, l(0.996094, 0.996094, 0.000000, 0.000000) 27: max r1.xy, r1.xyxx, l(0.015625, 0.015625, 0.000000, 0.000000) 28: min r1.xy, r1.xyxx, l(0.984375, 0.984375, 0.000000, 0.000000) 29: mul r3.xy, r1.wwww, l(64.000000, 8.000000, 0.000000, 0.000000) 30: round_ni r4.xz, r3.yyyy 31: mad r1.w, -r4.x, l(8.000000), r3.x 32: round_ni r4.y, r1.w 33: mul r3.xy, r4.yzyy, l(0.125000, 0.125000, 0.000000, 0.000000) 34: mad r1.xy, r1.xyxx, l(0.125000, 0.125000, 0.000000, 0.000000), r3.xyxx 35: sample_l(texture2d)(float,float,float,float) r1.xyw, r1.xyxx, t1.xywz, s1, l(0) 36: add r2.xyz, -r1.xywx, r2.xyzx 37: mad r1.xyz, r1.zzzz, r2.xyzx, r1.xywx 38: log r1.xyz, abs(r1.xyzx) 39: mul r1.xyz, r1.xyzx, l(2.200000, 2.200000, 2.200000, 0.000000) 40: exp r1.xyz, r1.xyzx 41: mad r1.xyz, cb3[1].zzzz, r1.xyzx, -r0.xyzx 42: mad o0.xyz, cb3[1].yyyy, r1.xyzx, r0.xyzx 43: mov o0.w, r0.w 44: ret `

In general, the developers of "The Witcher 3" did not invent a bicycle and use a lot of "reliable" code. This is logical, because this is one of the effects in which you need to be extremely careful with the coordinates of the textures.

However, two LUT requests are required, this is a consequence of using a 2D texture - you need to simulate bilinear sampling of the blue channel. In the OpenGL implementation provided by the link above, the merging of these two requests depends on the fractional part of the blue channel.

What seemed interesting to me was the lack of instructions in the assembler code ceil (

*round_pi*) and frac (

*frc*). However, it contains quite a few instructions floor (

*round_ni*).

A shader starts by getting an incoming color texture and extracting color from it in gamma space:

`float3 LinearToGamma(float3 c) { return pow(c, 1.0/2.2); } float3 GammaToLinear(float3 c) { return pow(c, 2.2); } ...//Set range of allowed texcoords float2 minAllowedUV=cb3_v0.xy; float2 maxAllowedUV=cb3_v0.zw; float2 samplingUV=clamp( Input.Texcoords, minAllowedUV, maxAllowedUV );//Get color in *linear* space float4 inputColorLinear=texture0.Sample( samplerPointClamp, samplingUV );//Calculate color in *gamma* space for RGB float3 inputColorGamma=LinearToGamma( inputColorLinear.rgb ); `

Valid sampling coordinates min and max are taken from cbuffer:

This particular frame was captured in 1920x1080 resolution - the max values are: (1919/1920, 1079/1080)

It's pretty easy to notice that the assembler code for the shader contains two pretty similar blocks, followed by retrieving data from the LUT. So I created a helper function that calculates uv for LUT. Let's first take a look at the corresponding assembler code:

`7: min r2.xyz, r2.xyzx, l(1.000000, 1.000000, 1.000000, 0.000000) 8: min r2.z, r2.z, l(0.999990) 9: add r2.xy, r2.xyxx, l(0.007813, 0.007813, 0.000000, 0.000000) 10: mul r2.xyzw, r2.xyzz, l(0.996094, 0.996094, 64.000000, 8.000000) 11: max r2.xy, r2.xyxx, l(0.015625, 0.015625, 0.000000, 0.000000) 12: min r2.xy, r2.xyxx, l(0.984375, 0.984375, 0.000000, 0.000000) 13: round_ni r3.xz, r2.wwww 14: mad r2.z, -r3.x, l(8.000000), r2.z 15: round_ni r3.y, r2.z 16: mul r2.zw, r3.yyyz, l(0.000000, 0.000000, 0.125000, 0.125000) 17: mad r2.xy, r2.xyxx, l(0.125000, 0.125000, 0.000000, 0.000000), r2.zwzz 18: sample_l(texture2d)(float,float,float,float) r2.xyz, r2.xyxx, t1.xyzw, s1, l(0) `

Here r2.xyz is the input color.

The first thing that happens is checking if the incoming data is in the interval [0-1]. (line 7). This, for example, is used for pixels with > 1.0, like the aforementioned pixels of the Sun.

Next, the blue channel is multiplied by 0.99999 (line 8) so that

*floor (color.b)*returns a value in the interval [0-7].

To calculate the LUT coordinates, the shader first converts the red and green channels to “squeeze” from to the upper left segment. The blue channel [0-1] is cut into 64 fragments that correspond to all 64 segments in the search texture. Based on the current value of the blue channel, the corresponding segment is selected and the offset for it is calculated.

**Example**

Let's, for example, choose (0.75, 0.5, 1.0). Red and green channels are converted to the upper left segment, which gives us:

*float2 rgOffset=(0.75, 0.5)/8=(0.09375, 0.0625)*

Next, we check in which of the 64 segments the blue value (1.0) is located. Of course, in our case this is the last segment - 64.

The offset is expressed in segments (rowOffset, columnOffset):

*float blue_rowOffset=7.0;*

*float blue_columnOffset=7.0;*

*float2 blueOffset=float2 (blue_rowOffset, blue_columnOffset)/8.0=(0.875, 0.875)*

In the end, we just summarize the offsets:

*float2 finalUV=rgOffset + blueOffset;*

*finalUV=*

*(*

*0.09375*

*, 0.0625) +*

*(0.875, 0.875)=(0.96875, 0.9375)*

It was just a short example. Now let's look at the implementation details.

For red and green channels (r2.xy), line 9 adds an offset of half a pixel (0.5/64). Then we multiply them by 0.996094 (line 10) and limit them (clamp) to a special interval (lines 11-12).

The need for a half-pixel offset is pretty obvious - we want to sample from the center of the pixel. A much more mysterious aspect is the scaling factor from line 10 - it is 63.75/64.0. Soon we will tell you more about it.

At the end, the coordinates are limited to the interval [1/64 - 63/64].

Why do we need this? I don’t know for sure, but it seems to be done so that bilinear sampling never took samples outside the segment.

Here is an example image in the form of a 6x6 segment demonstrating how this clamp operation works:

Here is a scene without using clamp - notice a rather serious discoloration around the Sun:

For simplicity of comparison, I will show once again the result of the game:

Here is the code snippet for this part:

`//* Calculate red/green offset//half-pixel offset to always sample within centre of a pixel const float halfOffset=0.5/64.0; const float scale=63.75/64.0; float2 rgOffset; rgOffset=halfOffset + color.rg; rgOffset *= scale; rgOffset.xy=clamp(rgOffset.xy, float2(1.0/64.0, 1.0/64.0), float2(63.0/64.0, 63.0/64.0) );//place within the top left slice rgOffset.xy/= 8.0; `

Now it's time to find the offset for the blue channel.

To find the line offset, the blue channel is divided into 8 parts, each of which covers exactly one line in the search texture.

`//rows bOffset.y=floor(color.b * 8); `

To find the column offset, the resulting value must be divided again into 8 smaller parts, which corresponds to all 8 segments of the row. The shader equation is pretty confusing:

`//columns bOffset.x=floor(color.b * 64 - 8*bOffset.y ); `

At this stage, it is worth noting that:

*frac (x)=x - floor (x)*

Therefore, the equation can be rewritten as follows:

`bOffset.x=floor(8 * frac(color.b * 8) ); `

And here is the code snippet for this:

`//* Calculate blue offset float2 bOffset;//rows bOffset.y=floor(color.b * 8);//columns bOffset.x=floor(color.b * 64 - 8*bOffset.y );//or://bOffset.x=floor(8 * frac(color.b * 8) );//at this moment bOffset stores values in [0-7] range, we have to divide it by 8.0. bOffset/= 8.0; float2 lutPos=rgOffset + bOffset; return lutPos; `

Thus, we got a function that returns the coordinates of the texture for sampling the LUT texture. Let's call this function “getUV.”

`float2 getUV(in float3 color) { ... } `

Now back to the main function of the shader. As mentioned above, due to the use of a two-dimensional LUT to simulate a bilinear sampling of the blue channel, two requests to the LUT are needed (from two segments adjacent to each other).

Consider the following snippet on HLSL:

`//Part 1 float scale_1=63.75/64.0; float offset_1=1.0/64.0;//0.015625 float3 inputColor1=inputColorGamma; inputColor1.b=inputColor1.b * scale_1 + offset_1; float2 uv1=getUV(inputColor1); float3 color1=texLUT.SampleLevel( sampler1, uv1, 0 ).rgb;//Part 2 float3 inputColor2=inputColorGamma; inputColor2.b=floor(inputColorGamma.b * 63.75)/64; float2 uv2=getUV(inputColor2); float3 color2=texLUT.SampleLevel( sampler1, uv2, 0 ).rgb;//frac(x)=x - floor(x);//float blueInterp=inputColorGamma.b*63.75 - floor(inputColorGamma.b * 63.75); float blueInterp=frac(inputColorGamma.b * 63.75);//Final LUT-corrected color const float lutCorrectedMult=cb3_v1.z; float3 finalLUT=lerp(color2, color1, blueInterp); finalLUT=lutCorrectedMult * GammaToLinear(finalLUT); `

The principle is to get colors from two adjacent segments and then interpolate between them - the amount of interpolation depends on the fraction of the incoming blue.

Part 1 gets the color from the “far” segment due to an explicitly set blue offset (+ 1.0/64);

The result of the interpolation is stored in the variable "finalLUT". Notice that after this the result returns to linear space again and is multiplied by

*lutCorrectedMult*. In this particular frame, its value is 1.00916. This allows you to change the brightness of the LUT color.

Obviously, the most intriguing part is 63.75 and 63.75/64. I don’t quite understand where they come from. The only explanation I found: 63.75/64.0=510.0/512.0. As mentioned above, there is a clamp for the.rg channels, that is, when adding a blue offset, this essentially means that the outermost rows and LUT columns will not be used directly. I think that the colors are explicitly “squeezed” into the center of the 510x510 search texture area.

Let's say that

*inputColorGamma.b*=0.75/64.0.

Here's how it works:

Here we have the first four segments (1-4) that cover the blue channel with [0 - 4/64].

Judging by the location of the pixel, it seems that the channels of red and green are approximately equal to 0.75 and 0.5.

We double-query the LUT - “Part 1” points to segment 2, and “Part 2” points to the first segment.

And the interpolation is based on the fractional part of the color, which is 0.75.

That is, the final result has 75% color from the first segment and 25% color from the second.

We are almost done.The last thing to do is:

`//Calculate the final color const float lutCorrectedInfluence=cb3_v1.y;//0.20 in this frame float3 finalColor=lerp(inputColorLinear.rgb, finalLUT, lutCorrectedInfluence); return float4( finalColor, inputColorLinear.a ); `

Ha! In this case, the final color consists of 80% of the input color and 20% of the LUT color!

Let's do a brief comparison of the images again: the input color (that is, in fact, with 0% color correction), the final frame (20%) and the fully processed image (100% of the effect of color correction):

*0% color grading*

*20% color grading (true shader)*

*100% color grading*

### Multiple LUTs

In some cases, The Witcher 3 uses multiple LUTs.

Here is a scene that uses two LUTs:

*Before color grading*

*After going through color grading*

Used LUTs:

*LUT 1 (texture1)*

*LUT 2 (texture2)*

Let's examine the assembler fragment from this version of the shader:

`18: sample_l(texture2d)(float,float,float,float) r3.xyz, r2.xyxx, t2.xyzw, s2, l(0) 19: sample_l(texture2d)(float,float,float,float) r2.xyz, r2.xyxx, t1.xyzw, s1, l(0) ... 36: sample_l(texture2d)(float,float,float,float) r4.xyz, r1.xyxx, t2.xyzw, s2, l(0) 37: sample_l(texture2d)(float,float,float,float) r1.xyw, r1.xyxx, t1.xywz, s1, l(0) 38: add r3.xyz, r3.xyzx, -r4.xyzx 39: mad r3.xyz, r1.zzzz, r3.xyzx, r4.xyzx 40: log r3.xyz, abs(r3.xyzx) 41: mul r3.xyz, r3.xyzx, l(2.200000, 2.200000, 2.200000, 0.000000) 42: exp r3.xyz, r3.xyzx 43: add r2.xyz, -r1.xywx, r2.xyzx 44: mad r1.xyz, r1.zzzz, r2.xyzx, r1.xywx 45: log r1.xyz, abs(r1.xyzx) 46: mul r1.xyz, r1.xyzx, l(2.200000, 2.200000, 2.200000, 0.000000) 47: exp r1.xyz, r1.xyzx 48: add r2.xyz, -r1.xyzx, r3.xyzx 49: mad r1.xyz, cb3[1].xxxx, r2.xyzx, r1.xyzx 50: mad r1.xyz, cb3[1].zzzz, r1.xyzx, -r0.xyzx 51: mad o0.xyz, cb3[1].yyyy, r1.xyzx, r0.xyzx 52: mov o0.w, r0.w 53: ret `

Fortunately, everything is pretty simple. According to the assembler code, we get:

`//Part 1//... float2 uv1=getUV(inputColor1); float3 lut2_color1=texture2.SampleLevel( sampler2, uv1, 0 ).rgb; float3 lut1_color1=texture1.SampleLevel( sampler1, uv1, 0 ).rgb;//Part 2//... float2 uv2=getUV(inputColor2); float3 lut2_color2=texture2.SampleLevel( sampler2, uv2, 0 ).rgb; float3 lut1_color2=texture1.SampleLevel( sampler1, uv2, 0 ).rgb; float blueInterp=frac(inputColorGamma.b * 63.75); float3 lut2_finalLUT=lerp(lut2_color2, lut2_color1, blueInterp); lut2_finalLUT=GammaToLinear(lut2_finalLUT); float3 lut1_finalLUT=lerp(lut1_color2, lut1_color1, blueInterp); lut1_finalLUT=GammaToLinear(lut1_finalLUT); const float lut_Interp=cb3_v1.x; float3 finalLUT=lerp(lut1_finalLUT, lut2_finalLUT, lut_Interp); const float lutCorrectedMult=cb3_v1.z; finalLUT *= lutCorrectedMult;//Calculate the final color const float lutCorrectedInfluence=cb3_v1.y; float3 finalColor=lerp(inputColorLinear.rgb, finalLUT, lutCorrectedInfluence); return float4( finalColor, inputColorLinear.a ); } `

After receiving two colors from the LUT, interpolation is performed between them in

*lut_Interp*. Everything else is almost the same as in the version with one LUT.

In this case, the only additional variable is

*lut_interp*, which tells how the two LUTs are mixed.

Its value in this particular frame is approximately 0.96, that is,

*finalLUT*contains 96% of the color from LUT2 and 4% of the color from LUT1.

However, this is not the end! The scene I studied in the part of the Fog uses the

__three__LUT!

Let's take a look!

*Before color grading*

*After going through color grading*

*LUT1 (texture1)*

*LUT2 (texture2)*

*LUT3 (texture3)*

And again the assembler fragment:

`23: mad r2.yz, r2.yyzy, l(0.000000, 0.125000, 0.125000, 0.000000), r3.xxyx 24: sample_l(texture2d)(float,float,float,float) r3.xyz, r2.yzyy, t2.xyzw, s2, l(0) ... 34: mad r1.xy, r1.xyxx, l(0.125000, 0.125000, 0.000000, 0.000000), r1.zwzz 35: sample_l(texture2d)(float,float,float,float) r4.xyz, r1.xyxx, t2.xyzw, s2, l(0) 36: add r4.xyz, -r3.xyzx, r4.xyzx 37: mad r3.xyz, r2.xxxx, r4.xyzx, r3.xyzx 38: log r3.xyz, abs(r3.xyzx) 39: mul r3.xyz, r3.xyzx, l(2.200000, 2.200000, 2.200000, 0.000000) 40: exp r3.xyz, r3.xyzx 41: sample_l(texture2d)(float,float,float,float) r4.xyz, r1.xyxx, t1.xyzw, s1, l(0) 42: sample_l(texture2d)(float,float,float,float) r1.xyz, r1.xyxx, t3.xyzw, s3, l(0) 43: sample_l(texture2d)(float,float,float,float) r5.xyz, r2.yzyy, t1.xyzw, s1, l(0) 44: sample_l(texture2d)(float,float,float,float) r2.yzw, r2.yzyy, t3.wxyz, s3, l(0) 45: add r4.xyz, r4.xyzx, -r5.xyzx 46: mad r4.xyz, r2.xxxx, r4.xyzx, r5.xyzx 47: log r4.xyz, abs(r4.xyzx) 48: mul r4.xyz, r4.xyzx, l(2.200000, 2.200000, 2.200000, 0.000000) 49: exp r4.xyz, r4.xyzx 50: add r3.xyz, r3.xyzx, -r4.xyzx 51: mad r3.xyz, cb3[1].xxxx, r3.xyzx, r4.xyzx 52: mad r3.xyz, cb3[1].zzzz, r3.xyzx, -r0.xyzx 53: mad r3.xyz, cb3[1].yyyy, r3.xyzx, r0.xyzx 54: add r1.xyz, r1.xyzx, -r2.yzwy 55: mad r1.xyz, r2.xxxx, r1.xyzx, r2.yzwy 56: log r1.xyz, abs(r1.xyzx) 57: mul r1.xyz, r1.xyzx, l(2.200000, 2.200000, 2.200000, 0.000000) 58: exp r1.xyz, r1.xyzx 59: mad r1.xyz, cb3[2].zzzz, r1.xyzx, -r0.xyzx 60: mad r0.xyz, cb3[2].yyyy, r1.xyzx, r0.xyzx 61: mov o0.w, r0.w 62: add r0.xyz, -r3.xyzx, r0.xyzx 63: mad o0.xyz, cb3[2].wwww, r0.xyzx, r3.xyzx 64: ret `

Unfortunately, this version of the shader is much more confusing than the previous two. For example, a UV called “uv1” used to be seen in assembler code before “uv2” (compare the assembler code of a shader with just one LUT). But here it’s not so - UV for “Part 1” are calculated on line 34, UV for “Part 2” on line 23.

Having spent much more than expected time studying what is happening here and wondering why Part2 seems to have swapped places with Part1, I wrote a code fragment in HLSL for three LUTs:

`//Part 1//... float2 uv1=getUV(inputColor1); float3 lut3_color1=texture3.SampleLevel( sampler3, uv1, 0 ).rgb; float3 lut2_color1=texture2.SampleLevel( sampler2, uv1, 0 ).rgb; float3 lut1_color1=texture1.SampleLevel( sampler1, uv1, 0 ).rgb;//Part 2//... float2 uv2=getUV(inputColor2); float3 lut3_color2=texture3.SampleLevel( sampler3, uv2, 0 ).rgb; float3 lut2_color2=texture2.SampleLevel( sampler2, uv2, 0 ).rgb; float3 lut1_color2=texture1.SampleLevel( sampler1, uv2, 0 ).rgb; float blueInterp=frac(inputColorGamma.b * 63.75);//At first compute linear color for LUT 2 [assembly lines 36-40] float3 lut2_finalLUT=lerp(lut2_color2, lut2_color1, blueInterp); lut2_finalLUT=GammaToLinear(lut2_finalLUT);//Compute linear color for LUT 1 [assembly: 45-49] float3 lut1_finalLUT=lerp(lut1_color2, lut1_color1, blueInterp); lut1_finalLUT=GammaToLinear(lut1_finalLUT);//Interpolate between LUT 1 and LUT 2 [assembly: 50-51] const float lut12_Interp=cb3_v1.x; float3 lut12_finalLUT=lerp(lut1_finalLUT, lut2_finalLUT, lut12_Interp);//Multiply the LUT1-2 intermediate result with scale factor [assembly: 52] const float lutCorrectedMult_LUT1_2=cb3_v1.z; lut12_finalLUT *= lutCorrectedMult;//Mix LUT1-2 intermediate result with the scene color [assembly: 52-53] const float lutCorrectedInfluence_12=cb3_v1.y; lut12_finalLUT=lerp(inputColorLinear.rgb, lut12_finalLUT, lutCorrectedInfluence_12);//Compute linear color for LUT3 [assembly: 54-58] float3 lut3_finalLUT=lerp(lut3_color2, lut3_color1, blueInterp); lut3_finalLUT=GammaToLinear(lut3_finalLUT);//Multiply the LUT3 intermediate result with the scale factor [assembly: 59] const float lutCorrectedMult_LUT3=cb3_v2.z; lut3_finalLUT *= lutCorrectedMult_LUT3;//Mix LUT3 intermediate result with the scene color [assembly: 59-60] const float lutCorrectedInfluence3=cb3_v2.y; lut3_finalLUT=lerp(inputColorLinear.rgb, lut3_finalLUT, lutCorrectedInfluence3);//The final mix between LUT1+2 and LUT3 influence [assembly: 62-63] const float finalInfluence=cb3_v2.w; float3 finalColor=lerp(lut12_finalLUT, lut3_finalLUT, finalInfluence); return float4( finalColor, inputColorLinear.a ); } `

After completing all texture queries, the results of LUT1 and LUT2 are first interpolated, then they are multiplied by the scaling factor, and then combined with the linear color of the main scene. Let's name the result

*lut12_finalLUT*.

Then, roughly the same thing happens for LUT3 - multiply by another scaling factor and combine it with the color of the main scene, which gives us

*lut3_finalLUT*.

At the end, both intermediate results are again interpolated.

Here are the values from cbuffer:

## Part 3: portals

If you played Witcher 3 for a long time, then you know that Geralt is not a big fan of portals. Let's see if they are really so scary.

There are two types of portals in the game:

*Blue Portal*

*Fire Portal*

I will explain how the fire is created.Mostly because its code is simpler than blue :) :)

This is how the fire portal looks in the game:

The most important part is, of course, the fire rotating towards the center, but the effect itself does not only consist of the visible part. More on this later.

The plan for this part is pretty standard: first geometry, then vertex and pixel shaders. There will be quite a few screenshots and videos.

From the point of view of rendering, portals are drawn in a direct pass with mixing turned on - a fairly common technique in the game; see the shooting stars section for more details.

Well, let's get started.

## 1. Geometry

Here's what the portal mesh looks like:

*Local Space - Front View*

*Local Space - Side View*

The mesh resembles Gabriel’s horn . The vertex shader compresses it along one axis, here is the same mesh after compression in the side view (in world space):

*Portal mesh after vertex shader (side view)*

In addition to the position, each vertex has additional data: the following are important for us (at this stage I will demonstrate visualizations from RenderDoc, and I will tell you more about them later):

Texcoords (float2):

Tangent (float3):

Color (float3):

All this data will be used later, but at this stage we already have too much data for the.obj file, so exporting this mesh can cause problems. I exported each channel as a separate.csv file, and then uploaded all the.csv files to my C++ application and the mesh is built based on this collected data at runtime.

## 2. Vertex Shader

The vertex shader is not particularly interesting, but let's take a look at the corresponding fragment anyway:

`vs_5_0 dcl_globalFlags refactoringAllowed dcl_constantbuffer cb1[7], immediateIndexed dcl_constantbuffer cb2[6], immediateIndexed dcl_input v0.xyz dcl_input v1.xy dcl_input v3.xyz dcl_input v4.xyzw dcl_input v6.xyzw dcl_input v7.xyzw dcl_input v8.xyzw dcl_output o0.xyz dcl_output o1.xyzw dcl_output o2.xyz dcl_output o3.xyz dcl_output_siv o4.xyzw, position dcl_temps 3 0: mov o0.xy, v1.xyxx 1: mul r0.xyzw, v7.xyzw, cb1[6].yyyy 2: mad r0.xyzw, v6.xyzw, cb1[6].xxxx, r0.xyzw 3: mad r0.xyzw, v8.xyzw, cb1[6].zzzz, r0.xyzw 4: mad r0.xyzw, cb1[6].wwww, l(0.000000, 0.000000, 0.000000, 1.000000), r0.xyzw 5: mad r1.xyz, v0.xyzx, cb2[4].xyzx, cb2[5].xyzx 6: mov r1.w, l(1.000000) 7: dp4 o0.z, r1.xyzw, r0.xyzw 8: mov o1.xyzw, v4.xyzw 9: dp4 o2.x, r1.xyzw, v6.xyzw 10: dp4 o2.y, r1.xyzw, v7.xyzw 11: dp4 o2.z, r1.xyzw, v8.xyzw 12: mad r0.xyz, v3.xyzx, l(2.000000, 2.000000, 2.000000, 0.000000), l(-1.000000, -1.000000, -1.000000, 0.000000) 13: dp3 r2.x, r0.xyzx, v6.xyzx 14: dp3 r2.y, r0.xyzx, v7.xyzx 15: dp3 r2.z, r0.xyzx, v8.xyzx 16: dp3 r0.x, r2.xyzx, r2.xyzx 17: rsq r0.x, r0.x 18: mul o3.xyz, r0.xxxx, r2.xyzx `

The vertex shader is very similar to the other shaders we met earlier.

After a brief analysis and comparison of the input data circuit, I found out that the output struct can be written like this:

`struct VS_OUTPUT { float3 TexcoordAndViewSpaceDepth : TEXCOORD0; float3 Color : TEXCOORD1; float3 WorldSpacePosition : TEXCOORD2; float3 Tangent : TEXCOORD3; float4 PositionH : SV_Position; }; `

I wanted to demonstrate one aspect - how a shader gets depth in the viewing space (o0.z): it's just the.w component of the SV_Position variable.

On gamedev.net is a topic , which explains this a little more.

## 3. Pixel Shader

Here is an example scene immediately before rendering the portal:

... and after rendering:

In addition, the RenderDoc texture viewer has a useful “Clear Before Draw” overlay option, with which we can accurately see the rendered portal:

The first interesting aspect is that the flame layer itself is drawn only in the central region of the mesh.

The pixel shader consists of 186 lines, for convenience I posted it here . As usual, during the explanation I will cite the corresponding fragments in assembler.

It’s also worth noting that 100 out of 186 lines relate to the calculation of fog .

At the beginning, 4 textures are fed to the entrance: fire (t0), noise/smoke (t1), scene color (t6) and scene depth (t15):

*Fire Texture*

*Noise/Smoke Texture*

*Scene Color*

*Scene Depth*

There is also a separate constant buffer with 14 parameters that control the effect:

Incoming data: position, tangent and texcoords are pretty clear concepts, but let's take a closer look at the Color channel. After several experiments, it seems to me that this is not a color in itself, but three different masks that the shader uses to distinguish between separate layers and understand where to apply various effects:

Color.r - heat haze mask. As the name implies, it is used for the effect of thermal distortion of air (more about it later):

Color.g - inner mask. It is mainly used for the effect of fire.

Color.b - back mask. Used to determine where the "back" of the portal is.

I believe that in the case of such effects, it is better to describe the individual layers, rather than analyze the assembler code from beginning to end, as I did before.

So, let's go:

### 3.1. Layer of fire

First, let's explore the most important part: the layer of fire. Here is a video of him:

The main principle for implementing this effect is to use static texcoords from the data for each vertex and animate them using the elapsed time variable from the constant buffer. Thanks to these animated texcoords, we sample the texture (in our case, fire) with a distortion/repeat sampler.

Interestingly, in this particular effect, only the.r texture channel of the fire is sampled. To make the effect more believable, in the manner described above, two layers of fire are obtained, which are then combined with each other.

Well, let's finally look at the code!

We start by making texcoords more dynamic when they reach the center of the mesh:

`const float2 texcoords=Input.TextureUV; const float uvSquash=cb4_v4.x;//2.50 ... const float y_cutoff=0.2; const float y_offset=pow(texcoords.y - y_cutoff, uvSquash); `

But the same thing, but in assembler:

`21: add r1.z, v0.y, l(-0.200000) 22: log r1.z, r1.z 23: mul r1.z, r1.z, cb4[4].x 24: exp r1.z, r1.z `

The shader then receives texcoords for the first layer of fire and samples the texture of the fire:

`const float elapsedTimeSeconds=cb0_v0.x; const float uvScaleGlobal1=cb4_v2.x;//1.00 const float uvScale1=cb4_v3.x;//0.15 ...//Sample fire1 - the first fire layer float fire1;//r1.w { float2 fire1Uv; fire1Uv.x=texcoords.x; fire1Uv.y=uvScale1 * elapsedTimeSeconds + y_offset; const float scaleGlobal=floor(uvScaleGlobal1);//1.0 fire1Uv *= scaleGlobal; fire1=texFire.Sample(samplerLinearWrap, fire1Uv).x; } `

Here is the corresponding snippet in assembler:

`25: round_ni r1.w, cb4[2].x 26: mad r2.y, cb4[3].x, cb0[0].x, r1.z 27: mov r2.x, v0.x 28: mul r2.xy, r1.wwww, r2.xyxx 29: sample_indexable(texture2d)(float,float,float,float) r1.w, r2.xyxx, t0.yzwx, s0 `

This is what the first layer looks like with

*elapsedTimeSeconds*=50.0:

To show what

*y_cutoff*does, we demonstrate the same scene, but with

*y_cutoff*=0.5:

So we got the first layer. Next, the shader gets the second:

`const float uvScale2=cb4_v6.x;//0.06 const float uvScaleGlobal2=cb4_v7.x;//1.00 ...//Sample fire2 - the second fire layer float fire2;//r1.z { float2 fire2Uv; fire2Uv.x=texcoords.x - uvScale2 * elapsedTimeSeconds; fire2Uv.y=uvScale2 * elapsedTimeSeconds + y_offset; const float fire2_scale=floor(uvScaleGlobal2); fire2Uv *= fire2_scale; fire2=texFire.Sample(samplerLinearWrap, fire2Uv).x; } `

And here is the corresponding assembler fragment:

`144: mad r2.x, -cb0[0].x, cb4[6].x, v0.x 145: mad r2.y, cb0[0].x, cb4[6].x, r1.z 146: round_ni r1.z, cb4[7].x 147: mul r2.xy, r1.zzzz, r2.xyxx 148: sample_indexable(texture2d)(float,float,float,float) r1.z, r2.xyxx, t0.yzxw, s0 `

That is, as you can see, the only difference is in UV: now X is also animating.

The second layer looks like this:

Having received two layers of

**inner fire**, we can combine them. However, this process is a bit more complicated than ordinary multiplication, since the inner mask is involved in it:

`const float innerMask=Input.Color.y; const float portalInnerColorSqueeze=cb4_v8.x;//3.00 const float portalInnerColorBoost=cb4_v9.x;//188.00 ...//Calculate inner fire influence float inner_influence;//r1.z {//innerMask and "-1.0" are used here to control where the inner part of a portal is. inner_influence=fire1 * fire2 + innerMask; inner_influence=saturate(inner_influence - 1.0);//Exponentation to hide less luminous elements of inner portal inner_influence=pow(inner_influence, portalInnerColorSqueeze);//Boost the intensity inner_influence *= portalInnerColorBoost; } `

Here is the corresponding assembly code:

`149: mad r1.z, r1.w, r1.z, v1.y 150: add_sat r1.z, r1.z, l(-1.000000) 151: log r1.z, r1.z 152: mul r1.z, r1.z, cb4[8].x 153: exp r1.z, r1.z 154: mul r1.z, r1.z, cb4[9].x `

Having received

*inner_influence*, which is nothing more than a mask for the inner fire, we can simply multiply the mask by the color of the inner fire:

`//Calculate portal color const float3 colorPortalInner=cb4_v5.rgb;//(1.00, 0.60, 0.21961) ... const float3 portal_inner_final=pow(colorPortalInner, 2.2) * inner_influence; `

Assembler Code:

`155: log r2.xyz, cb4[5].xyzx 156: mul r2.xyz, r2.xyzx, l(2.200000, 2.200000, 2.200000, 0.000000) 157: exp r2.xyz, r2.xyzx ... 170: mad r2.xyz, r2.xyzx, r1.zzzz, r3.xyzx `

Here's a video that demonstrates the individual layers of inner fire. Order: first layer, second layer, inner influences and final inner color:

### 3.2.Glow

Having created the inner fire, we pass to the second layer: the glow. Here is a video showing first only the internal fire, then only the glow, and then their sum - the finished effect of the fire:

This is how the shader calculates the glow. Similar to creating an internal fire, a mask is first generated, which is then multiplied by the color of the glow from the constant buffer.

`const float portalOuterGlowAttenuation=cb4_v10.x;//0.30 const float portalOuterColorBoost=cb4_v11.x;//1.50 const float3 colorPortalOuterGlow=cb4_v12.rgb;//(1.00, 0.61961, 0.30196) ...//Calculate outer portal glow float outer_glow_influence; { float outer_mask=(1.0 - backMask) * innerMask; const float perturbParam=fire1*fire1; float outer_mask_perturb=lerp( 1.0 - portalOuterGlowAttenuation, 1.0, perturbParam ); outer_mask *= outer_mask_perturb; outer_glow_influence=outer_mask * portalOuterColorBoost; }//the final glow color const float3 portal_outer_final=pow(colorPortalOuterGlow, 2.2) * outer_glow_influence;//and the portal color, the sum of fire and glow float3 portal_final=portal_inner_final + portal_outer_final; `

This is what outer_mask looks like:

(1.0 - backMask) * innerMask

The glow does not have a constant color. To make it look more interesting, an animated first layer of fire (squared) is used, so the vibrations going to the center are noticeable:

Here is the assembler code responsible for the glow:

`158: add r2.w, -v1.z, l(1.000000) 159: mul r2.w, r2.w, v1.y 160: mul r1.w, r1.w, r1.w 161: add r3.x, l(1.000000), -cb4[10].x 162: add r3.y, -r3.x, l(1.000000) 163: mad r1.w, r1.w, r3.y, r3.x 164: mul r1.w, r1.w, r2.w 165: mul r1.w, r1.w, cb4[11].x 166: log r3.xyz, cb4[12].xyzx 167: mul r3.xyz, r3.xyzx, l(2.200000, 2.200000, 2.200000, 0.000000) 168: exp r3.xyz, r3.xyzx 169: mul r3.xyz, r1.wwww, r3.xyzx 170: mad r2.xyz, r2.xyzx, r1.zzzz, r3.xyzx `

### 3.3. Marevo

When I began to analyze the implementation of the portal shader, it was not clear to me why the color of the scene without the portal was used as one of the input textures. I reasoned like this: “here we use blending, so it’s enough to return a pixel with a zero alpha value to preserve the background color.”

The shader has a small but beautiful haze effect (thermal distortion of the air) - heat and energy are emanating from the portal, so the background is distorted.

The principle is to offset pixel texcoords and sample the background color texture with the new coordinates - such an operation cannot be performed by simple mixing.

Here is a video demonstrating how this works. Order: first the full effect, then the haze from the shader, and at the end I multiply the offset by 10 to enhance the effect.

Let's see how the offset is calculated.

`const float ViewSpaceDepth=Input.ViewSpaceDepth; const float3 Tangent=Input.Tangent; const float backgroundDistortionStrength=cb4_v1.x;//0.40//Fades smoothly from the outer edges to the back of a portal const float heatHazeMask=Input.Color.x; ...//The heat haze effect is view dependent thanks to tangent vectors in view space. float2 heatHazeOffset=mul( normalize(Tangent), (float3x4)g_mtxView); heatHazeOffset *= float2(-1, 1);//Fade the effect as camera is further from a portal const float heatHazeDistanceFade=backgroundDistortionStrength/ViewSpaceDepth; heatHazeOffset *= heatHazeDistanceFade; heatHazeOffset *= heatHazeMask;//this is what animates the heat haze effect heatHazeOffset *= pow(fire1, 0.2);//Actually I don't know what's this :)//It was 1.0 usually so I won't bother discussing this. heatHazeOffset *= vsDepth2; `

The corresponding assembler is scattered by code:

`11: dp3 r1.x, v3.xyzx, v3.xyzx 12: rsq r1.x, r1.x 13: mul r1.xyz, r1.xxxx, v3.xyzx 14: mul r1.yw, r1.yyyy, cb12[2].xxxy 15: mad r1.xy, cb12[1].xyxx, r1.xxxx, r1.ywyy 16: mad r1.xy, cb12[3].xyxx, r1.zzzz, r1.xyxx 17: mul r1.xy, r1.xyxx, l(-1.000000, 1.000000, 0.000000, 0.000000) 18: div r1.z, cb4[1].x, v0.z 19: mul r1.xy, r1.zzzz, r1.xyxx 20: mul r1.xy, r1.xyxx, v1.xxxx ... 33: mul r1.xy, r1.xyxx, r2.xxxx 34: mul r1.xy, r0.zzzz, r1.xyxx `

We calculated the offset, so let's use it!

`const float2 backgroundSceneMaxUv=cb0_v2.zw;//(1.0, 1.0) const float2 invViewportSize=cb0_v1.zw;//(1.0/1920.0, 1.0/1080.0 )//Obtain background scene color - we need to obtain it from texture//for distortion effect float3 sceneColor; { const float2 sceneUv_0=pixelUv + backgroundSceneMaxUv*heatHazeOffset; const float2 sceneUv_1=backgroundSceneMaxUv - 0.5*invViewportSize; const float2 sceneUv=min(sceneUv_0, sceneUv_1); sceneColor=texScene.SampleLevel(sampler6, sceneUv, 0).rgb; } `

`175: mad r0.xy, cb0[2].zwzz, r1.xyxx, r0.xyxx 176: mad r1.xy, -cb0[1].zwzz, l(0.500000, 0.500000, 0.000000, 0.000000), cb0[2].zwzz 177: min r0.xy, r0.xyxx, r1.xyxx 178: sample_l(texture2d)(float,float,float,float) r1.xyz, r0.xyxx, t6.xyzw, s6, l(0) `

So, in the end, we got

*sceneColor*.

### 3.4. Portal Target Color

By the color of the “goal” I mean the central part of the portal:

Unfortunately, he is all black. And the reason for this is fog.

I already talked about how fog is implemented in this article . In the portal shader, the fog calculations are in lines [35-135] of the source assembler code.

HLSL:

`struct FogResult { float4 paramsFog; float4 paramsAerial; }; ... FogResult fog; { const float3 CameraPosition=cb12_v0.xyz; const float fogStart=cb12_v22.z;//near plane fog=CalculateFog( WSPosition, CameraPosition, fogStart, false ); } ... const float3 destination_color=fog.paramsFog.a * fog.paramsFog.rgb; `

And so we get the finished scene:

The fact is that the camera in the frame is so close to the portal that the calculated

*destination_color*is zero, that is, the black center of the portal is actually fog (or, strictly speaking, its absence).

Since with RenderDoc we can inject shaders into the game, let's try moving the camera manually:

`const float3 CameraPosition=cb12_v0.xyz + float3(100, 100, 0); `

And here is the result:

Ha!

So, although in this particular case it makes little sense to use fog calculations, theoretically, nothing prevents us from using

*destination_color*. for example, landscape from another world (you may need an extra pair of texcoords, but, nevertheless, this is quite feasible).

Using fog can be useful in the case of a huge portal, which the player can see from a long distance.

### 3.5. Mixing the color of the scene (with the superimposed haze) with the “target”

I was wondering where to place this section - in “The color of the“ goal ”or“ Putting it all together, ”but decided to create a new subsection.

So, at this stage, we have the

*sceneColor*described in 3.3 that already contains the haze effect (thermal distortion), and we also have the

*destination_color*from section 3.4.

They are interpolated using:

`178: sample_l(texture2d)(float,float,float,float) r1.xyz, r0.xyxx, t6.xyzw, s6, l(0) 179: mad r3.xyz, r4.wwww, r4.xyzx, -r1.xyzx 180: mad r0.xyw, r0.wwww, r3.xyxz, r1.xyxz `

What is the value that interpolates them (r0.w)?

The noise/smoke texture is applied here.

It is used to create what I call the “portal target mask.”

Here's the video (first the full effect, then the target mask, then the scene color and the target color interpolated):

Take a look at this snippet on HLSL:

`//Determines the back part of a portal const float backMask=Input.Color.z; const float ViewSpaceDepth=Input.TexcoordAndViewSpaceDepth.z; const float viewSpaceDepthScale=cb4_v0.x;//0.50 ...//Load depth from texture float hardwareDepth=texDepth.SampleLevel(sampler15, pixelUv, 0).x; float linearDepth=getDepth(hardwareDepth);//cb4_v0.x=0.5 float vsDepthScale=saturate( (linearDepth - ViewSpaceDepth) * viewSpaceDepthScale ); float vsDepth1=2*vsDepthScale; ....//Calculate 'portal destination' mask - maybe we would like see a glimpse of where a portal leads//like landscape from another planet - the shader allows for it. float portal_destination_mask; { const float region_mask=dot(backMask.xx, vsDepth1.xx); const float2 _UVScale=float2(4.0, 1.0); const float2 _TimeScale=float2(0.0, 0.2); const float2 _UV=texcoords * _UVScale + elapsedTime * _TimeScale; portal_destination_mask=texNoise.Sample(sampler0, _UV).x; portal_destination_mask=saturate(portal_destination_mask + region_mask - 1.0); portal_destination_mask *= portal_destination_mask;//line 143, r0.w } `

The portal’s target mask as a whole is obtained in the same way as fire — using animated texture coordinates. The variable "

*region_mask*" is used to adjust the location of the effect.

Another variable called

*vsDepth1*is used to get

*region_mask*. I will describe it in more detail in the next section. However, it has little effect on the target mask.

The assembly code for the target mask looks like this:

`137: dp2 r0.w, v1.zzzz, r0.zzzz 138: mul r2.xy, cb0[0].xxxx, l(0.000000, 0.200000, 0.000000, 0.000000) 139: mad r2.xy, v0.xyxx, l(4.000000, 1.000000, 0.000000, 0.000000), r2.xyxx 140: sample_indexable(texture2d)(float,float,float,float) r2.x, r2.xyxx, t1.xyzw, s0 141: add r0.w, r0.w, r2.x 142: add_sat r0.w, r0.w, l(-1.000000) 143: mul r0.w, r0.w, r0.w `

### 3.6. Putting It All Together

Fuh, almost done.

Let's get the portal color first:

`//Calculate portal color float3 portal_final; { const float3 portal_inner_color=pow(colorPortalInner, 2.2) * inner_influence; const float3 portal_outer_color=pow(colorPortalOuterGlow, 2.2) * outer_glow_influence; portal_final=portal_inner_color + portal_outer_color; portal_final *= vsDepth1;//fade the effect to avoid harsh artifacts due to depth test portal_final *= portalFinalColorFilter;//this was (1,1,1) - so not relevant } `

The only aspect I want to discuss here is

*vsDepth1*.

Here's what this mask looks like:

In the previous subsection, I showed how it turns out; in fact, it is a “linear depth buffer” used to reduce the color of the portal so that there is no sharp border due to the depth test.

Consider once again the finished scene, with and without multiplication by

*vsDepth1*.

After creating the

*portal_final*, getting the finished color is easy:

`const float finalPortalAmount=cb2_v0.x;//0.99443 const float3 finalColorFilter=cb2_v2.rgb;//(1.0, 1.0, 1.0) const float finalOpacityFilter=cb2_v2.a;//1.0 ...//Alpha component for blending float opacity=saturate( lerp(cb2_v0.x, 1, cb4_v13.x) );//Calculate the final color float3 finalColor; {//Mix the scene color (with heat haze effect) with the 'destination color'.//In this particular example fog is used as destination (which is black where camera is nearby)//but in theory there is nothing which stops us from putting here a landscape from another world. const float3 destination_color=fog.paramsFog.a * fog.paramsFog.rgb; finalColor=lerp( sceneColor, destination_color, portal_destination_mask );//Add the portal color finalColor += portal_final * finalPortalAmount;//Final filter finalColor *= finalColorFilter; } opacity *= finalOpacityFilter; return float4(finalColor * opacity, opacity); `

That's all. There is another

*finalPortalAmount*variable that determines how much flame a player sees.I did not test it in detail, but I assume that it is used when the portal appears and disappears - for a short period of time the player does not see fire, but sees everything else - glow, target color, etc.

## 4. To summarize

The ready-made shader on HLSL has been posted here . I had to swap several lines to get the same assembler code as the original, but this does not interfere with the overall flow of execution. The shader is ready to use with RenderDoc, all cbuffers are present, etc., so you can inject it and experiment yourself.

Hope you enjoyed it, thanks for reading !.

Source