19f2c52c66
Mesa 18.x needs an ld with build-id for at least the intel code Mesa 18.2 assumes linux only memfd syscalls in intel code Tested by matthieu@, kettenis@ and myself on a variety of hardware and architectures. ok kettenis@
523 lines
20 KiB
Plaintext
523 lines
20 KiB
Plaintext
Name
|
|
|
|
MESA_shader_integer_functions
|
|
|
|
Name Strings
|
|
|
|
GL_MESA_shader_integer_functions
|
|
|
|
Contact
|
|
|
|
Ian Romanick <ian.d.romanick@intel.com>
|
|
|
|
Contributors
|
|
|
|
All the contributors of GL_ARB_gpu_shader5
|
|
|
|
Status
|
|
|
|
Supported by all GLSL 1.30 capable drivers in Mesa 12.1 and later
|
|
|
|
Version
|
|
|
|
Version 3, March 31, 2017
|
|
|
|
Number
|
|
|
|
OpenGL Extension #495
|
|
|
|
Dependencies
|
|
|
|
This extension is written against the OpenGL 3.2 (Compatibility Profile)
|
|
Specification.
|
|
|
|
This extension is written against Version 1.50 (Revision 09) of the OpenGL
|
|
Shading Language Specification.
|
|
|
|
GLSL 1.30 (OpenGL) or GLSL ES 3.00 (OpenGL ES) is required.
|
|
|
|
This extension interacts with ARB_gpu_shader5.
|
|
|
|
This extension interacts with ARB_gpu_shader_fp64.
|
|
|
|
This extension interacts with NV_gpu_shader5.
|
|
|
|
Overview
|
|
|
|
GL_ARB_gpu_shader5 extends GLSL in a number of useful ways. Much of this
|
|
added functionality requires significant hardware support. There are many
|
|
aspects, however, that can be easily implmented on any GPU with "real"
|
|
integer support (as opposed to simulating integers using floating point
|
|
calculations).
|
|
|
|
This extension provides a set of new features to the OpenGL Shading
|
|
Language to support capabilities of these GPUs, extending the
|
|
capabilities of version 1.30 of the OpenGL Shading Language and version
|
|
3.00 of the OpenGL ES Shading Language. Shaders using the new
|
|
functionality provided by this extension should enable this
|
|
functionality via the construct
|
|
|
|
#extension GL_MESA_shader_integer_functions : require (or enable)
|
|
|
|
This extension provides a variety of new features for all shader types,
|
|
including:
|
|
|
|
* support for implicitly converting signed integer types to unsigned
|
|
types, as well as more general implicit conversion and function
|
|
overloading infrastructure to support new data types introduced by
|
|
other extensions;
|
|
|
|
* new built-in functions supporting:
|
|
|
|
* splitting a floating-point number into a significand and exponent
|
|
(frexp), or building a floating-point number from a significand and
|
|
exponent (ldexp);
|
|
|
|
* integer bitfield manipulation, including functions to find the
|
|
position of the most or least significant set bit, count the number
|
|
of one bits, and bitfield insertion, extraction, and reversal;
|
|
|
|
* extended integer precision math, including add with carry, subtract
|
|
with borrow, and extenended multiplication;
|
|
|
|
The resulting extension is a strict subset of GL_ARB_gpu_shader5.
|
|
|
|
IP Status
|
|
|
|
No known IP claims.
|
|
|
|
New Procedures and Functions
|
|
|
|
None
|
|
|
|
New Tokens
|
|
|
|
None
|
|
|
|
Additions to Chapter 2 of the OpenGL 3.2 (Compatibility Profile) Specification
|
|
(OpenGL Operation)
|
|
|
|
None.
|
|
|
|
Additions to Chapter 3 of the OpenGL 3.2 (Compatibility Profile) Specification
|
|
(Rasterization)
|
|
|
|
None.
|
|
|
|
Additions to Chapter 4 of the OpenGL 3.2 (Compatibility Profile) Specification
|
|
(Per-Fragment Operations and the Frame Buffer)
|
|
|
|
None.
|
|
|
|
Additions to Chapter 5 of the OpenGL 3.2 (Compatibility Profile) Specification
|
|
(Special Functions)
|
|
|
|
None.
|
|
|
|
Additions to Chapter 6 of the OpenGL 3.2 (Compatibility Profile) Specification
|
|
(State and State Requests)
|
|
|
|
None.
|
|
|
|
Additions to Appendix A of the OpenGL 3.2 (Compatibility Profile)
|
|
Specification (Invariance)
|
|
|
|
None.
|
|
|
|
Additions to the AGL/GLX/WGL Specifications
|
|
|
|
None.
|
|
|
|
Modifications to The OpenGL Shading Language Specification, Version 1.50
|
|
(Revision 09)
|
|
|
|
Including the following line in a shader can be used to control the
|
|
language features described in this extension:
|
|
|
|
#extension GL_MESA_shader_integer_functions : <behavior>
|
|
|
|
where <behavior> is as specified in section 3.3.
|
|
|
|
New preprocessor #defines are added to the OpenGL Shading Language:
|
|
|
|
#define GL_MESA_shader_integer_functions 1
|
|
|
|
|
|
Modify Section 4.1.10, Implicit Conversions, p. 27
|
|
|
|
(modify table of implicit conversions)
|
|
|
|
Can be implicitly
|
|
Type of expression converted to
|
|
--------------------- -----------------
|
|
int uint, float
|
|
ivec2 uvec2, vec2
|
|
ivec3 uvec3, vec3
|
|
ivec4 uvec4, vec4
|
|
|
|
uint float
|
|
uvec2 vec2
|
|
uvec3 vec3
|
|
uvec4 vec4
|
|
|
|
(modify second paragraph of the section) No implicit conversions are
|
|
provided to convert from unsigned to signed integer types or from
|
|
floating-point to integer types. There are no implicit array or structure
|
|
conversions.
|
|
|
|
(insert before the final paragraph of the section) When performing
|
|
implicit conversion for binary operators, there may be multiple data types
|
|
to which the two operands can be converted. For example, when adding an
|
|
int value to a uint value, both values can be implicitly converted to uint
|
|
and float. In such cases, a floating-point type is chosen if either
|
|
operand has a floating-point type. Otherwise, an unsigned integer type is
|
|
chosen if either operand has an unsigned integer type. Otherwise, a
|
|
signed integer type is chosen.
|
|
|
|
|
|
Modify Section 5.9, Expressions, p. 57
|
|
|
|
(modify bulleted list as follows, adding support for implicit conversion
|
|
between signed and unsigned types)
|
|
|
|
Expressions in the shading language are built from the following:
|
|
|
|
* Constants of type bool, int, int64_t, uint, uint64_t, float, all vector
|
|
types, and all matrix types.
|
|
|
|
...
|
|
|
|
* The operator modulus (%) operates on signed or unsigned integer scalars
|
|
or vectors. If the fundamental types of the operands do not match, the
|
|
conversions from Section 4.1.10 "Implicit Conversions" are applied to
|
|
produce matching types. ...
|
|
|
|
|
|
Modify Section 6.1, Function Definitions, p. 63
|
|
|
|
(modify description of overloading, beginning at the top of p. 64)
|
|
|
|
Function names can be overloaded. The same function name can be used for
|
|
multiple functions, as long as the parameter types differ. If a function
|
|
name is declared twice with the same parameter types, then the return
|
|
types and all qualifiers must also match, and it is the same function
|
|
being declared. For example,
|
|
|
|
vec4 f(in vec4 x, out vec4 y); // (A)
|
|
vec4 f(in vec4 x, out uvec4 y); // (B) okay, different argument type
|
|
vec4 f(in ivec4 x, out uvec4 y); // (C) okay, different argument type
|
|
|
|
int f(in vec4 x, out ivec4 y); // error, only return type differs
|
|
vec4 f(in vec4 x, in vec4 y); // error, only qualifier differs
|
|
vec4 f(const in vec4 x, out vec4 y); // error, only qualifier differs
|
|
|
|
When function calls are resolved, an exact type match for all the
|
|
arguments is sought. If an exact match is found, all other functions are
|
|
ignored, and the exact match is used. If no exact match is found, then
|
|
the implicit conversions in Section 4.1.10 (Implicit Conversions) will be
|
|
applied to find a match. Mismatched types on input parameters (in or
|
|
inout or default) must have a conversion from the calling argument type
|
|
to the formal parameter type. Mismatched types on output parameters (out
|
|
or inout) must have a conversion from the formal parameter type to the
|
|
calling argument type.
|
|
|
|
If implicit conversions can be used to find more than one matching
|
|
function, a single best-matching function is sought. To determine a best
|
|
match, the conversions between calling argument and formal parameter
|
|
types are compared for each function argument and pair of matching
|
|
functions. After these comparisons are performed, each pair of matching
|
|
functions are compared. A function definition A is considered a better
|
|
match than function definition B if:
|
|
|
|
* for at least one function argument, the conversion for that argument
|
|
in A is better than the corresponding conversion in B; and
|
|
|
|
* there is no function argument for which the conversion in B is better
|
|
than the corresponding conversion in A.
|
|
|
|
If a single function definition is considered a better match than every
|
|
other matching function definition, it will be used. Otherwise, a
|
|
semantic error occurs and the shader will fail to compile.
|
|
|
|
To determine whether the conversion for a single argument in one match is
|
|
better than that for another match, the following rules are applied, in
|
|
order:
|
|
|
|
1. An exact match is better than a match involving any implicit
|
|
conversion.
|
|
|
|
2. A match involving an implicit conversion from float to double is
|
|
better than a match involving any other implicit conversion.
|
|
|
|
3. A match involving an implicit conversion from either int or uint to
|
|
float is better than a match involving an implicit conversion from
|
|
either int or uint to double.
|
|
|
|
If none of the rules above apply to a particular pair of conversions,
|
|
neither conversion is considered better than the other.
|
|
|
|
For the function prototypes (A), (B), and (C) above, the following
|
|
examples show how the rules apply to different sets of calling argument
|
|
types:
|
|
|
|
f(vec4, vec4); // exact match of vec4 f(in vec4 x, out vec4 y)
|
|
f(vec4, uvec4); // exact match of vec4 f(in vec4 x, out ivec4 y)
|
|
f(vec4, ivec4); // matched to vec4 f(in vec4 x, out vec4 y)
|
|
// (C) not relevant, can't convert vec4 to
|
|
// ivec4. (A) better than (B) for 2nd
|
|
// argument (rule 2), same on first argument.
|
|
f(ivec4, vec4); // NOT matched. All three match by implicit
|
|
// conversion. (C) is better than (A) and (B)
|
|
// on the first argument. (A) is better than
|
|
// (B) and (C).
|
|
|
|
|
|
Modify Section 8.3, Common Functions, p. 84
|
|
|
|
(add support for single-precision frexp and ldexp functions)
|
|
|
|
Syntax:
|
|
|
|
genType frexp(genType x, out genIType exp);
|
|
genType ldexp(genType x, in genIType exp);
|
|
|
|
The function frexp() splits each single-precision floating-point number in
|
|
<x> into a binary significand, a floating-point number in the range [0.5,
|
|
1.0), and an integral exponent of two, such that:
|
|
|
|
x = significand * 2 ^ exponent
|
|
|
|
The significand is returned by the function; the exponent is returned in
|
|
the parameter <exp>. For a floating-point value of zero, the significant
|
|
and exponent are both zero. For a floating-point value that is an
|
|
infinity or is not a number, the results of frexp() are undefined.
|
|
|
|
If the input <x> is a vector, this operation is performed in a
|
|
component-wise manner; the value returned by the function and the value
|
|
written to <exp> are vectors with the same number of components as <x>.
|
|
|
|
The function ldexp() builds a single-precision floating-point number from
|
|
each significand component in <x> and the corresponding integral exponent
|
|
of two in <exp>, returning:
|
|
|
|
significand * 2 ^ exponent
|
|
|
|
If this product is too large to be represented as a single-precision
|
|
floating-point value, the result is considered undefined.
|
|
|
|
If the input <x> is a vector, this operation is performed in a
|
|
component-wise manner; the value passed in <exp> and returned by the
|
|
function are vectors with the same number of components as <x>.
|
|
|
|
|
|
(add support for new integer built-in functions)
|
|
|
|
Syntax:
|
|
|
|
genIType bitfieldExtract(genIType value, int offset, int bits);
|
|
genUType bitfieldExtract(genUType value, int offset, int bits);
|
|
|
|
genIType bitfieldInsert(genIType base, genIType insert, int offset,
|
|
int bits);
|
|
genUType bitfieldInsert(genUType base, genUType insert, int offset,
|
|
int bits);
|
|
|
|
genIType bitfieldReverse(genIType value);
|
|
genUType bitfieldReverse(genUType value);
|
|
|
|
genIType bitCount(genIType value);
|
|
genIType bitCount(genUType value);
|
|
|
|
genIType findLSB(genIType value);
|
|
genIType findLSB(genUType value);
|
|
|
|
genIType findMSB(genIType value);
|
|
genIType findMSB(genUType value);
|
|
|
|
The function bitfieldExtract() extracts bits <offset> through
|
|
<offset>+<bits>-1 from each component in <value>, returning them in the
|
|
least significant bits of corresponding component of the result. For
|
|
unsigned data types, the most significant bits of the result will be set
|
|
to zero. For signed data types, the most significant bits will be set to
|
|
the value of bit <offset>+<base>-1. If <bits> is zero, the result will be
|
|
zero. The result will be undefined if <offset> or <bits> is negative, or
|
|
if the sum of <offset> and <bits> is greater than the number of bits used
|
|
to store the operand. Note that for vector versions of bitfieldExtract(),
|
|
a single pair of <offset> and <bits> values is shared for all components.
|
|
|
|
The function bitfieldInsert() inserts the <bits> least significant bits of
|
|
each component of <insert> into the corresponding component of <base>.
|
|
The result will have bits numbered <offset> through <offset>+<bits>-1
|
|
taken from bits 0 through <bits>-1 of <insert>, and all other bits taken
|
|
directly from the corresponding bits of <base>. If <bits> is zero, the
|
|
result will simply be <base>. The result will be undefined if <offset> or
|
|
<bits> is negative, or if the sum of <offset> and <bits> is greater than
|
|
the number of bits used to store the operand. Note that for vector
|
|
versions of bitfieldInsert(), a single pair of <offset> and <bits> values
|
|
is shared for all components.
|
|
|
|
The function bitfieldReverse() reverses the bits of <value>. The bit
|
|
numbered <n> of the result will be taken from bit (<bits>-1)-<n> of
|
|
<value>, where <bits> is the total number of bits used to represent
|
|
<value>.
|
|
|
|
The function bitCount() returns the number of one bits in the binary
|
|
representation of <value>.
|
|
|
|
The function findLSB() returns the bit number of the least significant one
|
|
bit in the binary representation of <value>. If <value> is zero, -1 will
|
|
be returned.
|
|
|
|
The function findMSB() returns the bit number of the most significant bit
|
|
in the binary representation of <value>. For positive integers, the
|
|
result will be the bit number of the most significant one bit. For
|
|
negative integers, the result will be the bit number of the most
|
|
significant zero bit. For a <value> of zero or negative one, -1 will be
|
|
returned.
|
|
|
|
|
|
(support for unsigned integer add/subtract with carry-out)
|
|
|
|
Syntax:
|
|
|
|
genUType uaddCarry(genUType x, genUType y, out genUType carry);
|
|
genUType usubBorrow(genUType x, genUType y, out genUType borrow);
|
|
|
|
The function uaddCarry() adds 32-bit unsigned integers or vectors <x> and
|
|
<y>, returning the sum modulo 2^32. The value <carry> is set to zero if
|
|
the sum was less than 2^32, or one otherwise.
|
|
|
|
The function usubBorrow() subtracts the 32-bit unsigned integer or vector
|
|
<y> from <x>, returning the difference if non-negative or 2^32 plus the
|
|
difference, otherwise. The value <borrow> is set to zero if x >= y, or
|
|
one otherwise.
|
|
|
|
|
|
(support for signed and unsigned multiplies, with 32-bit inputs and a
|
|
64-bit result spanning two 32-bit outputs)
|
|
|
|
Syntax:
|
|
|
|
void umulExtended(genUType x, genUType y, out genUType msb,
|
|
out genUType lsb);
|
|
void imulExtended(genIType x, genIType y, out genIType msb,
|
|
out genIType lsb);
|
|
|
|
The functions umulExtended() and imulExtended() multiply 32-bit unsigned
|
|
or signed integers or vectors <x> and <y>, producing a 64-bit result. The
|
|
32 least significant bits are returned in <lsb>; the 32 most significant
|
|
bits are returned in <msb>.
|
|
|
|
|
|
GLX Protocol
|
|
|
|
None.
|
|
|
|
Dependencies on ARB_gpu_shader_fp64
|
|
|
|
This extension, ARB_gpu_shader_fp64, and NV_gpu_shader5 all modify the set
|
|
of implicit conversions supported in the OpenGL Shading Language. If more
|
|
than one of these extensions is supported, an expression of one type may
|
|
be converted to another type if that conversion is allowed by any of these
|
|
specifications.
|
|
|
|
If ARB_gpu_shader_fp64 or a similar extension introducing new data types
|
|
is not supported, the function overloading rule in the GLSL specification
|
|
preferring promotion an input parameters to smaller type to a larger type
|
|
is never applicable, as all data types are of the same size. That rule
|
|
and the example referring to "double" should be removed.
|
|
|
|
|
|
Dependencies on NV_gpu_shader5
|
|
|
|
This extension, ARB_gpu_shader_fp64, and NV_gpu_shader5 all modify the set
|
|
of implicit conversions supported in the OpenGL Shading Language. If more
|
|
than one of these extensions is supported, an expression of one type may
|
|
be converted to another type if that conversion is allowed by any of these
|
|
specifications.
|
|
|
|
If NV_gpu_shader5 is supported, integer data types are supported with four
|
|
different precisions (8-, 16, 32-, and 64-bit) and floating-point data
|
|
types are supported with three different precisions (16-, 32-, and
|
|
64-bit). The extension adds the following rule for output parameters,
|
|
which is similar to the one present in this extension for input
|
|
parameters:
|
|
|
|
5. If the formal parameters in both matches are output parameters, a
|
|
conversion from a type with a larger number of bits per component is
|
|
better than a conversion from a type with a smaller number of bits
|
|
per component. For example, a conversion from an "int16_t" formal
|
|
parameter type to "int" is better than one from an "int8_t" formal
|
|
parameter type to "int".
|
|
|
|
Such a rule is not provided in this extension because there is no
|
|
combination of types in this extension and ARB_gpu_shader_fp64 where this
|
|
rule has any effect.
|
|
|
|
|
|
Errors
|
|
|
|
None
|
|
|
|
|
|
New State
|
|
|
|
None
|
|
|
|
New Implementation Dependent State
|
|
|
|
None
|
|
|
|
Issues
|
|
|
|
(1) What should this extension be called?
|
|
|
|
UNRESOLVED. This extension borrows from GL_ARB_gpu_shader5, so creating
|
|
some sort of a play on that name would be viable. However, nothing in
|
|
this extension should require SM5 hardware, so such a name would be a
|
|
little misleading and weird.
|
|
|
|
Since the primary purpose is to add integer related functions from
|
|
GL_ARB_gpu_shader5, call this extension GL_MESA_shader_integer_functions
|
|
for now.
|
|
|
|
(2) Why is some of the formatting in this extension weird?
|
|
|
|
RESOLVED: This extension is formatted to minimize the differences (as
|
|
reported by 'diff --side-by-side -W180') with the GL_ARB_gpu_shader5
|
|
specification.
|
|
|
|
(3) Should ldexp and frexp be included?
|
|
|
|
RESOLVED: Yes. Few GPUs have native instructions to implement these
|
|
functions. These are generally implemented using existing GLSL built-in
|
|
functions and the other functions provided by this extension.
|
|
|
|
(4) Should umulExtended and imulExtended be included?
|
|
|
|
RESOLVED: Yes. These functions should be implementable on any GPU that
|
|
can support the rest of this extension, but the implementation may be
|
|
complex. The implementation on a GPU that only supports 32bit x 32bit =
|
|
32bit multiplication would be quite expensive. However, many GPUs
|
|
(including OpenGL 4.0 GPUs that already support this function) have a
|
|
32bit x 16bit = 48bit multiplier. The implementation there is only
|
|
trivially more expensive than regular 32bit multiplication.
|
|
|
|
(5) Should the pack and unpack functions be included?
|
|
|
|
RESOLVED: No. These functions are already available via
|
|
GL_ARB_shading_language_packing.
|
|
|
|
(6) Should the "BitsTo" functions be included?
|
|
|
|
RESOLVED: No. These functions are already available via
|
|
GL_ARB_shader_bit_encoding.
|
|
|
|
Revision History
|
|
|
|
Rev. Date Author Changes
|
|
---- ----------- -------- -----------------------------------------
|
|
3 31-Mar-2017 Jon Leech Add ES support (OpenGL-Registry/issues/3)
|
|
2 7-Jul-2016 idr Fix typo in #extension line
|
|
1 20-Jun-2016 idr Initial version based on GL_ARB_gpu_shader5.
|