Import Mesa 13.0.3
This commit is contained in:
parent
b93dd40b0b
commit
6d4ad6fa4b
@ -1 +1 @@
|
||||
13.0.2
|
||||
13.0.3
|
||||
|
@ -31,7 +31,8 @@ because compatibility contexts are not supported.
|
||||
|
||||
<h2>SHA256 checksums</h2>
|
||||
<pre>
|
||||
TBD
|
||||
6014233a5db6032ab8de4881384871bbe029de684502707794ce7b3e6beec308 mesa-13.0.2.tar.gz
|
||||
a6ed622645f4ed61da418bf65adde5bcc4bb79023c36ba7d6b45b389da4416d5 mesa-13.0.2.tar.xz
|
||||
</pre>
|
||||
|
||||
|
||||
|
176
lib/mesa/docs/relnotes/13.0.3.html
Normal file
176
lib/mesa/docs/relnotes/13.0.3.html
Normal file
@ -0,0 +1,176 @@
|
||||
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
|
||||
<html lang="en">
|
||||
<head>
|
||||
<meta http-equiv="content-type" content="text/html; charset=utf-8">
|
||||
<title>Mesa Release Notes</title>
|
||||
<link rel="stylesheet" type="text/css" href="../mesa.css">
|
||||
</head>
|
||||
<body>
|
||||
|
||||
<div class="header">
|
||||
<h1>The Mesa 3D Graphics Library</h1>
|
||||
</div>
|
||||
|
||||
<iframe src="../contents.html"></iframe>
|
||||
<div class="content">
|
||||
|
||||
<h1>Mesa 13.0.3 Release Notes / January 5, 2017</h1>
|
||||
|
||||
<p>
|
||||
Mesa 13.0.3 is a bug fix release which fixes bugs found since the 13.0.2 release.
|
||||
</p>
|
||||
<p>
|
||||
Mesa 13.0.3 implements the OpenGL 4.4 API, but the version reported by
|
||||
glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
|
||||
glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
|
||||
Some drivers don't support all the features required in OpenGL 4.4. OpenGL
|
||||
4.4 is <strong>only</strong> available if requested at context creation
|
||||
because compatibility contexts are not supported.
|
||||
</p>
|
||||
|
||||
|
||||
<h2>SHA256 checksums</h2>
|
||||
<pre>
|
||||
TBD
|
||||
</pre>
|
||||
|
||||
|
||||
<h2>New features</h2>
|
||||
<p>None</p>
|
||||
|
||||
|
||||
<h2>Bug fixes</h2>
|
||||
|
||||
<ul>
|
||||
|
||||
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=77662">Bug 77662</a> - Fail to render to different faces of depth-stencil cube map</li>
|
||||
|
||||
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92234">Bug 92234</a> - [BDW] GPU hang in Shogun2</li>
|
||||
|
||||
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=98329">Bug 98329</a> - [dEQP, EGL, SKL, BDW, BSW] dEQP-EGL.functional.image.render_multiple_contexts.gles2_renderbuffer_depth16_depth_buffer</li>
|
||||
|
||||
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=99038">Bug 99038</a> - [dEQP, EGL, SKL, BDW, BSW] dEQP-EGL.functional.negative_api.create_pixmap_surface crashes</li>
|
||||
|
||||
</ul>
|
||||
|
||||
|
||||
<h2>Changes</h2>
|
||||
|
||||
<p>Chad Versace (2):</p>
|
||||
<ul>
|
||||
<li>i965/mt: Disable aux surfaces after making miptree shareable</li>
|
||||
<li>egl: Fix crashes in eglCreate*Surface()</li>
|
||||
</ul>
|
||||
|
||||
<p>Dave Airlie (4):</p>
|
||||
<ul>
|
||||
<li>anv: set maxFragmentDualSrcAttachments to 1</li>
|
||||
<li>radv: set maxFragmentDualSrcAttachments to 1</li>
|
||||
<li>radv: fix another regression since shadow fixes.</li>
|
||||
<li>radv: add missing license file to radv_meta_bufimage.</li>
|
||||
</ul>
|
||||
|
||||
<p>Emil Velikov (5):</p>
|
||||
<ul>
|
||||
<li>docs: add sha256 checksums for 13.0.2</li>
|
||||
<li>anv: don't double-close the same fd</li>
|
||||
<li>anv: don't leak memory if anv_init_wsi() fails</li>
|
||||
<li>radv: don't leak the fd if radv_physical_device_init() succeeds</li>
|
||||
<li>Update version to 13.0.3</li>
|
||||
</ul>
|
||||
|
||||
<p>Eric Anholt (1):</p>
|
||||
<ul>
|
||||
<li>vc4: In a loop break/continue, jump if everyone has taken the path.</li>
|
||||
</ul>
|
||||
|
||||
<p>Gwan-gyeong Mun (3):</p>
|
||||
<ul>
|
||||
<li>anv: Add missing error-checking to anv_block_pool_init (v2)</li>
|
||||
<li>anv: Update the teardown in reverse order of the anv_CreateDevice</li>
|
||||
<li>vulkan/wsi: Fix resource leak in success path of wsi_queue_init()</li>
|
||||
</ul>
|
||||
|
||||
<p>Haixia Shi (1):</p>
|
||||
<ul>
|
||||
<li>compiler/glsl: fix precision problem of tanh</li>
|
||||
</ul>
|
||||
|
||||
<p>Ilia Mirkin (1):</p>
|
||||
<ul>
|
||||
<li>mesa: only verify that enabled arrays have backing buffers</li>
|
||||
</ul>
|
||||
|
||||
<p>Jason Ekstrand (8):</p>
|
||||
<ul>
|
||||
<li>anv/cmd_buffer: Re-emit MEDIA_CURBE_LOAD when CS push constants are dirty</li>
|
||||
<li>anv/image: Rename hiz_surface to aux_surface</li>
|
||||
<li>anv/cmd_buffer: Remove the 1-D case from the HiZ QPitch calculation</li>
|
||||
<li>genxml/gen9: Change the default of MI_SEMAPHORE_WAIT::RegisterPoleMode</li>
|
||||
<li>anv/device: Return the right error for failed maps</li>
|
||||
<li>anv/device: Implicitly unmap memory objects in FreeMemory</li>
|
||||
<li>anv/descriptor_set: Write the state offset in the surface state free list.</li>
|
||||
<li>spirv: Use a simpler and more correct implementaiton of tanh()</li>
|
||||
</ul>
|
||||
|
||||
<p>Kenneth Graunke (1):</p>
|
||||
<ul>
|
||||
<li>i965: Allocate at least some URB space even when max_vertices = 0.</li>
|
||||
</ul>
|
||||
|
||||
<p>Marek Olšák (17):</p>
|
||||
<ul>
|
||||
<li>radeonsi: always set all blend registers</li>
|
||||
<li>radeonsi: set CB_BLEND1_CONTROL.ENABLE for dual source blending</li>
|
||||
<li>radeonsi: disable RB+ blend optimizations for dual source blending</li>
|
||||
<li>radeonsi: consolidate max-work-group-size computation</li>
|
||||
<li>radeonsi: apply a multi-wave workgroup SPI bug workaround to affected CIK chips</li>
|
||||
<li>radeonsi: apply a TC L1 write corruption workaround for SI</li>
|
||||
<li>radeonsi: apply a tessellation bug workaround for SI</li>
|
||||
<li>radeonsi: add a tess+GS hang workaround for VI dGPUs</li>
|
||||
<li>radeonsi: apply the double EVENT_WRITE_EOP workaround to VI as well</li>
|
||||
<li>cso: don't release sampler states that are bound</li>
|
||||
<li>radeonsi: always restore sampler states when unbinding sampler views</li>
|
||||
<li>radeonsi: fix incorrect FMASK checking in bind_sampler_states</li>
|
||||
<li>radeonsi: allow specifying simm16 of emit_waitcnt at call sites</li>
|
||||
<li>radeonsi: wait for outstanding memory instructions in TCS barriers</li>
|
||||
<li>tgsi: fix the src type of TGSI_OPCODE_MEMBAR</li>
|
||||
<li>radeonsi: wait for outstanding LDS instructions in memory barriers if needed</li>
|
||||
<li>radeonsi: disable the constant engine (CE) on Carrizo and Stoney</li>
|
||||
</ul>
|
||||
|
||||
<p>Matt Turner (3):</p>
|
||||
<ul>
|
||||
<li>i965/fs: Rename opt_copy_propagate -> opt_copy_propagation.</li>
|
||||
<li>i965/fs: Add unit tests for copy propagation pass.</li>
|
||||
<li>i965/fs: Reject copy propagation into SEL if not min/max.</li>
|
||||
</ul>
|
||||
|
||||
<p>Nanley Chery (1):</p>
|
||||
<ul>
|
||||
<li>mesa/fbobject: Update CubeMapFace when reusing textures</li>
|
||||
</ul>
|
||||
|
||||
<p>Nicolai Hähnle (4):</p>
|
||||
<ul>
|
||||
<li>radeonsi: fix isolines tess factor writes to control ring</li>
|
||||
<li>radeonsi: update all GSVS ring descriptors for new buffer allocations</li>
|
||||
<li>radeonsi: do not kill GS with memory writes</li>
|
||||
<li>radeonsi: fix an off-by-one error in the bounds check for max_vertices</li>
|
||||
</ul>
|
||||
|
||||
<p>Rhys Kidd (1):</p>
|
||||
<ul>
|
||||
<li>glsl: Add pthread libs to cache_test</li>
|
||||
</ul>
|
||||
|
||||
<p>Timothy Arceri (2):</p>
|
||||
<ul>
|
||||
<li>mesa: fix active subroutine uniforms properly</li>
|
||||
<li>Revert "nir: Turn imov/fmov of undef into undef."</li>
|
||||
</ul>
|
||||
|
||||
|
||||
</div>
|
||||
</body>
|
||||
</html>
|
@ -3545,7 +3545,7 @@ static void visit_tex(struct nir_to_llvm_context *ctx, nir_tex_instr *instr)
|
||||
|
||||
if (instr->op == nir_texop_query_levels)
|
||||
result = LLVMBuildExtractElement(ctx->builder, result, LLVMConstInt(ctx->i32, 3, false), "");
|
||||
else if (instr->is_shadow && instr->op != nir_texop_txs && instr->op != nir_texop_lod)
|
||||
else if (instr->is_shadow && instr->op != nir_texop_txs && instr->op != nir_texop_lod && instr->op != nir_texop_tg4)
|
||||
result = LLVMBuildExtractElement(ctx->builder, result, ctx->i32zero, "");
|
||||
else if (instr->op == nir_texop_txs &&
|
||||
instr->sampler_dim == GLSL_SAMPLER_DIM_CUBE &&
|
||||
|
@ -91,6 +91,7 @@ radv_physical_device_init(struct radv_physical_device *device,
|
||||
|
||||
fprintf(stderr, "WARNING: radv is not a conformant vulkan implementation, testing use only.\n");
|
||||
device->name = device->rad_info.name;
|
||||
close(fd);
|
||||
return VK_SUCCESS;
|
||||
|
||||
fail:
|
||||
@ -424,7 +425,7 @@ void radv_GetPhysicalDeviceProperties(
|
||||
.maxGeometryTotalOutputComponents = 1024,
|
||||
.maxFragmentInputComponents = 128,
|
||||
.maxFragmentOutputAttachments = 8,
|
||||
.maxFragmentDualSrcAttachments = 2,
|
||||
.maxFragmentDualSrcAttachments = 1,
|
||||
.maxFragmentCombinedOutputResources = 8,
|
||||
.maxComputeSharedMemorySize = 32768,
|
||||
.maxComputeWorkGroupCount = { 65535, 65535, 65535 },
|
||||
|
@ -1,6 +1,33 @@
|
||||
/*
|
||||
* Copyright © 2016 Red Hat.
|
||||
* Copyright © 2016 Bas Nieuwenhuizen
|
||||
*
|
||||
* Permission is hereby granted, free of charge, to any person obtaining a
|
||||
* copy of this software and associated documentation files (the "Software"),
|
||||
* to deal in the Software without restriction, including without limitation
|
||||
* the rights to use, copy, modify, merge, publish, distribute, sublicense,
|
||||
* and/or sell copies of the Software, and to permit persons to whom the
|
||||
* Software is furnished to do so, subject to the following conditions:
|
||||
*
|
||||
* The above copyright notice and this permission notice (including the next
|
||||
* paragraph) shall be included in all copies or substantial portions of the
|
||||
* Software.
|
||||
*
|
||||
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
|
||||
* THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
|
||||
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
|
||||
* IN THE SOFTWARE.
|
||||
*/
|
||||
#include "radv_meta.h"
|
||||
#include "nir/nir_builder.h"
|
||||
|
||||
/*
|
||||
* Compute shader implementation of image->buffer copy.
|
||||
*/
|
||||
|
||||
static nir_shader *
|
||||
build_nir_itob_compute_shader(struct radv_device *dev)
|
||||
{
|
||||
|
@ -1 +1 @@
|
||||
#define RADV_TIMESTAMP "1480346204"
|
||||
#define RADV_TIMESTAMP "1483631585"
|
||||
|
@ -3563,9 +3563,17 @@ builtin_builder::_tanh(const glsl_type *type)
|
||||
ir_variable *x = in_var(type, "x");
|
||||
MAKE_SIG(type, v130, 1, x);
|
||||
|
||||
/* Clamp x to [-10, +10] to avoid precision problems.
|
||||
* When x > 10, e^(-x) is so small relative to e^x that it gets flushed to
|
||||
* zero in the computation e^x + e^(-x). The same happens in the other
|
||||
* direction when x < -10.
|
||||
*/
|
||||
ir_variable *t = body.make_temp(type, "tmp");
|
||||
body.emit(assign(t, min2(max2(x, imm(-10.0f)), imm(10.0f))));
|
||||
|
||||
/* (e^x - e^(-x)) / (e^x + e^(-x)) */
|
||||
body.emit(ret(div(sub(exp(x), exp(neg(x))),
|
||||
add(exp(x), exp(neg(x))))));
|
||||
body.emit(ret(div(sub(exp(t), exp(neg(t))),
|
||||
add(exp(t), exp(neg(t))))));
|
||||
|
||||
return sig;
|
||||
}
|
||||
|
@ -633,6 +633,8 @@ private:
|
||||
uniform->opaque[shader_type].index = this->next_subroutine;
|
||||
uniform->opaque[shader_type].active = true;
|
||||
|
||||
prog->_LinkedShaders[shader_type]->NumSubroutineUniforms++;
|
||||
|
||||
/* Increment the subroutine index by 1 for non-arrays and by the
|
||||
* number of array elements for arrays.
|
||||
*/
|
||||
|
@ -3118,7 +3118,6 @@ link_calculate_subroutine_compat(struct gl_shader_program *prog)
|
||||
if (!uni)
|
||||
continue;
|
||||
|
||||
sh->NumSubroutineUniforms++;
|
||||
count = 0;
|
||||
if (sh->NumSubroutineFunctions == 0) {
|
||||
linker_error(prog, "subroutine uniform %s defined but no valid functions found\n", uni->type->name);
|
||||
|
@ -0,0 +1 @@
|
||||
# dummy
|
@ -79,9 +79,7 @@ opt_undef_vecN(nir_builder *b, nir_alu_instr *alu)
|
||||
{
|
||||
if (alu->op != nir_op_vec2 &&
|
||||
alu->op != nir_op_vec3 &&
|
||||
alu->op != nir_op_vec4 &&
|
||||
alu->op != nir_op_fmov &&
|
||||
alu->op != nir_op_imov)
|
||||
alu->op != nir_op_vec4)
|
||||
return false;
|
||||
|
||||
assert(alu->dest.dest.is_ssa);
|
||||
|
@ -565,16 +565,21 @@ handle_glsl450_alu(struct vtn_builder *b, enum GLSLstd450 entrypoint,
|
||||
build_exp(nb, nir_fneg(nb, src[0]))));
|
||||
return;
|
||||
|
||||
case GLSLstd450Tanh:
|
||||
/* (0.5 * (e^x - e^(-x))) / (0.5 * (e^x + e^(-x))) */
|
||||
val->ssa->def =
|
||||
nir_fdiv(nb, nir_fmul(nb, nir_imm_float(nb, 0.5f),
|
||||
nir_fsub(nb, build_exp(nb, src[0]),
|
||||
build_exp(nb, nir_fneg(nb, src[0])))),
|
||||
nir_fmul(nb, nir_imm_float(nb, 0.5f),
|
||||
nir_fadd(nb, build_exp(nb, src[0]),
|
||||
build_exp(nb, nir_fneg(nb, src[0])))));
|
||||
case GLSLstd450Tanh: {
|
||||
/* tanh(x) := (0.5 * (e^x - e^(-x))) / (0.5 * (e^x + e^(-x)))
|
||||
*
|
||||
* With a little algebra this reduces to (e^2x - 1) / (e^2x + 1)
|
||||
*
|
||||
* We clamp x to (-inf, +10] to avoid precision problems. When x > 10,
|
||||
* e^2x is so much larger than 1.0 that 1.0 gets flushed to zero in the
|
||||
* computation e^2x +/- 1 so it can be ignored.
|
||||
*/
|
||||
nir_ssa_def *x = nir_fmin(nb, src[0], nir_imm_float(nb, 10));
|
||||
nir_ssa_def *exp2x = build_exp(nb, nir_fmul(nb, x, nir_imm_float(nb, 2)));
|
||||
val->ssa->def = nir_fdiv(nb, nir_fsub(nb, exp2x, nir_imm_float(nb, 1)),
|
||||
nir_fadd(nb, exp2x, nir_imm_float(nb, 1)));
|
||||
return;
|
||||
}
|
||||
|
||||
case GLSLstd450Asinh:
|
||||
val->ssa->def = nir_fmul(nb, nir_fsign(nb, src[0]),
|
||||
|
@ -849,7 +849,7 @@ _eglCreateWindowSurfaceCommon(_EGLDisplay *disp, EGLConfig config,
|
||||
RETURN_EGL_ERROR(disp, EGL_BAD_NATIVE_WINDOW, EGL_NO_SURFACE);
|
||||
|
||||
#ifdef HAVE_SURFACELESS_PLATFORM
|
||||
if (disp->Platform == _EGL_PLATFORM_SURFACELESS) {
|
||||
if (disp && disp->Platform == _EGL_PLATFORM_SURFACELESS) {
|
||||
/* From the EGL_MESA_platform_surfaceless spec (v1):
|
||||
*
|
||||
* eglCreatePlatformWindowSurface fails when called with a <display>
|
||||
@ -970,7 +970,7 @@ _eglCreatePixmapSurfaceCommon(_EGLDisplay *disp, EGLConfig config,
|
||||
EGLSurface ret;
|
||||
|
||||
#if HAVE_SURFACELESS_PLATFORM
|
||||
if (disp->Platform == _EGL_PLATFORM_SURFACELESS) {
|
||||
if (disp && disp->Platform == _EGL_PLATFORM_SURFACELESS) {
|
||||
/* From the EGL_MESA_platform_surfaceless spec (v1):
|
||||
*
|
||||
* [Like eglCreatePlatformWindowSurface,] eglCreatePlatformPixmapSurface
|
||||
|
@ -188,7 +188,9 @@ cso_insert_state(struct cso_cache *sc,
|
||||
void *state)
|
||||
{
|
||||
struct cso_hash *hash = _cso_hash_for_type(sc, type);
|
||||
sanitize_hash(sc, hash, type, sc->max_size);
|
||||
|
||||
if (type != CSO_SAMPLER)
|
||||
sanitize_hash(sc, hash, type, sc->max_size);
|
||||
|
||||
return cso_hash_insert(hash, hash_key, state);
|
||||
}
|
||||
|
@ -485,6 +485,7 @@ tgsi_opcode_infer_src_type( uint opcode )
|
||||
case TGSI_OPCODE_UMUL_HI:
|
||||
case TGSI_OPCODE_UP2H:
|
||||
case TGSI_OPCODE_U2I64:
|
||||
case TGSI_OPCODE_MEMBAR:
|
||||
return TGSI_TYPE_UNSIGNED;
|
||||
case TGSI_OPCODE_IMUL_HI:
|
||||
case TGSI_OPCODE_I2F:
|
||||
|
@ -85,7 +85,8 @@ void r600_gfx_write_fence(struct r600_common_context *ctx, struct r600_resource
|
||||
{
|
||||
struct radeon_winsys_cs *cs = ctx->gfx.cs;
|
||||
|
||||
if (ctx->chip_class == CIK) {
|
||||
if (ctx->chip_class == CIK ||
|
||||
ctx->chip_class == VI) {
|
||||
/* Two EOP events are required to make all engines go idle
|
||||
* (and optional cache flushes executed) before the timestamp
|
||||
* is written.
|
||||
@ -114,7 +115,8 @@ unsigned r600_gfx_write_fence_dwords(struct r600_common_screen *screen)
|
||||
{
|
||||
unsigned dwords = 6;
|
||||
|
||||
if (screen->chip_class == CIK)
|
||||
if (screen->chip_class == CIK ||
|
||||
screen->chip_class == VI)
|
||||
dwords *= 2;
|
||||
|
||||
if (!screen->info.has_virtual_memory)
|
||||
|
@ -343,6 +343,7 @@ static bool si_switch_compute_shader(struct si_context *sctx,
|
||||
lds_blocks += align(program->local_size, 512) >> 9;
|
||||
}
|
||||
|
||||
/* TODO: use si_multiwave_lds_size_workaround */
|
||||
assert(lds_blocks <= 0xFF);
|
||||
|
||||
config->rsrc2 &= C_00B84C_LDS_SIZE;
|
||||
|
@ -413,13 +413,13 @@ static void si_set_sampler_view(struct si_context *sctx,
|
||||
struct si_sampler_views *views = &sctx->samplers[shader].views;
|
||||
struct si_sampler_view *rview = (struct si_sampler_view*)view;
|
||||
struct si_descriptors *descs = si_sampler_descriptors(sctx, shader);
|
||||
uint32_t *desc = descs->list + slot * 16;
|
||||
|
||||
if (views->views[slot] == view && !disallow_early_out)
|
||||
return;
|
||||
|
||||
if (view) {
|
||||
struct r600_texture *rtex = (struct r600_texture *)view->texture;
|
||||
uint32_t *desc = descs->list + slot * 16;
|
||||
|
||||
assert(rtex); /* views with texture == NULL aren't supported */
|
||||
pipe_sampler_view_reference(&views->views[slot], view);
|
||||
@ -468,9 +468,14 @@ static void si_set_sampler_view(struct si_context *sctx,
|
||||
rview->is_stencil_sampler, true);
|
||||
} else {
|
||||
pipe_sampler_view_reference(&views->views[slot], NULL);
|
||||
memcpy(descs->list + slot*16, null_texture_descriptor, 8*4);
|
||||
memcpy(desc, null_texture_descriptor, 8*4);
|
||||
/* Only clear the lower dwords of FMASK. */
|
||||
memcpy(descs->list + slot*16 + 8, null_texture_descriptor, 4*4);
|
||||
memcpy(desc + 8, null_texture_descriptor, 4*4);
|
||||
/* Re-set the sampler state if we are transitioning from FMASK. */
|
||||
if (views->sampler_states[slot])
|
||||
memcpy(desc + 12,
|
||||
views->sampler_states[slot], 4*4);
|
||||
|
||||
views->enabled_mask &= ~(1u << slot);
|
||||
}
|
||||
|
||||
@ -803,10 +808,10 @@ static void si_bind_sampler_states(struct pipe_context *ctx,
|
||||
/* If FMASK is bound, don't overwrite it.
|
||||
* The sampler state will be set after FMASK is unbound.
|
||||
*/
|
||||
if (samplers->views.views[i] &&
|
||||
samplers->views.views[i]->texture &&
|
||||
samplers->views.views[i]->texture->target != PIPE_BUFFER &&
|
||||
((struct r600_texture*)samplers->views.views[i]->texture)->fmask.size)
|
||||
if (samplers->views.views[slot] &&
|
||||
samplers->views.views[slot]->texture &&
|
||||
samplers->views.views[slot]->texture->target != PIPE_BUFFER &&
|
||||
((struct r600_texture*)samplers->views.views[slot]->texture)->fmask.size)
|
||||
continue;
|
||||
|
||||
memcpy(desc->list + slot * 16 + 12, sstates[i]->val, 4*4);
|
||||
|
@ -187,7 +187,10 @@ static struct pipe_context *si_create_context(struct pipe_screen *screen,
|
||||
|
||||
/* SI + AMDGPU + CE = GPU hang */
|
||||
if (!(sscreen->b.debug_flags & DBG_NO_CE) && ws->cs_add_const_ib &&
|
||||
sscreen->b.chip_class != SI) {
|
||||
sscreen->b.chip_class != SI &&
|
||||
/* These can't use CE due to a power gating bug in the kernel. */
|
||||
sscreen->b.family != CHIP_CARRIZO &&
|
||||
sscreen->b.family != CHIP_STONEY) {
|
||||
sctx->ce_ib = ws->cs_add_const_ib(sctx->b.gfx.cs);
|
||||
if (!sctx->ce_ib)
|
||||
goto fail;
|
||||
|
@ -2577,10 +2577,18 @@ static void si_write_tess_factors(struct lp_build_tgsi_context *bld_base,
|
||||
lp_build_const_int32(gallivm,
|
||||
tess_outer_index * 4), "");
|
||||
|
||||
for (i = 0; i < outer_comps; i++)
|
||||
out[i] = lds_load(bld_base, TGSI_TYPE_SIGNED, i, lds_outer);
|
||||
for (i = 0; i < inner_comps; i++)
|
||||
out[outer_comps+i] = lds_load(bld_base, TGSI_TYPE_SIGNED, i, lds_inner);
|
||||
if (shader->key.tcs.epilog.prim_mode == PIPE_PRIM_LINES) {
|
||||
/* For isolines, the hardware expects tess factors in the
|
||||
* reverse order from what GLSL / TGSI specify.
|
||||
*/
|
||||
out[0] = lds_load(bld_base, TGSI_TYPE_SIGNED, 1, lds_outer);
|
||||
out[1] = lds_load(bld_base, TGSI_TYPE_SIGNED, 0, lds_outer);
|
||||
} else {
|
||||
for (i = 0; i < outer_comps; i++)
|
||||
out[i] = lds_load(bld_base, TGSI_TYPE_SIGNED, i, lds_outer);
|
||||
for (i = 0; i < inner_comps; i++)
|
||||
out[outer_comps+i] = lds_load(bld_base, TGSI_TYPE_SIGNED, i, lds_inner);
|
||||
}
|
||||
|
||||
/* Convert the outputs to vectors for stores. */
|
||||
vec0 = lp_build_gather_values(gallivm, out, MIN2(stride, 4));
|
||||
@ -3301,6 +3309,7 @@ static void build_tex_intrinsic(const struct lp_build_tgsi_action *action,
|
||||
* point in the program by emitting empty inline assembly that is marked as
|
||||
* having side effects.
|
||||
*/
|
||||
#if 0 /* unused currently */
|
||||
static void emit_optimization_barrier(struct si_shader_context *ctx)
|
||||
{
|
||||
LLVMBuilderRef builder = ctx->gallivm.builder;
|
||||
@ -3308,13 +3317,19 @@ static void emit_optimization_barrier(struct si_shader_context *ctx)
|
||||
LLVMValueRef inlineasm = LLVMConstInlineAsm(ftype, "", "", true, false);
|
||||
LLVMBuildCall(builder, inlineasm, NULL, 0, "");
|
||||
}
|
||||
#endif
|
||||
|
||||
static void emit_waitcnt(struct si_shader_context *ctx)
|
||||
/* Combine these with & instead of |. */
|
||||
#define NOOP_WAITCNT 0xf7f
|
||||
#define LGKM_CNT 0x07f
|
||||
#define VM_CNT 0xf70
|
||||
|
||||
static void emit_waitcnt(struct si_shader_context *ctx, unsigned simm16)
|
||||
{
|
||||
struct gallivm_state *gallivm = &ctx->gallivm;
|
||||
LLVMBuilderRef builder = gallivm->builder;
|
||||
LLVMValueRef args[1] = {
|
||||
lp_build_const_int32(gallivm, 0xf70)
|
||||
lp_build_const_int32(gallivm, simm16)
|
||||
};
|
||||
lp_build_intrinsic(builder, "llvm.amdgcn.s.waitcnt",
|
||||
ctx->voidt, args, 1, 0);
|
||||
@ -3326,8 +3341,23 @@ static void membar_emit(
|
||||
struct lp_build_emit_data *emit_data)
|
||||
{
|
||||
struct si_shader_context *ctx = si_shader_context(bld_base);
|
||||
LLVMValueRef src0 = lp_build_emit_fetch(bld_base, emit_data->inst, 0, 0);
|
||||
unsigned flags = LLVMConstIntGetZExtValue(src0);
|
||||
unsigned waitcnt = NOOP_WAITCNT;
|
||||
|
||||
emit_waitcnt(ctx);
|
||||
if (flags & TGSI_MEMBAR_THREAD_GROUP)
|
||||
waitcnt &= VM_CNT & LGKM_CNT;
|
||||
|
||||
if (flags & (TGSI_MEMBAR_ATOMIC_BUFFER |
|
||||
TGSI_MEMBAR_SHADER_BUFFER |
|
||||
TGSI_MEMBAR_SHADER_IMAGE))
|
||||
waitcnt &= VM_CNT;
|
||||
|
||||
if (flags & TGSI_MEMBAR_SHARED)
|
||||
waitcnt &= LGKM_CNT;
|
||||
|
||||
if (waitcnt != NOOP_WAITCNT)
|
||||
emit_waitcnt(ctx, waitcnt);
|
||||
}
|
||||
|
||||
static LLVMValueRef
|
||||
@ -3481,7 +3511,8 @@ static void image_append_args(
|
||||
struct si_shader_context *ctx,
|
||||
struct lp_build_emit_data * emit_data,
|
||||
unsigned target,
|
||||
bool atomic)
|
||||
bool atomic,
|
||||
bool force_glc)
|
||||
{
|
||||
const struct tgsi_full_instruction *inst = emit_data->inst;
|
||||
LLVMValueRef i1false = LLVMConstInt(ctx->i1, 0, 0);
|
||||
@ -3489,6 +3520,7 @@ static void image_append_args(
|
||||
LLVMValueRef r128 = i1false;
|
||||
LLVMValueRef da = tgsi_is_array_image(target) ? i1true : i1false;
|
||||
LLVMValueRef glc =
|
||||
force_glc ||
|
||||
inst->Memory.Qualifier & (TGSI_MEMORY_COHERENT | TGSI_MEMORY_VOLATILE) ?
|
||||
i1true : i1false;
|
||||
LLVMValueRef slc = i1false;
|
||||
@ -3543,7 +3575,8 @@ static void buffer_append_args(
|
||||
LLVMValueRef rsrc,
|
||||
LLVMValueRef index,
|
||||
LLVMValueRef offset,
|
||||
bool atomic)
|
||||
bool atomic,
|
||||
bool force_glc)
|
||||
{
|
||||
const struct tgsi_full_instruction *inst = emit_data->inst;
|
||||
LLVMValueRef i1false = LLVMConstInt(ctx->i1, 0, 0);
|
||||
@ -3554,6 +3587,7 @@ static void buffer_append_args(
|
||||
emit_data->args[emit_data->arg_count++] = offset; /* voffset */
|
||||
if (!atomic) {
|
||||
emit_data->args[emit_data->arg_count++] =
|
||||
force_glc ||
|
||||
inst->Memory.Qualifier & (TGSI_MEMORY_COHERENT | TGSI_MEMORY_VOLATILE) ?
|
||||
i1true : i1false; /* glc */
|
||||
}
|
||||
@ -3583,7 +3617,7 @@ static void load_fetch_args(
|
||||
offset = LLVMBuildBitCast(builder, tmp, bld_base->uint_bld.elem_type, "");
|
||||
|
||||
buffer_append_args(ctx, emit_data, rsrc, bld_base->uint_bld.zero,
|
||||
offset, false);
|
||||
offset, false, false);
|
||||
} else if (inst->Src[0].Register.File == TGSI_FILE_IMAGE) {
|
||||
LLVMValueRef coords;
|
||||
|
||||
@ -3593,14 +3627,14 @@ static void load_fetch_args(
|
||||
if (target == TGSI_TEXTURE_BUFFER) {
|
||||
rsrc = extract_rsrc_top_half(ctx, rsrc);
|
||||
buffer_append_args(ctx, emit_data, rsrc, coords,
|
||||
bld_base->uint_bld.zero, false);
|
||||
bld_base->uint_bld.zero, false, false);
|
||||
} else {
|
||||
emit_data->args[0] = coords;
|
||||
emit_data->args[1] = rsrc;
|
||||
emit_data->args[2] = lp_build_const_int32(gallivm, 15); /* dmask */
|
||||
emit_data->arg_count = 3;
|
||||
|
||||
image_append_args(ctx, emit_data, target, false);
|
||||
image_append_args(ctx, emit_data, target, false, false);
|
||||
}
|
||||
}
|
||||
}
|
||||
@ -3727,7 +3761,7 @@ static void load_emit(
|
||||
}
|
||||
|
||||
if (inst->Memory.Qualifier & TGSI_MEMORY_VOLATILE)
|
||||
emit_waitcnt(ctx);
|
||||
emit_waitcnt(ctx, VM_CNT);
|
||||
|
||||
if (inst->Src[0].Register.File == TGSI_FILE_BUFFER) {
|
||||
load_emit_buffer(ctx, emit_data);
|
||||
@ -3790,11 +3824,19 @@ static void store_fetch_args(
|
||||
offset = LLVMBuildBitCast(builder, tmp, bld_base->uint_bld.elem_type, "");
|
||||
|
||||
buffer_append_args(ctx, emit_data, rsrc, bld_base->uint_bld.zero,
|
||||
offset, false);
|
||||
offset, false, false);
|
||||
} else if (inst->Dst[0].Register.File == TGSI_FILE_IMAGE) {
|
||||
unsigned target = inst->Memory.Texture;
|
||||
LLVMValueRef coords;
|
||||
|
||||
/* 8bit/16bit TC L1 write corruption bug on SI.
|
||||
* All store opcodes not aligned to a dword are affected.
|
||||
*
|
||||
* The only way to get unaligned stores in radeonsi is through
|
||||
* shader images.
|
||||
*/
|
||||
bool force_glc = ctx->screen->b.chip_class == SI;
|
||||
|
||||
coords = image_fetch_coords(bld_base, inst, 0);
|
||||
|
||||
if (target == TGSI_TEXTURE_BUFFER) {
|
||||
@ -3802,14 +3844,14 @@ static void store_fetch_args(
|
||||
|
||||
rsrc = extract_rsrc_top_half(ctx, rsrc);
|
||||
buffer_append_args(ctx, emit_data, rsrc, coords,
|
||||
bld_base->uint_bld.zero, false);
|
||||
bld_base->uint_bld.zero, false, force_glc);
|
||||
} else {
|
||||
emit_data->args[1] = coords;
|
||||
image_fetch_rsrc(bld_base, &memory, true, &emit_data->args[2]);
|
||||
emit_data->args[3] = lp_build_const_int32(gallivm, 15); /* dmask */
|
||||
emit_data->arg_count = 4;
|
||||
|
||||
image_append_args(ctx, emit_data, target, false);
|
||||
image_append_args(ctx, emit_data, target, false, force_glc);
|
||||
}
|
||||
}
|
||||
}
|
||||
@ -3929,7 +3971,7 @@ static void store_emit(
|
||||
}
|
||||
|
||||
if (inst->Memory.Qualifier & TGSI_MEMORY_VOLATILE)
|
||||
emit_waitcnt(ctx);
|
||||
emit_waitcnt(ctx, VM_CNT);
|
||||
|
||||
if (inst->Dst[0].Register.File == TGSI_FILE_BUFFER) {
|
||||
store_emit_buffer(ctx, emit_data);
|
||||
@ -3993,7 +4035,7 @@ static void atomic_fetch_args(
|
||||
offset = LLVMBuildBitCast(builder, tmp, bld_base->uint_bld.elem_type, "");
|
||||
|
||||
buffer_append_args(ctx, emit_data, rsrc, bld_base->uint_bld.zero,
|
||||
offset, true);
|
||||
offset, true, false);
|
||||
} else if (inst->Src[0].Register.File == TGSI_FILE_IMAGE) {
|
||||
unsigned target = inst->Memory.Texture;
|
||||
LLVMValueRef coords;
|
||||
@ -4005,12 +4047,12 @@ static void atomic_fetch_args(
|
||||
if (target == TGSI_TEXTURE_BUFFER) {
|
||||
rsrc = extract_rsrc_top_half(ctx, rsrc);
|
||||
buffer_append_args(ctx, emit_data, rsrc, coords,
|
||||
bld_base->uint_bld.zero, true);
|
||||
bld_base->uint_bld.zero, true, false);
|
||||
} else {
|
||||
emit_data->args[emit_data->arg_count++] = coords;
|
||||
emit_data->args[emit_data->arg_count++] = rsrc;
|
||||
|
||||
image_append_args(ctx, emit_data, target, true);
|
||||
image_append_args(ctx, emit_data, target, true, false);
|
||||
}
|
||||
}
|
||||
}
|
||||
@ -5247,6 +5289,7 @@ static void si_llvm_emit_vertex(
|
||||
struct si_shader *shader = ctx->shader;
|
||||
struct tgsi_shader_info *info = &shader->selector->info;
|
||||
struct gallivm_state *gallivm = bld_base->base.gallivm;
|
||||
struct lp_build_if_state if_state;
|
||||
LLVMValueRef soffset = LLVMGetParam(ctx->main_fn,
|
||||
SI_PARAM_GS2VS_OFFSET);
|
||||
LLVMValueRef gs_next_vertex;
|
||||
@ -5264,19 +5307,28 @@ static void si_llvm_emit_vertex(
|
||||
"");
|
||||
|
||||
/* If this thread has already emitted the declared maximum number of
|
||||
* vertices, kill it: excessive vertex emissions are not supposed to
|
||||
* have any effect, and GS threads have no externally observable
|
||||
* effects other than emitting vertices.
|
||||
* vertices, skip the write: excessive vertex emissions are not
|
||||
* supposed to have any effect.
|
||||
*
|
||||
* If the shader has no writes to memory, kill it instead. This skips
|
||||
* further memory loads and may allow LLVM to skip to the end
|
||||
* altogether.
|
||||
*/
|
||||
can_emit = LLVMBuildICmp(gallivm->builder, LLVMIntULE, gs_next_vertex,
|
||||
can_emit = LLVMBuildICmp(gallivm->builder, LLVMIntULT, gs_next_vertex,
|
||||
lp_build_const_int32(gallivm,
|
||||
shader->selector->gs_max_out_vertices), "");
|
||||
kill = lp_build_select(&bld_base->base, can_emit,
|
||||
lp_build_const_float(gallivm, 1.0f),
|
||||
lp_build_const_float(gallivm, -1.0f));
|
||||
|
||||
lp_build_intrinsic(gallivm->builder, "llvm.AMDGPU.kill",
|
||||
ctx->voidt, &kill, 1, 0);
|
||||
bool use_kill = !info->writes_memory;
|
||||
if (use_kill) {
|
||||
kill = lp_build_select(&bld_base->base, can_emit,
|
||||
lp_build_const_float(gallivm, 1.0f),
|
||||
lp_build_const_float(gallivm, -1.0f));
|
||||
|
||||
lp_build_intrinsic(gallivm->builder, "llvm.AMDGPU.kill",
|
||||
ctx->voidt, &kill, 1, 0);
|
||||
} else {
|
||||
lp_build_if(&if_state, gallivm, can_emit);
|
||||
}
|
||||
|
||||
for (i = 0; i < info->num_outputs; i++) {
|
||||
LLVMValueRef *out_ptr =
|
||||
@ -5302,6 +5354,7 @@ static void si_llvm_emit_vertex(
|
||||
1, 0, 1, 1, 0);
|
||||
}
|
||||
}
|
||||
|
||||
gs_next_vertex = lp_build_add(uint, gs_next_vertex,
|
||||
lp_build_const_int32(gallivm, 1));
|
||||
|
||||
@ -5312,6 +5365,9 @@ static void si_llvm_emit_vertex(
|
||||
args[1] = LLVMGetParam(ctx->main_fn, SI_PARAM_GS_WAVE_ID);
|
||||
lp_build_intrinsic(gallivm->builder, "llvm.SI.sendmsg",
|
||||
ctx->voidt, args, 2, 0);
|
||||
|
||||
if (!use_kill)
|
||||
lp_build_endif(&if_state);
|
||||
}
|
||||
|
||||
/* Cut one primitive from the geometry shader */
|
||||
@ -5344,7 +5400,7 @@ static void si_llvm_emit_barrier(const struct lp_build_tgsi_action *action,
|
||||
* always fits into a single wave.
|
||||
*/
|
||||
if (ctx->type == PIPE_SHADER_TESS_CTRL) {
|
||||
emit_optimization_barrier(ctx);
|
||||
emit_waitcnt(ctx, LGKM_CNT & VM_CNT);
|
||||
return;
|
||||
}
|
||||
|
||||
@ -5481,6 +5537,23 @@ static void declare_tess_lds(struct si_shader_context *ctx)
|
||||
"tess_lds");
|
||||
}
|
||||
|
||||
static unsigned si_get_max_workgroup_size(struct si_shader *shader)
|
||||
{
|
||||
const unsigned *properties = shader->selector->info.properties;
|
||||
unsigned max_work_group_size =
|
||||
properties[TGSI_PROPERTY_CS_FIXED_BLOCK_WIDTH] *
|
||||
properties[TGSI_PROPERTY_CS_FIXED_BLOCK_HEIGHT] *
|
||||
properties[TGSI_PROPERTY_CS_FIXED_BLOCK_DEPTH];
|
||||
|
||||
if (!max_work_group_size) {
|
||||
/* This is a variable group size compute shader,
|
||||
* compile it for the maximum possible group size.
|
||||
*/
|
||||
max_work_group_size = SI_MAX_VARIABLE_THREADS_PER_BLOCK;
|
||||
}
|
||||
return max_work_group_size;
|
||||
}
|
||||
|
||||
static void create_function(struct si_shader_context *ctx)
|
||||
{
|
||||
struct lp_build_tgsi_context *bld_base = &ctx->soa.bld_base;
|
||||
@ -5706,22 +5779,9 @@ static void create_function(struct si_shader_context *ctx)
|
||||
S_0286D0_FRONT_FACE_ENA(1) |
|
||||
S_0286D0_POS_FIXED_PT_ENA(1));
|
||||
} else if (ctx->type == PIPE_SHADER_COMPUTE) {
|
||||
const unsigned *properties = shader->selector->info.properties;
|
||||
unsigned max_work_group_size =
|
||||
properties[TGSI_PROPERTY_CS_FIXED_BLOCK_WIDTH] *
|
||||
properties[TGSI_PROPERTY_CS_FIXED_BLOCK_HEIGHT] *
|
||||
properties[TGSI_PROPERTY_CS_FIXED_BLOCK_DEPTH];
|
||||
|
||||
if (!max_work_group_size) {
|
||||
/* This is a variable group size compute shader,
|
||||
* compile it for the maximum possible group size.
|
||||
*/
|
||||
max_work_group_size = SI_MAX_VARIABLE_THREADS_PER_BLOCK;
|
||||
}
|
||||
|
||||
si_llvm_add_attribute(ctx->main_fn,
|
||||
"amdgpu-max-work-group-size",
|
||||
max_work_group_size);
|
||||
si_get_max_workgroup_size(shader));
|
||||
}
|
||||
|
||||
shader->info.num_input_sgprs = 0;
|
||||
@ -6643,20 +6703,11 @@ int si_compile_tgsi_shader(struct si_screen *sscreen,
|
||||
* LLVM 3.9svn has this bug.
|
||||
*/
|
||||
if (sel->type == PIPE_SHADER_COMPUTE) {
|
||||
unsigned *props = sel->info.properties;
|
||||
unsigned wave_size = 64;
|
||||
unsigned max_vgprs = 256;
|
||||
unsigned max_sgprs = sscreen->b.chip_class >= VI ? 800 : 512;
|
||||
unsigned max_sgprs_per_wave = 128;
|
||||
unsigned max_block_threads;
|
||||
|
||||
if (props[TGSI_PROPERTY_CS_FIXED_BLOCK_WIDTH])
|
||||
max_block_threads = props[TGSI_PROPERTY_CS_FIXED_BLOCK_WIDTH] *
|
||||
props[TGSI_PROPERTY_CS_FIXED_BLOCK_HEIGHT] *
|
||||
props[TGSI_PROPERTY_CS_FIXED_BLOCK_DEPTH];
|
||||
else
|
||||
max_block_threads = SI_MAX_VARIABLE_THREADS_PER_BLOCK;
|
||||
|
||||
unsigned max_block_threads = si_get_max_workgroup_size(shader);
|
||||
unsigned min_waves_per_cu = DIV_ROUND_UP(max_block_threads, wave_size);
|
||||
unsigned min_waves_per_simd = DIV_ROUND_UP(min_waves_per_cu, 4);
|
||||
|
||||
@ -7746,11 +7797,31 @@ static bool si_shader_select_ps_parts(struct si_screen *sscreen,
|
||||
return true;
|
||||
}
|
||||
|
||||
static void si_fix_num_sgprs(struct si_shader *shader)
|
||||
void si_multiwave_lds_size_workaround(struct si_screen *sscreen,
|
||||
unsigned *lds_size)
|
||||
{
|
||||
/* SPI barrier management bug:
|
||||
* Make sure we have at least 4k of LDS in use to avoid the bug.
|
||||
* It applies to workgroup sizes of more than one wavefront.
|
||||
*/
|
||||
if (sscreen->b.family == CHIP_BONAIRE ||
|
||||
sscreen->b.family == CHIP_KABINI ||
|
||||
sscreen->b.family == CHIP_MULLINS)
|
||||
*lds_size = MAX2(*lds_size, 8);
|
||||
}
|
||||
|
||||
static void si_fix_resource_usage(struct si_screen *sscreen,
|
||||
struct si_shader *shader)
|
||||
{
|
||||
unsigned min_sgprs = shader->info.num_input_sgprs + 2; /* VCC */
|
||||
|
||||
shader->config.num_sgprs = MAX2(shader->config.num_sgprs, min_sgprs);
|
||||
|
||||
if (shader->selector->type == PIPE_SHADER_COMPUTE &&
|
||||
si_get_max_workgroup_size(shader) > 64) {
|
||||
si_multiwave_lds_size_workaround(sscreen,
|
||||
&shader->config.lds_size);
|
||||
}
|
||||
}
|
||||
|
||||
int si_shader_create(struct si_screen *sscreen, LLVMTargetMachineRef tm,
|
||||
@ -7846,7 +7917,7 @@ int si_shader_create(struct si_screen *sscreen, LLVMTargetMachineRef tm,
|
||||
}
|
||||
}
|
||||
|
||||
si_fix_num_sgprs(shader);
|
||||
si_fix_resource_usage(sscreen, shader);
|
||||
si_shader_dump(sscreen, shader, debug, sel->info.processor,
|
||||
stderr);
|
||||
|
||||
|
@ -482,6 +482,8 @@ int si_shader_binary_upload(struct si_screen *sscreen, struct si_shader *shader)
|
||||
void si_shader_dump(struct si_screen *sscreen, struct si_shader *shader,
|
||||
struct pipe_debug_callback *debug, unsigned processor,
|
||||
FILE *f);
|
||||
void si_multiwave_lds_size_workaround(struct si_screen *sscreen,
|
||||
unsigned *lds_size);
|
||||
void si_shader_apply_scratch_relocs(struct si_context *sctx,
|
||||
struct si_shader *shader,
|
||||
struct si_shader_config *config,
|
||||
|
@ -453,8 +453,14 @@ static void *si_create_blend_state_mode(struct pipe_context *ctx,
|
||||
S_028760_ALPHA_COMB_FCN(V_028760_OPT_COMB_BLEND_DISABLED);
|
||||
|
||||
/* Only set dual source blending for MRT0 to avoid a hang. */
|
||||
if (i >= 1 && blend->dual_src_blend)
|
||||
if (i >= 1 && blend->dual_src_blend) {
|
||||
/* Vulkan does this for dual source blending. */
|
||||
if (i == 1)
|
||||
blend_cntl |= S_028780_ENABLE(1);
|
||||
|
||||
si_pm4_set_reg(pm4, R_028780_CB_BLEND0_CONTROL + i * 4, blend_cntl);
|
||||
continue;
|
||||
}
|
||||
|
||||
/* Only addition and subtraction equations are supported with
|
||||
* dual source blending.
|
||||
@ -463,16 +469,14 @@ static void *si_create_blend_state_mode(struct pipe_context *ctx,
|
||||
(eqRGB == PIPE_BLEND_MIN || eqRGB == PIPE_BLEND_MAX ||
|
||||
eqA == PIPE_BLEND_MIN || eqA == PIPE_BLEND_MAX)) {
|
||||
assert(!"Unsupported equation for dual source blending");
|
||||
si_pm4_set_reg(pm4, R_028780_CB_BLEND0_CONTROL + i * 4, blend_cntl);
|
||||
continue;
|
||||
}
|
||||
|
||||
if (!state->rt[j].colormask)
|
||||
continue;
|
||||
|
||||
/* cb_render_state will disable unused ones */
|
||||
blend->cb_target_mask |= (unsigned)state->rt[j].colormask << (4 * i);
|
||||
|
||||
if (!state->rt[j].blend_enable) {
|
||||
if (!state->rt[j].colormask || !state->rt[j].blend_enable) {
|
||||
si_pm4_set_reg(pm4, R_028780_CB_BLEND0_CONTROL + i * 4, blend_cntl);
|
||||
continue;
|
||||
}
|
||||
@ -553,6 +557,17 @@ static void *si_create_blend_state_mode(struct pipe_context *ctx,
|
||||
}
|
||||
|
||||
if (sctx->b.family == CHIP_STONEY) {
|
||||
/* Disable RB+ blend optimizations for dual source blending.
|
||||
* Vulkan does this.
|
||||
*/
|
||||
if (blend->dual_src_blend) {
|
||||
for (int i = 0; i < 8; i++) {
|
||||
sx_mrt_blend_opt[i] =
|
||||
S_028760_COLOR_COMB_FCN(V_028760_OPT_COMB_NONE) |
|
||||
S_028760_ALPHA_COMB_FCN(V_028760_OPT_COMB_NONE);
|
||||
}
|
||||
}
|
||||
|
||||
for (int i = 0; i < 8; i++)
|
||||
si_pm4_set_reg(pm4, R_028760_SX_MRT0_BLEND_OPT + i * 4,
|
||||
sx_mrt_blend_opt[i]);
|
||||
|
@ -154,6 +154,12 @@ static void si_emit_derived_tess_state(struct si_context *sctx,
|
||||
*/
|
||||
*num_patches = MIN2(*num_patches, 40);
|
||||
|
||||
/* SI bug workaround - limit LS-HS threadgroups to only one wave. */
|
||||
if (sctx->b.chip_class == SI) {
|
||||
unsigned one_wave = 64 / MAX2(num_tcs_input_cp, num_tcs_output_cp);
|
||||
*num_patches = MIN2(*num_patches, one_wave);
|
||||
}
|
||||
|
||||
output_patch0_offset = input_patch_size * *num_patches;
|
||||
perpatch_output_offset = output_patch0_offset + pervertex_output_patch_size;
|
||||
|
||||
@ -162,11 +168,13 @@ static void si_emit_derived_tess_state(struct si_context *sctx,
|
||||
|
||||
if (sctx->b.chip_class >= CIK) {
|
||||
assert(lds_size <= 65536);
|
||||
ls_rsrc2 |= S_00B52C_LDS_SIZE(align(lds_size, 512) / 512);
|
||||
lds_size = align(lds_size, 512) / 512;
|
||||
} else {
|
||||
assert(lds_size <= 32768);
|
||||
ls_rsrc2 |= S_00B52C_LDS_SIZE(align(lds_size, 256) / 256);
|
||||
lds_size = align(lds_size, 256) / 256;
|
||||
}
|
||||
si_multiwave_lds_size_workaround(sctx->screen, &lds_size);
|
||||
ls_rsrc2 |= S_00B52C_LDS_SIZE(lds_size);
|
||||
|
||||
if (sctx->last_ls == ls->current &&
|
||||
sctx->last_tcs == tcs &&
|
||||
@ -284,10 +292,18 @@ static unsigned si_get_ia_multi_vgt_param(struct si_context *sctx,
|
||||
|
||||
/* Needed for 028B6C_DISTRIBUTION_MODE != 0 */
|
||||
if (sctx->screen->has_distributed_tess) {
|
||||
if (sctx->gs_shader.cso)
|
||||
if (sctx->gs_shader.cso) {
|
||||
partial_es_wave = true;
|
||||
else
|
||||
|
||||
/* GPU hang workaround. */
|
||||
if (sctx->b.family == CHIP_TONGA ||
|
||||
sctx->b.family == CHIP_FIJI ||
|
||||
sctx->b.family == CHIP_POLARIS10 ||
|
||||
sctx->b.family == CHIP_POLARIS11)
|
||||
partial_vs_wave = true;
|
||||
} else {
|
||||
partial_vs_wave = true;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
|
@ -1777,10 +1777,15 @@ static bool si_update_gs_ring_buffers(struct si_context *sctx)
|
||||
sctx->esgs_ring, 0, sctx->esgs_ring->width0,
|
||||
false, false, 0, 0, 0);
|
||||
}
|
||||
if (sctx->gsvs_ring)
|
||||
if (sctx->gsvs_ring) {
|
||||
si_set_ring_buffer(&sctx->b.b, SI_VS_RING_GSVS,
|
||||
sctx->gsvs_ring, 0, sctx->gsvs_ring->width0,
|
||||
false, false, 0, 0, 0);
|
||||
|
||||
/* Also update SI_GS_RING_GSVSi descriptors. */
|
||||
sctx->last_gsvs_itemsize = 0;
|
||||
}
|
||||
|
||||
return true;
|
||||
}
|
||||
|
||||
|
@ -1865,22 +1865,29 @@ ntq_emit_if(struct vc4_compile *c, nir_if *if_stmt)
|
||||
static void
|
||||
ntq_emit_jump(struct vc4_compile *c, nir_jump_instr *jump)
|
||||
{
|
||||
struct qblock *jump_block;
|
||||
switch (jump->type) {
|
||||
case nir_jump_break:
|
||||
qir_SF(c, c->execute);
|
||||
qir_MOV_cond(c, QPU_COND_ZS, c->execute,
|
||||
qir_uniform_ui(c, c->loop_break_block->index));
|
||||
jump_block = c->loop_break_block;
|
||||
break;
|
||||
|
||||
case nir_jump_continue:
|
||||
qir_SF(c, c->execute);
|
||||
qir_MOV_cond(c, QPU_COND_ZS, c->execute,
|
||||
qir_uniform_ui(c, c->loop_cont_block->index));
|
||||
jump_block = c->loop_cont_block;
|
||||
break;
|
||||
|
||||
case nir_jump_return:
|
||||
unreachable("All returns shouold be lowered\n");
|
||||
default:
|
||||
unreachable("Unsupported jump type\n");
|
||||
}
|
||||
|
||||
qir_SF(c, c->execute);
|
||||
qir_MOV_cond(c, QPU_COND_ZS, c->execute,
|
||||
qir_uniform_ui(c, jump_block->index));
|
||||
|
||||
/* Jump to the destination block if everyone has taken the jump. */
|
||||
qir_SF(c, qir_SUB(c, c->execute, qir_uniform_ui(c, jump_block->index)));
|
||||
qir_BRANCH(c, QPU_COND_BRANCH_ALL_ZS);
|
||||
struct qblock *new_block = qir_new_block(c);
|
||||
qir_link_blocks(c->cur_block, jump_block);
|
||||
qir_link_blocks(c->cur_block, new_block);
|
||||
qir_set_emit_block(c, new_block);
|
||||
}
|
||||
|
||||
static void
|
||||
|
@ -3194,7 +3194,7 @@
|
||||
<value name="Per Process Graphics Address" value="0"/>
|
||||
<value name="Global Graphics Address" value="1"/>
|
||||
</field>
|
||||
<field name="Register Poll Mode" start="16" end="16" type="uint" default="1"/>
|
||||
<field name="Register Poll Mode" start="16" end="16" type="bool"/>
|
||||
<field name="Wait Mode" start="15" end="15" type="uint">
|
||||
<value name="Polling Mode" value="1"/>
|
||||
<value name="Signal Mode" value="0"/>
|
||||
|
@ -8220,7 +8220,6 @@ GEN9_MI_SEMAPHORE_SIGNAL_pack(__gen_user_data *data, void * restrict dst,
|
||||
#define GEN9_MI_SEMAPHORE_WAIT_header \
|
||||
.CommandType = 0, \
|
||||
.MICommandOpcode = 28, \
|
||||
.RegisterPollMode = 1, \
|
||||
.DWordLength = 2
|
||||
|
||||
struct GEN9_MI_SEMAPHORE_WAIT {
|
||||
@ -8229,7 +8228,7 @@ struct GEN9_MI_SEMAPHORE_WAIT {
|
||||
uint32_t MemoryType;
|
||||
#define PerProcessGraphicsAddress 0
|
||||
#define GlobalGraphicsAddress 1
|
||||
uint32_t RegisterPollMode;
|
||||
bool RegisterPollMode;
|
||||
uint32_t WaitMode;
|
||||
#define PollingMode 1
|
||||
#define SignalMode 0
|
||||
|
@ -15247,8 +15247,7 @@ static const uint8_t gen9_xml[] = {
|
||||
0x72, 0x20, 0x50, 0x6f, 0x6c, 0x6c, 0x20, 0x4d, 0x6f, 0x64, 0x65, 0x22,
|
||||
0x20, 0x73, 0x74, 0x61, 0x72, 0x74, 0x3d, 0x22, 0x31, 0x36, 0x22, 0x20,
|
||||
0x65, 0x6e, 0x64, 0x3d, 0x22, 0x31, 0x36, 0x22, 0x20, 0x74, 0x79, 0x70,
|
||||
0x65, 0x3d, 0x22, 0x75, 0x69, 0x6e, 0x74, 0x22, 0x20, 0x64, 0x65, 0x66,
|
||||
0x61, 0x75, 0x6c, 0x74, 0x3d, 0x22, 0x31, 0x22, 0x2f, 0x3e, 0x0a, 0x20,
|
||||
0x65, 0x3d, 0x22, 0x62, 0x6f, 0x6f, 0x6c, 0x22, 0x2f, 0x3e, 0x0a, 0x20,
|
||||
0x20, 0x20, 0x20, 0x3c, 0x66, 0x69, 0x65, 0x6c, 0x64, 0x20, 0x6e, 0x61,
|
||||
0x6d, 0x65, 0x3d, 0x22, 0x57, 0x61, 0x69, 0x74, 0x20, 0x4d, 0x6f, 0x64,
|
||||
0x65, 0x22, 0x20, 0x73, 0x74, 0x61, 0x72, 0x74, 0x3d, 0x22, 0x31, 0x35,
|
||||
|
@ -246,10 +246,12 @@ anv_ptr_free_list_push(void **list, void *elem)
|
||||
static uint32_t
|
||||
anv_block_pool_grow(struct anv_block_pool *pool, struct anv_block_state *state);
|
||||
|
||||
void
|
||||
VkResult
|
||||
anv_block_pool_init(struct anv_block_pool *pool,
|
||||
struct anv_device *device, uint32_t block_size)
|
||||
{
|
||||
VkResult result;
|
||||
|
||||
assert(util_is_power_of_two(block_size));
|
||||
|
||||
pool->device = device;
|
||||
@ -260,17 +262,23 @@ anv_block_pool_init(struct anv_block_pool *pool,
|
||||
|
||||
pool->fd = memfd_create("block pool", MFD_CLOEXEC);
|
||||
if (pool->fd == -1)
|
||||
return;
|
||||
return vk_error(VK_ERROR_INITIALIZATION_FAILED);
|
||||
|
||||
/* Just make it 2GB up-front. The Linux kernel won't actually back it
|
||||
* with pages until we either map and fault on one of them or we use
|
||||
* userptr and send a chunk of it off to the GPU.
|
||||
*/
|
||||
if (ftruncate(pool->fd, BLOCK_POOL_MEMFD_SIZE) == -1)
|
||||
return;
|
||||
if (ftruncate(pool->fd, BLOCK_POOL_MEMFD_SIZE) == -1) {
|
||||
result = vk_error(VK_ERROR_INITIALIZATION_FAILED);
|
||||
goto fail_fd;
|
||||
}
|
||||
|
||||
u_vector_init(&pool->mmap_cleanups,
|
||||
round_to_power_of_two(sizeof(struct anv_mmap_cleanup)), 128);
|
||||
if (!u_vector_init(&pool->mmap_cleanups,
|
||||
round_to_power_of_two(sizeof(struct anv_mmap_cleanup)),
|
||||
128)) {
|
||||
result = vk_error(VK_ERROR_INITIALIZATION_FAILED);
|
||||
goto fail_fd;
|
||||
}
|
||||
|
||||
pool->state.next = 0;
|
||||
pool->state.end = 0;
|
||||
@ -279,6 +287,13 @@ anv_block_pool_init(struct anv_block_pool *pool,
|
||||
|
||||
/* Immediately grow the pool so we'll have a backing bo. */
|
||||
pool->state.end = anv_block_pool_grow(pool, &pool->state);
|
||||
|
||||
return VK_SUCCESS;
|
||||
|
||||
fail_fd:
|
||||
close(pool->fd);
|
||||
|
||||
return result;
|
||||
}
|
||||
|
||||
void
|
||||
|
@ -498,6 +498,7 @@ anv_descriptor_set_destroy(struct anv_device *device,
|
||||
struct surface_state_free_list_entry *entry =
|
||||
set->buffer_views[b].surface_state.map;
|
||||
entry->next = pool->surface_state_free_list;
|
||||
entry->offset = set->buffer_views[b].surface_state.offset;
|
||||
pool->surface_state_free_list = entry;
|
||||
}
|
||||
|
||||
|
@ -24,6 +24,7 @@
|
||||
#include <assert.h>
|
||||
#include <stdbool.h>
|
||||
#include <string.h>
|
||||
#include <sys/mman.h>
|
||||
#include <unistd.h>
|
||||
#include <fcntl.h>
|
||||
|
||||
@ -162,8 +163,6 @@ anv_physical_device_init(struct anv_physical_device *device,
|
||||
device->info.max_cs_threads = max_cs_threads;
|
||||
}
|
||||
|
||||
close(fd);
|
||||
|
||||
brw_process_intel_debug_variable();
|
||||
|
||||
device->compiler = brw_compiler_create(NULL, &device->info);
|
||||
@ -175,12 +174,15 @@ anv_physical_device_init(struct anv_physical_device *device,
|
||||
device->compiler->shader_perf_log = compiler_perf_log;
|
||||
|
||||
result = anv_init_wsi(device);
|
||||
if (result != VK_SUCCESS)
|
||||
goto fail;
|
||||
if (result != VK_SUCCESS) {
|
||||
ralloc_free(device->compiler);
|
||||
goto fail;
|
||||
}
|
||||
|
||||
/* XXX: Actually detect bit6 swizzling */
|
||||
isl_device_init(&device->isl_dev, &device->info, swizzled);
|
||||
|
||||
close(fd);
|
||||
return VK_SUCCESS;
|
||||
|
||||
fail:
|
||||
@ -527,7 +529,7 @@ void anv_GetPhysicalDeviceProperties(
|
||||
.maxGeometryTotalOutputComponents = 1024,
|
||||
.maxFragmentInputComponents = 128,
|
||||
.maxFragmentOutputAttachments = 8,
|
||||
.maxFragmentDualSrcAttachments = 2,
|
||||
.maxFragmentDualSrcAttachments = 1,
|
||||
.maxFragmentCombinedOutputResources = 8,
|
||||
.maxComputeSharedMemorySize = 32768,
|
||||
.maxComputeWorkGroupCount = { 65535, 65535, 65535 },
|
||||
@ -967,10 +969,10 @@ void anv_DestroyDevice(
|
||||
{
|
||||
ANV_FROM_HANDLE(anv_device, device, _device);
|
||||
|
||||
anv_queue_finish(&device->queue);
|
||||
|
||||
anv_device_finish_blorp(device);
|
||||
|
||||
anv_queue_finish(&device->queue);
|
||||
|
||||
#ifdef HAVE_VALGRIND
|
||||
/* We only need to free these to prevent valgrind errors. The backing
|
||||
* BO will go away in a couple of lines so we don't actually leak.
|
||||
@ -978,22 +980,27 @@ void anv_DestroyDevice(
|
||||
anv_state_pool_free(&device->dynamic_state_pool, device->border_colors);
|
||||
#endif
|
||||
|
||||
anv_scratch_pool_finish(device, &device->scratch_pool);
|
||||
|
||||
anv_gem_munmap(device->workaround_bo.map, device->workaround_bo.size);
|
||||
anv_gem_close(device, device->workaround_bo.gem_handle);
|
||||
|
||||
anv_bo_pool_finish(&device->batch_bo_pool);
|
||||
anv_state_pool_finish(&device->dynamic_state_pool);
|
||||
anv_block_pool_finish(&device->dynamic_state_block_pool);
|
||||
anv_state_pool_finish(&device->instruction_state_pool);
|
||||
anv_block_pool_finish(&device->instruction_block_pool);
|
||||
anv_state_pool_finish(&device->surface_state_pool);
|
||||
anv_block_pool_finish(&device->surface_state_block_pool);
|
||||
anv_scratch_pool_finish(device, &device->scratch_pool);
|
||||
anv_state_pool_finish(&device->instruction_state_pool);
|
||||
anv_block_pool_finish(&device->instruction_block_pool);
|
||||
anv_state_pool_finish(&device->dynamic_state_pool);
|
||||
anv_block_pool_finish(&device->dynamic_state_block_pool);
|
||||
|
||||
anv_bo_pool_finish(&device->batch_bo_pool);
|
||||
|
||||
pthread_cond_destroy(&device->queue_submit);
|
||||
pthread_mutex_destroy(&device->mutex);
|
||||
|
||||
anv_gem_destroy_context(device, device->context_id);
|
||||
|
||||
close(device->fd);
|
||||
|
||||
pthread_mutex_destroy(&device->mutex);
|
||||
|
||||
vk_free(&device->alloc, device);
|
||||
}
|
||||
|
||||
@ -1236,6 +1243,9 @@ VkResult anv_AllocateMemory(
|
||||
|
||||
mem->type_index = pAllocateInfo->memoryTypeIndex;
|
||||
|
||||
mem->map = NULL;
|
||||
mem->map_size = 0;
|
||||
|
||||
*pMem = anv_device_memory_to_handle(mem);
|
||||
|
||||
return VK_SUCCESS;
|
||||
@ -1257,6 +1267,9 @@ void anv_FreeMemory(
|
||||
if (mem == NULL)
|
||||
return;
|
||||
|
||||
if (mem->map)
|
||||
anv_UnmapMemory(_device, _mem);
|
||||
|
||||
if (mem->bo.map)
|
||||
anv_gem_munmap(mem->bo.map, mem->bo.size);
|
||||
|
||||
@ -1303,8 +1316,12 @@ VkResult anv_MapMemory(
|
||||
/* Let's map whole pages */
|
||||
map_size = align_u64(map_size, 4096);
|
||||
|
||||
mem->map = anv_gem_mmap(device, mem->bo.gem_handle,
|
||||
map_offset, map_size, gem_flags);
|
||||
void *map = anv_gem_mmap(device, mem->bo.gem_handle,
|
||||
map_offset, map_size, gem_flags);
|
||||
if (map == MAP_FAILED)
|
||||
return vk_error(VK_ERROR_MEMORY_MAP_FAILED);
|
||||
|
||||
mem->map = map;
|
||||
mem->map_size = map_size;
|
||||
|
||||
*ppData = mem->map + (offset - map_offset);
|
||||
@ -1322,6 +1339,9 @@ void anv_UnmapMemory(
|
||||
return;
|
||||
|
||||
anv_gem_munmap(mem->map, mem->map_size);
|
||||
|
||||
mem->map = NULL;
|
||||
mem->map_size = 0;
|
||||
}
|
||||
|
||||
static void
|
||||
|
@ -88,10 +88,8 @@ anv_gem_mmap(struct anv_device *device, uint32_t gem_handle,
|
||||
};
|
||||
|
||||
int ret = anv_ioctl(device->fd, DRM_IOCTL_I915_GEM_MMAP, &gem_mmap);
|
||||
if (ret != 0) {
|
||||
/* FIXME: Is NULL the right error return? Cf MAP_INVALID */
|
||||
return NULL;
|
||||
}
|
||||
if (ret != 0)
|
||||
return MAP_FAILED;
|
||||
|
||||
VG(VALGRIND_MALLOCLIKE_BLOCK(gem_mmap.addr_ptr, gem_mmap.size, 0, 1));
|
||||
return (void *)(uintptr_t) gem_mmap.addr_ptr;
|
||||
|
@ -194,8 +194,8 @@ make_surface(const struct anv_device *dev,
|
||||
anv_finishme("Test gen8 multisampled HiZ");
|
||||
} else {
|
||||
isl_surf_get_hiz_surf(&dev->isl_dev, &image->depth_surface.isl,
|
||||
&image->hiz_surface.isl);
|
||||
add_surface(image, &image->hiz_surface);
|
||||
&image->aux_surface.isl);
|
||||
add_surface(image, &image->aux_surface);
|
||||
}
|
||||
}
|
||||
|
||||
@ -306,16 +306,16 @@ VkResult anv_BindImageMemory(
|
||||
/* The offset and size must be a multiple of 4K or else the
|
||||
* anv_gem_mmap call below will return NULL.
|
||||
*/
|
||||
assert((image->offset + image->hiz_surface.offset) % 4096 == 0);
|
||||
assert(image->hiz_surface.isl.size % 4096 == 0);
|
||||
assert((image->offset + image->aux_surface.offset) % 4096 == 0);
|
||||
assert(image->aux_surface.isl.size % 4096 == 0);
|
||||
|
||||
/* HiZ surfaces need to have their memory cleared to 0 before they
|
||||
* can be used. If we let it have garbage data, it can cause GPU
|
||||
* hangs on some hardware.
|
||||
*/
|
||||
void *map = anv_gem_mmap(device, image->bo->gem_handle,
|
||||
image->offset + image->hiz_surface.offset,
|
||||
image->hiz_surface.isl.size,
|
||||
image->offset + image->aux_surface.offset,
|
||||
image->aux_surface.isl.size,
|
||||
device->info.has_llc ? 0 : I915_MMAP_WC);
|
||||
|
||||
/* If anv_gem_mmap returns NULL, it's likely that the kernel was
|
||||
@ -324,9 +324,9 @@ VkResult anv_BindImageMemory(
|
||||
if (map == NULL)
|
||||
return vk_error(VK_ERROR_OUT_OF_HOST_MEMORY);
|
||||
|
||||
memset(map, 0, image->hiz_surface.isl.size);
|
||||
memset(map, 0, image->aux_surface.isl.size);
|
||||
|
||||
anv_gem_munmap(map, image->hiz_surface.isl.size);
|
||||
anv_gem_munmap(map, image->aux_surface.isl.size);
|
||||
}
|
||||
|
||||
return VK_SUCCESS;
|
||||
|
@ -416,8 +416,8 @@ anv_state_clflush(struct anv_state state)
|
||||
anv_clflush_range(state.map, state.alloc_size);
|
||||
}
|
||||
|
||||
void anv_block_pool_init(struct anv_block_pool *pool,
|
||||
struct anv_device *device, uint32_t block_size);
|
||||
VkResult anv_block_pool_init(struct anv_block_pool *pool,
|
||||
struct anv_device *device, uint32_t block_size);
|
||||
void anv_block_pool_finish(struct anv_block_pool *pool);
|
||||
int32_t anv_block_pool_alloc(struct anv_block_pool *pool);
|
||||
int32_t anv_block_pool_alloc_back(struct anv_block_pool *pool);
|
||||
@ -1526,10 +1526,11 @@ struct anv_image {
|
||||
|
||||
struct {
|
||||
struct anv_surface depth_surface;
|
||||
struct anv_surface hiz_surface;
|
||||
struct anv_surface stencil_surface;
|
||||
};
|
||||
};
|
||||
|
||||
struct anv_surface aux_surface;
|
||||
};
|
||||
|
||||
static inline uint32_t
|
||||
@ -1593,11 +1594,11 @@ anv_image_get_surface_for_aspect_mask(const struct anv_image *image,
|
||||
static inline bool
|
||||
anv_image_has_hiz(const struct anv_image *image)
|
||||
{
|
||||
/* We must check the aspect because anv_image::hiz_surface belongs to
|
||||
* a union.
|
||||
/* We must check the aspect because anv_image::aux_surface may be used for
|
||||
* any type of auxiliary surface, not just HiZ.
|
||||
*/
|
||||
return (image->aspects & VK_IMAGE_ASPECT_DEPTH_BIT) &&
|
||||
image->hiz_surface.isl.size > 0;
|
||||
image->aux_surface.isl.size > 0;
|
||||
}
|
||||
|
||||
struct anv_buffer_view {
|
||||
|
@ -1 +1 @@
|
||||
#define ANV_TIMESTAMP "1480346204"
|
||||
#define ANV_TIMESTAMP "1483631584"
|
||||
|
@ -1356,22 +1356,13 @@ flush_compute_descriptor_set(struct anv_cmd_buffer *cmd_buffer)
|
||||
result = emit_binding_table(cmd_buffer, MESA_SHADER_COMPUTE, &surfaces);
|
||||
assert(result == VK_SUCCESS);
|
||||
}
|
||||
|
||||
result = emit_samplers(cmd_buffer, MESA_SHADER_COMPUTE, &samplers);
|
||||
assert(result == VK_SUCCESS);
|
||||
|
||||
|
||||
struct anv_state push_state = anv_cmd_buffer_cs_push_constants(cmd_buffer);
|
||||
|
||||
const struct brw_cs_prog_data *cs_prog_data = get_cs_prog_data(pipeline);
|
||||
const struct brw_stage_prog_data *prog_data = &cs_prog_data->base;
|
||||
|
||||
if (push_state.alloc_size) {
|
||||
anv_batch_emit(&cmd_buffer->batch, GENX(MEDIA_CURBE_LOAD), curbe) {
|
||||
curbe.CURBETotalDataLength = push_state.alloc_size;
|
||||
curbe.CURBEDataStartAddress = push_state.offset;
|
||||
}
|
||||
}
|
||||
|
||||
const uint32_t slm_size = encode_slm_size(GEN_GEN, prog_data->total_shared);
|
||||
|
||||
struct anv_state state =
|
||||
@ -1441,6 +1432,18 @@ genX(cmd_buffer_flush_compute_state)(struct anv_cmd_buffer *cmd_buffer)
|
||||
cmd_buffer->state.descriptors_dirty &= ~VK_SHADER_STAGE_COMPUTE_BIT;
|
||||
}
|
||||
|
||||
if (cmd_buffer->state.push_constants_dirty & VK_SHADER_STAGE_COMPUTE_BIT) {
|
||||
struct anv_state push_state =
|
||||
anv_cmd_buffer_cs_push_constants(cmd_buffer);
|
||||
|
||||
if (push_state.alloc_size) {
|
||||
anv_batch_emit(&cmd_buffer->batch, GENX(MEDIA_CURBE_LOAD), curbe) {
|
||||
curbe.CURBETotalDataLength = push_state.alloc_size;
|
||||
curbe.CURBEDataStartAddress = push_state.offset;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
cmd_buffer->state.compute_dirty = 0;
|
||||
|
||||
genX(cmd_buffer_apply_pipe_flushes)(cmd_buffer);
|
||||
@ -1796,10 +1799,10 @@ cmd_buffer_emit_depth_stencil(struct anv_cmd_buffer *cmd_buffer)
|
||||
if (has_hiz) {
|
||||
anv_batch_emit(&cmd_buffer->batch, GENX(3DSTATE_HIER_DEPTH_BUFFER), hdb) {
|
||||
hdb.HierarchicalDepthBufferObjectControlState = GENX(MOCS);
|
||||
hdb.SurfacePitch = image->hiz_surface.isl.row_pitch - 1;
|
||||
hdb.SurfacePitch = image->aux_surface.isl.row_pitch - 1;
|
||||
hdb.SurfaceBaseAddress = (struct anv_address) {
|
||||
.bo = image->bo,
|
||||
.offset = image->offset + image->hiz_surface.offset,
|
||||
.offset = image->offset + image->aux_surface.offset,
|
||||
};
|
||||
#if GEN_GEN >= 8
|
||||
/* From the SKL PRM Vol2a:
|
||||
@ -1809,11 +1812,14 @@ cmd_buffer_emit_depth_stencil(struct anv_cmd_buffer *cmd_buffer)
|
||||
* - SURFTYPE_1D: distance in pixels between array slices
|
||||
* - SURFTYPE_2D/CUBE: distance in rows between array slices
|
||||
* - SURFTYPE_3D: distance in rows between R - slices
|
||||
*
|
||||
* Unfortunately, the docs aren't 100% accurate here. They fail to
|
||||
* mention that the 1-D rule only applies to linear 1-D images.
|
||||
* Since depth and HiZ buffers are always tiled, they are treated as
|
||||
* 2-D images. Prior to Sky Lake, this field is always in rows.
|
||||
*/
|
||||
hdb.SurfaceQPitch =
|
||||
image->hiz_surface.isl.dim == ISL_SURF_DIM_1D ?
|
||||
isl_surf_get_array_pitch_el(&image->hiz_surface.isl) >> 2 :
|
||||
isl_surf_get_array_pitch_el_rows(&image->hiz_surface.isl) >> 2;
|
||||
isl_surf_get_array_pitch_el_rows(&image->aux_surface.isl) >> 2;
|
||||
#endif
|
||||
}
|
||||
} else {
|
||||
|
@ -5692,7 +5692,7 @@ fs_visitor::optimize()
|
||||
|
||||
OPT(opt_algebraic);
|
||||
OPT(opt_cse);
|
||||
OPT(opt_copy_propagate);
|
||||
OPT(opt_copy_propagation);
|
||||
OPT(opt_predicated_break, this);
|
||||
OPT(opt_cmod_propagation);
|
||||
OPT(dead_code_eliminate);
|
||||
@ -5716,7 +5716,7 @@ fs_visitor::optimize()
|
||||
}
|
||||
|
||||
if (OPT(lower_d2x)) {
|
||||
OPT(opt_copy_propagate);
|
||||
OPT(opt_copy_propagation);
|
||||
OPT(dead_code_eliminate);
|
||||
}
|
||||
|
||||
@ -5728,12 +5728,12 @@ fs_visitor::optimize()
|
||||
OPT(lower_logical_sends);
|
||||
|
||||
if (progress) {
|
||||
OPT(opt_copy_propagate);
|
||||
OPT(opt_copy_propagation);
|
||||
/* Only run after logical send lowering because it's easier to implement
|
||||
* in terms of physical sends.
|
||||
*/
|
||||
if (OPT(opt_zero_samples))
|
||||
OPT(opt_copy_propagate);
|
||||
OPT(opt_copy_propagation);
|
||||
/* Run after logical send lowering to give it a chance to CSE the
|
||||
* LOAD_PAYLOAD instructions created to construct the payloads of
|
||||
* e.g. texturing messages in cases where it wasn't possible to CSE the
|
||||
@ -5762,7 +5762,7 @@ fs_visitor::optimize()
|
||||
if (devinfo->gen <= 5 && OPT(lower_minmax)) {
|
||||
OPT(opt_cmod_propagation);
|
||||
OPT(opt_cse);
|
||||
OPT(opt_copy_propagate);
|
||||
OPT(opt_copy_propagation);
|
||||
OPT(dead_code_eliminate);
|
||||
}
|
||||
|
||||
|
@ -133,11 +133,11 @@ public:
|
||||
bool opt_redundant_discard_jumps();
|
||||
bool opt_cse();
|
||||
bool opt_cse_local(bblock_t *block);
|
||||
bool opt_copy_propagate();
|
||||
bool opt_copy_propagation();
|
||||
bool try_copy_propagate(fs_inst *inst, int arg, acp_entry *entry);
|
||||
bool try_constant_propagate(fs_inst *inst, acp_entry *entry);
|
||||
bool opt_copy_propagate_local(void *mem_ctx, bblock_t *block,
|
||||
exec_list *acp);
|
||||
bool opt_copy_propagation_local(void *mem_ctx, bblock_t *block,
|
||||
exec_list *acp);
|
||||
bool opt_drop_redundant_mov_to_flags();
|
||||
bool opt_register_renaming();
|
||||
bool register_coalesce();
|
||||
|
@ -129,7 +129,7 @@ fs_copy_prop_dataflow::fs_copy_prop_dataflow(void *mem_ctx, cfg_t *cfg,
|
||||
foreach_in_list(acp_entry, entry, &out_acp[block->num][i]) {
|
||||
acp[next_acp] = entry;
|
||||
|
||||
/* opt_copy_propagate_local populates out_acp with copies created
|
||||
/* opt_copy_propagation_local populates out_acp with copies created
|
||||
* in a block which are still live at the end of the block. This
|
||||
* is exactly what we want in the COPY set.
|
||||
*/
|
||||
@ -431,7 +431,9 @@ fs_visitor::try_copy_propagate(fs_inst *inst, int arg, acp_entry *entry)
|
||||
if (entry->saturate) {
|
||||
switch(inst->opcode) {
|
||||
case BRW_OPCODE_SEL:
|
||||
if (inst->src[1].file != IMM ||
|
||||
if ((inst->conditional_mod != BRW_CONDITIONAL_GE &&
|
||||
inst->conditional_mod != BRW_CONDITIONAL_L) ||
|
||||
inst->src[1].file != IMM ||
|
||||
inst->src[1].f < 0.0 ||
|
||||
inst->src[1].f > 1.0) {
|
||||
return false;
|
||||
@ -735,8 +737,8 @@ can_propagate_from(fs_inst *inst)
|
||||
* list.
|
||||
*/
|
||||
bool
|
||||
fs_visitor::opt_copy_propagate_local(void *copy_prop_ctx, bblock_t *block,
|
||||
exec_list *acp)
|
||||
fs_visitor::opt_copy_propagation_local(void *copy_prop_ctx, bblock_t *block,
|
||||
exec_list *acp)
|
||||
{
|
||||
bool progress = false;
|
||||
|
||||
@ -819,7 +821,7 @@ fs_visitor::opt_copy_propagate_local(void *copy_prop_ctx, bblock_t *block,
|
||||
}
|
||||
|
||||
bool
|
||||
fs_visitor::opt_copy_propagate()
|
||||
fs_visitor::opt_copy_propagation()
|
||||
{
|
||||
bool progress = false;
|
||||
void *copy_prop_ctx = ralloc_context(NULL);
|
||||
@ -832,8 +834,8 @@ fs_visitor::opt_copy_propagate()
|
||||
* the set of copies available at the end of the block.
|
||||
*/
|
||||
foreach_block (block, cfg) {
|
||||
progress = opt_copy_propagate_local(copy_prop_ctx, block,
|
||||
out_acp[block->num]) || progress;
|
||||
progress = opt_copy_propagation_local(copy_prop_ctx, block,
|
||||
out_acp[block->num]) || progress;
|
||||
}
|
||||
|
||||
/* Do dataflow analysis for those available copies. */
|
||||
@ -852,7 +854,8 @@ fs_visitor::opt_copy_propagate()
|
||||
}
|
||||
}
|
||||
|
||||
progress = opt_copy_propagate_local(copy_prop_ctx, block, in_acp) || progress;
|
||||
progress = opt_copy_propagation_local(copy_prop_ctx, block, in_acp) ||
|
||||
progress;
|
||||
}
|
||||
|
||||
for (int i = 0; i < cfg->num_blocks; i++)
|
||||
|
@ -780,7 +780,13 @@ brw_compile_gs(const struct brw_compiler *compiler, void *log_data,
|
||||
if (compiler->devinfo->gen >= 8)
|
||||
output_size_bytes += 32;
|
||||
|
||||
assert(output_size_bytes >= 1);
|
||||
/* Shaders can technically set max_vertices = 0, at which point we
|
||||
* may have a URB size of 0 bytes. Nothing good can come from that,
|
||||
* so enforce a minimum size.
|
||||
*/
|
||||
if (output_size_bytes == 0)
|
||||
output_size_bytes = 1;
|
||||
|
||||
unsigned max_output_size_bytes = GEN7_MAX_GS_URB_ENTRY_SIZE_BYTES;
|
||||
if (compiler->devinfo->gen == 6)
|
||||
max_output_size_bytes = GEN6_MAX_GS_URB_ENTRY_SIZE_BYTES;
|
||||
|
@ -2159,6 +2159,8 @@ intel_miptree_make_shareable(struct brw_context *brw,
|
||||
intel_miptree_release(&mt->mcs_mt);
|
||||
mt->fast_clear_state = INTEL_FAST_CLEAR_STATE_NO_MCS;
|
||||
}
|
||||
|
||||
mt->disable_aux_buffers = true;
|
||||
}
|
||||
|
||||
|
||||
|
213
lib/mesa/src/mesa/drivers/dri/i965/test_fs_copy_propagation.cpp
Normal file
213
lib/mesa/src/mesa/drivers/dri/i965/test_fs_copy_propagation.cpp
Normal file
@ -0,0 +1,213 @@
|
||||
/*
|
||||
* Copyright © 2016 Intel Corporation
|
||||
*
|
||||
* Permission is hereby granted, free of charge, to any person obtaining a
|
||||
* copy of this software and associated documentation files (the "Software"),
|
||||
* to deal in the Software without restriction, including without limitation
|
||||
* the rights to use, copy, modify, merge, publish, distribute, sublicense,
|
||||
* and/or sell copies of the Software, and to permit persons to whom the
|
||||
* Software is furnished to do so, subject to the following conditions:
|
||||
*
|
||||
* The above copyright notice and this permission notice (including the next
|
||||
* paragraph) shall be included in all copies or substantial portions of the
|
||||
* Software.
|
||||
*
|
||||
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
|
||||
* THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
|
||||
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
|
||||
* IN THE SOFTWARE.
|
||||
*/
|
||||
|
||||
#include <gtest/gtest.h>
|
||||
#include "brw_fs.h"
|
||||
#include "brw_cfg.h"
|
||||
#include "program/program.h"
|
||||
|
||||
using namespace brw;
|
||||
|
||||
class copy_propagation_test : public ::testing::Test {
|
||||
virtual void SetUp();
|
||||
|
||||
public:
|
||||
struct brw_compiler *compiler;
|
||||
struct gen_device_info *devinfo;
|
||||
struct gl_context *ctx;
|
||||
struct brw_wm_prog_data *prog_data;
|
||||
struct gl_shader_program *shader_prog;
|
||||
fs_visitor *v;
|
||||
};
|
||||
|
||||
class copy_propagation_fs_visitor : public fs_visitor
|
||||
{
|
||||
public:
|
||||
copy_propagation_fs_visitor(struct brw_compiler *compiler,
|
||||
struct brw_wm_prog_data *prog_data,
|
||||
nir_shader *shader)
|
||||
: fs_visitor(compiler, NULL, NULL, NULL,
|
||||
&prog_data->base, (struct gl_program *) NULL,
|
||||
shader, 8, -1) {}
|
||||
};
|
||||
|
||||
|
||||
void copy_propagation_test::SetUp()
|
||||
{
|
||||
ctx = (struct gl_context *)calloc(1, sizeof(*ctx));
|
||||
compiler = (struct brw_compiler *)calloc(1, sizeof(*compiler));
|
||||
devinfo = (struct gen_device_info *)calloc(1, sizeof(*devinfo));
|
||||
compiler->devinfo = devinfo;
|
||||
|
||||
prog_data = ralloc(NULL, struct brw_wm_prog_data);
|
||||
nir_shader *shader =
|
||||
nir_shader_create(NULL, MESA_SHADER_FRAGMENT, NULL);
|
||||
|
||||
v = new copy_propagation_fs_visitor(compiler, prog_data, shader);
|
||||
|
||||
devinfo->gen = 4;
|
||||
}
|
||||
|
||||
static fs_inst *
|
||||
instruction(bblock_t *block, int num)
|
||||
{
|
||||
fs_inst *inst = (fs_inst *)block->start();
|
||||
for (int i = 0; i < num; i++) {
|
||||
inst = (fs_inst *)inst->next;
|
||||
}
|
||||
return inst;
|
||||
}
|
||||
|
||||
static bool
|
||||
copy_propagation(fs_visitor *v)
|
||||
{
|
||||
const bool print = getenv("TEST_DEBUG");
|
||||
|
||||
if (print) {
|
||||
fprintf(stderr, "= Before =\n");
|
||||
v->cfg->dump(v);
|
||||
}
|
||||
|
||||
bool ret = v->opt_copy_propagation();
|
||||
|
||||
if (print) {
|
||||
fprintf(stderr, "\n= After =\n");
|
||||
v->cfg->dump(v);
|
||||
}
|
||||
|
||||
return ret;
|
||||
}
|
||||
|
||||
TEST_F(copy_propagation_test, basic)
|
||||
{
|
||||
const fs_builder &bld = v->bld;
|
||||
fs_reg vgrf0 = v->vgrf(glsl_type::float_type);
|
||||
fs_reg vgrf1 = v->vgrf(glsl_type::float_type);
|
||||
fs_reg vgrf2 = v->vgrf(glsl_type::float_type);
|
||||
fs_reg vgrf3 = v->vgrf(glsl_type::float_type);
|
||||
bld.MOV(vgrf0, vgrf2);
|
||||
bld.ADD(vgrf1, vgrf0, vgrf3);
|
||||
|
||||
/* = Before =
|
||||
*
|
||||
* 0: mov(8) vgrf0 vgrf2
|
||||
* 1: add(8) vgrf1 vgrf0 vgrf3
|
||||
*
|
||||
* = After =
|
||||
* 0: mov(8) vgrf0 vgrf2
|
||||
* 1: add(8) vgrf1 vgrf2 vgrf3
|
||||
*/
|
||||
|
||||
v->calculate_cfg();
|
||||
bblock_t *block0 = v->cfg->blocks[0];
|
||||
|
||||
EXPECT_EQ(0, block0->start_ip);
|
||||
EXPECT_EQ(1, block0->end_ip);
|
||||
|
||||
EXPECT_TRUE(copy_propagation(v));
|
||||
EXPECT_EQ(0, block0->start_ip);
|
||||
EXPECT_EQ(1, block0->end_ip);
|
||||
|
||||
fs_inst *mov = instruction(block0, 0);
|
||||
EXPECT_EQ(BRW_OPCODE_MOV, mov->opcode);
|
||||
EXPECT_TRUE(mov->dst.equals(vgrf0));
|
||||
EXPECT_TRUE(mov->src[0].equals(vgrf2));
|
||||
|
||||
fs_inst *add = instruction(block0, 1);
|
||||
EXPECT_EQ(BRW_OPCODE_ADD, add->opcode);
|
||||
EXPECT_TRUE(add->dst.equals(vgrf1));
|
||||
EXPECT_TRUE(add->src[0].equals(vgrf2));
|
||||
EXPECT_TRUE(add->src[1].equals(vgrf3));
|
||||
}
|
||||
|
||||
TEST_F(copy_propagation_test, maxmax_sat_imm)
|
||||
{
|
||||
const fs_builder &bld = v->bld;
|
||||
fs_reg vgrf0 = v->vgrf(glsl_type::float_type);
|
||||
fs_reg vgrf1 = v->vgrf(glsl_type::float_type);
|
||||
fs_reg vgrf2 = v->vgrf(glsl_type::float_type);
|
||||
|
||||
static const struct {
|
||||
enum brw_conditional_mod conditional_mod;
|
||||
float immediate;
|
||||
bool expected_result;
|
||||
} test[] = {
|
||||
/* conditional mod, imm, expected_result */
|
||||
{ BRW_CONDITIONAL_GE , 0.1f, true },
|
||||
{ BRW_CONDITIONAL_L , 0.1f, true },
|
||||
{ BRW_CONDITIONAL_GE , 0.5f, true },
|
||||
{ BRW_CONDITIONAL_L , 0.5f, true },
|
||||
{ BRW_CONDITIONAL_GE , 0.9f, true },
|
||||
{ BRW_CONDITIONAL_L , 0.9f, true },
|
||||
{ BRW_CONDITIONAL_GE , -1.5f, false },
|
||||
{ BRW_CONDITIONAL_L , -1.5f, false },
|
||||
{ BRW_CONDITIONAL_GE , 1.5f, false },
|
||||
{ BRW_CONDITIONAL_L , 1.5f, false },
|
||||
|
||||
{ BRW_CONDITIONAL_NONE, 0.5f, false },
|
||||
{ BRW_CONDITIONAL_Z , 0.5f, false },
|
||||
{ BRW_CONDITIONAL_NZ , 0.5f, false },
|
||||
{ BRW_CONDITIONAL_G , 0.5f, false },
|
||||
{ BRW_CONDITIONAL_LE , 0.5f, false },
|
||||
{ BRW_CONDITIONAL_R , 0.5f, false },
|
||||
{ BRW_CONDITIONAL_O , 0.5f, false },
|
||||
{ BRW_CONDITIONAL_U , 0.5f, false },
|
||||
};
|
||||
|
||||
for (unsigned i = 0; i < sizeof(test) / sizeof(test[0]); i++) {
|
||||
fs_inst *mov = set_saturate(true, bld.MOV(vgrf0, vgrf1));
|
||||
fs_inst *sel = set_condmod(test[i].conditional_mod,
|
||||
bld.SEL(vgrf2, vgrf0,
|
||||
brw_imm_f(test[i].immediate)));
|
||||
|
||||
v->calculate_cfg();
|
||||
|
||||
bblock_t *block0 = v->cfg->blocks[0];
|
||||
|
||||
EXPECT_EQ(0, block0->start_ip);
|
||||
EXPECT_EQ(1, block0->end_ip);
|
||||
|
||||
EXPECT_EQ(test[i].expected_result, copy_propagation(v));
|
||||
EXPECT_EQ(0, block0->start_ip);
|
||||
EXPECT_EQ(1, block0->end_ip);
|
||||
|
||||
EXPECT_EQ(BRW_OPCODE_MOV, mov->opcode);
|
||||
EXPECT_TRUE(mov->saturate);
|
||||
EXPECT_TRUE(mov->dst.equals(vgrf0));
|
||||
EXPECT_TRUE(mov->src[0].equals(vgrf1));
|
||||
|
||||
EXPECT_EQ(BRW_OPCODE_SEL, sel->opcode);
|
||||
EXPECT_EQ(test[i].conditional_mod, sel->conditional_mod);
|
||||
EXPECT_EQ(test[i].expected_result, sel->saturate);
|
||||
EXPECT_TRUE(sel->dst.equals(vgrf2));
|
||||
if (test[i].expected_result) {
|
||||
EXPECT_TRUE(sel->src[0].equals(vgrf1));
|
||||
} else {
|
||||
EXPECT_TRUE(sel->src[0].equals(vgrf0));
|
||||
}
|
||||
EXPECT_TRUE(sel->src[1].equals(brw_imm_f(test[i].immediate)));
|
||||
|
||||
delete v->cfg;
|
||||
v->cfg = NULL;
|
||||
}
|
||||
}
|
@ -925,7 +925,7 @@ valid_draw_indirect(struct gl_context *ctx,
|
||||
* buffer bound.
|
||||
*/
|
||||
if (_mesa_is_gles31(ctx) &&
|
||||
ctx->Array.VAO->_Enabled != ctx->Array.VAO->VertexAttribBufferMask) {
|
||||
ctx->Array.VAO->_Enabled & ~ctx->Array.VAO->VertexAttribBufferMask) {
|
||||
_mesa_error(ctx, GL_INVALID_OPERATION, "%s(No VBO bound)", name);
|
||||
return GL_FALSE;
|
||||
}
|
||||
|
@ -2850,6 +2850,7 @@ reuse_framebuffer_texture_attachment(struct gl_framebuffer *fb,
|
||||
dst_att->Type = src_att->Type;
|
||||
dst_att->Complete = src_att->Complete;
|
||||
dst_att->TextureLevel = src_att->TextureLevel;
|
||||
dst_att->CubeMapFace = src_att->CubeMapFace;
|
||||
dst_att->Zoffset = src_att->Zoffset;
|
||||
dst_att->Layered = src_att->Layered;
|
||||
}
|
||||
|
@ -67,9 +67,7 @@ supported_interface_enum(struct gl_context *ctx, GLenum iface)
|
||||
}
|
||||
|
||||
static struct gl_shader_program *
|
||||
lookup_linked_program(GLuint program,
|
||||
const char *caller,
|
||||
bool raise_link_error)
|
||||
lookup_linked_program(GLuint program, const char *caller)
|
||||
{
|
||||
GET_CURRENT_CONTEXT(ctx);
|
||||
struct gl_shader_program *prog =
|
||||
@ -79,66 +77,13 @@ lookup_linked_program(GLuint program,
|
||||
return NULL;
|
||||
|
||||
if (prog->LinkStatus == GL_FALSE) {
|
||||
if (raise_link_error)
|
||||
_mesa_error(ctx, GL_INVALID_OPERATION, "%s(program not linked)",
|
||||
caller);
|
||||
_mesa_error(ctx, GL_INVALID_OPERATION, "%s(program not linked)",
|
||||
caller);
|
||||
return NULL;
|
||||
}
|
||||
return prog;
|
||||
}
|
||||
|
||||
static GLenum
|
||||
stage_from_program_interface(GLenum programInterface)
|
||||
{
|
||||
switch(programInterface) {
|
||||
case GL_VERTEX_SUBROUTINE_UNIFORM:
|
||||
return MESA_SHADER_VERTEX;
|
||||
case GL_TESS_CONTROL_SUBROUTINE_UNIFORM:
|
||||
return MESA_SHADER_TESS_CTRL;
|
||||
case GL_TESS_EVALUATION_SUBROUTINE_UNIFORM:
|
||||
return MESA_SHADER_TESS_EVAL;
|
||||
case GL_GEOMETRY_SUBROUTINE_UNIFORM:
|
||||
return MESA_SHADER_GEOMETRY;
|
||||
case GL_FRAGMENT_SUBROUTINE_UNIFORM:
|
||||
return MESA_SHADER_FRAGMENT;
|
||||
case GL_COMPUTE_SUBROUTINE_UNIFORM:
|
||||
return MESA_SHADER_COMPUTE;
|
||||
default:
|
||||
unreachable("unexpected programInterface value");
|
||||
}
|
||||
}
|
||||
|
||||
static struct gl_linked_shader *
|
||||
lookup_linked_shader(GLuint program,
|
||||
GLenum programInterface,
|
||||
const char *caller)
|
||||
{
|
||||
struct gl_shader_program *shLinkedProg =
|
||||
lookup_linked_program(program, caller, false);
|
||||
gl_shader_stage stage = stage_from_program_interface(programInterface);
|
||||
|
||||
if (!shLinkedProg)
|
||||
return NULL;
|
||||
|
||||
return shLinkedProg->_LinkedShaders[stage];
|
||||
}
|
||||
|
||||
static bool
|
||||
is_subroutine_uniform_program_interface(GLenum programInterface)
|
||||
{
|
||||
switch(programInterface) {
|
||||
case GL_VERTEX_SUBROUTINE_UNIFORM:
|
||||
case GL_TESS_CONTROL_SUBROUTINE_UNIFORM:
|
||||
case GL_TESS_EVALUATION_SUBROUTINE_UNIFORM:
|
||||
case GL_GEOMETRY_SUBROUTINE_UNIFORM:
|
||||
case GL_FRAGMENT_SUBROUTINE_UNIFORM:
|
||||
case GL_COMPUTE_SUBROUTINE_UNIFORM:
|
||||
return true;
|
||||
default:
|
||||
return false;
|
||||
}
|
||||
}
|
||||
|
||||
void GLAPIENTRY
|
||||
_mesa_GetProgramInterfaceiv(GLuint program, GLenum programInterface,
|
||||
GLenum pname, GLint *params)
|
||||
@ -174,49 +119,9 @@ _mesa_GetProgramInterfaceiv(GLuint program, GLenum programInterface,
|
||||
/* Validate pname against interface. */
|
||||
switch(pname) {
|
||||
case GL_ACTIVE_RESOURCES:
|
||||
if (is_subroutine_uniform_program_interface(programInterface)) {
|
||||
/* ARB_program_interface_query doesn't explicitly says that those
|
||||
* uniforms would need a linked shader, or that should fail if it is
|
||||
* not the case, but Section 7.6 (Uniform Variables) of the OpenGL
|
||||
* 4.4 Core Profile says:
|
||||
*
|
||||
* "A uniform is considered an active uniform if the compiler and
|
||||
* linker determine that the uniform will actually be accessed
|
||||
* when the executable code is executed. In cases where the
|
||||
* compiler and linker cannot make a conclusive determination,
|
||||
* the uniform will be considered active."
|
||||
*
|
||||
* So in order to know the real number of active subroutine uniforms
|
||||
* we would need a linked shader .
|
||||
*
|
||||
* At the same time, Section 7.3 (Program Objects) of the OpenGL 4.4
|
||||
* Core Profile says:
|
||||
*
|
||||
* "The GL provides various commands allowing applications to
|
||||
* enumerate and query properties of active variables and in-
|
||||
* terface blocks for a specified program. If one of these
|
||||
* commands is called with a program for which LinkProgram
|
||||
* succeeded, the information recorded when the program was
|
||||
* linked is returned. If one of these commands is called with a
|
||||
* program for which LinkProgram failed, no error is generated
|
||||
* unless otherwise noted."
|
||||
* <skip>
|
||||
* "If one of these commands is called with a program for which
|
||||
* LinkProgram had never been called, no error is generated
|
||||
* unless otherwise noted, and the program object is considered
|
||||
* to have no active variables or interface blocks."
|
||||
*
|
||||
* So if the program is not linked we will return 0.
|
||||
*/
|
||||
struct gl_linked_shader *sh =
|
||||
lookup_linked_shader(program, programInterface, "glGetProgramInterfaceiv");
|
||||
|
||||
*params = sh ? sh->NumSubroutineUniforms : 0;
|
||||
} else {
|
||||
for (i = 0, *params = 0; i < shProg->NumProgramResourceList; i++)
|
||||
if (shProg->ProgramResourceList[i].Type == programInterface)
|
||||
(*params)++;
|
||||
}
|
||||
for (i = 0, *params = 0; i < shProg->NumProgramResourceList; i++)
|
||||
if (shProg->ProgramResourceList[i].Type == programInterface)
|
||||
(*params)++;
|
||||
break;
|
||||
case GL_MAX_NAME_LENGTH:
|
||||
if (programInterface == GL_ATOMIC_COUNTER_BUFFER ||
|
||||
@ -500,7 +405,7 @@ _mesa_GetProgramResourceLocation(GLuint program, GLenum programInterface,
|
||||
}
|
||||
|
||||
struct gl_shader_program *shProg =
|
||||
lookup_linked_program(program, "glGetProgramResourceLocation", true);
|
||||
lookup_linked_program(program, "glGetProgramResourceLocation");
|
||||
|
||||
if (!shProg || !name)
|
||||
return -1;
|
||||
@ -556,7 +461,7 @@ _mesa_GetProgramResourceLocationIndex(GLuint program, GLenum programInterface,
|
||||
}
|
||||
|
||||
struct gl_shader_program *shProg =
|
||||
lookup_linked_program(program, "glGetProgramResourceLocationIndex", true);
|
||||
lookup_linked_program(program, "glGetProgramResourceLocationIndex");
|
||||
|
||||
if (!shProg || !name)
|
||||
return -1;
|
||||
|
@ -65,6 +65,7 @@ wsi_queue_init(struct wsi_queue *queue, int length)
|
||||
if (ret)
|
||||
goto fail_cond;
|
||||
|
||||
pthread_condattr_destroy(&condattr);
|
||||
return 0;
|
||||
|
||||
fail_cond:
|
||||
|
Loading…
Reference in New Issue
Block a user