Import Mesa 13.0.3

This commit is contained in:
jsg 2017-01-19 05:38:52 +00:00
parent b93dd40b0b
commit 6d4ad6fa4b
47 changed files with 810 additions and 292 deletions

View File

@ -1 +1 @@
13.0.2
13.0.3

View File

@ -31,7 +31,8 @@ because compatibility contexts are not supported.
<h2>SHA256 checksums</h2>
<pre>
TBD
6014233a5db6032ab8de4881384871bbe029de684502707794ce7b3e6beec308 mesa-13.0.2.tar.gz
a6ed622645f4ed61da418bf65adde5bcc4bb79023c36ba7d6b45b389da4416d5 mesa-13.0.2.tar.xz
</pre>

View File

@ -0,0 +1,176 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Mesa Release Notes</title>
<link rel="stylesheet" type="text/css" href="../mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
</div>
<iframe src="../contents.html"></iframe>
<div class="content">
<h1>Mesa 13.0.3 Release Notes / January 5, 2017</h1>
<p>
Mesa 13.0.3 is a bug fix release which fixes bugs found since the 13.0.2 release.
</p>
<p>
Mesa 13.0.3 implements the OpenGL 4.4 API, but the version reported by
glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
Some drivers don't support all the features required in OpenGL 4.4. OpenGL
4.4 is <strong>only</strong> available if requested at context creation
because compatibility contexts are not supported.
</p>
<h2>SHA256 checksums</h2>
<pre>
TBD
</pre>
<h2>New features</h2>
<p>None</p>
<h2>Bug fixes</h2>
<ul>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=77662">Bug 77662</a> - Fail to render to different faces of depth-stencil cube map</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92234">Bug 92234</a> - [BDW] GPU hang in Shogun2</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=98329">Bug 98329</a> - [dEQP, EGL, SKL, BDW, BSW] dEQP-EGL.functional.image.render_multiple_contexts.gles2_renderbuffer_depth16_depth_buffer</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=99038">Bug 99038</a> - [dEQP, EGL, SKL, BDW, BSW] dEQP-EGL.functional.negative_api.create_pixmap_surface crashes</li>
</ul>
<h2>Changes</h2>
<p>Chad Versace (2):</p>
<ul>
<li>i965/mt: Disable aux surfaces after making miptree shareable</li>
<li>egl: Fix crashes in eglCreate*Surface()</li>
</ul>
<p>Dave Airlie (4):</p>
<ul>
<li>anv: set maxFragmentDualSrcAttachments to 1</li>
<li>radv: set maxFragmentDualSrcAttachments to 1</li>
<li>radv: fix another regression since shadow fixes.</li>
<li>radv: add missing license file to radv_meta_bufimage.</li>
</ul>
<p>Emil Velikov (5):</p>
<ul>
<li>docs: add sha256 checksums for 13.0.2</li>
<li>anv: don't double-close the same fd</li>
<li>anv: don't leak memory if anv_init_wsi() fails</li>
<li>radv: don't leak the fd if radv_physical_device_init() succeeds</li>
<li>Update version to 13.0.3</li>
</ul>
<p>Eric Anholt (1):</p>
<ul>
<li>vc4: In a loop break/continue, jump if everyone has taken the path.</li>
</ul>
<p>Gwan-gyeong Mun (3):</p>
<ul>
<li>anv: Add missing error-checking to anv_block_pool_init (v2)</li>
<li>anv: Update the teardown in reverse order of the anv_CreateDevice</li>
<li>vulkan/wsi: Fix resource leak in success path of wsi_queue_init()</li>
</ul>
<p>Haixia Shi (1):</p>
<ul>
<li>compiler/glsl: fix precision problem of tanh</li>
</ul>
<p>Ilia Mirkin (1):</p>
<ul>
<li>mesa: only verify that enabled arrays have backing buffers</li>
</ul>
<p>Jason Ekstrand (8):</p>
<ul>
<li>anv/cmd_buffer: Re-emit MEDIA_CURBE_LOAD when CS push constants are dirty</li>
<li>anv/image: Rename hiz_surface to aux_surface</li>
<li>anv/cmd_buffer: Remove the 1-D case from the HiZ QPitch calculation</li>
<li>genxml/gen9: Change the default of MI_SEMAPHORE_WAIT::RegisterPoleMode</li>
<li>anv/device: Return the right error for failed maps</li>
<li>anv/device: Implicitly unmap memory objects in FreeMemory</li>
<li>anv/descriptor_set: Write the state offset in the surface state free list.</li>
<li>spirv: Use a simpler and more correct implementaiton of tanh()</li>
</ul>
<p>Kenneth Graunke (1):</p>
<ul>
<li>i965: Allocate at least some URB space even when max_vertices = 0.</li>
</ul>
<p>Marek Olšák (17):</p>
<ul>
<li>radeonsi: always set all blend registers</li>
<li>radeonsi: set CB_BLEND1_CONTROL.ENABLE for dual source blending</li>
<li>radeonsi: disable RB+ blend optimizations for dual source blending</li>
<li>radeonsi: consolidate max-work-group-size computation</li>
<li>radeonsi: apply a multi-wave workgroup SPI bug workaround to affected CIK chips</li>
<li>radeonsi: apply a TC L1 write corruption workaround for SI</li>
<li>radeonsi: apply a tessellation bug workaround for SI</li>
<li>radeonsi: add a tess+GS hang workaround for VI dGPUs</li>
<li>radeonsi: apply the double EVENT_WRITE_EOP workaround to VI as well</li>
<li>cso: don't release sampler states that are bound</li>
<li>radeonsi: always restore sampler states when unbinding sampler views</li>
<li>radeonsi: fix incorrect FMASK checking in bind_sampler_states</li>
<li>radeonsi: allow specifying simm16 of emit_waitcnt at call sites</li>
<li>radeonsi: wait for outstanding memory instructions in TCS barriers</li>
<li>tgsi: fix the src type of TGSI_OPCODE_MEMBAR</li>
<li>radeonsi: wait for outstanding LDS instructions in memory barriers if needed</li>
<li>radeonsi: disable the constant engine (CE) on Carrizo and Stoney</li>
</ul>
<p>Matt Turner (3):</p>
<ul>
<li>i965/fs: Rename opt_copy_propagate -&gt; opt_copy_propagation.</li>
<li>i965/fs: Add unit tests for copy propagation pass.</li>
<li>i965/fs: Reject copy propagation into SEL if not min/max.</li>
</ul>
<p>Nanley Chery (1):</p>
<ul>
<li>mesa/fbobject: Update CubeMapFace when reusing textures</li>
</ul>
<p>Nicolai Hähnle (4):</p>
<ul>
<li>radeonsi: fix isolines tess factor writes to control ring</li>
<li>radeonsi: update all GSVS ring descriptors for new buffer allocations</li>
<li>radeonsi: do not kill GS with memory writes</li>
<li>radeonsi: fix an off-by-one error in the bounds check for max_vertices</li>
</ul>
<p>Rhys Kidd (1):</p>
<ul>
<li>glsl: Add pthread libs to cache_test</li>
</ul>
<p>Timothy Arceri (2):</p>
<ul>
<li>mesa: fix active subroutine uniforms properly</li>
<li>Revert "nir: Turn imov/fmov of undef into undef."</li>
</ul>
</div>
</body>
</html>

View File

@ -3545,7 +3545,7 @@ static void visit_tex(struct nir_to_llvm_context *ctx, nir_tex_instr *instr)
if (instr->op == nir_texop_query_levels)
result = LLVMBuildExtractElement(ctx->builder, result, LLVMConstInt(ctx->i32, 3, false), "");
else if (instr->is_shadow && instr->op != nir_texop_txs && instr->op != nir_texop_lod)
else if (instr->is_shadow && instr->op != nir_texop_txs && instr->op != nir_texop_lod && instr->op != nir_texop_tg4)
result = LLVMBuildExtractElement(ctx->builder, result, ctx->i32zero, "");
else if (instr->op == nir_texop_txs &&
instr->sampler_dim == GLSL_SAMPLER_DIM_CUBE &&

View File

@ -91,6 +91,7 @@ radv_physical_device_init(struct radv_physical_device *device,
fprintf(stderr, "WARNING: radv is not a conformant vulkan implementation, testing use only.\n");
device->name = device->rad_info.name;
close(fd);
return VK_SUCCESS;
fail:
@ -424,7 +425,7 @@ void radv_GetPhysicalDeviceProperties(
.maxGeometryTotalOutputComponents = 1024,
.maxFragmentInputComponents = 128,
.maxFragmentOutputAttachments = 8,
.maxFragmentDualSrcAttachments = 2,
.maxFragmentDualSrcAttachments = 1,
.maxFragmentCombinedOutputResources = 8,
.maxComputeSharedMemorySize = 32768,
.maxComputeWorkGroupCount = { 65535, 65535, 65535 },

View File

@ -1,6 +1,33 @@
/*
* Copyright © 2016 Red Hat.
* Copyright © 2016 Bas Nieuwenhuizen
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and associated documentation files (the "Software"),
* to deal in the Software without restriction, including without limitation
* the rights to use, copy, modify, merge, publish, distribute, sublicense,
* and/or sell copies of the Software, and to permit persons to whom the
* Software is furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice (including the next
* paragraph) shall be included in all copies or substantial portions of the
* Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
* THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
* IN THE SOFTWARE.
*/
#include "radv_meta.h"
#include "nir/nir_builder.h"
/*
* Compute shader implementation of image->buffer copy.
*/
static nir_shader *
build_nir_itob_compute_shader(struct radv_device *dev)
{

View File

@ -1 +1 @@
#define RADV_TIMESTAMP "1480346204"
#define RADV_TIMESTAMP "1483631585"

View File

@ -3563,9 +3563,17 @@ builtin_builder::_tanh(const glsl_type *type)
ir_variable *x = in_var(type, "x");
MAKE_SIG(type, v130, 1, x);
/* Clamp x to [-10, +10] to avoid precision problems.
* When x > 10, e^(-x) is so small relative to e^x that it gets flushed to
* zero in the computation e^x + e^(-x). The same happens in the other
* direction when x < -10.
*/
ir_variable *t = body.make_temp(type, "tmp");
body.emit(assign(t, min2(max2(x, imm(-10.0f)), imm(10.0f))));
/* (e^x - e^(-x)) / (e^x + e^(-x)) */
body.emit(ret(div(sub(exp(x), exp(neg(x))),
add(exp(x), exp(neg(x))))));
body.emit(ret(div(sub(exp(t), exp(neg(t))),
add(exp(t), exp(neg(t))))));
return sig;
}

View File

@ -633,6 +633,8 @@ private:
uniform->opaque[shader_type].index = this->next_subroutine;
uniform->opaque[shader_type].active = true;
prog->_LinkedShaders[shader_type]->NumSubroutineUniforms++;
/* Increment the subroutine index by 1 for non-arrays and by the
* number of array elements for arrays.
*/

View File

@ -3118,7 +3118,6 @@ link_calculate_subroutine_compat(struct gl_shader_program *prog)
if (!uni)
continue;
sh->NumSubroutineUniforms++;
count = 0;
if (sh->NumSubroutineFunctions == 0) {
linker_error(prog, "subroutine uniform %s defined but no valid functions found\n", uni->type->name);

View File

@ -79,9 +79,7 @@ opt_undef_vecN(nir_builder *b, nir_alu_instr *alu)
{
if (alu->op != nir_op_vec2 &&
alu->op != nir_op_vec3 &&
alu->op != nir_op_vec4 &&
alu->op != nir_op_fmov &&
alu->op != nir_op_imov)
alu->op != nir_op_vec4)
return false;
assert(alu->dest.dest.is_ssa);

View File

@ -565,16 +565,21 @@ handle_glsl450_alu(struct vtn_builder *b, enum GLSLstd450 entrypoint,
build_exp(nb, nir_fneg(nb, src[0]))));
return;
case GLSLstd450Tanh:
/* (0.5 * (e^x - e^(-x))) / (0.5 * (e^x + e^(-x))) */
val->ssa->def =
nir_fdiv(nb, nir_fmul(nb, nir_imm_float(nb, 0.5f),
nir_fsub(nb, build_exp(nb, src[0]),
build_exp(nb, nir_fneg(nb, src[0])))),
nir_fmul(nb, nir_imm_float(nb, 0.5f),
nir_fadd(nb, build_exp(nb, src[0]),
build_exp(nb, nir_fneg(nb, src[0])))));
case GLSLstd450Tanh: {
/* tanh(x) := (0.5 * (e^x - e^(-x))) / (0.5 * (e^x + e^(-x)))
*
* With a little algebra this reduces to (e^2x - 1) / (e^2x + 1)
*
* We clamp x to (-inf, +10] to avoid precision problems. When x > 10,
* e^2x is so much larger than 1.0 that 1.0 gets flushed to zero in the
* computation e^2x +/- 1 so it can be ignored.
*/
nir_ssa_def *x = nir_fmin(nb, src[0], nir_imm_float(nb, 10));
nir_ssa_def *exp2x = build_exp(nb, nir_fmul(nb, x, nir_imm_float(nb, 2)));
val->ssa->def = nir_fdiv(nb, nir_fsub(nb, exp2x, nir_imm_float(nb, 1)),
nir_fadd(nb, exp2x, nir_imm_float(nb, 1)));
return;
}
case GLSLstd450Asinh:
val->ssa->def = nir_fmul(nb, nir_fsign(nb, src[0]),

View File

@ -849,7 +849,7 @@ _eglCreateWindowSurfaceCommon(_EGLDisplay *disp, EGLConfig config,
RETURN_EGL_ERROR(disp, EGL_BAD_NATIVE_WINDOW, EGL_NO_SURFACE);
#ifdef HAVE_SURFACELESS_PLATFORM
if (disp->Platform == _EGL_PLATFORM_SURFACELESS) {
if (disp && disp->Platform == _EGL_PLATFORM_SURFACELESS) {
/* From the EGL_MESA_platform_surfaceless spec (v1):
*
* eglCreatePlatformWindowSurface fails when called with a <display>
@ -970,7 +970,7 @@ _eglCreatePixmapSurfaceCommon(_EGLDisplay *disp, EGLConfig config,
EGLSurface ret;
#if HAVE_SURFACELESS_PLATFORM
if (disp->Platform == _EGL_PLATFORM_SURFACELESS) {
if (disp && disp->Platform == _EGL_PLATFORM_SURFACELESS) {
/* From the EGL_MESA_platform_surfaceless spec (v1):
*
* [Like eglCreatePlatformWindowSurface,] eglCreatePlatformPixmapSurface

View File

@ -188,7 +188,9 @@ cso_insert_state(struct cso_cache *sc,
void *state)
{
struct cso_hash *hash = _cso_hash_for_type(sc, type);
sanitize_hash(sc, hash, type, sc->max_size);
if (type != CSO_SAMPLER)
sanitize_hash(sc, hash, type, sc->max_size);
return cso_hash_insert(hash, hash_key, state);
}

View File

@ -485,6 +485,7 @@ tgsi_opcode_infer_src_type( uint opcode )
case TGSI_OPCODE_UMUL_HI:
case TGSI_OPCODE_UP2H:
case TGSI_OPCODE_U2I64:
case TGSI_OPCODE_MEMBAR:
return TGSI_TYPE_UNSIGNED;
case TGSI_OPCODE_IMUL_HI:
case TGSI_OPCODE_I2F:

View File

@ -85,7 +85,8 @@ void r600_gfx_write_fence(struct r600_common_context *ctx, struct r600_resource
{
struct radeon_winsys_cs *cs = ctx->gfx.cs;
if (ctx->chip_class == CIK) {
if (ctx->chip_class == CIK ||
ctx->chip_class == VI) {
/* Two EOP events are required to make all engines go idle
* (and optional cache flushes executed) before the timestamp
* is written.
@ -114,7 +115,8 @@ unsigned r600_gfx_write_fence_dwords(struct r600_common_screen *screen)
{
unsigned dwords = 6;
if (screen->chip_class == CIK)
if (screen->chip_class == CIK ||
screen->chip_class == VI)
dwords *= 2;
if (!screen->info.has_virtual_memory)

View File

@ -343,6 +343,7 @@ static bool si_switch_compute_shader(struct si_context *sctx,
lds_blocks += align(program->local_size, 512) >> 9;
}
/* TODO: use si_multiwave_lds_size_workaround */
assert(lds_blocks <= 0xFF);
config->rsrc2 &= C_00B84C_LDS_SIZE;

View File

@ -413,13 +413,13 @@ static void si_set_sampler_view(struct si_context *sctx,
struct si_sampler_views *views = &sctx->samplers[shader].views;
struct si_sampler_view *rview = (struct si_sampler_view*)view;
struct si_descriptors *descs = si_sampler_descriptors(sctx, shader);
uint32_t *desc = descs->list + slot * 16;
if (views->views[slot] == view && !disallow_early_out)
return;
if (view) {
struct r600_texture *rtex = (struct r600_texture *)view->texture;
uint32_t *desc = descs->list + slot * 16;
assert(rtex); /* views with texture == NULL aren't supported */
pipe_sampler_view_reference(&views->views[slot], view);
@ -468,9 +468,14 @@ static void si_set_sampler_view(struct si_context *sctx,
rview->is_stencil_sampler, true);
} else {
pipe_sampler_view_reference(&views->views[slot], NULL);
memcpy(descs->list + slot*16, null_texture_descriptor, 8*4);
memcpy(desc, null_texture_descriptor, 8*4);
/* Only clear the lower dwords of FMASK. */
memcpy(descs->list + slot*16 + 8, null_texture_descriptor, 4*4);
memcpy(desc + 8, null_texture_descriptor, 4*4);
/* Re-set the sampler state if we are transitioning from FMASK. */
if (views->sampler_states[slot])
memcpy(desc + 12,
views->sampler_states[slot], 4*4);
views->enabled_mask &= ~(1u << slot);
}
@ -803,10 +808,10 @@ static void si_bind_sampler_states(struct pipe_context *ctx,
/* If FMASK is bound, don't overwrite it.
* The sampler state will be set after FMASK is unbound.
*/
if (samplers->views.views[i] &&
samplers->views.views[i]->texture &&
samplers->views.views[i]->texture->target != PIPE_BUFFER &&
((struct r600_texture*)samplers->views.views[i]->texture)->fmask.size)
if (samplers->views.views[slot] &&
samplers->views.views[slot]->texture &&
samplers->views.views[slot]->texture->target != PIPE_BUFFER &&
((struct r600_texture*)samplers->views.views[slot]->texture)->fmask.size)
continue;
memcpy(desc->list + slot * 16 + 12, sstates[i]->val, 4*4);

View File

@ -187,7 +187,10 @@ static struct pipe_context *si_create_context(struct pipe_screen *screen,
/* SI + AMDGPU + CE = GPU hang */
if (!(sscreen->b.debug_flags & DBG_NO_CE) && ws->cs_add_const_ib &&
sscreen->b.chip_class != SI) {
sscreen->b.chip_class != SI &&
/* These can't use CE due to a power gating bug in the kernel. */
sscreen->b.family != CHIP_CARRIZO &&
sscreen->b.family != CHIP_STONEY) {
sctx->ce_ib = ws->cs_add_const_ib(sctx->b.gfx.cs);
if (!sctx->ce_ib)
goto fail;

View File

@ -2577,10 +2577,18 @@ static void si_write_tess_factors(struct lp_build_tgsi_context *bld_base,
lp_build_const_int32(gallivm,
tess_outer_index * 4), "");
for (i = 0; i < outer_comps; i++)
out[i] = lds_load(bld_base, TGSI_TYPE_SIGNED, i, lds_outer);
for (i = 0; i < inner_comps; i++)
out[outer_comps+i] = lds_load(bld_base, TGSI_TYPE_SIGNED, i, lds_inner);
if (shader->key.tcs.epilog.prim_mode == PIPE_PRIM_LINES) {
/* For isolines, the hardware expects tess factors in the
* reverse order from what GLSL / TGSI specify.
*/
out[0] = lds_load(bld_base, TGSI_TYPE_SIGNED, 1, lds_outer);
out[1] = lds_load(bld_base, TGSI_TYPE_SIGNED, 0, lds_outer);
} else {
for (i = 0; i < outer_comps; i++)
out[i] = lds_load(bld_base, TGSI_TYPE_SIGNED, i, lds_outer);
for (i = 0; i < inner_comps; i++)
out[outer_comps+i] = lds_load(bld_base, TGSI_TYPE_SIGNED, i, lds_inner);
}
/* Convert the outputs to vectors for stores. */
vec0 = lp_build_gather_values(gallivm, out, MIN2(stride, 4));
@ -3301,6 +3309,7 @@ static void build_tex_intrinsic(const struct lp_build_tgsi_action *action,
* point in the program by emitting empty inline assembly that is marked as
* having side effects.
*/
#if 0 /* unused currently */
static void emit_optimization_barrier(struct si_shader_context *ctx)
{
LLVMBuilderRef builder = ctx->gallivm.builder;
@ -3308,13 +3317,19 @@ static void emit_optimization_barrier(struct si_shader_context *ctx)
LLVMValueRef inlineasm = LLVMConstInlineAsm(ftype, "", "", true, false);
LLVMBuildCall(builder, inlineasm, NULL, 0, "");
}
#endif
static void emit_waitcnt(struct si_shader_context *ctx)
/* Combine these with & instead of |. */
#define NOOP_WAITCNT 0xf7f
#define LGKM_CNT 0x07f
#define VM_CNT 0xf70
static void emit_waitcnt(struct si_shader_context *ctx, unsigned simm16)
{
struct gallivm_state *gallivm = &ctx->gallivm;
LLVMBuilderRef builder = gallivm->builder;
LLVMValueRef args[1] = {
lp_build_const_int32(gallivm, 0xf70)
lp_build_const_int32(gallivm, simm16)
};
lp_build_intrinsic(builder, "llvm.amdgcn.s.waitcnt",
ctx->voidt, args, 1, 0);
@ -3326,8 +3341,23 @@ static void membar_emit(
struct lp_build_emit_data *emit_data)
{
struct si_shader_context *ctx = si_shader_context(bld_base);
LLVMValueRef src0 = lp_build_emit_fetch(bld_base, emit_data->inst, 0, 0);
unsigned flags = LLVMConstIntGetZExtValue(src0);
unsigned waitcnt = NOOP_WAITCNT;
emit_waitcnt(ctx);
if (flags & TGSI_MEMBAR_THREAD_GROUP)
waitcnt &= VM_CNT & LGKM_CNT;
if (flags & (TGSI_MEMBAR_ATOMIC_BUFFER |
TGSI_MEMBAR_SHADER_BUFFER |
TGSI_MEMBAR_SHADER_IMAGE))
waitcnt &= VM_CNT;
if (flags & TGSI_MEMBAR_SHARED)
waitcnt &= LGKM_CNT;
if (waitcnt != NOOP_WAITCNT)
emit_waitcnt(ctx, waitcnt);
}
static LLVMValueRef
@ -3481,7 +3511,8 @@ static void image_append_args(
struct si_shader_context *ctx,
struct lp_build_emit_data * emit_data,
unsigned target,
bool atomic)
bool atomic,
bool force_glc)
{
const struct tgsi_full_instruction *inst = emit_data->inst;
LLVMValueRef i1false = LLVMConstInt(ctx->i1, 0, 0);
@ -3489,6 +3520,7 @@ static void image_append_args(
LLVMValueRef r128 = i1false;
LLVMValueRef da = tgsi_is_array_image(target) ? i1true : i1false;
LLVMValueRef glc =
force_glc ||
inst->Memory.Qualifier & (TGSI_MEMORY_COHERENT | TGSI_MEMORY_VOLATILE) ?
i1true : i1false;
LLVMValueRef slc = i1false;
@ -3543,7 +3575,8 @@ static void buffer_append_args(
LLVMValueRef rsrc,
LLVMValueRef index,
LLVMValueRef offset,
bool atomic)
bool atomic,
bool force_glc)
{
const struct tgsi_full_instruction *inst = emit_data->inst;
LLVMValueRef i1false = LLVMConstInt(ctx->i1, 0, 0);
@ -3554,6 +3587,7 @@ static void buffer_append_args(
emit_data->args[emit_data->arg_count++] = offset; /* voffset */
if (!atomic) {
emit_data->args[emit_data->arg_count++] =
force_glc ||
inst->Memory.Qualifier & (TGSI_MEMORY_COHERENT | TGSI_MEMORY_VOLATILE) ?
i1true : i1false; /* glc */
}
@ -3583,7 +3617,7 @@ static void load_fetch_args(
offset = LLVMBuildBitCast(builder, tmp, bld_base->uint_bld.elem_type, "");
buffer_append_args(ctx, emit_data, rsrc, bld_base->uint_bld.zero,
offset, false);
offset, false, false);
} else if (inst->Src[0].Register.File == TGSI_FILE_IMAGE) {
LLVMValueRef coords;
@ -3593,14 +3627,14 @@ static void load_fetch_args(
if (target == TGSI_TEXTURE_BUFFER) {
rsrc = extract_rsrc_top_half(ctx, rsrc);
buffer_append_args(ctx, emit_data, rsrc, coords,
bld_base->uint_bld.zero, false);
bld_base->uint_bld.zero, false, false);
} else {
emit_data->args[0] = coords;
emit_data->args[1] = rsrc;
emit_data->args[2] = lp_build_const_int32(gallivm, 15); /* dmask */
emit_data->arg_count = 3;
image_append_args(ctx, emit_data, target, false);
image_append_args(ctx, emit_data, target, false, false);
}
}
}
@ -3727,7 +3761,7 @@ static void load_emit(
}
if (inst->Memory.Qualifier & TGSI_MEMORY_VOLATILE)
emit_waitcnt(ctx);
emit_waitcnt(ctx, VM_CNT);
if (inst->Src[0].Register.File == TGSI_FILE_BUFFER) {
load_emit_buffer(ctx, emit_data);
@ -3790,11 +3824,19 @@ static void store_fetch_args(
offset = LLVMBuildBitCast(builder, tmp, bld_base->uint_bld.elem_type, "");
buffer_append_args(ctx, emit_data, rsrc, bld_base->uint_bld.zero,
offset, false);
offset, false, false);
} else if (inst->Dst[0].Register.File == TGSI_FILE_IMAGE) {
unsigned target = inst->Memory.Texture;
LLVMValueRef coords;
/* 8bit/16bit TC L1 write corruption bug on SI.
* All store opcodes not aligned to a dword are affected.
*
* The only way to get unaligned stores in radeonsi is through
* shader images.
*/
bool force_glc = ctx->screen->b.chip_class == SI;
coords = image_fetch_coords(bld_base, inst, 0);
if (target == TGSI_TEXTURE_BUFFER) {
@ -3802,14 +3844,14 @@ static void store_fetch_args(
rsrc = extract_rsrc_top_half(ctx, rsrc);
buffer_append_args(ctx, emit_data, rsrc, coords,
bld_base->uint_bld.zero, false);
bld_base->uint_bld.zero, false, force_glc);
} else {
emit_data->args[1] = coords;
image_fetch_rsrc(bld_base, &memory, true, &emit_data->args[2]);
emit_data->args[3] = lp_build_const_int32(gallivm, 15); /* dmask */
emit_data->arg_count = 4;
image_append_args(ctx, emit_data, target, false);
image_append_args(ctx, emit_data, target, false, force_glc);
}
}
}
@ -3929,7 +3971,7 @@ static void store_emit(
}
if (inst->Memory.Qualifier & TGSI_MEMORY_VOLATILE)
emit_waitcnt(ctx);
emit_waitcnt(ctx, VM_CNT);
if (inst->Dst[0].Register.File == TGSI_FILE_BUFFER) {
store_emit_buffer(ctx, emit_data);
@ -3993,7 +4035,7 @@ static void atomic_fetch_args(
offset = LLVMBuildBitCast(builder, tmp, bld_base->uint_bld.elem_type, "");
buffer_append_args(ctx, emit_data, rsrc, bld_base->uint_bld.zero,
offset, true);
offset, true, false);
} else if (inst->Src[0].Register.File == TGSI_FILE_IMAGE) {
unsigned target = inst->Memory.Texture;
LLVMValueRef coords;
@ -4005,12 +4047,12 @@ static void atomic_fetch_args(
if (target == TGSI_TEXTURE_BUFFER) {
rsrc = extract_rsrc_top_half(ctx, rsrc);
buffer_append_args(ctx, emit_data, rsrc, coords,
bld_base->uint_bld.zero, true);
bld_base->uint_bld.zero, true, false);
} else {
emit_data->args[emit_data->arg_count++] = coords;
emit_data->args[emit_data->arg_count++] = rsrc;
image_append_args(ctx, emit_data, target, true);
image_append_args(ctx, emit_data, target, true, false);
}
}
}
@ -5247,6 +5289,7 @@ static void si_llvm_emit_vertex(
struct si_shader *shader = ctx->shader;
struct tgsi_shader_info *info = &shader->selector->info;
struct gallivm_state *gallivm = bld_base->base.gallivm;
struct lp_build_if_state if_state;
LLVMValueRef soffset = LLVMGetParam(ctx->main_fn,
SI_PARAM_GS2VS_OFFSET);
LLVMValueRef gs_next_vertex;
@ -5264,19 +5307,28 @@ static void si_llvm_emit_vertex(
"");
/* If this thread has already emitted the declared maximum number of
* vertices, kill it: excessive vertex emissions are not supposed to
* have any effect, and GS threads have no externally observable
* effects other than emitting vertices.
* vertices, skip the write: excessive vertex emissions are not
* supposed to have any effect.
*
* If the shader has no writes to memory, kill it instead. This skips
* further memory loads and may allow LLVM to skip to the end
* altogether.
*/
can_emit = LLVMBuildICmp(gallivm->builder, LLVMIntULE, gs_next_vertex,
can_emit = LLVMBuildICmp(gallivm->builder, LLVMIntULT, gs_next_vertex,
lp_build_const_int32(gallivm,
shader->selector->gs_max_out_vertices), "");
kill = lp_build_select(&bld_base->base, can_emit,
lp_build_const_float(gallivm, 1.0f),
lp_build_const_float(gallivm, -1.0f));
lp_build_intrinsic(gallivm->builder, "llvm.AMDGPU.kill",
ctx->voidt, &kill, 1, 0);
bool use_kill = !info->writes_memory;
if (use_kill) {
kill = lp_build_select(&bld_base->base, can_emit,
lp_build_const_float(gallivm, 1.0f),
lp_build_const_float(gallivm, -1.0f));
lp_build_intrinsic(gallivm->builder, "llvm.AMDGPU.kill",
ctx->voidt, &kill, 1, 0);
} else {
lp_build_if(&if_state, gallivm, can_emit);
}
for (i = 0; i < info->num_outputs; i++) {
LLVMValueRef *out_ptr =
@ -5302,6 +5354,7 @@ static void si_llvm_emit_vertex(
1, 0, 1, 1, 0);
}
}
gs_next_vertex = lp_build_add(uint, gs_next_vertex,
lp_build_const_int32(gallivm, 1));
@ -5312,6 +5365,9 @@ static void si_llvm_emit_vertex(
args[1] = LLVMGetParam(ctx->main_fn, SI_PARAM_GS_WAVE_ID);
lp_build_intrinsic(gallivm->builder, "llvm.SI.sendmsg",
ctx->voidt, args, 2, 0);
if (!use_kill)
lp_build_endif(&if_state);
}
/* Cut one primitive from the geometry shader */
@ -5344,7 +5400,7 @@ static void si_llvm_emit_barrier(const struct lp_build_tgsi_action *action,
* always fits into a single wave.
*/
if (ctx->type == PIPE_SHADER_TESS_CTRL) {
emit_optimization_barrier(ctx);
emit_waitcnt(ctx, LGKM_CNT & VM_CNT);
return;
}
@ -5481,6 +5537,23 @@ static void declare_tess_lds(struct si_shader_context *ctx)
"tess_lds");
}
static unsigned si_get_max_workgroup_size(struct si_shader *shader)
{
const unsigned *properties = shader->selector->info.properties;
unsigned max_work_group_size =
properties[TGSI_PROPERTY_CS_FIXED_BLOCK_WIDTH] *
properties[TGSI_PROPERTY_CS_FIXED_BLOCK_HEIGHT] *
properties[TGSI_PROPERTY_CS_FIXED_BLOCK_DEPTH];
if (!max_work_group_size) {
/* This is a variable group size compute shader,
* compile it for the maximum possible group size.
*/
max_work_group_size = SI_MAX_VARIABLE_THREADS_PER_BLOCK;
}
return max_work_group_size;
}
static void create_function(struct si_shader_context *ctx)
{
struct lp_build_tgsi_context *bld_base = &ctx->soa.bld_base;
@ -5706,22 +5779,9 @@ static void create_function(struct si_shader_context *ctx)
S_0286D0_FRONT_FACE_ENA(1) |
S_0286D0_POS_FIXED_PT_ENA(1));
} else if (ctx->type == PIPE_SHADER_COMPUTE) {
const unsigned *properties = shader->selector->info.properties;
unsigned max_work_group_size =
properties[TGSI_PROPERTY_CS_FIXED_BLOCK_WIDTH] *
properties[TGSI_PROPERTY_CS_FIXED_BLOCK_HEIGHT] *
properties[TGSI_PROPERTY_CS_FIXED_BLOCK_DEPTH];
if (!max_work_group_size) {
/* This is a variable group size compute shader,
* compile it for the maximum possible group size.
*/
max_work_group_size = SI_MAX_VARIABLE_THREADS_PER_BLOCK;
}
si_llvm_add_attribute(ctx->main_fn,
"amdgpu-max-work-group-size",
max_work_group_size);
si_get_max_workgroup_size(shader));
}
shader->info.num_input_sgprs = 0;
@ -6643,20 +6703,11 @@ int si_compile_tgsi_shader(struct si_screen *sscreen,
* LLVM 3.9svn has this bug.
*/
if (sel->type == PIPE_SHADER_COMPUTE) {
unsigned *props = sel->info.properties;
unsigned wave_size = 64;
unsigned max_vgprs = 256;
unsigned max_sgprs = sscreen->b.chip_class >= VI ? 800 : 512;
unsigned max_sgprs_per_wave = 128;
unsigned max_block_threads;
if (props[TGSI_PROPERTY_CS_FIXED_BLOCK_WIDTH])
max_block_threads = props[TGSI_PROPERTY_CS_FIXED_BLOCK_WIDTH] *
props[TGSI_PROPERTY_CS_FIXED_BLOCK_HEIGHT] *
props[TGSI_PROPERTY_CS_FIXED_BLOCK_DEPTH];
else
max_block_threads = SI_MAX_VARIABLE_THREADS_PER_BLOCK;
unsigned max_block_threads = si_get_max_workgroup_size(shader);
unsigned min_waves_per_cu = DIV_ROUND_UP(max_block_threads, wave_size);
unsigned min_waves_per_simd = DIV_ROUND_UP(min_waves_per_cu, 4);
@ -7746,11 +7797,31 @@ static bool si_shader_select_ps_parts(struct si_screen *sscreen,
return true;
}
static void si_fix_num_sgprs(struct si_shader *shader)
void si_multiwave_lds_size_workaround(struct si_screen *sscreen,
unsigned *lds_size)
{
/* SPI barrier management bug:
* Make sure we have at least 4k of LDS in use to avoid the bug.
* It applies to workgroup sizes of more than one wavefront.
*/
if (sscreen->b.family == CHIP_BONAIRE ||
sscreen->b.family == CHIP_KABINI ||
sscreen->b.family == CHIP_MULLINS)
*lds_size = MAX2(*lds_size, 8);
}
static void si_fix_resource_usage(struct si_screen *sscreen,
struct si_shader *shader)
{
unsigned min_sgprs = shader->info.num_input_sgprs + 2; /* VCC */
shader->config.num_sgprs = MAX2(shader->config.num_sgprs, min_sgprs);
if (shader->selector->type == PIPE_SHADER_COMPUTE &&
si_get_max_workgroup_size(shader) > 64) {
si_multiwave_lds_size_workaround(sscreen,
&shader->config.lds_size);
}
}
int si_shader_create(struct si_screen *sscreen, LLVMTargetMachineRef tm,
@ -7846,7 +7917,7 @@ int si_shader_create(struct si_screen *sscreen, LLVMTargetMachineRef tm,
}
}
si_fix_num_sgprs(shader);
si_fix_resource_usage(sscreen, shader);
si_shader_dump(sscreen, shader, debug, sel->info.processor,
stderr);

View File

@ -482,6 +482,8 @@ int si_shader_binary_upload(struct si_screen *sscreen, struct si_shader *shader)
void si_shader_dump(struct si_screen *sscreen, struct si_shader *shader,
struct pipe_debug_callback *debug, unsigned processor,
FILE *f);
void si_multiwave_lds_size_workaround(struct si_screen *sscreen,
unsigned *lds_size);
void si_shader_apply_scratch_relocs(struct si_context *sctx,
struct si_shader *shader,
struct si_shader_config *config,

View File

@ -453,8 +453,14 @@ static void *si_create_blend_state_mode(struct pipe_context *ctx,
S_028760_ALPHA_COMB_FCN(V_028760_OPT_COMB_BLEND_DISABLED);
/* Only set dual source blending for MRT0 to avoid a hang. */
if (i >= 1 && blend->dual_src_blend)
if (i >= 1 && blend->dual_src_blend) {
/* Vulkan does this for dual source blending. */
if (i == 1)
blend_cntl |= S_028780_ENABLE(1);
si_pm4_set_reg(pm4, R_028780_CB_BLEND0_CONTROL + i * 4, blend_cntl);
continue;
}
/* Only addition and subtraction equations are supported with
* dual source blending.
@ -463,16 +469,14 @@ static void *si_create_blend_state_mode(struct pipe_context *ctx,
(eqRGB == PIPE_BLEND_MIN || eqRGB == PIPE_BLEND_MAX ||
eqA == PIPE_BLEND_MIN || eqA == PIPE_BLEND_MAX)) {
assert(!"Unsupported equation for dual source blending");
si_pm4_set_reg(pm4, R_028780_CB_BLEND0_CONTROL + i * 4, blend_cntl);
continue;
}
if (!state->rt[j].colormask)
continue;
/* cb_render_state will disable unused ones */
blend->cb_target_mask |= (unsigned)state->rt[j].colormask << (4 * i);
if (!state->rt[j].blend_enable) {
if (!state->rt[j].colormask || !state->rt[j].blend_enable) {
si_pm4_set_reg(pm4, R_028780_CB_BLEND0_CONTROL + i * 4, blend_cntl);
continue;
}
@ -553,6 +557,17 @@ static void *si_create_blend_state_mode(struct pipe_context *ctx,
}
if (sctx->b.family == CHIP_STONEY) {
/* Disable RB+ blend optimizations for dual source blending.
* Vulkan does this.
*/
if (blend->dual_src_blend) {
for (int i = 0; i < 8; i++) {
sx_mrt_blend_opt[i] =
S_028760_COLOR_COMB_FCN(V_028760_OPT_COMB_NONE) |
S_028760_ALPHA_COMB_FCN(V_028760_OPT_COMB_NONE);
}
}
for (int i = 0; i < 8; i++)
si_pm4_set_reg(pm4, R_028760_SX_MRT0_BLEND_OPT + i * 4,
sx_mrt_blend_opt[i]);

View File

@ -154,6 +154,12 @@ static void si_emit_derived_tess_state(struct si_context *sctx,
*/
*num_patches = MIN2(*num_patches, 40);
/* SI bug workaround - limit LS-HS threadgroups to only one wave. */
if (sctx->b.chip_class == SI) {
unsigned one_wave = 64 / MAX2(num_tcs_input_cp, num_tcs_output_cp);
*num_patches = MIN2(*num_patches, one_wave);
}
output_patch0_offset = input_patch_size * *num_patches;
perpatch_output_offset = output_patch0_offset + pervertex_output_patch_size;
@ -162,11 +168,13 @@ static void si_emit_derived_tess_state(struct si_context *sctx,
if (sctx->b.chip_class >= CIK) {
assert(lds_size <= 65536);
ls_rsrc2 |= S_00B52C_LDS_SIZE(align(lds_size, 512) / 512);
lds_size = align(lds_size, 512) / 512;
} else {
assert(lds_size <= 32768);
ls_rsrc2 |= S_00B52C_LDS_SIZE(align(lds_size, 256) / 256);
lds_size = align(lds_size, 256) / 256;
}
si_multiwave_lds_size_workaround(sctx->screen, &lds_size);
ls_rsrc2 |= S_00B52C_LDS_SIZE(lds_size);
if (sctx->last_ls == ls->current &&
sctx->last_tcs == tcs &&
@ -284,10 +292,18 @@ static unsigned si_get_ia_multi_vgt_param(struct si_context *sctx,
/* Needed for 028B6C_DISTRIBUTION_MODE != 0 */
if (sctx->screen->has_distributed_tess) {
if (sctx->gs_shader.cso)
if (sctx->gs_shader.cso) {
partial_es_wave = true;
else
/* GPU hang workaround. */
if (sctx->b.family == CHIP_TONGA ||
sctx->b.family == CHIP_FIJI ||
sctx->b.family == CHIP_POLARIS10 ||
sctx->b.family == CHIP_POLARIS11)
partial_vs_wave = true;
} else {
partial_vs_wave = true;
}
}
}

View File

@ -1777,10 +1777,15 @@ static bool si_update_gs_ring_buffers(struct si_context *sctx)
sctx->esgs_ring, 0, sctx->esgs_ring->width0,
false, false, 0, 0, 0);
}
if (sctx->gsvs_ring)
if (sctx->gsvs_ring) {
si_set_ring_buffer(&sctx->b.b, SI_VS_RING_GSVS,
sctx->gsvs_ring, 0, sctx->gsvs_ring->width0,
false, false, 0, 0, 0);
/* Also update SI_GS_RING_GSVSi descriptors. */
sctx->last_gsvs_itemsize = 0;
}
return true;
}

View File

@ -1865,22 +1865,29 @@ ntq_emit_if(struct vc4_compile *c, nir_if *if_stmt)
static void
ntq_emit_jump(struct vc4_compile *c, nir_jump_instr *jump)
{
struct qblock *jump_block;
switch (jump->type) {
case nir_jump_break:
qir_SF(c, c->execute);
qir_MOV_cond(c, QPU_COND_ZS, c->execute,
qir_uniform_ui(c, c->loop_break_block->index));
jump_block = c->loop_break_block;
break;
case nir_jump_continue:
qir_SF(c, c->execute);
qir_MOV_cond(c, QPU_COND_ZS, c->execute,
qir_uniform_ui(c, c->loop_cont_block->index));
jump_block = c->loop_cont_block;
break;
case nir_jump_return:
unreachable("All returns shouold be lowered\n");
default:
unreachable("Unsupported jump type\n");
}
qir_SF(c, c->execute);
qir_MOV_cond(c, QPU_COND_ZS, c->execute,
qir_uniform_ui(c, jump_block->index));
/* Jump to the destination block if everyone has taken the jump. */
qir_SF(c, qir_SUB(c, c->execute, qir_uniform_ui(c, jump_block->index)));
qir_BRANCH(c, QPU_COND_BRANCH_ALL_ZS);
struct qblock *new_block = qir_new_block(c);
qir_link_blocks(c->cur_block, jump_block);
qir_link_blocks(c->cur_block, new_block);
qir_set_emit_block(c, new_block);
}
static void

View File

@ -3194,7 +3194,7 @@
<value name="Per Process Graphics Address" value="0"/>
<value name="Global Graphics Address" value="1"/>
</field>
<field name="Register Poll Mode" start="16" end="16" type="uint" default="1"/>
<field name="Register Poll Mode" start="16" end="16" type="bool"/>
<field name="Wait Mode" start="15" end="15" type="uint">
<value name="Polling Mode" value="1"/>
<value name="Signal Mode" value="0"/>

View File

@ -8220,7 +8220,6 @@ GEN9_MI_SEMAPHORE_SIGNAL_pack(__gen_user_data *data, void * restrict dst,
#define GEN9_MI_SEMAPHORE_WAIT_header \
.CommandType = 0, \
.MICommandOpcode = 28, \
.RegisterPollMode = 1, \
.DWordLength = 2
struct GEN9_MI_SEMAPHORE_WAIT {
@ -8229,7 +8228,7 @@ struct GEN9_MI_SEMAPHORE_WAIT {
uint32_t MemoryType;
#define PerProcessGraphicsAddress 0
#define GlobalGraphicsAddress 1
uint32_t RegisterPollMode;
bool RegisterPollMode;
uint32_t WaitMode;
#define PollingMode 1
#define SignalMode 0

View File

@ -15247,8 +15247,7 @@ static const uint8_t gen9_xml[] = {
0x72, 0x20, 0x50, 0x6f, 0x6c, 0x6c, 0x20, 0x4d, 0x6f, 0x64, 0x65, 0x22,
0x20, 0x73, 0x74, 0x61, 0x72, 0x74, 0x3d, 0x22, 0x31, 0x36, 0x22, 0x20,
0x65, 0x6e, 0x64, 0x3d, 0x22, 0x31, 0x36, 0x22, 0x20, 0x74, 0x79, 0x70,
0x65, 0x3d, 0x22, 0x75, 0x69, 0x6e, 0x74, 0x22, 0x20, 0x64, 0x65, 0x66,
0x61, 0x75, 0x6c, 0x74, 0x3d, 0x22, 0x31, 0x22, 0x2f, 0x3e, 0x0a, 0x20,
0x65, 0x3d, 0x22, 0x62, 0x6f, 0x6f, 0x6c, 0x22, 0x2f, 0x3e, 0x0a, 0x20,
0x20, 0x20, 0x20, 0x3c, 0x66, 0x69, 0x65, 0x6c, 0x64, 0x20, 0x6e, 0x61,
0x6d, 0x65, 0x3d, 0x22, 0x57, 0x61, 0x69, 0x74, 0x20, 0x4d, 0x6f, 0x64,
0x65, 0x22, 0x20, 0x73, 0x74, 0x61, 0x72, 0x74, 0x3d, 0x22, 0x31, 0x35,

View File

@ -246,10 +246,12 @@ anv_ptr_free_list_push(void **list, void *elem)
static uint32_t
anv_block_pool_grow(struct anv_block_pool *pool, struct anv_block_state *state);
void
VkResult
anv_block_pool_init(struct anv_block_pool *pool,
struct anv_device *device, uint32_t block_size)
{
VkResult result;
assert(util_is_power_of_two(block_size));
pool->device = device;
@ -260,17 +262,23 @@ anv_block_pool_init(struct anv_block_pool *pool,
pool->fd = memfd_create("block pool", MFD_CLOEXEC);
if (pool->fd == -1)
return;
return vk_error(VK_ERROR_INITIALIZATION_FAILED);
/* Just make it 2GB up-front. The Linux kernel won't actually back it
* with pages until we either map and fault on one of them or we use
* userptr and send a chunk of it off to the GPU.
*/
if (ftruncate(pool->fd, BLOCK_POOL_MEMFD_SIZE) == -1)
return;
if (ftruncate(pool->fd, BLOCK_POOL_MEMFD_SIZE) == -1) {
result = vk_error(VK_ERROR_INITIALIZATION_FAILED);
goto fail_fd;
}
u_vector_init(&pool->mmap_cleanups,
round_to_power_of_two(sizeof(struct anv_mmap_cleanup)), 128);
if (!u_vector_init(&pool->mmap_cleanups,
round_to_power_of_two(sizeof(struct anv_mmap_cleanup)),
128)) {
result = vk_error(VK_ERROR_INITIALIZATION_FAILED);
goto fail_fd;
}
pool->state.next = 0;
pool->state.end = 0;
@ -279,6 +287,13 @@ anv_block_pool_init(struct anv_block_pool *pool,
/* Immediately grow the pool so we'll have a backing bo. */
pool->state.end = anv_block_pool_grow(pool, &pool->state);
return VK_SUCCESS;
fail_fd:
close(pool->fd);
return result;
}
void

View File

@ -498,6 +498,7 @@ anv_descriptor_set_destroy(struct anv_device *device,
struct surface_state_free_list_entry *entry =
set->buffer_views[b].surface_state.map;
entry->next = pool->surface_state_free_list;
entry->offset = set->buffer_views[b].surface_state.offset;
pool->surface_state_free_list = entry;
}

View File

@ -24,6 +24,7 @@
#include <assert.h>
#include <stdbool.h>
#include <string.h>
#include <sys/mman.h>
#include <unistd.h>
#include <fcntl.h>
@ -162,8 +163,6 @@ anv_physical_device_init(struct anv_physical_device *device,
device->info.max_cs_threads = max_cs_threads;
}
close(fd);
brw_process_intel_debug_variable();
device->compiler = brw_compiler_create(NULL, &device->info);
@ -175,12 +174,15 @@ anv_physical_device_init(struct anv_physical_device *device,
device->compiler->shader_perf_log = compiler_perf_log;
result = anv_init_wsi(device);
if (result != VK_SUCCESS)
goto fail;
if (result != VK_SUCCESS) {
ralloc_free(device->compiler);
goto fail;
}
/* XXX: Actually detect bit6 swizzling */
isl_device_init(&device->isl_dev, &device->info, swizzled);
close(fd);
return VK_SUCCESS;
fail:
@ -527,7 +529,7 @@ void anv_GetPhysicalDeviceProperties(
.maxGeometryTotalOutputComponents = 1024,
.maxFragmentInputComponents = 128,
.maxFragmentOutputAttachments = 8,
.maxFragmentDualSrcAttachments = 2,
.maxFragmentDualSrcAttachments = 1,
.maxFragmentCombinedOutputResources = 8,
.maxComputeSharedMemorySize = 32768,
.maxComputeWorkGroupCount = { 65535, 65535, 65535 },
@ -967,10 +969,10 @@ void anv_DestroyDevice(
{
ANV_FROM_HANDLE(anv_device, device, _device);
anv_queue_finish(&device->queue);
anv_device_finish_blorp(device);
anv_queue_finish(&device->queue);
#ifdef HAVE_VALGRIND
/* We only need to free these to prevent valgrind errors. The backing
* BO will go away in a couple of lines so we don't actually leak.
@ -978,22 +980,27 @@ void anv_DestroyDevice(
anv_state_pool_free(&device->dynamic_state_pool, device->border_colors);
#endif
anv_scratch_pool_finish(device, &device->scratch_pool);
anv_gem_munmap(device->workaround_bo.map, device->workaround_bo.size);
anv_gem_close(device, device->workaround_bo.gem_handle);
anv_bo_pool_finish(&device->batch_bo_pool);
anv_state_pool_finish(&device->dynamic_state_pool);
anv_block_pool_finish(&device->dynamic_state_block_pool);
anv_state_pool_finish(&device->instruction_state_pool);
anv_block_pool_finish(&device->instruction_block_pool);
anv_state_pool_finish(&device->surface_state_pool);
anv_block_pool_finish(&device->surface_state_block_pool);
anv_scratch_pool_finish(device, &device->scratch_pool);
anv_state_pool_finish(&device->instruction_state_pool);
anv_block_pool_finish(&device->instruction_block_pool);
anv_state_pool_finish(&device->dynamic_state_pool);
anv_block_pool_finish(&device->dynamic_state_block_pool);
anv_bo_pool_finish(&device->batch_bo_pool);
pthread_cond_destroy(&device->queue_submit);
pthread_mutex_destroy(&device->mutex);
anv_gem_destroy_context(device, device->context_id);
close(device->fd);
pthread_mutex_destroy(&device->mutex);
vk_free(&device->alloc, device);
}
@ -1236,6 +1243,9 @@ VkResult anv_AllocateMemory(
mem->type_index = pAllocateInfo->memoryTypeIndex;
mem->map = NULL;
mem->map_size = 0;
*pMem = anv_device_memory_to_handle(mem);
return VK_SUCCESS;
@ -1257,6 +1267,9 @@ void anv_FreeMemory(
if (mem == NULL)
return;
if (mem->map)
anv_UnmapMemory(_device, _mem);
if (mem->bo.map)
anv_gem_munmap(mem->bo.map, mem->bo.size);
@ -1303,8 +1316,12 @@ VkResult anv_MapMemory(
/* Let's map whole pages */
map_size = align_u64(map_size, 4096);
mem->map = anv_gem_mmap(device, mem->bo.gem_handle,
map_offset, map_size, gem_flags);
void *map = anv_gem_mmap(device, mem->bo.gem_handle,
map_offset, map_size, gem_flags);
if (map == MAP_FAILED)
return vk_error(VK_ERROR_MEMORY_MAP_FAILED);
mem->map = map;
mem->map_size = map_size;
*ppData = mem->map + (offset - map_offset);
@ -1322,6 +1339,9 @@ void anv_UnmapMemory(
return;
anv_gem_munmap(mem->map, mem->map_size);
mem->map = NULL;
mem->map_size = 0;
}
static void

View File

@ -88,10 +88,8 @@ anv_gem_mmap(struct anv_device *device, uint32_t gem_handle,
};
int ret = anv_ioctl(device->fd, DRM_IOCTL_I915_GEM_MMAP, &gem_mmap);
if (ret != 0) {
/* FIXME: Is NULL the right error return? Cf MAP_INVALID */
return NULL;
}
if (ret != 0)
return MAP_FAILED;
VG(VALGRIND_MALLOCLIKE_BLOCK(gem_mmap.addr_ptr, gem_mmap.size, 0, 1));
return (void *)(uintptr_t) gem_mmap.addr_ptr;

View File

@ -194,8 +194,8 @@ make_surface(const struct anv_device *dev,
anv_finishme("Test gen8 multisampled HiZ");
} else {
isl_surf_get_hiz_surf(&dev->isl_dev, &image->depth_surface.isl,
&image->hiz_surface.isl);
add_surface(image, &image->hiz_surface);
&image->aux_surface.isl);
add_surface(image, &image->aux_surface);
}
}
@ -306,16 +306,16 @@ VkResult anv_BindImageMemory(
/* The offset and size must be a multiple of 4K or else the
* anv_gem_mmap call below will return NULL.
*/
assert((image->offset + image->hiz_surface.offset) % 4096 == 0);
assert(image->hiz_surface.isl.size % 4096 == 0);
assert((image->offset + image->aux_surface.offset) % 4096 == 0);
assert(image->aux_surface.isl.size % 4096 == 0);
/* HiZ surfaces need to have their memory cleared to 0 before they
* can be used. If we let it have garbage data, it can cause GPU
* hangs on some hardware.
*/
void *map = anv_gem_mmap(device, image->bo->gem_handle,
image->offset + image->hiz_surface.offset,
image->hiz_surface.isl.size,
image->offset + image->aux_surface.offset,
image->aux_surface.isl.size,
device->info.has_llc ? 0 : I915_MMAP_WC);
/* If anv_gem_mmap returns NULL, it's likely that the kernel was
@ -324,9 +324,9 @@ VkResult anv_BindImageMemory(
if (map == NULL)
return vk_error(VK_ERROR_OUT_OF_HOST_MEMORY);
memset(map, 0, image->hiz_surface.isl.size);
memset(map, 0, image->aux_surface.isl.size);
anv_gem_munmap(map, image->hiz_surface.isl.size);
anv_gem_munmap(map, image->aux_surface.isl.size);
}
return VK_SUCCESS;

View File

@ -416,8 +416,8 @@ anv_state_clflush(struct anv_state state)
anv_clflush_range(state.map, state.alloc_size);
}
void anv_block_pool_init(struct anv_block_pool *pool,
struct anv_device *device, uint32_t block_size);
VkResult anv_block_pool_init(struct anv_block_pool *pool,
struct anv_device *device, uint32_t block_size);
void anv_block_pool_finish(struct anv_block_pool *pool);
int32_t anv_block_pool_alloc(struct anv_block_pool *pool);
int32_t anv_block_pool_alloc_back(struct anv_block_pool *pool);
@ -1526,10 +1526,11 @@ struct anv_image {
struct {
struct anv_surface depth_surface;
struct anv_surface hiz_surface;
struct anv_surface stencil_surface;
};
};
struct anv_surface aux_surface;
};
static inline uint32_t
@ -1593,11 +1594,11 @@ anv_image_get_surface_for_aspect_mask(const struct anv_image *image,
static inline bool
anv_image_has_hiz(const struct anv_image *image)
{
/* We must check the aspect because anv_image::hiz_surface belongs to
* a union.
/* We must check the aspect because anv_image::aux_surface may be used for
* any type of auxiliary surface, not just HiZ.
*/
return (image->aspects & VK_IMAGE_ASPECT_DEPTH_BIT) &&
image->hiz_surface.isl.size > 0;
image->aux_surface.isl.size > 0;
}
struct anv_buffer_view {

View File

@ -1 +1 @@
#define ANV_TIMESTAMP "1480346204"
#define ANV_TIMESTAMP "1483631584"

View File

@ -1356,22 +1356,13 @@ flush_compute_descriptor_set(struct anv_cmd_buffer *cmd_buffer)
result = emit_binding_table(cmd_buffer, MESA_SHADER_COMPUTE, &surfaces);
assert(result == VK_SUCCESS);
}
result = emit_samplers(cmd_buffer, MESA_SHADER_COMPUTE, &samplers);
assert(result == VK_SUCCESS);
struct anv_state push_state = anv_cmd_buffer_cs_push_constants(cmd_buffer);
const struct brw_cs_prog_data *cs_prog_data = get_cs_prog_data(pipeline);
const struct brw_stage_prog_data *prog_data = &cs_prog_data->base;
if (push_state.alloc_size) {
anv_batch_emit(&cmd_buffer->batch, GENX(MEDIA_CURBE_LOAD), curbe) {
curbe.CURBETotalDataLength = push_state.alloc_size;
curbe.CURBEDataStartAddress = push_state.offset;
}
}
const uint32_t slm_size = encode_slm_size(GEN_GEN, prog_data->total_shared);
struct anv_state state =
@ -1441,6 +1432,18 @@ genX(cmd_buffer_flush_compute_state)(struct anv_cmd_buffer *cmd_buffer)
cmd_buffer->state.descriptors_dirty &= ~VK_SHADER_STAGE_COMPUTE_BIT;
}
if (cmd_buffer->state.push_constants_dirty & VK_SHADER_STAGE_COMPUTE_BIT) {
struct anv_state push_state =
anv_cmd_buffer_cs_push_constants(cmd_buffer);
if (push_state.alloc_size) {
anv_batch_emit(&cmd_buffer->batch, GENX(MEDIA_CURBE_LOAD), curbe) {
curbe.CURBETotalDataLength = push_state.alloc_size;
curbe.CURBEDataStartAddress = push_state.offset;
}
}
}
cmd_buffer->state.compute_dirty = 0;
genX(cmd_buffer_apply_pipe_flushes)(cmd_buffer);
@ -1796,10 +1799,10 @@ cmd_buffer_emit_depth_stencil(struct anv_cmd_buffer *cmd_buffer)
if (has_hiz) {
anv_batch_emit(&cmd_buffer->batch, GENX(3DSTATE_HIER_DEPTH_BUFFER), hdb) {
hdb.HierarchicalDepthBufferObjectControlState = GENX(MOCS);
hdb.SurfacePitch = image->hiz_surface.isl.row_pitch - 1;
hdb.SurfacePitch = image->aux_surface.isl.row_pitch - 1;
hdb.SurfaceBaseAddress = (struct anv_address) {
.bo = image->bo,
.offset = image->offset + image->hiz_surface.offset,
.offset = image->offset + image->aux_surface.offset,
};
#if GEN_GEN >= 8
/* From the SKL PRM Vol2a:
@ -1809,11 +1812,14 @@ cmd_buffer_emit_depth_stencil(struct anv_cmd_buffer *cmd_buffer)
* - SURFTYPE_1D: distance in pixels between array slices
* - SURFTYPE_2D/CUBE: distance in rows between array slices
* - SURFTYPE_3D: distance in rows between R - slices
*
* Unfortunately, the docs aren't 100% accurate here. They fail to
* mention that the 1-D rule only applies to linear 1-D images.
* Since depth and HiZ buffers are always tiled, they are treated as
* 2-D images. Prior to Sky Lake, this field is always in rows.
*/
hdb.SurfaceQPitch =
image->hiz_surface.isl.dim == ISL_SURF_DIM_1D ?
isl_surf_get_array_pitch_el(&image->hiz_surface.isl) >> 2 :
isl_surf_get_array_pitch_el_rows(&image->hiz_surface.isl) >> 2;
isl_surf_get_array_pitch_el_rows(&image->aux_surface.isl) >> 2;
#endif
}
} else {

View File

@ -5692,7 +5692,7 @@ fs_visitor::optimize()
OPT(opt_algebraic);
OPT(opt_cse);
OPT(opt_copy_propagate);
OPT(opt_copy_propagation);
OPT(opt_predicated_break, this);
OPT(opt_cmod_propagation);
OPT(dead_code_eliminate);
@ -5716,7 +5716,7 @@ fs_visitor::optimize()
}
if (OPT(lower_d2x)) {
OPT(opt_copy_propagate);
OPT(opt_copy_propagation);
OPT(dead_code_eliminate);
}
@ -5728,12 +5728,12 @@ fs_visitor::optimize()
OPT(lower_logical_sends);
if (progress) {
OPT(opt_copy_propagate);
OPT(opt_copy_propagation);
/* Only run after logical send lowering because it's easier to implement
* in terms of physical sends.
*/
if (OPT(opt_zero_samples))
OPT(opt_copy_propagate);
OPT(opt_copy_propagation);
/* Run after logical send lowering to give it a chance to CSE the
* LOAD_PAYLOAD instructions created to construct the payloads of
* e.g. texturing messages in cases where it wasn't possible to CSE the
@ -5762,7 +5762,7 @@ fs_visitor::optimize()
if (devinfo->gen <= 5 && OPT(lower_minmax)) {
OPT(opt_cmod_propagation);
OPT(opt_cse);
OPT(opt_copy_propagate);
OPT(opt_copy_propagation);
OPT(dead_code_eliminate);
}

View File

@ -133,11 +133,11 @@ public:
bool opt_redundant_discard_jumps();
bool opt_cse();
bool opt_cse_local(bblock_t *block);
bool opt_copy_propagate();
bool opt_copy_propagation();
bool try_copy_propagate(fs_inst *inst, int arg, acp_entry *entry);
bool try_constant_propagate(fs_inst *inst, acp_entry *entry);
bool opt_copy_propagate_local(void *mem_ctx, bblock_t *block,
exec_list *acp);
bool opt_copy_propagation_local(void *mem_ctx, bblock_t *block,
exec_list *acp);
bool opt_drop_redundant_mov_to_flags();
bool opt_register_renaming();
bool register_coalesce();

View File

@ -129,7 +129,7 @@ fs_copy_prop_dataflow::fs_copy_prop_dataflow(void *mem_ctx, cfg_t *cfg,
foreach_in_list(acp_entry, entry, &out_acp[block->num][i]) {
acp[next_acp] = entry;
/* opt_copy_propagate_local populates out_acp with copies created
/* opt_copy_propagation_local populates out_acp with copies created
* in a block which are still live at the end of the block. This
* is exactly what we want in the COPY set.
*/
@ -431,7 +431,9 @@ fs_visitor::try_copy_propagate(fs_inst *inst, int arg, acp_entry *entry)
if (entry->saturate) {
switch(inst->opcode) {
case BRW_OPCODE_SEL:
if (inst->src[1].file != IMM ||
if ((inst->conditional_mod != BRW_CONDITIONAL_GE &&
inst->conditional_mod != BRW_CONDITIONAL_L) ||
inst->src[1].file != IMM ||
inst->src[1].f < 0.0 ||
inst->src[1].f > 1.0) {
return false;
@ -735,8 +737,8 @@ can_propagate_from(fs_inst *inst)
* list.
*/
bool
fs_visitor::opt_copy_propagate_local(void *copy_prop_ctx, bblock_t *block,
exec_list *acp)
fs_visitor::opt_copy_propagation_local(void *copy_prop_ctx, bblock_t *block,
exec_list *acp)
{
bool progress = false;
@ -819,7 +821,7 @@ fs_visitor::opt_copy_propagate_local(void *copy_prop_ctx, bblock_t *block,
}
bool
fs_visitor::opt_copy_propagate()
fs_visitor::opt_copy_propagation()
{
bool progress = false;
void *copy_prop_ctx = ralloc_context(NULL);
@ -832,8 +834,8 @@ fs_visitor::opt_copy_propagate()
* the set of copies available at the end of the block.
*/
foreach_block (block, cfg) {
progress = opt_copy_propagate_local(copy_prop_ctx, block,
out_acp[block->num]) || progress;
progress = opt_copy_propagation_local(copy_prop_ctx, block,
out_acp[block->num]) || progress;
}
/* Do dataflow analysis for those available copies. */
@ -852,7 +854,8 @@ fs_visitor::opt_copy_propagate()
}
}
progress = opt_copy_propagate_local(copy_prop_ctx, block, in_acp) || progress;
progress = opt_copy_propagation_local(copy_prop_ctx, block, in_acp) ||
progress;
}
for (int i = 0; i < cfg->num_blocks; i++)

View File

@ -780,7 +780,13 @@ brw_compile_gs(const struct brw_compiler *compiler, void *log_data,
if (compiler->devinfo->gen >= 8)
output_size_bytes += 32;
assert(output_size_bytes >= 1);
/* Shaders can technically set max_vertices = 0, at which point we
* may have a URB size of 0 bytes. Nothing good can come from that,
* so enforce a minimum size.
*/
if (output_size_bytes == 0)
output_size_bytes = 1;
unsigned max_output_size_bytes = GEN7_MAX_GS_URB_ENTRY_SIZE_BYTES;
if (compiler->devinfo->gen == 6)
max_output_size_bytes = GEN6_MAX_GS_URB_ENTRY_SIZE_BYTES;

View File

@ -2159,6 +2159,8 @@ intel_miptree_make_shareable(struct brw_context *brw,
intel_miptree_release(&mt->mcs_mt);
mt->fast_clear_state = INTEL_FAST_CLEAR_STATE_NO_MCS;
}
mt->disable_aux_buffers = true;
}

View File

@ -0,0 +1,213 @@
/*
* Copyright © 2016 Intel Corporation
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and associated documentation files (the "Software"),
* to deal in the Software without restriction, including without limitation
* the rights to use, copy, modify, merge, publish, distribute, sublicense,
* and/or sell copies of the Software, and to permit persons to whom the
* Software is furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice (including the next
* paragraph) shall be included in all copies or substantial portions of the
* Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
* THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
* IN THE SOFTWARE.
*/
#include <gtest/gtest.h>
#include "brw_fs.h"
#include "brw_cfg.h"
#include "program/program.h"
using namespace brw;
class copy_propagation_test : public ::testing::Test {
virtual void SetUp();
public:
struct brw_compiler *compiler;
struct gen_device_info *devinfo;
struct gl_context *ctx;
struct brw_wm_prog_data *prog_data;
struct gl_shader_program *shader_prog;
fs_visitor *v;
};
class copy_propagation_fs_visitor : public fs_visitor
{
public:
copy_propagation_fs_visitor(struct brw_compiler *compiler,
struct brw_wm_prog_data *prog_data,
nir_shader *shader)
: fs_visitor(compiler, NULL, NULL, NULL,
&prog_data->base, (struct gl_program *) NULL,
shader, 8, -1) {}
};
void copy_propagation_test::SetUp()
{
ctx = (struct gl_context *)calloc(1, sizeof(*ctx));
compiler = (struct brw_compiler *)calloc(1, sizeof(*compiler));
devinfo = (struct gen_device_info *)calloc(1, sizeof(*devinfo));
compiler->devinfo = devinfo;
prog_data = ralloc(NULL, struct brw_wm_prog_data);
nir_shader *shader =
nir_shader_create(NULL, MESA_SHADER_FRAGMENT, NULL);
v = new copy_propagation_fs_visitor(compiler, prog_data, shader);
devinfo->gen = 4;
}
static fs_inst *
instruction(bblock_t *block, int num)
{
fs_inst *inst = (fs_inst *)block->start();
for (int i = 0; i < num; i++) {
inst = (fs_inst *)inst->next;
}
return inst;
}
static bool
copy_propagation(fs_visitor *v)
{
const bool print = getenv("TEST_DEBUG");
if (print) {
fprintf(stderr, "= Before =\n");
v->cfg->dump(v);
}
bool ret = v->opt_copy_propagation();
if (print) {
fprintf(stderr, "\n= After =\n");
v->cfg->dump(v);
}
return ret;
}
TEST_F(copy_propagation_test, basic)
{
const fs_builder &bld = v->bld;
fs_reg vgrf0 = v->vgrf(glsl_type::float_type);
fs_reg vgrf1 = v->vgrf(glsl_type::float_type);
fs_reg vgrf2 = v->vgrf(glsl_type::float_type);
fs_reg vgrf3 = v->vgrf(glsl_type::float_type);
bld.MOV(vgrf0, vgrf2);
bld.ADD(vgrf1, vgrf0, vgrf3);
/* = Before =
*
* 0: mov(8) vgrf0 vgrf2
* 1: add(8) vgrf1 vgrf0 vgrf3
*
* = After =
* 0: mov(8) vgrf0 vgrf2
* 1: add(8) vgrf1 vgrf2 vgrf3
*/
v->calculate_cfg();
bblock_t *block0 = v->cfg->blocks[0];
EXPECT_EQ(0, block0->start_ip);
EXPECT_EQ(1, block0->end_ip);
EXPECT_TRUE(copy_propagation(v));
EXPECT_EQ(0, block0->start_ip);
EXPECT_EQ(1, block0->end_ip);
fs_inst *mov = instruction(block0, 0);
EXPECT_EQ(BRW_OPCODE_MOV, mov->opcode);
EXPECT_TRUE(mov->dst.equals(vgrf0));
EXPECT_TRUE(mov->src[0].equals(vgrf2));
fs_inst *add = instruction(block0, 1);
EXPECT_EQ(BRW_OPCODE_ADD, add->opcode);
EXPECT_TRUE(add->dst.equals(vgrf1));
EXPECT_TRUE(add->src[0].equals(vgrf2));
EXPECT_TRUE(add->src[1].equals(vgrf3));
}
TEST_F(copy_propagation_test, maxmax_sat_imm)
{
const fs_builder &bld = v->bld;
fs_reg vgrf0 = v->vgrf(glsl_type::float_type);
fs_reg vgrf1 = v->vgrf(glsl_type::float_type);
fs_reg vgrf2 = v->vgrf(glsl_type::float_type);
static const struct {
enum brw_conditional_mod conditional_mod;
float immediate;
bool expected_result;
} test[] = {
/* conditional mod, imm, expected_result */
{ BRW_CONDITIONAL_GE , 0.1f, true },
{ BRW_CONDITIONAL_L , 0.1f, true },
{ BRW_CONDITIONAL_GE , 0.5f, true },
{ BRW_CONDITIONAL_L , 0.5f, true },
{ BRW_CONDITIONAL_GE , 0.9f, true },
{ BRW_CONDITIONAL_L , 0.9f, true },
{ BRW_CONDITIONAL_GE , -1.5f, false },
{ BRW_CONDITIONAL_L , -1.5f, false },
{ BRW_CONDITIONAL_GE , 1.5f, false },
{ BRW_CONDITIONAL_L , 1.5f, false },
{ BRW_CONDITIONAL_NONE, 0.5f, false },
{ BRW_CONDITIONAL_Z , 0.5f, false },
{ BRW_CONDITIONAL_NZ , 0.5f, false },
{ BRW_CONDITIONAL_G , 0.5f, false },
{ BRW_CONDITIONAL_LE , 0.5f, false },
{ BRW_CONDITIONAL_R , 0.5f, false },
{ BRW_CONDITIONAL_O , 0.5f, false },
{ BRW_CONDITIONAL_U , 0.5f, false },
};
for (unsigned i = 0; i < sizeof(test) / sizeof(test[0]); i++) {
fs_inst *mov = set_saturate(true, bld.MOV(vgrf0, vgrf1));
fs_inst *sel = set_condmod(test[i].conditional_mod,
bld.SEL(vgrf2, vgrf0,
brw_imm_f(test[i].immediate)));
v->calculate_cfg();
bblock_t *block0 = v->cfg->blocks[0];
EXPECT_EQ(0, block0->start_ip);
EXPECT_EQ(1, block0->end_ip);
EXPECT_EQ(test[i].expected_result, copy_propagation(v));
EXPECT_EQ(0, block0->start_ip);
EXPECT_EQ(1, block0->end_ip);
EXPECT_EQ(BRW_OPCODE_MOV, mov->opcode);
EXPECT_TRUE(mov->saturate);
EXPECT_TRUE(mov->dst.equals(vgrf0));
EXPECT_TRUE(mov->src[0].equals(vgrf1));
EXPECT_EQ(BRW_OPCODE_SEL, sel->opcode);
EXPECT_EQ(test[i].conditional_mod, sel->conditional_mod);
EXPECT_EQ(test[i].expected_result, sel->saturate);
EXPECT_TRUE(sel->dst.equals(vgrf2));
if (test[i].expected_result) {
EXPECT_TRUE(sel->src[0].equals(vgrf1));
} else {
EXPECT_TRUE(sel->src[0].equals(vgrf0));
}
EXPECT_TRUE(sel->src[1].equals(brw_imm_f(test[i].immediate)));
delete v->cfg;
v->cfg = NULL;
}
}

View File

@ -925,7 +925,7 @@ valid_draw_indirect(struct gl_context *ctx,
* buffer bound.
*/
if (_mesa_is_gles31(ctx) &&
ctx->Array.VAO->_Enabled != ctx->Array.VAO->VertexAttribBufferMask) {
ctx->Array.VAO->_Enabled & ~ctx->Array.VAO->VertexAttribBufferMask) {
_mesa_error(ctx, GL_INVALID_OPERATION, "%s(No VBO bound)", name);
return GL_FALSE;
}

View File

@ -2850,6 +2850,7 @@ reuse_framebuffer_texture_attachment(struct gl_framebuffer *fb,
dst_att->Type = src_att->Type;
dst_att->Complete = src_att->Complete;
dst_att->TextureLevel = src_att->TextureLevel;
dst_att->CubeMapFace = src_att->CubeMapFace;
dst_att->Zoffset = src_att->Zoffset;
dst_att->Layered = src_att->Layered;
}

View File

@ -67,9 +67,7 @@ supported_interface_enum(struct gl_context *ctx, GLenum iface)
}
static struct gl_shader_program *
lookup_linked_program(GLuint program,
const char *caller,
bool raise_link_error)
lookup_linked_program(GLuint program, const char *caller)
{
GET_CURRENT_CONTEXT(ctx);
struct gl_shader_program *prog =
@ -79,66 +77,13 @@ lookup_linked_program(GLuint program,
return NULL;
if (prog->LinkStatus == GL_FALSE) {
if (raise_link_error)
_mesa_error(ctx, GL_INVALID_OPERATION, "%s(program not linked)",
caller);
_mesa_error(ctx, GL_INVALID_OPERATION, "%s(program not linked)",
caller);
return NULL;
}
return prog;
}
static GLenum
stage_from_program_interface(GLenum programInterface)
{
switch(programInterface) {
case GL_VERTEX_SUBROUTINE_UNIFORM:
return MESA_SHADER_VERTEX;
case GL_TESS_CONTROL_SUBROUTINE_UNIFORM:
return MESA_SHADER_TESS_CTRL;
case GL_TESS_EVALUATION_SUBROUTINE_UNIFORM:
return MESA_SHADER_TESS_EVAL;
case GL_GEOMETRY_SUBROUTINE_UNIFORM:
return MESA_SHADER_GEOMETRY;
case GL_FRAGMENT_SUBROUTINE_UNIFORM:
return MESA_SHADER_FRAGMENT;
case GL_COMPUTE_SUBROUTINE_UNIFORM:
return MESA_SHADER_COMPUTE;
default:
unreachable("unexpected programInterface value");
}
}
static struct gl_linked_shader *
lookup_linked_shader(GLuint program,
GLenum programInterface,
const char *caller)
{
struct gl_shader_program *shLinkedProg =
lookup_linked_program(program, caller, false);
gl_shader_stage stage = stage_from_program_interface(programInterface);
if (!shLinkedProg)
return NULL;
return shLinkedProg->_LinkedShaders[stage];
}
static bool
is_subroutine_uniform_program_interface(GLenum programInterface)
{
switch(programInterface) {
case GL_VERTEX_SUBROUTINE_UNIFORM:
case GL_TESS_CONTROL_SUBROUTINE_UNIFORM:
case GL_TESS_EVALUATION_SUBROUTINE_UNIFORM:
case GL_GEOMETRY_SUBROUTINE_UNIFORM:
case GL_FRAGMENT_SUBROUTINE_UNIFORM:
case GL_COMPUTE_SUBROUTINE_UNIFORM:
return true;
default:
return false;
}
}
void GLAPIENTRY
_mesa_GetProgramInterfaceiv(GLuint program, GLenum programInterface,
GLenum pname, GLint *params)
@ -174,49 +119,9 @@ _mesa_GetProgramInterfaceiv(GLuint program, GLenum programInterface,
/* Validate pname against interface. */
switch(pname) {
case GL_ACTIVE_RESOURCES:
if (is_subroutine_uniform_program_interface(programInterface)) {
/* ARB_program_interface_query doesn't explicitly says that those
* uniforms would need a linked shader, or that should fail if it is
* not the case, but Section 7.6 (Uniform Variables) of the OpenGL
* 4.4 Core Profile says:
*
* "A uniform is considered an active uniform if the compiler and
* linker determine that the uniform will actually be accessed
* when the executable code is executed. In cases where the
* compiler and linker cannot make a conclusive determination,
* the uniform will be considered active."
*
* So in order to know the real number of active subroutine uniforms
* we would need a linked shader .
*
* At the same time, Section 7.3 (Program Objects) of the OpenGL 4.4
* Core Profile says:
*
* "The GL provides various commands allowing applications to
* enumerate and query properties of active variables and in-
* terface blocks for a specified program. If one of these
* commands is called with a program for which LinkProgram
* succeeded, the information recorded when the program was
* linked is returned. If one of these commands is called with a
* program for which LinkProgram failed, no error is generated
* unless otherwise noted."
* <skip>
* "If one of these commands is called with a program for which
* LinkProgram had never been called, no error is generated
* unless otherwise noted, and the program object is considered
* to have no active variables or interface blocks."
*
* So if the program is not linked we will return 0.
*/
struct gl_linked_shader *sh =
lookup_linked_shader(program, programInterface, "glGetProgramInterfaceiv");
*params = sh ? sh->NumSubroutineUniforms : 0;
} else {
for (i = 0, *params = 0; i < shProg->NumProgramResourceList; i++)
if (shProg->ProgramResourceList[i].Type == programInterface)
(*params)++;
}
for (i = 0, *params = 0; i < shProg->NumProgramResourceList; i++)
if (shProg->ProgramResourceList[i].Type == programInterface)
(*params)++;
break;
case GL_MAX_NAME_LENGTH:
if (programInterface == GL_ATOMIC_COUNTER_BUFFER ||
@ -500,7 +405,7 @@ _mesa_GetProgramResourceLocation(GLuint program, GLenum programInterface,
}
struct gl_shader_program *shProg =
lookup_linked_program(program, "glGetProgramResourceLocation", true);
lookup_linked_program(program, "glGetProgramResourceLocation");
if (!shProg || !name)
return -1;
@ -556,7 +461,7 @@ _mesa_GetProgramResourceLocationIndex(GLuint program, GLenum programInterface,
}
struct gl_shader_program *shProg =
lookup_linked_program(program, "glGetProgramResourceLocationIndex", true);
lookup_linked_program(program, "glGetProgramResourceLocationIndex");
if (!shProg || !name)
return -1;

View File

@ -65,6 +65,7 @@ wsi_queue_init(struct wsi_queue *queue, int length)
if (ret)
goto fail_cond;
pthread_condattr_destroy(&condattr);
return 0;
fail_cond: