// Copyright 2013 The Go Authors. All rights reserved. // Use of this source code is governed by a BSD-style // license that can be found in the LICENSE file. /* Package pointer implements Andersen's analysis, an inclusion-based pointer analysis algorithm first described in (Andersen, 1994). The implementation is similar to that described in (Pearce et al, PASTE'04). Unlike many algorithms which interleave constraint generation and solving, constructing the callgraph as they go, this implementation has a strict phase ordering: generation before solving. Only simple (copy) constraints may be generated during solving. This improves the traction of presolver optimisations, but imposes certain restrictions, e.g. potential context sensitivity is limited since all variants must be created a priori. We intend to add various presolving optimisations such as Pointer and Location Equivalence from (Hardekopf & Lin, SAS '07) and solver optimisatisions such as Hybrid- and Lazy- Cycle Detection from (Hardekopf & Lin, PLDI'07), TERMINOLOGY We occasionally use C's x->f notation to distinguish the case where x is a struct pointer from x.f where is a struct value. NODES Nodes are the key datastructure of the analysis, and have a dual role: they represent both constraint variables (equivalence classes of pointers) and members of points-to sets (things that can be pointed at, i.e. "labels"). Nodes are naturally numbered. The numbering enables compact representations of sets of nodes such as bitvectors or BDDs; and the ordering enables a very cheap way to group related nodes together. (For example, passing n parameters consists of generating n parallel constraints from caller+i to callee+i for 0<=if requires a dynamic closure rule to create simple edges for each struct discovered in pts(x). The nodes of a struct consist of a special 'identity' node (whose type is that of the struct itself), followed by the nodes for all the struct's fields, recursively flattened out. A pointer to the struct is a pointer to its identity node. That node allows us to distinguish a pointer to a struct from a pointer to its first field. Field offsets are currently the logical field offsets (plus one for the identity node), so the sizes of the fields can be ignored by the analysis. Sound treatment of unsafe.Pointer conversions (not yet implemented) would require us to model memory layout using physical field offsets to ascertain which object field(s) might be aliased by a given FieldAddr of a different base pointer type. It would also require us to dispense with the identity node. *ssa.Field y = x.f creates a simple edge to y from x's node at f's offset. *ssa.FieldAddr y = &x->f requires a dynamic closure rule to create simple edges for each struct discovered in pts(x). Arrays We model an array by an identity node (whose type is that of the array itself) followed by a node representing all the elements of the array; the analysis does not distinguish elements with different indices. Effectively, an array is treated like struct{elem T}, a load y=x[i] like y=x.elem, and a store x[i]=y like x.elem=y; the index i is ignored. A pointer to an array is pointer to its identity node. (A slice is also a pointer to an array's identity node.) The identity node allows us to distinguish a pointer to an array from a pointer to one of its elements, but it is rather costly because it introduces more offset constraints into the system. Furthermore, sound treatment of unsafe.Pointer would require us to dispense with this node. Arrays may be allocated by Alloc, by make([]T), by calls to append, and via reflection. Tuples (T, ...) Tuples are treated like structs with naturally numbered fields. *ssa.Extract is analogous to *ssa.Field. However, tuples have no identity field since by construction, they cannot be address-taken. FUNCTION CALLS There are three kinds of function call: (1) static "call"-mode calls of functions. (2) dynamic "call"-mode calls of functions. (3) dynamic "invoke"-mode calls of interface methods. Cases 1 and 2 apply equally to methods and standalone functions. Static calls. A static call consists three steps: - finding the function object of the callee; - creating copy edges from the actual parameter value nodes to the params block in the function object (this includes the receiver if the callee is a method); - creating copy edges from the results block in the function object to the value nodes for the result of the call. Context sensitivity Static calls (alone) may be treated context sensitively, i.e. each callsite may cause a distinct re-analysis of the callee, improving precision. Our current context-sensitivity policy treats all intrinsics and getter/setter methods in this manner since such functions are small and seem like an obvious source of spurious confluences, though this has not yet been evaluated. Dynamic function calls Dynamic calls work in a similar manner except that the creation of copy edges occurs dynamically, in a similar fashion to a pair of struct copies: *fn->params = callargs callresult = *fn->results (Recall that the function object's params and results blocks are contiguous.) Interface method invocation For invoke-mode calls, we create a params/results block for the callsite and attach a dynamic closure rule to the interface. For each new interface object that flows to the interface, we look up the concrete method, find its function object, and connect its P/R block to the callsite's P/R block. Recording call targets The analysis notifies its clients of each callsite it encounters, passing a CallSite interface. Among other things, the CallSite contains a synthetic constraint variable ("targets") whose points-to solution includes the set of all function objects to which the call may dispatch. It is via this mechanism that the callgraph is made available. Clients may also elect to be notified of callgraph edges directly; internally this just iterates all "targets" variables' pts(·)s. SOLVER The solver is currently a very naive Andersen-style implementation, although it does use difference propagation (Pearce et al, SQC'04). There is much to do. FURTHER READING. Andersen, L. O. 1994. Program analysis and specialization for the C programming language. Ph.D. dissertation. DIKU, University of Copenhagen. David J. Pearce, Paul H. J. Kelly, and Chris Hankin. 2004. Efficient field-sensitive pointer analysis for C. In Proceedings of the 5th ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools and engineering (PASTE '04). ACM, New York, NY, USA, 37-42. http://doi.acm.org/10.1145/996821.996835 David J. Pearce, Paul H. J. Kelly, and Chris Hankin. 2004. Online Cycle Detection and Difference Propagation: Applications to Pointer Analysis. Software Quality Control 12, 4 (December 2004), 311-337. http://dx.doi.org/10.1023/B:SQJO.0000039791.93071.a2 David Grove and Craig Chambers. 2001. A framework for call graph construction algorithms. ACM Trans. Program. Lang. Syst. 23, 6 (November 2001), 685-746. http://doi.acm.org/10.1145/506315.506316 Ben Hardekopf and Calvin Lin. 2007. The ant and the grasshopper: fast and accurate pointer analysis for millions of lines of code. In Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation (PLDI '07). ACM, New York, NY, USA, 290-299. http://doi.acm.org/10.1145/1250734.1250767 Ben Hardekopf and Calvin Lin. 2007. Exploiting pointer and location equivalence to optimize pointer analysis. In Proceedings of the 14th international conference on Static Analysis (SAS'07), Hanne Riis Nielson and Gilberto Filé (Eds.). Springer-Verlag, Berlin, Heidelberg, 265-280. */ package pointer