/*

Package pointer implements Andersen's analysis, an inclusion-based
pointer analysis algorithm first described in (Andersen, 1994).

The implementation is similar to that described in (Pearce et al,
PASTE'04).  Unlike many algorithms which interleave constraint
generation and solving, constructing the callgraph as they go, this
implementation has a strict phase ordering: generation before solving.
Only simple (copy) constraints may be generated during solving.  This
improves the traction of presolver optimisations, but imposes certain
restrictions, e.g. potential context sensitivity is limited since all
variants must be created a priori.

We intend to add various presolving optimisations such as Pointer and
Location Equivalence from (Hardekopf & Lin, SAS '07) and solver
optimisatisions such as Hybrid- and Lazy- Cycle Detection from
(Hardekopf & Lin, PLDI'07),


TERMINOLOGY

We occasionally use C's x->f notation to distinguish the case where x
is a struct pointer from x.f where is a struct value.


NODES

Nodes are the key datastructure of the analysis, and have a dual role:
they represent both constraint variables (equivalence classes of
pointers) and members of points-to sets (things that can be pointed
at, i.e. "labels").

Nodes are naturally numbered.  The numbering enables compact
representations of sets of nodes such as bitvectors or BDDs; and the
ordering enables a very cheap way to group related nodes together.
(For example, passing n parameters consists of generating n parallel
constraints from caller+i to callee+i for 0<=i<n.)

The zero nodeid means "not a pointer".  Currently it's only used for
struct{} or ().  We generate all flow constraints, even for non-pointer
types, with the expectations that (a) presolver optimisations will
quickly collapse all the non-pointer ones and (b) we may get more
precise results by treating uintptr as a possible pointer.

Each node represents a scalar (word-sized) part of a value or object.
Aggregate types (structs, tuples, arrays) are recursively flattened
out into a sequential list of scalar component types, and all the
elements of an array are represented by a single node.  (The
flattening of a basic type is a list containing a single node.)

Nodes are connected into a graph with various kinds of labelled edges:
simple edges (or copy constraints) represent value flow.  Complex
edges (load, store, etc) trigger the creation of new simple edges
during the solving phase.


OBJECTS

An "object" is a contiguous sequence of nodes denoting an addressable
location: something that a pointer can point to.  The first node of an
object has the ntObject flag, and its size indicates the extent of the
object.

Objects include:
   - functions and globals;
   - variable allocations in the stack frame or heap;
   - maps, channels and slices created by calls to make();
   - allocations to construct an interface;
   - allocations caused by literals and conversions,
     e.g. []byte("foo"), []byte(str).
   - arrays allocated by calls to append();

Many objects have no Go types.  For example, the func, map and chan
type kinds in Go are all varieties of pointers, but the respective
objects are actual functions, maps, and channels.  Given the way we
model interfaces, they too are pointers to interface objects with no
Go type.  And an *ssa.Global denotes the address of a global variable,
but the object for a Global is the actual data.  So, types of objects
are usually "off by one indirection".

The individual nodes of an object are sometimes referred to as
"labels".

Objects containing no nodes (i.e. just empty structs; tuples may be
values but never objects in Go) are padded with an invalid-type node
to have at least one node so that there is something to point to.
(TODO(adonovan): I think this is unnecessary now that we have identity
nodes; check.)


ANALYSIS ABSTRACTION OF EACH TYPE

Variables of the following "scalar" types may be represented by a
single node: basic types, pointers, channels, maps, slices, 'func'
pointers, interfaces.

Pointers
  Nothing to say here.

Basic types (bool, string, numbers, unsafe.Pointer)
  Currently all fields in the flattening of a type, including
  non-pointer basic types such as int, are represented in objects and
  values.  Though non-pointer nodes within values are uninteresting,
  non-pointer nodes in objects may be useful (if address-taken)
  because they permit the analysis to deduce, in this example,

     var s struct{ ...; x int; ... }
     p := &s.x

  that p points to s.x.  If we ignored such object fields, we could only
  say that p points somewhere within s.

  All other basic types are ignored.  Expressions of these types have
  zero nodeid, and fields of these types within aggregate other types
  are omitted.

  unsafe.Pointer conversions are not yet modelled as pointer
  conversions.  Consequently uintptr is always a number and uintptr
  nodes do not point to any object.

Channels
  An expression of type 'chan T' is a kind of pointer that points
  exclusively to channel objects, i.e. objects created by MakeChan (or
  reflection).

  'chan T' is treated like *T.
  *ssa.MakeChan is treated as equivalent to new(T).
  *ssa.Send and receive (*ssa.UnOp(ARROW)) and are equivalent to store
   and load.

Maps
  An expression of type 'map[K]V' is a kind of pointer that points
  exclusively to map objects, i.e. objects created by MakeMap (or
  reflection).

  map K[V] is treated like *M where M = struct{k K; v V}.
  *ssa.MakeMap is equivalent to new(M).
  *ssa.MapUpdate is equivalent to *y=x where *y and x have type M.
  *ssa.Lookup is equivalent to y=x.v where x has type *M.

Slices
  A slice []T, which dynamically resembles a struct{array *T, len, cap int},
  is treated as if it were just a *T pointer; the len and cap fields are
  ignored.

  *ssa.MakeSlice is treated like new([1]T): an allocation of a
   singleton array.
  *ssa.Index on a slice is equivalent to a load.
  *ssa.IndexAddr on a slice returns the address of the sole element of the
  slice, i.e. the same address.
  *ssa.Slice is treated as a simple copy.

Functions
  An expression of type 'func...' is a kind of pointer that points
  exclusively to function objects.

  A function object has the following layout:

     identity         -- typ:*types.Signature; flags ⊇ {ntFunction}; data:*cgnode
     params_0         -- (the receiver, if a method)
     ...
     params_n-1
     results_0
     ...
     results_m-1

  There may be multiple function objects for the same *ssa.Function
  due to context-sensitive treatment of some functions.

  The first node is the function's identity node; its .data is the
  callgraph node (*cgnode) this object represents.
  Associated with every callsite is a special "targets" variable,
  whose pts(·) contains the identity node of each function to which
  the call may dispatch.  Identity words are not otherwise used.

  The following block of nodes represent the flattened-out types of
  the parameters and results of the function object, and are
  collectively known as its "P/R block".

  The treatment of free variables of closures (*ssa.Capture) is like
  that of global variables; it is not context-sensitive.
  *ssa.MakeClosure instructions create copy edges to Captures.

  A Go value of type 'func' (i.e. a pointer to one or more functions)
  is a pointer whose pts() contains function objects.  The valueNode()
  for an *ssa.Function returns a singleton for that function.

Interfaces
 An expression of type 'interface{...}' is a kind of pointer that
 points exclusively to interface objects.

 An interface object has the following layout:

    conctype          -- flags ⊇ {ntInterface}; data:*ssa.MakeInterface?
    value
    ...

 The conctype node's typ field is the concrete type of the interface
 value which follows, flattened out.  It has the ntInterface flag.
 Its associated data is the originating MakeInterface instruction, if
 any.

 Constructing an interface value causes generation of constraints for
 all of the concrete type's methods; we can't tell a priori which ones
 may be called.

 TypeAssert y = x.(T) is implemented by a dynamic filter triggered by
 each interface object E added to pts(x).  If T is an interface that
 E.conctype implements, pts(y) gets E.  If T is a concrete type then
 edge pts(y) <- E.value is added.

 ChangeInterface is a simple copy because the representation of
 interface objects is independent of the interface type (in contrast
 to the "method tables" approach used by the gc runtime).

 y := Invoke x.m(...) is implemented by allocating a contiguous P/R
 block for the callsite and adding a dynamic rule triggered by each
 interface object E added to pts(x).  The rule adds param/results copy
 edges to/from each discovered concrete method.

 (Q. Why do we model an interface as a pointer to a pair of type and
  value, rather than as a pair of a pointer to type and a pointer to
  value?
  A. Control-flow joins would merge interfaces ({T1}, {V1}) and ({T2},
  {V2}) to make ({T1,T2}, {V1,V2}), leading to the infeasible and
  type-unsafe combination (T1,V2).  Treating the value and its concrete
  type as inseparable makes the analysis type-safe.)


Aggregate types:

Aggregate types are treated as if all directly contained
aggregates are recursively flattened out.

Structs
  *ssa.Field y = x.f creates a simple edge to y from x's node at f's offset.

  *ssa.FieldAddr y = &x->f requires a dynamic closure rule to create
   simple edges for each struct discovered in pts(x).

  The nodes of a struct consist of a special 'identity' node (whose
  type is that of the struct itself), followed by the nodes for all
  the struct's fields, recursively flattened out.  A pointer to the
  struct is a pointer to its identity node.  That node allows us to
  distinguish a pointer to a struct from a pointer to its first field.

  Field offsets are currently the logical field offsets (plus one for
  the identity node), so the sizes of the fields can be ignored by the
  analysis.

  Sound treatment of unsafe.Pointer conversions (not yet implemented)
  would require us to model memory layout using physical field offsets
  to ascertain which object field(s) might be aliased by a given
  FieldAddr of a different base pointer type.  It would also require
  us to dispense with the identity node.

  *ssa.Field y = x.f creates a simple edge to y from x's node at f's offset.

  *ssa.FieldAddr y = &x->f requires a dynamic closure rule to create
   simple edges for each struct discovered in pts(x).

Arrays
  We model an array by an identity node (whose type is that of the
  array itself) followed by a node representing all the elements of
  the array; the analysis does not distinguish elements with different
  indices.  Effectively, an array is treated like struct{elem T}, a
  load y=x[i] like y=x.elem, and a store x[i]=y like x.elem=y; the
  index i is ignored.

  A pointer to an array is pointer to its identity node.  (A slice is
  also a pointer to an array's identity node.)  The identity node
  allows us to distinguish a pointer to an array from a pointer to one
  of its elements, but it is rather costly because it introduces more
  offset constraints into the system.  Furthermore, sound treatment of
  unsafe.Pointer would require us to dispense with this node.

  Arrays may be allocated by Alloc, by make([]T), by calls to append,
  and via reflection.

Tuples (T, ...)
  Tuples are treated like structs with naturally numbered fields.
  *ssa.Extract is analogous to *ssa.Field.

  However, tuples have no identity field since by construction, they
  cannot be address-taken.


FUNCTION CALLS

  There are three kinds of function call:
  (1) static "call"-mode calls of functions.
  (2) dynamic "call"-mode calls of functions.
  (3) dynamic "invoke"-mode calls of interface methods.
  Cases 1 and 2 apply equally to methods and standalone functions.

  Static calls.
    A static call consists three steps:
    - finding the function object of the callee;
    - creating copy edges from the actual parameter value nodes to the
      params block in the function object (this includes the receiver
      if the callee is a method);
    - creating copy edges from the results block in the function
      object to the value nodes for the result of the call.

    Context sensitivity

      Static calls (alone) may be treated context sensitively,
      i.e. each callsite may cause a distinct re-analysis of the
      callee, improving precision.  Our current context-sensitivity
      policy treats all intrinsics and getter/setter methods in this
      manner since such functions are small and seem like an obvious
      source of spurious confluences, though this has not yet been
      evaluated.

  Dynamic function calls

    Dynamic calls work in a similar manner except that the creation of
    copy edges occurs dynamically, in a similar fashion to a pair of
    struct copies:

      *fn->params = callargs
      callresult = *fn->results

    (Recall that the function object's params and results blocks are
    contiguous.)

  Interface method invocation

    For invoke-mode calls, we create a params/results block for the
    callsite and attach a dynamic closure rule to the interface.  For
    each new interface object that flows to the interface, we look up
    the concrete method, find its function object, and connect its P/R
    block to the callsite's P/R block.

  Recording call targets

    The analysis notifies its clients of each callsite it encounters,
    passing a CallSite interface.  Among other things, the CallSite
    contains a synthetic constraint variable ("targets") whose
    points-to solution includes the set of all function objects to
    which the call may dispatch.

    It is via this mechanism that the callgraph is made available.
    Clients may also elect to be notified of callgraph edges directly;
    internally this just iterates all "targets" variables' pts(·)s.


SOLVER

The solver is currently a very naive Andersen-style implementation,
although it does use difference propagation (Pearce et al, SQC'04).
There is much to do.


FURTHER READING.

Andersen, L. O. 1994. Program analysis and specialization for the C
programming language. Ph.D. dissertation. DIKU, University of
Copenhagen.

David J. Pearce, Paul H. J. Kelly, and Chris Hankin. 2004.  Efficient
field-sensitive pointer analysis for C. In Proceedings of the 5th ACM
SIGPLAN-SIGSOFT workshop on Program analysis for software tools and
engineering (PASTE '04). ACM, New York, NY, USA, 37-42.
http://doi.acm.org/10.1145/996821.996835

David J. Pearce, Paul H. J. Kelly, and Chris Hankin. 2004. Online
Cycle Detection and Difference Propagation: Applications to Pointer
Analysis. Software Quality Control 12, 4 (December 2004), 311-337.
http://dx.doi.org/10.1023/B:SQJO.0000039791.93071.a2

David Grove and Craig Chambers. 2001. A framework for call graph
construction algorithms. ACM Trans. Program. Lang. Syst. 23, 6
(November 2001), 685-746.
http://doi.acm.org/10.1145/506315.506316

Ben Hardekopf and Calvin Lin. 2007. The ant and the grasshopper: fast
and accurate pointer analysis for millions of lines of code. In
Proceedings of the 2007 ACM SIGPLAN conference on Programming language
design and implementation (PLDI '07). ACM, New York, NY, USA, 290-299.
http://doi.acm.org/10.1145/1250734.1250767

Ben Hardekopf and Calvin Lin. 2007. Exploiting pointer and location
equivalence to optimize pointer analysis. In Proceedings of the 14th
international conference on Static Analysis (SAS'07), Hanne Riis
Nielson and Gilberto Filé (Eds.). Springer-Verlag, Berlin, Heidelberg,
265-280.

*/
package pointer