arnaucube
/
derosuite
mirror of https://github.com/arnaucube/derosuite.git


								// Copyright 2013 The Go Authors. All rights reserved.

								// Use of this source code is governed by a BSD-style

								// license that can be found in the LICENSE file.


								/*


								Package pointer implements Andersen's analysis, an inclusion-based

								pointer analysis algorithm first described in (Andersen, 1994).


								A pointer analysis relates every pointer expression in a whole program

								to the set of memory locations to which it might point.  This

								information can be used to construct a call graph of the program that

								precisely represents the destinations of dynamic function and method

								calls.  It can also be used to determine, for example, which pairs of

								channel operations operate on the same channel.


								The package allows the client to request a set of expressions of

								interest for which the points-to information will be returned once the

								analysis is complete.  In addition, the client may request that a

								callgraph is constructed.  The example program in example_test.go

								demonstrates both of these features.  Clients should not request more

								information than they need since it may increase the cost of the

								analysis significantly.


								CLASSIFICATION


								Our algorithm is INCLUSION-BASED: the points-to sets for x and y will

								be related by pts(y) ⊇ pts(x) if the program contains the statement

								y = x.


								It is FLOW-INSENSITIVE: it ignores all control flow constructs and the

								order of statements in a program.  It is therefore a "MAY ALIAS"

								analysis: its facts are of the form "P may/may not point to L",

								not "P must point to L".


								It is FIELD-SENSITIVE: it builds separate points-to sets for distinct

								fields, such as x and y in struct { x, y *int }.


								It is mostly CONTEXT-INSENSITIVE: most functions are analyzed once,

								so values can flow in at one call to the function and return out at

								another.  Only some smaller functions are analyzed with consideration

								of their calling context.


								It has a CONTEXT-SENSITIVE HEAP: objects are named by both allocation

								site and context, so the objects returned by two distinct calls to f:

								   func f() *T { return new(T) }

								are distinguished up to the limits of the calling context.


								It is a WHOLE PROGRAM analysis: it requires SSA-form IR for the

								complete Go program and summaries for native code.


								See the (Hind, PASTE'01) survey paper for an explanation of these terms.


								SOUNDNESS


								The analysis is fully sound when invoked on pure Go programs that do not

								use reflection or unsafe.Pointer conversions.  In other words, if there

								is any possible execution of the program in which pointer P may point to

								object O, the analysis will report that fact.


								REFLECTION


								By default, the "reflect" library is ignored by the analysis, as if all

								its functions were no-ops, but if the client enables the Reflection flag,

								the analysis will make a reasonable attempt to model the effects of

								calls into this library.  However, this comes at a significant

								performance cost, and not all features of that library are yet

								implemented.  In addition, some simplifying approximations must be made

								to ensure that the analysis terminates; for example, reflection can be

								used to construct an infinite set of types and values of those types,

								but the analysis arbitrarily bounds the depth of such types.


								Most but not all reflection operations are supported.

								In particular, addressable reflect.Values are not yet implemented, so

								operations such as (reflect.Value).Set have no analytic effect.


								UNSAFE POINTER CONVERSIONS


								The pointer analysis makes no attempt to understand aliasing between the

								operand x and result y of an unsafe.Pointer conversion:

								   y = (*T)(unsafe.Pointer(x))

								It is as if the conversion allocated an entirely new object:

								   y = new(T)


								NATIVE CODE


								The analysis cannot model the aliasing effects of functions written in

								languages other than Go, such as runtime intrinsics in C or assembly, or

								code accessed via cgo.  The result is as if such functions are no-ops.

								However, various important intrinsics are understood by the analysis,

								along with built-ins such as append.


								The analysis currently provides no way for users to specify the aliasing

								effects of native code.


								------------------------------------------------------------------------


								IMPLEMENTATION


								The remaining documentation is intended for package maintainers and

								pointer analysis specialists.  Maintainers should have a solid

								understanding of the referenced papers (especially those by H&L and PKH)

								before making making significant changes.


								The implementation is similar to that described in (Pearce et al,

								PASTE'04).  Unlike many algorithms which interleave constraint

								generation and solving, constructing the callgraph as they go, this

								implementation for the most part observes a phase ordering (generation

								before solving), with only simple (copy) constraints being generated

								during solving.  (The exception is reflection, which creates various

								constraints during solving as new types flow to reflect.Value

								operations.)  This improves the traction of presolver optimisations,

								but imposes certain restrictions, e.g. potential context sensitivity

								is limited since all variants must be created a priori.


								TERMINOLOGY


								A type is said to be "pointer-like" if it is a reference to an object.

								Pointer-like types include pointers and also interfaces, maps, channels,

								functions and slices.


								We occasionally use C's x->f notation to distinguish the case where x

								is a struct pointer from x.f where is a struct value.


								Pointer analysis literature (and our comments) often uses the notation

								dst=*src+offset to mean something different than what it means in Go.

								It means: for each node index p in pts(src), the node index p+offset is

								in pts(dst).  Similarly *dst+offset=src is used for store constraints

								and dst=src+offset for offset-address constraints.


								NODES


								Nodes are the key datastructure of the analysis, and have a dual role:

								they represent both constraint variables (equivalence classes of

								pointers) and members of points-to sets (things that can be pointed

								at, i.e. "labels").


								Nodes are naturally numbered.  The numbering enables compact

								representations of sets of nodes such as bitvectors (or BDDs); and the

								ordering enables a very cheap way to group related nodes together.  For

								example, passing n parameters consists of generating n parallel

								constraints from caller+i to callee+i for 0<=i<n.


								The zero nodeid means "not a pointer".  For simplicity, we generate flow

								constraints even for non-pointer types such as int.  The pointer

								equivalence (PE) presolver optimization detects which variables cannot

								point to anything; this includes not only all variables of non-pointer

								types (such as int) but also variables of pointer-like types if they are

								always nil, or are parameters to a function that is never called.


								Each node represents a scalar part of a value or object.

								Aggregate types (structs, tuples, arrays) are recursively flattened

								out into a sequential list of scalar component types, and all the

								elements of an array are represented by a single node.  (The

								flattening of a basic type is a list containing a single node.)


								Nodes are connected into a graph with various kinds of labelled edges:

								simple edges (or copy constraints) represent value flow.  Complex

								edges (load, store, etc) trigger the creation of new simple edges

								during the solving phase.


								OBJECTS


								Conceptually, an "object" is a contiguous sequence of nodes denoting

								an addressable location: something that a pointer can point to.  The

								first node of an object has a non-nil obj field containing information

								about the allocation: its size, context, and ssa.Value.


								Objects include:

								   - functions and globals;

								   - variable allocations in the stack frame or heap;

								   - maps, channels and slices created by calls to make();

								   - allocations to construct an interface;

								   - allocations caused by conversions, e.g. []byte(str).

								   - arrays allocated by calls to append();


								Many objects have no Go types.  For example, the func, map and chan type

								kinds in Go are all varieties of pointers, but their respective objects

								are actual functions (executable code), maps (hash tables), and channels

								(synchronized queues).  Given the way we model interfaces, they too are

								pointers to "tagged" objects with no Go type.  And an *ssa.Global denotes

								the address of a global variable, but the object for a Global is the

								actual data.  So, the types of an ssa.Value that creates an object is

								"off by one indirection": a pointer to the object.


								The individual nodes of an object are sometimes referred to as "labels".


								For uniformity, all objects have a non-zero number of fields, even those

								of the empty type struct{}.  (All arrays are treated as if of length 1,

								so there are no empty arrays.  The empty tuple is never address-taken,

								so is never an object.)


								TAGGED OBJECTS


								An tagged object has the following layout:


								    T          -- obj.flags ⊇ {otTagged}

								    v

								    ...


								The T node's typ field is the dynamic type of the "payload": the value

								v which follows, flattened out.  The T node's obj has the otTagged

								flag.


								Tagged objects are needed when generalizing across types: interfaces,

								reflect.Values, reflect.Types.  Each of these three types is modelled

								as a pointer that exclusively points to tagged objects.


								Tagged objects may be indirect (obj.flags ⊇ {otIndirect}) meaning that

								the value v is not of type T but *T; this is used only for

								reflect.Values that represent lvalues.  (These are not implemented yet.)


								ANALYSIS ABSTRACTION OF EACH TYPE


								Variables of the following "scalar" types may be represented by a

								single node: basic types, pointers, channels, maps, slices, 'func'

								pointers, interfaces.


								Pointers

								  Nothing to say here, oddly.


								Basic types (bool, string, numbers, unsafe.Pointer)

								  Currently all fields in the flattening of a type, including

								  non-pointer basic types such as int, are represented in objects and

								  values.  Though non-pointer nodes within values are uninteresting,

								  non-pointer nodes in objects may be useful (if address-taken)

								  because they permit the analysis to deduce, in this example,


								     var s struct{ ...; x int; ... }

								     p := &s.x


								  that p points to s.x.  If we ignored such object fields, we could only

								  say that p points somewhere within s.


								  All other basic types are ignored.  Expressions of these types have

								  zero nodeid, and fields of these types within aggregate other types

								  are omitted.


								  unsafe.Pointers are not modelled as pointers, so a conversion of an

								  unsafe.Pointer to *T is (unsoundly) treated equivalent to new(T).


								Channels

								  An expression of type 'chan T' is a kind of pointer that points

								  exclusively to channel objects, i.e. objects created by MakeChan (or

								  reflection).


								  'chan T' is treated like *T.

								  *ssa.MakeChan is treated as equivalent to new(T).

								  *ssa.Send and receive (*ssa.UnOp(ARROW)) and are equivalent to store

								   and load.


								Maps

								  An expression of type 'map[K]V' is a kind of pointer that points

								  exclusively to map objects, i.e. objects created by MakeMap (or

								  reflection).


								  map K[V] is treated like *M where M = struct{k K; v V}.

								  *ssa.MakeMap is equivalent to new(M).

								  *ssa.MapUpdate is equivalent to *y=x where *y and x have type M.

								  *ssa.Lookup is equivalent to y=x.v where x has type *M.


								Slices

								  A slice []T, which dynamically resembles a struct{array *T, len, cap int},

								  is treated as if it were just a *T pointer; the len and cap fields are

								  ignored.


								  *ssa.MakeSlice is treated like new([1]T): an allocation of a

								   singleton array.

								  *ssa.Index on a slice is equivalent to a load.

								  *ssa.IndexAddr on a slice returns the address of the sole element of the

								  slice, i.e. the same address.

								  *ssa.Slice is treated as a simple copy.


								Functions

								  An expression of type 'func...' is a kind of pointer that points

								  exclusively to function objects.


								  A function object has the following layout:


								     identity         -- typ:*types.Signature; obj.flags ⊇ {otFunction}

								     params_0         -- (the receiver, if a method)

								     ...

								     params_n-1

								     results_0

								     ...

								     results_m-1


								  There may be multiple function objects for the same *ssa.Function

								  due to context-sensitive treatment of some functions.


								  The first node is the function's identity node.

								  Associated with every callsite is a special "targets" variable,

								  whose pts() contains the identity node of each function to which

								  the call may dispatch.  Identity words are not otherwise used during

								  the analysis, but we construct the call graph from the pts()

								  solution for such nodes.


								  The following block of contiguous nodes represents the flattened-out

								  types of the parameters ("P-block") and results ("R-block") of the

								  function object.


								  The treatment of free variables of closures (*ssa.FreeVar) is like

								  that of global variables; it is not context-sensitive.

								  *ssa.MakeClosure instructions create copy edges to Captures.


								  A Go value of type 'func' (i.e. a pointer to one or more functions)

								  is a pointer whose pts() contains function objects.  The valueNode()

								  for an *ssa.Function returns a singleton for that function.


								Interfaces

								  An expression of type 'interface{...}' is a kind of pointer that

								  points exclusively to tagged objects.  All tagged objects pointed to

								  by an interface are direct (the otIndirect flag is clear) and

								  concrete (the tag type T is not itself an interface type).  The

								  associated ssa.Value for an interface's tagged objects may be an

								  *ssa.MakeInterface instruction, or nil if the tagged object was

								  created by an instrinsic (e.g. reflection).


								  Constructing an interface value causes generation of constraints for

								  all of the concrete type's methods; we can't tell a priori which

								  ones may be called.


								  TypeAssert y = x.(T) is implemented by a dynamic constraint

								  triggered by each tagged object O added to pts(x): a typeFilter

								  constraint if T is an interface type, or an untag constraint if T is

								  a concrete type.  A typeFilter tests whether O.typ implements T; if

								  so, O is added to pts(y).  An untagFilter tests whether O.typ is

								  assignable to T,and if so, a copy edge O.v -> y is added.


								  ChangeInterface is a simple copy because the representation of

								  tagged objects is independent of the interface type (in contrast

								  to the "method tables" approach used by the gc runtime).


								  y := Invoke x.m(...) is implemented by allocating contiguous P/R

								  blocks for the callsite and adding a dynamic rule triggered by each

								  tagged object added to pts(x).  The rule adds param/results copy

								  edges to/from each discovered concrete method.


								  (Q. Why do we model an interface as a pointer to a pair of type and

								  value, rather than as a pair of a pointer to type and a pointer to

								  value?

								  A. Control-flow joins would merge interfaces ({T1}, {V1}) and ({T2},

								  {V2}) to make ({T1,T2}, {V1,V2}), leading to the infeasible and

								  type-unsafe combination (T1,V2).  Treating the value and its concrete

								  type as inseparable makes the analysis type-safe.)


								reflect.Value

								  A reflect.Value is modelled very similar to an interface{}, i.e. as

								  a pointer exclusively to tagged objects, but with two generalizations.


								  1) a reflect.Value that represents an lvalue points to an indirect

								     (obj.flags ⊇ {otIndirect}) tagged object, which has a similar

								     layout to an tagged object except that the value is a pointer to

								     the dynamic type.  Indirect tagged objects preserve the correct

								     aliasing so that mutations made by (reflect.Value).Set can be

								     observed.


								     Indirect objects only arise when an lvalue is derived from an

								     rvalue by indirection, e.g. the following code:


								        type S struct { X T }

								        var s S

								        var i interface{} = &s    // i points to a *S-tagged object (from MakeInterface)

								        v1 := reflect.ValueOf(i)  // v1 points to same *S-tagged object as i

								        v2 := v1.Elem()           // v2 points to an indirect S-tagged object, pointing to s

								        v3 := v2.FieldByName("X") // v3 points to an indirect int-tagged object, pointing to s.X

								        v3.Set(y)                 // pts(s.X) ⊇ pts(y)


								     Whether indirect or not, the concrete type of the tagged object

								     corresponds to the user-visible dynamic type, and the existence

								     of a pointer is an implementation detail.


								     (NB: indirect tagged objects are not yet implemented)


								  2) The dynamic type tag of a tagged object pointed to by a

								     reflect.Value may be an interface type; it need not be concrete.


								     This arises in code such as this:

								        tEface := reflect.TypeOf(new(interface{}).Elem() // interface{}

								        eface := reflect.Zero(tEface)

								     pts(eface) is a singleton containing an interface{}-tagged

								     object.  That tagged object's payload is an interface{} value,

								     i.e. the pts of the payload contains only concrete-tagged

								     objects, although in this example it's the zero interface{} value,

								     so its pts is empty.


								reflect.Type

								  Just as in the real "reflect" library, we represent a reflect.Type

								  as an interface whose sole implementation is the concrete type,

								  *reflect.rtype.  (This choice is forced on us by go/types: clients

								  cannot fabricate types with arbitrary method sets.)


								  rtype instances are canonical: there is at most one per dynamic

								  type.  (rtypes are in fact large structs but since identity is all

								  that matters, we represent them by a single node.)


								  The payload of each *rtype-tagged object is an *rtype pointer that

								  points to exactly one such canonical rtype object.  We exploit this

								  by setting the node.typ of the payload to the dynamic type, not

								  '*rtype'.  This saves us an indirection in each resolution rule.  As

								  an optimisation, *rtype-tagged objects are canonicalized too.


								Aggregate types:


								Aggregate types are treated as if all directly contained

								aggregates are recursively flattened out.


								Structs

								  *ssa.Field y = x.f creates a simple edge to y from x's node at f's offset.


								  *ssa.FieldAddr y = &x->f requires a dynamic closure rule to create

								   simple edges for each struct discovered in pts(x).


								  The nodes of a struct consist of a special 'identity' node (whose

								  type is that of the struct itself), followed by the nodes for all

								  the struct's fields, recursively flattened out.  A pointer to the

								  struct is a pointer to its identity node.  That node allows us to

								  distinguish a pointer to a struct from a pointer to its first field.


								  Field offsets are logical field offsets (plus one for the identity

								  node), so the sizes of the fields can be ignored by the analysis.


								  (The identity node is non-traditional but enables the distinction

								  described above, which is valuable for code comprehension tools.

								  Typical pointer analyses for C, whose purpose is compiler

								  optimization, must soundly model unsafe.Pointer (void*) conversions,

								  and this requires fidelity to the actual memory layout using physical

								  field offsets.)


								  *ssa.Field y = x.f creates a simple edge to y from x's node at f's offset.


								  *ssa.FieldAddr y = &x->f requires a dynamic closure rule to create

								   simple edges for each struct discovered in pts(x).


								Arrays

								  We model an array by an identity node (whose type is that of the

								  array itself) followed by a node representing all the elements of

								  the array; the analysis does not distinguish elements with different

								  indices.  Effectively, an array is treated like struct{elem T}, a

								  load y=x[i] like y=x.elem, and a store x[i]=y like x.elem=y; the

								  index i is ignored.


								  A pointer to an array is pointer to its identity node.  (A slice is

								  also a pointer to an array's identity node.)  The identity node

								  allows us to distinguish a pointer to an array from a pointer to one

								  of its elements, but it is rather costly because it introduces more

								  offset constraints into the system.  Furthermore, sound treatment of

								  unsafe.Pointer would require us to dispense with this node.


								  Arrays may be allocated by Alloc, by make([]T), by calls to append,

								  and via reflection.


								Tuples (T, ...)

								  Tuples are treated like structs with naturally numbered fields.

								  *ssa.Extract is analogous to *ssa.Field.


								  However, tuples have no identity field since by construction, they

								  cannot be address-taken.


								FUNCTION CALLS


								  There are three kinds of function call:

								  (1) static "call"-mode calls of functions.

								  (2) dynamic "call"-mode calls of functions.

								  (3) dynamic "invoke"-mode calls of interface methods.

								  Cases 1 and 2 apply equally to methods and standalone functions.


								  Static calls.

								    A static call consists three steps:

								    - finding the function object of the callee;

								    - creating copy edges from the actual parameter value nodes to the

								      P-block in the function object (this includes the receiver if

								      the callee is a method);

								    - creating copy edges from the R-block in the function object to

								      the value nodes for the result of the call.


								    A static function call is little more than two struct value copies

								    between the P/R blocks of caller and callee:


								       callee.P = caller.P

								       caller.R = callee.R


								    Context sensitivity


								      Static calls (alone) may be treated context sensitively,

								      i.e. each callsite may cause a distinct re-analysis of the

								      callee, improving precision.  Our current context-sensitivity

								      policy treats all intrinsics and getter/setter methods in this

								      manner since such functions are small and seem like an obvious

								      source of spurious confluences, though this has not yet been

								      evaluated.


								  Dynamic function calls


								    Dynamic calls work in a similar manner except that the creation of

								    copy edges occurs dynamically, in a similar fashion to a pair of

								    struct copies in which the callee is indirect:


								       callee->P = caller.P

								       caller.R = callee->R


								    (Recall that the function object's P- and R-blocks are contiguous.)


								  Interface method invocation


								    For invoke-mode calls, we create a params/results block for the

								    callsite and attach a dynamic closure rule to the interface.  For

								    each new tagged object that flows to the interface, we look up

								    the concrete method, find its function object, and connect its P/R

								    blocks to the callsite's P/R blocks, adding copy edges to the graph

								    during solving.


								  Recording call targets


								    The analysis notifies its clients of each callsite it encounters,

								    passing a CallSite interface.  Among other things, the CallSite

								    contains a synthetic constraint variable ("targets") whose

								    points-to solution includes the set of all function objects to

								    which the call may dispatch.


								    It is via this mechanism that the callgraph is made available.

								    Clients may also elect to be notified of callgraph edges directly;

								    internally this just iterates all "targets" variables' pts(·)s.


								PRESOLVER


								We implement Hash-Value Numbering (HVN), a pre-solver constraint

								optimization described in Hardekopf & Lin, SAS'07.  This is documented

								in more detail in hvn.go.  We intend to add its cousins HR and HU in

								future.


								SOLVER


								The solver is currently a naive Andersen-style implementation; it does

								not perform online cycle detection, though we plan to add solver

								optimisations such as Hybrid- and Lazy- Cycle Detection from (Hardekopf

								& Lin, PLDI'07).


								It uses difference propagation (Pearce et al, SQC'04) to avoid

								redundant re-triggering of closure rules for values already seen.


								Points-to sets are represented using sparse bit vectors (similar to

								those used in LLVM and gcc), which are more space- and time-efficient

								than sets based on Go's built-in map type or dense bit vectors.


								Nodes are permuted prior to solving so that object nodes (which may

								appear in points-to sets) are lower numbered than non-object (var)

								nodes.  This improves the density of the set over which the PTSs

								range, and thus the efficiency of the representation.


								Partly thanks to avoiding map iteration, the execution of the solver is

								100% deterministic, a great help during debugging.


								FURTHER READING


								Andersen, L. O. 1994. Program analysis and specialization for the C

								programming language. Ph.D. dissertation. DIKU, University of

								Copenhagen.


								David J. Pearce, Paul H. J. Kelly, and Chris Hankin. 2004.  Efficient

								field-sensitive pointer analysis for C. In Proceedings of the 5th ACM

								SIGPLAN-SIGSOFT workshop on Program analysis for software tools and

								engineering (PASTE '04). ACM, New York, NY, USA, 37-42.

								http://doi.acm.org/10.1145/996821.996835


								David J. Pearce, Paul H. J. Kelly, and Chris Hankin. 2004. Online

								Cycle Detection and Difference Propagation: Applications to Pointer

								Analysis. Software Quality Control 12, 4 (December 2004), 311-337.

								http://dx.doi.org/10.1023/B:SQJO.0000039791.93071.a2


								David Grove and Craig Chambers. 2001. A framework for call graph

								construction algorithms. ACM Trans. Program. Lang. Syst. 23, 6

								(November 2001), 685-746.

								http://doi.acm.org/10.1145/506315.506316


								Ben Hardekopf and Calvin Lin. 2007. The ant and the grasshopper: fast

								and accurate pointer analysis for millions of lines of code. In

								Proceedings of the 2007 ACM SIGPLAN conference on Programming language

								design and implementation (PLDI '07). ACM, New York, NY, USA, 290-299.

								http://doi.acm.org/10.1145/1250734.1250767


								Ben Hardekopf and Calvin Lin. 2007. Exploiting pointer and location

								equivalence to optimize pointer analysis. In Proceedings of the 14th

								international conference on Static Analysis (SAS'07), Hanne Riis

								Nielson and Gilberto Filé (Eds.). Springer-Verlag, Berlin, Heidelberg,

								265-280.


								Atanas Rountev and Satish Chandra. 2000. Off-line variable substitution

								for scaling points-to analysis. In Proceedings of the ACM SIGPLAN 2000

								conference on Programming language design and implementation (PLDI '00).

								ACM, New York, NY, USA, 47-56. DOI=10.1145/349299.349310

								http://doi.acm.org/10.1145/349299.349310


								*/

								package pointer // import "golang.org/x/tools/go/pointer"