1
0
mirror of https://github.com/golang/go synced 2024-11-05 15:56:12 -07:00
go/internal/apidiff/README.md
Jonathan Amsterdam 9e5445377b internal/apidiff: diffs of package APIs
This is copied unchanged from x/exp.

Change-Id: I944b912212f7fd844a4bea81605433baf4bcc9a2
Reviewed-on: https://go-review.googlesource.com/c/tools/+/170862
Reviewed-by: Jay Conrod <jayconrod@google.com>
2019-04-09 17:10:12 +00:00

22 KiB

Checking Go Package API Compatibility

The apidiff tool in this directory determines whether two versions of the same package are compatible. The goal is to help the developer make an informed choice of semantic version after they have changed the code of their module.

apidiff reports two kinds of changes: incompatible ones, which require incrementing the major part of the semantic version, and compatible ones, which require a minor version increment. If no API changes are reported but there are code changes that could affect client code, then the patch version should be incremented.

Because apidiff ignores package import paths, it may be used to display API differences between any two packages, not just different versions of the same package.

The current version of apidiff compares only packages, not modules.

Compatibility Desiderata

Any tool that checks compatibility can offer only an approximation. No tool can detect behavioral changes; and even if it could, whether a behavioral change is a breaking change or not depends on many factors, such as whether it closes a security hole or fixes a bug. Even a change that causes some code to fail to compile may not be considered a breaking change by the developers or their users. It may only affect code marked as experimental or unstable, for example, or the break may only manifest in unlikely cases.

For a tool to be useful, its notion of compatibility must be relaxed enough to allow reasonable changes, like adding a field to a struct, but strict enough to catch significant breaking changes. A tool that is too lax will miss important incompatibilities, and users will stop trusting it; one that is too strict may generate so much noise that users will ignore it.

To a first approximation, this tool reports a change as incompatible if it could cause client code to stop compiling. But apidiff ignores five ways in which code may fail to compile after a change. Three of them are mentioned in the Go 1 Compatibility Guarantee.

Unkeyed Struct Literals

Code that uses an unkeyed struct literal would fail to compile if a field was added to the struct, making any such addition an incompatible change. An example:

// old
type Point struct { X, Y int }

// new
type Point struct { X, Y, Z int }

// client
p := pkg.Point{1, 2} // fails in new because there are more fields than expressions

Here and below, we provide three snippets: the code in the old version of the package, the code in the new version, and the code written in a client of the package, which refers to it by the name pkg. The client code compiles against the old code but not the new.

Embedding and Shadowing

Adding an exported field to a struct can break code that embeds that struct, because the newly added field may conflict with an identically named field at the same struct depth. A selector referring to the latter would become ambiguous and thus erroneous.

// old
type Point struct { X, Y int }

// new
type Point struct { X, Y, Z int }

// client
type z struct { Z int }

var v struct {
    pkg.Point
    z
}

_ = v.Z // fails in new

In the new version, the last line fails to compile because there are two embedded Z fields at the same depth, one from z and one from pkg.Point.

Using an Identical Type Externally

If it is possible for client code to write a type expression representing the underlying type of a defined type in a package, then external code can use it in assignments involving the package type, making any change to that type incompatible.

// old
type Point struct { X, Y int }

// new
type Point struct { X, Y, Z int }

// client
var p struct { X, Y int } = pkg.Point{} // fails in new because of Point's extra field

Here, the external code could have used the provided name Point, but chose not to. I'll have more to say about this and related examples later.

unsafe.Sizeof and Friends

Since unsafe.Sizeof, unsafe.Offsetof and unsafe.Alignof are constant expressions, they can be used in an array type literal:

// old
type S struct{ X int }

// new
type S struct{ X, y int }

// client
var a [unsafe.Sizeof(pkg.S{})]int = [8]int{} // fails in new because S's size is not 8

Use of these operations could make many changes to a type potentially incompatible.

Type Switches

A package change that merges two different types (with same underlying type) into a single new type may break type switches in clients that refer to both original types:

// old
type T1 int
type T2 int

// new
type T1 int
type T2 = T1

// client
switch x.(type) {
case T1:
case T2:
} // fails with new because two cases have the same type

This sort of incompatibility is sufficiently esoteric to ignore; the tool allows merging types.

First Attempt at a Definition

Our first attempt at defining compatibility captures the idea that all the exported names in the old package must have compatible equivalents in the new package.

A new package is compatible with an old one if and only if:

  • For every exported package-level name in the old package, the same name is declared in the new at package level, and
  • the names denote the same kind of object (e.g. both are variables), and
  • the types of the objects are compatible.

We will work out the details (and make some corrections) below, but it is clear already that we will need to determine what makes two types compatible. And whatever the definition of type compatibility, it's certainly true that if two types are the same, they are compatible. So we will need to decide what makes an old and new type the same. We will call this sameness relation correspondence.

Type Correspondence

Go already has a definition of when two types are the same: type identity. But identity isn't adequate for our purpose: it says that two defined types are identical if they arise from the same definition, but it's unclear what "same" means when talking about two different packages (or two versions of a single package).

The obvious change to the definition of identity is to require that old and new defined types have the same name instead. But that doesn't work either, for two reasons. First, type aliases can equate two defined types with different names:

// old
type E int

// new
type t int
type E = t

Second, an unexported type can be renamed:

// old
type u1 int
var V u1

// new
type u2 int
var V u2

Here, even though u1 and u2 are unexported, their exported fields and methods are visible to clients, so they are part of the API. But since the name u1 is not visible to clients, it can be changed compatibly. We say that u1 and u2 are exposed: a type is exposed if a client package can declare variables of that type.

We will say that an old defined type corresponds to a new one if they have the same name, or one can be renamed to the other without otherwise changing the API. In the first example above, old E and new t correspond. In the second, old u1 and new u2 correspond.

Two or more old defined types can correspond to a single new type: we consider "merging" two types into one to be a compatible change. As mentioned above, code that uses both names in a type switch will fail, but we deliberately ignore this case. However, a single old type can correspond to only one new type.

So far, we've explained what correspondence means for defined types. To extend the definition to all types, we parallel the language's definition of type identity. So, for instance, an old and a new slice type correspond if their element types correspond.

Definition of Compatibility

We can now present the definition of compatibility used by apidiff.

Package Compatibility

A new package is compatible with an old one if:

  1. Each exported name in the old package's scope also appears in the new package's scope, and the object (constant, variable, function or type) denoted by that name in the old package is compatible with the object denoted by the name in the new package, and
  2. For every exposed type that implements an exposed interface in the old package, its corresponding type should implement the corresponding interface in the new package.

Otherwise the packages are incompatible.

As an aside, the tool also finds exported names in the new package that are not exported in the old, and marks them as compatible changes.

Clause 2 is discussed further in "Whole-Package Compatibility."

Object Compatibility

This section provides compatibility rules for constants, variables, functions and types.

Constants

A new exported constant is compatible with an old one of the same name if and only if

  1. Their types correspond, and
  2. Their values are identical.

It is tempting to allow changing a typed constant to an untyped one. That may seem harmless, but it can break code like this:

// old
const C int64 = 1

// new
const C = 1

// client
var x = C          // old type is int64, new is int
var y int64 = x // fails with new: different types in assignment

A change to the value of a constant can break compatiblity if the value is used in an array type:

// old
const C = 1

// new
const C = 2

// client
var a [C]int = [1]int{} // fails with new because [2]int and [1]int are different types

Changes to constant values are rare, and determining whether they are compatible or not is better left to the user, so the tool reports them.

Variables

A new exported variable is compatible with an old one of the same name if and only if their types correspond.

Correspondence doesn't look past names, so this rule does not prevent adding a field to MyStruct if the package declares var V MyStruct. It does, however, mean that

var V struct { X int }

is incompatible with

var V struct { X, Y int }

I discuss this at length below in the section "Compatibility, Types and Names."

Functions

A new exported function or variable is compatible with an old function of the same name if and only if their types (signatures) correspond.

This rule captures the fact that, although many signature changes are compatible for all call sites, none are compatible for assignment:

var v func(int) = pkg.F

Here, F must be of type func(int) and not, for instance, func(...int) or func(interface{}).

Note that the rule permits changing a function to a variable. This is a common practice, usually done for test stubbing, and cannot break any code at compile time.

Exported Types

A new exported type is compatible with an old one if and only if their names are the same and their types correspond.

This rule seems far too strict. But, ignoring aliases for the moment, it demands only that the old and new defined types correspond. Consider:

// old
type T struct { X int }

// new
type T struct { X, Y int }

The addition of Y is a compatible change, because this rule does not require that the struct literals have to correspond, only that the defined types denoted by T must correspond. (Remember that correspondence stops at type names.)

If one type is an alias that refers to the corresponding defined type, the situation is the same:

// old
type T struct { X int }

// new
type u struct { X, Y int }
type T = u

Here, the only requirement is that old T corresponds to new u, not that the struct types correspond. (We can't tell from this snippet that the old T and the new u do correspond; that depends on whether u replaces T throughout the API.)

However, the following change is incompatible, because the names do not denote corresponding types:

// old
type T = struct { X int }

// new
type T = struct { X, Y int }

Type Literal Compatibility

Only five kinds of types can differ compatibly: defined types, structs, interfaces, channels and numeric types. We only consider the compatibility of the last four when they are the underlying type of a defined type. See "Compatibility, Types and Names" for a rationale.

We justify the compatibility rules by enumerating all the ways a type can be used, and by showing that the allowed changes cannot break any code that uses values of the type in those ways.

Values of all types can be used in assignments (including argument passing and function return), but we do not require that old and new types are assignment compatible. That is because we assume that the old and new packages are never used together: any given binary will link in either the old package or the new. So in describing how a type can be used in the sections below, we omit assignment.

Any type can also be used in a type assertion or conversion. The changes we allow below may affect the run-time behavior of these operations, but they cannot affect whether they compile. The only such breaking change would be to change the type T in an assertion x.T so that it no longer implements the interface type of x; but the rules for interfaces below disallow that.

A new type is compatible with an old one if and only if they correspond, or one of the cases below applies.

Defined Types

Other than assignment, the only ways to use a defined type are to access its methods, or to make use of the properties of its underlying type. Rule 2 below covers the latter, and rules 3 and 4 cover the former.

A new defined type is compatible with an old one if and only if all of the following hold:

  1. They correspond.
  2. Their underlying types are compatible.
  3. The new exported value method set is a superset of the old.
  4. The new exported pointer method set is a superset of the old.

An exported method set is a method set with all unexported methods removed. When comparing methods of a method set, we require identical names and corresponding signatures.

Removing an exported method is clearly a breaking change. But removing an unexported one (or changing its signature) can be breaking as well, if it results in the type no longer implementing an interface. See "Whole-Package Compatibility," below.

Channels

A new channel type is compatible with an old one if

  1. The element types correspond, and
  2. Either the directions are the same, or the new type has no direction.

Other than assignment, the only ways to use values of a channel type are to send and receive on them, to close them, and to use them as map keys. Changes to a channel type cannot cause code that closes a channel or uses it as a map key to fail to compile, so we need not consider those operations.

Rule 1 ensures that any operations on the values sent or received will compile. Rule 2 captures the fact that any program that compiles with a directed channel must use either only sends, or only receives, so allowing the other operation by removing the channel direction cannot break any code.

Interfaces

A new interface is compatible with an old one if and only if:

  1. The old interface does not have an unexported method, and it corresponds to the new interfaces (i.e. they have the same method set), or
  2. The old interface has an unexported method and the new exported method set is a superset of the old.

Other than assignment, the only ways to use an interface are to implement it, embed it, or call one of its methods. (Interface values can also be used as map keys, but that cannot cause a compile-time error.)

Certainly, removing an exported method from an interface could break a client call, so neither rule allows it.

Rule 1 also disallows adding a method to an interface without an existing unexported method. Such an interface can be implemented in client code. If adding a method were allowed, a type that implements the old interface could fail to implement the new one:

type I interface { M1() }         // old
type I interface { M1(); M2() }   // new

// client
type t struct{}
func (t) M1() {}
var i pkg.I = t{} // fails with new, because t lacks M2

Rule 2 is based on the observation that if an interface has an unexported method, the only way a client can implement it is to embed it. Adding a method is compatible in this case, because the embedding struct will continue to implement the interface. Adding a method also cannot break any call sites, since no program that compiles could have any such call sites.

Structs

A new struct is compatible with an old one if all of the following hold:

  1. The new set of top-level exported fields is a superset of the old.
  2. The new set of selectable exported fields is a superset of the old.
  3. If the old struct is comparable, so is the new one.

The set of selectable exported fields is the set of exported fields F such that x.F is a valid selector expression for a value x of the struct type. F may be at the top level of the struct, or it may be a field of an embedded struct.

Two fields are the same if they have the same name and corresponding types.

Other than assignment, there are only four ways to use a struct: write a struct literal, select a field, use a value of the struct as a map key, or compare two values for equality. The first clause ensures that struct literals compile; the second, that selections compile; and the third, that equality expressions and map index expressions compile.

Numeric Types

A new numeric type is compatible with an old one if and only if they are both unsigned integers, both signed integers, both floats or both complex types, and the new one is at least as large as the old on both 32-bit and 64-bit architectures.

Other than in assignments, numeric types appear in arithmetic and comparison expressions. Since all arithmetic operations but shifts (see below) require that operand types be identical, and by assumption the old and new types underly defined types (see "Compatibility, Types and Names," below), there is no way for client code to write an arithmetic expression that compiles with operands of the old type but not the new.

Numeric types can also appear in type switches and type assertions. Again, since the old and new types underly defined types, type switches and type assertions that compiled using the old defined type will continue to compile with the new defined type.

Going from an unsigned to a signed integer type is an incompatible change for the sole reason that only an unsigned type can appear as the right operand of a shift. If this rule is relaxed, then changes from an unsigned type to a larger signed type would be compatible. See this issue.

Only integer types can be used in bitwise and shift operations, and for indexing slices and arrays. That is why switching from an integer to a floating-point type--even one that can represent all values of the integer type--is an incompatible change.

Conversions from floating-point to complex types or vice versa are not permitted (the predeclared functions real, imag, and complex must be used instead). To prevent valid floating-point or complex conversions from becoming invalid, changing a floating-point type to a complex type or vice versa is considered an incompatible change.

Although conversions between any two integer types are valid, assigning a constant value to a variable of integer type that is too small to represent the constant is not permitted. That is why the only compatible changes are to a new type whose values are a superset of the old. The requirement that the new set of values must include the old on both 32-bit and 64-bit machines allows conversions from int32 to int and from int to int64, but not the other direction; and similarly for uint.

Changing a type to or from uintptr is considered an incompatible change. Since its size is not specified, there is no way to know whether the new type's values are a superset of the old type's.

Whole-Package Compatibility

Some changes that are compatible for a single type are not compatible when the package is considered as a whole. For example, if you remove an unexported method on a defined type, it may no longer implement an interface of the package. This can break client code:

// old
type T int
func (T) m() {}
type I interface { m() }

// new
type T int // no method m anymore

// client
var i pkg.I = pkg.T{} // fails with new because T lacks m

Similarly, adding a method to an interface can cause defined types in the package to stop implementing it.

The second clause in the definition for package compatibility handles these cases. To repeat:

  1. For every exposed type that implements an exposed interface in the old package, its corresponding type should implement the corresponding interface in the new package. Recall that a type is exposed if it is part of the package's API, even if it is unexported.

Other incompatibilities that involve more than one type in the package can arise whenever two types with identical underlying types exist in the old or new package. Here, a change "splits" an identical underlying type into two, breaking conversions:

// old
type B struct { X int }
type C struct { X int }

// new
type B struct { X int }
type C struct { X, Y int }

// client
var b B
_ = C(b) // fails with new: cannot convert B to C

Finally, changes that are compatible for the package in which they occur can break downstream packages. That can happen even if they involve unexported methods, thanks to embedding.

The definitions given here don't account for these sorts of problems.

Compatibility, Types and Names

The above definitions state that the only types that can differ compatibly are defined types and the types that underly them. Changes to other type literals are considered incompatible. For instance, it is considered an incompatible change to add a field to the struct in this variable declaration:

var V struct { X int }

or this alias definition:

type T = struct { X int }

We make this choice to keep the definition of compatibility (relatively) simple. A more precise definition could, for instance, distinguish between

func F(struct { X int })

where any changes to the struct are incompatible, and

func F(struct { X, u int })

where adding a field is compatible (since clients cannot write the signature, and thus cannot assign F to a variable of the signature type). The definition should then also allow other function signature changes that only require call-site compatibility, like

func F(struct { X, u int }, ...int)

The result would be a much more complex definition with little benefit, since the examples in this section rarely arise in practice.