aboutsummaryrefslogtreecommitdiffstats
path: root/docs/automating_vpp_api_flag_day.rst
diff options
context:
space:
mode:
authorpmikus <peter.mikus@protonmail.ch>2023-03-09 13:32:33 +0000
committerPeter Mikus <peter.mikus@protonmail.ch>2023-03-15 10:06:55 +0000
commitb928e56347c682fdb9762ccbe2f368329d4037e4 (patch)
tree9c0f6a02efc1704821229b8e18a23468fdf8674a /docs/automating_vpp_api_flag_day.rst
parentddcdf45806d0efa9e89dd4446b4c7da39cfb27a8 (diff)
feat(docs): Hugo
Signed-off-by: pmikus <peter.mikus@protonmail.ch> Change-Id: Id8b43ef1f31f39b19a0629c52581514fda278f3b
Diffstat (limited to 'docs/automating_vpp_api_flag_day.rst')
-rw-r--r--docs/automating_vpp_api_flag_day.rst325
1 files changed, 0 insertions, 325 deletions
diff --git a/docs/automating_vpp_api_flag_day.rst b/docs/automating_vpp_api_flag_day.rst
deleted file mode 100644
index 5e4803822f..0000000000
--- a/docs/automating_vpp_api_flag_day.rst
+++ /dev/null
@@ -1,325 +0,0 @@
-..
- Copyright (c) 2021 Cisco and/or its affiliates.
- Licensed under the Apache License, Version 2.0 (the "License");
- you may not use this file except in compliance with the License.
- You may obtain a copy of the License at:
-..
- http://www.apache.org/licenses/LICENSE-2.0
-..
- Unless required by applicable law or agreed to in writing, software
- distributed under the License is distributed on an "AS IS" BASIS,
- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- See the License for the specific language governing permissions and
- limitations under the License.
-
-
-VPP API Flag Day Algorithm
-^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-Abstract
-~~~~~~~~
-
-This document describes the current solution to the problem of
-automating the detection of VPP API changes which are not backwards
-compatible with existing CSIT tests, by defining the "Flag Day"
-process of deploying a new set of CSIT tests which are compatible
-with the new version of the VPP API without causing a halt to the
-normal VPP/CSIT operational CI process. This is initially
-limited to changes in \*.api files contained in the vpp repo.
-Eventually the detection algorithm could be extended to include
-other integration points such as "directory" structure of stats
-segment or PAPI python library dependencies.
-
-Motivation
-~~~~~~~~~~
-
-Aside of per-release activities (release report), CSIT also provides testing
-that requires somewhat tight coupling to the latest (merged but not released)
-VPP code. Currently, HEAD of one project is run against somewhat older codebase
-of the other project. Definition of what is the older codebase to use
-is maintained by CSIT project. For older CSIT codebase, there are so-called
-"oper" branches. For older VPP codebase, CSIT master HEAD contains identifiers
-for "stable" VPP builds. Such older codebases are also used for verify jobs,
-where HEAD of the other project is replaced by the commit under review.
-
-One particular type of jobs useful for VPP development is trending jobs.
-They test latests VPP build with latest oper branch of CSIT,
-and analytics is applied to detect regressions in preformance.
-For this to work properly, VPP project needs a warning against breaking
-the assumptions the current oper branch makes about VPP behavior.
-In the past, the most frequent type of such breakage was API change.
-
-Earlier attempts to create a process to minimize breakage have focused
-on creating a new verify job for VPP (called api-crc job) that
-votes -1 on a change that affects CRC values for API messages CSIT uses.
-The list of messages and CRC values (multiple "collections" are allowed)
-is maintained in CSIT repository (in oper branch).
-The process was less explicit on how should CSIT project maintain such list.
-As CSIT was not willing to support two incpompatible API messages
-by the same codebase (commit), there were unavoidable windows
-where either trenging jobs, or CSIT verify jobs were failing.
-
-Practice showed that human (or infra) errors can create two kinds of breakages.
-Either the unavoidable short window gets long, affecting a trending job run
-or two, or the api-crc job starts giving -1 to innocent changes
-because oper branch went out of sync with VPP HEAD codebase.
-This second type of failure prevents any merges to VPP for a long time
-(12 hours is the typical time, give time zone differences).
-
-The current version of this document introduces two new requirements.
-Firstly, the api-crc job should not give false -1, under any
-(reasonable) circumstances. That means, if a VPP change
-(nor any of its unmerged ancestor commits) does not affect any CRC values
-for messages used by CSIT, -1 should only mean "rebase is needed",
-and rebasing to HEAD should result in +1 from the api-crc job.
-Secondly, no more than one VPP change is allowed to be processed
-(at the same time).
-
-Naming
-~~~~~~
-
-It is easier to define the process after chosing shorter names
-for notions that need long definition.
-
-Note: Everytime a single job is mentioned,
-in practice it can be a set of jobs covering parts of functionality.
-A "run" of the set of jobs passes only if each job within the set
-has been run (again) and passed.
-
-Jobs
-----
-
-+ A *vpp verify* job: Any job run automatically, and voting on open VPP changes.
- Some verify jobs compile and package VPP for target operating system
- and processor architecture, the packages are NOT archived (currently).
- They should be cached somewhere in future to speed up in downstream jobs,
- but currently each such downstream job can clone and build.
-
-+ The *api-crc* job: Quick verify job for VPP changes, that accesses
- CSIT repository (checkout latest oper branch HEAD) to figure out
- whether merging the change is safe from CSIT point of view.
- Here, -1 means CSIT is not ready. +1 means CSIT looks to be ready
- for the new CRC values, but there still may be failures on real tests.
-
-+ A *trending* job: Any job that is started by timer and performs testing.
- It checkouts CSIT latest oper branch HEAD, downloads the most recent
- completely uploaded VPP package, and unconditionally runs the tests.
- CRC checks are optional, ideally only written to console log
- without otherwise affecting the test cases.
-
-+ A *vpp-csit* job: A slower verify job for VPP changes, that accesses CSIT
- repository and runs tests from the correct CSIT commit (chosen as in trending)
- against the VPP (built from the VPP patch under review).
- Vote -1 means there were test failures. +1 means no test failures, meaning
- there either was no API change, or it was backward compatible.
-
-+ A *csit-vpp* job: Verify job for open CSIT changes. Downloads the
- (completely uploaded) VPP package marked as "stable", and runs a selection
- of tests (from the CSIT patch under review).
- Vote +1 means all tests have passed, so it is safe to merge
- the patch under review.
-
-+ A *patch-on-patch* job: Manually triggered non-voting job
- for open CSIT changes. Compiles and packages from VPP source
- (usually of an unmerged change). Then runs the same tests as csit-vpp job.
- This job is used to prove the CSIT patch under review is supporting
- the specified VPP code.
- In practice, this can be a vpp-csit job started with CSIT_REF set.
-
-+ A *manual verification* is done by a CSIT committer, locally executing steps
- equivalent to the patch-on-patch job. This can to save time and resources.
-
-CRC Collections
----------------
-
-Any commit in/for the CSIT repository contains a file (supported_crcs.yaml),
-which contains either one or two collections. A collection is a mapping
-that maps API message name to its CRC value.
-
-A collection name specifies which VPP build is this collection for.
-An API message name is present in a collection if and only if
-it is used by a test implementation (can be in different CSIT commit)
-targeted at the VPP build (pointed out by the collection name).
-
-+ The *stable collection*: Usually required, listed first, has comments and name
- pointing to the VPP build this CSIT commit marks as stable.
- The stable collection is only missing in deactivating changes (see below)
- when not mergeable yet.
-
-+ The *active collection*: Optional, listed second, has comments and name
- pointing to the VPP Gerrit (including patch set number)
- the currently active API process is processing.
- The patch set number part can be behind the actual Gerrit state.
- This is safe, because api-crc job on the active API change will fail
- if the older patch is no longer API-equivalent to the newer patch.
-
-Changes
--------
-
-+ An *API change*: The name for any Gerrit Change for VPP repository
- that does not pass api-crc job right away, and needs this whole process.
- This usually means .api files are edited, but a patch that affects
- the way CRC values are computed is also an API change.
-
- Full name could be VPP API Change, but as no CSIT change is named "API change"
- (and this document does not talk about other FD.io or external projects),
- "API change" is shorter.
-
- TODO: Is there a magic incantation for Gerrit WebUI to search for API changes?
- Open, -1 from api-crc job, +1 from other (non-csit) jobs.
-
-+ A *blocked change*: The name for open Gerrit Change for VPP repository
- that got -1 from some of voting verify jobs.
-
-+ A *VPP-blocked change": A blocked change which got -1 from some "pure VPP"
- verify job, meaning no CSIT code has been involved in the vote.
- Example: "make test" fails.
-
- VPP contributor is expected to fix the change, or VPP developers
- are expected to found a cause in an earlier VPP change, and fix it.
- No interaction with CSIT developers is necessary.
-
-+ A *CSIT-blocked change*: A blocked change which is not VPP-blocked,
- but does not pass some vpp-csit job.
- To fix a CSIT-blocked change, an interaction with a CSIT committer
- is usually necessary. Even if a VPP developer is experienced enough
- to identify the cause of the failure, a merge to CSIT is usually needed
- for a full fix.
-
- This process does not specify what to do with CSIT-blocked changes
- that are not also API changes.
-
-+ A *candidate API change*: An API change that meets all requirements
- to become active (see below). Currently, the requirements are:
-
- + No -1 nor -2 from from any human reviewer.
-
- + All verify jobs (except vpp-csit ones) pass.
-
- + +1 from a VPP committer.
-
- The reason is to avoid situations where an API change becomes active,
- but the VPP committers are unwilling to merge it for some reason.
-
-+ The *active API change*: The candidate API change currently being processed
- by the API Flag Day Algorithm.
- While many API changes can be candidates at the same time,
- only one is allowed be active at a time.
-
-+ The *activating change*: The name for a Gerrit Change for CSIT repository
- that does not change the test code, but adds the active CRC collection.
- Merge of the opening change (to latest CSIT oper branch) defines
- which API change has become active.
-
-+ The *deactivating change*: The name for Gerrit Change for CSIT repository
- that only supports tests and CRC values for VPP with the active API change.
- That implies the previously stable CRC collection is deleted,
- and any edits to the test implementation are done here.
-
-+ The *mergeable deactivating change*: The deactivating change with additional
- requirements. Details on the requirements are listed in the next section.
- Merging this change finishes the process for the active API change.
-
-It is possible for a single CSIT change to act both as a mergeable
-deactivating change for one API change, and as an activating change
-for another API change. As English lacks a good adjective for such a thing,
-this document does not name this change.
-When this documents says a change is activating or deactivating,
-it allows the possibility for the change to fullfill also other purposes
-(e.g. acting as deactivating / activating change for another API change).
-
-Algorithm Steps
-~~~~~~~~~~~~~~~
-
-The following steps describe the application of the API "Flag Day" algorithm:
-
-#. A VPP patch for an API change is submitted to
- gerrit for review.
-#. The api-crc job detects the API CRC values have changed
- for some messages used by CSIT.
-#. The api-crc job runs in parallel with any other vpp-csit verify job,
- so those other jobs can hint at the impact on CSIT.
- Currently, any such vpp-csit job is non-voting,
- as the current process does not guarantee such jobs passes
- when the API change is merged.
-#. If the api-crc job fails, an email with the appropriate reason
- is sent to the VPP patch submitter and vpp-api-dev@lists.fd.io
- including the VPP patch information and .api files that are edited.
-#. The VPP patch developer works with a VPP committer
- to ensure the patch meets requirements to become a candidate (see above).
-#. The VPP patch developer and CSIT team create a CSIT JIRA ticket
- to identify the work required to support the new VPP API version.
-#. CSIT developer creates a patch of the deactivating change
- (upload to Gerrit not required yet).
-#. CSIT developer runs patch-on-patch job (or manual verification).
- Both developers iterate until the verification passes.
- Note that in this phase csit-vpp job is expected to vote -1,
- as the deactivating change is not mergeable yet.
-#. CSIT developer creates the activating change, uploads to Gerrit,
- waits for vote (usual review cycle applies).
-#. When CSIT committer is satisfied, the activating change is merged
- to CSIT master branch and cherry-picked to the latest oper branch.
- This enters a "critical section" of the process.
- Merges of other activating changes are not allowed from now on.
- The targeted API change becomes the active API change.
- This does not break any jobs.
-#. VPP developer (or CSIT committer) issues a recheck on the VPP patch.
-#. On failure, VPP and CSIT committers analyze what went wrong.
- Typically, the active CRC collection is matching only an older patch set,
- but a newer patch set needs different CRC values.
- Either due to improvements on the VPP change in question,
- or due to a rebase over previously merged (unrelated) API change.
- VPP perhaps needs to rebase, and CSIT definitely needs
- to merge edits to the active collection. Then issue a recheck again,
- and iterate until success.
-#. On success, VPP Committer merges the active API change patch.
- (This is also a delayed verification of the current active CRC collection.)
-#. VPP committer sends an e-mail to vpp-api-dev stating the support for
- the previous CRC values will soon be removed, implying other changes
- (whether API or not) should be rebased soon.
-#. VPP merge jobs create and upload new VPP packages.
- This breaks trending jobs, but both VPP and CSIT verify jobs still work.
-#. CSIT developer makes the deactivating change mergeable:
- The stable VPP build indicator is bumped to the build
- that contains the active API change. The active CRC collection
- (added by the activating change) is renamed to the new stable collection.
- (The previous stable collection has already been deleted.)
- At this time, the deactivating change should be uploaded to Gerrit and
- csit verify jobs should be triggered.
-#. CSIT committer reviews the code, perhaps triggering any additional jobs
- needed to verify the tests using the edited APIs are still working.
-#. When satisfied, CSIT committer merges the mergeable deactivating change
- (to both master and oper).
- The merge fixes trending jobs. VPP and CSIT verify jobs continue to work.
- The merge also breaks some verify jobs for old changes in VPP,
- as announced when the active API change was merged.
- The merge is the point where the process leaves the "critical section",
- thus allowing merges of activating changes for other API changes.
-#. CSIT committer sends an e-mail to vpp-api-dev stating the support for
- the previous CRC values has been removed, and rebase is needed
- for all affected VPP changes.
-#. Recheck of existing VPP patches in gerrit may cause the "VPP
- API Incompatible Change Test" to send an email to the patch
- submitter to rebase the patch to pick up the compatible VPP API
- version files.
-
-Real life examples
-~~~~~~~~~~~~~~~~~~
-
-Simple API change: https://gerrit.fd.io/r/c/vpp/+/23829
-
-Activating change: https://gerrit.fd.io/r/c/csit/+/23956
-
-Mergeable deactivating change: https://gerrit.fd.io/r/c/csit/+/24280
-
-Less straightforward mergeable deactivating change:
-https://gerrit.fd.io/r/c/csit/+/22526
-It shows:
-
-+ Crc edits: supported_crcs.yaml
-+ Version bump: VPP_STABLE_VER_UBUNTU_BIONIC
-+ And even a way to work around failing tests:
- eth2p-ethicmpv4-ip4base-eth-1tap-dev.robot
-
-Simple change that is both deactivating and activating:
-https://gerrit.fd.io/r/c/csit/+/23969