aboutsummaryrefslogtreecommitdiffstats
path: root/docs/toi
diff options
context:
space:
mode:
Diffstat (limited to 'docs/toi')
-rw-r--r--docs/toi/automating_vpp_api_flag_day.md303
-rw-r--r--docs/toi/bash_code_style.md651
-rw-r--r--docs/toi/branches.md13
-rw-r--r--docs/toi/test_code_guidelines.md294
-rw-r--r--docs/toi/vagrant.md326
5 files changed, 1581 insertions, 6 deletions
diff --git a/docs/toi/automating_vpp_api_flag_day.md b/docs/toi/automating_vpp_api_flag_day.md
new file mode 100644
index 0000000000..131adeab9d
--- /dev/null
+++ b/docs/toi/automating_vpp_api_flag_day.md
@@ -0,0 +1,303 @@
+---
+bookHidden: true
+title: "VPP API Flag Day Algorithms"
+---
+
+# VPP API Flag Day Algorithm
+
+## Abstract
+
+This document describes the current solution to the problem of
+automating the detection of VPP API changes which are not backwards
+compatible with existing CSIT tests, by defining the "Flag Day"
+process of deploying a new set of CSIT tests which are compatible
+with the new version of the VPP API without causing a halt to the
+normal VPP/CSIT operational CI process. This is initially
+limited to changes in \*.api files contained in the vpp repo.
+Eventually the detection algorithm could be extended to include
+other integration points such as "directory" structure of stats
+segment or PAPI python library dependencies.
+
+## Motivation
+
+Aside of per-release activities (release report), CSIT also provides testing
+that requires somewhat tight coupling to the latest (merged but not released)
+VPP code. Currently, HEAD of one project is run against somewhat older codebase
+of the other project. Definition of what is the older codebase to use
+is maintained by CSIT project. For older CSIT codebase, there are so-called
+"oper" branches. For older VPP codebase, CSIT master HEAD contains identifiers
+for "stable" VPP builds. Such older codebases are also used for verify jobs,
+where HEAD of the other project is replaced by the commit under review.
+
+One particular type of jobs useful for VPP development is trending jobs.
+They test latests VPP build with latest oper branch of CSIT,
+and analytics is applied to detect regressions in preformance.
+For this to work properly, VPP project needs a warning against breaking
+the assumptions the current oper branch makes about VPP behavior.
+In the past, the most frequent type of such breakage was API change.
+
+Earlier attempts to create a process to minimize breakage have focused
+on creating a new verify job for VPP (called api-crc job) that
+votes -1 on a change that affects CRC values for API messages CSIT uses.
+The list of messages and CRC values (multiple "collections" are allowed)
+is maintained in CSIT repository (in oper branch).
+The process was less explicit on how should CSIT project maintain such list.
+As CSIT was not willing to support two incpompatible API messages
+by the same codebase (commit), there were unavoidable windows
+where either trenging jobs, or CSIT verify jobs were failing.
+
+Practice showed that human (or infra) errors can create two kinds of breakages.
+Either the unavoidable short window gets long, affecting a trending job run
+or two, or the api-crc job starts giving -1 to innocent changes
+because oper branch went out of sync with VPP HEAD codebase.
+This second type of failure prevents any merges to VPP for a long time
+(12 hours is the typical time, give time zone differences).
+
+The current version of this document introduces two new requirements.
+Firstly, the api-crc job should not give false -1, under any
+(reasonable) circumstances. That means, if a VPP change
+(nor any of its unmerged ancestor commits) does not affect any CRC values
+for messages used by CSIT, -1 should only mean "rebase is needed",
+and rebasing to HEAD should result in +1 from the api-crc job.
+Secondly, no more than one VPP change is allowed to be processed
+(at the same time).
+
+## Naming
+
+It is easier to define the process after chosing shorter names
+for notions that need long definition.
+
+Note: Everytime a single job is mentioned,
+in practice it can be a set of jobs covering parts of functionality.
+A "run" of the set of jobs passes only if each job within the set
+has been run (again) and passed.
+
+## Jobs
+
++ A *vpp verify* job: Any job run automatically, and voting on open VPP changes.
+ Some verify jobs compile and package VPP for target operating system
+ and processor architecture, the packages are NOT archived (currently).
+ They should be cached somewhere in future to speed up in downstream jobs,
+ but currently each such downstream job can clone and build.
+
++ The *api-crc* job: Quick verify job for VPP changes, that accesses
+ CSIT repository (checkout latest oper branch HEAD) to figure out
+ whether merging the change is safe from CSIT point of view.
+ Here, -1 means CSIT is not ready. +1 means CSIT looks to be ready
+ for the new CRC values, but there still may be failures on real tests.
+
++ A *trending* job: Any job that is started by timer and performs testing.
+ It checkouts CSIT latest oper branch HEAD, downloads the most recent
+ completely uploaded VPP package, and unconditionally runs the tests.
+ CRC checks are optional, ideally only written to console log
+ without otherwise affecting the test cases.
+
++ A *vpp-csit* job: A slower verify job for VPP changes, that accesses CSIT
+ repository and runs tests from the correct CSIT commit (chosen as in trending)
+ against the VPP (built from the VPP patch under review).
+ Vote -1 means there were test failures. +1 means no test failures, meaning
+ there either was no API change, or it was backward compatible.
+
++ A *csit-vpp* job: Verify job for open CSIT changes. Downloads the
+ (completely uploaded) VPP package marked as "stable", and runs a selection
+ of tests (from the CSIT patch under review).
+ Vote +1 means all tests have passed, so it is safe to merge
+ the patch under review.
+
++ A *patch-on-patch* job: Manually triggered non-voting job
+ for open CSIT changes. Compiles and packages from VPP source
+ (usually of an unmerged change). Then runs the same tests as csit-vpp job.
+ This job is used to prove the CSIT patch under review is supporting
+ the specified VPP code.
+ In practice, this can be a vpp-csit job started with CSIT_REF set.
+
++ A *manual verification* is done by a CSIT committer, locally executing steps
+ equivalent to the patch-on-patch job. This can to save time and resources.
+
+## CRC Collections
+
+Any commit in/for the CSIT repository contains a file (supported_crcs.yaml),
+which contains either one or two collections. A collection is a mapping
+that maps API message name to its CRC value.
+
+A collection name specifies which VPP build is this collection for.
+An API message name is present in a collection if and only if
+it is used by a test implementation (can be in different CSIT commit)
+targeted at the VPP build (pointed out by the collection name).
+
++ The *stable collection*: Usually required, listed first, has comments and name
+ pointing to the VPP build this CSIT commit marks as stable.
+ The stable collection is only missing in deactivating changes (see below)
+ when not mergeable yet.
+
++ The *active collection*: Optional, listed second, has comments and name
+ pointing to the VPP Gerrit (including patch set number)
+ the currently active API process is processing.
+ The patch set number part can be behind the actual Gerrit state.
+ This is safe, because api-crc job on the active API change will fail
+ if the older patch is no longer API-equivalent to the newer patch.
+
+## Changes
+
++ An *API change*: The name for any Gerrit Change for VPP repository
+ that does not pass api-crc job right away, and needs this whole process.
+ This usually means .api files are edited, but a patch that affects
+ the way CRC values are computed is also an API change.
+
+ Full name could be VPP API Change, but as no CSIT change is named "API change"
+ (and this document does not talk about other FD.io or external projects),
+ "API change" is shorter.
+
++ A *blocked change*: The name for open Gerrit Change for VPP repository
+ that got -1 from some of voting verify jobs.
+
++ A *VPP-blocked change": A blocked change which got -1 from some "pure VPP"
+ verify job, meaning no CSIT code has been involved in the vote.
+ Example: "make test" fails.
+
+ VPP contributor is expected to fix the change, or VPP developers
+ are expected to found a cause in an earlier VPP change, and fix it.
+ No interaction with CSIT developers is necessary.
+
++ A *CSIT-blocked change*: A blocked change which is not VPP-blocked,
+ but does not pass some vpp-csit job.
+ To fix a CSIT-blocked change, an interaction with a CSIT committer
+ is usually necessary. Even if a VPP developer is experienced enough
+ to identify the cause of the failure, a merge to CSIT is usually needed
+ for a full fix.
+
+ This process does not specify what to do with CSIT-blocked changes
+ that are not also API changes.
+
++ A *candidate API change*: An API change that meets all requirements
+ to become active (see below). Currently, the requirements are:
+
+ + No -1 nor -2 from from any human reviewer.
+
+ + All verify jobs (except vpp-csit ones) pass.
+
+ + +1 from a VPP committer.
+
+ The reason is to avoid situations where an API change becomes active,
+ but the VPP committers are unwilling to merge it for some reason.
+
++ The *active API change*: The candidate API change currently being processed
+ by the API Flag Day Algorithm.
+ While many API changes can be candidates at the same time,
+ only one is allowed be active at a time.
+
++ The *activating change*: The name for a Gerrit Change for CSIT repository
+ that does not change the test code, but adds the active CRC collection.
+ Merge of the opening change (to latest CSIT oper branch) defines
+ which API change has become active.
+
++ The *deactivating change*: The name for Gerrit Change for CSIT repository
+ that only supports tests and CRC values for VPP with the active API change.
+ That implies the previously stable CRC collection is deleted,
+ and any edits to the test implementation are done here.
+
++ The *mergeable deactivating change*: The deactivating change with additional
+ requirements. Details on the requirements are listed in the next section.
+ Merging this change finishes the process for the active API change.
+
+It is possible for a single CSIT change to act both as a mergeable
+deactivating change for one API change, and as an activating change
+for another API change. As English lacks a good adjective for such a thing,
+this document does not name this change.
+When this documents says a change is activating or deactivating,
+it allows the possibility for the change to fullfill also other purposes
+(e.g. acting as deactivating / activating change for another API change).
+
+## Algorithm Steps
+
+The following steps describe the application of the API "Flag Day" algorithm:
+
+#. A VPP patch for an API change is submitted to
+ gerrit for review.
+#. The api-crc job detects the API CRC values have changed
+ for some messages used by CSIT.
+#. The api-crc job runs in parallel with any other vpp-csit verify job,
+ so those other jobs can hint at the impact on CSIT.
+ Currently, any such vpp-csit job is non-voting,
+ as the current process does not guarantee such jobs passes
+ when the API change is merged.
+#. If the api-crc job fails, an email with the appropriate reason
+ is sent to the VPP patch submitter and vpp-api-dev@lists.fd.io
+ including the VPP patch information and .api files that are edited.
+#. The VPP patch developer works with a VPP committer
+ to ensure the patch meets requirements to become a candidate (see above).
+#. The VPP patch developer and CSIT team create a CSIT JIRA ticket
+ to identify the work required to support the new VPP API version.
+#. CSIT developer creates a patch of the deactivating change
+ (upload to Gerrit not required yet).
+#. CSIT developer runs patch-on-patch job (or manual verification).
+ Both developers iterate until the verification passes.
+ Note that in this phase csit-vpp job is expected to vote -1,
+ as the deactivating change is not mergeable yet.
+#. CSIT developer creates the activating change, uploads to Gerrit,
+ waits for vote (usual review cycle applies).
+#. When CSIT committer is satisfied, the activating change is merged
+ to CSIT master branch and cherry-picked to the latest oper branch.
+ This enters a "critical section" of the process.
+ Merges of other activating changes are not allowed from now on.
+ The targeted API change becomes the active API change.
+ This does not break any jobs.
+#. VPP developer (or CSIT committer) issues a recheck on the VPP patch.
+#. On failure, VPP and CSIT committers analyze what went wrong.
+ Typically, the active CRC collection is matching only an older patch set,
+ but a newer patch set needs different CRC values.
+ Either due to improvements on the VPP change in question,
+ or due to a rebase over previously merged (unrelated) API change.
+ VPP perhaps needs to rebase, and CSIT definitely needs
+ to merge edits to the active collection. Then issue a recheck again,
+ and iterate until success.
+#. On success, VPP Committer merges the active API change patch.
+ (This is also a delayed verification of the current active CRC collection.)
+#. VPP committer sends an e-mail to vpp-api-dev stating the support for
+ the previous CRC values will soon be removed, implying other changes
+ (whether API or not) should be rebased soon.
+#. VPP merge jobs create and upload new VPP packages.
+ This breaks trending jobs, but both VPP and CSIT verify jobs still work.
+#. CSIT developer makes the deactivating change mergeable:
+ The stable VPP build indicator is bumped to the build
+ that contains the active API change. The active CRC collection
+ (added by the activating change) is renamed to the new stable collection.
+ (The previous stable collection has already been deleted.)
+ At this time, the deactivating change should be uploaded to Gerrit and
+ csit verify jobs should be triggered.
+#. CSIT committer reviews the code, perhaps triggering any additional jobs
+ needed to verify the tests using the edited APIs are still working.
+#. When satisfied, CSIT committer merges the mergeable deactivating change
+ (to both master and oper).
+ The merge fixes trending jobs. VPP and CSIT verify jobs continue to work.
+ The merge also breaks some verify jobs for old changes in VPP,
+ as announced when the active API change was merged.
+ The merge is the point where the process leaves the "critical section",
+ thus allowing merges of activating changes for other API changes.
+#. CSIT committer sends an e-mail to vpp-api-dev stating the support for
+ the previous CRC values has been removed, and rebase is needed
+ for all affected VPP changes.
+#. Recheck of existing VPP patches in gerrit may cause the "VPP
+ API Incompatible Change Test" to send an email to the patch
+ submitter to rebase the patch to pick up the compatible VPP API
+ version files.
+
+### Real life examples
+
+Simple API change: https://gerrit.fd.io/r/c/vpp/+/23829
+
+Activating change: https://gerrit.fd.io/r/c/csit/+/23956
+
+Mergeable deactivating change: https://gerrit.fd.io/r/c/csit/+/24280
+
+Less straightforward mergeable deactivating change:
+https://gerrit.fd.io/r/c/csit/+/22526
+It shows:
+
++ Crc edits: supported_crcs.yaml
++ Version bump: VPP_STABLE_VER_UBUNTU_BIONIC
++ And even a way to work around failing tests:
+ eth2p-ethicmpv4-ip4base-eth-1tap-dev.robot
+
+Simple change that is both deactivating and activating:
+https://gerrit.fd.io/r/c/csit/+/23969
diff --git a/docs/toi/bash_code_style.md b/docs/toi/bash_code_style.md
new file mode 100644
index 0000000000..bbd0c37196
--- /dev/null
+++ b/docs/toi/bash_code_style.md
@@ -0,0 +1,651 @@
+---
+bookHidden: true
+title: "Bash Code Style"
+---
+
+The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED",
+"MAY", and "OPTIONAL" in this document are to be interpreted as
+described in [BCP 14](https://tools.ietf.org/html/bcp14),
+[RFC2119](https://tools.ietf.org/html/rfc2119),
+[RFC8174](https://tools.ietf.org/html/rfc8174)
+when, and only when, they appear in all capitals, as shown here.
+
+This document SHALL describe guidelines for writing reliable, maintainable,
+reusable and readable code for CSIT.
+
+# Proposed Style
+
+# File Types
+
+Bash files SHOULD NOT be monolithic. Generally, this document
+considers two types of bash files:
+
++ Entry script: Assumed to be called by user,
+ or a script "external" in some way.
+
+ + Sources bash libraries and calls functions defined there.
+
++ Library file: To be sourced by entry scipts, possibly also by other libraries.
+
+ + Sources other libraries for functions it needs.
+
+ + Or relies on a related file already having sourced that.
+
+ + Documentation SHALL imply which case it is.
+
+ + Defines multiple functions other scripts can call.
+
+# Safety
+
++ Variable expansions MUST be quoted, to prevent word splitting.
+
+ + This includes special "variables" such as "${1}".
+
+ + RECOMMENDED even if the value is safe, as in "$?" and "$#".
+
+ + It is RECOMMENDED to quote strings in general,
+ so text editors can syntax-highlight them.
+
+ + Even if the string is a numeric value.
+
+ + Commands and known options can get their own highlight, no need to quote.
+
+ + Example: You do not need to quote every word of
+ "pip install --upgrade virtualenv".
+
+ + Code SHALL NOT quote glob characters you need to expand (obviously).
+
+ + OPTIONALLY do not quote adjacent characters (such as dot or fore-slash),
+ so that syntax highlighting makes them stand out compared to surrounding
+ ordinary strings.
+
+ + Example: cp "logs"/*."log" "."/
+
+ + Command substitution on right hand side of assignment are safe
+ without quotes.
+
+ + Note that command substitution limits the scope for quotes,
+ so it is NOT REQUIRED to escape the quotes in deeper levels.
+
+ + Both backtics and "dollar round-bracket" provide command substitution.
+ The folowing rules are RECOMMENDED:
+
+ + For simple constructs, use "dollar round-bracket".
+
+ + If there are round brackets in the surrounding text, use backticks,
+ as some editor highlighting logic can get confused.
+
+ + Avoid nested command substitution.
+
+ + Put intermediate results into local variables,
+ use "|| die" on each step of command substitution.
+
+ + Code SHOULD NOT be structured in a way where
+ word splitting is intended.
+
+ + Example: Variable holding string of multiple command lines arguments.
+
+ + Solution: Array variable should be used in this case.
+
+ + Expansion MUST use quotes then: "${name[@]}".
+
+ + Word splitting MAY be used when creating arrays from command substitution.
+
++ Code MUST always check the exit code of commands.
+
+ + Traditionally, error code checking is done either by "set -e"
+ or by appending "|| die" after each command.
+ The first is unreliable, due to many rules affecting "set -e" behavior
+ (see <https://mywiki.wooledge.org/BashFAQ/105>), but "|| die"
+ relies on humans identifying each command, which is also unreliable.
+ When was the last time you checked error code of "echo" command,
+ for example?
+
+ + Another example: "set -e" in your function has no effect
+ if any ancestor call is done with logical or,
+ for example in "func || code=$?" construct.
+
+ + As there is no reliable method of error detection, and there are two
+ largely independent unreliable methods, the best what we can do
+ is to apply both. So, code SHOULD explicitly
+ check each command (with "|| die" and similar) AND have "set -e" applied.
+
+ + Code MUST explicitly check each command, unless the command is well known,
+ and considered safe (such as the aforementioned "echo").
+
+ + The well known commands MUST still be checked implicitly via "set -e".
+
+ + See below for specific "set -e" recommendations.
+
++ Code SHOULD use "readlink -e" (or "-f" if target does not exist yet)
+ to normalize any path value to absolute path without symlinks.
+ It helps with debugging and identifies malformed paths.
+
++ Code SHOULD use such normalized paths for sourcing.
+
++ When exiting on a known error, code MUST print a longer, helpful message,
+ in order for the user to fix their situation if possible.
+
++ When error happens at an unexpected place, it is RECOMMENDED for the message
+ to be short and generic, instead of speculative.
+
+# Bash Options
+
++ Code MUST apply "-x" to make debugging easier.
+
+ + Code MAY temporarily supress such output in order to avoid spam
+ (e.g. in long busy loops), but it is still NOT RECOMMENDED to do so.
+
++ Code MUST apply "-e" for early error detection.
+
+ + But code still SHOULD use "|| die" for most commands,
+ as "-e" has numerous rules and exceptions.
+
+ + Code MAY apply "+e" temporarily for commands which (possibly nonzero)
+ exit code it interested in.
+
+ + Code MUST to store "$?" and call "set -e" immediatelly afterwards.
+
+ + Code MUST NOT use this approach when calling functions.
+
+ + That is because functions are instructed to apply "set -e" on their own
+ which (when triggered) will exit the whole entry script.
+
+ + Unless overriden by ERR trap.
+ But code SHOULD NOT set any ERR trap.
+
+ + If code needs exit code of a function, it is RECOMMENDED to use
+ pattern 'code="0"; called_function || code="${?}"'.
+
+ + In this case, contributor MUST make sure nothing in the
+ called_function sub-graph relies on "set -e" behavior,
+ because the call being part of "or construct" disables it.
+
+ + Code MAY append "|| true" for benign commands,
+ when it is clear non-zero exit codes make no difference.
+
+ + Also in this case, the contributor MUST make sure nothing within
+ the called sub-graph depends on "set -e", as it is disabled.
+
++ Code MUST apply "-u" as unset variable is generally a typo, thus an error.
+
+ + Code MAY temporarily apply "+u" if a command needs that to pass.
+
+ + Virtualenv activation is the only known example so far.
+
++ Code MUST apply "-o pipefail" to make sure "-e" picks errors
+ inside piped construct.
+
+ + Code MAY use "|| true" inside a pipe construct, in the (inprobable) case
+ when non-zero exit code still results in a meaningful pipe output.
+
++ All together: "set -exuo pipefail".
+
+ + Code MUST put that line near start of every file, so we are sure
+ the options are applied no matter what.
+
+ + "Near start" means "before any nontrivial code".
+
+ + Basically only copyright is RECOMMENDED to appear before.
+
+ + Also code MUST put the line near start of function bodies
+ and subshell invocations.
+
+# Functions
+
+There are (at least) two possibilities how a code from an external file
+can be executed. Either the file contains a code block to execute
+on each "source" invocation, or the file just defines functions
+which have to be called separately.
+
+This document considers the "function way" to be better,
+here are some pros and cons:
+
++ Cons:
+
+ + The function way takes more space. Files have more lines,
+ and the code in function body is one indent deeper.
+
+ + It is not easy to create functions for low-level argument manipulation,
+ as "shift" command in the function code does not affect the caller context.
+
+ + Call sites frequently refer to code two times,
+ when sourcing the definition and when executing the function.
+
+ + It is not clear when a library can rely on its relative
+ to have performed the sourcing already.
+
+ + Ideally, each library should detect if it has been sourced already
+ and return early, which takes even more space.
+
++ Pros:
+
+ + Some code blocks are more useful when used as function,
+ to make call site shorter.
+
+ + Examples: Trap functions, "die" function.
+
+ + The "import" part and "function" part usually have different side effects,
+ making the documentation more focused (even if longer overall).
+
+ + There is zero risk of argument-less invocation picking arguments
+ from parent context.
+
+ + This safety feature is the main reason for chosing the "function way".
+
+ + This allows code blocks to support optional arguments.
+
++ Rules:
+
+ + Library files MUST be only "source"d. For example if "tox" calls a script,
+ it is an entry script.
+
+ + Library files (upon sourcing) MUST minimize size effect.
+
+ + The only permitted side effects MUST by directly related to:
+
+ + Defining functions (without executing them).
+
+ + Sourcing sub-library files.
+
+ + If a bash script indirectly call another bash script,
+ it is not a "source" operation, variables are not shared,
+ so the called script MUST be considered an entry script,
+ even if it implements logic fitting into a single function.
+
+ + Entry scripts SHOULD avoid duplicating any logic.
+
+ + Clear duplicated blocks MUST be moved into libraries as functions.
+
+ + Blocks with low amount of duplication MAY remain in entry scripts.
+
+ + Usual motives for not creating functions are:
+
+ + The extracted function would have too much logic for processing
+ arguments (instead of hardcoding values as in entry script).
+
+ + The arguments needed would be too verbose.
+
+ + And using "set +x" would take too much vertical space
+ (when compared to entry script implementation).
+
+# Variables
+
+This document describes two kinds of variables: called "local" and "global".
+
++ Local variables:
+
+ + Variable name MUST contain only lower case letters, digits and underscores.
+
+ + Code MUST NOT export local variables.
+
+ + Code MUST NOT rely on local variables set in different contexts.
+
+ + Documentation is NOT REQUIRED.
+
+ + Variable name SHOULD be descriptive enough.
+
+ + Local variable MUST be initialized before first use.
+
+ + Code SHOULD have a comment if a reader might have missed
+ the initialization.
+
+ + Unset local variables when leaving the function.
+
+ + Explicitly typeset by "local" builtin command.
+
+ + Require strict naming convention, e.g. function_name__variable_name.
+
++ Global variables:
+
+ + Variable name MUST contain only upper case letters, digits and underscores.
+
+ + They SHOULD NOT be exported, unless external commands need them
+ (e.g. PYTHONPATH).
+
+ + Code MUST document if a function (or its inner call)
+ reads a global variable.
+
+ + Code MUST document if a function (or its inner call)
+ sets or rewrites a global variable.
+
+ + If a function "wants to return a value", it SHOULD be implemented
+ as the function setting (or rewriting) a global variable,
+ and the call sites reading that variable.
+
+ + If a function "wants to accept an argument", it IS RECOMMENDED
+ to be implemented as the call sites setting or rewriting global variables,
+ and the function reading that variables.
+ But see below for direct arguments.
+
++ Code MUST use curly brackets when referencing variables,
+ e.g. "${my_variable}".
+
+ + It makes related constructs (such as ${name:-default}) less surprising.
+
+ + It looks more similar to Robot Framework variables (which is good).
+
+# Arguments
+
+Bash scripts and functions MAY accept arguments, named "${1}", "${2}" and so on.
+As a whole available via "$@".
+You MAY use "shift" command to consume an argument.
+
+## Contexts
+
+Functions never have access to parent arguments, but they can read and write
+variables set or read by parent contexts.
+
+### Arguments Or Variables
+
++ Both arguments and global variables MAY act as an input.
+
++ In general, if the caller is likely to supply the value already placed
+ in a global variable of known name, it is RECOMMENDED
+ to use that global variable.
+
++ Construct "${NAME:-value}" can be used equally well for arguments,
+ so default values are possible for both input methods.
+
++ Arguments are positional, so there are restrictions on which input
+ is optional.
+
++ Functions SHOULD either look at arguments (possibly also
+ reading global variables to use as defaults), or look at variables only.
+
++ Code MUST NOT rely on "${0}", it SHOULD use "${BASH_SOURCE[0]}" instead
+ (and apply "readlink -e") to get the current block location.
+
++ For entry scripts, it is RECOMMENDED to use standard parsing capabilities.
+
+ + For most Linux distros, "getopt" is RECOMMENDED.
+
+# Working Directory Handling
+
++ Functions SHOULD act correctly without neither assuming
+ what the currect working directory is, nor changing it.
+
+ + That is why global variables and arguments SHOULD contain
+ (normalized) full paths.
+
+ + Motivation: Different call sites MAY rely on different working directories.
+
++ A function MAY return (also with nonzero exit code) when working directory
+ is changed.
+
+ + In this case the function documentation MUST clearly state where (and when)
+ is the working directory changed.
+
+ + Exception: Functions with undocumented exit code.
+
+ + Those functions MUST return nonzero code only on "set -e" or "die".
+
+ + Note that both "set -e" and "die" by default result in exit of the whole
+ entry script, but the caller MAY have altered that behavior
+ (by registering ERR trap, or redefining die function).
+
+ + Any callers which use "set +e" or "|| true" MUST make sure
+ their (and their caller ancestors') assumption on working directory
+ are not affected.
+
+ + Such callers SHOULD do that by restoring the original working directory
+ either in their code,
+
+ + or contributors SHOULD do such restoration in the function code,
+ (see below) if that is more convenient.
+
+ + Motivation: Callers MAY rely on this side effect to simplify their logic.
+
++ A function MAY assume a particular directory is already set
+ as the working directory (to save space).
+
+ + In this case function documentation MUST clearly state what the assumed
+ working directory is.
+
+ + Motivation: Callers MAY call several functions with common
+ directory of interest.
+
+ + Example: Several dowload actions to execute in sequence,
+ implemented as functions assuming ${DOWNLOAD_DIR}
+ is the working directory.
+
++ A function MAY change the working directory transiently,
+ before restoring it back before return.
+
+ + Such functions SHOULD use command "pushd" to change the working directory.
+
+ + Such functions SHOULD use "trap 'trap - RETURN; popd' RETURN"
+ imediately after the pushd.
+
+ + In that case, the "trap - RETURN" part MUST be included,
+ to restore any trap set by ancestor.
+
+ + Functions MAY call "trap - RETURN; popd" exlicitly.
+
+ + Such functions MUST NOT call another pushd (before an explicit popd),
+ as traps do not stack within a function.
+
++ If entry scripts also use traps to restore working directory (or other state),
+ they SHOULD use EXIT traps instead.
+
+ + That is because "exit" command, as well as the default behavior
+ of "die" or "set -e" cause direct exit (without skipping function returns).
+
+# Function Size
+
++ In general, code SHOULD follow reasoning similar to how pylint
+ limits code complexity.
+
++ It is RECOMMENDED to have functions somewhat simpler than Python functions,
+ as Bash is generally more verbose and less readable.
+
++ If code contains comments in order to partition a block
+ into sub-blocks, the sub-blocks SHOULD be moved into separate functions.
+
+ + Unless the sub-blocks are essentially one-liners,
+ not readable just because external commands do not have
+ obvious enough parameters. Use common sense.
+
+# Documentation
+
++ The library path and filename is visible from source sites. It SHOULD be
+ descriptive enough, so reader do not need to look inside to determine
+ how and why is the sourced file used.
+
+ + If code would use several functions with similar names,
+ it is RECOMMENDED to create a (well-named) sub-library for them.
+
+ + Code MAY create deep library trees if needed, it SHOULD store
+ common path prefixes into global variables to make sourcing easier.
+
+ + Contributors, look at other files in the subdirectory. You SHOULD
+ improve their filenames when adding-removing other filenames.
+
+ + Library files SHOULD NOT have executable flag set.
+
+ + Library files SHOULD have an extension .sh (or perhaps .bash).
+
+ + It is RECOMMENDED for entry scripts to also have executable flag unset
+ and have .sh extension.
+
++ Each entry script MUST start with a shebang.
+
+ + "#!/bin/usr/env bash" is RECOMMENDED.
+
+ + Code SHOULD put an empty line after shebang.
+
+ + Library files SHOULD NOT contain a shebang, as "source" is the primary
+ method to include them.
+
++ Following that, there SHOULD be a block of comment lines with copyright.
+
+ + It is a boilerplate, but human eyes are good at ignoring it.
+
+ + Overhead for git is also negligible.
+
++ Following that, there MUST be "set -exuo pipefail".
+
+ + It acts as an anchor for humans to start paying attention.
+
+Then it depends on script type.
+
+## Library Documentation
+
++ Following "set -exuo pipefail" SHALL come the "import part" documentation.
+
++ Then SHALL be the import code
+ ("source" commands and a bare minimum they need).
+
++ Then SHALL be the function definitions, and inside:
+
+ + The body SHALL sart with the function documentation explaining API contract.
+ Similar to Robot [Documentation] or Python function-level docstring.
+
+ + See below.
+
+ + "set -exuo pipefail" SHALL be the first executable line
+ in the function body, except functions which legitimely need
+ different flags. Those SHALL also start with appropriate "set" command(s).
+
+ + Lines containing code itself SHALL follow.
+
+ + "Code itself" SHALL include comment lines
+ explaining any non-obvious logic.
+
+ + There SHALL be two empty lines between function definitions.
+
+More details on function documentation:
+
+Generally, code SHOULD use comments to explain anything
+not obvious from the funtion name.
+
++ Function documentation SHOULD start with short description of function
+ operation or motivation, but only if not obvious from function name.
+
++ Documentation SHOULD continue with listing any non-obvious side effect:
+
+ + Documentation MUST list all read global variables.
+
+ + Documentation SHOULD include descriptions of semantics
+ of global variable values.
+ It is RECOMMENDED to mention which function is supposed to set them.
+
+ + The "include descriptions" part SHOULD apply to other items as well.
+
+ + Documentation MUST list all global variables set, unset, reset,
+ or otherwise updated.
+
+ + It is RECOMMENDED to list all hardcoded values used in code.
+
+ + Not critical, but can hint at future improvements.
+
+ + Documentation MUST list all files or directories read
+ (so caller can make sure their content is ready).
+
+ + Documentation MUST list all files or directories updated
+ (created, deleted, emptied, otherwise edited).
+
+ + Documentation SHOULD list all functions called (so reader can look them up).
+
+ + Documentation SHOULD mention where are the functions defined,
+ if not in the current file.
+
+ + Documentation SHOULD list all external commands executed.
+
+ + Because their behavior can change "out of bounds", meaning
+ the contributor changing the implementation of the extrenal command
+ can be unaware of this particular function interested in its side effects.
+
+ + Documentation SHOULD explain exit code (coming from
+ the last executed command).
+
+ + Usually, most functions SHOULD be "pass or die",
+ but some callers MAY be interested in nonzero exit codes
+ without using global variables to store them.
+
+ + Remember, "exit 1" ends not only the function, but all scripts
+ in the source chain, so code MUST NOT use it for other purposes.
+
+ + Code SHOULD call "die" function instead. This way the caller can
+ redefine that function, if there is a good reason for not exiting
+ on function failure.
+
+## Entry Script Documentation
+
++ After "set -exuo pipefail", high-level description SHALL come.
+
+ + Entry scripts are rarely reused, so detailed side effects
+ are OPTIONAL to document.
+
+ + But code SHOULD document the primary side effects.
+
++ Then SHALL come few commented lines to import the library with "die" function.
+
++ Then block of "source" commands for sourcing other libraries needed SHALL be.
+
+ + In alphabetical order, any "special" library SHOULD be
+ in the previous block (for "die").
+
++ Then block os commands processing arguments SHOULD be (if needed).
+
++ Then SHALL come block of function calls (with parameters as needed).
+
+# Other General Recommendations
+
++ Code SHOULD NOT not repeat itself, even in documentation:
+
+ + For hardcoded values, a general description SHOULD be written
+ (instead of copying the value), so when someone edits the value
+ in the code, the description still applies.
+
+ + If affected directory name is taken from a global variable,
+ documentation MAY distribute the directory description
+ over the two items.
+
+ + If most of side effects come from an inner call,
+ documentation MAY point the reader to the documentation
+ of the called function (instead of listing all the side effects).
+
++ But documentation SHOULD repeat it if the information crosses functions.
+
+ + Item description MUST NOT be skipped just because the reader
+ should have read parent/child documentation already.
+
+ + Frequently it is RECOMMENDED to copy&paste item descriptions
+ between functions.
+
+ + But sometimes it is RECOMMENDED to vary the descriptions. For example:
+
+ + A global variable setter MAY document how does it figure out the value
+ (without caring about what it will be used for by other functions).
+
+ + A global variable reader MAY document how does it use the value
+ (without caring about how has it been figured out by the setter).
+
++ When possible, Bash code SHOULD be made to look like Python
+ (or Robot Framework). Those are three primary languages CSIT code relies on,
+ so it is nicer for the readers to see similar expressions when possible.
+ Examples:
+
+ + Code MUST use indentation, 1 level is 4 spaces.
+
+ + Code SHOULD use "if" instead of "&&" constructs.
+
+ + For comparisons, code SHOULD use operators such as "!=" (needs "[[").
+
++ Code MUST NOT use more than 80 characters per line.
+
+ + If long external command invocations are needed,
+ code SHOULD use array variables to shorten them.
+
+ + If long strings (or arrays) are needed, code SHOULD use "+=" operator
+ to grow the value over multiple lines.
+
+ + If "|| die" does not fit with the command, code SHOULD use curly braces:
+
+ + Current line has "|| {",
+
+ + Next line has the die commands (indented one level deeper),
+
+ + Final line closes with "}" at original intent level.
diff --git a/docs/toi/branches.md b/docs/toi/branches.md
index d52ac05d79..20759b9c78 100644
--- a/docs/toi/branches.md
+++ b/docs/toi/branches.md
@@ -1,10 +1,9 @@
-# Git Branches in CSIT
-
-#### Content
+---
+bookHidden: true
+title: "Git Branches in CSIT"
+---
-- [Overview](#overview)
-- [Operational Branches](#operational-branches)
-- [Release Branches](#release-branches)
+# Git Branches in CSIT
## Overview
@@ -122,6 +121,8 @@ patches as an example.
See the patch
[Update of VPP_REPO_URL and VPP_STABLE_VER files](https://gerrit.fd.io/r/c/csit/+/30461)
+and / or
+[rls2101: Update VPP_STABLE_VER files to release version](https://gerrit.fd.io/r/c/csit/+/30976)
as an example.
1. Find the last successful build on the
diff --git a/docs/toi/test_code_guidelines.md b/docs/toi/test_code_guidelines.md
new file mode 100644
index 0000000000..9707d63ea6
--- /dev/null
+++ b/docs/toi/test_code_guidelines.md
@@ -0,0 +1,294 @@
+---
+bookHidden: true
+title: "CSIT Test Code Guidelines"
+---
+
+# CSIT Test Code Guidelines
+
+The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED",
+"MAY", and "OPTIONAL" in this document are to be interpreted as
+described in [BCP 14](https://tools.ietf.org/html/bcp14),
+[RFC2119](https://tools.ietf.org/html/rfc2119),
+[RFC8174](https://tools.ietf.org/html/rfc8174)
+when, and only when, they appear in all capitals, as shown here.
+
+This document SHALL describe guidelines for writing reliable, maintainable,
+reusable and readable code for CSIT.
+
+# RobotFramework test case files and resource files
+
++ General
+
+ + Contributors SHOULD look at requirements.txt in root CSIT directory
+ for the currently used Robot Framework version.
+ Contributors SHOULD read
+ [Robot Framework User Guide](http://robotframework.org/robotframework/latest/RobotFrameworkUserGuide.html)
+ for more details.
+
+ + RobotFramework test case files and resource files
+ SHALL use special extension .robot
+
+ + Pipe and space separated file format (without trailing pipe
+ and without pipe aligning) SHALL be used.
+ Tabs are invisible characters, which are error prone.
+ 4-spaces separation is prone to accidental double space
+ acting as a separator.
+
+ + Files SHALL be encoded in UTF-8 (the default Robot source file encoding).
+ Usage of non-ASCII characters SHOULD be avoided if possible.
+ It is RECOMMENDED to
+ [escape](http://robotframework.org/robotframework/latest/RobotFrameworkUserGuide.html#escaping)
+ non-ASCII characters.
+
+ + Line length SHALL be limited to 80 characters.
+
+ + There SHALL be licence text present at the beginning of each file.
+
+ + Copy-pasting of the code NOT RECOMMENDED practice, any code that could be
+ re-used SHOULD be put into a library (Robot resource, Python library, ...).
+
++ Test cases
+
+ + It is RECOMMENDED to use data-driven test case definitions
+ anytime suite contains test cases similar in structure.
+ Typically, a suite SHOULD define a Template keyword, and test cases
+ SHOULD only specify tags and argument values
+
+ *** Settings ***
+ | Test Template | Local Template
+ ...
+
+ *** Test Cases ***
+ | tc01-64B-1c-eth-l2patch-mrr
+ | | [Tags] | 64B | 1C
+ | | framesize=${64} | phy_cores=${1}
+
+ + Test case templates (or testcases) SHALL be written in Behavior-driven style
+ i.e. in readable English, so that even non-technical project stakeholders
+ can understand it
+
+ *** Keywords ***
+ | Local Template
+ | | [Documentation]
+ | | ... | [Cfg] DUT runs L2 patch config with ${phy_cores} phy core(s).
+ | | ... | [Ver] Measure NDR and PDR values using MLRsearch algorithm.\
+ | | ...
+ | | ... | *Arguments:*
+ | | ... | - frame_size - Framesize in Bytes in integer
+ | | ... | or string (IMIX_v4_1). Type: integer, string
+ | | ... | - phy_cores - Number of physical cores. Type: integer
+ | | ... | - rxq - Number of RX queues, default value: ${None}.
+ | | ... | Type: integer
+ | | ...
+ | | [Arguments] | ${frame_size} | ${phy_cores} | ${rxq}=${None}
+ | | ...
+ | | Set Test Variable | \${frame_size}
+ | | ...
+ | | Given Add worker threads and rxqueues to all DUTs
+ | | ... | ${phy_cores} | ${rxq}
+ | | And Add PCI devices to all DUTs
+ | | Set Max Rate And Jumbo And Handle Multi Seg
+ | | And Apply startup configuration on all VPP DUTs
+ | | When Initialize L2 patch
+ | | Then Find NDR and PDR intervals using optimized search
+
+ + Every suite and test case template (or testcase)
+ SHALL contain short documentation.
+ Generated CSIT web pages display the documentation.
+
+ + You SHOULD NOT use hard-coded constants.
+ It is RECOMMENDED to use the variable table
+ (\*\*\*Variables\*\*\*) to define test case specific values.
+ You SHALL use the assignment sign = after the variable name
+ to make assigning variables slightly more explicit
+
+ *** Variables ***
+ | ${traffic_profile}= | trex-stl-2n-ethip4-ip4src254
+
+ + Common test case specific settings of the test environment SHALL be done
+ in Test Setup keyword defined in the Setting table.
+
+ + Run Keywords construction is RECOMMENDED if it is more readable
+ than a keyword.
+
+ + Separate keyword is RECOMMENDED if the construction is less readable.
+
+ + Post-test cleaning and processing actions SHALL be done in Test Teardown
+ part of the Setting table (e.g. download statistics from VPP nodes).
+ This part is executed even if the test case has failed. On the other hand
+ it is possible to disable the tear-down from command line, thus leaving
+ the system in “broken” state for investigation.
+
+ + Every testcase SHALL be correctly tagged. List of defined tags is in
+ csit/docs/introduction/test_tag_documentation.rst
+
+ + Whenever possible, common tags SHALL be set using Force Tags
+ in Settings table.
+
+ + User high-level keywords specific for the particular test suite
+ SHOULD be implemented in the Keywords table of suitable Robot resource file
+ to enable readability and code-reuse.
+
+ + Such keywords MAY be implemented in Keywords table of the suite instead,
+ if the contributor believes no other test will use such keywords.
+ But this is NOT RECOMMENDED in general, as keywords in Resources
+ are easier to maintain.
+
+ + All test case names (and suite names) SHALL conform
+ to current naming convention.
+ https://wiki.fd.io/view/CSIT/csit-test-naming
+
+ + Frequently, different suites use the same test case layout.
+ It is RECOMMENDED to use autogeneration scripts available,
+ possibly extending them if their current functionality is not sufficient.
+
++ Resource files
+
+ + SHALL be used to implement higher-level keywords that are used in test cases
+ or other higher-level (or medium-level) keywords.
+
+ + Every keyword SHALL contain Documentation where the purpose and arguments
+ of the keyword are described. Also document types, return values,
+ and any specific assumptions the particular keyword relies on.
+
+ + A keyword usage example SHALL be the part of the Documentation.
+ The example SHALL use pipe and space separated format
+ (with escaped pipes and) with a trailing pipe.
+
+ + The reason was possbile usage of Robot's libdoc tool
+ to generate tests and resources documentation. In that case
+ example keyword usage would be rendered in table.
+
+ + Keyword name SHALL describe what the keyword does,
+ specifically and in a reasonable length (“short sentence”).
+
+ + Keyword names SHALL be short enough for call sites
+ to fit within line length limit.
+
+ + If a keyword argument has a most commonly used value, it is RECOMMENDED
+ to set it as default. This makes keyword code longer,
+ but suite code shorter, and readability (and maintainability)
+ of suites SHALL always more important.
+
+ + If there is intermediate data (created by one keyword, to be used
+ by another keyword) of singleton semantics (it is clear that the test case
+ can have at most one instance of such data, even if the instance
+ is complex, for example ${nodes}), it is RECOMMENDED to store it
+ in test variables. You SHALL document test variables read or written
+ by a keyword. This makes the test template code less verbose.
+ As soon as the data instance is not unique, you SHALL pass it around
+ via arguments and return values explicitly (this makes lower level keywords
+ more reusable and less bug prone).
+
+ + It is RECOMMENDED to pass arguments explicitly via [Arguments] line.
+ Setting test variables takes more space and is less explicit.
+ Using arguments embedded in keyword name makes them less visible,
+ and it makes it harder for the line containing the resulting long name
+ to fit into the maximum character limit, so you SHOULD NOT use them.
+
+# Python library files
+
++ General
+
+ + SHALL be used to implement low-level keywords that are called from
+ resource files (of higher-level keywords) or from test cases.
+
+ + Higher-level keywords MAY be implemented in python library file too.
+ it is RECOMMENDED especially in the case that their implementation
+ in resource file would be too difficult or impossible,
+ e.g. complex data structures or functional programming.
+
+ + Every keyword, Python module, class, method, enum SHALL contain
+ docstring with the short description and used input parameters
+ and possible return value(s) or raised exceptions.
+
+ + The docstrings SHOULD conform to
+ [PEP 257](https://www.python.org/dev/peps/pep-0257/)
+ and other quality standards.
+
+ + CSIT contributions SHALL use a specific formatting for documenting
+ arguments, return values and similar.
+
+ + Keyword usage examples MAY be grouped and used
+ in the class/module documentation string, to provide better overview
+ of the usage and relationships between keywords.
+
+ + Keyword name SHALL describe what the keyword does,
+ specifically and in a reasonable length (“short sentence”).
+ See https://wiki.fd.io/view/CSIT/csit-test-naming
+
+ + Python implementation of a keyword is a function,
+ so its name in the python library should be lowercase_with_underscores.
+ Robot call sites should usename with first letter capitalized, and spaces.
+
++ Coding
+
+ + It is RECOMMENDED to use some standard development tool
+ (e.g. PyCharm Community Edition) and follow
+ [PEP-8](https://www.python.org/dev/peps/pep-0008/) recommendations.
+
+ + All python code (not only Robot libraries) SHALL adhere to PEP-8 standard.
+ This is reported by CSIT Jenkins verify job.
+
+ + Indentation: You SHALL NOT use tab for indents!
+ Indent is defined as four spaces.
+
+ + Line length: SHALL be limited to 80 characters.
+
+ + CSIT Python code assumes PYTHONPATH is set
+ to the root of cloned CSIT git repository, creating a tree of sub-packages.
+ You SHALL use that tree for importing, for example
+
+ from resources.libraries.python.ssh import exec_cmd_no_error
+
+ + Imports SHALL be grouped in the following order:
+
+ 1. standard library imports,
+ 2. related third party imports,
+ 3. local application/library specific imports.
+
+ You SHALL put a blank line between each group of imports.
+
+ + You SHALL use two blank lines between top-level definitions,
+ one blank line between method definitions.
+
+ + You SHALL NOT execute any active code on library import.
+
+ + You SHALL NOT use global variables inside library files.
+
+ + You MAY define constants inside library files.
+
+ + It is NOT RECOMMENDED to use hard-coded constants (e.g. numbers,
+ paths without any description). It is RECOMMENDED to use
+ configuration file(s), like /csit/resources/libraries/python/Constants.py,
+ with appropriate comments.
+
+ + The code SHALL log at the lowest possible level of implementation,
+ for debugging purposes. You SHALL use same style for similar events.
+ You SHALL keep logging as verbose as necessary.
+
+ + You SHALL use the most appropriate exception not general one (Exception)
+ if possible. You SHOULD create your own exception
+ if necessary and implement there logging, level debug.
+
+ + You MAY use RuntimeException for generally unexpected failures.
+
+ + It is RECOMMENDED to use RuntimeError also for
+ infrastructure failures, e.g. losing SSH connection to SUT.
+
+ + You MAY use EnvironmentError and its cublasses instead,
+ if the distinction is informative for callers.
+
+ + It is RECOMMENDED to use AssertionError when SUT is at fault.
+
+ + For each class (e.g. exception) it is RECOMMENDED to implement __repr__()
+ which SHALL return a string usable as a constructor call
+ (including repr()ed arguments).
+ When logging, you SHOULD log the repr form, unless the internal structure
+ of the object in question would likely result in too long output.
+ This is helpful for debugging.
+
+ + For composing and formatting strings, you SHOULD use .format()
+ with named arguments.
+ Example: "repr() of name: {name!r}".format(name=name)
diff --git a/docs/toi/vagrant.md b/docs/toi/vagrant.md
new file mode 100644
index 0000000000..21890f86eb
--- /dev/null
+++ b/docs/toi/vagrant.md
@@ -0,0 +1,326 @@
+# FD.io CSIT Development Environment
+
+The intent of this document is to give you a quick start guide for setting up a CSIT development and testing environment inside a Vagrant VM.
+
+## Pulling CSIT code
+
+The first step is to pull the FD.io CSIT code. Eventhough the fastest way is to pull the code anonymously using https by typing the below command, the recommended way is to pull code via ssh if you intend to develop and commit changes upstream.
+```
+git clone https://gerrit.fd.io/r/csit
+```
+To pull the code via ssh, you'll first need to setup a Linux Foundation (LF) account as fd.io uses the Linux Foundations identity system. If you do not have an LF account, proceed to [Linux_Foundations_Identity_Setup](https://identity.linuxfoundation.org) to setup one. Once you have setup your Linux Foundation username and password, you can use if for all fd.io logins.
+
+After you've setup your account, make sure you have registered your [ssh key with
+gerrit](https://wiki.fd.io/view/DEV/Setting_up_Gerrit). Then pull the code by typing the below command. Replace USERNAME with your Linux Foundation username.
+
+```
+git clone ssh://USERNAME@gerrit.fd.io:29418/csit.git
+```
+
+## Standing up Linux VM
+
+To setup your dev environment, you'll want to stand up a Linux VM. The CSIT repo provides a
+Vagrantfile to help you quickly setup an Ubuntu Jammy VM. This file is located in the csit.infra.vagrant folder.
+
+If you haven't already installed Vagrant, install it by following the instructions [here](https://developer.hashicorp.com/vagrant/docs/installation).
+
+Vagrant works well with the VirtualBox provider. We have only tested Vagrant with the VirtualBox provider for setting up a CSIT dev/test environment. Install the VirtualBox hypervisor on your
+host machine by following the instructions for [Installing VirtualBox](https://www.virtualbox.org/wiki/Downloads).
+
+If you've more than one hypervisor in use on the host machine, you'll most likely encounter an error when bringing up the VM. You must ensure that other hyperviors such as Hyper-V or KVM are disabled.
+
+### Ensure KVM and Hyper-V are disabled on the host
+
+If you have a Linux machine, ensure KVM is disabled:
+```
+lsmod | grep kvm
+```
+If you see kvm or kvm_intel in the output, you'll need to use the blacklist command to add it to the deny list.
+```
+echo 'blacklist kvm-intel' | sudo tee -a /etc/modprobe.d/blacklist.conf
+```
+
+If you have a Windows machine, ensure Hyper-V is disabled in system settings.
+
+ - Right click on the Windows button and select 'Apps and Features'.
+ - Select Turn Windows Features on or off.
+ - Unselect Hyper-V and click OK.
+
+ Reboot your host machine for the changes to take effect.
+
+### Starting the Vagrant VM
+
+The CSIT Vagrantfile: csit/csit.infra.vagrant/Vagrantfile is used to start up the Ubuntu
+jammy VM with 8GB of RAM and 4 VCPUs. Vagrant boots up the VM and provisions software in it
+using ansible local. Ansible installation is not required on the host.
+
+The inventory path for ansible provisioning on the vagrant VM is located at:
+csit/fdio.infra.ansible/inventories/vagrant_inventory/hosts.
+
+The ansible playbook used for the vagrant host is located at:
+/home/vagrant/csit/fdio.infra.ansible/vagrant.yaml
+
+If your host OS is Linux, you may have to increase the maximum map count to a high value to
+ensure that the Linux Kernel allows the VirtualBox hypervisor to allocate the required memory
+maps. You can do this by typing the below command:
+```
+sudo sysctl -w vm.max_map_count=262144
+```
+
+If you're using a proxy, you'll need to export your proxy settings to facilitate software provisioning within the Vagrant VM.
+```
+export VAGRANT_APT_HTTP_PROXY=http://{Your_Proxy_URL}:{Proxy_Port}
+export VAGRANT_APT_HTTPS_PROXY=http://{Your_Proxy_URL}:{Proxy_Port}
+export VAGRANT_HTTPS_PROXY=http://{Your_Proxy_URL}:{Proxy_Port}
+export VAGRANT_HTTP_PROXY=http://{Your_Proxy_URL}:{Proxy_Port}
+```
+
+Ansible downloads stable VPP packages from Packagecloud. The VPP version used for testing
+can be set by updating the file: csit/VPP_STABLE_VER_UBUNTU_JAMMY.
+
+To bring up the Ubuntu jammy VM with virtualbox provider and provision software, type the command
+```
+vagrant up
+```
+
+If everything goes well, vagrant will boot up the VM, mount shared folders and provision all the required software for running CSIT tests. The csit repository on the host will be mounted at /home/vagrant/csit on the VM.
+
+### Running Device Tests
+
+After your VM is provisioned, start by running VPP device tests. To do this type the
+following commands:
+```
+vagrant ssh # login to the VM
+cd /home/vagrant/csit/resources/libraries/bash/entry
+./bootstrap_vpp_device.sh csit-vpp-device-master-ubuntu2004-1n-vbox
+```
+
+The script will pack and copy the test framework into the docker containers named csit-tg-* and csit-dut1-* via ssh. The copied tarball will be extracted in the docker container.
+Once the nodes are ready, you'll see device tests being executed in the docker container and the test results.
+
+### Your questions answered
+
+1) Where are the tests located and how are they written?
+
+ CSIT tests are written using an open source automation framework called the [Robot Framework](https://robotframework.org/). The tests are present in the /tests folder. Infact these tests are used as templates to generate new robot tests for testing various interface types. The new interface tests are generated at runtime and stored in the /generated/tests folder. For VM based testing of interfaces, you should see robot tests generated for 1GE interfaces in this folder.
+
+2) I am getting a robot error, [ ERROR ] Suite 'Tests' contains no tests matching tag '2 node single link topo', not matching tags 'avf', 'vhost', 'flow', 'NIC "HW 4xxx"', 'NIC "HW C4xxx"', 'NIC Amazon-Nitro-100G', 'NIC Amazon-Nitro-200G', 'NIC Amazon-Nitro-50G', 'NIC Intel-DSA', 'NIC Intel-E810CQ', 'NIC Intel-E810XXV', 'NIC Intel-E822CQ', 'NIC Intel-X520-DA2', 'NIC Intel-X553', 'NIC Intel-X710', 'NIC Intel-XL710', 'NIC Intel-XXV710', 'NIC Mellanox-CX556A', 'NIC Mellanox-CX6DX', 'NIC Mellanox-CX7VEAT' or 'NIC azure-mlx-40g' and matching name 'devicetest' in suite 'tests.vpp.device'. How do I resolve this?
+
+ This error means that the robot framework is missing the virtual interface specification in its configuration. To resolve this issue, update the file: resources/library/python/Constants.py by adding a mapping for the nic named virtual. For instance, create two mappings such as:
+ ```
+ "virtual": "1ge1p82540em"
+ "virtual": ["vfio-pci"]
+ ```
+ Add the appropriate mapping into,
+ - NIC_NAME_TO_CODE
+ - NIC_CODE_TO_SHORT_NAME
+ - NIC_NAME_TO_DRIVER
+ - NIC_DRIVER_TO_PLUGINS
+ - NIC_DRIVER_TO_TAG
+ - NIC_DRIVER_TO_SUITE_PREFIX
+ - NIC_DRIVER_TO_VFS
+ - DPDK_NIC_NAME_TO_DRIVER
+
+ After this, delete the /generated/tests folder. We are using job_spec files for test definition. See /resources/job_specs/vpp_device. If a job spec is missing for vbox, create a new job spec by just copying and pasting the existing vpp-1n-spr.md to vpp-1n-vbox.md. However, change the NIC to virtual!
+
+ Next, in the file: resources/libraries/bash/function/common.sh add the below line to create a substitution for the virtual NIC,
+ ```
+ awk_nics_sub_cmd+='gsub("virtual","1ge1p82540em");'
+ ```
+ Also, keep the “virtual” in vpp-1n-vbox.md (column 4).
+ Now re-run the tests and in robot command line (log) you should start seeing --test <name> --test <name> etc.
+
+3) Where can I find test run logs?
+
+ Test run logs are present in the /archives folder. You should find a file named log.hml in this folder.
+
+4) I am seeing a Docker image not found error when running tests.
+ How do I build the required docker images?
+
+ You should have two docker images inside the VM named,
+ - base-ubuntu2204:local
+ - csit_sut-ubuntu2204:local.
+
+ If these images are missing, you can create them by typing the below commands:
+ ```
+ cd /opt/csit-docker-images/base
+ docker build -t base-ubuntu2204:local .
+
+ cd /opt/csit-docker-images/csit-sut
+ docker build -t csit_sut-ubuntu2204:local .
+ ```
+
+5) VPP is failing to start inside the docker container. How do I fix this?
+
+ First start by looking at the log.html file. You should find the startup configuration used to start VPP. For instance, your startup.conf file could look like the below,
+ ```
+ {
+ log /var/log/vpp/vpp.log
+ cli-listen /run/vpp/cli.sock
+ cli-no-pager
+ gid vpp
+ full-coredump
+ }
+ socksvr
+ {
+ socket-name /run/vpp/api.sock
+ }
+ memory
+ {
+ main-heap-size 2G
+ main-heap-page-size 2M
+ default-hugepage-size 2M
+ }
+ statseg
+ {
+ size 2G
+ page-size 2M
+ per-node-counters on
+ }
+ plugins
+ {
+ plugin default
+ {
+ disable
+ }
+ plugin dpdk_plugin.so
+ {
+ enable
+ }
+ }
+ dpdk
+ {
+ dev 0000:00:10.0
+ dev 0000:00:11.0
+ }
+ ip6
+ {
+ hash-buckets 2000000
+ heap-size 4G
+ }
+ ```
+ One common reason for VPP not starting up is not allocating enough hugepages for VPP inside the VM. Increase the number of hugepages to 2560 by typing the below command and try running the tests again.
+
+ ```
+ sudo sysctl -w vm.nr_hugepages=2650
+ ```
+
+6) How do I check if the robot test cases for virtual interfaces have been successfully generated?
+
+ Check the /generated/tests folder for all the generated tests. If you're running VPP device tests, generated tests will be found in the sub-folder named vpp/device. If you've named your virtual interface "1ge1p82540em", you will find robot test files with names 2n1l-1ge1p82540em-*
+
+7) For debugging, how do I prevent the test environment from being torn down after a test run?
+
+ You can disable the CSIT framework from cleaning up the test environment by setting the environment variable CSIT_NO_CLEANUP=1.
+
+ To reset the environment back for regular test runs, reboot the VM by typing the command,
+
+ ```
+ vagrant reload
+ ```
+
+ This will terminate all docker containers and free up all pci interfaces grabbed by dpdk.
+
+ ```
+ cd /home/vagrant/csit/resources/libraries/bash/entry
+ CSIT_NO_CLEANUP=1 ./bootstrap_vpp_device.sh csit-vpp-device-master-ubuntu2004-1n-vbox
+ ```
+
+8) How do I ssh into the docker container for further troubleshooting?
+
+ First disable test environment cleanups by following the instructions above. This will leave the TG and DUT1 docker containers running. You can now ssh into the csit-dut1-* docker container for further troubleshooting, such as running VPP or robot tests by hand. To do so, find the port published by the DUT1 docker container by typing the below command and then ssh into the container as root. The default root password is Csit1234.
+ ```
+ docker ps # list all running containers and get the csit-dut1-* container ID
+ docker port ${DUT1_CONTAINER_ID} # get the published docker container port
+ ssh root@{HOST_IP_ADDRESS} -p {DOCKER_PORT} # ssh into the container
+ ```
+
+9) What's the CSIT test topology used for VM tests and where's the topology file?
+
+ CSIT generates a 2 node topology with a TG docker node connected to a DUT1 docker node.
+ The topology file is located at topologies/available/vpp_device.yaml
+ For instance, here's a sample topology file generated by CSIT -
+ ```
+ metadata:
+ version: 0.1
+ schema:
+ - resources/topology_schemas/2_node_topology.sch.yaml
+ - resources/topology_schemas/topology.sch.yaml
+ tags: [dcr, 2-node]
+
+ nodes:
+ TG:
+ type: "TG"
+ host: "10.0.2.15"
+ arch: "x86_64"
+ port: 32768
+ username: "root"
+ interfaces:
+ port0:
+ mac_address: "08:00:27:0f:e0:4d"
+ pci_address: "0000:00:08.0"
+ link: "link0"
+ model: virtual
+ driver: "e1000"
+ vlan: 0
+ port1:
+ mac_address: "08:00:27:61:f7:ad"
+ pci_address: "0000:00:09.0"
+ link: "link1"
+ model: virtual
+ driver: "e1000"
+ vlan: 0
+
+ DUT1:
+ type: "DUT"
+ host: "10.0.2.15"
+ arch: "x86_64"
+ port: 32769
+ username: "root"
+ interfaces:
+ port0:
+ mac_address: "08:00:27:38:5e:58"
+ pci_address: "0000:00:10.0"
+ link: "link0"
+ model: virtual
+ driver: "e1000"
+ vlan: 0
+ port1:
+ mac_address: "08:00:27:e3:f5:42"
+ pci_address: "0000:00:11.0"
+ link: "link1"
+ model: virtual
+ driver: "e1000"
+ vlan: 0
+ ```
+
+10) How do I run an ansible task after the VM is provisioned?
+
+ You can run a specific ansible task after the VM has been provisioned using ansible tags.
+ For instance, you can run the tasks that has been tagged docker_images, by typing the below ansible command,
+
+ ```
+ cd ~/csit/fdio.infra.ansible
+ ansible-playbook vagrant.yaml --tags "docker_images" -i inventories/vagrant_inventory/hosts
+ ```
+
+11) Docker image build is failing due to a network error. Where do I set proxy settings for Docker?
+
+ You can set proxy settings for Docker in the file ~/.docker/config.json. Update this file with your environment's proxy info -
+ ```
+ {
+ "proxies":
+ {
+ "default":
+ {
+ "httpProxy": "http://{Proxy_IP_Address}:{Proxy_Port}",
+ "httpsProxy": "http://{Proxy_IP_Address}:{Proxy_Port}",
+ "noProxy": "localhost,127.0.0.1"
+ }
+ }
+ }
+ ```
+
+12) Where should I set the proxy vars for Ansible?
+
+ Set Ansible proxy variables in the file - fdio.infra.ansible/roles/common/defaults/main.yaml. Uncomment the proxy_env: section and fill the correct proxy values for your dev/test environment.