aboutsummaryrefslogtreecommitdiffstats
path: root/docs/toi/bash_code_style.md
diff options
context:
space:
mode:
Diffstat (limited to 'docs/toi/bash_code_style.md')
-rw-r--r--docs/toi/bash_code_style.md651
1 files changed, 651 insertions, 0 deletions
diff --git a/docs/toi/bash_code_style.md b/docs/toi/bash_code_style.md
new file mode 100644
index 0000000000..bbd0c37196
--- /dev/null
+++ b/docs/toi/bash_code_style.md
@@ -0,0 +1,651 @@
+---
+bookHidden: true
+title: "Bash Code Style"
+---
+
+The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED",
+"MAY", and "OPTIONAL" in this document are to be interpreted as
+described in [BCP 14](https://tools.ietf.org/html/bcp14),
+[RFC2119](https://tools.ietf.org/html/rfc2119),
+[RFC8174](https://tools.ietf.org/html/rfc8174)
+when, and only when, they appear in all capitals, as shown here.
+
+This document SHALL describe guidelines for writing reliable, maintainable,
+reusable and readable code for CSIT.
+
+# Proposed Style
+
+# File Types
+
+Bash files SHOULD NOT be monolithic. Generally, this document
+considers two types of bash files:
+
++ Entry script: Assumed to be called by user,
+ or a script "external" in some way.
+
+ + Sources bash libraries and calls functions defined there.
+
++ Library file: To be sourced by entry scipts, possibly also by other libraries.
+
+ + Sources other libraries for functions it needs.
+
+ + Or relies on a related file already having sourced that.
+
+ + Documentation SHALL imply which case it is.
+
+ + Defines multiple functions other scripts can call.
+
+# Safety
+
++ Variable expansions MUST be quoted, to prevent word splitting.
+
+ + This includes special "variables" such as "${1}".
+
+ + RECOMMENDED even if the value is safe, as in "$?" and "$#".
+
+ + It is RECOMMENDED to quote strings in general,
+ so text editors can syntax-highlight them.
+
+ + Even if the string is a numeric value.
+
+ + Commands and known options can get their own highlight, no need to quote.
+
+ + Example: You do not need to quote every word of
+ "pip install --upgrade virtualenv".
+
+ + Code SHALL NOT quote glob characters you need to expand (obviously).
+
+ + OPTIONALLY do not quote adjacent characters (such as dot or fore-slash),
+ so that syntax highlighting makes them stand out compared to surrounding
+ ordinary strings.
+
+ + Example: cp "logs"/*."log" "."/
+
+ + Command substitution on right hand side of assignment are safe
+ without quotes.
+
+ + Note that command substitution limits the scope for quotes,
+ so it is NOT REQUIRED to escape the quotes in deeper levels.
+
+ + Both backtics and "dollar round-bracket" provide command substitution.
+ The folowing rules are RECOMMENDED:
+
+ + For simple constructs, use "dollar round-bracket".
+
+ + If there are round brackets in the surrounding text, use backticks,
+ as some editor highlighting logic can get confused.
+
+ + Avoid nested command substitution.
+
+ + Put intermediate results into local variables,
+ use "|| die" on each step of command substitution.
+
+ + Code SHOULD NOT be structured in a way where
+ word splitting is intended.
+
+ + Example: Variable holding string of multiple command lines arguments.
+
+ + Solution: Array variable should be used in this case.
+
+ + Expansion MUST use quotes then: "${name[@]}".
+
+ + Word splitting MAY be used when creating arrays from command substitution.
+
++ Code MUST always check the exit code of commands.
+
+ + Traditionally, error code checking is done either by "set -e"
+ or by appending "|| die" after each command.
+ The first is unreliable, due to many rules affecting "set -e" behavior
+ (see <https://mywiki.wooledge.org/BashFAQ/105>), but "|| die"
+ relies on humans identifying each command, which is also unreliable.
+ When was the last time you checked error code of "echo" command,
+ for example?
+
+ + Another example: "set -e" in your function has no effect
+ if any ancestor call is done with logical or,
+ for example in "func || code=$?" construct.
+
+ + As there is no reliable method of error detection, and there are two
+ largely independent unreliable methods, the best what we can do
+ is to apply both. So, code SHOULD explicitly
+ check each command (with "|| die" and similar) AND have "set -e" applied.
+
+ + Code MUST explicitly check each command, unless the command is well known,
+ and considered safe (such as the aforementioned "echo").
+
+ + The well known commands MUST still be checked implicitly via "set -e".
+
+ + See below for specific "set -e" recommendations.
+
++ Code SHOULD use "readlink -e" (or "-f" if target does not exist yet)
+ to normalize any path value to absolute path without symlinks.
+ It helps with debugging and identifies malformed paths.
+
++ Code SHOULD use such normalized paths for sourcing.
+
++ When exiting on a known error, code MUST print a longer, helpful message,
+ in order for the user to fix their situation if possible.
+
++ When error happens at an unexpected place, it is RECOMMENDED for the message
+ to be short and generic, instead of speculative.
+
+# Bash Options
+
++ Code MUST apply "-x" to make debugging easier.
+
+ + Code MAY temporarily supress such output in order to avoid spam
+ (e.g. in long busy loops), but it is still NOT RECOMMENDED to do so.
+
++ Code MUST apply "-e" for early error detection.
+
+ + But code still SHOULD use "|| die" for most commands,
+ as "-e" has numerous rules and exceptions.
+
+ + Code MAY apply "+e" temporarily for commands which (possibly nonzero)
+ exit code it interested in.
+
+ + Code MUST to store "$?" and call "set -e" immediatelly afterwards.
+
+ + Code MUST NOT use this approach when calling functions.
+
+ + That is because functions are instructed to apply "set -e" on their own
+ which (when triggered) will exit the whole entry script.
+
+ + Unless overriden by ERR trap.
+ But code SHOULD NOT set any ERR trap.
+
+ + If code needs exit code of a function, it is RECOMMENDED to use
+ pattern 'code="0"; called_function || code="${?}"'.
+
+ + In this case, contributor MUST make sure nothing in the
+ called_function sub-graph relies on "set -e" behavior,
+ because the call being part of "or construct" disables it.
+
+ + Code MAY append "|| true" for benign commands,
+ when it is clear non-zero exit codes make no difference.
+
+ + Also in this case, the contributor MUST make sure nothing within
+ the called sub-graph depends on "set -e", as it is disabled.
+
++ Code MUST apply "-u" as unset variable is generally a typo, thus an error.
+
+ + Code MAY temporarily apply "+u" if a command needs that to pass.
+
+ + Virtualenv activation is the only known example so far.
+
++ Code MUST apply "-o pipefail" to make sure "-e" picks errors
+ inside piped construct.
+
+ + Code MAY use "|| true" inside a pipe construct, in the (inprobable) case
+ when non-zero exit code still results in a meaningful pipe output.
+
++ All together: "set -exuo pipefail".
+
+ + Code MUST put that line near start of every file, so we are sure
+ the options are applied no matter what.
+
+ + "Near start" means "before any nontrivial code".
+
+ + Basically only copyright is RECOMMENDED to appear before.
+
+ + Also code MUST put the line near start of function bodies
+ and subshell invocations.
+
+# Functions
+
+There are (at least) two possibilities how a code from an external file
+can be executed. Either the file contains a code block to execute
+on each "source" invocation, or the file just defines functions
+which have to be called separately.
+
+This document considers the "function way" to be better,
+here are some pros and cons:
+
++ Cons:
+
+ + The function way takes more space. Files have more lines,
+ and the code in function body is one indent deeper.
+
+ + It is not easy to create functions for low-level argument manipulation,
+ as "shift" command in the function code does not affect the caller context.
+
+ + Call sites frequently refer to code two times,
+ when sourcing the definition and when executing the function.
+
+ + It is not clear when a library can rely on its relative
+ to have performed the sourcing already.
+
+ + Ideally, each library should detect if it has been sourced already
+ and return early, which takes even more space.
+
++ Pros:
+
+ + Some code blocks are more useful when used as function,
+ to make call site shorter.
+
+ + Examples: Trap functions, "die" function.
+
+ + The "import" part and "function" part usually have different side effects,
+ making the documentation more focused (even if longer overall).
+
+ + There is zero risk of argument-less invocation picking arguments
+ from parent context.
+
+ + This safety feature is the main reason for chosing the "function way".
+
+ + This allows code blocks to support optional arguments.
+
++ Rules:
+
+ + Library files MUST be only "source"d. For example if "tox" calls a script,
+ it is an entry script.
+
+ + Library files (upon sourcing) MUST minimize size effect.
+
+ + The only permitted side effects MUST by directly related to:
+
+ + Defining functions (without executing them).
+
+ + Sourcing sub-library files.
+
+ + If a bash script indirectly call another bash script,
+ it is not a "source" operation, variables are not shared,
+ so the called script MUST be considered an entry script,
+ even if it implements logic fitting into a single function.
+
+ + Entry scripts SHOULD avoid duplicating any logic.
+
+ + Clear duplicated blocks MUST be moved into libraries as functions.
+
+ + Blocks with low amount of duplication MAY remain in entry scripts.
+
+ + Usual motives for not creating functions are:
+
+ + The extracted function would have too much logic for processing
+ arguments (instead of hardcoding values as in entry script).
+
+ + The arguments needed would be too verbose.
+
+ + And using "set +x" would take too much vertical space
+ (when compared to entry script implementation).
+
+# Variables
+
+This document describes two kinds of variables: called "local" and "global".
+
++ Local variables:
+
+ + Variable name MUST contain only lower case letters, digits and underscores.
+
+ + Code MUST NOT export local variables.
+
+ + Code MUST NOT rely on local variables set in different contexts.
+
+ + Documentation is NOT REQUIRED.
+
+ + Variable name SHOULD be descriptive enough.
+
+ + Local variable MUST be initialized before first use.
+
+ + Code SHOULD have a comment if a reader might have missed
+ the initialization.
+
+ + Unset local variables when leaving the function.
+
+ + Explicitly typeset by "local" builtin command.
+
+ + Require strict naming convention, e.g. function_name__variable_name.
+
++ Global variables:
+
+ + Variable name MUST contain only upper case letters, digits and underscores.
+
+ + They SHOULD NOT be exported, unless external commands need them
+ (e.g. PYTHONPATH).
+
+ + Code MUST document if a function (or its inner call)
+ reads a global variable.
+
+ + Code MUST document if a function (or its inner call)
+ sets or rewrites a global variable.
+
+ + If a function "wants to return a value", it SHOULD be implemented
+ as the function setting (or rewriting) a global variable,
+ and the call sites reading that variable.
+
+ + If a function "wants to accept an argument", it IS RECOMMENDED
+ to be implemented as the call sites setting or rewriting global variables,
+ and the function reading that variables.
+ But see below for direct arguments.
+
++ Code MUST use curly brackets when referencing variables,
+ e.g. "${my_variable}".
+
+ + It makes related constructs (such as ${name:-default}) less surprising.
+
+ + It looks more similar to Robot Framework variables (which is good).
+
+# Arguments
+
+Bash scripts and functions MAY accept arguments, named "${1}", "${2}" and so on.
+As a whole available via "$@".
+You MAY use "shift" command to consume an argument.
+
+## Contexts
+
+Functions never have access to parent arguments, but they can read and write
+variables set or read by parent contexts.
+
+### Arguments Or Variables
+
++ Both arguments and global variables MAY act as an input.
+
++ In general, if the caller is likely to supply the value already placed
+ in a global variable of known name, it is RECOMMENDED
+ to use that global variable.
+
++ Construct "${NAME:-value}" can be used equally well for arguments,
+ so default values are possible for both input methods.
+
++ Arguments are positional, so there are restrictions on which input
+ is optional.
+
++ Functions SHOULD either look at arguments (possibly also
+ reading global variables to use as defaults), or look at variables only.
+
++ Code MUST NOT rely on "${0}", it SHOULD use "${BASH_SOURCE[0]}" instead
+ (and apply "readlink -e") to get the current block location.
+
++ For entry scripts, it is RECOMMENDED to use standard parsing capabilities.
+
+ + For most Linux distros, "getopt" is RECOMMENDED.
+
+# Working Directory Handling
+
++ Functions SHOULD act correctly without neither assuming
+ what the currect working directory is, nor changing it.
+
+ + That is why global variables and arguments SHOULD contain
+ (normalized) full paths.
+
+ + Motivation: Different call sites MAY rely on different working directories.
+
++ A function MAY return (also with nonzero exit code) when working directory
+ is changed.
+
+ + In this case the function documentation MUST clearly state where (and when)
+ is the working directory changed.
+
+ + Exception: Functions with undocumented exit code.
+
+ + Those functions MUST return nonzero code only on "set -e" or "die".
+
+ + Note that both "set -e" and "die" by default result in exit of the whole
+ entry script, but the caller MAY have altered that behavior
+ (by registering ERR trap, or redefining die function).
+
+ + Any callers which use "set +e" or "|| true" MUST make sure
+ their (and their caller ancestors') assumption on working directory
+ are not affected.
+
+ + Such callers SHOULD do that by restoring the original working directory
+ either in their code,
+
+ + or contributors SHOULD do such restoration in the function code,
+ (see below) if that is more convenient.
+
+ + Motivation: Callers MAY rely on this side effect to simplify their logic.
+
++ A function MAY assume a particular directory is already set
+ as the working directory (to save space).
+
+ + In this case function documentation MUST clearly state what the assumed
+ working directory is.
+
+ + Motivation: Callers MAY call several functions with common
+ directory of interest.
+
+ + Example: Several dowload actions to execute in sequence,
+ implemented as functions assuming ${DOWNLOAD_DIR}
+ is the working directory.
+
++ A function MAY change the working directory transiently,
+ before restoring it back before return.
+
+ + Such functions SHOULD use command "pushd" to change the working directory.
+
+ + Such functions SHOULD use "trap 'trap - RETURN; popd' RETURN"
+ imediately after the pushd.
+
+ + In that case, the "trap - RETURN" part MUST be included,
+ to restore any trap set by ancestor.
+
+ + Functions MAY call "trap - RETURN; popd" exlicitly.
+
+ + Such functions MUST NOT call another pushd (before an explicit popd),
+ as traps do not stack within a function.
+
++ If entry scripts also use traps to restore working directory (or other state),
+ they SHOULD use EXIT traps instead.
+
+ + That is because "exit" command, as well as the default behavior
+ of "die" or "set -e" cause direct exit (without skipping function returns).
+
+# Function Size
+
++ In general, code SHOULD follow reasoning similar to how pylint
+ limits code complexity.
+
++ It is RECOMMENDED to have functions somewhat simpler than Python functions,
+ as Bash is generally more verbose and less readable.
+
++ If code contains comments in order to partition a block
+ into sub-blocks, the sub-blocks SHOULD be moved into separate functions.
+
+ + Unless the sub-blocks are essentially one-liners,
+ not readable just because external commands do not have
+ obvious enough parameters. Use common sense.
+
+# Documentation
+
++ The library path and filename is visible from source sites. It SHOULD be
+ descriptive enough, so reader do not need to look inside to determine
+ how and why is the sourced file used.
+
+ + If code would use several functions with similar names,
+ it is RECOMMENDED to create a (well-named) sub-library for them.
+
+ + Code MAY create deep library trees if needed, it SHOULD store
+ common path prefixes into global variables to make sourcing easier.
+
+ + Contributors, look at other files in the subdirectory. You SHOULD
+ improve their filenames when adding-removing other filenames.
+
+ + Library files SHOULD NOT have executable flag set.
+
+ + Library files SHOULD have an extension .sh (or perhaps .bash).
+
+ + It is RECOMMENDED for entry scripts to also have executable flag unset
+ and have .sh extension.
+
++ Each entry script MUST start with a shebang.
+
+ + "#!/bin/usr/env bash" is RECOMMENDED.
+
+ + Code SHOULD put an empty line after shebang.
+
+ + Library files SHOULD NOT contain a shebang, as "source" is the primary
+ method to include them.
+
++ Following that, there SHOULD be a block of comment lines with copyright.
+
+ + It is a boilerplate, but human eyes are good at ignoring it.
+
+ + Overhead for git is also negligible.
+
++ Following that, there MUST be "set -exuo pipefail".
+
+ + It acts as an anchor for humans to start paying attention.
+
+Then it depends on script type.
+
+## Library Documentation
+
++ Following "set -exuo pipefail" SHALL come the "import part" documentation.
+
++ Then SHALL be the import code
+ ("source" commands and a bare minimum they need).
+
++ Then SHALL be the function definitions, and inside:
+
+ + The body SHALL sart with the function documentation explaining API contract.
+ Similar to Robot [Documentation] or Python function-level docstring.
+
+ + See below.
+
+ + "set -exuo pipefail" SHALL be the first executable line
+ in the function body, except functions which legitimely need
+ different flags. Those SHALL also start with appropriate "set" command(s).
+
+ + Lines containing code itself SHALL follow.
+
+ + "Code itself" SHALL include comment lines
+ explaining any non-obvious logic.
+
+ + There SHALL be two empty lines between function definitions.
+
+More details on function documentation:
+
+Generally, code SHOULD use comments to explain anything
+not obvious from the funtion name.
+
++ Function documentation SHOULD start with short description of function
+ operation or motivation, but only if not obvious from function name.
+
++ Documentation SHOULD continue with listing any non-obvious side effect:
+
+ + Documentation MUST list all read global variables.
+
+ + Documentation SHOULD include descriptions of semantics
+ of global variable values.
+ It is RECOMMENDED to mention which function is supposed to set them.
+
+ + The "include descriptions" part SHOULD apply to other items as well.
+
+ + Documentation MUST list all global variables set, unset, reset,
+ or otherwise updated.
+
+ + It is RECOMMENDED to list all hardcoded values used in code.
+
+ + Not critical, but can hint at future improvements.
+
+ + Documentation MUST list all files or directories read
+ (so caller can make sure their content is ready).
+
+ + Documentation MUST list all files or directories updated
+ (created, deleted, emptied, otherwise edited).
+
+ + Documentation SHOULD list all functions called (so reader can look them up).
+
+ + Documentation SHOULD mention where are the functions defined,
+ if not in the current file.
+
+ + Documentation SHOULD list all external commands executed.
+
+ + Because their behavior can change "out of bounds", meaning
+ the contributor changing the implementation of the extrenal command
+ can be unaware of this particular function interested in its side effects.
+
+ + Documentation SHOULD explain exit code (coming from
+ the last executed command).
+
+ + Usually, most functions SHOULD be "pass or die",
+ but some callers MAY be interested in nonzero exit codes
+ without using global variables to store them.
+
+ + Remember, "exit 1" ends not only the function, but all scripts
+ in the source chain, so code MUST NOT use it for other purposes.
+
+ + Code SHOULD call "die" function instead. This way the caller can
+ redefine that function, if there is a good reason for not exiting
+ on function failure.
+
+## Entry Script Documentation
+
++ After "set -exuo pipefail", high-level description SHALL come.
+
+ + Entry scripts are rarely reused, so detailed side effects
+ are OPTIONAL to document.
+
+ + But code SHOULD document the primary side effects.
+
++ Then SHALL come few commented lines to import the library with "die" function.
+
++ Then block of "source" commands for sourcing other libraries needed SHALL be.
+
+ + In alphabetical order, any "special" library SHOULD be
+ in the previous block (for "die").
+
++ Then block os commands processing arguments SHOULD be (if needed).
+
++ Then SHALL come block of function calls (with parameters as needed).
+
+# Other General Recommendations
+
++ Code SHOULD NOT not repeat itself, even in documentation:
+
+ + For hardcoded values, a general description SHOULD be written
+ (instead of copying the value), so when someone edits the value
+ in the code, the description still applies.
+
+ + If affected directory name is taken from a global variable,
+ documentation MAY distribute the directory description
+ over the two items.
+
+ + If most of side effects come from an inner call,
+ documentation MAY point the reader to the documentation
+ of the called function (instead of listing all the side effects).
+
++ But documentation SHOULD repeat it if the information crosses functions.
+
+ + Item description MUST NOT be skipped just because the reader
+ should have read parent/child documentation already.
+
+ + Frequently it is RECOMMENDED to copy&paste item descriptions
+ between functions.
+
+ + But sometimes it is RECOMMENDED to vary the descriptions. For example:
+
+ + A global variable setter MAY document how does it figure out the value
+ (without caring about what it will be used for by other functions).
+
+ + A global variable reader MAY document how does it use the value
+ (without caring about how has it been figured out by the setter).
+
++ When possible, Bash code SHOULD be made to look like Python
+ (or Robot Framework). Those are three primary languages CSIT code relies on,
+ so it is nicer for the readers to see similar expressions when possible.
+ Examples:
+
+ + Code MUST use indentation, 1 level is 4 spaces.
+
+ + Code SHOULD use "if" instead of "&&" constructs.
+
+ + For comparisons, code SHOULD use operators such as "!=" (needs "[[").
+
++ Code MUST NOT use more than 80 characters per line.
+
+ + If long external command invocations are needed,
+ code SHOULD use array variables to shorten them.
+
+ + If long strings (or arrays) are needed, code SHOULD use "+=" operator
+ to grow the value over multiple lines.
+
+ + If "|| die" does not fit with the command, code SHOULD use curly braces:
+
+ + Current line has "|| {",
+
+ + Next line has the die commands (indented one level deeper),
+
+ + Final line closes with "}" at original intent level.