diff options
Diffstat (limited to 'docs/content/introduction/bash_code_style.md')
-rw-r--r-- | docs/content/introduction/bash_code_style.md | 651 |
1 files changed, 0 insertions, 651 deletions
diff --git a/docs/content/introduction/bash_code_style.md b/docs/content/introduction/bash_code_style.md deleted file mode 100644 index bbd0c37196..0000000000 --- a/docs/content/introduction/bash_code_style.md +++ /dev/null @@ -1,651 +0,0 @@ ---- -bookHidden: true -title: "Bash Code Style" ---- - -The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", -"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", -"MAY", and "OPTIONAL" in this document are to be interpreted as -described in [BCP 14](https://tools.ietf.org/html/bcp14), -[RFC2119](https://tools.ietf.org/html/rfc2119), -[RFC8174](https://tools.ietf.org/html/rfc8174) -when, and only when, they appear in all capitals, as shown here. - -This document SHALL describe guidelines for writing reliable, maintainable, -reusable and readable code for CSIT. - -# Proposed Style - -# File Types - -Bash files SHOULD NOT be monolithic. Generally, this document -considers two types of bash files: - -+ Entry script: Assumed to be called by user, - or a script "external" in some way. - - + Sources bash libraries and calls functions defined there. - -+ Library file: To be sourced by entry scipts, possibly also by other libraries. - - + Sources other libraries for functions it needs. - - + Or relies on a related file already having sourced that. - - + Documentation SHALL imply which case it is. - - + Defines multiple functions other scripts can call. - -# Safety - -+ Variable expansions MUST be quoted, to prevent word splitting. - - + This includes special "variables" such as "${1}". - - + RECOMMENDED even if the value is safe, as in "$?" and "$#". - - + It is RECOMMENDED to quote strings in general, - so text editors can syntax-highlight them. - - + Even if the string is a numeric value. - - + Commands and known options can get their own highlight, no need to quote. - - + Example: You do not need to quote every word of - "pip install --upgrade virtualenv". - - + Code SHALL NOT quote glob characters you need to expand (obviously). - - + OPTIONALLY do not quote adjacent characters (such as dot or fore-slash), - so that syntax highlighting makes them stand out compared to surrounding - ordinary strings. - - + Example: cp "logs"/*."log" "."/ - - + Command substitution on right hand side of assignment are safe - without quotes. - - + Note that command substitution limits the scope for quotes, - so it is NOT REQUIRED to escape the quotes in deeper levels. - - + Both backtics and "dollar round-bracket" provide command substitution. - The folowing rules are RECOMMENDED: - - + For simple constructs, use "dollar round-bracket". - - + If there are round brackets in the surrounding text, use backticks, - as some editor highlighting logic can get confused. - - + Avoid nested command substitution. - - + Put intermediate results into local variables, - use "|| die" on each step of command substitution. - - + Code SHOULD NOT be structured in a way where - word splitting is intended. - - + Example: Variable holding string of multiple command lines arguments. - - + Solution: Array variable should be used in this case. - - + Expansion MUST use quotes then: "${name[@]}". - - + Word splitting MAY be used when creating arrays from command substitution. - -+ Code MUST always check the exit code of commands. - - + Traditionally, error code checking is done either by "set -e" - or by appending "|| die" after each command. - The first is unreliable, due to many rules affecting "set -e" behavior - (see <https://mywiki.wooledge.org/BashFAQ/105>), but "|| die" - relies on humans identifying each command, which is also unreliable. - When was the last time you checked error code of "echo" command, - for example? - - + Another example: "set -e" in your function has no effect - if any ancestor call is done with logical or, - for example in "func || code=$?" construct. - - + As there is no reliable method of error detection, and there are two - largely independent unreliable methods, the best what we can do - is to apply both. So, code SHOULD explicitly - check each command (with "|| die" and similar) AND have "set -e" applied. - - + Code MUST explicitly check each command, unless the command is well known, - and considered safe (such as the aforementioned "echo"). - - + The well known commands MUST still be checked implicitly via "set -e". - - + See below for specific "set -e" recommendations. - -+ Code SHOULD use "readlink -e" (or "-f" if target does not exist yet) - to normalize any path value to absolute path without symlinks. - It helps with debugging and identifies malformed paths. - -+ Code SHOULD use such normalized paths for sourcing. - -+ When exiting on a known error, code MUST print a longer, helpful message, - in order for the user to fix their situation if possible. - -+ When error happens at an unexpected place, it is RECOMMENDED for the message - to be short and generic, instead of speculative. - -# Bash Options - -+ Code MUST apply "-x" to make debugging easier. - - + Code MAY temporarily supress such output in order to avoid spam - (e.g. in long busy loops), but it is still NOT RECOMMENDED to do so. - -+ Code MUST apply "-e" for early error detection. - - + But code still SHOULD use "|| die" for most commands, - as "-e" has numerous rules and exceptions. - - + Code MAY apply "+e" temporarily for commands which (possibly nonzero) - exit code it interested in. - - + Code MUST to store "$?" and call "set -e" immediatelly afterwards. - - + Code MUST NOT use this approach when calling functions. - - + That is because functions are instructed to apply "set -e" on their own - which (when triggered) will exit the whole entry script. - - + Unless overriden by ERR trap. - But code SHOULD NOT set any ERR trap. - - + If code needs exit code of a function, it is RECOMMENDED to use - pattern 'code="0"; called_function || code="${?}"'. - - + In this case, contributor MUST make sure nothing in the - called_function sub-graph relies on "set -e" behavior, - because the call being part of "or construct" disables it. - - + Code MAY append "|| true" for benign commands, - when it is clear non-zero exit codes make no difference. - - + Also in this case, the contributor MUST make sure nothing within - the called sub-graph depends on "set -e", as it is disabled. - -+ Code MUST apply "-u" as unset variable is generally a typo, thus an error. - - + Code MAY temporarily apply "+u" if a command needs that to pass. - - + Virtualenv activation is the only known example so far. - -+ Code MUST apply "-o pipefail" to make sure "-e" picks errors - inside piped construct. - - + Code MAY use "|| true" inside a pipe construct, in the (inprobable) case - when non-zero exit code still results in a meaningful pipe output. - -+ All together: "set -exuo pipefail". - - + Code MUST put that line near start of every file, so we are sure - the options are applied no matter what. - - + "Near start" means "before any nontrivial code". - - + Basically only copyright is RECOMMENDED to appear before. - - + Also code MUST put the line near start of function bodies - and subshell invocations. - -# Functions - -There are (at least) two possibilities how a code from an external file -can be executed. Either the file contains a code block to execute -on each "source" invocation, or the file just defines functions -which have to be called separately. - -This document considers the "function way" to be better, -here are some pros and cons: - -+ Cons: - - + The function way takes more space. Files have more lines, - and the code in function body is one indent deeper. - - + It is not easy to create functions for low-level argument manipulation, - as "shift" command in the function code does not affect the caller context. - - + Call sites frequently refer to code two times, - when sourcing the definition and when executing the function. - - + It is not clear when a library can rely on its relative - to have performed the sourcing already. - - + Ideally, each library should detect if it has been sourced already - and return early, which takes even more space. - -+ Pros: - - + Some code blocks are more useful when used as function, - to make call site shorter. - - + Examples: Trap functions, "die" function. - - + The "import" part and "function" part usually have different side effects, - making the documentation more focused (even if longer overall). - - + There is zero risk of argument-less invocation picking arguments - from parent context. - - + This safety feature is the main reason for chosing the "function way". - - + This allows code blocks to support optional arguments. - -+ Rules: - - + Library files MUST be only "source"d. For example if "tox" calls a script, - it is an entry script. - - + Library files (upon sourcing) MUST minimize size effect. - - + The only permitted side effects MUST by directly related to: - - + Defining functions (without executing them). - - + Sourcing sub-library files. - - + If a bash script indirectly call another bash script, - it is not a "source" operation, variables are not shared, - so the called script MUST be considered an entry script, - even if it implements logic fitting into a single function. - - + Entry scripts SHOULD avoid duplicating any logic. - - + Clear duplicated blocks MUST be moved into libraries as functions. - - + Blocks with low amount of duplication MAY remain in entry scripts. - - + Usual motives for not creating functions are: - - + The extracted function would have too much logic for processing - arguments (instead of hardcoding values as in entry script). - - + The arguments needed would be too verbose. - - + And using "set +x" would take too much vertical space - (when compared to entry script implementation). - -# Variables - -This document describes two kinds of variables: called "local" and "global". - -+ Local variables: - - + Variable name MUST contain only lower case letters, digits and underscores. - - + Code MUST NOT export local variables. - - + Code MUST NOT rely on local variables set in different contexts. - - + Documentation is NOT REQUIRED. - - + Variable name SHOULD be descriptive enough. - - + Local variable MUST be initialized before first use. - - + Code SHOULD have a comment if a reader might have missed - the initialization. - - + Unset local variables when leaving the function. - - + Explicitly typeset by "local" builtin command. - - + Require strict naming convention, e.g. function_name__variable_name. - -+ Global variables: - - + Variable name MUST contain only upper case letters, digits and underscores. - - + They SHOULD NOT be exported, unless external commands need them - (e.g. PYTHONPATH). - - + Code MUST document if a function (or its inner call) - reads a global variable. - - + Code MUST document if a function (or its inner call) - sets or rewrites a global variable. - - + If a function "wants to return a value", it SHOULD be implemented - as the function setting (or rewriting) a global variable, - and the call sites reading that variable. - - + If a function "wants to accept an argument", it IS RECOMMENDED - to be implemented as the call sites setting or rewriting global variables, - and the function reading that variables. - But see below for direct arguments. - -+ Code MUST use curly brackets when referencing variables, - e.g. "${my_variable}". - - + It makes related constructs (such as ${name:-default}) less surprising. - - + It looks more similar to Robot Framework variables (which is good). - -# Arguments - -Bash scripts and functions MAY accept arguments, named "${1}", "${2}" and so on. -As a whole available via "$@". -You MAY use "shift" command to consume an argument. - -## Contexts - -Functions never have access to parent arguments, but they can read and write -variables set or read by parent contexts. - -### Arguments Or Variables - -+ Both arguments and global variables MAY act as an input. - -+ In general, if the caller is likely to supply the value already placed - in a global variable of known name, it is RECOMMENDED - to use that global variable. - -+ Construct "${NAME:-value}" can be used equally well for arguments, - so default values are possible for both input methods. - -+ Arguments are positional, so there are restrictions on which input - is optional. - -+ Functions SHOULD either look at arguments (possibly also - reading global variables to use as defaults), or look at variables only. - -+ Code MUST NOT rely on "${0}", it SHOULD use "${BASH_SOURCE[0]}" instead - (and apply "readlink -e") to get the current block location. - -+ For entry scripts, it is RECOMMENDED to use standard parsing capabilities. - - + For most Linux distros, "getopt" is RECOMMENDED. - -# Working Directory Handling - -+ Functions SHOULD act correctly without neither assuming - what the currect working directory is, nor changing it. - - + That is why global variables and arguments SHOULD contain - (normalized) full paths. - - + Motivation: Different call sites MAY rely on different working directories. - -+ A function MAY return (also with nonzero exit code) when working directory - is changed. - - + In this case the function documentation MUST clearly state where (and when) - is the working directory changed. - - + Exception: Functions with undocumented exit code. - - + Those functions MUST return nonzero code only on "set -e" or "die". - - + Note that both "set -e" and "die" by default result in exit of the whole - entry script, but the caller MAY have altered that behavior - (by registering ERR trap, or redefining die function). - - + Any callers which use "set +e" or "|| true" MUST make sure - their (and their caller ancestors') assumption on working directory - are not affected. - - + Such callers SHOULD do that by restoring the original working directory - either in their code, - - + or contributors SHOULD do such restoration in the function code, - (see below) if that is more convenient. - - + Motivation: Callers MAY rely on this side effect to simplify their logic. - -+ A function MAY assume a particular directory is already set - as the working directory (to save space). - - + In this case function documentation MUST clearly state what the assumed - working directory is. - - + Motivation: Callers MAY call several functions with common - directory of interest. - - + Example: Several dowload actions to execute in sequence, - implemented as functions assuming ${DOWNLOAD_DIR} - is the working directory. - -+ A function MAY change the working directory transiently, - before restoring it back before return. - - + Such functions SHOULD use command "pushd" to change the working directory. - - + Such functions SHOULD use "trap 'trap - RETURN; popd' RETURN" - imediately after the pushd. - - + In that case, the "trap - RETURN" part MUST be included, - to restore any trap set by ancestor. - - + Functions MAY call "trap - RETURN; popd" exlicitly. - - + Such functions MUST NOT call another pushd (before an explicit popd), - as traps do not stack within a function. - -+ If entry scripts also use traps to restore working directory (or other state), - they SHOULD use EXIT traps instead. - - + That is because "exit" command, as well as the default behavior - of "die" or "set -e" cause direct exit (without skipping function returns). - -# Function Size - -+ In general, code SHOULD follow reasoning similar to how pylint - limits code complexity. - -+ It is RECOMMENDED to have functions somewhat simpler than Python functions, - as Bash is generally more verbose and less readable. - -+ If code contains comments in order to partition a block - into sub-blocks, the sub-blocks SHOULD be moved into separate functions. - - + Unless the sub-blocks are essentially one-liners, - not readable just because external commands do not have - obvious enough parameters. Use common sense. - -# Documentation - -+ The library path and filename is visible from source sites. It SHOULD be - descriptive enough, so reader do not need to look inside to determine - how and why is the sourced file used. - - + If code would use several functions with similar names, - it is RECOMMENDED to create a (well-named) sub-library for them. - - + Code MAY create deep library trees if needed, it SHOULD store - common path prefixes into global variables to make sourcing easier. - - + Contributors, look at other files in the subdirectory. You SHOULD - improve their filenames when adding-removing other filenames. - - + Library files SHOULD NOT have executable flag set. - - + Library files SHOULD have an extension .sh (or perhaps .bash). - - + It is RECOMMENDED for entry scripts to also have executable flag unset - and have .sh extension. - -+ Each entry script MUST start with a shebang. - - + "#!/bin/usr/env bash" is RECOMMENDED. - - + Code SHOULD put an empty line after shebang. - - + Library files SHOULD NOT contain a shebang, as "source" is the primary - method to include them. - -+ Following that, there SHOULD be a block of comment lines with copyright. - - + It is a boilerplate, but human eyes are good at ignoring it. - - + Overhead for git is also negligible. - -+ Following that, there MUST be "set -exuo pipefail". - - + It acts as an anchor for humans to start paying attention. - -Then it depends on script type. - -## Library Documentation - -+ Following "set -exuo pipefail" SHALL come the "import part" documentation. - -+ Then SHALL be the import code - ("source" commands and a bare minimum they need). - -+ Then SHALL be the function definitions, and inside: - - + The body SHALL sart with the function documentation explaining API contract. - Similar to Robot [Documentation] or Python function-level docstring. - - + See below. - - + "set -exuo pipefail" SHALL be the first executable line - in the function body, except functions which legitimely need - different flags. Those SHALL also start with appropriate "set" command(s). - - + Lines containing code itself SHALL follow. - - + "Code itself" SHALL include comment lines - explaining any non-obvious logic. - - + There SHALL be two empty lines between function definitions. - -More details on function documentation: - -Generally, code SHOULD use comments to explain anything -not obvious from the funtion name. - -+ Function documentation SHOULD start with short description of function - operation or motivation, but only if not obvious from function name. - -+ Documentation SHOULD continue with listing any non-obvious side effect: - - + Documentation MUST list all read global variables. - - + Documentation SHOULD include descriptions of semantics - of global variable values. - It is RECOMMENDED to mention which function is supposed to set them. - - + The "include descriptions" part SHOULD apply to other items as well. - - + Documentation MUST list all global variables set, unset, reset, - or otherwise updated. - - + It is RECOMMENDED to list all hardcoded values used in code. - - + Not critical, but can hint at future improvements. - - + Documentation MUST list all files or directories read - (so caller can make sure their content is ready). - - + Documentation MUST list all files or directories updated - (created, deleted, emptied, otherwise edited). - - + Documentation SHOULD list all functions called (so reader can look them up). - - + Documentation SHOULD mention where are the functions defined, - if not in the current file. - - + Documentation SHOULD list all external commands executed. - - + Because their behavior can change "out of bounds", meaning - the contributor changing the implementation of the extrenal command - can be unaware of this particular function interested in its side effects. - - + Documentation SHOULD explain exit code (coming from - the last executed command). - - + Usually, most functions SHOULD be "pass or die", - but some callers MAY be interested in nonzero exit codes - without using global variables to store them. - - + Remember, "exit 1" ends not only the function, but all scripts - in the source chain, so code MUST NOT use it for other purposes. - - + Code SHOULD call "die" function instead. This way the caller can - redefine that function, if there is a good reason for not exiting - on function failure. - -## Entry Script Documentation - -+ After "set -exuo pipefail", high-level description SHALL come. - - + Entry scripts are rarely reused, so detailed side effects - are OPTIONAL to document. - - + But code SHOULD document the primary side effects. - -+ Then SHALL come few commented lines to import the library with "die" function. - -+ Then block of "source" commands for sourcing other libraries needed SHALL be. - - + In alphabetical order, any "special" library SHOULD be - in the previous block (for "die"). - -+ Then block os commands processing arguments SHOULD be (if needed). - -+ Then SHALL come block of function calls (with parameters as needed). - -# Other General Recommendations - -+ Code SHOULD NOT not repeat itself, even in documentation: - - + For hardcoded values, a general description SHOULD be written - (instead of copying the value), so when someone edits the value - in the code, the description still applies. - - + If affected directory name is taken from a global variable, - documentation MAY distribute the directory description - over the two items. - - + If most of side effects come from an inner call, - documentation MAY point the reader to the documentation - of the called function (instead of listing all the side effects). - -+ But documentation SHOULD repeat it if the information crosses functions. - - + Item description MUST NOT be skipped just because the reader - should have read parent/child documentation already. - - + Frequently it is RECOMMENDED to copy&paste item descriptions - between functions. - - + But sometimes it is RECOMMENDED to vary the descriptions. For example: - - + A global variable setter MAY document how does it figure out the value - (without caring about what it will be used for by other functions). - - + A global variable reader MAY document how does it use the value - (without caring about how has it been figured out by the setter). - -+ When possible, Bash code SHOULD be made to look like Python - (or Robot Framework). Those are three primary languages CSIT code relies on, - so it is nicer for the readers to see similar expressions when possible. - Examples: - - + Code MUST use indentation, 1 level is 4 spaces. - - + Code SHOULD use "if" instead of "&&" constructs. - - + For comparisons, code SHOULD use operators such as "!=" (needs "[["). - -+ Code MUST NOT use more than 80 characters per line. - - + If long external command invocations are needed, - code SHOULD use array variables to shorten them. - - + If long strings (or arrays) are needed, code SHOULD use "+=" operator - to grow the value over multiple lines. - - + If "|| die" does not fit with the command, code SHOULD use curly braces: - - + Current line has "|| {", - - + Next line has the die commands (indented one level deeper), - - + Final line closes with "}" at original intent level. |