aboutsummaryrefslogtreecommitdiffstats
path: root/docs/bash_code_style.rst
blob: 3af2bb7ba650d3a497094f54a7a494d065b0f083 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
..
   Copyright (c) 2019 Cisco and/or its affiliates.
   Licensed under the Apache License, Version 2.0 (the "License");
   you may not use this file except in compliance with the License.
   You may obtain a copy of the License at:
..
       http://www.apache.org/licenses/LICENSE-2.0
..
   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.


The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED",
"MAY", and "OPTIONAL" in this document are to be interpreted as
described in `BCP 14 <https://tools.ietf.org/html/bcp14>`_
`[RFC2119] <https://tools.ietf.org/html/rfc2119>`_
`[RFC8174] <https://tools.ietf.org/html/rfc8174>`_
when, and only when, they appear in all capitals, as shown here.

This document SHALL describe guidelines for writing reliable, maintainable,
reusable and readable code for CSIT.

Motivation
^^^^^^^^^^

TODO: List reasons why we need code style document for Bash.

Proposed style
^^^^^^^^^^^^^^

File types
~~~~~~~~~~

Bash files SHOULD NOT be monolithic. Generally, this document
considers two types of bash files:

+ Entry script: Assumed to be called by user,
  or a script "external" in some way.

  + Sources bash libraries and calls functions defined there.

+ Library file: To be sourced by entry scipts, possibly also by other libraries.

  + Sources other libraries for functions it needs.

    + Or relies on a related file already having sourced that.

    + Documentation SHALL imply which case it is.

  + Defines multiple functions other scripts can call.

Safety
~~~~~~

+ Variable expansions MUST be quoted, to prevent word splitting.

  + This includes special "variables" such as "${1}".

    + RECOMMENDED even if the value is safe, as in "$?" and "$#".

  + It is RECOMMENDED to quote strings in general,
    so text editors can syntax-highlight them.

    + Even if the string is a numeric value.

    + Commands and known options can get their own highlight, no need to quote.

      + Example: You do not need to quote every word of
        "pip install --upgrade virtualenv".

  + Code SHALL NOT quote glob characters you need to expand (obviously).

    + OPTIONALLY do not quote adjacent characters (such as dot or fore-slash),
      so that syntax highlighting makes them stand out compared to surrounding
      ordinary strings.

    + Example: cp "logs"/*."log" "."/

    + TODO: Consider giving examples both for good and bad usage.

  + Command substitution on right hand side of assignment should be safe
    without quotes.

    + But still, quotes are RECOMMENDED, unless line length is a concern.

    + Note that command substitution limits the scope for quotes,
      so it is NOT REQUIRED to escape the quotes in deeper levels.

    + TODO: Do we prefer backtics, or "dollar round-bracket"?

      + Backticks are more readable, especially when there are
        round brackets in the surrounding text.

      + Backticks do not allow nested command substitution.

        + But we might want to avoid nested command substitution anyway?

  + Code SHOULD NOT be structured in a way where
    word splitting is intended.

    + Example: Variable holding string of multiple command lines arguments.

    + Solution: Array variable should be used in this case.

    + Expansion MUST use quotes then: "${name[@]}".

    + Word splitting MAY be used when creating arrays from command substitution.

+ Code MUST always check the exit code of commands.

  + Traditionally, error code checking is done either by "set -e"
    or by appending "|| die" after each command.
    The first is unreliable, due to many rules affecting "set -e" behavior
    (see <https://mywiki.wooledge.org/BashFAQ/105>), but "|| die"
    relies on humans identifying each command, which is also unreliable.
    When was the last time you checked error code of "echo" command,
    for example?

  + As there is no reliable method of error detection, and there are two
    largely independent unreliable methods, the best what we can do
    is to apply both. So, code SHOULD explicitly
    check each command (with "|| die" and similar) AND have "set -e" applied.

  + Code MUST explicitly check each command, unless the command is well known,
    and considered safe (such as the aforementioned "echo").

    + The well known commands MUST still be checked implicitly via "set -e".

  + See below for specific "set -e" recommendations.

+ Code SHOULD use "readlink -e" (or "-f" if target does not exist yet)
  to normalize any path value to absolute path without symlinks.
  It helps with debugging and identifies malformed paths.

+ Code SHOULD use such normalized paths for sourcing.

+ When exiting on a known error, code MUST print a longer, helpful message,
  in order for the user to fix their situation if possible.

+ When error happens at an unexpected place, it is RECOMMENDED for the message
  to be short and generic, instead of speculative.

Bash options
~~~~~~~~~~~~

+ Code MUST apply "-x" to make debugging easier.

  + Code MAY temporarily supress such output in order to avoid spam
    (e.g. in long busy loops), but it is still NOT RECOMMENDED to do so.

+ Code MUST apply "-e" for early error detection.

  + But code still SHOULD use "|| die" for most commands,
    as "-e" has numerous rules and exceptions.

  + Code MAY apply "+e" temporarily for commands which (possibly nonzero)
    exit code it interested in.

    + Code MUST to store "$?" and call "set -e" immediatelly afterwards.

  + Code MAY append "|| true" for benign commands,
    when it is clear non-zero exit codes make no difference.

+ Code MUST apply "-u" as unset variable is generally a typo, thus an error.

  + Code MAY temporarily apply "+u" if a command needs that to pass.

    + Virtualenv activation is the only known example so far.

+ Code MUST apply "-o pipefail" to make sure "-e" picks errors
  inside piped construct.

  + Code MAY use "|| true" inside a pipe construct, in the (inprobable) case
    when non-zero exit code still results in a meaningful pipe output.

+ All together: "set -exuo pipefail".

  + Code MUST put that line near start of every file, so we are sure
    the options are applied no matter what.

    + "Near start" means "before any nontrivial code".

    + Basically only copyright and long high-level comments are
      RECOMMENDED to appear before.

  + Also code MUST put the line near start of function bodies
    and subshell invocations.

Functions
~~~~~~~~~

There are (at least) two possibilities how a code from an external file
can be executed. Either the file contains a code block to execute
on each "source" invocation, or the file just defines functions
which have to be called separately.

This document considers the "function way" to be better,
here are some pros and cons:

+ Cons:

  + The function way takes more space. Files have more lines,
    and the code in function body is one indent deeper.

  + It is not easy to create functions for low-level argument manipulation,
    as "shift" command in the function code does not affect the caller context.

  + Call sites frequently refer to code two times,
    when sourcing the definition and when executing the function.

  + It is not clear when a library can rely on its relative
    to have performed the sourcing already.

  + Ideally, each library should detect if it has been sourced already
    and return early, which takes even more space.

+ Pros:

  + Some code blocks are more useful when used as function,
    to make call site shorter.

    + Examples: Trap functions, "die" function.

  + The "import" part and "function" part usually have different side effects,
    making the documentation more focused (even if longer overall).

  + There is zero risk of argument-less invocation picking arguments
    from parent context.

    + This safety feature is the main reason for chosing the "function way".

    + This allows code blocks to support optional arguments.

+ Rules:

  + Library files MUST be only "source"d. For example if "tox" calls a script,
    it is an entry script.

  + Library files (upon sourcing) MUST minimize size effect.

    + The only permitted side effects MUST by directly related to:

      + Defining functions (without executing them).

      + Sourcing sub-library files.

  + If a bash script indirectly call another bash script,
    it is not a "source" operation, variables are not shared,
    so the called script MUST be considered an entry script,
    even if it implements logic fitting into a single function.

  + Entry scripts SHOULD avoid duplicating any logic.

    + Clear duplicated blocks MUST be moved into libraries as functions.

    + Blocks with low amount of duplication MAY remain in entry scripts.

    + Usual motives for not creating functions are:

      + The extracted function would have too much logic for processing
        arguments (instead of hardcoding values as in entry script).

      + The arguments needed would be too verbose.

        + And using "set +x" would take too much vertical space
          (when compared to entry script implementation).

Variables
~~~~~~~~~

This document describes two kinds of variables: called "local" and "global".

TODO: Find better adjectives for the two kinds defined here,
if the usual bash meaning makes reader forget other specifics.
For example, variable with lowercase name and not marked by "local" builtin,
is cosidered "global" from bash point of view, but "local" from this document
point of view.

+ Local variables:

  + Variable name MUST contain only lower case letters, digits and underscores.

  + Code MUST NOT export local variables.

  + Code MUST NOT rely on local variables set in different contexts.

  + Documentation is NOT REQUIRED.

    + Variable name SHOULD be descriptive enough.

  + Local variable MUST be initialized before first use.

    + Code SHOULD have a comment if a reader might have missed
      the initialization.

  + TODO: Agree on level of defensiveness (against local values
    being influenced by other functions) needed.
    Possible alternatives / additions to the "always initialize" rule:

    + Unset local variables when leaving the function.

    + Explicitly typeset by "local" builtin command.

    + Require strict naming convention, e.g. function_name__variable_name.

+ Global variables:

  + Variable name MUST contain only upper case letters, digits and underscores.

  + They SHOULD NOT be exported, unless external commands need them
    (e.g. PYTHONPATH).

  + TODO: Propose a strict naming convention, or a central document
    of all used global variables, to prevent contributors
    from causing variable name conflicts.

  + Code MUST document if a function (or its inner call)
    reads a global variable.

  + Code MUST document if a function (or its inner call)
    sets or rewrites a global variable.

  + If a function "wants to return a value", it SHOULD be implemented
    as the function setting (or rewriting) a global variable,
    and the call sites reading that variable.

  + If a function "wants to accept an argument", it IS RECOMMENDED
    to be implemented as the call sites setting or rewriting global variables,
    and the function reading that variables.
    But see below for direct arguments.

+ Code MUST use curly brackets when referencing variables,
  e.g. "${my_variable}".

  + It makes related constructs (such as ${name:-default}) less surprising.

  + It looks more similar to Robot Framework variables (which is good).

Arguments
~~~~~~~~~

Bash scripts and functions MAY accept arguments, named "${1}", "${2}" and so on.
As a whole available via "$@".
You MAY use "shift" command to consume an argument.

Contexts
````````

Functions never have access to parent arguments, but they can read and write
variables set or read by parent contexts.

Arguments or variables
``````````````````````

+ Both arguments and global variables MAY act as an input.

+ In general, if the caller is likely to supply the value already placed
  in a global variable of known name, it is RECOMMENDED
  to use that global variable.

+ Construct "${NAME:-value}" can be used equally well for arguments,
  so default values are possible for both input methods.

+ Arguments are positional, so there are restrictions on which input
  is optional.

+ Functions SHOULD either look at arguments (possibly also
  reading global variables to use as defaults), or look at variables only.

+ Code MUST NOT rely on "${0}", it SHOULD use "${BASH_SOURCE[0]}" instead
  (and apply "readlink -e") to get the current block location.

+ For entry scripts, it is RECOMMENDED to use standard parsing capabilities.

  + For most Linux distros, "getopt" is RECOMMENDED.

Function size
~~~~~~~~~~~~~

+ In general, code SHOULD follow reasoning similar to how pylint
  limits code complexity.

+ It is RECOMMENDED to have functions somewhat simpler than Python functions,
  as Bash is generally more verbose and less readable.

+ If code contains comments in order to partition a block
  into sub-blocks, the sub-blocks SHOULD be moved into separate functions.

  + Unless the sub-blocks are essentially one-liners,
    not readable just because external commands do not have
    obvious enough parameters. Use common sense.

Documentation
~~~~~~~~~~~~~

+ The library path and filename is visible from source sites. It SHOULD be
  descriptive enough, so reader do not need to look inside to determine
  how and why is the sourced file used.

  + If code would use several functions with similar names,
    it is RECOMMENDED to create a (well-named) sub-library for them.

  + Code MAY create deep library trees if needed, it SHOULD store
    common path prefixes into global variables to make sourcing easier.

  + Contributors, look at other files in the subdirectory. You SHOULD
    improve their filenames when adding-removing other filenames.

  + Library files SHOULD NOT have executable flag set.

  + Library files SHOULD have an extension .sh (or perhaps .bash).

  + It is RECOMMENDED for entry scripts to also have executable flag unset
    and have .sh extension.

+ Each entry script MUST start with a shebang.

  + "#!/bin/usr/env bash" is RECOMMENDED.

  + Code SHOULD put an empty line after shebang.

  + Library files SHOULD NOT contain a shebang, as "source" is the primary
    method to include them.

+ Following that, there SHOULD be a block of comment lines with copyright.

  + It is a boilerplate, but human eyes are good at ignoring it.

  + Overhead for git is also negligible.

+ Following that, there MUST be "set -exuo pipefail".

  + It acts as an anchor for humans to start paying attention.

Then it depends on script type.

Library documentation
`````````````````````

+ Following "set -exuo pipefail" SHALL come the "import part" documentation.

+ Then SHALL be the import code
  ("source" commands and a bare minimum they need).

+ Then SHALL be the function definitions, and inside:

  + "set -exuo pipefail" again.

  + Following that SHALL be the function documentation explaining API contract.
    Similar to Robot [Documentation] or Python function-level docstring.

    + See below.

  + Following that SHALL be varius TODOs, FIXMEs and code itself.

    + "Code itself" SHALL include comment lines
      explaining any non-obvious logic.

    + TODO: Document (in an appropriate place) how TODOs differ from FIXMEs.

  + There SHALL be two empty lines before next function definition.

More details on function documentation:

Generally, code SHOULD use comments to explain anything
not obvious from the funtion name.

+ Function documentation SHOULD start with short description of function
  operation or motivation, but only if not obvious from function name.

+ Documentation SHOULD continue with listing any non-obvious side effect:

  + Documentation MUST list all read global variables.

    + Documentation SHOULD include descriptions of semantics
      of global variable values.
      It is RECOMMENDED to mention which function is supposed to set them.

    + The "include descriptions" part SHOULD apply to other items as well.

  + Documentation MUST list all global variables set, unset, reset,
    or otherwise updated.

  + It is RECOMMENDED to list all hardcoded values used in code.

    + Not critical, but can hint at future improvements.

  + Documentation MUST list all files or directories read
    (so caller can make sure their content is ready).

  + Documentation MUST list all files or directories updated
    (created, deleted, emptied, otherwise edited).

  + Documentation SHOULD list all functions called (so reader can look them up).

    + Documentation SHOULD mention where are the functions defined,
      if not in the current file.

  + Documentation SHOULD list all external commands executed.

    + Because their behavior can change "out of bounds", meaning
      the contributor changing the implementation of the extrenal command
      can be unaware of this particular function interested in its side effects.

  + Documentation SHOULD explain exit code (coming from
    the last executed command).

    + Usually, most functions SHOULD be "pass or die",
      but some callers MAY be interested in nonzero exit codes
      without using global variables to store them.

    + Remember, "exit 1" ends not only the function, but all scripts
      in the source chain, so code MUST NOT use it for other purposes.

      + Code SHOULD call "die" function instead. This way the caller can
        redefine that function, if there is a good reason for not exiting
        on function failure.

  + TODO: Programs installed, services started, URLs downloaded from, ...

  + TODO: Add more items when you spot them.

  + TODO: Is the current order recommended?

Entry script documentation
``````````````````````````

+ After "set -exuo pipefail", high-level description SHALL come.

  + Then TODOs and FIXMEs SHALL be placed (if any).

  + Entry scripts are rarely reused, so detailed side effects
    are OPTIONAL to document.

  + But code SHOULD document the primary side effects.

+ Then SHALL come few commented lines to import the library with "die" function.

+ Then block of "source" commands for sourcing other libraries needed SHALL be.

  + In alphabetical order, any "special" library SHOULD be
    in the previous block (for "die").

+ Then block os commands processing arguments SHOULD be (if needed).

+ Then SHALL come block of function calls (with parameters as needed).

Other general recommendations
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

+ Code SHOULD NOT not repeat itself, even in documentation:

  + For hardcoded values, a general description SHOULD be written
    (instead of copying the value), so when someone edits the value
    in the code, the description still applies.

  + If affected directory name is taken from a global variable,
    documentation MAY distribute the directory description
    over the two items.

  + If most of side effects come from an inner call,
    documentation MAY point the reader to the documentation
    of the called function (instead of listing all the side effects).

    + TODO: Composite functions can have large effects. Should we require
      intermediate functions to actively hide them whenever possible?

+ But documentation SHOULD repeat it if the information crosses functions.

  + Item description MUST NOT be skipped just because the reader
    should have read parent/child documentation already.

  + Frequently it is RECOMMENDED to copy&paste item descriptions
    between functions.

  + But sometimes it is RECOMMENDED to vary the descriptions. For example:

    + A global variable setter MAY document how does it figure out the value
      (without caring about what it will be used for by other functions).

    + A global variable reader MAY document how does it use the value
      (without caring about how has it been figured out by the setter).

+ When possible, Bash code SHOULD be made to look like Python
  (or Robot Framework). Those are three primary languages CSIT code relies on,
  so it is nicer for the readers to see similar expressions when possible.
  Examples:

  + Code MUST use indentation, 1 level is 4 spaces.

  + Code SHOULD use "if" instead of "&&" constructs.

  + For comparisons, code SHOULD use operators such as "!=" (needs "[[").

+ Code MUST NOT use more than 80 characters per line.

  + If long external command invocations are needed,
    code SHOULD use array variables to shorten them.

  + If long strings (or arrays) are needed, code SHOULD use "+=" operator
    to grow the value over multiple lines.

  + If "|| die" does not fit with the command, code SHOULD use curly braces:

    + Current line has "|| {",

    + Next line has the die commands (indented one level deeper),

    + Final line closes with "}" at original intent level.

  + TODO: Recommend what to do with other constructs.

    + For example multiple piped commands.

    + No, "eval" is too unsafe to use.