1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
|
VNET (VPP Network Stack)
========================
The files associated with the VPP network stack layer are located in the
*./src/vnet* folder. The Network Stack Layer is basically an
instantiation of the code in the other layers. This layer has a vnet
library that provides vectorized layer-2 and 3 networking graph nodes, a
packet generator, and a packet tracer.
In terms of building a packet processing application, vnet provides a
platform-independent subgraph to which one connects a couple of
device-driver nodes.
Typical RX connections include "ethernet-input" \[full software
classification, feeds ipv4-input, ipv6-input, arp-input etc.\] and
"ipv4-input-no-checksum" \[if hardware can classify, perform ipv4 header
checksum\].
Effective graph dispatch function coding
----------------------------------------
Over the 15 years, multiple coding styles have emerged: a
single/dual/quad loop coding model (with variations) and a
fully-pipelined coding model.
Single/dual loops
-----------------
The single/dual/quad loop model variations conveniently solve problems
where the number of items to process is not known in advance: typical
hardware RX-ring processing. This coding style is also very effective
when a given node will not need to cover a complex set of dependent
reads.
Here is an quad/single loop which can leverage up-to-avx512 SIMD vector
units to convert buffer indices to buffer pointers:
```c
static uword
simulated_ethernet_interface_tx (vlib_main_t * vm,
vlib_node_runtime_t *
node, vlib_frame_t * frame)
{
u32 n_left_from, *from;
u32 next_index = 0;
u32 n_bytes;
u32 thread_index = vm->thread_index;
vnet_main_t *vnm = vnet_get_main ();
vnet_interface_main_t *im = &vnm->interface_main;
vlib_buffer_t *bufs[VLIB_FRAME_SIZE], **b;
u16 nexts[VLIB_FRAME_SIZE], *next;
n_left_from = frame->n_vectors;
from = vlib_frame_vector_args (frame);
/*
* Convert up to VLIB_FRAME_SIZE indices in "from" to
* buffer pointers in bufs[]
*/
vlib_get_buffers (vm, from, bufs, n_left_from);
b = bufs;
next = nexts;
/*
* While we have at least 4 vector elements (pkts) to process..
*/
while (n_left_from >= 4)
{
/* Prefetch next quad-loop iteration. */
if (PREDICT_TRUE (n_left_from >= 8))
{
vlib_prefetch_buffer_header (b[4], STORE);
vlib_prefetch_buffer_header (b[5], STORE);
vlib_prefetch_buffer_header (b[6], STORE);
vlib_prefetch_buffer_header (b[7], STORE);
}
/*
* $$$ Process 4x packets right here...
* set next[0..3] to send the packets where they need to go
*/
do_something_to (b[0]);
do_something_to (b[1]);
do_something_to (b[2]);
do_something_to (b[3]);
/* Process the next 0..4 packets */
b += 4;
next += 4;
n_left_from -= 4;
}
/*
* Clean up 0...3 remaining packets at the end of the incoming frame
*/
while (n_left_from > 0)
{
/*
* $$$ Process one packet right here...
* set next[0..3] to send the packets where they need to go
*/
do_something_to (b[0]);
/* Process the next packet */
b += 1;
next += 1;
n_left_from -= 1;
}
/*
* Send the packets along their respective next-node graph arcs
* Considerable locality of reference is expected, most if not all
* packets in the inbound vector will traverse the same next-node
* arc
*/
vlib_buffer_enqueue_to_next (vm, node, from, nexts, frame->n_vectors);
return frame->n_vectors;
}
```
Given a packet processing task to implement, it pays to scout around
looking for similar tasks, and think about using the same coding
pattern. It is not uncommon to recode a given graph node dispatch function
several times during performance optimization.
Creating Packets from Scratch
-----------------------------
At times, it's necessary to create packets from scratch and send
them. Tasks like sending keepalives or actively opening connections
come to mind. Its not difficult, but accurate buffer metadata setup is
required.
### Allocating Buffers
Use vlib_buffer_alloc, which allocates a set of buffer indices. For
low-performance applications, it's OK to allocate one buffer at a
time. Note that vlib_buffer_alloc(...) does NOT initialize buffer
metadata. See below.
In high-performance cases, allocate a vector of buffer indices,
and hand them out from the end of the vector; decrement _vec_len(..)
as buffer indices are allocated. See tcp_alloc_tx_buffers(...) and
tcp_get_free_buffer_index(...) for an example.
### Buffer Initialization Example
The following example shows the **main points**, but is not to be
blindly cut-'n-pasted.
```c
u32 bi0;
vlib_buffer_t *b0;
ip4_header_t *ip;
udp_header_t *udp;
/* Allocate a buffer */
if (vlib_buffer_alloc (vm, &bi0, 1) != 1)
return -1;
b0 = vlib_get_buffer (vm, bi0);
/* Initialize the buffer */
VLIB_BUFFER_TRACE_TRAJECTORY_INIT (b0);
/* At this point b0->current_data = 0, b0->current_length = 0 */
/*
* Copy data into the buffer. This example ASSUMES that data will fit
* in a single buffer, and is e.g. an ip4 packet.
*/
if (have_packet_rewrite)
{
clib_memcpy (b0->data, data, vec_len (data));
b0->current_length = vec_len (data);
}
else
{
/* OR, build a udp-ip packet (for example) */
ip = vlib_buffer_get_current (b0);
udp = (udp_header_t *) (ip + 1);
data_dst = (u8 *) (udp + 1);
ip->ip_version_and_header_length = 0x45;
ip->ttl = 254;
ip->protocol = IP_PROTOCOL_UDP;
ip->length = clib_host_to_net_u16 (sizeof (*ip) + sizeof (*udp) +
vec_len(udp_data));
ip->src_address.as_u32 = src_address->as_u32;
ip->dst_address.as_u32 = dst_address->as_u32;
udp->src_port = clib_host_to_net_u16 (src_port);
udp->dst_port = clib_host_to_net_u16 (dst_port);
udp->length = clib_host_to_net_u16 (vec_len (udp_data));
clib_memcpy (data_dst, udp_data, vec_len(udp_data));
if (compute_udp_checksum)
{
/* RFC 7011 section 10.3.2. */
udp->checksum = ip4_tcp_udp_compute_checksum (vm, b0, ip);
if (udp->checksum == 0)
udp->checksum = 0xffff;
}
b0->current_length = vec_len (sizeof (*ip) + sizeof (*udp) +
vec_len (udp_data));
}
b0->flags |= (VLIB_BUFFER_TOTAL_LENGTH_VALID;
/* sw_if_index 0 is the "local" interface, which always exists */
vnet_buffer (b0)->sw_if_index[VLIB_RX] = 0;
/* Use the default FIB index for tx lookup. Set non-zero to use another fib */
vnet_buffer (b0)->sw_if_index[VLIB_TX] = 0;
```
If your use-case calls for large packet transmission, use
vlib_buffer_chain_append_data_with_alloc(...) to create the requisite
buffer chain.
### Enqueueing packets for lookup and transmission
The simplest way to send a set of packets is to use
vlib_get_frame_to_node(...) to allocate fresh frame(s) to
ip4_lookup_node or ip6_lookup_node, add the constructed buffer
indices, and dispatch the frame using vlib_put_frame_to_node(...).
```c
vlib_frame_t *f;
f = vlib_get_frame_to_node (vm, ip4_lookup_node.index);
f->n_vectors = vec_len(buffer_indices_to_send);
to_next = vlib_frame_vector_args (f);
for (i = 0; i < vec_len (buffer_indices_to_send); i++)
to_next[i] = buffer_indices_to_send[i];
vlib_put_frame_to_node (vm, ip4_lookup_node_index, f);
```
It is inefficient to allocate and schedule single packet frames.
That's typical in case you need to send one packet per second, but
should **not** occur in a for-loop!
Packet tracer
-------------
Vlib includes a frame element \[packet\] trace facility, with a simple
debug CLI interface. The cli is straightforward: "trace add
input-node-name count" to start capturing packet traces.
To trace 100 packets on a typical x86\_64 system running the dpdk
plugin: "trace add dpdk-input 100". When using the packet generator:
"trace add pg-input 100"
To display the packet trace: "show trace"
Each graph node has the opportunity to capture its own trace data. It is
almost always a good idea to do so. The trace capture APIs are simple.
The packet capture APIs snapshoot binary data, to minimize processing at
capture time. Each participating graph node initialization provides a
vppinfra format-style user function to pretty-print data when required
by the VLIB "show trace" command.
Set the VLIB node registration ".format\_trace" member to the name of
the per-graph node format function.
Here's a simple example:
```c
u8 * my_node_format_trace (u8 * s, va_list * args)
{
vlib_main_t * vm = va_arg (*args, vlib_main_t *);
vlib_node_t * node = va_arg (*args, vlib_node_t *);
my_node_trace_t * t = va_arg (*args, my_trace_t *);
s = format (s, "My trace data was: %d", t-><whatever>);
return s;
}
```
The trace framework hands the per-node format function the data it
captured as the packet whizzed by. The format function pretty-prints the
data as desired.
Graph Dispatcher Pcap Tracing
-----------------------------
The vpp graph dispatcher knows how to capture vectors of packets in pcap
format as they're dispatched. The pcap captures are as follows:
```
VPP graph dispatch trace record description:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Major Version | Minor Version | NStrings | ProtoHint |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Buffer index (big endian) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ VPP graph node name ... ... | NULL octet |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Buffer Metadata ... ... | NULL octet |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Buffer Opaque ... ... | NULL octet |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Buffer Opaque 2 ... ... | NULL octet |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| VPP ASCII packet trace (if NStrings > 4) | NULL octet |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Packet data (up to 16K) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
```
Graph dispatch records comprise a version stamp, an indication of how
many NULL-terminated strings will follow the record header and preceed
packet data, and a protocol hint.
The buffer index is an opaque 32-bit cookie which allows consumers of
these data to easily filter/track single packets as they traverse the
forwarding graph.
Multiple records per packet are normal, and to be expected. Packets
will appear multiple times as they traverse the vpp forwarding
graph. In this way, vpp graph dispatch traces are significantly
different from regular network packet captures from an end-station.
This property complicates stateful packet analysis.
Restricting stateful analysis to records from a single vpp graph node
such as "ethernet-input" seems likely to improve the situation.
As of this writing: major version = 1, minor version = 0. Nstrings
SHOULD be 4 or 5. Consumers SHOULD be wary values less than 4 or
greater than 5. They MAY attempt to display the claimed number of
strings, or they MAY treat the condition as an error.
Here is the current set of protocol hints:
```c
typedef enum
{
VLIB_NODE_PROTO_HINT_NONE = 0,
VLIB_NODE_PROTO_HINT_ETHERNET,
VLIB_NODE_PROTO_HINT_IP4,
VLIB_NODE_PROTO_HINT_IP6,
VLIB_NODE_PROTO_HINT_TCP,
VLIB_NODE_PROTO_HINT_UDP,
VLIB_NODE_N_PROTO_HINTS,
} vlib_node_proto_hint_t;
```
Example: VLIB_NODE_PROTO_HINT_IP6 means that the first octet of packet
data SHOULD be 0x60, and should begin an ipv6 packet header.
Downstream consumers of these data SHOULD pay attention to the
protocol hint. They MUST tolerate inaccurate hints, which MAY occur
from time to time.
### Dispatch Pcap Trace Debug CLI
To start a dispatch trace capture of up to 10,000 trace records:
```
pcap dispatch trace on max 10000 file dispatch.pcap
```
To start a dispatch trace which will also include standard vpp packet
tracing for packets which originate in dpdk-input:
```
pcap dispatch trace on max 10000 file dispatch.pcap buffer-trace dpdk-input 1000
```
To save the pcap trace, e.g. in /tmp/dispatch.pcap:
```
pcap dispatch trace off
```
### Wireshark dissection of dispatch pcap traces
It almost goes without saying that we built a companion wireshark
dissector to display these traces. As of this writing, we have
upstreamed the wireshark dissector.
Since it will be a while before wireshark/master/latest makes it into
all of the popular Linux distros, please see the "How to build a vpp
dispatch trace aware Wireshark" page for build info.
Here is a sample packet dissection, with some fields omitted for
clarity. The point is that the wireshark dissector accurately
displays **all** of the vpp buffer metadata, and the name of the graph
node in question.
```
Frame 1: 2216 bytes on wire (17728 bits), 2216 bytes captured (17728 bits)
Encapsulation type: USER 13 (58)
[Protocols in frame: vpp:vpp-metadata:vpp-opaque:vpp-opaque2:eth:ethertype:ip:tcp:data]
VPP Dispatch Trace
BufferIndex: 0x00036663
NodeName: ethernet-input
VPP Buffer Metadata
Metadata: flags:
Metadata: current_data: 0, current_length: 102
Metadata: current_config_index: 0, flow_id: 0, next_buffer: 0
Metadata: error: 0, n_add_refs: 0, buffer_pool_index: 0
Metadata: trace_index: 0, recycle_count: 0, len_not_first_buf: 0
Metadata: free_list_index: 0
Metadata:
VPP Buffer Opaque
Opaque: raw: 00000007 ffffffff 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Opaque: sw_if_index[VLIB_RX]: 7, sw_if_index[VLIB_TX]: -1
Opaque: L2 offset 0, L3 offset 0, L4 offset 0, feature arc index 0
Opaque: ip.adj_index[VLIB_RX]: 0, ip.adj_index[VLIB_TX]: 0
Opaque: ip.flow_hash: 0x0, ip.save_protocol: 0x0, ip.fib_index: 0
Opaque: ip.save_rewrite_length: 0, ip.rpf_id: 0
Opaque: ip.icmp.type: 0 ip.icmp.code: 0, ip.icmp.data: 0x0
Opaque: ip.reass.next_index: 0, ip.reass.estimated_mtu: 0
Opaque: ip.reass.fragment_first: 0 ip.reass.fragment_last: 0
Opaque: ip.reass.range_first: 0 ip.reass.range_last: 0
Opaque: ip.reass.next_range_bi: 0x0, ip.reass.ip6_frag_hdr_offset: 0
Opaque: mpls.ttl: 0, mpls.exp: 0, mpls.first: 0, mpls.save_rewrite_length: 0, mpls.bier.n_bytes: 0
Opaque: l2.feature_bitmap: 00000000, l2.bd_index: 0, l2.l2_len: 0, l2.shg: 0, l2.l2fib_sn: 0, l2.bd_age: 0
Opaque: l2.feature_bitmap_input: none configured, L2.feature_bitmap_output: none configured
Opaque: l2t.next_index: 0, l2t.session_index: 0
Opaque: l2_classify.table_index: 0, l2_classify.opaque_index: 0, l2_classify.hash: 0x0
Opaque: policer.index: 0
Opaque: ipsec.flags: 0x0, ipsec.sad_index: 0
Opaque: map.mtu: 0
Opaque: map_t.v6.saddr: 0x0, map_t.v6.daddr: 0x0, map_t.v6.frag_offset: 0, map_t.v6.l4_offset: 0
Opaque: map_t.v6.l4_protocol: 0, map_t.checksum_offset: 0, map_t.mtu: 0
Opaque: ip_frag.mtu: 0, ip_frag.next_index: 0, ip_frag.flags: 0x0
Opaque: cop.current_config_index: 0
Opaque: lisp.overlay_afi: 0
Opaque: tcp.connection_index: 0, tcp.seq_number: 0, tcp.seq_end: 0, tcp.ack_number: 0, tcp.hdr_offset: 0, tcp.data_offset: 0
Opaque: tcp.data_len: 0, tcp.flags: 0x0
Opaque: sctp.connection_index: 0, sctp.sid: 0, sctp.ssn: 0, sctp.tsn: 0, sctp.hdr_offset: 0
Opaque: sctp.data_offset: 0, sctp.data_len: 0, sctp.subconn_idx: 0, sctp.flags: 0x0
Opaque: snat.flags: 0x0
Opaque:
VPP Buffer Opaque2
Opaque2: raw: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Opaque2: qos.bits: 0, qos.source: 0
Opaque2: loop_counter: 0
Opaque2: gbp.flags: 0, gbp.src_epg: 0
Opaque2: pg_replay_timestamp: 0
Opaque2:
Ethernet II, Src: 06:d6:01:41:3b:92 (06:d6:01:41:3b:92), Dst: IntelCor_3d:f6 Transmission Control Protocol, Src Port: 22432, Dst Port: 54084, Seq: 1, Ack: 1, Len: 36
Source Port: 22432
Destination Port: 54084
TCP payload (36 bytes)
Data (36 bytes)
0000 cf aa 8b f5 53 14 d4 c7 29 75 3e 56 63 93 9d 11 ....S...)u>Vc...
0010 e5 f2 92 27 86 56 4c 21 ce c5 23 46 d7 eb ec 0d ...'.VL!..#F....
0020 a8 98 36 5a ..6Z
Data: cfaa8bf55314d4c729753e5663939d11e5f2922786564c21…
[Length: 36]
```
It's a matter of a couple of mouse-clicks in Wireshark to filter the
trace to a specific buffer index. With that specific kind of filtration,
one can watch a packet walk through the forwarding graph; noting any/all
metadata changes, header checksum changes, and so forth.
This should be of significant value when developing new vpp graph
nodes. If new code mispositions b->current_data, it will be completely
obvious from looking at the dispatch trace in wireshark.
## pcap rx, tx, and drop tracing
vpp also supports rx, tx, and drop packet capture in pcap format,
through the "pcap trace" debug CLI command.
This command is used to start or stop a packet capture, or show the
status of packet capture. Each of "pcap trace rx", "pcap trace tx",
and "pcap trace drop" is implemented. Supply one or more of "rx",
"tx", and "drop" to enable multiple simultaneous capture types.
These commands have the following optional parameters:
- <b>rx</b> - trace received packets.
- <b>tx</b> - trace transmitted packets.
- <b>drop</b> - trace dropped packets.
- <b>max _nnnn_</b> - file size, number of packet captures. Once
<nnnn> packets have been received, the trace buffer buffer is flushed
to the indicated file. Defaults to 1000. Can only be updated if packet
capture is off.
- <b>max-bytes-per-pkt _nnnn_</b> - maximum number of bytes to trace
on a per-packet basis. Must be >32 and less than 9000. Default value:
512.
- <b>filter</b> - Use the pcap rx / tx / drop trace filter, which must
be configured. Use <b>classify filter pcap...</b> to configure the
filter. The filter will only be executed if the per-interface or
any-interface tests fail.
- <b>intfc _interface_ | _any_</b> - Used to specify a given interface,
or use '<em>any</em>' to run packet capture on all interfaces.
'<em>any</em>' is the default if not provided. Settings from a previous
packet capture are preserved, so '<em>any</em>' can be used to reset
the interface setting.
- <b>file _filename_</b> - Used to specify the output filename. The
file will be placed in the '<em>/tmp</em>' directory. If _filename_
already exists, file will be overwritten. If no filename is
provided, '<em>/tmp/rx.pcap or tx.pcap</em>' will be used, depending
on capture direction. Can only be updated when pcap capture is off.
- <b>status</b> - Displays the current status and configured
attributes associated with a packet capture. If packet capture is in
progress, '<em>status</em>' also will return the number of packets
currently in the buffer. Any additional attributes entered on
command line with a '<em>status</em>' request will be ignored.
- <b>filter</b> - Capture packets which match the current packet
trace filter set. See next section. Configure the capture filter
first.
## packet trace capture filtering
The "classify filter pcap | <interface-name> | trace" debug CLI command
constructs an arbitrary set of packet classifier tables for use with
"pcap rx | tx | drop trace," and with the vpp packet tracer on a
per-interface or system-wide basis.
Packets which match a rule in the classifier table chain will be
traced. The tables are automatically ordered so that matches in the
most specific table are tried first.
It's reasonably likely that folks will configure a single table with
one or two matches. As a result, we configure 8 hash buckets and 128K
of match rule space by default. One can override the defaults by
specifying "buckets <nnn>" and "memory-size <xxx>" as desired.
To build up complex filter chains, repeatedly issue the classify
filter debug CLI command. Each command must specify the desired mask
and match values. If a classifier table with a suitable mask already
exists, the CLI command adds a match rule to the existing table. If
not, the CLI command add a new table and the indicated mask rule
### Configure a simple pcap classify filter
```
classify filter pcap mask l3 ip4 src match l3 ip4 src 192.168.1.11
pcap trace rx max 100 filter
```
### Configure a simple per-interface capture filter
```
classify filter GigabitEthernet3/0/0 mask l3 ip4 src match l3 ip4 src 192.168.1.11"
pcap trace rx max 100 intfc GigabitEthernet3/0/0
```
Note that per-interface capture filters are _always_ applied.
### Clear per-interface capture filters
```
classify filter GigabitEthernet3/0/0 del
```
### Configure another fairly simple pcap classify filter
```
classify filter pcap mask l3 ip4 src dst match l3 ip4 src 192.168.1.10 dst 192.168.2.10
pcap trace tx max 100 filter
```
### Configure a vpp packet tracer filter
```
classify filter trace mask l3 ip4 src dst match l3 ip4 src 192.168.1.10 dst 192.168.2.10
trace add dpdk-input 100 filter
```
### Clear all current classifier filters
```
classify filter [pcap | <interface> | trace] del
```
### To inspect the classifier tables
```
show classify table [verbose]
```
The verbose form displays all of the match rules, with hit-counters.
### Terse description of the "mask <xxx>" syntax:
```
l2 src dst proto tag1 tag2 ignore-tag1 ignore-tag2 cos1 cos2 dot1q dot1ad
l3 ip4 <ip4-mask> ip6 <ip6-mask>
<ip4-mask> version hdr_length src[/width] dst[/width]
tos length fragment_id ttl protocol checksum
<ip6-mask> version traffic-class flow-label src dst proto
payload_length hop_limit protocol
l4 tcp <tcp-mask> udp <udp_mask> src_port dst_port
<tcp-mask> src dst # ports
<udp-mask> src_port dst_port
```
To construct **matches**, add the values to match after the indicated
keywords in the mask syntax. For example: "... mask l3 ip4 src" ->
"... match l3 ip4 src 192.168.1.11"
## VPP Packet Generator
We use the VPP packet generator to inject packets into the forwarding
graph. The packet generator can replay pcap traces, and generate packets
out of whole cloth at respectably high performance.
The VPP pg enables quite a variety of use-cases, ranging from functional
testing of new data-plane nodes to regression testing to performance
tuning.
## PG setup scripts
PG setup scripts describe traffic in detail, and leverage vpp debug
CLI mechanisms. It's reasonably unusual to construct a pg setup script
which doesn't include a certain amount of interface and FIB configuration.
For example:
```
loop create
set int ip address loop0 192.168.1.1/24
set int state loop0 up
packet-generator new {
name pg0
limit 100
rate 1e6
size 300-300
interface loop0
node ethernet-input
data { IP4: 1.2.3 -> 4.5.6
UDP: 192.168.1.10 - 192.168.1.254 -> 192.168.2.10
UDP: 1234 -> 2345
incrementing 286
}
}
```
A packet generator stream definition includes two major sections:
- Stream Parameter Setup
- Packet Data
### Stream Parameter Setup
Given the example above, let's look at how to set up stream
parameters:
- **name pg0** - Name of the stream, in this case "pg0"
- **limit 1000** - Number of packets to send when the stream is
enabled. "limit 0" means send packets continuously.
- **maxframe \<nnn\>** - Maximum frame size. Handy for injecting
multiple frames no larger than \<nnn\>. Useful for checking dual /
quad loop codes
- **rate 1e6** - Packet injection rate, in this case 1 MPPS. When not
specified, the packet generator injects packets as fast as possible
- **size 300-300** - Packet size range, in this case send 300-byte packets
- **interface loop0** - Packets appear as if they were received on the
specified interface. This datum is used in multiple ways: to select
graph arc feature configuration, to select IP FIBs. Configure
features e.g. on loop0 to exercise those features.
- **tx-interface \<name\>** - Packets will be transmitted on the
indicated interface. Typically required only when injecting packets
into post-IP-rewrite graph nodes.
- **pcap \<filename\>** - Replay packets from the indicated pcap
capture file. "make test" makes extensive use of this feature:
generate packets using scapy, save them in a .pcap file, then inject
them into the vpp graph via a vpp pg "pcap \<filename\>" stream
definition
- **worker \<nn\>** - Generate packets for the stream using the
indicated vpp worker thread. The vpp pg generates and injects O(10
MPPS / core). Use multiple stream definitions and worker threads to
generate and inject enough traffic to easily fill a 40 gbit pipe with
small packets.
### Data definition
Packet generator data definitions make use of a layered implementation
strategy. Networking layers are specified in order, and the notation can
seem a bit counter-intuitive. In the example above, the data
definition stanza constructs a set of L2-L4 headers layers, and
uses an incrementing fill pattern to round out the requested 300-byte
packets.
- **IP4: 1.2.3 -> 4.5.6** - Construct an L2 (MAC) header with the ip4
ethertype (0x800), src MAC address of 00:01:00:02:00:03 and dst MAC
address of 00:04:00:05:00:06. Mac addresses may be specified in either
_xxxx.xxxx.xxxx_ format or _xx:xx:xx:xx:xx:xx_ format.
- **UDP: 192.168.1.10 - 192.168.1.254 -> 192.168.2.10** - Construct an
incrementing set of L3 (IPv4) headers for successive packets with
source addresses ranging from .10 to .254. All packets in the stream
have a constant dest address of 192.168.2.10. Set the protocol field
to 17, UDP.
- **UDP: 1234 -> 2345** - Set the UDP source and destination ports to
1234 and 2345, respectively
- **incrementing 256** - Insert up to 256 incrementing data bytes.
Obvious variations involve "s/IP4/IP6/" in the above, along with
changing from IPv4 to IPv6 address notation.
The vpp pg can set any / all IPv4 header fields, including tos, packet
length, mf / df / fragment id and offset, ttl, protocol, checksum, and
src/dst addresses. Take a look at ../src/vnet/ip/ip[46]_pg.c for
details.
If all else fails, specify the entire packet data in hex:
- **hex 0xabcd...** - copy hex data verbatim into the packet
When replaying pcap files ("**pcap \<filename\>**"), do not specify a
data stanza.
### Diagnosing "packet-generator new" parse failures
If you want to inject packets into a brand-new graph node, remember
to tell the packet generator debug CLI how to parse the packet
data stanza.
If the node expects L2 Ethernet MAC headers, specify ".unformat_buffer
= unformat_ethernet_header":
```
/* *INDENT-OFF* */
VLIB_REGISTER_NODE (ethernet_input_node) =
{
<snip>
.unformat_buffer = unformat_ethernet_header,
<snip>
};
```
Beyond that, it may be necessary to set breakpoints in
.../src/vnet/pg/cli.c. Debug image suggested.
When debugging new nodes, it may be far simpler to directly inject
ethernet frames - and add a corresponding vlib_buffer_advance in the
new node - than to modify the packet generator.
## Debug CLI
The descriptions above describe the "packet-generator new" debug CLI in
detail.
Additional debug CLI commands include:
```
vpp# packet-generator enable [<stream-name>]
```
which enables the named stream, or all streams.
```
vpp# packet-generator disable [<stream-name>]
```
disables the named stream, or all streams.
```
vpp# packet-generator delete <stream-name>
```
Deletes the named stream.
```
vpp# packet-generator configure <stream-name> [limit <nnn>]
[rate <f64-pps>] [size <nn>-<nn>]
```
Changes stream parameters without having to recreate the entire stream
definition. Note that re-issuing a "packet-generator new" command will
correctly recreate the named stream.
|