docs/developer/corearchitecture/multi_thread.rst


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169

.. _vpp_multi_thread:

Multi-threading in VPP
======================

Modes
-----

VPP can work in 2 different modes:

-  single-thread
-  multi-thread with worker threads

Single-thread
~~~~~~~~~~~~~

In a single-thread mode there is one main thread which handles both
packet processing and other management functions (Command-Line Interface
(CLI), API, stats). This is the default setup. There is no special
startup config needed.

Multi-thread with Worker Threads
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

In this mode, the main threads handles management functions(debug CLI,
API, stats collection) and one or more worker threads handle packet
processing from input to output of the packet.

Each worker thread polls input queues on subset of interfaces.

With RSS (Receive Side Scaling) enabled multiple threads can service one
physical interface (RSS function on NIC distributes traffic between
different queues which are serviced by different worker threads).

Thread placement
----------------

Thread placement is defined in the startup config under the cpu { … }
section.

The VPP platform can place threads automatically or manually. Automatic
placement works in the following way:

-  if “skip-cores X” is defined first X cores will not be used
-  if “main-core X” is defined, VPP main thread will be placed on core
   X, otherwise 1st available one will be used
-  if “workers N” is defined vpp will allocate first N available cores
   and it will run threads on them
-  if “corelist-workers A,B1-Bn,C1-Cn” is defined vpp will automatically
   assign those CPU cores to worker threads

User can see active placement of cores by using the VPP debug CLI
command show threads:

.. code-block:: console

   vpd# show threads
   ID     Name                Type        LWP     lcore  Core   Socket State
   0      vpe_main                        59723   2      2      0      wait
   1      vpe_wk_0            workers     59755   4      4      0      running
   2      vpe_wk_1            workers     59756   5      5      0      running
   3      vpe_wk_2            workers     59757   6      0      1      running
   4      vpe_wk_3            workers     59758   7      1      1      running
   5                          stats       59775
   vpd#

The sample output above shows the main thread running on core 2 (2nd
core on the CPU socket 0), worker threads running on cores 4-7.

Sample Configurations
---------------------

By default, at start-up VPP uses
configuration values from: ``/etc/vpp/startup.conf``

The following sections describe some of the additional changes that can be made to this file.
This file is initially populated from the files located in the following directory ``/vpp/vpp/conf/``

Manual Placement
~~~~~~~~~~~~~~~~

Manual placement places the main thread on core 1, workers on cores
4,5,20,21.

.. code-block:: console

   cpu {
     main-core 1
     corelist-workers  4-5,20-21
   }

Auto placement
--------------

Auto placement is likely to place the main thread on core 1 and workers
on cores 2,3,4.

.. code-block:: console

   cpu {
     skip-cores 1
     workers 3
   }

Buffer Memory Allocation
~~~~~~~~~~~~~~~~~~~~~~~~

The VPP platform is NUMA aware. It can allocate memory for buffers on
different CPU sockets (NUMA nodes). The amount of memory allocated can
be defined in the startup config for each CPU socket by using the
socket-mem A[[,B],C] statement inside the dpdk { … } section.

For example:

.. code-block:: console

   dpdk {
     socket-mem 1024,1024
   }

The above configuration allocates 1GB of memory on NUMA#0 and 1GB on
NUMA#1. Each worker thread uses buffers which are local to itself.

Buffer memory is allocated from hugepages. VPP prefers 1G pages if they
are available. If not 2MB pages will be used.

VPP takes care of mounting/unmounting hugepages file-system
automatically so there is no need to do that manually.

’‘’NOTE’’’: If you are running latest VPP release, there is no need for
specifying socket-mem manually. VPP will discover all NUMA nodes and it
will allocate 512M on each by default. socket-mem is only needed if
bigger number of mbufs is required (default is 16384 per socket and can
be changed with num-mbufs startup config command).

Interface Placement in Multi-thread Setup
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

On startup, the VPP platform assigns interfaces (or interface, queue
pairs if RSS is used) to different worker threads in round robin
fashion.

The following example shows debug CLI commands to show and change
interface placement:

.. code-block:: console

   vpd# sh dpdk interface placement
   Thread 1 (vpp_wk_0 at lcore 5):
    TenGigabitEthernet2/0/0 queue 0
    TenGigabitEthernet2/0/1 queue 0
   Thread 2 (vpp_wk_1 at lcore 6):
    TenGigabitEthernet2/0/0 queue 1
    TenGigabitEthernet2/0/1 queue 1

The following shows an example of moving TenGigabitEthernet2/0/1 queue 1
processing to 1st worker thread:

.. code-block:: console

   vpd# set interface placement TenGigabitEthernet2/0/1 queue 1 thread 1

   vpp# sh dpdk interface placement
   Thread 1 (vpp_wk_0 at lcore 5):
    TenGigabitEthernet2/0/0 queue 0
    TenGigabitEthernet2/0/1 queue 0
    TenGigabitEthernet2/0/1 queue 1
   Thread 2 (vpp_wk_1 at lcore 6):
    TenGigabitEthernet2/0/0 queue 1