Open vSwitch integration with P-virtualization
The current version of the RP at the time of the article was 7.0.13-31 , the kernel in this version is 3.10.0-1062.12.1.rv184.108.40.206, which corresponds to the RedHat version 7.7, and the OVS version that comes with the RP repository was 2.0.0. The list of functionality on OVS 2.0.0 can be found here . The keys for the RP can be found here .
The goal was to try to configure VXLAN and the virtual switch as an alternative to the native technology kvm-qemu and libvirt bridges. If you use naked OpenSource kvm-qemu, then everything is fine with OVS, but I wanted to try it with RP, where not just naked kvm-qemu + libvirt, but a lot of patches for this bundle are vstorage .
For this, of course, we need iron. The minimum requirements of the RP are three servers in each of three disks, of course, two if all are SSDs, but in my case one SSD is the rest of the HDDs. Further, the network is at least 10 gigabits two ports, in my case there were 20 gigabits two ports and an additional port 5 gigabits for entering tagged traffic for OVS into the cluster bypassing the classic bridges or in other words parallel use of the classic P-virtualization bridges with OVS. In general, in my case, the lucky one was Synergy with 3 blades, baskets with JBOD disks and an integrated switch.
Quick Installation Guide:
- We install the first node, in the anaconda we assign the IP host and from the same subnet a virtual IP for managing P-virtualization, and IP for managing P-storage. There should be at least 3 disks, one for the host (OS/hypervisor) in my case is HDD, the second and third disks are already configured through P-storage management, where one SSD is for cache and the other is for chunk server. On the second interface, we assign IP from a different subnet for a storage network without a gateway.
- After installation, we go through the chrome browser through the P-storage management IP address, then select the servers section and click on the box with a rectangle, and select the network section, in it we assign roles for the interface with the management subnet address (ssh, management, web roles cp) and for the interface with the storage subnet we assign roles (ssh, storage).
- Next, click to create a cluster, enter its name, the interface should connect to the storage role and select the detailed settings, make sure that the system has correctly assigned roles on the disks, one must be system, the other metadata service plus cache if it is an SSD and then the third HDD in storage roles (chunk server).
- In parallel with the installation of the first node, you can install the next two, assign each one two IP addresses, one for the P-virtualization control subnet, on the second interface from the storage subnet. If you leave the registration fields on the P-virtualization management server and P-storage management empty, you can continue the installation without specifying the management IP addresses, and then register later.
- After the subsequent nodes are installed, go to the IP address of these hosts via ssh in cli and register, where IP is the P-storage management address and the token can be copied from the p-storage web management when you click add node.
- After both nodes appear in the P-repository web management, perform the same actions as in step 2–3, instead of creating a cluster, select attach.
- Next, after creating the cluster, the services item appears in it. Create a datastore storage. Depending on the availability of disks and nodes, you can make a replica of 2 or 3, etc. if node 3 then replica 2, the rest is all by default.
- Next, go to the IP address of the P-virtualization control, click add if physical. if the server is not added and add the rest, then install the trial license, then in the host settings you can perform “Change host settings for virtual environments” instead of the default local folder, you can select for all items (for the first time it’s better) choose our datastore that was created in P -store and put a tick to apply to all hosts.
- After this, we migrate vstorage-ui and va-nm to any other host, it may take some time because it is a migration from local media to clustered ones.
- After that, open the ssh console of all three nodes and enter the HA enable command on each node where the IP address of the storage network is, then you need to check with the command #shaman stat.
- After that, you can start creating a VM, where I installed CentOS 7 as a guest.
node registration command to paragraph 5 described above:
#/usr/libexec/vstorage-ui-agent/bin/register-storage-node.sh -m 10.43.10.14 -t ec234873
HA enable point to point 10:
#hastart -c имя кластера -n 192.168.10.0/24
and HA check, where the output should be something like this:
[root@n3 ~]# shaman stat Cluster 'rptest' Nodes: 3 Resources: 7 NODE_IP STATUS ROLES RESOURCES 192.168.10.10 Active VM:QEMU,CT:VZ7 0 CT, 0 VM 192.168.10.11 Active VM:QEMU,CT:VZ7 0 CT, 0 VM *M 192.168.10.12 Active VM:QEMU,CT:VZ7 2 CT, 0 VM
Install and configure OVS
OVS was installed on the first and third nodes of the cluster with the following command:
#yum install openvswitch
After installation, you can check with the command
The output will be something like this:
[root@node1 ~]# ovs-vsctl show 180c5636-2d3d-4e08-9c95-fe5e47f1e5fa ovs_version: "2.0.0" [root@node1 ~]#
Next, you need to create a virtual switch bridge on which we will hang up ports, interfaces, or other bridges.
# ovs-vsctl add-br ovsbr0
We’ll name the bridge so that it is clear that this is an instance of one virtual switch.
Next, we can create a tagged bridge to add to a specific VM.
#ovs-vsctl add-br brlv140 ovsbr0 140
The tag may not be attached to any real tag from the physical port, this is only within the virtual switch.
Next, we assign it to the VM to the virtual network, where we first create the xml file:
<network> <name>ovsvl</name> <forward mode='bridge'/> <bridge name='brlv140'/> <vlan> <tag id='140'/> </vlan> <virtualport type='openvswitch'/> </network>
Unfortunately the web ui of P-management does not yet support settings with OVS, but they can be done through cli. To create and add virtual network adapters to the VM, I used web ui, but then through cli I changed the binding of these adapters to ovsvl and ovsvl2 instead of Bridged. The main thing is not to forget then that changes to the network settings of the VM equipment should already be made through cli, otherwise web ui will not return Bridged without knowing about OVS.
To view existing networks, use the command:
#virsh net-list --all
To add our network:
#virsh net-define ovsvl.xml
Next, you need to run/activate it
#virsh net-start ovsvl
And register in auto start
#virsh net-autostart ovsvl
Next, add this network to the VM
#virsh edit имяВМ
We find the necessary lines with the interfaces, edit them or add our own analogous ones by changing the poppy address and port number (slot):
<interface type='bridge'> <mac address='00:1c:42:c6:80:06'/>
<vlan> <tag id='140'/> </vlan> <virtualport type='openvswitch'> <parameters interfaceid='5a70be5b-5576-4734-9f61-61cdfc4a884a'/> </virtualport> <target dev='vme001c42c68006'/> <model type='virtio'/> <boot order='2'/> <alias name='net0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> </interface>
Editing is done by vi editor commands
Next, after editing, you must turn off and start the VM to apply the current settings:
#prlctl stop имя ВМ
#prlctl start имя ВМ
You can enter the command to check:
#virsh dumpxml имяВМ | grep имясети
After that, you can start network settings from inside the VM guest or add network ports to the switch to communicate with another cluster via VXLAN overlay:
#ovs-vsctl add-port ovsbr0 vxlan0 -- set Interface vxlan0 type=vxlan options:remote_ip=10.43.11.12
where the IP address is the address of the node on which the same settings are made as shown above. A direct connection between these addresses can be either through routers or through a VPN, and from different subnets, the main thing is that a route be configured between them. But it’s even more important that the physical port interface on which this address is assigned should be configured with an MTU of more than 1500 to pass large packets, since vxlan adds its data, a header of several bytes, but I did not bother to read all the bytes and just assigned 2000.
#ip link set mtu 2000 dev ens3f0
The bridge itself, depending on this interface, must also be with mtu2000, but it may not immediately inherit it and may need to be restarted.
On the side of the second cluster, execute on the node with the address 10.43.11.12 as described above the same settings only in vxlan assign the node address of the first setting in my case
#ovs-vsctl add-port ovsbr0 vxlan0 -- set Interface vxlan0 type=vxlan options:remote_ip=10.43.11.10
Next, also configure mtu.
If everything is set up correctly, then pings will go and it is possible to make connections via ssh, if, for example, you first set addresses from the same subnet from inside the VM. If you do a network analysis:
#tcpdump –i ens3f0 | grep 4789 ``` то можно увидеть пакеты с vxlan или c тегами vlan ```bash #tcpdump -ee -vvv -i ens3f0 | grep vlan
Next, you can configure a more convenient option for setting up a network without bridges through the portgroup virtual switch functionality.
To do this, create an xml network with the following:
<network> <name>ovsvl2</name> <forward mode='bridge'/> <bridge name='ovsbr0'/> <virtualport type='openvswitch'/> <portgroup name='vlan-120'> <vlan> <tag id='120'/> </vlan> </portgroup> </network>
You can create a new network or edit the previous one with these parameters, but in my case I add another network.
As described above, but in the VM as follows:
<interface type='bridge'> <mac address='00:1c:42:c8:f1:cd'/>
<vlan> <tag id='120'/> </vlan> <virtualport type='openvswitch'> <parameters interfaceid='ef717aa4-23b3-4fbe-82bb-193033f933b1'/> </virtualport> <target dev='vme001c42c8f1cd'/> <model type='virtio'/> <boot order='3'/> <alias name='net1'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/> </interface>
Next, save and restart the VM as described above, and on one of the virtual switches you can add a trunk port with certain tags from the physical port, that is, previously connected to one of the physical ports of the server with trunk tagged traffic from the physical switch.
In my case:
#ovs-vsctl set port ens3f4 trunks=120,130 #ovs-vsctl add-port ovsbr0 ens3f4
And you can add a port with a tag of 120 for VM:
#ovs-vsctl add-port ovsbr0 vlan120 tag=120 -- set interface vlan120 type=internal
On the side of another virtual switch on another node of another cluster, where there is no trunk from the physical switch, add this port in the same way, that is:
#ovs-vsctl add-port ovsbr0 vlan120 tag=120 -- set interface vlan120 type=internal
Plus add a network as described above.
Example of outputting OVS and VM settings
The output of the # ovs-vsctl show command on the first node. Ports and interfaces with a name starting with vme are virtual interfaces of the VM, which automatically get into the config when starting the VM connected to OVS.
The output of the # ovs-vsctl show command on the third node
The output of the virsh net-list command contains 4 types of networks, where Bridged and Host-Only are the standard classic versions of P-virtualization by default, and ovsvl and ovsvl2 are what we added according to the instructions above. Ovsvl is obtained for the variant with the tagged tag 140 bridge over the OVS bridge, and ovsvl2 is the variant with the portgroup port group consisting of one port with tag 120. Portgroup is very convenient and you can add more than one port to it using only one network instead of a large number of networks in the classic R-virtualization option with bridges under which different VLAN interfaces. Next, the output of the #virsh net-dumpxml ovsvl and ovsvl2 command showing the contents of the settings for these networks.
Here is a piece of the VM config, where the network interfaces are by command:
#virsh dumpxml имяВМ
When testing OVS, compatibility with running networkmanager (NM) was checked, the above scenarios work fine, but when the service starts automatically, NM can display messages on virtual switch interfaces that it cannot control, NM can be turned off, but it doesn’t work on network functionality affects.
There are still add-ons for controlling OVS via NM, but in this case the OVS functionality is limited. Therefore, parallel operation is possible, or with NM shutdown if necessary.
Also, live migrations of VMs with networks from OVS were successfully verified, if each node has a network instance with OVS, then there are no migration problems, as well as in the standard configuration of a P-virtualization cluster with additional networks for VMs.
In the figure above, ping was launched from inside the VM to the VM from the external network and live migration of the VM from one node with OVS installed to another node with OVS installed, the delay during this VM migration is marked in red.The VM parameters were the following 4 vCPU, 8GB RAM, 64GB disk.
Exactly the same delay occurs in the classic with bridges, that is, nothing has changed for the VM, and the network stack now works through OVS.
In addition to this, successful ssh connections were made with different VMs located on different nodes between the vxlan tunnel and with the VM behind the physical switch. To verify operation, they turned off the tunnel or analyzed packets through tcpdump as described above. If you do not assign an MTU as described above, only ping will pass, but it will not work through ssh.
Description of the scripts
The standard classic with bridges option for configuring a cluster of three P-virtualization nodes plus P-storage without OVS is shown below.
In the diagram, the P-storage network is omitted and not shown, it usually goes as a separate interface intended only for the block level and does not participate in this test. It can be configured without bridges, and as an option, it can also be activated through OVS. For example, configure OVS aggregation for P-storage.
The following is a diagram using OVS with bridges.
There may already be one network with OVS and one with a bridge. With OVS, you can add ports with different tags to portgroup and output them to different VMs already.
If we return to the test scenario, then it can be seen in the following picture:
In my case, it was within the same cluster, but it can be between different clusters due to vxlan tunnels. Let's try to imagine that these are nodes of two different clusters.
The tunnel was raised on a dedicated port on one of the servers in each cluster. Through the tunnel, a specific vlan120 is forwarded in which a certain number of VMs from the entire cluster are calculated for the bandwidth of the channel, where OVS can determine the QoS for traffic for each VM. Local VMs of this node are visible through the local OVS, and VMs from other nodes are visible through the physical switch of each cluster.
OVS fault tolerance is ensured by adding to the script of the HA (shaman) service commands to transfer the vxlan tunnel to another node with OVS, which will be selected by the default algorithm drs, round-robin due to the shaman service from P-Storage.
Fault tolerance and server port balancing can be achieved by bonding aggregation in LACP (802.3ad) mode with layer2 + 3 or layer3 + 4 hashing, which can also be configured using OVS.
I won’t describe how classic br0 works, ovsbr0 works with the IP stack of the OS, which is defined for br0 in this picture, that is, the virtual switch instance in the form ovsbr0 works in this case through br0. In other words, the static IP address of the node is assigned to classic br0 and all traffic that is directed to this subnet from the virtual switch goes through br0, just as it works for all applications of this node. From the point of view of configuration in this case, no cli assignments to br0 from the side of the virtual switch were made except for configuring the vxlan interface with option, respectively, if the node has a second classic br1 with a different IP address and subnet hanging on another physical port, for example eth2 or eth3, from a virtual switch due to the OS stack and its mac table, packets can be sent to these subnets by assigning any virtual switch port to this subnet and connected to the VM, the subnet address will be directly assigned inside the VM or in its settings.
Thanks to this principle, a virtual switch works like a regular program on a host through its network stack without interfering with classic bridges, but of course, provided that you do not configure the same settings on both tools (on bridges and OVS).
The virtual switch has its own poppy table
Each virtual switch I created has a certain set of interfaces (allowing Vlan on some interface, we add a port to a specific virtual switch), as well as a table of correspondence of mac addresses and ports (we look at it with the ovs-appctl fdb/show ovsbr0). Manual assignment of poppy addresses inside the switch was not performed. Portgoup is a group of ports in which at the moment there is a vlan120 port to which the VM is connected.
Theoretically, we can imagine that when a frame with a VLAN tag arrives at some port of the switch, the decision to send the frame further is made based on the table mac of the addresses of the corresponding virtual switch. If a frame with a tag of 120 is received, then a decision will be made on the forwarding of this frame based on the mac of the virtual switch table with a tag of 120.
With regards to VXLAN, in this case static (Unicast). The simplest option is a static indication of remote interfaces as vxlan. It all comes down to the fact that in the VNI configuration (vlan vxlan) it is necessary to statically set the addresses of all remote vxlan interfaces that terminate clients in the specified VNI. In such a scenario, vxlan will indicate in the IP header as the destination addresses - the addresses specified manually by vxlan. Naturally, if there are more than two vxlan s, then there will be at least two package destination points during the flood. It is not possible to specify several recipients in the IP header, so the easiest solution would be to replicate the VxLAN packet on the outgoing vxlan interface and send them unicast to the remote vxlan interfaces specified in the configuration. Upon receiving this packet, the remote vxlan decapsulates it, determines which VNI the packet belongs to and then forwards it to all ports in the given VNI. In addition, since we all know that mac addresses are learned by switches based on the source mac field, after decapsulating the VxLAN packet, the vxlan interface associates the mac address specified as the outgoing one in the original ethernet header with the tunnel to the vxlan interface from which the packet was received. As mentioned earlier, the VxLAN tunnel is perceived by the switch as a simple trunk port.
The disadvantages of this approach are obvious - this is an increased load on the network, since BUM traffic is replicated on the outgoing vxlan interface and unicast is sent to all specified in the configuration network, plus when adding or removing the vxlan interface you will have to edit the configs on all other vxlan- interfaces ah and manually remove or add a neubor (neubor) or in auto mode by means of shaman scripts in case of a node failure. In the age of automation, of course, it is somehow strange to use static indication of neubors. But still, this approach has the right to life, for example, OVS can only work with a statically specified neybor, at least for the moment.
For this mode to work, only the presence of connectivity between the loopbacks of all vxlan interfaces is necessary.
Static (Unicast) VxLAN - as simple as felt boots and trouble-free, as a Kalashnikov assault rifle. There's nothing to break here.
here is described in more detail by definition poppies and flood & amp; Learn at OVS.
When making OVS settings, I liked simplicity, convenience, you immediately feel that you are working with a switch, not just bridges) )) In general, it makes sense to use it at least in parallel with the classic bridges in the RP.
Before the integration described above, along with the links used in it, I studied the following articles:
- https ://blog.remibergsma.com/2015/03/26/connecting-two-open-vswitches-to-create-a-l2-connection/