Testing Docker networking with GNS3, Part1: MaCVLAN


Introduction

MacVLAN allows to connect containers in separate docker networks to your VLAN infrastructure, so they act like being directly connected to your network.

From the main interface, MacVLAN driver creates subinterfaces to handle 802.1q tags for each VLAN, and assign to them separate IP and MAC addresses.

Because the main interface (with its own MAC) has to accept traffic toward subinterfaces (with their own MACs), Docker network driver MacVLAN requires Docker host interface to be in promiscuous mode.

Knowing that, most cloud providers (aws, azure…) do not allow promiscuous mode, you’ll be deploying MACVLAN on your own premises.

MacVLAN network characteristics:

Creates subintefaces to process VLAN tags
Assign different IP and MAC addresses to each subinterface
Requires the main Docker host intreface to function in promiscuous mode to accept traffic not destined to main interface MAC.
The admin needs to carefully assign ranges of IP’s to VLANs in different Docker nodes in harmony with an eventual DHCP range used by the existing VLANs

Conceptual diagram: Docker node connected to the topology

Selection_0083

Conceptual diagram: MACVLAN configured

Selection_0082

Conceptual diagram: Logical functioning of MACVLAN

Selection_0084

Purpose of this lab:

  • To test and get hands on practice with Docker MACVLAN.
  • It is easy to deploy complex topologies in GNS3 using a meriad of virtual appliances https://gns3.com/marketplace/appliances.  Building a topology is as easy as dragging devices and drawing the connections betweeen them.

star_fullNote:

It is better to have some basic practical knowledge of docker containers.

1- GNS3 topology:

Selection_0080

Devices used:

  • Two VMWare Virtual machines for docker nodes, imported into GNS3.
  • Two OpenvSwitch containers gns3/openvswitch. Import & Insert
  • Ansible container ajnouri/ansible used as SSH client to manage Docker nodes. In another post I’ll be showing how to use it to provision package installation to any device (ex: Docker instalaltion to VMWare nodes).
  • Cisco IOSv 15.6(2)T: Route-on-a-stick used to route traffic from each vlan to outside world (PAT) and deploy communication policy between VLANs. Import & Insert

star_fullNote:

Atually importing a container into GNS3 is very easy and intuitive, here is a video from David Bomball explaining the process.

  • Create two VMWare Ubuntu xenial LTS servers to be used as Docker nodes, with 1Gig RAM and 2 interfaces.

  • Install Docker min 1.12 (latest recommended).

Here is the script if you want to automate the deployment of Docker, for example from an ansible container, like shown in this GNS3 container series.

#!/bin/bash
 ### Install GNS3
 sudo add-apt-repository ppa:gns3/ppa
 sudo apt-get update
 sudo apt-get install -y gns3-gui
 sudo apt-get install -y gns3-server# Add Oficial docker repository GPG signature
 
### Install Docker
 # https://docs.docker.com/engine/installation/linux/ubuntu/
 curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -# Add apt repository sources
 sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"
 sudo apt-get update

# Install Docker CE
 sudo apt-get install -y docker-ce# Docker without sudo
 sudo groupadd docker
 sudo gpasswd -a $USER docker
 $docker version
Client:
Version:      17.03.1-ce
API version:  1.27
Go version:   go1.7.5
Git commit:   c6d412e
Built:        Mon Mar 27 17:14:09 2017
OS/Arch:      linux/amd64
Server:
Version:      17.03.1-ce
API version:  1.27 (minimum version 1.12)
Go version:   go1.7.5
Git commit:   c6d412e
Built:        Mon Mar 27 17:14:09 2017
OS/Arch:      linux/amd64
Experimental: false

Docker node interfaces:

Main interface: e0/0 (Ubuntu: ens33), a trunk interface used to connect containers to your network VLANs

Management interface: e1/0 (ubuntu: ens38) connected to the common VLAN114

Interface configuration (/etc/netwoork/interfaces):
# The primary network interface
auto ens33
iface ens33 inet manual
auto ens38
iface ens38 inet static
address 192.168.114.32
netmask 24
gateway 192.168.114.200
up echo "nameserver 8.8.8.8" > /etc/resolv.conf

# autoconfigured IPv6 interfaces
iface ens33 inet6 auto
iface ens38 inet6 auto

Promiscuous mode:

Without proomiscuous mode, containers will not be able to communicate with hosts outside of docker node, because the main interface (connected to the VLAN network) will not accept traffic to other MAC addresses (those of MACVLAN)

Promiscuous mode is configured in two steps:

Configuring Promiscuous mode on VMWare guest:

Add the below command to /etc/rc.local

ifconfig ens33 up promisc

Check for the letter “P” for Promiscuous

netstat -i
Kernel Interface table
Iface   MTU Met   RX-OK RX-ERR RX-DRP RX-OVR    TX-OK TX-ERR TX-DRP TX-OVR Flg
docker0    1500 0         0      0      0 0             0      0      0      0 BMU
ens33      1500 0        25      0      0 0            32      0      0      0 BMPRU
ens38      1500 0        55      0      0 0            60      0      0      0 BMRU
ens33.10   1500 0         0      0      0 0             8      0      0      0 BMRU
ens33.20   1500 0         0      0      0 0             8      0      0      0 BMRU
ens33.30   1500 0         0      0      0 0             8      0      0      0 BMRU
lo        65536 0       160      0      0 0           160      0      0      0 LRU
Authorizations for Promiscuous mode on VMWare host:

By default, VMWare interfaces will not function in promiscuous mode because a regular user will not have write access to /dev/vmnet* files.

So, Create a special group, include the user running vmware in the group and allow th rgroup to have right access to /dev/vmnet* files :

sudo groupadd promiscuous
sudo usermod -a -G promiscuous $USER
chgrp promiscuous /dev/vmnet*
chmod g+rw /dev/vmnet*

Or simply give right access to everyone:

chmod a+rw /dev/vmnet*

For permanent change, put it in  /etc/init.d/vmware file as follow:

vmwareStartVmnet() {
 vmwareLoadModule $vnet
 "$BINDIR"/vmware-networks --start >> $VNETLIB_LOG 2>&1
 chmod a+rw /dev/vmnet*
}

GNS3 VLAN topology:

For each Docker node, connect the first interface to OpenVswitch1 trunk interface and the second interface to a VLAN interface 114.
VLAN 114 is a common VLAN used to reach and manage all other devices.

GNS3 integrates Docker, so you can use containers as simple endhost devices (independently of docker network drivers):

  • gns3/openvswitch container: Simple L2 switch
  • gns3/webterm container: GUI Firefox browser (no need for entire VM for that)
  • ajnouri/ansible container: the management endhost used to access Docker nodes thourgh SSH. In subsequent lab, I’ll be showing how top manage GNS3 devices from this Ansible container.

Docker MACVLAN network allows to connect your containers to an existing network vlans seamlessly as they were directly connected to your VLAN infrstustructre.

The network is deploying three isolated VLANs (id: 10, 20 and 30) and vlan id 114 able to communicate with all three VLANs through a router on a stick (Cisco IOSv 15.6T).

MacVLAN generates subinterfaces (.) to process (tag/untag) traffic.

The parent (main) interface will act as a trunk interface carrying all vlans from “children” interfaces, so the network switch interface linked to it should be a trunk port.

OpenVswitch1 ports:

First, let’s clean the configuration and then reintroduce trunk and vlan ports:

for br in `ovs-vsctl list-br`; do ovs-vsctl del-br ${br}; done

#Trunk ports:
ovs-vsctl add-port br0 eth1
ovs-vsctl add-port br0 eth2
ovs-vsctl add-port br0 eth6

#vlan ports:
ovs-vsctl add-port br0 eth2 tag=114
ovs-vsctl add-port br0 eth4 tag=114
ovs-vsctl add-port br0 eth7 tag=114
ovs-vsctl show
7afbe760-5237-4ae4-a7e6-ac5b4f1cc6df
Bridge "br0"
…
Port "eth1"
Interface "eth1"

In OVS, untagged ports acts as trunk.

OpenVswitch2 ports:

Openvswitch 2 connects the two management endhosts, Ansible container and Firefox browser container.

for br in `ovs-vsctl list-br`; do ovs-vsctl del-br ${br}; done

#Trunk ports:
ovs-vsctl add-port br0 eth7

#vlan ports:
ovs-vsctl add-port br0 eth0 tag=114
ovs-vsctl add-port br0 eth1 tag=114

For more information on how to configure advanced switching features with ovs, please refer to my gns3 blog post on gns3 community.

Cisco router-on-a-stick configuration:

This router is used to allow inter-vlan communications between VLAN114 and all other VLANs, deny communications between VLANs 10,20 and 30, and connect the entire topolgy to Internet using PAT (Port Address Translation ~ Linux MASQUERADING).

ROAS#sh run
Building configuration...Current configuration : 6093 bytes
!
version 15.6
service timestamps debug datetime msec
service timestamps log datetime msec
no service password-encryption
!
hostname ROAS
!
boot-start-marker
boot-end-marker
!
!
logging buffered 1000000
!
no aaa new-model
ethernet lmi ce
!
!
!
mmi polling-interval 60
no mmi auto-configure
no mmi pvc
mmi snmp-timeout 180
!
!
!
!
!
!
!
!
!
!
!
no ip domain lookup
ip cef
no ipv6 cef
!
multilink bundle-name authenticated
!
!
!
crypto pki trustpoint TP-self-signed-4294967295
enrollment selfsigned
subject-name cn=IOS-Self-Signed-Certificate-4294967295
revocation-check none
rsakeypair TP-self-signed-4294967295
!
!
crypto pki certificate chain TP-self-signed-4294967295
certificate self-signed 01
3082022B 30820194 A0030201 02020101 300D0609 2A864886 F70D0101 05050030
31312F30 2D060355 04031326 494F532D 53656C66 2D536967 6E65642D 43657274
69666963 6174652D 34323934 39363732 3935301E 170D3137 30363138 31313034
31345A17 0D323030 31303130 30303030 305A3031 312F302D 06035504 03132649
4F532D53 656C662D 5369676E 65642D43 65727469 66696361 74652D34 32393439
36373239 3530819F 300D0609 2A864886 F70D0101 01050003 818D0030 81890281
8100B306 1D16E9A7 67E556AD A2A5DEF2 4914C183 5C6B5C7B 9A37CE29 A53F61BB
6FED6E2C 3E4E8E67 355560A7 818590CC 4410B87B 72126999 465A45D4 4627F5DC
185E545B 492840DA A8DB88B3 AC8DBE34 D3109B8D AD4A5522 6C7325E6 405DE12B
91B30192 64AC93BB 618FADB8 2F6F94E0 779B80FF 5002DEA0 1AD6F6D0 5C289790
95590203 010001A3 53305130 0F060355 1D130101 FF040530 030101FF 301F0603
551D2304 18301680 14BF7E97 AE5F2D93 86F08CF4 ED9C8FF0 E92C5D8E D3301D06
03551D0E 04160414 BF7E97AE 5F2D9386 F08CF4ED 9C8FF0E9 2C5D8ED3 300D0609
2A864886 F70D0101 05050003 818100A3 76F489B3 BF33FA87 8E4DD1B5 85913A54
428FB7F2 1D1FDF3E 6D18E3B3 CE0F9400 C574B89C A2D7E89E 7F13AA3F BB4F9B19
10490BF7 4F7C0B3C 70516F75 5C26078F 6A4A14A3 370B63EC 76376758 1B614B98
B4A4FF1D 1B4F7C88 60BFAF98 AF822BB5 DCF6FA16 A31DAD0D 89F53E60 24305110
64839C15 1865D92A D8153B73 8FB8C1
quit
!
redundancy
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
interface GigabitEthernet0/0
no ip address
duplex full
speed auto
media-type rj45
!
interface GigabitEthernet0/0.10
encapsulation dot1Q 10
ip address 10.0.0.200 255.255.255.0
ip access-group 101 in
ip access-group 101 out
ip nat inside
ip virtual-reassembly in
!
interface GigabitEthernet0/0.20
encapsulation dot1Q 20
ip address 20.0.0.200 255.255.255.0
ip access-group 101 in
ip access-group 101 out
ip nat inside
ip virtual-reassembly in
!
interface GigabitEthernet0/0.30
encapsulation dot1Q 30
ip address 30.0.0.200 255.255.255.0
ip access-group 101 in
ip access-group 101 out
ip nat inside
ip virtual-reassembly in
!
interface GigabitEthernet0/0.114
encapsulation dot1Q 114
ip address 192.168.114.200 255.255.255.0
ip nat inside
ip virtual-reassembly in
!
interface GigabitEthernet0/1
ip address 192.168.66.200 255.255.255.0
duplex full
speed auto
media-type rj45
!
interface GigabitEthernet0/2
no ip address
shutdown
duplex auto
speed auto
media-type rj45
!
interface GigabitEthernet0/3
ip address dhcp
ip nat outside
ip virtual-reassembly in
duplex full
speed auto
media-type rj45
!
ip forward-protocol nd
!
!
ip http server
ip http authentication local
ip http secure-server
ip nat inside source list 100 interface GigabitEthernet0/3 overload
ip ssh rsa keypair-name ROAS.cciethebeginning.wordpress.com
ip ssh version 2
!
!
!
access-list 100 permit ip 192.168.114.0 0.0.0.255 any
access-list 100 permit ip 10.0.0.0 0.0.0.255 any
access-list 100 permit ip 20.0.0.0 0.0.0.255 any
access-list 100 permit ip 30.0.0.0 0.0.0.255 any
access-list 101 deny ip 10.0.0.0 0.0.0.255 20.0.0.0 0.0.0.255
access-list 101 deny ip 20.0.0.0 0.0.0.255 10.0.0.0 0.0.0.255
access-list 101 deny ip 10.0.0.0 0.0.0.255 30.0.0.0 0.0.0.255
access-list 101 deny ip 30.0.0.0 0.0.0.255 10.0.0.0 0.0.0.255
access-list 101 deny ip 20.0.0.0 0.0.0.255 30.0.0.0 0.0.0.255
access-list 101 permit ip 192.168.114.0 0.0.0.255 any
access-list 101 permit ip any 192.168.114.0 0.0.0.255
access-list 101 permit ip any any
!
control-plane
!
banner exec ^C
**************************************************************************
* IOSv is strictly limited to use for evaluation, demonstration and IOS *
* education. IOSv is provided as-is and is not supported by Cisco's *
* Technical Advisory Center. Any use or disclosure, in whole or in part, *
* of the IOSv Software or Documentation to any third party for any *
* purposes is expressly prohibited except as otherwise authorized by *
* Cisco in writing. *
**************************************************************************^C
banner incoming ^C
**************************************************************************
* IOSv is strictly limited to use for evaluation, demonstration and IOS *
* education. IOSv is provided as-is and is not supported by Cisco's *
* Technical Advisory Center. Any use or disclosure, in whole or in part, *
* of the IOSv Software or Documentation to any third party for any *
* purposes is expressly prohibited except as otherwise authorized by *
* Cisco in writing. *
**************************************************************************^C
banner login ^C
**************************************************************************
* IOSv is strictly limited to use for evaluation, demonstration and IOS *
* education. IOSv is provided as-is and is not supported by Cisco's *
* Technical Advisory Center. Any use or disclosure, in whole or in part, *
* of the IOSv Software or Documentation to any third party for any *
* purposes is expressly prohibited except as otherwise authorized by *
* Cisco in writing. *
**************************************************************************^C
!
line con 0
line aux 0
line vty 0 4
privilege level 15
login
transport input telnet ssh
!
no scheduler allocate
!
endROAS#

Now you can scale your infrastructure by adding any numbe of new docker nodes.

2- Configuring MACVLAN network on docker node1

1) Create MacVLAN networks

Create MacVLAN networks with the following parameters:

  • type=MacVLAN
  • subnet & ip range from which the container will get their IP parameters
  • Gateway of the VLAN in question
  • parent interface
  • MacVLAN network name
docker network create -d macvlan \
--subnet 10.0.0.0/24 \
--ip-range=10.0.0.0/26 \
--gateway=10.0.0.200 \
-o parent=ens33.10 macvlan10

docker network create -d macvlan \
--subnet 20.0.0.0/24 \
--ip-range=20.0.0.64/26 \
--gateway=20.0.0.200 \
-o parent=ens33.20 macvlan20

docker network create -d macvlan \
--subnet 30.0.0.0/24 \
--ip-range=30.0.0.128/26 \
--gateway=30.0.0.200 \
-o parent=ens33.30 macvlan30

List created the created subinterfaces with “ip a”

List docker networks with “docker network ls” and make sure the three macvlans are created

docker network  ls
NETWORK ID          NAME                DRIVER              SCOPE
916165cd344c        bridge              bridge              local
686ebb8c5399        host                host                local
b3c9487a6cd0        macvlan10           macvlan             local
e1818c46a437        macvlan20           macvlan             local
52ce778548c3        macvlan30           macvlan             local
d97f45467edd        none                null                local

as an example, let’s inspect docker network macvlan10

docker network inspect macvlan10
docker network inspect macvlan10
 [
 {
 “Name”: “macvlan10”,
 “Id”: “b3c9487a6cd09054f06e22cf04181473819236d06245710f3763489a326770d2”,
 “Created”: “2017-06-20T14:36:02.581834167+02:00”,
 “Scope”: “local”,
 “Driver”: “macvlan”,
 “EnableIPv6”: false,
 “IPAM”: {
 “Driver”: “default”,
 “Options”: {},
 “Config”: [
 {
 “Subnet”: “10.0.0.0/24”,
 “IPRange”: “10.0.0.0/26”,
 “Gateway”: “10.0.0.200”
 }
 ]
 },
 “Internal”: false,
 “Attachable”: false,
 “Containers”: {},
 “Options”: {
 “parent”: “ens33.10”
 },
 “Labels”: {}
 }
 ]

docker network inspect macvlan10

[

{

"Name": "macvlan10",

"Id": "b3c9487a6cd09054f06e22cf04181473819236d06245710f3763489a326770d2",

"Created": "2017-06-20T14:36:02.581834167+02:00",

"Scope": "local",

"Driver": "macvlan",

"EnableIPv6": false,

"IPAM": {

"Driver": "default",

"Options": {},

"Config": [

{

"Subnet": "10.0.0.0/24",

"IPRange": "10.0.0.0/26",

"Gateway": "10.0.0.200"

}

]

},

"Internal": false,

"Attachable": false,

"Containers": {},

"Options": {

"parent": "ens33.10"

},

"Labels": {}

}

]

Notice that, no containers are attached to the network:   “Containers”: {},

Let’s remediate to that by running simple apache containers from a custom image ajnouri/apache_ssl_container image (You can use other appropriate image with “bash/sh” running on the console) and connect them respectively to macvlan10, macvlan20 and macvlan30.

2) start and connect containers to MacVLAN networks

docker run --net=macvlan10 -dt --name c11 --restart=unless-stopped ajnouri/apache_ssl_container
docker run --net=macvlan20 -dt --name c12 --restart=unless-stopped ajnouri/apache_ssl_container
docker run --net=macvlan30 -dt --name c13 --restart=unless-stopped ajnouri/apache_ssl_container

The first time, docker will download the image, then any container from that image is created instantly.

docker run”  command options:

  • –net=macvlan10 : macvlan name
  • -dt : run a console on the background
  • –name c11: container name
  • –restart=unless-stopped : if Docker host restart, containers are started & connected to their networks, except if they are intentionally stopped.
  • ajnouri/apache_ssl_container : custom container with Apache SSL installed & small php script to detect session ip addresses

List running containers with “docker ps


$ docker ps
CONTAINER ID        IMAGE                          COMMAND                  CREATED             STATUS              PORTS               NAMES
1a2104d84519        ajnouri/apache_ssl_container   "/bin/sh -c 'servi..."   6 days ago          Up 14 minutes                           c13
691b468918ee        ajnouri/apache_ssl_container   "/bin/sh -c 'servi..."   6 days ago          Up 14 minutes                           c12
1e2bb1933d10        ajnouri/apache_ssl_container   "/bin/sh -c 'servi..."   6 days ago          Up 14 minutes   

And inspect macvlan attached containers with “docker network inspect macvlan10

$ docker network inspect macvlan10
[
{
"Name": "macvlan10",
"Id": "b3c9487a6cd09054f06e22cf04181473819236d06245710f3763489a326770d2",
"Created": "2017-06-20T14:36:02.581834167+02:00",
"Scope": "local",
"Driver": "macvlan",
"EnableIPv6": false,
"IPAM": {
"Driver": "default",
"Options": {},
"Config": [
{
"Subnet": "10.0.0.0/24",
"IPRange": "10.0.0.0/26",
"Gateway": "10.0.0.200"
}
]
},
"Internal": false,
"Attachable": false,
"Containers": {
"1e2bb1933d10f94f2aeb3e83deb4f141d393fc6cfbf09e415ebd1239e421b50f": {
"Name": "c11",
"EndpointID": "e2bc0ea1d3cf9806cf880e6cdb34e4914d84a75e1eabc90a8429f7f7668f82b7",
"MacAddress": "02:42:0a:00:00:01",
"IPv4Address": "10.0.0.1/24",
"IPv6Address": ""
}
},
"Options": {
"parent": "ens33.10"
},
"Labels": {}
}
]

Let’s inspect macvlan networks again for connected containers with "docker network inspect macvlan10"

Container c11 is connected to macvlan10 (10.0.0.0/24) and got dynamically an ip from that range.

3) check connectivity

Now let’s do some connectivity checks inside container c11 (macvlan10) and see if it can reach its gateway (Router ona stick) outside docker host.

ajn@ubuntu:~$ docker exec c11 ping -t3 10.0.0.200
PING 10.0.0.200 (10.0.0.200) 56(84) bytes of data.
64 bytes from 10.0.0.200: icmp_seq=1 ttl=255 time=1.07 ms
64 bytes from 10.0.0.200: icmp_seq=2 ttl=255 time=1.50 ms
64 bytes from 10.0.0.200: icmp_seq=3 ttl=255 time=1.27 ms
64 bytes from 10.0.0.200: icmp_seq=4 ttl=255 time=1.40 ms
64 bytes from 10.0.0.200: icmp_seq=5 ttl=255 time=1.29 ms
^C

ajn@ubuntu:~$ docker exec c12 ping 20.0.0.200
PING 20.0.0.200 (20.0.0.200) 56(84) bytes of data.
64 bytes from 20.0.0.200: icmp_seq=1 ttl=255 time=2.22 ms
64 bytes from 20.0.0.200: icmp_seq=2 ttl=255 time=1.34 ms
64 bytes from 20.0.0.200: icmp_seq=3 ttl=255 time=1.23 ms
64 bytes from 20.0.0.200: icmp_seq=4 ttl=255 time=1.41 ms
^C

ajn@ubuntu:~$ docker exec c13 ping 30.0.0.200
PING 30.0.0.200 (30.0.0.200) 56(84) bytes of data.
64 bytes from 30.0.0.200: icmp_seq=1 ttl=255 time=1.76 ms
64 bytes from 30.0.0.200: icmp_seq=2 ttl=255 time=1.36 ms
64 bytes from 30.0.0.200: icmp_seq=3 ttl=255 time=1.42 ms
64 bytes from 30.0.0.200: icmp_seq=4 ttl=255 time=1.51 ms
64 bytes from 30.0.0.200: icmp_seq=5 ttl=255 time=1.39 ms
^C

Yes!!!

And can even reach Internet, thanks to router-on-stick

docker exec c11 ping gns3.com
PING gns3.com (104.20.168.3) 56(84) bytes of data.
64 bytes from 104.20.168.3: icmp_seq=1 ttl=51 time=2.54 ms
64 bytes from 104.20.168.3: icmp_seq=2 ttl=51 time=2.82 ms
64 bytes from 104.20.168.3: icmp_seq=3 ttl=51 time=2.62 ms
64 bytes from 104.20.168.3: icmp_seq=4 ttl=51 time=2.82 ms
^C

And that’s not all, it can even reach other containers, if you allow it of course. Cisco router used for that purpose, you can play with access control lists to implement the policy you want.

3- Configuring MACVLAN network on docker node2

The same steps are applied to Docker node 2. Interfaces are connected in the same way as node1: one interface to common vlan 114 and another trunk interface to create c21, c22 and C23 connected respectively to MacVLANs macvlan10, macvlan20 and macvlan30 (same as node1).

Node2 used different ip ranges than those used for each node1 VLAN:

VLAN subnets Node1 Node2
macvlan10 (10.0.0.0/24) 10.0.0.0/26 10.0.0.64/26
macvlan20 (20.0.0.0/24) 20.0.0.64/26 20.0.0.0/26
macvlan30 (30.0.0.0/24) 30.0.0.128/26 30.0.0.0/26

1) Create MacVLAN networks

docker network create -d macvlan \
--subnet 10.0.0.0/24 \
--ip-range=10.0.0.64/26 \
--gateway=10.0.0.200 \
-o parent=ens33.10 macvlan10


docker network create -d macvlan \
--subnet 20.0.0.0/24 \
--ip-range=20.0.0.0/26 \
--gateway=20.0.0.200 \
-o parent=ens33.20 macvlan20


docker network create -d macvlan \
--subnet 30.0.0.0/24 \
--ip-range=30.0.0.0/26 \
--gateway=30.0.0.200 \
-o parent=ens33.30 macvlan30

2) start and connect containers to MacVLAN networks

docker run --net=macvlan10 -dt --name c21 --restart=unless-stopped ajnouri/apache_ssl_container
docker run --net=macvlan20 -dt --name c22 --restart=unless-stopped ajnouri/apache_ssl_container
docker run --net=macvlan30 -dt --name c23 --restart=unless-stopped ajnouri/apache_ssl_container

3) check connectivity

$ docker exec c22 ping 20.0.0.200
PING 20.0.0.200 (20.0.0.200) 56(84) bytes of data.
64 bytes from 20.0.0.200: icmp_seq=1 ttl=255 time=1.06 ms
64 bytes from 20.0.0.200: icmp_seq=2 ttl=255 time=1.51 ms
64 bytes from 20.0.0.200: icmp_seq=3 ttl=255 time=1.46 ms
^C
$ docker exec c23 ping 30.0.0.200
PING 30.0.0.200 (30.0.0.200) 56(84) bytes of data.
64 bytes from 30.0.0.200: icmp_seq=1 ttl=255 time=2.11 ms
64 bytes from 30.0.0.200: icmp_seq=2 ttl=255 time=1.42 ms
64 bytes from 30.0.0.200: icmp_seq=3 ttl=255 time=1.56 ms

And according to the deployed policy on the router inter-vlan communication is not allowed

$ docker exec c23 ping 10.0.0.200
PING 10.0.0.200 (10.0.0.200) 56(84) bytes of data.
From 30.0.0.200 icmp_seq=1 Packet filtered
From 30.0.0.200 icmp_seq=2 Packet filtered
From 30.0.0.200 icmp_seq=3 Packet filtered

Now let’s check communication between c11 (node1: macvlan10) and c21 (node2: macvlan10):

docker exec c21 ping 10.0.0.1
PING 10.0.0.1 (10.0.0.1) 56(84) bytes of data.
64 bytes from 10.0.0.1: icmp_seq=1 ttl=64 time=1.45 ms
64 bytes from 10.0.0.1: icmp_seq=2 ttl=64 time=0.879 ms
64 bytes from 10.0.0.1: icmp_seq=3 ttl=64 time=0.941 ms
^C

Nice!

Containers connected to the same network VLAN (from different docker nodes) talk to each other.

And to finish, from the GUI browser container gns3/webterm , let’s access all containers from both nodes. Yes, in GNS3 you can run Firefox in a container, no need for an entire VM for that :P-)

Selection_0079

Troubleshooting:

Without Promiscuous mode you’ll notice that container traffic reaches the outside network, but not the other way around, as shown below in the wireshark capture:

  • 10.0.0.200=outside router
  • 10.0.0.1=container behing MACVLAN

Selection_0073

References:

Routing between Docker containers using GNS3.


The idea is to route (IPv4 and IPv6) between Dockers containers using GNS3 and use them as end-hosts instead of Virtual Machines.

Containers use only the resources necessary for the application they run. They use an image of the host file system and can share the same environment (binaries and libraries).

In the other hand, virtual machines require entire OS’s, with reserved RAM and disk space.

Virtual machines vs Docker containers

Virtual machines vs Docker containers

 

If you are not familiar with Docker, I urge you to take a look at the below excellent short introduction and some additional explanation from Docker site. :

 

 

As for now, Docker has limited networking functionalities. This is where pipework comes to the rescue. Pipework allows more advanced networking settings like adding new interfaces, IP’s from a different subnets and set gateways and many more…

To be able to route between the containers using your own GNS3 topology (the sky the limit!), pipework allows to create a new interface inside a running container, connect it to a host bridge interface, give it an IP/mask in any subnet you want and set a default gateway pointing to a device in GNS3. Consequently all egress traffic from the container is routed to your GNS3 topology.

 

GNS3 connection to Docker a container

GNS3 connection to Docker a container

 

How pipework connects exposes container network

How pipework connects exposes container network

Lab requirements:

Docker:
https://docs.docker.com/installation/ubuntulinux/#docker-maintained-package-installation
Pipework:

sudo bash -c "curl https://raw.githubusercontent.com/jpetazzo/pipework/master/pipework\
 > /usr/local/bin/pipework"

For each container, we will generate docker image, run a container with an interactive terminal and set networking parameters (IP and default gateway).

To demonstrate docker flexibility, we will use 4 docker containers with 4 different subnets:

 

 

This is how containers are built for this lab:

 

 .

 .

Here is the general workflow for each container.

1- build image from Dockerfile (https://docs.docker.com/reference/builder/):

An image is readonly.

sudo docker build -t <image-tag> .

Or (docker v1.5) sudo docker build -t <image-tag> <DockerfileLocation>

2- Run the built image:

Spawn and run a writable container with interactive console.

The parameters of this command may differ slightly for each GUI containers.

sudo docker run -t -i <image id from `sudo docker images`> /bin/bash

3- Set container networking:

Create host bridge interface and link to a new interface inside the container, assign to it an IP and a new default gateway.

sudo pipework <bridge> -i <int> <container if from `sudo docker ps`> <ip/mask>@<gateway-ip

 

To avoid manipulating image id’s and container id’s for each of the images and the containers, I use a bash script to build and run all containers automatically:

https://github.com/AJNOURI/Docker-files/blob/master/gns3-docker.sh

 

#!/bin/bash
IMGLIST="$(sudo docker images | grep mybimage | awk '{ print $1; }')"
[[ $IMGLIST =~ "mybimage" ]] && sudo docker build -t mybimage -f phusion-dockerbase .
[[ $IMGLIST =~ "myapache" ]] && sudo docker build -t myapache -f apache-docker .
[[ $IMGLIST =~ "myfirefox" ]] && sudo docker build -t myfirefox -f firefox-docker .

BASE_I1="$(sudo docker images | grep mybimage | awk '{ print $3; }')"
lxterminal -e "sudo docker run -t -i --name baseimage1 $BASE_I1 /bin/bash"
sleep 2
BASE_C1="$(sudo docker ps | grep baseimage1 | awk '{ print $1; }')"
sudo pipework br4 -i eth1 $BASE_C1 192.168.44.1/24@192.168.44.100 

BASE_I2="$(sudo docker images | grep mybimage | awk '{ print $3; }')"
lxterminal -e "sudo docker run -t -i --name baseimage2 $BASE_I2 /bin/bash"
sleep 2
BASE_C2="$(sudo docker ps | grep baseimage2 | awk '{ print $1; }')"
sudo pipework br5 -i eth1 $BASE_C2 192.168.55.1/24@192.168.55.100 

APACHE_I1="$(sudo docker images | grep myapache | awk '{ print $3; }')"
lxterminal -t "Base apache" -e "sudo docker run -t -i --name apache1 $APACHE_I1 /bin/bash"
sleep 2
APACHE_C1="$(sudo docker ps | grep apache1 | awk '{ print $1; }')"
sudo pipework br6 -i eth1 $APACHE_C1 192.168.66.1/24@192.168.66.100 

lxterminal -t "Firefox" -e "sudo docker run -ti --name firefox1 --rm -e DISPLAY=$DISPLAY -v /tmp/.X11-unix:/tmp/.X11-unix myfirefox"
sleep 2
FIREFOX_C1="$(sudo docker ps | grep firefox1 | awk '{ print $1; }')"
sudo pipework br7 -i eth1 $FIREFOX_C1 192.168.77.1/24@192.168.77.100

 

And we end up with the following conainers:

Containers, images and dependencies.

Containers, images and dependencies.


 

GNS3

All you have to do is to bind a separate cloud to each bridge interface (br4,br5,br6 and br7) created by pipework, and then connect them to the appropriate segment in your topology.

 

Lab topology

Lab topology

Note that GNS3 topology is already configured for IPv6, so as soon as you start the routers, Docker containers will be assigned IPv6 addresses from the routers through SLAAC (Stateles Auto Configuration) which makes them reachable through IPv6.

 

Here is a video on how to launch the lab:


 

Cleaning up

To clean your host from all containers and images use the following bash script:

https://github.com/AJNOURI/Docker-files/blob/master/clean_docker.sh which uses the below docker commands:

Stop running containers:

  • sudo docker stop <container id’s from `sudo docker ps`>

Remove the stopped container:

  • sudo docker rm <container id’s from `sudo docker ps -a`>

Remove the image:

  • sudo docker rmi <image id’s from `sudo docker images`>
sudo ./clean_docker.sh
Stopping all running containers...
bf3d37220391
f8ad6f5c354f
Removing all stopped containers...
bf3d37220391
f8ad6f5c354f
Erasing all images...
Make sure you are generating image from a Dockerfile
or have pushed your images to DockerHub.
*** Do you want to continue? No

I answered “No”, because I still need those images to spawn containers, you can answer “Yes” to the question if you don’t need the images anymore or if you need to change the images.


 

References:

Docker:

pipework for advanced Docker networking:

Running firefox inside Docker container:

Baseimage-Docker:

3D model shipping container:

OSPF inter-area and intra-area routing rules


The following lab focuses on intra-area and inter-area route selection process.

For the sake of clarity, I put the final conclusions first, wrapped in a table form, with some explanations to ponder upon, followed by the different lab cases used to check OSPF route selection rules.

For each case, I used interface costs and states to illustrate OSPF selection rules in action.

 

Order of preference and criteria Rules
1. Intra-area (O)

  • Lowest cost
  • Multipath
– Intra-area routes are always preferred over inter-area ones.

– Intra-area routing to a destination inside a non-backbone area will take the shortest path without traversing the backbone area.- Intra-area routing to a destination inside a backbone area will take the shortest path without traversing a non-backbone area.
– ABR’s advertise only intra-area routes from non-backbone area to the backbone area and advertise intra-area and inter-area routes from backbone area to a non-backbone area.
– ABRs do not take into account in SPF calculations LSAs received from non-backbone areas.
2. Inter-area (IA) – Inter-area route between two non-backbone areas must pass through the backbone area.
– Inter-area route will take the path with the shortest total cost.
3. External routes
3a. Type 1:

  • Lowest total cost
  • Multipath

3b. Type 2:

  • Redistribution cost
  • Total cost
  • Multipath
For more information about comparing OSPF external routes, please refer to the lab OSPF external E1, E2, N1, N2…Who is the winner?

 

  • References from RFCs:

rfc3509

OSPF prevents inter-area routing loops by implementing a split-horizon mechanism, allowing ABRs to inject into the backbone only Summary-LSAs derived from the intra-area routes, and limiting ABRs’ SPF calculation to consider only Summary-LSAs in the backbone area’s link-state database.

 

rfc2328

Routing in the Autonomous System takes place on two levels, depending on whether the source and destination of a packet reside in the same area (intra-area routing is used) or different areas (inter-area routing is used). In intra-area routing, the packet is routed solely on information obtained within the area; no routing information obtained from outside the area can be used.   This protects intra-area routing from the injection of bad routing information.

 

3.2.   Inter-area routingWhen routing a packet between two non-backbone areas the backbone is used. The path that the packet will travel can be broken up into three contiguous pieces: an intra-area path from the source to an area border router, a backbone path between the source and destination areas, and then another intra-area path to the destination. The algorithm finds the set of such paths that have the smallest cost.The topology of the backbone dictates the backbone paths used between areas.

 


There are four possible types of paths used to route traffic to the destination, listed here in decreasing order of preference:
intra-area, inter-area, type 1 external or type 2 external.

To understand OSPF mechanism of loop prevention, think conceptually of OSPF areas as nodes in a loop-free tree with depth never bigger than 2.

 

OSPF tree: loop-free

OSPF tree: loop-free

You can visually see why 2 non-backbone areas cannot directly exchange routes and they must have area0 as an intermediate area to avoid loops:

 

OSPF tree: loop

OSPF tree: loop

Important notes:

  • Throughout the lab, I am using cost to manipulate route selection.

  • OSPF takes into account the cost of output interface toward the destination, so be careful when you change the cost on one end of a link, this can cause unwanted asymmetric routing.

  • IGP protocols split the router (advertise routes through interfaces) whereas BGP splits the link between routers, this fundamental difference should be clearly depicted in the topology to avoid confusion.

  • If you are advertising your loopback networks with mask less than 32 you will have to to set their ospf network type point-to-point (refer to this lab for more information).

  • Observe the ospf database inf. for LSA3 “Routing Bit Set on this LSA“, this is a Cisco-specific implementation of OSPF protocol, indicating that a specific LSA is taken into account in the calculation of the best route.

  • Multipath selection is considered locally through FIB and provided by CEF load balancing mechanism, if there next-hops leading to the same destination.

 

Low-level lab design topology

Here is the lab topology used for testing:

Figure3: Low Level Design Lab topology

Figure3: Low Level Design Lab topology

 

Test cases

Case1:

  • Traffic between R1 10.10.0.1 (area 123) to R5 50.10.0.5 (area0)
  • Default interface ospf costs
Figure4: Case1

Figure4: Case1

R1#Ping 50.10.0.5 source 10.10.0.1 repeat 5

Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 50.10.0.5, timeout is 2 seconds:
Packet sent with a source address of 10.10.0.1
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 16/27/40 ms
R1#trace 50.10.0.5 source 10.10.0.1

Type escape sequence to abort.
Tracing the route to 50.10.0.5

  1 192.168.31.3 8 msec
    192.168.21.2 12 msec
    192.168.31.3 16 msec
  2 192.168.42.4 16 msec
    192.168.43.4 16 msec
    192.168.42.4 32 msec
  3 192.168.54.5 28 msec 40 msec 40 msec
R1#sh ip route 50.10.0.5

Routing entry for 50.10.0.5/32

  Known via &quot;ospf 666&quot;, distance 110, metric 4, type inter area

  Last update from 192.168.12.2 on FastEthernet1/0, 00:42:05 ago

  Routing Descriptor Blocks:

  * 192.168.13.3, from 3.3.3.3, 00:42:15 ago, via FastEthernet1/1

      Route metric is 4, traffic share count is 1

    192.168.12.2, from 2.2.2.2, 00:42:05 ago, via FastEthernet1/0

      Route metric is 4, traffic share count is 1

R1#
R1#sh ip ospf database summary 50.10.0.5

            OSPF Router with ID (1.1.1.1) (Process ID 666)

        Summary Net Link States (Area 123)

  Routing Bit Set on this LSA

  LS age: 543

  Options: (No TOS-capability, DC, Upward)

  LS Type: Summary Links(Network)

  Link State ID: 50.10.0.5 (summary Network Number)

  Advertising Router: 2.2.2.2

  LS Seq Number: 80000002

  Checksum: 0x32BD

  Length: 28

  Network Mask: /32

    TOS: 0     Metric: 3 

  Routing Bit Set on this LSA

  LS age: 587

  Options: (No TOS-capability, DC, Upward)

  LS Type: Summary Links(Network)

  Link State ID: 50.10.0.5 (summary Network Number)

  Advertising Router: 3.3.3.3

  LS Seq Number: 80000002

  Checksum: 0x14D7

  Length: 28

  Network Mask: /32

    TOS: 0     Metric: 3 

R1#

R1#

 

Case2:

  • Traffic from R1 10.10.0.1 (area123) to R5 50.20.0.5 (backbone)
  • R1 fa1/0 cost = 10
  • R2 fa1/1 cost = 10
Figure5: Case2

Figure5: Case2

Making two inter-area paths with unequal total costs, (unequal intra-area costs)

R1#trace 50.10.0.5 source 10.10.0.1

Type escape sequence to abort.
Tracing the route to 50.10.0.5

  1  *
    192.168.13.3 12 msec 28 msec
  2  *
    192.168.34.4 16 msec 16 msec
  3  *
    192.168.45.5 44 msec 44 msec
R1#sh ip route 50.10.0.5
Routing entry for 50.10.0.5/32
  Known via &quot;ospf 666&quot;, distance 110, metric 4, type inter area
  Last update from 192.168.13.3 on FastEthernet1/1, 00:48:22 ago
  Routing Descriptor Blocks:
  * 192.168.13.3, from 3.3.3.3, 01:06:54 ago, via FastEthernet1/1
      Route metric is 4, traffic share count is 1

R1#

R1#sh ip ospf database summary 50.10.0.5

            OSPF Router with ID (1.1.1.1) (Process ID 666)

        Summary Net Link States (Area 123)

  LS age: 827
  Options: (No TOS-capability, DC, Upward)
  LS Type: Summary Links(Network)
  Link State ID: 50.10.0.5 (summary Network Number)
  Advertising Router: 2.2.2.2
  LS Seq Number: 80000007
  Checksum: 0x825F
  Length: 28
  Network Mask: /32
    TOS: 0     Metric: 12 

  Routing Bit Set on this LSA
  LS age: 90
  Options: (No TOS-capability, DC, Upward)
  LS Type: Summary Links(Network)
  Link State ID: 50.10.0.5 (summary Network Number)
  Advertising Router: 3.3.3.3
  LS Seq Number: 8000000A
  Checksum: 0x4DF
  Length: 28
  Network Mask: /32
    TOS: 0     Metric: 3 

R1#

 

R5#trace 10.10.0.1 source 50.10.0.5

Type escape sequence to abort.
Tracing the route to 10.10.0.1

  1 192.168.45.4 8 msec 4 msec 8 msec
  2 192.168.34.3 16 msec *  32 msec
  3  *
    192.168.13.1 44 msec *
R5#

R5#sh ip ospf database summ 10.10.0.1

            OSPF Router with ID (5.5.5.5) (Process ID 666)

        Summary Net Link States (Area 0)

  LS age: 194
  Options: (No TOS-capability, DC, Upward)
  LS Type: Summary Links(Network)
  Link State ID: 10.10.0.1 (summary Network Number)
  Advertising Router: 2.2.2.2
  LS Seq Number: 80000007
  Checksum: 0x50C7
  Length: 28
  Network Mask: /32
    TOS: 0     Metric: 2 

  Routing Bit Set on this LSA
  LS age: 691
  Options: (No TOS-capability, DC, Upward)
  LS Type: Summary Links(Network)
  Link State ID: 10.10.0.1 (summary Network Number)
  Advertising Router: 3.3.3.3
  LS Seq Number: 80000008
  Checksum: 0x30E2
  Length: 28
  Network Mask: /32
    TOS: 0     Metric: 2 

        Summary Net Link States (Area 25)

  LS age: 198
  Options: (No TOS-capability, DC, Upward)
  LS Type: Summary Links(Network)
  Link State ID: 10.10.0.1 (summary Network Number)
  Advertising Router: 2.2.2.2
  LS Seq Number: 80000007
  Checksum: 0x50C7
  Length: 28
  Network Mask: /32
    TOS: 0     Metric: 2 

  LS age: 203
  Options: (No TOS-capability, DC, Upward)
  LS Type: Summary Links(Network)
  Link State ID: 10.10.0.1 (summary Network Number)
  Advertising Router: 5.5.5.5
  LS Seq Number: 80000007
  Checksum: 0xAFF
  Length: 28
  Network Mask: /32
    TOS: 0     Metric: 4 

R5#

Note that, for the return traffic R5 will receive both summary LSA3 from R2 and R3, but will take into account only R3 because of the ABR’s router ID = 3.3.3.3

Multipath is not considered because there is only one next-hop (R4) in the FIB.

Case3:

  • Traffic from R1 10.10.0.1 (area 123) to R5 50.10.0.2 (backbone)
  • R1 fa1/0 cost = 10
  • R3 fa1/2 cost = 100
Figure6: Case3

Figure6: Case3

R1#sh ip ospf database summ 50.10.0.5

            OSPF Router with ID (1.1.1.1) (Process ID 666)

        Summary Net Link States (Area 123)

  Routing Bit Set on this LSA
  LS age: 697
  Options: (No TOS-capability, DC, Upward)
  LS Type: Summary Links(Network)
  Link State ID: 50.10.0.5 (summary Network Number)
  Advertising Router: 2.2.2.2
  LS Seq Number: 80000004
  Checksum: 0x2EBF
  Length: 28
  Network Mask: /32
    TOS: 0     Metric: 3

  LS age: 46
  Options: (No TOS-capability, DC, Upward)
  LS Type: Summary Links(Network)
  Link State ID: 50.10.0.5 (summary Network Number)
  Advertising Router: 3.3.3.3
  LS Seq Number: 80000002
  Checksum: 0xF592
  Length: 28
  Network Mask: /32
    TOS: 0     Metric: 102

R1#      
R1#sh ip route 50.10.0.5             
Routing entry for 50.10.0.5/32
  Known via &quot;ospf 666&quot;, distance 110, metric 13, type inter area
  Last update from 192.168.12.2 on FastEthernet1/0, 00:01:22 ago
  Routing Descriptor Blocks:
  * 192.168.12.2, from 2.2.2.2, 00:01:22 ago, via FastEthernet1/0
      Route metric is 13, traffic share count is 1

R1#
R1#trace 50.10.0.5 source 10.10.0.1         

Type escape sequence to abort.
Tracing the route to 50.10.0.5

  1 192.168.12.2 20 msec 20 msec 20 msec
  2 192.168.24.4 28 msec 20 msec 24 msec
  3 192.168.45.5 28 msec 36 msec 40 msec
R1#

 

With unequal costs to ABRs and unequal costs advertised by ABRs, R1 OSPF has chosen the path with the lowest total cost to destination: cost to ABRs + cost of LSA3 summary advertised by each ABR.

Case4:

  • Traffic from R1 10.10.0.1 (area 123) to R5 50.10.0.2 (backbone)
  • R1 fa1/0 cost = 10
  • R3 fa1/2 cost = 10
Figure7: Case4

Figure7: Case4

R1#sh ip ospf database summ 50.10.0.5    

            OSPF Router with ID (1.1.1.1) (Process ID 666)

        Summary Net Link States (Area 123)

  Routing Bit Set on this LSA
  LS age: 1072
  Options: (No TOS-capability, DC, Upward)
  LS Type: Summary Links(Network)
  Link State ID: 50.10.0.5 (summary Network Number)
  Advertising Router: 2.2.2.2
  LS Seq Number: 80000004
  Checksum: 0x2EBF
  Length: 28
  Network Mask: /32
    TOS: 0     Metric: 3

  Routing Bit Set on this LSA
  LS age: 12
  Options: (No TOS-capability, DC, Upward)
  LS Type: Summary Links(Network)
  Link State ID: 50.10.0.5 (summary Network Number)
  Advertising Router: 3.3.3.3
  LS Seq Number: 80000003
  Checksum: 0x6C75
  Length: 28
  Network Mask: /32
    TOS: 0     Metric: 12

R1#
R1#sh ip route 50.10.0.5                 
Routing entry for 50.10.0.5/32
  Known via &quot;ospf 666&quot;, distance 110, metric 13, type inter area
  Last update from 192.168.13.3 on FastEthernet1/1, 00:01:21 ago
  Routing Descriptor Blocks:
    192.168.13.3, from 3.3.3.3, 00:01:21 ago, via FastEthernet1/1
      Route metric is 13, traffic share count is 1
  * 192.168.12.2, from 2.2.2.2, 00:08:09 ago, via FastEthernet1/0
      Route metric is 13, traffic share count is 1

R1#
R1#trace 50.10.0.5 source 10.10.0.1  

Type escape sequence to abort.
Tracing the route to 50.10.0.5

  1 192.168.13.3 8 msec
    192.168.12.2 8 msec
    192.168.13.3 8 msec
  2 192.168.24.4 20 msec
    192.168.34.4 24 msec
    192.168.24.4 16 msec
  3 192.168.45.5 20 msec 32 msec 24 msec
R1#

 

With unequal costs to ABRs and unequal costs advertised by ABRs, R1 OSPF has chosen multipath because of the equal total cost to destination: cost to ABRs + cost of LSA3 summary advertised by each ABR.

Case5:

  • Traffic from R5 50.10.0.5 (backbone) to R1 10.10.0.1 (area 123)
  • R3 fa1/1 cost = 10
Figure8: Case5

Figure8: Case5

R5#sh ip ospf database summary 10.10.0.1

            OSPF Router with ID (50.10.0.5) (Process ID 666)

        Summary Net Link States (Area 0)

  Routing Bit Set on this LSA
  LS age: 1906
  Options: (No TOS-capability, DC, Upward)
  LS Type: Summary Links(Network)
  Link State ID: 10.10.0.1 (summary Network Number)
  Advertising Router: 2.2.2.2
  LS Seq Number: 80000011
  Checksum: 0x3CD1
  Length: 28
  Network Mask: /32
    TOS: 0     Metric: 2

  LS age: 19
  Options: (No TOS-capability, DC, Upward)
  LS Type: Summary Links(Network)
  Link State ID: 10.10.0.1 (summary Network Number)
  Advertising Router: 3.3.3.3
  LS Seq Number: 80000003
  Checksum: 0x947A
  Length: 28
  Network Mask: /32
        TOS: 0     Metric: 11
          
...
R5#
R5#sh ip route 10.10.0.1                
Routing entry for 10.10.0.1/32
  Known via &quot;ospf 666&quot;, distance 110, metric 4, type inter area
  Last update from 192.168.45.4 on FastEthernet1/0, 00:02:53 ago
  Routing Descriptor Blocks:
  * 192.168.45.4, from 2.2.2.2, 00:02:53 ago, via FastEthernet1/0
      Route metric is 4, traffic share count is 1

R5#
R5#trace 10.10.0.1 source 50.10.0.5     

Type escape sequence to abort.
Tracing the route to 10.10.0.1

  1 192.168.45.4 4 msec 12 msec 8 msec
  2 192.168.24.2 24 msec 20 msec 20 msec
  3 192.168.12.1 20 msec 28 msec 20 msec
R5#

 

With equal paths to ABRs R2 and R3, R5 ospf choose the path with the lowest total cost (cost to ABR + cost advertised by ABR)

Case6:

  • Traffic from R5 50.10.0.5 (backbone) to R1 10.10.0.1 (area 123)
  • R3 fa1/1 cost = 10
  • R4 fa1/1 cost = 5
Figure9: Case6

Figure9: Case6

R5#sh ip ospf database summary 10.10.0.1

            OSPF Router with ID (50.10.0.5) (Process ID 666)

        Summary Net Link States (Area 0)

  Routing Bit Set on this LSA
  LS age: 573
  Options: (No TOS-capability, DC, Upward)
  LS Type: Summary Links(Network)
  Link State ID: 10.10.0.1 (summary Network Number)
  Advertising Router: 2.2.2.2
  LS Seq Number: 80000012
  Checksum: 0x3AD2
  Length: 28
  Network Mask: /32
    TOS: 0     Metric: 2

  LS age: 710
  Options: (No TOS-capability, DC, Upward)
  LS Type: Summary Links(Network)
  Link State ID: 10.10.0.1 (summary Network Number)
  Advertising Router: 3.3.3.3
  LS Seq Number: 80000003
  Checksum: 0x947A
  Length: 28
  Network Mask: /32
        TOS: 0     Metric: 11
          
...   
R5#
R5#sh ip route 10.10.0.1                
Routing entry for 10.10.0.1/32
  Known via &quot;ospf 666&quot;, distance 110, metric 8, type inter area
  Last update from 192.168.45.4 on FastEthernet1/0, 00:02:49 ago
  Routing Descriptor Blocks:
  * 192.168.45.4, from 2.2.2.2, 00:02:49 ago, via FastEthernet1/0
      Route metric is 8, traffic share count is 1

R5#
R5#trace 10.10.0.1 source 50.10.0.5     

Type escape sequence to abort.
Tracing the route to 10.10.0.1

  1 192.168.45.4 16 msec 12 msec 8 msec
  2 192.168.24.2 20 msec 20 msec 20 msec
  3 192.168.12.1 28 msec 24 msec 20 msec
R5#

 

Note that OSPF on R5 did not choose the shortest path to ABR (R3), but the total cost.

==> The same from area0 to non-backbone area, the router looks at the total cost of LSA3 + cost of the route inside area0

Case7:

  • Traffic from R1 10.10.0.1 (area123) to R2 20.10.0.2 (area 123)
  • R1 fa1/0 cost = 100
Figure10: Case7

Figure10: Case7

R1#sh ip route 20.10.0.2
Routing entry for 20.10.0.2/32
  Known via &quot;ospf 666&quot;, distance 110, metric 101, type intra area
  Last update from 192.168.12.2 on FastEthernet1/0, 00:00:11 ago
  Routing Descriptor Blocks:
  * 192.168.12.2, from 2.2.2.2, 00:00:11 ago, via FastEthernet1/0
      Route metric is 101, traffic share count is 1

R1#trace 20.10.0.2 source 10.10.0.1

Type escape sequence to abort.
Tracing the route to 20.10.0.2

  1 192.168.12.2 16 msec 12 msec 8 msec
R1#

 

R3#sh ip route 20.10.0.2
Routing entry for 20.10.0.2/32
  Known via &quot;ospf 666&quot;, distance 110, metric 102, type intra area
  Last update from 192.168.13.1 on FastEthernet1/1, 00:01:24 ago
  Routing Descriptor Blocks:
  * 192.168.13.1, from 2.2.2.2, 00:01:24 ago, via FastEthernet1/1
      Route metric is 102, traffic share count is 1

R3#

 

Case8:

  • Traffic from R1 10.10.0.1 (area123) to R2 20.10.0.2 (area 123)
  • R1-R2 link down (no inter-area route to 20.10.0.2)
Figure11: Case8

Figure11: Case8

R1#sh ip route 20.10.0.2
% Subnet not in table
R1#
R1#
R1#sh ip ospf database summ
R1#sh ip ospf database summary 20.10.0.2

            OSPF Router with ID (1.1.1.1) (Process ID 666)
R1#

 

R1 can no more reach the destination in the same area, though it is reachable from R3 which is itself reachable to R1

R3#sh ip route 20.10.0.2
Routing entry for 20.10.0.2/32
  Known via &quot;ospf 666&quot;, distance 110, metric 3, type inter area
  Last update from 192.168.34.4 on FastEthernet1/2, 00:01:12 ago
  Routing Descriptor Blocks:
  * 192.168.34.4, from 2.2.2.2, 00:01:12 ago, via FastEthernet1/2
      Route metric is 3, traffic share count is 1

R3#ping 20.10.0.2

Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 20.10.0.2, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 24/27/32 ms
R3#trace 20.10.0.2

Type escape sequence to abort.
Tracing the route to 20.10.0.2

  1 192.168.34.4 12 msec 8 msec 12 msec
  2 192.168.24.2 16 msec 24 msec 16 msec
R3#

 

OSPF will always choose the intra-area path without crossing area 0

Case9:

  • Intra-area traffic from R4 40.10.0.4 (backbone) to R2 20.10.0.2 (backbone)
  • R4 f1/1 cost = 100
Figure12: Case9

Figure12: Case9

R4#sh ip route 20.20.0.2
Routing entry for 20.20.0.2/32
  Known via &quot;ospf 666&quot;, distance 110, metric 101, type intra area
  Last update from 192.168.24.2 on FastEthernet1/1, 00:01:51 ago
  Routing Descriptor Blocks:
  * 192.168.24.2, from 2.2.2.2, 00:01:51 ago, via FastEthernet1/1
      Route metric is 101, traffic share count is 1

R4#trace 20.20.0.2 source 40.10.0.4

Type escape sequence to abort.
Tracing the route to 20.20.0.2

  1 192.168.24.2 20 msec 12 msec 8 msec
R4#

 

R3#sh ip route 20.20.0.2
Routing entry for 20.20.0.2/32
  Known via &quot;ospf 666&quot;, distance 110, metric 102, type intra area
  Last update from 192.168.34.4 on FastEthernet1/2, 00:02:44 ago
  Routing Descriptor Blocks:
  * 192.168.34.4, from 2.2.2.2, 00:02:44 ago, via FastEthernet1/2
      Route metric is 102, traffic share count is 1

R3#

 

R4 chose the worse path through R2 inside the backbone without crossing non-backbone area.

Case10:

  • Traffic from R1 10.10.0.2 (area123) to R2 20.20.0.2 (backbone)
  • R4-R2 link down (no inter-area route to 20.20.0.2)
Figure13: Case10

Figure13: Case10

R1#sh ip route 20.20.0.2
Routing entry for 20.20.0.2/32
  Known via &quot;ospf 666&quot;, distance 110, metric 2, type inter area
  Last update from 192.168.12.2 on FastEthernet1/0, 00:00:02 ago
  Routing Descriptor Blocks:
  * 192.168.12.2, from 2.2.2.2, 00:00:02 ago, via FastEthernet1/0
      Route metric is 2, traffic share count is 1

R1#trace 20.20.0.2 source 10.10.0.2

Type escape sequence to abort.
Tracing the route to 20.20.0.2

  1 192.168.12.2 12 msec 8 msec 8 msec
R1#

R4#sh ip route 20.20.0.2
% Network not in table
R4#
R4#sh ip ospf data summ 20.20.0.2  

            OSPF Router with ID (4.4.4.4) (Process ID 666)
R4#
R3#sh ip route 20.20.0.2
% Network not in table
R3#sh ip ospf data summary  20.20.0.2

            OSPF Router with ID (3.3.3.3) (Process ID 666)

        Summary Net Link States (Area 123)

  LS age: 3429
  Options: (No TOS-capability, DC, Upward)
  LS Type: Summary Links(Network)
  Link State ID: 20.20.0.2 (summary Network Number)
  Advertising Router: 2.2.2.2
  LS Seq Number: 8000001C
  Checksum: 0x17D7
  Length: 28
  Network Mask: /32
    TOS: 0     Metric: 1

R3#

Though R3 has received the summary LSA3 from R2 though the non-backbone area 123, it did not include it in the routing table, even if it is reachable from R1

Case11:

  • Traffic between two non-backbone areas. From area123 to area25.
  • Default interface costs
Figure14: Case11

Figure14: Case11

R1#sh ip route 50.20.0.5
Routing entry for 50.20.0.5/32
  Known via &quot;ospf 666&quot;, distance 110, metric 3, type inter area
  Last update from 192.168.12.2 on FastEthernet1/0, 00:02:54 ago
  Routing Descriptor Blocks:
  * 192.168.12.2, from 2.2.2.2, 00:02:54 ago, via FastEthernet1/0
      Route metric is 3, traffic share count is 1

R1#trace 50.20.0.5 source 10.10.0.1

Type escape sequence to abort.
Tracing the route to 50.20.0.5

  1 192.168.12.2 16 msec 0 msec 8 msec
  2 192.168.25.5 20 msec 24 msec 32 msec
R1#

From R1, OSPF will choose the path with the lowest total cost within area 123, the backbone and area 25. This happens to be the path through R2, which is directly connected to area25. This seems to defeat the rule B, but it doesn’t, because the ABR R2 has an interface in the backbone.

Case12:

  • Traffic generated from R2: 20.10.0.2 (area 123) to R5 50.20.0.5 (area 25).
  • R2 fa1/2 cost = 100
Figure15: Case12

Figure15: Case12

R2(config-if)#do sh ip route 50.20.0.5           
Routing entry for 50.20.0.5/32
  Known via &quot;ospf 666&quot;, distance 110, metric 101, type intra area
  Last update from 192.168.25.5 on FastEthernet1/2, 00:04:03 ago
  Routing Descriptor Blocks:
  * 192.168.25.5, from 5.5.5.5, 00:04:03 ago, via FastEthernet1/2
      Route metric is 101, traffic share count is 1

R2(config-if)#
R2(config-if)#do trace 50.20.0.5 source 20.10.0.2

Type escape sequence to abort.
Tracing the route to 50.20.0.5

  1 192.168.25.5 20 msec 24 msec 20 msec
R2(config-if)#

Even though inter-area link cost is made worse (higher cost), R2 ospf will choose the shortest path without crossing the backbone.

Case13:

  • R2 fa1/1 Down
Figure16: Case13

Figure16: Case13

R2#sh ip route 50.20.0.2
% Subnet not in table
R2#
R1#sh ip route 50.20.0.5           
Routing entry for 50.20.0.5/32
  Known via &quot;ospf 666&quot;, distance 110, metric 4, type inter area
  Last update from 192.168.13.3 on FastEthernet1/1, 00:08:28 ago
  Routing Descriptor Blocks:
  * 192.168.13.3, from 3.3.3.3, 00:12:15 ago, via FastEthernet1/1
      Route metric is 4, traffic share count is 1

R1#trace 50.20.0.5 source 10.10.0.1

Type escape sequence to abort.
Tracing the route to 50.20.0.5

  1 192.168.13.3 12 msec 8 msec 8 msec
  2 192.168.34.4 16 msec 16 msec 20 msec
  3 192.168.45.5 20 msec 28 msec 28 msec
R1#

Note that, as soon as R2 interface connected to the backbone is down, R2 can no more reach area25. And R1 will turn to the path advertised through R3.

Case14:

  • R2 fa1/1 Down
  • R1 fa1/1 Down
Figure17: Case14

Figure17: Case14

R1#sh ip route 50.20.0.5           
% Network not in table
R1#t  

Even though R1 link to R2 is up and R2 link (area 25) to R5 is up, R1 will not be able to use the inter-area path, because it doesn’t cross the backbone (not even a connected interface to the backbone).

 

 

Administrative Distance, prefix length, metric… Who is the winner?


  • The Concept
  • Procedural tasks
  • Result table
  • Conclusion

The concept

The idea of the lab is to test the RIB best route election criteria of a border router. To do so, four overlapping subnets are configured in different parts of the network and available to a border router through different routing protocols. One of them is directly connected.

All prefixes are made available and reachable in the same time to see who is going to be elected as best route, then remove the winner from the competition by making the corresponding path unavailable and iterate the selection process until the last path.

One directly connected segment and three routing protocols, so four administrative distances: directly connected (AD=0), RIP(AD=120),OSPF(AD=110) and EIGRP internal(AD=90).

Each protocol has two unequal paths (different metrics) to reach the same prefix.

Prefix masks are configured to be inversely proportional to routing protocol administrative distances.

Lab topology

6VPE MPLS

Procedural tasks

For each test case, the routing table is checked for the best route, a trace route to check the path and make the winner path unavailable.




Result table

Classification

Mask length

metric

AD

prefix

Path

Routing protocol

4

28

110

110

192.168.1.64

A

OSPF

3

74

192.168.1.64

B

1

29

1

120

192.168.1.64

C

RIP

2

2

192.168.1.64

D

6

27

32195456

90

192.168.1.64

E

EIGRP

5

2195456

192.168.1.64

F

7

26

0

0

192.168.1.64

G


Directly connected

RIB looks at the mask length first. The directly connected prefix with the shortest mask length is considered last as the longer the mask, the more accurate the prefix.

Conclusion

With the same prefix and different mask lengths, the border router considers the following criteria in order of preference:

  1. Longest mask among all routing protocols
  2. Lowest cost with the same routing protocol
%d bloggers like this: