[ale] Fun routing

Discussion:

[ale] Fun routing

James Sumners

2017-10-11 20:06:03 UTC

I lied. I'm not having fun.

I have a system with three NICs: eth0, eth1, eth2. This system is to be a
"load balancer," or, more accurately, a reverse proxy for many end points.
My desire is to make eth0 a "maintenance" NIC and bond eth1 & eth2 into the
primary service interface, bond0. I have three subnets in play: 10.0.1.0/24,
10.0.2.0/24, and 10.0.3.0/24. Pretend that 10.0.2/24 and 10.0.3/24 are
public, Internet accessible, subnets and 10.0.1/24 is private. The proxy
will serve end points on all three of these subnets.

Okay, so let's setup the interfaces:

```
$ echo -e "1 service\n2 bond" >> /etc/iproute2/rt_tables

$ ip link set eth0 up
$ ip addr add 10.0.1.2/24 dev eth0
$ ip rule add iif eth0 prio 0 table service
$ ip route add to 0.0.0.0/0 via 10.0.1.1 dev eth0 prio 10000 table main
$ ip route add default via 10.0.1.1 dev eth0 table service
$ ip route flush cache

$ ip link add dev bond0 address 00:00:00:aa:bb:cc type bond
$ echo balance-alb > /sys/class/net/bond0/bonding/mode
$ echo 100 > /sys/class/net/bond0/bonding/miimon
$ ip link set eth1 master bond0
$ ip link set eth2 master bond0
$ ip link set bond0 up
$ ip addr add 10.0.2.2/24 dev bond0 # see note 1 below
$ ip rule add iif bond0 prio 0 table bond
$ ip route add default via 10.0.2.1 dev bond0 table bond
$ ip route flush cache
```

Cool. Now let's add an end point:

```
$ ip addr add 10.0.3.15/32 dev bond0
```

So, what's the problem? The switches see 10.0.3.15 as being associated with
eth0. Thus, things don't work correctly. I can use tcpdump to monitor the
traffic on bond0, ping 10.0.3.15 and see the traffic come in, but the
pinger never gets a pong.

At this point I'm probably just going to say to hell with the maintenance
interface and have all traffic on the bond and routed with the main table.
But I figured I'd see if anyone has any guesses about why this
configuration isn't working. To the best of my knowledge, the following
should be true:

1. Traffic originating on the system will be routed through 10.0.1.1 via
the eth0 interface as per the "main" routing table.
2. Traffic originating remotely via 10.0.1.2 will route through 10.0.1.1
via the eth0 interface as per the "service" routing table.
3. Traffic originating remotely via 10.0.2.2 or 10.0.3.15 will route
through 10.0.2.1 via the bond0 interface as per the "bond" routing table.

Note 1: this is actually a pair of systems configured for failover with
Ucarp as provided by https://github.com/jsumners/ucarp-rhel7 . Ucarp needs
a "master IP" to tie the VIPs to.

--
James Sumners
http://james.sumners.info/ (technical profile)
http://jrfom.com/ (personal site)
http://haplo.bandcamp.com/ (music)

Ed Cashin

2017-10-11 20:47:53 UTC

Permalink

One thing to bear in mind is that with many Linux systems in a default
configuration, ARP replies will mention an inaccessible IP address.

E.g., host A on networks 1 and 2 has interfaces A1 and A2. Host B is only
on network 2. Say A1 has IP 10.1.1.1 with MAC address aa:bb:cc:dd:ee:ff.

Now do "arping -I B2 10.1.1.1" on B. Even though that's the IP for A1,
which B cannot get to directly, A responds with an ARP reply that says
aa:bb:cc:dd:ee:ff has IP 10.1.1.1.

Post by James Sumners
I lied. I'm not having fun.
I have a system with three NICs: eth0, eth1, eth2. This system is to be a
"load balancer," or, more accurately, a reverse proxy for many end points.
My desire is to make eth0 a "maintenance" NIC and bond eth1 & eth2 into the
10.0.1.0/24, 10.0.2.0/24, and 10.0.3.0/24. Pretend that 10.0.2/24 and
10.0.3/24 are public, Internet accessible, subnets and 10.0.1/24 is
private. The proxy will serve end points on all three of these subnets.
```
$ echo -e "1 service\n2 bond" >> /etc/iproute2/rt_tables
$ ip link set eth0 up
$ ip addr add 10.0.1.2/24 dev eth0
$ ip rule add iif eth0 prio 0 table service
$ ip route add to 0.0.0.0/0 via 10.0.1.1 dev eth0 prio 10000 table main
$ ip route add default via 10.0.1.1 dev eth0 table service
$ ip route flush cache
$ ip link add dev bond0 address 00:00:00:aa:bb:cc type bond
$ echo balance-alb > /sys/class/net/bond0/bonding/mode
$ echo 100 > /sys/class/net/bond0/bonding/miimon
$ ip link set eth1 master bond0
$ ip link set eth2 master bond0
$ ip link set bond0 up
$ ip addr add 10.0.2.2/24 dev bond0 # see note 1 below
$ ip rule add iif bond0 prio 0 table bond
$ ip route add default via 10.0.2.1 dev bond0 table bond
$ ip route flush cache
```
```
$ ip addr add 10.0.3.15/32 dev bond0
```
So, what's the problem? The switches see 10.0.3.15 as being associated
with eth0. Thus, things don't work correctly. I can use tcpdump to monitor
the traffic on bond0, ping 10.0.3.15 and see the traffic come in, but the
pinger never gets a pong.
At this point I'm probably just going to say to hell with the maintenance
interface and have all traffic on the bond and routed with the main table.
But I figured I'd see if anyone has any guesses about why this
configuration isn't working. To the best of my knowledge, the following
1. Traffic originating on the system will be routed through 10.0.1.1 via
the eth0 interface as per the "main" routing table.
2. Traffic originating remotely via 10.0.1.2 will route through 10.0.1.1
via the eth0 interface as per the "service" routing table.
3. Traffic originating remotely via 10.0.2.2 or 10.0.3.15 will route
through 10.0.2.1 via the bond0 interface as per the "bond" routing table.
Note 1: this is actually a pair of systems configured for failover with
Ucarp as provided by https://github.com/jsumners/ucarp-rhel7 . Ucarp
needs a "master IP" to tie the VIPs to.
--
James Sumners
http://james.sumners.info/ (technical profile)
http://jrfom.com/ (personal site)
http://haplo.bandcamp.com/ (music)
_______________________________________________
Ale mailing list
http://mail.ale.org/mailman/listinfo/ale
See JOBS, ANNOUNCE and SCHOOLS lists at
http://mail.ale.org/mailman/listinfo

--
Ed Cashin <***@noserose.net>

James Sumners

2017-10-21 22:46:11 UTC

Permalink

I didn't see this since ALE stuff started going to SPAM (that is being
resolved). I had to settle for putting everything on a single bond
interface and forgetting about a service interface. I had to get the
systems operational.

Post by Ed Cashin
One thing to bear in mind is that with many Linux systems in a default
configuration, ARP replies will mention an inaccessible IP address.
E.g., host A on networks 1 and 2 has interfaces A1 and A2. Host B is only
on network 2. Say A1 has IP 10.1.1.1 with MAC address aa:bb:cc:dd:ee:ff.
Now do "arping -I B2 10.1.1.1" on B. Even though that's the IP for A1,
which B cannot get to directly, A responds with an ARP reply that says
aa:bb:cc:dd:ee:ff has IP 10.1.1.1.

--
_______________________________________________
Ale mailing list
http://mail.ale.org/mailman/listinfo/ale
See JOBS, ANNOUNCE and SCHOOLS lists at
http://mail.ale.org/mailman/listinfo

--
James Sumners
http://james.sumners.info/ (technical profile)
http://jrfom.com/ (personal site)
http://haplo.bandcamp.com/ (music)

James Sumners

2017-10-21 22:46:12 UTC

Permalink

--
_______________________________________________
Ale mailing list
http://mail.ale.org/mailman/listinfo/ale
See JOBS, ANNOUNCE and SCHOOLS lists at
http://mail.ale.org/mailman/listinfo

--
James Sumners
http://james.sumners.info/ (technical profile)
http://jrfom.com/ (personal site)
http://haplo.bandcamp.com/ (music)

James Sumners

2017-10-25 13:13:37 UTC

Permalink

As I mentioned, I had to get this stuff into production so I ended up
settling for making the bond the only interface available. At the southwest
meetup last night we discussed this and I said I would post my scripts for
this configuration. There's nothing too special in them, here they are
(default RHEL `network` service is disabled; some formatting is off due to
Ansible templating):

/etc/systemd/system/bond.service:
```
[Unit]
Description=manual bond routing
Before=network.target

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStartPre=/usr/local/bin/bond-down
ExecStart=/usr/local/bin/bond-up
ExecStop=/usr/local/bin/bond-down

[Install]
WantedBy=multi-user.target
```

/usr/local/bin/bond-up:
```
#!/bin/bash

if [ "$(whoami)" != "root" ]; then
echo "must be root"
exit 1
fi

echo "checking for existing bond device"
ip link show dev bond1 2>/dev/null
bond_added=$?

if [ ${bond_added} -eq 0 ]; then
echo "bond device already added, nothing to do"
exit 0
fi

echo "adding bond device"
ip link add dev bond1 address 00:c5:4c:29:11:a7 type bond

if [ $? -eq 0 ]; then
echo "setting bond mode and monitoring config"
echo balance-alb > /sys/class/net/bond1/bonding/mode && \
echo 100 > /sys/class/net/bond1/bonding/miimon

if [ ! $? -eq 0 ]; then
echo "failed to set bond mode/monitor config, aborting"
ip link del dev bond1 type bond
exit 1
fi

echo "enslaving nic em1 to bond bond1"
ip link set em1 down
ip link set em1 master bond1
if [ ! $? -eq 0 ]; then
echo "could not enslave em1"
ip link del dev bond1 type bond
exit 1
fi
echo "enslaving nic em2 to bond bond1"
ip link set em2 down
ip link set em2 master bond1
if [ ! $? -eq 0 ]; then
echo "could not enslave em2"
ip link del dev bond1 type bond
exit 1
fi
fi

echo "bringing bond1 up"
ip link set bond1 up

echo "adding bond ip address"
# This shouldn't be possible, but let's be thorough
ip addr show dev bond1 | grep 10.0.1.5
bond_ip_up=$?
if [ ${bond_ip_up} -eq 0 ]; then
echo "bond ip already added, nothing to do"
else
ip addr add 10.0.1.5/24 dev bond1
fi

echo "establishing bond routing"
ip route add default via 10.0.1.1 dev bond1
ip route flush cache

echo "bond bond1 created"
exit 0
```

/usr/local/bin/bond-down:
```
#!/bin/bash

if [ "$(whoami)" != "root" ]; then
echo "must be root"
exit 1
fi

echo "checking for bond device bond1"
ip link show dev bond1 2>/dev/null
bond_added=$?

# could not find the bond device so nothing to do
if [ ${bond_added} -eq 1 ]; then
echo "no device to bring down"
exit 0
fi

echo "freeing enslaved nic em1 from bond bond1"
ip link set em1 nomaster
echo "freeing enslaved nic em2 from bond bond1"
ip link set em2 nomaster

echo "removing bond device bond1"
ip link del dev bond1 type bond
exit $?
```

--
James Sumners
http://james.sumners.info/ (technical profile)
http://jrfom.com/ (personal site)
http://haplo.bandcamp.com/ (music)