Thursday 27 November 2014

Removing VLAN/MPLS/PPPoE/GRE/GTP/VXLAN Encapsulation Headers from pcap Files

Many years ago, when I worked in a school, I used to port mirror our proxy server to an old PC running driftnet and leave the screen where the kids could see it as a warning that staff could "see what you're doing on the Internet". I haven't played with driftnet since but certainly at the time it could only handle native frames (no VLAN tags, certainly no MPLS or PPPoE). I vaguely remember some other tools being similar, unfortunately I can't remember which ones.

Looking at the analytics for this blog, I can see I'm not the only one who's had the problem. It's certainly not the number one issue that people are searching for when they get here but there have been a few and the thought occurred that the packet processing engine I wrote for dechap would be really good for this task - it already stripped back VLANs and MPLS, plus it knows how to detect PPPoE and L2TP.

After a couple of hours it was working to the point of being able to strip VLANs and MPLS off, with a little more effort PPPoE also gave way. GRE came quite easily, too, as it has simple headers and uses the same etypes as Ethernet.

Anyway, here is "stripe" (from STRIP Encapsulation), a command line tool which takes a pcap file as input, re-assembles IP fragments and strips off all the encap it can (currently VLAN tags, MPLS shim headers, PPPoE, L2TP, GRE GTP and VXLAN) then outputs another pcap containing just payload over Ethernet.

**UPDATE** - Version 0.3b now adds support for VXLAN.

Download


Stripe is available from my github: https://github.com/theclam/stripe

Usage


The command line is pretty straightforward, as shown in the online help:

Harrys-MacBook-Air:stripe foeh$ ./stripe
stripe: a utility to remove VLAN tags, MPLS shims, PPPoE, L2TP headers,
etc. from the frames in a PCAP file and return untagged IP over Ethernet.
Version v0.1 alpha, November 2014

Usage:
./stripe -r inputcapfile -w outputcapfile

Where inputcapfile is a tcpdump-style .cap file containing encapsulated IP 
outputcapfile is the file where the decapsulated IP will be saved

Harrys-MacBook-Air:stripe foeh$ 

Simply specify the files you want to read encapsulated packets from (-r) and write the cleaned up packets to (-w). Stripe will remove as many layers of encap as it can until you are left with straight payload over Ethernet.

How it Works


The majority of stripe's work is done by the "decap" function. This function takes in a block of memory, a length parameter, a data type hint and a frame template. The process runs as follows:


  1. If the type is Ethernet, populate the source / destination MACs of the frame template
  2. If the type has an Ethertype or protocol type field, use this to populate the ethertype of the frame template
  3. If the next protocol is possibly or definitely payload, set the payload pointer of the frame template to the address of the next protocol and return
  4. If the next protocol is possibly or definitely encapsulation, call decap against the remainder of the packet
So essentially it eats up encap, recording MACs and protocol types as it goes, until there is no more encap left. By the end there is a fully populated frame template with source and destination MAC (the innermost copy if there are multiple as in the case of MPLS pseudowires), the etherype of the payload and the payload itself. Piecing these together gives a minimally encapsulated frame, i.e. one with just an Ethernet header and payload.

Here is a worked example for a frame with VLAN, MPLS, GRE over IP and an IP payload:


Step 1 - The "decap" function is called on the entire frame. Since the first header is Ethernet, the frame template gets populated with the source / destination MACs and the etype from the Ethernet header. The frame template's length field gets populated with the size of the frame minus the Ethernet header and the payload pointer is adjusted to point at the next header. The decap function then calls itself on the remainder of the frame, hinting that the type is VLAN tag based on the current header's etype.


Step 2 - The decap function now considers the partial frame starting at the VLAN tag. Since the VLAN tag has an etype associated, the frame template's etype is overwritten with the one from the VLAN header. The length is overwritten with the length of the payload after the VLAN header and the pointer adjusted to point at the next header. The decap function then calls itself again with a hint of MPLS, based on the etype in the VLAN header.


Step 3 - The decap function now considers the partial frame starting at the MPLS label. Since the MPLS label is bottom of stack, we know there are no more MPLS labels left . Unfortunately there is no protocol type in an MPLS header (these are signaled on the control plane) so we have to take a peek at the byte immediately following the label. If we find a "4" or a "6" in the high order nibble then we have to guess that the next protocol is IPv4 or IPv6, respectively. If the following four bytes are all zeroes then we assume Ethernet over MPLS with control word, otherwise we assume Ethernet over MPLS without control word. In this case we find a 4 in the low nibble, so call decap with an "IP" hint.


Step 4 - The IP header tells us that GRE is the next protocol so for now nothing changes in the frame template (the remainder could be decodable or not). We just call decap again on the GRE part...



Step 5 - The GRE header is decoded and the etype is copied into the frame header. The length of the remaining payload is updated in the frame template and the pointer is adjusted. Decap is called on the next header, which is IP. When the decap function inspects the IP payload it can go no further and just returns the frame template.



In essence, the process has started with a deeply encapsulated frame and ended with IP over Ethernet. The source and destination MACs are taken from the innermost ones found (which in this case is the outermost Ethernet header) but with the etype changed to match the payload, which is the first non-encapsulating payload found in the frame, in this case the second IP.

References

https://tools.ietf.org/html/rfc2784
https://tools.ietf.org/html/rfc1701
http://www.ieee802.org/1/pages/802.1Q.html
http://www.3gpp.org/DynaReport/29060.htm

Friday 21 November 2014

Troubleshooting PPPoE Client on Cisco Routers

There are are essentially a handful of stages involved in bringing up a PPPoE client session on a Cisco router, each of which could fail for a distinct set of reasons. This guide takes a walk through the entire process, step by step, highlighting the most common causes of problems at each stage.

Routing


Even though PPP itself is peer to peer, PPPoE is inherently client-server. That means that the connection has to be originated by the client and, in most cases, the client will only do that when it has some traffic to send over PPPoE. Therefore, the router must know the dialer as its next hop interface for some destination, i.e. it must have a route. It sounds trivial but it's surprising how often people go to all the trouble of putting in a perfectly good PPPoE config, then forget to put a default route in for the traffic!

More broadly, though, there are other things that could stop a route from being installed. For example, if you configure a dialer as a backup interface to another interface then there are some gotchas. Shutting down the primary will usually not enable its backup, also you still need a (static) route pointing traffic towards the dialer in order to make it dial - a step which is often forgotten. It's usually best to remove the backup interface configuration while testing the dialer, then re-apply it when that has been proven to work.

Dialer


Once traffic hits the dialer interface, the router will only attempt to bring up a PPPoE session if its dialer becomes activated. If a route exists but the dialer is not trying to connect then enable dialer debugging as follows:

Client#debug dialer packet
Dial on demand packets debugging is on
Client#debug dialer event
Dial on demand events debugging is on

Now generate some traffic that should bring up the link, for example by sending a ping:

Client#ping 8.8.8.8
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 8.8.8.8, timeout is 2 seconds:
*Nov  5 21:44:34.147: Di0 DDR: ip (s=10.10.10.10, d=8.8.8.8), 100 bytes, outgoing interesting (ip PERMIT)
*Nov  5 21:44:34.151: Di0 DDR: Cannot place call, no dialer string set.
*Nov  5 21:44:41.491: %DIALER-6-BIND: Interface Vi2 bound to profile Di0
*Nov  5 21:44:41.503: %LINK-3-UPDOWN: Interface Virtual-Access2, changed state to up
*Nov  5 21:44:41.503: Vi2 DDR: Dialer statechange to up
Client#
*Nov  5 21:44:42.195: %LINEPROTO-5-UPDOWN: Line protocol on Interface Virtual-Access2, changed state to up
Client#

*Nov  5 21:44:42.371: Vi2 DDR: dialer protocol up

When the "dialer protocol up" message is received, the dialer has been activated correctly and troubleshooting should move on to the PPPoE stage.

Common problems:

  • If no debug messages are generated at all:
    • Verify that traffic is being routed towards the dialer interface
    • Verify that the dialer-list is configured correctly and referenced by the dialer
    • Verify that the dialer's encapsulation is configured to ppp
  • Di0 DDR: Cannot place call, no dialer string set. 
    • Appears to be spurious, no dialer string is required for PPPoE and it dials anyway
  • Di0 DDR: ip (s=10.10.10.10, d=8.8.8.8), 100 bytes, outgoing uninteresting (no dialer-group defined).
    • Exactly what it says - dialer is not associated with a dialer list. Ensure a dialer-list is configured and is referred to by the dialer with the "dialer-group x" command.
  • Di0 DDR: ip (s=10.10.10.10, d=8.8.8.8), 100 bytes, outgoing uninteresting (dialer-list 2 not defined).
    • Again, exactly what it says. The dialer is associated with a non-existent dialer-list. Either re-point the dialer using the "dialer-group x" command or create a new dialer-list with the appropriate number.
  • Di0 DDR: ip (s=10.10.10.10, d=8.8.8.8), 100 bytes, outgoing interesting (ip PERMIT)
    • Repeated interesting traffic lines but no dialling can occur if the dialer references an empty dialer pool - ensure your PPPoE interface is configured with both "pppoe enable" and "pppoe-client dial-pool-number x". Also ensure that the interface is admin up.
    • Could also be due to PPPoE discovery phase failing, see below.

PPPoE Discovery


In order to bring up a PPP session over Ethernet, a PPPoE session must be set up to create a point-to-point connection over a broadcast Ethernet network. This is established using PPPoE Auto-Discovery, where the PPPoE client (our router) searches for a PPPoE access concentrator which is willing to terminate its connection. This phase should operate as follows:

PPPoE Discovery Phase
The client sends a PPPoE Auto Discovery Initiate (PADI) frame, asking any available access concentrators to make themselves known. The access concentrator(s) then respond(s) with a PADO (offer) frame to indicate its availability. The client then sends a PADR (request) frame to its chosen access concentrator which, all being well, will respond with a PADS (session) message to indicate that the PPPoE session is now up. At any time either device may issue a PADT to close the PPPoE session.

To see what is happening at this stage, run the following command:

Client#debug pppoe packet

*Nov  5 20:38:52.107: pppoe_send_padi
contiguous pak, size 60
FF FF FF FF FF FF 00 01 02 03 04 05 88 63 11 09
00 00 00 10 01 01 00 00 01 03 00 08 2A 00 00 01
00 00 06 CD 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00
*Nov  5 20:38:52.143: PPPoE 0: I PADO  R:0011.2233.4455 L:0001.0203.0405 Fa0/0
contiguous pak, size 66
00 01 02 03 04 05 00 11 22 33 44 55 88 63 11 07
00 00 00 2E 01 01 00 00 01 03 00 08 2A 00 00 01
00 00 06 CD 01 02 00 06 4C 61 62 2D 41 43 01 04
00 10 D5 60 38 B8 05 81 B6 69 29 1B 5E 82 77 A0
5E 91
*Nov  5 20:38:54.155: OUT PADR from PPPoE Session
contiguous pak, size 66
00 11 22 33 44 55 00 01 02 03 04 05 88 63 11 19
00 00 00 2E 01 03 00 08 2A 00 00 01 00 00 06 CD
01 02 00 06 4C 61 62 2D 41 43 01 04 00 10 D5 60
38 B8 05 81 B6 69 29 1B 5E 82 77 A0 5E 91 01 01
00 00
*Nov  5 20:38:54.355: PPPoE 14: I PADS  R:0011.2233.4455 L:0001.0203.0405 Fa0/0
contiguous pak, size 66
00 01 02 03 04 05 00 11 22 33 44 55 88 63 11 65
00 0E 00 2E 01 03 00 08 2A 00 00 01 00 00 06 CD
01 02 00 06 4C 61 62 2D 41 43 01 04 00 10 D5 60
38 B8 05 81 B6 69 29 1B 5E 82 77 A0 5E 91 01 01
00 00
*Nov  5 20:38:54.363: [0]PPPoE 0: O PADT  R:0000.0000.0000 L:0000.0000.0000 Fa0/0
contiguous pak, size 60
00 00 00 00 00 00 00 01 02 03 04 05 88 63 11 A7
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00

Client#


The mark of a successful PPPoE Discovery phase is that a PADS packet is received - at this point the PPPoE session is up and troubleshooting focus should shift to the PPP stage.

Common problems:
  • Only PADI seen:
    • Layer 1 or 2 issue between client and server
    • PPPoE traffic being filtered between client and server, max sessions per MAC exceeded
    • Sometimes possible even if interface is admin down!
  • PADT messages received after PADI or PADR:
    • Restrictions on Access Concentrator (e.g. max sessions per MAC exceeded)
  • PADT message received after PADS:
    • Generally a problem further up the stack, continue troubleshooting
Note - PPPoE discovery may occur without the dialer even being activated.

PPP Negotiation (LCP)


With the PPPoE session up, Link Control Protocol (LCP) will attempt to negotiate the parameters for the actual PPP session. These include useful parameters such as the MRU and authentication type, plus potentially many less applicable parameters such as callback, compression or PPP multilink.

The process is that each side will send proposals (CONFiguration REQuests or CONFREQs) to the other indicating its preferred settings. The opposite device can then respond in one of the following ways:

  • Send a CONFiguration ACKnowledgement (CONFACK) to indicate agreement to the proposal
  • Send a CONFiguration Negative AcKnowledgement (CONFNAK) to indicate a particular setting should be changed and providing a suggested alternative value. If multiple values need to change, there will be multiple CONFNAKs.
  • Send a CONFiguration REJect (CONFREJ) to indicate that either the type or the value of the parameter proposed is completely unacceptable.
Only once all the parameters are agreed between the two peers can the connection be established and higher layer protocols be negotiated. Bear in mind that PPP is inherently peer to peer, so both sides will play the roles of both requester and approver / rejecter.

To see what is happening at this stage, run the following command:

Client#debug ppp negotiation
PPP protocol negotiation debugging is on
*Nov  5 22:11:39.191: %DIALER-6-BIND: Interface Vi2 bound to profile Di0
*Nov  5 22:11:39.207: %LINK-3-UPDOWN: Interface Virtual-Access2, changed state to up
*Nov  5 22:11:39.211: Vi2 PPP: Sending cstate UP notification
*Nov  5 22:11:39.219: Vi2 PPP: Processing CstateUp message
*Nov  5 22:11:39.251: PPP: Alloc Context [670153F8]
*Nov  5 22:11:39.255: ppp22 PPP: Phase is ESTABLISHING
*Nov  5 22:11:39.259: Vi2 PPP: Using dialer call direction
*Nov  5 22:11:39.259: Vi2 PPP: Treating connection as a callout
*Nov  5 22:11:39.263: Vi2 PPP: Session handle[F4000016] Session id[22]
*Nov  5 22:11:39.263: Vi2 LCP: Event[OPEN] State[Initial to Starting]
*Nov  5 22:11:39.267: Vi2 PPP: No remote authentication for call-out
*Nov  5 22:11:39.267: Vi2 LCP: O CONFREQ [Starting] id 1 len 10
*Nov  5 22:11:39.271: Vi2 LCP:    MagicNumber 0x03BD43E3 (0x050603BD43E3)
*Nov  5 22:11:39.271: Vi2 LCP: Event[UP] State[Starting to REQsent]
*Nov  5 22:11:39.323: Vi2 LCP: I CONFREQ [REQsent] id 1 len 19
*Nov  5 22:11:39.323: Vi2 LCP:    MRU 1492 (0x010405D4)
*Nov  5 22:11:39.327: Vi2 LCP:    AuthProto CHAP (0x0305C22305)
*Nov  5 22:11:39.327: Vi2 LCP:    MagicNumber 0x02C2DFB3 (0x050602C2DFB3)
*Nov  5 22:11:39.331: Vi2 LCP: O CONFNAK [REQsent] id 1 len 8
*Nov  5 22:11:39.331: Vi2 LCP:    MRU 1500 (0x010405DC)
*Nov  5 22:11:39.335: Vi2 LCP: Event[Receive ConfReq-] State[REQsent to REQsent]
*Nov  5 22:11:39.395: Vi2 LCP: I CONFACK [REQsent] id 1 len 10
*Nov  5 22:11:39.395: Vi2 LCP:    MagicNumber 0x03BD43E3 (0x050603BD43E3)
*Nov  5 22:11:39.399: Vi2 LCP: Event[Receive ConfAck] State[REQsent to ACKrcvd]
*Nov  5 22:11:39.439: Vi2 LCP: I CONFREQ [ACKrcvd] id 2 len 19
*Nov  5 22:11:39.439: Vi2 LCP:    MRU 1500 (0x010405DC)
*Nov  5 22:11:39.443: Vi2 LCP:    AuthProto CHAP (0x0305C22305)
*Nov  5 22:11:39.443: Vi2 LCP:    MagicNumber 0x02C2DFB3 (0x050602C2DFB3)
*Nov  5 22:11:39.447: Vi2 LCP: O CONFACK [ACKrcvd] id 2 len 19
*Nov  5 22:11:39.447: Vi2 LCP:    MRU 1500 (0x010405DC)
*Nov  5 22:11:39.451: Vi2 LCP:    AuthProto CHAP (0x0305C22305)
*Nov  5 22:11:39.451: Vi2 LCP:    MagicNumber 0x02C2DFB3 (0x050602C2DFB3)
*Nov  5 22:11:39.455: Vi2 LCP: Event[Receive ConfReq+] State[ACKrcvd to Open]
*Nov  5 22:11:39.467: Vi2 PPP: Phase is AUTHENTICATING, by the peer
*Nov  5 22:11:39.467: Vi2 LCP: State is Open
Client#

To explain the above transaction, I have highlighted the two conversations in different colours. "O" indicates an outbound frame, "I" an inbound frame.

The blue conversation is what we (the client) are proposing to the access concentrator. The client sends an essentially empty proposal with only a magic number (used for loop detection). The access concentrator responds with an acknowledgement, after all there's nothing to argue about!

The red conversation (where the access concentrator is proposing settings) is slightly more interesting. The first proposal contains a proposed maximum receive unit (MRU) size of 1492 and a proposal to use CHAP authentication. In the next frame our client sends a NAK message to indicate it would prefer the access concentrator used an MRU of 1500. Following that, the access concentrator sends a new proposal with an MRU of 1500 and CHAP authentication, which our client then acknowledges.

Now that both sides are in agreement, the state changes to "Open", which is PPP talk for "up".

Common Problems:

  • MRU mismatch - many access concentrators are strictly RFC 2516 compliant and allow a maximum MRU of 1492. This is because 1492 bytes IP + 6 bytes PPPoE + 2 bytes PPP is the largest that can fit inside a standard 1500 byte Ethernet payload. It may be necessary to tweak the MTU on the Ethernet interface using the "pppoe-client ppp-max-payload xxxx" command.
  • Authentication type mismatch - if one peer is set for CHAP only while the other is set to PAP only or no authentication, they are not going to talk. A common mistake is forgetting the authentication callin option, which means that the client asks the server to authenticate itself - this is almost invariably not.
General Note:

If you examine the debug output, it will be clear what the local device is saying (marked with "O" for output) and what the other end is saying (marked with "I" for input). Whichever options are being rejected (CONFREJ'd) will be at the root of the problem - just work out which end is rejecting what and the rest should fall into place.

PPP Authentication


PPP has the ability to authenticate either, both or neither of the peers. In a typical deployment, the access concentrator will require the client to authenticate, but will refuse to authenticate itself to the client. This is typically done using CHAP as in the example below (output from "debug ppp negotiation"):

*Nov  5 22:11:39.259: Vi2 PPP: Treating connection as a callout
*Nov  5 22:11:39.263: Vi2 PPP: Session handle[F4000016] Session id[22]
*Nov  5 22:11:39.263: Vi2 LCP: Event[OPEN] State[Initial to Starting]
*Nov  5 22:11:39.267: Vi2 PPP: No remote authentication for call-out
-- SNIP --
*Nov  5 22:11:39.467: Vi2 PPP: Phase is AUTHENTICATING, by the peer
*Nov  5 22:11:39.527: Vi2 CHAP: I CHALLENGE id 1 len 27 from "Lab-AC"
*Nov  5 22:11:39.543: Vi2 CHAP: Using hostname from interface CHAP
*Nov  5 22:11:39.543: Vi2 CHAP: Using password from interface CHAP
*Nov  5 22:11:39.543: Vi2 CHAP: O RESPONSE id 1 len 30 from "pppoeuser"
*Nov  5 22:11:39.835: Vi2 CHAP: I SUCCESS id 1 len 4
*Nov  5 22:11:39.839: Vi2 PPP: Phase is FORWARDING, Attempting Forward

The output clearly shows that this connection is considered to be a "callout", i.e. we are the initiating party. Next, the debug informs us that we do not require the remote party to authenticate.

Following that, the peer (the access concentrator) asks us to authenticate. It sends us a "CHALLENGE", we send a "RESPONSE", then it sends us a "SUCCESS" message, indicating that our credentials were accepted.

Common problems:


  • It is worth noting the lines which state we are using the hostname and password from interface CHAP. This means that the hostname (in practice essentially a username) and password are configured under the dialer interface with the "ppp chap hostname xxx" and "ppp chap password xxxx". If these are not specified, the router will use its actual hostname and the password will be taken from the local user database, under a user named after the the peer's hostname. Usually that's not what you want.
  • A response of "CHAP: I FAILURE id 1 len 25 msg is "Authentication failed"means exactly what it looks like it means. Check both the username and password are configured correctly.
  • A response of "CHAP: Unable to authenticate for peer" indicates that the device does not  know what password to use to authenticate with the peer. This can be because a "ppp chap hostname" is configured but a "ppp chap password" is not, there is not even a "ppp chap hostname" configured or in the case where local usernames are being used it means there's no local username which matches the AC's hostname.

IPCP


Once the PPP session has been brought up (negotiated through LCP), the next stage is to negotiate each of the protocols that will run through the PPP tunnel. Normally this is just IPv4, which is negotiated using IPCP, but there are also IPv6CP, CDPCP and so on, collectively known as Network Control Protocols or NCPs. Below is some example debug (output from "debug ppp negotiation"), with the AC's IP negotiation in blue and the client's IP negotiation in red:

*Nov  5 22:11:39.867: Vi2 PPP: Queue IPCP code[1] id[1]
*Nov  5 22:11:39.883: Vi2 PPP: Phase is ESTABLISHING, Finish LCP
*Nov  5 22:11:39.887: %LINEPROTO-5-UPDOWN: Line protocol on Interface Virtual-Access2, changed state to up
*Nov  5 22:11:39.899: Vi2 PPP: Phase is UP
*Nov  5 22:11:39.903: Vi2 IPCP: Protocol configured, start CP. state[Initial]
*Nov  5 22:11:39.903: Vi2 IPCP: Event[OPEN] State[Initial to Starting]
*Nov  5 22:11:39.907: Vi2 IPCP: O CONFREQ [Starting] id 1 len 10
*Nov  5 22:11:39.907: Vi2 IPCP:    Address 0.0.0.0 (0x030600000000)
*Nov  5 22:11:39.911: Vi2 IPCP: Event[UP] State[Starting to REQsent]
*Nov  5 22:11:39.915: Vi2 PPP: Process pending ncp packets
*Nov  5 22:11:39.915: Vi2 IPCP: Redirect packet to Vi2
*Nov  5 22:11:39.915: Vi2 IPCP: I CONFREQ [REQsent] id 1 len 10
*Nov  5 22:11:39.919: Vi2 IPCP:    Address 1.1.1.1 (0x030601010101)
*Nov  5 22:11:39.923: Vi2 IPCP: O CONFACK [REQsent] id 1 len 10
*Nov  5 22:11:39.923: Vi2 IPCP:    Address 1.1.1.1 (0x030601010101)
*Nov  5 22:11:39.927: Vi2 IPCP: Event[Receive ConfReq+] State[REQsent to ACKsent]
*Nov  5 22:11:39.987: Vi2 IPCP: I CONFNAK [ACKsent] id 1 len 10
*Nov  5 22:11:39.987: Vi2 IPCP:    Address 172.16.0.6 (0x0306AC100006)
*Nov  5 22:11:39.991: Vi2 IPCP: O CONFREQ [ACKsent] id 2 len 10
*Nov  5 22:11:39.991: Vi2 IPCP:    Address 172.16.0.6 (0x0306AC100006)
*Nov  5 22:11:39.995: Vi2 IPCP: Event[Receive ConfNak/Rej] State[ACKsent to ACKsent]
*Nov  5 22:11:40.055: Vi2 IPCP: I CONFACK [ACKsent] id 2 len 10
*Nov  5 22:11:40.059: Vi2 IPCP:    Address 172.16.0.6 (0x0306AC100006)
*Nov  5 22:11:40.059: Vi2 IPCP: Event[Receive ConfAck] State[ACKsent to Open]
*Nov  5 22:11:40.071: Vi2 IPCP: State is Open
*Nov  5 22:11:40.075: Di0 IPCP: Install negotiated IP interface address 172.16.0.6
*Nov  5 22:11:40.131: Di0 Added to neighbor route AVL tree: topoid 0, address 1.1.1.1
*Nov  5 22:11:40.135: Di0 IPCP: Install route to 1.1.1.1

Common Problems:


  • If no IPCP appears at all, it could be that both ends have "ppp ncp passive" set.
  • If you see a message similar to "O PROTREJ [Open] id 2 len 16 protocol IPCP (0x0101000C030601010101)" this is because IP is not configured on the local dialer interface. Usually you just want to add "ip address negotiated" under the dialer to fix this.

At this point your session is up and you should be able to pass traffic OK. If you're still looking, it may help to read my blog post on the theory behind bringing up a PPPoE session.

References


http://tools.ietf.org/html/rfc2516 (PPPoE)
http://tools.ietf.org/html/rfc4638 (PPPoE large MRU)
http://tools.ietf.org/html/rfc1661 (PPP)


Thursday 13 November 2014

Basic Internet Connectivity Setup Using HWIC-3G-GSM Card

In addition to working on some spanking new 4G Cisco 819 devices, I've occasionally had to slum it by providing Internet access with a normal 1800 / 2800 series router and an HWIC-3G-GSM card. Once you know what's involved the config is remarkably simple but it can be difficult to understand what's what at first.

The video below takes a very quick walk through setting this up, some further explanation is given beneath that for those interested.


Building Blocks


While there are a few mobile-specific pieces of configuration, anyone who has previously worked on ISDN, async modems or ADSL on Cisco routers will probably find a lot of familiar concepts. Here are the main elements of a 3G / 4G configuration:

Cellular Profile 


This is where the APN address and authentication mode are configured. These are saved to the modem's NVRAM as soon as they are applied. Here's an example of how to set a cellular profile on the two different platforms:

Router#cell 0/0/0 gsm profile create 1 three.co.uk
Profile 1 will be created with the following values:                            
PDP type = IPv4                                                                 
APN = three.co.uk                                                               
Are you sure? [confirm]                                                         

Profile 1 written to modem                                                      
Router#

The number "1" here indicates which profile slot on the modem will be used to store the details. This is significant later on because there may be multiple APNs configured and the router needs to know which to use when connecting.

This example is for three, a UK mobile carrier which is interesting because the APN uses no authentication. If your APN requires authentication, simply follow the APN with either a pap or chap keyword, the username and finally the password.

Note that this is applied at the exec prompt rather than in config mode.



Cellular Interface


The physical radio interfaces are referred to using the "Cellular" prefix, in this case Cellular0/0/0. The Cellular interface is where the dialer, authentication and IP details are normally configured - I say normally as there are many different ways to configure dialers depending on what kind of load balancing and resilience are required. For a typical 3G deployment, though, you will only have one physical interface and so the simplest way is to forget about pools and put the config straight onto that.

Here is an example configuration showing the key elements:

interface Cellular0/0/0
 ip address negotiated
 encapsulation ppp
 dialer in-band
 dialer string "*98*1#"
 dialer-group 1
 ppp chap refuse
!
dialer-list 1 protocol ip permit
ip route 0.0.0.0 0.0.0.0 Cell0/0/0

The key thing to notice here is the "*98*1#" dialer string. The "*98*" and "#" are fixed, the "1" refers to the profile slot number used earlier. If you used a different slot, refer to it here.

The rest is fairly standard dialer stuff, in this example I've made the dialer-list so that any IP traffic will cause it to connect.

Sundry Config

At this point the router should be able to connect to the cellular network. For most purposes, though, you will need to either set up NAT or some sort of VPN tunnel for the connection to be of any use. These are set up the same way as for any other setup.

Testing and Diagnostics

How can you tell whether the cellular connection is coming up? The first clue is that log entries similar to the following should appear:

%LINK-3-UPDOWN: Interface Cellular0/0/0, changed state to 
up

To check whether the modem is attached to the radio network, use the following commands:

Router#show cell 0/0/0 network                                                  
Current Service Status = Normal, Service Error = None                           
Current Service = Combined                                                      
Packet Service = HSDPA (Attached)                                               
Packet Session Status = Active                                                  
Current Roaming Status = Home                                                   
Network Selection Mode = Automatic                                              
Country = GBR, Network = 3 UK                                                   
Mobile Country Code (MCC) = 234                                                 
Mobile Network Code (MNC) = 20                                                  
Location Area Code (LAC) = 24                                                   
Routing Area Code (RAC) = 24                                                    
Cell ID = 14827                                                                 
Primary Scrambling Code = 81                                                    
PLMN Selection = Automatic                                                      
Registered PLMN = 3 , Abbreviated =                                             
Service Provider =                                                              
Router#show cell 0/0/0 radio                                                    
Radio power mode = ON                                                           
Current Band = WCDMA 2100, Channel Number = 10564                               
Current RSSI = -76 dBm                                                          
Band Selected = Auto                                                            
Number of nearby cells = 1                                                      
Cell 1

        Primary Scrambling Code = 0x51
        RSCP = -77 dBm, ECIO = -0 dBm           
                                                                                
Router#

Note that the band and channel need to be populated, the network should display the expected carrier name and the packet service should show as attached. The actual band and service type will vary depending on carrier, coverage, area and equipment used.

If the network status remains in "Emergency Only" and you get no MSISDN showing in your show cell 0/0/0 hardware command, particularly if it is accompanied by messages saying "%CELLWAN-2-SIM_LOCKED: [Cellular0/0/0]: SIM is locked", then you have probably locked the SIM (i.e. by setting up a startup PIN on a phone handset) and will need to unlock it as follows:

Router #cell 0/1/0 gsm sim unlock 1234
!!!WARNING: SIM will be unlocked with pin=1234(4).
Do not enter new PIN to unlock SIM. Enter PIN that the SIM is configured with.
Call will be disconnected!!!
Are you sure you want to proceed?[confirm]

*Dec  7 22:15:38.035: %LINK-3-UPDOWN: Interface Cellular0/0/0, changed state to up

Router#

If the radio interface is up but a data connection cannot be established then all the usual debugs may be used:

debug dialer (to verify it is trying to dial)
debug chat (sometimes useful to deduce whether APN is configured correctly)
debug ppp negotiation (shows the PPP negotiation process from agreeing basic link properties and authentication type, through the authenticating stage and up to IP being allocated)

A full deep-dive into these debugs wouldn't really be appropriate for this post, in any case it's usually fairly evident where the problem lies. UPDATE: The promised dialer / PPP debugging guide is available here - it's written for PPPoE but the vast majority of it is applicable to cellular interfaces as well.

References

Video accompanying this blog post

Monday 6 October 2014

Quiet Mode on ME3x00 Platforms

The Cisco ME3x00 range of devices comes with automatic lockout for SSH as default. If too many bad login attempts are made within a short period of time, the device will go into a locked down state called "quiet mode" which blocks any new management connections.

In a default configuration, entering quiet mode causes the device to completely refuse any and all new telnet, SSH and port 80 sessions directed towards it. Existing (open) sessions are not affected. If you've ever tried a couple of credentials out, then suddenly started getting "connection refused", you have probably run into this feature!

The default threshold and timer values are as follows:

  • An artificial 1 second delay is added to each login
  • 5 bad logins within 60 seconds triggers a lockout
  • lockout lasts 5 minutes

If that's all you needed to know then I suppose you can go now :)

If you're interested in checking / tweaking the settings, read on.

The commands to view and edit settings relevant to this feature all centre around "login", for example:


ME3x00#show login
     A default login delay of 1 seconds is applied.
     No Quiet-Mode access list has been configured.

     Router enabled to watch for login Attacks.
     If more than 5 login failures occur in 60 seconds or less,
     logins will be disabled for 300 seconds.

     Router presently in Normal-Mode.
     Current Watch Window remaining time 37 seconds.
     Present login failure count 1.

ME3x00#

One thing to note here is that if you try to authenticate with public keys (like openssh and others do by default) it will refuse and count that as a failure. If you bail out at the password prompt (i.e. ctrl-C or leave it time out) then that also counts as a failure. It doesn't take long to get to 5! Attempts blocked by the VTY ACLs don't count.

If you would like to force a longer delay between login attempts, you can adjust the value (between 1 and 10 seconds) under config mode as follows:

ME3x00(config)# login delay 5

The lockout thresholds can easily be changed using the following command:

ME3x00(config)# login block-for 120 attempts 10 within 60

This example would cause the device to trigger a two minute lockdown into quiet mode if it saw 10 failed logins within 60 seconds.

Now the 5 minute default logout is quite a long time and being able to force a lockdown could be advantageous to attackers - if you can prevent an administrator from being able to log into a device then it makes it far more difficult for him to detect and / or mitigate attacks on the network. Luckily there is a feature available to effectively whitelist management traffic that should always be allowed through, even when the device goes into lockdown. This comes in the form of the "login quiet-mode access-class", which basically decides what access controls are put in place when the device enters quiet mode. By default, the quiet-mode access-class is set to "sl_def_acl", which is an omnipresent ACL which looks like this:

ME3x00#show ip access-list sl_def_acl
Extended IP access list sl_def_acl
    10 deny tcp any any eq telnet (7 matches)
    20 deny tcp any any eq www
    30 deny tcp any any eq 22 (72 matches)
    40 permit ip any any

ME3x00#

In order to provide back door access when the device enters quiet mode, simply define an ACL which permits the desired traffic but blocks all other management traffic, such as:

ME3x00(config)#ip access-list extended quiet_mode_access
ME3x00(config-ext-nacl)#permit tcp host 10.1.1.1 any eq 22
ME3x00(config-ext-nacl)#deny tcp any any eq telnet
ME3x00(config-ext-nacl)#deny tcp any any eq www
ME3x00(config-ext-nacl)#deny tcp any any eq 22 
ME3x00(config-ext-nacl)#exit

Then apply it to the box using:

ME3x00(config)#login quiet-mode access-class quiet_mode_access 

That way a dedicated management box (10.1.1.1) will always be able to connect and manage the device, even if it is in quiet mode, while everything else will be locked out.

Logging


The device also generates some helpful syslog messages to tell you what it's doing, for example here is my device going into quiet mode:

%SEC_LOGIN-1-QUIET_MODE_ON: Still timeleft for watching failures is 22 secs, [user: billybob] [Source: 10.10.10.10] [localport: 22] [Reason: Login Authentication Failed] [ACL: sl_def_acl] at 16:30:00 BST Sun Oct 5 2014

And coming out again:

%SEC_LOGIN-5-QUIET_MODE_OFF: Quiet Mode is OFF, because block period timed out at 16:35:00 BST Sun Oct 5 2014

References


Cisco IOS Security Configuration Guide

Sunday 28 September 2014

Configuring Basic 4G LTE Connectivity on the Cisco 819 Router

I've recently had the mixed fortune to have set up a couple of Cisco routers for 3G and 4G data services. It turns out to be surprisingly simple, although I found myself having to flit around between a handful of different documents to work out how to get it working. Luckily I was sat next to someone who previously worked in one of the UK's biggest mobile carriers while I was working on it, which saved me a bit of head scratching at times.

As a little bonus I had two different devices to work on - the first being an old Cisco 1841 router with a 3G WIC installed, the second being a Cisco 819 (4G LTE model). As it turns out the concepts are pretty similar but the syntax is moderately different between the two platforms, so in time I'll write up the process for each. This post and its accompanying video will explain the 4G version.



Building Blocks


While there are a few mobile-specific pieces of configuration, anyone who has previously worked on ISDN, async modems or ADSL on Cisco routers will probably find a lot of familiar concepts. Here are the main elements of a 3G / 4G configuration:

Cellular Profile 


This is where the APN address and authentication mode are configured. These are saved to the modem's NVRAM as soon as they are applied. Here's an example of how to set a cellular profile on the two different platforms:

Router#cell 0 lte profile create 1 three.co.uk none

PDP Type = IPv4
Access Point Name (APN) =
Username =
Password =
Authentication = NONE

Profile 1 already exists with above parameters. Do you want to overwrite? [confirm]

Profile 1 will be overwritten with the following values:

PDP type = IPv4
APN = three.co.uk
Username =
Password =
Authentication = NONE

Are you sure? [confirm]
Profile 1 written to modem
Router#


Note that this example is for three, a UK mobile carrier which is interesting because the APN uses no authentication (and barfs if you try to authenticate with it). I found during my testing that an 819 router running IOS 15.2 does not have the option to use authentication type "none". Under 15.3 the option is there and works fine - luckily I had another 819 with 15.3 installed which worked and so a) I knew that's what the problem was and b) I could copy the image across!

Also note that this is applied at the exec prompt rather than in config mode.

Most carriers use CHAP authentication, these just require the authentication type and credentials added to the command, for example:

cell 0 lte profile create 1 everywhere chap eesecure secure

The number "1" here indicates which slot on the modem will be used to store the profile. This is significant later on because there may be multiple APNs configured and the router needs to know which to use when connecting.

Cellular Interface


The physical radio interfaces are referred to using the "Cellular" prefix, in this case Cellular0. The Cellular interface is where the dialer, authentication and IP details are normally configured - I say normally as there are many different ways to configure dialers depending on what kind of load balancing and resilience are required. For a typical 3G / 4G deployment, though, you will only have one physical interface and so the simplest way is to forget about pools and put the config straight onto that.

Here is an example configuration showing the key elements:

interface Cellular0
 ip address negotiated
 encapsulation slip
 dialer in-band
 dialer-group 1
!
dialer-list 1 protocol ip permit
ip route 0.0.0.0 0.0.0.0 Cell0


The key thing to notice here is that unlike the old 3G config there is no "*98*1#" type diaper string. If you want to use alternative profiles you have to mess around with config under the "controller cellular 0" context.

The rest is fairly standard dialer stuff, in this example I've made the dialer-list so that any IP traffic will cause it to connect.

Note that the encapsulation specified on this 4G interface is SLIP. When 4G is not available it will fall back to 3G (which uses PPP encapsulation) - it does this transparently and does not need the encap changed.

Sundry Config

At this point the router should be able to connect to the cellular network. For most purposes, though, you will need to either set up NAT or some sort of VPN tunnel for the connection to be of any use. These are set up the same way as for any other setup.

Testing and Diagnostics

How can you tell whether the cellular connection is coming up? The first clue is that log entries similar to the following should appear:

%LINK-3-UPDOWN: Interface Cellular0, changed state to up

If the above log entries don't appear it could be because the modem is not ready yet. The modems in the 819 routers I was playing with took an incredibly long time to boot. The following logs indicate that the modem has (finally) booted up:

%CISCO800-2-MODEM_UP: Cellular0 modem is now UP.
%CISCO800-6-SIM_STATUS: SIM in slot 0 is present

However at this point the modem will still need to attach to the cellular network, which can take a little time. To check whether the modem is attached to the radio network, use the following commands:

Router#show cell 0 radio
Radio power mode = ON
Channel Number = 1667
Current Band = LTE
Current RSSI = -60 dBm
Current RSRP = -84  dBm
Current RSRQ = -4  dB
Current SNR = 10.6  dB
LTE Technology Preference = AUTO
LTE Technology Selected = LTE

Router#show cell 0 network
Current System Time = Sun Jan 6 0:1:30 1980
Current Service Status = Normal
Current Service = Packet switched
Current Roaming Status = Home
Network Selection Mode = Automatic
Network = 3
Mobile Country Code (MCC) = 234
Mobile Network Code (MNC) = 20
Packet switch domain(PS) state = Attached
Registration state(EMM) = Registered
Location Area Code (LAC) = 107
Cell ID = 9150878
Primary Scrambling Code = 65535


Note that the band and channel need to be populated, the network should display the expected carrier name and the packet service should show as attached. The actual band and service type will vary depending on carrier, coverage, area and equipment used.

If the radio interface is up but a data connection cannot be established then all the usual debugs may be used:

debug dialer (to verify it is trying to dial)
debug chat (sometimes useful to deduce whether APN is configured correctly)
debug cellular 0 messages callcontrol (shows the cellular network assigning the IP and DNS)

Note on Earlier Releases


Prior to IOS 15.3 you had to define your own chat string and apply it to the line - later releases do this automatically. If your "debug chat" output shows anything about "ATDT" or expecting "CONNECT" then this probably applies to you. Making and applying the chat script is pretty simple:

chat-script lte "" "AT!CALL" TIMEOUT 20 "OK"
line 3
 script dialer lte
!

Basically this says to define a script called "lte" which waits for nothing, sends "AT!CALL" to the modem and expects to get "OK" in return within 20 seconds. Then that script gets attached to line 3, which in the 819 router is the 4G cellular modem.

References


Cisco 4G configuration guide
YouTube clip accompanying this post


Thursday 25 September 2014

How to Get the Chassis Serial Number in IOS-XR

This caught me out the other day and I had to find the answer deep within a Cisco document. To hopefully save someone else having to wade through all that, the commands to find out chassis serial numbers on the ASR9k are as follows:

RP/0/RSP0/CPU0:nodename#admin
Thu Sep 25 14:15:16.645 BST
RP/0/RSP0/CPU0:nodename(admin)#show dsc
Thu Sep 25 14:15:27.080 BST
---------------------------------------------------------
           Node  (     Seq)     Role       Serial State
---------------------------------------------------------
    0/RSP0/CPU0  (       0)   ACTIVE  ABC2345X678 PRIMARY-DSC
RP/0/RSP0/CPU0:nodename(admin)#


Err.... simple?

Saturday 13 September 2014

Crippling CPU Load on Back to Back ASAs

I was recently involved in troubleshooting a problem where an ASA firewall's CPU was hitting 100%. One of its interfaces was seeing much higher traffic levels than the others, so we did some fairly run-of-the-mill troubleshooting including a packet capture. What this showed was the same, seemingly innocuous, packet repeated thousands upon thousands of times.

The payload was identical, in fact everything from the IP layer and up remained identical from one frame to the next. The only thing that varied was that the source and destination MAC addresses were swapped each time - clearly the packet was ping-ponging between two devices.

We checked the MACs and found they were legitimate - one was the local firewall, while the other was its default gateway - another ASA upstream towards the Internet.

This got our attention. First of all there was a routing loop, which is bad enough, but a packet should never be able to loop forever like that. That's why we have Time To Live (TTL) after all - the number which decrements by one each time a packet goes through a routed hop with the packet being thrown away when its value reaches zero. The key thing here is that the packet, including its TTL, was not changing at all so it never got removed from the system.

The cause of the routing loop was relatively easily found by looking at the source and destination IPs on the packet. The setup was as follows:


What had happened here was that a RAS user had connected to the tenant firewall using their IPSec client and started talking to some devices on the server LAN:

At some point the IPSec session had ended while an internal device was still sending traffic towards the user. This creates an interesting corner case:




The routing is "correct" here - the multi-tenant firewall needs to route the RAS subnet via the tenant firewall so that RAS users can connect to shared resources. The tenant firewall needs to route the traffic outwards for it to hit the right crypto maps. The problem comes when a packet is destined for an IP in the RAS pool which is not associated with a live VPN session.

Our bodge to get us out of the immediate hole was to put a deny entry inbound on the mutli-tenant firewall for anything targeted at a RAS pool address. These packets should never make it onto the transit LAN as any legitimate traffic to that range would need to be tunneled via IPSec and therefore the multi-tenant firewall would see a public IP as the destination. After a lot of thinking we couldn't come up with a better answer than this and decided just to stop calling it a bodge.

OK, so first problem solved. Next question, why was the packet looping forever without ever reducing its TTL?

Root Cause


As it turns out this is by design on the ASA (and the good old fashioned PIX & FWSM before it). The idea is that if the firewall behaved like any other routed hop and decremented the TTL then it would be visible in traceroutes. To be fair if it did decrement TTL it would just appear as a black hole in the trace as the ASA doesn't really "do" unreachables unless you force its hand. This normally doesn't cause any problems, even if there is a routing loop. Take a typical deployment where an ASA is attached to a router as shown below:


If we get a loop between the ASA and a traditional router then the packet will eventually be taken out of the loop. Even though the ASA doesn't decrement the TTL, the router does so it eventually gets dropped - half as fast as normal and always by the router (which will punt the packet to the CPU and usually generate an ICMP TTL expired, which can be pretty CPU intensive on small devices), but it does get dropped eventually.

The problem comes when we have a pair of non-decrementing devices (ASAs) back to back at layer 2. Both devices route the packet but neither device decrements the TTL, so if there is a loop between the two it the packet will go around and around forever. Eugh...

The moral of the story is that it's probably best not to put ASAs back to back. I could have sworn I'd seen this setup in Cisco whitepapers before but, now that I look, I can't find it anywhere. The closest I can find is IOS firewall back to back with ASA or two ASAs with a server between. Perhaps there's a good reason for that :)

As with my situation, though, in most cases by the time you get to realise there is a problem the hardware has long since been bought, installed and is carrying live service. So what can you do?

Well, as noted above you can use ACLs to block potential loop traffic but in all honesty that just fixes by exception. You could be fairly liberal with what you block (e.g. drop all RFC 1918 addresses where you would expect to only see public IPs) but it's still imperfect.

Making the ASA decrement TTL


A better idea would be to have at least one of the ASAs decrement TTL. It's a bit uncomfortable to retro-fit but there is a way built in to ASA versions 8.0(3) and above using "set connection decrement-ttl" under a policy map. There are two different ways to do it, one is to adjust the "global_policy" policy map which applies to the entire device by default, or you can create a new policy map to apply to a single interface.

Here's how to apply it to the entire device:

policy-map global_policy
 class class-default
  set connection decrement-ttl
!

The effect is immediate as the global_policy is applied to all traffic by default. Alternatively, if you only want to apply it to specific interfaces, you can create a separate policy map and apply it as follows:

policy-map asa_workaround
 class inspection_default
  inspect dns preset_dns_map
  inspect ftp
  inspect h323 h225
  inspect h323 ras
  inspect netbios
  inspect rsh
  inspect rtsp
  inspect skinny
  inspect esmtp
  inspect sqlnet
  inspect sunrpc
  inspect tftp
  inspect sip
  inspect xdmcp
  inspect icmp
 class class-default
  set connection decrement-ttl
!

service-policy asa_workaround interface interface-name

The above is modeled on the standard default policy & inspections, if you've changed yours from default you probably don't need to be reading this!


Summary


So there you have it - ASAs back to back is a bit dangerous unless you take measures to protect against routing loops. This can be in the form of strict ACLs or by enabling TTL decrement, either globally or on specific interfaces.

References


Cisco guide to enabling traceroute through ASA

Cisco guide to modular policy framework on ASA


Friday 22 August 2014

AS-Override and the Importance of SoO

Recently I discovered that AS-override works in the opposite direction to what I thought! Now, this is largely academic as in most cases if you apply it to one peer you apply it everywhere, but I was dealing with a bit of a corner case and it caught me out as I had to mess about with (i.e. clear) a peer that I didn't really want to touch.

Cisco's config guides are a little bit ambiguous, saying:

"To configure a provider edge (PE) router to override the autonomous system number (ASN) of a site with the ASN of a provider, use the as-override command in VRF neighbor address family configuration mode. To restore the system to its default condition, use the no form of this command."

No mention whatsoever of in which direction the override happens. I always thought that a PE configured with AS override just didn't add the peer's AS to the AS_PATH when it received routes from the peer. It turns out I know nothing and that is not how it works at all.

In fact, as-override has no effect at all on received routes; it works only in the outbound direction. This makes sense, really, as the AS_PATH within carrier the carrier remains true (i.e. the service provider still gets to see what AS the routes originally came from). It's only when advertising routes out of the AS that as-override makes a difference, "overriding" the peer's ASN with the provider's.

But what does "overriding" mean? Let's take a look at some scenarios.

Simple Base Case



In a simple case where the AS_PATH (as seen by the provider) only contains a single entry and this corresponds with the peer's ASN, clearly the provider just replaces that with their own ASN. By "replace" I mean the peer's ASN is overwritten by the carrier's and then, as the route is advertised via eBGP, the carrier's ASN is added as normal. The route the peer receives, therefore, has an AS_PATH containing two copies of the carrier's ASN:

CE2#show ip bgp
BGP table version is 5, local router ID is 10.2.2.2
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal, 
              r RIB-failure, S Stale, m multipath, b backup-path, f RT-Filter, 
              x best-external, a additional-path, c RIB-compressed, 
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found

     Network          Next Hop            Metric LocPrf Weight Path
 *>  10.0.1.0/24      10.2.2.1                               0 100 100 i
 *>  10.0.2.0/24      0.0.0.0                  0         32768 i
CE2#
 

Prepended



So what if there are multiple copies of the peer's ASN at the start of the path? Well, as you might expect the whole topology doesn't suddenly tumble down. All copies of the peer ASN are replaced wih the carrier's ASN (after all, if we only replaced the first then the peer would still see it's own ASN and drop the update) before, again, adding the carrier's ASN as the route is advertised:

PE1#show ip bgp vpnv4 vrf cust1
BGP table version is 5, local router ID is 1.1.1.1
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal, 
              r RIB-failure, S Stale, m multipath, b backup-path, f RT-Filter, 
              x best-external, a additional-path, c RIB-compressed, 
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found

     Network          Next Hop            Metric LocPrf Weight Path
Route Distinguisher: 100:100 (default for vrf cust1)
 *>  10.0.1.0/24      10.1.1.2                 0             0 65000 65000 65000 i
 *>  10.0.2.0/24      10.2.2.2                 0             0 65000 i
PE1#

As observed on the PE, the route learned from CE1 has been prepended twice giving a total AS_PATH length of 3. All three of these 65000s will all be overridden when advertised towards CE2, creating the three 100s in orange and another copy of the local ASN (in red) will be added on egress as shown:


CE2#show ip bgp
BGP table version is 7, local router ID is 10.2.2.2
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal, 
              r RIB-failure, S Stale, m multipath, b backup-path, f RT-Filter, 
              x best-external, a additional-path, c RIB-compressed, 
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found

     Network          Next Hop            Metric LocPrf Weight Path
 *>  10.0.1.0/24      10.2.2.1                               0 100 100 100 100 i
 *>  10.0.2.0/24      0.0.0.0                  0         32768 i
CE2#
 

ASN Arbitrarily Contained in the AS_PATH

Another possibility exists where there are multiple AS involved. What if the customer connects to two different carriers who, in turn, connect to each other? This introduces the possibility that a customer route is learned from the other carrier, which then needs to be advertised out to the customer. The diagram below probably explains things better:



Thankfully, as-override doesn't seem to be too fussy (unlike remove-private-as in earlier IOS) and will replace the ASN wherever it appears in the path. It literally operates like a "find and replace all". Here's the AS_PATH from the PE's perspective showing the customer's ASN, followed by the other carrier's ASN:

PE1#show ip bgp vpnv4 vrf cust1              
BGP table version is 6, local router ID is 1.1.1.1
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal, 
              r RIB-failure, S Stale, m multipath, b backup-path, f RT-Filter, 
              x best-external, a additional-path, c RIB-compressed, 
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found

     Network          Next Hop            Metric LocPrf Weight Path
Route Distinguisher: 100:100 (default for vrf cust1)
 *>  192.168.0.0      172.16.1.2                             0 200 65000 i
PE1#

And here we see the route from CE2's perspective, with the 65000 (customer's) ASN replaced by the provider's ASN (100 shown in orange), followed by the untouched transit AS (in blue) and finally the provider's ASN is added on egress (in red):

CE2#show ip bgp   
BGP table version is 4, local router ID is 10.2.2.2
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal, 
              r RIB-failure, S Stale, m multipath, b backup-path, f RT-Filter, 
              x best-external, a additional-path, c RIB-compressed, 
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found

     Network          Next Hop            Metric LocPrf Weight Path
 *>  192.168.0.0      10.2.2.1                               0 100 200 100 i
CE2# 

Don't forget - the other carrier will also have to use as-override, otherwise CE1 will discard CE2's routes.

In Coordination with "local-as"


Now, as you'd imagine as-override works in conjunction with "local-as" on the PE. The happy news is that when local-as is in use, the ASN specified in the local-as command is used to override the customer ASN (after all, we're pretending to be that ASN). The bad news is you get some funny looking AS_PATHs.

Let's take a simple case where two CEs connect to a single PE, actually configured as AS 50 but masquerading as AS100:



As you can see, on the PE, the route learned from CE1 shows the customer's ASN and also one copy of the pretend AS which is tacked on by default when using the local-as command:

PE#show ip bgp vpnv4 vrf cust1
BGP table version is 4, local router ID is 1.1.1.1
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal, 
              r RIB-failure, S Stale, m multipath, b backup-path, f RT-Filter, 
              x best-external, a additional-path, c RIB-compressed, 
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found

     Network          Next Hop            Metric LocPrf Weight Path
Route Distinguisher: 100:100 (default for vrf cust1)
 *>  10.10.10.0/24    10.1.1.2                 0             0 100 65000 i
PE#

If we look on CE2 we can see:

CE2#show ip bgp
BGP table version is 33, local router ID is 192.168.2.2
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal, 
              r RIB-failure, S Stale, m multipath, b backup-path, f RT-Filter, 
              x best-external, a additional-path, c RIB-compressed, 
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found

     Network          Next Hop            Metric LocPrf Weight Path
 *>  10.10.10.0/24    10.2.2.1                               0 100 50 100 100 i
CE2#

To explain this strange arrangement, we have:
  • 100 - the pretend ASN (normal local-as behaviour, added on egress)
  • 50 - the real ASN (normal local-as behaviour, added on egress)
  • 100 - the pretend ASN (normal local-as behaviour, added on ingress from CE1)
  • 100 - the pretend ASN used in place of the customer ASN (as-override)
Now, it's possible to set "local-as" with the "no-prepend" directive. This makes the situation slightly cleaner in that the pretend ASN is no longer added as routes are received by the PE from the relevant peer. In other words you lose the orange ASN out of the path, but notice that in this example the no-prepend has to be applied on the PE's peering with CE1 in order to clean up CE2's BGP table....

Really, though, how many bodges do you want in play at once?

The Importance of SoO

Whenever altering the behaviour of something as important as BGP's loop prevention mechanism it is important to have a safety net. Unless you're very careful it's possible to introduce routing loops, particularly where multiple ISPs / ASNs are involved. Site of Origin, or SoO for short, provides just such a safety net.

The mode of operation is as follows:
  • A SoO extended community is allocated for each customer site
  • The SoO value is configured against each customer BGP peer within the PE router
  • As routes are learned from a neighbour, the SoO extended community is attached to them to indicate their site of origin
  • The PE checks any routes that are waiting to be advertised to a BGP peer and handles them according to the following rules:
    • Any routes that are found to have the same SoO as the peer are not advertised to that particular peer
    • Any routes that have a SoO community different to the peer's are advertised to that peer
    • Any routes that do not have a SoO community attached are advertised
Now, SoO is occasionally overlooked as as-override often appears to work without it. Really, though, you are storing up problems for later.

Here's an example of SoO config on the PE:

router bgp 100
!
 address-family ipv4 vrf cust1
  neighbor 10.1.1.2 remote-as 65000
  neighbor 10.1.1.2 activate
  neighbor 10.1.1.2 as-override
  neighbor 10.1.1.2 soo 100:1
  neighbor 10.2.2.2 remote-as 65000
  neighbor 10.2.2.2 activate
  neighbor 10.2.2.2 as-override
  neighbor 10.2.2.2 soo 100:2
 exit-address-family


If there were two links into the same site (or into two sites joined by a backdoor network) then we would set the same SoO on both of its links. Since it uses an extended community (and this is a VRF so extended communities must be turned on) the SoO principle works across sites as well. It's important that different sites use different SoO values, otherwise they will not be able to learn each other's routes.

Problems Without SoO


There's one minor, almost cosmetic, quirk you get if you enable as-override without SoO:

If two CEs attach to the same PE then they will receive a copy of their own routes back from the PE.

This is a strange side-effect of the way update-groups work. Normally the PE would receive routes from both CEs, put together a list of updates and send them to both CEs - at this point each CE would see its own updates but would weed them out due to the AS_PATH containing the local ASN. With as-override enabled on the PE, the customer ASN is overridden with the provider's and the CE has no way to tell that the route was just echoed back.

Normally this doesn't matter as other mechanisms cause the locally injected route to be preferred (weight is set to 32768 for locally originated prefixes unless overridden, static routes generally have a better AD, etc) so it just looks a bit weird. There are cases where this does cause (rather drastic) problems, though. Take the following, not too far-fetched situation:




When the primary feed is up everything is great. The local preference of routes learned over the secondary feed is set to 50 by a route-map to ensure that they are less preferable than those received from the primary.

Let's break the primary feed and see what happens:



The withdrawals ripple through the network until CE2 is aware that the primary has gone away, decides to use the secondary and announces the route upstream. Seems legit so far...


Ah, no... this doesn't look right. The PE has echoed the route back to CE2 and, since the echoed route doesn't contain the local ASN and has a default local-preference of 100 it is now CE2's favourite.

 
Now we're in a right knot. CE2 has told the PE that it has a new route to use, but it contains the carrier ASN in the AS_PATH. The PE drops that as a loop, removes the route from its BGP table and sends a withdrawal message to CE2.

We are effectively back to the start where CE2 only has one option - it will take the route it is learning over the secondary feed and advertise it to the PE. Round and around we go...

There are a few tell-tale signs that this is happening. First of all there will be intermittent connectivity (usually in 30 second steps):

CE2#ping 10.10.10.1 repeat 1000
Type escape sequence to abort.
Sending 1000, 100-byte ICMP Echos to 10.0.1.1, timeout is 2 seconds:
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!................!!!!!!!!
!!!!!!!!!!!!!!!!!!!!
 
The next big giveaway is that when you run "show ip route" the age of the affected route(s) is always very low, typically under 30 seconds on standard BGP timers, and the path alternates between the same two next hops over and over:
 
CE2#show ip route 10.10.10.0
Routing entry for 10.10.10.0/24
  Known via "bgp 65000", distance 20, metric 0
  Tag 100, type external
  Last update from 10.2.2.1 00:00:29 ago
  Routing Descriptor Blocks:
  * 10.2.2.1, from 10.2.2.1, 00:00:29 ago
      Route metric is 0, traffic share count is 1
      AS Hops 3
      Route tag 100
      MPLS label: none
CE2#show ip route 10.10.10.0
Routing entry for 10.10.10.0/24
  Known via "bgp 65000", distance 20, metric 0
  Tag 200, type external
  Last update from 192.168.2.1 00:00:02 ago
  Routing Descriptor Blocks:
  * 192.168.2.1, from 192.168.2.1, 00:00:02 ago
      Route metric is 0, traffic share count is 1
      AS Hops 1
      Route tag 200
      MPLS label: none
CE2#
 
Finally, another good indication is that your BGP table version number is through the roof and continuously incrementing:
 
CE2#sh ip bgp
BGP table version is 51423, local router ID is 192.168.2.2
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal, 
              r RIB-failure, S Stale, m multipath, b backup-path, f RT-Filter, 
              x best-external, a additional-path, c RIB-compressed, 
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found

     Network          Next Hop            Metric LocPrf Weight Path
 *>  10.10.10.0/24    10.2.2.1                               0 100 100 200 i
 *                    192.168.2.1              0     50      0 200 i
 
CE2#

Note that you can also see the genuine and echoed routes in the BGP table (sometimes, re-check periodically).

It's possible to bodge together a route-map or prefix-list to 'fix' this, in fact just applying any unique route-map outbound on the PE will put the peer into a separate update-group which will bodge it into action. Please just use SoO, though - that's what it's there for!