Thursday 24 October 2013

IOS XR Gotchas

I've recently started doing some work with IOS XR - I have to say I'm quickly getting to love XR and, if I'm honest, completely going off IOS as far as MPLS and BGP are concerned. I've found a couple of gotchas around the way RPL and BGP works as compared to normal IOS. Aside from the obvious need to type "commit" every time you make changes there are a few quirks that have caught me out - I expect I'll continue to add to this post as and when I get caught out by new and wonderful things.

No Policy = No Routes

One of the big, fun changes in IOS-XR is that if you don't apply either an inbound or an outbound route policy to a BGP peer then it assumes you've made a mistake and does not advertise or accept any routes at all. You can see that this is the case because the peer is marked with an excalmation mark (!) in the BGP summary.

The fix: apply either an inbound policy, an outbound policy or both.

Soft Reconfiguration Inbound

Another surprise when migrating to IOS-XR from IOS is that the rules for soft-reconfiguration have changed. Originally, before route-refresh capability came along, soft-reconfiguration was primarily there so that you could edit an inbound route-map and apply it without doing a hard reset on the BGP session. IOS-XR is a bit of a smarty-pants so if you configure "soft-reconfiguration inbound" and the peer turns out to support the refresh capability, it decides you don't need that local copy of the received routes - if it needs to apply an updated policy it can just request a refresh from the peer.

Most people who do a lot with BGP will be familiar with how useful it is to have a copy of all the routes received from a peer, including the ones dropped by policy, for troubleshooting purposes. Luckily, it is possible to force the router's hand by configuring "soft-reconfiguration inbound always".


Empty Prefix Sets Don't Work

IOS-XR has a new CLI construct called a set - mathematicians will already be familiar with these as an un-ordered group of items. In RPL sets can be used for communities or prefixes, but beware - if you create a prefix set but leave it empty, the CLI will accept it but any policies that reference the set will fail. If you're seeing a message like this:

% The policy [policy-name] uses an invalid argument to the [pfxmatch] condition on the [destination] field. Internal error:  no parameters

... then you've got an empty prefix set somewhere. Curiously it doesn't seem to mind about empty community sets. The workaround for this is to put a dummy prefix into any sets that need to be there but don't currently have anything in them.

Applying a Route Policy (or maybe destroying it!)

One nasty bit of CLI I've found is this: to apply a route-policy to a BGP peer you would type "route-policy policyname in" or "route-policy policyname out" under the configure -> router bgp -> neighbor context. If, however, you omit the "in" or "out" keyword, XR kindly assumes you want to blow away the policy and build it from scratch! Luckily you can just type "abort" to escape disaster... but you lose whatever you changed since your last commit.


Tuesday 8 October 2013

Unexpected LDP Support in dechap

About a week ago I made a couple of updates to dechap which allowed it to run dictionary attacks firstly against OSPF packets and then against BGP packets. Since then I've been thinking about adding other protocols such as TACACS+ (turns out to be hard, see my other post on this), LDP, HSRPv2, you name it.

This evening I decided to sit down and work on LDP support, only to discover that just like BGP, LDP uses the TCP MD5 signature option for authentication. After pausing for a moment to wonder if I really was that lucky I decided to knock together a quick test. Basically the answer is "yes, I am that lucky" and "yes, dechap v0.4a works against LDP". So I can add another protocol to the growing list without even modifying the code.

So to summarise, dechap can now attack:

  • PPP / PPPoE CHAP authentication
  • RADIUS CHAP authentication
  • L2TP CHAP authentication
  • OSPF MD5 authentication
  • BGP MD5 authentication
  • LDP MD5 authentication
...straight from the pcap file, even if the captured traffic has MPLS labels and / or VLAN tags.

Time to move onto the next target... any suggestions?

Meanwhile, dechap v0.4a can be downloaded at my github.

Saturday 5 October 2013

Be Careful where you use TACACS!

As part of my on-going work to add more and more protocols into my hobby project dechap, I started looking into the workings of TACACS+ today. I was looking to see whether TACACS+ would be a likely candidate as the next attackable protocol. I had in my mind a couple of events from my past that made me suspect that a TACACS+ server couldn't really tell when an incorrect key was in use, other than that the packet decoded to garbage that it was then not able to interpret. If that is the case, it would be very hard to attack the protocol as there is not a straightforward way to tell when you've hit the correct key. I'll let you know when I've figured that out because as I read the protocol spec something else derailed my train of thought.

Like most network engineers I was raised on Cisco's literature and read through plenty of their whitepapers such as this one comparing RADIUS and TACACS+. This document is full of useful facts such as "RADIUS does not allow users to control which commands can be executed on a router and which cannot" (no, Cisco, because for some reason you notched out the ability to do so in IOS) and that RFC compliance doesn't guarantee interoperability. One of the parts I always remembered and believed, in the olden days at least, was the part that says that far beyond the argument of TCP being better than UDP, TACACS+ is more secure than RADIUS due to the way it encrypts the entire message body. I suspect most people just swallow that without chewing, I know I did. Encryption is good, so more encryption must be better.

RADIUS, by comparison, uses CHAP for secure password authentication but makes no attempt to encrypt the parameters within requests and responses. To be fair, these can just be read in plain text straight off the wire. The shared key used when configuring RADIUS is purely an authentication measure - it guards against arbitrary spoofed requests and tampering but does not offer any kind of privacy.

In reality the biggest part of the decision when choosing one or the other will be a "horses for courses" argument. Want to authenticate PPP subscribers with a nice wholesale / retail proxy model? Use RADIUS. Want to authenticate logins to infrastructure devices and authorise specific commands down to the parameter level? Use TACACS+.

A Slightly More Balanced Comparison

Let's compare the relative merits, security wise of each protocol.

Message Encryption

TACACS+, as previously mentioned, encrypts the entire message body using a pre-shared key. It only leaves the header in the clear, so without the key it is only really possible to determine who is client and who is server, plus what kind of messages are being passed (authentication or authorisation, query or response).

RADIUS uses a pre-shared key to authenticate messages going back and forth, but the messages themselves are unencrypted and can easily be read straight off the wire.

Credentials

RADIUS relies on CHAP for user credential validation. The NAS sends a "random" challenge to the user, who produces a one-way hash of the challenge data and password (plus some other stuff) and returns that to the NAS. The NAS then sends the challenge and response off to the RADIUS, meaning that the credentials are never sent over the wire in any reversible way. In order to get the password an attacker must capture the challenge and response data then run a dictionary or brute force attack. On the down side, the RADIUS server itself must have a plaintext copy of the password available in order to verify that a response is correct given the challenge. Clearly if the RADIUS server's password database is compromised then things get quite sticky. For proxy RADIUS, the proxy does not need access to plaintext passwords. In summary, passwords are safe in flight but exposed at rest.

TACACS+ relies on the pre-shared key to encrypt everything, including password information. No form of CHAP or similar system is used, so credentials are passed in a reversible form over the wire. It's encrypted, though, so don't worry - unless an attacker knows the key it's all just gibberish. On the positive side of this, the TACACS+ server does not need to store plaintext passwords for the end users and can instead keep one-way hashes on disk meaning that a compromised database is arguably less of an issue. Safe at rest and safe in flight.

Or is it? Think about the typical use case again. TACACS+ is more-or-less always used to authenticate CLI users logging into routers and switches. The key used to encrypt the TACACS+ communications is stored in the device config, either completely in plain text or using (trivially) reversible type 7 encryption. Virtually all devices are left with the password recovery mechanism enabled. Most of the time the key is re-used across every device in the estate since it makes administration easy and, what's the risk anyway?

A Really Easy Attack

I'd like to point out I'm not suggesting or endorsing any kind of illegal or immoral behaviour. Even as a joke :)

The thought occurs that many TACACS+ managed devices are in remote locations - far flung or sparsely populated offices, in accessible wiring closets, even (*shudder*) customer sites. Given physical access to a device, it's very possible to make a terminal emulator script to perform a password recovery, dump out the config then reset the config register to its original value within only a couple of seconds more than it takes to double-reboot the device. I know this because I did it many moons ago (I don't have it any more - it was pretty easy to write, though).

If I were an evil adversary who wanted to get some credentials, perhaps a good way would be:
  1. Feign a power cut, on-site work or some other convincing reason for a device to go down
  2. Take the device off the network (to avoid it phoning home by syslog / SNMP) and perform a quick password recovery / config dump before putting it back to its original condition:

  3. *break*
    Readonly ROMMON initialized
    program load complete, entry point: 0x8000f000, size: 0xcb80

    monitor: command "boot" aborted due to user interrupt
    rommon 1 > confreg 0x2142

    You must reset or power cycle for new config to take effect
    rommon 2 > reset

    *snip snip*
    Would you like to enter the initial configuration dialog? [yes/no]: no

    Press RETURN to get started!

    Router>enable
    Router#show startup-config | include tacacs
    aaa authentication login default group tacacs+ local
    aaa authorization exec default group tacacs+ none
    aaa authorization configuration default group tacacs+
    tacacs-server host 10.4.4.10
    tacacs-server key supersecret
    Router#conf t
    Enter configuration commands, one per line.  End with CNTL/Z.
    Router(config)#config-register 0x2102
    Router(config)#^Z
    Router#reload
    System configuration has been modified. Save? [yes/no]: no
    Proceed with reload? [confirm]


  4. Stick a sniffer inline between the device and wherever its administrators are
  5. Call in a fault saying that since the power cut / whatever nothing attached to that router / switch is able to see the network - perhaps leave the LAN side disconnected for authenticity
  6. Capture TACACS+ packets as the administrators log in to investigate
  7. Come up with some compelling reason for comms to go down again while the sniffer is taken out
Now with the config dump it is trivial to get the TACACS server key - it's either just there in the clear or can be decoded from the type 7 encrypted version using any number of free tools. If you put this key into the TACACS+ protocol settings of Wireshark (in the preferences screen expand the protocols area then scroll down to TACACS+), it will happily decrypt the captured packets from step 5:


Configuring Wireshark
Viewing the Decrypted Payload


Now you have the administrator username(s) and password(s) in plain text!

Yikes!

Are you sure you still want to run TACACS on that remote box?

Thursday 3 October 2013

BGP support added to dechap

Hot on the heels of adding the ability to attack OSPF MD5 authentication, I've added BGP support to dechap. It is now possible to feed a pcap file with PPPoE, L2TP, RADIUS, OSPF and BGP packets to the same tool and perform offline dictionary attacks on the authentications within.

As usual, if you're not interested in the theory just skip right to the end for the download link.

TCP MD5 Signatures

BGP authentication uses the MD5 Signature TCP option field, which is defined in RFC 2385. Personally, I found this RFC very vague and it took a lot of iterations to get the technique right. It's particularly fuzzy about what is included in the hash, what isn't and how to present values correctly. I'm hoping to document the process a little more clearly for the next poor guy who tries to implement it as I couldn't find a sufficiently detailed reference anywhere.

RFC 2385 states that the hash must be calculated over the following:

1. the TCP pseudo-header (in the order: source IP address,
   destination IP address, zero-padded protocol number, and
   segment length)
2. the TCP header, excluding options, and assuming a checksum of
   zero
3. the TCP segment data (if any)
4. an independently-specified key or password, known to both TCPs
   and presumably connection-specific

Now, maybe it's just me, but this raised a lot of questions in my mind. Zero padding usually means to fill the trailing space with zeros, but padding the second byte would effectively multiply the protocol number by 256 so should it be a leading zero? Which headers and options are included in the "segment length"? Should the pad bytes be copied with the TCP header?

Through a lot of trial and error I found that:
  • The zero padding goes before the protocol number
  • The "segment length" includes the TCP header, the TCP options (including room for the MD5 signature option being calculated) and the actual payload data
  • The copied TCP header should be 20 bytes long, i.e. includes two padding bytes after the (zeroed out) checksum. The header length remains as-is, including the length of the options.
  • The TCP segment data starts immediately after the TCP options and runs to the last byte indicated by the IP length field
  • The null byte terminating the password is not passed to the hash algorithm
 The resulting hash value is then stored inside the MD5 signature option (kind 19, length 18).

Checking / Attacking BGP Packets

Using the above method it is straightforward to run a dictionary attack as follows:


  • Start with a sniffed BGP packet (see the original dechap blog post for info on how this is extracted).
  • Extract and store the authentication hash (look for option kind 19) for later comparison
  • Put together the "pseudoheader" as described above
  • Append the TCP header without options
  • Append the TCP payload
  • Append the candidate password
  • Calculate the MD5 hash over the complete data set and compare to the value seen in the sniffed packet. A matching hash indicates a matching password.
As of v0.4a, dechap can now be used to automate this process.

Obtaining the Tool

The C source code may be downloaded from: https://github.com/theclam/dechap

Provided the OpenSSL dev libraries are installed it should be possible to simply extract the source code, cd into the directory then run "make". I've only tested this under Ubuntu Linux but there are very few dependancies so I would imagine it will work on most distributions.

Using the Tool

As usual - this is for legitimate audit and recovery purposes and must not be used for any kind of malicious activity.

The usage is pretty straightforward - there are only two parameters and both are mandatory. Specify your capture file (original pcap format) with the -c flag and your word list with the -w flag. Here's an example:

lab@lab:~/dechap$ ./dechap -w mywords.txt -c bgp.cap
Found password "password1" for TCP from 10.0.0.2 to 10.0.0.1.
Found password "password1" for TCP from 10.0.0.1 to 10.0.0.2.
Found password "password1" for TCP from 10.0.0.2 to 10.0.0.1.
lab@lab:~/dechap$
I'm not sure how quickly it runs but it doesn't seem quite as quick as the OSPF version. I suppose BGP packets tend to be a little bigger than OSPF so there's more to hash. You can improve the speed by only including one packet for each source / destination pair in each capture as, at present, it doesn't check for multiple packets between pairs and attacks each instance individually.

If you try this out, please leave a comment on this post with your experiences - good or bad. Any suggestions would also be welcome, particularly for other protocols to attack.

References

RFC2385 - Protection of BGP Sessions via the TCP MD5 Signature Option
RFC1321 - The MD5 Message-Digest Algorithm


Wednesday 2 October 2013

Offline Attack on MD5 keys in captured OSPF packets

A few months ago I released a tool called dechap which finds PPPoE, L2TP and RADIUS authentications in pcap files and performs dictionary attacks against them. Since writing dechap I've always thought it would be more useful if it were able to do a similar thing with OSPF packets.

Well, the good news is that I've finally got around to adding OSPF support to dechap! Woo and yay! If you just want the tool, scroll straight to the bottom. If you're interested in the theory, read on.

OSPF Authentication Basics

OSPF, or more accurately OSPFv2 as defined in RFC2328, has three options for authenticating incoming packets:

Null: no authentication is performed at all.

Password: a plaintext password is added in the clear to each OSPF packet. If the password contained in an incoming packet matches the one configured locally then the packet is considered valid and is processed, otherwise it is silently ignored.

Message Digest: an MD5 hash is calculated over a combination of the OSPF packet contents and the password. The hash output is then added to the OSPF packet before transmission. When a packet arrives, the receiving router computes an MD5 hash of the packet contents plus its locally stored password. If the calculated hash matches the one attached to the incoming packet then the check passes and the packet is processed; otherwise it is silently dropped.

Note that this is authentication only - in other words the password only serves to verify that the packet contents are authentic. It does not offer privacy, so all the information within the packet is visible  in the clear.

OSPF MD5 Authentication Detail

One thing I found unclear in RFC 2328 was exactly what data the MD5 hash was calculated over. The RFC states:

Input to the authentication algorithm consists of the OSPF packet and the secret key.

... and clarifies that:

(a) The 16 byte MD5 key is appended to the OSPF packet.

(b) Trailing pad and length fields are added, as
    specified in [Ref17].

(c) The MD5 authentication algorithm is run over the
    concatenation of the OSPF packet, secret key, pad
    and length fields, producing a 16 byte message
    digest (see [Ref17]).

(d) The MD5 digest is written over the OSPF key (i.e.,
    appended to the original OSPF packet). The digest is
    not counted in the OSPF packet's length field, but
    is included in the packet's IP length field. Any
    trailing pad or length fields beyond the digest are
    not counted or transmitted.

Confusingly, Ref17 refers to RFC1321, which defines the MD5 algorithm. MD5 defines a method to pad the input before the hash is calculated, so it's easy to assume that point (b) refers to that - it doesn't. I spent a couple of hours trying to work out why my hashes were coming out to the wrong value before finally figuring it out. To aid others, I've taken the liberty of rewriting the instructions so that they can be understood by thickos such as myself:

Calculating the MD5 Hash

In order to calculate the correct MD5 hash, the following method should be used:

(a) Build the OSPF packet as normal, ensuring that the key number and authentication sequence number are populated. The OSPF length field must contain the total number of bytes in the packet at this point. The checksum must be set to zero.

(b) The authentication key / password in plaintext must be adjusted to exactly 16 bytes, i.e. if the key is longer than 16 bytes then it must be truncated, shorter keys must be padded with null (0x00) bytes until 16 bytes long. The resulting 16 byte "modified authentication key" is then appended to the packet.

(c) The MD5 hash must be calculated over the entire result, i.e. the original OSPF packet plus the 16 byte modified authentication key.

(d) The resulting hash is then written over the modified authentication key in the last 16 bytes of the packet.

Testing / Attacking OSPF Packets

Using the above method it is straightforward to run a dictionary attack as follows:


  • Start with a sniffed OSPF packet (see the original dechap blog post for info on how this is extracted).
  • Extract the original OSPF packet (start immediately after the IP header and continue up to the length specified in the OSPF header)
  • Extract and store the authentication hash (the 16 bytes following the packet) for later comparison
  • Zero out the checksum
  • For each candidate password, pad or truncate to 16 bytes and append to the original OSPF packet. 
  • Calculate the MD5 hash as described above and compare to the value seen in the sniffed packet. A matching hash indicates a matching password.
As of v0.3a, dechap can now be used to automate this process.

Obtaining the Tool

The C source code may be downloaded from: https://github.com/theclam/dechap

Provided the OpenSSL dev libraries are installed it should be possible to simply extract the source code, cd into the directory then run "make".

Using the Tool

As usual - this is for legitimate audit and recovery purposes and must not be used for any kind of malicious activity.

The usage is pretty straightforward - there are only two parameters and both are mandatory. Specify your capture file (original pcap format) with the -c flag and your word list with the -w flag. Here's an example:

lab@lab:~/dechap$ ./dechap -w mywords.txt -c ospf-bcast.cap
Found password "password1" for user OSPF host 10.1.1.1 key 1.
Found password "password1" for user OSPF host 10.1.1.2 key 1.
Found password "password1" for user OSPF host 10.1.1.1 key 1.

lab@lab:~/dechap$

I haven't tried any serious benchmarks for this but it seems reasonably fast. In a worst case scenario (correct key not present) on my creaky old Athlon XP 2100 it can try 100k passwords in under 100ms.

If you try this out, please leave a comment on this post with your experiences - good or bad. Any suggestions would also be welcome (yes, I know BGP exists).

References

RFC2328 - OSPF Version 2
RFC1321 - The MD5 Message-Digest Algorithm


Friday 12 April 2013

LACP miscellanea

According to the statistics, a few people have stumbled across this blog because they were searching for certain 7750-specific information relating to LACP. Here are a couple of answers that were missed out from my main LACP article:

What is the failover time for a LAG / etherchannel? 

The answer to this question varies considerably depending on the setup. If a device notices a bundled interface going physically down then it should unbundle it immediately, causing very low loss (50ms should be achievable).

In the event of an interface remaining physically up (i.e. where there is transmission equipment or EoMPLS between the two devices), also known as a silent failure, the failover will be up to 3 times the LACP timer. So the impact would be up to 3 seconds using fast timers or up to 90 seconds using slow timers. Most lower end Cisco kit only supports slow timers.

In the event of a single fibre fault or other asymmetric failure, you may see a combination of these effects where traffic in one direction heals faster than the other. There are other corner cases such as when administratively shutting down an interface - some devices send an out of sync LACPDU to inform the other end the link is about to go away which helps speed convergence. It is really best to lab test where possible to check different failure scenarios.

Be aware that when using load balanced LAGs, the impact to some streams may be zero. Typically traffic that is hashed onto one link in the bundle will not suffer loss when a different link in the bundle fails.

How can I transport, rather than terminate, LACP through epipe services on the Alcatel-Lucent 7750? 

The answer to this is pretty straightforward, but I know I looked in the wrong place when I first needed to use the feature.

All you need to do is to configure "lacp-tunnel" under the configure -> port -> ethernet context.

How can I transport, rather than terminate, LACP through a QinQ tunnel on a Cisco switch? 

Again, this is pretty straightforward. There are a load of different protocols that can optionally be tunneled on a dot1q-tunnel port, but we just need lacp enabled:


Simply configure "l2protocol-tunnel point-to-point lacp" under the dot1q-tunnel interface.

Normally it makes sense to tunnel everything (STP, CDP, VTP, LLDP, LACP, PAgP, ...) for consistency. Either be a tunnel or don't!

What is the valid range for LAG IDs on the 7750?

For IOM-based systems (i.e. SR-7, SR-12), the usable LAG ID range is 1 to 200. For integrated IOM systems such as SR-1 and ESS-1, the LAG ID range is 1 to 64.


Can you tell me something unusual about LACP on the 7750?

When  the Rx fibre for a LACP-speaking port loses light (i.e. fails), right before the port gets pulled down the 7750 sends an LACP out-of-sync message to inform the other end that it is going away. This is useful for single fibre faults and can drastically improve convergence times, particularly where transmission equipment between the two LACP peers does not forward link loss.

What does "mux: during state COLLECTING_DISTRIBUTING, got event 5(in_sync) (ignored)" mean in a Cisco debug?

As best I can tell it means that a LACPDU was received with the sync bit set, indicating that the far end is ready to use the link, but the link was already collecting / distributing (i.e. in use) so no change in state was required.

What does "lag number : partner oper state bits changed on member port : [expired false -> true]" mean on a 7750 debug?

This means that a particular port's state machine moved into the expired state due to missing three inbound LACPDUs from the peer. Once the port reaches the expired state it is removed from the bundle but the peer parameters are remembered for a further 3 intervals, after which point the peer information is flushed and the port enters the defaulted state.

Do the LACP keys need to match at both ends of a LAG?

No - the LACP key is locally significant and corresponds one-to-one with a LAG or etherchannel ID. It is used to check consistency (i.e. to catch crossed cables) so must be identical for all members within a LAG, however the devices at each end of the LAG can select any value they like for any particular LAG.

Why do ports get "suspended" from an etherchannel?

Basically a port gets suspended if its configuration is not in line with that of the port channel with which it is associated. The most common way to accidentally arrive in this state is for a member trunk port to have a different allowed VLAN list than its parent port-channel interface. While IOS allows member ports to be reconfigured, it is much more sensible to make the configuration changes to the port-channel interface - the changes are then pushed down to the member ports automatically, avoiding this kind of conflict.


Friday 1 March 2013

A Better Way to Compare 7750 Configs

One of the tasks I regularly have to perform as part of my job is to audit router configurations against basebuild templates and occasionally against each other. This can be quite labour-intensive, particularly as good, old-fashioned diff doesn't cope very well with hierarchical configs such as the 7750's. There are many things that make life difficult with traditional diff:
  1. Diff is more-or-less reliant on order being preserved between the files being compared. Some config elements are split across the config file when it is saved. Others may be stored in different locations depending on the software release. Certain items such as layer 3 interfaces are stored in the order that they were added, so two configs may contain the same interfaces but in a different order and traditional diff can't generally work that out.
  2. Traditional diff does not appreciate the significance of policy names, sequence numbers or service IDs in determining what should be compared to what. Ad-hoc insertions and deletions of policy entries or service often causes diff to get completely out of step, making it compare apples to oranges.
  3. Traditional diff does not provide context for differences. Quite often you just get a load of "shutdown" "no shutdown" pairs. It is possible to include a fixed number of lines pre- or post- difference, but even that does not always show the configuration context where the difference actually occurred and brings a load of junk with it. The only sure-fire way is to turn on side by side diff, which shows both configurations in full side by side with changes marked in a centre column.
Point 2 is the real killer, making traditional diff basically useless for audits where a handful of known policies must be validated within a config containing many other policies. Below is a (fairly common) worst case when working with diff - one policy has been removed and another two added:



Traditional diff makes a horrible job of this. Even seeing the two configs side by side it is confusing to look at and it is not immediately apparent what has changed.

I poked and played with a number of "diff" type tools to try and find something that would handle this kind of thing more gracefully but I eventually came to the conclusion that nothing currently existed. I had a rough idea of what I wanted:
  •  It must only compare the contents of like policies, i.e. policy "A" should only be compared to policy "A" and never to policy "B".
  • It must compare configuration elements that appear in a different order in one config to the other.
  • It should, ideally, report the full context of each difference within the hierarchy.
To address these points I decided to write a tool from scratch and called it, unimaginatively, 7750diff. Here's how 7750diff reports the same changes:

D:\7750diff>7750diff a.cfg b.cfg
Unique to a.cfg:
configure
    qos
        scheduler-policy "20000kbps" create
            description "20000kbps Scheduler"
            tier 1
                scheduler "tier1" create
                    rate 20000 cir 20000
                exit
            exit
        exit
    exit
exit
Unique to b.cfg:
configure
    qos
        scheduler-policy "25000kbps" create
            description "25000kbps Scheduler"
            tier 1
                scheduler "tier1" create
                    rate 25000 cir 25000
                exit
            exit
        exit
        scheduler-policy "40000kbps" create
            description "40000kbps Scheduler"
            tier 1
                scheduler "tier1" create
                    rate 40000 cir 40000
                exit
            exit
        exit
    exit
exit

D:\7750diff>


That's not only much clearer (IMHO), but I can take the output and copy / paste it directly into the node to build the missing configuration. Happy days!

How it Works

The basic methodology used by 7750diff is to:
  • Read each config into a hierarchical tree structure based on indent levels
  • Recursively compare the trees starting at the root:
    • For each branch of config A, search for an identical branch on config B within the same context.
    • If a match is found, check for subordinate "child" configuration elements.
    • If any children exist, recursively process them.
    • If no children are then present, remove the matching elements.
Once all elements have been compared only the elements unique to each config will remain, along with the parent elements required to reach the configuration context of the change. The trees can then be output as a list of differences between the two files. In many cases, thanks to the inclusion of context, big chunks of 7750diff output can be directly entered into one node to bring its config into line with the other.

Since the whole thing runs on indents it is not 7750 specific. It "may" work on ISAM configs, it may work on any other sort of config that uses indent / whitespace to denote hierarchy - I just haven't tested it. Try your luck :)

Obtaining the Tool

As usual, the tool is available for download at my github: https://github.com/theclam/7750diff - there is the C source code plus a Windows binary available to download.

The code is pretty horrible as this was my first bit of C coding in over a decade. I may decide to clean the code up at some point but it is quite stable now, meaning I haven't found a config that upsets it for the last couple of versions. I've been running a nightly cron job for quite some time now which pulls down an entire lab's configs and 7750diffs each one against the previous day's and it works great.

If you download 7750diff, please let me know how you get along with it.

Friday 1 February 2013

A Tool for Measuring Forwarding Delay in Packet Captures

I have access to some pretty expensive test kit in work. One of its main purposes in life is to measure the latency of traffic streams passing through a network, which is a pretty useful feature. Occasionally, though, the figures produced can be hard to believe and it would be nice to be able to validate them independently. It would also sometimes be nice to be able to see down to the packet level how the delay varies over time.

Since the tester inserts a "unique" signature into each frame, it is possible to do the calculations by hand - simply take a packet capture of traffic entering and leaving the device (at minimum capture using 2 ports on the same box, preferably use one port with a two-source port mirror), then manually compare the timestamps of the packets pre- and post-routing.

Finding matching pairs of packets is pretty tedious, especially for large captures and particularly where high throughput rates mean that there may be thousands of other frames between a "before" and "after". The technique is sound, though, if you're patient.

For some work I was doing recently, I needed to do this on a grand scale. A multi-megabit stream did not appear to be queuing as expected and it was unclear why. Eventually that particular problem was traced, using Wireshark IO graphs, back to overly bursty traffic being offered into the device under test but it made me think it would be very nice to have a tool for doing this kind of verification and, actually, it would not be difficult to write one. So I wrote a tool, in case I needed it in a hurry later on. The impatient may just want to scroll down to "Obtaining the Tool".

How it Works

Packet headers are inevitably be changed by routing (and, in fact, encapsulation could be added or removed in the process) so packet payloads must be compared in order to find "before and after" pairs. The Spirent TestCenter tester includes a 20 byte "signature" in each generated packet, which is always at the very end of the payload. In practice it is not necessary to compare the entire, variable length, payload to pair up packets. Rather it is sufficient, and much faster, to compare the last 20 bytes of each packet for a matching signature.

The process implemented in the tool is to read each packet from the pcap file, storing the following details in a list entry:
  • Frame number
  • Arrival time
  • Signature
Each entry is then stored in a linked list, as in the following diagram:


Each packet read in adds another node of around 36 - 44 bytes in size. This is smaller than the original capture but can still be a considerable amount of memory when working with very large captures.

Once the complete list has been built, the next job is to identify the "before and after" pairs. This is done by considering each list entry in turn, then looking forward in the list for an entry with a matching signature. If such an entry is found, the frame numbers and timestamps of each frame are output along with the time delta between the two frames. Pretty simple, really.

Obtaining the Tool

The tool is available to download as C source code and as a Windows binary at https://github.com/theclam/fwding.

To build from source code simply extract the source,  change into the directory and type "make".

Using the Tool

Once the binary has been compiled or downloaded, simply run it with the name of the pcap file as its only parameter: For example:

lab@lab:~/Projects/fwding$ ./fwding input.cap
Arrival Frame Number, Arrival Time, Departure Frame Number, Departure Time, Forwarding Delay
1, 1359461693.826304, 2, 1359461693.826354, 0.000050
5, 1359461693.826418, 6, 1359461693.826468, 0.000050
7, 1359461693.826585, 8, 1359461693.826701, 0.000116
9, 1359461693.826818, 11, 1359461693.826946, 0.000128
10, 1359461693.826830, 12, 1359461693.826958, 0.000128
17, 1359461693.826999, 18, 1359461693.827014, 0.000015
21, 1359461693.827078, 22, 1359461693.827128, 0.000050
25, 1359461693.827192, 26, 1359461693.827242, 0.000050
27, 1359461693.827359, 29, 1359461693.827488, 0.000129
[...snip...]

The output produced is standard CSV-formatted text.which can be piped or redirected to a file as necessary for manipulation by your favourite spreadsheet or command line tool. Timestamps are in seconds since Unix epoch. Delay is reported in seconds.

Note: The pairing-up mechanism is highly dependent on the test traffic containing unique data in the last 20 bytes of each frame. For tester traffic that's taken care of automatically but your mileage with "real" traffic will vary. I would expect that FTPing a compressed file or playing music / white noise over VOIP  should give relatively good entropy to your data if a tester is not available. For best results filter out non-test traffic beforehand - OSPF hellos and LACPDUs are very repetitive so will generate lots of false hits.

For example, if you want a quick-and-dirty graph of latency over arrival time using gnuplot, just pipe the output to file then use a command such as:

gnuplot> plot "all-ways.txt" using 2:5 with points pt 2


Alternatively, graph the latency by frame number:

gnuplot> plot "all-ways.txt" using 1:5 with points pt 2

Hopefully your output won't look like this - it is an intentionally odd example caused by sending very bursty traffic.

Finally

I always ask but it's never happened yet... if you try the tool out, please leave a comment. I'm interested in feedback, good or bad, and if it doesn't quite do what you want I may change it!

Tuesday 22 January 2013

RADIUS and L2TP Support Added to "dechap"

This is just a very short message to say that I have enhanced the "dechap" tool mentioned in my previous post. In addition to the original PPPoE support it can now extract and attack CHAP authentications sniffed from RADIUS and L2TPv2 protocols.

The syntax remains exactly the same and it should "just work". The code is available to download at https://www.github.com/theclam/dechap. Please post a comment if you have any feedback or suggestions.


Monday 21 January 2013

Recovering CHAP Passwords from Sniffed PPPoE Sessions

In a previous blog post I outlined the theory behind setting up a PPPoE session including PPPoE discovery, LCP, NCPs and, more relevant to this post, the basics of CHAP authentication. At the time I was writing the post I wondered how easy it would be to work back from the CHAP messages on the wire to the original credentials, so I decided to find out.

Recap of CHAP Theory


As a reminder (or a very quick introduction), the CHAP process works something like this:

CHAP Authentication
  1. The party requiring the opposite peer to authenticate (i.e. "server") sends a CHAP challenge message containing a challenge ID and some unpredictable "random" data.
  2. The party being authenticated (i.e. "client") concatenates the authentication ID, the password and the challenge data into a single unit, then generates an MD5 hash of that. The resulting hash, plus the client name (user ID or hostname) is passed to the server as a CHAP response.
  3. The server compares the incoming hash to the value it obtains by performing the same calculation locally and returns a CHAP success or CHAP failure message.
Now, clearly, if the CHAP challenge and response messages can be captured then an offline brute force attack can be mounted against the password. This can be achieved by simply extracting the authentication ID, challenge data and response from the relevant messages and then trying candidate passwords until one (hopefully) generates a hash identical to that seen in the response message.

The Attack in Practice

While the process is intuitively simple, as usual there are a few corner cases to cover. Recovering CHAP authentications from a capture file full of other junk requires a certain amount of processing logic, then responses must be re-united with their corresponding challenges before they can be attacked.

Gathering CHAP Packets


I wanted the tool to be flexible with regards to encap. Since I work primarily on carrier networks, I get really frustrated by tools that do a job perfectly but only accept untagged, unencapsulated frames. Once you have a packet capture in your hand, realising that it can't be used because it has two VLAN tags and a pair of MPLS labels is a nuisance.

The approach that seemed most sensible was to build a recursive decap function which would take in a (partial) frame plus a "hint" as to what type of header to expect. The function would then check for and record any matching criteria present (i.e. MACs for Ethernet, VLAN ID for 802.1Q - more on this in the next section) before either returning or calling itself on the remainder of the packet with a "hint" derived from the current header.

Worked Example

Let's process the following frame as an example. Data in black are used by the algorithm while data in grey are not.

[Ethernet][VLAN][VLAN][PPPoES][PPP][CHAP]
The initial call to the function passes the entire frame with an "Ethernet" hint. In the Ethernet header, the source and destination MAC addresses are read and stored. The EtherType field contains 0x8100, indicating an 802.1Q VLAN header is next. The function calls itself against the contents of the frame from byte 15 and on with a hint of "VLAN".

[VLAN][VLAN][PPPoES][PPP][CHAP]

Now the function reads and stores the VLAN ID. Since this is the first VLAN we have seen it is stored as the C-VLAN for now. The EtherType is, again, 0x8100 so the function calls itself against bytes 5 and onward using a hint of "VLAN".

[VLAN][PPPoES][PPP][CHAP]


Again,  the function reads and stores the VLAN ID. Since this is not the first VLAN tag found, the previously known VLAN ID is moved into the S-VLAN field and the value from the frame is stored in the C-VLAN field. This time the EtherType is 0x8864, indicating a PPPoE session header follows. The function calls itself against bytes 5 and onwards using a hint of "PPPoE".

[PPPoES][PPP][CHAP]


The function now reads and stores the PPPoE session ID (SID). The only valid thing to follow a PPPoE session header is a PPP header, so the function calls itself on bytes 7 and onward, using a hint of "PPP".



The function now simply checks that the protocol ID in the PPP header is 0xC223 for CHAP. If so, it calls itself one last time against bytes 3 and onward using a hint of CHAP.

[CHAP]


Finally we are down to the payload. The CHAP message type is checked and:
  • For challenges, the authentication ID, challenge length and challenge data are stored.
  • For responses, the authentication ID, response and client name are stored.
Each instance of the function can then return to its parent, eventually resulting in a fully populated record of all the data relevant to authentication. The completed records can then be stored in a doubly linked list for later consumption.

Pairing Up

A CHAP response must be paired up with its respective CHAP challenge, otherwise the maths don't work. In real life there may be several authentications in progress at one time across multiple PPPoE sessions, possibly over multiple different VLANs. Often the CHAP authentication ID is only unique within a PPPoE session. Similarly, the PPPoE session ID only needs to be unique within a broadcast domain so these are often re-used across VLANs. Care must be taken to ensure that the challenge and response really do belong together.

In order to be considered a challenge / response pair, I decided the following criteria must match:
  • Server and Client MACs
  • S & C VLAN IDs (if present)
  • PPPoE SID
  • CHAP authentication ID
I considered including MPLS labels in this but I struggled to think of a realistic scenario in which two authentications would match the above criteria but use a different label.

Additionally, the thought occurred that even with the above details matching, there may be more than one challenge / response pair for the same PPPoE session so a response would have to be paired with the most recent challenge for which the criteria matched. In the program this is achieved by working backwards through the linked list, starting at the response, until a match is found. Data from matching challenge / response pairs are stored in another list for later consumption. If the search reaches the beginning without a matching challenge being found then the response cannot be used and is ignored.

Brute Force Password Guessing


For each challenge / response pair in the list, the next step is to cycle through a list of password guesses. Each candidate password is combined with the authentication ID and challenge data from the captured authentication and hashed. The resulting hash is compared to the one from the captured response and, for those that match, a correct guess is reported. If no password generated a matching hash then the word list does not contain the correct password and this is also reported back.

Downloading the Tool

The C source code may be downloaded from: https://github.com/theclam/dechap

Provided the OpenSSL dev libraries are installed it should be possible to simply extract the source code, cd into the directory then run "make".

In the future I may add the capability to pull the auths from L2TP or RADIUS interactions but for now only PPPoE is supported. It also assumes that Ethernet control words are not present in MPLS encapsulated traffic.

Using the Tool


The usage is pretty straightforward - there are only two parameters and both are mandatory. Specify your capture file (original pcap format) with the -c flag and your word list with the -w flag. Here's an example:

lab@lab:~/dechap$ ./dechap -w mywords.txt -c someauths.cap
Found password "tangerine" for user user1@testisp.com.
Unable to find a password for user user2@testisp.com.
Found password "password1" for user user3@testisp.com.
Found password "Africa" for user user4@testisp.com.
Found password "Frankenstein" for user user5@testisp.com.
lab@lab:~/dechap$

Considering that I've made no effort at all to make the code efficient, I've found the speed pretty good. On my '90s PC, a worst-case run (i.e. where no passwords are found) against 800 auths with 100k candidate passwords, a run still completes inside a minute. I don't think that's bad for parsing 15,000 packets and running 80 million concatenate - hash - compare sequences.

If you try this out, please leave a comment on this post with your experiences - good or bad.

Friday 11 January 2013

A Script to Bring Up a PPPoE Sessions using Python & Scapy

As I mentioned in my previous post, I have put together a script which can bring up a PPPoE session, authenticate using CHAP, negotiate an IP address and send / receive traffic. The script is written in Python and requires a relatively up to date version of scapy (I use v2.2.0-dev, just grab the latest from http://www.secdev.org/projects/scapy/).

I warn you now that I am not a professional coder (or even a particularly keen amateur) and I don't really get on with Python... so don't be surprised if it looks a bit C-like!

To run the script, simply download PPPoESession.py from https://github.com/theclam/PPPoESession-Python and call it from within Python:

root@labpc:~# python
Python 2.6.5 (r265:79063, Apr 16 2010, 13:09:56)
[GCC 4.4.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> execfile("PPPoESession.py")
__main__:2: DeprecationWarning: the md5 module is deprecated; use hashlib instead
WARNING: No route found for IPv6 destination :: (no default route?)
/usr/local/lib/python2.6/dist-packages/scapy/crypto/cert.py:10: DeprecationWarning: the sha module is deprecated; use the hashlib module instead
  import os, sys, math, socket, struct, sha, hmac, string, time
/usr/local/lib/python2.6/dist-packages/scapy/crypto/cert.py:11: DeprecationWarning: The popen2 module is deprecated.  Use the subprocess module.
  import random, popen2, tempfile
>>>


You can expect to see a few deprecation warnings, depending on which version of Python is in use.

The script defines the PPPoESession class, plus a few other miscellaneous functions for encapsulating and extracting parameters. The PPPoESession class inherits from the scapy Automata class, so all the useful features of that class such as graph() and easy debugging are available. See the scapy Automata wiki entry (http://trac.secdev.org/scapy/wiki/Automata) for more details.

In order to bring up a PPPoE session, a PPPoESession object needs to be instantiated and a few parameters need to be set. At minimum the Ethernet interface, username and password need to be configured:

>>> p = PPPoESession()
>>> p.iface="eth1"
>>> p.username="spongebob@bodges"
>>> p.password="password"


Once that is done, the automaton can be started using the runbg() method. The state machine then runs in the background, returning control to the user. Messages will appear as it goes through the motions of bringing up the PPPoE session, then the PPP session, then authenticating before finally completing IPCP:

>>> p = PPPoESession()
>>> p.username="spongebob@bodges"
>>> p.password="password"
>>> p.iface="eth1"
>>> p.runbg()
>>> Starting PPPoED
Starting LCP
Got CHAP Challenge, Authenticating
Authenticated OK
Starting IPCP
Peer provided our IP as 123.4.5.6
IPCP is OPEN

>>>

Once IP is negotiated, the automaton will stay in the IPCP_OPEN state, able to send and receive IP packets and automatically responding to any LCP echoes that arrive.

From that state, the following methods may be called:

recv_queuelen() - returns the number of packets waiting in the receive buffer
recv_packet() - returns and de-queues the first packet in the receive buffer
send_packet(IPPacket) - transmits the given IP packet over the PPPoE session
ip() - returns the IP address given to the client
gw() - returns the peer's IP address

Here's an example of passing some traffic on an open session by pinging the gateway:

>>> p.recv_queuelen()
0
>>> p.send_packet(IP(src=p.ip(), dst=p.gw())/ICMP())
>>> p.recv_queuelen()
1
>>> p.recv_packet()
<IP  version=4L ihl=5L tos=0x0 len=28 id=1 flags= frag=0L ttl=64 proto=icmp chksum=0xbd0f src=1.1.1.1 dst=123.4.5.6 options=[] |<ICMP  type=echo-reply code=0 chksum=0xffff id=0x0 seq=0x0 |<Padding  load='\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00' |>>>
>>>

The script is still very much a work in progress. There is, for example, no clean way to gracefully shut down the PPP session at the moment and it doesn't handle incoming Terminate-Requests, either. I am hoping to add that, and more, soon.

Have a play with it and let me know what you think, good or bad :)

Thursday 10 January 2013

Bringing Up a PPPoE Session - The Theory

In a previous post, I shared a Scapy script that implements the PPPoE discovery stage and stops once the session stage is reached. As handy as that script is for testing AC Cookie validation, it is not particularly useful for anything else. It would be much better if the script could bring the PPP session all the way up.

Luckily, the PPPoE discovery script is a cut-down version of another script that I wrote a long time back which goes all the way from PPPoED, through LCP and CHAP authentication and stops at IPCP. At the time, the script was far too messy to share but I've tidied it up and it is now in a state that it could be useable by others. I've also added IPCP negotiation and a couple of methods for sending and receiving IP traffic over the resulting session.

Before I present the script, I'll cover the theory involved, step by step. The impatient may want to just go to the next post (when it is available) for the script itself and instructions on how to run it.

PPPoE Discovery

PPP is (a) point-to-point protocol, designed to run over a dedicated link between two devices. Ethernet is a multi-access network, so if we want to run PPP over Ethernet then we need a mechanism to discover peers and establish a point-to-point relationship between two devices over the shared medium.

PPPoE provides this service and operates in two distinct stages:
  1. Discovery: The discovery stage is responsible for locating PPPoE peers and negotiating session parameters so that, ultimately, a PPPoE session can be created.
  2. Session: Once the discovery stage is complete the protocol enters the session stage, at which time the two peers have a tunneled connection between them over which to start passing PPP.
Once the session stage is reached, the peers bring up and operate their PPP session exactly as they would over a dedicated link.

The diagram below summarises the PPPoE "Discovery" stage:

PPPoE State Transitions
The first step in the journey is to find a PPPoE access concentrator which is willing to terminate our session. To do this, we must broadcast a PPPoE Active Discovery Initiation (PADI) message. It is possible to specify a service name in the PADI - this is just a string that identifies a particular type of service in which the client is interested. The access concentrator may use this to decide whether or not to offer to terminate the session, though in most cases it is just ignored. For this reason clients generally use an empty service name.

Any access concentrators listening on the segment will receive the PADI message, inspect its contents and then make a decision whether or not to make an offer to terminate the client's session. If the access concentrator is willing to terminate the session, it signals this to the client by sending a unicast offer (PADO) message. Typically, the PADO has an AC-Cookie attached to it - essentially the AC-Cookie is an "unpredictable" string, derived from the client's MAC address, which the access concentrator uses to mitigate against certain kinds of resource exhaustion attacks. When AC-Cookies are used, a PADO is generated 'mechanically' from the incoming PADI and no state is created on the access concentrator at this point.

When the client has received at least one PADO, it must select a favourite. It is common to just use the first offer received, but other selection criteria may be used. The client then sends a unicast request (PADR) to the chosen access concentrator, indicating that it would like to access its offer. If an AC-Cookie was contained in the PADO message then is echoed back in the PADR. The requirement to echo the cookie back to the access concentrator is designed to validate that the client really exists and is available on the MAC address where the PADO was sent.

Finally, it is up to the access concentrator to confirm that it the session has been created. If AC-Cookies are in use then the incoming PADR is examined to check whether the AC would have generated the provided cookie given the source MAC - in the case of a mismatch the PADR is silently dropped, otherwise the session state is created in the AC and a session (PADS) message is unicast to the client to confirm that the session has been created and the "Session" stage has begun. The PADS always contains a PPPoE session ID number, which is used to discriminate between multiple PPPoE sessions on the same LAN. The session ID is used to differentiate between multiple PPPoE sessions on the same LAN and must be present in the header of every PPPoE frame exchanged with the AC during the "Session" stage.

The fifth type of PPPoE discovery message is the terminate (PADT) which, as its name suggests, is used to terminate (i.e. end) a session which has been established. Either end may send a PADT message to close the session and once a PADT has been received, no further traffic may be sent for that session.

PPP

PPP itself consists of a number of sub-protocols. There are:
  • Link Control Protocol (LCP) which is responsible for negotiating overall link parameters
  • PAP and CHAP which are used for authentication
  • A family of Network Control Protocols (NCPs) used to negotiate the transport of each upper layer protocol
PPP also defines that once a higher layer protocol has been negotiated by its corresponding NCP, that protocol's traffic will be encapsulated with header indicating that particular protocol's protocol number.

Link Control Protocol (LCP)

RFC 1661 defines LCP as the protocol that is responsible for "establishing, configuring,
and testing the data-link connection." Essentially this means that LCP is used to bring up and take down PPP links, negotiate the configuration parameters and check that the link is still alive. There are a range of LCP codes which are used to fulfil these aims, discussed below.

Configuration Type Codes

In order to bring up a PPP session both peers must agree on certain parameters, for example the maximum size of frame that may be passed, whether to use compression and so on. Both peers propose the settings they would like to use - the opposite peer will then either acknowledge (accept), nak (i.e. suggest alternative) or reject (outright refuse) the proposed options. The aim is to reach a state where the opposite peer has acknowledged the locally proposed parameters.

The following LCP codes are standard and must be implemented:

Configure-Request - Used to propose a set of parameters that we would like to use for the session. The peer will then respond to the proposed parameters with one of the next three responses.

Configure-Ack - Used to advise the peer that their proposed parameters are acceptable. The accepted parameters are echoed back in the ack message.

Configure-Nak - Used to advise the peer that their proposed parameters are not acceptable and that the alternative values should be used. The proposed changes are attached to the nak message.

Configure-Reject - Used to advise the peer that their proposed parameters are not supported and cannot be used. The unacceptable parameters are echoed back in the reject message.

Termination Type Codes

Either peer may request to terminate the session at any point and the opposite peer must honour that request. There are two termination related codes in LCP:

Terminate-Request - Generated by a peer to initiate the tear-down of the link. A Terminate-Request should be re-sent if no Terminate-Ack is received in response.

Terminate-Ack - Generated to confirm receipt of a Terminate-Request. A Terminate-Ack must be generated in response to a Terminate-Request.

Liveness Check Codes

LCP includes a ping-like echo mechanism to verify that the opposite peer is still available, with LCP in an open state and is responding. The same mechanism is used to detect a looped interface - due to the symmetric nature of PPP it's quite possible to negotiate a connection to yourself without necessarily realising or for a connection to be looped mid-session. The following codes are used for liveness checks:

Echo-Request - Sent to the remote peer to solicit an Echo-Reply message. There is no requirement to negotiate the use of LCP echoes and an Echo-Request may be generated at any time while LCP is open. If the Magic-Number option was negotiated during LCP, the Echo-Request must contain the "random" 4 octet magic number decided at that time.

Echo-Reply - Sent in response to an Echo-Request message. When LCP is open, an Echo-Reply message must be sent whenever an Echo-Request is received. The magic number contained within the incoming Echo-Request must be copied into the outgoing Echo-Reply. If the incoming packet has our magic number then the connection has become looped.

Other Codes

There are other codes such as Code-Reject, Protocol-Reject and Discard-Request which do pretty much what you would expect. You don't get to see them very often so I will not discuss them here. I suggest referring to RFC 1661 for more detail on these.

LCP State Diagram

Below is a simplified state diagram showing how LCP makes its way from the "Starting" state into an "Opened" state. Most parts of PPP are referred to as "open" when they are up and running. I have omitted a number of transitions that deal with strange corner cases (like if the peer acks something we never sent, etc) and also transitions related to closing the connection (the Term commands discussed above). RFC 1661 contains a complete state transition table which is far more complex. If you bear in mind that at any stage either peer may terminate the session then this minimal version will cover 95% of "normal" cases.

LCP State Transitions

Authentication

Once LCP is open, the next stage is typically to start authentication. Authentication may be done by either, neither or both the peers as negotiated by LCP and can be done using plaintext PAP or MD5 hashed CHAP. If no authentication was negotiated by LCP, an implicit pass is assumed.

PAP is hardly ever used these days, is strongly discouraged and in any case is pretty simple, so I will not discuss it here. Please refer to RFC 1334 if you require details on PAP.

CHAP, though not immune to attack, offers reasonable security. The password itself is never sent "over the wire" and there is good protection against replay attacks via the use of random challenges. Here is how CHAP operates:

CHAP Authentication

Essentially, security is provided in two ways:
  1. The password is never exchanged in the clear but instead is passed through a one-way cryptographic hash function. It is computationally infeasible to recover the password from the hash function's output, so it is quite safe to pass this output over the wire.
  2. If the client just hashed the password, then it would be possible for an attacker to capture the hashed value and authenticate with the server at a later time by simply replaying the same response. CHAP requires the server to generate a random challenge string, which is also fed into the hash function and affects its output. Provided the server never re-uses a challenge value, an attacker cannot simply replay a previous authentication response to gain access.
When the CHAP response comes in, the server compares the received hash value with the output of a local calculation using the same method to determine whether the authentication attempt was successful. While this is precisely true when the server has a local copy of the password, typically this is not desirable and in practice the authentication check is deferred to an external RADIUS server. In order for the RADIUS to validate the attempt, the server must pass it a copy of the ID and challenge sent, plus the response received. The RADIUS can then use the ID, its own copy of the plaintext password and the challenge value to compute the expected response. If the expected and actual responses match then the RADIUS will return an "Accept" response, otherwise it will return a "Reject" response.

Network Control Protocols (NCPs)

Before any higher layer protocol can be passed through a PPP tunnel, it must be negotiated by a corresponding NCP. For example before you can pass IP through a PPP tunnel, IPCP must be open, indicating that all the required IP parameters have been successfully negotiated. To pass OSI traffic, OSICP must be open. For IPv6, IPV6CP is used.

The operation of each NCP is different but they all essentially follow the same model as LCP - parameters are proposed by each peer and ack'd, nak'd or rejected by the opposite peer. - and the state transition diagram pretty much looks the same.

IPCP

I'll go into a little more detail on IPCP since that is the most commonly used (for now) with a worked example of a DSL subscriber connecting to his ISP, starting immediately after authentication succeeds.

Client Side

The client generally does not know anything when it first connects and relies on the server to provide it with everything it needs. The client sends a Configure-Request proposing an IP address, primary and secondary DNS of 0.0.0.0. Proposing 0.0.0.0 for these is actually an  explicit request for the server to provide legitimate values for the client to use.

The server will then respond with a Configure-Nak message containing the IP address and DNS servers that the client should use.

The client will then send another Configure-Request with the newly acquired details, to which the server responds with a Configure-Ack.

Server Side

The server will typically send out a Configure-Request containing only its own IP address. There is no reason to argue over this so the client should just respond with a Configure-Ack. If the client tries to push a different address to the server using a Configure-Nak, it is typically ignored and after a few retries the session gets pulled down.

Passing Traffic

Once the two peers are agreed and IPCP is open, IP packets may be passed through the PPP tunnel by attaching a header - in most cases, for PPPoE connectivity, the PPP header consists of only a two byte protocol number (0x0021 for IP). The protocol number is analogous to the EtherType field of an Ethernet frame and indicates to the receiver how to interpret the payload. Alternative encapsulations exist - refer to RFC 1662 for more details on HDLC style framing which is often seen in L2TP.

Further Reading

That about covers the protocols involved in bringing up a PPPoE session at a high level. If you require more information I would suggest turning to the following RFCs:

RFC 2516 - PPPoE - http://tools.ietf.org/html/rfc2516
RFC 1661 - PPP - http://tools.ietf.org/html/rfc1661
RFC 1994 - CHAP - http://tools.ietf.org/html/rfc1994
RFC 1332 - IPCP - http://tools.ietf.org/html/rfc1332
RFC 1877 - IPCP extensions for DNS - http://tools.ietf.org/html/rfc1877