Monitoring Bluetooth is hard. A Bluetooth piconet (a small network of two or more Bluetooth devices) continually hops through 79 adjacent channels, each 1 MHz wide. The piconet uses one channel at a time, hopping to a new channel 1600 times each second according to a pseudo-random channel sequence based on semi-secret information.
In order to capture all the transmissions of a particular piconet, a receiver must predict and execute the piconet's hopping sequence. (An alternative approach is to capture all 79 channels simultaneously and then throw out the 78 that are unused at any particular time after identifying the channel that is active for a particular time slot. I'm considering this to be too expensive, but it can be done and will become less expensive over time.)
We can only predict the hopping sequence if we know two pieces of information. The first is a portion (the 28 lowest bits) of the piconet master device's numeric address (the "MAC address" or more properly "BD_ADDR"), and the second is the master's clock, a 28 bit integer value that increments 3200 times per second. If we know both the address and the clock at a particular time, then we can correctly predict the hopping sequence forever. We just have to use the hopping algorithm dictated by the Bluetooth specification.
In 2007, Spill and Bittau demonstrated that the necessary portion of the address can be derived from the contents of any single frame. [update: see Thierry Zoller's comment below.] It is easy to capture a frame by listening on a single channel and waiting for the piconet to hop through it. With 79 channels and 1600 hops per second, this doesn't take long. All that we are then missing is the clock.
One way to acquire the clock is to join the piconet. When we successfully join, the master shares its clock with us so that we can follow along from then on. In order to join, we need to know the entire address of the master, not just the lowest 28 bits. Since we already know part of the address, the remaining bits can be guessed fairly easily (see Josh Wright's BNAP BNAP project). Armed with the complete address, we can attempt to join the piconet. This works in a great many cases, even when you might think it shouldn't, but it is possible that we could encounter a master device that will not let us join. It is also possible that we would prefer to monitor passively and do not want to interfere with the piconet in any way.
There is a way to acquire the clock passively by observing another device joining the piconet. Unfortunately, this requires some combination of luck and patience. If we don't have an opportunity to observe a device joining the piconet, this technique doesn't help us.
It is possible to reliably determine the master's clock completely passively, and that is by reversing the hopping sequence. That is, instead of using the hopping algorithm in the forward direction (determining the sequence from the clock), we use it in reverse (determining the clock from the hopping sequence). The hopping sequence repeats every 134217728 steps (28 clock bits minus one bit because it only hops every other clock cycle) with each step calculated from the address and clock. You can think of it as a static sequence based on the address with the clock value indicating the current index or position within the sequence. Using the address, we can pre-calculate the entire sequence. If we then observe a small number of hops taken by the target piconet, we can search through the complete sequence to find the index that matches the observed hops. This index is the clock.
At first, I thought this would be a great application for a high-bandwidth (79 MHz) software radio device. We could capture all 79 channels for a fraction of a second, do a bunch of number crunching to identify target frames and reveal a short segment of the hopping sequence, and then search through the complete sequence to find the clock. Even if we can't decode all 79 channels simultaneously in real time, we can probably do real time decoding of one channel at a time after (slowly) reversing the hopping sequence. Assuming that the pseudo-random hopping sequence is reasonably random, we would only need to observe a few hops in order to have a high probability of locating a unique match within the complete sequence; five hops ought to be enough in most cases.
Unfortunately, the pseudo-random hopping sequence is a long way from being reasonably random. The algorithm does a good job of spreading nearby time slots across a wide range of channels, but it does so with a great deal of repetition in the long run. If we capture all 79 channels for five hops, our chance of finding a unique match in the complete sequence is almost nil. Even after observing fifty consecutive hops we would have less than a 50% chance of success.
To illustrate this, I simulated the results for a large number of cases (with random address and random clock). This graph shows how often an observed sequence segment turned out to be unique for various observation periods. One set of simulations was done assuming a single observed channel, one with eight adjacent channels (randomly selected), and one with all 79 channels. When observing fewer than 79 channels, we miss many of the hops, but we are able to capture a frame each time the piconet hops through one of the observed channels.
It turns out that, in order to have a 95% chance of getting a unique match while observing all 79 channels, we have to capture not five hops, but about 650! This requires an observation period of about four tenths of a second and a great deal of computation. Multiplying the four tenths of a second by the number of channels observed (79), the result is about 32 channel seconds processed. At the other end of the scale, if we only observe a single channel, it takes about 2000 total hops (out of which only about 25 will be captured) to get to a 95% chance of a unique match. It takes almost a second longer of observation, but the number of channel seconds processed is only 1.25.
Adding channels helps quite a bit less than I expected. Regardless of the number of channels observed, the important thing is to capture frames that span a long enough period of time. Since the observation of more channels involves significantly more computation and more expensive hardware, it looks like the best way to reverse the hopping sequence is to listen to a single channel until a unique match is found.
Here is the code I used for the simulations. It includes a fast (I think) implementation of the hopping algorithm. Please let me know if you find any bugs! I'd love to hear from you if you find this interesting or useful.