Wednesday, August 24, 2016

Why validation surveys aren't enough, the importance of functional validation testing.

I've been thinking a lot lately about post validation surveys and what value they provide.  And I'm starting to wonder if the premise of validation surveys isn't being realized. After all, what exactly are you validating?

I've seen surveys that looked great, -67dbi coverage, 30 SNR,  no CCI, no interference, and yet those site have been the receiving complaints about the wifi.  A visit to the site illuminates some challenge that the survey didn't convey, and usually easily remedied.  Sometimes it's RF related. Sometimes configuration.

Surveys don't lie, but they tell the story from their perspective.  Let's call this subjective reality.  And the reality that your survey rig sees may vary dramatically from what other devices see, so much that it appears false.  Think of a cog native bias around how Wi-Fi works as seen by the device testing it.

There has been a lot written on "compensation" lately and how we can compensate the delta from the survey device to the real device. But I don't know if this can overcome enough of the differences between our survey device and the variety of devices out there.  Sure, I can compensate a fixed amount from my survey NIC to the actual device.  Is that compensation value flat?  Or is there a compensation value per channel?

And then let's talk about noise, SNR, overlap, roaming, etc.  Can we compensate for all of these variables per channel?  Suddenly compensation seems about as practical as calibration.  Then we have another view to consider: how the APs see things.  So how do we validate networks?  

Functional Validation Testing:
Personally, I think engineers spend too much time focusing on surveys and not enough time validating that the network meets the requirements for the devices using it.  That means getting the actual devices on the network and testing that they meet the requirements.  Who's requirements?  The business requirements for their devices.

Ever seen a device that fails to roam if it sees more than one "good" candidate?  Those are fun.  What about an iPhone that can see a network but won't automatically associate to it even when configured to?  But... But... But... The survey results look fine.  You won't find those problems on a Windows machine running your favorite survey software.

Take that forklift and barcode scanner for a ride and make sure it roams and works they way it's intended.  Spend a day walking the facility measuring voice call quality, roaming, throughput, etc with the devices that will actually be deployed.  This ensures that you know and understand the device, how it will be used, and identify places where we may have issues.  Have actual workers show you how they use it.

This process also makes you think about how you can validate if the device works normally.  Maybe it's packet captures looking for L2 retransmission?  Measured roaming delays?  Maybe it's constantly scanning a barcode into the order management system as you fly by on a forklift.  Maybe it's wired side packet captures looking for Telnet or SSH sessions dropping and reconnecting within a specific amount of time.  Or, if you are lucky, it might be putting the device into a survey mode and see how t sees the network around it.

Spend the day after the turn-up working onsite.  Find a comfy chair and work wirelessly for the day.  Take every device you can manage with you.  Make sure they behave as you expect them to.  See odd behavior?  End users will too.

Now, before I get a bunch of angry emails, I'm not saying validation surveys are dead.  I'm saying that we need to spend more time validating the actual devices and less about a relative measurement that may or may not be indicative of a how well a network performs for specific devices.  Validation surveys give us a relative baseline from which to judge a network, which is valuable.  It does not however tell us how well a network functions.

As my friend Chris Lyttle ( so eloquently put it: "It's a system, all things matter."  Validating the RF is one thing, but testing the rest of the system is just as important.