|  
                           | 
                        
                          
                             
                                
                                  
                                    
                                        
                                          article page | 1 |
                                              2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
                                           did not count. And even above this,
                                            scheduled maintenance down time did
                                            not count toward availability. The
                                            Voice NOC had seemingly qualified
                                            away rigor in order to meet
                                            the mandated goal of five-nines.
                                            (Once again proving the rule that
                                          you get what you measure.) 
                                          Essentially
                                            the Voice NOC was not wrong, nor
                                            the Data division right. What the
                                            NOC was measuring was compliance
                                            with a customer-oriented SLA, a Service
                                            Level Agreement, and not really an
                                            availability measure. The Data division
                                            was measuring availability as “measured
                                            uptime,” or the probability
                                            the network could be used at any
                                            specific time the customer wished
                                            to use it. Today we clearly understand
                                            the difference between SLAs and Availability,
                                            and define them in separate and individually
                                            appropriate ways. So the justification
                                            session worked, and Wedge Greene
                                            henceforth started attending standards
                                          meetings as an Operations guy. 
                                             
                                           
                                         
                                      | 
                                   
                                | 
                            
                                
                                  
                                      
                                        
                                          High availability is consistently
                                            used by telecommunications vendors
                                            to describe their products, perhaps
                                            even more often than the umbrella
                                            term “carrier grade”. 
                                           | 
                                       
                                    | 
                                 
                                
                                    
                                          
                                            
                                              
                                                  
                                                    In practice, these numbers
                                                      are mostly estimates that
                                                      the manufacturer makes
                                                      about the reliability of
                                                      their equipment. Because
                                                      telecom equipment frequently
                                                      is expensive, not deployed
                                                      in statistically significant
                                                      sample set sizes, and rushed
                                                      into service as soon as
                                                      it passes laboratory tests,
                                                      the manufacturer estimates
                                                    its reliability. 
                                                    It is not clear when the
                                                      measure of reliability
                                                      of an individual network
                                                      element became the measure
                                                      of overall network availability – but
                                                      it did: customers don’t
                                                      care if one element fails,
                                                      or dozens fail. Customers
                                                      only care that the service
                                                      they are paying for and
                                                      rely on works as offered.
                                                      It is also interesting
                                                      to note that this five-nines
                                                      either transferred from
                                                      network elements to computing
                                                      systems. Today  
                                                 
                                                | 
                                             
                                           
                                    | 
                                 
                               
                                 | 
                           
                          
                             | 
                           
                          
                             
                                
                                  
                                    
                                        Hardware
                                          Origins 
                                        Let us backtrack a bit. It is likely
                                          that the origin of five-nines availability
                                          comes from design specifications for
                                          network elements. It also seems likely
                                          that the percent measure came before
                                          the current MTBF (Mean Time Between
                                          Failures) and MTTF (Mean Time To Fix)
                                          measurement, since it is a simply expressed
                                          figure and the MTTF requirements often
                                          match the % calculation while being
                                          expressed as an odd-ish number. However,
                                          99.999% is not so accurate, fundamentally
                                          when you examine it closely, because
                                          of the fuzziness of the definition
                                          of availability. So in MIL-HDBK-217
                                          and Bellcore/Telecordia TR332 they
                                          standardized these measures. The basic
                                          hardware design measures became: 
                                        
                                          MTBF
                                            - the average time between failures
                                            in hardware modules
                                            MTTR - is the
                                            time taken to repair a failed hardware
                                            module.  
                                         
                                          | 
                                   
                                | 
                             
                                
                                  
                                    
                                        
                                          computer server reliability is critical
                                            to the network availability, so it
                                            is actually convenient that both
                                            seek the same standard for describing
                                          quality. 
                                           Defining Availability 
                                          IEEE Reliability Society - Reliability
                                            Engineering subgroup defines “Reliability
                                            [as] a design engineering discipline
                                            which applies scientific knowledge
                                            to assure a product will perform
                                            its intended function for the required
                                            duration within a given environment.”   Reliability
                                            is the flip side of the availability
                                            coin. 
                                          From a common
                                              sense perspective, the meaning
                                              of availability is clear. But when
                                              you want to measure it, and then
                                              hold someone to task for delivering
                                              that availability, you must define
                                              an operational definition for
                                              it. Federal 
                                            
                                         
                                      | 
                                   
                                  
                                    article
                                          page | 1 | 2 |
                                          3 | 4 | 5 | 6 | 7 | 8 | 9 |  | 
                                   
                                | 
                           
                          |