Resilencia, La Falta De Controladores De Dispositivos

Páginas: 13 (3205 palabras) Publicado: 21 de noviembre de 2012
Failure Resilience for Device Drivers
Jorrit N. Herder, Herbert Bos, Ben Gras, Philip Homburg, and Andrew S. Tanenbaum Computer Science Dept., Vrije Universiteit, Amsterdam, The Netherlands {jnherder, herbertb, beng, philip, ast}@cs.vu.nl

Abstract
Studies have shown that device drivers and extensions contain 3–7 times more bugs than other operating system code and thus are more likely tofail. Therefore, we present a failure-resilient operating system design that can recover from dead drivers and other critical components—primarily through monitoring and replacing malfunctioning components on the fly—transparent to applications and without user intervention. This paper focuses on the post-mortem recovery procedure. We explain the working of our defect detection mechanism, thepolicy-driven recovery procedure, and post-restart reintegration of the components. Furthermore, we discuss the concrete steps taken to recover from network, block device, and character device driver failures. Finally, we evaluate our design using performance measurements, software fault-injection experiments, and an analysis of the reengineering effort. Keywords: Operating System Dependability, FailureResilience, Device Driver Recovery.

and DVDs also contain error-correcting codes so that read errors can be corrected on the fly. The TCP protocol provides reliable data transport, even in the face of lost, misordered, or garbled packets. DNS can transparently deal with crashed root servers. Finally, init automatically respawns crashed daemons in the application layer of some UNIX variants. Inall these cases, software masks the underlying failures and allows the system to continue as though no errors had occurred. In this paper, we extend these ideas to the operating system internals. In particular, we want to tolerate and mask failures of device drivers and other extensions. Recovery from such failures is particularly important, since extensions are generally written by third partiesand tend to be buggy [9, 39]. Unfortunately, recovering from driver failures is also hard, primarily because drivers are closely tied to the rest of the operating system. In addition, it is sometimes impossible to tell whether a driver crash has led to data loss. Nevertheless, we have designed an operating system consisting of multiple isolated user-mode components that are structured in such a waythat the system can automatically detect and repair a broad range of defects [10, 15, 30], without affecting running processes or bothering the user. The architecture of this system is shown in Fig. 1.
Fork Drivers Notify on Exit

1

INTRODUCTION

Perhaps someday software will be bugfree, but for the moment all software contains bugs and we had better learn to coexist with them.Nevertheless, a question we have posed is: “Can we build dependable systems out of unreliable, buggy components?” In particular, we address the problem of failures in device drivers and other operating system extensions. In most operating systems, such failures can disrupt normal operation. In many other areas, failure-resilient designs are common. For example, RAIDs are disk arrays that continue functioningeven in the face of drive failures. ECC memories can detect and correct bit errors transparently without affecting program execution. Disks, CD-ROMs,

App

App

App

Process Manager

User Space

Virtual File System File Servers Network Server

(V)FS Servers

Reinc. Server

Monitor System Repair Defects

Untrusted Drivers Expected to Fail

Device Drivers

Data StorePublish Config Backup State

Microkernel

Figure 1: Architecture of our failure-resilient operating system that can recover from malfunctioning device drivers.

In this paper, we focus on the post-mortem recovery procedure that allows the system to continue normal operation in the event of otherwise catastrophic failures. We rely on a stable set of servers to deal with untrusted components....
Leer documento completo

Regístrate para leer el documento completo.

Estos documentos también te pueden resultar útiles

  • Dispositivos de control
  • dispositivos de control
  • Dispositivos de control
  • Estres por falta de control
  • La falta de control del grupo
  • Conocimiento Y Alambrado De Dispositivos De Control
  • Dispositivos de control eléctrico y electrónico
  • dispositivos de control de un separador trifasico

Conviértase en miembro formal de Buenas Tareas

INSCRÍBETE - ES GRATIS