1-Introduction

Video created from my 734 (http://www.cs.unc.edu/~dewan/734/current/) Fall 13 lecture . See https://mix.office.com/watch/ak1oye6lv5qe for a downloadable slide-based structured "video", http://www.cs.unc.edu/~dewan/734/current/lectures/1-Introduction.pptx for the original PPT and http://www.cs.unc.edu/~dewan/734/current/lectures/1-Introduction.pdf for the pdf version

1.0x

1-Introduction

Created 2 years ago

Duration 1:49:21
lesson view count 24
Video created from my 734 (http://www.cs.unc.edu/~dewan/734/current/) Fall 13 lecture . See https://mix.office.com/watch/ak1oye6lv5qe for a downloadable slide-based structured "video", http://www.cs.unc.edu/~dewan/734/current/lectures/1-Introduction.pptx for the original PPT and http://www.cs.unc.edu/~dewan/734/current/lectures/1-Introduction.pdf for the pdf version
Select the file type you wish to download
Slide Content
  1. Distributed Systems

    Slide 1 - Distributed Systems

    • Instructor: Prasun Dewan (FB 150, dewan@unc.edu)
  2. Course Home Page

    Slide 2 - Course Home Page

    • http://www.cs.unc.edu/~dewan/734/current/index.html
  3. Lectures and Assignments

    Slide 3 - Lectures and Assignments

    • Current assignment is on the web - start working ASAP on it
    • No book
    • PPT slides and sometimes word doc
    • Outline of other assignments given
  4. Software

    Slide 4 - Software

    • Software to be continuously updated
  5. Grade Distribution

    Slide 5 - Grade Distribution

    • Exams (Two midterms, no final)
    • 40%
    • Assignments (Home work)
    • 60%
    • Fudge Factor (Class participation, other factors)
    • 10%
  6. Getting Help

    Slide 6 - Getting Help

    • Can discuss solutions with each other at a high level
    • Not at the code level
    • Sharing of code is honor code violation
    • Can help each other with debugging as long as it does not lead to code sharing
    • Assignments may contain solution in English (read only if stuck)
  7. Piazza

    Slide 7 - Piazza

  8. Distributed Program?

    Slide 8 - Distributed Program?

    • A program “involving” multiple computers
    • Specific computers must be bound at run time
    •  Program can run on a single computer
    • Definition involves processes
  9. Program vs. Process vs. Thread

    Slide 9 - Program vs. Process vs. Thread

    • Program
    • Process
    • Execution instance
    • Thread
    • Thread
    • Process is execution instance of program, associated with program and memory
    • Same program can result in multiple processes
    • Thread is also an independent activity, but within a process, associated with a process and a stack
    • Processes are independent activities that can interleave or execute concurrently
  10. Distribution of Processes/Threads

    Slide 10 - Distribution of Processes/Threads

    • Process
    • Thread
    • Thread
    • Process
    • Thread
    • Thread
    • Process
    • Thread
    • Thread
    • Different processes can execute on different (distributed) computers
    • A single process executes on one machine
  11. Distributed Program

    Slide 11 - Distributed Program

    • Process
    • Thread
    • Thread
    • Process
    • Thread
    • Thread
    • Connection
    • Execution instance
    • Execution instance
    • Connected process pair : Some computation of a process can be influenced by or influence computation of the other process
    • Connected process group: each process is coupled to at least one other process in the group
    • Graph crated by creating pair-wise dependency links is not partitioned– every node reachable from every other node
  12. Logical vs. Physical Inter Process Connection Links

    Slide 12 - Logical vs. Physical Inter Process Connection Links

    • Process
    • Thread
    • Thread
    • Process
    • Thread
    • Thread
    • Logical Connection
    • Process
    • Thread
    • Thread
    • Physical & Logical Connection
    • Physical m & Logical Connection
    • Physical coupling links are physical inter process communication links along which information flows in the network
    • Logical links indicate computational dependencies
    • Relayer
    • Can have logical links without physical links
    • Physical links usually imply logical links
  13. Distributed Applications

    Slide 13 - Distributed Applications

    • Distributed applications?
    • Non distributed applications?
    • In today’s world, what is or should not be distributed?
  14. Some Distributed Domains

    Slide 14 - Some Distributed Domains

    • Distributed Repositories (Files, Databases)
    • Remotely Accessible Services (Printers, Desktops)
    • Collaborative Applications (Games, Shared Desktops)
    • Distributed Sensing (Disaster Prediction)
    • Computation Distribution (e.g. Simulations)
    • Full courses on some of these areas, with concepts specific to them (Distributed Databases, Collaborative Applications)
    • Will look at domain-independent concepts at the intersection of them
    • Will not take an application-centric view
    • Fundamental Issues?
  15. Distribution vs. Concurrency Program

    Slide 15 - Distribution vs. Concurrency Program

    • Process
    • Process
    • Connection
    • Process
    • Thread
    • Thread
    • Process
    • Thread
    • Thread
    • Process
    • Thread
    • Thread
    • Connection
    • Distribution, no fine-grained concurrency
    • Distribution and fine-grained concurrency (typical)
    • Concurrency, not distribution
  16. Non-Distributed vs. Distributed Program

    Slide 16 - Non-Distributed vs. Distributed Program

    • Creates a single process logically and physically unconnected to any other process
    • Creates a pair or larger group of connected processes
    • Must deal with sequential and possibly concurrency issues
    • Must also deal with distribution and usually concurrency issues
    • Non-Distributed
    • Distributed
  17. Systems ViewPoint

    Slide 17 - Systems ViewPoint

    • Operating System
    • Query Language, Transactions, …
    • Database Management System
    • Programming Languages
    • System
    • Distributed Systems
    • Computer abstractions to implement some class of programs
    • Processes, Files, Memory Management , Threads…,
    • Arrays, Loops, Classes, …
    • Data Communication, Remote Procedure Call (RPC), …
    • RPC assumes communication consists of procedure requests and return value responses
    • Byte/object communication consists of byte/object of exchange
  18. Distributed Systems

    Slide 18 - Distributed Systems

    • Study of design and/or implementation of computer abstractions for developing distributed programs
    • Why distributed systems?
    • Why systems?
    • Alternatives to understand how to program some domain of applications?
    • Non distributed programs?
  19. Alternatives to Understanding

    Slide 19 - Alternatives to Understanding

    • Programming: Use of a specific set of non distributed abstractions (e.g. , functional, MATLAB programming)
    • Distributed Programming : Use of a set of distributed abstractions (e.g. Socket/RPC Programming)
    • Design and implementation of non distribution abstractions (Object-Oriented vs. Functional Languages, Compilers/Interpreters)
    • Design and implementation of distributed system abstractions (e.g. Data Communication /RPC Design and/or Implementation)
    • Non distributed model and algorithms ( Turing Machines, HeapSort,)
    • Distributed Models and Algorithms(e.g. 2-Phase commit, Group Comm. Model)
    • Programming: Abstraction use
    • Systems: Abstraction design and/or implementation
    • Theory: Models and algorithms
  20. Rationale

    Slide 20 - Rationale

    • Abstraction design linked to implementation: Designs are done of only efficiently implementable abstractions
    • Abstractions are implemented operational models and have (the more) practical algorithms in them
    • Maturity with design and implementation issues allows you to better understand the semantics of a specific abstraction.
    • Abstraction Design vs. Implementation
    • Abstractions vs. Theory (Models, Algorithms)
    • Abstraction Design & Implementation vs. Use
    • Abstract implementations require advanced programming/ software engineering techniques– “you cant really program if you have not written a compiler”
  21. Teaching Abstraction Design & Implementation?

    Slide 21 - Teaching Abstraction Design & Implementation?

    • Lectures address design; assignments, implementation (e.g. Implement a PL interpreter in another PL)
    • Lectures give high-level pseudo code for complex algorithms; assignments full implementation (e.g. compilers)
    • Lectures discuss code for a system of abstractions : assignments extend/modify this code
    • Implementations can be complex and need instruction
    • Pain/gain ratio high, semester barely enough time for compiler
    • Code must be understandable and ideally also elegant
  22. The Xinu Approach to Teaching OS

    Slide 22 - The Xinu Approach to Teaching OS

    • Thread Management
    • Thread Synchronization
    • Thread Communication
    • Interrupt Management
    • Layering
    • Reuse of previous layers keeps code short (and hence presentable in class)
    • Can unravel a system in stages to a class
    • Layering good for software engineering as well as pedagogical reasons
    • Approach not used in distributed computing
    • Need distributed system layers
  23. Layers Exist in Networking

    Slide 23 - Layers Exist in Networking

    • Physical Communication
    • Link-Level Communication
    • IP
    • UDP
    • TCP/IP
    • Physical communication in networking involves machines and used hardware machine addresses
    • Physical communication in distributed systems is between processes and indicates routing of information among processes
  24. Distributed vs. Network Layers

    Slide 24 - Distributed vs. Network Layers

    • Process
    • Thread
    • Thread
    • Process
    • Thread
    • Thread
    • Logical Connection
    • Process
    • Thread
    • Thread
    • Physical Connection
    • Physical Connection
    • Networking addresses physical connections and byte communication among processes
    • No separate logical connections, object communication, synchronization, fault tolerance
    • Low-level (hidden from programmers) abstractions
  25. Distributed  System vs. Networking Abstractions

    Slide 25 - Distributed System vs. Networking Abstractions

    • Networked Abstractions
    • Distributed Abstractions
    • Assembly Language Abstractions
    • Programming Language Abstractions
    • OS Byte Communication API
    • Distributed Abstractions
    • Just as programming language abstractions are built on top of assembly language abstractions
    • Distributed system abstractions are built on top of networked abstractions
    • Byte communication APIs, close to networked abstractions, is provided by operating systems (e.g. sockets), which hide networking abstractions
    • Knowledge of assembly/networked abstractions important to implement PL/distributed abstractions
  26. Domain Independent?

    Slide 26 - Domain Independent?

    • Distributed Repositories (Files, Databases)
    • Remotely Accessible Services (Printers, Desktops)
    • Collaborative Applications (Games, Shared Desktops)
    • Distributed Sensing (Disaster Prediction)
    • Computation Distribution (e.g. Simulations)
    • Will look at domain-independent concepts at the intersection of them
    • OS Byte Communication API
    • Distributed Abstractions
    • Even though OS abstractions developed to build distributed OS (file systems), they are by definition domain-independent
  27. Language vs. OS Abstractions

    Slide 27 - Language vs. OS Abstractions

    • Both operating systems and programming languages provide domain-independent abstractions
    • Operating systems support processes and language-independent abstractions for accessing protected info and sharing information among processes (files, IPC)
    • Programming languages must provide fine-grained abstractions needed within a process
    • They also provide an interface to OS abstractions through libraries or language constructs
    • They can also extend the OS abstractions (e.g. typed files)
  28. Language vs. OS,  Distributed Abstractions

    Slide 28 - Language vs. OS, Distributed Abstractions

    • Byte communication is all that operating systems provide
    • Non distributed programming languages such as C provide only OS abstractions
    • Distributed programming languages such as Java provide a richer variety of abstractions
    • Will use Java as implementation language
    • Java provides threads and reflection, making it easy to implement our own replacements and extensions of Java abstractions
    • To extend and replace Java abstractions/layers, knowledge of them useful
  29. Java Abstractions

    Slide 29 - Java Abstractions

    • Blocking byte communication (Sockets)
    • Blocking stream object communication (Object Stream)
    • Remote procedure call (RMI)
    • Non blocking byte communication (NIO)
  30. Java Layers

    Slide 30 - Java Layers

    • Go beyond Java layers?
    • OS Byte Communication
    • Blocking byte communication (Sockets)
    • Blocking stream object communication (Object Stream)
    • Non blocking byte communication (NIO)
    • Remote procedure call (RMI)
  31. Beyond Java Layers

    Slide 31 - Beyond Java Layers

    • Implementation
    • OS Byte Communication
    • Blocking byte communication (Sockets)
    • Blocking stream object communication (Object Stream)
    • Non blocking byte communication (NIO)
    • Remote procedure call (RMI)
    • Sync Replicated Objects
  32. GIPC: Improved Abstractions and Layers with Open Source

    Slide 32 - GIPC: Improved Abstractions and Layers with Open Source

    • Pair-wise Byte, and Object Communication,
    • Pairwise RPC
    • Pair-wise Synchronization
    • Group Communication and RPC
    • Group Synchronization
    • Scalability
    • Fault Tolerance
    • GIPC layers will be replaced, augmented with assignment layers
  33. Course Plan Principle

    Slide 33 - Course Plan Principle

    • Cover material for next assignment (and other relevant material)
    • Do next assignment
    • Lectures
    • Assignments
    • Boundary conditions?
  34. Use Non Blocking I/O

    Slide 34 - Use Non Blocking I/O

    • Java NIO
    • Use NIO Tutorials and class lectures to Implement a Distributed Program
    • Lectures
    • Assignments
    • Distributed Non Blocking Simulation
    • Existing non distributed simulation
    • Java NIO
  35. Halloween Simulation

    Slide 35 - Halloween Simulation

    • Make Beau Anderson’s 401 Halloween implementation distributed
  36. Use RMI

    Slide 36 - Use RMI

    • Java RMI
    • Use Java Tutorials and class lectures to Implement an RMI-based Distributed Program
    • Lectures
    • Assignments
    • Distributed RMI-based Simulation
    • Existing non distributed simulation
    • RMI
  37. Use Sync Replicated Objects

    Slide 37 - Use Sync Replicated Objects

    • Java Sync
    • Use class lectures and Sync to Implement an RMI-based Distributed Program
    • Lectures
    • Assignments
    • Replicated Simulation
    • Existing non distributed simulation
    • Sync
  38. Blocking vs. Non Blocking Sockets

    Slide 38 - Blocking vs. Non Blocking Sockets

    • Understand the differences between blocking and non blocking communication
    • Lectures
    • Assignments
    • High-level buffer communication
    • Java (Non Blocking) Channels
    • High-level buffer communication
    • Java (Blocking) Sockets
  39. Recursion and Serialization

    Slide 39 - Recursion and Serialization

    • Understand serialization and really understand recursion
    • Lectures
    • Assignments
    • High-level object communication
    • High-level Buffer comm.
    • Java Object Streams
    • High-level object communication
    • High-level Buffer comm.
    • Custom Object Serialization
  40. Synchronization and RPC

    Slide 40 - Synchronization and RPC

    • Lectures
    • Assignments
    • Remote Procedure Call
    • High-level Object comm.
    • High-level Object Communication with Synchronization
    • Java Thread Synchronization
    • Remote Procedure Call
    • High-level Object comm.
    • Java Thread Synchronization
    • Use and implement pairwise synchronization
  41. Group Communication and Fault Tolerance

    Slide 41 - Group Communication and Fault Tolerance

    • Lectures
    • Assignments
    • High-level Object comm.
    • Group Communication
    • Fault Tolerance
    • RPC
    • High-level Object comm.
    • More Functional Group Communication
    • More Efficient Fault Tolerance
    • RPC
    • Use and implement group synchronization and fault tolerance and group communication
  42. Last Phase

    Slide 42 - Last Phase

    • Lectures
    • Assignments
    • High-level Object comm.
    • More Functional Group Communication
    • More Efficient Fault Tolerance
    • RPC
    • Transactions?, Distributed Hashtables?, Multiprocessor systems?, ….
  43. Objectives

    Slide 43 - Objectives

    • At the end of the course you will …..
  44. Distributed Computing

    Slide 44 - Distributed Computing

    • Distributed Repositories (Files, Databases)
    • Remotely Accessible Services (Printers, Desktops)
    • Collaborative Applications (Games, Shared Desktops)
    • Distributed Sensing (Disaster Prediction)
    • Computation Distribution (e.g. Simulations)
    • Internet/Cloud computing increasing relevance of the fundamental concepts
  45. Practical Relevance

    Slide 45 - Practical Relevance

    • For distributed applications, likely to use the code you implemented than existing abstractions
    • Existing Java RPC does not work on Android devices, but the one you implement will
    • Can send objects over NIO socket channels
    • Will implement many abstractions not part of standard Java
    • Use Sync, which apparently is the basis of some new Mobile platforms
  46. Software Engineering Principles

    Slide 46 - Software Engineering Principles

    • Interfaces
    • Factories and Abstract factories
    • Existing classes will be used, inherited but not modified directly
    • Classes
    • Alternative implementations will create new classes implementing existing interfaces
    • These will allow easy switching between different implementations
    • Generics
    • Implementation rather than use of generics to unite buffer and object communication
    • Will be both a distributed computing and software engineering course
  47. Relevance to OS

    Slide 47 - Relevance to OS

    • Inter-process communication key to design of new OS’s, even non distributed OS
    • Extensive use of bounded buffers
    • Will study and use thread synchronization in depth
    • Will gain understanding of fundamental OS concepts except memory management
    • Will study how distributed OS are implemented
  48. Introduction to Systems

    Slide 48 - Introduction to Systems

    • Design and implementation of non distribution abstractions (Object-Oriented vs. Functional Languages, Compilers/Interpreters)
    • Design and implementation of distributed system abstractions (e.g. Data Communication /RPC Design and/or Implementation)
    • Systems: Abstraction design and implementation
    • Distributed systems covers concepts from many fields
  49. Extra Slides

    Slide 49 - Extra Slides

  50. Alternative Java Layers

    Slide 50 - Alternative Java Layers

    • OS Byte Communication
    • Blocking byte communication (Sockets)
    • Blocking stream object communication (Object Stream)
    • Remote procedure call (RMI)
    • Non blocking byte communication (NIO)
    • Remote procedure call
    • Could have more efficient RPC and non blocking object communication
    • Non blocking object communication
    • Two RPC’s?
  51. Improved Alternative Java Layers

    Slide 51 - Improved Alternative Java Layers

    • OS Byte Communication
    • Blocking byte communication (Sockets)
    • Blocking stream object communication (Object Stream)
    • Non blocking byte communication (NIO)
    • Non blocking object communication
    • Could do late binding between RPC and lower-level communication
    • Go beyond Java abstractions?
    • Socket communication is low level
    • NIO is even lower level
    • Programmers rely on usage patterns
    • Cannot unite NIO and socket at byte or object level
    • Remote procedure call
  52. Pattern vs. Abstraction (1-Computer Programming)

    Slide 52 - Pattern vs. Abstraction (1-Computer Programming)

    • public final static int RED = 0;
    • public final static int BLUE = 1;
    • public final static int GREEN = 2;
    • int color = RED;
    • public enum Color {RED, BLUE, GREEN};
    • Color color = Color. RED;
    • public final static int LIKE= 0;
    • public final static int DISLIKE = 1;
    • public final static int NEUTRAL = 2;
    • int response = NEUTRAL;
    • public enum Response {LIKE, DISLIKE, NEUTRAK};
    • Response response = Response.NEUTRAL;
    • Pattern
    • Abstraction
  53. Java NIOTutorial for Echo Server

    Slide 53 - Java NIOTutorial for Echo Server

    • public void run() {    while (true) {      try {        // Process any pending changes        synchronized(this.changeRequests) {          Iterator changes = this.changeRequests.iterator();          while (changes.hasNext()) {            ChangeRequest change = (ChangeRequest) changes.next();            switch(change.type) {            case ChangeRequest.CHANGEOPS:              SelectionKey key = change.socket.keyFor(this.selector);              key.interestOps(change.ops);            }          }          this.changeRequests.clear();         }        …..          }
    • :
    • NioServer.java
    • EchoWorker.java
    • ServerDataEvent.java
    • ChangeRequest.java
    • NioClient.java
    • RspHandler.java
    • Vast majority of tutorial readers will copy and edit this pattern
    • Much better to identify a corresponding abstraction and implement it to understand channels
  54. Problem with Java Abstraction Level

    Slide 54 - Problem with Java Abstraction Level

    • OS Byte Communication
    • Blocking byte communication (Sockets)
    • Remote procedure call
    • Non blocking byte communication (NIO)
    • Non blocking object communication
    • Blocking stream object communication (Object Stream)
    • Socket communication is low level
    • NIO is even lower level
    • Programmers rely on usage patterns
    • Cannot unite NIO and socket at byte or object level
    • New picture?
  55. Improved Alternative Java Abstractions and Layers

    Slide 55 - Improved Alternative Java Abstractions and Layers

    • OS Byte Communication
    • Blocking byte communication (Sockets)
    • Remote procedure call
    • Non blocking byte communication (NIO)
    • Can be bound to either lower level layer
    • Alternative high level layer to socket and NIO based byte communication
    • New Object Communication
    • New Byte Communication
    • Picture complete? More abstractions and layers?
    • Top down vs. bottom up view point
    • Design and implementation challenge
  56. Improved Alternative Java Abstractions and Layers

    Slide 56 - Improved Alternative Java Abstractions and Layers

    • OS Byte Communication
    • Blocking byte communication (Sockets)
    • Remote procedure call
    • Non blocking byte communication (NIO)
    • New Object Communication
    • New Byte Communication
    • Link setup and communication: How to create a group of physically/logically connected processes and communicate informing along these links?
    • Distributed fault tolerance: How to recover when one end of the link goes down but the other does not?
    • Process synchronization: How to block a (thread in a) process until the information it needs to proceed is received from a (thread in a) remote process
    • Scalability: How to allow group size to increase without degrading performance?
    • No direct support for group setup and communication, scalability, fault tolerance, and any process synchronization
  57. New Abstractions: Design Challenge

    Slide 57 - New Abstractions: Design Challenge

    • Pair-wise Byte, and Object Communication,
    • Pair-wise RPC
    • Pair-wise Synchronization
    • Group Communication and RPC
    • Group Synchronization
    • Scalability
    • Fault Tolerance
    • Layering?
  58. Distribution Issues

    Slide 58 - Distribution Issues

    • Process
    • Thread
    • Thread
    • Process
    • Thread
    • Thread
    • Logical Connection
    • Process
    • Thread
    • Thread
    • Physical & Logical Connection
    • Physical m & Logical Connection
    • Relayer
    • Link setup and communication: How to create a group of physically/logically connected processes and communicate informing along these links?
    • Distributed fault tolerance: How to recover when one end of the link goes down but the other does not?
    • Process synchronization: How to block a (thread in a) process until the information it needs to proceed is received from a (thread in a) remote process
    • Thread synchronization: How to block a thread until a condition for proceeding is enabled by a local thread
    • Scalability: How to allow group size to increase without degrading performance?