- Cloud services have specific design considerations: Always on, distributed state, large scale, and failure handling.
- Azure is an OS for the cloud: scale out, dynamic and on demand
- Azure manages services not just servers: tell is what you want and it will help automate the details
- Frees developers from many platform issues: allows concentrating on logic instead of platform
This session will show us how this is done.
Characteristics of cloud computing
- Scale out not up
- Add and remove capacity on demand
- Pay for what you use as you go
- Automation is key to reducing costs
Design coniderations
- Failure of any node is expected: each node is a cache and must be replicated
- No one-time install step: apps need to reinitialize on restarts, don't assume previous local state is available
- Configuration changes to due to load or failures: handle dynamic configuration changes
- Services are always running: rolling up grades/downgrades; must handle data schema changes
- Services are build on multiple nodes / roles: document service architecture and communications paths
- Services can grow very large: careful state management at scale is needed
benefits of adhering to a windows Azure design point
- Azure manages services not just severs: tell it what you want and it automates; system manges nodes, services and network
- Automates sevice live-cycle management: MDA, allocation deployment snd SLA
- Turns pool of physical resources into a shared fabric: pay for what you use; platform insures isolation
Azure Service Lifecycls
- Coding and modeling
- Provisioning
- Deployment
- Maintain goal state
Service Model Guides Automation
- Describes service as distributed entities: authored by service developer, and configured by service developer
- Logical description of the services: same model used for testing and prod, mapped to hardware at deployment time
- Powerful declarative composition language
Azure Service Model Elements
- Service
- Role
- Group
- Endpoint
- Channel
- Interface
- Configuration settings
Fault Domains
- Purpose: avoid single points of failure
- Unit of failure: ex: compute node, rack of machines
- System considers fault domains when allocating service roles: ex: dont put all roles in the same rack
- Service owner assigns number required by each role: ex: 10 front-ends across 2 fault domains
Update Domains
- ensure service stays up while updating
- unit of software/configuration update: ex: set of nodes to update
- Used when rolling forward or backward
- Developer assigns # required by each role
Dynamic Configuration Settings
- Purpose is to communicate settings to service roles; there is no registry for services
- Application configuraiton settings; declared by the developer; set by deployer
- System configuration settings: pre-declared, sample kinds for all roles (instance it, fault domain id, update domain id); assigned by the system
- Both cases, available at run-time via callbacks when values change
Azure Automation
- Fabric controller: maps declarative service specs to available resources; manages service life cycle starting from the bare metal; maintains system helth and SLA
- What's special about it?: MDA
Azure Push-Button Deployment
- Allocate nodes, across fault domains, update domains
- Place OS and role images on nodes
- Configure settings
- Start Roles
- Configure load balancers
- Maintain desiered # of roles: failed roles automaticall restarted; node failure results in new nodes automatically allocated
Managing running services
- Adding capacity: push-button; steps repeated to the running service
- Removing: pb, steps reversed
- Rolling service upgrades: pb, iterative\
Rapid reliable software provisioning
- image based multicast deployment (scalable and reliable)
- seperate OS and service images: images copid, not installed; same images used for physical machines and vms
- multiple images are cached
Monitoring and Events
- Log collection
- Alerts
- Usage metering
- Data available through portal
Service Isolation and Security
- Yours are isolated from other services: model is boundary of isolation; local node resources is temp storage; network end-points
- Isoolation using multiple mechanisms
- Automatic application of windows security patches (rolling OS image upgrades)
Axure is Highly Available
- Netowrk has redudancy: switches, lb, access routers
- Services deployed across fault domains (lb's route to active nodes only)
- Fabric controller state checkpointed:can roll back to previous chekpoints
Azure automates
- Provisioning and monitoring of hardware
- Hardware life cycle mmt
- Capacity planning
- Internal security measures
Roadmap
- PDC: automated service dployment; subset of service model - simple set of service template; can change # of isntances; simple upgrades / downgrades; automated service failure and recovery; hardware mgmt; managed code / asp.net; run in fixed-size VM instances; external virtual IP address per service; service network isolation enforcement
- 2009: expose more of underlying service model, native code; multiple data centers
084540da-eb0a-4bdd-a217-7a5849684c5a|0|.0