Change Control
Comments
Since this topic has restricted write access, you can use this form to easily post a comment :-
You can see previous comments from other people in
ChangeControlComments.
All comments on this policy are very welcome and will be addressed individually. Replies will probably be off-line with solutions merged into this policy. The History section will summarise what is actually changed.
New Change Note
Create new Change Note - Please read through this Policy as you fill out the template.
Existing notes are shown here:
ChangeNotes
Policy
ARCS provides a number of production services to researchers and developers. These need to be managed professionally with all changes planned carefully, considering the impact on end-users or other services.
ARCS teams (Systems, Data, Collaboration and Authorisation) will be responsible for documenting
ALL changes to production services.
Change Note
A change note will be required for planned changes to ARCS infrastructure such as installation, commissioning, upgrades and decommissioning of systems and services. This should include :-
Description - Describe the reason for the planned change. Examples include:
- Installation of a new system - description of the system and the services it will provide
- Upgrade to an existing system - describe the changes, eg. does it resolve existing issues?
- Decommissioning of a system - detail why the system is being decommissioned
Proposed Date - Indicate when the change is to occur. Make sure that there is enough time for review and notification unless it is an urgent change that needs to be implemented immediately. Non-urgent changes should be planned at least 1 week in advance.
Estimated Duration - Indicate how long the affected systems are services are expected to be unavailable. This duration should include ample time for implementation and testing of the proposed change.
Systems/Services Affected - Detail all systems and services that will be unavailable during the change period and indicate alternate resources that may be used during this period.
This should also provide an explanation to help users of our services understand how it will directly affect them.
Level of Impact
The Level of Impact is described as a combination of expected downtime and number of systems OR users effected. Each of the two factors have four levels of severity (1-4 for downtime, A-D for number of systems/users).
The levels are:
| 1 |
no downtime |
| 2 |
downtime less than 2 hours |
| 3 |
downtime more than 2 hours but less than 24 hours |
| 4 |
downtime more than 24 hours |
| A |
single service at one site OR less than 10 people/users affected |
| B |
single service at multiple sites OR minor loss of functionality affecting more than 10 people |
| C |
multiple services at one site OR significant loss of functionality affecting less than 10 people |
| D |
multiple services at multiple sites OR significant loss of functionality affecting more than 10 people |
These two factors can be combined in a severity matrix which describes each combination as either Low, Medium, High, Severe or Critical. This value will be used when announcing downtime, including on the GOC. A Critical combination should be avoided where possible (eg: by doing out of hours maintenance).
| |
A |
B |
C |
D |
| 1 |
Low |
Low |
Medium |
High |
| 2 |
Low |
Medium |
High |
Severe |
| 3 |
Medium |
High |
Severe |
Severe |
| 4 |
High |
Severe |
Severe |
Critical |
Examples:
- A complete update of all Data Fabric nodes (less then 2 hours, one service at multiple sites): 2B, High
- Complete power test at one site (more then 2 but less then 24 hours, multiple services at one site): 3C, Severe
- Sakai upgrade (1 service less than 2 hours, lots of users): 2D, Severe
Staff Responsible - List the people who are performing the change and their contact details.
Detailed Instructions - Explain exactly what will be done - this may link to other Wiki topics, attachments or pages within ARCS Trac projects.
Testing Procedures - Detail the test plan that will be performed to ensure that the proposed change has been successful.
Back-out Procedures - Detail the backout plan that will be performed if there are problems with implementation of the proposed change. This will ensure that systems and services can be restored to the state they were in prior to commencement of the change.
Review - All changes must be reviewed. (

the level of impact may affect how detailed this review should be).
Approval - Changes may be planned by any team member but must be approved by a Manager or Team Leader. Notes should be reviewed before approval.
Schedule/Notification - Once APPROVED
- Add downtime entries to the GOC for each service or host that will be offline (see: Status/Downtime below).
- Ensure that a notification is sent to affected users (via relevant email lists) prior to the change, with 1 weeks advance notice (where possible). Another notification should be sent after the change process indicating the success or failure.
- Notification should be sent in a consistent format.
TODO: use a template or automated system? Generate from metadata?
Tracking completion - Some changes will be required at multiple sites, progress should be monitored to make sure all sites have completed within a specified time. This should include problems encountered.
Status/Downtime
To monitor system status, we have the
GOC (soon to be replaced by a new fancy looking system) and automated INCA tests. These often detect and notify us of problems but the overall status of services needs to be easily found from the main ARCS web page.
All downtime that affects production services will be entered into the GOC. This will include scheduled and unscheduled events and events that are not under ARCS control, eg. power or AARNET networking problems.
Development
Where possible, a development system should be setup for experimenting and testing while preparing the Change Note. This is generally very easy with the use of virtual machines.
This change control policy does not apply to development systems, ie. changes to development machines do not need to be documented - it is assumed they will be rebuilt often.
Most systems and data services are deployed on a common CentOS Linux/Xen infrastructure using RPMs. The
Systems Trac Project exists to manage source code/scripts, bugs, milestones and guides for installing/upgrading services.
There is a process for making changes to this project, see:
Making Changes. In summary :-
- make changes, sometimes in a separate branch
- build an RPM in the development repository and install on a development machine that is as close as possible to the production system needing a change
- possibly write a release note explaining how to apply the updates
- this may be linked to, from the ARCS Change Note (detailed instructions)
-
and could be useful for external groups using the Trac project/repository
Services
This list is also defined elsewhere, including the web site. It is useful to have a short summary here for extra comments or links to specific points we should consider, ie. a certain process may always be required when changing MDS.
Grouped in order to try and show some dependencies. Category does not necessarily indicate responsibility, eg. Systems will be responsible for changes to Jabber, even though it is a Collaboration tool.
Systems (some with be Auth soon)
- DNS
- MDS (index, ng2/MIP)
- Grix
- APAC CA
- ARCS VOMS
- MyProxy
- Grisu
- Globus GRAM (ng2)
- Globus GridFTP (ngdata)
- INCA, GOC
- Shib IdP
- SLCS Service
Data
Collaboration
- Wiki
- Jabber
- Mailing Lists
- Drupal
- Sakai
- Plone
- AccessGrid
- EVO
Implementation
The method for managing change control needs to be simple and not take so long that it holds things up.
Hopefully it will be possible to manage this process effectively using the existing Wiki system. The biggest issue is being able to show that a change has been approved by authorised staff. A plugin is available to help with this:
WorkflowPlugin - managed documents have a state, moving from one state to another is controlled by the plugin based on group access and logged.
A link at the start of this page creates a new topic, automatically named: ChangeNoteYYMM-NNN (this is a
WikiWord so can be easily linked to other topics).
The new topic is based on
ChangeNoteTemplate
- topic settings (via More topic actions - Edit settings for this topic ) specify the WORKFLOW and WORKFLOWHISTORYFORMAT to use: ChangeNoteWorkflow
- an INCLUDE for ChangeNoteInclude is used at the end to provide a consistent block of text at the bottom of every Change Note. Things such as variables to show the state in workflow.
- new topics use the PlanningForm and when finally approved, the ApprovedForm (both forms enabled in WebPreferences)
States: PLANNING - WAITING - Reject:PLANNING, Approve:APPROVED (by ArcsManagersGroup) - Revise:PLANNING?
Initially we are going to try this policy without enforcing write-access once WAITING or APPROVED. It is more useful to allow anyone to add extra comments and document progress.
CommentPlugin is not a work around for this security setting, if enabled. An alternative could be to comment into an extra topic - but it takes the user to that topic on posting and is confusing.
- if changes are made once APPROVED, they should be limited and not affect the actual plan that was approved
- the state should be changed back to PLANNING (Revise action) for more complicated changes
- last APPROVED time, revision and full history are provided by the plugin (and displayed by ChangeNoteInclude at the bottom of each topic) so the really important information is still available and can't be changed by users
Notes:
- All topics that manage this process are read-only except by ArcsManagersGroup.
- Plugin changes:
- 16/06/08 Modify UserIsAllowed in
TWiki/Plugins/WorkflowPlugin.pm to prevent TWikiAdminGroup from overriding workflow restrictions
- 23/06/08 Incorrectly put name of user to last change topic in history, instead of current user. See: TWiki:Plugins.WorkflowPluginDev
- the INCLUDE does not work within topic settings so it has to be a part of the main topic text and can be easily removed
- TWikiForms can not use INCLUDE to inherit fields from other forms - need to keep in sync manually
- HISTORY format does not work in WorkflowPlugin or ChangeNoteWorkflow, specify in ChangeNoteTemplate as topic setting instead
- this is actually better, because it makes a mess if the history format is changed once entries have been recorded
- Removing the WORKFLOW setting and re-adding does not reset any existing metadata
- Topics are automatically created using AUTOINC, See: TWikiTemplates#Automatically_Generated_Topicnam
- Users will be correctly asked to login before the workflow button will allow state change.
The button may not appear in some cases unless logged in.
- Can search on WORKFLOW name, but can't extract values for FormattedSearch
- pattern skips META, formfield only uses FORM data
- Options:
use separate searches, or consider TWiki:Plugins.MakeCtrlTopicsListAddOn
- OR can use expandvariables=on and pattern to display (extract from INCLUDED content works too) - this does not work reliably on ARCS Wiki, but did on SAPAC test system. See: ChangeNotesTest
- OR display form name as Status - data may be lost if state is moved backwards and a form with less fields is used
-
BUG: Click to Approve - it is approved, even if user does not save with new form details!
-
BUG: Sometimes the template local settings are not applied to the new topic, eg. ChangeNote200807? -003
History/Status
13/05/08 Darran Carey -
Change Process - originally based on examples in use at iVEC.
16/06/08 Daniel Cox - First version of this topic with content copied/moved from Systems Trac
- Other initial examples that will need to be converted to use this policy/Change Note:
25/06/08 Florian/Ashley - Ideas for Impact Matrix
30/06/08 Daniel/Florian - Merge Impact Matrix ideas (services and users). Allow topic to be edited while in the WAITING state.
30/06/08 Request for feedback from Developers
1/07/08 Ashley - Changes to matrix, more relevant now that we have merged services and users
Settings
--
DanielCox - 16 Jun 2008