Configuration Changes

Configuration changes/Release Rolling Back vs.Rolling Forward Session

Planning release steps while maintaining invariants

Andy Parker's master thesis
https://github.com/zaphod42/Coeus
start from a cluster configuration
how do you roll out a new service while keeping things going
There can be no SPOF
Define a new language (think Prolog like) to describe rules/policies -> turn declarative policies-based language into puppet execution plan
match execution plans (goal based) against policies
provisioning large systems
idea: model-checking failing systems
applying modely-checking to sysadmin
real world failures are complex, how do you model them? Problem with all model checking approaches

Why does it seem to be that so few plan for reverting releases?

discussion starting point: tools are concerned with going forward, so are teams. Usually no explicit backout plans, or if there is, rarely tested - when some of it can be automated (contrast it with the 3am call and hacking a bugfix forward)
problematic naming: rollback is a bad word, backout is better
you cannot roll-back time
continuous rolling forward: what happens when something goes wrong during deployment?
migrating databases? Whole concept of db refactoring (see Scott W. Ambler's book)
django south, rails migrations, etc.
link w/ application architecture: isolating things prevents failure propagation
migration use different data models
need a 3rd pipeline to build the data (after code, infrastructure)
eg. anonymizing data : cannot rollback, need to be done in production
once you got forward there are two many paths to go back
depends on your scenario? What's the difference between roll-forward/roll-back
fail in unexpected way (corrupting data could affect your application)
"stopping time" by switching systems (maintain parallel installations of systems)
easy to have a default rollback for mainline scenario, without losing newly gathered data (e.g.: added a new field to signup form, this needs to be backed out, we can remove field, and keep all data, even customer's that have signed up after the release)
what about featuretoggles? Could be used to handle suche cases.
basic issue w/ the idea of rolling-back: means losing data, you cannot rollback your data
you should implement a rollback scenario if you can (depends on the risk, costs...)
the effort to do it correctly is much higher than most people do
snapshot: need to be in a consistent state
no way to rollback after some time has passed (eg. deploy in weekend, failure occurs in week days)
if rollback is not possible, be aware of it and prepared to roll forward
come up with a design where you don't have to do it: lowers the risk enough...
clever system allow to dit by connection, by user, by feature
allow to tune for some users, provide some resource consuming feature to part of users, not to users
DI is better than feature branch for doing that
deploy schema changes alongside the code
just add to database, do not removing anything - all older versions of the app can use the new schema (consider meaningful defaults, and beware of the potential performance hit you are taking with increased record size)
featuretoggles used to test new database
deploying schemas in advance give your more confidence (but does not solve the rollback problem) - database shadowing, so it's like the additive only schema changes, just temporarily and not forever
running live data through a secondary installation that contains the old version
event sourcing provides the ability to replay stuff
pb: how much time does it take?
but the events have schemas themselves...
finding ways to mitigate your inability to do anything about something going wrong
reducing the barrier to going in production: being minutes away from delivering
how do we make people more aware of the problem? lot of developers have not worked on the ops part, dealing with the unexpected
Google engineers are on ops for a month after pushing a new release of a soft
product teams actually run the software (not always feasible due to regulations)
the whole forward/backwards discussion is not concerned with undoing multiple releases

Some scenarios given that you can't recover from in a planned way

the new release of the application starts to generate gibberish data. How do you downgrade to the previous version and restore old data and clean data that has been generated since?
does your backout script work when the release has not completed, but failed halfway through?
what do you do with large amounts of data (though this might already be a problem for the actual release)?

And unfortunately database level application integration (many apps read-write the same database tables) is not yet extinct.

Video Recording of the Session

http://skillsmatter.com/podcast/agile-testing/rolling-back-rolling-forward

Configuration Changes

Contents

Configuration changes/Release Rolling Back vs.Rolling Forward Session

Planning release steps while maintaining invariants

Why does it seem to be that so few plan for reverting releases?

Some scenarios given that you can't recover from in a planned way

Video Recording of the Session

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools