Configuration Changes

Configuration changes Session

https://github.com/zaphod42/Coeus
start from a cluster configuration
how do you roll out a new service while keeping things going
There can be no SPOF
Define a new language to describe rules/policies -> turn declarative policies-based language into puppet execution plan
match execution plans (goal based) against policies
provisioning large systems
idea: model-checking failing systems
applying modely-checking to sysadmin
real world failures are complex, how do you model them?
you cannot roll-back time
continuous rolling forward: what happens when something goes wrong during deployment?
migrating database? Whole concept of db refactoring
link w/ application architecture: isolating things prevents failure propagation
migration use different data models
need a 3rd pipeline to build the data (after code, infrastructure)
eg. anonymizing data : cannot rollback, need to be done in production
once you got forward there are two many paths to go back
depends on your scenario? What's the difference between roll-forward/roll-back
fail in unexpected way (corrupting data could affect your application)
"stopping time" by switching systems
easy to have a default rollback for mainline scenario
what about featuretoggles? Could be used to handle suche cases.
basic issue w/ the idea of rolling-back: means losing data, you cannot rollback your data
you should implement a rollback scenario if you can (depends on the risk, costs...)
the effort to do it correctly is much higher than most people do
snapshot: need to be in a consistent state
no way to rollback after some time has passed (eg. deploy in weekend, failure occurs in week days)
if rollback is not possible, be aware of it and prepared to roll forward
come up with a design where you don't have to do it: lowers the risk enough...
clever system allow to dit by connection, by user, by feature
allow to tune for some users, provide some resource consuming feature to part of users, not to users
DI is better than feature branch for doing that
deploy schema changes alongside the code
just add to database, do not removing anything
featuretoggles used to test new database
deploying schemas in advance give your more confidence (but does not solve the rollback problem)
event sourcing provides the ability to replay stuff
pb: how much time does it take?
but the events have schemas themselves...
finding ways to mitigate your inability to do anything about something going wrong
reducing the barrier to going in production: being minutes away from delivering
how do we make people more aware of the problem? lot of developers have not worked on the ops part, dealing with the unexpected
Google engineers are on ops for a month after pushing a new release of a soft
product teams actually run the software (not always feasible due to regulations)

Configuration Changes

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools