Managing Test Data All The Way To Production
Basic notes -> needs to be edited for readability:
very common to write tests against sample data but the won't deploy until run against more real data
the big concern is that in dev/CI there is a certain type of data vs the prod like which we not have access to
dream is to have an oracle type thing which can go get results from a prod like data
another idea would be to have prod like data in dev
for every test, you can calculate/create your data and therefore expected results from real data
- expense of doing this is too high
how to handle if steps have different implementation in prod-like vs in dev
if the app logic is used for both data setup and testing, "false == false is true"
somethings only need to be run in dev/CI not in prod-like
flag individual tests as "can be run in production" and then bring in new steps for those that can't be
- identify the highest value thing to be tested at prod and implement it into prod
- tests continue to run against prod so if different types of data come into prod then it can turn red
can do random generation of data or pre-set data
- random can be slow so not worth it
- most people do pre-set data
question: why do we care about sandbox data vs "real" prod data?
- volume
- diversity
are tests for anything other than holding regression?
- this is the problem and why the deploy guys do not want to take your sandbox data based test
- need to build trust with prod by proving you are using prod-like data
is an imaginative QA useful if that has never happened in prod? do we care about that bug then?
- does it require that someone manually defines what data is required to be passing before going to prod?
is valuable to run automated tests against prod
- NO: more worthwhile to monitor prod because prod should find things naturally not thru automation
- NO: if something is caught in prod it will be pushed back to the CI/Local stage as a test
- YES: it is worth us finding it first
how to make effective sandbox data
- straight snap shots
-- problem is that it needs to stay in sync
-- must anonymize the data
- have a tool which can identify boundaries in a prod database to create the sandbox
- completely man made data
-- possibly with quickcheck(?) which can create your data based on rules
- create the relationships through the actual app
load testing can be handled against either prod-like or sandbox data
two different types of load testing needed
- a bottleneck in the backend code
- a traffic concerns with parallel computing