Managing Test Data All The Way To Production

very common to write tests against sample data but the won't deploy until run against more real data

the big concern is that in dev/CI there is a certain type of data vs the prod like which we not have access to

dream is to have an oracle type thing which can go get results from a prod like data

another idea would be to have prod like data in dev

for every test, you can calculate/create your data and therefore expected results from real data

- expense of doing this is too high

how to handle if steps have different implementation in prod-like vs in dev

if the app logic is used for both data setup and testing, "false == false is true"

somethings only need to be run in dev/CI not in prod-like

flag individual tests as "can be run in production" and then bring in new steps for those that can't be

- identify the highest value thing to be tested at prod and implement it into prod

- tests continue to run against prod so if different types of data come into prod then it can turn red

can do random generation of data or pre-set data

- random can be slow so not worth it

- most people do pre-set data

question: why do we care about sandbox data vs "real" prod data?

- volume

- diversity

are tests for anything other than holding regression?

- this is the problem and why the deploy guys do not want to take your sandbox data based test

- need to build trust with prod by proving you are using prod-like data

is an imaginative QA useful if that has never happened in prod? do we care about that bug then?

- does it require that someone manually defines what data is required to be passing before going to prod?

is valuable to run automated tests against prod

- NO: more worthwhile to monitor prod because prod should find things naturally not thru automation

- NO: if something is caught in prod it will be pushed back to the CI/Local stage as a test

- YES: it is worth us finding it first

how to make effective sandbox data

- straight snap shots

-- problem is that it needs to stay in sync

-- must anonymize the data

- have a tool which can identify boundaries in a prod database to create the sandbox

- completely man made data

-- possibly with quickcheck(?) which can create your data based on rules

- create the relationships through the actual app

load testing can be handled against either prod-like or sandbox data

two different types of load testing needed

- a bottleneck in the backend code

- a traffic concerns with parallel computing