A Day in the Life of a Checkin
Using Git And Friends
[This is a stub since I (Squirrel) didn't have much to add to this discussion. Please add details.]
We discussed using Git, Mercurial, and similar distributed version-control systems. The consensus was that DVCS works best with continuous integration if you have a central repository from which to draw candidates for CI builds. Why use a DVCS then, since it is not necessarily distributed? The tools provide extra flexibility before the commit to the main repository, and have other useful features besides (e.g. Git's whole-tree diffs allow you to track code as it moves from file to file). However, the integration of these tools with popular IDEs like Eclipse and NetBeans leaves something to be desired.
Synchronous CI
[This is a stub since I (Squirrel) didn't have much to add to this discussion. Please add details.]
We discussed the synchronous, less-automated continuous integration described by James Shore here: http://jamesshore.com/Blog/Continuous-Integration-on-a-Dollar-a-Day.html. No one had actually tried this CI method, but most did not think it was workable with larger teams and complex tests (such as the acceptance tests described in the next section). We also couldn't see how it would work when some team members are remote (or just working from home or a customer site temporarily). See this photo of the board from this part of the session.
CI Real-World Example
We discussed how CI runs at youDevise, a small financial-services firm in London. Here's a photo of the board showing how the process works; unfortunately, I don't seem to be able to add the diagram right here.
A checkin at youDevise follows these steps through our continuous-integration process:
Step 1: First, our checkin has to announce its birth to the world. The checkin causes a hook to run in source control - see commitinfo in CVS and post-commit hooks in SVN. This hook script updates a file on a webserver that says "hey, checkin to work on here!" We use a webserver to avoid having all our CI servers checking source control all the time just to see if anything is changed - when we started to get lots of CI instances, CVS couldn't cope. (Maybe Subversion will be better when we switch, but why add load you don't need?)
Step 2: Next, the checkin has to pass basic tests. The main build server, running CruiseControl, notices the flag and retrieves the latest code. It compiles the code, then farms the testing to two servers: one runs unit tests via JUnit, and one does loads of static checks (Checkstyle, FindBugs, JDepend, Emma, and Testability Explorer, among others). How does it do the farming? The two slaves and the master share a directory via Samba; the slaves check every few seconds for a trigger file, and get the compiled code from the same directory once they see a trigger. They publish their results to the same shared directory, and the master waits for both to report a result (timing out if they take too long). Finally, the master extracts a list of the modifications from the CruiseControl log file and stores these in the shared directory. This build takes 5-10 minutes.
Step 3: Our checkin now has to pass browser tests (we build only web applications, so that's all we need to run for acceptance). Three acceptance-test servers, each running CruiseControl and each intended to test a specific browser, monitor the same shared directory we mentioned in step 2. Each one is looking for a different trigger indicating that there has been at least one successful master build since its last build. (This dependency on a successful master build means these servers form the second link in the build pipeline.) When the acceptance-test server sees the trigger, it gets the compiled code from the main build server as well as all the modification lists for any builds since its last one (both of these come from that same shared Samba directory again); the code to check the modification lists is a little custom plugin we wrote for CruiseControl. It deploys this code in Tomcat (which includes automated start and population of a MySQL database) and tells Selenium RC on one or more slave servers to start running. Our most popular browser, IE6, uses three slaves; the other two browsers just have one each, as we don't care as much about a speedy answer from them. (We found that if you don't have at least one slave, the run isn't efficient - separating Tomcat/MySQL from the browser seems to make everyone happy about resource usage.) Finally, when the acceptance tests are done running on the slave or slaves, Selenium RC reports results back to the master. This build takes 20-90 minutes.
Step 4: There are some other hurdles for our checkin: two other servers also run tests triggered by a checkin, but not necessarily right away. One runs some very complicated calculations tests overnight, and one runs some specialised tests for a particular subproduct (including unit and acceptance tests). Both run CruiseControl and check the status file via HTTP as described in step 1.
Step 5: The last step in the life of a checkin is getting used by people. We have a completely automated release-management tool (written in-house) that picks up built code from the main-build server - same shared directory again! - and publishes it to various testing environments, all with one button click. Releasing to production is a slightly more complex process for security reasons, but once the code is delivered to the production site the process is again fully automated.
Results of all these steps are displayed on two monitors in our office, with red or green rectangles indicating the current status of each. Failure at any stage also triggers an email to interested parties, who are supposed to fix the problems identified; while they are doing this, they mark the corresponding rectangle orange (so everyone knows someone is working on the problem).
By the way, all these "servers" are just commodity desktops, jammed into our server room on some simple shelves. If we want to expand further, we'll certainly want to consider virtualisation (or a bigger server room!) We've experimented with this on one master/slave pair for acceptance tests; it seems to work OK (that is, nothing is faster or slower) but you do have to be careful to configure carefully (we had a network misconfiguration on the host OS that caused the guest to lose its connection mysteriously every once in awhile).
We didn't get to a lot of discussion on this setup (which was too bad, as feedback was what I'd hoped to get). One participant said he runs a similar setup, but most do not have a complicated build pipeline of this kind running. I'm told that various modern CI servers (see CI_Smackdown) make this easier to set up out of the box, but no one said they were using these features (and they weren't available when we started building ours).