Some facts and figures from Google presented by Daniel are really quite scary:
- Google have 15,000 engineers in 40 offices
- There are 4,000 projects active, some of which result in technical debt and tests that end up being “flaky”
- the average team size is just under 4, and many are called Software Engineer in Test (SET)
- These teams make 5,500 software changes a day, averaging more than one per second
- The codebase has about 100 million lines of code. There is one code base all in one repository
- On average, 50% of the code changes monthly
- The “infinite grid” of servers is actually about 100,000 cores in a global grid
- 4,000,000 browsers are launched for testing (not all chrome)
- The complete suite of 5 million test cases are run 20 times a day, meaning ~100 million tests per day
Massive scale of change
In other words, there is a massive amount of change going on, with multiple diverse teams spread across wide geographies and complete testing all the time is just not possible. Through static analysis, Google document that there is no way a change in one component can affect another component test.
Some other points that I noted down included “Make clean” is considered a bug. Petabytes of output needs recreating in case of a complete rebuild and time stamps are not a reliable attribute so Google’s build tools take a hash of source files to check for changes.
Needs hermetic control
Incidentally, every tool is controlled in the source code repository so Google make developers use consistent versions of all the tools and can then increase consistency and productivity by copying of built artefacts. Copying outputs between builds is always better than repeating a build with distributed workforce and data so the incrementality of building can be small as typically at least 90% is already been built at that version, and with those versioned tools, in other words it is entirely reproducible (“hermetically sealed”) and is a strategy known as “Hermetic builds
” which I confess I had to look up and found a really good descriptive blog by Jeff Brown (see the link
). Having small, composable services is similar to feature flags, little “switches” in software that turn features on or off at runtime – see Pete Hodgson’s blog
Managing test dependencies
Other slightly more obvious advice was to parallelise component builds by having small libraries of <15 files, which makes perfect sense to me… but it did make me slightly confused why tests on such built components would need to be re-run and goes against the component-based paradigm of testing in isolation (remembering that earlier it has been stated to run all tests for every change).
The answer was that test results are also stored in the build system and if there are no (dependent) changes the answer to the test is the same so get it from the cache.
Something that struck a chord with me was “Don’t test for all different browsers” – if there is a problem with the way code works on a browser then it is the browser’s problem. Many is the time that I’ve spent ages tracking down and fixing a display glitch that only appears in Safari or IE. This is a problem dear to me in relation to testing multiple mobile devices so maybe I need to understand Dan’s points better. I think Dan mentioned that Saucelabs
provides the cross-browser testing as a service.
Other common problems can include glitches in tests related to authentication and sessions so “Don’t log in for every test, spoof a cookie instead”. I’m not completely sure about this but as long as there are a suite of test cases to specifically test the login and session-handling then I guess it can help simplify functional tests.
Quadratic increase in code and test arises in growing companies where hiring of new developers is roughly linear, and writing code is also linearly growing, hence there is a quadratic increase in code and tests that has to be allowed for in capacity planning.
The Build Cops
So what happens when builds and tests start to go wrong? Google have “Build cops”, people who investigate what is going on. What broke? Who broke it? and have the ability to Rollback and retry change. They also chase down “flaky tests” in quiet times. I thought Dan went on a bit about flaky tests and got slightly sidetracked in a discussion about flaky tests indicating flaky code.
Release often and twitter pants
One other obvious strategic change was to have more frequent releases, which arose because Google Chrome had problems with last minute commits before releases when developers wanted to get in changes they were working on but perhaps not totally happy with rather than wait a long time for a subsequent release. So my takeaways from this are to think very carefully about encouraging/enforcing developers to be even more diligent in designing and implementing componentised, provable code modules and just re-inforcing the agile manifesto principles
Finally, another take-away that I’m taking back to the team is “twitter pants
” which I am sure is not funny at all in the US but in the UK the discussion around “Labels in pants” and python in pants will surely cause a few giggles. This is a shame as the framework for controlling dependencies actually sounds really interesting and I’m thinking we could invent a similar tool and call it Smart Trousers in honour of the Trousers of Reality
. How much different would that book series have been if it were called the Pants of Reality?
And as a very last memory, it was nice to see that real software engineers still sport beards and ponytails (at least in Norwich). Oh, and thanks to all the organisers and my employers Smart421
for the free beer.