It was the last week in January, nearly a month ago, and we were preparing for the imminent release of LINA 0.72.
It was down to details. We believed ourselves to be past the unknown unknowns. Nile had hacked together code that could run natively on Linux while generating installers on all the major platforms. Paul and I had a sweet little test system sketched out - seven real-live computers wired together and KVM-d in to two workstations. Five host machines ruled by Zuul, the Gate-keeper, programmed to distribute LINA test-runs to our zoo of real and virtual operating systems, a machine dual-booting the two latest Mac flavors on i386, with the dual-booting Mac PPC machine waiting in the wings. That plus our QEMU disk images in various stages of completion. Debian4, Fedora7, Fedora8, OpenSUSE 10.2, OpenSUSE 10.3, Ubuntu Feisty, Ubunty Gutsy, Windows 2000, Windows 2003, Windows XP and Windows Vista, fully prepped with the LINA prerequisites and compressed from 30Gigs into nice little 1-2 Gig tarballs and camping out on our networked hard drive.
All was in order. In test-runs the convoluted scripts were delivering lovely false results files back to Zuul. The 0.72 release including the new install code was producing all kinds of actual GUI elements, running sweetly, if primatively, on the development machines. Yes indeed, a tiny bit more testing, and we should be good to go. LINA 0.72 would be out within a week. It was down to details.
And the long week began.
Day 1 - The Dell laptop host boots into blue screens and kernel panics after a package update. After hours of fsck-ing around with it, I pronounce the hard drive dead.
Day 2 - The first test run doesn’t quite loop right, eats its own brain, and leaves tarballs strewn across the hosts. I write a little cleanup routine.
Day 3 - All of the tests either fail outright or time out. This with the 0.71 release, known to work on seven of our platforms. I fix the passwordless sudo situation on three guests, pass host ssh keys to two other guests, and change the code to allow a long wait time during the kernel build.
Day 4 - Add wheel group to the Ubuntu images thus REALLY fixing their passwordless sudo problem. Increase time-out again.
Day 5 - All-day business meeting, then research into the blue-screen problem with starting Vista on QEMU.
Day 6 - Paul sets up Good Old Dell with the new 120G hard drive. Installs Fedora7, whacks self on forehead then installs Fedora8. I struggle to figure out why we’ve only had one successful build so far.
Day 7 - Fit out good old Dell with the qemu packages (gcc3.4, kqemu, qemu) and the git packages (git and curl). Set up ssh. Open each diskimage and pass it the revised authorized_keys file.
Day 8 - Attempt a day of rest. I come in to find out why the Good Old Dell build has disappeared from the radar. Turn off strict-host-checking in its ssh-config. Oops. Restart tests. Get in the car around dusk to go home. Car doesn’t start. Lots of time on the bus ride to think about improving test outputs.
Day 9 - Car has a broken timing belt. Still struggling to understand the lack of good builds. Set up scripts to preserve ALL output, with the option to preserve machine states as well
Day 10 - Still struggling. Add a missing package to Ubuntu Gutsy and install kernel sources on Debian. Increase the time-out.
Day 11 - Another successful build - OpenSuse 10.3 on a Mac Mini host. Which is weird because that image had hung earlier on a different host. Hmm.
Day 12 - Examining the results more carefully. Every Windows build has timed out. Every build on the HP Desktop and Good Old Dell have timed out. Hmm.
Day 13 - Another attempted rest day. Fooling around with getting Vista up on QEMU. Try translating an image built using VirtualBox, which kinda works except for nearly complete lack of hardware compatibility, including networking.
Day 14 - Board meeting and first round of installer usability testing. Begin exploration of the ACPI problem that prevents Vista from running on QEMU. Start another test run with even longer time-outs. Installer on Mac and Linux is sweet but still a tiny bit awkward. Nile takes it home to work through details again.
Day 15 - Recompile QEMU repeatedly, following clues from forums, working through different ACPI options. Still no luck with Vista. Starting to give up hope with Vista. Plan B, real hardware, may be the only choice. How barbaric.
Day 16 - Struggle to understand last night’s results. Still no completed builds on Windows. Bring home XP image to run it by hand on QEMU on my development laptop. Fedora7 has finally built, on a mini - after a timeout and a memory failure on other hosts. Hmm.
Day 17 - XP build, virtualized on my own machine appears to have hung. It trudged along for ten hour or so, but has made no progress in the last eighteen. On XP installed on hardware the entire build took about twelve hours. Cygwin on Windows on QEMU seems to be a disappearingly narrow bottleneck.
Day 18 - Still no completed builds on the slow HP desktop or Good Old Dell. It’s all starting to make sense. Start a local build of OpenSuse10.3 (known to be a good image) on Good Old Dell, with no time out
Day 19 - Finally get around to investigating the wxWidgets failure on the Mac Tiger build. Adjust script to copy libtoolize to system area. Still fails. Brainstorm with Nile and Google - seems like the latest Tiger update isn’t supported by our version of wxWidgets. Find a proposed fix to include in the current code.
Day 20 - Day of rest… Good Old Dell is still chomping away on the build. 36 hours is too long. The HP is even slower. Looks like we’re down to three host machines.
Day 21 - The super-speedy dual core 64-bit Dell server host has become suspect as well. Builds either hang or wind up with an alloc error. Are we maxing out QEMU’s memory at nearly 2 Gig? Is it some other problem? Tests are devised.
Day 22 - The Dell server has joined our collection of useless hosts. Sigh. Down to the two Minis as host machines. And still no good builds on Windows images, regardless of host. Meanwhile, the installer code is almost rock-solid. We really need to start testing the current code soon…
Day 23 - Paul and I lay it on the line - we need to do something fast. We can limp along with the two Mini’s for testing Unix images, but what about Windows? Better virtualization? Testing on bare metal? A bit of tense brainstorming, and we decide to do a trial run using VMWare.
Day 24 - Pounding on VMWare - nice interface, super speedy. The Debian image I convert runs with no graphical output, but the converted XP image doesn’t boot. Looks like we’ll need to re-create Windows images for VMWare. Can VMWare support the internal LINA virtualization? Tests are devised.
Day 25 - Building LINA on Vista on VMWare from a script! Now if we can just get a handle on the seemingly randomly assigned IP addresses, and get VMWare to stop quizzing us about relocated images…
Day 26 - Meanwhile, back at the lab, testing new code on Mac Tiger and the Linux distros.
Day 27 - Windows installer code complete, ready to test Windows builds by hand using VMWare - saving the automation for next time!
Day 28 - Test system is up and running, installer code is finished. Just a tiny bit of bug-stomping and we’re good to go.
Within a week for sure.
Share This