O’Reilly has a great interview up with NASA’s Peter Gluck, project software engineer for the Mars Phoenix Lander. I always find the design and implementation of mission-critical systems interesting. In short, they’re running a radiation-hardened system (the RAD 6000 board) with a 33MHz CPU, 128 megabytes of RAM, and a PCI peripheral interface… pretty advanced stuff for space. This usually surprises people when they first hear about these systems, but the circumstances require proven technology that is hardened against the perils of outer space (for example, the Hubble Space Telescope was recently upgraded to an Intel 486 processor… the Space Shuttle still runs on hardened PDP-11s).
The software is written in C and running on the VxWorks real-time OS… Lockheed Martin (who wrote the control systems) switched from ADA to C a few years back. There are plenty more interesting details in the article. Here are a few teasers:
The RAD 6000 has built in error detection and corrections. So the hardware does RAM scrubbing. There is a RAM scrubbing that occurs on a continuous basis. And beyond that, we have internal fault protection that monitors the health and safety of the software. And if a software task, for example, fails to respond to a ping, we have pings in the system, then the fault protection task will declare that a fault has occurred and will safe the spacecraft. And what that means, by “safeing”, we mean that the spacecraft will enter into a power and communications safe mode where it will just sit and wait for the ground to respond. It’ll basically phone home and say, I’ve got a problem; somebody tell me what to do.
So if it were to completely lock-up, the hardware has to be stroked every 64 seconds. There’s a watch-stop timer. And so if that 64 second period expires, then the hardware resets and the software is rebooted, and hopefully that clears whatever error occurred. Now in the event that that doesn’t work, we have a whole second set of avionics onboard. So the hardware will try to boot to the same side, and if the same side doesn’t come up and start stroking the watch-stop timer, then it will swap to the other side and boot the first side.
Interviewer: Am I right in assuming that there’s very little process separation in the older RAD 6000 boards?
Peter: Exactly… We have strict coding guidelines that we use. We don’t allow dynamic memory allocation, for example.
These are true fail-safe systems… not the stuff we mortal engineers play with. Click HERE to read the rest of the interview.
Tweakers.net has an excellent article describing the process and tools Intel uses to bring a CPU into production. It focuses mainly on the new Penryn processor, which was designed at the Folsom campus where I used to work. I’ve worked in some of these labs (debugging chipset firmware problems), and they are just as impressive as they sound. The article also does a good job of describing the tick-tock model of design scheduling, and how various “errata” (i.e. bugs) remain in each processor. Hope you enjoy the article as much as I did.
Back when I worked at Intel in their System Software group, we were working on an embedded OS kernel that would run on the chipset and help provide firmware based security for the enterprise.This technology was called Active Management Technology (AMT). The OS was host to various embedded security applications that monitored and controlled the system. We were also working with another team that was developing a secure hypervisor that would provide an isolated environment for the user OS (like Windows, Linux, etc). All of this was designed to allow an enterprise IT department more control over its machines and help isolate malware infected computers from the network. It also allows IT to more easily manage and repair systems remotely. Click on the link above if you want to read more.
Anyways, a few months ago, Intel made this music video promoting the technology:
A recent article by the Wharton group takes a look at employee perks (specifically at Google) and how effective they are. One of the most interesting points I found was the classification of employees as Integrators or Segmentors. From the article:
Perks like Google’s appeal to integrators, people for whom work life and home life have little distinction. These are the employees who like to plug into the wi-fi system on Google’s commuter bus and do work as they ride to and from the office; who check office e-mail frequently at home on nights and weekends; and who like child-care facilities at or near their office so that they can bring a part of home with them to work.
Segmentors, by contrast, like to maintain distinct walls between work and home. These are people made uncomfortable by a workplace filled with perks related to one’s personal life. Even employees with children can dislike the fact that their employer provides on-site childcare.
I also found this interesting:
In her research, Rothbard documented how segmentors in an integrationist workplace enjoyed less job satisfaction and had a lower commitment to their companies than their integrator co-workers. What was noteworthy, too, was that segmentors may not know the reasons they are dissatisfied at work. “It’s a subtle effect, where they know they just don’t fit in but may not know why,” Rothbard says.
I tend to be a segmentor… I enjoy my private life separate from work. This could explain why I never quite felt at home working at Intel, which is highly “integrationist”…
Here’s a 40-minute video of Intel Senior Fellow, Mark Bohr, giving Robert Scoble a tour of Intel’s new 45nm chip fabrication plant. Most of the interview focuses on the 45nm technology, but there are several shots inside the fab.
Click here to watch the video (requires Flash Player)
If you listen close, he even mentions the Folsom Chipset group I used to work for
The exciting thing about the new 45nm process is that Intel is replacing the SiO2 (silicon-dioxide) gate dielectric with a high-k dielectric like HfO2 (hafnium-dioxide). The gate dielectric is a thin layer (about 5 atoms thick with SiO2) that prevents current from leaking through the gate.
The problem is that the electric field generated by the gate must still be strong enough to create the inversion channel between the source and drain of the transistor (more information here). As transistors get smaller, the dielectric layer has to be so thin (to maintain the correct capacitance) that current begins to flow (leak) across the gate dielectric. This consumes power and generates heat.