Category Archives: Uncategorized

Making BeagleBone Black serial number available to user applications

The BeagleBone Black is a handy little board. It has a 1GHz ARM processor and 4GB of on-board Flash storage, and runs Linux very easily. In particular, the Debian distribution is available ready-made to just copy on to the board from an SD card. I’ve been using them in various embedded applications recently.

IMG_20160812_162532

One of the applications I’m working on at the moment will be deploying BeagleBone Black boards in hundreds of locations, and we have to manage them sensibly. When managing all these boards, it will make our lives much easier to know their identities. Happily, the BeagleBone Black is fitted with an EEPROM which contains various useful information about it, including its serial number and a copy of the barcode stuck to the board.

The EEPROM is accessible via the I2C bus. The bus itself is accessible via /dev/i2c-0 at address 0x50. As standard, Debian Linux is set up so that the group i2c has access to this device. However, the kernel device tree is one step ahead of us. A special at24 driver takes over access to the EEPROM, so any attempt to access it directly via /dev/i2c-0 just gets denied with ‘Device or resource busy’. The contents of the EEPROM are available, however, in sysfs under /sys/bus/i2c/0-0050/eeprom.

There is also a bone-capemgr driver which presents some of the EEPROM contents in a more convenient way, as a series of files elsewhere in sysfs, under /sys/devices/bone_capemgr.9/baseboard/

This is all very handy, but all the files in sysfs are only readable by root. That’s not helpful for my applications. I don’t want to be running things as root if I can possibly help it. I wanted to find a reasonably legitimate way to make the relevant sysfs files available to a group of users. We can’t just change the permissions on those files because sysfs is created dynamically, so any changes would be lost after a reboot.

The obvious way to do this would be via a udev rule so that the permissions are set up when the relevant devices are found. However, udev rules don’t seem to have a direct way to change the permissions and ownership of files in sysfs. After some experimentation I came up with a couple of rules which did what I wanted. They match on the narrowest set of keys I could work out, and run chmod and chown to set the permissions the way I’d like them:

DRIVER=="bone-capemgr", RUN+="/bin/chown root:i2c /sys$env{DEVPATH}/baseboard/serial-number"
SUBSYSTEM=="i2c", DEVPATH=="*0-0050", RUN+="/bin/chown root:i2c /sys$env{DEVPATH}/eeprom", RUN+="/bin/chmod 0640 /sys$env{DEVPATH}/eeprom"

Put those lines in a suitable rules file. I used /etc/udev/rules.d/30-bone-capemgr.rules. Then, running

udevadm trigger

Should trigger the rules and set the permissions. Because sysfs is recreated at boot time when the various devices are found and added, the rules will trigger each time and leave the permissions the way we want them.

The board serial number is to be found in bytes 16-28 of the EEPROM, and the barcode is in bytes 80-103. The hexdump utility is handy for extracting the data. For example,

hexdump -e '8/1 "%c"' /sys/bus/i2c/devices/0-0050/eeprom -s 80 -n 23

should show the barcode. Wrap that up in a script and we have a programmatic way of finding the board’s identity.

Super Breakout Saga

In the beginning, or at least in 1972, there was Pong. It was one of the first video games to become widely popular, and made Atari’s name in the games industry. It wouldn’t be overstating the case to call it seminal, even though the game itself involved only bouncing a ball between two bats. Four years later, Pong begat Breakout, which extended the bat-and-ball mechanics of Pong to knocking bricks out of a wall. Then came Super Breakout, a similar game, but more complex, with three different modes of play and even multiple bats and balls. Adding the new features needed a new, powerful ingredient: Super Breakout was one of the earliest arcade games to feature a microprocessor. This was in 1977, before Space Invaders, before Pac-Man, before Donkey Kong.

super-breakout-cabinet

I picked up this PCB several years ago on eBay. The price was keen because, of course, it was ‘untested’. We all know what that means. Broken. That’s the way I like them. It’s hard to get to know a game board that’s in perfect working order.

IMG_20160807_180709

IMG_20160807_180717

Getting this one running was more than the usual challenge. From about 1980 onwards, a few conventions about how arcade games were built and wired got established. This game pre-dates those, so the usual rules don’t apply. Power? Instead of 5V and 12V DC supplies, it wants 10V and 22V AC, centre-tapped. Video? Instead of the usual colour RGB with separate sync, it delivers black-and-white composite video, and expects a screen with strips of coloured film stuck to it. Controls? No joystick here. Not even a trackball. Just an analogue potentiometer to control the paddle, and various flashing lights and switches.

My aim with every arcade game I acquire is to adapt it to the JAMMA standard, so it’s easy to plug it in and play it. This one was going need more than just a bit of wiring.

The first job was to get the board running. I rigged up temporary power supplies and a black-and-white monitor which could handle its video output.

The most serious fault was faulty RAM chips. Super Breakout has eight RAM chips of a whole one kilobit each, totalling a spectacular 1 Kbyte of RAM. Six of them were faulty. Fortunately a kind member of the most excellent UKVAC forum had some available. Identifying the faults was made easier by the board’s test mode, which makes beep-boop-boop noises to indicate which chips it thinks are faulty.

IMG_20160807_180733

The sockets holding the program ROMs and the processor were all rubbish and had to be replaced with modern ones. Cleaning them never worked reliably. This particular version of the game uses 12 ROM chips, each holding 4 kilobits. That’s a generous 6 Kbyte of program code, but it’s also 216 pins in sockets to go wrong.

IMG_20160807_180742

There was one faulty logic gate ( a 7420) associated with the ‘game select’ and ‘serve’ switches, and the DIP switches for choosing the game options needed a squirt of DeoxIT switch cleaner. Success!

IMG_20160807_220827

With those problems attended to the board passed all its self tests and seemed to run quite happily. Now to design that JAMMA adapter.

Tektronix 549 Storage Oscilloscope, Restored

A couple of years ago, I restored this exquisite brute of an oscilloscope to working order. You can read the story starting with part 1. I never posted any pictures of the finished article, so here they are.

You can read more about the machinery at the TekWiki.

549_34a

549_front_off

549_34c

549_34b

549_rear

1a4

manuals

Switched on and working, with all four traces showing.

549_front_light

549_front_dark

I made a short movie of the storage functions working.

Systemd for Embedded Linux

Over the last few years, there has been a lot of controversy in the Linux world about systemd. As I understand it, systemd is intended to be a better-engineered, more powerful version of the motley collection of little programs and scripts which keeps the essential services on a Linux system running.

systemctl

The controversy arises because the original 1970s Unix way of doing things was to rely on a motley collection of little programs and scripts for everything, each of which was simple but well understood, and to knit them together to form a complete operating system. Systemd takes a different approach, using larger and more sophisticated components which are more dedicated to particular tasks, such as managing services or network connections. This is supposed to make it more efficient and easier to manage in the twenty-first century.

I’ve been doing some work recently on an embedded Linux system which runs on the latest version of Debian Linux, version 8 (‘Jessie’). Debian Jessie fully supports systemd to the extent that it seems to be the default way of doing things. I thought I’d experiment with it a bit.

When working on an embedded Linux system, I very frequently want to have a piece of my software run reliably at startup, get restarted if it fails, and be able to output logging information to an easily-managed place. In this case, my software provides a D-Bus interface to a piece of industrial electronics.

In the past I’ve relied on copying and pasting scripts from other pieces of software, and managing log files has always been a bit of a mess. It’s hard to do these things right, so re-inventing the wheel is too risky, which means that the best strategy is to copy somebody else’s scripts. I have never counted the hours of my time which have been wasted by dealing with awkward corner cases and peculiar bugs due to recycled scripts behaving in ways I hadn’t anticipated.

What does it look like with systemd? There are some helpful tutorials out there, including this one from Alexander Patrakov, so it didn’t take me too long to put together a service file which looks like this:

[Unit]
Description=My D-Bus Gateway
[Service]
Type=dbus
BusName=com.martin-jones.gateway
ExecStart=/usr/bin/my_dbus_gateway
Restart=always
[Install]
WantedBy=multi-user.target

I’ve changed the names to protect the innocent, but the contents of the file are pretty self-explanatory. The [Unit] section just includes a description which is readable to a human being. The [Service] section describes the service itself. In this case it’s of type  dbus, which means that systemd will check that the service name (com.martin-jones.gateway in this case) gets correctly published on to D-Bus. The Restart=always setting means that my software gets restarted if it exits. The [Install] section just indicates that this service should run when the system comes up in multi-user mode (like the old runlevel 5).

Having created this file, I simply copied it into /etc/systemd/system/my_dbus_gateway.service and, lo and behold, my new service worked. It was immediately possible to manage the service using commands like

systemctl start my_dbus_gateway.service
systemctl stop my_dbus_gateway.service
systemctl status my_dbus_gateway.service

Great! That’s exactly what I wanted.

Now for logging. I’d heard that systemd would log the stdout and stderr outputs of services into its own journal, and forward that to syslog as required. It does, but there’s a subtlety. Output from stderr appears in /var/log/syslog immediately line-by-line, but output from stdout gets aggressively buffered. This means that it gives the appearance of not working at all unless you explicitly flush the stdout buffer in your code using something like

fflush(stdout)

That’s the only wrinkle I came across, though.

In summary, using systemd’s facilities has made my life as an embedded Linux developer much, much easier and hopefully more reliable. That’s a good thing. My top tips for getting your software working under systemd are these:

  • Create your .service file using the recipe above and the documentation
  • Don’t forget to flush stdout if you want to see it in syslog.

Lenovo Thinkpad T61 GPU fix. Or not.

My mobile workhorse is a trusty Lenovo T61, a close cousin of the ones they use on the International Space Station. How cool is that? It’s built like a tank, and weighs about as much, but the feature I appreciate most is its screen: a lovely 1680 x 1050 resolution, which is actually enough pixels to get some work done. Most laptops have rubbish screens which were clearly only designed for watching DVDs and reading Facebook. It’s a pet hate of mine.

IMG_1276

Sadly the Thinkpad has blotted its copybook. The Nvidia graphics chip has a reputation for expiring earlier than it should do, and this one has. It started with the screen going blank a couple of months ago. I replaced the screen, and then it worked but only with every other column of pixels. Then it spontaneously started working almost-properly, but with a peculiar green shimmering effect on bright colours which was just about possible to avoid by fiddling with the display settings. Then it died altogether, and just gave the ominous beep-bip-bip code at startup which means ‘my graphics hardware isn’t working’.

In a last ditch attempt to revive it, I thought I’d try reflowing the solder on the graphics chip. It helps on some laptops. In theory you need very specialised equipment for this, but I’ve had success in my professional life doing it with a more, shall we say, agricultural approach. With nothing to lose, I had a go. Here’s what I did.

Reflowing the solder on the graphics chip involves removing the motherboard, which requires completely disassembling the laptop.

Remove all the bits the come out: the battery, DVD drive, hard drive, and any PC card and SD card.

IMG_1237

Remove the marked screws  on the bottom to remove the keyboard, touch pad and palm rest.

IMG_1235

Unplug the three wires to the Wifi module. Plug 1 is grey, 3 is white, 2 is black.

IMG_1238

Remove the keyboard surround/speaker grills. Two short screws on the top, one long one from the bottom rear right corner, one long one from the outer rear bottom left corner, four little flat ones from the metalwork near the CPU.

Unplug the screen connector, the grey cloth one near the fan.

IMG_1242

Remove the two tiny screws holding the left speaker and move it to one side.

IMG_1243

Remove 2 medium-length screws from the bottom rear edge and two short screws from the screen brackets on the top side. Take the screen off.

IMG_1245

Remove one tiny silver screw from the front right hand side, 9 short screws from the bottom, three more long ones from the bottom, and four short ones with big flat heads near the docking connector.

From the top, remove the four medium-length screws holding the heatsink down.

IMG_1249

Remove the two short silver screws holding the two silver brackets on the right side of the heat sink.

IMG_1251

Unplug the fan connector and ease the heatsink/fan assembly free.

 

Img_1252

Unplug the multiway connector in the rear centre which feeds the USB sub-board. Tease the wire free from under its sellotape.IMG_1253

Remove the two medium screws holding the wifi module in and remove the wifi module.

IMG_1255.JPG

The motherboard and frame should now be free of the bottom case.

IMG_1256.JPG

Turn the motherboard over. Remove the 8 remaining screws with big flat heads. Remove the medium screw by the DVD connector. Remove the small silver screw by the SD socket. Remove the short screw holding the heatsink brace. Remove the heatsink brace.

Returning to the top of the motherboard, unplug the charger connector, the small connector by the phone socket, the speaker connector by the RAM and the little black connector at the back where the wifi module was. Leave the backup battery connected.

The magnesium frame should now be free of the motherboard. Take it off. Clean the heatsink compound from the tops of the large chips. U47, with the Nvidia logo on it, is the GPU.

To try and reflow it, I made a makeshift heatshield out of a doubled-up piece of kitchen foil. I’ve done some emergency BGA repairs in the past so I had some idea of what I was aiming at. My strategy is to heat the whole chip area while carefully poking some nearby easy-to-repair part to see when the solder has melted. I expected it to take a couple of minutes.

Img_1273.jpg

I did exactly that, but ran into a nasty problem: there is self-adhesive tape on the top of the GPU and nearby on the PCB, which shrinks in the heat. Unfortunately it takes components with it, so they end up on the tape instead of the PCB. It turns out that there are a couple of dozen parts – decoupling capacitors, I think – on the top of the GPU. Most of them had come off and stuck to the shrunken tape.

IMG_1274

In addition, the backup battery got itself in the way and overheated and burst during the reflow. There were also solder balls visible round the end of the GPU. I thought I’d pretty much wrecked it. But, having come this far, I decided to try and repair the damage and put it back together.

Reassembly is the reverse of disassembly, as they say in the Haynes manuals. Don’t forget to apply new heatsink compound to the tops of the chips in contact with the heatsink. I didn’t hold out much hope.

I got it all back together with no screws left over. I put the battery in, pressed the power switch, and apart from a brief flicker of the power light, nothing. Not a sausage, or even a beep. Game over. Time to go shopping for a new motherboard.

As luck would have it, the Thinkpad T61 was built in various versions. Only the really upmarket ones have the Nvidia graphics chip with the bad reputation. The cheaper versions use the graphics provided by the Intel motherboard chipset. I care more about reliability than I do about ultimate 3D graphics performance, so I decided to do a motherboard swap once I’d discovered that replacement motherboards were available from the US at about $30. It’s useful to know the part numbers. My original motherboard, an early one with Nvidia graphics, was a 42W7652. The ones with the Intel graphics are 42W7651 (early version, supports Intel Merom processors) and 42W7875 (later version, supports both Merom and Penryn processors). On the left the old motherboard, on the right the replacement one.

Img_1377

Swap the CPU and RAM to the new motherboard, and make sure you put the little foam block on top of the transistors by the VGA socket – it conducts heat to the heatsink. Also swap the PC Card cage, which is held by two screws underneath the motherboard. It pulls out vertically from the motherboard.

Getting the machine to boot after I’d swapped the motherboard took a bit of fiddling. It turns out that, because the backup battery had been removed, all the BIOS settings were lost and the SATA interface mode had changed from ‘Emulation’ to ‘AHCI’. The symptom was that the machine would start booting but Windows 7 would just bluescreen immediately. Changing that BIOS setting fixed the problem.

I didn’t have to re-validate Windows, but a couple of pieces of software got unhappy. The Visagesoft Expert PDF tools required me to re-enter the registration information I’d already paid for, which was no trouble, and the Xilinx and Lattice FPGA tools needed new host-locked licence keys generating, which was free.

The Thinkpad lives to fight another day, and to let me write this blog post.

 

Refurbishing a JVC 5 1/4″ floppy disc drive

A couple of months ago, I was given this splendid setup in return for some data recovery work. The original disc drives had stopped working, and my customer wanted the data from the discs. I used my office BBC B+ to retrieve the data successfully from the discs which were in ADFS format. This equipment – a BBC Master 128, Acorn AKF12 monitor and a pair of 40/80 track disc drives – was in such good condition that I decided to try and get it all working.

Img_0917

The monitor worked fine and the computer booted up quite happily, but the disc drives were definitely in trouble. The right hand one would just about read the first couple of tracks of a disc, but the left hand one wouldn’t read anything at all. I pulled the lid off to see what kind of drives were inside. The disc unit has a ‘Cumana’ label on the front, but they were a dealer and put together boxes with all sorts of drives inside, though usually of good quality.

Img_0918

There’s nothing complicated in there. A simple linear power supply at the back, the two drives, and the switches at the front with some distinctly dodgy-looking wiring. Was zip-tying the 40/80 track switch wiring directly to the 240V mains wiring really such a good idea? And I think I might have splashed out on insulated crimps for the mains connections on the back of the switch.

The drives themselves are ones I’d never seen before. They’re made in Japan by JVC, model number MDP-100. I associate JVC with 1980s consumer electronics rather than computer peripherals. They generally had a decent reputation amongst techno-savvy schoolboys. Their video recorders, especially, rebadged in the UK as Ferguson, were almost indestructible. The floppy drives should be worth saving.

Looking more closely at the drives revealed some interesting details. Underneath is a little brown add-on board with a bunch of TTL logic which does the 40/80 track switching. It looks like a factory modification, but the wiring to the main board (not visible in this photo) is a bit heath robinson.

Img_0920

The main board features a couple of 7438 TTL chips made by Tesla in Czechoslovakia. They were definitely a rarity on the western side of the iron curtain. Their date code (courtesy of tubes-store.com) is November 1983, which is the right sort of time. Why on earth would a Japanese disc drive contain Czechoslovak chips? It’s not as if there weren’t any Japanese ones. All the other TTL chips in the drive are made by Hitachi, and they’re 74LS series which was current at the time. Almost nobody was still using straight 74 series in 1983. My best guess is that there was some subtlety of the Shugart-standard floppy disc interface that needed standard TTL, and nobody else was still making it. Or they were just cheap.

Img_0921

The first problem I spotted was corrosion. The board has lots of horrible little electrolytic capacitors on it. The tiny ones like these are notorious for being badly sealed and oozing gunk which causes electrical faults, corrodes tracks and other components. Playing on the safe side, I whipped the lot off and replaced them with newer ones which were a bit bigger, and cleaned up the damage. No tracks seemed to have suffered, fortunately.

Img_0922

Next I inspected the heads. Squinting at them in position, I thought I could see some dirt, so I unscrewed the upper head to get a good look at both of them.

Img_1210

 

Yuk! I’ve never seen a floppy drive head so dirty. No wonder it wouldn’t read discs. The photo is a view of the upper head. The lower one was just as bad. A thorough scrub with isopropyl alcohol, being careful not to damage the rather delicate suspension, and they came up like new.

Img_1213

I put the upper head back on in roughly the right place and tried the drive again. It would now read discs, at least on the lower head (drive 0) but only intermittently, and wouldn’t work at all beyond track 2. Attempting to get to track 3 or beyond just resulted in a lot of rattling and a disc error.

I suspected a head stepper motor problem. There were two BA6208 chips visible on the PCB between the main JVC controller chip and the connector to the stepper motor, CN04.

Img_1307

I suspected they, or one of them, might be the culprit. This type chip and many similar ones were used for driving motors in lots of consumer electronics, and they did give trouble. I remember replacing them in VCRs and the like. The data sheet is easy enough to find on the internet. A quick check with the scope and sure enough, the right hand one wasn’t working properly. Each chip contains two drivers, and one of them wasn’t driving at all.

I found a replacement chip for about a pound on allegro.pl. Bargain! I fitted it and now had full head-stepping action. However, reading data still wasn’t reliable. Verifying a known-good disc would fail quite frequently, often in the same place. I had a look at the signal waveform coming from the disc. Fortunately the head amplifier chip, a Hitachi HA16631P, is also documented online. The read data signal is visible at pin 12. What I saw was this.

Img_1315

It was difficult to capture on the scope, but basically the waveform is of uneven size: as the disc rotates, it grows and shrinks, sometimes so much that errors result. I tried fiddling with the head alignment by adjusting the head stepper motor, but it didn’t help. Eventually I figured out that during my head-cleaning efforts, I’d managed to bend the thin steel tab which holds the upper head. Not much, but enough to prevent it exerting the right amount of pressure on the disc. This pressure also keeps the disc in contact with the lower head, which is fixed. I straightened it and tried again.

Img_1312

Much better! A nice fat waveform, with no odd variations. You can see the head mounting in the photo below. The two screws marked with a cyan dot mount the head assembly via its steel strip, and loosening them allows it to be aligned with the lower head. The screw between them marked in pink just attaches the copper-coloured bracket which holds a tiny coil spring against the back of the head assembly which adds a little more pressure, though it didn’t seem to make much difference in practice. Perhaps if the disc was wrinkly or something it might help.

Img_1308a

After all that I had a disc drive in perfect working order.

At some point I’ll do the second one and then the whole system should be OK.

Img_1316

Cracking the code: reverse engineering the AlcaTech BPM Studio controller

The recent demise of my old workshop PC has spurred me into action. Back in the halcyon days of the late 90s and early 2000s, I used to DJ a bit. I was an early adopter of computer-based DJ technology. In those days it was still fairly unusual to have music in MP3 format – the iPod wasn’t released until late 2001 – never mind being able to actually DJ with it. At the time I found a product from a German company which did exactly what I wanted: it was a combination of hardware and software called BPM Studio which meant I could use MP3 files as if they were a professional CD player: cueing them, pitch-shifting, mixing and so on. The hardware is a solidly-built control panel which connects to the PC, on which runs some software which does all the audio processing.

Img_1278

So why has the failure of my old workshop machine reminded me of this? Because, once upon a time, it was my media PC, and it ran the DJ software. The software and hardware is now 15 years old, and it shows: the controller connects using a serial port (when did you last see a PC with one of those?) and the software has…wait for it…a dongle! Yes, just like in the bad old days, it has a device which plugs into the parallel port on the PC, and if it’s not found, the software won’t run. Parallel ports, especially, are a dying breed today, so the chances of being able to use this controller and software in the future fade as PC technology moves on.

The BPM Studio package was expensive, and the controller is quite nice and robustly built, so I’d like to be able to preserve it and, if possible, use it with more modern software. The trouble is, its interface to the PC is proprietary, unsupported by any other software, and I couldn’t find any documentation on it. There was only one way forward to protect my investment: hack it.

The first step was to have a look at what was going on on the serial connection, and what baud rate was in use. I put a little breakout adapter in the serial cable (this one, in fact, modified a bit) so I could examine the data.

IMG_1283

First thing was to figure out the baud rate. Set the scope to trigger on the rising edge and play with the timebase a bit and soon we can see the start bit of each byte:

Img_1282

That’s definitely 19200 baud. Nice and standard. So I started my handy hdump2 software, which displays two streams of data side by side so it’s possible to see what came from where and when, and hoped to see something which made sense: a recognisable packet format, perhaps, or at least consistent data. What I got instead was this:

bpmstudio-startup

The left column is data from the PC, and the right column is data from the controller. It’s clear there’s a conversation going on, but it looks encrypted to me. There are no obvious packets, no start or end markers, nothing clearly related to what’s going on. I played around pressing buttons on both the controller and PC, and lots of data flowed but nothing made any sense. No readable track names for the displays, no recognisably similar data when I pressed the same button numerous times.

Why on earth would anyone encrypt the connection between the PC and a controller like this? Only the designers know, but I guess it’s part of the same mindset that required a hardware dongle to run the software. A fear of piracy, probably.

Interestingly, if the PC and controller are separated, they each send out  a burst of data once a second. The PC sends bursts of 4 bytes, and the controller sends bursts of 12 bytes. Each of them follows a fixed pattern from startup. The PC sends:

9c 94 dc 0e
56 1e 97 95
ad f8 87 4a
dc bc f0 37
32 44 bd a1

and so on. The controller sends:

f0 99 d0 af 3b 2f c8 5b 21 3c 4f d4
44 95 ac e1 d9 76 2a 58 bf 1e 52 52
34 e7 1a 93 ce b1 97 3e a4 f9 01 37
d3 f3 94 c1 32 57 31 a7 9a 6c 83 68
84 ae d1 f6 e7 c1 c8 5d e2 e4 46 36

and so on. I can’t see an obvious relationship between them. What I can see from the conversation dump above is that the controller seems to restart its sequence when it sees the data from the PC, but with some subtle differences.

If I was a proper mathematician, I’d spend more time trying to work out what the code was. Being an engineer, I thought I’d take it apart and have a look inside.

Img_1285

There’s more to it than I thought. This predates the days of powerful PIC and AVR microcontrollers, and actually has separate chips for its CPU, ROM and RAM. That’s good news for anyone interested in reverse engineering it. The CPU is a Siemens/Infineon 80C166:

Img_1287

and there’s a 29F010 (128 KByte) ROM connected to it, presumably holding the software:

Img_1288

There’s also a 32K RAM chip, which is more than I’d expect. I just hope the software isn’t doing something horrible like decrypting itself into RAM and running from there.

The good news is that documentation for the 80C166 is freely available, as is a disassembler, ADIS16X. And I’ve just ordered a PLCC32 adapter so that the ROM, once I’ve desoldered it, will fit my EPROM programmer. Watch this space.

Modifying libmodbus for asynchronous operation

I’m working on a project at the moment which has to connect to some industrial control equipment. The communications protocol in use is Modbus, or to be more precise its Modbus TCP variant. Working with this protocol is made much easier by the convenient libmodbus, a free and open-source software library which handles the communications and data formatting. The library is included with Debian Linux, the platform on which I’m writing the software.

modbus_logo

Convenient as it is, libmodbus is written with the assumption that communications are synchronous: that it’s OK to request some data and wait for the response. For example, fetching some data from a Modbus device looks like this (in abbreviated C):

uint16_t registers[5];
modbus_t *mb = modbus_new_tcp("192.168.1.20", 1502);
modbus_connect(mb);
modbus_read_registers(mb, 0, 5, registers);
modbus_close(mb);

The code above fetches the contents of registers 0-5 from the Modbus device at IP address 192.168.1.20, port 1502. It’s delightfully simple. My problem is that each of the network operations: modbus_connect(), modbus_read_registers() and modbus_close() could take some time, if the network is congested or unreliable or if the device is busy doing something else.

My software needs to handle various types of communication from different sources on the network, so hanging around while any of them completes isn’t acceptable. It’s OK to wait for data – that’s just life – but being unresponsive to other things while that data is arriving just won’t do.

Another project I worked on last year used D-Bus communications which faces exactly the same problem. It’s intended for relatively complex software systems where many things could be going on at the same time. The authors of D-Bus have thought of this, and made it easy to use asynchronously. Rather than asking for some data and simply being unable to do anything else until it arrives, asynchronous operation allows the program to request some data, get on with something else, and be informed when the data is ready. The same applies to other operations which may take some time.

At the core of asynchronous operation is the run loop. Rather than the program being a step-by-step series of synchronous operations like the example above, it has a loop which sits waiting for any new activity, and then triggers any actions which need to deal with that activity. For example, in pseudo-C again:

initialise_everything();
while(1)
  run_loop();

run_loop() {
  if(nothing_happening())
    sleep_for_a_moment();
  switch(what_happened()) {
    case network_connection_succeeded:
      start_sending_data();
    case data_sent:
      start_receiving_data();
    case data_received:
      notify_application();
    case error:
      /* handle error */
    }
}

This structure means that lots of operations can be outstanding, and whichever needs attention first can get that attention without waiting for any of the others. It’s more complex but much more powerful.

My application is interested in network data from various places, and Linux (as well as many other operating systems) provides some handy operating system services that make asynchronous operation straightforward, with a little thought. The most important is select().

The select() system call allows a program to wait until something happens to any of a list of file descriptors, each of which can represent a hardware device, a network connection, or various other things. It also allows a timeout, so if nothing happens for a moment, the program can do other things, then call select() again without missing anything.

The D-Bus library has two important interfaces which make it possible to base the applications run loop around select():

  • D-Bus will tell the application each time it is interested in a new file descriptor, or it is no longer interested in a file descriptor. This is known as adding or removing a watch.
  • D-Bus provides a function which can be called whenever one of the file descriptors is indicated by select().

That’s basically it. It means that D-Bus can get on with whatever networking complexity it likes without occupying my application any more than it has to. Sauce for the goose is sauce for the gander, so this model should fit my Modbus application too.

Since libmodbus is open source, I was able to modify it to support this method of operation. Most of the required code was already in there, but I had to create new functions which called it in particular ways, and add new data to the modbus_t structure to keep track of what operations were outstanding. The new asynchronous way of working looks like this, in rather abbreviated form:

modbus_t *mb = modbus_new_tcp("192.168.1.20",1502);

modbus_set_connected_cb(mb, &connect_callback);
modbus_set_read_cb(mb, &read_callback);
modbus_set_add_watch_cb(mb, &add_watch_callback);
modbus_set_remove_watch_cb(mb, &remove_watch_callback);

modbus_connect_async(mb);
while(1)
  run_loop();

void connect_callback(int failure) {
  if(!failure)
    modbus_read_registers_async(mb, 0, 5, registers);
}

void read_callback(int failure) {
  if(!failure)
    /* we got the data we asked for */
}

void add_watch_callback(int fd, int flags) {
  /* add fd to our list of file descriptors */
}

void remove_watch_callback(int fd, int flags) {
  /* remove fd from our list of file descriptors */
}

void run_loop() {
  if(select(list_of_file_descriptors))
    modbus_selected(fd, flags);
}

I’ve left lots of detail out here, but the sequence of operations looks like this:

  • the application informs libmodbus about the various functions which should be called when things happen: a connection succeeds, data is received, a new watch is to be added, a watch is to be removed.
  • the application asks libmodbus to start a connection, but asynchronously using libmodbus_connect_async().
  • the application then just sits in the run loop.
  • libmodbus call the application back through add_watch_callback(), adding a watch on the socket it will use for the connection. It then asks the operating system to make the connection.
  • when the connection completes, select() will return its file descriptor, and the run loop will call libmodbus via modbus_selected()
  • libmodbus now checks that the connection was successful, and calls the application through connect_callback().
  • The application can now request data using modbus_read_registers_async().
  • While requesting the data, libmodbus will almost certainly use add_watch_callback() to inform the run loop that it should keep an eye out for the data.
  • When data arrives, which may happen in several small chunks, the run loop will call modbus_selected().
  • libmodbus can assemble and check the received data. When it has arrived succesfully, or failed terminally, it will call the application back through read_callback().
  • The application can now work with the received data.

While all this is going on, the application can be doing other things: handling other network connections, processing data, or even handling other Modbus connections.

The modifications, after some development and debugging time, work very nicely. After more testing, they’ll almost certainly make it into the final application, and I’d like to contribute them back to the open source community so that other developers can use Modbus asynchronously too.

Recovering Windows XP from a corrupted registry

In the workshop at home I have a terribly old PC. It’s got an AMD Athlon XP CPU at a blistering 1.48GHz, and runs Windows XP. It’s mostly just got on with the jobs I ask of it (although its power supply has featured in the blog before) but in the last week it’s failed twice. I suspect there’s a hardware problem, given that the machine is approaching its 15th birthday.

Img_1229

The symptoms have been that the machine would be working normally and then the screen would go black, with no response from the keyboard or mouse and no disk activity. Pressing the PC’s reset button brought it to life again. but with the dreaded message, “Windows could not start because the following file is missing or corrupt: \WINDOWS\SYSTEM32\CONFIG\SYSTEM”.

A quick web search revealed the official repair procedure from Microsoft: How to recover from a corrupted registry that prevents Windows XP from starting. The procedure looks like this:

  • Boot the machine using the XP install CD and enter the recovery console
  • Copy the five registry hives (system, software, sam, security and default) from \WINDOWS\SYSTEM32\CONFIG into a backup location
  • Copy default registry hives from \WINDOWS\REPAIR in their place
  • Boot into Windows and use the desktop to copy backup registry hives created by System Restore into a temporary location
  • Reboot into the recovery console again
  • Copy the backup registry hives into \WINDOWS\SYSTEM32\CONFIG
  • Reboot into Windows again (yawn)
  • Use the System Restore utility to restore the system to the most recent restore point.

I did this the first time, and it turned out to be an awful lot of fiddling around just to restore a backup of five files. One problem is that installing the default registry hives and booting Windows makes a mess of the user profiles, which is why the later system restore is required. As far as I can see, Microsoft recommend this procedure simply to avoid people having to dig around in the filesystem at the recovery console.

The next time the same fault occurred, I decided to try a short cut. This is what I did.

  • Boot the machine using the XP install CD and enter the recovery console.
  • Copy the corrupt registry hive (system) from \WINDOWS\SYSTEM32\CONFIG into a backup location, just in case
  • Still in the recovery console, find the most recent backup created by System Restore
  • Copy the system registry hive from there into \WINDOWS\SYSTEM32\CONFIG
  • Reboot into Windows and start working again.

The tricky part about this is that the system restore folder names are really long and unpleasant to type, and the recovery console doesn’t have command completion. However, you only have to do it once.

First, find the most recent _restore folder in \System Volume Information:

Img_1230

Then, the most recent RPxxx folder inside there:

Img_1231

That folder will contain a ‘snapshot’ folder, inside which are the registry backups. Copy the relevant one into \WINDOWS\SYSTEM32\CONFIG. Note that the filenames are different:

Img_1234

Type ‘exit’ to reboot, and that’s it. Job done. It worked for me: a ginormous download that Firefox had been working on continued exactly where it left off, and I’m typing this on the very same machine.

It would probably be possible to do exactly the same process using a bootable Linux CD, too, as long as it was capable of reading and writing NTFS filesystems in a trustworthy way.

Incidentally, all this is only possible because XP automagically saves backups of important things using System Restore. Say what you like about Microsoft, but that’s a really useful feature.

Debian Mini-debconf, Cambridge 2015

I spent last weekend at the Debian Mini-debconf in Cambridge, UK. What on earth is a mini-debconf? It’s a smaller version of a Debconf, which in turn is a conference concerning the Debian operating system.

DSC_1270

The Debian operating system is free, and is usually based around the Linux kernel. It can run on (almost) anything, from a Raspberry Pi to an IBM mainframe and I frequently work on products which contain it. Though I’ve worked with the Debian software a lot over the years, I had never really taken part in the community. Attending this event was my first chance to do that.

It was a great experience. Thanks to sponsorship from ARM, Codethink, Collabora, Cosworth, and Hewlett Packard Enterprise, it was all free, and even lunch was provided. There was an astonishing variety of talks from all sorts of people, covering topics from sleep apnoea to the problems of dealing with the vast quantities of data generated by the Large Hadron Collider.

The talk from Betty Dall of HP Enterprise about The Machine was fascinating. HPE are working on a new computer architecture which does away with the traditional divisions between main memory, hard drives, and other types of storage, and replaces them with a vast (like, really enormous, thousands of petabytes) array of memristor memory.  She explained some of the challenges of designing and programming such a machine. I was not a little surprised to hear that it will run Debian Linux, just like the little electric car charging points and communication aids I’ve worked on!

Also entertaining was Vincent Sanders’s account of the trials and tribulations of maintaining a web browser, NetSurf. The reality of dealing with the world wide web is hideous: so many web pages do terrible, terrible things, standards are rarely properly documented or specified, and best of all, web browsers are never allowed to give up and say ‘this page makes no sense’!

The whole team did a great job, ably led by Steve McIntyre, and the event was streamed live by the Debian Video Team, who I even joined as a temporary director and vision mixer for a couple of hours. The talks will eventually be on line in video form at the Debian Meeting Archive.

At the end of the event on Sunday evening, Steve announced that the ARM atrium area would need to look like a canteen by the time we left. With no fuss, no persuasion, people just got together and shifted all the tables and chairs. The work was done in no time. I mused as I left that perhaps that’s why free software works.