Tektronix 549 Storage Oscilloscope, Restored

2 Replies

A couple of years ago, I restored this exquisite brute of an oscilloscope to working order. You can read the story starting with part 1. I never posted any pictures of the finished article, so here they are.

You can read more about the machinery at the TekWiki.

549_34a

549_front_off

549_34c

549_34b

549_rear

1a4

manuals

Switched on and working, with all four traces showing.

549_front_light

549_front_dark

I made a short movie of the storage functions working.

Cracking a password-protected PDF file

2 Replies

Suppose you asked an insurance company for a letter. The insurance company kindly sent it as a PDF attached to an email. Sensibly, they protected that PDF with a password which they told you over the phone. You wrote it in a notebook and then left the notebook at work over the weekend.

pdf-password

How could you read the letter in the password-protected file at home, then? Remembering that the password was definitely an English word, and all in lower case, a dictionary attack has got to be worth a try.

Linux provides some handy tools for this. There’s a list of English words in /usr/share/dict/words, and a suite of PDF tools which can attempt to open the file using a password, indicating success or failure. A few minutes with Python and:

#!/usr/bin/python
import os,sys

wf=open('/usr/share/dict/words','r')
while True:
  word = wf.readline().strip().lower()
  if word == '':
    print "No solution found"
    break
  print word
  cmdline = 'pdftotext -upw "'+word+'" '+sys.argv[1]
  result = os.system(cmdline)
  if result == 0:
    break

The same thing must be possible in a more hipsterly fashion using awk, but I couldn’t be bothered to figure out a sufficiently baroque command line.

By the way, the password was ‘orange’. Don’t tell anybody.

Reliable I2C with a Raspberry Pi and Arduino

Leave a reply

There are many ways of connecting sensors and devices to a Raspberry Pi. One of the most popular is the I²C bus. It’s great for devices which don’t need to transfer too much data, like simple sensors and motor controllers, and it’s handy because lots of devices (up to 127, or even more) can be connected to the same pair of wires, which makes life really simple for the experimenter. I’ve mentioned using the I²C bus in another blog post, because sometimes a bit of software fiddling is needed to get it to work.

Recently I’ve been working on a project involving various devices connected to a Raspberry Pi. Some of them use I²C. The project is based around a breakout board I designed for the Multidisciplinary Design Project at Cambridge University Department of Engineering, in which students collaborate in teams to put together a robot. The breakout board is shown next to the Raspberry Pi in the photo below.

It fits on top of the Pi, and has lots of useful features including a student-proof power supply, real time clock, accelerometer, space for a Zigbee module, analogue inputs, diagnostic LEDs and four motor driving outputs, all wired to convenient connectors.

The analogue inputs and motor outputs are implemented by a PIC microcontroller connected to the I²C bus. The software for the PIC was written by an undergraduate several years ago. It works well, but seems to have some odd habits. I found that it would apparently work, but sometimes an attempt to read data from the PIC would just fail, or return wrong data, and sometimes data would get written to the wrong register. At first I suspected a wiring problem, but examining the SDA and SCL signals with a scope showed nothing wrong. I tested another device on the same bus – a Philips PCF8575 I/O expander – and it worked perfectly every time. That narrowed the problem down to the PIC. Since there was nothing I could do about the PIC’s software, I had to find a workaround.

I spent some time experimenting with where the communications seemed to go wrong. Reading from an I²C device usually involves two separate operations on the bus. The first one tells the I²C device which register address we want to read, and the second does the actual read. The diagram below shows the sequence. The ‘control byte’ in each case sends the address of the I²C device (0x30 in this case) plus a bit indicating read or write.

smbus-transaction

I found a pattern in the failures. From time to time, the write operation which sets the register address would fail, reporting ‘I/O error’. After that, reading the data would return the wrong value. I modified my code so that if the write operation failed, it would retry a couple of times before giving up. It turned out that retrying was always successful, if not on the first attempt then on the second. However, the data read would still return the wrong value. The value returned was always the address of the register I wanted! It seemed as if something was getting stuck somewhere in the I²C system. Whether it was in the Linux drivers, or the PIC software, I don’t know, and I didn’t spend long enough to find out. My assumption is that the PIC software is sometimes just too busy to respond to the I²C operations correctly.

I tried the retry strategy again, and it turned out that the second attempt to read the data byte always got the right value. The algorithm to read reliably looks like this, in pseudo-code:

  if (write_register_address() fails)
    retry up to 3 times;

  read_data();
  if (we had to retry writing register address)
    read_data();

In practice I was using the Linux I²C dev interface to implement this. Yes, it’s a bit of a nasty hacky workaround, but it did get the communications working reliably.

There was another device I wanted to talk to: an Arduino Mini running a very simple sketch to return some sensor data. This also used the I²C bus. There are handy tutorials about how to get an Arduino to behave as an I²C slave device, like this one. The I²C interface is implemented nicely by the Wire library. Implementing a slave involves responding to two events: onReceive and onRequest.

The onReceive event is called when data, like the register address, is written to the slave, and the onRequest event is called when the master wants to read data. My initial code looked like this:

Wire.begin(I2C_ADDRESS)
Wire.onReceive(receiveEvent)
Wire.onRequest(requestEvent)

void receiveEvent(int bytes) {
  registerNumber = Wire.read();
}
void requestEvent() {
  Wire.write(registers[registerNumber];
}

This worked most of the time, but after a few thousand transactions, it would appear to ‘lock up’ and ignore any attempt to change registers – it would always return the same register, and in fact no more onReceive events were ever generated. Of course, it turned out to be my fault. When reading data in the onReceive event code, it turns out to be important to make sure that data is actually available, like this:

void receiveEvent(int bytes) {
  while(Wire.available())
    registerNumber = Wire.read();
}

That solved the problem. It’s annoying that reading non-existent data can lock up the whole I²C interface, so watch out for this one if you’re using an Arduino as an I²C slave.

Systemd for Embedded Linux

2 Replies

Over the last few years, there has been a lot of controversy in the Linux world about systemd. As I understand it, systemd is intended to be a better-engineered, more powerful version of the motley collection of little programs and scripts which keeps the essential services on a Linux system running.

systemctl

The controversy arises because the original 1970s Unix way of doing things was to rely on a motley collection of little programs and scripts for everything, each of which was simple but well understood, and to knit them together to form a complete operating system. Systemd takes a different approach, using larger and more sophisticated components which are more dedicated to particular tasks, such as managing services or network connections. This is supposed to make it more efficient and easier to manage in the twenty-first century.

I’ve been doing some work recently on an embedded Linux system which runs on the latest version of Debian Linux, version 8 (‘Jessie’). Debian Jessie fully supports systemd to the extent that it seems to be the default way of doing things. I thought I’d experiment with it a bit.

When working on an embedded Linux system, I very frequently want to have a piece of my software run reliably at startup, get restarted if it fails, and be able to output logging information to an easily-managed place. In this case, my software provides a D-Bus interface to a piece of industrial electronics.

In the past I’ve relied on copying and pasting scripts from other pieces of software, and managing log files has always been a bit of a mess. It’s hard to do these things right, so re-inventing the wheel is too risky, which means that the best strategy is to copy somebody else’s scripts. I have never counted the hours of my time which have been wasted by dealing with awkward corner cases and peculiar bugs due to recycled scripts behaving in ways I hadn’t anticipated.

What does it look like with systemd? There are some helpful tutorials out there, including this one from Alexander Patrakov, so it didn’t take me too long to put together a service file which looks like this:

[Unit]
Description=My D-Bus Gateway

[Service]
Type=dbus
BusName=com.martin-jones.gateway
ExecStart=/usr/bin/my_dbus_gateway
Restart=always

[Install]
WantedBy=multi-user.target

I’ve changed the names to protect the innocent, but the contents of the file are pretty self-explanatory. The [Unit] section just includes a description which is readable to a human being. The [Service] section describes the service itself. In this case it’s of type dbus, which means that systemd will check that the service name (com.martin-jones.gateway in this case) gets correctly published on to D-Bus. The Restart=always setting means that my software gets restarted if it exits. The [Install] section just indicates that this service should run when the system comes up in multi-user mode (like the old runlevel 5).

Having created this file, I simply copied it into /etc/systemd/system/my_dbus_gateway.service and, lo and behold, my new service worked. It was immediately possible to manage the service using commands like

systemctl start my_dbus_gateway.service
systemctl stop my_dbus_gateway.service
systemctl status my_dbus_gateway.service

Great! That’s exactly what I wanted.

Now for logging. I’d heard that systemd would log the stdout and stderr outputs of services into its own journal, and forward that to syslog as required. It does, but there’s a subtlety. Output from stderr appears in /var/log/syslog immediately line-by-line, but output from stdout gets aggressively buffered. This means that it gives the appearance of not working at all unless you explicitly flush the stdout buffer in your code using something like

fflush(stdout)

That’s the only wrinkle I came across, though.

In summary, using systemd’s facilities has made my life as an embedded Linux developer much, much easier and hopefully more reliable. That’s a good thing. My top tips for getting your software working under systemd are these:

Create your .service file using the recipe above and the documentation
Don’t forget to flush stdout if you want to see it in syslog.

Lenovo Thinkpad T61 GPU fix. Or not.

1 Reply

My mobile workhorse is a trusty Lenovo T61, a close cousin of the ones they use on the International Space Station. How cool is that? It’s built like a tank, and weighs about as much, but the feature I appreciate most is its screen: a lovely 1680 x 1050 resolution, which is actually enough pixels to get some work done. Most laptops have rubbish screens which were clearly only designed for watching DVDs and reading Facebook. It’s a pet hate of mine.

Sadly the Thinkpad has blotted its copybook. The Nvidia graphics chip has a reputation for expiring earlier than it should do, and this one has. It started with the screen going blank a couple of months ago. I replaced the screen, and then it worked but only with every other column of pixels. Then it spontaneously started working almost-properly, but with a peculiar green shimmering effect on bright colours which was just about possible to avoid by fiddling with the display settings. Then it died altogether, and just gave the ominous beep-bip-bip code at startup which means ‘my graphics hardware isn’t working’.

In a last ditch attempt to revive it, I thought I’d try reflowing the solder on the graphics chip. It helps on some laptops. In theory you need very specialised equipment for this, but I’ve had success in my professional life doing it with a more, shall we say, agricultural approach. With nothing to lose, I had a go. Here’s what I did.

Reflowing the solder on the graphics chip involves removing the motherboard, which requires completely disassembling the laptop.

Remove all the bits the come out: the battery, DVD drive, hard drive, and any PC card and SD card.

Remove the marked screws on the bottom to remove the keyboard, touch pad and palm rest.

Unplug the three wires to the Wifi module. Plug 1 is grey, 3 is white, 2 is black.

Remove the keyboard surround/speaker grills. Two short screws on the top, one long one from the bottom rear right corner, one long one from the outer rear bottom left corner, four little flat ones from the metalwork near the CPU.

Unplug the screen connector, the grey cloth one near the fan.

Remove the two tiny screws holding the left speaker and move it to one side.

Remove 2 medium-length screws from the bottom rear edge and two short screws from the screen brackets on the top side. Take the screen off.

Remove one tiny silver screw from the front right hand side, 9 short screws from the bottom, three more long ones from the bottom, and four short ones with big flat heads near the docking connector.

From the top, remove the four medium-length screws holding the heatsink down.

Remove the two short silver screws holding the two silver brackets on the right side of the heat sink.

Unplug the fan connector and ease the heatsink/fan assembly free.

Unplug the multiway connector in the rear centre which feeds the USB sub-board. Tease the wire free from under its sellotape.

Remove the two medium screws holding the wifi module in and remove the wifi module.

The motherboard and frame should now be free of the bottom case.

Turn the motherboard over. Remove the 8 remaining screws with big flat heads. Remove the medium screw by the DVD connector. Remove the small silver screw by the SD socket. Remove the short screw holding the heatsink brace. Remove the heatsink brace.

Returning to the top of the motherboard, unplug the charger connector, the small connector by the phone socket, the speaker connector by the RAM and the little black connector at the back where the wifi module was. Leave the backup battery connected.

The magnesium frame should now be free of the motherboard. Take it off. Clean the heatsink compound from the tops of the large chips. U47, with the Nvidia logo on it, is the GPU.

To try and reflow it, I made a makeshift heatshield out of a doubled-up piece of kitchen foil. I’ve done some emergency BGA repairs in the past so I had some idea of what I was aiming at. My strategy is to heat the whole chip area while carefully poking some nearby easy-to-repair part to see when the solder has melted. I expected it to take a couple of minutes.

I did exactly that, but ran into a nasty problem: there is self-adhesive tape on the top of the GPU and nearby on the PCB, which shrinks in the heat. Unfortunately it takes components with it, so they end up on the tape instead of the PCB. It turns out that there are a couple of dozen parts – decoupling capacitors, I think – on the top of the GPU. Most of them had come off and stuck to the shrunken tape.

In addition, the backup battery got itself in the way and overheated and burst during the reflow. There were also solder balls visible round the end of the GPU. I thought I’d pretty much wrecked it. But, having come this far, I decided to try and repair the damage and put it back together.

Reassembly is the reverse of disassembly, as they say in the Haynes manuals. Don’t forget to apply new heatsink compound to the tops of the chips in contact with the heatsink. I didn’t hold out much hope.

I got it all back together with no screws left over. I put the battery in, pressed the power switch, and apart from a brief flicker of the power light, nothing. Not a sausage, or even a beep. Game over. Time to go shopping for a new motherboard.

As luck would have it, the Thinkpad T61 was built in various versions. Only the really upmarket ones have the Nvidia graphics chip with the bad reputation. The cheaper versions use the graphics provided by the Intel motherboard chipset. I care more about reliability than I do about ultimate 3D graphics performance, so I decided to do a motherboard swap once I’d discovered that replacement motherboards were available from the US at about $30. It’s useful to know the part numbers. My original motherboard, an early one with Nvidia graphics, was a 42W7652. The ones with the Intel graphics are 42W7651 (early version, supports Intel Merom processors) and 42W7875 (later version, supports both Merom and Penryn processors). On the left the old motherboard, on the right the replacement one.

Swap the CPU and RAM to the new motherboard, and make sure you put the little foam block on top of the transistors by the VGA socket – it conducts heat to the heatsink. Also swap the PC Card cage, which is held by two screws underneath the motherboard. It pulls out vertically from the motherboard.

Getting the machine to boot after I’d swapped the motherboard took a bit of fiddling. It turns out that, because the backup battery had been removed, all the BIOS settings were lost and the SATA interface mode had changed from ‘Emulation’ to ‘AHCI’. The symptom was that the machine would start booting but Windows 7 would just bluescreen immediately. Changing that BIOS setting fixed the problem.

I didn’t have to re-validate Windows, but a couple of pieces of software got unhappy. The Visagesoft Expert PDF tools required me to re-enter the registration information I’d already paid for, which was no trouble, and the Xilinx and Lattice FPGA tools needed new host-locked licence keys generating, which was free.

The Thinkpad lives to fight another day, and to let me write this blog post.

Refurbishing a JVC 5 1/4″ floppy disc drive

Leave a reply

A couple of months ago, I was given this splendid setup in return for some data recovery work. The original disc drives had stopped working, and my customer wanted the data from the discs. I used my office BBC B+ to retrieve the data successfully from the discs which were in ADFS format. This equipment – a BBC Master 128, Acorn AKF12 monitor and a pair of 40/80 track disc drives – was in such good condition that I decided to try and get it all working.

The monitor worked fine and the computer booted up quite happily, but the disc drives were definitely in trouble. The right hand one would just about read the first couple of tracks of a disc, but the left hand one wouldn’t read anything at all. I pulled the lid off to see what kind of drives were inside. The disc unit has a ‘Cumana’ label on the front, but they were a dealer and put together boxes with all sorts of drives inside, though usually of good quality.

There’s nothing complicated in there. A simple linear power supply at the back, the two drives, and the switches at the front with some distinctly dodgy-looking wiring. Was zip-tying the 40/80 track switch wiring directly to the 240V mains wiring really such a good idea? And I think I might have splashed out on insulated crimps for the mains connections on the back of the switch.

The drives themselves are ones I’d never seen before. They’re made in Japan by JVC, model number MDP-100. I associate JVC with 1980s consumer electronics rather than computer peripherals. They generally had a decent reputation amongst techno-savvy schoolboys. Their video recorders, especially, rebadged in the UK as Ferguson, were almost indestructible. The floppy drives should be worth saving.

Looking more closely at the drives revealed some interesting details. Underneath is a little brown add-on board with a bunch of TTL logic which does the 40/80 track switching. It looks like a factory modification, but the wiring to the main board (not visible in this photo) is a bit heath robinson.

The main board features a couple of 7438 TTL chips made by Tesla in Czechoslovakia. They were definitely a rarity on the western side of the iron curtain. Their date code (courtesy of tubes-store.com) is November 1983, which is the right sort of time. Why on earth would a Japanese disc drive contain Czechoslovak chips? It’s not as if there weren’t any Japanese ones. All the other TTL chips in the drive are made by Hitachi, and they’re 74LS series which was current at the time. Almost nobody was still using straight 74 series in 1983. My best guess is that there was some subtlety of the Shugart-standard floppy disc interface that needed standard TTL, and nobody else was still making it. Or they were just cheap.

The first problem I spotted was corrosion. The board has lots of horrible little electrolytic capacitors on it. The tiny ones like these are notorious for being badly sealed and oozing gunk which causes electrical faults, corrodes tracks and other components. Playing on the safe side, I whipped the lot off and replaced them with newer ones which were a bit bigger, and cleaned up the damage. No tracks seemed to have suffered, fortunately.

Next I inspected the heads. Squinting at them in position, I thought I could see some dirt, so I unscrewed the upper head to get a good look at both of them.

Yuk! I’ve never seen a floppy drive head so dirty. No wonder it wouldn’t read discs. The photo is a view of the upper head. The lower one was just as bad. A thorough scrub with isopropyl alcohol, being careful not to damage the rather delicate suspension, and they came up like new.

I put the upper head back on in roughly the right place and tried the drive again. It would now read discs, at least on the lower head (drive 0) but only intermittently, and wouldn’t work at all beyond track 2. Attempting to get to track 3 or beyond just resulted in a lot of rattling and a disc error.

I suspected a head stepper motor problem. There were two BA6208 chips visible on the PCB between the main JVC controller chip and the connector to the stepper motor, CN04.

I suspected they, or one of them, might be the culprit. This type chip and many similar ones were used for driving motors in lots of consumer electronics, and they did give trouble. I remember replacing them in VCRs and the like. The data sheet is easy enough to find on the internet. A quick check with the scope and sure enough, the right hand one wasn’t working properly. Each chip contains two drivers, and one of them wasn’t driving at all.

I found a replacement chip for about a pound on allegro.pl. Bargain! I fitted it and now had full head-stepping action. However, reading data still wasn’t reliable. Verifying a known-good disc would fail quite frequently, often in the same place. I had a look at the signal waveform coming from the disc. Fortunately the head amplifier chip, a Hitachi HA16631P, is also documented online. The read data signal is visible at pin 12. What I saw was this.

It was difficult to capture on the scope, but basically the waveform is of uneven size: as the disc rotates, it grows and shrinks, sometimes so much that errors result. I tried fiddling with the head alignment by adjusting the head stepper motor, but it didn’t help. Eventually I figured out that during my head-cleaning efforts, I’d managed to bend the thin steel tab which holds the upper head. Not much, but enough to prevent it exerting the right amount of pressure on the disc. This pressure also keeps the disc in contact with the lower head, which is fixed. I straightened it and tried again.

Much better! A nice fat waveform, with no odd variations. You can see the head mounting in the photo below. The two screws marked with a cyan dot mount the head assembly via its steel strip, and loosening them allows it to be aligned with the lower head. The screw between them marked in pink just attaches the copper-coloured bracket which holds a tiny coil spring against the back of the head assembly which adds a little more pressure, though it didn’t seem to make much difference in practice. Perhaps if the disc was wrinkly or something it might help.

Img_1308a

After all that I had a disc drive in perfect working order.

At some point I’ll do the second one and then the whole system should be OK.

Cracking the code: reverse engineering the AlcaTech BPM Studio controller

Leave a reply

The recent demise of my old workshop PC has spurred me into action. Back in the halcyon days of the late 90s and early 2000s, I used to DJ a bit. I was an early adopter of computer-based DJ technology. In those days it was still fairly unusual to have music in MP3 format – the iPod wasn’t released until late 2001 – never mind being able to actually DJ with it. At the time I found a product from a German company which did exactly what I wanted: it was a combination of hardware and software called BPM Studio which meant I could use MP3 files as if they were a professional CD player: cueing them, pitch-shifting, mixing and so on. The hardware is a solidly-built control panel which connects to the PC, on which runs some software which does all the audio processing.

So why has the failure of my old workshop machine reminded me of this? Because, once upon a time, it was my media PC, and it ran the DJ software. The software and hardware is now 15 years old, and it shows: the controller connects using a serial port (when did you last see a PC with one of those?) and the software has…wait for it…a dongle! Yes, just like in the bad old days, it has a device which plugs into the parallel port on the PC, and if it’s not found, the software won’t run. Parallel ports, especially, are a dying breed today, so the chances of being able to use this controller and software in the future fade as PC technology moves on.

The BPM Studio package was expensive, and the controller is quite nice and robustly built, so I’d like to be able to preserve it and, if possible, use it with more modern software. The trouble is, its interface to the PC is proprietary, unsupported by any other software, and I couldn’t find any documentation on it. There was only one way forward to protect my investment: hack it.

The first step was to have a look at what was going on on the serial connection, and what baud rate was in use. I put a little breakout adapter in the serial cable (this one, in fact, modified a bit) so I could examine the data.

First thing was to figure out the baud rate. Set the scope to trigger on the rising edge and play with the timebase a bit and soon we can see the start bit of each byte:

That’s definitely 19200 baud. Nice and standard. So I started my handy hdump2 software, which displays two streams of data side by side so it’s possible to see what came from where and when, and hoped to see something which made sense: a recognisable packet format, perhaps, or at least consistent data. What I got instead was this:

bpmstudio-startup

The left column is data from the PC, and the right column is data from the controller. It’s clear there’s a conversation going on, but it looks encrypted to me. There are no obvious packets, no start or end markers, nothing clearly related to what’s going on. I played around pressing buttons on both the controller and PC, and lots of data flowed but nothing made any sense. No readable track names for the displays, no recognisably similar data when I pressed the same button numerous times.

Why on earth would anyone encrypt the connection between the PC and a controller like this? Only the designers know, but I guess it’s part of the same mindset that required a hardware dongle to run the software. A fear of piracy, probably.

Interestingly, if the PC and controller are separated, they each send out a burst of data once a second. The PC sends bursts of 4 bytes, and the controller sends bursts of 12 bytes. Each of them follows a fixed pattern from startup. The PC sends:

9c 94 dc 0e
56 1e 97 95
ad f8 87 4a
dc bc f0 37
32 44 bd a1

and so on. The controller sends:

f0 99 d0 af 3b 2f c8 5b 21 3c 4f d4
44 95 ac e1 d9 76 2a 58 bf 1e 52 52
34 e7 1a 93 ce b1 97 3e a4 f9 01 37
d3 f3 94 c1 32 57 31 a7 9a 6c 83 68
84 ae d1 f6 e7 c1 c8 5d e2 e4 46 36

and so on. I can’t see an obvious relationship between them. What I can see from the conversation dump above is that the controller seems to restart its sequence when it sees the data from the PC, but with some subtle differences.

If I was a proper mathematician, I’d spend more time trying to work out what the code was. Being an engineer, I thought I’d take it apart and have a look inside.

There’s more to it than I thought. This predates the days of powerful PIC and AVR microcontrollers, and actually has separate chips for its CPU, ROM and RAM. That’s good news for anyone interested in reverse engineering it. The CPU is a Siemens/Infineon 80C166:

and there’s a 29F010 (128 KByte) ROM connected to it, presumably holding the software:

There’s also a 32K RAM chip, which is more than I’d expect. I just hope the software isn’t doing something horrible like decrypting itself into RAM and running from there.

The good news is that documentation for the 80C166 is freely available, as is a disassembler, ADIS16X. And I’ve just ordered a PLCC32 adapter so that the ROM, once I’ve desoldered it, will fit my EPROM programmer. Watch this space.

Modifying libmodbus for asynchronous operation

8 Replies

I’m working on a project at the moment which has to connect to some industrial control equipment. The communications protocol in use is Modbus, or to be more precise its Modbus TCP variant. Working with this protocol is made much easier by the convenient libmodbus, a free and open-source software library which handles the communications and data formatting. The library is included with Debian Linux, the platform on which I’m writing the software.

modbus_logo

Convenient as it is, libmodbus is written with the assumption that communications are synchronous: that it’s OK to request some data and wait for the response. For example, fetching some data from a Modbus device looks like this (in abbreviated C):

uint16_t registers[5];
modbus_t *mb = modbus_new_tcp("192.168.1.20", 1502);
modbus_connect(mb);
modbus_read_registers(mb, 0, 5, registers);
modbus_close(mb);

The code above fetches the contents of registers 0-5 from the Modbus device at IP address 192.168.1.20, port 1502. It’s delightfully simple. My problem is that each of the network operations: modbus_connect(), modbus_read_registers() and modbus_close() could take some time, if the network is congested or unreliable or if the device is busy doing something else.

My software needs to handle various types of communication from different sources on the network, so hanging around while any of them completes isn’t acceptable. It’s OK to wait for data – that’s just life – but being unresponsive to other things while that data is arriving just won’t do.

Another project I worked on last year used D-Bus communications which faces exactly the same problem. It’s intended for relatively complex software systems where many things could be going on at the same time. The authors of D-Bus have thought of this, and made it easy to use asynchronously. Rather than asking for some data and simply being unable to do anything else until it arrives, asynchronous operation allows the program to request some data, get on with something else, and be informed when the data is ready. The same applies to other operations which may take some time.

At the core of asynchronous operation is the run loop. Rather than the program being a step-by-step series of synchronous operations like the example above, it has a loop which sits waiting for any new activity, and then triggers any actions which need to deal with that activity. For example, in pseudo-C again:

initialise_everything();
while(1)
  run_loop();

run_loop() {
  if(nothing_happening())
    sleep_for_a_moment();
  switch(what_happened()) {
    case network_connection_succeeded:
      start_sending_data();
    case data_sent:
      start_receiving_data();
    case data_received:
      notify_application();
    case error:
      /* handle error */
    }
}

This structure means that lots of operations can be outstanding, and whichever needs attention first can get that attention without waiting for any of the others. It’s more complex but much more powerful.

My application is interested in network data from various places, and Linux (as well as many other operating systems) provides some handy operating system services that make asynchronous operation straightforward, with a little thought. The most important is select().

The select() system call allows a program to wait until something happens to any of a list of file descriptors, each of which can represent a hardware device, a network connection, or various other things. It also allows a timeout, so if nothing happens for a moment, the program can do other things, then call select() again without missing anything.

The D-Bus library has two important interfaces which make it possible to base the applications run loop around select():

D-Bus will tell the application each time it is interested in a new file descriptor, or it is no longer interested in a file descriptor. This is known as adding or removing a watch.
D-Bus provides a function which can be called whenever one of the file descriptors is indicated by select().

That’s basically it. It means that D-Bus can get on with whatever networking complexity it likes without occupying my application any more than it has to. Sauce for the goose is sauce for the gander, so this model should fit my Modbus application too.

Since libmodbus is open source, I was able to modify it to support this method of operation. Most of the required code was already in there, but I had to create new functions which called it in particular ways, and add new data to the modbus_t structure to keep track of what operations were outstanding. The new asynchronous way of working looks like this, in rather abbreviated form:

modbus_t *mb = modbus_new_tcp("192.168.1.20",1502);

modbus_set_connected_cb(mb, &connect_callback);
modbus_set_read_cb(mb, &read_callback);
modbus_set_add_watch_cb(mb, &add_watch_callback);
modbus_set_remove_watch_cb(mb, &remove_watch_callback);

modbus_connect_async(mb);
while(1)
  run_loop();

void connect_callback(int failure) {
  if(!failure)
    modbus_read_registers_async(mb, 0, 5, registers);
}

void read_callback(int failure) {
  if(!failure)
    /* we got the data we asked for */
}

void add_watch_callback(int fd, int flags) {
  /* add fd to our list of file descriptors */
}

void remove_watch_callback(int fd, int flags) {
  /* remove fd from our list of file descriptors */
}

void run_loop() {
  if(select(list_of_file_descriptors))
    modbus_selected(fd, flags);
}

I’ve left lots of detail out here, but the sequence of operations looks like this:

the application informs libmodbus about the various functions which should be called when things happen: a connection succeeds, data is received, a new watch is to be added, a watch is to be removed.
the application asks libmodbus to start a connection, but asynchronously using libmodbus_connect_async().
the application then just sits in the run loop.
libmodbus call the application back through add_watch_callback(), adding a watch on the socket it will use for the connection. It then asks the operating system to make the connection.
when the connection completes, select() will return its file descriptor, and the run loop will call libmodbus via modbus_selected()
libmodbus now checks that the connection was successful, and calls the application through connect_callback().
The application can now request data using modbus_read_registers_async().
While requesting the data, libmodbus will almost certainly use add_watch_callback() to inform the run loop that it should keep an eye out for the data.
When data arrives, which may happen in several small chunks, the run loop will call modbus_selected().
libmodbus can assemble and check the received data. When it has arrived succesfully, or failed terminally, it will call the application back through read_callback().
The application can now work with the received data.

While all this is going on, the application can be doing other things: handling other network connections, processing data, or even handling other Modbus connections.

The modifications, after some development and debugging time, work very nicely. After more testing, they’ll almost certainly make it into the final application, and I’d like to contribute them back to the open source community so that other developers can use Modbus asynchronously too.

Recovering Windows XP from a corrupted registry

Leave a reply

In the workshop at home I have a terribly old PC. It’s got an AMD Athlon XP CPU at a blistering 1.48GHz, and runs Windows XP. It’s mostly just got on with the jobs I ask of it (although its power supply has featured in the blog before) but in the last week it’s failed twice. I suspect there’s a hardware problem, given that the machine is approaching its 15th birthday.

The symptoms have been that the machine would be working normally and then the screen would go black, with no response from the keyboard or mouse and no disk activity. Pressing the PC’s reset button brought it to life again. but with the dreaded message, “Windows could not start because the following file is missing or corrupt: \WINDOWS\SYSTEM32\CONFIG\SYSTEM”.

A quick web search revealed the official repair procedure from Microsoft: How to recover from a corrupted registry that prevents Windows XP from starting. The procedure looks like this:

Boot the machine using the XP install CD and enter the recovery console
Copy the five registry hives (system, software, sam, security and default) from \WINDOWS\SYSTEM32\CONFIG into a backup location
Copy default registry hives from \WINDOWS\REPAIR in their place
Boot into Windows and use the desktop to copy backup registry hives created by System Restore into a temporary location
Reboot into the recovery console again
Copy the backup registry hives into \WINDOWS\SYSTEM32\CONFIG
Reboot into Windows again (yawn)
Use the System Restore utility to restore the system to the most recent restore point.

I did this the first time, and it turned out to be an awful lot of fiddling around just to restore a backup of five files. One problem is that installing the default registry hives and booting Windows makes a mess of the user profiles, which is why the later system restore is required. As far as I can see, Microsoft recommend this procedure simply to avoid people having to dig around in the filesystem at the recovery console.

The next time the same fault occurred, I decided to try a short cut. This is what I did.

Boot the machine using the XP install CD and enter the recovery console.
Copy the corrupt registry hive (system) from \WINDOWS\SYSTEM32\CONFIG into a backup location, just in case
Still in the recovery console, find the most recent backup created by System Restore
Copy the system registry hive from there into \WINDOWS\SYSTEM32\CONFIG
Reboot into Windows and start working again.

The tricky part about this is that the system restore folder names are really long and unpleasant to type, and the recovery console doesn’t have command completion. However, you only have to do it once.

First, find the most recent _restore folder in \System Volume Information:

Then, the most recent RPxxx folder inside there:

That folder will contain a ‘snapshot’ folder, inside which are the registry backups. Copy the relevant one into \WINDOWS\SYSTEM32\CONFIG. Note that the filenames are different:

Type ‘exit’ to reboot, and that’s it. Job done. It worked for me: a ginormous download that Firefox had been working on continued exactly where it left off, and I’m typing this on the very same machine.

It would probably be possible to do exactly the same process using a bootable Linux CD, too, as long as it was capable of reading and writing NTFS filesystems in a trustworthy way.

Incidentally, all this is only possible because XP automagically saves backups of important things using System Restore. Say what you like about Microsoft, but that’s a really useful feature.

Debian Mini-debconf, Cambridge 2015

Leave a reply

I spent last weekend at the Debian Mini-debconf in Cambridge, UK. What on earth is a mini-debconf? It’s a smaller version of a Debconf, which in turn is a conference concerning the Debian operating system.

The Debian operating system is free, and is usually based around the Linux kernel. It can run on (almost) anything, from a Raspberry Pi to an IBM mainframe and I frequently work on products which contain it. Though I’ve worked with the Debian software a lot over the years, I had never really taken part in the community. Attending this event was my first chance to do that.

It was a great experience. Thanks to sponsorship from ARM, Codethink, Collabora, Cosworth, and Hewlett Packard Enterprise, it was all free, and even lunch was provided. There was an astonishing variety of talks from all sorts of people, covering topics from sleep apnoea to the problems of dealing with the vast quantities of data generated by the Large Hadron Collider.

The talk from Betty Dall of HP Enterprise about The Machine was fascinating. HPE are working on a new computer architecture which does away with the traditional divisions between main memory, hard drives, and other types of storage, and replaces them with a vast (like, really enormous, thousands of petabytes) array of memristor memory. She explained some of the challenges of designing and programming such a machine. I was not a little surprised to hear that it will run Debian Linux, just like the little electric car charging points and communication aids I’ve worked on!

Also entertaining was Vincent Sanders’s account of the trials and tribulations of maintaining a web browser, NetSurf. The reality of dealing with the world wide web is hideous: so many web pages do terrible, terrible things, standards are rarely properly documented or specified, and best of all, web browsers are never allowed to give up and say ‘this page makes no sense’!

The whole team did a great job, ably led by Steve McIntyre, and the event was streamed live by the Debian Video Team, who I even joined as a temporary director and vision mixer for a couple of hours. The talks will eventually be on line in video form at the Debian Meeting Archive.

At the end of the event on Sunday evening, Steve announced that the ARM atrium area would need to look like a canteen by the time we left. With no fuss, no persuasion, people just got together and shifted all the tables and chairs. The work was done in no time. I mused as I left that perhaps that’s why free software works.

martinjonestechnology

What's going on in the workshop?

Tektronix 549 Storage Oscilloscope, Restored

Cracking a password-protected PDF file

Reliable I2C with a Raspberry Pi and Arduino

Systemd for Embedded Linux

Lenovo Thinkpad T61 GPU fix. Or not.

Refurbishing a JVC 5 1/4″ floppy disc drive

Cracking the code: reverse engineering the AlcaTech BPM Studio controller

Modifying libmodbus for asynchronous operation

Recovering Windows XP from a corrupted registry

Debian Mini-debconf, Cambridge 2015