housel


Not Quite Research

Peter S. Housel's weblog


Chaining signal handlers on x86 linux
housel
For some strange reason I can't comprehend, the SysV x86 ABI specifies that the INT 4 trap, invoked by the INTO (Interrupt on Overflow) instruction, is mapped to the SIGSEGV Unix signal, rather than the perhaps more intuitive SIGFPE. Linux for the x86 platform has followed suit (though FreeBSD did not.)

Open Dylan makes use of the INTO instruction to ensure that the <integer> abstraction is not violated. When an overflow trap is caught, the runtime is supposed to redirect program flow in order to cause a <arithmetic-overflow-error> exception to be signalled. This means that it needs to install a SIGSEGV handler.

The Memory Pool System garbage collector used by Open Dylan also installs its own SIGSEGV handler in order to implement a write barrier. (Linux aspects of the design are documented here.) In order for the two handlers to coexist, it is necessary to chain from one handler to the other if the condition it handles is not detected.

This is the tricky part. POSIX supports two different function signatures for signal handlers: the old style of handler,
void func(int signo);

and the SA_SIGINFO style handler,
void func(int signo, siginfo_t *info, void *context);

Chaining from one SA_SIGINFO handler to another is easy: just call it. Chaining to an old-style handler is not, however, because on Linux its real function signature is
void func(int signo, struct sigcontext context);

Moreover, any modifications made to the context structure passed on the stack are reflected in the program state on return from the signal handler. MPS uses this to simulate the execution of instructions that write to memory areas protected by its write barrier.

Fortunately, on x86 the contents of a struct sigcontext are identical to the content of uc_mcontext field of the ucontext_t structure type. This fact, along with a bit of inline assembly magic, allows us to write the following function for chaining to either type of signal handler:
inline void chain_sigaction(const struct sigaction *act,
                            int sig, siginfo_t *info, void *uap)
{
  if(act->sa_flags & SA_SIGINFO) {
    /* Inner handler uses the same (sa_sigaction) convention... call it */
    (*act->sa_sigaction)(sig, info, uap);
  }
  else {
    /* Inner handler uses the old (sa_handler) convention, with a
     * struct sigcontext passed as a structure argument. The content
     * of struct sigcontext is identical to the content of the
     * ucontext_t uc_mcontext field.
     */
    ucontext_t *uc = (ucontext_t *) uap;
    asm volatile(/* Preserve scratch registers on the stack */
                 "push\t%%eax\n\t"
                 "push\t%%edx\n\t"

                 /* Reserve stack space for the sigcontext argument */
                 "subl\t%[mcontext_bytes],%%esp\n\t"
                 
                 /* Copy the sigcontext onto the stack as the second
                  * argument
                  */
                 "cld\n\t"
                 "movl\t%[mcontext_words],%%ecx\n\t"
                 "lea\t%[mcontext],%%esi\n\t"
                 "lea\t(%%esp),%%edi\n\t"
                 "rep\tmovsl\n\t"

                 /* Push the signal number onto the stack as the first
                  * argument
                  */
                 "push\t%[sig]\n\t"

                 /* Call the handler */
                 "call\t*%[handler]\n\t"

                 /* Restore scratch registers */
                 "movl\t4+%c0(%%esp),%%edx\n\t"
                 "movl\t8+%c0(%%esp),%%eax\n\t"

                 /* Copy the sigcontext back into uc->uc_mcontext */
                 "movl\t%[mcontext_words],%%ecx\n\t"
                 "lea\t4(%%esp),%%esi\n\t"
                 "lea\t%[mcontext],%%edi\n\t"
                 "rep\tmovsl\n\t"

                 /* Restore the stack pointer */
                 "addl\t%[mcontext_bytes]+12,%%esp\n\t"
                 : /* no outputs */
                 : [mcontext_bytes] "i" (sizeof(uc->uc_mcontext)),
                   [mcontext_words] "i" (sizeof(uc->uc_mcontext) / 4),
                   [mcontext] "m" (uc->uc_mcontext),
                   [handler] "g" (act->sa_handler),
                   [sig] "g" (sig)
                 : "memory", "cc", "ecx", "esi", "edi");
  }
}
This code was added to Open Dylan in r12264.
Tags: ,

Laptop hard drive
housel
After a bit over three years of use, the 60 gigabyte hard drive in my Vaio VGN-TX670P has been hovering close to full for awhile now. In addition to the gradual accretion of Windows Update files, it contained:
  • Full checkouts of the Gwydion Dylan/Open Dylan , Monday and gcc source code repositories
  • Microsoft Visual C++ 2008 Express Edition, the platform SDK, etc.
  • The Logos Scholar's Library and various add-on dictionaries, lexicons, and commentaries, for a total of around three gigabytes of stuff (that would weigh who knows how many hundreds of pounds if it were printed on dead trees)
  • Outlook 2003's cached local copies of messages from my home IMAP server, containing e-mail going back to 1996.
  • Digital photos going back to 2005
  • Music and podcasts (I sometimes say I carry a three-pound iPod)
  • coLinux and a full Debian Etch disk image
Rather than making the difficult and painful decisions about what to get rid of, I decided it would be better to just upgrade the hard drive. I managed to find a suitable hard drive with double the original capacity, a Toshiba MK1214GAH, at a reasonable price. However, when it arrived and I opened up the Vaio (using instructions I found here), I discovered that the new disk had a ZIF (zero insertion force) connector, and the connector on the old drive was a Toshiba 50-pin IDE connector. There wasn't much I could do about that, so I put everything back together again and looked for another way.

Newer Vaio TX models used a ZIF connector, but I was unable to determine what the right FPC part number would be. (Plus, they were all fairly expensive.) Fortunately, I managed to find an adapter that would allow me to connect the new drive to the old connector. I ended up waiting another week for the adapter to arrive from Hong Kong.

The steps of the final upgrade were:

  1. Boot the Vaio using the Recovery is Possible Linux LiveCD
  2. Connect the Vaio to my home wired Ethernet, and mount /home on my FreeBSD desktop hard drive over NFS
  3. Copy an image of the original hard drive to a file on the server using dd. (This took about 4 hours.)
  4. Take the Vaio apart and (very gingerly) disconnect and remove the old hard drive
  5. Connect the adapter to the new hard drive, making sure to get the cable polarity correct (matching up pin 1 on both connectors)
  6. Move the rubber cushions from the old hard drive to the new one
  7. Folding the flat cable in an S curve, plug the adapter into the FPC connector and put the hard drive in the place of the old one. The new drive is a little shorter than the old one, but because it's sitting on top of the adapter it sticks up a little on the connector side. This results in the laptop case not quite snapping together, but it is barely perceptable.
  8. Put the Vaio back together again and boot Recovery is Possible to verify that the new hard drive is working
  9. Mount the NFS drive and copy the image file onto the new disk with dd.
  10. Reboot into XP to make sure everything is working
  11. Boot into Recovery is Possible again, and use fdisk to resize the main partition. (Linux fdisk is a bit raw, but it works.)
  12. Use ntfsresize to resize the filesystem to fit the partition
  13. Boot into XP (which will cause an automatic filesystem check at startup)
  14. Defragment the filesystem, now that there is room to do so
  15. Enjoy the freedom afforded by an abundance of disk space

Programming Languages I've Learned, In Order
housel
To the best of my recollection:
  • BASIC (TRS-80 Model I) [1980]
  • Z-80 assembly [1981]
  • FORTH [1982]
  • IBM PC BASIC (GW-Basic) [1983]
  • MACLISP [1983]
  • 68000 assembly [1984]
  • x86 assembly [1984]
  • C [1985]
  • FORTRAN [1985]
  • Pascal [1985]
  • Unix shell [1985]
  • VAX/Tahoe assembly [1986]
  • 6809 assembly [1988]
  • Perl [1988]
  • Common Lisp [1988]
  • Prolog [1989]
  • MATLAB [1989]
  • Scheme [1989]
  • DSP3210 assembly [1990]
  • C++ [1993]
  • Access Basic, Visual Basic [1993]
  • NewtonScript [1994]
  • Dylan [1995]
  • Java [1995]
  • (Redacted) DSP assembly [1997]
  • Standard ML [1999]
  • XSLT [2000]
  • Lua [2002]
  • ARM assembly [2003]
  • TCL [2004]
  • Jam [2004]
  • Javascript [2008]
I've used the ones in boldface within the last year.
(Idea via James Tauber)

Project snapshot: an exercise in yak shaving
housel
Projects directed graph

At any given time, paid workload permitting, I tend to be working on one or more different free or open-source projects, most of them related to the Dylan programming language. Much of this work frequently ends up split up into a large number of subtasks with various complex interrelationships. So that others, and I myself, are clear about these...

Topic-oriented documentation

This is the only project I'm working on that isn't directly connected to anything else.  When Open Dylan was open-sourced, it came with a large amount of high-quality documentation in FrameMaker format. FrameMaker is a fairly expensive tool, so it makes sense to convert all of this documentation to some other format in order to allow wider participation.  In 2004, DITA was starting to become known as a hot new documentation schema/architecture, designed from the ground up for customization.  By May 2005 I had finished a customization for Dylan, built a prototype system for delivering it on the web, and converted a number of fragments of the Open Dylan documentation to use it.

Since then, it has continued as a background task.  It's one of those relatively relaxing and stress-free things I can do when I don't feel like coding, or when something prevents the development environment from running on one of my machines.

Collage

One of the nifty features of the original Apple Dylan compiler was Creole. Creole was a foreign-function interface facility for interfacing to C routines, simply by including in your program an interface definition such as:
define interface
  #include "sampleheader.h",
    import: { "sqrt" => square-root, 
              "expt" => exponent,
              "divide",
              "remainder" };
end interface;

Creole was built into the compiler and worked by parsing the named C header files.

The CMU Gwydion Dylan developers implemented something very similar called Melange. One of the main differences was that Melange was implemented as a separate tool which took an interface file (containing a definition like the one above) and turned it into a Dylan source file.  This allowed it to work with Mindy, and also made rebuilding the compiler faster than it would have been if the interface generator had been built in (a major consideration back then).

Melange contains its own C preprocessor (so that it can see #define definitions) and its own incomplete C parser.  It does reasonably well, but it has often been unable to parse system headers on Linux, BSD, and Mac OS systems, especially when they make use of gcc-specific C extensions.  For this reason, almost all of the people who have hacked on Gwydion Dylan have had their turn at improving Melange, or at least adding their own stopgap measures.

Fairly early on, Erik Kidd got fed up with adding bandages to Melange and started a new tool, Pidgin.  Pidgin concentrated on providing a sensible representation for C types, as the C type representation in Melange was a source of frequent problems.  The preprocessor and parser were adapted from the Melange code base and shared the same failings as Melange's.  Pidgin was designed to be made up of reusable libraries; a command-line tool was also included that could parse one or more C header files and generate code for Harlequin Dylan C-FFI.  (In 2001 I also made use of the Pidgin libraries as part of a tool to generate interfaces for glib 1.x and gtk 1.x.)

The C-FFI library was the Harlequin Dylan (later Functional Objects Functional Developer, now Open Dylan) take on a low-level foreign function interface facility.  It provides a fairly elegant way of working with C types, functions, and values.  The library was originally designed to be the lower-level substrate for a higher-level facility called Collage... but unfortunately Collage was never implemented.  Instead, they used gema along with a number of custom transformation patterns to generate C-FFI definitions from the Win32 headers.

At one point I worked on implementing C-FFI in d2c, the Gwydion Dylan compiler.  I implemented a lot of the compiler metaclass support needed for C-FFI designator classes, but never got any further than that.

Since Harlequin/Functional Objects never finished designing and implementing Collage, I've taken it upon myself to do so.  In particular, it will be a foreign function interface, built into the compiler (like Creole), and designed to work well with C-FFI.  The programmer's interface will look as much like Creole and Melange as possible, making it possible to adapt existing definitions with little work.

Collage needs to be built on top of a high quality C parser/internal representation.  Inspired by the SML-NJ ckit library and by George Necula's CIL, in 2004 I started work on CPR, the C Program Representation library. CPR consists of representation classes for C types, expressions, and programs, along with a C preprocessor, C parser, and a pretty-printer for writing out C language textual representations of the intermediate language representation.  In addition to being useful for Collage, CPR could also be used for constructing a C compiler (by connecting it to the Open Dylan HARP backend, say) or for writing the same sort of static analysis tools that people use CIL for.  It is designed to support multiple C dialects, including C89, C99, GNU C extensions, and Microsoft Visual C.

CPR has been an exercise in test-driven development using the excellent Testworks framework for Dylan. In lieu of a commercial test suite for C, I've been testing using the test suites included with mcpp and with gcc.  Both of these are based on Dejagnu, a test framework written using Tcl (Expect, actually).  Most of the tests are written as C source files containing embedded Dejagnu directives (which use Tcl syntax) in comments.  Dealing with these directives required writing my own Tcl parser, with (since I insisted on doing test-driven development) its own test suite based on the tests included in the Tcl source code.  (These currently only implement just enough to handle embedded Dejagnu, though it could be expanded if some application required it.)

Currently the CPR preprocessor is almost complete, modulo some difficulties with integer arithmetic (as described below).  The expression representation classes, parser, and pretty-printer are complete, and I've begun work on the parsing and representation of type names and declarations.

Numbers

The Dylan Reference Manual states that

Implementations are required to support integers with at least 28 bits of precision. The overflow and underflow behavior is implementation-defined. (Some implementations may choose to have integers of unlimited size, but this is not required.)

Gwydion Dylan implements 32 bits of <integer> precision.  This makes use of d2c's two-word tagged representation for objects.  No checking for overflow or underflow is done.  Infinite-precision integers (bignums) are provided via the <extended-integer> class, included in an extensions module in the Dylan library.

Open Dylan, on the other hand, uses a tagged pointer representation with two tag bits:

  • #b00 => Object pointer
  • #b01 => <integer>
  • #b10 => <character>
  • #b11 => <unicode-character>

This leaves 30 bits for the integer values.  (Or 62 bits for 64-bit procesors, if we supported them.) As described here, integer overflows and underflows cause an exception to be raised.

Both implementations support a <double-integer> class, corresponding roughly to a C "long long int" (usually 64 bits).  In the case of Open Dylan, this is provided in the context of the big-integers and generic-arithmetic libraries, which allow users to re-map the <integer> class and the arithmetic functions to generic versions that allow more bits of precision than the standard 30 bits.

For CPR to support integer arithmetic in a generic way, infinite-precision integers are needed.  So that I can rely on having <extended-integer> with both compilers, I'm currently working on implementing the necessary support in Open Dylan.  Most of this is being placed in the big-integers library, but some changes to the compiler (DFMC) are needed too.  When I last left off I had implemented addition, subtraction, O(n2) multiplication, some of the logical arithmetic functions, and had started implementing division.  TAoCP volume 2 and the GMP manual detail some of  the more sophisticated algorithms that I'll add once the basic functionality is out of the way.

Several years ago I proposed, and then implemented in Gwydion Dylan, a number of extensions to the Common Dylan library for doing floating point arithmetic.  Implementing these extensions, in particular guaranteed-accurate conversions between decimal and floating-point, requires infinite-precision integers.  Once <extended-integer> support is complete these extensions can be added to Open Dylan as well.

(Actually, Bruce Hoult recently described in IRC a clever new decimal-to-floating point algorithm that allows one to limit the size of the integers involved to 1152 bits.)

There are additional things that can be done to improve integer arithmetic.  Using the #b01 tag means that when you add two tagged integers, you need to either mask the tag off of one of them, or subtract one from the result, in order to yield a valid tagged integer.  That's a waste of an instruction in a frequently-executed operation.  We can do better than that with the following tag scheme, inspired by something I first read in Appel's Compiling with Continuations:

  • #b00 => Even <integer> 
  • #b01 => Object pointer
  • #b10 => Odd <integer>
  • #b11 => <character>

Now adding or subtracting integers doesn't require any tag fixups, instance?(<integer>, x) is a single-bit test, and we get 31 bits (or 63 for 64-bit processors) instead of just 30 (or 62).  It doesn't matter that object pointer references are to one more than the actual (32-bit aligned) address, since most accesses to object fields use an immediate offset anyway, so we just reduce the offset in the addressing mode by 1.  (This would also require a change to the garbage collector so that it knows how to recognize offset pointers.)

I think we could do even better by making bignums transparent: integer arithmetic would transparently overflow into bignums when needed, making it possible for programmers to think of <integer> as the mathematical set of integers. The DRM could be changed to read:

Implementations are required to support integers of unlimited size.

Doing this efficiently would require that the compiler do integer range analysis in order to prove when fixed-precision integer arithmetic is sufficient.  This sort of analysis is currently done by d2c but not by the Open Dylan compiler.

Note that since the size field of most sequence types (such as <simple-object-vector>) will necessarily fit in a fixed-precision integer since the maximum number of elements is 1/4 (or 1/8) the size of the address space.  This constraint would allow a significant fraction of arithmetic operations to be optimized to use fixnum-only arithmetic. 

Open Dylan Project-Manager

The project-manager component (currently the projects, registry-projects, and user-projects libraries) of the Open Dylan compiler and development environment is responsible for:

  • Locating Dylan libraries given their names
  • Locating the database (*.ddb) file for a compiled library
  • Managing the project file (*.hdp or *.lid)
  • Keeping track of dependencies between libraries, and determining when a library needs to be recompiled
  • Determining whether particular cross-library bindings should be done in "tight" or "loose" mode
  • Interfacing with the compiler and build system to control compilation and linking
  • Interfacing with the "tools" plugin interface for constructing Dylan source files (or entire subprojects) out of some other source.  Current tools include tool-scepter (the CORBA IDL compiler), tool-parser-generator (the LALR(1) parser generator for *.dylgram files), and motley (the OLE type library interface generator).

There are currently three types of projects: binary-only projects, user projects, and registry projects.  Binary-only projects have only a *.ddb file and a shared library (*.dll/*.so), and of course can never be recompiled.  Binary projects belonged to library packs. Registry projects are opened via a registry, which is a directory tree containing one-line files which map library names to LID or *.hdp project files in a source code hierarchy.  User projects are opened by referencing their LID or *.hdp files directly.

Before r9982, things worked like this:

  • When opening a user project (i.e., via a LID or HDP file), the results of compilation (build temporaries, linked DLL, import library, etc.) were placed in subdirectories of the directory where the project file was found.
  • System projects were never recompiled, even if the source code was modified.

Since at least r10747, the project manager works like this:

  • No matter what kind of library you open, the results of compilation are stored in subdirectories of ~/Open-Dylan (on Linux/FreeBSD/Mac OS) or %AppData%\Open-Dylan (on Win32)
  • This means that the first time you compile something with Open Dylan, it recompiles all of the libraries on which it depends (such as dylan, common-dylan, io, system, etc.) into your personal root (such as ~/Open-Dylan)
  • Warning messages from system projects (dylan, common-dylan, etc.) show up in the Warnings tab in the IDE whenever you compile something that uses them
  • Registry projects are easier to use than they were before

I like this less than the old way, and have been known to revert a few of the relevant revisions in my local working copy during development. Of course, either way is short of ideal.

In order to solve this in a more flexible way, allowing people to customize the environment to get the behavior they want, I came up with the concept of Dylan Library Catalogs. So far I've defined the requirements, but don't have all of the design ready.

Other improvements can be made to the project manager, such as:

  • It should be possible to support projects containing no Dylan code, only (say) a Jamfile.  This would allow projects with significant amounts of non-Dylan code (such as Open Dylan itself) to be built entirely under the control of the Dylan environment.
  • It ought to be possible to do a re-link of a *.so shared library to change its rpath (as is often done using "libtool install") without re-compiling the library
  • The tools interface should be supported for registry projects as well as for user projects.  (Working around this currently makes the build process more complex than it needs to be.)

At one point Andy Armstrong began implementing a new project manager (the currently unused projects-protocol and dfmc-projects libraries).  I haven't yet tried hooking this up to the build to see how well it works.

Runtime Cleanup

The Open Dylan runtime source code is a bit of a mess. Problems include:

  • The run-time for the native code generator is found under sources/lib/run-time, whereas the run-time for the C code generator is under sources/dfmc/c-run-time
  • The source directories and their subdirectories contain a haphazard mix of platform-dependent and platform-independent files
  • In some cases source files get copied from one directory to another during build
  • Some *.c files #include other *.c files
  • Either MPS or the Boehm-Demers-Weiser garbage collector is supported; the choice is handled via a set of haphazard #ifdefs in the runtime, rather than providing a clean interface
  • Certain structures and interface functions are declared both in the C code and (in Dylan) in the run-time generator, requiring that they be manually kept in sync
  • The Makefiles for each platform are completely separate, and probably don't accurately reflect the inter-file dependencies
What it mainly needs is just a little refactoring care and attention.  I've been holding off doing any until I can replace the Makefiles with Jamfiles.  This could be done, I guess, except that the Jam automatic dependency mechanism depends on using a regular expression with \(...\) submatch addressing.  That is, unfortunately, one of the few things missing from the Open Dylan Jam implementation.  I have been planning to rectify this by adapting the Jam interpreter's regular-expression implementation to use one of the algorithms in Ville Laurikari's master's thesis or in his TRE regular expression library.

It's worth noting that we could also solve the problem of keeping the C and Dylan sides of the run-time interface in sync by using CPR to parse header files in the run-time.

As far as directory layout is concerned, I think it might make sense to place both source directories under lib/dylan, since the code is statically linked into the Dylan library.

Tags:

Milestone
housel
I finally made it to the minor prophets (Hosea何西亞) in my daily Bible reading in Chinese.  I started with the Gospel of John in 1995, when it was becoming obvious that Bible knowledge in Chinese was something I would have a continual need for.  I kept at it, eventually wrapping around to the Old Testament, and yesterday finally finishing the major prophets.

In 1990 I read in Wesley L. Duewel's Mighty Prevailing Prayer:
Every morning [Jonathan] Goforth, within a half hour of rising, began intensive Bible study with pencil and notebook.  Whether preaching or doing personal evangelism, Goforth always had an open Bible in his hand.  At one point in his life he had read the entire Bible thirty-five times in Chinese alone.  He had read his Chinese New Testament sixty times, and by the time of his death he had read the entire Bible consecutively seventy-three times.

It's going to be hard to match that.
Tags: ,

You are viewing housel