| Glulx | |
| A 32-Bit Virtual Machine for IF | |
| VM specification version 3.1.2 | |
| Andrew Plotkin <erkyrath@eblong.com> | |
| Copyright 1999-2010 by Andrew Plotkin. This specification is licensed under | |
| a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported | |
| License: <http://creativecommons.org/licenses/by-nc-sa/3.0> | |
| The virtual machine *described* by this document is an idea, not an | |
| expression of an idea, and is therefore not copyrightable. Anyone is free | |
| to write programs that run on the Glulx VM or make use of it, including | |
| compilers, interpreters, debuggers, and so on. | |
| This document and further Glulx information can be found | |
| at:<http://eblong.com/zarf/glulx/> | |
| 0 Introduction | |
| 0.1 Why Bother? | |
| 0.2 Glulx and Other IF Systems | |
| 0.3 Credits | |
| 1 The Machine | |
| 1.1 Input and Output | |
| 1.2 The Memory Map | |
| 1.3 The Stack | |
| 1.3.1 The Call Frame | |
| 1.3.2 Call Stubs | |
| 1.3.3 Calling and Returning | |
| 1.3.4 Calling and Returning Within Strings | |
| 1.3.5 Calling and Returning During Output Filtering | |
| 1.4 The Header | |
| 1.5 Instruction Format | |
| 1.6 Typable Objects | |
| 1.6.1 Strings | |
| 1.6.1.1 Unencoded strings | |
| 1.6.1.2 Unencoded Unicode strings | |
| 1.6.1.3 Compressed strings | |
| 1.6.1.4 The String-Decoding Table | |
| 1.6.2 Functions | |
| 1.6.3 Other Glulx Objects | |
| 1.6.4 User-Defined Objects | |
| 1.7 Floating-Point Numbers | |
| 1.8 The Save-Game Format | |
| 1.8.1 Contents of Dynamic Memory | |
| 1.8.2 Contents of the Stack | |
| 1.8.3 Memory Allocation Heap | |
| 1.8.4 Associated Story File | |
| 1.8.5 State Not Saved | |
| 2 Dictionary of Opcodes | |
| 2.1 Integer Math | |
| 2.2 Branches | |
| 2.3 Moving Data | |
| 2.4 Array Data | |
| 2.5 The Stack | |
| 2.6 Functions | |
| 2.7 Continuations | |
| 2.8 Memory Map | |
| 2.9 Memory Allocation Heap | |
| 2.10 Game State | |
| 2.11 Output | |
| 2.12 Floating-Point Math | |
| 2.13 Floating-Point Comparisons | |
| 2.14 Random Number Generator | |
| 2.15 Block Copy and Clear | |
| 2.16 Searching | |
| 2.17 Accelerated Functions | |
| 2.18 Miscellaneous | |
| 2.19 Assembly Language | |
| 0: Introduction | |
| Glulx is a simple solution to a fairly trivial problem. We want a virtual | |
| machine which the Inform compiler can compile to, without the increasingly | |
| annoying restrictions of the Z-machine. | |
| Glulx does this, without much fuss. All arithmetic is 32-bit (although | |
| there are opcodes to handle 8-bit and 16-bit memory access.) Input and | |
| output are handled through the Glk API (which chops out half the Z-machine | |
| opcodes, and most of the complexity of a Z-code interpreter.) Some care has | |
| been taken to make the bytecode small, but simplicity and elbow room are | |
| considered more important -- bytecode is not a majority of the bulk in | |
| current Inform games. | |
| 0.1: Why Bother? | |
| We're buried in IF VMs already, not to mention general VMs like Java, not | |
| to mention other interpreters or bytecode systems like Perl. Do we need | |
| another one? | |
| Well, maybe not, but Glulx is simple enough that it was easier to design | |
| and implement it than to use something else. Really. | |
| The Inform compiler already does most of the work of translating a | |
| high-level language to bytecode. It has long since outgrown many of the | |
| IF-specific features of the Z-machine (such as the object structure.) So it | |
| makes sense to remove those features, leaving a generic VM. Furthermore, | |
| there are enough other constraints (Inform's assumption of a flat memory | |
| model, the desire to have a lightweight VM suitable for PDAs) that no | |
| existing system is really ideal. So it seems worthwhile to design a new one. | |
| Indeed, most of the effort that has gone into this system has been | |
| modifying Inform. Glulx itself is nearly an afterthought. | |
| 0.2: Glulx and Other IF Systems | |
| Glulx grew out of the desire to extend Inform. However, it may well be | |
| suitable as a VM for other IF systems. | |
| Or maybe not. Since Glulx *is* so lightweight, a compiler has to be fairly | |
| complex to compile to it. Many IF systems take the approach of a simple | |
| compiler, and a complex, high-level, IF-specific interpreter. Glulx is not | |
| suitable for this. | |
| However, if a system wants to use a simple runtime format with 32-bit data, | |
| Glulx may be a good choice. | |
| Note that this is entirely separate from question of the I/O layer. Glulx | |
| uses the Glk I/O API, for the sake of simplicity and portability. Any IF | |
| system can do the same. One can use Glk I/O without using the Glulx | |
| game-file format. | |
| On the obverse, one could also extend the Glulx VM to use a different I/O | |
| system instead of Glk. One such extension is FyreVM, a commercial IF system | |
| developed by Textfyre. FyreVM is described at | |
| <http://textfyre.com/fyrevm/>. | |
| Other extension projects, not yet solidified, are being developed by Dannii | |
| Willis. See <http://curiousdannii.github.com/if/>. | |
| This specification does not cover FyreVM and the other projects, except to | |
| note opcodes, gestalt selectors, and iosys values that are specific to them. | |
| 0.3: Credits | |
| Graham Nelson gets pretty much all of it. Without Inform, there would be no | |
| reason for any of this. The entirety of Glulx is fallout from my attempt to | |
| deconstruct Inform and rebuild its code generator in my own image, with | |
| Graham's support. | |
| 1: The Machine | |
| The Glulx machine consists of main memory, the stack, and a few registers | |
| (the program counter, the stack pointer, and the call-frame pointer.) | |
| Main memory is a simple array of bytes, numbered from zero up. When | |
| accessing multibyte values, the most significant byte is stored first | |
| (big-endian). Multibyte values are not necessarily aligned in memory. | |
| The stack is an array of values. It is not a part of main memory; the terp | |
| maintains it separately. The format of the stack is technically up to the | |
| implementation. However, the needs of the machine (especially the game-save | |
| format) leave about one good option. (See section 1.8, "The Save-Game | |
| Format".) One important point: the stack can be kept in either byte | |
| ordering. The program should make no assumptions about endianness on the | |
| stack. (In fact, programs should never need to care.) Values on the stack | |
| always have their natural alignment (16-bit values at even addresses, | |
| 32-bit values at multiples of four). | |
| The stack consists of a set of call frames, one for each function in the | |
| current chain. When a function is called, a new stack frame is pushed, | |
| containing the function's local variables. The function can then push or | |
| pull 32-bit values on top of that, to store intermediate computations. | |
| All values are treated as unsigned integers, unless otherwise noted. Signed | |
| integers are handled with the usual two's-complement notation. Arithmetic | |
| overflows and underflows are truncated, also as usual. | |
| 1.1: Input and Output | |
| No input/output facilities are built into the Glulx machine itself. | |
| Instead, the machine has one or more opcodes which dispatch calls to an I/O | |
| library. | |
| At the moment, that means Glk. All Glulx interpreters support the Glk I/O | |
| facility (via the glk opcode), and no other I/O facilities exist. However, | |
| other I/O libraries may be adapted to Glk in the future. For best behavior, | |
| a program should test for the presence of an I/O facility before using it, | |
| using the IOSystem gestalt selector (see section 2.18, "Miscellaneous"). | |
| One I/O system is set as current at any given time. This does not mean that | |
| the others are unavailable. (If the interpreter supports Glk, for example, | |
| the glk opcode will always function.) However, the basic Glulx output | |
| opcodes -- streamchar, streamnum, and streamstr -- always print using the | |
| current I/O system. | |
| Every Glulx interpreter supports at least one normal I/O facility (such as | |
| Glk), and also two special facilities. | |
| The "null" I/O system does nothing. If this is selected, all Glulx output | |
| is simply discarded. [[Silly, perhaps, but I like simple base cases.]] When | |
| the Glulx machine starts up, the null system is the current system. You | |
| must select a different one before using the streamchar, streamnum, or | |
| streamstr opcodes. | |
| The "filter" I/O system allows the Glulx program itself to handle output. | |
| The program specifies a function when selecting this I/O system. That | |
| function is then called for every single character of output that the | |
| machine generates (via streamchar, streamnum, or streamstr). The function | |
| can output its character directly via the glk opcode (or one of the other | |
| output opcodes). | |
| [[This may all seem rather baroque, but in practice most authors can ignore | |
| it. Most programs will want to test for the Glk facility, set it to be the | |
| current output system immediately, and then leave the I/O system alone for | |
| the rest of the game. All output will then automatically be handled through | |
| Glk.]] | |
| 1.2: The Memory Map | |
| Memory is divided into several segments. The sizes of the segments are | |
| determined by constant values in the game-file header. | |
| Segment Address (hex) | |
| +---------+ 00000000 | |
| | Header | | |
| | - - - - | 00000024 | |
| | | | |
| | ROM | | |
| | | | |
| +---------+ RAMSTART | |
| | | | |
| | RAM | | |
| | | | |
| | - - - - | EXTSTART | |
| | | | |
| | | | |
| +---------+ ENDMEM | |
| As you might expect, the section marked ROM never changes during execution; | |
| it is illegal to write there. Executable code and constant data are usually | |
| (but not necessarily) kept in ROM. Note that unlike the Z-machine, the | |
| Glulx machine's ROM comes before RAM; the 36-byte header is part of ROM. | |
| The boundary marked EXTSTART is a trivial gimmick for making game-files | |
| smaller. A Glulx game-file only stores the data from 0 to EXTSTART. When | |
| the terp loads it in, it allocates memory up to ENDMEM; everything above | |
| EXTSTART is initialized to zeroes. Once execution starts, there is no | |
| difference between the memory above and below EXTSTART. | |
| For the convenience of paging interpreters, the three boundaries RAMSTART, | |
| EXTSTART, and ENDMEM must be aligned on 256-byte boundaries. | |
| Any of the segments of memory can be zero-length, except that ROM must be | |
| at least 256 bytes long (so that the header fits in it). | |
| 1.3: The Stack | |
| The stack pointer starts at zero, and the stack grows upward. The maximum | |
| size of the stack is determined by a constant value in the game-file | |
| header. For convenience, this must be a multiple of 256. | |
| The stack pointer counts in bytes. If you push a 32-bit value on the stack, | |
| the pointer increases by four. | |
| 1.3.1: The Call Frame | |
| A call frame looks like this: | |
| +------------+ FramePtr | |
| | Frame Len | (4 bytes) | |
| | Locals Pos | (4 bytes) | |
| | | | |
| | Format of | (2*n bytes) | |
| | Locals | | |
| | | | |
| | Padding | (0 or 2 bytes) | |
| +------------+ FramePtr+LocalsPos | |
| | Locals | (1, 2, or 4 bytes each) | |
| | | | |
| | Padding | (0 to 3 bytes) | |
| +------------+ FramePtr+FrameLen | |
| | Values | (4 bytes each) | |
| | .... | | |
| +------------+ StackPtr | |
| When a function begins executing, the last segment is empty (StackPtr | |
| equals FramePtr+FrameLen.) Computation can push and pull 32-bit values on | |
| the stack. It is illegal to pop back beyond the original FramePtr+FrameLen | |
| boundary. | |
| The "locals" are a list of values which the function uses as local | |
| variables. These also include function arguments. (The first N locals can | |
| be used as the arguments to an N-argument function.) Locals can be 8, 16, | |
| or 32-bit values. They are not necessarily contiguous; padding is inserted | |
| wherever necessary to bring a value to its natural alignment (16-bit values | |
| at even addresses, 32-bit values at multiples of four). | |
| The "format of locals" is a series of bytes, describing the arrangement of | |
| the "locals" section of the frame (from LocalsPos up to FrameLen). This | |
| information is copied directly from the header of the function being | |
| called. (See section 1.6.2, "Functions".) | |
| Each field in this section is two bytes: | |
| * LocalType: 1, 2, or 4, indicating a set of locals which are that many | |
| bytes each. | |
| * LocalCount: 1 to 255, indicating how many locals of LocalType to | |
| declare. | |
| The section is terminated by a pair of zero bytes. Another pair of zeroes | |
| is added if necessary to reach a four-byte boundary. | |
| (Example: if a function has three 8-bit locals followed by six 16-bit | |
| locals, the format segment would contain eight bytes: (1, 3, 2, 6, 0, 0, 0, | |
| 0). The locals segment would then be 16 bytes long, with a padding byte | |
| after the third local.) | |
| The "format of locals" information is needed by the terp in two places: | |
| when calling a function (to write in function arguments), and when saving | |
| the game (to fix byte-ordering of the locals.) The formatting is *not* | |
| enforced by the terp while a function is executing. The program is not | |
| prevented from accessing locations whose size and position don't match the | |
| formatting, or locations that overlap, or even locations in the padding | |
| between locals. However, if a program does this, the results are undefined, | |
| because the byte-ordering of locals is up to the terp. The save-game | |
| algorithm will fail, if nothing else. | |
| [[In fact, the call frame may not exist as a byte sequence during function | |
| execution. The terp is free to maintain a more structured form, as long as | |
| it generates valid save-game files, and correctly handles accesses to valid | |
| (according to the format) locals.]] | |
| [[NOTE: 8-bit and 16-bit locals have never been in common use, and this | |
| spec has not been unambiguous in describing their handling. (By which I | |
| mean, what I implemented in the reference interpreter didn't match the | |
| spec.) Therefore, 8-bit and 16-bit locals are deprecated. Use of the copyb | |
| and copys opcodes with a local-variable operand is also deprecated.]] | |
| 1.3.2: Call Stubs | |
| Several different Glulx operations require the ability to jump back to a | |
| previously-saved execution state. (For example: function call/return, | |
| game-state save/restore, and exception catch/throw.) | |
| For simplicity, all these operations store the execution state the same way | |
| -- as a "call stub" on the stack. This is a block of four 32-bit values. It | |
| encodes the PC and FramePtr, and also a location to store a single 32-bit | |
| value at jump-back time. (For example, the function return value, or the | |
| game-restore success flag.) | |
| The values are pushed on the stack in the following order (FramePtr pushed | |
| last): | |
| +-----------+ | |
| | DestType | (4 bytes) | |
| | DestAddr | (4 bytes) | |
| | PC | (4 bytes) | |
| | FramePtr | (4 bytes) | |
| +-----------+ | |
| FramePtr is the current value of FramePtr -- the stack position of the call | |
| frame of the function during which the call stub was generated. | |
| PC is the current value of the program counter. This is the address of the | |
| instruction *after* the one which caused the call stub to be generated. | |
| (For example, for a function call, the call stub contains the address of | |
| the first instruction to execute after the function returns.) | |
| DestType and DestAddr describe a location in which to store a result. This | |
| will occur after the operation is completed (function returned, game | |
| restored, etc). It happens after the PC and FramePtr are reloaded from the | |
| call stub, and the call stub is removed from the stack. | |
| DestType is one of the following values: | |
| * 0: Do not store. The result value is discarded. DestAddr should be | |
| zero. | |
| * 1: Store in main memory. The result value is stored in the | |
| main-memory address given by DestAddr. | |
| * 2: Store in local variable. The result value is stored in the call | |
| frame at position ((FramePtr+LocalsPos) + DestAddr). See section 1.5, | |
| "Instruction Format". | |
| * 3: Push on stack. The result value is pushed on the stack. DestAddr | |
| should be zero. | |
| The string-decoding mechanism complicates matters a little, since it is | |
| possible for a function to be called from inside a string, instead of | |
| another function. (See section 1.3.4, "Calling and Returning Within | |
| Strings".) The following DestType values allow this: | |
| * 10: Resume printing a compressed (E1) string. The PC value contains | |
| the address of the byte (within the string) to continue printing in. The | |
| DestAddr value contains the bit number (0 to 7) within that byte. | |
| * 11: Resume executing function code after a string completes. The PC | |
| value contains the program counter as usual, but the FramePtr field is | |
| ignored, since the string is printed in the same call frame as the function | |
| that executed it. DestAddr should be zero. | |
| * 12: Resume printing a signed decimal integer. The PC value contains | |
| the integer itself. The DestAddr value contains the position of the digit | |
| to print next. (0 indicates the first digit, or the minus sign for negative | |
| integers; and so on.) | |
| * 13: Resume printing a C-style (E0) string. The PC value contains the | |
| address of the character to print next. The DestAddr value should be zero. | |
| * 14: Resume printing a Unicode (E2) string. The PC value contains the | |
| address of the (four-byte) character to print next. The DestAddr value | |
| should be zero. | |
| 1.3.3: Calling and Returning | |
| When a function is called, the terp pushes a four-value call stub. (This | |
| includes the return-value destination, the PC, and the FramePtr; see | |
| section 1.3.2, "Call Stubs".) The terp then sets the FramePtr to the | |
| StackPtr, and builds a new call frame. (See section 1.3.1, "The Call | |
| Frame".) The PC moves to the first instruction of the function, and | |
| execution continues. | |
| Function arguments can be stored in the locals of the new call frame, or | |
| pushed on the stack above the new call frame. This is determined by the | |
| type of the function; see section 1.6.2, "Functions". | |
| When a function returns, the process is reversed. First StackPtr is set | |
| back to FramePtr, throwing away the current call frame (and any pushed | |
| values). The FramePtr and PC are popped off the stack, and then the | |
| return-value destination. The function's return value is stored where the | |
| destination says it should be. Then execution continues at the restored PC. | |
| (But note that a function can also return to a suspended string, as well as | |
| a suspended caller function. See section 1.3.4, "Calling and Returning | |
| Within Strings" and section 1.3.5, "Calling and Returning During Output | |
| Filtering".) | |
| 1.3.4: Calling and Returning Within Strings | |
| Glulx uses a Huffman string-compression scheme. This allows bit sequences | |
| in strings to decode to large strings, or even function invocations which | |
| generate output. This means the streamstr opcode can invoke function calls, | |
| and we must therefore be able to represent this situation on the stack. | |
| When the terp begins printing a string, it pushes a type-11 call stub. | |
| (This includes only the current PC. The FramePtr is included, for | |
| consistency's sake, but it will be ignored when the call stub is read back | |
| off.) The terp then starts decoding the string data. The PC now indicates | |
| the position within the string data. | |
| If, during string decoding, the terp encounters an indirect reference to a | |
| string or function, it pushes a type-10 call stub. This includes the | |
| string-decoding PC, and the bit number within that address. It also | |
| includes the current FramePtr, which has not changed since string-printing | |
| began. | |
| If the indirect reference is to another string, the decoding continues at | |
| the new location after the type-10 stub is pushed. However, if the | |
| reference is to a function, the usual call frame is pushed on top of the | |
| type-10 stub, and the terp returns to normal function execution. | |
| When a string completes printing, the terp pops a call stub. This will | |
| necessarily be either a type-10 or type-11. If the former, the terp resumes | |
| string decoding at the PC address/bit number in the stub. If the latter, | |
| the topmost string is finished, and the terp resumes function execution at | |
| the stub's PC. | |
| When a function returns, it must check to see if it was called from within | |
| a string, instead of from another function. This is the case if the call | |
| stub it pops is type-10. (The call stub cannot be type-11.) If so, the | |
| FramePtr is taken from the stub as usual; but the stub's PC is taken to | |
| refer to a string data address, with the DestAddr value being the bit | |
| number within that address. (The function's return value is discarded.) | |
| String decoding resumes from there. | |
| [[It may seem wasteful for the terp to push and pop a call stub every time | |
| a string is printed. Fortunately, in the most common case -- printing a | |
| string with no indirect references at all -- this can easily be optimized | |
| out. (No VM code is executed between the push and pop, so it is safe to | |
| skip them.) Similarly, when printing an unencoded (E0) string, there can be | |
| no indirect references, so it is safe to optimize away the call stub | |
| push/pop.]] | |
| 1.3.5: Calling and Returning During Output Filtering | |
| The "filter" I/O system allows the terp to call a Glulx function for each | |
| character that is printed via streamchar, streamnum, or streamstr. We must | |
| be able to represent this situation on the call stack as well. | |
| If filtering is the current I/O system, then when the terp executes | |
| streamchar, it pushes a normal function call stub and begins executing the | |
| output function. Nothing else is required; when the function returns, | |
| execution will resume after the streamchar opcode. (A type-0 call stub is | |
| used, so the function's return value is discarded.) | |
| The other output opcodes are more complex. When the terp executes | |
| streamnum, it pushes a type-11 call stub. As before, this records the | |
| current PC. The terp then pushes a type-12 call stub, which contains the | |
| integer being printed and the position of the next character to be printed | |
| (namely 1). It then executes the output function. | |
| When the output function returns, the terp pops the type-12 stub and | |
| realizes that it should continue printing the integer contained therein. It | |
| pushes another type-12 stub back on the stack, indicating that the next | |
| position to print is 2, and calls the output function again. | |
| This process continues until there are no more characters in the decimal | |
| representation of the integer. The terp then pops the type-11 stub, | |
| restores the PC, and resumes execution after the streamnum opcode. | |
| The streamstr opcode works on the same principle, except that instead of | |
| type-12 stubs, the terp uses type-10 stubs (when interrupting an encoded | |
| string) and type-13/14 stubs (when interruping a C-style, null-terminated | |
| string of bytes/Unicode chars). Type-13 and type-14 stubs look like the | |
| others, except that they contain only the address of the next character to | |
| print; no other position or bit number is necessary. | |
| The interaction between the filter I/O system and indirect string/function | |
| calls within encoded strings is left to the reader's imagination. [[Because | |
| I couldn't explain it if I tried. Follow the rules; they work.]] | |
| 1.4: The Header | |
| The header is the first 36 bytes of memory. It is always in ROM, so its | |
| contents cannot change during execution. The header is organized as nine | |
| 32-bit values. (Recall that values in memory are always big-endian.) | |
| +---------------+ address 0 | |
| | Magic Number | (4 bytes) | |
| | Glulx Version | (4 bytes) | |
| | RAMSTART | (4 bytes) | |
| | EXTSTART | (4 bytes) | |
| | ENDMEM | (4 bytes) | |
| | Stack Size | (4 bytes) | |
| | Start Func | (4 bytes) | |
| | Decoding Tbl | (4 bytes) | |
| | Checksum | (4 bytes) | |
| +---------------+ | |
| * Magic number: 47 6C 75 6C, which is to say ASCII 'Glul'. | |
| * Glulx version number: The upper 16 bits stores the major version | |
| number; the next 8 bits stores the minor version number; the low 8 bits | |
| stores an even more minor version number, if any. This specification is | |
| version 3.1.2, so a game file generated to this spec would contain 00030102. | |
| * RAMSTART: The first address which the program can write to. | |
| * EXTSTART: The end of the game-file's stored initial memory (and | |
| therefore the length of the game file.) | |
| * ENDMEM: The end of the program's memory map. | |
| * Stack size: The size of the stack needed by the program. | |
| * Address of function to execute: Execution commences by calling this | |
| function. | |
| * Address of string-decoding table: This table is used to decode | |
| compressed strings. See section 1.6.1.3, "Compressed strings". This may be | |
| zero, indicating that no compressed strings are to be decoded. [[Note that | |
| the game can change which table the terp is using, with the setstringtbl | |
| opcode. See section 2.11, "Output".]] | |
| * Checksum: A simple sum of the entire initial contents of memory, | |
| considered as an array of big-endian 32-bit integers. The checksum should | |
| be computed with this field set to zero. | |
| The interpreter should validate the magic number and the Glulx version | |
| number. An interpreter which is written to version X.Y.Z of this | |
| specification should accept game files whose Glulx version between X.0.0 | |
| and X.Y.*. (That is, the major version number should match; the minor | |
| version number should be less than or equal to Y; the subminor version | |
| number does not matter.) | |
| EXCEPTION: A version 3.* interpreter should accept version 2.0 game files. | |
| The only difference between spec 2.0 and spec 3.0 is that 2.0 lacks Unicode | |
| functionality. Therefore, an interpreter written to this version of the | |
| spec (3.1.2) should accept game files whose version is between 2.0.0 and | |
| 3.1.* (0x00020000 and 0x000301FF inclusive). | |
| [[These rules mean, in the vernacular, that minor version changes are | |
| backwards compatible, and subminor version changes are backwards and | |
| forwards compatible. If I add a feature which I expect every terp to | |
| implement (e.g. mzero and mcopy), then I bump the minor version number, and | |
| your game can use that feature without worrying about availability. If I | |
| add a feature which not all terps will implement (e.g. floating point), | |
| then I bump the subminor version number, and your game should only use the | |
| feature after doing a gestalt test for availability.]] | |
| [[The header is conventionally followed by a 32-bit word which describes | |
| the layout of data in the rest of the file. This value is *not* a part of | |
| the Glulx specification; it is the first ROM word after the header, not a | |
| part of the header. It is an option that compilers can insert, when | |
| generating Glulx files, to aid debuggers and decompilers. | |
| For Inform-generated Glulx files, this descriptive value is 49 6E 66 6F, | |
| which is to say ASCII 'Info'. There then follow several more bytes of data | |
| relevant to the Inform compiler. See the Glulx chapter of the Inform | |
| Technical Manual.]] | |
| 1.5: Instruction Format | |
| There are 2^28 Glulx opcodes, numbered from 0 to 0FFFFFFF. If this proves | |
| insufficient, more may be added in the future. | |
| An instruction is encoded as follows: | |
| +--------------+ | |
| | Opcode Num | (1 to 4 bytes) | |
| | | | |
| | Operand | (two per byte) | |
| | Addr Modes | | |
| | | | |
| | Operand Data | (as defined by | |
| | .... | addr modes) | |
| +--------------+ | |
| The opcode number OP, which can be anything up to 0FFFFFFF, may be packed | |
| into fewer than four bytes: | |
| * 00..7F: One byte, OP | |
| * 0000..3FFF: Two bytes, OP+8000 | |
| * 00000000..0FFFFFFF: Four bytes, OP+C0000000 | |
| Note that the length of this field can be decoded by looking at the top two | |
| bits of the first byte. Also note that, for example, 01 and 8001 and | |
| C0000001 all represent the same opcode. | |
| The operand addressing modes are a list of fields which tell where opcode | |
| arguments are read from or written to. Each is four bits long, and they are | |
| packed two to a byte. (They occur in the same order as the arguments, low | |
| bits first. If there are an odd number, the high bits of the last byte are | |
| left zero.) | |
| Since each addressing mode is a four-bit number, there are sixteen | |
| addressing modes. Each is associated with a fixed number of bytes in the | |
| "operand data" segment of the instruction. These bytes appear after the | |
| addressing modes, in the same order. (There is no alignment padding.) | |
| * 0: Constant zero. (Zero bytes) | |
| * 1: Constant, -80 to 7F. (One byte) | |
| * 2: Constant, -8000 to 7FFF. (Two bytes) | |
| * 3: Constant, any value. (Four bytes) | |
| * 4: (Unused) | |
| * 5: Contents of address 00 to FF. (One byte) | |
| * 6: Contents of address 0000 to FFFF. (Two bytes) | |
| * 7: Contents of any address. (Four bytes) | |
| * 8: Value popped off stack. (Zero bytes) | |
| * 9: Call frame local at address 00 to FF. (One byte) | |
| * A: Call frame local at address 0000 to FFFF. (Two bytes) | |
| * B: Call frame local at any address. (Four bytes) | |
| * C: (Unused) | |
| * D: Contents of RAM address 00 to FF. (One byte) | |
| * E: Contents of RAM address 0000 to FFFF. (Two bytes) | |
| * F: Contents of RAM, any address. (Four bytes) | |
| Things to note: | |
| The "constant" modes sign-extend their data into a 32-bit value; the other | |
| modes do not. This is just because negative constants occur more frequently | |
| than negative addresses. | |
| The indirect modes (all except "constant") access 32-bit fields, either in | |
| the stack or in memory. This means four bytes starting at the given | |
| address. A few opcodes are exceptions: copyb and copys (copy byte and copy | |
| short) access 8-bit and 16-bit fields (one or two bytes starting at the | |
| given address.) | |
| The "call frame local" modes access a field on the stack, starting at byte | |
| ((FramePtr+LocalsPos) + address). As described in section 1.3.1, "The Call | |
| Frame", this must be aligned with (and the same size as) one of the fields | |
| described in the function's locals format. It must not point outside the | |
| range of the current function's locals segment. | |
| The "contents of address" modes access a field in main memory, starting at | |
| byte (addr). The "contents of RAM" modes access a field in main memory, | |
| starting at byte (RAMSTART + addr). Since the byte-ordering of main memory | |
| is well-defined, these need not have any particular alignment or position. | |
| All address addition is truncated to 32 bits, and addresses are unsigned. | |
| So, for example, "contents of RAM" address FFFFFFFC (RAMSTART + FFFFFFFC) | |
| accesses the last 32-bit value in ROM, since it effectively subtracts 4 | |
| from RAMSTART. "Contents of address" FFFFFFFC would access the very last | |
| 32-bit value in main memory, assuming you can find a terp which handles | |
| four-gigabyte games. "Call frame local" FFFFFFFC is illegal; whether you | |
| interpret it as a negative number or a large positive number, it's outside | |
| the current call frame's locals segment. | |
| Some opcodes store values as well as reading them in. Store operands use | |
| the same addressing modes, with a few exceptions: | |
| * 8: The value is pushed into the stack, instead of being popped off. | |
| * 3, 2, 1: These modes cannot be used, since it makes no sense to store | |
| to a constant. [[We delicately elide the subject of Fortran. And rule-based | |
| property algebras.]] | |
| * 0: This mode means "throw the value away"; it is not stored at all. | |
| Operands are evaluated from left to right. (This is important if there are | |
| several push/pop operands.) | |
| 1.6: Typable Objects | |
| It is convenient for a program to store object references as 32-bit | |
| pointers, and still determine the type of a reference at run-time. | |
| To facilitate this, structured objects in Glulx main memory follow a simple | |
| convention: the first byte indicates the type of the object. | |
| At the moment, there are only two kinds of Glulx objects: functions and | |
| strings. A program (or compiler, or library) may declare more, but the | |
| Glulx VM does not have to know about them. | |
| Of course, not every byte in memory is the start of the legitimate object. | |
| It is the program's responsibility to keep track of which values validly | |
| refer to typable objects. | |
| 1.6.1: Strings | |
| Strings have a type byte of E0 (for unencoded, C-style strings), E2 (for | |
| unencoded strings of Unicode values), or E1 (for compressed strings.) Types | |
| E3 to FF are reserved for future expansion of string types. | |
| 1.6.1.1: Unencoded strings | |
| An unencoded string consists of an E0 byte, followed by all the bytes of | |
| the string, followed by a zero byte. | |
| 1.6.1.2: Unencoded Unicode strings | |
| An unencoded Unicode string consists of an E2 byte, followed by three | |
| padding 0 bytes, followed by the Unicode character values (each one being a | |
| four-byte integer). Finally, there is a terminating value (four 0 bytes). | |
| Unencoded Unicode string | |
| +----------------+ | |
| | Type: E2 | (1 byte) | |
| | Padding: 00 | (3 bytes) | |
| | Characters.... | (any length, multiple of 4) | |
| | NUL: 00000000 | (4 bytes) | |
| +----------------+ | |
| Note that the character data is not encoded in UTF-8, UTF-16, or any other | |
| peculiar encoding. It is treated as an array of 32-bit integers (which are, | |
| as always in Glulx, stored big-endian). Each integer is a Unicode code | |
| point. | |
| 1.6.1.3: Compressed strings | |
| A compressed string consists of an E1 byte, followed by a block of | |
| Huffman-encoded data. This should be read as a stream of bits, starting | |
| with the low bit (the 1 bit) of the first byte after the E1, proceeding | |
| through the high bit (the 128 bit), and so on with succeeding bytes. | |
| Decoding compressed strings requires looking up data in a Huffman table. | |
| The address of this table is normally found in the header. However, the | |
| program can select a different decompression table at run-time; see section | |
| 2.11, "Output". | |
| The Huffman table is logically a binary tree. Internal nodes are branch | |
| points; leaf nodes represent printable entities. To decode a string, begin | |
| at the root node. Read one bit from the bit stream, and go to the left or | |
| right child depending on its value. Continue reading bits and branching | |
| left or right, until you reach a leaf node. Print that entity. Then jump | |
| back to the root, and repeat the process. One particular leaf node | |
| indicates the end of the string (rather than any printable entity), and | |
| when the bit stream leads you to that node, you stop. | |
| [[This is a fairly slow process, with VM memory reads and a conditional | |
| test for every *bit* of the string. A terp can speed it up considerably by | |
| reading the Huffman table all at once, and caching it as native data | |
| structures. A binary tree is the obvious choice, but one can do even better | |
| (at the cost of some space) by looking up four-bit chunks at a time in a | |
| 16-branching tree.]] | |
| [[Note that decompression tables are not necessarily in ROM. This is | |
| particularly important for tables that are generated and selected at | |
| run-time. Furthermore, it is technically legal for a table in RAM to be | |
| altered at runtime -- possibly even when it is the currently-selected | |
| table. Therefore, an interpreter that caches or preloads this decompression | |
| data must be careful. If it caches data from RAM, it must watch for writes | |
| to that RAM space, and invalidate its cache upon seeing such a write.]] | |
| 1.6.1.4: The String-Decoding Table | |
| The decoding table has the following format: | |
| +-----------------+ | |
| | Table Length | (4 bytes) | |
| | Number of Nodes | (4 bytes) | |
| | Root Node Addr | (4 bytes) | |
| | Node Data .... | (table length - 12 bytes) | |
| +-----------------+ | |
| The table length is measured in bytes, from the beginning of the table to | |
| the end of the last node. The node count includes both branch and leaf | |
| nodes. [[There will, of course, be an odd number of nodes, and (N+1)/2 of | |
| them will be leaves.]] The root address indicates which node is the root of | |
| the tree; it is not necessarily the first node. This is an absolute | |
| address, not an offset from the beginning of the table. | |
| There then follow all the nodes, with no extra data before, between, or | |
| after them. They need not be in any particular order. There are several | |
| possible types of nodes, distinguished by their first byte. | |
| Branch (non-leaf node) | |
| +----------------+ | |
| | Type: 00 | (1 byte) | |
| | Left (0) Node | (4 bytes) | |
| | Right (1) Node | (4 bytes) | |
| +----------------+ | |
| The left and right node fields are addresses (again, absolute addresses) of | |
| the nodes to go to given a 0 or 1 bit from the bit stream. | |
| String terminator | |
| +----------------+ | |
| | Type: 01 | (1 byte) | |
| +----------------+ | |
| This ends the string-decoding process. | |
| Single character | |
| +----------------+ | |
| | Type: 02 | (1 byte) | |
| | Character | (1 byte) | |
| +----------------+ | |
| This prints a single character. [[The encoding scheme is the business of | |
| the I/O system; in Glk, it will be the Latin-1 character set.]] | |
| C-style string | |
| +----------------+ | |
| | Type: 03 | (1 byte) | |
| | Characters.... | (any length) | |
| | NUL: 00 | (1 byte) | |
| +----------------+ | |
| This prints an array of characters. Note that the array cannot contain a | |
| zero byte, since that is reserved to terminate the array. [[A zero byte can | |
| be printed using the single-character node type.]] | |
| Single Unicode character | |
| +----------------+ | |
| | Type: 04 | (1 byte) | |
| | Character | (4 bytes) | |
| +----------------+ | |
| This prints a single Unicode character. [[To be precise, it prints a 32-bit | |
| character, which will be interpreted as Unicode if the I/O system is Glk.]] | |
| C-style Unicode string | |
| +----------------+ | |
| | Type: 05 | (1 byte) | |
| | Characters.... | (any length, multiple of 4) | |
| | NUL: 00000000 | (4 bytes) | |
| +----------------+ | |
| This prints an array of Unicode characters. Note that the array cannot | |
| contain a zero word, since that is reserved to terminate the array. Also | |
| note that, unlike an E2-encoded string object, there is no padding. | |
| [[If the Glk library is unable to handle Unicode, node types 04 and 05 are | |
| still legal. However, characters beyond FF will be printed as 3F ("?").]] | |
| Indirect reference | |
| +----------------+ | |
| | Type: 08 | (1 byte) | |
| | Address | (4 bytes) | |
| +----------------+ | |
| This prints a string or calls a function, which is not actually part of the | |
| decoding table. The address may refer to a location anywhere in memory | |
| (including RAM.) It must be a valid Glulx string (see section 1.6.1, | |
| "Strings") or function (see section 1.6.2, "Functions"). If it is a string, | |
| it is printed. If a function, it is called (with no arguments) and the | |
| result is discarded. | |
| The management of the stack during an indirect string/function call is a | |
| bit tricky. See section 1.3.4, "Calling and Returning Within Strings". | |
| Double-indirect reference | |
| +----------------+ | |
| | Type: 09 | (1 byte) | |
| | Address | (4 bytes) | |
| +----------------+ | |
| This is similar to the indirect-reference node, but the address refers to a | |
| four-byte field in memory, and *that* contains the address of a string or | |
| function. The extra layer of indirection can be useful. For example, if the | |
| four-byte field is in RAM, its contents can be changed during execution, | |
| pointing to a new typable object, without modifying the decoding table | |
| itself. | |
| Indirect reference with arguments | |
| +----------------+ | |
| | Type: 0A | (1 byte) | |
| | Address | (4 bytes) | |
| | Argument Count | (4 bytes) | |
| | Arguments.... | (4*N bytes) | |
| +----------------+ | |
| Double-indirect reference with arguments | |
| +----------------+ | |
| | Type: 0B | (1 byte) | |
| | Address | (4 bytes) | |
| | Argument Count | (4 bytes) | |
| | Arguments.... | (4*N bytes) | |
| +----------------+ | |
| These work the same as the indirect and double-indirect nodes, but if the | |
| object found is a function, it will be called with the given argument list. | |
| If the object is a string, the arguments are ignored. | |
| 1.6.2: Functions | |
| Functions have a type byte of C0 (for stack-argument functions) or C1 (for | |
| local-argument functions). Types C2 to DF are reserved for future expansion | |
| of function types. | |
| A Glulx function always takes a list of 32-bit arguments, and returns | |
| exactly one 32-bit value. (If you want a function which returns no value, | |
| discard or ignore it. Store operand mode zero is convenient.) | |
| If the type is C0, the arguments are passed on the stack, and are made | |
| available on the stack. After the function's call frame is constructed, all | |
| the argument values are pushed -- last argument pushed first, first | |
| argument topmost. Then the number of arguments is pushed on top of that. | |
| All locals in the call frame itself are initialized to zero. | |
| If the type is C1, the arguments are passed on the stack, and are written | |
| into the locals according to the "format of locals" list of the function. | |
| Arguments passed into 8-bit or 16-bit locals are truncated. It is | |
| legitimate for there to be too many or too few arguments. Extras are | |
| discarded silently; any locals left unfilled are initialized to zero. | |
| A function has the following structure: | |
| +------------+ | |
| | C0 or C1 | Type (1 byte) | |
| +------------+ | |
| | Format of | (2*n bytes) | |
| | Locals | | |
| +------------+ | |
| | Opcodes | | |
| | .... | | |
| +------------+ | |
| The locals-format list is encoded the same way it is on the stack; see | |
| section 1.3.1, "The Call Frame". This is a list of LocalType/LocalCount | |
| byte pairs, terminated by a zero/zero pair. (There is, however, no extra | |
| padding to reach four-byte alignment.) | |
| Note that although a LocalType/LocalCount pair can only describe up to 255 | |
| locals, there is no restriction on how many locals the function can have. | |
| It is legitimate to encode several pairs in a row with the same LocalType. | |
| Immediately following the two zero bytes, the instructions start. There is | |
| no explicit terminator for the function. | |
| 1.6.3: Other Glulx Objects | |
| There are no other Glulx objects at this time, but type 80 to BF are | |
| reserved for future expansion. Type 00 is also reserved; it indicates "no | |
| object", and should not be used by any typable object. A null reference's | |
| type would be considered 00. (Even though byte 00000000 of main memory is | |
| not in fact 00.) | |
| 1.6.4: User-Defined Objects | |
| Types 01 to 7F are available for use by the compiler, the library, or the | |
| program. Glulx will not use them. | |
| [[Inform uses 60 for dictionary words, and 70 for objects and classes. It | |
| reserves types 40 to 7F. Types 01 to 3F remain available for use by Inform | |
| programmers.]] | |
| 1.7: Floating-Point Numbers | |
| Glulx values are 32-bit integers, big-endian when stored in memory. To | |
| handle floating-point math, we must be able to encode float values as | |
| 32-bit values. Unsurprisingly, Glulx uses the big-endian, single-precision | |
| IEEE-754 encoding. (See | |
| <http://www.psc.edu/general/software/packages/ieee/ieee.php>.) This allows | |
| floats to be stored in memory, on the stack, in local variables, and in any | |
| other place that a 32-bit value appears. | |
| However, float values and integer values are *not* interchangable. You | |
| cannot pass floats to the normal arithmetic opcodes, or vice versa, and | |
| expect to get meaningful answers. Always pass floats to the float opcodes | |
| and integers to the int opcodes, with the appropriate conversion opcodes to | |
| convert back and forth. (See section 2.12, "Floating-Point Math".) | |
| Floats have limited precision; they cannot represent all real values | |
| exactly. They can't even represent all integers exactly. (Integers between | |
| -1000000 and 1000000 (hex) have exact representations. Beyond that, the | |
| rounding error can be greater than 1. But when you get into fractions, | |
| errors are possible anywhere: 1/3 cannot be stored exactly.) | |
| Therefore, you must be careful when comparing results. A series of float | |
| operations may produce a result fractionally different from what you | |
| expect. When comparing float values, you will most often want to use the | |
| jfeq opcode, which tests whether two values are *near* each other (within a | |
| specified range). | |
| A float value has three fields in its 32 bits, from highest (the sign bit) | |
| to lowest: | |
| +---------------+ | |
| | Sign Bit (S) | (1 bit) | |
| | Exponent (E) | (8 bits) | |
| | Mantissa (M) | (23 bits) | |
| +---------------+ | |
| The interpretation of the value depends on the exponent value: | |
| * If E is FF and M is zero, the value is positive or negative infinity, | |
| depending on S. Infinite values represent overflows. (+Inf is 7F800000; | |
| -Inf is FF800000.) | |
| * If E is FF and M is nonzero, the value is a positive or negative NaN | |
| ("not a number"), depending on S. NaN values represent arithmetic failures. | |
| (+NaN values are in the range 7F800001 to 7FFFFFFF; -NaN are FF800001 to | |
| FFFFFFFF.) | |
| * If E is 00 and M is zero, the value is a positive or negative zero, | |
| depending on S. Zero values represent underflows, and also, you know, zero. | |
| (+0 is 00000000; -0 is 80000000.) | |
| * If E is 00 and M is nonzero, the value is a "denormalized" number, | |
| very close to zero: plus or minus 2^(-149)*M. | |
| * If E is anything else, the value is a "normalized" number: plus or | |
| minus 2^(E-150)*(800000+M). | |
| [[I'm using decimal exponents there amid all the hex constants. -149 is hex | |
| -95; -150 is hex -96. Sorry about that.]] | |
| The numeric formulas may look more familiar if you write them as | |
| 2^(-126)*(0.MMMM...) and 2^(E-127)*(1.MMMM...), where "0.MMMM..." is a | |
| fraction between zero and one (23 mantissa bits after the binal point) and | |
| "1.MMMM...." is a fraction beween one and two. | |
| Some example values: | |
| * 0.0 = 00000000 (S=0, E=00, M=0) | |
| * 1.0 = 3F800000 (S=0, E=7F, M=0) | |
| * -2.0 = C0000000 (S=1, E=80, M=0) | |
| * 100.0 = 42C80000 (S=0, E=85, M=480000) | |
| * pi = 40490FDB (S=0, E=80, M=490FDB) | |
| * 2*pi = 40C90FDB (S=0, E=81, M=490FDB) | |
| * e = 402DF854 (S=0, E=80, M=2DF854) | |
| To give you an idea of the behavior of the special values: | |
| * 1 / 0 = +Inf | |
| * -1 / 0 = -Inf | |
| * 1 / Inf = 0 | |
| * 1 / -Inf = -0 | |
| * 0 / 0 = NaN | |
| * 2 * 0 = 0 | |
| * 2 * -0 = -0 | |
| * +Inf * 0 = NaN | |
| * +Inf * 1 = +Inf | |
| * +Inf + +Inf = +Inf | |
| * +Inf * +Inf = +Inf | |
| * +Inf - +Inf = NaN | |
| * +Inf / +Inf = NaN | |
| NaN is sticky; almost *any* mathematical operation involving a NaN produces | |
| NaN. (There are a few exceptions.) | |
| However, Glulx does not guarantee *which* NaN value you will get from such | |
| operations. The underlying platform may try to encode information about | |
| what operation failed in the mantissa field of the NaN. Or, contrariwise, | |
| it may return the same value for every NaN. The sign bit, similarly, is | |
| never guaranteed. (The sign may be preserved if that's meaningful for the | |
| failed operation, but it may not be.) You should not test for NaN by | |
| comparing to a fixed encoded value; instead, use the jisnan opcode. | |
| 1.8: The Save-Game Format | |
| (Or, if you like, "serializing the machine state".) | |
| This is a variant of Quetzal, the standard Z-machine save file format. (See | |
| <http://ifarchive.org/if-archive/infocom/interpreters/specification/savefile | |
| _14.txt>.) | |
| Everything in the Quetzal specification applies, with the following | |
| exceptions: | |
| 1.8.1: Contents of Dynamic Memory | |
| In both compressed and uncompressed form, the memory chunk ('CMem' or | |
| 'UMem') starts with a four-byte value, which is the current size of memory. | |
| The memory data then follows. During a restore, the size of memory is | |
| changed to this position. | |
| The memory area to be saved does not start at address zero, but at | |
| RAMSTART. It continues to the current end of memory (which may not be the | |
| ENDMEM value in the header.) When generating or reading compressed data | |
| ('CMem' chunk), the data above EXTSTART is handled as if the game file were | |
| extended with as many zeroes as necessary. | |
| 1.8.2: Contents of the Stack | |
| Before the stack is written out, a four-value call stub is pushed on -- | |
| result destination, PC, and FramePtr. (See section 1.3.2, "Call Stubs".) | |
| Then the entire stack can be written out, with all of its values (of | |
| whatever size) transformed to big-endian. (Padding is not skipped; it's | |
| written out as the appropriate number of zero bytes.) | |
| When the game-state is loaded back in -- or, for that matter, when | |
| continuing after a game-save -- the four values are read back off the | |
| stack, a result code for the operation is stored in the appropriate | |
| destination, and execution continues. | |
| [[Remember that in a call stub, the PC contains the address of the | |
| instruction *after* the one being executed.]] | |
| 1.8.3: Memory Allocation Heap | |
| If the heap is active (see section 2.9, "Memory Allocation Heap"), an | |
| allocation heap chunk is written ('MAll'). This chunk contains two | |
| four-byte values, plus two more for each extant memory block: | |
| * Heap start address | |
| * Number of extant blocks | |
| * Address of first block | |
| * Length of first block | |
| * Address of second block | |
| * Length of second block | |
| * ... | |
| The blocks need not be listed in any particular order. | |
| If the heap is not active, the 'MAll' chunk can contain 0,0 or it may be | |
| omitted. | |
| 1.8.4: Associated Story File | |
| The contents of the game-file identifier ('IFhd' chunk) are simply the | |
| first 128 bytes of memory. This is within ROM (since RAMSTART is at least | |
| 256), so it does not vary during play. It includes the story file length | |
| and checksum, as well as any compiler-specific information that may be | |
| stored immediately after the header. | |
| 1.8.5: State Not Saved | |
| Some aspects of Glulx execution are not part of the save process, and | |
| therefore are not changed during a restart, restore, or restoreundo | |
| operation. The program is responsible for checking these values after a | |
| restore to see if they have (from the program's point of view) changed | |
| unexpectedly. | |
| Examples of information which is not saved: | |
| * Glk library state. This includes Glk opaque objects (windows, | |
| filerefs, streams). It also includes I/O state such as the current output | |
| stream, contents of windows, and cursor positions. Accounting for Glk | |
| object changes after restore/restoreundo is tricky, but absolutely | |
| necessary. | |
| * The protected-memory range (position, length, and whether it exists | |
| at all). Note that the *contents* of the range (if it exists) are not | |
| treated specially during saving, and are therefore saved normally. | |
| * The random number generator's internal state. | |
| * The I/O system mode and current string-decoding table address. | |
| 2: Dictionary of Opcodes | |
| Opcodes are written here in the format: | |
| opname L1 L2 S1 | |
| ...where "L1" and "L2" are operands using the load addressing modes, and | |
| "S1" is an operand using the store addressing modes. (See section 1.5, | |
| "Instruction Format".) | |
| The table of opcodes: | |
| * 0x00: nop | |
| * 0x10: add | |
| * 0x11: sub | |
| * 0x12: mul | |
| * 0x13: div | |
| * 0x14: mod | |
| * 0x15: neg | |
| * 0x18: bitand | |
| * 0x19: bitor | |
| * 0x1A: bitxor | |
| * 0x1B: bitnot | |
| * 0x1C: shiftl | |
| * 0x1D: sshiftr | |
| * 0x1E: ushiftr | |
| * 0x20: jump | |
| * 0x22: jz | |
| * 0x23: jnz | |
| * 0x24: jeq | |
| * 0x25: jne | |
| * 0x26: jlt | |
| * 0x27: jge | |
| * 0x28: jgt | |
| * 0x29: jle | |
| * 0x2A: jltu | |
| * 0x2B: jgeu | |
| * 0x2C: jgtu | |
| * 0x2D: jleu | |
| * 0x30: call | |
| * 0x31: return | |
| * 0x32: catch | |
| * 0x33: throw | |
| * 0x34: tailcall | |
| * 0x40: copy | |
| * 0x41: copys | |
| * 0x42: copyb | |
| * 0x44: sexs | |
| * 0x45: sexb | |
| * 0x48: aload | |
| * 0x49: aloads | |
| * 0x4A: aloadb | |
| * 0x4B: aloadbit | |
| * 0x4C: astore | |
| * 0x4D: astores | |
| * 0x4E: astoreb | |
| * 0x4F: astorebit | |
| * 0x50: stkcount | |
| * 0x51: stkpeek | |
| * 0x52: stkswap | |
| * 0x53: stkroll | |
| * 0x54: stkcopy | |
| * 0x70: streamchar | |
| * 0x71: streamnum | |
| * 0x72: streamstr | |
| * 0x73: streamunichar | |
| * 0x100: gestalt | |
| * 0x101: debugtrap | |
| * 0x102: getmemsize | |
| * 0x103: setmemsize | |
| * 0x104: jumpabs | |
| * 0x110: random | |
| * 0x111: setrandom | |
| * 0x120: quit | |
| * 0x121: verify | |
| * 0x122: restart | |
| * 0x123: save | |
| * 0x124: restore | |
| * 0x125: saveundo | |
| * 0x126: restoreundo | |
| * 0x127: protect | |
| * 0x130: glk | |
| * 0x140: getstringtbl | |
| * 0x141: setstringtbl | |
| * 0x148: getiosys | |
| * 0x149: setiosys | |
| * 0x150: linearsearch | |
| * 0x151: binarysearch | |
| * 0x152: linkedsearch | |
| * 0x160: callf | |
| * 0x161: callfi | |
| * 0x162: callfii | |
| * 0x163: callfiii | |
| * 0x170: mzero | |
| * 0x171: mcopy | |
| * 0x178: malloc | |
| * 0x179: mfree | |
| * 0x180: accelfunc | |
| * 0x181: accelparam | |
| * 0x190: numtof | |
| * 0x191: ftonumz | |
| * 0x192: ftonumn | |
| * 0x198: ceil | |
| * 0x199: floor | |
| * 0x1A0: fadd | |
| * 0x1A1: fsub | |
| * 0x1A2: fmul | |
| * 0x1A3: fdiv | |
| * 0x1A4: fmod | |
| * 0x1A8: sqrt | |
| * 0x1A9: exp | |
| * 0x1AA: log | |
| * 0x1AB: pow | |
| * 0x1B0: sin | |
| * 0x1B1: cos | |
| * 0x1B2: tan | |
| * 0x1B3: asin | |
| * 0x1B4: acos | |
| * 0x1B5: atan | |
| * 0x1B6: atan2 | |
| * 0x1C0: jfeq | |
| * 0x1C1: jfne | |
| * 0x1C2: jflt | |
| * 0x1C3: jfle | |
| * 0x1C4: jfgt | |
| * 0x1C5: jfge | |
| * 0x1C8: jisnan | |
| * 0x1C9: jisinf | |
| Opcodes 0x1000 to 0x10FF are reserved for use by FyreVM. Opcodes 0x1100 to | |
| 0x11FF are reserved for extension projects by Dannii Willis. These are not | |
| documented here. See section 0.2, "Glulx and Other IF Systems". | |
| 2.1: Integer Math | |
| add L1 L2 S1 | |
| Add L1 and L2, using standard 32-bit addition. Truncate the result to 32 | |
| bits if necessary. Store the result in S1. | |
| sub L1 L2 S1 | |
| Compute (L1 - L2), and store the result in S1. | |
| mul L1 L2 S1 | |
| Compute (L1 * L2), and store the result in S1. Truncate the result to 32 | |
| bits if necessary. | |
| div L1 L2 S1 | |
| Compute (L1 / L2), and store the result in S1. This is signed integer | |
| division. | |
| mod L1 L2 S1 | |
| Compute (L1 % L2), and store the result in S1. This is the remainder from | |
| signed integer division. | |
| In division and remainer, signs are annoying. Rounding is towards zero. The | |
| sign of a remainder equals the sign of the dividend. It is always true that | |
| (A / B) * B + (A % B) == A. Some examples (in decimal): | |
| 11 / 2 = 5 | |
| -11 / 2 = -5 | |
| 11 / -2 = -5 | |
| -11 / -2 = 5 | |
| 13 % 5 = 3 | |
| -13 % 5 = -3 | |
| 13 % -5 = 3 | |
| -13 % -5 = -3 | |
| neg L1 S1 | |
| Compute the negative of L1. | |
| bitand L1 L2 S1 | |
| Compute the bitwise AND of L1 and L2. | |
| bitor L1 L2 S1 | |
| Compute the bitwise OR of L1 and L2. | |
| bitxor L1 L2 S1 | |
| Compute the bitwise XOR of L1 and L2. | |
| bitnot L1 S1 | |
| Compute the bitwise negation of L1. | |
| shiftl L1 L2 S1 | |
| Shift the bits of L1 to the left (towards more significant bits) by L2 | |
| places. The bottom L2 bits are filled in with zeroes. If L2 is 32 or more, | |
| the result is always zero. | |
| ushiftr L1 L2 S1 | |
| Shift the bits of L1 to the right by L2 places. The top L2 bits are filled | |
| in with zeroes. If L2 is 32 or more, the result is always zero. | |
| sshiftr L1 L2 S1 | |
| Shift the bits of L1 to the right by L2 places. The top L2 bits are filled | |
| in with copies of the top bit of L1. If L2 is 32 or more, the result is | |
| always zero or FFFFFFFF, depending on the top bit of L1. | |
| Notes on the shift opcodes: If L2 is zero, the result is always equal to | |
| L1. L2 is considered unsigned, so 80000000 or greater is "more than 32". | |
| 2.2: Branches | |
| All branches (except jumpabs) specify their destinations with an offset | |
| value. The actual destination address of the branch is computed as (Addr + | |
| Offset - 2), where Addr is the address of the instruction *after* the | |
| branch opcode, and offset is the branch's operand. The special offset | |
| values 0 and 1 are interpreted as "return 0" and "return 1" respectively. | |
| [[This odd hiccup is inherited from the Z-machine. Inform uses it heavily | |
| for code optimization.]] | |
| It is legal to branch to code that is in another function. [[Indeed, there | |
| is no well-defined notion of where a function ends.]] However, this does | |
| not affect the current stack frame; that remains set up according to the | |
| same function call as before the branch. Similarly, it is legal to branch | |
| to code which is not associated with any function -- e.g., code compiled on | |
| the fly in RAM. | |
| jump L1 | |
| Branch unconditionally to offset L1. | |
| jz L1 L2 | |
| If L1 is equal to zero, branch to L2. | |
| jnz L1 L2 | |
| If L1 is not equal to zero, branch to L2. | |
| jeq L1 L2 L3 | |
| If L1 is equal to L2, branch to L3. | |
| jne L1 L2 L3 | |
| If L1 is not equal to L2, branch to L3. | |
| jlt L1 L2 L3 | |
| jle L1 L2 L3 | |
| jgt L1 L2 L3 | |
| jge L1 L2 L3 | |
| Branch is L1 is less than, less than or equal to, greater than, greater | |
| than or equal to L2. The values are compared as signed 32-bit values. | |
| jltu L1 L2 L3 | |
| jleu L1 L2 L3 | |
| jgtu L1 L2 L3 | |
| jgeu L1 L2 L3 | |
| The same, except that the values are compared as unsigned 32-bit values. | |
| [[Since the address space can span the full 32-bit range, it is wiser to | |
| compare addresses with the unsigned comparison operators.]] | |
| jumpabs L1 | |
| Branch unconditionally to address L1. Unlike the other branch opcodes, this | |
| takes an absolute address, not an offset. The special cases 0 and 1 (for | |
| returning) do not apply; jumpabs 0 would branch to memory address 0, if | |
| that were ever a good idea, which it isn't. | |
| 2.3: Moving Data | |
| copy L1 S1 | |
| Read L1 and store it at S1, without change. | |
| copys L1 S1 | |
| Read a 16-bit value from L1 and store it at S1. | |
| copyb L1 S1 | |
| Read an 8-bit value from L1 and store it at S1. | |
| Since copys and copyb can access chunks smaller than the usual four bytes, | |
| they require some comment. When reading from main memory or the call-frame | |
| locals, they access two or one bytes, instead of four. However, when | |
| popping or pushing values on the stack, these opcodes pull or push a full | |
| 32-bit value. | |
| Therefore, if copyb (for example) copies a byte from main memory to the | |
| stack, a 32-bit value will be pushed, whose value will be from 0 to 255. | |
| Sign-extension *does not* occur. Conversely, if copyb copies a byte from | |
| the stack to memory, a 32-bit value is popped, and the bottom 8 bits are | |
| written at the given address. The upper 24 bits are lost. Constant values | |
| are truncated as well. | |
| If copys or copyb are used with both L1 and S1 in pop/push mode, the 32-bit | |
| value is popped, truncated, and pushed. | |
| [[NOTE: Since a call frame has no specified endianness, it is unwise to use | |
| these opcodes to pull out one or two bytes from a four-byte local variable. | |
| The result will be implementation-dependent. Therefore, use of the copyb | |
| and copys opcodes with a local-variable operand of different size is | |
| deprecated. Since locals of less than four bytes are *also* deprecated, you | |
| should not use copyb or copys with local-variable operands at all.]] | |
| sexs L1 S1 | |
| Sign-extend a value, considered as a 16-bit value. If the value's 8000 bit | |
| is set, the upper 16 bits are all set; otherwise, the upper 16 bits are all | |
| cleared. | |
| sexb L1 S1 | |
| Sign-extend a value, considered as an 8-bit value. If the value's 80 bit is | |
| set, the upper 24 bits are all set; otherwise, the upper 24 bits are all | |
| cleared. | |
| Note that these opcodes, like most, work on 32-bit values. Although (for | |
| example) sexb is commonly used in conjunction with copyb, it does not share | |
| copyb's behavior of reading a single byte from memory or the locals. | |
| Also note that the upper bits, 16 or 24 of them, are entirely ignored and | |
| overwritten with ones or zeroes. | |
| 2.4: Array Data | |
| astore L1 L2 L3 | |
| Store L3 into the 32-bit field at main memory address (L1+4*L2). | |
| aload L1 L2 S1 | |
| Load a 32-bit value from main memory address (L1+4*L2), and store it in S1. | |
| astores L1 L2 L3 | |
| Store L3 into the 16-bit field at main memory address (L1+2*L2). | |
| aloads L1 L2 S1 | |
| Load an 16-bit value from main memory address (L1+2*L2), and store it in S1. | |
| astoreb L1 L2 L3 | |
| Store L3 into the 8-bit field at main memory address (L1+L2). | |
| aloadb L1 L2 S1 | |
| Load an 8-bit value from main memory address (L1+L2), and store it in S1. | |
| Note that these opcodes cannot access call-frame locals, or the stack. (Not | |
| with the L1 and L2 opcodes, that is.) L1 and L2 provide a main-memory | |
| address. Be not confused by the fact that L1 and L2 can be any addressing | |
| mode, including call-frame or stack-pop modes. That controls where the | |
| values come from which are used to *compute* the main-memory address. | |
| The other end of the transfer (S1 or L3) is always a 32-bit value. The | |
| "store" opcodes truncate L3 to 8 or 16 bits if necessary. The "load" | |
| opcodes expand 8-bit or 16-bit values *without* sign extension. (If signed | |
| values are appropriate, you can follow aloads/aloadb with sexs/sexb.) | |
| L2 is considered signed, so you can access addresses before L1 as well as | |
| after. | |
| astorebit L1 L2 L3 | |
| Set or clear a single bit. This is bit number (L2 mod 8) of memory address | |
| (L1+L2/8). It is cleared if L3 is zero, set if nonzero. | |
| aloadbit L1 L2 S1 | |
| Test a single bit, similarly. If it is set, 1 is stored at S1; if clear, 0 | |
| is stored. | |
| For these two opcodes, bits are effectively numbered sequentially, starting | |
| with the least significant bit of address L1. L2 is considered signed, so | |
| this numbering extends both positively and negatively. For example: | |
| astorebit 1002 0 1: Set bit 0 of address 1002. (The 1's place.) | |
| astorebit 1002 7 1: Set bit 7 of address 1002. (The 128's place.) | |
| astorebit 1002 8 1: Set bit 0 of address 1003. | |
| astorebit 1002 9 1: Set bit 1 of address 1003. | |
| astorebit 1002 -1 1: Set bit 7 of address 1001. | |
| astorebit 1002 -3 1: Set bit 5 of address 1001. | |
| astorebit 1002 -8 1: Set bit 0 of address 1001. | |
| astorebit 1002 -9 1: Set bit 7 of address 1000. | |
| Like the other aload and astore opcodes, these opcodes cannot access | |
| call-frame locals, or the stack. | |
| 2.5: The Stack | |
| stkcount S1 | |
| Store a count of the number of values on the stack. This counts only values | |
| above the current call-frame. In other words, it is always zero when a C1 | |
| function starts executing, and (numargs+1) when a C0 function starts | |
| executing. It then increases and decreases thereafter as values are pushed | |
| and popped; it is always the number of values that can be popped legally. | |
| (If S1 uses the stack push mode, the count is done before the result is | |
| pushed.) | |
| stkpeek L1 S1 | |
| Peek at the L1'th value on the stack, without actually popping anything. If | |
| L1 is zero, this is the top value; if one, it's the value below that; etc. | |
| L1 must be less than the current stack-count. (If L1 or S1 use the stack | |
| pop/push modes, the peek is counted after L1 is popped, but before the | |
| result is pushed.) | |
| stkswap | |
| Swap the top two values on the stack. The current stack-count must be at | |
| least two. | |
| stkcopy L1 | |
| Peek at the top L1 values in the stack, and push duplicates onto the stack | |
| in the same order. If L1 is zero, nothing happens. L1 must not be greater | |
| than the current stack-count. (If L1 uses the stack pop mode, the stkcopy | |
| is counted after L1 is popped.) | |
| An example of stkcopy, starting with six values on the stack: | |
| 5 4 3 2 1 0 <top> | |
| stkcopy 3 | |
| 5 4 3 2 1 0 2 1 0 <top> | |
| stkroll L1 L2 | |
| Rotate the top L1 values on the stack. They are rotated up or down L2 | |
| places, with positive values meaning up and negative meaning down. The | |
| current stack-count must be at least L1. If either L1 or L2 is zero, | |
| nothing happens. (If L1 and/or L2 use the stack pop mode, the roll occurs | |
| after they are popped.) | |
| An example of two stkrolls, starting with nine values on the stack: | |
| 8 7 6 5 4 3 2 1 0 <top> | |
| stkroll 5 1 | |
| 8 7 6 5 0 4 3 2 1 <top> | |
| stkroll 9 -3 | |
| 5 0 4 3 2 1 8 7 6 <top> | |
| Note that stkswap is equivalent to stkroll 2 1, or for that matter stkroll | |
| 2 -1. Also, stkcopy 1 is equivalent to stkpeek 0 sp. | |
| These opcodes can only access the values pushed on the stack above the | |
| current call-frame. It is illegal to stkswap, stkpeek, stkcopy, or stkroll | |
| values below that -- i.e, the locals segment or any previous function call | |
| frames. | |
| 2.6: Functions | |
| call L1 L2 S1 | |
| Call function whose address is L1, passing in L2 arguments, and store the | |
| return result at S1. | |
| The arguments are taken from the stack. Before you execute the call opcode, | |
| you must push the arguments on, in backward order (last argument pushed | |
| first, first argument topmost on the stack.) The L2 arguments are removed | |
| before the new function's call frame is constructed. (If L1, L2, or S1 use | |
| the stack pop/push modes, the arguments are taken after L1 or L2 is popped, | |
| but before the result is pushed.) | |
| Recall that all functions in Glulx have a single 32-bit return value. If | |
| you do not care about the return value, you can use operand mode 0 | |
| ("discard value") for operand S1. | |
| callf L1 S1 | |
| callfi L1 L2 S1 | |
| callfii L1 L2 L3 S1 | |
| callfiii L1 L2 L3 L4 S1 | |
| Call function whose address is L1, passing zero, one, two, or three | |
| arguments. Store the return result at S1. | |
| These opcodes behave the same as call, except that the arguments are given | |
| in the usual opcode format instead of being found on the stack. (If L2, L3, | |
| etc. all use the stack pop mode, then the behavior is exactly the same as | |
| call.) | |
| return L1 | |
| Return from the current function, with the given return value. If this is | |
| the top-level function, Glulx execution is over. | |
| Note that all the branch opcodes (jump, jz, jeq, and so on) have an option | |
| to return 0 or 1 instead of branching. These behave exactly as if the | |
| return opcode had been executed. | |
| tailcall L1 L2 | |
| Call function whose address is L1, passing in L2 arguments, and pass the | |
| return result out to whoever called the current function. | |
| This destroys the current call-frame, as if a return had been executed, but | |
| does not touch the call stub below that. It then immediately calls L1, | |
| creating a new call-frame. The effect is the same as a call immediately | |
| followed by a return, but takes less stack space. | |
| It is legal to use tailcall from the top-level function. L1 becomes the | |
| top-level function. | |
| [[This opcode can be used to implement tail recursion, without forcing the | |
| stack to grow with every call.]] | |
| 2.7: Continuations | |
| catch S1 L1 | |
| Generates a "catch token", which can be used to jump back to this execution | |
| point from a throw opcode. The token is stored in S1, and then execution | |
| branches to offset L1. If execution is proceeding from this point because | |
| of a throw, the thrown value is stored instead, and the branch is ignored. | |
| Remember if the branch value is not 0 or 1, the branch is to to (Addr + L1 | |
| - 2), where Addr is the address of the instruction *after* the catch. If | |
| the value *is* 0 or 1, the function returns immediately, invalidating the | |
| catch token. | |
| If S1 or L1 uses the stack push/pop modes, note that the precise order of | |
| execution is: evaluate L1 (popping if appropriate); generate a call stub | |
| and compute the token; store S1 (pushing if appropriate). | |
| throw L1 L2 | |
| Jump back to a previously-executed catch opcode, and store the value L1. L2 | |
| must be a valid catch token. | |
| The exact catch/throw procedure is as follows: | |
| When catch is executed, a four-value call stub is pushed on the stack -- | |
| result destination, PC, and FramePtr. (See section 1.3.2, "Call Stubs". The | |
| PC is the address of the next instruction after the catch.) The catch token | |
| is the value of the stack pointer after these are pushed. The token value | |
| is stored in the result destination, and execution proceeds, branching to | |
| L1. | |
| When throw is executed, the stack is popped down until the stack pointer | |
| equals the given token. Then the four values are read back off the stack, | |
| the thrown value is stored in the destination, and execution proceeds with | |
| the instruction after the catch. | |
| If the call stub (or any part of it) is removed from the stack, the catch | |
| token becomes invalid, and must not be used. This will certainly occur when | |
| you return from the function containing the catch opcode. It will also | |
| occur if you pop too many values from the stack after executing the catch. | |
| (You may wish to do this to "cancel" the catch; if you pop and discard | |
| those four values, the token is invalidated, and it is as if you had never | |
| executed the catch at all.) The catch token is also invalidated if any part | |
| of the call stub is overwritten (e.g. with stkswap or stkroll). | |
| [[Why is the catch branch taken at catch time, and ignored after a throw? | |
| Because it's easier to write the interpreter that way, that's why. If it | |
| had to branch after a throw, either the call stub would have to contain the | |
| branch offset, or the terp would have to re-parse the catch instruction. | |
| Both are ugly.]] | |
| 2.8: Memory Map | |
| getmemsize S1 | |
| Store the current size of the memory map. This is originally the ENDMEM | |
| value from the header, but you can change it with the setmemsize opcode. | |
| (The malloc and mfree opcodes may also cause this value to change; see | |
| section 2.9, "Memory Allocation Heap".) It will always be greater than or | |
| equal to ENDMEM, and will always be a multiple of 256. | |
| setmemsize L1 S1 | |
| Set the current size of the memory map. The new value must be a multiple of | |
| 256, like all memory boundaries in Glulx. It must be greater than or equal | |
| to ENDMEM (the initial memory-size value which is stored in the header.) It | |
| does not have to be greater than the previous memory size. The memory size | |
| may grow and shrink over time, as long as it never gets smaller than the | |
| initial size. | |
| When the memory size grows, the new space is filled with zeroes. When it | |
| shrinks, the contents of the old space are lost. | |
| If the allocation heap is active (see section 2.9, "Memory Allocation | |
| Heap") you may not use setmemsize -- the memory map is under the control of | |
| the heap system. If you free all heap objects, the heap will then no longer | |
| be active, and you can use setmemsize. | |
| Since memory allocation is never guaranteed, you must be prepared for the | |
| possibility that setmemsize will fail. The opcode stores the value zero if | |
| it succeeded, and 1 if it failed. If it failed, the memory size is | |
| unchanged. | |
| Some interpreters do not have the capability to resize memory at all. On | |
| such interpreters, setmemsize will *always* fail. You can check this in | |
| advance with the ResizeMem gestalt selector. | |
| Note that the memory size is considered part of the game state. If you | |
| restore a saved game, the current memory size is changed to the size that | |
| was in effect when the game was saved. If you restart, the current memory | |
| size is reset to its initial value. | |
| 2.9: Memory Allocation Heap | |
| Manage the memory allocation heap. | |
| Glulx is able to maintain a list of dynamically-allocated memory objects. | |
| These objects exist in the memory map, above ENDMEM. The malloc and mfree | |
| opcodes allow the game to request the allocation and destruction of these | |
| objects. | |
| Some interpreters do not have the capability to manage an allocation heap. | |
| On such interpreters, malloc will always fail. You can check this in | |
| advance with the MAlloc gestalt selector. | |
| When you first allocate a block of memory, the heap becomes active. The | |
| current end of memory -- that is, the current getmemsize value -- becomes | |
| the beginning address of the heap. The memory map is then extended to | |
| accomodate the memory block. | |
| Subsequent memory allocations and deallocations are done within the heap. | |
| The interpreter may extend or reduce the memory map, as needed, when | |
| allocations and deallocations occur. While the heap is active, you may not | |
| manually resize the memory map with setmemsize; the heap system is | |
| responsible for doing that. | |
| When you free the last extant memory block, the heap becomes inactive. The | |
| interpreter will reduce the memory map size down to the heap-start address. | |
| (That is, the getmemsize value returns to what it was before you allocated | |
| the first block.) Thereafter, it is legal to call setmemsize again. | |
| It is legitimate to read or write any memory address in the heap range | |
| (from ENDMEM to the end of the memory map). You are not restricted to | |
| extant blocks. [[The VM's heap state is not stored in its own memory map. | |
| So, unlike the familiar C heap, you cannot damage it by writing outside | |
| valid blocks.]] | |
| The heap state (whether it is active, its starting address, and the | |
| addresses and sizes of all extant blocks) *is* part of the saved game state. | |
| These opcodes were added in Glulx version 3.1. | |
| malloc L1 S1 | |
| Allocate a memory block of L1 bytes. (L1 must be positive.) This stores the | |
| address of the new memory block, which will be within the heap and will not | |
| overlap any other extant block. The interpreter may have to extend the | |
| memory map (see section 2.8, "Memory Map") to accomodate the new block. | |
| This operation does not change the contents of the memory block (or, | |
| indeed, the contents of the memory map at all). If you want the memory | |
| block to be initialized, you must do it yourself. | |
| If the allocation fails, this stores zero. | |
| mfree L1 | |
| Free the memory block at address L1. This *must* be the address of an | |
| extant block -- that is, a value returned by malloc and not previously | |
| freed. | |
| This operation does not change the contents of the memory block (or, | |
| indeed, the contents of the memory map at all). | |
| 2.10: Game State | |
| quit | |
| Shut down the terp and exit. This is equivalent to returning from the | |
| top-level function, or for that matter calling glk_exit(). | |
| Note that (in the Glk I/O system) Glk is responsible for any "hit any key | |
| to exit" prompt. It is safe for you to print a bunch of final text and then | |
| exit immediately. | |
| restart | |
| Restore the VM to its initial state (memory, stack, and registers). Note | |
| that the current memory size is reset, as well as the contents of memory. | |
| save L1 S1 | |
| Save the VM state to the output stream L1. It is your responsibility to | |
| prompt the player for a filespec, open the stream, and then destroy these | |
| objects afterward. S1 is set to zero if the operation succeeded, 1 if it | |
| failed, and -1 if the VM has just been restored and is continuing from this | |
| instruction. | |
| (In the Glk I/O system, L1 should be the ID of a writable Glk stream. In | |
| other I/O systems, it will mean something different. In the "filter" and | |
| "null" I/O systems, the save opcode is illegal, as the interpreter has | |
| nowhere to write the state.) | |
| restore L1 S1 | |
| Restore the VM state from the input stream L1. S1 is set to 1 if the | |
| operation failed. If it succeeded, of course, this instruction never | |
| returns a value. | |
| saveundo S1 | |
| Save the VM state in a temporary location. The terp will choose a location | |
| appropriate for rapid access, so this may be called once per turn. S1 is | |
| set to zero if the operation succeeded, 1 if it failed, and -1 if the VM | |
| state has just been restored. | |
| restoreundo S1 | |
| Restore the VM state from temporary storage. S1 is set to 1 if the | |
| operation failed. | |
| protect L1 L2 | |
| Protect a range of memory from restart, restore, restoreundo. The protected | |
| range starts at address L1 and has a length of L2 bytes. This memory is | |
| silently unaffected by the state-restoring operations. (However, if the | |
| result-storage S1 is directed into the protected range, that is not | |
| blocked.) | |
| When the VM starts up, there is no protection range. Only one range can be | |
| protected at a time. Calling protect cancels any previous range. To turn | |
| off protection, call protect with L1 and L2 set to zero. | |
| It is important to note that the protection range itself (its existence, | |
| location, and length) is *not* part of the saved game state! If you save a | |
| game, move the protection range to a new location, and then restore that | |
| game, it is the new range that will be protected, and the range will remain | |
| there afterwards. | |
| verify S1 | |
| Perform sanity checks on the game file, using its length and checksum. S1 | |
| is set to zero if everything looks good, 1 if there seems to be a problem. | |
| (Many interpreters will do this automatically, before the game starts | |
| executing. This opcode is provided mostly for slower interpreters, where | |
| auto-verify might cause an unacceptable delay.) | |
| Notes: | |
| All the save and restore opcodes can generate diagnostic information on the | |
| current output stream. | |
| A terp may support several levels of temporary storage. You should not make | |
| any assumptions about how many times restoreundo can be called. If the | |
| player so requests, you should keep calling it until it fails. | |
| Glk opaque objects (windows, streams, filespecs) are not part of the saved | |
| game state. Therefore, when you restore a game, all the object IDs you have | |
| in Glulx memory must be considered invalid. (This includes both IDs in main | |
| memory and on the stack.) You must use the Glk iteration calls to go | |
| through all the opaque objects in existence, and recognize them by their | |
| rocks. | |
| The same applies after restoreundo, to a lesser extent. Since | |
| saveundo/restoreundo only operate within a single play session, you can | |
| rely on the IDs of objects created before the first saveundo. However, if | |
| you have created any objects since then, you must iterate and recognize | |
| them. | |
| The restart opcode is a similar case. You must do an iteration as soon as | |
| your program starts, to find objects created in an earlier incarnation. | |
| Alternatively, you can be careful to close all opaque objects before | |
| invoking restart. | |
| [[Another approach is to use the protect opcode, to preserve global | |
| variables containing your object IDs. This will work within a play session | |
| -- that is, with saveundo, restoreundo, and restart. You must still deal | |
| with save and restore.]] | |
| 2.11: Output | |
| getiosys S1 S2 | |
| Return the current I/O system mode and rock. | |
| Due to a long-standing bug in the reference interpreter, the two store | |
| operands must be of the same general type: both main-memory/global stores, | |
| both local variable stores, or both stack pushes. | |
| setiosys L1 L2 | |
| Set the I/O system mode and rock. If the system L1 is not supported by the | |
| interpreter, it will default to the "null" system (0). | |
| These systems are currently defined: | |
| * 0: The null system. All output is discarded. (When the Glulx machine | |
| starts up, this is the current system.) | |
| * 1: The filtering system. The rock (L2) value should be the address of | |
| a Glulx function. This function will be called for every character output | |
| (with the character value as its sole argument). The function's return | |
| value is ignored. | |
| * 2: The Glk system. All output will be handled through Glk function | |
| calls, sent to the current Glk stream. | |
| * 20: The FyreVM channel system. See section 0.2, "Glulx and Other IF | |
| Systems". | |
| It is important to recall that when Glulx starts up, the Glk I/O system is | |
| *not* set. And when Glk starts up, there are no windows and no current | |
| output stream. To make anything appear to the user, you must first do three | |
| things: select the Glk I/O system, open a Glk window, and set its stream as | |
| the current one. (It is illegal in Glk to send output when there is no | |
| stream set. Sending output to Glulx's "null" I/O system is legal, but | |
| pointless.) | |
| streamchar L1 | |
| Send L1 to the current stream. This sends a single character; the value L1 | |
| is truncated to eight bits. | |
| streamunichar L1 | |
| Send L1 to the current stream. This sends a single (32-bit) character. | |
| This opcode was added in Glulx version 3.0. | |
| streamnum L1 | |
| Send L1 to the current stream, represented as a signed decimal number in | |
| ASCII. | |
| streamstr L1 | |
| Send a string object to the current stream. L1 must be the address of a | |
| Glulx string object (type E0, E1, or E2.) The string is decoded and sent as | |
| a sequence of characters. | |
| When the Glk I/O system is set, these opcodes are implemented using the Glk | |
| API. You can bypass them and directly call glk_put_char(), | |
| glk_put_buffer(), and so on. Remember, however, that glk_put_string() only | |
| accepts unencoded string (E0) objects; glk_put_string_uni() only accepts | |
| unencoded Unicode (E2) objects. | |
| Note that it is illegal to decode a compressed string (E1) if there is no | |
| string-decoding table set. | |
| getstringtbl S1 | |
| Return the address the terp is currently using for its string-decoding | |
| table. If there is no table, set, this returns zero. | |
| setstringtbl L1 | |
| Change the address the terp is using for its string-decoding table. This | |
| may be zero, indicating that there is no table (in which case it is illegal | |
| to print any compressed string). Otherwise, it must be the address of a | |
| *valid* string-decoding table. | |
| [[This does not change the value in the header field at address 001C. The | |
| header is in ROM, and never changes. To determine the current table | |
| address, use the getstringtbl opcode.]] | |
| A string-decoding table may be in RAM or ROM, but there may be speed | |
| penalties if it is in RAM. See section 1.6.1.4, "The String-Decoding Table". | |
| 2.12: Floating-Point Math | |
| Recall that floating-point values are encoded as single-precision (32-bit) | |
| IEEE-754 values (see section 1.7, "Floating-Point Numbers"). The | |
| interpreter must convert values (from memory or the stack) before | |
| performing a floating-point operation, and unconvert them afterwards. | |
| [[In other words, passing a float value to an integer arithmetic opcode | |
| will operate on the IEEE-754-encoded 32-bit value. Such an operation would | |
| be deterministic, albeit mathematically meaningless. The same is true for | |
| passing an integer to a float opcode.]] | |
| Float operations which produce inexact results are not guaranteed to be | |
| identical on every platform. That is, 1.0 plus 1.0 will always be 2.0, | |
| because that can be represented exactly. But acos(-1.0), which should be | |
| pi, may generate either 40490FDA (3.14159250...) or 40490FDB | |
| (3.14159274...). Both are approximations of the correct result, but which | |
| one you get depends on the interpreter's underlying math library. | |
| If any argument to a float operation is a NaN ("not a number") value, the | |
| result will be a NaN value. (Except for the pow opcode, which has some | |
| special cases.) | |
| [[Speaking of special cases: I have tried to describe all the important | |
| ones for these operations. However, you should also consult the Glulxercise | |
| unit test (available on the Glulx web site). Consider it definitive if this | |
| document is unclear.]] | |
| These opcodes were added in Glulx version 3.1.2. However, not all | |
| interpreters may support them. You can test for their availability with the | |
| Float gestalt selector. | |
| numtof L1 S1 | |
| Convert an integer value to the closest equivalent float. (That is, if L1 | |
| is 1, then 3F800000 -- the float encoding of 1.0 -- will be stored in S1.) | |
| Integer zero is converted to (positive) float zero. | |
| If the value is less than -1000000 or greater than 1000000 (hex), the | |
| conversion may not be exact. (More specifically, it may round to a nearby | |
| multiple of a power of 2.) | |
| ftonumz L1 S1 | |
| Convert a float value to an integer, rounding towards zero (i.e., | |
| truncating the fractional part). If the value is outside the 32-bit integer | |
| range, or is NaN or infinity, the result will be 7FFFFFFF (for positive | |
| values) or 80000000 (for negative values). | |
| ftonumn L1 S1 | |
| Convert a float value to an integer, rounding towards the nearest integer. | |
| Again, overflows become 7FFFFFFF or 80000000. | |
| fadd L1 L2 S1 | |
| fsub L1 L2 S1 | |
| fmul L1 L2 S1 | |
| fdiv L1 L2 S1 | |
| Perform floating-point arithmetic. Overflows produce infinite values (with | |
| the appropriate sign); underflows produce zero values (ditto). 0/0 is NaN. | |
| Inf/Inf, or Inf-Inf, is NaN. Any finite number added to infinity is | |
| infinity. Any nonzero number divided by an infinity, or multiplied by zero, | |
| is a zero. Any nonzero number multiplied by an infinity, or divided by | |
| zero, is an infinity. | |
| fmod L1 L2 S1 S2 | |
| Perform a floating-point modulo operation. S1 is the remainder (or | |
| modulus); S2 is the quotient. | |
| S2 is L1/L2, rounded (towards zero) to an integral value. S1 is L1-(S2*L2). | |
| Note that S1 always has the same sign as L1; S2 has the appropriate sign | |
| for L1/L2. | |
| If L2 is 1, this gives you the fractional and integer parts of L1. If L1 is | |
| zero, both results are zero. If L2 is infinite, S1 is L1 and S2 is zero. If | |
| L1 is infinite or L2 is zero, both results are NaN. | |
| ceil L1 S1 | |
| floor L1 S1 | |
| Round L1 up (towards +Inf) or down (towards -Inf) to the nearest integral | |
| value. (The result is still in float format, however.) These opcodes are | |
| idempotent. | |
| The result keeps the sign of L1; in particular, floor(0.5) is 0 and | |
| ceil(-0.5) is -0. Rounding -0 up or down gives -0. Rounding an infinite | |
| value gives infinity. | |
| sqrt L1 S1 | |
| exp L1 S1 | |
| log L1 S1 | |
| Compute the square root of L1, e^L1, and log of L1 (base e). | |
| sqrt(-0) is -0. sqrt returns NaN for all other negative values. exp(+0) and | |
| exp(-0) are 1; exp(-Inf) is +0. log(+0) and log(-0) are -Inf. log returns | |
| NaN for all other negative values. | |
| pow L1 L2 S1 | |
| Compute L1 raised to the L2 power. | |
| The special cases are breathtaking. The following is quoted (almost) | |
| directly from the libc man page: | |
| * pow(+-0, y) returns +-Inf for y an odd integer < 0. | |
| * pow(+-0, y) returns +Inf for y < 0 and not an odd integer. | |
| * pow(+-0, y) returns +-0 for y an odd integer > 0. | |
| * pow(+-0, y) returns +0 for y > 0 and not an odd integer. | |
| * pow(-1, +-Inf) returns 1. | |
| * pow(1, y) returns 1 for any y, even a NaN. | |
| * pow(x, +-0) returns 1 for any x, even a NaN. | |
| * pow(x, y) returns a NaN for finite x < 0 and finite non-integer y. | |
| * pow(x, -Inf) returns +Inf for |x| < 1. | |
| * pow(x, -Inf) returns +0 for |x| > 1. | |
| * pow(x, +Inf) returns +0 for |x| < 1. | |
| * pow(x, +Inf) returns +Inf for |x| > 1. | |
| * pow(-Inf, y) returns -0 for y an odd integer < 0. | |
| * pow(-Inf, y) returns +0 for y < 0 and not an odd integer. | |
| * pow(-Inf, y) returns -Inf for y an odd integer > 0. | |
| * pow(-Inf, y) returns +Inf for y > 0 and not an odd integer. | |
| * pow(+Inf, y) returns +0 for y < 0. | |
| * pow(+Inf, y) returns +Inf for y > 0. | |
| * pow(x, y) returns NaN if x is negative and y is not an integer (both | |
| finite). | |
| sin L1 S1 | |
| cos L1 S1 | |
| tan L1 S1 | |
| acos L1 S1 | |
| asin L1 S1 | |
| atan L1 S1 | |
| Compute the standard trigonometric functions. | |
| sin and cos return values in the range -1 to 1. sin, cos, and tan of | |
| infinity are NaN. | |
| asin is always in the range -pi/2 to pi/2; acos is always in the range 0 to | |
| pi. asin and acos of values greater than 1, or less than -1, are NaN. | |
| atan(+-Inf) is +-pi/2. | |
| atan2 L1 L2 S1 | |
| Computes the arctangent of L1/L2, using the signs of both arguments to | |
| determine the quadrant of the return value. (Note that the Y argument is | |
| first and the X argument is second.) | |
| Again with the special cases: | |
| * atan2(+-0, -0) returns +-pi. | |
| * atan2(+-0, +0) returns +-0. | |
| * atan2(+-0, x) returns +-pi for x < 0. | |
| * atan2(+-0, x) returns +-0 for x > 0. | |
| * atan2(y, +-0) returns +pi/2 for y > 0. | |
| * atan2(y, +-0) returns -pi/2 for y < 0. | |
| * atan2(+-y, -Inf) returns +-pi for finite y. | |
| * atan2(+-y, +Inf) returns +-0 for finite y. | |
| * atan2(+-Inf, x) returns +-pi/2 for finite x. | |
| * atan2(+-Inf, -Inf) returns +-3*pi/4. | |
| * atan2(+-Inf, +Inf) returns +-pi/4. | |
| 2.13: Floating-Point Comparisons | |
| All these branch opcodes specify their destinations with an offset value. | |
| See section 2.2, "Branches". | |
| Most of these opcodes never branch if any argument is NaN. (Exceptions are | |
| jisnan and jfne.) In particular, NaN is neither less than, greater than, | |
| nor equal to NaN. | |
| These opcodes were added in Glulx version 3.1.2. However, not all | |
| interpreters may support them. You can test for their availability with the | |
| Float gestalt selector. | |
| jisnan L1 L2 | |
| Branch to L2 if the floating-point value L1 is a NaN value. (See section | |
| 1.7, "Floating-Point Numbers".) | |
| jisinf L1 L2 | |
| Branch to L2 if the floating-point value L1 is an infinity (7F800000 or | |
| FF800000). | |
| jfeq L1 L2 L3 L4 | |
| Branch to L4 if the difference between L1 and L2 is less than or equal to | |
| (plus or minus) L3. The sign of L3 is ignored. | |
| If any of the arguments are NaN, this will not branch. If L3 is infinite, | |
| this will always branch -- unless L1 and L2 are opposite infinities. | |
| (Opposite infinities are never equal, regardless of L3. Infinities of the | |
| same sign are always equal.) | |
| If L3 is (plus or minus) zero, this tests for exact equality. Note that +0 | |
| is considered exactly equal to -0. | |
| jfne L1 L2 L3 L4 | |
| The reverse of jfeq. This *will* branch if *any* of the arguments is NaN. | |
| jflt L1 L2 L3 | |
| jfle L1 L2 L3 | |
| jfgt L1 L2 L3 | |
| jfge L1 L2 L3 | |
| Branch to L3 if L1 is less than (less than or equal to, greater than, | |
| greater than or equal to) L2. | |
| +0 and -0 behave identically in comparisons. In particular, +0 is | |
| considered equal to -0, not greater than -0. | |
| 2.14: Random Number Generator | |
| random L1 S1 | |
| Return a random number in the range 0 to (L1-1); or, if L1 is negative, the | |
| range (L1+1) to 0. If L1 is zero, return a random number in the full 32-bit | |
| integer range. (Remember that this may be either positive or negative.) | |
| setrandom L1 | |
| Seed the random-number generator with the value L1. If L1 is zero, | |
| subsequent random numbers will be as genuinely unpredictable as the terp | |
| can provide; it may include timing data or other random sources in its | |
| generation. If L1 is nonzero, subsequent random numbers will follow a | |
| deterministic sequence, always the same for a given nonzero seed. | |
| The terp starts up in the "nondeterministic" mode (as if setrandom 0 had | |
| been invoked.) | |
| The random-number generator is not part of the saved-game state. | |
| 2.15: Block Copy and Clear | |
| mzero L1 L2 | |
| Write L1 zero bytes, starting at address L2. This is exactly equivalent to: | |
| for (ix=0: ix<L1: ix++) L2->ix = 0; | |
| mcopy L1 L2 L3 | |
| Copy L1 bytes from address L2 to address L3. It is safe to copy a block to | |
| an overlapping block. This is exactly equivalent to: | |
| if (L3 < L2) | |
| for (ix=0: ix<L1: ix++) L3->ix = L2->ix; | |
| else | |
| for (ix=L1-1: ix>=0: ix--) L3->ix = L2->ix; | |
| For both of these opcodes, L1 may be zero, in which case the opcodes do | |
| nothing. The operands are considered unsigned, so a "negative" L1 is a very | |
| large number (and almost certainly a mistake). | |
| These opcodes were added in Glulx version 3.1. You can test for their | |
| availability with the MemCopy gestalt selector. | |
| 2.16: Searching | |
| Perform a generic linear, binary, or linked-list search. | |
| [[These are outrageously CISC for an hardware CPU, but easy enough to add | |
| to a software terp; and taking advantage of them can speed up a program | |
| considerably. Advent, under the Inform library, runs 15-20% faster when | |
| property-table lookup is handled with a binary-search opcode instead of | |
| Inform code. A similar change in the dictionary lookup trims another | |
| percent or so.]] | |
| All three of these opcodes operate on a collection of fixed-size data | |
| structures in memory. A key, which is a fixed-length array of bytes, is | |
| found at a known position within each data structure. The opcodes search | |
| the collection of structures, and find one whose key matches a given key. | |
| The following flags may be set in the Options argument. Note that not all | |
| flags can be used with all types of searches. | |
| * KeyIndirect (0x01): This flag indicates that the Key argument passed | |
| to the opcode is the address of the actual key. If this flag is not used, | |
| the Key argument is the key value itself. (In this case, the KeySize *must* | |
| be 1, 2, or 4 -- the native sizes of Glulx values. If the KeySize is 1 or | |
| 2, the lower bytes of the Key are used and the upper bytes ignored.) | |
| * ZeroKeyTerminates (0x02): This flag indicates that the search should | |
| stop (and return failure) if it encounters a structure whose key is all | |
| zeroes. If the searched-for key happens to also be all zeroes, the success | |
| takes precedence. | |
| * ReturnIndex (0x04): This flag indicates that search should return the | |
| array index of the structure that it finds, or -1 (0xFFFFFFFF) for failure. | |
| If this flag is not used, the search returns the address of the structure | |
| that it finds, or 0 for failure. | |
| linearsearch L1 L2 L3 L4 L5 L6 L7 S1 | |
| * L1: Key | |
| * L2: KeySize | |
| * L3: Start | |
| * L4: StructSize | |
| * L5: NumStructs | |
| * L6: KeyOffset | |
| * L7: Options | |
| * S1: Result | |
| An array of data structures is stored in memory, beginning at Start, each | |
| structure being StructSize bytes. Within each struct, there is a key value | |
| KeySize bytes long, starting at position KeyOffset (from the start of the | |
| structure.) Search through these in order. If one is found whose key | |
| matches, return it. If NumStructs are searched with no result, the search | |
| fails. | |
| NumStructs may be -1 (0xFFFFFFFF) to indicate no upper limit to the number | |
| of structures to search. The search will continue until a match is found, | |
| or (if ZeroKeyTerminates is used) a zero key. | |
| The KeyIndirect, ZeroKeyTerminates, and ReturnIndex options may be used. | |
| binarysearch L1 L2 L3 L4 L5 L6 L7 S1 | |
| * L1: Key | |
| * L2: KeySize | |
| * L3: Start | |
| * L4: StructSize | |
| * L5: NumStructs | |
| * L6: KeyOffset | |
| * L7: Options | |
| * S1: Result | |
| An array of data structures is in memory, as above. However, the structs | |
| must be stored in forward order of their keys (taking each key to be a | |
| big-endian unsigned integer.) There can be no duplicate keys. NumStructs | |
| must indicate the exact length of the array; it cannot be -1. | |
| The KeyIndirect and ReturnIndex options may be used. | |
| linkedsearch L1 L2 L3 L4 L5 L6 S1 | |
| * L1: Key | |
| * L2: KeySize | |
| * L3: Start | |
| * L4: KeyOffset | |
| * L5: NextOffset | |
| * L6: Options | |
| * S1: Result | |
| The structures need not be consecutive; they may be anywhere in memory, in | |
| any order. They are linked by a four-byte address field, which is found in | |
| each struct at position NextOffset. If this field contains zero, it | |
| indicates the end of the linked list. | |
| The KeyIndirect and ZeroKeyTerminates options may be used. | |
| 2.17: Accelerated Functions | |
| To improve performance, Glulx incorporates some complex functions which | |
| replicate code in the Inform library. [[Yes, this is even more outrageously | |
| CISC than the search opcodes.]] | |
| Rather than allocating a new opcode for each function, Glulx offers an | |
| expandable function acceleration system. Two functions are defined below. | |
| The game may request that a particular address -- the address of a VM | |
| function -- be replaced by one of the available functions. This does not | |
| alter memory; but any subsequent call to that address might invoke the | |
| terp's built-in version of the function, instead of the VM code at that | |
| address. | |
| (A "call" includes any function invocation of that address, including the | |
| call, tailcall, and callf (etc.) opcodes. It also includes invocation via | |
| the filter I/O system, and function nodes in the string-decoding table. | |
| Branches to the address are *not* affected; neither are returns, throws, or | |
| other ways the terp might reach it.) | |
| A terp may implement any, all, or none of the functions on the list. If the | |
| game requests an accelerated function which is not available, the request | |
| is ignored. Therefore, the game *must* be sure that it only requests an | |
| accelerated function at an address which actually matches the requested | |
| function. | |
| Some functions may require values (or addresses) which are compiled into | |
| the game file, or otherwise stored by the game. The interpreter maintains a | |
| table of these parameters -- whichever ones are needed by the functions it | |
| supports. All parameters in the table are initially zero; the game may | |
| supply values as needed. | |
| The set of active acceleration requests, and the values in the parameter | |
| table, are *not* part of the saved-game state. | |
| The behavior of an accelerated function is somewhat limited. The state of | |
| the VM during the function is not defined, so there is no way for an | |
| accelerated function to call a normal VM function. The normal printing | |
| mechanism (as in the streamchar opcode, etc) is not available, since that | |
| can call VM functions via the filter I/O system. [[Not that I/O functions | |
| are likely to be worth accelerating in any case.]] | |
| Errors encountered during an accelerated function will be displayed to the | |
| user by some convenient means. For example, an interpreter may send the | |
| error message to the current Glk output stream. However, the terp may have | |
| no recourse but to invoke a *fatal* error. (For example, if there is no | |
| current Glk output stream.) Therefore, accelerated functions are defined | |
| with no error conditions that must be recoverable. | |
| These opcodes were added in Glulx version 3.1.1. Since a 3.1.1 game file | |
| ought to run in a 3.1.0 interpreter, you *may not* use these opcodes | |
| without first testing the Acceleration gestalt selector. If it returns | |
| zero, your game is running on a 3.1.0 terp (or earlier), and it is your | |
| responsibility to avoid executing these opcodes. [[Of course, the way the | |
| opcodes are defined should ensure that skipping them does not affect the | |
| behavior of your game.]] | |
| accelfunc L1 L2 | |
| Request that the VM function with address L2 be replaced by the accelerated | |
| function whose number is L1. If L1 is zero, the acceleration for address L2 | |
| is cancelled. | |
| If the terp does not offer accelerated function L1, this does nothing. | |
| If you request acceleration at an address which is already accelerated, the | |
| previous request is cancelled before the new one is considered. If you | |
| cancel at an unaccelerated address, nothing happens. | |
| A given accelerated function L1 may replace several VM functions (at | |
| different addresses) at the same time. Each request is considered separate, | |
| and must be cancelled separately. | |
| accelparam L1 L2 | |
| Store the value L2 in the parameter table at position L1. If the terp does | |
| not know about parameter L1, this does nothing. | |
| The list of accelerated functions is as follows. They are defined as if in | |
| Inform source code. (Consider Inform's "strict" mode to be off, for the | |
| purposes of operators such as .& and -->.) ERROR() represents code which | |
| displays an error, as described above. | |
| (Functions may be added to this list in future versions of the Glulx spec. | |
| Existing functions will not be removed or altered.) | |
| Constant PARAM_0_classes_table = #classes_table; | |
| Constant PARAM_1_indiv_prop_start = INDIV_PROP_START; | |
| Constant PARAM_2_class_metaclass = Class; | |
| Constant PARAM_3_object_metaclass = Object; | |
| Constant PARAM_4_routine_metaclass = Routine; | |
| Constant PARAM_5_string_metaclass = String; | |
| Constant PARAM_6_self = #globals_array + WORDSIZE * #g$self; | |
| Constant PARAM_7_num_attr_bytes = NUM_ATTR_BYTES; | |
| Constant PARAM_8_cpv__start = #cpv__start; | |
| ! OBJ_IN_CLASS: utility function; implements "obj in Class". | |
| [ OBJ_IN_CLASS obj; | |
| return ((obj + 13 + PARAM_7_num_attr_bytes)-->0 | |
| == PARAM_2_class_metaclass); | |
| ]; | |
| ! FUNC_1_Z__Region: implements Z__Region() as of Inform 6.31. | |
| [ FUNC_1_Z__Region addr | |
| tb endmem; ! locals | |
| if (addr<36) rfalse; | |
| @getmemsize endmem; | |
| @jgeu addr endmem?outrange; ! branch if addr >= endmem (unsigned) | |
| tb=addr->0; | |
| if (tb >= $E0) return 3; | |
| if (tb >= $C0) return 2; | |
| if (tb >= $70 && tb <= $7F && addr >= (0-->2)) return 1; | |
| .outrange; | |
| rfalse; | |
| ]; | |
| ! FUNC_2_CP__Tab: implements CP__Tab() as of Inform 6.31. | |
| [ FUNC_2_CP__Tab obj id | |
| otab max res; ! locals | |
| if (FUNC_1_Z__Region(obj)~=1) { | |
| ERROR("[** Programming error: tried to find the ~.~ of (something) | |
| **]"); | |
| rfalse; | |
| } | |
| otab = obj-->4; | |
| if (otab == 0) return 0; | |
| max = otab-->0; | |
| otab = otab+4; | |
| @binarysearch id 2 otab 10 max 0 0 res; | |
| return res; | |
| ]; | |
| ! FUNC_3_RA__Pr: implements RA__Pr() as of Inform 6.31. | |
| [ FUNC_3_RA__Pr obj id | |
| cla prop ix; ! locals | |
| if (id & $FFFF0000) { | |
| cla = PARAM_0_classes_table-->(id & $FFFF); | |
| if (~~FUNC_5_OC__Cl(obj, cla)) return 0; | |
| @ushiftr id 16 id; | |
| obj = cla; | |
| } | |
| prop = FUNC_2_CP__Tab(obj, id); | |
| if (prop==0) return 0; | |
| if (OBJ_IN_CLASS(obj) && cla == 0) { | |
| if (id < PARAM_1_indiv_prop_start | |
| || id >= PARAM_1_indiv_prop_start+8) | |
| return 0; | |
| } | |
| if (PARAM_6_self-->0 ~= obj) { | |
| @aloadbit prop 72 ix; | |
| if (ix) return 0; | |
| } | |
| return prop-->1; | |
| ]; | |
| ! FUNC_4_RL__Pr: implements RL__Pr() as of Inform 6.31. | |
| [ FUNC_4_RL__Pr obj id | |
| cla prop ix; ! locals | |
| if (id & $FFFF0000) { | |
| cla = PARAM_0_classes_table-->(id & $FFFF); | |
| if (~~FUNC_5_OC__Cl(obj, cla)) return 0; | |
| @ushiftr id 16 id; | |
| obj = cla; | |
| } | |
| prop = FUNC_2_CP__Tab(obj, id); | |
| if (prop==0) return 0; | |
| if (OBJ_IN_CLASS(obj) && cla == 0) { | |
| if (id < PARAM_1_indiv_prop_start | |
| || id >= PARAM_1_indiv_prop_start+8) | |
| return 0; | |
| } | |
| if (PARAM_6_self-->0 ~= obj) { | |
| @aloadbit prop 72 ix; | |
| if (ix) return 0; | |
| } | |
| @aloads prop 1 ix; | |
| return WORDSIZE * ix; | |
| ]; | |
| ! FUNC_5_OC__Cl: implements OC__Cl() as of Inform 6.31. | |
| [ FUNC_5_OC__Cl obj cla | |
| zr jx inlist inlistlen; ! locals | |
| zr = FUNC_1_Z__Region(obj); | |
| if (zr == 3) { | |
| if (cla == PARAM_5_string_metaclass) rtrue; | |
| rfalse; | |
| } | |
| if (zr == 2) { | |
| if (cla == PARAM_4_routine_metaclass) rtrue; | |
| rfalse; | |
| } | |
| if (zr ~= 1) rfalse; | |
| if (cla == PARAM_2_class_metaclass) { | |
| if (OBJ_IN_CLASS(obj) | |
| || obj == PARAM_2_class_metaclass or PARAM_5_string_metaclass | |
| or PARAM_4_routine_metaclass or PARAM_3_object_metaclass) | |
| rtrue; | |
| rfalse; | |
| } | |
| if (cla == PARAM_3_object_metaclass) { | |
| if (OBJ_IN_CLASS(obj) | |
| || obj == PARAM_2_class_metaclass or PARAM_5_string_metaclass | |
| or PARAM_4_routine_metaclass or PARAM_3_object_metaclass) | |
| rfalse; | |
| rtrue; | |
| } | |
| if (cla == PARAM_5_string_metaclass or PARAM_4_routine_metaclass) | |
| rfalse; | |
| if (~~OBJ_IN_CLASS(cla)) { | |
| ERROR("[** Programming error: tried to apply 'ofclass' with | |
| non-class **]"); | |
| rfalse; | |
| } | |
| inlist = FUNC_3_RA__Pr(obj, 2); | |
| if (inlist == 0) rfalse; | |
| inlistlen = FUNC_4_RL__Pr(obj, 2) / WORDSIZE; | |
| for (jx=0 : jx<inlistlen : jx++) { | |
| if (inlist-->jx == cla) rtrue; | |
| } | |
| rfalse; | |
| ]; | |
| ! FUNC_6_RV__Pr: implements RV__Pr() as of Inform 6.31. | |
| [ FUNC_6_RV__Pr obj id | |
| addr; ! locals | |
| addr = FUNC_3_RA__Pr(obj, id); | |
| if (addr == 0) { | |
| if (id > 0 && id < PARAM_1_indiv_prop_start) { | |
| return PARAM_8_cpv__start-->id; | |
| } | |
| ERROR("[** Programming error: tried to read (something) **]"); | |
| return 0; | |
| } | |
| return addr-->0; | |
| ]; | |
| ! FUNC_7_OP__Pr: implements OP__Pr() as of Inform 6.31. | |
| [ FUNC_7_OP__Pr obj id | |
| zr; ! locals | |
| zr = FUNC_1_Z__Region(obj); | |
| if (zr == 3) { | |
| if (id == print or print_to_array) rtrue; | |
| rfalse; | |
| } | |
| if (zr == 2) { | |
| if (id == call) rtrue; | |
| rfalse; | |
| } | |
| if (zr ~= 1) rfalse; | |
| if (id >= PARAM_1_indiv_prop_start | |
| && id < PARAM_1_indiv_prop_start+8) { | |
| if (OBJ_IN_CLASS(obj)) rtrue; | |
| } | |
| if (FUNC_3_RA__Pr(obj, id) ~= 0) | |
| rtrue; | |
| rfalse; | |
| ]; | |
| 2.18: Miscellaneous | |
| nop | |
| Do nothing. | |
| gestalt L1 L2 S1 | |
| Test the Gestalt selector number L1, with optional extra argument L2, and | |
| store the result in S1. If the selector is not known, store zero. | |
| The reasoning behind the design of a Gestalt system is, I hope, too obvious | |
| to explain. | |
| [[This list of Gestalt selectors has nothing to do with the list in the Glk | |
| library.]] | |
| The list of L1 selectors is as follows. Note that if a selector does not | |
| mention L2, you should always set that argument to zero. [[This will ensure | |
| future compatibility, in case the selector definition is extended.]] | |
| * GlulxVersion (0): Returns the version of the Glulx spec which the | |
| interpreter implements. The upper 16 bits of the value contain a major | |
| version number; the next 8 bits contain a minor version number; and the | |
| lowest 8 bits contain an even more minor version number, if any. This | |
| specification is version 3.1.2, so a terp implementing it would return | |
| 0x00030102. I will try to maintain the convention that minor version | |
| changes are backwards compatible, and subminor version changes are | |
| backwards and forwards compatible. | |
| * TerpVersion (1): Returns the version of the interpreter. The format | |
| is the same as the GlulxVersion. [[Each interpreter has its own version | |
| numbering system, defined by its author, so this information is not | |
| terribly useful. But it is convenient for the game to be able to display | |
| it, in case the player is capturing version information for a bug report.]] | |
| * ResizeMem (2): Returns 1 if the terp has the potential to resize the | |
| memory map, with the setmemsize opcode. If this returns 0, setmemsize will | |
| always fail. [[But remember that setmemsize might fail in any case.]] | |
| * Undo (3): Returns 1 if the terp has the potential to undo. If this | |
| returns 0, saveundo and restoreundo will always fail. | |
| * IOSystem (4): Returns 1 if the terp supports the I/O system given in | |
| L2. (The constants are the same as for the setiosys opcode: 0 for null, 1 | |
| for filter, 2 for Glk, 20 for FyreVM. 0 and 1 will always succeed.) | |
| * Unicode (5): Returns 1 if the terp supports Unicode operations. These | |
| are: the E2 Unicode string type; the 04 and 05 string node types (in | |
| compressed strings); the streamunichar opcode; the type-14 call stub. If | |
| the Unicode selector returns 0, encountering any of these will cause a | |
| fatal interpreter error. | |
| * MemCopy (6): Returns 1 if the interpreter supports the mzero and | |
| mcopy opcodes. (This must true for any terp supporting Glulx 3.1.) | |
| * MAlloc (7): Returns 1 if the interpreter supports the malloc and | |
| mfree opcodes. (If this is true, MemCopy and ResizeMem must also both be | |
| true, so there is no need to check all three.) | |
| * MAllocHeap (8): Returns the start address of the heap. This is the | |
| value that getmemsize had when the first memory block was allocated. If the | |
| heap is not active (no blocks are extant), this returns zero. | |
| * Acceleration (9): Returns 1 if the interpreter supports the accelfunc | |
| and accelparam opcodes. (This must true for any terp supporting Glulx | |
| 3.1.1.) | |
| * AccelFunc (10): Returns 1 if the terp implements the accelerated | |
| function given in L2. | |
| * Float (11): Returns 1 if the interpreter supports the floating-point | |
| arithmetic opcodes. | |
| Selectors 0x1000 to 0x10FF are reserved for use by FyreVM. Selectors 0x1100 | |
| to 0x11FF are reserved for extension projects by Dannii Willis. These are | |
| not documented here. See section 0.2, "Glulx and Other IF Systems". | |
| [[The Unicode selector is slightly redundant. Since the Unicode operations | |
| exist in Glulx spec 3.0 and higher, you can get the same information by | |
| testing GlulxVersion against 0x00030000. However, it's clearer to have a | |
| separate selector. Similarly, the MemCopy selector is true exactly when | |
| GlulxVersion is 0x00030100 or higher.]] | |
| [[The Unicode selector does *not* guarantee that your Glk library supports | |
| Unicode. For that, you must check the Glk gestalt selector gestalt_Unicode. | |
| If the Glk library is non-Unicode, the Glulx Unicode operations are still | |
| legal; however, Unicode characters (beyond FF) will be printed as 3F | |
| ("?").]] | |
| debugtrap L1 | |
| Interrupt execution to do something interpreter-specific with L1. If the | |
| interpreter has nothing in mind, it should halt with a visible error | |
| message. | |
| [[This is intended for use by debugging interpreters. The program might be | |
| sprinkled with consistency tests, set to call debugtrap if an assertion | |
| failed. The interpreter could then be set to halt, display a warning, or | |
| ignore the debugtrap.]] | |
| This should *not* be used as an arbitrary interpreter trap-door in a | |
| finished (non-debugging) program. If you really want to add interpreter | |
| functionality to your program, and you're willing to support an alternate | |
| interpreter to run it, you should add an entirely new opcode. There are | |
| still 2^28 of them available, give or take. | |
| glk L1 L2 S1 | |
| Call the Glk API function whose identifier is L1, passing in L2 arguments. | |
| The return value is stored at S1. (If the Glk function has no return value, | |
| zero is stored at S1.) | |
| The arguments are passed on the stack, last argument pushed first, just as | |
| for the call opcode. | |
| Arguments should be represented in the obvious way. Integers and character | |
| are passed as integers. Glk opaque objects are passed as integer | |
| identifiers, with zero representing NULL. Strings and Unicode strings are | |
| passed as the addresses of Glulx string objects (see section 1.6.1, | |
| "Strings".) References to values are passed by their addresses. Arrays are | |
| passed by their addresses; note that an array argument, unlike a string | |
| argument, is always followed by an array length argument. | |
| Reference arguments require more explanation. A reference to an integer or | |
| opaque object is the address of a 32-bit value (which, being in main | |
| memory, does not have to be aligned, but must be big-endian.) | |
| Alternatively, the value -1 (FFFFFFFF) may be passed; this is a special | |
| case, which means that the value is read from or written to the stack. | |
| Arguments are always evaluated left to right, which means that input | |
| arguments are popped from the stack first-topmost, but output arguments are | |
| pushed on last-topmost. | |
| A reference to a Glk structure is the address of an array of 32-bit values | |
| in main memory. Again, -1 means that all the values are written to the | |
| stack. Also again, an input structure is popped off first-topmost, and an | |
| output structure is pushed on last-topmost. | |
| All stack input references (-1 addresses) are popped after the Glk argument | |
| list is popped. [[This should be obvious, since the -1 occurs *in* the Glk | |
| argument list.]] Stack output references are pushed after the Glk call, but | |
| before the S1 result value is stored. | |
| [[The difference between strings and character arrays is somewhat | |
| confusing. These are the same type in the C Glk API, but different in | |
| Glulx. Calls such as glk_put_buffer() and glk_request_line_event() take | |
| character arrays; this is the address of a byte array containing character | |
| values, followed by an integer array length. The byte array itself has | |
| neither a length field or a terminator. In contrast, calls such as | |
| glk_put_string() and glk_fileref_create_by_name() take string arguments, | |
| which must be unencoded Glulx string objects. An unencoded Glulx string | |
| object is nearly a byte array, but not quite; it has an E0 byte at the | |
| beginning and a zero byte at the end. Similarly, calls such as | |
| glk_put_string_uni() take unencoded (E2) Unicode objects.]] | |
| [[Previous versions of this spec said that string arguments could be | |
| unencoded *or encoded* string objects. This use of encoded strings has | |
| never been supported, however, and it is withdrawn from the spec.]] | |
| [[The convention that "address" -1 refers to the stack is a feature of the | |
| Glk invocation mechanism; it applies only to Glk arguments. It is *not* | |
| part of the general Glulx definition. When instruction operands are being | |
| evaluated, -1 has no special meaning. This includes the L1, L2, and S1 | |
| arguments of the glk opcode.]] | |
| 2.19: Assembly Language | |
| The format used by Inform is acceptable for now: | |
| @opcode [ op op op ... ] ; | |
| Where each "op" is a constant, the name of a local variable, the name of a | |
| global variable, or "sp" (for stack push/pop modes). | |
| [[It would be convenient to have a one-line form for the opcodes that pass | |
| arguments on the stack (call and glk).]] | |
| To make life a little easier for cross-platform I6 code, Inform accepts the | |
| macro "@push val" for "@copy val sp", and "@pull val" for "@copy sp val". | |
| Supporting these forms is recommended. | |
| You can synthesize opcodes that the compiler does not know about: | |
| @"FlagsCount:Code" [ op op op ... ] ; | |
| The optional Flags can include "S" if the last operand is a store; "SS" if | |
| the last two operands are stores; "B" for branch format; "R" if execution | |
| never continues after the opcode. The Count is the number of arguments (0 | |
| to 9). The Code is a decimal integer representing the opcode number. So | |
| these two lines generate the same code: | |
| @add x 1 y; | |
| @"S3:16" x 1 y; | |
| ...because the @add opcode has number 16 (decimal), and has format "@add L1 | |
| L2 S1". | |
Xet Storage Details
- Size:
- 112 kB
- Xet hash:
- d36fc2e8a32702f9946837a8ca9eaa63fc7543d54a51ad81c1888632d564b7bd
·
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.