Updated links 150826
Propeller Hardware Explorer with VGA
Tachyon Dropbox files and binaries (latest)
Introduction to TACHYON Forth
Tachyon Forth Resource Links
Tachyon Web Server
FTP: ftp://tachyonforth.com
Telnet: tachyonforth.com 10001
Watch Easynet in operation
![image]()
Note: these early posts are mostly historical only, please read the latest posts or click the links in my sig.
Enhanced bitmap graphics demo + serial
![image]()
*ORIGINAL POST*
I've been hooked onto Forth again after a long break away. Thanks to Sal's PropForth and recently the Bluetooth modules I have rediscovered the advantages and fun of programming and testing in a Forth environment. Now I mentioned I have been away from Forth for awhile and that's got to do with the Propeller chip since I like using it but Forth does not lend itself to this architecture very easily. Several years ago (time flies) I had a look at writing a Forth call CogForth for the Prop but I just felt it was too much hard work, which it was, not just because of the architecture but also because of the limitations of the tools (Spin tool etc). Even so the Forth would have been slow for what I need and there were memory limitations.
However, spurred on by the efforts of Sal Sanci and Prop Braino I have taken another look at my old CogForth since I needed amongst other things more runtime speed without having to resort to assembler. So over the last couple of days CogForth has been completely revamped and I think I'm on a winner with this implementation. It is both fast and very small thanks to the byte codes for each Forth VM operation. Like a tachyon, it is fast and very small (as a hypothetical particle anyway) with emphasis on fast I/O operations and maximizing the Propeller's memory. What would some byte code look like? Have a look at this function which prints a hex character:
So you see this function takes 14 bytes and compare this to Spin which also uses byte codes:
<removed proposed dictionary description>
The runtime speed is mainly because many of the primitives are written in assembly and stacks are implemented that are more suited for the Prop's architecture permitting direct addressing, just like a register. So many of the primitives get the job done with very very few instructions and even the runtime interpreter is lean and mean. A byte code is read from hub RAM and shifted up to 9-bits which the Prop jumps to in COG memory, so it's very direct. The runtime interpreter looks like this:
EDIT: Fixed a bug when testing for PASM for extended byte code functions.
There's a test for byte codes from $C0..$FF which doesn't really impact the speedy operation of the assembly primitves which are indexed by codes $00..$BF. The reason I reserve some codes is that there is no way you could use all the 256 codes for assembly primitives so I used some to form a very compact way of accessing up to 64 more words (functions) which instead of being assembly code are instead interpreted byte codes. All byte code functions other than these special 64 are referenced with 2 or 3 bytes one of which is the byte code and the other 1 or 2 bytes are a relative address poining back to the word function. The one byte CALL gets straight into 1 of 64 higher-level functions which are themselves comprised of byte codes which eventually execute assembly code via the first 192 byte codes $00..$BF.
Anyway, I'm developing and testing and the beta will be ready very soon but I thought I would present some details of the workings of this Forth implementation as I am also looking for feedback. Perhaps also someone could suggest an easier way around the Spin/BST compiler limitations especially with DAT sections and references which the compiler insists must be on long boundaries. Anyway, I want the references to be absolute in hub RAM rather than as if it were PASM running in a COG. Also, I am making it far easier to interface to various chips by having low-level code for serial operations and making all the byte code operations fast, especially serial operations. I'm even thinking of making it as easy to use as the Basic Stamp. For instance, there's something in being able to send and receive serial data on any pin at any time (without starting up a cog). So too all those pin high and pin low and clocking operations etc. I want to be able to hook-up an I2C or SPI'ish chip and bit-bash to it at least in the 100kHz range if not more (without resorting to PASM in a COG).
This is my header file and some code snippets for the moment.
TACHYON
A very fast and very small Forth byte code interpreter for the Propeller chip.
2012 Peter Jakacki
Features:
- Low level words are written in PASM and accessed by the
Forth run-time interpreter as single byte codes.
Byte codes are read from hub RAM and executed in PASM
Byte codes $00..$BF are PASM primitives expaned to 9-bits to directly address COG code
Byte codes $C0..$FF are calls to kernel byte code defs via table in hub RAM
- Support for LMM operations
- Interpreted byte code definitions are referenced either as:
- 1 byte - codes $C0..$FF index their definitions via a table - used as part of compiled kernel
- 2 bytes - RCALL opcode + relative byte (always referenced backwards) (extra 4 bits in opcode = -4096 range)
There are 16 entires in the COG for the RCALL byte code + extra address bits
- 3 bytes - WCALL byte code + 16-bit relative address
- All literals and strings are byte aligned
- Fast I/O bit-bashing support
- Flexible SPI PASM code support words in kernel
Constuct fast serial drivers with minimal code
- Holds Forth headers in EEPROM or SD storage
Searches the dictionary using rapid index key searching by first character
No hub RAM is used by headers
Even 32K EEPROMs can be used if the area is in RAM is normally rewritten (i.e. video memory)
Option to hold additional information per defintion such as stack usage and description
- Kernel compiled in standard manner via Spin tools so other Spin objects can be combined
- Three stacks in COG RAM: Data, Return, and Loop
Access loop indices outside of definitions
Avoids manipulation and corruption of return stack
Static stack arrays for direct addressing of stack items
Intrinsically safe stack overflow and underflow
Some early unoptimized observations:
- Empty loops can execute in 500ns to 825ns (absolute worst case)
Two to one stack operations ( + * AND etc) inc opcode fetch take 900ns to 1.087us (absolute worse case)
Propeller Hardware Explorer with VGA
Tachyon Dropbox files and binaries (latest)
Introduction to TACHYON Forth
Tachyon Forth Resource Links
Tachyon Web Server
FTP: ftp://tachyonforth.com
Telnet: tachyonforth.com 10001
Watch Easynet in operation

Note: these early posts are mostly historical only, please read the latest posts or click the links in my sig.
Enhanced bitmap graphics demo + serial

*ORIGINAL POST*
I've been hooked onto Forth again after a long break away. Thanks to Sal's PropForth and recently the Bluetooth modules I have rediscovered the advantages and fun of programming and testing in a Forth environment. Now I mentioned I have been away from Forth for awhile and that's got to do with the Propeller chip since I like using it but Forth does not lend itself to this architecture very easily. Several years ago (time flies) I had a look at writing a Forth call CogForth for the Prop but I just felt it was too much hard work, which it was, not just because of the architecture but also because of the limitations of the tools (Spin tool etc). Even so the Forth would have been slow for what I need and there were memory limitations.
However, spurred on by the efforts of Sal Sanci and Prop Braino I have taken another look at my old CogForth since I needed amongst other things more runtime speed without having to resort to assembler. So over the last couple of days CogForth has been completely revamped and I think I'm on a winner with this implementation. It is both fast and very small thanks to the byte codes for each Forth VM operation. Like a tachyon, it is fast and very small (as a hypothetical particle anyway) with emphasis on fast I/O operations and maximizing the Propeller's memory. What would some byte code look like? Have a look at this function which prints a hex character:
0534(001C) | PRTHEX ' ( n -- ) print n (0..$0F) as a hex character 0534(001C) 2D | byte CLIT/2,$30,PLUS/2 0535(001C) 30 | 0536(001C) 0C | 0537(001C) 05 | byte DUP/2,CLIT/2,$39,GT/2,_IF/2,3 0538(001D) 2D | 0539(001D) 39 | 053A(001D) 20 | 053B(001D) 3E | 053C(001E) 03 | 053D(001E) 2D | byte CLIT/2,12,PLUS/2 'Adjust for A..F 053E(001E) 0C | 053F(001E) 0C | 0540(001F) 49 | PRTCH byte EMIT/2,EXIT/2 0541(001F) 00 |EDIT: Byte codes mush be shifted one bit right to compress 9-bits, the lsb is always zero as all byte code functions are on double-long boundaries.
So you see this function takes 14 bytes and compare this to Spin which also uses byte codes:
88 char+=$30 Addr : 05B0: 38 30 : Constant 1 Bytes - 30 - $00000030 48 Addr : 05B2: 66 4C : Variable Operation Local Offset - 1 Assign WordMathop + 89 if char > $39 Addr : 05B4: 64 : Variable Operation Local Offset - 1 Read Addr : 05B5: 38 39 : Constant 1 Bytes - 39 - $00000039 57 Addr : 05B7: FA : Math Op > Addr : 05B8: JZ Label0002 Addr : 05B8: 0A 04 : jz Address = 05BE 4 90 char+=12 Addr : 05BA: 38 0C : Constant 1 Bytes - 0C - $0000000C 12 Addr : 05BC: 66 4C : Variable Operation Local Offset - 1 Assign WordMathop + Addr : 05BE: Label0002 Addr : 05BE: Label0003 91 coms.tx(char) Addr : 05BE: 01 : Drop Anchor Addr : 05BF: 64 : Variable Operation Local Offset - 1 Read Addr : 05C0: 06 03 0B : Call Obj.Sub 3 11 Addr : 05C3: 32 : Return
<removed proposed dictionary description>
The runtime speed is mainly because many of the primitives are written in assembly and stacks are implemented that are more suited for the Prop's architecture permitting direct addressing, just like a register. So many of the primitives get the job done with very very few instructions and even the runtime interpreter is lean and mean. A byte code is read from hub RAM and shifted up to 9-bits which the Prop jumps to in COG memory, so it's very direct. The runtime interpreter looks like this:
doNEXT rdbyte token,IP 'read byte code instruction add IP,#1 'advance IP to next byte token shl token,#1 'expand to 9-bits - all byte codes point to code on double-long boundary cmp token,#$180 wc 'tokens $C0..$FF are calls to kernel byte code via kbctbl if_c jmp token 'directly execute PASM byte codes without further ado
EDIT: Fixed a bug when testing for PASM for extended byte code functions.
There's a test for byte codes from $C0..$FF which doesn't really impact the speedy operation of the assembly primitves which are indexed by codes $00..$BF. The reason I reserve some codes is that there is no way you could use all the 256 codes for assembly primitives so I used some to form a very compact way of accessing up to 64 more words (functions) which instead of being assembly code are instead interpreted byte codes. All byte code functions other than these special 64 are referenced with 2 or 3 bytes one of which is the byte code and the other 1 or 2 bytes are a relative address poining back to the word function. The one byte CALL gets straight into 1 of 64 higher-level functions which are themselves comprised of byte codes which eventually execute assembly code via the first 192 byte codes $00..$BF.
Anyway, I'm developing and testing and the beta will be ready very soon but I thought I would present some details of the workings of this Forth implementation as I am also looking for feedback. Perhaps also someone could suggest an easier way around the Spin/BST compiler limitations especially with DAT sections and references which the compiler insists must be on long boundaries. Anyway, I want the references to be absolute in hub RAM rather than as if it were PASM running in a COG. Also, I am making it far easier to interface to various chips by having low-level code for serial operations and making all the byte code operations fast, especially serial operations. I'm even thinking of making it as easy to use as the Basic Stamp. For instance, there's something in being able to send and receive serial data on any pin at any time (without starting up a cog). So too all those pin high and pin low and clocking operations etc. I want to be able to hook-up an I2C or SPI'ish chip and bit-bash to it at least in the 100kHz range if not more (without resorting to PASM in a COG).
This is my header file and some code snippets for the moment.
TACHYON
A very fast and very small Forth byte code interpreter for the Propeller chip.
2012 Peter Jakacki
Features:
- Low level words are written in PASM and accessed by the
Forth run-time interpreter as single byte codes.
Byte codes are read from hub RAM and executed in PASM
Byte codes $00..$BF are PASM primitives expaned to 9-bits to directly address COG code
Byte codes $C0..$FF are calls to kernel byte code defs via table in hub RAM
- Support for LMM operations
- Interpreted byte code definitions are referenced either as:
- 1 byte - codes $C0..$FF index their definitions via a table - used as part of compiled kernel
- 2 bytes - RCALL opcode + relative byte (always referenced backwards) (extra 4 bits in opcode = -4096 range)
There are 16 entires in the COG for the RCALL byte code + extra address bits
- 3 bytes - WCALL byte code + 16-bit relative address
- All literals and strings are byte aligned
- Fast I/O bit-bashing support
- Flexible SPI PASM code support words in kernel
Constuct fast serial drivers with minimal code
- Holds Forth headers in EEPROM or SD storage
Searches the dictionary using rapid index key searching by first character
No hub RAM is used by headers
Even 32K EEPROMs can be used if the area is in RAM is normally rewritten (i.e. video memory)
Option to hold additional information per defintion such as stack usage and description
- Kernel compiled in standard manner via Spin tools so other Spin objects can be combined
- Three stacks in COG RAM: Data, Return, and Loop
Access loop indices outside of definitions
Avoids manipulation and corruption of return stack
Static stack arrays for direct addressing of stack items
Intrinsically safe stack overflow and underflow
Some early unoptimized observations:
- Empty loops can execute in 500ns to 825ns (absolute worst case)
Two to one stack operations ( + * AND etc) inc opcode fetch take 900ns to 1.087us (absolute worse case)
' Fetch the next byte code instruction pointed to by the instruction pointer IP in hub RAM ' doNEXT rdbyte token,IP 'read byte code instruction add IP,#1 'advance IP to next byte token shl token,#1 'expand to 9-bits - all byte codes point to code on double-long boundary cmp token,#$180 wc 'tokens $C0..$FF are calls to kernel byte code via kbctbl if_c jmp token 'directly execute PASM byte codes without further ado ' byte codes $C0..$FF point to further byte code definitions ' which are larger fragments of byte code in hub RAM call #SAVEIP 'save current IP in prep for a call add X,kbcptr 'kbcptr points to the kernel byte code table (less $180) rdword IP,X 'read 16-bit address from hub kbc table into IP jmp #doNEXT 'Execute the code ' Example of PASM code entries for Byte Code indexing on double-long boundaries ' DROP2 call #POPX jmp #DROP DUP mov X,tos ' Read directly from the top of the data stack jmp #PUSHX ' Push X onto the data stack and doNEXT ' OVER mov X,tos+1 'read second data item and push jmp #PUSHX NIP mov tos+1,tos 'replace second item with top and drop jmp #DROP LIT0 mov X,#0 jmp #PUSHX LIT1 mov X,#1 jmp #PUSHX '****************** BOOLEAN ****************** _AND movi _POPEX,#1000_001 ' AND ( n1 n2 -- n3 ) jmp #POPEX 'discard top of stack and execute modified PASM _OR movi _POPEX,#1010_001 jmp #POPEX _XOR movi _POPEX,#1011_001 jmp #POPEX '***************** MEMORY ******************* CFETCH rdbyte tos,tos ' read byte pointed to by tos into tos jmp #doNEXT CPLUSST rdbyte X,tos ' read in byte from adress add tos+1,X ' add second item to contents of address CSTORE wrbyte tos+1,tos ' write the second item using address on the tos jmp #DROP2 ' Example of interpreted byte codes in hub RAM ' References to other byte code defintions is relative which is also necessary because of the Spin compiler's limitations with DAT sections ' 0530(001B) 06 | _BOUNDS byte OVER/2,PLUS/2,SWAP/2,EXIT/2 0531(001B) 0C | 0532(001B) 08 | 0533(001B) 00 | 0534(001C) | PRTHEX ' ( n -- ) print n (0..$0F) as a hex character 0534(001C) 2D | byte CLIT/2,$30,PLUS/2 0535(001C) 30 | 0536(001C) 0C | 0537(001C) 05 | byte DUP/2,CLIT/2,$39,GT/2,_IF/2,3 0538(001D) 2D | 0539(001D) 39 | 053A(001D) 20 | 053B(001D) 3E | 053C(001E) 03 | 053D(001E) 2D | byte CLIT/2,12,PLUS/2 'Adjust for A..F 053E(001E) 0C | 053F(001E) 0C | 0540(001F) 49 | PRTCH byte EMIT/2,EXIT/2 0541(001F) 00 | 0542(001F) | PRTBYTE 0542(001F) 05 | byte DUP/2,CLIT/2,4,_SHR/2 0543(001F) 2D | 0544(0020) 04 | 0545(0020) 1A | 0546(0020) 3B | byte RCALL/2,20 '-->PRTHEX 'Due to limitations of Spin tool & BST this needs to be calculated by hand 0547(0020) 14 | 0548(0021) 3B | byte RCALL/2,22 0549(0021) 16 | 054A(0021) 00 | byte EXIT/2EDIT: Fixed byte code references which are encoded as 8-bits using cogaddress/2