| Welcome, Guest |
You have to register before you can post on our site.
|
| Forum Statistics |
» Members: 273
» Latest member: mindlin
» Forum threads: 1,085
» Forum posts: 6,486
Full Statistics
|
| Online Users |
There are currently 54 online users. » 0 Member(s) | 53 Guest(s) Bing
|
| Latest Threads |
location of heap manageme...
Forum: Help & Support
Last Post: boriel
2026-03-07, 12:13 AM
» Replies: 1
» Views: 216
|
non-paged supervisor code...
Forum: Help & Support
Last Post: sdo303
2026-02-20, 06:38 PM
» Replies: 8
» Views: 692
|
How to open fuse as an ex...
Forum: How-To & Tutorials
Last Post: Duefectu
2026-02-09, 01:52 PM
» Replies: 3
» Views: 1,067
|
Old zxbasic game errors
Forum: Help & Support
Last Post: boriel
2025-11-09, 11:52 AM
» Replies: 7
» Views: 1,678
|
Error: Undefined GLOBAL l...
Forum: Help & Support
Last Post: ardentcrest
2025-11-04, 05:46 PM
» Replies: 3
» Views: 812
|
A Fast(er) Plot Routine f...
Forum: How-To & Tutorials
Last Post: tubz74
2025-10-30, 03:16 PM
» Replies: 2
» Views: 935
|
Hall of Fame - Include fo...
Forum: How-To & Tutorials
Last Post: tubz74
2025-10-28, 03:48 PM
» Replies: 0
» Views: 475
|
[SOLVED] Array layout bug...
Forum: Bug Reports
Last Post: Zoran
2025-10-25, 05:48 PM
» Replies: 2
» Views: 945
|
3DOS Commands?
Forum: Help & Support
Last Post: boriel
2025-10-06, 02:54 PM
» Replies: 3
» Views: 1,109
|
CLS/Fade out ASM Sub-rout...
Forum: How-To & Tutorials
Last Post: tubz74
2025-10-05, 03:39 PM
» Replies: 2
» Views: 839
|
|
|
| Digraph Compression |
|
Posted by: britlion - 2016-02-15, 10:20 PM - Forum: ZX Basic Compiler
- Replies (7)
|
 |
After reading a note by Cheveron about digraph compression ( http://foro.speccy.org/viewtopic.php?f=6...aph#p33001) I thought I'd run up the English language version to see what it looked like.
If we take http://www.math.cornell.edu/~mec/2003-20...raphs.html as a start, and work out what the value of first letter and second letter are, I get this table in the end:
Code: 1st Frq 2nd Frq
e 3.24 n 2.88
t 2.65 e 2.52
h 2.3 t 2.39
i 1.9 h 1.52
a 1.87 r 1.43
o 1.66 s 1.35
n 1.37 d 1.18
s 0.74 a 1.13
r 0.72 i 0.85
l 0.1 o 0.52
d 0.09 u 0.5
v 0.04 g 0.18
u 0.02 f 0.16
l 0.09
Thinking about this, you could store these as a nybble each:
Code: Val bin Lft Rght
00 0000 e n
01 0001 t e
02 0010 h t
03 0011 i h
04 0100 a r
05 0101 o s
06 0110 n d
07 0111 s a
08 1000 r i
09 1001 l o
10 1010 d u
11 1011 v g
12 1100 u f
13 1101 space l
14 1110 Not Digraph space
15 1111 Not Digraph
And if the first three bits are 1 (nybble is > 13), you could use the last 5 bits for the letters A-Z comfortably. There's even room for a few other symbols in there. You might have to use one of them to express some non compression symbol and use a separate byte for capitals (or non capitals), and other symbols. Though clearly 26 letters + space is the default for the first 27 numbers of the 5 bits. In a real world usage, you'd probably find the four most common non lowercase characters, and put those in, and then use 255 to mean "use the full next byte as the character code". This does mean that some symbols go double byte, but that's compression for you - you can always find a case that's bigger than the original.
Not done anything else with it yet, but I thought the thought experiment might be useful if you are going to allocate bits to such a compression scheme.
|
|
|
|