Dreamcast Inside Agartha

FamilyGuy · Jun 11, 2021

Wombat said:
@Sifting all this time I have been keeping an eye on this conversation, being completely in awe about what you are accomplishing and discovering. Just leaving this here to say thank you for these awesome reports for both Agharta and Castlevania. Keep up the good work, very excited in what else you will discover.

Also +1 for Darks suggestion, Geist Force for sure is a game which holds many secrets. For one indeed there are multiple playable maps on the disc which by default are not being played. Seeing you really go deep, I think having the source materials untouched is probably best course of action for you. Geist Force unfortunately never got shared untouched, BUT..... luckily I kept the files around from when a select few were trying to figure out how to get the game running on a stock Dreamcast.

Basically this is the whole build with dummied data, but the most important ip.bin and 1ST_READ.BIN can be found in these files fully in-tact. So with these + the build that is floating around you can create an "untouched" image: https://we.tl/t-FR8Z3R2rXh (download valid for 7-days)

I did the original GF selfboot.

IIRC, but it's been a while so I might misremember, we tried to keep everything as untouched as possible, but had to do some modifications so that it'd boot. So the original AG release + Wombats files should get you pretty much the completely untouched original files.

And before people complain about the file order, I used the one from the GDR. I did everything over VNC with a shitty connection so I couldn't test anything other than "it boots on nullDC because I can see parts of a few frames once in a while". I was just as surprised as everyone when we realized it was seeking like hell.

Sifting · Jun 12, 2021

Thank you everyone for the encouragement! I wish I had more to share at the moment, but my work weeks are absurd this month. I have made an observation though in the mean time: there is, or maybe was, in fact multiple levels in this April build of Agartha. The pak file contains several scripts and definitions for a dozen of them. The scripts contain cutscene information, and judging by their contents, the levels on disk are all the ones seen in the 15 minute video.

However, the level geometry appears to be missing, and in its place are pairs of LST/HQR files. It's unclear what these files are used for. When opened in a hex editor they appear very dense, like they're compressed, but if they are it's totally different from LZSS. I loaded the 1agartha.bin file into Ghidra, but aside from a string list I'm failing to get anything meaningful out of it. It would be nice to know if they're ever used in the code. Either way it's a bit odd. What's the story behind this build? Why would those files be omitted?

Here's the script for dumping the PAK for others interested in exploring on their own:

Python:

#!/usr/bin/env python3
from struct import unpack, calcsize
import os
import png

def verify (cond, msg):
    if not cond:
        raise Exception (msg)

def pvr_decode (data):
    #Some PVR constants
    HEADER_SIZE = 16
    CODEBOOK_SIZE = 2048
    MAX_WIDTH = 0x8000
    MAX_HEIGHT = 0x8000
   
    #Image must be one of these
    ARGB1555 = 0x0
    RGB565   = 0x1
    ARGB4444 = 0x2
    YUV422   = 0x3
    BUMP     = 0x4
    PAL_4BPP = 0x5
    PAL_8BPP = 0x6
   
    #And one of these
    SQUARE_TWIDDLED            = 0x1
    SQUARE_TWIDDLED_MIPMAP     = 0x2
    VQ                         = 0x3
    VQ_MIPMAP                  = 0x4
    CLUT_TWIDDLED_8BIT         = 0x5
    CLUT_TWIDDLED_4BIT         = 0x6
    DIRECT_TWIDDLED_8BIT       = 0x7
    DIRECT_TWIDDLED_4BIT       = 0x8
    RECTANGLE                  = 0x9
    RECTANGULAR_STRIDE         = 0xd
    SMALL_VQ                   = 0x10
    SMALL_VQ_MIPMAP            = 0x11
    SQUARE_TWIDDLED_MIPMAP_ALT = 0x12
   
    #For printing the above
    TYPES = [
        'ARGB1555',
        'RGB565',
        'ARGB4444',
        'YUV422',
        'BUMP',
        '4BPP',
        '8BPP'
    ]
    FMTS = [
        'UNK0',
        'SQUARE TWIDDLED',
        'SQUARE TWIDDLED MIPMAP',
        'VQ',
        'VQ MIPMAP',
        'CLUT TWIDDLED 8BIT',
        'CLUT TWIDDLED 4BIT',
        'DIRECT TWIDDLED 8BIT',
        'DIRECT TWIDDLED 4BIT',
        'RECTANGLE',
        'UNK1',
        'UNK2',
        'UNK3',
        'RECTANGULAR STRIDE',
        'SMALL VQ',
        'SMALL VQ MIPMAP',
        'SQUARE TWIDDLED MIPMAP ALT'
    ]
   
    #Ensure the texture is PVR encoded
    if data[:4].decode ('ASCII', 'ignore') != 'PVRT':
        return 'Not a PVR texture!', ''
   
    #Extract header
    total, px, fmt, unk, width, height = unpack ('<IBBHHH', data[4:HEADER_SIZE])
   
    data = data[:8 + total]

    #Print info and verify
    print (f'    Type: {TYPES[px]} {FMTS[fmt]}, Size: {width}x{height}')
    verify (width < MAX_WIDTH, f'width is {width}; must be < {MAX_WIDTH}')
    verify (height < MAX_HEIGHT, f'height is {height}; must be < {MAX_HEIGHT}')
   
    #This is my favourite black magic spell!
    #Interleaves x and y to produce a morton code
    #This trivialises decoding PVR images
    def morton (x, y):
        x = (x|(x<<8))&0x00ff00ff
        y = (y|(y<<8))&0x00ff00ff
        x = (x|(x<<4))&0x0f0f0f0f
        y = (y|(y<<4))&0x0f0f0f0f
        x = (x|(x<<2))&0x33333333
        y = (y|(y<<2))&0x33333333
        x = (x|(x<<1))&0x55555555  
        y = (y|(y<<1))&0x55555555
        return x|(y<<1)
   
    #Colour decoders...
    def unpack1555 (colour):
        a = int (255*((colour>>15)&31))
        r = int (255*((colour>>10)&31)/31.0)
        g = int (255*((colour>> 5)&31)/31.0)
        b = int (255*((colour    )&31)/31.0)
        return [r, g, b, a]
       
    def unpack4444 (colour):
        a = int (255*((colour>>12)&15)/15.0)
        r = int (255*((colour>> 8)&15)/15.0)
        g = int (255*((colour>> 4)&15)/15.0)
        b = int (255*((colour    )&15)/15.0)
        return [r, g, b, a]
   
    def unpack565 (colour):
        r = int (255*((colour>>11)&31)/31.0)
        g = int (255*((colour>> 5)&63)/63.0)
        b = int (255*((colour    )&31)/31.0)
        return [r, g, b]
   
    #Format decoders...
    #GOTCHA: PVR stores mipmaps from smallest to largest!
    def vq_decode (raw, decoder):
        pix = []
       
        #Extract the codebook
        tmp = raw[HEADER_SIZE:]
        book = unpack (f'<1024H', tmp[:CODEBOOK_SIZE])
       
        #Skip to the largest mipmap
        #NB: This also avoids another gotcha:
        #Between the codebook and the mipmap data is a padding byte
        #Since we only want the largest though, it doesn't affect us
        size = len (raw)
        base = width*height//4
        lut = raw[size - base : size]
       
        #The codebook is a 2x2 block of 16 bit pixels
        #This effectively halves the image dimensions
        #Each index of the data refers to a codebook entry
        for i in range (height//2):
            row0 = []
            row1 = []
            for j in range (width//2):
                entry = 4*lut[morton (i, j)]
                row0.extend (decoder (book[entry + 0]))
                row1.extend (decoder (book[entry + 1]))
                row0.extend (decoder (book[entry + 2]))
                row1.extend (decoder (book[entry + 3]))
            pix.append (row0)
            pix.append (row1)
        return pix
   
    def morton_decode (raw, decoder):
        pix = []
       
        #Skip to largest mipmap
        size = len (raw)
        base = width*height*2
        mip = raw[size - base : size]
       
        data = unpack (f'<{width*height}H', mip)
        for i in range (height):
            row = []
            for j in range (width):
                row.extend (decoder (data[morton (i, j)]))
            pix.append (row)
        return pix
   
    #From observation:
    #All textures 16 bit
    #All textures are either VQ'd or morton coded (twiddled)
    #So let's just save time and only implement those
    if ARGB1555 == px:
        if SQUARE_TWIDDLED == fmt or SQUARE_TWIDDLED_MIPMAP == fmt:
            return morton_decode (data, unpack1555), 'RGBA'
        elif VQ == fmt or VQ_MIPMAP:
            return vq_decode (data, unpack1555), 'RGBA'
    elif ARGB4444 == px:
        if SQUARE_TWIDDLED == fmt or SQUARE_TWIDDLED_MIPMAP == fmt:
            return morton_decode (data, unpack4444), 'RGBA'
        elif VQ == fmt or VQ_MIPMAP:
            return vq_decode (data, unpack4444), 'RGBA'
    elif RGB565 == px:
        if SQUARE_TWIDDLED == fmt or SQUARE_TWIDDLED_MIPMAP == fmt:
            return morton_decode (data, unpack565), 'RGB'
        elif VQ == fmt or VQ_MIPMAP:
            return vq_decode (data, unpack565), 'RGB'
   
    #Oh, well...
    return 'Unsupported encoding', ''
       
def uncompress (data, extra):
    SIZE = 4096
    MASK = SIZE - 1
   
    extra += 1
    ring = SIZE*[0]
    r = 0
   
    out = []
    pos = 0
   
    ctl = 0
    while pos < len (data):
        if 0 == (ctl&256):
            if pos >= len (data):
                break
            ctl = data[pos]
            ctl |= 0xff00
            pos += 1
       
        #If bit is set then next byte in payload is a literal,
        #else it is a 12 bit offset, 4 bit length pair into a 4k ring buffer
        if ctl&1:
            c = data[pos]
            out.append (c)
            ring[r&MASK] = c
            pos += 1
            r += 1
        else:
            if pos >= len (data):
                break
            b0 = data[pos + 0]
            b1 = data[pos + 1]
            word = (b1<<8)|b0
            base = (word>>4)&0xfff
            length = (word&0xf) + extra
            offset = r - (base + 1)
           
            for i in range (length):
                c = ring[(offset + i)&MASK]
                out.append (c)
                ring[r&MASK] = c
                r += 1
           
            pos += 2
        #Advance to the next bit
        ctl >>= 1
   
    return bytes (out)

def main ():
    PREFIX = 'AGARTHA'
    DEST = 'contents'
   
    #Load the manifest
    with open (PREFIX + '.LST', 'rb') as f:
        count = 0
        lines = f.readlines ()
       
        #Build a list of files
        files = []
        for ln in lines:
            #Remove blanks and comments
            ln = ln.decode ('latin').strip ().lower ().replace ('/', '\\')
            if '' == ln:
                continue
            if '#' == ln[:1]:
                continue
            #Insert the path into the list for later
            files.append (ln)
            count += 1
       
    #Figure out the common prefix
    pref = os.path.commonprefix (files).replace ('\\', '/')
    print (f'Common Prefix: "{pref}"')
   
    #Build a path list...
    paths = []
    for p in files:
        sanitised = os.path.normpath (p.replace ('\\', '/'))
        paths.append (sanitised.replace (pref, ''))
   
    #Extract files from archive...
    with open (PREFIX + '.PAK', 'rb') as f:
        offsets = unpack (f'<{count}I', f.read (calcsize (f'<{count}I')))
        for i in range (count):
            #Rebuild path on disk
            fn = os.path.basename (paths[i])
            dn = os.path.join (DEST, os.path.dirname (paths[i]))
            os.makedirs (dn, exist_ok = True)
           
            #Seek to find and read header
            f.seek (offsets[i])
            uncompressed, compressed, mode = unpack ('<IIH', f.read (calcsize ('<IIH')))
           
            #Read and uncompress contents as needed
            MODE_RAW = 0
            MODE_UNK = 1
            MODE_LZSS = 2
           
            rate = 100*compressed/uncompressed
            MODES = ['uncompressed', 'LZSS_ALT', 'LZSS']
            print (f'Uncompressing "{paths[i]}", ratio: {rate:.4}% ({MODES[mode]}) {uncompressed}')
            if MODE_RAW == mode:
                data = f.read (compressed)
            elif MODE_UNK == mode or MODE_LZSS == mode:
                data = uncompress (f.read (compressed), mode)
                verify (len (data) == uncompressed, f'"{paths[i]}" uncompressed to {len (data)} bytes, instead of {uncompressed}')
            else:
                raise Exception (f'Unknown compression mode {mode}')
               
            if '.pvr' in paths[i]:
                data = data[16:]
                   
                ret, mode = pvr_decode (data)
                verify (str != type (ret), f'image "{paths[i]}" failed to decode: {ret}!')
                png.from_array (ret, mode).save (os.path.join (dn, fn) + '.png')
                   
            else:  
                #Write it to disk and free data
                with open (os.path.join (dn, fn), 'wb') as out:
                    out.write (data)
           
            del data  
               
if __name__ == "__main__":
    main ()

I'm in the process of figuring out the o6d files right now, the game's model format. I hope to have them dumped to GLTF format the same as I have with the Castlevania ones.

Also, are there any good ways to mount CDI files under Linux? I've tried converting them using iat and both the Agartha builds fail to translate into something mountable. It would be useful to cross reference the two builds, but I just cannot access the files on the march build in any possible way.

And thank you for the Geist Force files, @Wombat! I'm unable to make promises, but if I do another dive like this then Geist Force is on the short list.

Sega Dreamcast Info · Jun 12, 2021

The book in english :

Page 1

In the beginning, the massive earthquake, that the Bible erroneously calls Flood,
deeply transformed terrestrial geography and led the 13 tribes on the path of great migrations.
Guided by those that the Pnakotics Manuscrits (Vatican libraries archives) have designed under the
generic name of "the 9 unknown ones", those tribes populated the highlands that escaped the rising
waters.
That's how various cyclopean cities were born. Ancient papyrus preserved in the archeological
museum of Cairo and bearing the seal of Meneptah (1200 years before our era) mention for
example the forbidden city of the highlands of Leng which has seemingly entirely vanished following
the cataclysm that shook the region of Pamir.

Page 2

It was during risky peregrinations, that led by Hassuna, the Sennacherib clan, also called children of
the first world, discovered the gigantic schist grottoes that formed organically not far from the
higher spring of the Brahmaputra (Himalayan Chain), at 6700 metres above the current sea level.
The Sennacherib decided to settle and take root there. One can find multiple allusion to this fact in
the Mahabharata (Third Book)
That's probably how an underground passage was found, linked to a complex network of
passageways leading to the earth core, legendary location in multiple mythologies that refer to what
we know today as the Hollow Earth.

Page 3

Following the guidance of the "Enlightened One", the clan decided after the ritual sacrifices to build
a secret city that could shelter a part of the human species should a cataclysmic event happen. No
one knows exactly what techniques were used to accomplish this "Grand Design", or Agartha in
Tibetan, which also signifies :
The subterranean kingdom at the center of the Earth where the King of the World reigns.

The building of the city was spread over numerous generations, and it seems, based on the few
documents that have survived, that Agartha was never completely built. Local folklore (such as the
chantings of Lapcha) attributes to the "inevitable devastation", this frenzy that led to enslavement
and death a lot of the builders of this subterranean metropolis. Although it seems that events far
more dramatic than those prophecies preceded this titanic work.

Page 4

Agartha was the home of powerful malevolent forces fallen and banished from the divine kingdom.
Under the divine curse prohibiting them from appearing in the open, those infernal forces have been
buried in the deepest part of the earth.
Planning their return on Earth, they started to decimate the children of the first world using an
unknown disease. By doing this they made Agartha, an antechamber to hell in which they reign
supreme.
Leading the demoniac hordes, is the one they call The Sentinel (Y'aga Heer'GHta in the Sennacherib
dialect). If the writings are to be trusted, the city of Agartha communicates and opens towards other
heavens (please refer to the works of Julius Horbiger, Bale faculty 1724 and especially the corpus 46
entitled "On the Hollow Earth&quot

Page 5

Indeed it seems that the children of the first world had planned to return to the surface after the
“inevitable devastation” that they feared. Survivors of the epidemic were able to transmit to other
people some of their knowledge. That’s why Gozer’s tablets and “Dead Sea Scrolls” allude to the
existence of a “relay that leads soul from the surface to the depths below”.
Once a century, when the Scorpio enters the Virgo square, a chosen woman is born. A direct
descendent of the Sennacherib family, she represents the ultimate salvation or the ultimate threat
that can either save or doom the land. Indeed, with the chosen one’s sacrifice, carried out under
certain conditions, the Sentinel can either rise in broad daylight or be cast back to its lair.
Nowadays, the only custodians of this knowledge and undertakers of this task, have gathered in a
secret society, called the “Order of the Sennacherib”. Without being descendents of the clans, their
mission is to fight Evil. Their sworn enemies, hell bent on awakening the Sentinel, are called “the
Conspirators of Twilight”

FamilyGuy · Jun 12, 2021

Sifting said:
Also, are there any good ways to mount CDI files under Linux? I've tried converting them using iat and both the Agartha builds fail to translate into something mountable. It would be useful to cross reference the two builds, but I just cannot access the files on the march build in any possible way.

You can open CDI files with GDI tools but it's not really officially supported, as it's mostly untested.

See https://sourceforge.net/p/dcisotools/code/ci/master/tree/addons/dumpcdi11702.py

This script was developed to extract a CDI dump of Half-Life with the second session at LBA11702, you might have to adapt it for other CDI dumps. Hence why it is not an official feature.

FamilyGuy · Jun 12, 2021

I did some testing, and it seems like those CDI files might be invalid somehow. Maybe only the big endian TOC is valid and the Little endian one is messed up? Maybe as a primitive form of copy protection if those CDI originated from No Cliché internally? This is confusing to be honest.

Normally, this script would allow to extract "Agartha-Internal_March2011.cdi"

Python:

import sys, os
from gditools import ISO9660

cdi_offset = 0x158898 # Offset of IP.BIN (SEGA SEGAKATANA ...) in the CDI file
cdi_lba = 0x2DB4      # LBA of second session, found at the end of the CDI file, 36 bytes before the last filename

def main(filename):
    a = [dict(
        filename=os.path.abspath(filename),
        mode=2336,
        offset=cdi_lba*2048 - (cdi_offset-8)//2336*2048,
        wormhole=[0, cdi_lba*2048, 32*2048],
        manualRawOffset=(cdi_offset-8)%2336,
        lba=45000,  # Required to trick ISO9660 class, isn't actually used.
        tnum=0      # Idem
    ),] # List to trick ISO9660 class, need to fix for clean CDI support.
 
    b = ISO9660(a, verbose=True)
 
    for i in b._sorted_records('EX_LOC'):
        if not i['name'] in ['/0.0', '/DUMMY.DAT']:
            b.dump_file_by_record(i, './data')
            #b.dump_file_by_record(i, './'+b.get_pvd()['volume_identifier'])
 
    b.dump_sorttxt()
    b.dump_bootsector(lba=cdi_lba)


if __name__ == '__main__':
    if len(sys.argv) == 2:
        main(sys.argv[1])
    else:
        print('dump_AgarthaMarch2011.py - Based on gditools\n\nError: Bad syntax\n\nUsage: dump_AgarthaMarch2011.py image.cdi')

But it crashes with a "KeyError: 'path_table_size'" error, which indicates that that the primary volume descriptor (PVD) of the ISO9660 filesystem is likely invalid.

However, when looking at the "AppendedFiles" object in memory without trying to parse it as a ISO9660 filesystem, it seems like I did not made an error.

One can dump an ISO that represents that "AppendedFiles" object like so:

Python:

from gditools import AppendedFiles, _copy_buffered
filename = 'Agartha-Internal_March2011.cdi'
cdi_offset = 0x158898
cdi_lba = 0x2DB4
a = dict(
        filename=os.path.abspath(filename),
        mode=2336,
        offset=cdi_lba*2048 - (cdi_offset-8)//2336*2048,
        wormhole=[0, cdi_lba*2048, 32*2048],
        manualRawOffset=(cdi_offset-8)%2336,
        lba=45000,  # Required to trick ISO9660 class, isn't actually used.
        tnum=0      # Idem
    )
b = AppendedFiles(a, None)
with open('Agartha-Internal_March2011_fixed.iso', 'wb') as of:
    _copy_buffered(b, of)

This is essentially equivalent to running "isofix.exe" on the 3rd track of a CDI/GDI file.

That ISO looks like it starts right, with the IP.BIN at the start ( "SEGA SEGAKATANA"...) and at the 2048*cdi_lba offset, but neither archive manager nor "Disk Image Mounter" on Linux can make sense of it (Debian Bullseye). This is consistent with an invalid PVD.

For reference, one would have

Python:

cdi_offset = 0x2F2AAD8
cdi_lba = 0x7D80

for the E3 demo CDI.

Isobuster can't make sense of those ISO files either, so there's something fishy. Maybe I'm doing something wrong, I'm very out of practice when it comes to CDI files.

Maybe it's not in 2336 format?

[EDIT]

@Sifting Got it for ~~the march CDI~~ FOR BOTH! Turns out it's in 2352 mode!
I removed the part of the code that skipped dummy files as you might want to check those out!

Python:

import sys, os
from gditools import ISO9660

cdi_offset = 0x158888
cdi_lba = 0x2DB4
blocsize = 2352

def main(filename):
    a = [dict(
        filename=os.path.abspath(filename),
        mode=blocsize,
        offset=cdi_lba*2048 - cdi_offset//blocsize*2048,
        wormhole=[0, cdi_lba*2048, 32*2048],
        manualRawOffset=cdi_offset%blocsize,
        lba=45000,  # Required to trick ISO9660 class, isn't actually used.
        tnum=0      # Idem
    ),] # List to trick ISO9660 class, need to fix for clean CDI support.
 
    b = ISO9660(a, verbose=True)
 
    for i in b._sorted_records('EX_LOC'):
        b.dump_file_by_record(i, './data')
 
    b.dump_sorttxt()
    b.dump_bootsector(lba=cdi_lba)


if __name__ == '__main__':
    if len(sys.argv) == 2:
        main(sys.argv[1])
    else:
        print('dump_Agartha_March2011.py - Based on gditools\n\nError: Bad syntax\n\nUsage: dump_Agartha_March2011.py image.cdi')

[EDIT #2]

And here's the version for the E3 demo.

Python:

import sys, os
from gditools import ISO9660

cdi_offset = 0x2F2AAC8
cdi_lba = 0x7D80
blocsize = 2352

def main(filename):
    a = [dict(
        filename=os.path.abspath(filename),
        mode=blocsize,
        offset=cdi_lba*2048 - cdi_offset//blocsize*2048,
        wormhole=[0, cdi_lba*2048, 32*2048],
        manualRawOffset=cdi_offset%blocsize,
        lba=45000,  # Required to trick ISO9660 class, isn't actually used.
        tnum=0      # Idem
    ),] # List to trick ISO9660 class, need to fix for clean CDI support.
  
    b = ISO9660(a, verbose=True)
  
    for i in b._sorted_records('EX_LOC'):
        b.dump_file_by_record(i, './data')
  
    b.dump_sorttxt()
    b.dump_bootsector(lba=cdi_lba)


if __name__ == '__main__':
    if len(sys.argv) == 2:
        main(sys.argv[1])
    else:
        print('dump_Agartha_E3.py - Based on gditools\n\nError: Bad syntax\n\nUsage: dump_Agartha_E3.py image.cdi')

megavolt85 · Jun 13, 2021

Sifting said:
Also, are there any good ways to mount CDI files under Linux?

wine + net. framework 4 + GD-ROM Explorer

Sifting · Jun 13, 2021

@FamilyGuy Awesome! The E3 version now works, but I'm a bit confused about the other one. I have agartha-internal_april_2001-corrupted.cdi and Agartha-E3Demo.cdi. Neither script works with the april one. I used`dumpcdi11702` to access the april files originally, but I'm wondering if it's doing the right thing? is there a march internal build? I only have the two builds from https://en.sega-dreamcast-info-games-preservation.com/agartha-sega-dreamcast-download

For reference:

Using dump_Agartha_March2011.py:

Code:

$ python2 dump_Agartha_March2011.py ~/games/roms/dc/agartha-internal_april_2001-corrupted.cdi
Traceback (most recent call last):
  File "dump_Agartha_March2011.py", line 33, in <module>
    main(sys.argv[1])
  File "dump_Agartha_March2011.py", line 22, in main
    b = ISO9660(a,    verbose=True)
  File "/home/winter/.local/bin/gditools.py", line 65, in __init__
    _ISO9660_orig.__init__(self, 'url') # So url doesn't starts with http
  File "/home/winter/.local/bin/iso9660.py", line 34, in __init__
    self._unpack_pvd()
  File "/home/winter/.local/bin/iso9660.py", line 171, in _unpack_pvd
    self._pvd['volume_space_size']             = self._unpack_both('i')
  File "/home/winter/.local/bin/iso9660.py", line 272, in _unpack_both
    assert a == b
AssertionError

Code:

python2 dump_Agartha_March2011.py ~/games/roms/dc/Agartha-E3Demo.cdi
Traceback (most recent call last):
  File "dump_Agartha_March2011.py", line 33, in <module>
    main(sys.argv[1])
  File "dump_Agartha_March2011.py", line 22, in main
    b = ISO9660(a,    verbose=True)
  File "/home/winter/.local/bin/gditools.py", line 65, in __init__
    _ISO9660_orig.__init__(self, 'url') # So url doesn't starts with http
  File "/home/winter/.local/bin/iso9660.py", line 41, in __init__
    l0 = self._pvd['path_table_size']
KeyError: 'path_table_size'

Using dump_Agartha_E3.py:

Code:

python2 dump_Agartha_E3.py ~/games/roms/dc/agartha-internal_april_2001-corrupted.cdi
Traceback (most recent call last):
  File "dump_Agartha_E3.py", line 33, in <module>
    main(sys.argv[1])
  File "dump_Agartha_E3.py", line 22, in main
    b = ISO9660(a,    verbose=True)
  File "/home/winter/.local/bin/gditools.py", line 65, in __init__
    _ISO9660_orig.__init__(self, 'url') # So url doesn't starts with http
  File "/home/winter/.local/bin/iso9660.py", line 34, in __init__
    self._unpack_pvd()
  File "/home/winter/.local/bin/iso9660.py", line 171, in _unpack_pvd
    self._pvd['volume_space_size']             = self._unpack_both('i')
  File "/home/winter/.local/bin/iso9660.py", line 272, in _unpack_both
    assert a == b
AssertionError

I'm guessing it's an oversight on my part, but I'm unsure of how to handle this.

EDIT:

It also seems like the dump_Agartha_E3.py script dumps only about 15MiB off the E3 demo, which itself is about 65MiB. It seems like there should be way more?

FamilyGuy · Jun 13, 2021

I might have another build, but I'm likely not at liberty to share as I don't own it really. I'll check it out later this weekend.

[EDIT]

Sifting said:
It also seems like the dump_Agartha_E3.py script dumps only about 15MiB off the E3 demo, which itself is about 65MiB. It seems like there should be way more?

I think it's CDDA tracks, that's why the cdi_offset is so large for this one. You can use CDIRIP to extract tracks from the CDI. The data tracks need isofixing, and sometimes the data track doesn't extract properly, but it should at least give your the audio tracks.

See: https://sourceforge.net/projects/cdimagetools/files/CDIRip/0.6.3/

I'm downloading the two releases you got. But basically the key is in the two "cdi" variables at the beginning. cdi_offset is the beginning of the sector with the ip.bin (including the header for 2336 or 2352 mode, so -0x08 or -0x10 depending). And the other one is the LBA of the second session, that can be found at the end of the CDI file. It's 2 bytes long (or 4 and the last two one are 0x00) and it's 36 bytes before the last of the three times that the CDI filename is written.

Anyways I should be able to send you scripts to extract it properly later tonight.

FamilyGuy · Jun 13, 2021

Alright, so the "March 2011" (typo? I'm almost sure I didn't have that in 2011) dump I got seems to be identical to the "April 2001" one but in a different CDI format (2352 bytes/sector and 11700 LBA instead of 2336 bytes/sector and 11702 LBA) it has the exact same ISO9660 timestamps, number of files, and total filesize.

It's probably just an earlier rip from Laurent. IIRC he dumped it multiple times because he was afraid that there was some corruption. I might've actually done both selfboots, it is my habit to keep the original ISO9660 timestamps for betas/unreleased. Or it might be from someone else too, it's been so long that I honestly don't remember.

Anyways, I've cleaned up the script a little, so here it is:

cdi_dump_isofix.py

Python:

import sys, os
from gditools import ISO9660, _copy_buffered, AppendedFiles


cdi_offset = 0x2F2AAD8  # Offset of ip.bin userdata (typically the offset of "SEGA SEGAKATANA")
cdi_lba = 0x7D80        # 2nd session LBA. Found in cdi footer, 36 bytes before the 3rd filename entry.
blocksize = 2352        # 2048, 2336, 2352 are supported. Idk where to get the info reliably yet.

header_size = {2048: 0x00, 2336: 0x08, 2352: 0x10}
cdi_offset -= header_size[blocksize]


def main(filename):
    a = [dict(
        filename=os.path.abspath(filename),
        mode=blocksize,
        offset=cdi_lba*2048 - cdi_offset//blocksize*2048,
        wormhole=[0, cdi_lba*2048, 32*2048],
        manualRawOffset=cdi_offset%blocksize,
        lba=45000,  # Required to trick ISO9660 class, isn't actually used.
        tnum=0      # Idem
    ),] # List to trick ISO9660 class, need to fix for clean CDI support.
 
    b = ISO9660(a, verbose=True)
 
    for i in b._sorted_records('EX_LOC'):
        b.dump_file_by_record(i, './data')
 
    b.dump_sorttxt()
    b.dump_bootsector(lba=cdi_lba)
 
    c = AppendedFiles(a[0], None)

    print("Now dumping an isofixed version of the filesystem to disk ...")
    with open(os.path.splitext(filename)[0]+'.iso', 'wb') as of:
        _copy_buffered(c, of)


if __name__ == '__main__':
    if len(sys.argv) == 2:
        main(sys.argv[1])
    else:
        print('cdi_dump_isofix.py - Based on gditools\n\nError: Bad syntax\n\nUsage: cdi_dump_isofix.py image.cdi')

It will extract the data, the sorttxt.txt, the ip.bin, and it will also generate an "isofixed" iso, that is to say a valid .iso file that's mountable in Linux or with most tools that handle ISO files.

See the top of the script for the 3 dump-specific values required to dump a cdi file properly.
The comments there should be relatively clear on how to find the proper values in any CDI.

The blocksize can be guessed by looking at the first few bytes before "SEGA SEGAKATANA" to see if they match 2352 or 2336 format, or you can just try both (I've never seen 2048 format in a CDI).

The values in the script above are for the E3 demo, the April build would rather have:

Python:

cdi_offset = 0x159188   # Offset of ip.bin userdata (typically the offset of "SEGA SEGAKATANA")
cdi_lba = 0x2DB6        # 2nd session LBA. Found in cdi footer, 36 bytes before the last filename
blocksize = 2336        # 2048, 2336, 2352 are supported. Idk where to get the info reliably yet.

If you have an idea on how to automate the extraction of those 3 values simply, I'd be happy to add full CDI support to GDITools.
@SiZiOUS, care to share your CDI knowledge for that my friend?

I should really do some heavy refactoring on GDITools, convert it to python 3, and turn it into a proper package with a sane file structure.
I was learning python when I wrote that and some parts of the code are very cryptic or convoluted.
But I don't have enough free time on my hands right now, and it works good enough as it is, so don't hold your breath

Cheers!

Sifting · Jun 13, 2021

Wow, thank you! This thread is turning proper productive!

So, since @FamilyGuy has gotten me set up I was able to explore the contents of the playable build, and the two things that struck me immediately is, first, the playable build is much more primitive! The files are uncompressed and sitting on the image root, the audio is all in redbook, and it's just much more early seeming. It's quite interesting given they're only a month apart.

The second, and more exciting thing that I noticed is there are no .cse files in either build. However, both versions of the game have the previously mentioned .lst/.hqr files, and further more, in the playable version these are not jumbled. The .lst is plain text, and the .hqr file appears to be an archive. I've not found any manifest for the hqr files, but they are uncompressed. When viewing in a hex editor you can see script files and pvr textures blobbed together inside. This suggests that the level data is kept inside the hqr files, and if this is the case, provided we can figure out what's going on with the April versions, we might just be able to recover it, but this is all conjecture still. A lot remains to be examined.

Sega Dreamcast Info · Jun 13, 2021

I'm going to let you in on one of my biggest secrets, I've never owned a prototype of Agartha. Someone sent me the CD-I's of the 2 builds. They were on a CD-R.

I have never done a dump of this game.

I released the working Agartha in January 2017. The other was released after, late 2017 maybe.

discworld · Jun 14, 2021

Sifting said:
I ran some metrics on the files, and about 60% are encoded in LZSS, 30% in some other encoding - probably LZW - and 10% are uncompressed. Here are a few bits that I've found so far. I've employed the PVR routines that I wrote for Castlevania again, so we can see the textures...

A lot of Agartha appears to be scriptable in plain text, very similar to how Castlevania was. Again, this is probably because DeLoura's book was the hot read back then!

Code:

/***************************************************************************** ______________________________________________________________________ ) ) / Agartha - Demo2001 / / Playable.def / (_____________________________________________________________________( | ) | | Purpose / Palyable demo. Player can walk through Agartha world | | ( as well as meet some characters... |==========\==========================================================| | ) | | Author / Sébastien Viannay - NoCliché | |=========(===========================================================| | \ | | History ) 01/03/2001 : Creation. | |__________/__________________________________________________________| *****************************************************************************/ // include the entry point scene for the playabale demo part INCLUDE "Village\Village.def"

These book textures seem super interesting, but they're all written in French... @Sega Dreamcast Info care to translate for us? The others attached to the post.

View attachment 11731

Who's face is this?
View attachment 11736

There appears to be quite a few unfinished textures.

View attachment 11737

View attachment 11738

View attachment 11739

Loading screens...

View attachment 11740

View attachment 11741

View attachment 11742

More to come, hopefully. I start my work week today, so progress will be slower, unfortunately.

Sébastien Viannay the Developer of Agartha.

Yakumo · Jun 15, 2021

N

dark said:
Cool! You might consider exploring Geist Force as well. I don't remember anyone really diving into the disc contents on that title other than to re-order them for smoother loading.

Now that I would love to see. There are areas on the disc which are not accessible.

PrOfUnD Darkness · Jun 17, 2021

Love this technical threads. Sifting thank you for always sharing the code for your tools, it's awesome to go through them.

FamilyGuy, I remember you said GeistForce required some hacking because it uses early SET4 libraries or something like that, I 'd love to hear more.

Sifting · Jun 18, 2021

Thank you for the encouragement! I have to apologise for the huge drag on any new updates. Work is hitting me hard this week, and next week is going to be even worse. I'm looking forward to diving back in afterward, though! In the mean time I have made some minor observations:

There is a reoccurring pattern of:

Code:

UNK0 -  0 .. 4  // Always 00FFFFFFh
UNK1 -  4 .. 8  // Always FFFFFFFFh
UNK2 -  8 .. 12 // Always FFFFFF00h
UNK3 - 12 .. 16 // Seems monotonic? Increments by 100h each entry
UNK4 - 16 .. 20 // Always 00000800h
UNK5 - 20 .. 24 // Always 00000800h

There are several of these in both the HQR and LST files, upwards of a few hundred, so it suggests some kind of header. I suspect some of the adjacent bytes to be a part of it too, given there is little variance, and it would make sense to have a file size at a minimum if it is actually header data.

Also, given that textures and models and such are stored inside the PAK file in the April build, that would suggest only level geometry is stored inside the HQR files, but if that's the case, then why are there upwards of several hundred entries of the above structure? It's quite a mystery!

FamilyGuy · Jun 19, 2021

Sifting said:
Thank you for the encouragement! I have to apologise for the huge drag on any new updates. Work is hitting me hard this week, and next week is going to be even worse. I'm looking forward to diving back in afterward, though! In the mean time I have made some minor observations:

There is a reoccurring pattern of:

View attachment 11803

Code:

UNK0 - 0 .. 4 // Always 00FFFFFFh UNK1 - 4 .. 8 // Always FFFFFFFFh UNK2 - 8 .. 12 // Always FFFFFF00h UNK3 - 12 .. 16 // Seems monotonic? Increments by 100h each entry UNK4 - 16 .. 20 // Always 00000800h UNK5 - 20 .. 24 // Always 00000800h

There are several of these in both the HQR and LST files, upwards of a few hundred, so it suggests some kind of header. I suspect some of the adjacent bytes to be a part of it too, given there is little variance, and it would make sense to have a file size at a minimum if it is actually header data.

Also, given that textures and models and such are stored inside the PAK file in the April build, that would suggest only level geometry is stored inside the HQR files, but if that's the case, then why are there upwards of several hundred entries of the above structure? It's quite a mystery!

Looks like a CDROM mode2/form1 sector header to me.

Bytes 12—14 would be Minutes, Seconds, Frames. Byte 15 would be the mode. It's fuzzy in my memory, but minutes might have a ± 0xA offset.

Edit:

After reading ECMA-130, I think it should have a 0xA0 offset for the Minutes field. So that would make the example -0x40, or 0x9F minutes which is nonsensical.

There's some ambiguous wording about the 0xA0 offset, so I might be mistaken. Or it might just be some CDROM-inspired header.

If it's really based on CD-ROM sectors, you'd see the sync pattern 0x00, 0xFF ten times, 0x00 every 2352 bytes. That'd explain why the header repeats so often and why bytes 12–14 are monotonous.

Bytes 12–15 are increasing by 0x100 because bytes 15 is actually the mode, always 2 in this case, and the frames are increasing by 1. You should see the frame field roll over to zero after 74 because there's 75 frames (sectors) for a second. Though I'm not sure if it's in decimal or hex, so I'm not sure if 74 would be 0x74 or 0x4A. I vaguely remember there being some very funky business like that on CDs, maybe in the iso9660 timestamps though.

PrOfUnD Darkness said:
FamilyGuy, I remember you said GeistForce required some hacking because it uses early SET4 libraries or something like that, I 'd love to hear more.

I didn't fix the executable myself, some anonymous friend did and I only created the selfboot. Geist was so early that it didn't use the typical DC libraries. Specifically, and IIRC, it was hardcoded to read discs in mode1. GD-Rom are mode1, so that was ok for the GD-R, but Mil-CDs (aka selfboots) are in mode2, hence the crashing. The error correction is slightly different between the modes and so, in the case of a mismatch, fake-positive errors are erroneously "corrected" and you get garbage at the end.

PrOfUnD Darkness · Jun 21, 2021

FamilyGuy said:
I didn't fix the executable myself, some anonymous friend did and I only created the selfboot. Geist was so early that it didn't use the typical DC libraries. Specifically, and IIRC, it was hardcoded to read discs in mode1. GD-Rom are mode1, so that was ok for the GD-R, but Mil-CDs (aka selfboots) are in mode2, hence the crashing. The error correction is slightly different between the modes and so, in the case of a mismatch, fake-positive errors are erroneously "corrected" and you get garbage at the end.

So games using the regular katana libs handle mode1/mode2 automatically? I am glad you know people with enough knowledge to handle such problem, I bet this person took a while to figure it out..

FamilyGuy · Jun 21, 2021

PrOfUnD Darkness said:
So games using the regular katana libs handle mode1/mode2 automatically? I am glad you know people with enough knowledge to handle such problem, I bet this person took a while to figure it out..

I presume that the default libraries/drivers (SPI?) would handle reading each format correctly, as any decent library would.

It took that person maybe a few hours top to figure it out? He is just very, very, good honestly, and humble as well, doesn't want any recognition.

To be fair, a lot of other people (not me though, to be clear) could probably have figured it out, I'm thinking of megavolt84 or japanese-cake in particular, but this person just happened to be a contact of mine that I knew could figure basically anything out, and I could vouch for him as far as not leaking the executables.

Sifting · Jun 25, 2021

FamilyGuy said:
Looks like a CDROM mode2/form1 sector header to me.

I feel like you're on to something with this! I checked the strides between each occurrence and they're 2048 bytes apart. Also, I found this:

Recall that we saw this earlier in the weird ADX files, too. I am unsure of how to interpret this, but it seems like there is some other archive form based off the CDROM spec? Or could it be something else all together? If so, how should the actual file data be read? is it linear with the sync headers spliced into it? Is there any form of obfuscation or low level compression? Because I haven't seen any recognisable data going through these April build LST/HQR files, which tells me it's either compressed or obfuscated with like a xor key or something.

Also, work should be letting up after this week, so I hope to work on this more. I really need a better job.

EDIT:

The 'frame' field you mentioned also rolls over after 74h as well!

EDIT II:

I've tried mount ing these files as images too, but no luck. I've read the ecma 130 spec and it only mentions sector sizes of 2352 bytes. I've seen other information on the internet mention 2048 sizes though, and @FamilyGuy mentioned 2048 being one of the sizes that may be valid. Where can I find an official source for this information?

EDIT III:

The files do not begin with the sync pattern. In each LST/HQR file I've examined there are a variable number of bytes, no more than 2048 in length, before the first sync pattern. This would suggest a header, but the contents look scrambled. Super weird. This is the smallest 'header' that I've found so far:

FamilyGuy · Jun 26, 2021

Sifting said:
I feel like you're on to something with this! I checked the strides between each occurrence and they're 2048 bytes apart. Also, I found this:

View attachment 11832

Recall that we saw this earlier in the weird ADX files, too. I am unsure of how to interpret this, but it seems like there is some other archive form based off the CDROM spec? Or could it be something else all together? If so, how should the actual file data be read? is it linear with the sync headers spliced into it? Is there any form of obfuscation or low level compression? Because I haven't seen any recognisable data going through these April build LST/HQR files, which tells me it's either compressed or obfuscated with like a xor key or something.

Also, work should be letting up after this week, so I hope to work on this more. I really need a better job.

EDIT:

The 'frame' field you mentioned also rolls over after 74h as well!

EDIT II:

I've tried mount ing these files as images too, but no luck. I've read the ecma 130 spec and it only mentions sector sizes of 2352 bytes. I've seen other information on the internet mention 2048 sizes though, and @FamilyGuy mentioned 2048 being one of the sizes that may be valid. Where can I find an official source for this information?

EDIT III:

The files do not begin with the sync pattern. In each LST/HQR file I've examined there are a variable number of bytes, no more than 2048 in length, before the first sync pattern. This would suggest a header, but the contents look scrambled. Super weird. This is the smallest 'header' that I've found so far:

View attachment 11835

A CDROM data track is made up of sectors, each with the following structure:

(a) 16 bytes header (12 sync pattern, 3 M:S:F, 1 mode)
(b) 2048 bytes of user data
(c) 288 bytes of error detection and correction (cross-interleaved Reed Solomon coding, CIRC)

A raw 2352 image has all three parts. A 2336 image, common within cdi, has (b) and (c), an ISO has only (b).

I've never seen images with (a) and (b) only and I don't understand why one would want that.

(b) is typically an ISO9660 filesystem (for Dreamcast anyways, in general Joliet and others are possible). So yeah, this is a linear archive that's spliced with (a) and (c).

GDItools basically uses classes to provide an on-the-fly view to the (b) content, no matter the combination of (a)(b)(c) in the source files, to a ISO9660 library. It also handles the case where an ISO track doesn't start at LBA 0, e.g. track03 of a GDI.

Notes:

The implementation of the Reed Solomon code is slightly different depending on the mode of the sector
This is the "high level" layer of a CDROM, there's a "lower level" layer with other error correction considerations (CIRC, EFM).

Make a donation

Dreamcast Inside Agartha

2049 Donator

Well-known member

Well-known member

2049 Donator

2049 Donator

DreamShell Developer

Well-known member

2049 Donator

2049 Donator

Well-known member

Well-known member

Member

Well-known member

Well-known member

Well-known member

2049 Donator

Well-known member

2049 Donator

Well-known member

2049 Donator

Make a donation