[This is a second article in the 'embedded file system' series. If you missed the first part, you can read it here.]
GpStructuredStorage compound file is organized in 1 KB blocks. First block contains header, then the content alternates between a file allocation table (FAT) fragment block and 256 blocks managed by the preceeding FAT fragment. Each block can be represented by a number - header block is block #0, first fat fragment is block #1 and so on.
[header:HEADER:1024]
[fat entry:FATENTRY:1024]
256 x [block:FOLDER/FILE:1024]
[fat entry:FATENTRY:1024]
256 x [block:FOLDER/FILE:1024]
...
[fat entry:FATENTRY:1024]
<=256 x [block:FOLDER/FILE:1024]
Header starts with a signature, which must always be 'GpStructuredStorage file'#13#10#26#0.
HEADERHeader ends in:
[signature:32] // PChar
[unused:964]
[storage attribute file:4]
[storage attribute file size:4]
[first FAT block:4]
[first unused block:4]
[root folder:4]
[root folder size:4]
[version:4] // storage system version
- block number and size of the internal file containing global storage attributes
- block number of the first FAT block (always 1)
- block number of the first unused file block
- block number and size of the root folder
- storage system version (at the moment $01000200, or 1.0.2.0)
Each FAT fragment contains 256 32-bit numbers, one for each of the 256 following file blocks. This number is either 0 if block is last in the FAT chain, or it is a number of the next file block in the chain. Each block is linked into exactly one chain. It can either belong to a file/folder or to a empty block list. Address of the first block in the empty list is stored in the header ([first unused block] entry).
FATENTRYFAT structure defines the limitations of my file format - last block must have number that is less than 4294967294. As block are 1 KB in size, total file size cannot exceed 4294967294 * 1 KB, which is slightly less than 4 TB. Enough for all practical purposes, I think.
256*[next block:4] // next-pointers for this block; 0 = unused
Folders are simple - each folder is just a file. It contains file information records and is terminated with two 0 bytes.
FOLDER //potentially split over several blocksFile information record is a variable length record containing file name (up to 64 KB), attributes, length, and address of the first file block (additional blocks are reached by following the FAT chain).
[FILE_INFO]
[FILE_INFO]
...
[FILE_INFO]
[0:2]
FILE_INFOAt the moment, only two attributes are defined. One specifies that file is actually a subfolder, and another designates a special file containing file attributes (for discussion of attributes see the previous article).
[file name length:2]
[file name:1..65535]
[file attributes:ATTRIBUTES:4]
[file length:4] // 4 GB per file
[first file block:4]
ATTRIBUTESThat's just about everything that is to tell about the compound file format. Armed with this knowledge, one can easily write a compound file browser/repair utility.
$0001 = attrIsFolder
$0002 = attrIsAttributeFile
Brilliant. I found this in GExperts code and decided to use it in a NewsReader program i'm writing. These posts will surely come in handy.
ReplyDeleteHvala, pozdrav :)
Thanks for the thumbs up!
ReplyDelete(In hvala ;) )
good work, but i find a bug.
ReplyDeletetry to increase file size in test:
function TForm1.TestGSSBigAndSmall: boolean;
var
storage: IGpStructuredStorage;
begin
Result := false;
Log('Testing big and small files');
try
storage := CreateStructuredStorage;
storage.Initialize(CStorageFile, fmCreate);
// 0 bytes
TestFile(storage, '/small.dat', true, -1);
TestFile(storage, '/small2.dat', true, -1);
TestFile(storage, '/small2.dat', true);
// cross the 257-block boundary
TestFile(storage, '/large.dat', true, 64);
storage := CreateStructuredStorage;
storage.Initialize(CStorageFile, fmOpenRead);
TestFile(storage, '/small.dat', false, -1);
TestFile(storage, '/small2.dat', false);
TestFile(storage, '/large.dat', false, 64);
Result := true;
except
on E: Exception do
Log(' '+E.Message);
end;
end; { TForm1.TestGSSBigAndSmall }
Self test fail!
I use delphi 2007 Version 11.0.2804.9245
on XP
I can confirm the problem. I'll fix it as soon as I have a little spare time.
ReplyDeleteThanks for reporting this!
It turned out to be only a bug in the test suite.
ReplyDeleteWhen you passed a value larger than 63 as a fourth parameter to the TestFile method, TestFile made some incorrect assumptions and compared value that was larger than $FFFF to another value that was truncated to two bytes. As a result of that, error was raised (incorrectly).
I have updated test suite at http://gp.17slon.com/gp/files/testgpstructuredstorage_src.zip.
Hi;
ReplyDeleteDoes it possible to store an executable file (.exe) in this storage and
execute from there?
Store, yes. Execute, no.
ReplyDeleteHow can I estimate the size of a new GpStructuredFile BEFORE adding files to it?
ReplyDeleteLet's see ...
ReplyDeleteHeader takes 1 KB. Always.
There is at least one FAT entry (1 KB) and it can address 256 x 1 KB blocks.
There is at least the root folder (1 KB).
A back of the envelope calculation:
- Take your file sizes and round them up to the nearest 1 KB multiple.
- Multiply by 1.004 (FAT blocks).
- Add 1 KB for each 25 files (folder blocks).
- Add 3 KB (header, root FAT and root folder).
Are there any limitations on :
ReplyDelete- number of files in root?
- number of folders in root?
- number of files in subfolders?
And, why did you mention "1 KB for each 25 files"?
Thanks.
a) All directories are infinitely extendable, even root.
ReplyDeleteb) A folder is just a file so a) applies.
c) See a).
1 KB per 25 files was just an approximation. A file entry in the folder uses 14 bytes + length of the file name. If your files have 13-character names (all names are Unicode), 25 files would use 25 * (14 + 13*2) = 1000 bytes.