1 | @c This is part of the paxutils manual.
|
---|
2 | @c Copyright (C) 2006 Free Software Foundation, Inc.
|
---|
3 | @c This file is distributed under GFDL 1.1 or any later version
|
---|
4 | @c published by the Free Software Foundation.
|
---|
5 |
|
---|
6 | @cindex sparse formats
|
---|
7 | @cindex sparse versions
|
---|
8 | The notion of sparse file, and the ways of handling it from the point
|
---|
9 | of view of @GNUTAR{} user have been described in detail in
|
---|
10 | @ref{sparse}. This chapter describes the internal format @GNUTAR{}
|
---|
11 | uses to store such files.
|
---|
12 |
|
---|
13 | The support for sparse files in @GNUTAR{} has a long history. The
|
---|
14 | earliest version featuring this support that I was able to find was 1.09,
|
---|
15 | released in November, 1990. The format introduced back then is called
|
---|
16 | @dfn{old GNU} sparse format and in spite of the fact that its design
|
---|
17 | contained many flaws, it was the only format @GNUTAR{} supported
|
---|
18 | until version 1.14 (May, 2004), which introduced initial support for
|
---|
19 | sparse archives in @acronym{PAX} archives (@pxref{posix}). This
|
---|
20 | format was not free from design flows, either and it was subsequently
|
---|
21 | improved in versions 1.15.2 (November, 2005) and 1.15.92 (June,
|
---|
22 | 2006).
|
---|
23 |
|
---|
24 | In addition to GNU sparse format, @GNUTAR{} is able to read and
|
---|
25 | extract sparse files archived by @command{star}.
|
---|
26 |
|
---|
27 | The following subsections describe each format in detail.
|
---|
28 |
|
---|
29 | @menu
|
---|
30 | * Old GNU Format::
|
---|
31 | * PAX 0:: PAX Format, Versions 0.0 and 0.1
|
---|
32 | * PAX 1:: PAX Format, Version 1.0
|
---|
33 | @end menu
|
---|
34 |
|
---|
35 | @node Old GNU Format
|
---|
36 | @appendixsubsec Old GNU Format
|
---|
37 |
|
---|
38 | @cindex sparse formats, Old GNU
|
---|
39 | @cindex Old GNU sparse format
|
---|
40 | The format introduced some time around 1990 (v. 1.09). It was
|
---|
41 | designed on top of standard @code{ustar} headers in such an
|
---|
42 | unfortunate way that some of its fields overwrote fields required by
|
---|
43 | POSIX.
|
---|
44 |
|
---|
45 | An old GNU sparse header is designated by type @samp{S}
|
---|
46 | (@code{GNUTYPE_SPARSE}) and has the following layout:
|
---|
47 |
|
---|
48 | @multitable @columnfractions 0.10 0.10 0.20 0.20 0.40
|
---|
49 | @headitem Offset @tab Size @tab Name @tab Data type @tab Contents
|
---|
50 | @item 0 @tab 345 @tab @tab N/A @tab Not used.
|
---|
51 | @item 345 @tab 12 @tab atime @tab Number @tab @code{atime} of the file.
|
---|
52 | @item 357 @tab 12 @tab ctime @tab Number @tab @code{ctime} of the file .
|
---|
53 | @item 369 @tab 12 @tab offset @tab Number @tab For
|
---|
54 | multivolume archives: the offset of the start of this volume.
|
---|
55 | @item 381 @tab 4 @tab @tab N/A @tab Not used.
|
---|
56 | @item 385 @tab 1 @tab @tab N/A @tab Not used.
|
---|
57 | @item 386 @tab 96 @tab sp @tab @code{sparse_header} @tab (4 entries) File map.
|
---|
58 | @item 482 @tab 1 @tab isextended @tab Bool @tab @code{1} if an
|
---|
59 | extension sparse header follows, @code{0} otherwise.
|
---|
60 | @item 483 @tab 12 @tab realsize @tab Number @tab Real size of the file.
|
---|
61 | @end multitable
|
---|
62 |
|
---|
63 | Each of @code{sparse_header} object at offset 386 describes a single
|
---|
64 | data chunk. It has the following structure:
|
---|
65 |
|
---|
66 | @multitable @columnfractions 0.10 0.10 0.20 0.60
|
---|
67 | @headitem Offset @tab Size @tab Data type @tab Contents
|
---|
68 | @item 0 @tab 12 @tab Number @tab Offset of the
|
---|
69 | beginning of the chunk.
|
---|
70 | @item 12 @tab 12 @tab Number @tab Size of the chunk.
|
---|
71 | @end multitable
|
---|
72 |
|
---|
73 | If the member contains more than four chunks, the @code{isextended}
|
---|
74 | field of the header has the value @code{1} and the main header is
|
---|
75 | followed by one or more @dfn{extension headers}. Each such header has
|
---|
76 | the following structure:
|
---|
77 |
|
---|
78 | @multitable @columnfractions 0.10 0.10 0.20 0.20 0.40
|
---|
79 | @headitem Offset @tab Size @tab Name @tab Data type @tab Contents
|
---|
80 | @item 0 @tab 21 @tab sp @tab @code{sparse_header} @tab
|
---|
81 | (21 entires) File map.
|
---|
82 | @item 504 @tab 1 @tab isextended @tab Bool @tab @code{1} if an
|
---|
83 | extension sparse header follows, or @code{0} otherwise.
|
---|
84 | @end multitable
|
---|
85 |
|
---|
86 | A header with @code{isextended=0} ends the map.
|
---|
87 |
|
---|
88 | @node PAX 0
|
---|
89 | @appendixsubsec PAX Format, Versions 0.0 and 0.1
|
---|
90 |
|
---|
91 | @cindex sparse formats, v.0.0
|
---|
92 | There are two formats available in this branch. The version @code{0.0}
|
---|
93 | is the initial version of sparse format used by @command{tar}
|
---|
94 | versions 1.14--1.15.1. The sparse file map is kept in extended
|
---|
95 | (@code{x}) PAX header variables:
|
---|
96 |
|
---|
97 | @table @code
|
---|
98 | @vrindex GNU.sparse.size, extended header variable
|
---|
99 | @item GNU.sparse.size
|
---|
100 | Real size of the stored file
|
---|
101 |
|
---|
102 | @item GNU.sparse.numblocks
|
---|
103 | @vrindex GNU.sparse.numblocks, extended header variable
|
---|
104 | Number of blocks in the sparse map
|
---|
105 |
|
---|
106 | @item GNU.sparse.offset
|
---|
107 | @vrindex GNU.sparse.offset, extended header variable
|
---|
108 | Offset of the data block
|
---|
109 |
|
---|
110 | @item GNU.sparse.numbytes
|
---|
111 | @vrindex GNU.sparse.numbytes, extended header variable
|
---|
112 | Size of the data block
|
---|
113 | @end table
|
---|
114 |
|
---|
115 | The latter two variables repeat for each data block, so the overall
|
---|
116 | structure is like this:
|
---|
117 |
|
---|
118 | @smallexample
|
---|
119 | @group
|
---|
120 | GNU.sparse.size=@var{size}
|
---|
121 | GNU.sparse.numblocks=@var{numblocks}
|
---|
122 | repeat @var{numblocks} times
|
---|
123 | GNU.sparse.offset=@var{offset}
|
---|
124 | GNU.sparse.numbytes=@var{numbytes}
|
---|
125 | end repeat
|
---|
126 | @end group
|
---|
127 | @end smallexample
|
---|
128 |
|
---|
129 | This format presented the following two problems:
|
---|
130 |
|
---|
131 | @enumerate 1
|
---|
132 | @item
|
---|
133 | Whereas the POSIX specification allows a variable to appear multiple
|
---|
134 | times in a header, it requires that only the last occurrence be
|
---|
135 | meaningful. Thus, multiple occurrences of @code{GNU.sparse.offset} and
|
---|
136 | @code{GNU.sparse.numbytes} are conflicting with the POSIX specs.
|
---|
137 |
|
---|
138 | @item
|
---|
139 | Attempting to extract such archives using a third-party @command{tar}s
|
---|
140 | results in extraction of sparse files in @emph{compressed form}. If
|
---|
141 | the @command{tar} implementation in question does not support POSIX
|
---|
142 | format, it will also extract a file containing extension header
|
---|
143 | attributes. This file can be used to expand the file to its original
|
---|
144 | state. However, posix-aware @command{tar}s will usually ignore the
|
---|
145 | unknown variables, which makes restoring the file more
|
---|
146 | difficult. @xref{extracting sparse v.0.x, Extraction of sparse
|
---|
147 | members in v.0.0 format}, for the detailed description of how to
|
---|
148 | restore such members using non-GNU @command{tar}s.
|
---|
149 | @end enumerate
|
---|
150 |
|
---|
151 | @cindex sparse formats, v.0.1
|
---|
152 | @GNUTAR{} 1.15.2 introduced sparse format version @code{0.1}, which
|
---|
153 | attempted to solve these problems. As its predecessor, this format
|
---|
154 | stores sparse map in the extended POSIX header. It retains
|
---|
155 | @code{GNU.sparse.size} and @code{GNU.sparse.numblocks} variables, but
|
---|
156 | instead of @code{GNU.sparse.offset}/@code{GNU.sparse.numbytes} pairs
|
---|
157 | it uses a single variable:
|
---|
158 |
|
---|
159 | @table @code
|
---|
160 | @item GNU.sparse.map
|
---|
161 | @vrindex GNU.sparse.map, extended header variable
|
---|
162 | Map of non-null data chunks. It is a string consisting of
|
---|
163 | comma-separated values "@var{offset},@var{size}[,@var{offset-1},@var{size-1}...]"
|
---|
164 | @end table
|
---|
165 |
|
---|
166 | To address the 2nd problem, the @code{name} field in @code{ustar}
|
---|
167 | is replaced with a special name, constructed using the following pattern:
|
---|
168 |
|
---|
169 | @smallexample
|
---|
170 | %d/GNUSparseFile.%p/%f
|
---|
171 | @end smallexample
|
---|
172 |
|
---|
173 | @vrindex GNU.sparse.name, extended header variable
|
---|
174 | The real name of the sparse file is stored in the variable
|
---|
175 | @code{GNU.sparse.name}. Thus, those @command{tar} implementations
|
---|
176 | that are not aware of GNU extensions will at least extract the files
|
---|
177 | into separate directories, giving the user a possibility to expand it
|
---|
178 | afterwards. @xref{extracting sparse v.0.x, Extraction of sparse
|
---|
179 | members in v.0.1 format}, for the detailed description of how to
|
---|
180 | restore such members using non-GNU @command{tar}s.
|
---|
181 |
|
---|
182 | The resulting @code{GNU.sparse.map} string can be @emph{very} long.
|
---|
183 | Although POSIX does not impose any limit on the length of a @code{x}
|
---|
184 | header variable, this possibly can confuse some tars.
|
---|
185 |
|
---|
186 | @node PAX 1
|
---|
187 | @appendixsubsec PAX Format, Version 1.0
|
---|
188 |
|
---|
189 | @cindex sparse formats, v.1.0
|
---|
190 | The version @code{1.0} of sparse format was introduced with @GNUTAR{}
|
---|
191 | 1.15.92. Its main objective was to make the resulting file
|
---|
192 | extractable with little effort even by non-posix aware @command{tar}
|
---|
193 | implementations. Starting from this version, the extended header
|
---|
194 | preceding a sparse member always contains the following variables that
|
---|
195 | identify the format being used:
|
---|
196 |
|
---|
197 | @table @code
|
---|
198 | @item GNU.sparse.major
|
---|
199 | @vrindex GNU.sparse.major, extended header variable
|
---|
200 | Major version
|
---|
201 |
|
---|
202 | @item GNU.sparse.minor
|
---|
203 | @vrindex GNU.sparse.minor, extended header variable
|
---|
204 | Minor version
|
---|
205 | @end table
|
---|
206 |
|
---|
207 | The @code{name} field in @code{ustar} header contains a special name,
|
---|
208 | constructed using the following pattern:
|
---|
209 |
|
---|
210 | @smallexample
|
---|
211 | %d/GNUSparseFile.%p/%f
|
---|
212 | @end smallexample
|
---|
213 |
|
---|
214 | @vrindex GNU.sparse.name, extended header variable, in v.1.0
|
---|
215 | @vrindex GNU.sparse.realsize, extended header variable
|
---|
216 | The real name of the sparse file is stored in the variable
|
---|
217 | @code{GNU.sparse.name}. The real size of the file is stored in the
|
---|
218 | variable @code{GNU.sparse.realsize}.
|
---|
219 |
|
---|
220 | The sparse map itself is stored in the file data block, preceding the actual
|
---|
221 | file data. It consists of a series of octal numbers of arbitrary length, delimited
|
---|
222 | by newlines. The map is padded with nulls to the nearest block boundary.
|
---|
223 |
|
---|
224 | The first number gives the number of entries in the map. Following are map entries,
|
---|
225 | each one consisting of two numbers giving the offset and size of the
|
---|
226 | data block it describes.
|
---|
227 |
|
---|
228 | The format is designed in such a way that non-posix aware tars and tars not
|
---|
229 | supporting @code{GNU.sparse.*} keywords will extract each sparse file
|
---|
230 | in its condensed form with the file map prepended and will place it
|
---|
231 | into a separate directory. Then, using a simple program it would be
|
---|
232 | possible to expand the file to its original form even without @GNUTAR{}.
|
---|
233 | @xref{Sparse Recovery}, for the detailed information on how to extract
|
---|
234 | sparse members without @GNUTAR{}.
|
---|
235 |
|
---|