%-12345X@PJL ENTER LANGUAGE=POSTSCRIPT %!PS-Adobe-3.0 %%Title: http://www.cs.berkeley.edu/~wil %%Creator: Windows NT 4.0 %%CreationDate: 14:38 6/12/2000 %%Pages: (atend) %%BoundingBox: 15 13 598 780 %%LanguageLevel: 2 %%DocumentNeededFonts: (atend) %%DocumentSuppliedFonts: (atend) %%EndComments %%BeginProlog %%BeginResource: procset NTPSOct95 /NTPSOct95 100 dict dup begin/bd{bind def}bind def/ld{load def}bd/ed{exch def} bd/a{currentpoint}bd/c/curveto ld/d/dup ld/e/eofill ld/f/fill ld/tr/translate ld/gr/grestore ld/gs/gsave ld/j/setlinejoin ld/L/lineto ld/M/moveto ld/n /newpath ld/cp/closepath ld/rm/rmoveto ld/sl/setlinewidth ld/sd/setdash ld/g /setgray ld/r/setrgbcolor ld/s/stroke ld/t/show ld/aw/awidthshow ld/im /imagemask ld/MS{moveto show}bd/SF{findfont exch scalefont setfont}bd/SM{cmtx setmatrix}bd/MF{findfont exch makefont setfont}bd/CM{/cmtx matrix currentmatrix def}bd/B{M exch dup 0 rlt exch 0 exch rlt neg 0 rlt}bd/CB{B cp eoclip}bd/EA{1 index 0/G0 put 4 string 1 1 4 -1 roll{3 copy neg exch cvs dup 0 71 put cvn 3 -1 roll exch put}for pop}bd/rlt/rlineto ld/L2?/languagelevel where{pop languagelevel 2 ge}{false}ifelse def end def %%EndResource %%EndProlog %%BeginSetup [{0 /languagelevel where{pop languagelevel 2 ge}{false}ifelse {1 dict dup/JobTimeout 4 -1 roll put setuserparams} {statusdict/setjobtimeout get exec}ifelse }stopped cleartomark [{240 /languagelevel where{pop languagelevel 2 ge}{false}ifelse {1 dict dup/WaitTimeout 4 -1 roll put setuserparams} {statusdict/waittimeout 3 -1 roll put}ifelse }stopped cleartomark /#copies 1 def [{ %%BeginFeature: *OutputMode Normal 1 dict dup /HWResolution [300 300] put setpagedevice 2 dict dup /PostRenderingEnhance true put dup /PostRenderingEnhanceDetails 2 dict dup /PrintQuality 2 put dup /Type 11 put put setpagedevice %%EndFeature } stopped cleartomark [{ %%BeginFeature: *ColorModel DeviceCMYK 1 dict dup /ProcessColorModel /DeviceCMYK put setpagedevice %%EndFeature } stopped cleartomark [{ %%BeginFeature: *PageSize Letter currentpagedevice /InputAttributes get 0 get dup length dict copy dup /PageSize [612 792] put 1 dict dup begin exch 0 exch def end 3 dict dup begin exch /InputAttributes exch def end dup /PageSize [612 792] put dup /ImagingBBox null put setpagedevice %%EndFeature } stopped cleartomark [{ %%BeginFeature: *MediaType Plain /DefaultColorRendering /PlainColorRendering /ColorRendering findresource /ColorRendering defineresource pop currentpagedevice /InputAttributes get 0 get /MediaType (Plain) put 1 dict dup /MediaType (Plain) put setpagedevice %%EndFeature } stopped cleartomark [{ %%BeginFeature: *AppHalftoning True %%EndFeature } stopped cleartomark %%EndSetup NTPSOct95 begin %%Page: 1 1 NTPSOct95 /PageSV save put 15 780 translate 72 300 div dup neg scale 0 0 transform .25 add round .25 sub exch .25 add round .25 sub exch itransform translate n 2128 2840 150 150 B 1 g f %%IncludeFont: Times-Bold [98.957 0 0 -98.957 0 0]/Times-Bold MF 0 g (Robust Hyperlinks Cost Just Five Words Each)223 240 MS [41.039 0 0 -41.039 0 0]/Times-Bold MF (Thomas A. Phelps and Robert Wilensky)864 355 MS (Division of Computer Science)957 402 MS (University of California, Berkeley)919 449 MS (Berkeley, CA 94720-1776)990 496 MS %%IncludeFont: Courier-Bold [41.039 0 0 -41.039 0 0]/Courier-Bold MF 0 0 1 r (phelps@cs.berkeley.edu)609 543 MS 0 g (, )1159 543 MS 0 0 1 r (wilensky@cs.berkeley.edu)1209 543 MS [41.039 0 0 -41.039 0 0]/Times-Bold MF 0 g ( )1809 543 MS %%IncludeFont: Times-Roman [48.957 0 0 -48.957 0 0]/Times-Roman MF (Keywords: signature, robust, hyperlink, reference, location)154 655 MS (Lexical signature: thinkinginpostscript cityquilt fernec planetext peroperties )154 768 MS [57.914 0 0 -57.914 0 0]/Times-Bold MF (Abstract)150 890 MS [48.957 0 0 -48.957 0 0]/Times-Roman MF (We propose )150 1003 MS %%IncludeFont: Times-Italic [48.957 0 0 -48.957 0 0]/Times-Italic MF (robust hyperlinks)399 1003 MS [48.957 0 0 -48.957 0 0]/Times-Roman MF ( as a solution to the problem of broken hyperlinks. A robust hyperlink is a)745 1003 MS (URL augmented with a small "signature", computed from the referenced document. The signature can be)150 1059 MS (submitted as a query to web search engines to locate the document. It turns out that very small signatures)150 1115 MS (are sufficient to readily locate individual documents out of the many millions on the web. )150 1171 MS (Robust hyperlinks exhibit a number of desirable qualities: They can be computed and exploited)154 1283 MS (automatically, are small and cheap to compute \(so that it is practical to make all hyperlinks robust\), do not)150 1339 MS (require new server or infrastructure support, can be rolled out reasonably well in the existing URL syntax,)150 1395 MS (can be used to automatically retrofit existing links to make them robust, and are easy to understand. In)150 1451 MS (particular, one can start using robust hyperlinks now, as servers and web pages are mostly compatible as is,)150 1507 MS (while clients can increase their support in the future. )150 1563 MS (Robust hyperlinks are one example of using the web to bootstrap new features onto itself.)154 1675 MS [57.914 0 0 -57.914 0 0]/Times-Bold MF (Introduction)150 1853 MS [48.957 0 0 -48.957 0 0]/Times-Roman MF (Hypertext research has long been concerned with the problem of the persistence of hyperlinks, that is, of)150 1967 MS (dealing with problems that arise when one endpoint of a link, especially the destination, is unresolvable,)150 2023 MS (either because it was deleted, renamed, moved, or otherwise changed. Dangling pointers on the Web are)150 2079 MS (considered by some to be a significant problem, and a number of solutions have been proposed to deal with)150 2135 MS (them. Some of these suggest reliance on some additional naming scheme, such as Uniform Resource Names)150 2191 MS (\(URNs\) )150 2248 MS [48.957 0 0 -48.957 0 0]/Times-Bold MF 0 0 1 r ([Sollins and Masinter, 1994])316 2248 MS [48.957 0 0 -48.957 0 0]/Times-Roman MF 0 g (, handles )901 2248 MS [48.957 0 0 -48.957 0 0]/Times-Bold MF 0 0 1 r ([Kahn and Wilensky, 1995])1088 2248 MS [48.957 0 0 -48.957 0 0]/Times-Roman MF 0 g (, Persistent Uniform Resource)1657 2248 MS (Locator \(PURLs\) )150 2305 MS [48.957 0 0 -48.957 0 0]/Times-Bold MF 0 0 1 r ([OCLC])503 2305 MS [48.957 0 0 -48.957 0 0]/Times-Roman MF 0 g ( or Common Names )676 2305 MS [48.957 0 0 -48.957 0 0]/Times-Bold MF 0 0 1 r ([CNRP])1084 2305 MS [48.957 0 0 -48.957 0 0]/Times-Roman MF 0 g (. Other approaches involve monitoring and)1251 2305 MS (notification to insure referential integrity \(e.g., )150 2362 MS [48.957 0 0 -48.957 0 0]/Times-Bold MF 0 0 1 r ([Ingham et al., 1996])1075 2362 MS [48.957 0 0 -48.957 0 0]/Times-Roman MF 0 g (\), )1505 2362 MS [48.957 0 0 -48.957 0 0]/Times-Bold MF 0 0 1 r ([Mind-it])1545 2362 MS [48.957 0 0 -48.957 0 0]/Times-Roman MF 0 g (, )1737 2362 MS [48.957 0 0 -48.957 0 0]/Times-Bold MF 0 0 1 r ([Macskassy and Shklar,)1761 2362 MS (1997])150 2419 MS [48.957 0 0 -48.957 0 0]/Times-Roman MF 0 g (, )266 2419 MS [48.957 0 0 -48.957 0 0]/Times-Bold MF 0 0 1 r ([Francis et al., 1995])290 2419 MS [48.957 0 0 -48.957 0 0]/Times-Roman MF 0 g (\). )715 2419 MS (In this paper, we demonstrate a different approach to this problem. This is to augment URLs so that they)154 2531 MS (themselves become )150 2587 MS [48.957 0 0 -48.957 0 0]/Times-Italic MF (robust hyperlinks)547 2587 MS [48.957 0 0 -48.957 0 0]/Times-Roman MF (. A robust hyperlink is one which offers a reasonable chance of being)893 2587 MS (successfully dereferenced in the presence of uncoordinated change. That is, suppose that, when creating a)150 2643 MS (hyperlink to a networked resource, one could design it so that it was still possible, with high probability, to)150 2699 MS (resolve the reference of the hyperlink, even if the resource referred to by the hyperlink had been moved and)150 2755 MS (edited, with the probability of successful dereferencing declining with the degree of substantive change to)150 2811 MS (the document content. Subsequence users of such a hyperlink would find it robust, in that it would function)150 2867 MS (reasonably well after the state of the network it reflected had changed. )150 2923 MS [36.684 0 0 -36.684 0 0]/Times-Roman MF (1)1205 3186 MS showpage %%Page: 2 2 15 780 translate 72 300 div dup neg scale 0 0 transform .25 add round .25 sub exch .25 add round .25 sub exch itransform translate n 2128 2855 150 150 B 1 g f [48.957 0 0 -48.957 0 0]/Times-Roman MF 0 g (Note that robust hyperlinks puts the burden of additional effort on the party creating the hyperlink, rather)154 194 MS (than the party administrating the resource. One implication of this fact is that no buy-in is required by an)150 250 MS (administrative unit, as for example a web server or site adminstrator. Similiarly, hyperlinks could be made)150 306 MS (robust on a piecemeal basis, a link at a time, rather than require use on a more systematic basis. )150 362 MS (We believe these practical advantages of robust hyperlinks should facilitate their adoption, should the)154 474 MS (underlying technology be available. That technology has the following requirements: )150 530 MS 1 j 1 setlinecap 11 sl n 252 621 M 252 618 249 616 246 616 c 243 616 241 618 241 621 c 241 624 243 627 246 627 c 249 627 252 624 252 621 c cp gs e gr CM 0.207 0.207 scale s SM [41.039 0 0 -41.039 0 0]/Times-Roman MF (Robust hyperlinks should provide a very high likelihood of successful dereferencing in those cases in which an item is)282 635 MS (moved, but has otherwise been left largely unchanged. Moreover, performance should degrade gracefully as document)282 682 MS (content changes from its state at the time the hyperlink was created. )282 729 MS n 252 762 M 252 759 249 757 246 757 c 243 757 241 759 241 762 c 241 765 243 768 246 768 c 249 768 252 765 252 762 c cp gs e gr CM 0.207 0.207 scale s SM (When the robust character of the hyperlink is not needed, robustness should not impose a significant performance penalty. )282 776 MS n 252 809 M 252 806 249 804 246 804 c 243 804 241 806 241 809 c 241 812 243 815 246 815 c 249 815 252 812 252 809 c cp gs e gr CM 0.207 0.207 scale s SM (The additional storage required for a robust hyperlink must be relatively small, so that it is practical to make all URLs)282 823 MS (robust. )282 871 MS n 252 903 M 252 900 249 898 246 898 c 243 898 241 900 241 903 c 241 906 243 909 246 909 c 249 909 252 906 252 903 c cp gs e gr CM 0.207 0.207 scale s SM (Robust hyperlinks will require support, however minimal, from client or from proxy with which the user can interact. Thus,)282 918 MS (implementation in clients or via proxies should be straightforward so as to encourage widespread adoption. )282 965 MS n 252 997 M 252 994 249 992 246 992 c 243 992 241 994 241 997 c 241 1000 243 1003 246 1003 c 249 1003 252 1000 252 997 c cp gs e gr CM 0.207 0.207 scale s SM (To encourage immediate adoption, robust hyperlinks should be largely non-interfering with clients and services that do not)282 1012 MS (support them. )282 1059 MS n 252 1091 M 252 1088 249 1086 246 1086 c 243 1086 241 1088 241 1091 c 241 1094 243 1097 246 1097 c 249 1097 252 1094 252 1091 c cp gs e gr CM 0.207 0.207 scale s SM (The additional work required to make a hyperlink robust should not be computationally large, and it must be possible to)282 1106 MS (automate it completely. That is, an author should be able to point to a hyperlink, and have it automatically become a robust)282 1153 MS (hyperlink. )282 1200 MS [57.914 0 0 -57.914 0 0]/Times-Bold MF (Providing Robust Hyperlinks)150 1367 MS [48.957 0 0 -48.957 0 0]/Times-Roman MF (As it turns out, robust hyperlinks can be readily created in a manner that fulfills all of the above)150 1480 MS (characteristics. The basic idea is extremely simple. It is to include some part of the document content along)150 1537 MS (with the URL. Then, if the URL is no longer valid, one can feed the content to a web search engine, and)150 1593 MS (peruse the results. )150 1649 MS (Of course, for many cases, it is not challenging to carry out this process by hand. For example, if the page is)154 1761 MS (known to be a particular individual's home page, then a user can manually call up a search engine, enter the)150 1817 MS (person's name and perhaps affiliation, and have a good chance of finding it. However, in the general case,)150 1873 MS (how to determine good search terms may not be obvious: Users may have never encountered a link before,)150 1929 MS (and hence may not know much about its content. Even if users are familiar with a resource, hypothesizing)150 1985 MS (search terms from memory generally involves much trial and error. )150 2041 MS (The scheme described in this paper determines a small number of good words to search for, that is, words)154 2153 MS (that will find the desired page, but as few of the other millions of pages in the web as possible. This "lexical)150 2209 MS (signature" is then attached to the URL. A robust-hyperlink-aware agent will generally perform an initial)150 2265 MS (attempt at "traditional \(i.e., address-based\) dereferencing", that is, looking up a URL, ignoring the signature.)150 2321 MS (However, if traditional dereferencing fails, the client enters into a second phase of "signature-based \(i..e,)150 2377 MS (content-based\) dereferencing", in which it uses the signature to search for documents whose signature most)150 2433 MS (closely matches that in the robust hyperlink. The user is then presented with the matching documents from)150 2489 MS (which to complete the reference. )150 2545 MS [48.957 0 0 -48.957 0 0]/Times-Bold MF (Computing Lexical Signatures)150 2658 MS [48.957 0 0 -48.957 0 0]/Times-Roman MF (One way to create lexical signatures that meets the desired criteria is to select the first few terms of the)150 2769 MS (document that have the highest "term frequency-inverse document frequency" \(TF-IDF\) values. Certainly,)150 2825 MS (such lexical signatures are easy to compute: The frequency of a term in a document is of course easy to)150 2882 MS (determine, and document frequency of terms can be estimated by the values given for these terms by search)150 2938 MS (engines. Intuitively, TF-IDF seems like a reasonable characterization of a document's contents. The)150 2994 MS [36.684 0 0 -36.684 0 0]/Times-Roman MF (2)1205 3186 MS showpage %%Page: 3 3 15 780 translate 72 300 div dup neg scale 0 0 transform .25 add round .25 sub exch .25 add round .25 sub exch itransform translate n 2128 2044 150 150 B 1 g f [48.957 0 0 -48.957 0 0]/Times-Roman MF 0 g (question is whether a relatively small set of such terms can effectively discriminate a given document from)150 194 MS (all the others in a large collection. )150 250 MS [48.957 0 0 -48.957 0 0]/Times-Bold MF (Empirical Results)150 363 MS [48.957 0 0 -48.957 0 0]/Times-Roman MF (Perhaps surprisingly, for a distributed hypertext system the size of the Web, a very small number of terms is)150 474 MS (sufficient. Specifically, a signature of five terms is sufficient to determine a web resource virtually uniquely.)150 530 MS (Indeed, fewer terms will probably suffice; we advocate at least these many terms because the redundancy)150 586 MS (that is provided is useful with respect to document change and because the additional terms may needed to)150 642 MS (distinguish documents as the web continues to grow. )150 698 MS (Let us examine this claim a bit more closely. Our criteria state that we need searching by signature to return)154 810 MS (a reasonably small result set, meaning one that can be readily perused by a user so as to select a document.)150 866 MS (Empirically, it seems that using five word signatures actually overshoots this goal: In most cases, a query to)150 922 MS (a search engine requesting documents which contain all of the terms in the signature will cause a unique)150 978 MS (document to be returned, namely, the desired document. In those few cases in which more than one)150 1034 MS (document is returned, the desired document is among the highest ranked. In those case in which a particular)150 1090 MS (search engine returns no matching documents, this is generally because the document has not yet been)150 1147 MS (indexed, or has been substantially edited since it was last indexed. )150 1203 MS (As an example, we computed signatures for a varied, unpremeditated sample of different sorts of web pages.)154 1315 MS (\(For a different set of examples, see the hyperlinks in this paper.\) Most of these are for papers referenced by)150 1371 MS (a bibliography maintained by one of the authors, as we feel this is a realistic application of the technology.)150 1427 MS (To these we added a personal home page, a research web site, and a commercial home page. For each case,)150 1483 MS (we computed signatures, and then perform straightforward queries using several search engines. \(That is,)150 1539 MS (we just supply the engines with the signature terms, without using any advanced features of the engines.\) In)150 1595 MS (Table 1, we report the rank of the document in the result set \(or "?" if it is absent in the first page of results,)150 1651 MS (which is usually the first ten results\). )150 1707 MS (Table 1:Sample Signatures and Query Results.)761 1822 MS (URLs are presented along with their signatures and the rank of that exact URL in the result set of each)172 1878 MS (search engine to which the signature is submitted. In addition, the server software of the URL's host is)172 1934 MS (listed, along with whether that server accepts the "robust URL" syntax described below. \(In the on-line)172 1990 MS (version of this paper, the contents of the "Accepts Robust URLs?" cells are hyperlinks that use robust)172 2046 MS (URLs; the search engine rank cells are hyperlinks that query the respective search engine with the)172 2102 MS (signature.\))172 2158 MS [36.684 0 0 -36.684 0 0]/Times-Roman MF (3)1205 3186 MS showpage %%Page: 4 4 15 780 translate 72 300 div dup neg scale 0 0 transform .25 add round .25 sub exch .25 add round .25 sub exch itransform translate n 2128 2883 150 150 B 1 g f gs n 2125 2717 154 150 CB n 163 159 M 163 2857 L 154 2866 L 154 150 L 2283 150 L 2273 159 L cp 0.754 g e gr gs n 2125 2717 154 150 CB n 2273 2857 M 163 2857 L 154 2866 L 2283 2866 L 2283 150 L 2273 159 L cp 0.5 g e gr n 111 3 169 165 B 0.5 g f n 2 177 169 165 B f n 111 2 169 343 B 0.754 g f n 3 177 281 165 B f [48.957 0 0 -48.957 0 0]/Times-Bold MF 0 g (URL)175 273 MS n 467 3 291 165 B 0.5 g f n 2 177 291 165 B f n 467 2 291 343 B 0.754 g f n 3 177 759 165 B f 0 g (signature)430 273 MS n 438 3 769 165 B 0.5 g f n 2 177 769 165 B f n 438 2 769 343 B 0.754 g f n 2 177 1208 165 B f 0 g (Server)921 273 MS n 200 3 1217 165 B 0.5 g f n 2 177 1217 165 B f n 200 2 1217 343 B 0.754 g f n 2 177 1418 165 B f 0 g (Accepts)1238 217 MS (Robust)1245 273 MS (URLs?)1245 329 MS n 157 3 1427 165 B 0.5 g f n 2 177 1427 165 B f n 157 2 1427 343 B 0.754 g f n 2 177 1585 165 B f 0 g (Google)1434 273 MS n 118 3 1594 165 B 0.5 g f n 2 177 1594 165 B f n 118 2 1594 343 B 0.754 g f n 2 177 1713 165 B f 0 g (Alta)1611 245 MS (Vista)1600 301 MS n 162 3 1723 165 B 0.5 g f n 2 177 1723 165 B f n 162 2 1723 343 B 0.754 g f n 2 177 1886 165 B f 0 g (Yahoo)1738 273 MS n 173 3 1895 165 B 0.5 g f n 3 177 1895 165 B f n 173 2 1895 343 B 0.754 g f n 2 177 2069 165 B f 0 g (Hotbot)1910 273 MS n 185 3 2078 165 B 0.5 g f n 2 177 2078 165 B f n 185 2 2078 343 B 0.754 g f n 2 177 2264 165 B f 0 g (Infoseek)2085 273 MS n 2094 2 169 352 B 0.5 g f n 2 65 169 352 B f n 2094 2 169 418 B 0.754 g f n 2 65 2264 352 B f 0 0 1 r (http://www.cs.berkeley.edu/~daf/)175 404 MS n 467 2 291 427 B 0.5 g f n 2 121 291 427 B f n 467 2 291 549 B 0.754 g f n 3 121 759 427 B f [48.957 0 0 -48.957 0 0]/Times-Roman MF 0 g (bregler interreflections)302 477 MS (zisserman cvpr iccv)332 533 MS n 438 2 769 427 B 0.5 g f n 2 121 769 427 B f n 438 2 769 549 B 0.754 g f n 2 121 1208 427 B f 0 g (Apache 1.3.4)860 505 MS n 200 2 1217 427 B 0.5 g f n 2 121 1217 427 B f n 200 2 1217 549 B 0.754 g f n 2 121 1418 427 B f [48.957 0 0 -48.957 0 0]/Times-Bold MF 0 0 1 r (yes)1280 506 MS [48.957 0 0 -48.957 0 0]/Times-Roman MF 0 g ( )1346 506 MS n 157 2 1427 427 B 0.5 g f n 2 121 1427 427 B f n 157 2 1427 549 B 0.754 g f n 2 121 1585 427 B f [48.957 0 0 -48.957 0 0]/Times-Bold MF 0 0 1 r (1 )1489 506 MS n 118 2 1594 427 B 0.5 g f n 2 121 1594 427 B f n 118 2 1594 549 B 0.754 g f n 2 121 1713 427 B f 0 0 1 r (6 )1637 506 MS n 162 2 1723 427 B 0.5 g f n 2 121 1723 427 B f n 162 2 1723 549 B 0.754 g f n 2 121 1886 427 B f 0 0 1 r (1 )1787 506 MS n 173 2 1895 427 B 0.5 g f n 3 121 1895 427 B f n 173 2 1895 549 B 0.754 g f n 2 121 2069 427 B f 0 0 1 r (1 )1965 506 MS n 185 2 2078 427 B 0.5 g f n 2 121 2078 427 B f n 185 2 2078 549 B 0.754 g f n 2 121 2264 427 B f 0 0 1 r (1 )2154 506 MS n 2094 2 169 558 B 0.5 g f n 2 64 169 558 B f n 2094 2 169 623 B 0.754 g f n 2 65 2264 558 B f 0 0 1 r (http://www-diglib.stanford.edu/diglib/pub/)175 609 MS n 467 2 291 633 B 0.5 g f n 2 120 291 633 B f n 467 2 291 754 B 0.754 g f n 3 120 759 633 B f [48.957 0 0 -48.957 0 0]/Times-Roman MF 0 g (sdlip interbib sdlt)353 683 MS (infobus testbed)376 739 MS n 438 2 769 633 B 0.5 g f n 2 120 769 633 B f n 438 2 769 754 B 0.754 g f n 2 120 1208 633 B f 0 g (Apache 1.3.4)860 711 MS n 200 2 1217 633 B 0.5 g f n 2 120 1217 633 B f n 200 2 1217 754 B 0.754 g f n 2 120 1418 633 B f [48.957 0 0 -48.957 0 0]/Times-Bold MF 0 0 1 r (yes)1280 711 MS [48.957 0 0 -48.957 0 0]/Times-Roman MF 0 g ( )1346 712 MS n 157 2 1427 633 B 0.5 g f n 2 120 1427 633 B f n 157 2 1427 754 B 0.754 g f n 2 120 1585 633 B f [48.957 0 0 -48.957 0 0]/Times-Bold MF 0 0 1 r (1 )1489 712 MS n 118 2 1594 633 B 0.5 g f n 2 120 1594 633 B f n 118 2 1594 754 B 0.754 g f n 2 120 1713 633 B f 0 0 1 r (1 )1637 712 MS n 162 2 1723 633 B 0.5 g f n 2 120 1723 633 B f n 162 2 1723 754 B 0.754 g f n 2 120 1886 633 B f 0 0 1 r (1 )1787 712 MS n 173 2 1895 633 B 0.5 g f n 3 120 1895 633 B f n 173 2 1895 754 B 0.754 g f n 2 120 2069 633 B f 0 0 1 r (1 )1965 712 MS n 185 2 2078 633 B 0.5 g f n 2 120 2078 633 B f n 185 2 2078 754 B 0.754 g f n 2 120 2264 633 B f 0 0 1 r (4 )2154 712 MS n 2094 2 169 764 B 0.5 g f n 2 64 169 764 B f n 2094 2 169 829 B 0.754 g f n 2 64 2264 764 B f 0 0 1 r (http://www.hotofftheweb.com/)175 815 MS n 467 2 291 838 B 0.5 g f n 2 177 291 838 B f n 467 2 291 1016 B 0.754 g f n 3 177 759 838 B f [48.957 0 0 -48.957 0 0]/Times-Roman MF 0 g (servicemarks)398 889 MS (moskowitz mustache)320 945 MS (scrapbook surfers)354 1001 MS n 438 2 769 838 B 0.5 g f n 2 177 769 838 B f n 438 2 769 1016 B 0.754 g f n 2 177 1208 838 B f 0 g (ApacheSSL)873 917 MS (2.4.1/1.3.3)884 973 MS n 200 2 1217 838 B 0.5 g f n 2 177 1217 838 B f n 200 2 1217 1016 B 0.754 g f n 2 177 1418 838 B f [48.957 0 0 -48.957 0 0]/Times-Bold MF 0 0 1 r (yes)1280 945 MS [48.957 0 0 -48.957 0 0]/Times-Roman MF 0 g ( )1346 945 MS n 157 2 1427 838 B 0.5 g f n 2 177 1427 838 B f n 157 2 1427 1016 B 0.754 g f n 2 177 1585 838 B f [48.957 0 0 -48.957 0 0]/Times-Bold MF 0 0 1 r (1 )1489 946 MS n 118 2 1594 838 B 0.5 g f n 2 177 1594 838 B f n 118 2 1594 1016 B 0.754 g f n 2 177 1713 838 B f 0 0 1 r (2 )1637 946 MS n 162 2 1723 838 B 0.5 g f n 2 177 1723 838 B f n 162 2 1723 1016 B 0.754 g f n 2 177 1886 838 B f 0 0 1 r (1 )1787 946 MS n 173 2 1895 838 B 0.5 g f n 3 177 1895 838 B f n 173 2 1895 1016 B 0.754 g f n 2 177 2069 838 B f 0 0 1 r (1 )1965 946 MS n 185 2 2078 838 B 0.5 g f n 2 177 2078 838 B f n 185 2 2078 1016 B 0.754 g f n 2 177 2264 838 B f 0 0 1 r (1 )2154 946 MS n 2094 2 169 1025 B 0.5 g f n 2 65 169 1025 B f n 2094 2 169 1091 B 0.754 g f n 2 65 2264 1025 B f 0 0 1 r (http://developer.apple.com/techpubs/macos8/Legacy/OpenDoc/opendoc.html)175 1076 MS n 467 2 291 1100 B 0.5 g f n 2 177 291 1100 B f n 467 2 291 1278 B 0.754 g f n 3 177 759 1100 B f [48.957 0 0 -48.957 0 0]/Times-Roman MF 0 g (opendoc webobjects)325 1150 MS (software constants)345 1206 MS (reference)436 1262 MS n 438 2 769 1100 B 0.5 g f n 2 177 769 1100 B f n 438 2 769 1278 B 0.754 g f n 2 177 1208 1100 B f 0 g (Netscape-Enterprise)790 1178 MS (3.5.1G)923 1234 MS n 200 2 1217 1100 B 0.5 g f n 2 177 1217 1100 B f n 200 2 1217 1278 B 0.754 g f n 2 177 1418 1100 B f [48.957 0 0 -48.957 0 0]/Times-Bold MF 0 0 1 r (yes)1280 1207 MS [48.957 0 0 -48.957 0 0]/Times-Roman MF 0 g ( )1346 1207 MS n 157 2 1427 1100 B 0.5 g f n 2 177 1427 1100 B f n 157 2 1427 1278 B 0.754 g f n 2 177 1585 1100 B f [48.957 0 0 -48.957 0 0]/Times-Bold MF 0 0 1 r (1 )1489 1207 MS n 118 2 1594 1100 B 0.5 g f n 2 177 1594 1100 B f n 118 2 1594 1278 B 0.754 g f n 2 177 1713 1100 B f 0 0 1 r (? )1637 1207 MS n 162 2 1723 1100 B 0.5 g f n 2 177 1723 1100 B f n 162 2 1723 1278 B 0.754 g f n 2 177 1886 1100 B f 0 0 1 r (1 )1787 1207 MS n 173 2 1895 1100 B 0.5 g f n 3 177 1895 1100 B f n 173 2 1895 1278 B 0.754 g f n 2 177 2069 1100 B f 0 0 1 r (? )1965 1207 MS n 185 2 2078 1100 B 0.5 g f n 2 177 2078 1100 B f n 185 2 2078 1278 B 0.754 g f n 2 177 2264 1100 B f 0 0 1 r (? )2154 1207 MS n 2094 2 169 1287 B 0.5 g f n 2 64 169 1287 B f n 2094 2 169 1352 B 0.754 g f n 2 65 2264 1287 B f 0 0 1 r (http://www.rightbrain.com/pages/book-download.shtml)175 1338 MS n 467 2 291 1362 B 0.5 g f n 2 120 291 1362 B f n 467 2 291 1483 B 0.754 g f n 3 120 759 1362 B f [48.957 0 0 -48.957 0 0]/Times-Roman MF 0 g (thinkinginpostscript)328 1412 MS (ematter click infringe)316 1468 MS n 438 2 769 1362 B 0.5 g f n 2 120 769 1362 B f n 438 2 769 1483 B 0.754 g f n 2 120 1208 1362 B f 0 g (BESTWWWD 2.4)806 1440 MS n 200 2 1217 1362 B 0.5 g f n 2 120 1217 1362 B f n 200 2 1217 1483 B 0.754 g f n 2 120 1418 1362 B f [48.957 0 0 -48.957 0 0]/Times-Bold MF 0 0 1 r (yes)1280 1440 MS [48.957 0 0 -48.957 0 0]/Times-Roman MF 0 g ( )1346 1440 MS n 157 2 1427 1362 B 0.5 g f n 2 120 1427 1362 B f n 157 2 1427 1483 B 0.754 g f n 2 120 1585 1362 B f [48.957 0 0 -48.957 0 0]/Times-Bold MF 0 0 1 r (? )1489 1441 MS n 118 2 1594 1362 B 0.5 g f n 2 120 1594 1362 B f n 118 2 1594 1483 B 0.754 g f n 2 120 1713 1362 B f 0 0 1 r (2 )1637 1441 MS n 162 2 1723 1362 B 0.5 g f n 2 120 1723 1362 B f n 162 2 1723 1483 B 0.754 g f n 2 120 1886 1362 B f 0 0 1 r (? )1787 1441 MS n 173 2 1895 1362 B 0.5 g f n 3 120 1895 1362 B f n 173 2 1895 1483 B 0.754 g f n 2 120 2069 1362 B f 0 0 1 r (? )1965 1441 MS n 185 2 2078 1362 B 0.5 g f n 2 120 2078 1362 B f n 185 2 2078 1483 B 0.754 g f n 2 120 2264 1362 B f 0 0 1 r (? )2154 1441 MS n 2094 2 169 1493 B 0.5 g f n 2 64 169 1493 B f n 2094 2 169 1558 B 0.754 g f n 2 64 2264 1493 B f 0 0 1 r (http://msdn.microsoft.com/workshop/author/css/css.asp)175 1544 MS n 467 2 291 1567 B 0.5 g f n 2 177 291 1567 B f n 467 2 291 1745 B 0.754 g f n 3 177 759 1567 B f [48.957 0 0 -48.957 0 0]/Times-Roman MF 0 g (mystyles dblspaced)336 1618 MS (selamoglu intdev)358 1674 MS (italicizes)439 1730 MS n 438 2 769 1567 B 0.5 g f n 2 177 769 1567 B f n 438 2 769 1745 B 0.754 g f n 2 177 1208 1567 B f 0 g (Microsoft-IIS 5.0)819 1674 MS n 200 2 1217 1567 B 0.5 g f n 2 177 1217 1567 B f n 200 2 1217 1745 B 0.754 g f n 2 177 1418 1567 B f [48.957 0 0 -48.957 0 0]/Times-Bold MF 0 0 1 r (yes)1280 1674 MS [48.957 0 0 -48.957 0 0]/Times-Roman MF 0 g ( )1346 1674 MS n 157 2 1427 1567 B 0.5 g f n 2 177 1427 1567 B f n 157 2 1427 1745 B 0.754 g f n 2 177 1585 1567 B f [48.957 0 0 -48.957 0 0]/Times-Bold MF 0 0 1 r (1 )1489 1675 MS n 118 2 1594 1567 B 0.5 g f n 2 177 1594 1567 B f n 118 2 1594 1745 B 0.754 g f n 2 177 1713 1567 B f 0 0 1 r (1 )1637 1675 MS n 162 2 1723 1567 B 0.5 g f n 2 177 1723 1567 B f n 162 2 1723 1745 B 0.754 g f n 2 177 1886 1567 B f 0 0 1 r (1 )1787 1675 MS n 173 2 1895 1567 B 0.5 g f n 3 177 1895 1567 B f n 173 2 1895 1745 B 0.754 g f n 2 177 2069 1567 B f 0 0 1 r (1 )1965 1675 MS n 185 2 2078 1567 B 0.5 g f n 2 177 2078 1567 B f n 185 2 2078 1745 B 0.754 g f n 2 177 2264 1567 B f 0 0 1 r (1 )2154 1675 MS n 2094 2 169 1754 B 0.5 g f n 2 65 169 1754 B f n 2094 2 169 1820 B 0.754 g f n 2 65 2264 1754 B f 0 0 1 r (http://www.adobe.com/products/acrobat/main.html)175 1805 MS n 467 2 291 1829 B 0.5 g f n 2 176 291 1829 B f n 467 3 291 2006 B 0.754 g f n 3 177 759 1829 B f [48.957 0 0 -48.957 0 0]/Times-Roman MF 0 g (accessiblity hewson)331 1879 MS (epaper gillmor)383 1935 MS (workflow)432 1991 MS n 438 2 769 1829 B 0.5 g f n 2 176 769 1829 B f n 438 3 769 2006 B 0.754 g f n 2 177 1208 1829 B f 0 g (Netscape-Enterprise)790 1907 MS (3.6 SP2)913 1963 MS n 200 2 1217 1829 B 0.5 g f n 2 176 1217 1829 B f n 200 3 1217 2006 B 0.754 g f n 2 177 1418 1829 B f [48.957 0 0 -48.957 0 0]/Times-Bold MF 0 0 1 r (yes)1280 1936 MS [48.957 0 0 -48.957 0 0]/Times-Roman MF 0 g ( )1346 1936 MS n 157 2 1427 1829 B 0.5 g f n 2 176 1427 1829 B f n 157 3 1427 2006 B 0.754 g f n 2 177 1585 1829 B f [48.957 0 0 -48.957 0 0]/Times-Bold MF 0 0 1 r (? )1489 1936 MS n 118 2 1594 1829 B 0.5 g f n 2 176 1594 1829 B f n 118 3 1594 2006 B 0.754 g f n 2 177 1713 1829 B f 0 0 1 r (1 )1637 1936 MS n 162 2 1723 1829 B 0.5 g f n 2 176 1723 1829 B f n 162 3 1723 2006 B 0.754 g f n 2 177 1886 1829 B f 0 0 1 r (1 )1787 1936 MS n 173 2 1895 1829 B 0.5 g f n 3 176 1895 1829 B f n 173 3 1895 2006 B 0.754 g f n 2 177 2069 1829 B f 0 0 1 r (1 )1965 1936 MS n 185 2 2078 1829 B 0.5 g f n 2 176 2078 1829 B f n 185 3 2078 2006 B 0.754 g f n 2 177 2264 1829 B f 0 0 1 r (? )2154 1936 MS n 2094 2 169 2016 B 0.5 g f n 2 64 169 2016 B f n 2094 2 169 2081 B 0.754 g f n 2 64 2264 2016 B f 0 0 1 r (http://www.isg.sfu.ca/~duchier/misc/hypertext_review/)175 2067 MS n 467 2 291 2091 B 0.5 g f n 2 232 291 2091 B f n 467 2 291 2324 B 0.754 g f n 3 232 759 2091 B f [48.957 0 0 -48.957 0 0]/Times-Roman MF 0 g (balasubramanian bala)315 2169 MS (hypermedia pegasus)328 2225 MS (interface)441 2281 MS n 438 2 769 2091 B 0.5 g f n 2 232 769 2091 B f n 438 2 769 2324 B 0.754 g f n 2 232 1208 2091 B f 0 g (Apache 1.3.6 \(Unix\))789 2141 MS (PHP/3.0.7)889 2197 MS (mod_ssl/2.3.11)838 2253 MS (OpenSSL/0.9.3a)827 2309 MS n 200 2 1217 2091 B 0.5 g f n 2 232 1217 2091 B f n 200 2 1217 2324 B 0.754 g f n 2 232 1418 2091 B f [48.957 0 0 -48.957 0 0]/Times-Bold MF 0 0 1 r (yes)1280 2225 MS [48.957 0 0 -48.957 0 0]/Times-Roman MF 0 g ( )1346 2225 MS n 157 2 1427 2091 B 0.5 g f n 2 232 1427 2091 B f n 157 2 1427 2324 B 0.754 g f n 2 232 1585 2091 B f [48.957 0 0 -48.957 0 0]/Times-Bold MF 0 0 1 r (1 )1489 2226 MS n 118 2 1594 2091 B 0.5 g f n 2 232 1594 2091 B f n 118 2 1594 2324 B 0.754 g f n 2 232 1713 2091 B f 0 0 1 r (10 )1624 2226 MS n 162 2 1723 2091 B 0.5 g f n 2 232 1723 2091 B f n 162 2 1723 2324 B 0.754 g f n 2 232 1886 2091 B f 0 0 1 r (?)1787 2169 MS [48.957 0 0 -48.957 0 0]/Times-Roman MF 0 g ( )1812 2169 MS (\(1-4 are)1729 2225 MS (dups.\) )1739 2282 MS n 173 2 1895 2091 B 0.5 g f n 3 232 1895 2091 B f n 173 2 1895 2324 B 0.754 g f n 2 232 2069 2091 B f [48.957 0 0 -48.957 0 0]/Times-Bold MF 0 0 1 r (? )1965 2170 MS [48.957 0 0 -48.957 0 0]/Times-Roman MF 0 g (\(1-4 are)1907 2225 MS (dups.\))1923 2281 MS n 185 2 2078 2091 B 0.5 g f n 2 232 2078 2091 B f n 185 2 2078 2324 B 0.754 g f n 2 232 2264 2091 B f [48.957 0 0 -48.957 0 0]/Times-Bold MF 0 0 1 r (? )2154 2226 MS n 2094 2 169 2334 B 0.5 g f n 2 64 169 2334 B f n 2094 2 169 2399 B 0.754 g f n 2 64 2264 2334 B f 0 0 1 r (http://www.ai.univie.ac.at/~paolo/lva/vu-htmm1998/html/conklin87/Conklin87.html)175 2385 MS n 467 2 291 2408 B 0.5 g f n 2 177 291 2408 B f n 467 2 291 2586 B 0.754 g f n 3 177 759 2408 B f [48.957 0 0 -48.957 0 0]/Times-Roman MF 0 g (planetext fernec)369 2459 MS (synview peroperties)330 2515 MS (textnet)459 2571 MS n 438 2 769 2408 B 0.5 g f n 2 177 769 2408 B f n 438 2 769 2586 B 0.754 g f n 2 177 1208 2408 B f 0 g (Apache 1.3.3 \(Unix\))789 2487 MS (Debian/GNU)859 2543 MS n 200 2 1217 2408 B 0.5 g f n 2 177 1217 2408 B f n 200 2 1217 2586 B 0.754 g f n 2 177 1418 2408 B f [48.957 0 0 -48.957 0 0]/Times-Bold MF 0 0 1 r (yes)1280 2515 MS [48.957 0 0 -48.957 0 0]/Times-Roman MF 0 g ( )1346 2515 MS n 157 2 1427 2408 B 0.5 g f n 2 177 1427 2408 B f n 157 2 1427 2586 B 0.754 g f n 2 177 1585 2408 B f [48.957 0 0 -48.957 0 0]/Times-Bold MF 0 0 1 r (? )1489 2516 MS n 118 2 1594 2408 B 0.5 g f n 2 177 1594 2408 B f n 118 2 1594 2586 B 0.754 g f n 2 177 1713 2408 B f 0 0 1 r (? )1637 2516 MS n 162 2 1723 2408 B 0.5 g f n 2 177 1723 2408 B f n 162 2 1723 2586 B 0.754 g f n 2 177 1886 2408 B f 0 0 1 r (? )1787 2516 MS n 173 2 1895 2408 B 0.5 g f n 3 177 1895 2408 B f n 173 2 1895 2586 B 0.754 g f n 2 177 2069 2408 B f 0 0 1 r (1 )1965 2516 MS n 185 2 2078 2408 B 0.5 g f n 2 177 2078 2408 B f n 185 2 2078 2586 B 0.754 g f n 2 177 2264 2408 B f 0 0 1 r (? )2154 2516 MS n 2094 2 169 2595 B 0.5 g f n 2 65 169 2595 B f n 2094 2 169 2661 B 0.754 g f n 2 65 2264 2595 B f 0 0 1 r (http://www.lcc.gatech.edu/gallery/hypercafe/HT96_HTML/HyperCafe_HT96.html)175 2646 MS n 467 2 291 2670 B 0.5 g f n 2 177 291 2670 B f n 467 2 291 2848 B 0.754 g f n 3 177 759 2670 B f [48.957 0 0 -48.957 0 0]/Times-Roman MF 0 g (cityquilt hypervideo)328 2720 MS (cyberbelt infrawhere)324 2776 MS (videotexts)425 2832 MS n 438 2 769 2670 B 0.5 g f n 2 177 769 2670 B f n 438 2 769 2848 B 0.754 g f n 2 177 1208 2670 B f 0 g (Apache 1.3.6 \(Unix\))789 2776 MS n 200 2 1217 2670 B 0.5 g f n 2 177 1217 2670 B f n 200 2 1217 2848 B 0.754 g f n 2 177 1418 2670 B f [48.957 0 0 -48.957 0 0]/Times-Bold MF 0 0 1 r (yes)1280 2777 MS [48.957 0 0 -48.957 0 0]/Times-Roman MF 0 g ( )1346 2777 MS n 157 2 1427 2670 B 0.5 g f n 2 177 1427 2670 B f n 157 2 1427 2848 B 0.754 g f n 2 177 1585 2670 B f [48.957 0 0 -48.957 0 0]/Times-Bold MF 0 0 1 r (1 )1489 2777 MS n 118 2 1594 2670 B 0.5 g f n 2 177 1594 2670 B f n 118 2 1594 2848 B 0.754 g f n 2 177 1713 2670 B f 0 0 1 r (1 )1637 2777 MS n 162 2 1723 2670 B 0.5 g f n 2 177 1723 2670 B f n 162 2 1723 2848 B 0.754 g f n 2 177 1886 2670 B f 0 0 1 r (1 )1787 2777 MS n 173 2 1895 2670 B 0.5 g f n 3 177 1895 2670 B f n 173 2 1895 2848 B 0.754 g f n 2 177 2069 2670 B f 0 0 1 r (1 )1965 2777 MS n 185 2 2078 2670 B 0.5 g f n 2 177 2078 2670 B f n 185 2 2078 2848 B 0.754 g f n 2 177 2264 2670 B f 0 0 1 r (1 )2154 2777 MS [48.957 0 0 -48.957 0 0]/Times-Roman MF 0 g (These results suggest that a number of signature-based dereferencing strategies are feasible. For example,)154 2966 MS (an agent could query a set of engines, and return the top few results from each one. In our sample set, each)150 3022 MS [36.684 0 0 -36.684 0 0]/Times-Roman MF (4)1205 3186 MS showpage %%Page: 5 5 15 780 translate 72 300 div dup neg scale 0 0 transform .25 add round .25 sub exch .25 add round .25 sub exch itransform translate n 2128 2857 150 150 B 1 g f [48.957 0 0 -48.957 0 0]/Times-Roman MF 0 g (hyperlink is successfully dereferenced by this strategy. )150 194 MS (Alternatively, an agent could make "stringent" queries to one or more engines \(i.e., queries insisting that all)154 306 MS (the terms be present\), and, if this fails to return a result, make progressively less stringent queries. \(While it)150 362 MS (is not obvious in the table, in most cases, only one or two documents are returned by the more stringent)150 418 MS (searches. In most other cases, the stringent searches returns no items, probably because the document was)150 474 MS (substantially modified since the last time the crawler reached it.\) For example, a Google query succeeds)150 530 MS (only when all the terms are provided; the Alta Vista query will find the most relevant pages, which do not)150 586 MS (necessary include all of the query terms. Performing the Google query, and then performing the Alta Vista)150 642 MS (query if Google fails, locates the desired reference in all but one case. )150 698 MS (Of course, most of these search engines \(but not Google\) offer "advanced options" that afford the user more)154 810 MS (control, leaving open the possibility of addition strategies. )150 866 MS (Note that the results actually be even better than the table would suggest. For example, in several cases an)154 978 MS (identical paper with a different URL occurs earlier in the result set. Similarly, some of the other pages may)150 1034 MS (produce access to the document a link or two away, so even though we do not count this as successful, it)150 1090 MS (might be helpful to the user. )150 1147 MS (To understand why such a small number of terms can uniquely identify a web page, we suggest the)154 1259 MS (following line of reasoning. There are probably a very large number of distinct terms on the web; 500,000 is)150 1315 MS (probably a conservative estimate. Then the number of distinct combinations of 5 terms is greater than)150 1371 MS (3x10*28. Assuming the web is populated by documents whose most characteristic terms are uniformly)150 1427 MS (drawn at random, the probability that more than one document matches a set of 5 characteristic terms is)150 1483 MS (very small indeed. )150 1539 MS (Of course, the assumption of uniform distribution is highly questionable. Perhaps the empirical results)154 1651 MS (indicate that it is not that far off in practice. Interesting, even among intuitively similar documents \(e.g.,)150 1707 MS (separate chapters of the same book\), signatures seem not to overlap much. In addition, TF-IDF-based)150 1763 MS (signatures are by definition skewed toward infrequent terms--most of the signatures we have seen contain)150 1819 MS (domain-specific abbreviations, proper names, jargon, et cetera, which may each occur only in a few dozen)150 1875 MS (documents, narrowing down the set of matching documents very rapidly. )150 1931 MS (Indeed, in our examples above, cutting the signature length down to three terms changes the query results)154 2043 MS (only slightly. \(One signature would fail to readily locate its target in this case.\) Determining the optimal)150 2099 MS (length will presumably require some empirical experimentation and study. However, the method does not)150 2155 MS (depend on a standard length signature, so different implementations are free to use different lengths as well)150 2211 MS (as different methods of computing them. )150 2267 MS (We find these results are encouraging, if only impressionistic. We have automated the process of signature)154 2379 MS (creation and subsequent searching, and have found these result to be consistent with our \(still somewhat)150 2435 MS (limited\) experience. For example, we created signatures for all the references at the end of paper. The)150 2492 MS (results are virtually identical to those in the table. However, we have resisted the temptation to report more)150 2548 MS (thorough empirical testing for several reasons. First, we are depending on web search engines, whose)150 2604 MS (performance changes from moment to moment. Second, these results, and any other empirical testing we)150 2660 MS (might do, compute signatures and search for pages that have not actually been moved, whereas in real use,)150 2716 MS (old signatures would be used to search for moved, possibly changed, pages. Therefore, we do not think that)150 2772 MS (generating a large quantity of artificial data will be definitive proof that robust hyperlinks can be used)150 2828 MS (effectively, which will of necessity require empirical testing by real users. Finally, as we discuss below,)150 2884 MS (there are many ways in which one might vary and possibly improve the details of this scheme \(all of which)150 2940 MS (are mutually compatible\). Thus, we present sample empirical results to demonstrate feasibility, and to)150 2996 MS [36.684 0 0 -36.684 0 0]/Times-Roman MF (5)1205 3186 MS showpage %%Page: 6 6 15 780 translate 72 300 div dup neg scale 0 0 transform .25 add round .25 sub exch .25 add round .25 sub exch itransform translate n 2128 2870 150 150 B 1 g f [48.957 0 0 -48.957 0 0]/Times-Roman MF 0 g (encourage experimentation and use. )150 194 MS [57.914 0 0 -57.914 0 0]/Times-Bold MF (Integrating Robust Hyperlinks into the Web)150 316 MS [48.957 0 0 -48.957 0 0]/Times-Bold MF (Encoding Signatures in URLs)150 430 MS [48.957 0 0 -48.957 0 0]/Times-Roman MF (Given that lexical signatures are a good way to augment URLs, we are left with the issue of how to include)150 541 MS (these in hyperlinks. Here we discuss several alternatives. For the purposes of this discussion, suppose that)150 597 MS (the hyperlink has the URL )150 653 MS [48.957 0 0 -48.957 0 0]/Times-Italic MF (http://www.something.dom/a/b/c)683 653 MS [48.957 0 0 -48.957 0 0]/Times-Roman MF (, and that the designated resource has the)1325 653 MS (signature )150 709 MS [48.957 0 0 -48.957 0 0]/Times-Italic MF (w1,...,w5)342 709 MS [48.957 0 0 -48.957 0 0]/Times-Roman MF (. )514 709 MS [48.957 0 0 -48.957 0 0]/Times-Italic MF (Robust URLs \(Incompatible with existing web syntax\))154 821 MS [48.957 0 0 -48.957 0 0]/Times-Roman MF (: One can introduce new syntax for URLs to identify)1214 821 MS (the signature, as for example XPointer )150 878 MS [48.957 0 0 -48.957 0 0]/Times-Bold MF 0 0 1 r ([XPointer])915 878 MS [48.957 0 0 -48.957 0 0]/Times-Roman MF 0 g ( does for sub-resource references. Doing so is)1136 878 MS (incompatible with existing URLs, and makes for an awkward transition in adopting the scheme. \(However,)150 934 MS (should robust URLs come into widespread use, such a proposal might merit further consideration.\) )150 990 MS [48.957 0 0 -48.957 0 0]/Times-Italic MF (Robust URLs \(Mostly Compatible\))154 1103 MS [48.957 0 0 -48.957 0 0]/Times-Roman MF (: Another approach is to append the signature to the URL as if it were a)839 1103 MS (query term, that is: )150 1159 MS %%IncludeFont: Courier [41.039 0 0 -41.039 0 0]/Courier MF (http://www.something.dom/a/b/c?lexical-signature="w1+w2+w3+w4+w5")279 1271 MS [48.957 0 0 -48.957 0 0]/Times-Roman MF ( )1904 1271 MS (\(If the URL already includes a query, the proposal is to append the signature expression with a "&".\) )150 1383 MS (In this approach, if one's client is "robust-hyperlink-aware", it strips the signature expression before)154 1495 MS (attempting traditional dereferencing. An advantage of this approach is that, if one's client is not)150 1551 MS (robust-hyperlink-aware, one can use a local proxy to intercept all hyperlinks, and perform any robust)150 1607 MS (hyperlink processing there, including stripping out the lexical-signature expression for traditional)150 1663 MS (dereferencing. In addition, the robust hyperlink is just a URL, and hence can be passed around easily, as in)150 1719 MS (an email message \(although we suspect that this will not be an important use\). )150 1775 MS (The primary disadvantage of this approach is that it may have some adverse interactions with non-aware)154 1887 MS (clients and servers. However, most widely used HTTP servers ignore query terms for a non-script URL, and)150 1943 MS (most services seem to ignore what appear to be gratuitous search terms. For example, the table above)150 1999 MS (reports, in the "Accepts Robust URLs?" column, the results of submitting a robust URL in place of each)150 2055 MS (original URL. These are uniformly successful. Moreover, the sample covers the major web servers in use)150 2111 MS (today: Apache \(with over 50% of the market\), Microsoft Internet Information Server \(24%\), and Netscape)150 2167 MS (Enterprise \(7%\) )150 2224 MS [48.957 0 0 -48.957 0 0]/Times-Bold MF 0 0 1 r ([Netcraft 1999])475 2224 MS [48.957 0 0 -48.957 0 0]/Times-Roman MF 0 g (. )792 2224 MS (Of course, the term "lexical-signature" might in fact be a valid search term for some service. While one can)154 2337 MS (arbitrarily decrease the possibility of a collision by using an even more obscure name, the term is probably)150 2393 MS (of sufficient rarity already. Once the term is established in the future, well-informed web designers will)150 2449 MS (know not to use it. )150 2505 MS (So, for the most part, "robust URLs" will be harmless to non-aware clients and servers, if not universally so.)154 2617 MS (If the scheme becomes popular, the minority of web sites hostile to unknown parameters may become less)150 2673 MS (so, or perhaps explicitly recogize one more. )150 2729 MS [48.957 0 0 -48.957 0 0]/Times-Italic MF (Robust Link Elements.)154 2841 MS [48.957 0 0 -48.957 0 0]/Times-Roman MF ( Another possibility is to include the signature in the markup rather than as part of)596 2841 MS (the URL. For example, suppose the URL was part of the following HTML anchor element: )150 2897 MS [41.039 0 0 -41.039 0 0]/Courier MF (click here)279 3009 MS [48.957 0 0 -48.957 0 0]/Times-Roman MF ( )1654 3009 MS [36.684 0 0 -36.684 0 0]/Times-Roman MF (6)1205 3186 MS showpage %%Page: 7 7 15 780 translate 72 300 div dup neg scale 0 0 transform .25 add round .25 sub exch .25 add round .25 sub exch itransform translate n 2128 2833 150 150 B 1 g f [48.957 0 0 -48.957 0 0]/Times-Roman MF 0 g (We could robustify this to )150 250 MS [41.039 0 0 -41.039 0 0]/Courier MF (click here)279 406 MS [48.957 0 0 -48.957 0 0]/Times-Roman MF ( )1504 406 MS (The advantage of this proposal is that it should be ignored completely by non-aware clients. A disadvantage)150 518 MS (is that, if one's client is not robust-hyperlink-aware, then one can't set up a local proxy to intercept a)150 574 MS (dereferencing attempt, as one can for robust URLs. Furthermore, this embedding works only for document)150 630 MS (formats that can harmlessly accept new attributes, limiting its use to HTML, XML, SGML \(and even there,)150 686 MS (documents will fail validation against un-updated DTDs\), and excluding other text and multimedia types. )150 742 MS [48.957 0 0 -48.957 0 0]/Times-Bold MF (System Support for Robust Hyperlinks)150 855 MS [48.957 0 0 -48.957 0 0]/Times-Roman MF (To take advantage of robust hyperlinks, some piece of software needs to exploit them. Ideally, browsers will)150 966 MS (support them directly. This would require no changes in servers or other infrastructure, and users would)150 1022 MS (benefit as soon as they update their browser. Until this happens, support can be provided by a proxy server)150 1078 MS (that retains the major benefits, though it requires the assent of the web site administrator. )150 1134 MS [48.957 0 0 -48.957 0 0]/Times-Italic MF (Robust Proxy Module.)154 1247 MS [48.957 0 0 -48.957 0 0]/Times-Roman MF ( A robust proxy module transparently makes robust all URLs that pass through it,)596 1247 MS (from any client connecting to any server. When a client requests a page with a non-robust URL, the module)150 1303 MS (signs the returned incoming page and sends a redirect HTTP signal to the client to enable it to accept the)150 1359 MS (robust hyperlink. This hyperlink can then be saved as a bookmark, used in an HTML document, or emailed.)150 1415 MS (Moreover, when clients send robust hyperlinks to the proxy module, the signature is stripped off so as to)150 1471 MS (minimize the chance of adverse interactions with servers. If the server returns an HTTP error code 404, the)150 1527 MS (proxy engages in signature-based dereferencing, i.e., sends the signature to one or more search engines. )150 1583 MS (The advantage of this approach is that the proxy server can always correctly interpret the robust URL, and)154 1695 MS (handle interactions for all clients, regardless of whether they are robust-hyperlink-aware. Sites can buy in)150 1751 MS (one at a time. One disadvantage is that someone has to set up and manage the proxy service. Another)150 1807 MS (disadvantage is that one always suffers the \(small\) overhead of going through the proxy, even if)150 1863 MS (conventional dereferencing succeeds. \(It might be possible to save some overhead if one's client supports)150 1919 MS (some sort of "fail-over" capabilities, so that the proxy or software agent would only be contacted once)150 1975 MS (traditional dereferencing fails.\) )150 2031 MS [48.957 0 0 -48.957 0 0]/Times-Italic MF (Robust Proxy Service.)154 2143 MS [48.957 0 0 -48.957 0 0]/Times-Roman MF ( Another possibility is to include the signature in the URL, as before, but preface the)591 2143 MS (URL with an aware proxy-service URL. For example, the URL might look like this: )150 2199 MS [41.039 0 0 -41.039 0 0]/Courier MF (http://www.myproxyserver.dom/cgi-bin?url="http://www.something.dom/a/b/c")279 2300 MS (&lex-signature=w1+w2+w3+w4+w5)279 2355 MS [48.957 0 0 -48.957 0 0]/Times-Roman MF ( )1004 2355 MS (The proxy presumably performs traditional dereferencing on the URL, and uses the signature only if that)154 2467 MS (fails. The advantage of this approach is that the proxy server can always correctly interpret the robust URL)150 2523 MS (description, and handle interactions for all clients, regardless of whether they are robust-hyperlink-aware.)150 2579 MS (This scheme has a number of significant disadvantages. Someone has to set up and manage the proxy)150 2635 MS (service \(although perhaps some enterpreneur will find it valuable to provide such a service\). It roughly)150 2692 MS (doubles web traffic, as each URL request suffers an additional round trip through the service, even if)150 2748 MS (conventional dereferencing succeeds. Many people would not care to expose a complete record of their web)150 2804 MS (browsing. \(Even if the content is encrypted, the sites visited cannot be\). Finally, once clients support robust)150 2860 MS (hyperlinks, all hyperlinks encoded this way would need to be translated. )150 2916 MS [36.684 0 0 -36.684 0 0]/Times-Roman MF (7)1205 3186 MS showpage %%Page: 8 8 15 780 translate 72 300 div dup neg scale 0 0 transform .25 add round .25 sub exch .25 add round .25 sub exch itransform translate n 2128 2880 150 150 B 1 g f [57.914 0 0 -57.914 0 0]/Times-Bold MF 0 g (Limitations)150 204 MS [48.957 0 0 -48.957 0 0]/Times-Roman MF (The proposed scheme is not without its limitations: )150 317 MS [48.957 0 0 -48.957 0 0]/Times-Italic MF (Non-indexed documents.)154 429 MS [48.957 0 0 -48.957 0 0]/Times-Roman MF ( The scheme will only work for documents indexed by web search engines. Many)644 429 MS (important classes of documents, for example, PostScript, DVI, compressed documents, and images are)150 485 MS (generally not indexed. As web search engines improve, however, robust hyperlink coverage will improve)150 541 MS (with it. Furthermore, it has been claimed that search engines are beginning to lag considerably behind the)150 597 MS (state of the web, and the performance of signature dereferencing can only be as successful their coverage)150 653 MS (permits. \(Of course, that signatures are computed independently of indexing, so that once a signed)150 709 MS (document becomes indexed, the existing signature immediately gains currency.\) )150 765 MS (Moreover, large numbers of documents live behind a firewall or are accessible only through a script. In the)154 877 MS (case of those behind a firewall, they are presumably accessible only to users within the local administrative)150 933 MS (domain. Moreover, if the documents are indexed behind the firewall, it is possible for a local robust)150 989 MS (hyperlink module to handle them. I.e., one's robust hyperlink agent could maintain a list of search engines)150 1045 MS (to use, and could try the local search engine before using global ones. )150 1102 MS (In the case of documents behind a script, there may also be a helpful search engine at the site. Finding the)154 1214 MS (location of such a service and automatically exploiting it remains an interesting research challenge. )150 1270 MS [48.957 0 0 -48.957 0 0]/Times-Italic MF (Duplicates.)154 1382 MS [48.957 0 0 -48.957 0 0]/Times-Roman MF ( There are many duplicates of documents on the web. In this case, signature-based dereferencing)381 1382 MS (may return a substantial result. Probably this fact does not represent a substantial problem, as presumably)150 1438 MS (any of the duplicates will be reasonable replacements for the moved document. )150 1494 MS [48.957 0 0 -48.957 0 0]/Times-Italic MF (Variation in search engine performance.)154 1606 MS [48.957 0 0 -48.957 0 0]/Times-Roman MF ( Signature dereferencing performance relies on the performance of)961 1606 MS (search engines, which is highly variable. Moreover, signatures should be computed to work with the)150 1662 MS (minimal common functionality of search engines, namely that only page content \(not comments, not scripts)150 1718 MS (or style sheets or other special features\), with HTML tags stripped out, and with words defined as)150 1774 MS (continuous strings of alphabetic characters \(no numbers\). )150 1830 MS [48.957 0 0 -48.957 0 0]/Times-Italic MF (Resources with highly variable content.)154 1942 MS [48.957 0 0 -48.957 0 0]/Times-Roman MF ( Some resources, e.g., a newspaper home page, vary in content very)939 1942 MS (quickly over time. In these cases, a straightforward signature will not be of much use. However, if the page)150 1998 MS (is changing frequently, it is likely that it is a live page so that the original URL remains valid. )150 2054 MS [48.957 0 0 -48.957 0 0]/Times-Italic MF (Non-textual resources.)154 2166 MS [48.957 0 0 -48.957 0 0]/Times-Roman MF ( Non-textual resources \(e.g., images and video\) are not generally indexed based on)605 2166 MS (their content. Two possible extensions are as follows: \(i\) If an image, et cetera, is embedded within an)150 2222 MS (obvious textual context, the signature of that context could be used instead. \(ii\) Various attempts to analyze)150 2278 MS (images by content are under way \(e.g., see )150 2335 MS [48.957 0 0 -48.957 0 0]/Times-Bold MF 0 0 1 r ([Carson et al., 1999])991 2335 MS [48.957 0 0 -48.957 0 0]/Times-Roman MF 0 g (\) in ways that provide a set of terms that are)1410 2335 MS (rankable and indexable; should one of these become available as the basis for a global image search engine,)150 2392 MS (then analogous image signatures could conceivably be created. At this stage, it is premature to determine)150 2448 MS (whether image signatures will be viable. )150 2504 MS [57.914 0 0 -57.914 0 0]/Times-Bold MF (Extensions)150 2626 MS [48.957 0 0 -48.957 0 0]/Times-Roman MF (We mean for the results presented above to suggest that a useful level of performance is readily obtainable,)150 2739 MS (and hence, that the approach is viable. There are any number of ways in which one might improve these)150 2795 MS (results, both in the practice of computing signatures and by the action of robust-hyperlink-aware agents in)150 2851 MS (dereferencing dangling links. )150 2907 MS [48.957 0 0 -48.957 0 0]/Times-Italic MF (Signature Creation.)154 3019 MS [48.957 0 0 -48.957 0 0]/Times-Roman MF ( Upon creation of the initial candidate signature, the signature creation agent can)549 3019 MS [36.684 0 0 -36.684 0 0]/Times-Roman MF (8)1205 3186 MS showpage %%Page: 9 9 15 780 translate 72 300 div dup neg scale 0 0 transform .25 add round .25 sub exch .25 add round .25 sub exch itransform translate n 2128 2858 150 150 B 1 g f [48.957 0 0 -48.957 0 0]/Times-Roman MF 0 g (immediately perform a search to see how well it works in locating the reference. If, as in some of our)150 194 MS (examples above, the stringent search produces no results, one may try computing an alternate signature.)150 250 MS (\(One possibility here is to choose among the top 2n significant terms to produce the best performing n term)150 306 MS (signature.\) )150 362 MS (One interesting case is that of resources with highly variable contents. As mentioned above, a signature is)154 474 MS (likely to be less helpful in such cases \(and less needed, as well\). One possibility for such cases is to)150 530 MS (incrementally compute )150 586 MS [48.957 0 0 -48.957 0 0]/Times-Italic MF (adaptive signatures)615 586 MS [48.957 0 0 -48.957 0 0]/Times-Roman MF (, that is, signatures that are computed over time, so that they)1006 586 MS (contain terms that persist over time. )150 642 MS [48.957 0 0 -48.957 0 0]/Times-Italic MF (Signature Variations.)154 754 MS [48.957 0 0 -48.957 0 0]/Times-Roman MF (One may want to modify a strict TF-IDF ranking to include other criteria. For)581 754 MS (example, one might want to preclude all signature terms from occurring in the same sentence, or from)150 810 MS (containing words that appear to be misspellings, or words that occur only once in a document, as these)150 866 MS (terms might be subject to a greater probability of modification that will render the signature ineffective. \(In)150 922 MS (our own implementation, we have experimented with some of these variants, but have not found significant)150 978 MS (differences in the results.\) Indeed, one can compute longer or shorter signatures as desired, or even)150 1034 MS (hand-engineer them, and still have a signature that will operate with robust-hyperlink-aware agents. )150 1090 MS (In addition, the particular lexical signatures we suggest are just one form of lexical signature, which are in)154 1203 MS (turn just one form of signature. TF-IDF-based lexical signatures have the advantage of graceful degradation,)150 1259 MS (and of a search infrastructure supporting them already in place. However, it might be possible to devise)150 1315 MS (other strategies for lexical signature computation that will be superior. In addition, other, non-lexical forms)150 1371 MS (of signatures might be devised that could have other attractive properties. If some of these are important)150 1427 MS (enough, it might be worth the effort of search engines to compute these as they crawl the web. )150 1483 MS [48.957 0 0 -48.957 0 0]/Times-Italic MF (Signature Dereferencing Strategies.)154 1595 MS [48.957 0 0 -48.957 0 0]/Times-Roman MF ( If the document has been edited to remove one or more lexical)869 1595 MS (signature terms, searches requiring all terms to be present in the document will of course fail. In this case,)150 1651 MS (any number of "back off" strategies can be employed to widen the search. For example, above we suggested)150 1707 MS (using more liberal search engine semantics, which still tends to return the desired document as one of the)150 1763 MS (top few choices. However, signature dereferencing agents are free to create their own back-off strategies, for)150 1819 MS (example, incrementally eliminating terms, or combining multiple search engine results in various ways. )150 1875 MS (Note that our basic signature dereferencing strategy uses web searching as a proxy for signature matching.)154 1987 MS (However, once a result set is obtained, one's agent might compute signatures of the result set members, and)150 2043 MS (use the actual signature matches rather than the search results. As an example, the URL)150 2099 MS [48.957 0 0 -48.957 0 0]/Times-Bold MF 0 0 1 r (http://developer.apple.com/techpubs/mac/Cyberdog/Cyberdog-6.html)150 2156 MS [48.957 0 0 -48.957 0 0]/Times-Roman MF 0 g ( has a signature, which, when)1601 2156 MS (given to Google, returns 6 documents, with the document generating the signature coming in second. This)150 2212 MS (result is perfectly acceptable, and isn't surprising, as the document is just a book chapter, and the first place)150 2268 MS (document, the book's glossary. However, none of the other 5 items found by Google have a signature that is)150 2324 MS (even close to that of the original. Thus, should one want to spend the effort computing the signatures of the)150 2380 MS (result set and matching them to the original, the result can be improved \(moving up from second to first)150 2437 MS (place in this example\). )150 2493 MS [48.957 0 0 -48.957 0 0]/Times-Italic MF (Robust Hyperlink Agents)154 2605 MS [48.957 0 0 -48.957 0 0]/Times-Roman MF ( Above we mentioned that a robust-hyperlink-aware agent, either a client or proxy,)649 2605 MS (generally attempts traditional dereferencing, followed by signature-based dereferencing should that fail.)150 2661 MS (However, other actions might be carried out by such an agent. It might provide a means to persistently)150 2717 MS (change a reference for the given user. For example, the client may maintain a list of hyperlink remappings)150 2773 MS (for the user, so that subsequent uses of a URL will automatically be remapped to the one the user selected.)150 2829 MS (If one uses a robust proxy service, the service might keep track of previous users' replacement suggestions,)150 2885 MS (and offer these to future users. If the resources belongs to the user, the agent might offer to edit the)150 2941 MS (hyperlink for the user. \(Similarly, a robust proxy service might keep track of users' suggested hyperlink)150 2997 MS [36.684 0 0 -36.684 0 0]/Times-Roman MF (9)1205 3186 MS showpage %%Page: 10 10 15 780 translate 72 300 div dup neg scale 0 0 transform .25 add round .25 sub exch .25 add round .25 sub exch itransform translate n 2128 2850 150 150 B 1 g f [48.957 0 0 -48.957 0 0]/Times-Roman MF 0 g (replacements, and, at some point, offer this history to the author as suggestions for link replacements.\) )150 194 MS (Aware agents might use signatures even when traditional referencing succeeds. For example, since)154 306 MS (computing a signature is relatively cheap, one's client might always compute the signature of a document it)150 362 MS (locates, and compare it to that in the hyperlink. If the signatures depart significantly, the user might be)150 418 MS (advised, and perhaps given the choice of performing signature-based dereferencing. )150 474 MS (A more conservative strategy might be to perform signature checking only if there is some other indication)154 586 MS (that the document located may not be the same as that originally referenced. For example, suppose a)150 642 MS (document has several hyperlinks to sub-resources of a given document. Sub-resources might be named)150 698 MS (anchors in HTML, or XPointers )150 755 MS [48.957 0 0 -48.957 0 0]/Times-Bold MF 0 0 1 r ([XPointer])792 755 MS [48.957 0 0 -48.957 0 0]/Times-Roman MF 0 g (, or robust location references, such as those used in)1013 755 MS (Multivalent Documents )150 812 MS [48.957 0 0 -48.957 0 0]/Times-Bold MF 0 0 1 r ([MVD 1998a])630 812 MS [48.957 0 0 -48.957 0 0]/Times-Roman MF 0 g (, )917 812 MS [48.957 0 0 -48.957 0 0]/Times-Bold MF 0 0 1 r ([Phelps and Wilensky, 1998])941 812 MS [48.957 0 0 -48.957 0 0]/Times-Roman MF 0 g (. In each case, it is possible for the)1531 812 MS (resource reference to be successfully dereferenced, but for the sub-resource not to be found. For example,)150 868 MS (most web clients, when given a URL of the form http://...#name, will simply ignore the absence of an)150 924 MS (anchor named "name". Failure to resolve sub-resource references might instead be interpreted as an)150 980 MS (indication to perform signature checking. )150 1037 MS [57.914 0 0 -57.914 0 0]/Times-Bold MF (Other Applications of Lexical Signatures)150 1159 MS [48.957 0 0 -48.957 0 0]/Times-Roman MF (Simple signatures along the lines of the ones we suggest for robust hyperlinks may have other applications)150 1272 MS (as well. In particular, we speculate that they may have some utility for detecting duplicate or plagiarized)150 1328 MS (documents. Investigating such applications is a topic for future research. )150 1384 MS [57.914 0 0 -57.914 0 0]/Times-Bold MF (Relation to Other Work)150 1506 MS [48.957 0 0 -48.957 0 0]/Times-Roman MF (The primary difference between robust hyperlinks and other approaches is essentially a practical one.)150 1619 MS (Namely, one does not require administrative buy-in, the creation of infrastructure, or agreement on)150 1675 MS (conventions for robust hyperlinks to work. In addition, the storage, computational and communication)150 1731 MS (requirements are modest. )150 1787 MS (An alternative approach to robust hyperlinks might be feasible if systems like the Alexa's archive of the web)154 1899 MS [48.957 0 0 -48.957 0 0]/Times-Bold MF 0 0 1 r ([Alexa])150 1956 MS [48.957 0 0 -48.957 0 0]/Times-Roman MF 0 g ( can be made relatively complete. That is, if one can always find an old reference in an archive, one)301 1956 MS (can use it as a query for the current version of the resource. The feasibility of this approach is a function of)150 2012 MS (the completeness of a web archive \(which, in effect, allows one to compute signatures retroactively\). )150 2068 MS [57.914 0 0 -57.914 0 0]/Times-Bold MF (Implementation)150 2190 MS [48.957 0 0 -48.957 0 0]/Times-Roman MF (We have provided support for robust URLs in the Multivalent Document System )150 2305 MS [48.957 0 0 -48.957 0 0]/Times-Bold MF 0 0 1 r ([Phelps, 1998])1756 2305 MS [48.957 0 0 -48.957 0 0]/Times-Roman MF 0 g (, )2049 2305 MS [48.957 0 0 -48.957 0 0]/Times-Bold MF 0 0 1 r ([Phelps)2073 2305 MS (and Wilensky, 1998])150 2362 MS [48.957 0 0 -48.957 0 0]/Times-Roman MF 0 g (. In particular, the system will compute the signature of a document on request, and if)575 2362 MS (performance measurements are desired, submit signatures to multiple search engines. URLs saved as)150 2418 MS (bookmarks are currently automatically made robust. )150 2474 MS (Under development are two tools to bring most of the advantages of robust hyperlinks to users of standard)154 2586 MS (browers. The first is a module for the Apache web server that makes robust all web traffic that pass through)150 2642 MS (it, as described above. The second piece of software transparently updates the URLs in a web site to make)150 2698 MS (the robust, by crawling the site and rewriting HREFs. Robust hyperlinks are encoded by appending the)150 2754 MS (signature to the URL as in the form of CGI arguments, as described above. )150 2810 MS [57.914 0 0 -57.914 0 0]/Times-Bold MF (Conclusion)150 2932 MS [36.684 0 0 -36.684 0 0]/Times-Roman MF (10)1196 3186 MS showpage %%Page: 11 11 15 780 translate 72 300 div dup neg scale 0 0 transform .25 add round .25 sub exch .25 add round .25 sub exch itransform translate n 2128 2840 150 150 B 1 g f [48.957 0 0 -48.957 0 0]/Times-Roman MF 0 g (We believe our initial findings indicate that robust hyperlinks are a viable solution to a large part of the)150 194 MS (problem of dangling pointers. Namely, we can, immediately, at small cost, and fully automatically, make)150 250 MS (links that will enable us to find textual documents with highly probability when the resource has been both)150 306 MS (moved and modified. )150 362 MS (The approach embodied in robust hyperlinks is an example of the web being able to bootstrap new features)154 474 MS (upon those previously developed. Perhaps many other such additional capabilities will be possible.)150 530 MS [57.914 0 0 -57.914 0 0]/Times-Bold MF (Acknowledgements)150 708 MS [48.957 0 0 -48.957 0 0]/Times-Roman MF (This research was supported by Digital Libraries Initiative, under grant NSF CA98-17353. )154 821 MS [57.914 0 0 -57.914 0 0]/Times-Bold MF (References)150 943 MS [48.957 0 0 -48.957 0 0]/Times-Roman MF (\(In the on-line version of this paper, the hyperlinks in the references below are robust URLs.\) )154 1057 MS [48.957 0 0 -48.957 0 0]/Times-Bold MF ([Macskassy and Shklar, 1997])154 1170 MS [48.957 0 0 -48.957 0 0]/Times-Roman MF ( Sofus Macskassy and Leon Shklar. )779 1170 MS [48.957 0 0 -48.957 0 0]/Times-Bold MF 0 0 1 r (Maintaining information resources)1486 1170 MS [48.957 0 0 -48.957 0 0]/Times-Italic MF 0 g (.)2214 1170 MS gs n 1909 55 150 1183 CB (Proceedings of the Third International Workshop on Next Generation Information Technologies)150 1226 MS gr [48.957 0 0 -48.957 0 0]/Times-Roman MF (\(NGITS'97\), June 30-July 3, 1997, Neve Ilan, Israel. )150 1282 MS [48.957 0 0 -48.957 0 0]/Times-Bold MF ([Alexa])154 1395 MS [48.957 0 0 -48.957 0 0]/Times-Roman MF ( )305 1395 MS [48.957 0 0 -48.957 0 0]/Times-Bold MF 0 0 1 r (Alexa)317 1395 MS [48.957 0 0 -48.957 0 0]/Times-Roman MF 0 g (. \(http://www.alexa.com/\) )436 1395 MS [48.957 0 0 -48.957 0 0]/Times-Bold MF ([Carson et al., 1999])154 1508 MS [48.957 0 0 -48.957 0 0]/Times-Roman MF ( Chad Carson, Serge Belongie, Hayit Greenspan, and Jitendra Malik)573 1508 MS (. )1910 1508 MS [48.957 0 0 -48.957 0 0]/Times-Bold MF 0 0 1 r (Blobworld:)1934 1508 MS (Image segmentation using Expectation-Maximization and its application to image querying)150 1565 MS [48.957 0 0 -48.957 0 0]/Times-Roman MF 0 g (, February)2050 1565 MS (4, 1999. \(http://elib.cs.berkeley.edu/~carson/papers/pami.html\) )150 1621 MS [48.957 0 0 -48.957 0 0]/Times-Bold MF ([CNRP])154 1734 MS [48.957 0 0 -48.957 0 0]/Times-Roman MF ( )321 1734 MS [48.957 0 0 -48.957 0 0]/Times-Bold MF 0 0 1 r (Common Name Resolution Protocol)333 1734 MS [48.957 0 0 -48.957 0 0]/Times-Roman MF 0 g (, 18-Oct-99.)1083 1734 MS (\(http://www.ietf.org/html.charters/cnrp-charter.html\) )150 1790 MS [48.957 0 0 -48.957 0 0]/Times-Bold MF ([Francis et al., 1995])154 1903 MS [48.957 0 0 -48.957 0 0]/Times-Roman MF ( Paul Francis, Takashi Kambayashi, Shin-ya Sato and Susumu Shimizu. )579 1903 MS [48.957 0 0 -48.957 0 0]/Times-Bold MF 0 0 1 r (Ingrid: A)1996 1903 MS (Self-Configuring Information Navigation Infrastructure)150 1960 MS [48.957 0 0 -48.957 0 0]/Times-Roman MF 0 g (. December 11-14, 1995.)1313 1960 MS (\(http://www.ingrid.org/francis/www4/Overview.html\) )150 2017 MS [48.957 0 0 -48.957 0 0]/Times-Bold MF ([Kahn and Wilensky, 1995])154 2130 MS [48.957 0 0 -48.957 0 0]/Times-Roman MF ( Robert Kahn and Robert Wilensky)723 2130 MS (. )1416 2130 MS [48.957 0 0 -48.957 0 0]/Times-Bold MF 0 0 1 r (A Framework for Distributed Digital)1440 2130 MS (Object Services)150 2187 MS [48.957 0 0 -48.957 0 0]/Times-Roman MF 0 g (. cnri.dlib/tn95-01, May 13, 1995. )475 2187 MS [48.957 0 0 -48.957 0 0]/Times-Bold MF ([Mind-it])154 2300 MS [48.957 0 0 -48.957 0 0]/Times-Roman MF ( )346 2300 MS [48.957 0 0 -48.957 0 0]/Times-Bold MF 0 0 1 r (Mind-it)358 2300 MS [48.957 0 0 -48.957 0 0]/Times-Roman MF 0 g (. \(http://www.netmind.com/html/individual.html\) )518 2300 MS [48.957 0 0 -48.957 0 0]/Times-Bold MF ([Phelps, 1998])154 2413 MS [48.957 0 0 -48.957 0 0]/Times-Roman MF ( Thomas A. Phelps. )446 2413 MS [48.957 0 0 -48.957 0 0]/Times-Bold MF 0 0 1 r (Multivalent Documents: Anytime, Anywhere, Any Type, Every Way)844 2413 MS (User-Improvable Digital Documents and Systems.)150 2470 MS [48.957 0 0 -48.957 0 0]/Times-Roman MF 0 g ( Ph.D. Dissertation, University of California, Berkeley.)1189 2470 MS (UC Berkeley Division of Computer Science Technical Report No. UCB/CSD-98-1026, December 1998.)150 2526 MS (Also see the )150 2583 MS [48.957 0 0 -48.957 0 0]/Times-Bold MF 0 0 1 r (general)402 2583 MS [48.957 0 0 -48.957 0 0]/Times-Roman MF 0 g ( and )556 2583 MS [48.957 0 0 -48.957 0 0]/Times-Bold MF 0 0 1 r (technical)651 2583 MS [48.957 0 0 -48.957 0 0]/Times-Roman MF 0 g ( home pages. )837 2583 MS [48.957 0 0 -48.957 0 0]/Times-Bold MF ([Wilensky and Phelps , 1998])154 2696 MS [48.957 0 0 -48.957 0 0]/Times-Roman MF ( Robert Wilensky and Thomas A. Phelps . )756 2696 MS [48.957 0 0 -48.957 0 0]/Times-Bold MF 0 0 1 r (Multivalent Documents: A New)1594 2696 MS (Model for Digital Documents)150 2753 MS [48.957 0 0 -48.957 0 0]/Times-Roman MF 0 g (. UC Berkeley Division of Computer Science Technical Report,)756 2753 MS (CSD-98-999, March 13, 1998. )150 2809 MS [48.957 0 0 -48.957 0 0]/Times-Bold MF ([OCLC])154 2923 MS [48.957 0 0 -48.957 0 0]/Times-Roman MF ( )326 2923 MS [48.957 0 0 -48.957 0 0]/Times-Bold MF 0 0 1 r (OCLC PURL Service)339 2923 MS [48.957 0 0 -48.957 0 0]/Times-Roman MF 0 g (. \(http://www.purl.org\) )790 2923 MS [36.684 0 0 -36.684 0 0]/Times-Roman MF (11)1196 3186 MS showpage %%Page: 12 12 15 780 translate 72 300 div dup neg scale 0 0 transform .25 add round .25 sub exch .25 add round .25 sub exch itransform translate n 2128 2894 150 150 B 1 g f [48.957 0 0 -48.957 0 0]/Times-Bold MF 0 g ([Netcraft 1999])154 195 MS [48.957 0 0 -48.957 0 0]/Times-Roman MF ( )471 195 MS [48.957 0 0 -48.957 0 0]/Times-Bold MF 0 0 1 r (Netcraft Web Server Survey)483 195 MS [48.957 0 0 -48.957 0 0]/Times-Roman MF 0 g (. December 1999 \(http://www.netcraft.com/survey/\). )1074 195 MS [48.957 0 0 -48.957 0 0]/Times-Bold MF ([Sollins and Masinter, 1994])154 308 MS [48.957 0 0 -48.957 0 0]/Times-Roman MF ( K. Sollins and L. Masinter. )739 308 MS [48.957 0 0 -48.957 0 0]/Times-Bold MF 0 0 1 r (Functional Requirements for Uniform)1296 308 MS (Resource Names)150 365 MS [48.957 0 0 -48.957 0 0]/Times-Roman MF 0 g (, Network Working Group Request for Comments 1737, December 1994. )496 365 MS [48.957 0 0 -48.957 0 0]/Times-Bold MF ([Ingham et al., 1996])154 478 MS [48.957 0 0 -48.957 0 0]/Times-Roman MF ( David Ingham, Steve Caughey, Mark Little. )584 478 MS [48.957 0 0 -48.957 0 0]/Times-Bold MF 0 0 1 r (Fixing the "Broken-Link" Problem:)1466 478 MS (The W3Objects Approach)150 535 MS [48.957 0 0 -48.957 0 0]/Times-Roman MF 0 g (. Computing Networks & ISDN Systems, Vol. 28, No. 7-11, pp. 1255-1268:)697 535 MS (Proceedings of the Fifth International World Wide Web Conference, Paris, France, 6-10 May 1996. )150 591 MS [48.957 0 0 -48.957 0 0]/Times-Bold MF ([XPointer])154 704 MS [48.957 0 0 -48.957 0 0]/Times-Roman MF ( )375 704 MS [48.957 0 0 -48.957 0 0]/Times-Bold MF 0 0 1 r (XML Pointer Language \(XPointer\))387 704 MS [48.957 0 0 -48.957 0 0]/Times-Roman MF 0 g (. W3C Working Draft 9 July 1999.)1120 704 MS (\(http://www.w3.org/1999/07/WD-xptr-19990709\) )150 760 MS [36.684 0 0 -36.684 0 0]/Times-Roman MF (12)1196 3186 MS showpage PageSV restore %%Trailer %%DocumentNeededFonts: %%+ Courier %%+ Courier-Bold %%+ Times-Bold %%+ Times-Italic %%+ Times-Roman %%DocumentSuppliedFonts: end %%Pages: 12 %%EOF %-12345X