Posts

Showing posts from February, 2011

2011-02-08: An Evaluation of Link Neighborhood Lexical Signatures to Rediscover Missing Web Pages

Image
The final project for my master's degree focused on the problem of “missing” web pages, those URIs that return an error result when retrieved.  When a web page is no longer available at a given URI, it may be available at a new URI, and this research proposes and demonstrates a new method for finding the new URI. Prior research has proposed using the lexical signature of a page as a search query to find the same or similar content at a new URI.  A lexical signature (LS) is a few words that are used in that page much more often than they are used in other pages on the Web, and so are thought to describe what the page is about.  That LS is then used as a search query which will hopefully find the target page in its results. Previously-proposed methods for using an LS to find a new URI required either that the page be analyzed before being lost (ref: P&W) or that cached or archived versions of the page be available for analysis.  If the page had not previously been analyzed and