Finding the most similar documents across multiple text databases Article

Yu, C, Liu, KL, Wu, W et al. (1999). Finding the most similar documents across multiple text databases . 150-162.

cited authors

  • Yu, C; Liu, KL; Wu, W; Meng, W; Rishe, N

fiu authors

abstract

  • In this paper, we present a methodology for finding the n most similar documents across multiple text databases for any given query and for any positive integer n. This methodology consists of two steps. First, databases are ranked in a certain order. Next, documents are retrieved from the databases according to the order and in a particular way. If the databases containing the n most similar documents for a given query can be ranked ahead of other databases, the methodology will guarantee the retrieval of the n most similar documents for the query. A statistical method is provided to identify databases, each of which is estimated to contain at least one of the n most similar documents. Then, a number of strategies is presented to retrieve documents from the identified databases. Experimental results are given to illustrate the relative performance of different strategies.

publication date

  • January 1, 1999

start page

  • 150

end page

  • 162