Automatic extraction of facets for user query in text mining [AEFTM] Article

Ramya, RS, Raju, N, Pushpa, CN et al. (2020). Automatic extraction of facets for user query in text mining [AEFTM] . 11(2), 342-350.

cited authors

  • Ramya, RS; Raju, N; Pushpa, CN; Venugopal, KR; Iyengar, SS; Patnaik, LM

fiu authors

abstract

  • Query facet is a group of items that describes the content covered by a user query. Every word in a facet item that assign significance to the facet. One single word is more appropriate than a big sentence if it is capable of giving the complete meaning of the sentence. However, identifying a single word efficiently is the challenging task. Normally, the important information of a query exists in the top retrieved document that are in the form of lists. Extracting query facet within the top search results is also a challenging task in Text mining. In this work, we propose a framework Automatic Extraction of Facets in Text Mining [AEFTM] for User Queries that extract the query facets automatically by grouping the list based on three categories namely HTML tags, free text patterns and repeat regions. Grouping G of the list is based on domain sites present in the list. We observe that some of the lists contains noise and irrelevant information for extracting the facets. In order to prune these lists, the importance of each item present in the lists from the group G is evaluated and Cosine Similarity (CS) between two items is calculated. Further, to extract more facets High Quality Clustering (HQC) algorithm is proposed to cluster the items that has the most number of point in each iteration. Finally, the top most items from each cluster are selected and provided as the best facets for the user query. Experiments are conducted on User Q and Random Q dataset. It is observed that the proposed method AEFTM out performs QD Miner method by removing duplicate items and provide a large number of useful and high relevant query facets for user submitted user queries.

publication date

  • January 1, 2020

start page

  • 342

end page

  • 350

volume

  • 11

issue

  • 2