ABSTRACT
Web data extraction techniques often focus on accurate and efficient information acquisition from webpages. However, webpage variants cause frequent extraction to fail and result in high maintenance costs. Significant effort is attracted to robust extraction, but most either require complex pre-processing or supplementary files. In this paper, a novel method is proposed to enhance extraction robustness by using datatype and weight information of path-layers. The similarities between paths of the target node in the original webpage and candidate nodes in page variants are calculated to determine the node with the highest possibility. Experiments on a large set of real data show that this method yields better robustness than the existing approaches.