Abstract
With the growing use of the Internet, users are using the Internet for varied purposes like online shopping etc. At the same time browsing history of users which includes the pages visited by the user and the time spent on each page stored in log file can be used for gathering useful insights. Web mining is one such technique that is applied to these log files to mine navigational patterns. In this paper, we have proposed a unique algorithm called FEDUS: Field Extraction, Data Cleaning, User and Session Identification. To the best of our knowledge, there is no such comprehensive algorithm that considers all these stages of pre-processing. The proposed algorithm is compared with existing approaches and the results indicate 77% reduction in the overall size of the log file after pre-processing. Also, the number of sessions identified using the proposed algorithm was more as compared to the existing techniques.
Keywords: