0
Views
0
CrossRef citations to date
0
Altmetric
Research Article

File hyper-searching explained

ORCID Icon &
Received 18 Dec 2023, Accepted 09 Jul 2024, Published online: 04 Aug 2024

ABSTRACT

There are two main ways to retrieve files: hierarchical navigation and query-based search. File retrieval studies have consistently found navigation preference, with search used only as a last resort when users forget in which folder they stored the target file. However, a minority of people, referred to as hyper-searchers, perform far more searches than others. This study aimed to discover why hyper-searchers conduct searches far more than the rest of the population. To do so, a group of hyper-searchers (n = 50) and a control group (n = 50) were assigned using a double-check allocation method which included both self-estimation and a retrieval task. On average, the search percentage in the retrieval task for the hyper-searchers (67%) was over 13 times higher than for the control group (5%). The study gives ample evidence that hyper-searchers’ files were less organized than the control group. As a result, their average navigation failure percentage (23%) was almost 4 times higher than the control group (6%), and hyper-searchers needed to resort to search. Our results suggest that hyper-searchers’ files were less well-organized because they scored less on the conscientiousness trait of the Big Five personality questionnaire; and in turn, having less organized files forced them to search more.

1. Introduction

File retrieval is a basic activity performed by billions of computer and smartphone users several times a day. The two main methods used for file retrieval are navigation and search. When navigating, users manually traverse their folder hierarchy until they reach the folder in which the target item is stored. When using search, users first generate a query specifying an attribute of the target item, and once the search engine returns a set of results, they select the relevant item from the query list (Bergman et al., Citation2008). File retrieval studies consistently found a preference for navigation over search (Barreau & Nardi, Citation1995; Bergman & Yanai, Citation2018; Bergman et al., Citation2014; Bergman, Israeli, et al., Citation2019; Boardman & Sasse, Citation2004; Jones et al., Citation2014; Kirk et al., Citation2006; Ravasio et al., Citation2004; Song & Ling, Citation2011; Teevan et al., Citation2004). Furthermore, participants were found to search only as a last resort, in the minority of cases in which they forgot in which folder they stored the target file (Bergman et al., Citation2008; Fitchett & Cockburn, Citation2015). However, while this holds true for the large majority of participants, a novel analysis of published papers (Bergman et al., Citation2008, Citation2014; Bergman, Israeli, et al., Citation2019) indicates that a minority of people tend to perform many more searches compared to others (Bergman, Israeli, et al., Citation2021). We termed these people “hyper-searchers.”

The study’s research aim was to discover why hyper-searchers search much more than the rest of the population. To do so, we assigned a group of hyper-searchers (n = 50) and a control group (n = 50) using a double-check allocation method: We first asked a large group of potential participants (n = 918) to estimate their percentage of searches of all their file retrievals and allocated them to groups based on their estimated search percentage. The observed search percentage was measured using the Elicited Personal Information Retrieval (EPIR) research technique, in which participants retrieve files requested by the tester using any retrieval method they prefer. We excluded participants when their observed search percentage differed from the group criterion. The double-check group allocation procedure increases reliability and allows for recruiting a larger group of hyper-searchers.

Next, we tested our seven hypotheses for our research question which were that hyper-searchers search more than the control group because: (H1) Hyper-searchers search for their files better than the control group; (H2) Hyper-searchers are worse at navigating to their files than the control group; (H3) Hyper-searchers’ file folders are less organized than the control group; (H4) Hyper-searchers have better verbal memory than the control group; (H5) Hyper-searchers’ spatial memory is less efficient than the control group; (H6) Hyper-searchers search more than others because they score lower on the conscientiousness trait of the Big Five personality questionnaire; (H7) Hyper-searchers have higher psychometric scores, years of education or computer literacy than the control group.

2. Literature review

Personal Information Management (PIM) is an activity in which individuals store and retrieve their personal information items. Such information items include files, Web favorites, e-mails, notes and contacts (Bergman & Whittaker, Citation2016; Jones & Teevan, Citation2007). Early PIM research tended to be exploratory and qualitative in nature, relying on observational studies and interviews (Kelly & Teevan, Citation2007). While such methods were important for identifying basic PIM phenomena, as this research field matured there has been a growing need for a more rigorous quantitative approaches that test statistical relations between variables (Bergman, Citation2013). The PIM subfield of file research includes studies on file retrieval (e.g., Bergman et al., Citation2010, Citation2014, Citation2020a), file organization (e.g., Dinneen et al., Citation2019; Gonçalves & Jorge, Citation2003; Henderson & Srinivasan, Citation2009), file sharing (e.g., Adavi & Acker, Citation2023; Rader, Citation2009) and file related design (Bergman et al., Citation2009; Bergman, Whittaker, et al., Citation2019; Fitchett et al., Citation2014). This subfield is a rapidly growing area of research, as evidenced by a relatively recent literature review paper that identified over 230 publications on file research (Dinneen & Julien, Citation2020).

2.1. Search percentages

Retrieval is the main reason people manage their files (Jones, Citation2007). It is essential for retrieval to be both successful and efficient, since information cannot be used unless it can be re-accessed (Bergman & Whittaker, Citation2016). There are two main ways by which people retrieve their files – navigation and search. Navigation is a folder-related process by which users manually traverse their folder hierarchy until they reach the folder in which the target item is stored, and then scan for the target file within that folder. In contrast, Search is a process in which users first generate a query specifying some attribute of the target item (e.g. its name or part of it, text it contains, its format, etc.). The search engine then returns a set of results from which the user selects the relevant item (Bergman et al., Citation2008).

There appear to be clear advantages for search over navigation. Search is more flexible because unlike navigation, it does not require users to remember an exact folder location; instead, users can specify any attribute they happen to remember about the document in their query (Lansdale, Citation1988). Search may also be perceived as more efficient, as users can retrieve information by generating a single query, instead of using multiple operations to navigate through their folder hierarchy. Most importantly, unlike navigation, search does not require users to engage in complex organizational strategies that anticipate their future retrieval requirements and therefore, it is claimed, to address the file management problem (Cutrell et al., Citation2006; Dourish et al., Citation2000; Lansdale, Citation1988; Russell & Lawrence, Citation2007). These arguments against navigation are supported by historical developments in web retrieval, in which the use of navigational complex systems, such as Yahoo web directories, to access web data has largely vanished and has been replaced by search (Kobayashi & Takeda, Citation2000; Obendorf et al., Citation2007). These intuitive arguments have led many PIM researchers to propose that search engines should replace folders (Cutrell et al., Citation2006; Dourish et al., Citation2000; Fertig et al., Citation1996; Lansdale, Citation1988; Raskin, Citation2000; Russell & Lawrence, Citation2007).

Contrary to these claimed advantages, empirical research repeatedly shows a preference for navigation over search (Barreau & Nardi, Citation1995; Bergman & Yanai, Citation2018; Bergman et al., Citation2014; Bergman, Israeli, et al., Citation2019; Boardman & Sasse, Citation2004; Kirk et al., Citation2006; Teevan et al., Citation2004). Moreover, ever since 2005, search engines began to use a constantly updated index system, which enabled more advanced search engines to run a thousand times faster than old ones on the same computers (Farina, Citation2005; Lowe, Citation2006) and consequently improve their interfaces. However, despite these radical improvements, a multi-method research project which combined a longitudinal study of users of PC search engines and a large-scale study of users of Mac search engines, found that users of such improved search engines showed no increases in search engine usage compared to those who used older ones (Bergman et al., Citation2008). Instead, these studies confirmed that search was utilized primarily as a last resort for a small minority of retrievals (an average search percentage of 4%-15% depending on the study), when users could not recall the folder where they had stored their document. The same research also revealed no evidence that the use of improved search engines led people to change their filing habits, such that they become less reliant on folders. A later logfile study that tested a newer search engine validated these findings, confirming that young Mac users used searches in just 4.5% of their document retrievals (Fitchett & Cockburn, Citation2015). These findings are not limited to older users who supposedly adhere to their folder-navigation habits. Quite the opposite: Bergman, Israeli, et al. (Citation2019) found that age is positively correlated with search percentage, with participants over 50 searching four times more than participants in their twenties. A possible reason for this is that older people find it harder to remember where they saved their files and are therefore forced to rely more heavily on search-based retrieval.

Why do users prefer navigation over search? Two possible explanations are that search requires users to specify more of their information needs, while navigating through folders provides, at each step, additional context to guide the navigation (Teevan et al., Citation2004). Another explanation is that the hierarchical method is boringly consistent – the files are stored in a folder and stay there until otherwise determined by the users, who can expect to find them in the same place for retrieval. The process of navigation may require more steps than search; but users consistently use those same steps. In contrast, the flexibility of search may compromise consistency, as users are able to retrieve the same file using a different search term (Bergman et al., Citation2008). More recently, a combination of cognitive (Bergman et al., Citation2013) and neuroimaging (Benn, Bergman, et al., Citation2015) studies indicate that folder navigation preference has deep neuro-cognitive roots: when navigating to their files, users use the same brain structures that they employ when navigating in the physical world. These same brain areas are also used for navigation by animals such as monkeys, rodents and birds and do not require verbal attention. In contrast, search recruits Broca’s area, commonly associated with linguistic processing (Conner et al., Citation2019). The need for linguistic resources for searching is hardly surprising, given that word retrieval has always been considered a core linguistic process, involving the identification and retrieval of a word from the vast store of words in our long-term memory (Riès et al., Citation2016). As such, given that most tasks we perform on computers require the use of linguistic resources, searching may cause an interruption to workflow, while navigation enables a person to continue to focus on their current work.

2.2. Hyper-searchers

While preference for navigation over search was repeatedly observed among a large majority of participants, reviewing data from previously published papers suggests researchers may have overlooked a minority of people who tend to perform many more searches compared to others (Bergman, Israeli, et al., Citation2021). We term these individuals “hyper-searchers.” We defined hyper-searchers as participants whose search percentage was over one standard deviation above the sample mean. These large-scale published papers include (Bergman et al., Citation2008), in which participants self-reported their search percentage, and (Bergman et al., Citation2014; Bergman, Israeli, et al., Citation2019), in which participants’ retrieval behavior was observed. The results show consistency between the analyzed results in: (a) the cutting point between the hyper-searcher group and the other participants (30%-33% searches of all retrievals); (b) the percentage of hyper-searchers among all participants (8%-13%); and (c) the search percentage among hyper-searchers (51%-64%) and other participants (1% − 7%). Also, note that in all three studies, the hyper-searchers conducted searches over five times more than the other participants on average. The accumulation of these results indicates that there is a group of hyper-searchers who search for their files significantly and substantially more than the rest of the population.

We (Bergman, Israeli et al. Citation2021) conducted a preliminary pilot study regarding hyper-searching with 65 participants. Search percentages were estimated using a questionnaire. The average search percentage was 15% with a standard deviation of 16%, therefore the cutting point between the hyper-searchers and the control group was 31%. Out of all the participants, seven, (11%) met this criterion. The average search percentage for the seven hyper-searchers was 51% (SD = 14%), which was more than five times that of the other 58 participants (M = 10%, SD = 9%). These results confirmed the findings of the re-analyzed data from their previous studies. Our pilot’s preliminary results indicate that hyper-searchers had better verbal memory than the control group. Following (Bergman, Israeli, et al., Citation2021), our aim in the current study was to discover why hyper-searchers search much more than the rest of the population.

3. Research question and hypotheses

 

RQ: Why do hyper-searchers search much more than the rest of the population?

Instead of testing a single hypothesis, we tested all the reasonable hypotheses we could think of.

 

Hypothesis 1: Hyper-searchers search for their files better than the control group.

We used two dependent variables to measure retrieval quality: failure percentage and retrieval time. In operational terms, we hypothesized that the search failure percentage of hyper-searchers is lower than that of the control group and that their search retrieval time is shorter than that of the control group. Taking into consideration that H1 may be confirmed, we asked hyper-searchers about their file search technique in a semi-structured interview in order to learn from them and transfer that knowledge. Additionally, if confirmed, it may possibly be explained by hypotheses such as H4, H7-H9 below (i.e. the dependent variables in these later hypotheses could be found to be secondary mediating variables in this connection).

 

Hypothesis 2: Hyper-searchers are worse at navigating to their files than the control group.

In operational terms, we hypothesized that the navigation failure percentage of hyper-searchers is higher than that of the control group and that their navigation retrieval time is longer than that of the control group. If confirmed, it may possibly be explained by the secondary mediating variables of hypotheses such as H3, and H5 below.

 

Hypothesis 3: Hyper-searchers search more because their file folders are less organized, making navigation more difficult to perform.

This hypothesis is closely related to H2: if hyper-searchers are worse at navigating to their files than the control group, then this could possibly be because their file folders are less well-organized (i.e. that file organization could serve as a mediating variable). In operational terms, we hypothesized that: (a) Hyper-searchers use a flatter folder structure than the control group. (b) Hyper-searchers tend to place their files in uncategorized folders (e.g., Downloads) more than the control group. (c) Hyper-searchers’ folders contain more files than the control group folders. (d) Hyper-searchers’ self-estimation of their file organization is lower than the control group. (e) Hyper-searchers’ self-estimation memory of their file location is lower than the control group. (f) In the interviews, hyper-searchers spontaneously talked about being messy more than the control group.

 

Hypothesis 4: Hyper-searchers have better verbal memory than the control group.

This hypothesis is related to H1: if hyper-searchers are better searchers than others, then one possible reason for this is that their verbal memory is superior, allowing them to recall search terms better than other participants. Search requires the user to think of unique words within a file in order to avoid a long list of irrelevant files with similar names. Identifying such a unique term was previously shown to be difficult for users to identify in the context of online search (Benn, Webb, et al., Citation2015). Our pilot study (Bergman, Israeli, et al., Citation2021) provided preliminary evidence for this superior verbal memory hypothesis.

 

Hypothesis 5: Hyper-searchers’ spatial memory is less efficient than for the control group.

This hypothesis is related to (H2): If hyper-searchers are worse at navigating to their files than the control group, then spatial memory could possibly serve as a mediating variable. We hypothesized that hyper-searchers have significantly reduced visuospatial memory scores compared to other participants. Consequentially, they forget where they located their files and are therefore unable to navigate to them, resorting to search instead. File navigation was found to use the same brain structures as real-world physical navigation (Benn, Bergman, et al., Citation2015). However, previous studies did not find a connection between spatial abilities and file organization (Dinneen & Frissen, Citation2020) or a difference between hyper-searchers and other participants regarding their spatial memory (Bergman, Israeli, et al., Citation2021).

 

Hypothesis 6: Hyper-searchers search more than others because they score lower on the conscientiousness trait of the Big Five personality questionnaire.

The tendency to search is consistent for files and e-mails and for computers and mobile phones, and can therefore be considered a personal trait (Bergman & Yanai, Citation2018). Hyper-searchers are very different in this specific trait, but this could be related to a deeper and more general personality trait. Massey et al. (Citation2014) found a positive correlation between conscientiousness (one of the Big Five personality traits) and the level of file organization. This is not surprising, because conscientious people tend to be efficient and organized as opposed to easy-going and disorderly (Thompson, Citation2008). On the assumption that hyper-searchers are less organized than the rest of the population (H3), we hypothesized that their score on the conscientiousness personality trait would be significantly lower than for the control group.

Hypothesis 7: Hyper-searchers search more than others because they have higher psychometric scores, are more educated, and/or have higher computer literacy than the control group.

This hypothesis is related to H1: having higher psychometric scores, years of education, and/or higher computer literacy could explain why hyper-searchers search better than the control group, on the assumption that H1 is confirmed in the experiment.

4. Research method

To increase measurement reliability, while enabling us to test a large sample of hyper-searchers (which consist of only 8%-13% of the population (Bergman, Israeli, et al., Citation2021)), we used a combination of two methods to measure search percentage: estimation and observation.

Search percentage estimation. We could not simply ask our participants to estimate their percentage of searches from among all their file retrievals, since the question itself would focus their attention on the search option, ignoring all other retrieval options and could possibly result in an overestimation bias (Tversky & Kahneman, Citation1990). We therefore asked our participants to estimate the percentage of all file retrieval options, including: navigation, search, desktop shortcuts, recent document list (either in the “Start” menu or from the applications in the “File” menu), or any other option they specify. The estimated percentages of all options (including “others”) were required to amount to 100%. Before using this estimation method (Bergman et al., Citation2008), validated the estimated search percentage by comparing it to the participants’ actual search percentages in two independent preliminary studies. Results indicated that the correlation between the users’ estimations of their search percentage and actual search behavior was extremely high (r = 0.94) and therefore the estimations were deemed accurate and valid. However, to increase reliability and validity, the participants’ retrieval strategies were also assessed based on observation.

Search percentage observation. Following (Benn, Bergman, et al., Citation2015; Bergman & Yanai, Citation2018; Bergman et al., Citation2010, Citation2014; Bergman, Israeli, et al., Citation2019, Citation2020a, Citation2020b; Elsweiler & Ruthven, Citation2007; Elsweiler et al., Citation2011), we used the Elicited Personal Information Retrieval (EPIR) technique to measure the search percentage by observing retrievals. Participants retrieved the target files, using any retrieval method they preferred. To increase ecological validity, participants retrieved their own files from their own computers. Both file name selection and presentation of the retrieval task were conducted by a purpose-built EPIR software (described in the “Tools” section).

Each measuring method has its advantages and disadvantages. While estimation may be less accurate than observation, EPIR observation relates to a small number of retrievals in which the file name is given to the participants (reducing the ecological validity and sensitivity of the results). We therefore used observation to validate the participants’ estimations (see “Participants” section below). After this validation (which relates to the EPIR search percentage data), we analyzed other EPIR data (e.g. failure percentage and retrieval time) in the Results section.

4.1. Participants

Participants were selected for the hyper-searchers group (n = 50) and for the control group (n = 50) using a double-check group allocation method. We first asked a large group of potential participants (N = 918) to estimate their search percentage out of all file retrievals. Hyper-searchers are defined as people whose search percentage is 30% or above (Bergman, Israeli, et al., Citation2021), because 30% was found to be a standard deviation above the mean for estimated search percentages in (Bergman et al., Citation2008). We therefore invited all participants whose estimated search percentage was 30% or above to participate in the hyper-searchers group, but excluded them if their observed search percentage at the EPIR task was below this critical point. The same methods were applied for selecting the control group. We randomly selected participants for inclusion in the control group from among the participants whose estimated search percentage was under 30%, but excluded them if their observed search percentage in the EPIR task was 30% or more, and replaced them with other randomly selected participants from the group below the cutting point.

The testers recruited the potential participants by using personal connections and a Facebook post (nonrandom selection). Of our 918 potential participants, 770 left their contact details and were qualified to participate in the study (Mac users could not participate as our EPIR software runs only on PCs using the Windows operating system). Of these, 89 (12%) complied with the hyper-searchers estimation criterion, and 80 of them were invited to participate in the study. We excluded 25 participants (31%) who did not meet the observed search percentage criterion, and 5 participants (6%) were excluded because of technical problems with the EPIR software. For the control group we randomly selected 82 participants. We then excluded 22 of them (27%) for being above the observed retrievals criterion, and 10 participants (12%) because of technical problems with the EPIR software.

This left us with 50 participants in the hyper-searchers group and 50 participants in the control group, as planned. The double-check group allocation procedure increased the reliability of the grouping categorization, and allowed us to recruit a large enough group of hyper-searchers (which consisted of only 12% of the potential participants) in order to provide enough power for meaningful statistical analysis when comparing the two groups of participants. The allocation to the hyper-searchers and the control groups is detailed toward the end of the Research Method section.

Gender. The hyper-searchers group contained 31 participants who identified as women and the control group 28. A Mann-Whitney test did not indicate a significant difference between the proportion of women in the two groups, Z = 0.61, p = .54.

Age. The average age for the hyper-searchers group was 25.70 (SD = 6.57) compared to 24.72 for the control group (SD = 3.46). An independent sample t-test showed no significant age difference between the two groups, t(98) = 0.93, p = .35.

Occupation. Of the hyper-searchers group, 28 (56%) were students, 4 (8%) were working, 15 (30%) were working students, and 3 participants (6%) neither studied or worked at the time of the experiment. As for the control group, 30 participants (60%) were students, 2 (4%) were only working and 18 participants (36%) were working students.

4.2. Tools

EPIR Software. Our EPIR software was designed to collect file names and present them as targets for the retrieval task. The software was an improved version of the one used in (Bergman, Israeli, et al., Citation2019, Citation2020a, Citation2020b). To ensure that the participants were familiar with the files and had used them, the software searched for files in the “Recent Items” folder. The “Recent Items” folder accumulates links to all files that the user accesses. Despite its name, it not only contains links to recently retrieved files but also links to files retrieved many months ago. The retrieval list was presented to users in reverse chronological order, from older files to more recently accessed ones, to avoid ceiling effects that could occur if participants accessed only very recently accessed files. Files from the same path were excluded from the retrieval list because (Bergman et al., Citation2014) showed that path duplication biases result in priming retrieval for recently accessed folders. The files collected were MS Office document files. The EPIR software performed the following steps: (a) Accessed the participants’ “Recent Items” folder. (b) Excluded files that are in the same folder as other files. (c) Presented the target file names list to the tester to allow him/her to exclude inappropriate files (e.g. file names that might embarrass the participants such as “divorce agreement”). (d) Sorted the list from older files to more recently accessed ones to avoid ceiling effects upon retrieval. The software also documented the path of the target files for later analysis.

Verbal Memory Task. To examine the participants’ verbal memory skills, we used the Logical Memory test from the Wechsler Memory Scale – Revised (WMS-R) (Wechsler, Citation1945). This is a test of verbal declarative memory, in which a story is read to the participant, after which the participant is asked to recall details from memory at two different intervals: immediately after (immediate recall task), and 30 minutes later (delayed recall task). Scores are given on the accuracy of the retelling of the story and for the use of exact and accurate words and terms.

Santa Barbara Sense of Direction (SBSOD) Questionnaire. The SBSOD is a self-report measure of general spatial orientation at the environmental scale of space consisting of 15 questions about navigational orientation skills or preferences, which participants respond to on a 1–7 Likert scale (Carbonell-Carrera et al., Citation2020; Hegarty et al., Citation2002). SBSOD was previously used for PIM research in (Dinneen & Frissen, Citation2020).

Online Card Memory Game. We used an online memory game to examine visuospatial memory.Footnote1 The game presents picture cards in random order. As in the classic offline memory game, cards are presented face down, so the participants are unable to see them. Using a computer mouse, participants clicked on a card to reveal its content. In each step, participants turn over two cards, which disappear if their picture is identical (a pair), or turn back over if their picture does not match. The goal of the game is to find all the pairs in as few steps as possible. To succeed, participants need to use their visual memory (to remember the pictures on the cards) and their spatial memory (to remember where the card was located). Therefore, the number of steps taken to complete the game measures their visuospatial memory on a reversed scale – the fewer steps it takes them to finish the game, the better their visuospatial memory. In our pilot research, we found that 15 pairs were an optimal level of difficulty for preventing both floor and ceiling effects.

Big Five Personality Traits Questionnaire. We started administering the Big Five personality traits questionnaire only after conducting the study with 31 participants, but asked these participants to answer the questionnaire later on. Eventually, 44 participants of the hyper-searchers group and 45 control group participants answered the questionnaire.

Psychometric Entrance Test (PET). The PET is a standardized test that serves as an entrance exam for institutions of higher education in Israel. The PET covers three areas: quantitative reasoning, verbal reasoning and English language. It is administered by the National Institute for Testing and Evaluation (NITE) and is of considerable importance in the admissions process. The test enables the institutions to rank all applicants on a standardized assessment scale, and, compared with other assessment tools, is less affected by each applicant’s background or other subjective factors.Footnote2 We asked the participants (who were mostly students) to report on their PET scores in our questionnaire. Of our participants, 71 reported their PET score (35 of the hyper-searchers group and 36 of the control group).

4.3. Procedure

Both the hyper-searchers group and control group underwent precisely the same procedure (except for a slight difference in the semi-structured interview at the end of the experiment phase). Participants brought their laptops to the experiment, which was conducted in a quiet, isolated room in a university library.

Initial Group Allocation Questionnaire. All 918 potential participants were given the initial group allocation questionnaire, which contained a single question asking them to estimate the percentage of cases in which they use each one of the retrieval options out of all their retrievals, and asked to give their contact details for possible later study.

Before the Experiments Phase. In preparation for the experiment, we assigned potential participants to the hyper-searchers and the control groups as described in the “Participants” section. Participants were not informed that they had been assigned to groups to avoid bias. Each participant performed the experiment individually with one of the testers. Some of these participants were later excluded due to their results in the retrieval task (described below). However, we did not inform these participants that they were excluded and they continued to participate in all experimental tests and received payment like the rest of the participants. Although this caused added expenses to the experiment, we did this to prevent participants from potentially telling other participants (in case they happen to know them) how they should behave in the retrieval task so as not to be excluded, which could have severely damaged the validity of our results.

Start of Experiments Phase. The experiment phase was conducted sitting next to the participants’ personal computers. The testers explained the experiment to the participants and asked them to sign an informed consent form approved by our university’s Ethics Committee (BD191221). Both tester and participants turned off their mobile phones and did not speak to each other during the experimental tasks (e.g. the retrieval task) to avoid any disturbance.

Retrieval Task. In preparation for the retrieval task, the tester activated the EPIR software to prepare the target file list. The EPIR software was on a USB memory stick and did not require installation. Before each experiment, the tester verified that the USB stick was free of viruses and malware using updated antivirus software. After the file retrieval list was created, participants were asked to retrieve the target files that were presented to them by the EPIR software. The number of files chosen by the software varied according to the number of suitable files in the Recent Documents folder. Depending on the number of eligible target files, the tester instructed the participant to perform 3–10 retrievals. Across participants, the average retrievals number was 4.81 (SD = 2.87). In each retrieval task, the participant was asked to retrieve a single file. When the participants completed the retrieval task, they pressed the “next” button and the EPIR software presented the next target file name. Retrieval time was automatically measured by our software – from the time the target file name was displayed until the time they pressed the “next” button. We also verified the retrieval time recorded by the software by visually observing the retrieval videos. Participants were instructed to retrieve the target file, and when they found it to click on it once, but not open it to protect their privacy. Participants were not given a time limit and each retrieval attempt continued until the file was successfully found, or the participant said that s/he could not find it. If participants abandoned the retrieval attempt, the tester recorded this. Participants were not confined to a specific retrieval method and were able to choose how to retrieve the target file (e.g. use navigation or search). Retrieval was video recorded (using Zoom software) for later measurement of related variables such as failure percentage and retrieval time. The tester wrote down particular search techniques applied by participants from the hyper-searchers group in order to ask them about it later, during the semi-structured interview (e.g. search in approximation to target file folder, using gradually more specific words to zoom-in on target file, typing infrequent search terms or using logical operators).

Verbal Memory Task. The tester read out loud the Logical Memory Test (WMS-R) story for the verbal memory task at a moderate paste. This was followed by the immediate recall task (LM-I). The delayed recall task (LM-II) was conducted after 30 minutes (in which the later tests were conducted), as required according to the instructions.

SBSOD, Big Five and Online Card Game. Participants were not given a time limitation for the online card game.

Final questionnaire. The final questionnaire included questions answered on a 1–5 Likert scale regarding their: file organization and memory of location, computer literacy, PET score, e-mail retrieval options percentage assessment, and ease of search for files and e-mails. We also collected background data: year of birth, and number of years of study. Computer literacy is defined as “the ability to use computers, programs, and applications to find and use different formats of information to achieve various goals in daily lives” (Dincer, Citation2016), or as “the basic knowledge, skills, and attitudes needed by all citizens to be able to deal with computer technology in their daily life.” Although there has been an attempt to measure computer literacy in PIM literature objectively (Blau et al., Citation2013), the majority of PIM literature uses self-estimation to measure it (e.g., Bergman, Whittaker, et al., Citation2019; W. Nwagwu, Citation2023; W. E. Nwagwu & Donkor, Citation2022).

Semi-Structured Interview. We ended the experiment phase with a short, semi-structured interview in which we mainly: (a) asked participants about the problems that they face when retrieving their files and the way they address these problems; (b) asked hyper-searchers to retrace their thoughts during their search to possibly learn from them how to improve file search, and (c) asked hyper-searchers for their recommendation for improving file retrieval.

Study End. The tester thanked the participants and paid them $40 for their time.

4.4. File organization measurements

The level of file organization was measured using different variables and research techniques. The first three variables (file depth, unfiled files percentage, and files per folder) were extracted from the data available from the EPIR software and the retrieval task recordings, and therefore were objective. The second two variables (file organization and memory of location), as well as the qualitative results of the interviews, were subjective. Each measurement category has its specific limitations: The objective measurements rely on a small sample of target files taken from the Recent Documents folder, and therefore may not accurately represent the entire file folder hierarchy. Subjective estimations relate to the entire file collection may be less accurate than the objective measurements. However, the inclusion of several different objective and subjective organization measurements can allow for convergent validity if all the results point in the same direction.

File depth. During the observed retrieval task, the EPIR software collected the target file paths. We therefore knew the locations of all the target files, even when the participants searched for the files or did not find them at all. File depth was defined as the number of steps down the folder hierarchy it takes to navigate from the root to the folder where the target file is located. A low target files depth may imply a general rather than specific folder hierarchy organization.

Unfiled files percentage. The EPIR software data also allowed us to measure the percentage of unfiled files (i.e., files stored in Download folder, Documents folder with no sub-folder, and root folders such as c:\) out of all the target files. Storing files in such folders indicates a low level of organization and has been shown to hinder retrieval (Bergman et al., Citation2014).

Files per folder. Organizing files in very large folders can hinder their retrieval (Bergman et al., Citation2010) and can also suggest a more general and less specific folder hierarchy organization.

File organization and memory of location. Subjective evaluations of file organization and memory of file locations were provided by the final questionnaire on a 1–5 Likert scale.

Qualitative results. Subjective qualitative results regarding file organization were collected in the semi-structured interviews.

4.5. Allocation to the hyper-searchers and the control groups

The allocation to the hyper-searchers and control group was made in two stages: first using estimated search percentage before the experiment and then using observed search percentage at the beginning of the experiment.

4.5.1. Estimated search percentage group allocation

In the initial group allocation questionnaire, participants were asked to evaluate the percentage of different retrieval behaviors (search, navigation, desktop use, recent documents, and “other”) used in their retrievals. Percentages had to accumulate to 100%. details the results of the hyper-searchers group and the control group and compares them using an independent t-test.

Table 1. A comparison between the hyper-searchers and the control groups on estimated retrieval behavior percentages.

indicates that the average search percentage for hyper-searchers (46%) was more than six times higher than the control group (7%). The difference between the two groups (39% on average) can be explained by the fact that hyper-searchers navigated significantly less than the control group (28% on average), and used desktop retrieval less than the control group (10% on average).

4.5.2. Observed search percentage group allocation

Unlike the results of the group allocation questionnaire, where participants indicated that they had used Recent Documents retrieval, in the observed retrieval participants only used search and navigation (including desktop retrieval, which is an extremely short navigation). For the sake of simplicity and consistency with past studies (Bergman, Israeli, et al., Citation2021; Bergman, Whittaker, et al., Citation2021) we categorized retrievals according to the last retrieval type. Therefore, retrievals that began with navigation and continued with search were categorized as searches. compares the search and navigation distribution of the two groups.

Table 2. Comparison of observed search and navigation percentages between the two groups.

Search Percentage. indicates that the search percentage conducted by the hyper-searchers group (67%) was significantly higher than the control group (5%). Note that the hyper-searchers’ search percentage was over 13 times higher than the control group. When conducting a Pearson correlation between participants’ estimated search percentage in the group allocation questionnaire and their observed search percentage in the EPIR task, results were high and significant, r(100) = 0.74, p < .001.

Hyper-searchers and Navigation. indicates that the navigation percentages for the hyper-searchers group (33%) were significantly lower than for the control group (95%), t(98)=-16.24, p < .001. Of the hyper-searchers group, 16 participants (32%) did not use navigation in any of their retrieval tasks (i.e. used search in 100% of their retrievals). However, when we looked at their initial group allocation questionnaire, they had all estimated that they do navigate. On average, they estimated that they navigate in 23% of their retrievals (SD = 14%), which is similar to the estimated navigation percentage of the entire hyper-searchers group (M = 22%, SD = 16%). These participants reported that they navigate but may happen not to do so during the retrieval task. Eight other participants estimated that they don’t navigate. However, all eight were observed navigating during the retrieval task. On average, they were observed navigating to 51% of their files (SD = 41%) which is more than the average for the hyper-searchers (33%). These participants did not report that they navigate, but obviously they do navigate as they were observed doing so. Not a single hyper-searcher estimated that s/he did not navigate and did not navigate in the retrieval task. We think it is therefore safe to conclude that all the participants of the hyper-searchers group do navigate.

4.6. Research method summary table

indexes the research hypothesis to the research tool/s and the variables that will be used to test them.

Table 3. Research hypotheses indexed to the tools and variables that tested them.

5. Results

All statistical results were calculated across participants.

5.1. Retrieval task

The participants’ first assignment was to retrieve some of their own files chosen for them by EPIR software. This section starts with the results of all of the retrievals and then focuses on “navigation and then search,” search and navigation.

5.1.1. Between group comparison for all retrievals: hyper-searchers fail twice as much as the control group and are less aware of their failures

compares the percentage of retrieval failures (failure percentage for short) and retrieval time for all retrievals between the two groups.

Table 4. A comparison between the hyper-searchers group and the control group on failed retrievals percentage and retrieval time.

Failure percentages. indicates that the average failure percentage for the hyper-searchers (8%) group was twice as big as for the control group (4%), but the difference was not significant because of the high variance in both groups. The failure between the two groups was different in nature. We divided failures into two categories: In the first category, participants informed the tester that they gave up on trying to find the file and therefore they were aware of their failures. In the second category, participants retrieved the wrong file without being aware of their mistake. We termed the latter category unaware failures. While there was no significant difference between the percentage of failures that participants were aware of, the unaware failure percentage for the hyper-searchers group (7%) was significantly higher than for the control group (2%). A paired samples t-test indicates that for the hyper-searchers group, the percentage of unaware failures (M = 7%, SD = 15%) was significantly larger than the percentage of aware failures (M = 1%, SD = 3%), t(50) = 3.10, p = .003, while a paired t-test for the control group did not indicate that the percentage of unaware failures (M = 2%, SD = 8%) was significantly different from the percentage of aware failures (M = 2%, SD = 7%), t(50)=-0.19, p = .85. This suggests that hyper-searchers are not aware of most of their failures, unlike the rest of the population.

Retrieval time. An independent samples t-test indicated no difference between the retrieval time of the hyper-searcher group (M = 27.23, SD = 15.49) and the control group (M = 28.09, SD = 17.81), t(98)=-0.26, p = .80.

5.1.2. Navigation and then search

In the Observed Search Percentage Group Allocation section, we categorized retrievals according to the last retrieval type. Therefore, retrievals that began with navigation and continued with search were categorized as searches. However, “navigation and then search” retrievals are of importance because they indicate navigation failure, regardless of the question of whether the following search succeeded or failed. An independent samples t-test indicated that the percentage of navigations that turned into searches, out of all navigations for the hyper-searchers (M = 16%, SD = 28%), was significantly higher than for the control group (M = 2%, SD = 6%), t(84) = 3.21, p = .002. Note the large effect: hyper-searchers turned to search after failing to navigate 8 times more than the control group. In the next two sections, we refer to the navigation part of “navigation and then search” as navigations, and to the search part of it as search.

5.1.3. Search retrievals: hyper-searchers failed more than the control group

This section focuses on observed search retrievals conducted during the retrieval task. compares the hyper-searchers group and the control group in terms of their failure percentage (of all searches), search time, and folder searches percentage of all searches.

Table 5. Search retrievals comparison.

Search failure percentages. Surprisingly, the average search failure percentage for the hyper-searchers group (7%) was significantly higher than for the control group (0%).

Search time. When measuring search time for searches that started with failed navigation, we omitted the preceding navigation time, and only timed the search part of the retrieval. Our results did not show a significant difference between the two groups.

 

H1: Hyper-searchers search for their files better than the control group.

Our results did not support our first hypothesis that hyper-searchers searched better than the rest of the population. On the contrary, their average failure percentage (7%) was significantly higher than that of the control group (0%), with no significant difference in retrieval time.

Folder search. Folder search was conducted only in a minority of searches. No significant difference was found between the hyper-searchers group and the control group.

5.1.4. Navigation retrievals: hyper-searchers failed navigation almost 4 times as much as the control group

In this section we refer to navigations, including retrievals that began with navigations and then turned to search (therefore the navigation distribution is different than in ). compares the two groups in terms of distribution, failure percentage and retrieval time.

Table 6. Navigation retrievals comparison.

Navigation percentage. indicates that the control group tried to navigate in 99% of the retrieval tasks. Most of these navigations were successful, but when failing, they turned to search in 2% of the cases and gave up the retrieval task in 4% of them. Unlike the control group, hyper-searchers attempted to navigate in only 40% of their retrievals.

Navigation failure percentage. An independent samples t-test indicated that the failure percentage for the hyper-searchers (23%) was significantly higher than for the control group (6%). This means that hyper-searchers failed their navigation almost 4 times as much as the control group, but often made up for it by successfully searching following the navigation failure.

Navigation time. No significant difference was found between the two groups.

 

H2: Hyper-searchers are worse at navigating to their files than the control group.

Our results confirmed H2, as the hyper-searchers’ failure percentage (23%) was almost 4 times higher than that of the control group (6%).

5.2. File organization: hyper-searchers are less organized than the control group

 

H3: Hyper-searchers search more because their file folders are less organized making navigation more difficult to perform.

H3 was tested by analyzing the structure in which the target files were located, by using self-estimations in the final questionnaire and by analyzing relevant citations from the semi-structured interview.

5.2.1. Target files analysis: hyper-searchers structures are flatter with more unfiled files than the control group

 

H3a: Hyper-searchers use a flatter folder structure than the control group.

An independent samples t-test indicated that the file depth of the hyper-searchers group (M = 1.49, SD = 1.23) was significantly smaller than that in the control group (M = 2.42, SD = 1.33), t(98)=-3.63, p < .001.

 

H3b: Hyper-searchers have more unfiled files than the control group.

We measured the percentage of files stored in uncategorized folders (e.g. downloads). A one-sided independent samples t-test indicated that the percentage of unfiled files in the hyper-searchers group (M = 20%, SD = 29%) was significantly larger than in the control group (M = 10%, SD = 17%), t(94) = 2.20, p = .03.

 

H3c: Hyper-searchers folders contain more files than the control group folders.

A one sided independent sample t-test indicated that the number of files in the target folders for hyper-searchers (M = 89.18, SD = 198.94) was significantly larger than for the control group (M = 32.72, SD = 60.78), t(83) = 1.78, p = .04.

5.2.2. Questionnaire estimations – hyper-searchers are less organized and less likely to remember file location

In the final questionnaire, we asked participants to what extent they agreed with statements on a 1–5 Likert scale.

 

H3d: Hyper-searchers’ self-estimation of their file organization is lower than the control group.

When presented with the statement “My files are organized,” an independent samples t-test indicated that the agreement level of the hyper-searchers group (M = 3.22, SD = 1.06) was significantly lower than the control group (M = 3.74, SD = 1.03), t(98)=-2.50, p = .014.

 

H3e: Hyper-searchers’ self-estimation of their memory of their file location is lower than the control group.

When presented with the statement “I remember in which folders I have saved my files,” an independent samples t-test indicated that the agreement level of the hyper-searchers group (M = 3.48, SD = 1.03) was significantly lower than the control group (M = 4.00, SD = 0.78), t(98)=-2.83, p=.006.

5.2.3. Interviews: hyper-searchers were aware that their files were disorganized, and associated their Disorganization with more file searches

 

H3f: In the interviews, hyper-searchers spontaneously talked about being messy more than members of the control group.

In the semi-structured interview, we did not ask participants about the way they organize their files. However, 25 members of the hyper-searchers group spontaneously talked about their lack of file organization. Four of them explicitly said that it is good to be organized (“of course it’s good to be organized, it’s just that I personally am not good at organizing” P88-H). However, they do not organize their files in folders because “it takes a lot of time, a lot of effort, I don’t have the energy for it” (P80-H). Three participants commented that they organize a few important files in folders (“ … so, in the end, I think that only the important things are worth filing [in folders] in order to get to them real fast,” P80-H); Three admitted that they use more general folders (e.g. P84-H uses a single folder for all the courses of the semester, and P1-H uses a single folder for most of his customers); Two said that they put all their files on their desktop (“I just put it on my desktop … it looks horrible!” P49-H); And two participants organize their files periodically (“It’s much easier for me to bring myself at the end of the semester to sit for an hour and organize everything” P5-H). Interestingly, six of the participants connected their unorganized files and their tendency to search for them (“ … and then I have to use search because I didn’t invest thought in where I put it” P97-H, “If I had organized folders then it’s really possible that I would use navigation” P83-H).

Compared to that, only six control group participants reported being disorganized, and their complaints seemed milder (“Distributing [files] into folders is really important. I could have done it better,” P82-C). These sorts of remarks are possibly related to the well-known guilt feelings associated with personal information organization (Bellotti & Smith, Citation2000; Bergman & Sher, Citation2022; Sweeten et al., Citation2018).

5.3. Verbal memory — no significant results

 

H4: Hyper-searchers have better verbal memory than the control group.

We measured verbal memory by using the Logical Memory test from the Wechsler Memory Scale – Revised (WMS-R)(Wechsler, Citation1945). The test measured the retelling of the story and the use of exact and accurate words in it immediately after the story is told, and then half an hour later. presents the comparison between the results of the two groups.

Table 7. WMS-R results comparison.

indicates no significant results.

5.4. Spatial memory – no significant results

 

H5: Hyper-searchers’ spatial memory is less efficient than the control group.

We measured spatial memory by using the Santa Barbara Sense of Direction (SBSOD) questionnaire and an online card memory game.

SBSOD questionnaire. The SBSOD results for the hyper-searchers group (M = 4.54, SD = 1.10) were not significantly different than for the control group (M = 4.21, SD = 1.21), t(98) = 1.42, p = .33. These results are consistent with the findings in (Dinneen & Frissen, Citation2020) which showed no significant correlation between file organization and SBSOD results.

Online card memory game. The number of steps for task completion for the hyper-searchers (M = 31.88, SD = 5.06) was not significantly different than for the control group (M = 32.62, SD = 4.94), t(98)=-0.74, p = .46. The number of seconds for task completion for the hyper-searchers (M = 121.40, SD = 28.70) was not significantly different than for the control group (M = 130.11, SD = 27.92), t(93)=-1.50, p = .14. Similarly, no significant results were found in (Bergman, Israeli, et al., Citation2021).

5.5. Hyper-searchers scored lower on the conscientiousness trait of the big five personality questionnaire

 

H6: Hyper-searchers search more than others because they score lower on the conscientiousness Big Five personality trait.

A one-sided independent samples t-test indicates that the conscientiousness scores for the hyper-searchers group (M = 3.641, SD = 0.57) were significantly lower than the scores of the control group (M = 3.86, SD = 0.59), t(87)=-1.77, p = .04. Four other independent samples t-tests showed no significant difference between the hyper-searchers and the control group regarding the 4 other personality traits.

5.6. Psychometric scores, years of education, and computer literacy – no significant results

 

H7: Hyper-searchers search more than others because they have higher psychometric scores, are more educated, and/or have higher computer literacy than the control group.

Psychometric scores. We asked our participants to report on their PET scores since most of them have taken this psychometric test. An independent t-test indicates that the PET score for the hyper-searchers (M = 669.83, SD = 38.80) was not significantly different than for the control group (M = 661.61, SD = 47.51), t(69) = 0.80, p = .43.

Years of education. An independent t-test regarding years of education indicates that it was not significantly different for hyper-searchers (M = 13.70, SD = 2.10) than for the control group (M = 13.42, SD = 1.29), t(98) = 0.80, p = .42.

Computer literacy. The self-estimated computer literacy of hyper-searchers (M = 4.04, SD = 0.97) was not significantly different from that of the control group (M = 3.76, SD = 0.96), t(98) = 1.45, p = .28.

5.7. Interviews – hyper-searchers search when they don’t remember their files’ location

In the semi-structured interview, 25 hyper-searchers spontaneously stated that they navigate when they remember their file location and/or that they search when they don’t remember it. This is an indication that hyper-searchers tend to navigate when they know where their files are located and search when they don’t remember the location. Five of the participants commented that they knew where their files were because they had recently used them (“Specifically, task 3 was a task [that] I did last week, so I knew it was in Downloads. That’s why I did it that way [i.e., navigated to the Downloads folder]” (P58-H)). Two other participants said that they remember the location (and therefore navigate) only for frequently used files (“If it’s something that I remember that it’s in a folder, then I will go to the folder … If it’s something I don’t use frequently then I won’t remember where it isthat’s when I’ll go straight to search” (P17-H)). And one participant explained her intensive use of search as being due to her not remembering where her files are “because most of my computer is a mess” (P41-H). It seems that like the control group (who attempted to navigate to 99% of their target files), hyper-searchers prefer navigation when they know where their files are. However, because their files are less organized, this happens less often, and for some of them only for recently/frequently used files. This, in turn, makes them resort to using search instead.

Although both hyper-searchers and the rest of the population navigate when they know where the files are located and search when they don’t, there may be a subtle difference between the two groups. The control group rarely ever searched, and so the possibility of searching probably had low cognitive availability (Tversky & Kahneman, Citation1973) for them (i.e., the possibility of searching did not come to mind easily and occurred to them only when they were totally unable to find the file). In contrast, because hyper-searchers have already searched a lot, the search option is likely to be much more available to their minds, and so they may use it in less extreme circumstances. Evidence for this can be found in the following utterance: “Actually I would always prefer to search, unless I know for sure where it is” (P49-H), “I knew exactly where it is, so I didn’t need to search. Search I use when I don’t know exactly where the file is” (P86-H), and “I don’t remember exactly where everything is located” (P97-H). In other words, hyper-searchers’ disorganized files can cause them to search more both directly (because it is more difficult for them to navigate to their files) and indirectly, by creating a dynamic according to which they search more, causing the search option to be cognitively more available to them, which in turn may cause them to search in less extreme cases (e.g. when they know approximately but not exactly where the file is). Both direct and indirect causal chains should be studied in future research.

Regardless of hyper-searchers’ positive attitude toward file search and their disorganized folders, the interviews do not indicate an alternative negative attitude toward folders and navigation. On the contrary, when asked for their recommendations for improving file retrieval, 39 of 46 hyper-searchers who answered this question (85%) advocated improving folder organization (“I recommend organizing the files in more orderly folders” H-90).

5.8. Other results — hyper-searchers search more and navigate less for their email too

Our final questionnaire included a question about e-mail retrieval. Similar to our initial group allocation questionnaire, participants were asked to estimate their percentages of use of each retrieval option, totaling up to 100%. compares e-mail retrieval percentages among the two groups.

Table 8. Email retrieval.

Two one-sided independent t-tests indicate that on average, hyper-searchers search for their e-mail messages (53%) significantly more than the control group (43%); and navigate less to messages in folders (6%) than the control group (13%). As with files, the difference in search behavior can be explained by the difference in navigation behavior between the two groups. Hyper-searchers tended to retrieve fewer files from their folders, possibly because they tended to organize messages in folders less than the control group.

5.9. Results summary

Of our seven research hypotheses, the results supported hypotheses H2, H3, and H6. Our results gave a strong indication that: Hyper-searchers are worse at navigating to their files and that their folders are less organized than the control group’s folders.

6. Discussion

The current study is the first to examine a large number of hyper-searchers, and compare them to a control group in order to explore the reasons that they search much more than the rest of the population. Unlike our pilot study (Bergman, Israeli, et al. Citation2021) which included only 7 unverified hyper-searchers, the current study included 50 verified hyper-searchers. Our study was conducted using a double-check group allocation method technique in which only potential participants with 30% searches or over, in both questionnaire estimations and observed retrievals of the retrieval task, were included in the hyper-searchers group. Results of our initial group allocation questionnaire indicate that the average estimated search percentage for hyper-searchers (46%) was over six times that of the control group participants (7%). Retrieval task results indicate that the average observed search percentage for hyper-searchers (67%)Footnote3 was over 13 times higher than for the control group (5%). These findings strongly validate the existence of a hyper-searchers group among the population.

As for the size of the hyper-searchers group, an analysis of previous studies that used either estimations or observed retrievals criterion (Bergman et al., Citation2008, Citation2014; Bergman, Israeli, et al., Citation2019, Citation2021) indicated that 8%-13% of the population are hyper-searchers. In this study, we found that 12% of the participants met the estimation criterion, with 31% of them being excluded due to the observed retrieval criterion. This leads to a conservative approximation that hyper-searchers constitute 8% of the population.

6.1. Hyper-searchers are not better at search

Our results did not support our first hypothesis that hyper-searchers searched better than the control group. On the contrary, their average failure percentage (7%) was significantly higher than that of the control group (0%), with no significant difference in retrieval time. It is hard to believe that hyper-searchers are worse at searching for their files than the rest of the population, if only because they search much more for their files than the rest of the population, and are therefore more experienced. One possible explanation for this finding is that the control group conducted very few searches (only 5% of their retrievals), and extreme behaviors (such as no search failure) are more likely to appear in small samples rather than in large samples. This is particularly true for relatively rare phenomena such as retrieval failures. We are therefore careful to not conclude that hyper-searchers are worse at file searching than the rest of the population, and leave this question open for future research. We can however conclude that our results found no support for H1, and therefore there was no justification to attempt to learn how to improve file searching from hyper-searchers.

6.2. Hyper-searchers are worse at navigating

Turning to navigation, our results indicate that all hyper-searchers navigate to some extent and tend to do so when they know where their files are located. Therefore, it is unlikely that they ideologically avoid it. However, they navigate much less than the control group. While the control group attempted to navigate in 99% of their retrieval tasks, hyper-searchers attempted to navigate in only 40% of them. In these navigations, the hyper-searchers’ average failure percentage (23%) was almost 4 times higher than the control group percentage (6%), confirming our second research hypothesis that hyper-searchers are worse at navigating to their files than the control group. Our results, therefore, clearly indicate that hyper-searchers search more not because they search better but because they navigate worse than the control group. The strong evidencethat hyper-searchers search much more because they are worse at navigation than the control group brings us to the next question, which is “why is do hyper-searchers fail to navigate more than the control group?”

6.3. Hyper-searchers’ files are less organized

Our results gave ample evidence that hyper-searchers’ files are less well organized than those of the rest of the population (represented by the control group):

Hyper-searchers use a flatter folder structure. Our results indicate that the average depth of the target files for hyper-searchers (1.49) was significantly lower than that of the control group (2.42). The file depth of the control group was similar to the one found in previous studies (Bergman et al., Citation2010, Citation2014, Citation2020b) which was 2.20–2.86, depending on the study. Having a flatter folder hierarchy may imply that hyper-searchers use more general folders, with less specific categorization. This is also suggested by the evidence that they use bigger folders, as discussed next.

Hyper-searchers tend to file their files less. We found that the percentage of uncategorized files among hyper-searchers (20%) was twice as high as that of the control group (10%). Storing files in uncategorized folders (e.g. the Download folder) avoids organizing them in meaningful folders that can help navigate to them in the future. No wonder that Bergman et al., found that retrieving files from uncategorized default folders damages retrieval in every possible aspect (2014).

Hyper-searchers use bigger folders. The average number of files in a folder among hyper-searchers (89.18) was significantly higher than for the control group (32.72). In other words, the average hyper-searcher’s folder was over 2.5 times bigger than the folders of the control group. Organizing files in very large folders can hinder their retrieval, as found in (Bergman et al., Citation2010), because it is well known in cognitive psychology that in visual scanning (when a user finds the target folder and scans for the target file), the number of irrelevant distracters increases the time it takes for people to identify a target object (Neisser, Citation1964; Treisman & Gelade, Citation1980). Moreover, when there are so many distracters (files other than the target file), hyper-searchers could possibly give up the scanning even when they have navigated to the right target folder, and search for the file instead. Hyper-searchers could have compensated for using larger folders by using folder search. However, our results did not indicate that they use this option significantly more than the control group.

Hyper-searchers estimate that they retrieve less from their desktop. When participants were asked to estimate their desktop retrieval percentage in the initial group allocation questionnaire, the hyper-searchers’ average estimation (17%) was significantly lower than that of the control group (27%). The use of the desktop can be very efficient because it provides quick access to the most relevant files. However, the relevancy of these files declines over time, and if they are not moved the desktop becomes cluttered. Therefore, if hyper-searchers retrieve fewer files from the desktop, they have less relevant files on it or their desktop is so cluttered with irrelevant files, that they can’t identify the target file. Both explanations indicate that hyper-searchers are less organized.

Hyper-searchers had significantly more unaware failures. For the retrieval task, the average percentage of unaware failures among hyper-searchers (7%) was significantly higher than for the control group (2%). Why were hyper-searchers largely unaware of their failures? The answer probably does not have to do with the fact that they search more, because of the 15 unaware failures only 9 (60%) were conducted using search and 6 (40%) with navigation. One possible answer is that the hyper-searchers may tend to give simple and repetitive file names which confuse them upon retrieval. The recommendation to give meaningful and distinctive file names is a consensus among PIM experts (Jones et al., Citation2015).

Hyper-searchers estimate that they are less organized and less likely to remember their file location. When presented in the final questionnaire with the statements “My files are organized” and “I remember in which folders I have saved my files,” the level of agreement among hyper-searchers was significantly lower in both cases compared to the control group’s answers. These results imply that hyper-searchers were aware that their files are less organized than the rest of the population, and that this hinders their ability to remember where they are in order to navigate to them.

Hyper-searchers connected their messy files with their tendency to search for them. In the semi-structured interview, 25 hyper-searchers spontaneously talked about being disorganized. They typically thought that files should be organized in folders, but neglected to do so because they thought they weren’t good at it or had no time and energy for it. Even so, they made some attempts to organize their files by: organizing their most important files, categorizing files in more general folders (e.g. using a single folder for all the courses in the semester),Footnote4 and spring cleaning their files (Whittaker & Sidner, Citation1996). Importantly, six of the participants spontaneously linked their unorganized files to their tendency to search for them (“ … and then I have to use search, because I didn’t invest thought in where I put it,” P97-H).

6.4. Hyper-searchers are less conscientious, which explains why their files are less organized

Our results confirm our sixth hypothesis that hyper-searchers score lower on the conscientiousness trait in the Big Five personality traits questionnaire. These results (limited to Windows users) conform with (Massey et al., Citation2014), which found a positive correlation between conscientiousness and file organization, but not with (Dinneen & Frissen, Citation2020) who found that no notable associations between users’ Big Five attributes and their collections, but do find considerable differences across operating systems. Conscientious people tend to be efficient and organized as opposed to easy-going and disorderly. They tend to show self-discipline, act dutifully, and aim for achievement; they display planned rather than spontaneous behavior; and they are generally dependable. Conscientiousness manifests in characteristic behaviors such as being neat, systematic, careful, thorough, and deliberate (Thompson, Citation2008). It is not surprising that people who are less organized, show less self-discipline, display less planned behavior, and are less systematic, less careful, less deliberate, and less neat, also have less organized files. This is a clear indication of the direction of causality: While we cannot entirely refute the possibility that hyper-searchers allow themselves to be disorganized because they rely more on search (e.g. because of incidental positive experience with it), this would not explain why they were found to be less conscientious than the control group, because a personality trait is much deeper and more general than a particular behavior relating to a specific PIM format. It is more reasonable to infer that the finding that hyper-searchers are less conscientious (and therefore are generally less organized) than the control group explains why their files are less organized; and that this in turn, forced them to search more for their files. Other results that suggest this causal direction are that all of the participants in the hyper-searchers group navigated, and the large majority advocated folder organization as a means of improving file retrieval. This suggests that the hyper-searchers’ tendencies to be less organized and navigate less are not derived from an ideology that search should eliminate file organization (see Cutrell et al., Citation2006).

6.5. No evidence to support alternative hypotheses

Our study’s results not only support the hypothesis that hyper-searchers search much more because their files are less organized, it also tested several alternative hypotheses and found no evidence to support them. There was no evidence that hyper-searchers: (H1) search better for their files (as discussed earlier); (H4) have better verbal memory (contrary to our preliminary pilot study results(Bergman et al., Citation2021); (H5) spatial memory is less efficient; (H7) have higher psychometric scores, are more educated, or have higher computer literacy than the control group. Needless to say, future studies may find evidence for these hypotheses, or for other hypotheses we could not think of to explain hyper-searching. However, the burden of proof is now on the researchers who claim that hyper-searching can be explained by these alternative hypotheses. Therefore, until proven otherwise, lack of file organization is the only evidence-based reason for hyper-searching.

6.6. Is hyper-searching an adaptive behavior?

The observation that some people are organized (neat) while others are disorganized (messy) goes all the way back to (Malone, Citation1983), which is considered one of the first PIM studies. Malone observed people in physical offices and found that some of his participants were neat and kept their papers in physical files, while others were messy and kept them in piles. Malone observed that piling papers had a beneficial reminding function: people tended to put papers that they urgently need to attend to at the top of the pile, so that they will incidentally see them and be reminded to attend to them. There was no reminding function when neat participants file their papers in binders and put them on the shelf. Malone also observed that a messy pile does not impair retrieval as long as the paper piles are small, however with large amount of papers, retrieval is hindered and the reminding function is lost because these “to do” papers get overlaid by other papers. It is generally agreed in PIM that the more effort people spend organizing their information items, the less effort they need to expend retrieving it (Bergman & Whittaker, Citation2016; Jones et al., Citation2015). More specifically, there is also evidence for this with respect to files (Bergman, Israeli, Whittaker, et al., Bergman et al., Citation2014, Citation2020) but not e-mails (Whittaker et al., Citation2011). If so, one might expect hyper-searchers whose files are less organized than the control group to have more problems retrieving their files.

When examining our results, it seems that there is evidence showing that hyper-searchers’ lack of organization hinders their retrieval, but the evidence is not as strong as one might expect. The average failure percentage for hyper-searchers (8%) was twice as high as for the control group (4%), but results were not significant because of the high variance. Our results showed no significant difference in retrieval time, but when re-analyzing the results of (Bergman, Israeli, et al., Citation2019) we found that retrieval time for hyper-searchers (M = 63.86 sec.) was significantly longer than for the rest of the participants (M = 39.29 sec.), t(287) = 5.46, p < .001. True, hyper-searchers failed their navigations almost 4 times more than the control group, but they typically resorted to search and successfully completed their retrieval. Therefore, one way of interpreting our results is that hyper-searchers would face a real problem retrieving their files if it wasn’t for search engine technologies (which of course was not available for Malone’s participants who used office papers). The hyper-searchers are probably aware of this, and therefore allow themselves to be more careless about their file organizing which is known to be a time and effort consuming process (Dumais et al., Citation2003; Malone, Citation1983; Whittaker & Hirschberg, Citation2001). However, as explained in the section regarding conscientiousness, it is unlikely that hyper-searchers’ files are less organized because they search more, and much more reasonable to infer that they search more for their files because they are disorganized.

To conclude this section, while our results indicate that the disorganized file behavior of hyper-searchers is not adaptive, it seems that search engine technologies save them from much worse retrieval outcomes. The prediction that search will replace navigation (Cutrell et al., Citation2006; Dourish et al., Citation2000; Fertig et al., Citation1996; Lansdale, Citation1988; Raskin, Citation2000; Russell & Lawrence, Citation2007) was never realized, despite advancements in search technology (Bergman et al., Citation2008; Fitchett & Cockburn, Citation2015). Nevertheless, extensive use of search does provide people with unorganized files with help they can usually rely on.

6.7. Implications, generalization, and limitations

6.7.1. Implications

Our results indicate that even for a small minority of users who search for files much more than the rest of the population, search is not a viable alternative that allows them to completely neglect folders and navigation. This implies that expectations regarding search engines should be realistic: while improvement of search engines is important when a file is not found using navigation (and this happens more often among hyper-searchers) we should not expect improvement in search engines to eliminate file folders (Bergman et al., Citation2010; Cutrell et al., Citation2006). It could always be argued that our study is tool-dependent, and that future much-improved search engines could eliminate the need for file folders for hyper-searchers (and perhaps even the entire population). While these claims are potentially true, the burden of proof is currently on those who make them. Implications regarding navigation are that file organization should be taught as part of PIM literacy (Alon & Nachmias, Citation2019; Mioduser et al., Citation2009) to improve the students’ file organization and consequently its retrieval. Teaching PIM literacy could start as early as in middle school (Van Alstyne, Citation2023). Another possible implication is software that would recognize hyper-searchers by their retrieval activity, and nudge them slightly more to save files in meaningful folders when they close files that are stored in default folders (e.g. Download). Nudging has been proven to be a useful behavior-changing strategy (Thaler & Sunstein, Citation2008). However, because there may be privacy issues involved and users may feel that their behavior is “observed” by the software (although the process will be fully automatic with no human in the loop), the software should be thoroughly tested before released to the public. Other experimental software that may help hyper-searchers are Finder Highlights which was shown to reduce file navigation time considerably in a longitudinal field study (Fitchett et al., Citation2014), and Old’nGray which automatically grays out the icon of older versions of the file to help users ignore them and spot the latest version using perceptual rather than cognitive processes (Bergman et al., Citation2015).

6.7.2. Generalization

Our EPIR software collected MS Office target files, but our results may not generalize well to other PIM formats. Our data indicates that hyper-searchers estimate that they search more (and navigate less) for e-mails too. However, it also indicates that the control group members search much more for e-mails than for files: their average estimated e-mail search percentage (43%) is over 6 times the average estimated file percentage (7%). This is consistent with previous results that indicate that people search for e-mails much more than for files (Bergman & Yanai, Citation2018; Jones et al., Citation2014; Whittaker et al., Citation2011). Regarding personal pictures, a recent study (Bergman et al., Citation2022) indicated that people have too many pictures to manually organize them, and if they attempt to download them into computer folders this results in a dramatic increase in search failure. Regarding the Internet, search is the main way of revisiting websites (Jones et al., Citation2014), and although people tend to bookmark webpages, they hardly ever use these bookmarks and instead search again for these websites (Bergman, Whittaker, et al., Citation2021). In sum, our results are valid only for personal files, and do not transfer well to other PIM formats. There are several possible reasons that files are organized manually more than other PIM collections (Bergman & Whittaker, Citation2016), including that: (a) Files are the most valuable PIM items for the users, and therefore they are willing to work harder to ensure that they can access them, and (b) Files map to people’s projects better than other PIM collections and therefore are easier to be categorized. Text files (such as MS Office files) comprise only a small portion of the files on users’ computers (Dinneen & Julien, Citation2019). However, these files are subjectively valued by users as their most important collection (Bergman, Citation2006; Boardman, Citation2004).

6.7.3. Research limitations

(a) Some of our variables (e.g. computer literacy) were measured by using self-evaluation and therefore may be inaccurate. (b) The EPIR software that we used allowed us to recruit only PC laptop users with a Windows operating system and therefore our results may not fully apply to Mac users who tend to organize their files differently (Bergman et al., Citation2012; Dinneen & Frissen, Citation2020), the use of smartphones where the same users search more than on their computers (Bergman & Yanai, Citation2018), or to the use of tablets and other devices. (c) Similar to previous work (e.g., Bergman, Israeli, et al., Citation2019) our EPIR software collected the target files from the Recent Documents folder, but these files may not fully represent the entire file collection.

7. Conclusions

Research has consistently shown that people prefer to retrieve their files by using navigation over search (Barreau & Nardi, Citation1995; Bergman & Yanai, Citation2018; Bergman et al., Citation2014; Bergman, Israeli, et al., Citation2019; Boardman & Sasse, Citation2004; Jones et al., Citation2014; Kirk et al., Citation2006; Ravasio et al., Citation2004; Song & Ling, Citation2011; Teevan et al., Citation2004), and search only in the small minority of cases when they don’t remember where their files are located (Bergman et al., Citation2008; Fitchett & Cockburn, Citation2015). In this paper, we studied people who are an exception to this rule and searched for their files much more than the rest of the population. Our results did not indicate that they did so because they searched better, because they were ideologically against navigation, because they have superior (verbal) cognitive abilities or inferior (spatial) cognitive abilities, or because their personality, psychometric scores, years of education or computer literacy is different than that of the control group. Instead, our study found strong evidence that hyper-searchers’ files are less organized than the rest of the population. In other words, there is no indication that hyper-searchers choose to search more (e.g. because they search better and therefore the choice to search is more appealing to them), but instead they search more out of necessity: their low level of file organization does not allow them to remember where their files are located, so they had no alternative but to search for them. This indicates that while hyper-searchers’ extensive file searching is an exception to the rule, their reasons for doing so only strengthen the rule: Like the rest of the population, hyper-searchers navigate when they remember the location in which they had placed their files and search for it when they don’t. However, as their files are less organized, this happens more often than for the rest of the population.

Acknowledgments

The authors thank the participants and research assistants for their time and efforts. This research was funded by the Israeli Science Foundation, Grant No. 976/22.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Funding

This work was supported by the Israel Science Foundation [976/22].

Notes on contributors

Ofer Bergman

Ofer Bergman is an Associate Professor at Bar-Ilan University. He has published over 50 PIM papers, including the book The Science of Managing Our Digital Stuff (MIT Press). Among his achievements are: developing the user-subjective approach to PIM systems design and positive evaluation of its prototypes; presenting counterintuitive findings regarding search, tags, and file sharing; and discovering the neuro-cognitive roots of folder navigation preference. He received the 2024 Rector Award for Innovative Research for his studies on streaming music collections.

Noga Dvir

Noga Dvir is a PhD Student in the Information Science Department, Bar-Ilan University.

Notes

3 The hyper-searchers search percentage was inflated when the number of retrievals was low. For example, a participant assigned to the hyper-searchers group who conducted 4 retrievals, had to search for 2 of these retrievals (50%) to pass the 30% inclusion criterion.

4 Note that the qualitative results of using more general folders relates to the quantitative results of hyper-searchers using flatter and bigger files than the control group.

References

  • Adavi, K. A. K., & Acker, A. (2023). What is a file on a phone? Personal information management practices amongst WhatsApp users. Proceedings of the ACM on Human-Computer Interaction, 7(CSCW2), 1–28. https://doi.org/10.1145/3610221
  • Alon, L., & Nachmias, R. (2019). Principles for the implementation of personal information management literacy program in higher education. INTED2019 Proceedings, Valencia, Spain.
  • Barreau, D. K., & Nardi, B. A. (1995). Finding and reminding: File organization from the desktop. ACM SIGCHI Bulletin, 27(3), 39–43. https://doi.org/10.1145/221296.221307
  • Bellotti, V., & Smith, I. (2000). Informing the design of an information management system with iterative fieldwork. Proceedings of the conference on Designing interactive systems: processes, practices, methods, and techniques, New-York, USA (pp. 227–237). ACM Press. https://d i. rg/10.E280/AA1145347642.E280AA347728
  • Benn, Y., Bergman, O., Glazer, L., Arent, P., Wilkinson, I. D., Varley, R., & Whittaker, S. (2015). Navigating through digital folders uses the same brain structures as real world navigation. Scientific Reports, 5(1), 1–8. https://doi.org/10.1038/srep14719
  • Benn, Y., Webb, T. L., Chang, B. P., & Reidy, J. (2015). What information do consumers consider, and how do they look for it, when shopping for groceries online? Appetite, 89, 265–273. https://doi.org/10.1016/j.appet.2015.01.025
  • Bergman, O. (2006). The use of subjective attributes in personal information management systems [ Ph.D. dissertation]. Tel Aviv University.
  • Bergman, O. (2013). Variables for personal information management research. ASLIB Proceedings, 65(5), 464–483. https://doi.org/10.1108/AP-04-2013-0032
  • Bergman, O., Beyth-Marom, R., Nachmias, R., Gradovitch, N., & Whittaker, S. (2008). Improved search engines and navigation preference in personal information management. ACM Transactions on Information Systems, 26(4), 1–24. https://doi.org/10.1145/1402256.1402259
  • Bergman, O., Elyada, O., Dvir, N., Vaitzman, Y., & Ben Ami, A. (2015). Spotting the latest version of a file with Old’nGray. Interacting with Computers, 27(6), 630–639. https://doi.org/10.1093/iwc/iwu018
  • Bergman, O., Gutman, D., & Whittaker, S. (2022). It’s too much for us to handle—the effect of smartphone use on long-term retrieval of family photos. Personal and Ubiquitous Computing, 27(2), 1–10. https://doi.org/10.1007/s00779-022-01677-x
  • Bergman, O., Israeli, T., & Benn, Y. (2021). Why do some people search for their files much more than others? A preliminary study. Aslib Journal of Information Management, 73(3), 406–418. https://doi.org/10.1108/AJIM-08-2020-0250
  • Bergman, O., Israeli, T., & Whittaker, S. (2019). Search is the future? The young search less for files. Proceedings of the Association for Information Science and Technology, 56(1), 360–363. https://doi.org/10.1002/pra2.29
  • Bergman, O., Israeli, T., & Whittaker, S. (2020a). Factors hindering shared files retrieval. Aslib Journal of Information Management, 72(1), 130–147. https://doi.org/10.1108/AJIM-05-2019-0120
  • Bergman, O., Israeli, T., & Whittaker, S. (2020b). The scalability of different file sharing methods. Journal of the Association for Information Science and Technology, 71(12), 1424–1438. https://doi.org/10.1002/asi.24350
  • Bergman, O., Israeli, T., Whittaker, S., Yanai, N., & Amichai-Hamburger, Y. (2020). The effect of personality traits on file retrieval. IConference 2020, Borås, Sweden.
  • Bergman, O., & Sher, E. (2022). File search: A contrast between beliefs and behavior. Interacting with Computers, 34(6), 150–154. https://doi.org/10.1093/iwc/iwad005
  • Bergman, O., Tene-Rubinstein, M., & Shalom, J. (2013). The use of attention resources in navigation vs. search. Personal and Ubiquitous Computing, 17(3), 583–590. https://doi.org/10.1007/s00779-012-0544-z
  • Bergman, O., Tucker, S., Beyth-Marom, R., Cutrell, E., & Whittaker, S. (2009). It’s not that important: Demoting personal information of low subjective importance using GrayArea. CHI 2009 Conference on Human Factors and Computing Systems, Boston, MA, USA (pp. 269–278). ACM. https://doi.org/10.1145/1518701.1518745
  • Bergman, O., & Whittaker, S. (2016). The science of managing our digital stuff. Mit Press.
  • Bergman, O., Whittaker, S., & Falk, N. (2014). Shared files – The retrieval perspective. Journal of the American Society for Information Science and Technology, 65(10), 1949–1963. https://doi.org/10.1002/asi.23147
  • Bergman, O., Whittaker, S., & Frishman, Y. (2019). Let’s get personal: The little nudge that improves document retrieval in the cloud. Journal of Documentation, 75(2), 379–396. https://doi.org/10.1108/JD-06-2018-0098
  • Bergman, O., Whittaker, S., Sanderson, M., Nachmias, R., & Ramamoorthy, A. (2010). The effect of folder structure on personal file navigation. Journal of the American Society for Information Science and Technology, 61(12), 2426–2441. https://doi.org/10.1002/asi.21415
  • Bergman, O., Whittaker, S., Sanderson, M., Nachmias, R., & Ramamoorthy, A. (2012). How do we find personal files?: The effect of OS, presentation & depth on file navigation. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, Austin, Texas, USA (pp. 2977–2980). ACM. https://doi.org/10.1145/2207676.2208707
  • Bergman, O., Whittaker, S., & Schooler, J. (2021). Out of sight and out of mind: Bookmarks are created but not used. Journal of Librarianship and Information Science, 53(2), 338–348. https://doi.org/10.1177/0961000620949652
  • Bergman, O., & Yanai, N. (2018). Personal information retrieval: Smartphones vs. computers, emails vs. files. Personal and Ubiquitous Computing, 22(4), 621–632. https://doi.org/10.1007/s00779-017-1101-6
  • Blau, M., Madmon, S., & Bergman, O. (2013). The effect of computer literacy on the percentage of personal file search. Proceedings of the Chais conference on instructional technologies research 2013: Learning in the technological era (pp. 92–93). The Open University of Israel. http://www.openu.ac.il/innovation/chais2013/download/b3_7.pdf
  • Boardman, R. (2004). Improving tool support for personal information management [ Ph.D.]. Imperial College]. London. http://www.iis.ee.ic.ac.uk/~rick/thesis/boardman04-thesis.pdf
  • Boardman, R., & Sasse, M. A. (2004). “Stuff goes into the computer and doesn’t come out”: A cross-tool study of personal information management. SIGCHI Conference on Human Factors in Computing Systems, Vienna, Austria.
  • Carbonell-Carrera, C., Gunalp, P., Saorin, J. L., & Hess-Medler, S. (2020). Think spatially with game engine. ISPRS International Journal of Geo-Information, 9(3), 159. https://doi.org/10.3390/ijgi9030159
  • Conner, C. R., Kadipasaoglu, C. M., Shouval, H. Z., Hickok, G., Tandon, N., & Hinojosa, J. A. (2019). Network dynamics of Broca’s area during word selection. PLOS ONE, 14(12), e0225756. https://doi.org/10.1371/journal.pone.0225756
  • Cutrell, E., Dumais, S. T., & Teevan, J. (2006). Searching to eliminate personal information management. Communications of the ACM, 49(1), 58–64. https://doi.org/10.1145/1107458.1107492
  • Dincer, S. (2016). Assessing the computer literacy of university graduates. The Third International Conference on Open and Flexxible Education, Hong Kong.
  • Dinneen, J. D., & Frissen, I. (2020). Mac users do it differently: The role of operating system and individual differences in file management. Extended Abstracts of the 2020 CHI Conference on Human Factors in Computing Systems, Honolulu, HI, USA. https://doi.org/10.1145/3334480.3382944
  • Dinneen, J. D., & Julien, C. A. (2019). What’s in people’s digital file collections? Proceedings of the Association for Information Science and Technology, 56(1), 68–77. https://doi.org/10.1002/pra2.64
  • Dinneen, J. D., & Julien, C. A. (2020). The ubiquitous digital file: A review of file management research. Journal of the Association for Information Science and Technology, 71(1), 1–32. https://doi.org/10.1002/asi.24222
  • Dinneen, J. D., Julien, C.-A., & Frissen, I. (2019). The scale and structure of personal file collections. Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, Glasgow, Scotland, UK. https://doi.org/10.1145/3290605.3300557
  • Dourish, P., Edwards, W. K., LaMarca, A., Lamping, J., Petersen, K., Salisbury, M. & Thornton, J. (2000). Extending document management systems with user-specific active properties. ACM Transactions on Information Systems, 18(2), 140–170. https://doi.org/10.1145/348751.348758
  • Dumais, S. T., Cutrell, E., Cadiz, J. J., Jancke, G., Sarin, R., & Robbins, D. C. (2003). Stuff I’ve seen: A system for personal information retrieval and re-use. Proceedings of the 26th annual international ACM SIGIR conference on Research and Development in Information Retrieval, Toronto, Canada (pp. 72–79). ACM Press. https://doi.org/10.1145/860435.860451
  • Elsweiler, D., Baillie, M., & Ruthven, I. (2011). What makes Re-finding information difficult? A study of email Re-finding. In Advances in information retrieval (Vol. 6611, pp. 568–579). Springer. https://doi.org/10.1007/978-3-642-20161-5_57
  • Elsweiler, D., & Ruthven, I. (2007). Towards task-based personal information management evaluations. Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, Amsterdam, The Netherlands. https://doi.org/10.1145/1277741.1277748
  • Farina, P. A. (2005). A comparison of two desktop search engines: Google desktop search (beta) vs. Windows XP search Companion. 21st Computer Science Seminar, Hartford, CT, USA.
  • Fertig, S., Freeman, E., & Gelernter, D. (1996). “Finding and reminding” reconsidered. ACM SIGCHI Bulletin, 28(1), 66–69. https://doi.org/10.1145/249170.249187
  • Fitchett, S., & Cockburn, A. (2015). An empirical characterisation of file retrieval. International Journal of Human-Computer Studies, 74, 1–13. https://doi.org/10.1016/j.ijhcs.2014.10.002
  • Fitchett, S., Cockburn, A., & Gutwin, C. (2014). Finder highlights: Field evaluation and design of an augmented file browser. Proceedings of the 32nd annual ACM conference on Human factors in computing systems, Toronto, Canada (pp. 3685–3694). ACM. https://doi.org/10.1145/2556288.2557014
  • Gonçalves, D., & Jorge, J. A. (2003). An empirical study of personal document spaces. Proceedings of DSV-IS’03, Funchal, Madeira Islan, Portugal (pp. 46–60). Springer.
  • Hegarty, M., Richardson, A. E., Montello, D. R., Lovelace, K., & Subbiah, I. (2002). Development of a self-report measure of environmental spatial ability. Intelligence, 30(5), 425–447. https://doi.org/10.1016/S0160-2896(02)00116-2
  • Henderson, S., & Srinivasan, A. (2009). An empirical analysis of personal digital document structures. In M.J. Smith & G. Salvendy (Eds.), HCI international 2009 (pp. 394–403). Springer.
  • Jones, W. (2007). Keeping found things found: The study and practice of personal information management. Morgan Kauffman.
  • Jones, W., Capra, R., Diekema, A., Teevan, J., Pérez-Quiñones, M., Dinneen, J. D., & Hemminger, B. (2015). “For telling” the present: Using the delphi method to understand personal information management practices. Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems, Seoul, Republic of Korea (pp. 3513–3522). ACM. https://doi.org/10.1145/2702123.2702523
  • Jones, W., & Teevan, J. (2007). Personal information management. University of Washington Press.
  • Jones, W., Wenning, A., & Bruce, H. (2014). How do people re-find files, emails and web pages? iConference 2014 Proceedings, Berlin, Germany (pp. 552–564). https://doi.org/10.9776/14136
  • Kelly, D., & Teevan, J. (2007). Understanding what works: Evaluating PIM tools. In P. J. W. In & J. Teevan (Eds.), Personal information management (pp. 190–204). University of Washington Press.
  • Kirk, D., Sellen, A., Rother, C., & Wood, K. (2006). Understanding photowork. SIGCHI conference on Human Factors in Computing Systems (pp. 761–770). ACM. https://doi.org/10.1145/1124772.1124885
  • Kobayashi, M., & Takeda, K. (2000). Information retrieval on the web. ACM Computing Surveys, 32(2), 144–173. https://doi.org/10.1145/358923.358934
  • Lansdale, M. W. (1988). The psychology of personal information management. Applied Ergonomics, 19(1), 55–66. https://doi.org/10.1016/0003-6870(88)90199-8
  • Lowe, M. (2006). Evaluation of desktop search applications (Technical report). Sydney, Australia: Kalio.
  • Malone, T. W. (1983). How do people organize their desks? Implications for the design of office information systems. ACM Transactions on Office Information Systems, 1(1), 99–112. https://doi.org/10.1145/357423.357430
  • Massey, C., TenBrook, S., Tatum, C., & Whittaker, S. (2014). PIM and personality: What do our personal file systems say about us? Proceedings of the 32nd annual ACM conference on Human factors in computing systems, Toronto, Canada (pp. 3695–3704). ACM. https://doi.org/10.1145/2556288.2557023
  • Mioduser, D., Nachmias, R., & Forkosh-Baruch, A. (2009). New literacies for the knowledge society. In J. M. Voogt & G. A. Knezek (Eds.), International handbook of information technology in primary and secondary education (pp. 23–42). Springer.
  • Neisser, U. (1964). Visual search. Scientific American, 210(6), 94–102. https://doi.org/10.1038/scientificamerican0664-94
  • Nwagwu, W. (2023). “Digesting the abundance of idol matter” key factors in personal information management experiences of selected social science faculty. VINE Journal of Information and Knowledge Management Systems, 53(3), 544–565. https://doi.org/10.1108/VJIKMS-10-2020-0182
  • Nwagwu, W. E., & Donkor, A. B. (2022). Personal information creation, storage and finding behaviours of faculty in selected universities in Ghana. African Journal of Library, Archives & Information Science, 32(1), 123–138.
  • Obendorf, H., Weinreich, H., Herder, E., & Mayer, M. (2007). Web page revisitation revisited: Implications of a long-term click-stream study of browser usage. Proceedings of the SIGCHI conference on Human factors in computing systems, San Jose, California, USA (pp. 597–606). ACM. https://doi.org/10.1145/1240624.1240719
  • Rader, E. (2009). Yours, mine and (not) ours: Social influences on group information repositories. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (pp. 2095–2098). ACM. https://doi.org/10.1145/1518701.1519019
  • Raskin, J. (2000). The humane interface: New directions for designing interactive systems. ACM Press/Addison-Wesley Publishing Co.
  • Ravasio, P., Sch, S. G., & Krueger, H. (2004). In pursuit of desktop evolution: User problems and practices with modern desktop systems. ACM Transactions on Computer-Human Interaction, 11(2), 156–180. https://doi.org/10.1145/1005361.1005363
  • Riès, S. K., Dronkers, N. F., & Knight, R. T. (2016). Choosing words: Left hemisphere, right hemisphere, or both? Perspective on the lateralization of word retrieval. Annals of the New York Academy of Sciences, 1369(1), 111. https://doi.org/10.1111/nyas.12993
  • Russell, D., & Lawrence, S. (2007). Search everything. In W. Jones & J. Teevan (Eds.), Personal information management (pp. 153–166). University of Washington Press.
  • Song, G., & Ling, C. (2011). Users’ attitude and strategies in information management with multiple computers. International Journal of Human-Computer Interaction, 27(8), 762–792.
  • Sweeten, G., Sillence, E., & Neave, N. (2018). Digital hoarding behaviours: Underlying motivations and potential negative consequences. Computers in Human Behavior, 85, 54–60. https://doi.org/10.1016/j.chb.2018.03.031
  • Teevan, J., Alvarado, C., Ackerman, M. S., & Karger, D. R. (2004). The perfect search engine is not enough: A study of orienteering behavior in directed search. SIGCHI conference on Human Factors in Computing Systems, Vienna, Austria (pp. 415–422). ACM Press. https://doi.org/10.1145/985692.985745
  • Thaler, R. H., & Sunstein, C. R. (2008). Nudge: Improving decisions about health, wealth, and happiness. Yale University Press.
  • Thompson, E. R. (2008). Development and validation of an international English big-five mini-markers. Personality & Individual Differences, 45(6), 542–548. https://doi.org/10.1016/j.paid.2008.06.013
  • Treisman, A. M., & Gelade, G. (1980). A feature-integration theory of attention. Cognitive Psychology, 12(1), 97–136. https://doi.org/10.1016/0010-0285(80)90005-5
  • Tversky, A., & Kahneman, D. (1973). Availability: A heuristic for judging frequency and probability. Cognitive Psychology, 5(2), 207–232. https://doi.org/10.1016/0010-0285(73)90033-9
  • Tversky, A., & Kahneman, D. (1990). Judgment under uncertainty: Heuristics and biases. In G. Shafer & J. Pearl (Eds.), Readings in uncertain reasoning (pp. 32–39). Morgan Kaufmann Publishers Inc.
  • Van Alstyne, J. H. (2023). Toward an understanding of the personal information management discourses of youth. University of Rochester].
  • Wechsler, D. (1945). The Wechsler Memory Scale. The Psychological Corporation, San Antonio, TX.
  • Whittaker, S., & Hirschberg, J. (2001). The character, value, and management of personal paper archives. ACM Transactions on Computer-Human Interaction, 8(2), 150–170. https://doi.org/10.1145/376929.376932
  • Whittaker, S., Matthews, T., Cerruti, J., Badenes, H., & Tang, J. (2011). Am I wasting my time organizing email? A study of email refinding. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, Vancouver, Canada (pp. 3449–3458). ACM. https://doi.org/10.1145/1978942.1979457
  • Whittaker, S., & Sidner, C. (1996). Email overload: Exploring personal information management of email. Proceedings of the SIGCHI conference on Human Factors in Computing Systems: Common Ground, Vancouver, Canada (pp. 276–283). ACM Press.