Searching in the Tower of Babel
Most of us, at some point, have used multiple search engines to resolve a given information need. For example, in purchasing a consumer product, we might scour half a dozen retail websites to find a good deal. Or in choosing a family holiday, we might visit a set of travel sites with common search criteria. In each case, our task is relatively straightforward: we take our keywords and copy them, along with any other details, into the respective search boxes.
But for knowledge workers, whose task is to perform searches that are comprehensive, accurate and repeatable, it is a different story. Their information needs can be complex and structured, and their search strategies often draw on platform-specific syntax and operators. For example, a recruiter wanting to fill a particular data science role may want to search LinkedIn, Stackoverflow, Github, and other social forums to find suitable candidates. Likewise, a clinician or information professional performing a systematic literature review might want to search numerous databases such as PubMed, Embase, Web of Science, PyscINFO, and more. In each case, their search query has to be manually ‘translated’ to the syntax and user interface of each database.
For a relatively simple query, this may not be a major undertaking, especially if it makes modest use of platform-specific syntax and operators. However, the searcher still has to recognise which elements are platform-specific, work out what the equivalents are in the other databases, then manually edit their query. All of this is tedious, error-prone, and inefficient.
So we’re delighted this week to announce support for automated search strategy translation. This means that in addition to natively searching Google, Bing, Google Scholar, PubMed, Epistmonikos and TRIP database, you can now use 2Dsearch to search:
Ovid
Cochrane Library
Embase
Web of Science
CINAHL
Pyscinfo
Scopus
and various other sources.
This is courtesy of a quite wonderful resource known as the PolyGlot Search Translator (PST), which ‘helps to automatically translate searches across multiple databases by modifying the database-specific syntax’. It is available as an online demo and on Github as an open source JavaScript library.
I should really say semi-automated support above as providing accurate and reliable translation of search strategies can be a significant undertaking, often requiring skilled human judgment. In particular, thesaurus terms may still need to be mapped manually. But much of the task is routine, and by integrating PST’s translation capabilities with the visual framework of 2Dsearch, all sorts of magic can happen.
For example, here is a published search strategy on the subject of ‘Galactomannan detection for invasive aspergillosis in immunocompromised patients’:
1 "Aspergillus"[MeSH]
2 "Aspergillosis"[MeSH]
3 "Pulmonary Aspergillosis"[MeSH]
4 aspergill*[tiab]
5 fungal infection[tw]
6 (invasive[tiab] AND fungal[tiab])
7 1 OR 2 OR 3 OR 4 OR 5 OR 6
8 "Serology"[MeSH]
9 Serology"[MeSH]
10 (serology[tiab] OR serodiagnosis[tiab] OR serologic[tiab]) 11 8 OR 9 OR 10
12 "Immunoassay"[MeSH]
13 (immunoassay[tiab] OR immunoassays[tiab])
14 (immuno assay[tiab] OR immuno assays[tiab])
15 (ELISA[tiab] OR ELISAs[tiab] OR EIA[tiab] OR EIAs[tiab])
16 immunosorbent[tiab]
17 12 OR 13 OR 14 OR 15 OR 16
18 Platelia[tw]
19 "Mannans"[MeSH]
20 galactomannan[tw]
21 18 OR 19 OR 20
22 11 OR 17 OR 21
23 7 AND 22
Like many complex searches, this one is hard to visualize, and even harder to validate, debug or translate. But when opened in 2Dsearch, we see the overall structure consists of a conjunction of two disjunctions (Lines 7 and 22), the first of which articulates variations on the fungal infection concept, while the latter contains various nested disjunctions to capture the diagnostic test (serology) and associated procedures:
Once we have our strategy in 2Dsearch, we can validate, debug and optimise it by combining and recombining the various elements - like visual Lego blocks, experimenting with alternative configurations. (BTW, did you spot the error in the published strategy above? 2Dsearch did – see below). In the right hand pane we see real time results, from PubMed or a number of other key databases:
But crucially, by integrating with PST, we are now offered automated translations to other databases. For example, if we select the Query tab on the result pane, we’ll see our original query expressed as a Boolean string, along with a number of automated translations:
So now we can simply select our chosen database, and copy the translated search string across. In essence, with 2Dsearch we can create, optimize and debug our search, then have it translated as required.
In closing
2Dsearch is a framework for structured searching in which information needs are expressed by manipulating objects on a two-dimensional canvas. Transforming logical structure into physical structure eliminates many sources of syntactic error, makes the query semantics more transparent, and offers an open-access platform for sharing reproducible search strategies and best practices.
Our integration with PST minimizes tedious and error-prone manual translation. But it also represents is something far greater: the prospect of a universal language for search, in which an individual’s information needs can be articulated in a generic manner, with the task of mapping to the syntax of underlying databases delegated to platform-specific adapters. Such a development could have profound implications for the way in which search skills are taught, learnt and applied.
We’ll be revisiting this theme in future posts when we start to explore further visualization options, focusing on alternative layouts that communicate different aspects of searches and allow them to be understood and optimized in different ways. For now, take a look at 2Dsearch, try a few searches for yourself, and let us know what you think.