We have just completed development on IGP:Information Architect 2 (IA2). It drastically simplifies the user interface from IA1, while increasing the ease of creating multi-lingual vocabularies.
Here is a quick overview of what it does extracted from the User's Manual
Vocabulary Creation and Maintenance
IGP:Information Architect 2 allows multiple controlled vocabularies to be created, modified, translated and used across multiple and large systems in multiple languages. The primary purpose of this application is the maintenance of the controlled vocabularies that are used in and by other applications to facilitate search and retrieval.
IGP:Information Architect 2 has been designed both as a specific tool for use within the IGP:ECMS Solutions application framework, and as a general Web Services Application for access by other applications. IGP:Information Architect 2 is deployed as a standard component with all IGP:ECMS Solutions products.
The application has a set of Application Programming Interfaces (APIs) that allow other authorized systems to retrieve vocabulary terms for use in their own context.
Purpose and Use
The reason Controlled Vocabularies are created is to facilitate browse, search and content/information retrieval systems. This means any particular vocabulary is used and useful at two points:
- The classification interface - where various Controlled vocabulary and other metadata is applied to retrievable content objects.
- The search and retrieval interface - where users can access tools that allow them to locate content based on various information facets or explicit metadata.
IGP:Information Architect 2 is designed for use in a template driven web interface system. Any vocabulary or vocabulary point can be made available to a data form or other classification interface for application of a term, term sequence or multiple terms to a classification field.
Within IGP:ECMS Solutions, metadata fields are specifically indexed as multiple facets in IGP:RealSearch, enabling the vocabulary terms to be used in standard and custom search interfaces.
Internationalization and Localization
In addition to the vocabulary maintenance tools, IGP:Information Architect 2 also contains the translation maintenance tools for localization of all IGP:ECMS Solutions application interfaces that can be localized (primarily IGP:InfoViewer 2 portals).
This approach focuses multi-lingual tasks into one application rather than dividing it across multiple applications and processes. It also makes the task of localization accessible to "normal" users through a relatively intuitive interface and localization terms are easy to review and change as required.
Knowledge Organization Systems (KOS)
The term "knowledge organization" (or "organization of knowledge", "organization of information" or "information organization") traditionally designates a field of study related to Library and Information Science (LIS) although it is now in more general use to describe any type of formal information retrieval system (excluding full-text search). In this meaning KO is about activities such as document description, indexing and classification performed in libraries, databases, archives and other information repositories.
In the library context these activities are usually done by librarians, archivists, and subject specialists. In more specific and smaller organizations these tasks can be carried out by a wide range of people, generally organization domain experts, who can define and control the nature and quality of such knowledge organizing processes (KOP) as well as the knowledge organizing systems (KOS) used to organize documents, document representations and concepts. This control is executed using controlled vocabularies in an application framework.
IGP:Information Architect 2 provides the framework for the creation and distribution of specific controlled vocabularies to support a wide range of KOS requirements. Four types of controlled vocabularies are supported:
- Authority files
- Glossaries
- Taxonomies
- Thesauruses
What is a Controlled Vocabulary?
Controlled vocabularies provide a way to organize knowledge or information for subsequent retrieval.
A Controlled vocabulary is a subset of natural language that is used to tag documents using some form of classification scheme, and then to find content through navigation or search. The goal for a Controlled Vocabulary is to establish agreement between the concepts within an application interface and the natural language vocabulary of the person using it. Controlled vocabularies are essential to the management of large collections.
Controlled Vocabularies are used in subject indexing schemes, subject headings, authority files, thesauri and taxonomies.
Controlled vocabularies as they are broadly defined and used exhibit the relationships depicted in the following figure:
The figure shows Controlled Vocabulary term relationships by vocabulary type. Vocabularies are in green, term types are in blue. This illustrates the increasing complexity of creation and maintenance of the various vocabulary types.
Controlled vocabulary schemes mandate the use of predefined, authorised terms that have been preselected by the designer of the vocabulary, in contrast to natural language vocabularies, where there is no restriction on the vocabulary.
By applying controlled vocabulary terms to different content, it is possible to ensure consistency of search and browse retrieval experiences for end users. It also allows content to be presented in a predefined organization context that usually makes it easier for users to interact with large systems.
Internet search engines have made full (free) text search the most popular way to access content. Compared to free text searching, the use of a controlled vocabulary can dramatically increase the performance of an information retrieval system if performance is measured by precision (the percentage of documents in the retrieval list that are actually relevant to the search topic). This is the usual requirement for an ECMS type system. Success is rated by how few results are returned, rather than how many.
Within and between organizations, controlled vocabularies may be introduced to improve technical communication. The use of controlled vocabulary ensures that everyone is using the same word to mean the same thing in the same context. This consistency of terms is one of the most important concepts in technical writing and information/knowledge management, where effort is expended to use the same word throughout a document or organization instead of slightly different ones (synonyms) to refer to the same thing.
More information on this general topic can be found on Wikipedia in the Controlled Vocabulary article.
IGP:Information Architect 2 allows the creation of any number of controlled vocabularies, across a number of types for any purpose where an organization can benefit from the power and precision which comes from using such methods.
Authority Files
Authority Files are lists of terms that are used to control the variant names for an entity or the domain value for a particular field. Examples include names for countries, individuals, and organizations. Non-preferred terms may be linked to the preferred versions. This type of vocabulary generally does not include a deep organization or complex structure. The presentation may be alphabetical or organized by a shallow node label scheme. This allows for simple navigation, particularly when the authority file is being accessed manually or is extremely large.
Glossaries
A glossary is a list of terms, usually with definitions. Strictly speaking a glossary is not a controlled vocabulary in the strict classification sense. However it is a very useful structure in many types of publishing and is included for convenience. Glossary terms may be from a general or specific subject field or those used in a particular work. The terms are defined for use within a specific environment and rarely have variant meanings provided.
Taxonomies
Classification and Categorization Schemes
These terms are often used interchangeably and provide ways to separate entities into buckets or relatively broad topic levels. Some examples provide a hierarchical arrangement of numeric or alphabetic notation to represent broad topics. An example of a classification scheme is the Dewey Decimal Classification (a closed system of 10 numeric sections with decimal extensions). Taxonomies are increasingly being used in object oriented design and knowledge management systems to indicate any grouping of objects based on a particular characteristic.
Subject Headings
Subject Headings provides a set of controlled terms to represent the subjects of items in a collection. Subject heading lists can be extensive, covering a broad range of subjects. The subject heading lists structure is generally very shallow with a limited hierarchical structure. In use, subject headings tend to be pre-coordinated, with rules for how subject headings can be joined to provide more specific concepts.
Within IA2 subject headings can be created as taxonomies if there are no related terms, or as thesauruses if there are related term definitions.
Thesauruses (Thesauri)
Thesauruses are based on concepts, and they show relationships between terms. Relationships include hierarchy, equivalence, and associative (or related). Within IA2 these relationships are represented by the notation BT (broader term), NT (narrower term), RT (associative or related), UF (Use for) and LE (Linguistic equivilent).
Associative Relationships may be more granular in some schemes. Preferred terms for indexing and retrieval are identified. Entry terms (or non-preferred terms) point to the preferred terms that are to be used for each concept.
Multiple Languages
IGP:Information Architect 2 is a multi-lingual vocabulary tool specifically designed to address the multi-lingual issues of developing countries vs. developed countries. That means we are less worried about handling the multi-language issues between English, French and German (as examples) as we are concerned about the languages of India, South East Asia, East Asia, Central Asia, Latin America and Africa, etc.
Languages that do not use the Latin alphabet and keyboard, or use complex diacritic combinations, can be difficult to create and manage. There is not a large body of work on this subject. While users interacting with controlled vocabularies generally have read/speak fluency, keyboard skills for search term entry or term creation are limited. Controlled Vocabularies can offer a major accessibility empowerment tool. This is especially true with the increase in mobile computing which is providing Internet access to ever widening circles of users.
Example 1. Singapore has four official languages: English, Malay, Chinese and Tamil. While English and Malay share the latin alphabet, Chinese and Tamil use two very different scripts.
Example 2. India has a complex map of languages by state with Hindi and English being the base pair, but every state/region having at least one additional official language. For many purposes it is essential that communication is carried out in locale languages, especially as connection to content becomes more mobile driven.
Multi-lingual Vocabulary Concepts
Addressing this diversity of languages, and the concepts and vocabularies they cover is difficult for vocabulary creators and maintainers. IGP:Information Architect 2 uses the concepts of ISO 2766:1986 Multi-lingual thesauruses to some degree or other.
Exchange language. A vocabulary can be created in any language. That is the exchange language. Therefore all terms are created as Exchange Language Terms.
Linguistic Equivalents. These are a term expressed in any other language. The term linguistic equivalent is used when describing the exchange term expressed in another language. A linguistic equivalent may be a linguistic synononym, a translation or a transliteration. If there is no exact equivalent, and no Linguistic Equivalent is defined, a vocabulary falls back to the Exchange Language Term.
The terms are defined for use within a specific environment and rarely have variant meanings provided.
Posted by: Term Paper | 02/10/2010 at 01:55 PM
Wow this blog is wonderful I like to study your messages. You know, many people are looking on this information, you can help a lot for increasing vocabularies.
Posted by: דומיין בעברית | 07/15/2011 at 12:13 PM