Thursday, December 29, 2011

From Folders to Facets

A recent taxonomy project I completed involved creating a new taxonomy for a financial services client who was migrating its internal content from shared drive folders to a SharePoint-based intranet, which also included automated indexing and a search engine (FAST). The new taxonomy will help support the search functionality, and taxonomy terms will also display in the left-hand margin (called the Refinement Panel), so that users can refine/narrower their initial search results by selecting terms from several attributes/filters/facets.  The client had already made an attempt at the start of a taxonomy by the time I had become involved. Not surprisingly, the client-created taxonomy followed the structure of the existing folder names quite closely. After all, the folder structure was their only reference point. It became apparent that a taxonomy for folders and a taxonomy for facets, even for the same content, should be designed quite differently.

A hierarchy of nested folders has the following characteristics:
  1. It is designed to gather and group similar documents together.
  2. It is usually designed and created by a person who is uploading/storing documents with the frame of mind of “where can I put these so that I might find them later.”
  3. A document can go into only one folder and thus under only one category.
  4. A folder can be located within only one parent folder.
  5. The hierarchy of nested folders thus may become quite deep, such as six of seven levels.
  6. Folder names at deeper levels can become long and complex to describe a combination of criteria (a taxonomy design characteristic called pre-coordination).

A faceted taxonomy for search refinement has the following characteristics:
  1. It is designed to refine and narrower a search by specific criteria.
  2. It is designed to help all members of an enterprise find documents, including documents uploaded by different people in different departments.
  3. A document can be assigned multiple taxonomy terms, even terms from within the same facet/broad category.
  4. A taxonomy term may display “under” more than one parent taxonomy term, as long as it is a logical hierarchy. (This feature is called “polyhierarchy.”)
  5. The displayed hierarchy of terms is not so deep, usually only three levels.
  6. Taxonomy term names stay simple, since they are intended to be used in combination (a taxonomy design characteristic known as post-coordination).

With this many differences between hierarchical folders and refinement facets, it’s inevitable that the taxonomy for each will differ, even if the content/documents and the users remain the same. Actually, a nested folder structure may or may not even constitute a “taxonomy.” It depends on whether the folder system was designed with a consistent structure and folder names or whether it just grew ad hoc.

A year and a half ago I was involved with a similar taxonomy project for the wind energy company First Wind. In addition to designing a faceted taxonomy for the Refinement Panel to support search in SharePoint, I was also tasked with improving the nested folder structure and folder names already in use in SharePoint, and which was not going to go away. I remember being asked then, if I could just create a single taxonomy for both purposes. The answer was no, not entirely. There would be overlap, but there would also be differences.  To the stakeholders, that seemed like a lot of additional work, but to me, the taxonomist, that’s simply the nature of my work, and I enjoy the diversity of building different kinds of taxonomies. In the end, more work put in the by the taxonomist means less work needed by the users.

Monday, November 28, 2011

Multilingual Taxonomies

We know that taxonomies help information-seekers browse or search for desired documents/information. Taxonomies provide the bridge between the user’s choice of words and the wording within the desired documents. But what if the user actually speaks a different language than that of the content? Documents can be translated (automatically if it’s just to get the general meaning or by human translators when accuracy is important), but that’s only done after the document is found. To support the findability of foreign language documents what is needed is a bilingual or multilingual taxonomy (“bilingual” meaning in two languages, and “multilingual” meaning in three or more languages).

This Thursday, December 1, I will be presenting on the topic of multilingual taxonomies at the Gilbane Conference in Boston, were the focus is web and enterprise content management. This session, which will be shared with the co-speaker Ross Lehrer of WAND, appears to be only one in the conference dedicated to taxonomies and the only presentation with the word “multilingual” in its name.  The topic will be of interest to both those concerned with multilingual content but with no experience with taxonomies and to those with an interest in taxonomies but no experience with multilingual content.

The description of the session (which I did not write) on the conference website says: “Multilingual content dramatically expands the potential market for your products, and multilingual taxonomies often need to be part of your multilingual strategy.” This description applies better to my colleague’s presentation, especially since the taxonomies that his company builds are product taxonomies. My presentation, on the other hand, addresses taxonomies for more than just websites of products, such as taxonomies for retrieving articles written in different languages.

The issue is whether the multilingual content is created and managed internally or externally to your organization. If your multilingual content is what your organization creates, such as additional language versions of a public website for a global market, then it is likely that the content in the different languages is managed internally but separately, by separate language teams. The content is similar but not identical in each language, and the taxonomies that support search and browse may also be created and managed separately. Having taxonomies in different languages, however, is not exactly the same as a “multilingual taxonomy.”

A good analogy would be a translated book. The book’s index should not simply be translated; rather a new index is created by an indexer, who is a native-language speaker of the translated language, based on the newly translated text. Consulting the original language index is fine, but directly translating it will have less than ideal results. Similarly, if you have a website translated into another language, and the website has a taxonomy for browsing for specific content pages, that taxonomy should not simply be translated, but rather a new second-language taxonomy should be created, consulting the first taxonomy, of course.

By contrast, a truly multilingual taxonomy connects users who speak one language to content that is in another language. There needs to be a one-to-one correspondence between terms across both languages, and the different language versions need to be managed together. It’s somewhat complicated to design and create, but software tools are available for this, and the result is a powerful aid to searching and browsing across languages. What is important is to match your multilingual taxonomy design to the specific goals, either (1) service in different language markets, each with their own language content; or (2) users being able to access content in a language which they don’t speak.

Sunday, November 20, 2011

Taxonomies: Not New, but Growing

What’s new in the field of taxonomies? I am asked this question following my attendance at the two-day Taxonomy Boot Camp conference (October 31 – November 1, Washington, DC), the only conference dedicated to information management taxonomies.  There is actually not a lot that is new in taxonomies, which is OK. Rather, taxonomies are new in increasingly more applications, organizations, and implementations; and that is more significant

We actually don’t want anything significantly new in taxonomy design, because taxonomies serve users with predictable, standard methods of navigation.  For example, the nontrained user should be able to understand a display of broader and narrower terms. Taxonomies have actually been around a lot longer than most people realize (and I don’t mean the Linnaean taxonomy of living organisms). Taxonomies (known as controlled vocabularies) have been around since the late 1800s for cataloging books and other library materials, such as Library of Congress Subject Headings, and indexing journal articles in the Reader’s Guide to Periodical Literature published by the H.W. Wilson Company. For generations, library science students have been able to take courses in designing and using thesauri for indexing periodical literature.

Taxonomies now, however, are showing up in more and more places. These include public websites that contain numerous data records, such ecommerce sites that list all their products or databases of movies, music, recipes; a proliferation of new niche subscription database vendors; business to business databases; and most significantly in the growing content and document management repositories of any medium-to-large enterprise. Taxonomy consultants as myself are increasingly finding that taxonomy projects are not merely those of building new taxonomies from scratch, but also revising, improving, integrating, and repurposing existing taxonomies that have been created in the past 5-10 years.

It was good to have a presentation at Taxonomy Boot Camp, perhaps its first, that dealt with taxonomies for managing image files (known as digital asset management or DAM), rather than just text-based documents. Additional applications of taxonomies would be welcome topics at future TBC conferences.

The switch in scheduling Taxonomy Boot Camp with its co-located conferences KM World, Enterprise Search Summit, and SharePoint Symposium from following those three conferences to preceding them  seems to signify a shift in perspective, too. Taxonomies are no longer seen as just an add-on specialization, but rather a basic system that information professionals need to understand as a component of knowledge management, search and SharePoint implementations.

Finally, the spread in adoption of taxonomies is indicated by the fact that Taxonomy Boot Camp for the first time included both a basic and a “Beyond the Basics” track for one of its two days. More taxonomies are in place and there are more people experienced in taxonomies, that more advanced topics now can have their own audience. Despite compelling speakers in the consecutive basic track, the advanced sessions were well attended. I look forward to hearing more about what taxonomies can do at the next Taxonomy Boot Camp conference, October 16-17, 2012.

Saturday, November 19, 2011

Introduction to a New Blog on Taxonomies

I have posted a number of blog posts on taxonomy topics, but until now those posts have not been on a blog of my own, but elsewhere: of an employer Project Performance Corporation’s blog, The Taxonomy Blog of my colleague Marlene Rockmore, and that of Earley & Associates' blog where I did some contracting work.

At first it was not certain if I had enough to say to start my own taxonomy blog. Upon completing my book, The Accidental Taxonomist, at the end of 2009, I certainly did not have much more to say on the subject after writing over 400 pages. Now in the meantime I am gaining additional experiences with taxonomies and am attending more conferences and other events, so finally feel that there are indeed more new ideas I can share about taxonomies and also more than I could post on my employer’s blog. (I have to give my co-workers turns to post, too!)  I do not plan to write another entire book on taxonomies (maybe just a chapter somewhere), so I don’t have to keep the thoughts to myself for later.

Where will my new blog post ideas come from?

As a consultant, I am constantly engaging in new taxonomy projects with new experiences, new lessons to be learned, and new insights into the field. My client names should be kept confidential, so writing complete case studies may not be feasible, but the short informal nature of a blog post is quite appropriate to share some thoughts.

I also attend a number of conferences during the course of a year, and there are always new ideas coming out of these events. Some of my blog posts will be based on my own presentation topics, but not a repeat of the slide bullets, though. Instead I will provide some commentary about the presentation topic, such as why it is significant, timely, of interest, or what my concerns are. Other posts will be my observations an ideas gleaned form what others presented.

I may decide to revisit a topic in my book for a blog post. But I could also explore some new direction of topics related to taxonomies, such as content management, information architecture, search, or digital asset management.