I’ve been known to rant when I see the pages and pages of links that come up when one searches for “Information Architecture” and all they get is content management garble. I will have to succumb to the fact that when dinosaurs are extinct, the kids that survive will be the ones who believe this stuff and gone will be the good stuff like information models such as logical and physical diagrams, server diagrams and data dictionaries. All the makings of mature information architectures.
I do however have to spend some time discussing content management – the credible side of information architecture. Many organizations have deemed these folks “librarians” of the content provided by content owners. Christina Wodke defines IA as “The art and science of structuring and organizing information systems to help people achieve their goals. Information architects organize content and design navigation systems to help people find and manage information.”
In it’s utmost simplest terms, Content management is an other name for publishing. The main objective of publishing is to get the right content to the right person at the right time at the right cost. Publishers manage publications. Key staff include contributors (authors) and editors. Authors create content. Editors decide what content should get published, and how much editing that content requires.
The web was invented by Tim Berners Lee as a publishing tool. HTML was created to be a publishing mark-up language. That’s the core reason the term web ‘pages’ is used. Content management is web-based publishing. The early stages of web publishing, like the early years of printing, were very dependent on the programmer, in the same way book publishers are reliant on the printer. It was a major technical feat to publish a large website.
Many people like to make their discipline sound complex because that makes them more valuable to the organization. Web publishing sounds very complex. I have personally seen business cards passed around at industry events that contain the title “Information Architect” to find out they may be the webmaster or the content librarian at their organization.
Web publishing technology is becoming streamlined and standardized. The focus is moving away from the tools and towards the content. The basic rules and concepts are the same as they’ve been for many centuries as they were for publishing, whether you are publishing to print or to the web.
A couple of new terms have come to light in the most recent years that never existed years ago, and here is where the architecture comes in.
Information architecture is the name being misused as many web publishers as the discipline of managing the organization and layout of web content. In print, editors have managed information architecture-type challenges for centuries (table of contents, indexes, etc.)
Why is IA Important? It’s all about the metrics, especially when we talk about public facing content sites. Each company who runs an online storefront, as well as the bricks and mortar variety will know that a good website keeps the buyers coming back, and makes it worth the investment in the first place.
Several metrics are key: Cost of finding information on the site – the time, # of clicks, amount of frustration or precision in finding something. Adversely, there are metrics surrounding not finding the item – success, recall, frustration, and alternatives which are harder to measure.
There are also some metrics that should be tracked in any content management solution, or for the web design. There is the cost of development (time, budget, staff, and frustration as well as the value of learning for the consumer (related products, services, projects, people.)
The latter of which has become known as usability, and there are folks that have now focused their careers on being web usability experts. This is an incredibly fast moving target, as styles, tools and technologies remain fast-paced release wise.
Personalized content is publishing is by definition an act of personalization. Your city newspaper has a specific scope and focus. Vogue is about fashion, and Sports Illustrated is about sports. So, if you edit for a website, you are by definition creating personalized content. Like so much about the Web, personalization has been vastly over-hyped and again, sold as a feature of content management tools.
Now further to metrics, we must design and consider the information environment or the context in which content is stored. In large companies, Enterprise Content Management projects are undertaken because they have increasingly global and distributed enterprises, multiple cultures and languages, and potentially have acquired several different companies. It would be nearly impossible to locate any document, so typically many copies are stored in department file systems before an ECM is included in the landscape.
It is complicated further by numerous intranets and web sites, and the fact that authors and users spread across departments. Often ownership is unclear. Many of the issues revolve around the centralization versus decentralization discussions.
IA consumers needs are both complex and diverse – they have diverse information seeking behaviors, needs and expertise. We are now approaching real information architecture, as we need to include storage, servers, networks, and database and application software to manage the content.
It is a large undertaking to study user behaviors to determine what will be best for an organization. Methods of organization of files and searching content that is not text-based becomes further issues. The method of categorization of the information is crucial. Content dictionaries should contain some or all of the information about the content being stored. Most content management solutions will provide for this type of indexing so that the content may be easily stored.
To provide you with some value for today’s eZine issue, here are some variables you should track in your homegrown dictionary, or data collection points if you are investing in content management software or a metadata store:
ID : The ID should be unique and descriptive. A little different than a unique ID in a database. It’s best if we use text within <a href> tag (link label), or you can use either the <title> of the HTML doc or headline for the content. These should be numbered in the data store, and can have an outline like structure as per levels in your web. e.g. 1.0, 1.1, 1.1.1
Description: A Brief description or summary of content, e.g. ” Specs for application X’
Link (URL/Location): Record the URL of content item you’re looking at – this allows you to (1) click and navigate from the dictionary and (2) capture the location of the document on the Web server. It should be noted that the URL should point to the location of the actual HTML file, not a symbolic link or redirect. Web Crawler results always need a lot of editing to be meaningful. Non-digital content should include the physical location such as the content owners name, phone number, location address.
File Type (format): Note whether the item is text, audio, video, image, etc. The size of the item may be included here.
Content Type: Different from the “topic” entry, the content TYPE tells you what kind of content it is, not what the content is about (“topic” – next attribute), e.g marketing content, navigation, data sheet, technical specifications, application, customer stories are all content types. Note: a pre-defined glossary or vocabulary is necessary so that data sorts and finds can be performed.
Topic: What’s the content about? No need for standard values–this is an open field for developing metadata (e.g. keywords), category labels for sorts, initial structure and content gap analysis. These are the keywords that are required by browser source views.
User group (audience): mapped to explicit departments, groups. This one may be harder to use and keep up to date, but well worth the effort.
Content Status: Exists, Planned, NTH (nice to have) : Does content exist? Is it definitely planned and accepted with resources available/committed? Or just a wish (no development plan, commitment)?
ROT: No, this is not the rule of thumb, but more status information. Redundant, outdated or trivial is a label indicating that the content should be removed from current site if possible, not migrated to redesigned site. It is often a placeholder in case of future need.
Responsibility/Owner: Name of person responsible for this content–who has primary/lead authority to approve or change it. This is a tough one, and your governance procedures will have to include this. This may become part of any data stewardship program that you may have.
Current status: This should be
C?= Create? (questionable, for future review and/or acceptance)
C = Create
D = Draft
R = Review
F = Final
Date Due: Date, time (e.g. “end-of-day”) FINAL is due for delivery, by mutual agreement
Date Delivered: Date, time (e.g. “end-of-day”) FINAL was delivered, by mutual agreement
Contents: Any/all pertinent information not relevant to proceeding (left) columns, e.g. broken links, images; characteristics of page; management of page; quirks of content ownership, etc.
Sitemap Page ID: Architecture diagram (sitemap) component #–destination for this content. E.g. 3.2
Wireframe Type: Type of wireframe (template) appropriate for this content. E.g. Type C. Note – we will discuss Wireframes as they pertain to GUI design at some later date.
Location within wireframe (template) – specific text field, column. E.g. text ‘field’ 2