Skip navigation

It’s been months since I last posted. Whoops. Lots of personal change for me. Some of it good, some of it not-so-good. Hunter Thompson once said that the best way to control your environment was to get off the defensive. Fair enough, Doc. So here I am, my hands back on the throttle.

This latest post is about something that I am seeing quite often these days. Confusion. More specifically, the confusion over what content management is and what records management is and how they relate (and how they don’t). I have seen it many times among many of my colleagues, customers and prospects. The range on the level of confusion is fairly broad.

Records Management is a specialized discipline with its own methods, vocabulary, best practices, etc. Records Management is not a technology itself but Records Managers seek to make use of technology to better perform and automate their lower-value tasks. The life of an RM can be quite challenging as they live and breathe through a lot of detail. Many of the RMs I’ve met have no more then a spreadsheet and email to deal with their job. Ouch!

You can imagine then that an industry of Records Management solutions must have emerged to help with the pain. Your imagination is very astute. Indeed many RM solutions from a variety of vendors have happened on the scene. These are quite a necessity to help Record Management practitioners keep up with the large and accelerating volumes that organizations are accumulating. For a definition of Records Management there is a wikipedia entry that goes into further details here.

It turns out that a number of Records Management functions are uncannily similar to Content Management functions. For example, content classification and records classification are essentially the same. Classification in this sense refers to the act of evaluating a document and understanding its purpose (e.g. invoice, contract, and so on). In both cases the content is reviewed, a decision made and the document is “classed”. The only real difference is that the content taxonomy and record taxonomy are separate and unique. An invoice may be received, classified and placed into a content management repository while it undergoes processing and approval. Simultaneously a record manager may also assign this invoice a category in the record catalogue (called a “file plan” by Record Managers).

Consumers of the content and records managers both need to understand the nature of the content but still view it through different lenses. The subtle difference is that content consumers (for example an Accounts Payable clerk) need to work directly with the data to complete a process (in the case of AP, that might be approval and payment of an invoice). The record manager, on the other hand wants to understand the content’s retention schedule and where it fits in the taxonomy. The content classification is important because it indicates how the content is to be consumed, the record classification determines the lifespan and archival requirements. With that in mind, it is important to understand here that a document can have both a content classification and a record classification.

To further this point we can think of content as having two lifecycles: a Content Lifecycle and a Record Lifecycle. The first deals with the content as it is being authored, edited, processed, approved and so on. The second deals with the content as it is being archived, retrieved and ultimately destroyed. These lifecycles may not necessarily be formalized either. As everyday creators and hoarders of our own content we all take part in this cycle. As I write, edit and publish this blog post the content is in an “authoring” stage. After I publish the post it then goes into an “archival” phase where people read it (I hope) and eventually I may even take it off-line or even delete it. These two phases informally represent the cycles of content and record management. While I have not officially declared this post a “record” in the formal sense I am treating it like one. Every piece of content–whether it is a tax document, email, MP3 file, junk mail or picture your 4-year-old drew–follows this same pattern. Every piece of content has its own lifecycle.

In large organizations with (hopefully) structured and formal processes every record should have a point in time where it is destroyed. That point differs between record types. It really depends on the type of information and the surrounding legalities. There are some records, such as a bank’s financial documents, which must be kept for at least seven years. Documents with longer retention periods, such as a birth certificate, may need to be kept for hundreds of years. The period of retention assigned to a record is called a retention schedule. Retention schedules are given to record categories and record categories are organized into a body of work called a File Plan. Up to this point I have been referring to the “file plan” as a “record management taxonomy”. While this might seem like an abstract concept it should be noted that every one of us deals with a formal record management system quite often.

Your public library is, in fact, a record management system. Really. By the time the books reach the shelves they would have completed their “content authoring lifecycle”. The file plan in your neighbourhood library is nothing more then the Dewey Decimal system. Each subject catagory in Dewey Decimal is, in fact, a record category (with many sub-categories). Of course, your friendly librarians are the record managers. I can’t help but devilishly note here that the very first Record Managment systems were the writing repositories founded in ancient Mesopotamia. Unfortunately the scribes and scholars there didn’t have much in the way of spreadsheets to assist them.

Seriously though, it should not be too surprising that librarians and museum staff take similar training as corporate records managers. Just take a look at the programs offered at the University of Toronto’s Information School.

Creation of a File Plan, developing retention schedules and enforcing this behaviour is the mandate of every record management team. However this is not always well adopted by many organizations. It is a heavy task to review every type of content and most record management teams only have the bandwidth to handle a small subset of their organization’s content. For example, a bank may have thousands of different types of documents but may only place the loan and investment documents under record management. These types of documents represent some of the highest volumes in the bank and it is not only vital to the bank and its customers that these follow a formal retention schedule but there are also legal compliance issues at stake. The legalities vary from country to country and from sector to sector.

Even with those documents under record management there may not always be the cultural will or even ability to destroy the records when it is time to do so. Large organizations such as banks or insurance companies have billions of pages of records in both paper and electronic format at any given time. A well-organized system is most onerous to maintain, review and destroy records when the retention schedule calls for it. As a former manager of mine used to epouse about the ill-famed Enron, “They got burned not for destroying documents they shouldn’t have been destroying but for destroying documents that were already supposed to have been destroyed.” Whoops.

Records Management systems are brought into the picture to automate the day-to-day opertions of records that were born digital (e.g. Microsoft Word docs), electronic (scanned) images and physical records (e.g. paper docs, photos, etc). Below is a list of various record management system capabilities:

Create and maintain record file plans – a decent RM system should be able to handle both electronic AND physical records in the same file plan. It should also support creation of multiple file plans in the same system.

Automation of common records management tasks – for example auto destruction and vital and periodic record review

Compliance features – legal holds, granular auditing and reporting capabilities
Support record searching and retrieval

eDiscovery Support – Search and retrieval capabilities which give RMs, Compliance and Legal team members the ability to query, locate and view records. Furthermore there should be an ability to show what queries were made and an audit trail of who has made them.

If you noticed I did not mention anywhere above that the record management system would actually STORE the content. That is not the job of a record management system. This might be easier to grasp for physical records where they would be tracked in the RM system but physically stored in a shelf, warehouse or a 3rd-party storage company such as Iron Mountain.

Perhaps now you can start to see where the confusion begins. Here we have two related systems that serve different masters. I took pains to present RM and its various use cases in order to be clear what an RM system is.

Hopefully you are able to understand that ECM and RM are both symbiotic and disparate. That they both flow together and yet require a difference in their practices. Clear as mud, right? The confusion is real and understandable. OK so now I will stop here as this post has gone on too long. 🙂 In my next post (Part 2 of this topic) I will go through the top things that people get confused over between RM and ECM.


For some strange reason I can still remember the phone call I received six years ago. It came with a somewhat strange assignment which is probably why I remember the call. I was sitting in my office quietly putting together a requirements document for my next release when my manager rang me up. He needed me to hop on a plane the next day and head down to Santa Cruz. It seems the CTO forgot to tell him to send someone down to a “WebDAV Interoperability Event” being held at UCSC. Clearly, I had drawn the short straw that day. My manager not only threw me the assignment but in the same call mentioned that I was now responsible for our platform’s WebDAV capabilities. So I scrambled to get a ticket, explain things to my wife (truly the most difficult task I had that week) and the next day was driving from San Jose airport down to the campus.

Back in 2004 my knowledge of WebDAV was pretty slim. As I really hadn’t paid much attention to it. At the time I was pretty sure I could spell WebDAV and that it had something to do with sharing files across the web. It was ironic that I hadn’t given it much thought up to that point as I (along with the browser-using public) had been unknowingly using it for quite some time.

The concept of WebDAV is straightforward and, frankly, not particularly mouthwatering–especially to a technofile like myself. That is both the strength and the beauty of this technology. If you read my earlier post on PDF/A you will see that I’ve visited this theme before. As I’ve come to realize after a trip or two around the block sexy software might sell but usable software endures.

Back in 2004 when I showed up at the interoperability event my job was to take my demo system and see how many of the other vendors could connect their WebDAV clients to my system. As I remember there was a sordid (grin) cast of vendors. I was in the room wth folks from Apple, Microsoft, Computer Associates, Xythos and a few others including a UC Santa Cruz team who had built their very own WebDAV client. Leading this event was a down-to-earth professor by the name of Jim Whitehead. Because I didn’t know much about WebDAV at the time I had no clue it was he who spearheaded the WebDAV movement and would become known as the “Father of WebDAV”. One thing I did note was that he was a man with many balls in the air and he seemed to be supervising a number of disparate groups that week. We were one of at least two or three big efforts he had on the go. Despite his busy schedule he still gave this the time it deserved AND even went out with us for a pint or two one evening.

We gathered in one of the computer science department’s classrooms and the environment was more like that of a 3rd year study group working on a problem set then a gathering of software professionals. That was actually pretty cool because any tension was quickly lifted and we were soon cracking geeky jokes with one another. I admit they were VERY geeky as I remember the punchline to one of them being something like “Tell him to use ‘grep'” (much laughter ensued).

A Brief Explanation Of WebDAV

Although there are a number of sources on the web explaining WebDAV at various levels (like here) I figured I’d belt out a background here.

As I have mentioned WebDAV was developed to help content authors share documents across the web. While today this doesn’t sound very earth shattering back in the 1990s when the web was starting to ramp up the lack of content sharing functionality was a blatant sore spot for the web authoring community. This made geographically-distributed content collaboration manual and clumsy.

WebDAV is meant to extend HTTP from read-only to read/write. In 1996 the WorldWideWeb (i.e. HTTP) was primarily a presentation-only affair. Users were allowed to browse to a website and view it’s contents and that’s it. File sharing had to be done through another mechanism such as FTP. This would mean that a separate FTP server would need to be set up and maintained, folders and permissions added, accounts created and so on. On the authoring side, users would need to take manual steps to move the files back and forth. WebDAV was meant to remove these manual steps and allow access directly through an HTTP client such as a browser.

The problem at the time was that HTTP was not broad enough to support this desired functionality. Remember that HTTP is a specification–effectively a contract between web browsers and web servers– and in order for everyone to get along we must follow the rules of this specification. To solve this, it was proposed (to the W3C) and accepted that the HTTP specification would be augmented with elements that enabled “read/write” behaviour. WebDAV was conceived but not yet born and I say this because it is one thing to write a spec but another (big) thing to actually implement it and even another (bigger) thing to become widely adopted. Eventually WebDAV acheived all of this.

WebDAV stands for “Web Distributed Authoring and Versioning”. The original goal of this specification was to provide basic document management functionality. WebDAV enabled web servers to now be used as content management servers as well. This would mean that anyone with an HTTP client (e.g. a browser) could now collaborate with one another (i.e. share, approve, edit, etc files) with much less overhead.

That was the plan and it was supposed to be mostly done in a single release of the spec. However, that isn’t exactly how this all played out. While it may have appeared a straightforward task there was surprisingly a lot of work to be done to cover off the original vision. Rather then hold things up the WebDAV working group chose to segment the specs. The initial phase of WebDAV was stripped down and did not include versioning. Without versioning in place it meant that WebDAV was really to behave more like a file system (where people can freely copy, move and over-write files) then a collaborative tool. While you might think this was a big loss it was still highly effective. In fact, it was this initial edition of WebDAV that most vendors adopted and have stayed put with.

I remember while at UCSC Jim walking a lot about “Delta-V”. As I learned Delta-V was an add-on to WebDAV and completed the HTTP extensions for versioning. I’m not sure whether the initial version of WebDAV gave us most of what we wanted or if the Delta-V add-on proved too costly to implement but most vendors have not implemented the versioning aspect to this day.

So What Happened?

WebDAV got itself widely adopted. There are a number of key reasons for it and I will go through these.

First, WebDAV functions across HTTP. This is the true elegance of the original vision. There is a browser on everyone’s desktop, laptop, mobile phone and belt buckle (OK..maybe not yet). With that, WebDAV opens itself up to a massive number of users right out of the gate. If I were a browser vendor why wouldn’t I want to give my browser the capability to be used for file sharing–the idea is that the user stays put in my browser for as many of her daily tasks as possible. This symbiotic relationship meant that Microsoft, Netscape and Apple (and later Mozilla) were motivated to adopt WebDAV in order to encourage usage. Remember back in the late 90s and early part of this decade there was still a blood-soaked popularity contest raging among browser vendors. Microsoft even upped the ante and WebDAV-enabled both Windows Explorer and Microsoft Office. With Windows Explorer supporting WebDAV it should go without saying that this opens up WebDAV usage to an even larger group of users (talk about world-wide adoption!). This is what I mean by the fact that I was using WebDAV without even knowing it. While in Windows Explorer a WebDAV folder looks and behaves just as any other Windows folder. I will say that the only thing I have on my wish list here would be for Windows Explorer to present thumbnail images from a WebDAV folder…but I don’t see that happening anytime soon.

The truth is that it was the adoption by Microsoft that took WebDAV from an neo-academic pursuit with mild market adoption through to complete viability and where the movement really took off. It is this continued support by Microsoft in the present day that ensures WebDAV remains a healthy standard and will be for the foreseeable future.

In the course of things another significant round of non-Microsoft adoption occured. Content authoring tool vendors such as Adobe capitilized on WebDAV–this was expected as it was for these users that WebDAV was originally envisioned. Not to be overlooked are the ECM vendors (indeed!). As my presence in UCSC indicated the ECM vendors were also implementing WebDAV. This was a no-brainer as it effectively gave the ECM vendors a free-lunch when it came to Windows browser and desktop integration. All an ECM vendor had to do was provide a WebDAV service and–POW–Windows/Internet explorer could be used as a ECM client. This is exactly the type of synergy that Jim’s Interoperability Event was meant to promote and I was on the hook to record and report all of the glitches that the other software vendors had in connecting to my repository. In fact, to this day the ECM vendors are continuing these interoperability events as they develop their CMIS interfaces.

I can’t tell you how many times I’ve put together a concept system or demo where I’ve simply set up a windows folder on the desktop, dragged in a couple of documents and (poof) these documents are instantly entered into the ECM repository complete with index values and approval lifecycle all ready to go. I am able to combine WebDAV’s UI simplicity with the advanced features of the ECM repository to improve people’s productivity while keeping their lives very much the same.

Here’s a simple example. Let’s say an HR specialist is responsible for keeping all of a company’s internal policy documents up-to-date. They might do the following:

1) open the document
2) make the edits
3) save the doc
4) open an email, add the doc in the email as an attachment
5) send email to manager/team lead for approval
6) Once approval is received send to the publishing team
7) publishing team posts to internal site/location

With WebDAV and (most) ECM systems I can take that very same process and automate it (without writing a scratch of code) so that steps 4) through 7) are handled entirely by the ECM system (with a complete audit trail). The rub here is that I can do it without any of the participants in this process need to learn any new tools! In the new world I would create the author simply saves the document back to the WebDAV folder and moves on to the next task. Everything else is orchestrated and executed by the ECM platform. If you are doing one document a day this isn’t such a big deal. Content authors who work with dozens of docs with tight deadlines very quickly realize this potential. With WebDAV and ECM you can have your cake and eat it too. 🙂

Low-tech users who still print out their emails and think Google is the real name of the internet love this sort of thing as their Windows-explorer world doesn’t change. IT groups love it because the users love it and their overhead doesn’t get worse. Like I said, WebDAV is neither sexy nor cutting-edge but it is easy, well-supported and transparent.

Over the years I’ve heard a lot about the demise of WebDAV. I hear that it is old technology or too static to live on. Pish-posh. The truth is that WebDAV is like the air we breathe, necessary to function and invisibly there for us. Looking at the ECM landscape I see that every major ECM vendor from IBM (P8 and CM8) to OpenText (LiveLink) to Microsoft (Sharepoint) and so on continues to provide WebDAV support with each new release. Among the ECM vendors WebDAV is simply an ongoing requirement in the price of entry to the market. Unless the planet tosses out HTTP there will always be a place for WebDAV.

If you talk to anyone from many of the various ECM vendors about mashups their eyes will start to get sparkly and they will get noticeably excited. The whole notion of jumping on and riding bareback on one of software’s current fast-paced technology thouroughbreds gives ECM product teams something to look forward to. Certainly the notion of taking data from your ECM repository and combining it with a Google map or some other such widget sounds like it could have a kazillion uses. The problem is that ECM vendors, to this day, are having trouble coming up with even a few dozen unique uses that are going to stick.

The truth is that while there indeed are great use cases where ECM and mashups can be successfully applied I have yet to see one that is a) unique to mashups and/or b) would justify the cost of putting the mashup infrastructure into place. (I would include licensing and maintenance in this equation)

In the case of a) I would call this the “killer app” factor. That is to say that no widespread case has yet emerged which would point to ECM and mashups as uniquely able to provide a solution. Of course there are some niche scenarios but these are not what is going to drive usage.

At this point it seems that adoption of mashups into ECM solutions are going to be incrementally driven by the ECM vendors themselves. Instead of putting R&D into developing newer interfaces some ECM vendors, such as IBM, are developing application widgets to fit into portal frameworks (e.g. BusinessSpace as one example). These are fairly limited in functionality and are being positioned as a “quick and dirty” UI. As these ECM widgets slowly gain feature parity with their counterpart interfaces customers may slowly migrate over. I am certainly not holding my breath as the key word here is SLOWLY. Slowly as in early adopters will likely be the only customers for the next few years. It is only when a true “killer app” emerges or widget functionality begins to exceed that of ECM legacy applications that adoption should substantially increase.

One possible accelerator may be found within CMIS–Content Management Interoperability Services. CMIS is an open standards (web services-based) API. This standard is meant to give ECM repositories of all vendors, sizes and flavours the means to communicate with each other. (I plan on writing more about CMIS in the future) As I was thinking this through the other day it occured to me that a CMIS widget could be used to provide a sovereign interface across a landscape of multiple repositories. Not a stretch of an idea but it does have certain implications. For example, this could be key for many large organizations with a plethora of repositories. As many of us know every large org has acquired over time a number of different ECM platforms. While a CMIS widget wouldn’t necessarily displace a heavyweight content federation application it would certainly give knowledge workers a simple way of mixing the content from multiple key repositories into a single view. The dynamic nature of mashups means that these views can be quickly modified to respond to changing needs.

There is no doubt that mashup technology belongs in the ECM toolkit but to what extent of its use remains to be seen. This early in the cycle things are too fluid to make a determination. Then again that right there is the true spirit of mashups.

As I am the facilitator for my company’s local user group I get a chance to sit in on some fairly interesting discussions with the ECM user community. In fact, at one of these recent meetings just before the Christmas break we had an interesting topic: PDF/A. For the uninitiated PDF/A is a standardized file format intended for long-term content archiving. The idea is that if today I save a document in PDF/A that in thirty years I will still be able to retrieve and view this document without worry that there will be viewer applications available and that the content of the file will be presented in the same way without loss of quality or information. This also provides a stronger basis for the legal admissability of electronic document. So many companies want to throw away the paper but are scared stiff of doing so. Organizations that would benefit the most from PDF/A are those which are responsible for retaining documents over a long time period such as banks, insurance companies and government agencies. Today most organizations use TIF as their long-term storage format.

PDF/A brings some great things to the table. First off, it is based off of the widely used PDF format. This is a good thing. Almost every desktop and laptop in the world has a PDF viewer available. Adobe Acrobat being the most popular. Most people with a heartbeat are quite familiar with the Adobe viewer and so this format has an exceptional advantage in that it is already well entrenched. Another advantage is that with PDF/A being an open standard and no longer a proprietary format it is not subject to the whims and folly of a particular software vendor. The PDF/A standard will live on and continue to evolve according to the needs of the community. A particular drawback is that the standards committee might not respond as quickly to the community as a vendor would its customers.

Certainly it should also be noted that PDF/A also brings improved fidelity to the game. In most cases the size of a PDF/A scanned as a colour image is the same as a TIF image of the same DPI. In B&W a PDF/A image would be smaller then its TIF counterpart at a similar DPI level. In either case it represents a net benefit. While this isn’t such a big deal for a small group storing a few thousand images it is a VERY big deal for a bank that might have hundreds of millions of images and is looking to control storage costs without losing image quality.

Today TIF is certainly THE standard for long-term image storage and it isn’t going away anytime soon. However, let us remember a few things about TIF. First, it is still a proprietary format and is owned by (drum roll here) Adobe. Not that Adobe has any sinister plans for TIF (that we know of..heh heh) but the weakness for TIF is what I stated above…it isn’t going anywhere. As a larger percentage of the day-to-day and year-to-year content moves into electronic format the relevance of TIF will dwindle over time. Twenty years ago we all used WordPerfect and now we don’t. What happened in between was that WP simply became irrelevant and we stopped using it. (unless you are my father, whoops)

While ECM is a very complex and magical world and every ECM specialist would love it if everyone everywhere would embrace and implement the entire ECM-magnetic spectrum (from BPM to Content Collection) in a fortnight the hard realty is that most organizations are just now grappling with the basics–getting their electronic content under control (from email to Sharepoint to scanned images to…). Not that these basics are particularly easy, mind you. In fact they are downright scary and expensive for many. After decades of ignoring the steady increase of electronic content volume there is a massive amount of catch-up to do. I doubt that most of my customers have the cultural will to take this on even if money were no object (but that’s another blog for another day).

With that said PDF/A is a great next step for any organization to take. However, it is only that..a next step. Before the plunge is taken there are a few things to keep in mind. First, PDF/A is a specification and not an application. This means that it is open to a certain level of interpretation by anyone who tries to write a viewer. Today this is not much of a consideration. We have Adobe Acrobat. In thirty years who knows what PDF/A viewers will exist if any. Any content strategy must include considerations for viewing this content at a point in the future.

Let’s also not forget that a document stored in PDF/A is still subject to being modified/tampered with. At a minimum vital content must not only be secured but must be viewable to only the right people. Furthernore an audit trail must be able to show who has had access. In other words, PDF/A is only a small part of a larger Content and Records Management program.

Finally it should be noted that not all PDF/A formats are created equal. For example. PDF/A – 1a is the “kitchen sink” of the standard and includes elements such as tagging whereas PDF/A – 1b is a subset of this. To determine what is the appropriate format a use case analysis must be undertaken. In a lot of cases PDF/A – 1a is overkill and will end up generating larger capture/conversion and storage costs. I can’t stress enough here that you must know what is the intent of the content before you make a decision. Companies such as LuraTech and AdLib have sprung up as experts in this field and can provide more insights.

So there it is. There is no space-aged breakthrough with PDF/A, most of us are using this technology right now. The remarkable thing here is that PDF/A is simply a common-sense approach that uses existing technology to solve a business problem. (as opposed to some gold-plated Apollo program built by developers for bragging rights) How about them apples! With that we still need to make sure that we have our eyes wide open.

To begin with, I have not thought this whole thing through. Then again in an age where impulsivenss is not only encouraged but often heavily rewarded I am merely playing by the rules. I just set this blog up in five minutes and started typing. Just add water.

This is ECM Missives. One of many places in the blogosphere where I can present insights and, with a little luck, others can benefit (or at least relate) from the wisdom of this software professional’s (in)experience and (mis)adventures.

The state of ECM. It is such a big place. ECM continues to both thrive and wither at the same time. It can be a harsh world at times. A world where experience and skills are being built up and eroded all at once and your greatest career acheivement becomes a distant memory within months.

You look around in your work environment (or virtual environment if you telecommute) and observe ridiculous (to you) activities going on and bad decisions (ditto) being made all around. You think to yourself, “How the hell did this company get this far? We’re screwed.” Yet somehow things progress, stuff gets handled. The sin of it all is that most of the attention is given to the urgent and not the important.

As an industry we are getting better but the truth is that it is dangerously painful at times and everyone in it bears at least a couple scars. Though as I often tell my children, “If you don’t have any bruises, you’re not having any fun.”

In Spring 2009 the economy is still a broken body stitching itself back together. Things are stagnant and software people (like  most others) are locked in a game of Russian Roulette– their employers holding the revolver and spinning the cylinder. As software product teams thin out the ranks important new product features are delayed, put in limbo or outright eliminated from the product roadmap. This makes life harder for sales and angers the customers. They don’t like being told “No” or “later”. They don’t even like being told “perhaps” and even “coming soon” starts to wear thin.

Things will change. In software–especially in software–they always do.

That’s my intro post for now. Nothing spectacular. Of course, more to come.