BBC News website's content management and publishing systems
As division of the BBC News site refresh we have been making stable changes to the underlying systems that manage and publish the make easy.
The BBC has one of the oldest and largest websites put ~ the internet and one of the goals of the update to the News position was to also update some of the core systems that horsemanship content for all our interactive services.
In this post I'll highlight a few areas where we have made some important changes.
CPS
The CPS is the combination of parts to form a whole that manages content production for BBC News, BBC Sport and in addition 100 other websites across the BBC. It also produces the ~ed for multi-platform journalism, such as the BBC Mobile services and Interactive TV/Red Button services and equable content for the mighty Old Skool Ceefax is born in the CPS.
If the BBC website is single in kind of the largest and oldest on the internet, then the CPS has been right and left nearly as long. As a rule of thumb, if you remember Bagpuss you are older than the CPS. If you grew up watching Teletubbies, you probably are not.
Let's not confuse old by legacy though. The CPS has been constantly evolving and we should saw, that when looking at the requirements for the new News location and other services, we did consider whether we should take a light step to the Content Management System (CMS) Showroom and see what glossy new wheels we could get.
However there is an interesting inanimate object about the CPS - most of our users (of which there are from one to another 1,200) think it does a pretty good job [checks inbox beneficial to complaints]. Now I'm not saying they have a picture of it next to their kids on the mantelpiece at home, but compared to my actual trial with many organisations and their CMS, that is something to rate highly.
The latest version of the CPS - version 6 - underpins the of the present day News site and has made substantial changes to systems and workflow, on the other hand it is still focused on the task of managing content what one. fits into a general journalistic pattern. It does not try to subsist all things to all people, and this in no doubt plays more part in its success.
There have been a number of requests from the public asking to see more of the CPS but as there is a al~ment of detail to go into, I'll just focus on a not many headline points for now. We will be doing a more in deepness blog post on it soon.
Moving to a more structured draw near
Some of the major changes in approach are in the Client that is a .NET 3.5 client, taking full advantage of WPF. The screenshot in the lower regions shows an example image from the CPS which illustrates some commencing features.
This shows a snapshot of a story editing window. Around this are location navigation and other tools (like Search).
As you can see there is a component based structure to the story content with a Video, Introduction and Quote shown. These components are predefined and be possible to be dragged in and added to the story showing that the CPS is not in a primary manner a WYSIWYG editor. The CPS focuses on content structure because in a nature where you are publishing to many platforms that have hugely variant rendering possibilities WYSIWYG becomes a pointless feature but there are previews showing the output.
Previously, users could count up HTML and Custom CPS tags directly into the story body to direction the content presentation and the components, similar to the way you would insert code into your content on Wikis and Blogs. This causes a portion of problems for quality and content structure though, so now these things are managed like components where the user can change the content and behaviour of the ~ part in a controlled manner. We will come on to the consequence of that next.
HTML Standards
Another part of the CPS that changed considerably was the means by which anything is reached content is published. Requirements over the years have caused features to have existence added organically to the way content is published, leaving it a mite messy with a lot of layout based on HTML tables. A solution goal here was to improve the technical quality of content produced and undergo standards as we move from <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> to <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-close.dtd">
For example, we are aiming for fully valid pages to be published based on the W3C Validation Checker.
If you look at more of the older pages published you will see they don't go through this test, and some pages, such as http://news.bbc.co.uk/sport2/hi/olympic_games/default.stm, render a lot of errors:
Ouch!
This is especially tricky to plant where the CPS is pulling in content from other systems or services what one. don't comply with these standards, but though there is gentle some work to do here, generally we should be down to 0 or real few errors now.
You will also see in the image superior to, that older stories are based on the iso-8859-1 disposition set, whereas new stories will all be UTF-8 encoded with respect to better international support.
Semantic Structure
We will also no longer have ~ing using tables to layout the content, instead we will be translation the pages using CSS layout and only using tables for data.
There are lots of reasons to do this, but some embody making the content more efficient, more standards compliant and faster to set forth. It also allows us to publish semantic XHTML, which means that ~ment blocks are better marked up to describe what they are and has benefits like creating a greater good header structure to help screen readers.
Better structure also means you bequeath see a more consistent presentation of stories in Google and inquiry engines with, for example, story dates and author information showing else clearly.
This reflects a new content model which is now largely based round a simple and generic data model of assets and groups of effects which are typed (meaning we don't just manage blocks of make ~ed, we use metadata to describe what is in the blocks of ~ed) and publishing through templates and services based around Velocity.
Again this is near supporting content standards better as described here for making better use of headings and lists.
Take this example showing how a composing is put together.
Previously the HTML would have looked something like this:
But after this it is much more structured and would look something like this by headers clearly marking out the sections of content:
Using CSS according to layout also makes a big difference to our HTML and makes instead of a better separation of layout and content. This rather messy layout...
becomes
The slab elements used in the first example are gone and the layout relies in c~tinuance CSS to manage the positioning of content.
URL Structure
Finally a testy note on the change in our URL structure where you may be in actual possession of noticed a couple of significant changes. These are the tip of ~y iceberg of substantial changes we have made to our networks and infrastructure besides part of this relaunch.
The first is that our News URLs be under the necessity moved from the http://news.bbc.co.uk to http://www.bbc.co.uk/information/ in order to consolidate our domains. As part of the News locality changes this involved us making significant updates to our networking infrastructure to endure better sharing of content across our domains. Moving all our URLs onto the http://www.bbc.co.uk/ empire also consolidates some differences which are there largely for reasons none longer necessary.
All URLs should redirect to the appropriate place, if it be not that if you do find any broken URLs please let us be aware of.
We also wanted to simplify our URL structure removing much of the traps in the previous structure for managing different types of content and editions of the website.
Now the edifice is basically:
http://www.bbc.co.uk/ [SITE] / [SECTION] - [SUBSECTION] - [STORY-ID]
For precedent:
http://www.bbc.co.uk/news/uk-politics-10721364
This has made URLS shorter and simpler.
We considered making even shorter URLs - you will have seen some stories were published this manner while we transitioned the site to the new design, such in the same manner with:
http://www.bbc.co.uk/news/10250603
The changes we hold made will allow us to make URLs more flexible, and there is more work to do yet on how we might appliance even shorter URLs (such as http://www.bbc.co.uk/10250603) and longer more descriptive ones http://www.bbc.co.uk/story-about-something-interesting.
If you would like to know more about any of this, then let me know by leaving a comment.
Thanks for reading.
John O'Donovan is Chief Technical Architect, Journalism and Knowledge, BBC Future Media & Technology.