Hi, I’m Rich Hall, Director of DSP Development for RipCode and another new voice on the RipCode blog. I’ve read the Brian Peebles article Architecting a Video Transcoding System, and thought it was a well written and thought provoking piece. I’ve been involved in building audio/video and graphic systems for over 20 years, and architecting these systems for performance, function, cost, etc. has been a daunting task, with the consequences large and long standing. I like to tell the story of when I was a junior engineer building mainframe graphic systems for CAD/CAM applications, we had a director that in every product meeting would remind us of our top 3 product goals; 1) performance, 2) performance, and 3) ah – performance. He would then go on with a wry smile and say there is actually a 4th goal, but I don’t think I have to mention what that was.
As Brian pointed out in his article, performance is not the only requirement. These systems must also be architected for functionality, quality, serviceability, upgradeability, as well as other less thought of but equally important factors such as space and power requirements. Architecting video conferencing systems for 6 years, including going thru 3 different architectures, has further heightened my awareness of the critical impact in choosing the right architecture now, and for that architecture to be viable as your business grows.
Architecting video transcoding systems leverages very similar requirements demanded by these earlier graphic and recent video conferencing systems. With the technology requirements needed for audio and video transcoding, the architecture and hardware selection is critical. Audio and video processing demand is increasing exponentially, from the social networking sites, to the deployment of video content to mobile phones and laptops, to IPTV, etc. The plethora of audio and video algorithms (e.g. AMR, AAC, MP3, AC3, H.264, WMV, Flash, MPEG), along with all of their various profiles, as well as the almost infinite number of resolutions, frame rates, and bit rates has generated the need for multimedia processing that is hard to comprehend. Add to this all of the required image and audio processing (e.g. scaling, de-interlacing, sample rate conversion, audio gain and normalization), and we effectively need the horsepower of a top fuel dragster with the function, reliability, maintainability, and earth friendly features of a Toyota Prius.
So, what does all of this mean? Well, it means we need ultra fast (for the most part parallel) mathematical processing, extremely efficient data movement, and the flexibility and programming model to quickly react to customer’s changing demands and the ever changing algorithmetic standards. Unfortunately, these goals have traditionally been at odds with each other. Typically, the fastest hardware for video type processing has been an ASIC. But, ASIC’s have their inherent problems. They typically only support one or maybe a few codec standards, won’t support new codec standards, and would very likely not support a new profile or appendix of an existing standard. And in case anyone get’s the notion that ‘this standard and its profiles are the last’, take a look at the relatively new H.264 – the ever growing number of profiles is getting staggering. Add to that the fact that it is usually not cost effective for a company to build its own ASIC, thus relying on a 3rd party vendor, an ASIC solution is usually a risky non extensible solution.
At the other end is flexibility. GPPs traditionally have been the most flexible platform. But it is quite a universal feeling that GPPs just are not designed for the heavy mathematical processing required for video compression and image processing. Throwing multiple cores helps some, but diminishing returns quickly kick in with regard to data movement and power consumption.
Other solutions are starting to gain some traction, such as FPGAs, and the new class of parallel programmable processors (e.g. Stream Processors, IBM Cell). FPGAs have the advantage that they’re quite fast, and retain a level of programmability. However, they suffer drawbacks in that typically the engineers that have the programming competence are not video algorithmetic engineers, resulting in either sub optimal implementations or difficult project collaboration. FPGAs also typically require some means of ‘GPP’ for general control and system interface, so you end up with a multi architecture solution with sometimes significant data movement.
The new class of parallel programmable processors are certainly an interesting piece of technology worth watching. The claims from a couple of years ago are quite interesting – ASIC level speed with the ease and programmability of a GPP or DSP. I think the jury is still out on these claims, and as we’re starting to see some of these technologies coming to fruition, we’ll start to get some visibility into the actual performance and practical programmability.
Ok, you noticed I left DSP’s for last. DSP’s have traditionally been the choice for many multimedia architectures, but have always left architects wishing for more. They were never quite fast enough, never quite easy enough to extract the maximum performance from. Many companies have shifted DSP’s and architectures in search of the ‘holy grail’. Well, I think we may be finally getting there with the next class of traditional DSP’s that include ‘GPP’ cores and hardware acceleration for mathematical functions required by video compression. The result is the potential to have it all – the ease of GPP programmability for control software and system interfacing, the speed and programmability of a DSP core for traditional video and audio processing, and the raw speed of an ASIC for the heavy processing power required by a video compression algorithm like H.264 for functions like motion estimation and de-blocking. The result being a single core that can be used to build a transcoding product that is flexible, sustainable, eco friendly, and reflecting back to that director of mine many years ago – just plan ol’ fast.
Email This Post
0 Responses to “Thoughts on Architecting a Video Transcoding System”