The conference for video devs
October 16th – 17th, 2024
Regency Ballroom, San Francisco
Buy tickets now!
Buy ticketsJoin our mailing list
Want to sponsor? Get in touch
Why attend?
No marketing, ever.
Speakers are selected based on their submission, not how much money their company paid; we will never, ever sell a speaking slot. Attendee information isn’t for sale either, and that includes any sponsors.
Affordable
We want anyone in the industry to be able to come, which means keeping tickets reasonably priced (thanks largely to our generous sponsors). We also offer free and discounted tickets to students and open source distributors, so please reach out if you’re interested.
For everyone in the community
Our community is dedicated to providing an inclusive, enjoyable experience for everyone in the video industry. In this pursuit, and in keeping with our love for reasonable standards, we adopted the Ada Initiative’s code of conduct.
Speakers
Alex Field
Sky/NBCU
Talk Overview
Alex Giladi
Comcast
Talk Overview
Anand Vadera
Meta
Talk Overview
Bruce Spang
Netflix
Talk Overview
Constanza Dibueno
Qualabs
Talk Overview
Derek Buitenhuis
Vimeo
Talk Overview
Eric Tang
Livepeer
Talk Overview
Fabio Sonnati
NTT Data
Talk Overview
Gwendal Simon
Synamedia
Talk Overview
Jan De Cock
Synamedia
Talk Overview
Jason Cloud
Dolby Laboratories
Talk Overview
Jeff Riedmiller
Dolby Laboratories
Talk Overview
Jill Boyce
Nokia
Talk Overview
John Bowers
Twitch/Amazon IVS
Talk Overview
Jon Dahl
Mux
Talk Overview
Katerina Dobnerova
CDN77
Talk Overview
Li-Heng Chen
Netflix
Talk Overview
Luke Curley
Discord
Talk Overview
Matteo Naccari
Visionular
Talk Overview
RongKai Guo
NVIDIA
Talk Overview
Ryan Cunningham
Scenery
Talk Overview
Ryan Lei
Meta
Talk Overview
Steve Robertson
YouTube
Talk Overview
Tanushree Nori
Vimeo
Talk Overview
Thomas Edwards
Amazon Web Services
Talk Overview
Tony McNamara
Paramount Streaming
Talk Overview
Tracey Jaquith
Internet Archive
Talk Overview
Vanessa Pyne
Daily
Talk Overview
Walker Griggs
Mux
Talk Overview
Wei Wei
Netflix Inc
Talk Overview
Will Law
Akamai
Talk Overview
Yingyu Yao
YouTube
Talk Overview
Yuriy Reznik
Brightcove, Inc.
Talk Overview
Zoe Liu
Visionular
Talk Overview
Alex Field
Sky/NBCU
The Colorful Truth of Automated Tests
Trying to automatically test what the end user actually sees and hears on their streaming device is hard - very hard. Automated testing methods often rely on unreliable data from player APIs, leading to inaccurate results. This talk aims to showcase our journey of how we experimented with content encoded with visual and audio queues to validate that our player APIs are really telling the truth about what the user is seeing.
Alex Giladi
Comcast
Ads and overlays
The concept of server-guided ad insertion (SGAI), first introduced by Hulu in 2019, is getting increasingly popular in the industry. It is markedly more scalable than the traditional server-side (SSAI) approach, but nearly as resilient. It is more interoperable and more resilient than the client-side (CSAI) approach but is nearly as efficient and versatile. Client-side graphic overlays are to a degree a reincarnation of the banner ads plaguing the web since the '90's. Their main use is not necessarily ad-related -- they are used in a variety of roles from station identification to localization to emergency notification. Their traditional implementation in the video was inserting them in baseband (i.e., pre-transcoder) in a playout system, which is the least scalable and the highest-latency approach possible in the video world. The streaming ecosystem has standardized and maturing support for SGAI. Interstitials are used to implement the approach in HLS. XLink was used in the original MPEG DASH implementation of the approach; however, XLink suffers from a number of design flaws and was never widely implemented in context of live channels and events. Media Presentation Insertion, a recent addition to MPEG DASH, revisits this concept and allows spawning a new media presentation while pausing the main channel. As opposed to HLS interstitials, media presentation insertion allows asynchronous termination ("return to network"), supports VAST tracking, and more. The same server-guided model can be applied to the overlay use case and has a potential to improve scalability, targeting, and glass-to-glass latency in a dramatic way. This talk will first describe the new MPEG-DASH media presentation description approach and its application to SGAI and blackouts. It will then cover the application of the same principles to the graphic overlays in MPEG-DASH. This presentation will conclude with a description and a demo of an open-source implementation of both technologies.
Anand Vadera
Meta
Optimizing Storage for Meta's Trillion-Video Catalog: Achieving Pareto Efficiency
Meta manages an extensive video catalog with over a trillion videos across various products, and this number is growing daily with more than one billion new videos added each day. The challenge lies in maintaining an efficient storage footprint while accommodating this continuous influx of new content. The goal is to achieve Pareto efficiency, optimizing the storage space without compromising the quality of the videos delivered. This balance is crucial for sustaining scalability and efficiency in Meta's Video Infrastructure. This talk will delve into an innovative method for addressing the problem at hand. It will discuss the fundamental concepts underpinning this approach and share valuable insights gained during its development and implementation. In particular, it will highlight effective strategies that have proven successful in enhancing storage efficiency without negatively impacting video quality. Furthermore, the presentation will touch upon the ongoing evolution of the system, showcasing how it is continually being improved to better tackle the challenge of managing an ever-growing video catalog while maintaining optimal storage usage. By sharing these learnings with others facing similar challenges, the hope is to contribute to the collective knowledge base and ultimately facilitate the development of more efficient and effective systems for managing large-scale video repositories.
Bruce Spang
Netflix
Wei Wei
Netflix Inc
Innovate Live Streaming with a Client Simulator
One of the major challenges in live streaming is the scarcity of real-world events to test innovative ideas and algorithms, such as adaptive bitrate (ABR) algorithms. Relying on actual live events for testing not only prolongs the innovation cycle but also increases the risk of negatively impacting user experience. To overcome this obstacle, we at Netflix have enhanced our existing client simulator to emulate live streaming scenarios. This simulator utilizes network traces and device characteristics gathered from real-world sessions to drive our production client library. We will delve into the specifics of how this simulator operates during our presentation. In summary, the client simulator plays a crucial role in driving innovation at Netflix, which we will explore in detail during our presentation. In this talk, we will first present how the client simulator simulates live streaming. Then we will demonstrate how it can be used to test new live encoding methods, like Variable Bitrate (VBR) encoding, and to evaluate various ABR algorithms on a large scale. We will conclude the talk with future directions.
Constanza Dibueno
Qualabs
How to play Dash on your HLS Player
What if I told you that you could play a DASH video seamlessly in an HLS player? In today's broadcasting landscape, interoperability is a challenge. Reaching a broader audience means creating multiple copies of each stream file in different formats, which doubles the costs of packaging and storage. This inefficiency is a significant pain for broadcasters. CMAF was designed to revolutionize HTTP-based streaming media delivery. It streamlines media delivery by using a single, unified transport container compatible with both HLS and DASH protocols. At the latest MonteVideo Tech Summer Camp, we embarked on an exciting project: creating a library based on the CMAF standard and the Hypothetical Application Model. This innovative library provides a practical solution for converting playlists and manifests between HLS and DASH. We brought this vision to life by building a proof of concept. We want to present to you an intuitive Open Source UI built on top of the library. In this presentation, we will showcase how the UI can help you to understand the library's powerful capabilities with the potential to create tools to simplify the broadcasting experience, without having to go deep into CMAF's specification complexities. For example, allowing users to take a DASH manifest as an input, convert it to an HLS playlist on-the-fly and reproduce the content on an HLS player. With this capability, broadcasters could adapt to different streaming requirements, delivering content across various platforms and devices, thereby enhancing adaptability and flexibility. By the end of this presentation, we aim to show approaches that could enhance interoperability in your broadcasting operations, using the CMAF HAM UI as a tool.
Vanessa Pyne
Daily
Derek Buitenhuis
Vimeo
Be the change you want to see: How to contribute to FFmpeg
Have you ever written code you wanted to contribute to FFmpeg, but you got a little tripped up in the send-email situation or maybe you got some feedback you weren't sure how to handle and your patch never made it across the finish line? Maybe you went to github to make a PR, saw PULL REQUESTS ARE IGNORED, followed the link to the contribution documentation, saw a 28 point checklist and backed away slowly from your computer. Don't give up the dream! This talk will review the entire FFmpeg contribution process from soup to nuts and demystify the scary parts. It will focus on procedure and potential sharp edges thereof, rather than the actual code contribution itself. If that sounds very dry, rest assure the only thing dry about it will be the wit. This information may be elementary to some folks, but to paraphrase a recent FFmpeg-devel mailing list sentiment: "More diversity would be good." Making the process more accessible is key to making the circle bigger and encouraging a more diverse group of people to participate in the FFmpeg-devel ecosystem. If we want some new kids on the block, there should be a step by step guide, and this talk aims to be that just that. A brief outline of the talk is as follows: 1. How to lurk (mailing list & IRC) 2. Find a thing to fix, improve, create 3. How to run regression tests (FATE, etc) 4. How to git patch (aka how to send an email) 5. How to address feedback 6. It merged! Now what?
Eric Tang
Livepeer
Progress and Opportunities in Video Content Authenticity
In an era where AI-generated video is rapidly becoming the norm, the need for video content authenticity has never been more critical. Over the past year, we've witnessed significant strides in this area, with industry giants like OpenAI, Google, and BBC joining the Coalition for Content Provenance and Authenticity (C2PA) and committing to integrate this technology into their core products. Join us in this enlightening session as we dive into C2PA’s significant technical advancements over the past year, and map out a practical approach for implementing C2PA in any video application. Discover the intricacies of C2PA’s trust model and understand how it safeguards users on video platforms. We'll also cover essential implementation considerations, from video player UX to backend video workflow management. As long standing members of the Content Authenticity Initiative (CAI) and a key contributor to C2PA, we bring a wealth of experience from participating in weekly working groups and shaping the last two versions of the C2PA specification. Our expertise is backed by numerous workshops and presentations at leading conferences and industry events like NAB and C2PA symposium.
Fabio Sonnati
NTT Data
The Long(est) Night
April 28, 2019, a phone call wakes me in the middle of the night: "TheLong Night", a new episode of the final season of Game of Thrones is airing, but nothing is visible! The artistic intent is clearly extreme, and the encoding can't handle it, resulting in a flurry of confused silhouettes in the darkness, struggling in a dark sea of banding. In this presentation, I will talk about how we resolved an extreme situation for an high quality streaming service by manually customizing encoding to mitigate the problem, inspired by well-known principles in the world of audio processing and 3D game rendering.
Gwendal Simon
Synamedia
Token Renewal: Keeping your Streaming Party Smooth and Secured
CDN leaching is a growing concern for content providers. The recent specification of the Common Access Token (CAT) has introduced a vehicle for designing more secure streaming delivery systems. Best practices for CDN content protection often involve renewing the token, either due to short expiration times or probabilistic rejections. However, token renewal is far from trivial. In token-based delivery systems, we identify three key entities: the client, the CDN server, and the token generator. Typically, these communicate via HTTP(S). At any point during a streaming session, the CDN server may request the client to renew its token, ensuring seamless video playback, even for low-latency ABR streaming. The CAT specification includes two claims related to renewal: catr and catif. While the specification details several operation modes, none fully satisfy the combined requirements for fast renewal, legacy clients, and the unique characteristics of DASH and HLS. In this talk, we will unpack the current situation, presenting the pros and cons of each proposed solution. We aim to open the door to a better solution and outline the community effort needed for its implementation.
Jan De Cock
Synamedia
Measuring live video quality with minimal complexity, now available for everyone!
We all love video, and we love it even more when the quality of the video is great! To measure that quality, we already have quite some options, and the folks at Netflix did a great job at giving us VMAF. This is all fine and dandy for our VOD colleagues, but what about us, *live* video engineers? We struggle to optimize every cycle in our live encoders, and spending a full CPU core on metric calculation is just not acceptable -- and not good for business. We spent quite some time figuring out how to simplify this problem. Our marketing people said: "Why don't you use AI"? So we did, and imagine that, in this case it actually worked. We'll forget about all those other projects that got stuck in the trough of AI disillusionment. Turns out that metrics such as SSIM and VMAF can be quite accurately predicted, and by using smart features inside the encoder, this can be done with marginal additional computational complexity. In the talk, we’ll explain how we found a balance between accuracy and complexity of the used features and ML networks. All fine for *your* encoder you say, but how does that help me? Well, we took on the challenge to show that this approach also works for open-source encoders, with x264 as our first target. And, we’re sharing the code, so you can try it out too! And while we’re eagerly awaiting the 10th Demuxed over the coming months, we’ll also be trying this approach on SVT-AV1. Too early to tell if this attempt will be successful, but we’ll be able to tell you in October, and take you through the process during the talk!
Jason Cloud
Dolby Laboratories
Jeff Riedmiller
Dolby Laboratories
Does a multi-CDN setup (truly) requiring switching? Deploying an anti-switching multi-CDN delivery platform.
It seems pretty clear that using multiple CDNs to deliver media is a good thing; but it’s hard to do effectively. What is the best policy to use? How do you determine when to switch? How often do you switch? Do you switch based on client performance alone, consolidated user metrics, or something else? What happens when the CDN you switched to isn’t performing as good as you thought it would? Answering these questions (let alone designing a multi-CDN switching architecture) is enough to give anyone a headache. What if we throw out “switching” by downloading media from multiple CDNs at the same time? We could then realize the best performance by merging the performance of each. Seems simple enough until you start trying to do it efficiently. Do you race media from multiple CDNs at the same time, or do you try to perform sub-segment/byte-level scheduling? This seems even more complicated than before! This talk will focus on how to implement and deploy a switchless multi-source media delivery platform that is highly performant and efficient which avoids having to answer these difficult questions or solving massively complicated scheduling problems. Enabling true multi-source delivery without all the fuss requires us to do something a little bit unique to the content we are streaming. We first create multiple “versions” (one for each source aka. CDN) of each and every HLS or MPEG-DASH segment. This is done by packaging these segments into something called the Coded Multisource Media Format (CMMF), which is currently being standardized in ETSI. CMMF is essentially a container that is used to communicate network encoded content (network coding is kind of like forward error correction (FEC), but not – we’ll expand upon this more during the talk). Each CMMF version is then cached on a different CDN. Now let’s say a media player wants to download a particular video segment. Instead of downloading the entire segment from one CDN or requesting bytes x1 through y1 of that segment from one CDN and x2 through y2 from another, the media player requests multiple CMMF versions of that segment from different CDNs at the same time. Once the player receives enough data (the amount required is very close to that of the original segment) from the collection of CDNs, it can stop the download and recover the segment it wanted. By doing it this way, we don’t have to worry about hiccups (like temporary congestion or slow response times) on one CDN because we will just download more from the others. During the talk, we will introduce CMMF, the core concepts behind it, as well as go over how we deployed it within a streaming service with 20+ million subscribers world-wide and streamed approximately one million hours of content using it. We will also provide performance data that shows how CMMF-enabled multi-CDN delivery stacks up against a popular multi-CDN switching approach (as you can guess, it stacked up well). We hope this talk provides the audience with a different perspective to an “age-old” problem and inspires them to explore multisource delivery in greater detail.
Jason Cloud
Dolby Laboratories
Jeff Riedmiller
Dolby Laboratories
Does a multi-CDN setup (truly) requiring switching? Deploying an anti-switching multi-CDN delivery platform.
It seems pretty clear that using multiple CDNs to deliver media is a good thing; but it’s hard to do effectively. What is the best policy to use? How do you determine when to switch? How often do you switch? Do you switch based on client performance alone, consolidated user metrics, or something else? What happens when the CDN you switched to isn’t performing as good as you thought it would? Answering these questions (let alone designing a multi-CDN switching architecture) is enough to give anyone a headache. What if we throw out “switching” by downloading media from multiple CDNs at the same time? We could then realize the best performance by merging the performance of each. Seems simple enough until you start trying to do it efficiently. Do you race media from multiple CDNs at the same time, or do you try to perform sub-segment/byte-level scheduling? This seems even more complicated than before! This talk will focus on how to implement and deploy a switchless multi-source media delivery platform that is highly performant and efficient which avoids having to answer these difficult questions or solving massively complicated scheduling problems. Enabling true multi-source delivery without all the fuss requires us to do something a little bit unique to the content we are streaming. We first create multiple “versions” (one for each source aka. CDN) of each and every HLS or MPEG-DASH segment. This is done by packaging these segments into something called the Coded Multisource Media Format (CMMF), which is currently being standardized in ETSI. CMMF is essentially a container that is used to communicate network encoded content (network coding is kind of like forward error correction (FEC), but not – we’ll expand upon this more during the talk). Each CMMF version is then cached on a different CDN. Now let’s say a media player wants to download a particular video segment. Instead of downloading the entire segment from one CDN or requesting bytes x1 through y1 of that segment from one CDN and x2 through y2 from another, the media player requests multiple CMMF versions of that segment from different CDNs at the same time. Once the player receives enough data (the amount required is very close to that of the original segment) from the collection of CDNs, it can stop the download and recover the segment it wanted. By doing it this way, we don’t have to worry about hiccups (like temporary congestion or slow response times) on one CDN because we will just download more from the others. During the talk, we will introduce CMMF, the core concepts behind it, as well as go over how we deployed it within a streaming service with 20+ million subscribers world-wide and streamed approximately one million hours of content using it. We will also provide performance data that shows how CMMF-enabled multi-CDN delivery stacks up against a popular multi-CDN switching approach (as you can guess, it stacked up well). We hope this talk provides the audience with a different perspective to an “age-old” problem and inspires them to explore multisource delivery in greater detail.
Jill Boyce
Nokia
Bringing more versatility to VVC with VSEI
Versatile Supplemental Enhancement Information (VSEI) is a companion standard to Versatile Video Coding (VVC). VSEI defines SEI messages that contain metadata inserted into a bitstream synchronized with the coded video, to convey extra information intended to be utilized by the receiver/decoder. SEI messages are optional and are targeted at specific use cases. SEI messages specified in VSEI may also be used with other video coding standards, including H.264/AVC, HEVC, or future standards. Since the initial standardization of VVC and VSEI in 2020, second and third editions of VSEI have been standardized, with a fourth edition under development. The new SEI messages included in new versions of VSEI bring even more versatility to VVC, by addressing a broader variety of applications. This talk will describe several of the new SEI messages and the use cases they enable.
John Bowers
Twitch/Amazon IVS
Free ABR Renditions for User Generated Content Platforms
Well, not exactly free - but much, MUCH lower cost than server-side transcoding! Providing an ABR ladder is table stakes for live viewer experiences, but it’s expensive for at-scale video platforms to provision and maintain specialized infrastructure to handle peak transcoding demand. A recently developed update to the Enhanced RTMP specification adds multitrack video, multitrack audio, and advanced codec support. With implementations in OBS Studio and Wireshark, the technology is ready for you to adopt it. Now you can offer all creators - regardless of audience size or creator - ABR playback. Come and learn why encoding multiple video tracks on the content creator’s machine at the edge is higher quality, lower latency, more scalable compared to server-side transcoding – all while allowing faster innovation and deployment of newer codecs like HEVC and AV1.
Jon Dahl
Mux
A taxonomy of video "quality," or: was Strobe right or wrong about quality?
“Quality” is one of the most abused and overloaded terms in video. Orwell says that unclear language leads to unclear thinking, and: wow, our industry suffers from unclear thinking around quality. We conflate codecs like AV1 with "high quality"; we don’t know the difference between QoS and QoE; we’re 👍 on VMAF but we don’t really know how to use it. Meanwhile, Strobe gets on stage at Demuxed 2018 and says “Video quality doesn’t matter” (as the audience gasps in horror). In this talk, we’ll bring clarity and precision to the domain of “video quality.” We’ll learn the difference between QoE, QoS, perceptual quality, fidelity, efficiency, and more. We will review a schema that once and for all will eliminate all confusion, doubt, and ignorance from this area, driving our industry forward into a more enlightened future. And most importantly, we’ll learn whether Strobe was right and wrong when he said quality didn’t matter.
Katerina Dobnerova
CDN77
Enhancing CDN Performance and Cutting Egress Costs in Large Video Libraries Delivery: Advanced Caching Strategies and Edge Computing Optimization
During the 20 minutes of my presentation, users worldwide will generate content equivalent to the volume created from the dawn of civilisation until 2003. The volume of content being created today is staggering. Consider this: from the beginning of recorded history until 2003, we produced roughly 5 exabytes of content. However, projections suggest a monumental leap to 147 zettabytes in 2024 alone, with video content leading the charge. With such exponential growth in content and its shortening life span, content delivery networks (CDN) face significant challenges in effectively caching large video libraries. While cache hit rates of 98% and higher are taken for granted, the figures above suggest that simple disc space inflation is not remotely enough to keep the cache hot ratio at the desired figures. This presentation explores many approaches, including tiered cache systems which use a hierarchical system of caching servers employing consistent hashing and other techniques to maximize scalability and performance while minimizing failover and downtime. It also covers one-hit-wonder elimination, utilizing simple counters to reduce cache pollution by avoiding storing unpopular content. It also addresses cache-state sharing, which employs Bloom-filter-based technology to further improve cache scalability and effective disk space utilization. Moreover, it will examine the deployment of edge computing to amplify caching efficiency in specific use cases.
Li-Heng Chen
Netflix
Ryan Lei
Meta
A hitchhiker's guide to AV1 deployment
Six years since its inception as a video coding standard stipulated by the Alliance for Open Media, AV1 has proven its capability as a Swiss Army knife, with application domains spanning the streaming of movies and TV shows, user generated content and real-time video conferencing, including screen content, among others. This talk will feature a roadshow of AV1 deployments that have impacted billions of people's lives, presented by engineers with first-hand experience on its implementation in production systems. Presenters will share tips, tricks, know-hows and the lessons learned from their deployment experience to bring the best performance out of AV1. Example topics include but not limited to: productization of AV1’s film grain synthesis feature and use of AV1 to deliver high dynamic range video at Netflix, AV1 deployment in Instagram Reels, and AV1 support for RTC services at Meta.
Luke Curley
Discord
Replacing WebRTC with Media over QUIC
It's been over a decade since WebRTC was released. Surely there's something new on the horizon, right? Media over QUIC is an IETF working group that is working on a new live media standard to replace the likes of WebRTC, RTMP/SRT, and HLS/DASH. Wow that's overly ambitious, but it's being backed by your favorite big tech companies (and some non-favorites) in the same standards body that has produced hits such as... WebRTC. But replacing WebRTC is difficult. It exists because there were no web standards in 2011 that could accomplish conferencing; remember this was before even HTML5. But there are new Web standards now! This talk will go over WebTransport and WebCodecs, and how they are utilized to provide a user experience nearly on par with WebRTC while being dramatically more flexible. No more magic black box, no more ICE/STUN/TURN/SDP/DTLS/SCTP/RTP/SRTP/mDNS, no more getting ghosted by Google. Just you with a QUIC connection and the ability to encode/decode video frames. And of course we'll go over the promise of Media over QUIC and why you should use the standard instead of your own bespoke protocol. I'll give you a hint, it starts with C and ends with "DN Support".
Matteo Naccari
Visionular
Compression of stereoscopic video with MV-HEVC: fundamentals, tools and development
Multiview (e.g. stereoscopic) content provides users with a fully immersive and compelling quality of experience when watching videos. This type of content is gaining new momentum thanks to the development and commercialisation of Virtual Reality (VR) headsets such as Apple Vision Pro and Oculus Quest. The delivery of Multiview video calls for new challenges to the video coding community, being frames composed of multiple views (two in the case of stereoscopic). Standardisation bodies such as ISO/IEC MPEG and ITU-T VCEG envisaged the compression of Multiview content with the H.265/HEVC standard, extended to efficiently tackle the intrinsic data redundancies present across different views. Thanks to the availability of VR headsets, content providers and codec vendors are now deploying solutions supporting the Multiview extension of H.265/HEVC (collectively known as MV-HEVC). This talk will introduce the MV-HEVC standard from the encoder’s designed perspective, starting with an overview of the standard’s design and tools supported. The focus will then move on to consider the challenges faced when implementing practical encoding solutions such as fast mode decision and rate control.
RongKai Guo
NVIDIA
Zoe Liu
Visionular
AI Enhanced GPU Video Coding: Achieving Joint High Compression Efficiency and Throughput
We are here to present a novel approach to significantly boost video compression efficiency on Nvidia NVENC hardware encoders, by leveraging AI-driven pre-analysis and pre-processing algorithms. We refer to this method as AI Enhanced GPU Video Coding, which combines Nvidia NVENC's high density, low latency, and high throughput with ML-based techniques to enhance video compression efficiency and boost visual quality, while maintaining high throughput. NVENC, as a leading hardware-based encoder, excels in providing high throughput and low latency but generally offers lower compression efficiency compared to CPU-based software encoders. Our AI-driven GPU video compression approach aims to leverage the advantages of both NVENC and AI algorithms to achieve high compression efficiency and throughput performance. Our optimization algorithms mainly include: 1. ML-based Scene & Region Classification: Identifying effective coding tools based on scene and region classification. 2. Regions of Interest (ROI) Identification: Focusing on perceptually significant regions, such as faces and jersey numbers in typical sports videos. 3. Pre-processing Techniques: Applying deblurring, denoising, sharpening, contrast adjustment, etc. to boost up visual quality. 4. Hierarchical pre-analysis and pre-classification: Setting fine granular QPs, including block-based QPs, and enabling quick quality monitoring. These techniques combined improve video compression efficiency, boosting both objective and subjective quality while achieving significant bitrate savings. We have applied these methods to large UGC content platforms. Our results demonstrate promising improvements in compression efficiency for both VOD and live use cases. Using the NVIDIA T4 Tensor Core, we maintained the same high throughput for multiple parallel encoding threads and achieved a 15-20% bitrate saving and a 1-2 VMAF score improvement at the same time, on typical UGC & PUGC content compared to the out-of-the-box NVENC approach. Further enhancements, such as re-encoding, are currently being developed and further compression gains are expected.
Ryan Cunningham
Scenery
WebCodecs vs. WASM for Fast Video Scrubbing
We built a web-based video editor capable of fast scrubbing and advanced WebGL compositing features. This talk explores the intricacies of building such an editor using WebCodecs for video decoding and preview and contrasts it with traditional methods, specifically using HTML video, or a WASM H264 decoder. The goal is to provide a comprehensive guide on implementing a high-performance video editor preview that leverages modern web technologies while addressing practical challenges and limitations, and also reveal areas where improvement is needed. HTML video elements, while widely used, pose significant limitations for fast scrubbing and precise frame accuracy. Slow seeking, lack of control over frame rendering, and the need to use drawImage to get frames into a WebGL texture can hinder the perceived speed in a video editor. WebCodecs provides a low-level API that allows developers to decode video segments and render them to textures, enabling extremely fast scrubbing and WebGL compositing directly in the web browser. By holding video data in GPU textures, we achieve advanced features such as alpha-transparency using just the H264 decoder. The talk will dive into the implementation details, showcasing pre-loading and garbage collection techniques. We will also discuss the pipeline nature of WebCodecs decoders, which necessitates efficient management of VideoFrames to maintain performance. Despite its advantages, WebCodecs comes with its own set of challenges. The hardware-based implementation means no actual concurrent decodes, and rendering VideoFrames to textures is surprisingly CPU-intensive. Additionally, the performance can be inconsistent across different hardware due to Google's GPU exclusion list in Chrome, which defaults to software decoding on certain computers. This session will cover mitigation strategies, including conducting test decodes to determine performance viability. We will discuss the trade-offs and potential pitfalls of using WebCodecs. Before the advent of WebCodecs, our approach involved using a WASM-compiled H264 decoder, tinyh264. Using WASM in Web Workers, we achieve true concurrent decoding. However, it comes with its own set of limitations. Running entirely on the CPU, it requires managing frames in main memory and handling the upload to the GPU, alongside color space conversions from YUV to RGB. Furthermore, it creates licensing issues since it distributes an H264 decoder. We will discuss the implementation details, performance considerations, and how it compares to WebCodecs.
Li-Heng Chen
Netflix
Ryan Lei
Meta
A hitchhiker's guide to AV1 deployment
Six years since its inception as a video coding standard stipulated by the Alliance for Open Media, AV1 has proven its capability as a Swiss Army knife, with application domains spanning the streaming of movies and TV shows, user generated content and real-time video conferencing, including screen content, among others. This talk will feature a roadshow of AV1 deployments that have impacted billions of people's lives, presented by engineers with first-hand experience on its implementation in production systems. Presenters will share tips, tricks, know-hows and the lessons learned from their deployment experience to bring the best performance out of AV1. Example topics include but not limited to: productization of AV1’s film grain synthesis feature and use of AV1 to deliver high dynamic range video at Netflix, AV1 deployment in Instagram Reels, and AV1 support for RTC services at Meta.
Steve Robertson
YouTube
Why is gapless so hard?
A deep dive into audio gaplessness, for video engineers. Covering the difference between stitching, pseudo-gapless, and true gapless approaches, why gapless is important to the art, the mechanical reasons why the audio clock always wins, how the system reconciles this instability, and why this leads to dropped frames and A/V sync issues.
Tanushree Nori
Vimeo
Budgeting Bytes: Acing Cost-Efficient Video Storage
In today's world, where data never stops growing, Vimeo is at the forefront, cleverly slashing storage costs while keeping videos readily accessible. In my talk, I’ll peel back the curtain on how we fine-tune cloud storage using Machine Learning, balancing cost savings with cheap and quick video access at Vimeo. We’ve cut our storage bills by an impressive 60% by applying smart lifecycle policies and a dash of machine learning methods. I'll share insights on how we determine the best times to tuck away older videos into cheaper storage tiers and what factors go into these decisions. This talk will offer practical strategies and a peek into the tools that help Vimeo manage a sprawling video library efficiently. Discover how these innovations can help reshape your approach to data storage too!
Thomas Edwards
Amazon Web Services
Video Processing on Quantum Computers
Quantum computing (QC) utilizes quantum mechanics to solve complex problems faster than on "classical" computers. QCs available today are considered "Noisy Intermediate-Scale Quantum" (NISQ) computers with a small number of quantum bits (qubits) and limited performance due to short coherence time and noisy gates. QCs are improving all the time, so it is possible that in the future they could provide practical acceleration to video processing workflows (remember how neural networks were in the 1990's?). This presentation will give a short overview of QC basics, results of representing (simple) images on an actual cloud-accessible QC, and will describe some research on potential video processing applications of QCs. [Note: I've timed that this can be presented in 20 minutes]
Tony McNamara
Paramount Streaming
Pseudo-Interstitials: Playback flexibility for legacy devices.
Interstitials allow the insertion of content by reference into a playback stream, and are especially useful when a playlist won't work. But Interstitials are also still relatively new; just a year ago Apple devices didn't support playback of them, despite Apple having accepted them into the HLS Specification years earlier. DASH XLinks suffer the general inconsistency so consistent in DASH. And of course legacy devices tend to be stuck on much earlier protocol versions. We've come up with "Pseudo-Interstitials", which provide much of the same flexibility, to allow very-late decisioning and binding of content, especially ads, into playback of legacy devices. This will include a very brief introduction to interstitials and their value, and the problem statement, and then a deep dive into the multi-disciplinary solution including encoding concerns, manifest manipulation, Edge Computing and even briefly SSAI constraints.
Tracey Jaquith
Internet Archive
What's on TV? 4 editors and 2 robots walk into a bar..
Using TV news "chyron" text overlays in the "lower third" (from human editors), image-to-text (OCR), grouping/filtering, and AI gpt to summarize --> we social post hourly: "What's on TV?" The non-captions news text (eg: BIDEN VISITS MEXICO) that shows up at the bottom of the screen (like those overhead monitors in airports showing news) is gold, written in real-time by editors during live broadcasts. However, the data is not carried anywhere inside the video streams (just visually). What's a girl with robots to do? Using CNN, MSNBC, Fox News and BBC News feeds, we use ffmpeg to crop the relevant image area; tesseract to OCR the image into text; and GPT AI to summarize, remove ads, and cleanup the text. We then post hourly summaries to mastodon.
Vanessa Pyne
Daily
Derek Buitenhuis
Vimeo
Be the change you want to see: How to contribute to FFmpeg
Have you ever written code you wanted to contribute to FFmpeg, but you got a little tripped up in the send-email situation or maybe you got some feedback you weren't sure how to handle and your patch never made it across the finish line? Maybe you went to github to make a PR, saw PULL REQUESTS ARE IGNORED, followed the link to the contribution documentation, saw a 28 point checklist and backed away slowly from your computer. Don't give up the dream! This talk will review the entire FFmpeg contribution process from soup to nuts and demystify the scary parts. It will focus on procedure and potential sharp edges thereof, rather than the actual code contribution itself. If that sounds very dry, rest assure the only thing dry about it will be the wit. This information may be elementary to some folks, but to paraphrase a recent FFmpeg-devel mailing list sentiment: "More diversity would be good." Making the process more accessible is key to making the circle bigger and encouraging a more diverse group of people to participate in the FFmpeg-devel ecosystem. If we want some new kids on the block, there should be a step by step guide, and this talk aims to be that just that. A brief outline of the talk is as follows: 1. How to lurk (mailing list & IRC) 2. Find a thing to fix, improve, create 3. How to run regression tests (FATE, etc) 4. How to git patch (aka how to send an email) 5. How to address feedback 6. It merged! Now what?
Walker Griggs
Mux
PSSH, or the Primordial Soup of Secure Headers
Consider our friendly, neighborhood PSSH box. The semantics are simple -- to identify encryption keys -- but, as with any permissive specification, there’s a lot more going on than meets the eye. In some cases, they contain deeply nested little-endian UTF16 XML. In others, we’ll find protocol buffers containing base64-encoded JSON. In all cases, they have surprising amount of personality. In this talk, we will dive deep into several PSSH boxes, dissecting them bit by bit across various popular DRM schemes. Along the way, we will: 1. Explore the history of the PSSH box and how it mirrors the evolution of DRM standards. 2. Discover how each provider has imparted their own company idioms onto the loosely-defined PSSH payload. 3. Identify where the decisions of one provider impacted the rest.
Bruce Spang
Netflix
Wei Wei
Netflix Inc
Innovate Live Streaming with a Client Simulator
One of the major challenges in live streaming is the scarcity of real-world events to test innovative ideas and algorithms, such as adaptive bitrate (ABR) algorithms. Relying on actual live events for testing not only prolongs the innovation cycle but also increases the risk of negatively impacting user experience. To overcome this obstacle, we at Netflix have enhanced our existing client simulator to emulate live streaming scenarios. This simulator utilizes network traces and device characteristics gathered from real-world sessions to drive our production client library. We will delve into the specifics of how this simulator operates during our presentation. In summary, the client simulator plays a crucial role in driving innovation at Netflix, which we will explore in detail during our presentation. In this talk, we will first present how the client simulator simulates live streaming. Then we will demonstrate how it can be used to test new live encoding methods, like Variable Bitrate (VBR) encoding, and to evaluate various ABR algorithms on a large scale. We will conclude the talk with future directions.
Will Law
Akamai
Creative Monkeys Contemplate Dating
The geeky primates at WAVE are releasing version 2 of the popular CMCD standard . While CMCD v1 was restrained to a CDN (data) relationship, v2 gives you three different modes for concurrently sharing data. Now you can date a content steering service, and an analytics service, at the same time as maintaining a committed relationship with your CDN :) This talk highlights the new features and capabilities of CMCD v2. In addition to the reporting mode enhancements, we'll investigate the host of news keys being offered: media start delay, target buffer length, buffer starvation duration, prefetching multiple objects at once, player state, response code, TTFB, timestamps, request URLS and many more. We'll explore how v2 can be used to drive lightweight data for content steering decisioning, rich collection for analytics providers that is decoupled from the delivery and even improved prefetching performance and visibility for the CDN. We'll show it all working and release some code so that you too can experiment. Join us!
Yingyu Yao
YouTube
Your TV Is Eating Your Frames
At YouTube, we aspire to stream cat videos to everything that has a screen, including the largest of them all: TVs in your living room. Despite being devices engineered to be video playback powerhouses, it is unexpectedly difficult to make videos play consistently and smoothly on them. From the lens of a player engineer, I will take you on a shallow dive through the TV media stack, and we will explore different ways a playback can get tripped on those large screens.
Yuriy Reznik
Brightcove, Inc.
Streaming in 1970s. NVP & ST: the very first real-time streaming protocols.
In this talk we will go back in history and look at the very first protocols and systems developed for internet streaming. The venerable NVP (network voice protocol) and ST (Internet Stream Protocol, aka IP v5) protocols developed by Danny Cohen, Jim Forgie, and other brilliant engineers at MIT Lincoln labs in 1970s. We will discuss the key ideas introduced by these protocols (the concepts of sessions, available capacity assessment, rate negotiation between sender and receiver, data transfer protocols, the need for network-layer support for sessions, resource provisioning, etc.) and show how most of these ideas become incorporated in subsequent designs. Specifically, we will show how many ideas introduced in NVP and ST have eventually found their implementations in modern protocols, such as WebRTC, QUIC and MOQ. The talk will include many historical pictures and some videos of those early pioneering systems build in 1970s. It will also try to explain why and what motivated these original developers to come up with all these techniques.
RongKai Guo
NVIDIA
Zoe Liu
Visionular
AI Enhanced GPU Video Coding: Achieving Joint High Compression Efficiency and Throughput
We are here to present a novel approach to significantly boost video compression efficiency on Nvidia NVENC hardware encoders, by leveraging AI-driven pre-analysis and pre-processing algorithms. We refer to this method as AI Enhanced GPU Video Coding, which combines Nvidia NVENC's high density, low latency, and high throughput with ML-based techniques to enhance video compression efficiency and boost visual quality, while maintaining high throughput. NVENC, as a leading hardware-based encoder, excels in providing high throughput and low latency but generally offers lower compression efficiency compared to CPU-based software encoders. Our AI-driven GPU video compression approach aims to leverage the advantages of both NVENC and AI algorithms to achieve high compression efficiency and throughput performance. Our optimization algorithms mainly include: 1. ML-based Scene & Region Classification: Identifying effective coding tools based on scene and region classification. 2. Regions of Interest (ROI) Identification: Focusing on perceptually significant regions, such as faces and jersey numbers in typical sports videos. 3. Pre-processing Techniques: Applying deblurring, denoising, sharpening, contrast adjustment, etc. to boost up visual quality. 4. Hierarchical pre-analysis and pre-classification: Setting fine granular QPs, including block-based QPs, and enabling quick quality monitoring. These techniques combined improve video compression efficiency, boosting both objective and subjective quality while achieving significant bitrate savings. We have applied these methods to large UGC content platforms. Our results demonstrate promising improvements in compression efficiency for both VOD and live use cases. Using the NVIDIA T4 Tensor Core, we maintained the same high throughput for multiple parallel encoding threads and achieved a 15-20% bitrate saving and a 1-2 VMAF score improvement at the same time, on typical UGC & PUGC content compared to the out-of-the-box NVENC approach. Further enhancements, such as re-encoding, are currently being developed and further compression gains are expected.
Venue & location
1300 Van Ness Ave.
San Francisco, CA 94109
The Regency Ballroom is a beautiful, centrally-located San Francisco event venue.
According to their website, the building is noted as a fine example of Scottish Rite architecture. Its ballroom is a beaux-art treasure with thirty-five foot ceilings and twenty-two turn-of-the-century teardrop chandeliers.
According to one intrepid online reviewer, “Took my son to a death metal concert here and it was awesome!” …so, you know it's gotta be good.
The Schedule
9:40 AM PDT
Matt McClure
Demuxed
Opening Remarks
Tanushree Nori
Vimeo
Budgeting Bytes: Acing Cost-Efficient Video Storage
In today's world, where data never stops growing, Vimeo is at the forefront, cleverly slashing storage costs while keeping videos readily accessible. In my talk, I’ll peel back the curtain on how we fine-tune cloud storage using Machine Learning, balancing cost savings with cheap and quick video access at Vimeo. We’ve cut our storage bills by an impressive 60% by applying smart lifecycle policies and a dash of machine learning methods. I'll share insights on how we determine the best times to tuck away older videos into cheaper storage tiers and what factors go into these decisions. This talk will offer practical strategies and a peek into the tools that help Vimeo manage a sprawling video library efficiently. Discover how these innovations can help reshape your approach to data storage too!
Read more
Alex Field
Sky/NBCU
The Colorful Truth of Automated Tests
Trying to automatically test what the end user actually sees and hears on their streaming device is hard - very hard. Automated testing methods often rely on unreliable data from player APIs, leading to inaccurate results. This talk aims to showcase our journey of how we experimented with content encoded with visual and audio queues to validate that our player APIs are really telling the truth about what the user is seeing.
Read more
10:40 AM PDT
Break
11:15 AM PDT
Walker Griggs
Mux
PSSH, or the Primordial Soup of Secure Headers
Consider our friendly, neighborhood PSSH box. The semantics are simple -- to identify encryption keys -- but, as with any permissive specification, there’s a lot more going on than meets the eye. In some cases, they contain deeply nested little-endian UTF16 XML. In others, we’ll find protocol buffers containing base64-encoded JSON. In all cases, they have surprising amount of personality. In this talk, we will dive deep into several PSSH boxes, dissecting them bit by bit across various popular DRM schemes. Along the way, we will: 1. Explore the history of the PSSH box and how it mirrors the evolution of DRM standards. 2. Discover how each provider has imparted their own company idioms onto the loosely-defined PSSH payload. 3. Identify where the decisions of one provider impacted the rest.
Read more
Eric Tang
Livepeer
Progress and Opportunities in Video Content Authenticity
In an era where AI-generated video is rapidly becoming the norm, the need for video content authenticity has never been more critical. Over the past year, we've witnessed significant strides in this area, with industry giants like OpenAI, Google, and BBC joining the Coalition for Content Provenance and Authenticity (C2PA) and committing to integrate this technology into their core products. Join us in this enlightening session as we dive into C2PA’s significant technical advancements over the past year, and map out a practical approach for implementing C2PA in any video application. Discover the intricacies of C2PA’s trust model and understand how it safeguards users on video platforms. We'll also cover essential implementation considerations, from video player UX to backend video workflow management. As long standing members of the Content Authenticity Initiative (CAI) and a key contributor to C2PA, we bring a wealth of experience from participating in weekly working groups and shaping the last two versions of the C2PA specification. Our expertise is backed by numerous workshops and presentations at leading conferences and industry events like NAB and C2PA symposium.
Read more
Jason Cloud
Dolby Laboratories
Jeff Riedmiller
Dolby Laboratories
Does a multi-CDN setup (truly) requiring switching? Deploying an anti-switching multi-CDN delivery platform.
It seems pretty clear that using multiple CDNs to deliver media is a good thing; but it’s hard to do effectively. What is the best policy to use? How do you determine when to switch? How often do you switch? Do you switch based on client performance alone, consolidated user metrics, or something else? What happens when the CDN you switched to isn’t performing as good as you thought it would? Answering these questions (let alone designing a multi-CDN switching architecture) is enough to give anyone a headache. What if we throw out “switching” by downloading media from multiple CDNs at the same time? We could then realize the best performance by merging the performance of each. Seems simple enough until you start trying to do it efficiently. Do you race media from multiple CDNs at the same time, or do you try to perform sub-segment/byte-level scheduling? This seems even more complicated than before! This talk will focus on how to implement and deploy a switchless multi-source media delivery platform that is highly performant and efficient which avoids having to answer these difficult questions or solving massively complicated scheduling problems. Enabling true multi-source delivery without all the fuss requires us to do something a little bit unique to the content we are streaming. We first create multiple “versions” (one for each source aka. CDN) of each and every HLS or MPEG-DASH segment. This is done by packaging these segments into something called the Coded Multisource Media Format (CMMF), which is currently being standardized in ETSI. CMMF is essentially a container that is used to communicate network encoded content (network coding is kind of like forward error correction (FEC), but not – we’ll expand upon this more during the talk). Each CMMF version is then cached on a different CDN. Now let’s say a media player wants to download a particular video segment. Instead of downloading the entire segment from one CDN or requesting bytes x1 through y1 of that segment from one CDN and x2 through y2 from another, the media player requests multiple CMMF versions of that segment from different CDNs at the same time. Once the player receives enough data (the amount required is very close to that of the original segment) from the collection of CDNs, it can stop the download and recover the segment it wanted. By doing it this way, we don’t have to worry about hiccups (like temporary congestion or slow response times) on one CDN because we will just download more from the others. During the talk, we will introduce CMMF, the core concepts behind it, as well as go over how we deployed it within a streaming service with 20+ million subscribers world-wide and streamed approximately one million hours of content using it. We will also provide performance data that shows how CMMF-enabled multi-CDN delivery stacks up against a popular multi-CDN switching approach (as you can guess, it stacked up well). We hope this talk provides the audience with a different perspective to an “age-old” problem and inspires them to explore multisource delivery in greater detail.
Read more
Thomas Edwards
Amazon Web Services
Video Processing on Quantum Computers
Quantum computing (QC) utilizes quantum mechanics to solve complex problems faster than on "classical" computers. QCs available today are considered "Noisy Intermediate-Scale Quantum" (NISQ) computers with a small number of quantum bits (qubits) and limited performance due to short coherence time and noisy gates. QCs are improving all the time, so it is possible that in the future they could provide practical acceleration to video processing workflows (remember how neural networks were in the 1990's?). This presentation will give a short overview of QC basics, results of representing (simple) images on an actual cloud-accessible QC, and will describe some research on potential video processing applications of QCs. [Note: I've timed that this can be presented in 20 minutes]
Read more
12:30 PM PDT
Lunch
1:45 PM PDT
Katerina Dobnerova
CDN77
Enhancing CDN Performance and Cutting Egress Costs in Large Video Libraries Delivery: Advanced Caching Strategies and Edge Computing Optimization
During the 20 minutes of my presentation, users worldwide will generate content equivalent to the volume created from the dawn of civilisation until 2003. The volume of content being created today is staggering. Consider this: from the beginning of recorded history until 2003, we produced roughly 5 exabytes of content. However, projections suggest a monumental leap to 147 zettabytes in 2024 alone, with video content leading the charge. With such exponential growth in content and its shortening life span, content delivery networks (CDN) face significant challenges in effectively caching large video libraries. While cache hit rates of 98% and higher are taken for granted, the figures above suggest that simple disc space inflation is not remotely enough to keep the cache hot ratio at the desired figures. This presentation explores many approaches, including tiered cache systems which use a hierarchical system of caching servers employing consistent hashing and other techniques to maximize scalability and performance while minimizing failover and downtime. It also covers one-hit-wonder elimination, utilizing simple counters to reduce cache pollution by avoiding storing unpopular content. It also addresses cache-state sharing, which employs Bloom-filter-based technology to further improve cache scalability and effective disk space utilization. Moreover, it will examine the deployment of edge computing to amplify caching efficiency in specific use cases.
Read more
Yingyu Yao
YouTube
Your TV Is Eating Your Frames
At YouTube, we aspire to stream cat videos to everything that has a screen, including the largest of them all: TVs in your living room. Despite being devices engineered to be video playback powerhouses, it is unexpectedly difficult to make videos play consistently and smoothly on them. From the lens of a player engineer, I will take you on a shallow dive through the TV media stack, and we will explore different ways a playback can get tripped on those large screens.
Read more
Alex Giladi
Comcast
Ads and overlays
The concept of server-guided ad insertion (SGAI), first introduced by Hulu in 2019, is getting increasingly popular in the industry. It is markedly more scalable than the traditional server-side (SSAI) approach, but nearly as resilient. It is more interoperable and more resilient than the client-side (CSAI) approach but is nearly as efficient and versatile. Client-side graphic overlays are to a degree a reincarnation of the banner ads plaguing the web since the '90's. Their main use is not necessarily ad-related -- they are used in a variety of roles from station identification to localization to emergency notification. Their traditional implementation in the video was inserting them in baseband (i.e., pre-transcoder) in a playout system, which is the least scalable and the highest-latency approach possible in the video world. The streaming ecosystem has standardized and maturing support for SGAI. Interstitials are used to implement the approach in HLS. XLink was used in the original MPEG DASH implementation of the approach; however, XLink suffers from a number of design flaws and was never widely implemented in context of live channels and events. Media Presentation Insertion, a recent addition to MPEG DASH, revisits this concept and allows spawning a new media presentation while pausing the main channel. As opposed to HLS interstitials, media presentation insertion allows asynchronous termination ("return to network"), supports VAST tracking, and more. The same server-guided model can be applied to the overlay use case and has a potential to improve scalability, targeting, and glass-to-glass latency in a dramatic way. This talk will first describe the new MPEG-DASH media presentation description approach and its application to SGAI and blackouts. It will then cover the application of the same principles to the graphic overlays in MPEG-DASH. This presentation will conclude with a description and a demo of an open-source implementation of both technologies.
Read more
2:50 PM PDT
Break
3:10 PM PDT
John Bowers
Twitch/Amazon IVS
Free ABR Renditions for User Generated Content Platforms
Well, not exactly free - but much, MUCH lower cost than server-side transcoding! Providing an ABR ladder is table stakes for live viewer experiences, but it’s expensive for at-scale video platforms to provision and maintain specialized infrastructure to handle peak transcoding demand. A recently developed update to the Enhanced RTMP specification adds multitrack video, multitrack audio, and advanced codec support. With implementations in OBS Studio and Wireshark, the technology is ready for you to adopt it. Now you can offer all creators - regardless of audience size or creator - ABR playback. Come and learn why encoding multiple video tracks on the content creator’s machine at the edge is higher quality, lower latency, more scalable compared to server-side transcoding – all while allowing faster innovation and deployment of newer codecs like HEVC and AV1.
Read more
Ryan Cunningham
Scenery
WebCodecs vs. WASM for Fast Video Scrubbing
We built a web-based video editor capable of fast scrubbing and advanced WebGL compositing features. This talk explores the intricacies of building such an editor using WebCodecs for video decoding and preview and contrasts it with traditional methods, specifically using HTML video, or a WASM H264 decoder. The goal is to provide a comprehensive guide on implementing a high-performance video editor preview that leverages modern web technologies while addressing practical challenges and limitations, and also reveal areas where improvement is needed. HTML video elements, while widely used, pose significant limitations for fast scrubbing and precise frame accuracy. Slow seeking, lack of control over frame rendering, and the need to use drawImage to get frames into a WebGL texture can hinder the perceived speed in a video editor. WebCodecs provides a low-level API that allows developers to decode video segments and render them to textures, enabling extremely fast scrubbing and WebGL compositing directly in the web browser. By holding video data in GPU textures, we achieve advanced features such as alpha-transparency using just the H264 decoder. The talk will dive into the implementation details, showcasing pre-loading and garbage collection techniques. We will also discuss the pipeline nature of WebCodecs decoders, which necessitates efficient management of VideoFrames to maintain performance. Despite its advantages, WebCodecs comes with its own set of challenges. The hardware-based implementation means no actual concurrent decodes, and rendering VideoFrames to textures is surprisingly CPU-intensive. Additionally, the performance can be inconsistent across different hardware due to Google's GPU exclusion list in Chrome, which defaults to software decoding on certain computers. This session will cover mitigation strategies, including conducting test decodes to determine performance viability. We will discuss the trade-offs and potential pitfalls of using WebCodecs. Before the advent of WebCodecs, our approach involved using a WASM-compiled H264 decoder, tinyh264. Using WASM in Web Workers, we achieve true concurrent decoding. However, it comes with its own set of limitations. Running entirely on the CPU, it requires managing frames in main memory and handling the upload to the GPU, alongside color space conversions from YUV to RGB. Furthermore, it creates licensing issues since it distributes an H264 decoder. We will discuss the implementation details, performance considerations, and how it compares to WebCodecs.
Read more
Bruce Spang
Netflix
Wei Wei
Netflix Inc
Innovate Live Streaming with a Client Simulator
One of the major challenges in live streaming is the scarcity of real-world events to test innovative ideas and algorithms, such as adaptive bitrate (ABR) algorithms. Relying on actual live events for testing not only prolongs the innovation cycle but also increases the risk of negatively impacting user experience. To overcome this obstacle, we at Netflix have enhanced our existing client simulator to emulate live streaming scenarios. This simulator utilizes network traces and device characteristics gathered from real-world sessions to drive our production client library. We will delve into the specifics of how this simulator operates during our presentation. In summary, the client simulator plays a crucial role in driving innovation at Netflix, which we will explore in detail during our presentation. In this talk, we will first present how the client simulator simulates live streaming. Then we will demonstrate how it can be used to test new live encoding methods, like Variable Bitrate (VBR) encoding, and to evaluate various ABR algorithms on a large scale. We will conclude the talk with future directions.
Read more
4:00 PM PDT
Break
4:40 PM PDT
Gwendal Simon
Synamedia
Token Renewal: Keeping your Streaming Party Smooth and Secured
CDN leaching is a growing concern for content providers. The recent specification of the Common Access Token (CAT) has introduced a vehicle for designing more secure streaming delivery systems. Best practices for CDN content protection often involve renewing the token, either due to short expiration times or probabilistic rejections. However, token renewal is far from trivial. In token-based delivery systems, we identify three key entities: the client, the CDN server, and the token generator. Typically, these communicate via HTTP(S). At any point during a streaming session, the CDN server may request the client to renew its token, ensuring seamless video playback, even for low-latency ABR streaming. The CAT specification includes two claims related to renewal: catr and catif. While the specification details several operation modes, none fully satisfy the combined requirements for fast renewal, legacy clients, and the unique characteristics of DASH and HLS. In this talk, we will unpack the current situation, presenting the pros and cons of each proposed solution. We aim to open the door to a better solution and outline the community effort needed for its implementation.
Read more
Will Law
Akamai
Creative Monkeys Contemplate Dating
The geeky primates at WAVE are releasing version 2 of the popular CMCD standard . While CMCD v1 was restrained to a CDN (data) relationship, v2 gives you three different modes for concurrently sharing data. Now you can date a content steering service, and an analytics service, at the same time as maintaining a committed relationship with your CDN :) This talk highlights the new features and capabilities of CMCD v2. In addition to the reporting mode enhancements, we'll investigate the host of news keys being offered: media start delay, target buffer length, buffer starvation duration, prefetching multiple objects at once, player state, response code, TTFB, timestamps, request URLS and many more. We'll explore how v2 can be used to drive lightweight data for content steering decisioning, rich collection for analytics providers that is decoupled from the delivery and even improved prefetching performance and visibility for the CDN. We'll show it all working and release some code so that you too can experiment. Join us!
Read more
RongKai Guo
NVIDIA
Zoe Liu
Visionular
AI Enhanced GPU Video Coding: Achieving Joint High Compression Efficiency and Throughput
We are here to present a novel approach to significantly boost video compression efficiency on Nvidia NVENC hardware encoders, by leveraging AI-driven pre-analysis and pre-processing algorithms. We refer to this method as AI Enhanced GPU Video Coding, which combines Nvidia NVENC's high density, low latency, and high throughput with ML-based techniques to enhance video compression efficiency and boost visual quality, while maintaining high throughput. NVENC, as a leading hardware-based encoder, excels in providing high throughput and low latency but generally offers lower compression efficiency compared to CPU-based software encoders. Our AI-driven GPU video compression approach aims to leverage the advantages of both NVENC and AI algorithms to achieve high compression efficiency and throughput performance. Our optimization algorithms mainly include: 1. ML-based Scene & Region Classification: Identifying effective coding tools based on scene and region classification. 2. Regions of Interest (ROI) Identification: Focusing on perceptually significant regions, such as faces and jersey numbers in typical sports videos. 3. Pre-processing Techniques: Applying deblurring, denoising, sharpening, contrast adjustment, etc. to boost up visual quality. 4. Hierarchical pre-analysis and pre-classification: Setting fine granular QPs, including block-based QPs, and enabling quick quality monitoring. These techniques combined improve video compression efficiency, boosting both objective and subjective quality while achieving significant bitrate savings. We have applied these methods to large UGC content platforms. Our results demonstrate promising improvements in compression efficiency for both VOD and live use cases. Using the NVIDIA T4 Tensor Core, we maintained the same high throughput for multiple parallel encoding threads and achieved a 15-20% bitrate saving and a 1-2 VMAF score improvement at the same time, on typical UGC & PUGC content compared to the out-of-the-box NVENC approach. Further enhancements, such as re-encoding, are currently being developed and further compression gains are expected.
Read more
Yuriy Reznik
Brightcove, Inc.
Streaming in 1970s. NVP & ST: the very first real-time streaming protocols.
In this talk we will go back in history and look at the very first protocols and systems developed for internet streaming. The venerable NVP (network voice protocol) and ST (Internet Stream Protocol, aka IP v5) protocols developed by Danny Cohen, Jim Forgie, and other brilliant engineers at MIT Lincoln labs in 1970s. We will discuss the key ideas introduced by these protocols (the concepts of sessions, available capacity assessment, rate negotiation between sender and receiver, data transfer protocols, the need for network-layer support for sessions, resource provisioning, etc.) and show how most of these ideas become incorporated in subsequent designs. Specifically, we will show how many ideas introduced in NVP and ST have eventually found their implementations in modern protocols, such as WebRTC, QUIC and MOQ. The talk will include many historical pictures and some videos of those early pioneering systems build in 1970s. It will also try to explain why and what motivated these original developers to come up with all these techniques.
Read more
Matt McClure
Demuxed
Closing Remarks
9:30 AM PDT
Matt McClure
Demuxed
Opening Remarks
Anand Vadera
Meta
Optimizing Storage for Meta's Trillion-Video Catalog: Achieving Pareto Efficiency
Meta manages an extensive video catalog with over a trillion videos across various products, and this number is growing daily with more than one billion new videos added each day. The challenge lies in maintaining an efficient storage footprint while accommodating this continuous influx of new content. The goal is to achieve Pareto efficiency, optimizing the storage space without compromising the quality of the videos delivered. This balance is crucial for sustaining scalability and efficiency in Meta's Video Infrastructure. This talk will delve into an innovative method for addressing the problem at hand. It will discuss the fundamental concepts underpinning this approach and share valuable insights gained during its development and implementation. In particular, it will highlight effective strategies that have proven successful in enhancing storage efficiency without negatively impacting video quality. Furthermore, the presentation will touch upon the ongoing evolution of the system, showcasing how it is continually being improved to better tackle the challenge of managing an ever-growing video catalog while maintaining optimal storage usage. By sharing these learnings with others facing similar challenges, the hope is to contribute to the collective knowledge base and ultimately facilitate the development of more efficient and effective systems for managing large-scale video repositories.
Read more
Jill Boyce
Nokia
Bringing more versatility to VVC with VSEI
Versatile Supplemental Enhancement Information (VSEI) is a companion standard to Versatile Video Coding (VVC). VSEI defines SEI messages that contain metadata inserted into a bitstream synchronized with the coded video, to convey extra information intended to be utilized by the receiver/decoder. SEI messages are optional and are targeted at specific use cases. SEI messages specified in VSEI may also be used with other video coding standards, including H.264/AVC, HEVC, or future standards. Since the initial standardization of VVC and VSEI in 2020, second and third editions of VSEI have been standardized, with a fourth edition under development. The new SEI messages included in new versions of VSEI bring even more versatility to VVC, by addressing a broader variety of applications. This talk will describe several of the new SEI messages and the use cases they enable.
Read more
Matteo Naccari
Visionular
Compression of stereoscopic video with MV-HEVC: fundamentals, tools and development
Multiview (e.g. stereoscopic) content provides users with a fully immersive and compelling quality of experience when watching videos. This type of content is gaining new momentum thanks to the development and commercialisation of Virtual Reality (VR) headsets such as Apple Vision Pro and Oculus Quest. The delivery of Multiview video calls for new challenges to the video coding community, being frames composed of multiple views (two in the case of stereoscopic). Standardisation bodies such as ISO/IEC MPEG and ITU-T VCEG envisaged the compression of Multiview content with the H.265/HEVC standard, extended to efficiently tackle the intrinsic data redundancies present across different views. Thanks to the availability of VR headsets, content providers and codec vendors are now deploying solutions supporting the Multiview extension of H.265/HEVC (collectively known as MV-HEVC). This talk will introduce the MV-HEVC standard from the encoder’s designed perspective, starting with an overview of the standard’s design and tools supported. The focus will then move on to consider the challenges faced when implementing practical encoding solutions such as fast mode decision and rate control.
Read more
10:40 AM PDT
Break
11:15 AM PDT
Tracey Jaquith
Internet Archive
What's on TV? 4 editors and 2 robots walk into a bar..
Using TV news "chyron" text overlays in the "lower third" (from human editors), image-to-text (OCR), grouping/filtering, and AI gpt to summarize --> we social post hourly: "What's on TV?" The non-captions news text (eg: BIDEN VISITS MEXICO) that shows up at the bottom of the screen (like those overhead monitors in airports showing news) is gold, written in real-time by editors during live broadcasts. However, the data is not carried anywhere inside the video streams (just visually). What's a girl with robots to do? Using CNN, MSNBC, Fox News and BBC News feeds, we use ffmpeg to crop the relevant image area; tesseract to OCR the image into text; and GPT AI to summarize, remove ads, and cleanup the text. We then post hourly summaries to mastodon.
Read more
Li-Heng Chen
Netflix
Ryan Lei
Meta
A hitchhiker's guide to AV1 deployment
Six years since its inception as a video coding standard stipulated by the Alliance for Open Media, AV1 has proven its capability as a Swiss Army knife, with application domains spanning the streaming of movies and TV shows, user generated content and real-time video conferencing, including screen content, among others. This talk will feature a roadshow of AV1 deployments that have impacted billions of people's lives, presented by engineers with first-hand experience on its implementation in production systems. Presenters will share tips, tricks, know-hows and the lessons learned from their deployment experience to bring the best performance out of AV1. Example topics include but not limited to: productization of AV1’s film grain synthesis feature and use of AV1 to deliver high dynamic range video at Netflix, AV1 deployment in Instagram Reels, and AV1 support for RTC services at Meta.
Read more
Luke Curley
Discord
Replacing WebRTC with Media over QUIC
It's been over a decade since WebRTC was released. Surely there's something new on the horizon, right? Media over QUIC is an IETF working group that is working on a new live media standard to replace the likes of WebRTC, RTMP/SRT, and HLS/DASH. Wow that's overly ambitious, but it's being backed by your favorite big tech companies (and some non-favorites) in the same standards body that has produced hits such as... WebRTC. But replacing WebRTC is difficult. It exists because there were no web standards in 2011 that could accomplish conferencing; remember this was before even HTML5. But there are new Web standards now! This talk will go over WebTransport and WebCodecs, and how they are utilized to provide a user experience nearly on par with WebRTC while being dramatically more flexible. No more magic black box, no more ICE/STUN/TURN/SDP/DTLS/SCTP/RTP/SRTP/mDNS, no more getting ghosted by Google. Just you with a QUIC connection and the ability to encode/decode video frames. And of course we'll go over the promise of Media over QUIC and why you should use the standard instead of your own bespoke protocol. I'll give you a hint, it starts with C and ends with "DN Support".
Read more
12:20 PM PDT
1:35 PM PDT
Lightning Talks
Jan De Cock
Synamedia
Measuring live video quality with minimal complexity, now available for everyone!
We all love video, and we love it even more when the quality of the video is great! To measure that quality, we already have quite some options, and the folks at Netflix did a great job at giving us VMAF. This is all fine and dandy for our VOD colleagues, but what about us, *live* video engineers? We struggle to optimize every cycle in our live encoders, and spending a full CPU core on metric calculation is just not acceptable -- and not good for business. We spent quite some time figuring out how to simplify this problem. Our marketing people said: "Why don't you use AI"? So we did, and imagine that, in this case it actually worked. We'll forget about all those other projects that got stuck in the trough of AI disillusionment. Turns out that metrics such as SSIM and VMAF can be quite accurately predicted, and by using smart features inside the encoder, this can be done with marginal additional computational complexity. In the talk, we’ll explain how we found a balance between accuracy and complexity of the used features and ML networks. All fine for *your* encoder you say, but how does that help me? Well, we took on the challenge to show that this approach also works for open-source encoders, with x264 as our first target. And, we’re sharing the code, so you can try it out too! And while we’re eagerly awaiting the 10th Demuxed over the coming months, we’ll also be trying this approach on SVT-AV1. Too early to tell if this attempt will be successful, but we’ll be able to tell you in October, and take you through the process during the talk!
Read more
2:45 PM PDT
Break
3:05 PM PDT
Tony McNamara
Paramount Streaming
Pseudo-Interstitials: Playback flexibility for legacy devices.
Interstitials allow the insertion of content by reference into a playback stream, and are especially useful when a playlist won't work. But Interstitials are also still relatively new; just a year ago Apple devices didn't support playback of them, despite Apple having accepted them into the HLS Specification years earlier. DASH XLinks suffer the general inconsistency so consistent in DASH. And of course legacy devices tend to be stuck on much earlier protocol versions. We've come up with "Pseudo-Interstitials", which provide much of the same flexibility, to allow very-late decisioning and binding of content, especially ads, into playback of legacy devices. This will include a very brief introduction to interstitials and their value, and the problem statement, and then a deep dive into the multi-disciplinary solution including encoding concerns, manifest manipulation, Edge Computing and even briefly SSAI constraints.
Read more
Jon Dahl
Mux
A taxonomy of video "quality," or: was Strobe right or wrong about quality?
“Quality” is one of the most abused and overloaded terms in video. Orwell says that unclear language leads to unclear thinking, and: wow, our industry suffers from unclear thinking around quality. We conflate codecs like AV1 with "high quality"; we don’t know the difference between QoS and QoE; we’re 👍 on VMAF but we don’t really know how to use it. Meanwhile, Strobe gets on stage at Demuxed 2018 and says “Video quality doesn’t matter” (as the audience gasps in horror). In this talk, we’ll bring clarity and precision to the domain of “video quality.” We’ll learn the difference between QoE, QoS, perceptual quality, fidelity, efficiency, and more. We will review a schema that once and for all will eliminate all confusion, doubt, and ignorance from this area, driving our industry forward into a more enlightened future. And most importantly, we’ll learn whether Strobe was right and wrong when he said quality didn’t matter.
Read more
Vanessa Pyne
Daily
Derek Buitenhuis
Vimeo
Be the change you want to see: How to contribute to FFmpeg
Have you ever written code you wanted to contribute to FFmpeg, but you got a little tripped up in the send-email situation or maybe you got some feedback you weren't sure how to handle and your patch never made it across the finish line? Maybe you went to github to make a PR, saw PULL REQUESTS ARE IGNORED, followed the link to the contribution documentation, saw a 28 point checklist and backed away slowly from your computer. Don't give up the dream! This talk will review the entire FFmpeg contribution process from soup to nuts and demystify the scary parts. It will focus on procedure and potential sharp edges thereof, rather than the actual code contribution itself. If that sounds very dry, rest assure the only thing dry about it will be the wit. This information may be elementary to some folks, but to paraphrase a recent FFmpeg-devel mailing list sentiment: "More diversity would be good." Making the process more accessible is key to making the circle bigger and encouraging a more diverse group of people to participate in the FFmpeg-devel ecosystem. If we want some new kids on the block, there should be a step by step guide, and this talk aims to be that just that. A brief outline of the talk is as follows: 1. How to lurk (mailing list & IRC) 2. Find a thing to fix, improve, create 3. How to run regression tests (FATE, etc) 4. How to git patch (aka how to send an email) 5. How to address feedback 6. It merged! Now what?
Read more
3:55 PM PDT
Break
4:35 PM PDT
Fabio Sonnati
NTT Data
The Long(est) Night
April 28, 2019, a phone call wakes me in the middle of the night: "TheLong Night", a new episode of the final season of Game of Thrones is airing, but nothing is visible! The artistic intent is clearly extreme, and the encoding can't handle it, resulting in a flurry of confused silhouettes in the darkness, struggling in a dark sea of banding. In this presentation, I will talk about how we resolved an extreme situation for an high quality streaming service by manually customizing encoding to mitigate the problem, inspired by well-known principles in the world of audio processing and 3D game rendering.
Read more
Constanza Dibueno
Qualabs
How to play Dash on your HLS Player
What if I told you that you could play a DASH video seamlessly in an HLS player? In today's broadcasting landscape, interoperability is a challenge. Reaching a broader audience means creating multiple copies of each stream file in different formats, which doubles the costs of packaging and storage. This inefficiency is a significant pain for broadcasters. CMAF was designed to revolutionize HTTP-based streaming media delivery. It streamlines media delivery by using a single, unified transport container compatible with both HLS and DASH protocols. At the latest MonteVideo Tech Summer Camp, we embarked on an exciting project: creating a library based on the CMAF standard and the Hypothetical Application Model. This innovative library provides a practical solution for converting playlists and manifests between HLS and DASH. We brought this vision to life by building a proof of concept. We want to present to you an intuitive Open Source UI built on top of the library. In this presentation, we will showcase how the UI can help you to understand the library's powerful capabilities with the potential to create tools to simplify the broadcasting experience, without having to go deep into CMAF's specification complexities. For example, allowing users to take a DASH manifest as an input, convert it to an HLS playlist on-the-fly and reproduce the content on an HLS player. With this capability, broadcasters could adapt to different streaming requirements, delivering content across various platforms and devices, thereby enhancing adaptability and flexibility. By the end of this presentation, we aim to show approaches that could enhance interoperability in your broadcasting operations, using the CMAF HAM UI as a tool.
Read more
Steve Robertson
YouTube
Why is gapless so hard?
A deep dive into audio gaplessness, for video engineers. Covering the difference between stitching, pseudo-gapless, and true gapless approaches, why gapless is important to the art, the mechanical reasons why the audio clock always wins, how the system reconciles this instability, and why this leads to dropped frames and A/V sync issues.
Read more
Surprise
Matt McClure
Demuxed
Closing Remarks
Li-Heng Chen
Netflix
Ryan Lei
Meta
A hitchhiker's guide to AV1 deployment
Six years since its inception as a video coding standard stipulated by the Alliance for Open Media, AV1 has proven its capability as a Swiss Army knife, with application domains spanning the streaming of movies and TV shows, user generated content and real-time video conferencing, including screen content, among others. This talk will feature a roadshow of AV1 deployments that have impacted billions of people's lives, presented by engineers with first-hand experience on its implementation in production systems. Presenters will share tips, tricks, know-hows and the lessons learned from their deployment experience to bring the best performance out of AV1. Example topics include but not limited to: productization of AV1’s film grain synthesis feature and use of AV1 to deliver high dynamic range video at Netflix, AV1 deployment in Instagram Reels, and AV1 support for RTC services at Meta.
Jon Dahl
Mux
A taxonomy of video "quality," or: was Strobe right or wrong about quality?
“Quality” is one of the most abused and overloaded terms in video. Orwell says that unclear language leads to unclear thinking, and: wow, our industry suffers from unclear thinking around quality. We conflate codecs like AV1 with "high quality"; we don’t know the difference between QoS and QoE; we’re 👍 on VMAF but we don’t really know how to use it. Meanwhile, Strobe gets on stage at Demuxed 2018 and says “Video quality doesn’t matter” (as the audience gasps in horror). In this talk, we’ll bring clarity and precision to the domain of “video quality.” We’ll learn the difference between QoE, QoS, perceptual quality, fidelity, efficiency, and more. We will review a schema that once and for all will eliminate all confusion, doubt, and ignorance from this area, driving our industry forward into a more enlightened future. And most importantly, we’ll learn whether Strobe was right and wrong when he said quality didn’t matter.
Alex Giladi
Comcast
Ads and overlays
The concept of server-guided ad insertion (SGAI), first introduced by Hulu in 2019, is getting increasingly popular in the industry. It is markedly more scalable than the traditional server-side (SSAI) approach, but nearly as resilient. It is more interoperable and more resilient than the client-side (CSAI) approach but is nearly as efficient and versatile. Client-side graphic overlays are to a degree a reincarnation of the banner ads plaguing the web since the '90's. Their main use is not necessarily ad-related -- they are used in a variety of roles from station identification to localization to emergency notification. Their traditional implementation in the video was inserting them in baseband (i.e., pre-transcoder) in a playout system, which is the least scalable and the highest-latency approach possible in the video world. The streaming ecosystem has standardized and maturing support for SGAI. Interstitials are used to implement the approach in HLS. XLink was used in the original MPEG DASH implementation of the approach; however, XLink suffers from a number of design flaws and was never widely implemented in context of live channels and events. Media Presentation Insertion, a recent addition to MPEG DASH, revisits this concept and allows spawning a new media presentation while pausing the main channel. As opposed to HLS interstitials, media presentation insertion allows asynchronous termination ("return to network"), supports VAST tracking, and more. The same server-guided model can be applied to the overlay use case and has a potential to improve scalability, targeting, and glass-to-glass latency in a dramatic way. This talk will first describe the new MPEG-DASH media presentation description approach and its application to SGAI and blackouts. It will then cover the application of the same principles to the graphic overlays in MPEG-DASH. This presentation will conclude with a description and a demo of an open-source implementation of both technologies.
RongKai Guo
NVIDIA
Zoe Liu
Visionular
AI Enhanced GPU Video Coding: Achieving Joint High Compression Efficiency and Throughput
We are here to present a novel approach to significantly boost video compression efficiency on Nvidia NVENC hardware encoders, by leveraging AI-driven pre-analysis and pre-processing algorithms. We refer to this method as AI Enhanced GPU Video Coding, which combines Nvidia NVENC's high density, low latency, and high throughput with ML-based techniques to enhance video compression efficiency and boost visual quality, while maintaining high throughput. NVENC, as a leading hardware-based encoder, excels in providing high throughput and low latency but generally offers lower compression efficiency compared to CPU-based software encoders. Our AI-driven GPU video compression approach aims to leverage the advantages of both NVENC and AI algorithms to achieve high compression efficiency and throughput performance. Our optimization algorithms mainly include: 1. ML-based Scene & Region Classification: Identifying effective coding tools based on scene and region classification. 2. Regions of Interest (ROI) Identification: Focusing on perceptually significant regions, such as faces and jersey numbers in typical sports videos. 3. Pre-processing Techniques: Applying deblurring, denoising, sharpening, contrast adjustment, etc. to boost up visual quality. 4. Hierarchical pre-analysis and pre-classification: Setting fine granular QPs, including block-based QPs, and enabling quick quality monitoring. These techniques combined improve video compression efficiency, boosting both objective and subjective quality while achieving significant bitrate savings. We have applied these methods to large UGC content platforms. Our results demonstrate promising improvements in compression efficiency for both VOD and live use cases. Using the NVIDIA T4 Tensor Core, we maintained the same high throughput for multiple parallel encoding threads and achieved a 15-20% bitrate saving and a 1-2 VMAF score improvement at the same time, on typical UGC & PUGC content compared to the out-of-the-box NVENC approach. Further enhancements, such as re-encoding, are currently being developed and further compression gains are expected.
Vanessa Pyne
Daily
Derek Buitenhuis
Vimeo
Be the change you want to see: How to contribute to FFmpeg
Have you ever written code you wanted to contribute to FFmpeg, but you got a little tripped up in the send-email situation or maybe you got some feedback you weren't sure how to handle and your patch never made it across the finish line? Maybe you went to github to make a PR, saw PULL REQUESTS ARE IGNORED, followed the link to the contribution documentation, saw a 28 point checklist and backed away slowly from your computer. Don't give up the dream! This talk will review the entire FFmpeg contribution process from soup to nuts and demystify the scary parts. It will focus on procedure and potential sharp edges thereof, rather than the actual code contribution itself. If that sounds very dry, rest assure the only thing dry about it will be the wit. This information may be elementary to some folks, but to paraphrase a recent FFmpeg-devel mailing list sentiment: "More diversity would be good." Making the process more accessible is key to making the circle bigger and encouraging a more diverse group of people to participate in the FFmpeg-devel ecosystem. If we want some new kids on the block, there should be a step by step guide, and this talk aims to be that just that. A brief outline of the talk is as follows: 1. How to lurk (mailing list & IRC) 2. Find a thing to fix, improve, create 3. How to run regression tests (FATE, etc) 4. How to git patch (aka how to send an email) 5. How to address feedback 6. It merged! Now what?
Jill Boyce
Nokia
Bringing more versatility to VVC with VSEI
Versatile Supplemental Enhancement Information (VSEI) is a companion standard to Versatile Video Coding (VVC). VSEI defines SEI messages that contain metadata inserted into a bitstream synchronized with the coded video, to convey extra information intended to be utilized by the receiver/decoder. SEI messages are optional and are targeted at specific use cases. SEI messages specified in VSEI may also be used with other video coding standards, including H.264/AVC, HEVC, or future standards. Since the initial standardization of VVC and VSEI in 2020, second and third editions of VSEI have been standardized, with a fourth edition under development. The new SEI messages included in new versions of VSEI bring even more versatility to VVC, by addressing a broader variety of applications. This talk will describe several of the new SEI messages and the use cases they enable.
Tanushree Nori
Vimeo
Budgeting Bytes: Acing Cost-Efficient Video Storage
In today's world, where data never stops growing, Vimeo is at the forefront, cleverly slashing storage costs while keeping videos readily accessible. In my talk, I’ll peel back the curtain on how we fine-tune cloud storage using Machine Learning, balancing cost savings with cheap and quick video access at Vimeo. We’ve cut our storage bills by an impressive 60% by applying smart lifecycle policies and a dash of machine learning methods. I'll share insights on how we determine the best times to tuck away older videos into cheaper storage tiers and what factors go into these decisions. This talk will offer practical strategies and a peek into the tools that help Vimeo manage a sprawling video library efficiently. Discover how these innovations can help reshape your approach to data storage too!
Matteo Naccari
Visionular
Compression of stereoscopic video with MV-HEVC: fundamentals, tools and development
Multiview (e.g. stereoscopic) content provides users with a fully immersive and compelling quality of experience when watching videos. This type of content is gaining new momentum thanks to the development and commercialisation of Virtual Reality (VR) headsets such as Apple Vision Pro and Oculus Quest. The delivery of Multiview video calls for new challenges to the video coding community, being frames composed of multiple views (two in the case of stereoscopic). Standardisation bodies such as ISO/IEC MPEG and ITU-T VCEG envisaged the compression of Multiview content with the H.265/HEVC standard, extended to efficiently tackle the intrinsic data redundancies present across different views. Thanks to the availability of VR headsets, content providers and codec vendors are now deploying solutions supporting the Multiview extension of H.265/HEVC (collectively known as MV-HEVC). This talk will introduce the MV-HEVC standard from the encoder’s designed perspective, starting with an overview of the standard’s design and tools supported. The focus will then move on to consider the challenges faced when implementing practical encoding solutions such as fast mode decision and rate control.
Will Law
Akamai
Creative Monkeys Contemplate Dating
The geeky primates at WAVE are releasing version 2 of the popular CMCD standard . While CMCD v1 was restrained to a CDN (data) relationship, v2 gives you three different modes for concurrently sharing data. Now you can date a content steering service, and an analytics service, at the same time as maintaining a committed relationship with your CDN :) This talk highlights the new features and capabilities of CMCD v2. In addition to the reporting mode enhancements, we'll investigate the host of news keys being offered: media start delay, target buffer length, buffer starvation duration, prefetching multiple objects at once, player state, response code, TTFB, timestamps, request URLS and many more. We'll explore how v2 can be used to drive lightweight data for content steering decisioning, rich collection for analytics providers that is decoupled from the delivery and even improved prefetching performance and visibility for the CDN. We'll show it all working and release some code so that you too can experiment. Join us!
Jason Cloud
Dolby Laboratories
Jeff Riedmiller
Dolby Laboratories
Does a multi-CDN setup (truly) requiring switching? Deploying an anti-switching multi-CDN delivery platform.
It seems pretty clear that using multiple CDNs to deliver media is a good thing; but it’s hard to do effectively. What is the best policy to use? How do you determine when to switch? How often do you switch? Do you switch based on client performance alone, consolidated user metrics, or something else? What happens when the CDN you switched to isn’t performing as good as you thought it would? Answering these questions (let alone designing a multi-CDN switching architecture) is enough to give anyone a headache. What if we throw out “switching” by downloading media from multiple CDNs at the same time? We could then realize the best performance by merging the performance of each. Seems simple enough until you start trying to do it efficiently. Do you race media from multiple CDNs at the same time, or do you try to perform sub-segment/byte-level scheduling? This seems even more complicated than before! This talk will focus on how to implement and deploy a switchless multi-source media delivery platform that is highly performant and efficient which avoids having to answer these difficult questions or solving massively complicated scheduling problems. Enabling true multi-source delivery without all the fuss requires us to do something a little bit unique to the content we are streaming. We first create multiple “versions” (one for each source aka. CDN) of each and every HLS or MPEG-DASH segment. This is done by packaging these segments into something called the Coded Multisource Media Format (CMMF), which is currently being standardized in ETSI. CMMF is essentially a container that is used to communicate network encoded content (network coding is kind of like forward error correction (FEC), but not – we’ll expand upon this more during the talk). Each CMMF version is then cached on a different CDN. Now let’s say a media player wants to download a particular video segment. Instead of downloading the entire segment from one CDN or requesting bytes x1 through y1 of that segment from one CDN and x2 through y2 from another, the media player requests multiple CMMF versions of that segment from different CDNs at the same time. Once the player receives enough data (the amount required is very close to that of the original segment) from the collection of CDNs, it can stop the download and recover the segment it wanted. By doing it this way, we don’t have to worry about hiccups (like temporary congestion or slow response times) on one CDN because we will just download more from the others. During the talk, we will introduce CMMF, the core concepts behind it, as well as go over how we deployed it within a streaming service with 20+ million subscribers world-wide and streamed approximately one million hours of content using it. We will also provide performance data that shows how CMMF-enabled multi-CDN delivery stacks up against a popular multi-CDN switching approach (as you can guess, it stacked up well). We hope this talk provides the audience with a different perspective to an “age-old” problem and inspires them to explore multisource delivery in greater detail.
Katerina Dobnerova
CDN77
Enhancing CDN Performance and Cutting Egress Costs in Large Video Libraries Delivery: Advanced Caching Strategies and Edge Computing Optimization
During the 20 minutes of my presentation, users worldwide will generate content equivalent to the volume created from the dawn of civilisation until 2003. The volume of content being created today is staggering. Consider this: from the beginning of recorded history until 2003, we produced roughly 5 exabytes of content. However, projections suggest a monumental leap to 147 zettabytes in 2024 alone, with video content leading the charge. With such exponential growth in content and its shortening life span, content delivery networks (CDN) face significant challenges in effectively caching large video libraries. While cache hit rates of 98% and higher are taken for granted, the figures above suggest that simple disc space inflation is not remotely enough to keep the cache hot ratio at the desired figures. This presentation explores many approaches, including tiered cache systems which use a hierarchical system of caching servers employing consistent hashing and other techniques to maximize scalability and performance while minimizing failover and downtime. It also covers one-hit-wonder elimination, utilizing simple counters to reduce cache pollution by avoiding storing unpopular content. It also addresses cache-state sharing, which employs Bloom-filter-based technology to further improve cache scalability and effective disk space utilization. Moreover, it will examine the deployment of edge computing to amplify caching efficiency in specific use cases.
John Bowers
Twitch/Amazon IVS
Free ABR Renditions for User Generated Content Platforms
Well, not exactly free - but much, MUCH lower cost than server-side transcoding! Providing an ABR ladder is table stakes for live viewer experiences, but it’s expensive for at-scale video platforms to provision and maintain specialized infrastructure to handle peak transcoding demand. A recently developed update to the Enhanced RTMP specification adds multitrack video, multitrack audio, and advanced codec support. With implementations in OBS Studio and Wireshark, the technology is ready for you to adopt it. Now you can offer all creators - regardless of audience size or creator - ABR playback. Come and learn why encoding multiple video tracks on the content creator’s machine at the edge is higher quality, lower latency, more scalable compared to server-side transcoding – all while allowing faster innovation and deployment of newer codecs like HEVC and AV1.
Constanza Dibueno
Qualabs
How to play Dash on your HLS Player
What if I told you that you could play a DASH video seamlessly in an HLS player? In today's broadcasting landscape, interoperability is a challenge. Reaching a broader audience means creating multiple copies of each stream file in different formats, which doubles the costs of packaging and storage. This inefficiency is a significant pain for broadcasters. CMAF was designed to revolutionize HTTP-based streaming media delivery. It streamlines media delivery by using a single, unified transport container compatible with both HLS and DASH protocols. At the latest MonteVideo Tech Summer Camp, we embarked on an exciting project: creating a library based on the CMAF standard and the Hypothetical Application Model. This innovative library provides a practical solution for converting playlists and manifests between HLS and DASH. We brought this vision to life by building a proof of concept. We want to present to you an intuitive Open Source UI built on top of the library. In this presentation, we will showcase how the UI can help you to understand the library's powerful capabilities with the potential to create tools to simplify the broadcasting experience, without having to go deep into CMAF's specification complexities. For example, allowing users to take a DASH manifest as an input, convert it to an HLS playlist on-the-fly and reproduce the content on an HLS player. With this capability, broadcasters could adapt to different streaming requirements, delivering content across various platforms and devices, thereby enhancing adaptability and flexibility. By the end of this presentation, we aim to show approaches that could enhance interoperability in your broadcasting operations, using the CMAF HAM UI as a tool.
Bruce Spang
Netflix
Wei Wei
Netflix Inc
Innovate Live Streaming with a Client Simulator
One of the major challenges in live streaming is the scarcity of real-world events to test innovative ideas and algorithms, such as adaptive bitrate (ABR) algorithms. Relying on actual live events for testing not only prolongs the innovation cycle but also increases the risk of negatively impacting user experience. To overcome this obstacle, we at Netflix have enhanced our existing client simulator to emulate live streaming scenarios. This simulator utilizes network traces and device characteristics gathered from real-world sessions to drive our production client library. We will delve into the specifics of how this simulator operates during our presentation. In summary, the client simulator plays a crucial role in driving innovation at Netflix, which we will explore in detail during our presentation. In this talk, we will first present how the client simulator simulates live streaming. Then we will demonstrate how it can be used to test new live encoding methods, like Variable Bitrate (VBR) encoding, and to evaluate various ABR algorithms on a large scale. We will conclude the talk with future directions.
Jan De Cock
Synamedia
Measuring live video quality with minimal complexity, now available for everyone!
We all love video, and we love it even more when the quality of the video is great! To measure that quality, we already have quite some options, and the folks at Netflix did a great job at giving us VMAF. This is all fine and dandy for our VOD colleagues, but what about us, *live* video engineers? We struggle to optimize every cycle in our live encoders, and spending a full CPU core on metric calculation is just not acceptable -- and not good for business. We spent quite some time figuring out how to simplify this problem. Our marketing people said: "Why don't you use AI"? So we did, and imagine that, in this case it actually worked. We'll forget about all those other projects that got stuck in the trough of AI disillusionment. Turns out that metrics such as SSIM and VMAF can be quite accurately predicted, and by using smart features inside the encoder, this can be done with marginal additional computational complexity. In the talk, we’ll explain how we found a balance between accuracy and complexity of the used features and ML networks. All fine for *your* encoder you say, but how does that help me? Well, we took on the challenge to show that this approach also works for open-source encoders, with x264 as our first target. And, we’re sharing the code, so you can try it out too! And while we’re eagerly awaiting the 10th Demuxed over the coming months, we’ll also be trying this approach on SVT-AV1. Too early to tell if this attempt will be successful, but we’ll be able to tell you in October, and take you through the process during the talk!
Anand Vadera
Meta
Optimizing Storage for Meta's Trillion-Video Catalog: Achieving Pareto Efficiency
Meta manages an extensive video catalog with over a trillion videos across various products, and this number is growing daily with more than one billion new videos added each day. The challenge lies in maintaining an efficient storage footprint while accommodating this continuous influx of new content. The goal is to achieve Pareto efficiency, optimizing the storage space without compromising the quality of the videos delivered. This balance is crucial for sustaining scalability and efficiency in Meta's Video Infrastructure. This talk will delve into an innovative method for addressing the problem at hand. It will discuss the fundamental concepts underpinning this approach and share valuable insights gained during its development and implementation. In particular, it will highlight effective strategies that have proven successful in enhancing storage efficiency without negatively impacting video quality. Furthermore, the presentation will touch upon the ongoing evolution of the system, showcasing how it is continually being improved to better tackle the challenge of managing an ever-growing video catalog while maintaining optimal storage usage. By sharing these learnings with others facing similar challenges, the hope is to contribute to the collective knowledge base and ultimately facilitate the development of more efficient and effective systems for managing large-scale video repositories.
Eric Tang
Livepeer
Progress and Opportunities in Video Content Authenticity
In an era where AI-generated video is rapidly becoming the norm, the need for video content authenticity has never been more critical. Over the past year, we've witnessed significant strides in this area, with industry giants like OpenAI, Google, and BBC joining the Coalition for Content Provenance and Authenticity (C2PA) and committing to integrate this technology into their core products. Join us in this enlightening session as we dive into C2PA’s significant technical advancements over the past year, and map out a practical approach for implementing C2PA in any video application. Discover the intricacies of C2PA’s trust model and understand how it safeguards users on video platforms. We'll also cover essential implementation considerations, from video player UX to backend video workflow management. As long standing members of the Content Authenticity Initiative (CAI) and a key contributor to C2PA, we bring a wealth of experience from participating in weekly working groups and shaping the last two versions of the C2PA specification. Our expertise is backed by numerous workshops and presentations at leading conferences and industry events like NAB and C2PA symposium.
Tony McNamara
Paramount Streaming
Pseudo-Interstitials: Playback flexibility for legacy devices.
Interstitials allow the insertion of content by reference into a playback stream, and are especially useful when a playlist won't work. But Interstitials are also still relatively new; just a year ago Apple devices didn't support playback of them, despite Apple having accepted them into the HLS Specification years earlier. DASH XLinks suffer the general inconsistency so consistent in DASH. And of course legacy devices tend to be stuck on much earlier protocol versions. We've come up with "Pseudo-Interstitials", which provide much of the same flexibility, to allow very-late decisioning and binding of content, especially ads, into playback of legacy devices. This will include a very brief introduction to interstitials and their value, and the problem statement, and then a deep dive into the multi-disciplinary solution including encoding concerns, manifest manipulation, Edge Computing and even briefly SSAI constraints.
Walker Griggs
Mux
PSSH, or the Primordial Soup of Secure Headers
Consider our friendly, neighborhood PSSH box. The semantics are simple -- to identify encryption keys -- but, as with any permissive specification, there’s a lot more going on than meets the eye. In some cases, they contain deeply nested little-endian UTF16 XML. In others, we’ll find protocol buffers containing base64-encoded JSON. In all cases, they have surprising amount of personality. In this talk, we will dive deep into several PSSH boxes, dissecting them bit by bit across various popular DRM schemes. Along the way, we will: 1. Explore the history of the PSSH box and how it mirrors the evolution of DRM standards. 2. Discover how each provider has imparted their own company idioms onto the loosely-defined PSSH payload. 3. Identify where the decisions of one provider impacted the rest.
Luke Curley
Discord
Replacing WebRTC with Media over QUIC
It's been over a decade since WebRTC was released. Surely there's something new on the horizon, right? Media over QUIC is an IETF working group that is working on a new live media standard to replace the likes of WebRTC, RTMP/SRT, and HLS/DASH. Wow that's overly ambitious, but it's being backed by your favorite big tech companies (and some non-favorites) in the same standards body that has produced hits such as... WebRTC. But replacing WebRTC is difficult. It exists because there were no web standards in 2011 that could accomplish conferencing; remember this was before even HTML5. But there are new Web standards now! This talk will go over WebTransport and WebCodecs, and how they are utilized to provide a user experience nearly on par with WebRTC while being dramatically more flexible. No more magic black box, no more ICE/STUN/TURN/SDP/DTLS/SCTP/RTP/SRTP/mDNS, no more getting ghosted by Google. Just you with a QUIC connection and the ability to encode/decode video frames. And of course we'll go over the promise of Media over QUIC and why you should use the standard instead of your own bespoke protocol. I'll give you a hint, it starts with C and ends with "DN Support".
Yuriy Reznik
Brightcove, Inc.
Streaming in 1970s. NVP & ST: the very first real-time streaming protocols.
In this talk we will go back in history and look at the very first protocols and systems developed for internet streaming. The venerable NVP (network voice protocol) and ST (Internet Stream Protocol, aka IP v5) protocols developed by Danny Cohen, Jim Forgie, and other brilliant engineers at MIT Lincoln labs in 1970s. We will discuss the key ideas introduced by these protocols (the concepts of sessions, available capacity assessment, rate negotiation between sender and receiver, data transfer protocols, the need for network-layer support for sessions, resource provisioning, etc.) and show how most of these ideas become incorporated in subsequent designs. Specifically, we will show how many ideas introduced in NVP and ST have eventually found their implementations in modern protocols, such as WebRTC, QUIC and MOQ. The talk will include many historical pictures and some videos of those early pioneering systems build in 1970s. It will also try to explain why and what motivated these original developers to come up with all these techniques.
Alex Field
Sky/NBCU
The Colorful Truth of Automated Tests
Trying to automatically test what the end user actually sees and hears on their streaming device is hard - very hard. Automated testing methods often rely on unreliable data from player APIs, leading to inaccurate results. This talk aims to showcase our journey of how we experimented with content encoded with visual and audio queues to validate that our player APIs are really telling the truth about what the user is seeing.
Fabio Sonnati
NTT Data
The Long(est) Night
April 28, 2019, a phone call wakes me in the middle of the night: "TheLong Night", a new episode of the final season of Game of Thrones is airing, but nothing is visible! The artistic intent is clearly extreme, and the encoding can't handle it, resulting in a flurry of confused silhouettes in the darkness, struggling in a dark sea of banding. In this presentation, I will talk about how we resolved an extreme situation for an high quality streaming service by manually customizing encoding to mitigate the problem, inspired by well-known principles in the world of audio processing and 3D game rendering.
Gwendal Simon
Synamedia
Token Renewal: Keeping your Streaming Party Smooth and Secured
CDN leaching is a growing concern for content providers. The recent specification of the Common Access Token (CAT) has introduced a vehicle for designing more secure streaming delivery systems. Best practices for CDN content protection often involve renewing the token, either due to short expiration times or probabilistic rejections. However, token renewal is far from trivial. In token-based delivery systems, we identify three key entities: the client, the CDN server, and the token generator. Typically, these communicate via HTTP(S). At any point during a streaming session, the CDN server may request the client to renew its token, ensuring seamless video playback, even for low-latency ABR streaming. The CAT specification includes two claims related to renewal: catr and catif. While the specification details several operation modes, none fully satisfy the combined requirements for fast renewal, legacy clients, and the unique characteristics of DASH and HLS. In this talk, we will unpack the current situation, presenting the pros and cons of each proposed solution. We aim to open the door to a better solution and outline the community effort needed for its implementation.
Thomas Edwards
Amazon Web Services
Video Processing on Quantum Computers
Quantum computing (QC) utilizes quantum mechanics to solve complex problems faster than on "classical" computers. QCs available today are considered "Noisy Intermediate-Scale Quantum" (NISQ) computers with a small number of quantum bits (qubits) and limited performance due to short coherence time and noisy gates. QCs are improving all the time, so it is possible that in the future they could provide practical acceleration to video processing workflows (remember how neural networks were in the 1990's?). This presentation will give a short overview of QC basics, results of representing (simple) images on an actual cloud-accessible QC, and will describe some research on potential video processing applications of QCs. [Note: I've timed that this can be presented in 20 minutes]
Ryan Cunningham
Scenery
WebCodecs vs. WASM for Fast Video Scrubbing
We built a web-based video editor capable of fast scrubbing and advanced WebGL compositing features. This talk explores the intricacies of building such an editor using WebCodecs for video decoding and preview and contrasts it with traditional methods, specifically using HTML video, or a WASM H264 decoder. The goal is to provide a comprehensive guide on implementing a high-performance video editor preview that leverages modern web technologies while addressing practical challenges and limitations, and also reveal areas where improvement is needed. HTML video elements, while widely used, pose significant limitations for fast scrubbing and precise frame accuracy. Slow seeking, lack of control over frame rendering, and the need to use drawImage to get frames into a WebGL texture can hinder the perceived speed in a video editor. WebCodecs provides a low-level API that allows developers to decode video segments and render them to textures, enabling extremely fast scrubbing and WebGL compositing directly in the web browser. By holding video data in GPU textures, we achieve advanced features such as alpha-transparency using just the H264 decoder. The talk will dive into the implementation details, showcasing pre-loading and garbage collection techniques. We will also discuss the pipeline nature of WebCodecs decoders, which necessitates efficient management of VideoFrames to maintain performance. Despite its advantages, WebCodecs comes with its own set of challenges. The hardware-based implementation means no actual concurrent decodes, and rendering VideoFrames to textures is surprisingly CPU-intensive. Additionally, the performance can be inconsistent across different hardware due to Google's GPU exclusion list in Chrome, which defaults to software decoding on certain computers. This session will cover mitigation strategies, including conducting test decodes to determine performance viability. We will discuss the trade-offs and potential pitfalls of using WebCodecs. Before the advent of WebCodecs, our approach involved using a WASM-compiled H264 decoder, tinyh264. Using WASM in Web Workers, we achieve true concurrent decoding. However, it comes with its own set of limitations. Running entirely on the CPU, it requires managing frames in main memory and handling the upload to the GPU, alongside color space conversions from YUV to RGB. Furthermore, it creates licensing issues since it distributes an H264 decoder. We will discuss the implementation details, performance considerations, and how it compares to WebCodecs.
Tracey Jaquith
Internet Archive
What's on TV? 4 editors and 2 robots walk into a bar..
Using TV news "chyron" text overlays in the "lower third" (from human editors), image-to-text (OCR), grouping/filtering, and AI gpt to summarize --> we social post hourly: "What's on TV?" The non-captions news text (eg: BIDEN VISITS MEXICO) that shows up at the bottom of the screen (like those overhead monitors in airports showing news) is gold, written in real-time by editors during live broadcasts. However, the data is not carried anywhere inside the video streams (just visually). What's a girl with robots to do? Using CNN, MSNBC, Fox News and BBC News feeds, we use ffmpeg to crop the relevant image area; tesseract to OCR the image into text; and GPT AI to summarize, remove ads, and cleanup the text. We then post hourly summaries to mastodon.
Steve Robertson
YouTube
Why is gapless so hard?
A deep dive into audio gaplessness, for video engineers. Covering the difference between stitching, pseudo-gapless, and true gapless approaches, why gapless is important to the art, the mechanical reasons why the audio clock always wins, how the system reconciles this instability, and why this leads to dropped frames and A/V sync issues.
Yingyu Yao
YouTube
Your TV Is Eating Your Frames
At YouTube, we aspire to stream cat videos to everything that has a screen, including the largest of them all: TVs in your living room. Despite being devices engineered to be video playback powerhouses, it is unexpectedly difficult to make videos play consistently and smoothly on them. From the lens of a player engineer, I will take you on a shallow dive through the TV media stack, and we will explore different ways a playback can get tripped on those large screens.
10 years of Demuxed!
Demuxed is video engineers talking about video technology
Our first meeting was a single day event back in 2015, born out of the SF Video Technology meetup. The video industry had plenty of trade shows and other opportunities for The Business, but our goal was to create a conference and community for the engineers building the technology powering video, from encoding, to delivery, to playback, and beyond. We’ve grown a lot since then, but our goal remains the same.
After creating Demuxed, some of the organizers went on to start and work at Mux. Mux continues to sponsor most of the organizational work behind the scenes (thanks for the salary!), but Demuxed is, at its core, a community-led event.
Every year we get a group together that’s kind enough to do things like schedule planning, help brainstorm cool swag, and, most importantly, argue heatedly over which talk submissions should make the final cut. These folks are the ones hanging out in Video Dev Slack, and they hail from all over the industry.
Our sponsors
We thank all our amazing sponsors!
Organized with by video nerds around the world
Contact: info@demuxed.com
Legal stuff: Code of Conduct