The conference for video devs

October 16th – 17th, 2024
Regency Ballroom, San Francisco

Catch up

Join our mailing list

About us

Why attend?

No marketing, ever.

Speakers are selected based on their submission, not how much money their company paid; we will never, ever sell a speaking slot. Attendee information isn’t for sale either, and that includes any sponsors.

Affordable

We want anyone in the industry to be able to come, which means keeping tickets reasonably priced (thanks largely to our generous sponsors). We also offer free and discounted tickets to students and open source distributors, so please reach out if you’re interested.

For everyone in the community

Our community is dedicated to providing an inclusive, enjoyable experience for everyone in the video industry. In this pursuit, and in keeping with our love for reasonable standards, we adopted the Ada Initiative’s code of conduct.

Speakers

Alex Field

NBCU/Sky

Talk Overview

Alex Giladi

Comcast

Talk Overview

Anand Vadera

Meta

Talk Overview

Bruce Spang

Netflix

Talk Overview

Constanza Dibueno

Qualabs

Talk Overview

Derek Buitenhuis

Vimeo

Talk Overview

Eric Tang

Livepeer

Talk Overview

Fabio Sonnati

NTT Data

Talk Overview

Gwendal Simon

Synamedia

Talk Overview

James Hurley

Twitch/IVS

Talk Overview

Jan De Cock

Synamedia

Talk Overview

Jason Cloud

Dolby Laboratories

Talk Overview

Jeff Riedmiller

Dolby Laboratories

Talk Overview

Jill Boyce

Nokia

Talk Overview

John Bartos

Twitch/IVS

Talk Overview

John Bowers

Twitch/Amazon IVS

Talk Overview

Jon Dahl

Mux

Talk Overview

Katerina Dobnerova

CDN77

Talk Overview

Li-Heng Chen

Netflix

Talk Overview

Luke Curley

Discord

Talk Overview

Matteo Naccari

Visionular

Talk Overview

RongKai Guo

NVIDIA

Talk Overview

Ryan Lei

Meta

Talk Overview

Steve Robertson

YouTube

Talk Overview

Tanushree Nori

Vimeo

Talk Overview

Thomas Edwards

Amazon Web Services

Talk Overview

Tony McNamara

Paramount Streaming

Talk Overview

Tracey Jaquith

Internet Archive

Talk Overview

Vanessa Pyne

Daily

Talk Overview

Walker Griggs

Mux

Talk Overview

Wei Wei

Netflix Inc

Talk Overview

Will Law

Akamai

Talk Overview

Yingyu Yao

YouTube

Talk Overview

Yuriy Reznik

Brightcove, Inc.

Talk Overview

Zoe Liu

Visionular

Talk Overview

Alex Field

NBCU/Sky

The Colorful Truth of Automated Tests

Trying to automatically test what the end user actually sees and hears on their streaming device is hard - very hard. Automated testing methods often rely on unreliable data from player APIs, leading to inaccurate results. This talk aims to showcase our journey of how we experimented with content encoded with visual and audio queues to validate that our player APIs are really telling the truth about what the user is seeing.

Alex Giladi

Comcast

Ads and overlays

The concept of server-guided ad insertion (SGAI), first introduced by Hulu in 2019, is getting increasingly popular in the industry. It is markedly more scalable than the traditional server-side (SSAI) approach, but nearly as resilient. It is more interoperable and more resilient than the client-side (CSAI) approach but is nearly as efficient and versatile. Client-side graphic overlays are to a degree a reincarnation of the banner ads plaguing the web since the '90's. Their main use is not necessarily ad-related -- they are used in a variety of roles from station identification to localization to emergency notification. Their traditional implementation in the video was inserting them in baseband (i.e., pre-transcoder) in a playout system, which is the least scalable and the highest-latency approach possible in the video world. The streaming ecosystem has standardized and maturing support for SGAI. Interstitials are used to implement the approach in HLS. XLink was used in the original MPEG DASH implementation of the approach; however, XLink suffers from a number of design flaws and was never widely implemented in context of live channels and events. Media Presentation Insertion, a recent addition to MPEG DASH, revisits this concept and allows spawning a new media presentation while pausing the main channel. As opposed to HLS interstitials, media presentation insertion allows asynchronous termination ("return to network"), supports VAST tracking, and more. The same server-guided model can be applied to the overlay use case and has a potential to improve scalability, targeting, and glass-to-glass latency in a dramatic way. This talk will first describe the new MPEG-DASH media presentation description approach and its application to SGAI and blackouts. It will then cover the application of the same principles to the graphic overlays in MPEG-DASH. This presentation will conclude with a description and a demo of an open-source implementation of both technologies.

Anand Vadera

Meta

Optimizing Storage for Meta's Trillion-Video Catalog: Achieving Pareto Efficiency

Meta manages an extensive video catalog with over a trillion videos across various products, and this number is growing daily with more than one billion new videos added each day. The challenge lies in maintaining an efficient storage footprint while accommodating this continuous influx of new content. The goal is to achieve Pareto efficiency, optimizing the storage space without compromising the quality of the videos delivered. This balance is crucial for sustaining scalability and efficiency in Meta's Video Infrastructure. This talk will delve into an innovative method for addressing the problem at hand. It will discuss the fundamental concepts underpinning this approach and share valuable insights gained during its development and implementation. In particular, it will highlight effective strategies that have proven successful in enhancing storage efficiency without negatively impacting video quality. Furthermore, the presentation will touch upon the ongoing evolution of the system, showcasing how it is continually being improved to better tackle the challenge of managing an ever-growing video catalog while maintaining optimal storage usage. By sharing these learnings with others facing similar challenges, the hope is to contribute to the collective knowledge base and ultimately facilitate the development of more efficient and effective systems for managing large-scale video repositories.

Bruce Spang

Netflix

Wei Wei

Netflix Inc

Innovate Live Streaming with a Client Simulator

One of the major challenges in live streaming is the scarcity of real-world events to test innovative ideas and algorithms, such as adaptive bitrate (ABR) algorithms. Relying on actual live events for testing not only prolongs the innovation cycle but also increases the risk of negatively impacting user experience. To overcome this obstacle, we at Netflix have enhanced our existing client simulator to emulate live streaming scenarios. This simulator utilizes network traces and device characteristics gathered from real-world sessions to drive our production client library. We will delve into the specifics of how this simulator operates during our presentation. In summary, the client simulator plays a crucial role in driving innovation at Netflix, which we will explore in detail during our presentation. In this talk, we will first present how the client simulator simulates live streaming. Then we will demonstrate how it can be used to test new live encoding methods, like Variable Bitrate (VBR) encoding, and to evaluate various ABR algorithms on a large scale. We will conclude the talk with future directions.

Constanza Dibueno

Qualabs

How to play Dash on your HLS Player

What if I told you that you could play a DASH video seamlessly in an HLS player? In today's broadcasting landscape, interoperability is a challenge. Reaching a broader audience means creating multiple copies of each stream file in different formats, which doubles the costs of packaging and storage. This inefficiency is a significant pain for broadcasters. CMAF was designed to revolutionize HTTP-based streaming media delivery. It streamlines media delivery by using a single, unified transport container compatible with both HLS and DASH protocols. At the latest MonteVideo Tech Summer Camp, we embarked on an exciting project: creating a library based on the CMAF standard and the Hypothetical Application Model. This innovative library provides a practical solution for converting playlists and manifests between HLS and DASH. We brought this vision to life by building a proof of concept. We want to present to you an intuitive Open Source UI built on top of the library. In this presentation, we will showcase how the UI can help you to understand the library's powerful capabilities with the potential to create tools to simplify the broadcasting experience, without having to go deep into CMAF's specification complexities. For example, allowing users to take a DASH manifest as an input, convert it to an HLS playlist on-the-fly and reproduce the content on an HLS player. With this capability, broadcasters could adapt to different streaming requirements, delivering content across various platforms and devices, thereby enhancing adaptability and flexibility. By the end of this presentation, we aim to show approaches that could enhance interoperability in your broadcasting operations, using the CMAF HAM UI as a tool.

Vanessa Pyne

Daily

Derek Buitenhuis

Vimeo

Be the change you want to see: How to contribute to FFmpeg

Have you ever written code you wanted to contribute to FFmpeg, but you got a little tripped up in the send-email situation or maybe you got some feedback you weren't sure how to handle and your patch never made it across the finish line? Maybe you went to github to make a PR, saw PULL REQUESTS ARE IGNORED, followed the link to the contribution documentation, saw a 28 point checklist and backed away slowly from your computer. Don't give up the dream! This talk will review the entire FFmpeg contribution process from soup to nuts and demystify the scary parts. It will focus on procedure and potential sharp edges thereof, rather than the actual code contribution itself. If that sounds very dry, rest assure the only thing dry about it will be the wit. This information may be elementary to some folks, but to paraphrase a recent FFmpeg-devel mailing list sentiment: "More diversity would be good." Making the process more accessible is key to making the circle bigger and encouraging a more diverse group of people to participate in the FFmpeg-devel ecosystem. If we want some new kids on the block, there should be a step by step guide, and this talk aims to be that just that. A brief outline of the talk is as follows: 1. How to lurk (mailing list & IRC) 2. Find a thing to fix, improve, create 3. How to run regression tests (FATE, etc) 4. How to git patch (aka how to send an email) 5. How to address feedback 6. It merged! Now what?

Eric Tang

Livepeer

Progress and Opportunities in Video Content Authenticity

In an era where AI-generated video is rapidly becoming the norm, the need for video content authenticity has never been more critical. Over the past year, we've witnessed significant strides in this area, with industry giants like OpenAI, Google, and BBC joining the Coalition for Content Provenance and Authenticity (C2PA) and committing to integrate this technology into their core products. Join us in this enlightening session as we dive into C2PA’s significant technical advancements over the past year, and map out a practical approach for implementing C2PA in any video application. Discover the intricacies of C2PA’s trust model and understand how it safeguards users on video platforms. We'll also cover essential implementation considerations, from video player UX to backend video workflow management. As long standing members of the Content Authenticity Initiative (CAI) and a key contributor to C2PA, we bring a wealth of experience from participating in weekly working groups and shaping the last two versions of the C2PA specification. Our expertise is backed by numerous workshops and presentations at leading conferences and industry events like NAB and C2PA symposium.

Fabio Sonnati

NTT Data

The Long(est) Night

April 28, 2019, a phone call wakes me in the middle of the night: "TheLong Night", a new episode of the final season of Game of Thrones is airing, but nothing is visible! The artistic intent is clearly extreme, and the encoding can't handle it, resulting in a flurry of confused silhouettes in the darkness, struggling in a dark sea of banding. In this presentation, I will talk about how we resolved an extreme situation for an high quality streaming service by manually customizing encoding to mitigate the problem, inspired by well-known principles in the world of audio processing and 3D game rendering.

Gwendal Simon

Synamedia

Token Renewal: Keeping your Streaming Party Smooth and Secured

CDN leaching is a growing concern for content providers. The recent specification of the Common Access Token (CAT) has introduced a vehicle for designing more secure streaming delivery systems. Best practices for CDN content protection often involve renewing the token, either due to short expiration times or probabilistic rejections. However, token renewal is far from trivial. In token-based delivery systems, we identify three key entities: the client, the CDN server, and the token generator. Typically, these communicate via HTTP(S). At any point during a streaming session, the CDN server may request the client to renew its token, ensuring seamless video playback, even for low-latency ABR streaming. The CAT specification includes two claims related to renewal: catr and catif. While the specification details several operation modes, none fully satisfy the combined requirements for fast renewal, legacy clients, and the unique characteristics of DASH and HLS. In this talk, we will unpack the current situation, presenting the pros and cons of each proposed solution. We aim to open the door to a better solution and outline the community effort needed for its implementation.

John Bartos

Twitch/IVS

James Hurley

Twitch/IVS

Real-Time Video Super Resolution for Live Streaming with WebGPU

In the era of high-definition content, maintaining video quality while minimizing bandwidth consumption is a critical challenge for live streaming platforms. On-device video super resolution (VSR) offers a promising solution to achieve higher resolutions at dramatically reduced bitrates. Traditional upscaling methods, such as bicubic or Lanczos interpolation, often fall short in terms of visual quality. While state-of-the-art machine learning (ML) models can produce incredibly accurate upscaled videos, they typically struggle with real-time performance requirements. Here, we present a novel ML-based approach for high-quality, on-device VSR, capable of running in real-time on modern hardware. We introduce the architectural details of our WebGPU-powered VSR pipeline, including efficient video frame transfer to the GPU via WebCodecs, optimized shader code, and seamless integration with web-based video players. Additionally, we will showcase the results from live user testing at scale, demonstrating the impact of our solution on user experience and bandwidth savings. By attending this talk, developers and researchers will get a taste of the latest advancements in on-device VSR and the practical implementation considerations for deploying such solutions in live streaming scenarios. We will explore the challenges and trade-offs involved in designing real-time VSR systems, and discuss future directions for further improvements in quality and performance.

Jan De Cock

Synamedia

Measuring live video quality with minimal complexity, now available for everyone!

We all love video, and we love it even more when the quality of the video is great! To measure that quality, we already have quite some options, and the folks at Netflix did a great job at giving us VMAF. This is all fine and dandy for our VOD colleagues, but what about us, *live* video engineers? We struggle to optimize every cycle in our live encoders, and spending a full CPU core on metric calculation is just not acceptable -- and not good for business. We spent quite some time figuring out how to simplify this problem. Our marketing people said: "Why don't you use AI"? So we did, and imagine that, in this case it actually worked. We'll forget about all those other projects that got stuck in the trough of AI disillusionment. Turns out that metrics such as SSIM and VMAF can be quite accurately predicted, and by using smart features inside the encoder, this can be done with marginal additional computational complexity. In the talk, we’ll explain how we found a balance between accuracy and complexity of the used features and ML networks. All fine for *your* encoder you say, but how does that help me? Well, we took on the challenge to show that this approach also works for open-source encoders, with x264 as our first target. And, we’re sharing the code, so you can try it out too! And while we’re eagerly awaiting the 10th Demuxed over the coming months, we’ll also be trying this approach on SVT-AV1. Too early to tell if this attempt will be successful, but we’ll be able to tell you in October, and take you through the process during the talk!

Jason Cloud

Dolby Laboratories

Jeff Riedmiller

Dolby Laboratories

Does a multi-CDN setup (truly) requiring switching? Deploying an anti-switching multi-CDN delivery platform.

It seems pretty clear that using multiple CDNs to deliver media is a good thing; but it’s hard to do effectively. What is the best policy to use? How do you determine when to switch? How often do you switch? Do you switch based on client performance alone, consolidated user metrics, or something else? What happens when the CDN you switched to isn’t performing as good as you thought it would? Answering these questions (let alone designing a multi-CDN switching architecture) is enough to give anyone a headache. What if we throw out “switching” by downloading media from multiple CDNs at the same time? We could then realize the best performance by merging the performance of each. Seems simple enough until you start trying to do it efficiently. Do you race media from multiple CDNs at the same time, or do you try to perform sub-segment/byte-level scheduling? This seems even more complicated than before! This talk will focus on how to implement and deploy a switchless multi-source media delivery platform that is highly performant and efficient which avoids having to answer these difficult questions or solving massively complicated scheduling problems. Enabling true multi-source delivery without all the fuss requires us to do something a little bit unique to the content we are streaming. We first create multiple “versions” (one for each source aka. CDN) of each and every HLS or MPEG-DASH segment. This is done by packaging these segments into something called the Coded Multisource Media Format (CMMF), which is currently being standardized in ETSI. CMMF is essentially a container that is used to communicate network encoded content (network coding is kind of like forward error correction (FEC), but not – we’ll expand upon this more during the talk). Each CMMF version is then cached on a different CDN. Now let’s say a media player wants to download a particular video segment. Instead of downloading the entire segment from one CDN or requesting bytes x1 through y1 of that segment from one CDN and x2 through y2 from another, the media player requests multiple CMMF versions of that segment from different CDNs at the same time. Once the player receives enough data (the amount required is very close to that of the original segment) from the collection of CDNs, it can stop the download and recover the segment it wanted. By doing it this way, we don’t have to worry about hiccups (like temporary congestion or slow response times) on one CDN because we will just download more from the others. During the talk, we will introduce CMMF, the core concepts behind it, as well as go over how we deployed it within a streaming service with 20+ million subscribers world-wide and streamed approximately one million hours of content using it. We will also provide performance data that shows how CMMF-enabled multi-CDN delivery stacks up against a popular multi-CDN switching approach (as you can guess, it stacked up well). We hope this talk provides the audience with a different perspective to an “age-old” problem and inspires them to explore multisource delivery in greater detail.

Jason Cloud

Dolby Laboratories

Jeff Riedmiller

Dolby Laboratories

Does a multi-CDN setup (truly) requiring switching? Deploying an anti-switching multi-CDN delivery platform.

Jill Boyce

Nokia

Bringing more versatility to VVC with VSEI

Versatile Supplemental Enhancement Information (VSEI) is a companion standard to Versatile Video Coding (VVC). VSEI defines SEI messages that contain metadata inserted into a bitstream synchronized with the coded video, to convey extra information intended to be utilized by the receiver/decoder. SEI messages are optional and are targeted at specific use cases. SEI messages specified in VSEI may also be used with other video coding standards, including H.264/AVC, HEVC, or future standards. Since the initial standardization of VVC and VSEI in 2020, second and third editions of VSEI have been standardized, with a fourth edition under development. The new SEI messages included in new versions of VSEI bring even more versatility to VVC, by addressing a broader variety of applications. This talk will describe several of the new SEI messages and the use cases they enable.

John Bartos

Twitch/IVS

James Hurley

Twitch/IVS

Real-Time Video Super Resolution for Live Streaming with WebGPU

John Bowers

Twitch/Amazon IVS

Free ABR Renditions for User Generated Content Platforms

Well, not exactly free - but much, MUCH lower cost than server-side transcoding! Providing an ABR ladder is table stakes for live viewer experiences, but it’s expensive for at-scale video platforms to provision and maintain specialized infrastructure to handle peak transcoding demand. A recently developed update to the Enhanced RTMP specification adds multitrack video, multitrack audio, and advanced codec support. With implementations in OBS Studio and Wireshark, the technology is ready for you to adopt it. Now you can offer all creators - regardless of audience size or creator - ABR playback. Come and learn why encoding multiple video tracks on the content creator’s machine at the edge is higher quality, lower latency, more scalable compared to server-side transcoding – all while allowing faster innovation and deployment of newer codecs like HEVC and AV1.

Jon Dahl

Mux

A taxonomy of video "quality," or: was Strobe right or wrong about quality?

“Quality” is one of the most abused and overloaded terms in video. Orwell says that unclear language leads to unclear thinking, and: wow, our industry suffers from unclear thinking around quality. We conflate codecs like AV1 with "high quality"; we don’t know the difference between QoS and QoE; we’re 👍 on VMAF but we don’t really know how to use it. Meanwhile, Strobe gets on stage at Demuxed 2018 and says “Video quality doesn’t matter” (as the audience gasps in horror). In this talk, we’ll bring clarity and precision to the domain of “video quality.” We’ll learn the difference between QoE, QoS, perceptual quality, fidelity, efficiency, and more. We will review a schema that once and for all will eliminate all confusion, doubt, and ignorance from this area, driving our industry forward into a more enlightened future. And most importantly, we’ll learn whether Strobe was right and wrong when he said quality didn’t matter.

Katerina Dobnerova

CDN77

Enhancing CDN Performance and Cutting Egress Costs in Large Video Libraries Delivery: Advanced Caching Strategies and Edge Computing Optimization

During the 20 minutes of my presentation, users worldwide will generate content equivalent to the volume created from the dawn of civilisation until 2003. The volume of content being created today is staggering. Consider this: from the beginning of recorded history until 2003, we produced roughly 5 exabytes of content. However, projections suggest a monumental leap to 147 zettabytes in 2024 alone, with video content leading the charge. With such exponential growth in content and its shortening life span, content delivery networks (CDN) face significant challenges in effectively caching large video libraries. While cache hit rates of 98% and higher are taken for granted, the figures above suggest that simple disc space inflation is not remotely enough to keep the cache hot ratio at the desired figures. This presentation explores many approaches, including tiered cache systems which use a hierarchical system of caching servers employing consistent hashing and other techniques to maximize scalability and performance while minimizing failover and downtime. It also covers one-hit-wonder elimination, utilizing simple counters to reduce cache pollution by avoiding storing unpopular content. It also addresses cache-state sharing, which employs Bloom-filter-based technology to further improve cache scalability and effective disk space utilization. Moreover, it will examine the deployment of edge computing to amplify caching efficiency in specific use cases.

Li-Heng Chen

Netflix

Ryan Lei

Meta

A hitchhiker's guide to AV1 deployment

Six years since its inception as a video coding standard stipulated by the Alliance for Open Media, AV1 has proven its capability as a Swiss Army knife, with application domains spanning the streaming of movies and TV shows, user generated content and real-time video conferencing, including screen content, among others. This talk will feature a roadshow of AV1 deployments that have impacted billions of people's lives, presented by engineers with first-hand experience on its implementation in production systems. Presenters will share tips, tricks, know-hows and the lessons learned from their deployment experience to bring the best performance out of AV1. Example topics include but not limited to: productization of AV1’s film grain synthesis feature and use of AV1 to deliver high dynamic range video at Netflix, AV1 deployment in Instagram Reels, and AV1 support for RTC services at Meta.

Luke Curley

Discord

Replacing WebRTC with Media over QUIC

It's been over a decade since WebRTC was released. Surely there's something new on the horizon, right? Media over QUIC is an IETF working group that is working on a new live media standard to replace the likes of WebRTC, RTMP/SRT, and HLS/DASH. Wow that's overly ambitious, but it's being backed by your favorite big tech companies (and some non-favorites) in the same standards body that has produced hits such as... WebRTC. But replacing WebRTC is difficult. It exists because there were no web standards in 2011 that could accomplish conferencing; remember this was before even HTML5. But there are new Web standards now! This talk will go over WebTransport and WebCodecs, and how they are utilized to provide a user experience nearly on par with WebRTC while being dramatically more flexible. No more magic black box, no more ICE/STUN/TURN/SDP/DTLS/SCTP/RTP/SRTP/mDNS, no more getting ghosted by Google. Just you with a QUIC connection and the ability to encode/decode video frames. And of course we'll go over the promise of Media over QUIC and why you should use the standard instead of your own bespoke protocol. I'll give you a hint, it starts with C and ends with "DN Support".

Matteo Naccari

Visionular

Compression of stereoscopic video with MV-HEVC: fundamentals, tools and development

Multiview (e.g. stereoscopic) content provides users with a fully immersive and compelling quality of experience when watching videos. This type of content is gaining new momentum thanks to the development and commercialisation of Virtual Reality (VR) headsets such as Apple Vision Pro and Oculus Quest. The delivery of Multiview video calls for new challenges to the video coding community, being frames composed of multiple views (two in the case of stereoscopic). Standardisation bodies such as ISO/IEC MPEG and ITU-T VCEG envisaged the compression of Multiview content with the H.265/HEVC standard, extended to efficiently tackle the intrinsic data redundancies present across different views. Thanks to the availability of VR headsets, content providers and codec vendors are now deploying solutions supporting the Multiview extension of H.265/HEVC (collectively known as MV-HEVC). This talk will introduce the MV-HEVC standard from the encoder’s designed perspective, starting with an overview of the standard’s design and tools supported. The focus will then move on to consider the challenges faced when implementing practical encoding solutions such as fast mode decision and rate control.

RongKai Guo

NVIDIA

Zoe Liu

Visionular

AI Enhanced GPU Video Coding: Achieving Joint High Compression Efficiency and Throughput

We are here to present a novel approach to significantly boost video compression efficiency on Nvidia NVENC hardware encoders, by leveraging AI-driven pre-analysis and pre-processing algorithms. We refer to this method as AI Enhanced GPU Video Coding, which combines Nvidia NVENC's high density, low latency, and high throughput with ML-based techniques to enhance video compression efficiency and boost visual quality, while maintaining high throughput. NVENC, as a leading hardware-based encoder, excels in providing high throughput and low latency but generally offers lower compression efficiency compared to CPU-based software encoders. Our AI-driven GPU video compression approach aims to leverage the advantages of both NVENC and AI algorithms to achieve high compression efficiency and throughput performance. Our optimization algorithms mainly include: 1. ML-based Scene & Region Classification: Identifying effective coding tools based on scene and region classification. 2. Regions of Interest (ROI) Identification: Focusing on perceptually significant regions, such as faces and jersey numbers in typical sports videos. 3. Pre-processing Techniques: Applying deblurring, denoising, sharpening, contrast adjustment, etc. to boost up visual quality. 4. Hierarchical pre-analysis and pre-classification: Setting fine granular QPs, including block-based QPs, and enabling quick quality monitoring. These techniques combined improve video compression efficiency, boosting both objective and subjective quality while achieving significant bitrate savings. We have applied these methods to large UGC content platforms. Our results demonstrate promising improvements in compression efficiency for both VOD and live use cases. Using the NVIDIA T4 Tensor Core, we maintained the same high throughput for multiple parallel encoding threads and achieved a 15-20% bitrate saving and a 1-2 VMAF score improvement at the same time, on typical UGC & PUGC content compared to the out-of-the-box NVENC approach. Further enhancements, such as re-encoding, are currently being developed and further compression gains are expected.

Li-Heng Chen

Netflix

Ryan Lei

Meta

A hitchhiker's guide to AV1 deployment

Steve Robertson

YouTube

Why is gapless so hard?

A deep dive into audio gaplessness, for video engineers. Covering the difference between stitching, pseudo-gapless, and true gapless approaches, why gapless is important to the art, the mechanical reasons why the audio clock always wins, how the system reconciles this instability, and why this leads to dropped frames and A/V sync issues.

Tanushree Nori

Vimeo

Budgeting Bytes: Acing Cost-Efficient Video Storage

In today's world, where data never stops growing, Vimeo is at the forefront, cleverly slashing storage costs while keeping videos readily accessible. In my talk, I’ll peel back the curtain on how we fine-tune cloud storage using Machine Learning, balancing cost savings with cheap and quick video access at Vimeo. We’ve cut our storage bills by an impressive 60% by applying smart lifecycle policies and a dash of machine learning methods. I'll share insights on how we determine the best times to tuck away older videos into cheaper storage tiers and what factors go into these decisions. This talk will offer practical strategies and a peek into the tools that help Vimeo manage a sprawling video library efficiently. Discover how these innovations can help reshape your approach to data storage too!

Thomas Edwards

Amazon Web Services

Video Processing on Quantum Computers

Quantum computing (QC) utilizes quantum mechanics to solve complex problems faster than on "classical" computers. QCs available today are considered "Noisy Intermediate-Scale Quantum" (NISQ) computers with a small number of quantum bits (qubits) and limited performance due to short coherence time and noisy gates. QCs are improving all the time, so it is possible that in the future they could provide practical acceleration to video processing workflows (remember how neural networks were in the 1990's?). This presentation will give a short overview of QC basics, results of representing (simple) images on an actual cloud-accessible QC, and will describe some research on potential video processing applications of QCs. [Note: I've timed that this can be presented in 20 minutes]

Tony McNamara

Paramount Streaming

Pseudo-Interstitials: Playback flexibility for legacy devices.

Interstitials allow the insertion of content by reference into a playback stream, and are especially useful when a playlist won't work. But Interstitials are also still relatively new; just a year ago Apple devices didn't support playback of them, despite Apple having accepted them into the HLS Specification years earlier. DASH XLinks suffer the general inconsistency so consistent in DASH. And of course legacy devices tend to be stuck on much earlier protocol versions. We've come up with "Pseudo-Interstitials", which provide much of the same flexibility, to allow very-late decisioning and binding of content, especially ads, into playback of legacy devices. This will include a very brief introduction to interstitials and their value, and the problem statement, and then a deep dive into the multi-disciplinary solution including encoding concerns, manifest manipulation, Edge Computing and even briefly SSAI constraints.

Tracey Jaquith

Internet Archive

What's on TV? 4 editors and 2 robots walk into a bar..

Using TV news "chyron" text overlays in the "lower third" (from human editors), image-to-text (OCR), grouping/filtering, and AI gpt to summarize --> we social post hourly: "What's on TV?" The non-captions news text (eg: BIDEN VISITS MEXICO) that shows up at the bottom of the screen (like those overhead monitors in airports showing news) is gold, written in real-time by editors during live broadcasts. However, the data is not carried anywhere inside the video streams (just visually). What's a girl with robots to do? Using CNN, MSNBC, Fox News and BBC News feeds, we use ffmpeg to crop the relevant image area; tesseract to OCR the image into text; and GPT AI to summarize, remove ads, and cleanup the text. We then post hourly summaries to mastodon.

Vanessa Pyne

Daily

Derek Buitenhuis

Vimeo

Be the change you want to see: How to contribute to FFmpeg

Walker Griggs

Mux

PSSH, or the Primordial Soup of Secure Headers

Consider our friendly, neighborhood PSSH box. The semantics are simple -- to identify encryption keys -- but, as with any permissive specification, there’s a lot more going on than meets the eye. In some cases, they contain deeply nested little-endian UTF16 XML. In others, we’ll find protocol buffers containing base64-encoded JSON. In all cases, they have surprising amount of personality. In this talk, we will dive deep into several PSSH boxes, dissecting them bit by bit across various popular DRM schemes. Along the way, we will: 1. Explore the history of the PSSH box and how it mirrors the evolution of DRM standards. 2. Discover how each provider has imparted their own company idioms onto the loosely-defined PSSH payload. 3. Identify where the decisions of one provider impacted the rest.

Bruce Spang

Netflix

Wei Wei

Netflix Inc

Innovate Live Streaming with a Client Simulator

Will Law

Akamai

Creative Monkeys Contemplate Dating

The geeky primates at WAVE are releasing version 2 of the popular CMCD standard . While CMCD v1 was restrained to a CDN (data) relationship, v2 gives you three different modes for concurrently sharing data. Now you can date a content steering service, and an analytics service, at the same time as maintaining a committed relationship with your CDN :) This talk highlights the new features and capabilities of CMCD v2. In addition to the reporting mode enhancements, we'll investigate the host of news keys being offered: media start delay, target buffer length, buffer starvation duration, prefetching multiple objects at once, player state, response code, TTFB, timestamps, request URLS and many more. We'll explore how v2 can be used to drive lightweight data for content steering decisioning, rich collection for analytics providers that is decoupled from the delivery and even improved prefetching performance and visibility for the CDN. We'll show it all working and release some code so that you too can experiment. Join us!

Yingyu Yao

YouTube

Your TV Is Eating Your Frames

At YouTube, we aspire to stream cat videos to everything that has a screen, including the largest of them all: TVs in your living room. Despite being devices engineered to be video playback powerhouses, it is unexpectedly difficult to make videos play consistently and smoothly on them. From the lens of a player engineer, I will take you on a shallow dive through the TV media stack, and we will explore different ways a playback can get tripped on those large screens.

Yuriy Reznik

Brightcove, Inc.

Streaming in 1970s. NVP & ST: the very first real-time streaming protocols.

In this talk we will go back in history and look at the very first protocols and systems developed for internet streaming. The venerable NVP (network voice protocol) and ST (Internet Stream Protocol, aka IP v5) protocols developed by Danny Cohen, Jim Forgie, and other brilliant engineers at MIT Lincoln labs in 1970s. We will discuss the key ideas introduced by these protocols (the concepts of sessions, available capacity assessment, rate negotiation between sender and receiver, data transfer protocols, the need for network-layer support for sessions, resource provisioning, etc.) and show how most of these ideas become incorporated in subsequent designs. Specifically, we will show how many ideas introduced in NVP and ST have eventually found their implementations in modern protocols, such as WebRTC, QUIC and MOQ. The talk will include many historical pictures and some videos of those early pioneering systems build in 1970s. It will also try to explain why and what motivated these original developers to come up with all these techniques.

RongKai Guo

NVIDIA

Zoe Liu

Visionular

AI Enhanced GPU Video Coding: Achieving Joint High Compression Efficiency and Throughput

Venue & location

The Regency Ballroom
1300 Van Ness Ave.
San Francisco, CA 94109

The Regency Ballroom is a beautiful, centrally-located San Francisco event venue.

According to their website, the building is noted as a fine example of Scottish Rite architecture. Its ballroom is a beaux-art treasure with thirty-five foot ceilings and twenty-two turn-of-the-century teardrop chandeliers.

According to one intrepid online reviewer, “Took my son to a death metal concert here and it was awesome!” …so, you know it's gotta be good.

The Schedule

9:40 AM PDT

Matt McClure

Demuxed

Opening Remarks

Tanushree Nori

Vimeo

Budgeting Bytes: Acing Cost-Efficient Video Storage

Alex Field

NBCU/Sky

The Colorful Truth of Automated Tests

10:40 AM PDT

Break

11:15 AM PDT

Walker Griggs

Mux

PSSH, or the Primordial Soup of Secure Headers

Eric Tang

Livepeer

Progress and Opportunities in Video Content Authenticity

Jason Cloud

Dolby Laboratories

Jeff Riedmiller

Dolby Laboratories

Does a multi-CDN setup (truly) requiring switching? Deploying an anti-switching multi-CDN delivery platform.

Thomas Edwards

Amazon Web Services

Video Processing on Quantum Computers

12:30 PM PDT

Lunch

1:45 PM PDT

Katerina Dobnerova

CDN77

Enhancing CDN Performance and Cutting Egress Costs in Large Video Libraries Delivery: Advanced Caching Strategies and Edge Computing Optimization

Yingyu Yao

YouTube

Your TV Is Eating Your Frames

Alex Giladi

Comcast

Ads and overlays

2:50 PM PDT

Break

3:10 PM PDT

John Bowers

Twitch/Amazon IVS

Free ABR Renditions for User Generated Content Platforms

John Bartos

Twitch/IVS

James Hurley

Twitch/IVS

Real-Time Video Super Resolution for Live Streaming with WebGPU

Bruce Spang

Netflix

Wei Wei

Netflix Inc

Innovate Live Streaming with a Client Simulator

4:00 PM PDT

Break

4:40 PM PDT

Gwendal Simon

Synamedia

Token Renewal: Keeping your Streaming Party Smooth and Secured

Will Law

Akamai

Creative Monkeys Contemplate Dating

RongKai Guo

NVIDIA

Zoe Liu

Visionular

AI Enhanced GPU Video Coding: Achieving Joint High Compression Efficiency and Throughput

Yuriy Reznik

Brightcove, Inc.

Streaming in 1970s. NVP & ST: the very first real-time streaming protocols.

Matt McClure

Demuxed

Closing Remarks

9:30 AM PDT

Matt McClure

Demuxed

Opening Remarks

Anand Vadera

Meta

Optimizing Storage for Meta's Trillion-Video Catalog: Achieving Pareto Efficiency

Jill Boyce

Nokia

Bringing more versatility to VVC with VSEI

Matteo Naccari

Visionular

Compression of stereoscopic video with MV-HEVC: fundamentals, tools and development

10:40 AM PDT

Break

11:15 AM PDT

Tracey Jaquith

Internet Archive

What's on TV? 4 editors and 2 robots walk into a bar..

Li-Heng Chen

Netflix

Ryan Lei

Meta

A hitchhiker's guide to AV1 deployment

Luke Curley

Discord

Replacing WebRTC with Media over QUIC

12:20 PM PDT

Lunch

1:35 PM PDT

Lightning Talks

Jan De Cock

Synamedia

Measuring live video quality with minimal complexity, now available for everyone!

2:45 PM PDT

Break

3:05 PM PDT

Tony McNamara

Paramount Streaming

Pseudo-Interstitials: Playback flexibility for legacy devices.

Jon Dahl

Mux

A taxonomy of video "quality," or: was Strobe right or wrong about quality?

Vanessa Pyne

Daily

Derek Buitenhuis

Vimeo

Be the change you want to see: How to contribute to FFmpeg

3:55 PM PDT

Break

4:35 PM PDT

Fabio Sonnati

NTT Data

The Long(est) Night

Constanza Dibueno

Qualabs

How to play Dash on your HLS Player

Steve Robertson

YouTube

Why is gapless so hard?

5:15 PM PDT

Surprise

5:35 PM PDT

Matt McClure

Demuxed

Closing Remarks

6:05 PM PDT

Afterparty

After 2 long days of talks, we'll head all the way to… the venue's social hall! Put your feet up, grab some refreshments, and hang out with your fellow video engineers.

Li-Heng Chen

Netflix

Ryan Lei

Meta

A hitchhiker's guide to AV1 deployment

Jon Dahl

Mux

A taxonomy of video "quality," or: was Strobe right or wrong about quality?

Alex Giladi

Comcast

Ads and overlays

RongKai Guo

NVIDIA

Zoe Liu

Visionular

AI Enhanced GPU Video Coding: Achieving Joint High Compression Efficiency and Throughput

Vanessa Pyne

Daily

Derek Buitenhuis

Vimeo

Be the change you want to see: How to contribute to FFmpeg

Jill Boyce

Nokia

Bringing more versatility to VVC with VSEI

Tanushree Nori

Vimeo

Budgeting Bytes: Acing Cost-Efficient Video Storage

Matteo Naccari

Visionular

Compression of stereoscopic video with MV-HEVC: fundamentals, tools and development

Will Law

Akamai

Creative Monkeys Contemplate Dating

Jason Cloud

Dolby Laboratories

Jeff Riedmiller

Dolby Laboratories

Does a multi-CDN setup (truly) requiring switching? Deploying an anti-switching multi-CDN delivery platform.

Katerina Dobnerova

CDN77

Enhancing CDN Performance and Cutting Egress Costs in Large Video Libraries Delivery: Advanced Caching Strategies and Edge Computing Optimization

John Bowers

Twitch/Amazon IVS

Free ABR Renditions for User Generated Content Platforms

Constanza Dibueno

Qualabs

How to play Dash on your HLS Player

Bruce Spang

Netflix

Wei Wei

Netflix Inc

Innovate Live Streaming with a Client Simulator

Jan De Cock

Synamedia

Measuring live video quality with minimal complexity, now available for everyone!

Anand Vadera

Meta

Optimizing Storage for Meta's Trillion-Video Catalog: Achieving Pareto Efficiency

Eric Tang

Livepeer

Progress and Opportunities in Video Content Authenticity

Tony McNamara

Paramount Streaming

Pseudo-Interstitials: Playback flexibility for legacy devices.

Walker Griggs

Mux

PSSH, or the Primordial Soup of Secure Headers

John Bartos

Twitch/IVS

James Hurley

Twitch/IVS

Real-Time Video Super Resolution for Live Streaming with WebGPU

Luke Curley

Discord

Replacing WebRTC with Media over QUIC

Yuriy Reznik

Brightcove, Inc.

Streaming in 1970s. NVP & ST: the very first real-time streaming protocols.

Alex Field

NBCU/Sky

The Colorful Truth of Automated Tests

Fabio Sonnati

NTT Data

The Long(est) Night

Gwendal Simon

Synamedia

Token Renewal: Keeping your Streaming Party Smooth and Secured

Thomas Edwards

Amazon Web Services

Video Processing on Quantum Computers

Tracey Jaquith

Internet Archive

What's on TV? 4 editors and 2 robots walk into a bar..

Steve Robertson

YouTube

Why is gapless so hard?

Yingyu Yao

YouTube

Your TV Is Eating Your Frames

10 years of Demuxed!

Demuxed is video engineers talking about video technology

Our first meeting was a single day event back in 2015, born out of the SF Video Technology meetup. The video industry had plenty of trade shows and other opportunities for The Business, but our goal was to create a conference and community for the engineers building the technology powering video, from encoding, to delivery, to playback, and beyond. We’ve grown a lot since then, but our goal remains the same.

After creating Demuxed, some of the organizers went on to start and work at Mux. Mux continues to sponsor most of the organizational work behind the scenes (thanks for the salary!), but Demuxed is, at its core, a community-led event.

Every year we get a group together that’s kind enough to do things like schedule planning, help brainstorm cool swag, and, most importantly, argue heatedly over which talk submissions should make the final cut. These folks are the ones hanging out in Video Dev Slack, and they hail from all over the industry.