Is the state of education video in 2024 the quiet before or after the storm? 大流行近在眼前, 在新常态下,企业规模的视频托管和管理服务能否以学校愿意承担的价格保持盈利,我们将面临一个十字路口. It’s unlikely that we’ll cross a point of no return this year, 但我建议密切关注一些迹象,这些迹象可能会减轻或加剧人们对学校在长期未来拥有一定程度的所有权和控制权所依赖的视频服务的担忧.

与此同时, 由于一家热门的新公司(OpenAI)和一个自2001年以来备受青睐的开源项目的融合,2024年方便的字幕工作流程现在可以为大众所接受.


从2021年夏天开始, those of us who closely follow the industry for supporting streaming media for schools have had detailed insights into exemplary vendors. That’s when 创作 successfully completed its IPO and subsequently was required to file docu­ments with the SEC, the most interesting of which are quarterly 10-Q forms and the 10-K annual report. These documents include financial statements and a sober accounting of a public company’s perspective on its business climate.

Similarly, Zoom is required to publish these same documents, as it completed its IPO in 2019. 在两家公司和他们提交给SEC的文件之间, 我们对同步和异步视频领域有一个可靠的看法,来自两个最成功和运行良好的供应商,为教育视频垂直服务.


Zoom is a little bit unusual for an emerging tech company in that it has turned a profit every year since 2018, IPO的前一年, 最近的利润相当可观. That synchronous video is a more profitable line of business is consistent with the general rule of cloud economics as detailed in 百家乐app下载 在去年的 原始资料:

在云端, 可变使用百家乐软件,如CPU, 内存, 带宽往往是非常划算的, while fixed-use resources like long-term disk stor­age are more expensive than what you can get with an on-prem investment. 换句话说, 当你为你所使用的服务付费,并在其他时间将这些百家乐软件交给其他公共云租户时,规模经济的效果最好;当你总是为存储你积累的、可能会或可能不会使用的数据付费时,规模经济的效果最差.

Synchronous video requires paying for CPU and bandwidth whenever the service is being used, and not much of anything goes on the cloud bill when it’s not being used. 默认情况下, Zoom deletes meetings recorded to its cloud hosting after 180 days, so stor­age costs have a built-in mechanism to avoid snow­balling. In searching for tea leaves to read 在去年的 原始资料, we alit upon a decreasing rate of revenue growth period-over-period from 2021 to 2022. 这一趋势一直持续到2023年, but Zoom reve­nue growth appears to have stabilized at around 3% through the first three quarters of 2023 compared to the same time periods in 2022.

Zoom acquired two companies in 2023: Solvvy and Workvivo. Solvvy adds a mature chatbot offering to the Zoom portfolio, and Workvivo provides an employ­ee experience platform (see 图1) that delivers streamlined communication and culture-building tools for business subscribers.


图1. Workvivo for Zoom

The closest thing to an academic institution listed on the Workvivo网站的合作伙伴页面 胡佛研究所在斯坦福大学吗, so I don’t expect that this acquisition will create immediate value for Zoom’s school customers. 然而, it may be a step toward closing feature gaps with Microsoft Teams down the road.

There is definite interest in developing custom chat­bot applications in higher ed, 虽然. The Universi­ty of Central Florida (UCF) is a school that I admire for generally staying ahead of the curve with educa­tional technology, 和它的 骑士聊天服务, built in a partnership with engagement platform vendor Mainstay, 是一个成功的聊天机器人的好例子吗. 另一个主要客户, 佐治亚州立大学, 与UCF和其他机构合作, 最近获得了7美元的奖金.600万美元的拨款用于研究聊天机器人是否可以通过为学生提供全天候的人工智能助教来提高学生的学习成果,学生可以向他们提问. 有趣的是,这项研究是否还揭示了学生与人类教师和助教的互动是否会因为与人工智能助理的互动更多而减少文明程度.


创作 had its IPO at a somewhat unfortunate time as far as trendline optics go, al虽然 it was a good time to raise cash to the tune of $172.500万年. 纳斯达克综合指数收于14,631点.95 on the day of the IPO and fell below 12,000 by May 2022 and be­low 11,000 in June 2022. 创作 was priced at $10 for the IPO, had a peak closing price of $13.8月61日. 2021年6月6日——顺便说一句,这一天最后的2.2500万年 shares were sold at the original price—then fell all the way to $1.78 on March 7, 2022, roughly where the stock price has languished ever since.

价格的暴跌使一件不太可能的事情成为可能, un­solicited purchase attempt from one of 创作’s top competitors, Panopto, 2022年夏天. The purchase was ultimately shot down by 创作’s board. 创作 has spent the past 2 years getting lean on operational costs, shedding 10% of the workforce in 2022 and lay­ing off an additional 11% in 2023.

Layoffs have been widespread across the tech sec­tor for the last several years and continue into 2024. Twitch laid off more than one-third of its employees 2024年1月,一个戏剧性的例子.

The effort to trim down has borne fruit in 创作’s case: The company’s non-R&D operating expenses fell below gross profits in Q4 2022 and have remained well below since. Period-over-period revenue growth in 2023 was strong, with 创作’s subscription in­come in the category that in­cludes the education vertical increasing by 8.2% 7%和4.与2022年相比,前三个季度增长了8%, handi­ly beating the trend observed 在去年的 原始资料.

It’s noteworthy that Kaltu­ra—the biggest provider of educational VOD services that serve half of the R1 universi­ties—has never shown a profit, 季度或年度, 虽然再一次, Zoom is the outli­er among emerging tech com­panies for consistently turning a profit. 在某一时刻, 虽然, it would be reassuring to know that the vendors schools rely on for educational video services operate on sustainable busi­ness models. 卡尔图拉也意识到了这一点, 最近从Magic Leap招聘了John Doherty to serve as its new CFO while specifically mentioning profitability as a component to the hire in its announcement.

去年的 百家乐app下载 这篇文章讨论了如果为学校服务的两家最大的视频管理系统供应商实际上合并了会发生什么,以及如果学校在大流行后的教育技术需求萎缩,新的环境要求他们的视频管理软件(VMS)订阅降级,学校将有什么选择. Since video services are tremen­dously valuable to schools, 学校管理者倾向于将核心服务外包给供应商,而不是依靠高技能员工的忠诚来支持这些关键的操作, 我相信这个行业会蓬勃发展.

如果这种乐观是错位的,那么 多伦多大学的公开课 项目可能暗示一个新的方向. The University of Toronto is a bold and forward-thinking institution with a total enrollment of just few­er than 100,在它的三个校区中有1000名学生. It successfully built out its Opencast Content Capture System (go2sm.(occs)提供全校范围的讲座记录, 对于那些愿意投资于本地解决方案或跨机构合作以集中百家乐软件实现这一目标的学校来说,它仍然是一个极好的解决方案 图2).


图2. A schematic of the 多伦多大学的公开课 Content Capture System

创作’s 2024 10-K filing was expected in Febru­ary. In the section of the filing that discusses risk fac­tors, compliance with privacy regulations is always a major concern. 2021年,中国通过了 个人资料保护法(PIPL), complicated legislation that includes specific cash ranges that companies can be held liable for if the law is not adhered to. 到目前为止, PIPL has not been mentioned in 创作’s SEC filings (and only oblique­ly in Zoom’s 2023 10-K), but navigating how this law impacts international educational institutions and the vendors that provide technology services for them is a major question.

I also expect some insightful discussion of new risks posed by modern AI. 一般, 创作 in­cludes a short paragraph about liability related to hosting content that violates copyright or licenses. It will be interesting to read if deepfake technolo­gies are on 创作’s radar, as they present a more costly challenge to assisting institutions with polic­ing take-down requests for offensive, 高度个性化内容.

I’m also curious to see data on how 创作’s en­trance into synchronous video services has devel­oped, something that has yet to be teased out in any filings thus far. 正如在Zoom中讨论的那样, the econom­ics of cloud resource provisioning for synchronous video are more favorable than those for asynchro­nous, so the more 创作 can grow its synchronous service offerings, 这可能对它的利润有利. The company will also need to thread the needle of either more effectively passing on its stor­age costs to customers without creating dissatis­faction, or, 更好的, providing data-driven tools for assessing what content can be inconsequentially de­leted or archived to lower-cost storage by customers to minimize storage costs.

An advisable approach is to adopt the “with great knowledge comes great liability” data retention pol­icy angle, perhaps in concert with efforts to comply most effortlessly with PIPL, 《百家乐软件》, 和你.S. 隐私法. Another appealing justification to conscientiously manage the accumulation of recorded video data is stream­ing green. Unnecessary video storage bloats elec­tricity usage and contributes carbon released into the atmosphere.


在去年的“教育现状”视频中,” I threw some cold water on the hype over ChatGPT based on the performance of GPT-3. GPT-4 was released right around the 原始资料’s publication, and that skep­ticism was no longer warranted given GPT-4’s superi­or performance. GPT-4 has been shown to do well on standardized tests, 大学先修课程考试, 还有行业考试, making it a major factor in how teach­ers assess student performance.

The best advice I’ve seen for how to AI-proof your tests and assignments, loosely adapted from optics research scientist and AI researcher Janelle Shane (aiweirdness.com), is to give questions that students can answer but that a pre-trained transformer can’t by making the questions very local to the student doing the assignment, 无论是在空间上还是在时间上. The transformer’s training data is many months stale from the public internet, 所以它不能回答关于最近事件的问题也不能访问课本或课程网站上的特定页面(除非学生提示).

Over the past year, many teachers have leaned into the 变压器革命 and have tried to incorporate AI into their instruction. Perhaps the most intriguing use of AI text genera­tion is for seeding inspiration. 在这里, the assignment would be to have your text generator produce sev­eral essays on various topics, 选择一个你最想重写的, and produce an original essay of your own based on the prompt. This strikes me as a generalization of Cunningham’s Law, 哪个可以表述为, “The best way to motivate experts to provide you with a correct answer is to invite their contempt by posting the wrong one on the public in­ternet.“不管出于什么原因,听起来都是真的, it’s easier and somehow more satisfying to put creative energy into disagreeing with someone than agree­ing with them. 一项引人注目的写作任务是让学生重写两篇人工智能生成的文章——一篇他们同意,一篇他们不同意——并主观地评价这段经历. 作为一个班级, they would then reflect on why this is so (assuming that it does indeed prove to be the class’ experience).

除了令人尴尬的低估大型语言模型(LLM)驱动的转换器的速度之外,对于比简答题更复杂的评估来说,这将带来实质性的挑战, 去年那篇文章的一个要点站得住脚: Whis-per, OpenAI’s open source speech-to-text engine, would be a huge benefit for education in 2023. In 2024, Whisper和Whisper-powered工具很容易使用, even for technology-challenged teachers and students who need to have their videos captioned without spending a huge amount of time on the process.

The quality of automatic captioning offered by ven­dors has improved dramatically in the past 5 years with the rise of attention-based transformers and LLMs. Whisper being freely available since Septem­ber 2022 upgraded the state of the art in how educa­tors can produce closed captions for their education­al video. Whisper is able to generate astonishingly accurate transcriptions in multiple languages. 例如, I supported a research project by generat­ing automatic transcripts of interviews in Ukrainian, 俄罗斯, 英语, and Czech with people fleeing the war in Ukraine and those providing aid to them. 这项技术极大地改进了研究人员的程序(修改成绩单比从头开始写成绩单要快得多),并且没有将高度敏感的数据发送到任何不可信的地方. That Whisper adds on the ability to automatically translate from language to language as part of the speech-to-text process is almost unimaginable, 但它运行得很好.

Whisper is not perfect, 虽然, and has two ma­jor problems. 首先,它产生的片段是远的, far too long; often three or four lines of captions fill the width of the player. 第二点是Whisper容易产生幻觉, 就像所有变形金刚一样, since they’re built to predict words and send them to output even when the input is very sparse or nonexistent from a human language user’s per­spective. 通常, a hallucination happens after or during stretches of silence or a non-speech signal like music, producing unrelated text or often just a sequence of periods for the remainder of the run.

WhisperX is a project that’s be­ing undertaken to address both of these problems head-on (github .com/m-bain/whisperX). WhisperX(见 图3) 通过检测语音信号和切断所有其他非语音音频间隔来预处理要转录的au - audio,这样Whisper就不会有产生幻觉的理由. 在生成这个编辑过的音频的文本之后, it performs forced alignment against the original audio using Me­ta’s Wave2vec toolkit to time code and segment the transcript into a caption file. 这是一个非常聪明的解决方案, al虽然 it jettisons Whisper’s translation capability, and WhisperX’s segmentation is also of­ten far too long.

图3. The WhisperX pipeline as diagrammed on the project’s GitHub readme

然而, hallucination is generally not a problem in instructional videos, where there are almost never extended periods of silence or non-speech sound. 事实上, 我使用Whisper已经好几个月了,我自己从来没有看到过这种现象,直到我们开始向它扔毕业典礼的录音,其中包括冗长的专业演讲. 因此, 作为一名教师, the only concern with using Whisper is getting it installed and being able to re-segment and easily correct the captions it produces.

为了解决Whisper面临的挑战, 字幕编辑 是一个优秀且免费的工具吗. Al虽然 I started using it only recently, it has been in development since 2001. The source code was version-controlled on GitHub for just over a decade and was

at that time primarily a souped-up version of SubRip, the DVD subtitle picture OCR program that invent­ed the SRT filetype. 字幕编辑的发展(见 图4), 虽然, 专注于人机工程学而不是OCR, deferring the job of recog­nizing the text in DVD subtitles to the Tesseract OCR engine, originally written at HP and later adopted as an open source project by Google. 字幕编辑 was a fascinating program all along; by 2011, 它有一些先进的功能,比如实时文本聊天,这样多个编辑就可以在DVD本地化项目上合作,还有一个快速傅立叶变换(FFT)计算器,可以显示实时频谱图,帮助专家识别含糊的语音. As of 2014, it could export to 201 different caption formats. 有3.6.10月8日上映. 24, 2022, 字幕编辑 began experimenting with using Whisper to au­to-generate captions for any video to be presented in its 2-decades-in-the-making caption correction user interface; this occurred about 1 month after Whisper was open sourced. The program makes download­ing and installing Whisper 和它的 pre-trained mod­els a breeze. The default option for the Whisper version is a standalone executable wrapper of Faster-Whisper, WhisperX使用的引擎的相同变体. Another easy option, CPP, a C++ port of Whisper by the brilliant and extraordinarily productive 格奥尔基Gerganov, has some very useful extra features like live captioning from a microphone and more com­pact models.

图4. 编辑即将下载的媒体.甚至是预训练的Whisper模型

If you need to caption video that would be prone to hallucination, WhisperX是一种选择, but it would re­quire a nonstandard installation procedure bypass­ing the Conda virtual environment steps. The original Whisper engine significantly benefits from inference on a GPU with at least 12GB of V内存 when using a large model, but both Faster-Whisper and Whisper CPP perform well on any modern computer.

字幕编辑 will re-segment the transcript into timed text using default settings (see 图5) that are close enough to the Netflix text style guide, 在全国聋人协会说服该公司成为流媒体娱乐行业无障碍的有效盟友后,它已成为行业标准.

图5. “字幕编辑设置”菜单

With more than a year of development since Whis-per was incorporated into the 字幕编辑 project, it’s an easy-to-use way to get started with this extremely advanced speech-to-text engine and one that I whole­heartedly recommend to teachers and students.

现在学生们都回到教室了, schools and universities face an existential dilemma about the role video will play going forward.
