WebVTT & SRT to Clean Transcript

WebVTT & SRT to Clean Transcript

I built a client-side converter using Vue.js to transform WebVTT or SRT subtitle files into clean, readable transcripts. It removes timings, HTML tags, and formatting artifacts, preserving speaker names for accessibility. Useful for content creators needing plain text transcripts from caption files.

Problem / Context

Caption files (WebVTT, SRT) include timings and markup unsuitable for transcripts. Manual cleaning is tedious; needed an automated tool to extract dialogue, handle speakers, and output plain text.

Constraints / Goals

  • Client-side: no uploads.
  • Handles WebVTT/SRT: timings, speakers, styles.
  • Clean output: no HTML, extra spaces.
  • Accessible: preserves speaker info.
  • Simple UI: textarea input/output.

Core Approach

Vue computed property processes input text with regex: trim, replace speaker tags, remove HTML/timings, clean spacing, join lines.

Vue Setup

var app = new Vue({
  el: "#app",
  data: {
    title: "Subtitles to Transcript Converter",
    subtitleText: `WEBVTT\n\n1\n00:11.000 --> 00:13.000\n<v Roger Bingham>We are in New York City.\n...`
  },
  computed: {
    transcriptText: function () {
      let tScript = this.subtitleText;
      // Processing steps...
      return tScript;
    }
  }
});

Processing Steps

transcriptText: function () {
  let tScript = this.subtitleText
    .trim()
    .replace(/<v\s?([\w\s]*)>/gi, "newlinegoeshere$1: ") // Speaker tags
    .replace(/<\/?[\w\s]*>/gi, "") // Remove HTML
    .replace(/ {2,}/gi, " ") // Double spaces
    .replace(/(\s+)?WEBVTT.*\n/gi, "\n") // Header
    .replace(/^\d+\n/gim, "\n\n") // Numbers
    .replace(/((\d+:)?\d+:\d+[\.,]\d+( *--> *)?)+(.*)\n/gi, "\n") // Timings
    .replace(/\n{2,}/gi, "\n") // Extra lines
    .split("\n")
    .map(ln => ln.trim())
    .join(" ")
    .replace(/newlinegoeshere/gi, "\n")
    .trim();
  return tScript;
}

HTML UI

Two textareas: input for subtitles, disabled output for transcript.

Results / Impact

  • Quick conversion for transcripts.
  • Improves accessibility.
  • Handles complex VTT with speakers/styles.

Trade-offs / Limitations

  • Regex-based; may miss edge cases.
  • No file upload.
  • Client-side only.

Next Steps

  • File drag-drop.
  • SRT/WebVTT detection.
  • Export options.

Tech Summary

Vue.js, regex, string manipulation.

Summary Pattern Example

Problem: Caption files need cleaning for transcripts.
Approach: Vue regex processor for WebVTT/SRT to plain text.
Result: Accessible transcript converter.