Voice quality

Voice Codecs Explained: G.711, G.722, Opus, G.729, AMR-WB, EVS, and What’s Next

The codec is the single decision that quietly governs every call you place: how it sounds, how much bandwidth it burns, and whether two endpoints can talk at all without a media server in the middle. Pick wrong and you get choppy audio under packet loss, transcoding latency you can’t debug, or a bandwidth bill that scales with the wrong number. This guide compares the codecs that actually matter in 2026, what each is for, where each falls down, and how to architect a SIP trunk so the right one wins the negotiation.

2026-05-26 · 10 min read

By Daria Kesselman · DIDHub editorial

1. What a voice codec actually does

A voice codec (coder-decoder) turns analog sound into a compact digital bitstream at one end of a call and reconstructs it at the other. Three numbers define what it can do: the sampling rate (how often the waveform is measured), the bitrate (how many bits per second the encoded stream consumes), and the compression scheme (how aggressively it throws away data you supposedly won’t miss).

Sampling rate is the one that most directly maps to perceived quality, because of a hard physical limit: a codec can only represent frequencies up to half its sampling rate. So an 8 kHz narrowband codec captures audio to ~4 kHz, in practice the classic telephone band of 300–3400 Hz. A 16 kHz wideband codec reaches up to ~8 kHz and carries roughly 50–7000 Hz, that extra top and bottom end is exactly what makes voices sound natural instead of tinny, and what the marketing world calls “HD voice.” A 48 kHz fullband codec spans the entire range of human hearing, 20–20000 Hz, which is overkill for speech but matters for music-on-hold, conferencing, and media.

The path each packet travels is the same regardless of codec. The encoder slices the audio into frames (commonly 20 ms), compresses each frame, and hands it to RTP, which packetizes it, one or more frames per packet, stamped with sequence number and timestamp, and ships it over the network. At the far end the jitter buffer reorders packets, the decoder reconstructs the waveform, and any missing frames are papered over by packet-loss concealment. Encode → packetize → decode. The whole question of “which codec” is really a question of what tradeoff you want between the bits on the wire and the sound that comes out the other side.

That tradeoff is never free. Lower bitrate means more aggressive compression, which means more CPU to encode and a model of human speech that breaks down on anything that isn’t speech, music, fax tones, DTMF. Higher sampling rate means better fidelity but more bandwidth. Every codec below is just a different point on that surface.

2. The codec comparison table

The codecs you will actually encounter on a modern SIP trunk or WebRTC stack, with the facts that drive the decision:

CodecBitrateBandwidth classRoyaltyBest for
G.711 (PCMU/µ-law, PCMA/A-law)64 kbpsNarrowband (8 kHz)Royalty-freeUniversal interop & PSTN; the fallback that always works
G.72264 kbpsWideband (16 kHz, 50–7000 Hz)Royalty-freeHD voice at G.711 bandwidth on modern IP phones
G.7298 kbpsNarrowband (8 kHz)Royalty-free since ~2017Bandwidth-constrained trunks & high call density
Opus (RFC 6716)6–510 kbps (adaptive)Narrow → fullband (8–48 kHz)Royalty-freeThe modern default; WebRTC, AI voice, anything IP-native
iLBC~13.3 / 15.2 kbpsNarrowband (8 kHz)Royalty-freeLossy networks; each frame independent
AMR-WB (G.722.2)6.6–23.85 kbpsWideband (16 kHz)Patented (3GPP)Mobile HD voice / VoLTE
EVS (Enhanced Voice Services)5.9–128 kbpsSuper-wideband / fullbandPatented (3GPP)Next-gen mobile: VoLTE / VoNR
Speex2–44 kbpsNarrow / widebandRoyalty-freeDeprecated, superseded by Opus
GSM-FR (06.10)13 kbpsNarrowband (8 kHz)Royalty-freeLegacy, historical mobile only

A few things in that table surprise people. G.722 carries HD-quality wideband audio at exactly the same 64 kbps as narrowband G.711, same bandwidth budget, dramatically better sound, and it has been supported on mainstream IP phones for years. G.729’s core patents expired around 2017, so the codec that telecom shops spent two decades licensing is now effectively free to deploy. And Opus spans the entire useful range by itself, from 6 kbps narrowband speech to 510 kbps fullband stereo, adapting in real time, which is why it is mandatory-to-implement in WebRTC and the default for nearly every new VoIP stack.

3. Narrowband vs wideband & the “HD voice” myth

The reason a G.722 or wideband-Opus call sounds so much better than a legacy PSTN call is not subtle: it carries more than twice the audio bandwidth. Narrowband’s 300–3400 Hz window was chosen in the 1960s to cram the maximum number of calls onto copper, not to make voices sound good. It strips the low end that gives voices body and the high end that distinguishes “f” from “s,” which is why you spell things out on a bad phone line. Wideband’s 50–7000 Hz range restores both. On a typical mean-opinion-score (MOS) scale, narrowband G.711 lands around 4.2 and wideband G.722 around 4.5, and the gap sounds even larger than the number suggests because it’s a qualitative jump, not just “cleaner.”

Here is the catch that sinks most “HD voice” marketing: HD only works end-to-end. A call is wideband only if every leg, caller, every intermediate hop, and callee, supports the same wideband codec. The moment one leg is narrowband, a PSTN gateway, an old SBC, a carrier that only offers G.711, the entire call collapses to narrowband. You cannot add high frequencies back that were never sampled. So two HD phones connected through a single narrowband interconnect produce a narrowband call, and the HD badges on both handsets are lying. This is why end-to-end codec consistency, not just endpoint capability, is what actually delivers HD voice.

4. Transcoding: the hidden tax

SIP endpoints negotiate codecs in the SDP offer/answer: each side advertises an ordered list of what it supports, and they use the best match they share. When they do share a codec, media flows straight through, this is codec passthrough, and it is the cheap, clean, low-latency path. When they don’t, say one side offers only Opus and the other only G.711, something in the middle has to transcode: decode the incoming stream all the way back to raw audio and re-encode it in the other codec, in real time, for every concurrent call.

Transcoding is expensive on three axes. It adds latency, because you’ve inserted a full decode-plus-encode cycle into the media path. It costs CPU on the media server or SBC, a transcoding leg can be an order of magnitude more expensive than a passthrough leg, which is why transcoding capacity is a real line item in SBC sizing. And it degrades quality, because re-encoding lossy audio compounds artifacts (and transcoding up to a wideband codec can never recover detail a narrowband leg already discarded, you just spend CPU to make narrowband audio sound like a slightly worse narrowband call).

The design implication for SIP trunks is concrete: offer a codec your trunk and your endpoints already share, and you avoid the tax entirely. G.711 is the safest passthrough target precisely because it is universal, if both ends speak it, no media server has to touch the audio. The pragmatic pattern is to prefer a wideband codec (G.722 or Opus) when you know both legs support it, and keep G.711 in the list as the no-transcode fallback. The worst outcome is a trunk where every call silently transcodes because the offered codec lists never overlap on a passthrough-friendly option.

5. Per-codec pros & cons

Short version of when to reach for each, and when not to.

G.711 (PCMU / PCMA)

Pick it for maximum interop, PSTN-facing routes, and anywhere you want guaranteed passthrough. Universal support, near-zero codec complexity, toll-quality baseline. µ-law is standard in North America and Japan; A-law everywhere else, offer both. Avoid it when bandwidth is scarce: 64 kbps per stream is heavy, and it’s narrowband, so you leave HD quality on the table.

G.722

Pick it when you want HD voice for the same bandwidth as G.711 and both ends are modern IP phones, it’s royalty-free and widely supported. Avoid it on PSTN-facing or constrained-bandwidth legs (it’s still 64 kbps, with no bandwidth saving over G.711), and don’t expect legacy gateways to speak it.

G.729

Pick it on bandwidth-constrained trunks and high-density deployments, 8 kbps is eight times leaner than G.711, the core patents have expired, and Annex B adds voice-activity detection plus comfort noise to save even more during silence. Avoid it for music-on-hold, fax, and DTMF, all of which its speech-optimized model mangles; quality is a notch below G.711 (MOS ~3.9) and it is narrowband only.

Opus

Pick it as your default for anything IP-native: WebRTC, softphones, AI voice agents. It adapts bitrate in real time, runs from narrowband to fullband, has low latency, and ships built-in forward error correction (FEC) and packet-loss concealment that keep calls intelligible through heavy loss. Royalty-free and mandatory-to-implement in WebRTC. Avoid it only where the far end genuinely can’t negotiate it, older SIP hardware and the PSTN, in which case it transcodes to G.711.

iLBC

Pick it on lossy networks where you can’t use Opus: each frame is encoded independently, so a lost packet doesn’t corrupt its neighbors. Royalty-free; common in early WebRTC. Avoid it for new work, Opus with FEC has effectively superseded it, and iLBC is narrowband.

AMR-WB (G.722.2) & EVS

Pick them in the mobile world: AMR-WB is the 3GPP wideband codec behind VoLTE HD voice, and EVS is its successor, super-wideband to fullband, 5.9–128 kbps, with excellent quality and strong packet-loss resilience, now the codec for VoLTE and VoNR (Voice over New Radio / 5G). Avoid them as a default for fixed VoIP: they’re patent-encumbered 3GPP codecs aimed at mobile networks, and outside that ecosystem Opus is the better, freer choice.

Speex & GSM-FR

Don’t pick either for new work. Speex is deprecated, its own authors point you to Opus. GSM Full Rate (06.10) is a 13 kbps legacy mobile codec of purely historical interest. Both exist only for interop with old systems that offer nothing better.

6. What’s deprecated & what’s coming

The deprecated end of the list is unambiguous. Speex is dead, Opus was created in large part to replace it and does everything it did, better. GSM-FR is legacy, surviving only in old equipment. If you’re specifying codecs for a new deployment in 2026, neither should appear in your offer except as a last-resort interop entry.

The mainstream is consolidating fast. Opus dominates VoIP and WebRTC, it is the de facto standard for browser calling, AI voice agents, and modern softphones, and its adaptive bitrate plus FEC make it the strongest general-purpose choice on real networks. In mobile, EVS is taking over from AMR-WB as carriers roll out VoLTE and VoNR, bringing super-wideband quality and better loss resilience to cellular calls.

The genuinely new frontier is neural / ML codecs. Google’s Lyra, Microsoft’s Satin, and Meta’s EnCodec take a fundamentally different approach: instead of compressing the waveform, they transmit a compact set of features and use a machine-learning model on the decode side to reconstruct intelligible speech, achieving usable voice at sub-3 kbps, well below where traditional codecs fall apart. That’s transformative for ultra-low-bandwidth and degraded-network scenarios. The honest caveat: these are early. Standardization, cross-vendor interop, and the compute cost of running a neural decoder on every call are all unsettled, so for production SIP today they remain promising rather than deployable. Watch the space; don’t bet a trunk on it yet.

7. How DIDHub handles codecs

DIDHub delivers calls over standards-based SIP to whatever endpoint you point them at, PBX, SBC, softphone, or AI voice agent, so codec choice stays in your hands, not locked behind a proprietary stack. On your SIP trunk you configure a codec preference order, and the normal SDP negotiation picks the best shared option per call.

In practice that means: keep G.711 in the list for universal compatibility and guaranteed passthrough, and put G.722 or Opus ahead of it where you know both ends support wideband and you want HD voice. DIDHub passes codecs through wherever possible to avoid the transcoding tax, lower latency, no added quality loss, and transcodes only at the boundaries where it’s unavoidable, such as a wideband IP leg meeting the narrowband PSTN. For the deep dives on individual codecs, see our glossary entries on G.711, G.729, and Opus, and the integrations page for how this plugs into Asterisk, FreePBX, 3CX, Teams, and the AI-voice platforms.

If you want a second opinion on the right codec policy for your call profile, HD internal calling, PSTN-heavy outbound, or AI agents at scale, [email protected] will talk specifics.

Bottom line

Default to G.711 when interop is the priority, it always negotiates, always passes through, and never surprises you. Prefer G.722 or Opus for HD voice whenever you control both ends and both support wideband; Opus is the right default for anything IP-native, G.722 the easy win on modern desk phones at no extra bandwidth. Reach for G.729 only when a constrained link genuinely forces 8 kbps, and keep it away from music, fax, and DTMF. Leave AMR-WB and EVS to the mobile networks that own them, keep an eye on neural codecs without depending on them, and never start new work on Speex or GSM-FR. Get the preference order right, design for passthrough, and the codec stops being something you think about, which is exactly the goal.

More from the blog

Ready to get a number?

Pick a DID in 130+ countries from $1.99/month. Activates instantly on most numbers.