🎰 Your Parser Isn’t Safe Yet

String parsing works until it doesn’t. Here’s why Phoenix teams switch to NimbleParsec when correctness, clarity, and production errors actually matter.

Welcome to GigaElixir Gazette, your 5-minute digest of Elixir ecosystem news that actually matters 👋.

. WEEKLY PICKS .

🔌 Wireless MIDI Controller Built with AtomVM on ESP32: Developer ships proof-of-concept sending MIDI data from ESP32-C3 over UDP broadcasting to control Arturia Pigments soft synth. Firmware written entirely in Elixir running on AtomVM—no C required. Demonstrates BEAM ecosystem expanding beyond web into hardware and music tech. Open source code available, creator planning to develop into complete product. Fresh reminder that Elixir runs on more than Phoenix servers.

🐄 Build Real-Time Pub/Sub Server with Cowboy Instead of Phoenix: AppSignal tutorial walks through building lightweight text pub/sub supporting HTTP publishing, WebSocket subscriptions, and Server-Sent Events directly on Cowboy. When to skip Phoenix: tight protocol control needed, focused streaming services, avoiding framework overhead. Cowboy shines for telemetry pipelines, IoT ingestion, edge servers. Tutorial covers topic model with ETS ring buffer, replay endpoints, and stats introspection. Phoenix uses Cowboy under the hood—understanding the foundation makes you better at both.

📝 AshPhoenix Form Sanitization Pattern for Clean Data: Kamaro Lambert documents three-option approach to handling messy real-world forms. params for initial values, prepare_source for forcing immutable attributes like member_id, prepare_params for transforming inputs on every change. Real example: currency inputs showing "RWF 2,270,000" in browser automatically sanitized to clean decimal before validation. No manual param munging in handle_event, no dirty data reaching resources.

🔍 ExSift Ships MongoDB-Style Query Filtering for Elixir Collections: New library brings sift.js-inspired declarative syntax to filter lists of maps or structs. Supports operators like $eq, $gt, $in, $regex, $elemMatch with dot notation for nested queries. Compilation engine pre-compiles queries into native Elixir function calls—claims 2.3x speedup over runtime interpretation. Useful when filtering in-memory data or API responses where Ecto queries don't apply.

🎙️ Jose Valim Discusses Tidewave and Coding Agent Architecture: Changelog interviews Jose on Tidewave's approach—coding agents running inside your browser against localhost, not remote sessions. Key insight: agents need verification loops with REPL access, browser coordination, and framework-aware DOM inspection. On MCP: "For coding agents, you probably just need code." Let agents write and execute directly instead of protocol overhead.

Why Your Parser Passed Review and Still Failed in Production

The predicate parser worked. It passed review. It shipped to production. Then someone queried project_id >= 100 and the system matched > instead of >=, returning wrong results silently. The cond clause order was wrong—>= needed checking before >—and nobody caught it until users complained. The task had seemed simple: transform "project_id = 123" into {:eq, "project_id", 123} for filtering Parquet files.

Standard developer instinct: reach for String.split/2 and a cond block checking each operator. The code worked locally. But the problems compounded. Each branch scanned the string from the beginning until finding a match—inefficient. Every clause was identical except the operator—unmaintainable. Extension meant copy-pasting fragile patterns. The code was smelly, but fixing it wasn't obvious.

Regex seems like the answer. ~r/(\w+)\s*(>=|<=|!=|=|>|<)\s*(.+)/ captures all the parts in one pass. Fewer lines of code. Still fragile. Still hard to extend. The captured "99.99" is still a string, not a float. Error messages when parsing fails are cryptic at best. Any extension means deciphering regex hieroglyphics without breaking existing patterns. A code reviewer gently suggested NimbleParsec instead.

NimbleParsec is a parser combinator library from Dashbit. Write small parsers, combine them into larger operations, build up block by block. Define column_name matching allowed characters, tag the result. Define operator as a choice between >=, <=, !=, =, >, <—order still matters but now it's explicit in the code structure, not hidden in conditional logic. Define value handling quoted strings, numbers, and literals. Compose them: whitespace, column, whitespace, operator, whitespace, value, end-of-string.

The payoff appears when things go wrong. Old code parsing "project_id !! 123" returned nil and maybe logged something somewhere. NimbleParsec returns {:error, "parse error at line 1, column 12: expected string \">=\"..."}—exact position, exact expectation, exact failure mode. Production debugging transforms from archaeology into reading error messages.

Smart teams recognize that more code isn't always worse code. NimbleParsec requires more lines than regex. But those lines are labeled, scoped, testable in isolation. Each parser component can be verified independently. Extension means adding new combinators without touching existing ones. The DSL reads like exactly what it does—get whitespace, name, operator, value. Under the hood, NimbleParsec compiles to binary matching clauses that are "absolutely insane" but hidden from the developer.

When should you switch? Simple validation or pattern matching within input—stick with String functions or regex. Anything more complex where you need clear error messages, extensibility, and maintainable code—give NimbleParsec a shot. Your future self debugging production parsing failures will thank you.

Remember, for parsing beyond trivial:

  1. Order-dependent conditionals hide bugs – When clause order matters for correctness, make it explicit in code structure, not implicit in cond blocks

  2. Regex trades lines for maintainability – Fewer lines of code doesn't help when extension means deciphering hieroglyphics

  3. Error messages predict debugging time – "nil" versus "expected operator at column 12" determines whether production issues take minutes or hours

  4. Testable components compound reliability – Parsing column names, operators, values independently catches bugs before they combine

. TIRED OF DEVOPS HEADACHES? .

Deploy your next Elixir app hassle-free with Gigalixir and focus more on coding, less on ops.

We're specifically designed to support all the features that make Elixir special, so you can keep building amazing things without becoming a DevOps expert.

See you next week,

Michael

P.S. Forward this to a friend who loves Elixir as much as you do 💜