the brief

Agent safety and operability took center stage: OpenAI detailed Codex’s sandboxed architecture and examined chain-of-thought monitoring tradeoffs, while Anthropic pushed another wave of Claude Code reliability fixes. On the tooling side, Next.js shipped a canary with meaningful tweaks, the Agents SDK defaulted to gpt-realtime-2, and zero-native promised tiny, cross‑platform web‑UI apps. Research cautioned that delegated LLM edits can corrupt documents and explored emergent modularity in MoE.

the poursit · sip · 10 items

alerts

(01)
  • anthropics/claude-code· feedMay 8, 06:39 PM

    Claude Code patches MCP, auto mode rules

    Release re‑enables enterprise session surveys via OpenTelemetry, adds hard‑deny auto‑mode classifier rules, and fixes vanished MCP servers after /clear across VS Code, JetBrains, and the Agent SDK.

    v2.1.136 — What's changed Added CLAUDE_CODE_ENABLE_FEEDBACK_SURVEY_FOR_OTEL to re-enable the session quality survey for enterprises capturing responses through OpenTelemetry Added settings.autoMode.hard_deny for auto mode classifier rules that block unconditionally regardless of user intent or allow exceptions Fixed MCP servers configured in .mcp.json, plugins, and claude.ai connectors silently disappearing after /clear in the VS Code extension, JetBrains plugin, and Agent SDK Fixed a rare lo...

    signal 9hype 0release_notesclaude_codechangelogsource ↗

pulse

(05)
  • @ctatedevMay 9, 12:25 AM

    Zero-native ships Zig-powered app runtime

    Launch enabling native desktop and mobile apps with web UI, tiny binaries, selectable embedded web engines, and support for Next.js/Vite across macOS, Windows, Linux, iOS, and Android.

    Introducing zero-native Build native desktop + mobile apps with web UI and Zig → Tiny binaries, low memory usage → Selectable web engines (WKWebView, WebKitGTK, WebView2, Chromium/CEF) → Next.js, Vue, Svelte, Vite, React → macOS, Linux, Windows, iOS, Android pic.x.com/gMBinlXWOU

    signal 7hype 2frameworkcross_platformwebviewsource ↗
  • vercel/next.js· feedMay 8, 11:58 PM

    Next.js 16.3 canary ships fixes

    Pre-release brings Turbopack cache versioning, a chunkLoadingGlobal option, improved rewrite detection, better devtools error rendering, and several adapters and standalone module fixes developers will feel immediately.

    v16.3.0-canary.17 — Core Changes Stabilize unstable_io: #93621 Use Next.js version as Turbopack persistent cache versioning key: #93605 feat(turbopack): add chunkLoadingGlobal config option: #93488 fix(devtools): render nested error messages with HotlinkedText: #93620 Fix: Improved rewrite detection during optimistic routing: #93619 Fix "type: module" in project dir when using standalone or adapters: #93612 Instant Insights: favor reported errors over missing slots: #93709 Misc Changes Trace-...

    signal 8hype 1release_notesnextjsturbopacksource ↗
  • @unknownMay 8, 04:43 PM

    Agents SDK defaults to gpt-realtime-2

    RealtimeAgent now uses OpenAI’s gpt-realtime-2 by default with WebRTC and WebSocket paths, simplifying low-latency multimodal agent deployments in browsers and servers.

    To celebrate the realtime model release, @seratch has updated RealtimeAgent to use gpt-realtime-2 by default in the Agents SDK 📷  (WebRTC & WebSocket): openai.github.io/openai-agents-… 📷  (WebSocket): openai.github.io/openai-agents-…

    signal 7hype 1agents_sdkgpt_realtime_2webrtcsource ↗
  • openai/blog· feedMay 8, 12:30 PM

    OpenAI details Codex safety architecture

    Engineering deep-dive outlines sandboxing, approvals, network policies, and agent-native telemetry to harden coding agents for enterprise use without stalling developer velocity.

    Running Codex safely at OpenAI — How OpenAI runs Codex securely with sandboxing, approvals, network policies, and agent-native telemetry to support safe and compliant coding agent adoption.

    signal 9hype 1agent_securitysandboxingnetwork_policiessource ↗
  • @unknownMay 8, 03:18 PM

    Claude Code rolls 60+ fixes

    Anthropic details smoother long-running sessions, a tighter agent loop, broader auth compatibility, and terminal stability—iterating rapidly on agentic coding ergonomics.

    Last week we shipped 50+ Claude Code reliability fixes. This week it's 60+ more. Smoother long-running sessions, a more efficient agent loop, auth that works in more environments, and terminal fixes: 🧵

    signal 7hype 2claude_coderelease_notesreliabilitysource ↗

findings

(03)
  • hn/frontpage· feedMay 9, 08:44 AM

    LLMs silently corrupt delegated documents

    New study quantifies content drift and formatting loss when LLMs handle document operations, underscoring the need for verifiable pipelines and stronger edit constraints in agent workflows.

    LLMs Corrupt Your Documents When You Delegate — Article URL: https://arxiv.org/abs/2604.15597 Comments URL: https://news.ycombinator.com/item?id=48073246 Points: 60 # Comments: 19

  • @OpenAIMay 8, 08:19 PM

    OpenAI analyzes CoT monitoring tradeoffs

    OpenAI reports limited accidental chain‑of‑thought grading in released models and argues preserving monitorability by not penalizing misaligned reasoning during RL enables safer agent oversight.

    Chain of thought monitors are a key layer of defense against AI agent misalignment. To preserve monitorability, we avoid penalizing misaligned reasoning during RL. We found a limited amount of accidental CoT grading which affected released models, and are sharing our analysis.

    signal 8hype 1chain_of_thoughtalignmentrlhfsource ↗
  • huggingface/blog· feedMay 8, 04:03 PM

    EMO shows emergent modularity in MoE

    Hugging Face highlights EMO pretraining for mixture‑of‑experts yielding modular behaviors, pointing to more efficient specialization without sacrificing general performance.

    EMO: Pretraining mixture of experts for emergent modularity

    signal 7hype 1research_blogmixture_of_expertspretrainingsource ↗

voices

(01)
  • @nicbstmeMay 9, 04:14 AM

    Cut HTML token spend with CSS

    Externalizing CSS via a shared stylesheet can reduce HTML token output by roughly 40%, a simple prompt-structure tweak that preserves context and lowers cost for UI‑heavy generations.

    A lot of people are arguing that HTML burns more tokens than markdown. It's true but you can save at least 40% by externalizing the CSS to a template with <link rel="stylesheet" href="./styles.css">. This style.css is your formatting so the LLM will never output CSS again. I x.com/trq212/status/…

    signal 6hype 2token_efficiencyprompt_engineeringhtmlsource ↗