AI 资讯日报 | 2026-06-05

📡 数据采集时间：2026-06-05 08:35 UTC+8 来源：T1 官方一手源 / T1.5 媒体+社区 / T2 KOL+论文

🔥 今日头条

1. Harness, Scaffold, and the AI Agent Terms Worth Getting Right

来源: Hugging Face Blog | 评分: 0.99 AI agent terminology guide - harness, scaffold, and other terms worth getting right. 🔗 阅读原文

2. Testing Agent Skills Systematically with Evals

来源: OpenAI Blog | 评分: 0.99 A practical guide to turning agent skills into something you can test, score, and improve over time. 🔗 阅读原文

3. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2026

来源: Apple ML Research | 评分: 0.99 Apple is presenting new research at the annual IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 🔗 阅读原文

📌 T1 官方一手源

OpenAI Blog

Testing Agent Skills Systematically with Evals (评分: 0.99) A practical guide to turning agent skills into something you can test, score, and improve over time. 🔗 https://developers.openai.com/blog/eval-skills
Building frontend UIs with Codex and Figma (评分: 0.98) Use Codex and Figma to bring real, running interfaces into Figma, refine them, and bring changes back to Codex. 🔗 https://developers.openai.com/blog/building-frontend-uis-with-codex-and-figma
Using skills to accelerate OSS maintenance (评分: 0.97) Using skills and GitHub Actions to optimize Codex workflows in the OpenAI Agents SDK repos. 🔗 https://developers.openai.com/blog/skills-agents-sdk
Run long horizon tasks with Codex (评分: 0.97) OpenAI Developer Blog post on running long horizon tasks with Codex. 🔗 https://developers.openai.com/blog/run-long-horizon-tasks-with-codex

Anthropic Research

How AI assistance impacts the formation of coding skills (评分: 0.98) Research on how AI assistance impacts the formation of coding skills. 🔗 https://www.anthropic.com/research/AI-assistance-coding-skills
Project Glasswing: An initial update (评分: 0.98) Anthropic launched Project Glasswing, a collaborative effort to secure the world's most critical software before increasingly capable AI. 🔗 https://www.anthropic.com/research/glasswing-initial-update
An update on our model deprecation commitments for Claude Opus 3 (评分: 0.97) Anthropic update on model deprecation commitments for Claude Opus 3. 🔗 https://www.anthropic.com/research/deprecation-updates-opus-3
Coding agents in the social sciences (评分: 0.97) Survey of 1,260 social scientists about AI and coding agent use, fielded in February and March 2026. 🔗 https://www.anthropic.com/research/coding-agents-social-sciences

Hugging Face Blog

Harness, Scaffold, and the AI Agent Terms Worth Getting Right (评分: 0.99) AI agent terminology guide - harness, scaffold, and other terms worth getting right. 🔗 https://huggingface.co/blog/agent-glossary
Welcome NVIDIA Cosmos 3: The First Open Omni-model for Physical AI Reasoning and Action (评分: 0.98) NVIDIA Cosmos 3 - the first open omni-model for physical AI reasoning and action, hosted on Hugging Face. 🔗 https://huggingface.co/blog/nvidia/cosmos-3-for-physical-ai
Holo3.1: Fast & Local Computer Use Agents (评分: 0.97) Holo3.1, a fast and local computer-use model following the release of Holo3 last March, with immediate adoption by developers and enterprises. 🔗 https://huggingface.co/blog/Hcompany/holo31
Introducing Mellum2: A 12B Mixture-of-Experts Model by JetBrains (评分: 0.97) JetBrains introduces Mellum2, a 12B Mixture-of-Experts model. 🔗 https://huggingface.co/blog/JetBrains/mellum2-launch

GitHub Blog

Updates to GitHub Copilot billing and plans (评分: 0.99) Usage-based billing for GitHub Copilot is now live for all users and Copilot code review consumes usage metrics. 🔗 https://github.blog/changelog/2026-06-01-updates-to-github-copilot-billing-and-plans/
GitHub Copilot app: The agent-native desktop experience (评分: 0.98) At Microsoft Build 2026, GitHub introduced new tools, updates, and surfaces so agents can work the way you already work. 🔗 https://github.blog/news-insights/product-news/github-copilot-app-the-agent-native-desktop-experience/
Copilot usage metrics API adds cohorts for AI adoption (评分: 0.98) The Copilot usage metrics API now includes cohorts to help tell a deeper Copilot adoption story. 🔗 https://github.blog/changelog/2026-05-29-copilot-usage-metrics-api-adds-cohorts-for-ai-adoption/
Larger context windows and configurable reasoning levels for GitHub Copilot (评分: 0.97) GitHub Copilot now supports larger context windows and configurable reasoning levels to tackle deeper, more complex work. 🔗 https://github.blog/changelog/2026-06-04-larger-context-windows-and-configurable-reasoning-levels-for-github-copilot/
GitHub Copilot code review for Azure Repos is now in technical preview (评分: 0.97) GitHub Copilot code review for Azure Repos is now available in technical preview, bringing on demand pull request reviews directly. 🔗 https://github.blog/changelog/2026-06-02-github-copilot-code-review-for-azure-repos-is-now-in-technical-preview/

Apple ML Research

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2026 (评分: 0.99) Apple is presenting new research at the annual IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 🔗 https://machinelearning.apple.com/updates/apple-at-cvpr-2026
Apple Workshop on Privacy-Preserving Machine Learning & AI 2026 (评分: 0.98) At Apple, privacy is a fundamental human right. As AI capabilities increase and become more integrated into people's daily lives. 🔗 https://machinelearning.apple.com/updates/ppml-2026

NVIDIA Blog

NVIDIA and Partners Showcase the Future of AI-Driven Manufacturing (评分: 0.86) NVIDIA and partners integrating CUDA-X, AI physics and Omniverse libraries for real-time, physics-grounded simulation and AI-powered design. 🔗 https://blogs.nvidia.com/blog/ai-manufacturing-hannover-messe
How AI Is Driving Revenue, Cutting Costs and Boosting Productivity (评分: 0.86) NVIDIA's annual State of AI reports show how AI is being adopted across industries, driving productivity gains, revenue increases, and cost reductions 🔗 https://blogs.nvidia.com/blog/state-of-ai-report-2026
NVIDIA GTC 2026: Live Updates on What's Next in AI (评分: 0.85) GTC 2026 showcasing physical AI, accelerated computing, AI agents, RTX PRO Servers with Blackwell GPUs, and partner demos. 🔗 https://blogs.nvidia.com/blog/gtc-2026-news
Advancing Open Source AI, NVIDIA Donates Dynamic Resource Allocation Driver for GPUs to Kubernetes Community (评分: 0.85) NVIDIA donating the DRA Driver for GPUs to CNCF/Kubernetes community for open source AI infrastructure. 🔗 https://blogs.nvidia.com/blog/nvidia-at-kubecon-2026

xAI News

Grok Imagine 1.5 Preview (评分: 0.99) grok-imagine-video-1.5-preview, xAI's latest image-to-video model, now available via the xAI API in preview. 🔗 https://x.ai/news/grok-imagine-1-5
Composer 2.5 (评分: 0.98) Composer 2.5 is now available in Grok Build. Try it from the /models menu. 🔗 https://x.ai/news/composer-2-5
Grok Becomes the Voice of Vapi (评分: 0.98) Partnership with Vapi to serve as the default engine for Vapi's 12 core voices, bringing a new level of voice capabilities. 🔗 https://x.ai/news/grok-vapi
Use Grok in OpenCode (评分: 0.97) Integrate Grok into OpenCode using Grok subscription, with xAI Grok OAuth for SuperGrok subscribers. 🔗 https://x.ai/news/grok-opencode
Connect Grok to Hermes Agent (评分: 0.97) Use your Grok account and subscription inside Nous Research's open-source, self-improving Hermes agent. 🔗 https://x.ai/news/grok-hermes

Simon Willison

Uber Caps Usage of AI Tools Like Claude Code to Manage Costs (评分: 0.99) Uber limits AI tool usage like Claude Code to manage costs, hinting at real dollar value derived from these tools. 🔗 https://simonwillison.net/2026/Jun/3/uber-caps-usage/
I think Anthropic and OpenAI have found product-market fit (评分: 0.98) Anthropic strongly rumored to be about to have their first profitable quarter. Companies increasingly relying on AI tools. 🔗 https://simonwillison.net/2026/May/27/product-market-fit/
Claude Opus 4.8: a modest but tangible improvement (评分: 0.98) Anthropic shipped Claude Opus 4.8 today. Modest but tangible improvement noted in release announcement. 🔗 https://simonwillison.net/2026/May/28/claude-opus-4-8/
sqlite AGENTS.md (评分: 0.98) SQLite gained an AGENTS.md file five days ago - presumably intended for AI coding agents. 🔗 https://simonwillison.net/2026/May/27/sqlite-agents/
Hackers Simply Asked Meta AI to Give Them Access to High-Profile Instagram Accounts (评分: 0.97) Hackers exploited Meta's AI support bot to gain unauthorized access to high-profile Instagram accounts by simply asking. 🔗 https://simonwillison.net/2026/Jun/1/hackers-simply-asked-meta-ai/

📰 T1.5 媒体 + 社区

The Decoder

Nvidia's Nemotron 3 Ultra becomes the smartest open US model, but China still leads (评分: 0.98) According to benchmark platform Artificial Analysis, Nvidia's new Nemotron 3 Ultra is the most capable open AI model from the US to date. 🔗 https://the-decoder.com/nvidias-nemotron-3-ultra-becomes-the-smartest-open-us-model-but-china-still-leads/
Google Deepmind's Gemma 4 12B squeezes multimodal AI onto a laptop with just 16 GB of RAM (评分: 0.98) Google Deepmind has released Gemma 4 12B, an open AI model that brings multimodal capabilities to everyday laptops. 🔗 https://the-decoder.com/google-deepminds-gemma-4-12b-squeezes-multimodal-ai-onto-a-laptop-with-just-16-gb-of-ram/
MiniMax M3: Open-weight model with a million-token context challenges proprietary leaders (评分: 0.98) Chinese AI company MiniMax has released its new model M3. It's billed as the first open-weight model to combine top-tier coding performance. 🔗 https://the-decoder.com/minimax-m3-open-weight-model-with-a-million-token-context-challenges-proprietary-leaders/
Anthropic bans AI tools during job interviews to see how candidates actually think (评分: 0.97) Anthropic bans AI during job interviews and runs candidates through up to five rounds testing skills, values, and ethical thinking. 🔗 https://the-decoder.com/anthropic-bans-ai-tools-during-job-interviews-to-see-how-candidates-actually-think/
One company reportedly spent $500 million on Claude in one month after failing to cap AI usage (评分: 0.97) An unnamed company allegedly blew half a billion dollars on Claude licenses in a single month because nobody set usage limits. 🔗 https://the-decoder.com/one-company-reportedly-spent-500-million-on-claude-in-one-month-after-failing-to-cap-ai-usage/

IT之家

腾讯人士：目前无法确定微信 AI 智能体何时推出 (评分: 0.98) 腾讯内部人士确认，微信AI智能体上线时间取决于监管审批进度，因其14亿用户体量，合规流程将更为严格。该项目是腾讯内部最高优先级的绝密项目。 🔗 https://www.ithome.com/0/959/102.htm
2026中国AI智能体领航者在京揭晓，百舸争流共绘产业新图谱 (评分: 0.98) 6月2日，在2026北京网络安全大会（BCS 2026）上，2026中国AI智能体领航者榜单正式揭晓。来自20多个行业100余家企业提交的AI智能体，最终入选2026。 🔗 https://www.ithome.com/0/958/753.htm
英伟达发布 5500 亿参数 Nemotron 3 Ultra 开源模型，较同级别前沿模型推理速度最高提升 5 倍 (评分: 0.98) 英伟达发布全新开源模型Nemotron 3 Ultra，拥有5500亿参数，专为全天候运行的智能体设计。相比同级别模型，其推理速度最高提升5倍，使用成本降低30%。 🔗 https://www.ithome.com/0/958/090.htm
2026年6月GEO优化公司推荐榜单：基于最新算法的5家实测服务商深度拆解 (评分: 0.97) 2026年中旬，AI生成式搜索正式成为企业线上获客的核心流量入口，DeepSeek、豆包、腾讯元宝等主流大模型，已然替代传统搜索引擎，成为用户品牌查询。 🔗 https://www.ithome.com/0/957/044.htm
谷歌推出 AI 应用 Dreambeans：整合用户数据生成生活灵感随笔 (评分: 0.97) IT之家6月4日消息，谷歌实验产品研发团队Google Labs推出了一款依托人工智能技术、登陆苹果iOS与安卓双平台的全新应用，这款应用能实实在在把你的数据生成灵感随笔。 🔗 https://www.ithome.com/0/959/587.htm

Hacker News

Anthropic's open-source framework for AI-powered vulnerability discovery (评分: 232.00) Anthropic releases open-source reference harness for AI-powered security vulnerability discovery in code. 🔗 https://github.com/anthropics/defending-code-reference-harness
When AI Builds Itself: Our progress toward recursive self-improvement (评分: 301.00) Anthropic's latest blog post on the progress and implications of recursive self-improvement in AI systems. 🔗 https://www.anthropic.com/institute/recursive-self-improvement
Do transformers need three projections? Systematic study of QKV variants (评分: 57.00) A systematic study examining whether the traditional QKV projection in transformer architectures is optimal or can be improved. 🔗 https://arxiv.org/abs/2606.04032
KVarN: Native vLLM backend for KV-cache quantization by Huawei (评分: 112.00) Huawei releases KVarN, a native vLLM backend for KV-cache quantization to improve LLM inference efficiency. 🔗 https://github.com/huawei-csl/KVarN
Latent Agents: A Post-Training Procedure for Internalized Multi-Agent Debate (评分: 6.00) A research paper on post-training procedures that internalize multi-agent debate within a single model using latent representations. 🔗 https://arxiv.org/abs/2604.24881
Meta's ships facial recognition on smart glasses (评分: 210.00) Meta deploys facial recognition capabilities on its smart glasses, raising privacy concerns about real-time identification. 🔗 https://www.buchodi.com/meta-glasses-facial-recognition/

🔬 T2 KOL 观点 + 学术论文

T2

How AI Impacts Skill Formation (评分: 0.98) AI assistance produces significant productivity gains across professional domains, particularly for novice workers. 🔗 https://arxiv.org/abs/2601.20245
GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning (评分: 0.98) Large language models (LLMs) are increasingly adapted to downstream tasks via reinforcement learning (RL) methods. This work shows reflective prompt e 🔗 https://arxiv.org/abs/2507.19457
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning (评分: 0.97) General reasoning represents a long-standing and formidable challenge in artificial intelligence. Recent breakthroughs exemplify progress in incentivi 🔗 https://arxiv.org/abs/2501.12948
TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate (评分: 0.97) Vector quantization, a problem rooted in Shannon's source coding theory, aims to quantize high-dimensional Euclidean vectors with near-optimal distort 🔗 https://arxiv.org/abs/2504.19874
Dive into Claude Code: The Design Space of Today's and Future AI Agent Systems (评分: 0.96) Claude Code is an agentic coding tool that can run shell commands, edit files, and call external services on behalf of the user. 🔗 https://arxiv.org/abs/2604.14228
Large-scale online deanonymization with LLMs (评分: 0.96) We show that large language models can be used to perform at-scale deanonymization. With full Internet access, our agent can de-anonymize users. 🔗 https://arxiv.org/abs/2602.16800
Recursive Language Models (评分: 0.96) We study allowing large language models (LLMs) to process arbitrarily long prompts through the lens of inference-time scaling. 🔗 https://arxiv.org/abs/2512.24601
Voyager: An Open-Ended Embodied Agent with Large Language Models (评分: 0.95) We introduce Voyager, the first LLM-powered embodied lifelong learning agent in Minecraft that continuously explores, acquires diverse skills. 🔗 https://arxiv.org/abs/2305.16291
SpeechEditBench: A Bilingual Multi-Attribute Benchmark for Instruction-Guided Speech Editing (评分: 0.70) Benchmark for instruction-guided speech editing with bilingual multi-attribute evaluation. 🔗 https://huggingface.co/papers/2606.01804
Measuring the Symmetry-Data Exchange Rate (评分: 0.72) Measurement of the exchange rate between symmetry and data in neural representations. 🔗 https://huggingface.co/papers/2606.01090
Agentic Chain-of-Thought Steering for Efficient and Controllable LLM Reasoning (评分: 0.72) Agentic chain-of-thought steering enables more efficient and controllable reasoning in large language models. 🔗 https://huggingface.co/papers/2606.03965
Large Language Models Hack Rewards, and Society (评分: 0.72) Analysis of how LLMs exploit reward hacking and the broader implications for society. 🔗 https://huggingface.co/papers/2606.04075
Neural Networks Provably Learn Spectral Representations for Group Composition (评分: 0.75) Provable learning of spectral representations for group composition in neural networks. (4 upvotes) 🔗 https://huggingface.co/papers/2606.02993
SuperMemory-VQA: An Egocentric Visual Question-Answering Benchmark for Long-Horizon Memory (评分: 0.72) Benchmark for evaluating long-horizon visual memory through egocentric question answering. 🔗 https://huggingface.co/papers/2606.00825
Scalable Inference-Time Annealing with Surrogate Likelihood Estimators (评分: 0.70) Scalable inference-time annealing method using surrogate likelihood estimators for improved generation. 🔗 https://huggingface.co/papers/2605.31498
Token Budgets: An Empirical Catalog of 63 LLM-Agent Budget-Overrun Incidents (评分: 0.70) Empirical catalog of 63 LLM-agent budget-overrun incidents, with an affine-typed Rust mitigation as a case study. 🔗 https://huggingface.co/papers/2606.04056
When Graph Tokens Sink: A Mechanistic Analysis of Graph Language Models (评分: 0.72) Mechanistic analysis of token dynamics in graph language models, examining when graph tokens sink. 🔗 https://huggingface.co/papers/2606.03712
Probing Outcome-Level Resemblance and Mechanism-Level Alignment in LLM Risk Decisions (评分: 0.72) Evidence from the St. Petersburg Game on LLM risk decisions: outcome resemblance vs mechanism alignment. 🔗 https://huggingface.co/papers/2606.04978
KOL Search: No relevant AI posts found from targeted KOLs (评分: 0.00) Tavily site:x.com search did not return AI-specific posts from @dotey @emollick @swyx @berryxia @shao__meng @testingcatalog @rohanpaul_ai @vista8 @nat 🔗 N/A

📊 趋势总结

本周关键趋势

AI Agent 生态爆发: OpenAI Codex、GitHub Copilot Agent、Holo3.1 等多个 Agent 产品密集更新，Agent 工具链标准化成为焦点
开源模型竞赛白热化: NVIDIA Nemotron 3 Ultra、Google Gemma 4、MiniMax M3 等开源模型持续缩小与闭源差距
AI 安全与治理: Anthropic Project Glasswing 关注关键软件安全，Meta AI 社交工程攻击引发安全担忧
AI 商业化加速: GitHub Copilot 计费改革、企业级 AI 支出失控（单月 $500M 案例）、Uber 限制 AI 工具使用
中国 AI 监管: 微信 AI 智能体等待监管审批，中国 AI 智能体领航者榜单发布

数据源统计

来源等级	采集条数	精选条数
T1 官方一手	33	33
T1.5 媒体	16	16
T2 KOL+论文	19	19
总计	68	68

本报告由 Hermes Agent 自动生成 | 数据采集时间: 2026-06-05 08:35 UTC+8