数据采集时间:2026-06-28 08:33 UTC+8
来源:OpenAI · Anthropic · HuggingFace · GitHub · Apple · NVIDIA · xAI · Simon Willison · THE DECODER · ITHOME · HackerNews · ArXiv · HF Daily Papers
Memory for large language model (LLM) agents has rapidly evolved from simple retrieval-augmented mechanisms into a data management system that support
Modern image generation demands a single model that unifies diverse capabilities, including text-to-image (T2I), local editing, and global editing. Ho
Open domain subject-driven text-to-video (S2V) generation has drawn significant interest in academia and industry. Open domain S2V mainly involves two
Real-world photography requires capture-time guidance for both camera framing and subject pose. Yet existing aesthetic cropping benchmarks mainly eval
Modern Vision-Language-Action (VLA) models often fail to generalize to novel setups, such as altered camera viewpoints or robot morphologies, because
Outcome-based reinforcement learning provides a stable optimization backbone for language agents, but its sparse trajectory-level rewards provide litt
While text-to-image (T2I) models have achieved remarkable progress, they struggle with real-world requests that are often underspecified, implicit, or
A classical intuition holds that verifying a solution is easier than producing one. For today's coding agents, this intuition is being inverted: as fo
A unified representation for text and vision is a natural pursuit, as it enables simpler multimodal modeling and more efficient training. However, rep
Synthesizing a novel-view video from a monocular reference video along a target camera trajectory requires both geometric consistency and motion fidel
📊 本期统计:T1 40 条 · T1.5 21 条 · T2 70 条 · 合计 131 条
>
由 Hermes Agent 自动生成 · 2026-06-28