Anthropic Oceanus Leaks and ChatGPT Dreaming

A new 'claude-oceanus-v1-p' has been made available to Red Teams

Anthropic appears to be gearing up for the public launch of a new version of Mythos that is better than Mythos Preview. A checkpoint of the model, codenamed Oceanus, was recently made available to red teamers. These programs typically begin a week before a wider launch. The program was apparently paused due to an individual in the program reselling the model via a Chinese API proxy. It is unknown whether this will impact the launch date.

ai-is-finally-dirivng-roi

create_a_extended_video_to_exp

ChatGPT Dreaming V3

OpenAI introduced a new memory synthesis system for ChatGPT designed to improve freshness, continuity, and relevance over longer time horizons. The update began rolling out to Plus and Pro users in the US, with broader availability planned later.

When AI builds itself

Anthropic is expediting AI development by enabling AI systems to autonomously design and develop successors, a concept known as recursive self-improvement. Internal benchmarks show AI-driven processes allow typical engineers to ship eight times more code than in previous years.

ai-Global network of autonomous

How we made continuous trace intelligence possible at scale

Braintrust founder Ankur Goyal lays out Topics, the intelligence layer for analyzing production agent traces at scale where million-token traces with hundreds of spans break every standard NLP tool that expects uniform document shapes. Inspired by Anthropic's Clio paper, the pipeline runs preprocess to facet to embed to cluster to name to classify, with the LLM summary doing the one job that makes the rest tractable since the raw trace never has to fit in an embedding model's context window.

ai-health-manufacture-education-finance

Qwen-Image-Flash

A study of few-step distillation for Qwen-Image-2.0 found that data composition, teacher guidance, and task mixture strongly affected student model performance.

Nemotron 3.5 Content Safety

NVIDIA released Nemotron 3.5 Content Safety, a unified model for multimodal, multilingual, and customizable enterprise safety enforcement. It supported auditable reasoning and was designed to fit into production moderation pipelines.

Ollama Model Tester (GitHub Repo)

Ollama Model Tester is a CLI tool for comparing local Ollama models by running the same prompt multiple times and saving responses for easy comparison.

Defending Code Reference Harness

This repository contains a reference implementation for autonomous vulnerability discovery and remediation with Claude. It can be used to build custom vulnerability pipelines based on general best practices. Anthropic offers a managed option that can find and fix vulnerabilities across multiple projects.

Anthropic says 80% of its new production code is now authored by Claude — how your enterprise can catch up

Anthropic reported that 80% of its production code now comes from its AI model, Claude, leading to an 8x increase in code volume per engineer.

Apple's Messages app on iPhone now has a third-party AI agent

Apple approved the third-party AI service Poke for use in its iPhone Messages app. This integration allows users to chat with Poke directly in iMessage to perform various tasks. Some users report issues with response times, likely due to high demand.

Accelerating the next phase of physical AI

Generalist AI secured $400 million to advance physical AGI, supported by investors like Radical Ventures and NVIDIA.

EVA-Bench Data 2.0: 3 Domains, 121 Tools, 213 Scenarios

EVA-Bench Data 2.0 expands its evaluation to three domains: Airline CSM, Enterprise ITSM, and Healthcare HRSD.

References

This article was informed by reporting and engineering write-ups from the sources below. Please visit them for the original analysis:

Anthropic Oceanus leaks 🤖, ChatGPT Dreaming 💭, recursive self improvement 🚀

Shine Soft Corp synthesizes and commentates on these sources; we do not republish their content.