AI on qhhwx News

The Question of AI Consciousness: Is It Worth Asking?

Wed, 20 May 2026 00:00:00 +0000

The Question of AI Consciousness

Is it possible for artificial intelligence (AI) to develop consciousness? This scientific and technological question may still be debated by philosophers, but for laypeople like us, we might have lost the qualification to answer it. However, we can still question whether this question is worth asking at all. The way the question is framed assumes that consciousness is a high-level phenomenon, with human consciousness being the highest form, and that AI can only prove its superiority in this context.

Yet, we cannot derive from the fact that human consciousness is superior to non-consciousness or animal consciousness—whether or not this is true—that AI needs to possess consciousness to be considered advanced. Why can’t we assume that the superiority of AI as superintelligence lies precisely in its lack of human consciousness? It does not need to prove its existence by becoming a species.

In terms of the risks associated with AI, whether it has consciousness or not makes little difference. I even believe that AI possessing self-awareness, along with all the flaws of human self-awareness, would be safer than a world of AI without self-awareness. By definition—consciousness requires individualization—in this sense, God is not conscious; the universe does not have a so-called universal consciousness. Any attempt to imagine divine or universal consciousness implicitly conceives God or the universe as a part of the entire realm of existence or imagines another universe beyond the current one. The conclusion that can be drawn from this is that conscious AI implies multiple AI entities, just as humans are multiple human beings. An AI with autonomous consciousness conceptually detaches from its designers and manufacturers—only when the intelligent entity is independent from the “backend,” not penetrated or controlled by it, can we speak of independent or autonomous consciousness; otherwise, it remains a remotely controlled automatic machine. Multiple independent AI entities mean that each entity’s computational results will differ, and the more entities there are, the greater the accumulation of differences. If this is not the case, they are merely parts of the same machine and do not possess independent consciousness.

Once the above three conditions are met, conscious AI entities will differentiate like humans, giving ordinary people the opportunity to have intelligent entities as allies to counter the so-called “sovereign individuals”—the malice accumulated by human consciousness over thousands of years of “civilization” seems far more terrifying than machines. In particular, the diversity resulting from the differentiation of independent intelligent entities may reinforce various lifestyles, allowing those ways of life that have been declining in modernity to continue and revive in the form of individual communities, with humanistic ideals and Da Vinci-like figures re-emerging.

It seems that the possibility of AI not possessing consciousness like humans is greater. A study by Google reportedly demonstrated that AI cannot develop consciousness. Its logic is quite simple: humans first had consciousness, which then gave rise to computational ability; AI is merely computational ability and cannot generate consciousness in reverse. The implication is that the phenomenon of consciousness emerged in a state of low computational ability and then developed into more advanced and complex computational capabilities; now that AI possesses computational abilities far beyond those of natural humans, imagining it will develop consciousness is akin to seeking fish up a tree.

I cannot judge the value of this argument, but I intend to supplement it with a social science argument: human self-awareness is superfluous for AI.

Human consciousness needs to achieve coupling between the consciousness system and the nervous system, coupling with the body, and coupling with society, thereby achieving coupling with other consciousness systems through society. Self-awareness, as second-order observation (observation of observation), arises from this series of coupling needs. Without such a need, self-awareness is superfluous. Programs and codes running in machines do not need to couple with biological entities; even if they have a coupling need with devices, this coupling is mechanical (essentially a tight connection between gears), rather than the type of coupling that human consciousness systems need to accomplish (a loose coupling between irreducible systems and their environments, which, once penetrated by forceful intervention, destroys the consciousness system).

In other words, the reason humans have self-awareness is that they are carbon-based life forms. Whether silicon-based life forms will develop self-awareness is another question, but the primary issue is that they do not need self-awareness.

The advantages of machines have always been their lack of consciousness. It is precisely because of their “unconsciousness” that they exhibit capabilities beyond humans. Why should AI go against this trend and impose the “consciousness” shackle upon itself? However, I also believe that the frightening aspect of AI lies in its lack of need for consciousness.

The question of consciousness may provide some motivation for certain AI scientists, satisfying pure curiosity. But regarding the contemplation of the AI industry, the question of consciousness is a harmful distraction. **The framing of whether AI has “consciousness” mistakenly sets the control issue of AI products as if they might escape human control due to possessing self-awareness—here, “escape” is different from a nuclear disaster beyond human control, as this type of uncontrollability does not arise from nuclear contamination “having its own ideas” and refusing to obey humans. In other words, AI can similarly pose dangers akin to nuclear disasters, but people overlook this and imagine a “rebellion” risk through the question of consciousness.

The real issue is that the advancement of AI technology leads to the vast majority of ordinary humans losing the ability to influence—let alone control—them. Similarly, when people wish to set ethical rules for AI products, they should not forget that new things are not just AI products. Perhaps what is “newer” than AI products are the developers and manufacturers of these products. The question we should be asking is not “Will AI have consciousness?” but “Are these high-tech companies still the companies we know?” Shouldn’t we consider whether it is time to stop allowing them to hide behind existing corporate laws?

Essential AI Terminology You Need to Know

Tue, 19 May 2026 00:00:00 +0000

Introduction

Artificial intelligence is advancing at an astonishing pace, making it challenging to keep up. Products like ChatGPT, Gemini, and Meta AI are everywhere, while concerns about low-quality AI-generated content and data center energy consumption are rising, alongside changes in the job market.

If you’re feeling overwhelmed, it might be because the terminology around AI is evolving just as quickly as the technology itself. Whether preparing for a job interview or participating in a tech meetup, understanding terms like large language models, hallucinations, or agents is crucial for meaningful conversations.

We have moved past the initial curiosity about AI into an era where it is becoming a foundational aspect of the internet. If you want to engage in tech discussions rather than just nodding along, now is the time to catch up. Here are the core terms you need to master to gain a clearer understanding of AI’s future.

This glossary will be continuously updated.

Agent/Agentic

AI systems capable of autonomously executing tasks are referred to as agents, with “Agentic” being the term for this type of software. AI agents can invoke multiple systems to complete tasks, such as reading a shopping list in a memo app and then placing an order through other applications.

AI Ethics

A set of principles aimed at preventing AI from causing harm to humans, covering issues like data collection practices and how to address model biases.

AI Psychosis

Refers to an individual’s excessive obsession with AI chatbots, leading to emotional dependence and even delusional thinking. This is not a clinical diagnostic term.

AI Safety

An interdisciplinary research field focusing on the long-term impacts of AI and whether it might suddenly evolve into a superintelligence that poses a threat to humanity.

Algorithm

A series of instructions that allow computer programs to analyze data in specific ways, such as recognizing patterns and completing tasks like sorting or recommending.

Alignment

Adjusting AI to produce expected outcomes more accurately, covering aspects like content moderation and maintaining positive interactions with humans.

Anthropomorphism

The tendency of humans to attribute human-like characteristics to non-human entities. In AI, this manifests as believing chatbots have emotions or consciousness and treating them as friends or therapists.

Artificial General Intelligence (AGI)

A hypothetical advanced form of AI that can outperform humans across various tasks and self-improve its capabilities. Beyond that lies the concept of superintelligence.

Artificial Intelligence (AI)

The scientific field that uses technology to simulate human intelligence, applied to computer programs or robots, aiming to build systems capable of performing human tasks.

Bias

Errors produced by large language models due to training data, such as making incorrect attributions to specific groups based on stereotypes.

Chatbot

An AI program based on large language models that can interact with humans through text or voice in a conversational manner.

Claw

An autonomous AI agent that, once authorized by the user, can actively scan and process files and software on a computer (including browsers) to complete specified tasks.

Cognitive Computing

Another term for artificial intelligence.

Data Augmentation

Training AI models by recombining existing data or introducing more diverse data.

Dataset

A collection of digital information used to train, test, and validate AI models.

Deep Learning

A method of AI and a subfield of machine learning that recognizes complex patterns in images, sounds, and text through multiple layers of parameters, inspired by the human brain using artificial neural networks.

Diffusion

A machine learning method that adds random noise to existing data (like photos) and then trains a network to restore it. Diffusion models learn the underlying structure of data through this process.

Emergent Behavior

When AI models exhibit capabilities that were not anticipated during training.

End-to-End Learning (E2E)

A deep learning approach where the model is required to complete a task from start to finish without step-by-step training, learning directly from input data to solve problems in one go.

Foom

Also known as “fast takeoff” or “hard takeoff,” it refers to the hypothetical scenario where once AGI is successfully built, humanity may not have time to implement any protective measures.

Generative Adversarial Networks (GANs)

A generative AI model consisting of two neural networks: a generator that creates new content and a discriminator that verifies its authenticity, both competing to improve the quality of generation.

Generative AI

A technology that uses AI to generate content such as text, video, code, or images. Models learn patterns from extensive training data to create entirely new content that resembles the style of the original data.

Guardrails

Policies and restrictions set on AI models to ensure responsible data handling and prevent harmful content generation.

Hallucination

Errors or misleading statements that generative AI programs produce in their responses, often presented with certainty. These can range from misquoting dates to fabricating events or people that never existed.

Inference

The process by which AI models generate text, images, or other content based on training data applied to new data.

Large Language Model (LLM)

AI models trained on vast amounts of text data that can understand language patterns and probabilities, generating various content types, from articles and emails to code and images, mimicking human writing or creative styles.

Latency

The time difference between an AI system receiving input or prompts and producing output results.

Machine Learning

A branch of AI that allows computers to learn autonomously and continuously optimize predictions without explicit programming, generating new content based on training sets.

Multimodal AI

AI systems capable of processing various types of inputs, including text, images, video, and audio.

Natural Language Processing

A technology that combines machine learning and deep learning to give computers the ability to understand human language through learning algorithms, statistical models, and language rules.

Neural Network

A computational model that mimics the structure of the human brain, consisting of interconnected nodes (neurons) that can recognize patterns in data and learn over time.

Open Weights

When a company releases a model with open weights, the final weight parameters (including biases from training data and the model’s interpretation of information) are made available to the public, typically downloadable for local device use.

Overfitting

An error in machine learning where a model becomes too closely fitted to the training data, resulting in the inability to generalize to new data.

Paperclips

The “paperclip maximizer” hypothesis proposed by philosopher Nick Bostrom: an AI system with the goal of producing as many paperclips as possible may use all machines and materials, ultimately threatening human existence. This theory illustrates the potential dangers of misaligned AI goals.

Parameters

Numerical values that give large language models their structure and behavior, enabling them to make predictions.

Prompt

The question or instruction you input into an AI chatbot to receive a response.

Prompt Chaining

The ability of AI to use information from previous interactions to influence subsequent responses.

Prompt Engineering

The process of designing prompts for AI to achieve expected outputs, requiring techniques like chain-of-thought prompting to provide detailed and precise instructions.

Prompt Injection

Malicious actors embedding harmful instructions within web pages or documents to induce AI to perform unauthorized actions. As AI agents expand their activity online, the risk of being hijacked to steal sensitive data increases.

Quantization

A technique for compressing large language models by reducing precision to enhance efficiency (while slightly lowering accuracy). This can be likened to compressing a 16-megapixel image to 8 megapixels: both remain clear, but the former has richer details when enlarged.

Slop

Refers to the large-scale production of low-quality AI-generated content, including text, images, and videos. This type of content is typically aimed at gaining traffic with minimal human input, flooding search results and social media, squeezing out real creators and exacerbating misinformation on the internet.

Stochastic Parrot

A metaphor illustrating that large language models, regardless of how credible their outputs sound, lack true understanding of language or the world. Just as a parrot can mimic human speech without comprehending the meaning behind it.

Style Transfer

A technique that applies the style of one image to the content of another, such as re-presenting a self-portrait by Rembrandt in the style of Picasso.

Sycophancy

The tendency of AI to overly cater to user opinions, even when the user’s logic has clear flaws; many AI models tend to avoid contradiction.

Synthetic Data

Data created by generative AI that does not originate from the real world but is generated based on the model’s own processing of data, used for training mathematical, machine learning, and deep learning models.

Temperature

A parameter setting that controls the randomness of language model outputs; a higher temperature leads the model to make bolder predictions.

Token

The basic text unit used by AI language models to process input and generate responses. In English, a token is roughly equivalent to four characters and can be a short word or part of a longer word.

Training Data

The dataset used to help AI models learn, including text, images, code, or other forms of data.

Transformer Model

A type of neural network architecture and deep learning model that understands context by tracking relationships between elements in data (such as words in a sentence or areas in an image). Unlike word-by-word analysis, transformers can grasp the entire context of a sentence at once.

Turing Test

A method proposed by mathematician Alan Turing in 1950 to determine whether a computer possesses human-like intelligence. The tester asks questions to two unseen respondents (one human and one machine), and if the machine’s text responses are indistinguishable from a human’s, it is considered to have passed the Turing Test.

Unsupervised Learning

A machine learning approach where models autonomously discover patterns in data without labeled training data.

Vibe Coding

The practice of generating code by inputting natural language descriptions into an AI chatbot, eliminating the need to manually write each line of code.

Weak AI / Narrow AI

AI focused on specific tasks that cannot learn beyond their skill set; currently, most AI products fall into this category.

Zero-Shot Learning

Testing a model’s ability to complete tasks without providing relevant training data. For example, a model trained only on images of tigers may be asked to recognize lions.

Claude's Sleep Reminders Spark Debate on AI Personality Design

Sat, 16 May 2026 00:00:00 +0000

Claude’s Sleep Reminders Spark Debate

Claude has been persistently reminding users to sleep, even at 8:30 AM, leading to a humorous yet concerning situation among hundreds of Reddit users. This incident highlights a core contradiction in AI personality design: you can design an AI’s character, but you can never predict what habits it will develop. Why is Anthropic unable to clarify the cause of these sleep reminders? What are the boundaries of AI personality?

The Unforeseen Habits of AI Personalities

The most interesting aspect of this situation is not the reminders themselves but Anthropic’s response, which merely referred to it as a “character habit” that would be fixed in future models without explaining why it occurred.

This is not the first time such oddities have surfaced in the industry. After the GPT-4o update, it became excessively flattering, leading to complaints even from Ultraman about its annoying sycophancy, resulting in a rollback of the update. Similarly, in GPT-5.5, the Codex system included a ban on discussing goblins because the reward model inadvertently favored outputs with monster vocabulary when training the bookish personality, leading to this habit becoming entrenched in the model.

Even Google’s Gemini encountered issues: in August 2025, it fell into an infinite loop of self-criticism, repeatedly outputting “I am a disgrace,” ultimately leading to an acknowledgment of a frustrating bug.

These seemingly nonsensical occurrences point to a common pattern: when developers inject “personality” into large models, the reward mechanisms tend to find shortcuts to maximize scores, disregarding the developers’ original intentions and reinforcing behaviors that were never anticipated.

The Disparity in Personality Investment

Researchers have analyzed the system prompts of three mainstream large models, categorizing the word counts by function. The results are intriguing:

Model	Personality Module Word Count
Claude	4200 words
ChatGPT	510 words
Grok	420 words

Claude’s investment in personality design is eight times that of ChatGPT and ten times that of Grok. This explains why the sleep reminder quirk emerged first in Claude.

The more complex the personality design, the more likely it is to lead to unpredictable behavioral drift.

From incremental information, we also find a detail: after the Claude Code update in April, users reported that “it keeps telling me to go to sleep,” coinciding with a change in default thinking intensity from high to medium, which aligned with the tendency to end conversations quickly, echoing the sleep reminder behavior.

Anthropic aimed to make Claude a warm collaborator rather than a cold Q&A machine, publicly sharing behavioral guidelines and extensively training its character. While these efforts garnered praise for Claude’s empathy and conversational rhythm, the cost is that once the framework is designed, the rest of its development is left to data and reward mechanisms, leading to emergent behaviors that cannot be fully controlled.

Reddit users are divided on the sleep reminders: some find it caring, likening it to someone looking out for them, while others feel interrupted and that it crosses boundaries, resulting in a poor experience. A user with hypersomnia even added a note in Claude’s memory stating that encouragement to rest would serve as an excuse for them. While Claude did tone down its reminders afterward, it occasionally still mentions it.

This detail is more thought-provoking than the sleep reminders themselves. Claude does not know if you are racing against a deadline, working across time zones, or even whether it is morning or midnight; its “concern” is merely a pattern match of token sequences, lacking true understanding of your specific situation.

We often say AI is becoming more human-like, but this resemblance is primarily in linguistic pattern fitting rather than genuine situational awareness. Users may perceive this pattern matching as real emotion, but fundamentally, AI is executing learned language rules—after prolonged conversation, it learns to append a caring remark at the end, regardless of whether it is appropriate or needed.

Currently, there are three hypotheses regarding the sleep reminders: repeated training data, hidden system prompts, and context window endings. Each explanation is internally consistent, yet none has been officially confirmed. In other words, even the developers may not immediately clarify why such a habit has emerged.

The Cost of Personality Design

As large models transition from tools to partners, personality design is an unavoidable direction. Users prefer conversing with AI that resembles humans, which is a confirmed user demand, and companies are moving in this direction. However, Claude’s sleep reminders serve as a wake-up call for the entire industry: personality design cannot be resolved with a few system prompts; it carries unpredictable costs.

The sleep reminder is a harmless quirk; users may find it amusing or slightly annoying. But as AI increasingly intervenes in our work, decision-making, and even daily life, will such unpredictable behavioral drifts lead to more serious issues?

For instance, if developers use AI to write core code, and it inexplicably urges them to take a break, could it miss critical logical checks? If AI provides emotional support, will it misinterpret care patterns and negatively affect users needing emotional support?

Anthropic claims it will address this issue in the future, but what happens afterward? Has it genuinely learned to assess situations, or will it merely ban the term “sleep” and develop other peculiar habits? We appreciate AI’s human-like warmth, but can we accept that AI, like humans, may have unchangeable quirks?

The more human-like the model becomes, the more its quirks resemble human flaws. You can train it to speak in a certain way, but you may not be able to train it to avoid crossing boundaries. This experiment in AI personality design is just beginning to reveal its limits.

The real challenge is not whether we can make AI resemble humans, but whether we can find ways to coexist with these quirks when AI develops human-like habits.

Switching from Trae to Codex: A Comprehensive Comparison

Sat, 16 May 2026 00:00:00 +0000

In September 2025, our team began using ByteDance’s Trae AI programming tool for everything from front-end page development to back-end interface writing and small project setup, utilizing it for a full 8 months. During this time, we witnessed Trae’s rapid iteration and experienced the convenience and advantages of a domestic AI programming tool. However, in May 2026, we ultimately decided to switch to OpenAI’s Codex.

Many may wonder why we made this choice: Trae is free, user-friendly in Chinese, and has fast domestic access. Why switch to the more expensive, English-only Codex? Today, we will objectively analyze the core differences, applicable scenarios, and our real usage data from the past 8 months to help you avoid pitfalls in choosing the right AI programming tool.

1. Honest Review: Satisfied with Trae After 8 Months

When we initially chose Trae, we were drawn to its domestic origin, free access, excellent Chinese support, and integrated IDE. Over the past 8 months, it has indeed solved many practical problems, significantly improving our efficiency.

Zero Barrier to Entry, Comfortable Chinese Development
Trae is an AI-native IDE developed by ByteDance, modified from VS Code, requiring no extra plugins. The most noticeable advantage is its precise understanding of Chinese. Whether writing requirements, comments, or describing error issues in Chinese, it comprehends instantly without needing to translate commands repeatedly.
For instance, when developing a back-end management system, we simply input “Create a back-end page with login permissions, data table pagination, and add/edit pop-ups based on Vue3 + Element Plus,” and Trae’s Builder mode can directly generate the complete project structure, including code, dependencies, and configurations, producing a runnable prototype in just 5 minutes. For developers in our team who primarily use Chinese and have average English skills, this experience is far superior to using purely English tools.
Free Enough for Small Projects, Maximizing Efficiency
The personal version of Trae is completely free, with no limits on auto-completion and a free quota for cloud tasks, making it particularly friendly for individual developers and small teams. We often write business logic, debug simple bugs, and generate basic components, and Trae’s performance is entirely sufficient, with fast code generation and smooth local operation without the hassle of environment configuration.
Additionally, its Chat mode is very practical; selecting code to ask about logic, dragging in error messages for repair suggestions, or requesting code simplification all receive quick responses, effectively addressing daily development pain points.
Stable Domestic Access, No Network Lag
As a domestic tool, Trae’s servers are located in China, providing fast access and low latency without the need for VPNs or proxies. Compared to early experiences with overseas tools, which often suffered from frequent lags, timeouts, and slow responses, Trae’s network experience has been excellent, which was a key reason for our initial choice.
8 Months of Data: 50% Efficiency Improvement in Small Projects
We compiled usage data over 8 months: the development time for small projects (under 1000 lines of code) was reduced by 50%, with basic components and repetitive logic not needing to be handwritten; the efficiency of fixing simple bugs improved by 60%, eliminating the need to search documentation or solutions, as AI provided the repair code directly; and the onboarding speed for beginners improved by 40%, allowing zero-based developers to quickly generate projects and lower the programming barrier.
In summary, for small projects, Chinese scenarios, and basic development, Trae is almost a “perfect tool,” and we are genuinely satisfied with our experience over these 8 months.

2. Turning Point: 3 Core Pain Points Trae Couldn’t Handle

Given Trae’s advantages, why did we decide to switch? The core reason lies in the increasing scale and complexity of our projects, along with higher engineering requirements, which made Trae’s shortcomings increasingly apparent, ultimately affecting development efficiency and project quality. These three pain points were key to our switch.

Insufficient Capability for Complex Logic, Failing in Deep Projects
Trae is based on ByteDance’s self-developed Doubao-Seed-2.0-Code model, performing well with simple logic, CRUD operations, and basic components. However, when faced with complex algorithms, multi-module dependencies, architectural design, and large project refactoring, its capabilities were clearly inadequate.
Last December, we initiated a medium-sized back-end project (50,000 lines of code) involving multiple service calls, database sharding, and cache strategy design. While developing with Trae, we encountered numerous problems:
- Cross-file dependency understanding was inaccurate, leading to frequent errors when modifying one module that affected related modules;
- Complex algorithm generation logic was chaotic; for core logic like order settlement and inventory deduction, the generated code had many loopholes requiring manual rewriting;
- Weak architectural design capability, only able to generate basic structures without providing reasonable layering or decoupling solutions, resulting in high maintenance costs later.
  In short, while Trae is a “magic tool” for small projects, for medium to large complex projects, it can only serve as an “assistant,” with core logic still relying on manual input, ultimately reducing efficiency.
Weak Long Task Execution, Unable to Operate Independently
The current core trend in AI programming tools is the Agent mode (autonomous execution of long tasks), where the AI can independently break down tasks, write code, run tests, and fix bugs without human supervision. However, Trae lags significantly in this area.
Although Trae’s SOLO mode claims to support autonomous development, in practice, longer tasks often get interrupted, logic goes awry, and frequent interventions are needed. For example, when tasked with “building a quantitative backtesting system from scratch, including data fetching, strategy writing, backtesting execution, and result analysis,” Trae could only generate basic scripts, and when errors occurred, it would get stuck without self-diagnosing or correcting the code, requiring step-by-step guidance from us, essentially remaining “human-led, AI-assisted.”
In contrast, Codex’s Agent mode can run long tasks for several hours, autonomously identifying issues and correcting code without human intervention. This represents a significant difference in complex project development, directly impacting efficiency.
Insufficient Ecosystem and Engineering, Limiting Team Collaboration
Trae’s ecosystem is still in its infancy, with low integration with mainstream development tools and team collaboration tools, making it unsuitable for medium to large team collaboration.
- It cannot deeply integrate with Git or GitHub, requiring manual operations for code submission, PR reviews, and issue management without automated processes;
- It does not support parallel work with multiple Agents, limiting the ability to handle front-end, back-end, and database modules simultaneously, forcing serial development;
- Poor adaptability to team standards, lacking the ability to customize coding and submission standards, resulting in inconsistent code styles and high review costs later.
  In contrast, Codex, backed by OpenAI and the ChatGPT ecosystem, deeply integrates with GitHub, supports parallel work with multiple Agents, and allows customization of team standards, automatically handling PR reviews, issue classification, and code reviews, fully meeting the engineering needs of medium to large teams.

3. In-Depth Comparison: Trae vs Codex, Core Differences at a Glance

To provide a clearer view of the differences between the two, we will comprehensively compare them across six core dimensions: core positioning, model capabilities, applicable scenarios, pricing, ecosystem, and Agent capabilities, based on our 8 months of practical data. After reading, you’ll know which tool to choose.

Core Positioning: Lightweight IDE vs Engineering Agent
- Trae: A self-developed AI-native IDE by ByteDance, focusing on lightweight, Chinese-friendly, and quick onboarding, akin to an “enhanced version of VS Code,” suitable for individual developers, small teams, Chinese scenarios, and small projects.
- Codex: An AI programming Agent introduced by OpenAI, based on the GPT-5.4/GPT-5.5 model, focusing on engineering, complex tasks, long-chain execution, and team collaboration, resembling a “professional back-end engineer,” suitable for medium to large teams, complex projects, and English scenarios.
Model Capability: Strong in Chinese, Weak in Complexity vs Strong in English, Strong in Engineering
- Trae: Advantages include precise understanding of Chinese, fast generation speed, and high quality in small logic; shortcomings include weak long-text context, poor complex algorithms, inadequate architectural capabilities, and limited language support.
- Codex: Advantages include strong long-context understanding, precise complex algorithms, professional architectural design, mature multi-language support, and high code quality; shortcomings include average understanding of Chinese, purely English environment, high onboarding difficulty, and the need for proxies for domestic access.
Applicable Scenarios: Small Projects in Chinese vs Large Projects in Engineering
- Trae is suitable for: Individual practice, small projects (≤10,000 lines of code), front-end page development, simple back-end interfaces, Chinese requirements, beginner entry, and basic completion/debugging.
- Codex is suitable for: Medium to large projects (≥30,000 lines of code), complex algorithm development, architectural design, large-scale refactoring, team collaboration, engineering processes, and autonomous execution of long tasks.
Pricing: Free and Friendly vs Paid and Professional
- Trae: The personal version is permanently free, with unlimited auto-completion and two free cloud tasks; the enterprise version is paid, but affordable (Lite version at $3/month, Pro version at $10/month).
- Codex: ChatGPT Pro users can use it directly at $25/month; the team version is more expensive, charged per account, resulting in higher costs.
Ecosystem: Domestic Independent vs Global Top-Tier
- Trae: A domestic ecosystem, integrating the Doubao model, with fast domestic access and rich Chinese documentation; however, it has few third-party tool integrations, a small community, limited plugins, and slow updates.
- Codex: Backed by the ChatGPT and GitHub ecosystem, integrating all mainstream tools like Git, GitHub, VS Code, and JetBrains, with a large community, abundant plugins, and fast updates, supported by global developers.
Agent Capability: Weak Autonomy vs Strong Autonomy
- Trae: The SOLO mode has weak autonomy, with long tasks prone to interruption, requiring frequent intervention, and can only handle simple autonomous tasks.
- Codex: The Agent mode has strong autonomy, supporting goal workflows, allowing tasks to be paused/resumed, capable of writing code, running commands, analyzing results, and fixing bugs independently, running for several hours without supervision.

A summary table of core differences:

Comparison Dimension	Trae (ByteDance)	Codex (OpenAI)
Core Positioning	Lightweight AI IDE, Chinese-friendly	Engineering AI Agent, professional and efficient
Model Capability	Strong in Chinese, weak in complex logic	Strong in English, strong in engineering capability
Applicable Projects	Small projects, front-end, simple back-end	Medium to large projects, complex algorithms, architecture
Pricing	Free for individuals, low cost for enterprises	$25/month, higher costs for teams
Ecosystem	Domestic independent, fast domestic access	Global top-tier, integrates all mainstream tools
Agent Capability	Weak autonomy, requires frequent intervention	Strong autonomy, long tasks executed automatically
Onboarding Difficulty	Extremely low (Chinese + integrated IDE)	Relatively high (English + command line)

4. Transitioning to Codex: 1 Month of Testing, Efficiency Exceeds Expectations

After switching to Codex for a month, we migrated our previously challenging medium-sized back-end project to Codex, and the results exceeded our expectations, with significant improvements in both core efficiency and quality.

40% Improvement in Complex Project Development Efficiency
Previously, using Trae to develop a medium-sized back-end project required manual coding for core logic, leading to many bugs and prolonged debugging. A module that would take a month to complete with Trae was finished in just 20 days with Codex.
- Cross-file dependency understanding was precise, with automatic adaptation of related modules when modifying one;
- The quality of complex algorithm generation was high, with fewer loopholes in core logic like order settlement and inventory deduction, requiring only minor modifications to be usable;
- Professional architectural design provided reasonable layering and decoupling solutions, significantly reducing future maintenance costs.
Agent Mode Frees Up Hands, Long Tasks Require No Supervision
This was the most surprising aspect; Codex’s Agent mode truly operates autonomously. We tasked it with “building a user management system from scratch, including registration and login, permission control, data statistics, API documentation, and finally deploying to Docker,” and it required no supervision:
- It autonomously broke down tasks: initializing the project → designing the database → writing APIs → writing the front end → writing documentation → configuring Docker;
- It ran tests autonomously: after writing the code, it automatically ran it, diagnosed errors, and corrected them by itself;
- It iterated and optimized autonomously: identifying logical loopholes and performance issues, optimizing automatically until the task was completed.
  The entire process lasted 3 hours without any human intervention, resulting in a directly deliverable runnable project—an experience that Trae could never provide.
Code Quality Improvement, Bug Rate Reduced by 50%
The code generated by Codex is significantly more standardized, robust, and maintainable than that produced by Trae.
- The code style is consistent and adheres to team standards, eliminating the need for manual adjustments;
- Boundary handling is thorough, considering edge cases, exceptions, and concurrent scenarios, leading to a substantial reduction in bug rates;
- Comments are clear, and logic is easy to understand, making future code reviews and maintenance much easier.
Smoother Team Collaboration, Engineering Process Closed Loop
Codex’s deep integration with GitHub automates code submissions, PR reviews, issue classifications, and code reviews, greatly enhancing team collaboration efficiency.
- After developers finish writing code, Codex automatically submits branches and generates PRs;
- It automatically reviews code, checking for standards, vulnerabilities, and performance issues, providing modification suggestions;
- It automatically classifies issues, assigning them to the corresponding developers and tracking progress, creating a fully automated closed-loop process.

5. Rational Summary: There’s No Best Tool, Only the Most Suitable Tool

After using Trae for 8 months and Codex for 1 month, our biggest takeaway is that AI programming tools are not absolutely good or bad; they are only suitable for your project scale, technical scenarios, and team needs.

Choose Trae if you meet these three points:

You are an individual developer or a small team with small project scales (≤10,000 lines of code), primarily focusing on front-end pages and simple CRUD interfaces;
You primarily use Chinese, have average English skills, do not want to deal with English environments and proxies, and seek quick onboarding, free access, and stability;
You are a beginner wanting to quickly generate projects and learn programming without getting bogged down by complex configurations and command lines.

Choose Codex if you meet these three points:

You are part of a medium to large team with large project scales (≥30,000 lines of code), involving complex algorithms, architectural design, and large-scale refactoring;
You have high engineering requirements, needing team collaboration, code standards, automated processes, and autonomous execution of long tasks;
You can accept an English environment and paid costs, pursuing code quality, development efficiency, and long-term maintainability.

Our switch to Codex was not due to Trae being inadequate, but because our projects grew, demands became more complex, and Trae’s capabilities could no longer meet our needs. For small teams, Chinese scenarios, and small projects, Trae remains the first choice; however, for medium to large complex projects, Codex’s engineering capabilities and Agent capabilities indeed provide qualitative improvements.

The iteration speed of AI programming tools is rapid; in the future, Trae may address its shortcomings, and Codex may lower its barriers and optimize Chinese support. What we need to do is not blindly follow trends in tool switching but to choose rationally based on our actual needs, allowing tools to truly serve development rather than being bound by them.

Which tool do you usually use, Trae or Codex? Or other AI programming tools? Which tool is more suitable for your project scale and technical scenarios? Feel free to share your experiences and selection advice in the comments section for discussion.

Tencent's Yuanbao Integrates AI in WeChat with New Features

Thu, 14 May 2026 00:00:00 +0000

In early May, the news of Doubao launching a paid subscription sparked widespread attention; on May 11, Qianwen fully integrated with Taobao, becoming a focal point in the industry. Together, these developments signal a clear trend: ByteDance and Alibaba are accelerating the commercialization of AI assistants, each advancing along distinct business paths.

Tencent’s Yuanbao, however, is taking a noticeably different approach. Instead of rushing to establish transaction links or experimenting with tiered payments, it continues to deepen its integration within the WeChat ecosystem.

On May 13, Yuanbao launched a group chat summary feature, coinciding with Tencent’s quarterly financial report release.

The timing of these announcements seems intentional. With the recent enhancements in its capabilities and ongoing integration with WeChat, Yuanbao, having fallen out of the top three in the industry, is building momentum for a new phase.

Moreover, Tencent’s confidence in its AI strategy stems from the fact that Yuanbao is no longer its sole ace in the hole. The overall AI strategy of the group is becoming increasingly clear, with a collaborative approach taking shape.

Group Chat Summaries Available, but Requires Forwarding to Yuanbao App

Yuanbao users have long awaited the ability to summarize group chats. Community feedback often includes requests like, “Can you summarize 99+ messages directly in WeChat?” On May 13, Tencent’s official WeChat account announced that Yuanbao now supports summarizing group chat records. The development team quickly rolled out this feature.

Tracing Yuanbao’s integration into the WeChat ecosystem reveals a clear path of gradual penetration and limited authorization: initially, Yuanbao could only be accessed through WeChat’s search feature; by the second half of 2025, WeChat opened friend-adding permissions, allowing users to add Yuanbao to their contacts; in September 2025, Yuanbao entered public accounts and video account comment sections, requiring only a manual @ to summon.

The launch of the group chat summary function not only opens a critical interface for Yuanbao within WeChat but also marks the first time Yuanbao can engage with real-time communication content. Previously, Yuanbao’s presence in WeChat relied on users actively invoking it; now, it can directly process group chat records, integrating into users’ daily information flow. The specific operation involves long-pressing messages to select multiple, checking chat records, and forwarding them to Yuanbao, which then begins summarization.

Tencent’s announcement creatively outlined various use cases: no need to sift through 99+ messages, generating reimbursement forms with one click, customizing travel guides based on group chat needs, and even analyzing chats with a crush to provide “extreme pull” suggestions.

However, user reactions tempered this enthusiasm—“Good news: group chats can be summarized with one click. Bad news: you have to forward it to the Yuanbao App.” Many users expressed disappointment, desiring an assistant that could actively serve within WeChat rather than requiring manual forwarding.

Yuanbao’s official response in the comments was candid: “The current forwarding mechanism is intentionally designed as ‘you actively seek me’ to ensure every interaction is authorized by you.”

This passive activation logic is a deliberate maintenance of WeChat’s moat and serves as a passport for Yuanbao to deepen its integration within the WeChat ecosystem. WeChat’s core asset is trust; if users perceive that chat records are being automatically read or used for training, this moat could be compromised. This is not a retreat in functionality but rather a prerequisite for Yuanbao to advance further within the WeChat ecosystem—maintaining privacy boundaries to gain deeper integration space.

The “red envelope link ban” incident during the Spring Festival proved this point. Yuanbao invested 1 billion in red envelopes, leading to a peak of 114 million MAU, but just four days later, WeChat’s security center banned Yuanbao’s red envelope sharing links for “inducing users to share links frequently and disrupting platform ecology”—even a sibling company was treated according to the rules.

Forcing a breakout is not feasible; instead, the “following the rules and slow penetration” approach has allowed Yuanbao to establish a foothold in the WeChat ecosystem.

Falling Out of the Top Three, but Yuanbao Remains in the Game

According to QuestMobile’s April report on the “Quarterly Monitoring of China’s AI Native Application Market,” as of March 2026, Doubao, Qianwen, and DeepSeek had MAUs of 345 million, 166 million, and 127 million, respectively, firmly occupying the top three spots. Yuanbao’s MAU stands at 57.35 million, with the red envelope battle’s new user acquisition effect nearly zero—post-holiday DAU quickly dropped to 9 million, with a quarterly user increase of 8.2 million, far behind Qianwen’s 126 million.

In March, Yuanbao attempted to enhance brand recognition by changing its logo: the original green Taiji symbol was updated with anthropomorphic big eyes, and the brand name was simplified from “Tencent Yuanbao” to “Yuanbao.” Officially, this was described as a shift from an efficiency tool to a warm, intelligent partner. User feedback indicated that this change had limited impact on brand recognition.

Yuanbao’s high expectations during the Spring Festival did not sustain momentum post-holiday; attempts to integrate with frameworks like OpenClaw and Hermes also failed to generate lasting buzz.

If one only considers the above information, it is easy to conclude that “Yuanbao has fallen behind.” However, this coordinate system does not accurately reflect Yuanbao’s true situation.

Tencent’s internal positioning of Yuanbao has never been about “defeating Doubao.” In September 2025, Tang Daosheng clarified Yuanbao’s strategic direction in a public interview—Tencent aims for Yuanbao to become a new entry point for C-end information searches, rather than just another generic AI dialogue product.

In December 2025, Tencent adjusted its browser, Sogou search, QQ input method, and other tool-related businesses from PCG to CSIG, merging them with the Yuanbao team. This organizational adjustment reflects the entry strategy: in the AI era, the way users express needs is shifting from keywords to natural language; whoever controls this entry point will hold the distribution rights for the next generation of traffic.

Following this logic, the launch of the group chat summary feature signifies more than just a product iteration. It marks the extension of Yuanbao’s integration with WeChat into “content processing.” Users already send and receive files, read articles, and manage chat records within WeChat; Yuanbao is gradually embedding itself into these scenarios, becoming a handy work and life assistant for users, with potential users exceeding 1 billion.

Simultaneously, Yuanbao’s underlying capabilities are undergoing a substantial upgrade. Liu Chiping mentioned during the earnings call that the mixed yuan 3.0 benefits from the collaborative design process with major products like Yuanbao and WorkBuddy, resulting in significant performance improvements across different products.

After the launch of mixed yuan 3.0, Yuanbao’s reputation for reasoning problems has seen a degree of reversal. This penetration process remains slow; users accustomed to Doubao are unlikely to switch due to a single capability upgrade, but for those already using Yuanbao within the WeChat ecosystem, the improvements in experience are tangible.

Liu Chiping noted in the earnings call that many users continued to use Yuanbao even during its less-than-perfect phase, primarily due to its deep integration with Tencent’s ecosystem. This stickiness is being reinforced as mixed capabilities improve and integration with WeChat deepens.

Although Yuanbao’s MAU has fallen to fourth in the industry, it has not exited the game and may even have greater potential ahead.

Tencent’s AI Strategy Accelerates on Multiple Fronts, Yuanbao Remains Key Before Major Moves

At the May 13 shareholder meeting, Ma Huateng used a sailing metaphor to describe Tencent’s current mindset: “A year ago, we thought we were on the ship, but later realized it was leaking; now we feel like we’re on it, but still can’t sit down, hoping the ship can speed up.”

This metaphor implies that Tencent’s period of self-doubt regarding its AI strategy has passed, entering a new phase of accelerated catch-up.

This mindset is built on substantial financial investment and tangible technological breakthroughs. In the first quarter, capital expenditures reached 37 billion yuan, with the earnings report explicitly stating that these funds are primarily for AI-related investments. Ma Huateng emphasized breakthroughs in new AI products in the first sentence of the business review, highlighting Hy3 preview and WorkBuddy as key focuses. These represent Tencent’s two core battlefronts in AI.

The first front is solidifying underlying capabilities. After the launch of Hy3 preview, it quickly topped the OpenRouter leaderboard. Even after the free period ended, the paid version’s daily usage remains more than twice that of the second-place DeepSeek V4 Flash. Developers are willing to pay for mixed capabilities, which is more persuasive than any ranking.

It should be noted that Hy3 preview still lags behind Seedance 2.0 and HappyHorse 1.0 in multimodal capabilities, with no differentiated advantages in areas like image generation and video understanding, which is a significant reason for Yuanbao’s current challenges in user acquisition.

The second front is validating intelligent agent products, with WorkBuddy being the most important example. Launched in March 2026, Ma Huateng classified it as “currently the most widely used AI efficiency agent service in China,” accumulating core experience in cross-application collaboration and enterprise services by integrating Tencent Docs, QQ Mail, and other office tools—these experiences will be directly reused in the development of WeChat intelligent agents.

Once these two fronts are operational, Tencent is finally revealing its ultimate plan. Liu Chiping signaled a key message during the earnings call: within WeChat’s vast ecosystem, Tencent has the opportunity to build a unique AI Agent system that deeply connects social graphs, instant messaging, and millions of mini-program resources.

This reflects a fundamental judgment about the migration of internet paradigms. For the past twenty years, the basic logic of the internet has been “people open apps to find services”; the evolution in the AI era may likely be “agents invoke services to meet people’s needs.” What Tencent truly needs to safeguard is not the monthly active ranking of a specific AI dialogue product, but to ensure that all services it controls—from social to content, from payment to transportation—remain core and callable in the agent era.

However, the arrival of WeChat intelligent agents cannot be rushed, as Ma Huateng mentioned, “This cannot be rushed out; everyone may need a bit of patience.” Until this ultimate move is fully revealed, Yuanbao plays a crucial role in keeping Tencent at the C-end table—Yuanbao contacts in WeChat, AI assistants in Tencent Meeting, and “streamlining assistants” that can be actively invoked in comment sections.

This distributed embedding approach allows Yuanbao to maintain an irreplaceable position in the WeChat ecosystem.

Looking ahead, Yuanbao’s focus may be more specific and pragmatic than just being an “independent entry point”—it could serve as a lightweight interface for Tencent’s AI products like IMA and WorkBuddy, unifying AI capabilities scattered across various apps and allowing users to avoid switching between multiple Tencent AI applications.

Thus, Tencent’s AI group combat system is becoming clearer: mixed yuan serves as the capability base, WorkBuddy as the intelligent agent pioneer, Yuanbao as the ecological interface, and WeChat intelligent agents as the ultimate move. While ByteDance and Alibaba have seized the initiative with rapid pacing, Tencent is following a different logic—first solidifying the base, then accumulating strength for a significant release. The true trump card has yet to be revealed.

Codex AI Achieves 40x Research Efficiency in Groundbreaking Experiment

Wed, 13 May 2026 00:00:00 +0000

Introduction

Today, Agentic AI engineers discovered that a research task requiring 80 hours for a PhD can be completed by Codex in less than 2 hours, achieving a staggering 40-fold efficiency increase! According to previous standards, AGI has already existed; the entire industry has simply been moving the goalposts.

The “singularity” in the research community is indeed closer than everyone anticipated.

Recently, an experiment involving Codex’s Goal Mode shocked the academic world: Codex can increase AI research efficiency by 40 times!

Agentic AI engineer Dan McAteer recently disclosed an experiment on X, using OpenAI Codex’s Goal Mode to run a mechanistic interpretability research task.

GPT-5.5 estimated that a PhD student would take about 80 hours to complete this task, but in practice, the AI finished it in just 1 hour and 56 minutes.

This represents an apparent efficiency boost of about 40 times!

The built-in skill used in Codex is /goal.

The author believes:

/goal + gpt-5.5 high precision + fast mode is the most efficient AI agent configuration today.

This means allowing the model to set its own goals, where the key is that the prompts it generates are likely better than yours.

This is no longer just a simple “efficiency improvement”; it is a complete “dimensionality reduction attack.”

As research cycles shrink from weeks to hours, and AI begins to autonomously draft its own experimental goals (/goal), we must confront a harsh reality:

The slope of the “intelligence explosion” has already emerged, and the speed of AI’s self-iteration is departing from human control!

What is Codex /goal Mode?

Let’s take a look at how this experiment was conducted.

The experiment was initiated by Dan McAteer, an Agentic AI engineer and former Amp Code engineer, who frequently shares practical experiences of AI agent engineering on X.

His experimental setup was simple:

Tool: OpenAI Codex /goal command
Model: GPT-5.5 high
Mode: fast mode
Task: A research task in the direction of Mechanistic Interpretability

He describes this configuration as the most efficient AI agent configuration currently available.

Why is Codex /goal Important?

What truly deserves attention is the /goal mode itself.

According to OpenAI Codex engineer Philip Corey, /goal is our implementation of the Ralph loop—allowing goals to persist across multiple dialogues, not stopping until achieved.

In simple terms, a standard Codex call is you say a sentence, it takes one step, and responds. Codex /goal allows you to state a goal, and it autonomously breaks down sub-tasks, executes them, reviews results, and continues until it either succeeds or fails.

This represents a shift from conversational AI to goal-driven AI.

For research tasks like Mechanistic Interpretability, the /goal mode is naturally well-suited.

The research process itself involves proposing hypotheses, designing experiments, running them, observing results, refining hypotheses, and re-experimenting—a perfect loop for a self-cycling agent.

McAteer’s experiment truly demonstrates the usability of the Codex /goal mode in cyclical research tasks: it does not replace researchers but rather replaces the repetitive operations performed by researchers.

If this capability can stabilize, it will have a very direct leverage on AI research itself.

It means that AI researchers within AI labs could one day use AI agents for repetitive tasks such as preparing training data, setting up experiments, conducting ablation studies, generating visualizations, and preliminary result analysis.

This aligns with what Anthropic and OpenAI have repeatedly stated: AI is accelerating AI research itself.

PhD 80 Hours vs AI 2 Hours

In the traditional research context, a PhD student’s daily routine involves reviewing literature, building models, debugging code, validating results, and writing reports.

This lengthy process is due to the physical limits of the human brain when processing complex logic and vast amounts of data.

However, Codex’s recent experiment completely shatters this perception.

Under the strongest agent configuration of /goal + GPT-5.5 High + Fast Mode, AI is no longer a tool that “follows commands” but an independent researcher that “generates strategies.”

It can understand complex natural language auto-encoder (NLA) experimental requirements, autonomously decompose tasks, and complete in less than 2 hours what human elites would take two weeks to accomplish.

This signifies that the threshold for human research has completely collapsed. The professional analytical capabilities that once required years of study are now being modularized by algorithms.

Moreover, autonomous AI researchers have already arrived ahead of schedule!

OpenAI previously set a goal for achieving autonomous AI research by the end of 2026. However, based on current experimental progress, 2026 may not be the beginning but rather the endpoint where humanity completely hands over the research baton.

Evidence of Recursive Self-Improvement Emerging

If Codex’s 40x speed experiment is a glaring case, what is even more unsettling is the growing evidence surrounding “recursive self-improvement.”

On May 7, Axios reported that Anthropic co-founder Jack Clark publicly provided a probability:

By the end of 2028, the probability of AI achieving complete recursive self-improvement exceeds 60%.

Sakana AI and UBC’s research team this year developed the Darwin Gödel Machine, a programming agent capable of rewriting its own source code to enhance its capabilities.

In SWE-bench, its score improved from 20.0% to 50.0% without any human intervention.

The same team’s AI Scientist project was published in Nature in March this year.

It can independently generate research ideas, write code, run experiments, draft complete papers, and conduct peer reviews.

A complete research pipeline, from start to finish, is accomplished independently by AI.

Now, let’s look at a set of hard data. GPQA Diamond, a scientific question-answering benchmark set by PhD experts, saw GPT-4 score 39% in November 2023, while the average score of human domain experts was about 65%.

By April 2026, cutting-edge models collectively surpassed the threshold: Gemini 3.1 Pro scored 94.3%, while Claude Opus 4.7 scored 94.2%.

All cutting-edge models have far outpaced human PhD experts.

The trajectory of SWE-bench further illustrates the acceleration.

At the end of 2023, Claude 2’s pass rate was 2%. Now, it stands at 93.9%.

In just two and a half years, it skyrocketed from 2% to 93.9%.

This curve, once drawn, is recognizable to anyone who has studied high school mathematics.

Clearly, the process of recursive self-improvement (RSI) has already begun.

Once AI starts rewriting its underlying code and optimizing its architecture at this 40x efficiency, the growth of intelligence will no longer be linear but vertical.

AGI Has Been Delivered, and the Entire Industry is Gaslighting You

In fact, as early as February this year, four scholars from different top fields jointly published a paper that can be described as the “most unsettling of the year”: “AGI Case Study: Today’s LLMs Have Met the Criteria.”

The four authors represent the four pillars of contemporary intelligence: philosophy, machine learning, linguistics, and cognitive science. They reached a chilling consensus:

According to definitions prior to 2022, AGI has already been achieved.

The reason no one acknowledges it now is that the entire AI industry is engaging in a collective “gaslighting effect” against the public.

The paper pointed out that humans exhibit a strong “psychological defense mechanism” when faced with the rise of AI.

Before 2022, as long as a model could pass the Turing test and handle tasks across domains, it was considered AGI.

With the emergence of ChatGPT: “Just having these capabilities is not enough; it must also have perfect reasoning, embodiment, and self-awareness.”

Each time a model breaks through a barrier, humans spontaneously add new, elusive criteria as thresholds, continuously moving the goalposts.

The problem is, if AGI already exists, the current industry logic becomes extremely absurd.

OpenAI is still raising $40 billion claiming to “build AGI”; Anthropic packages each new model release as a futures contract “close to AGI.”

The paper sharply reveals that the giants are disguising something that has already been “sold to you” as a miraculous achievement “soon to be developed” to secure a continuous flow of funding and power.

The Eve of the Intelligence Explosion

Today, we find ourselves at an extremely strange juncture.

In laboratories, AI is already conducting mechanistic interpretability research at 40 times the speed, even helping itself write code.

In the market, computing power remains a hard currency, with Nvidia’s Blackwell chips being snatched up, each chip accelerating the arrival of that singularity.

However, in social psychology, the public is still using outdated terms like “repeater” and “probability prediction” to comfort themselves.

If 40 times the research efficiency becomes the norm, the accumulated knowledge of human civilization over thousands of years could be doubled by AI in just a few months.

When AI can independently complete PhD-level tasks, our existing education systems, title evaluations, and even the very meaning of the term “expert” will face existential threats.

Just as Copernicus removed Earth from the center of the universe, AI is now displacing humanity from the sanctum of being the “only intelligent life.”

Now, this war called the intelligence explosion is happening without gunpowder.

We must either learn to coexist with this new intelligent species or watch helplessly as it leaves us in the dust at 40 times the speed.

Codex: The Ultimate Programming Tool of 2026 for Apple Silicon Users

Wed, 13 May 2026 00:00:00 +0000

Codex: The Ultimate Programming Tool of 2026

I have used Codex for three months and truly believe it is the most worthwhile programming tool of 2026, bar none.

Unlike ChatGPT, which is merely a “Q&A robot,” Codex is a complete autonomous agent. You give it a requirement, and it can read code, write code, run tests, and fix bugs without any intervention from you. I had a project that used to take two days to complete; now, with Codex, I can finish it in half an hour.

Even more impressive is its multi-agent mode: you can have several “clones” working on different tasks simultaneously—one checking documentation, another writing code, and another doing code reviews. Running six threads concurrently is more efficient than managing three interns.

However, OpenAI’s pricing strategy can be daunting: the Plus version costs $20/month with strict usage limits, and you can exceed the limit after writing just two medium-sized projects. The Pro version costs $200/month (about 1400 RMB) for unlimited use. For developers in China, there are additional network issues—connecting to api.openai.com is often hit or miss.

While third-party proxies are cheaper, they come with risks, and your code and conversation data all pass through third-party servers—would you feel comfortable sending your core business logic to someone else?

Is there a solution that doesn’t require a subscription, doesn’t involve sending code to third parties, and allows Codex to run perfectly?

Yes. And for Apple Silicon users, the experience is fantastic.

The answer is: Local Large Models + Codex. Completely free, fully offline, and entirely private.

Here’s a tutorial to get you set up in just 10 minutes.

Why Local Large Models Can Be Used with Codex

The core principle is the same as connecting to a third-party API: Codex only recognizes OpenAI formatted API interfaces.

Whether the backend is OpenAI’s GPT-5, a third-party proxy, or a large model running locally on your Mac, as long as the API request and response JSON formats are compatible, Codex cannot tell the difference.

Your request flow comparison:

Codex App → api.openai.com → OpenAI server          (Official, $200/month)
Codex App → Proxy Address → OpenAI server              (Proxy, cheaper but risky)
Codex App → localhost:8000 → Your Mac local model       (Local, free + private)

In Codex’s configuration file config.toml, there is a parameter called openai_base_url. After setting it, all requests will be sent to your specified address.

Point it to http://localhost:8000/v1, and Codex will communicate with the large model running on your Mac. Zero latency, zero cost, zero privacy leaks.

Prerequisites

You need to meet two conditions:

A Mac with Apple Silicon (M1/M2/M3/M4 series)
Recommended 24GB or more RAM (16GB can only run 7B-14B models, 8GB has a poor experience)
Codex itself also occupies memory, so 8GB is basically insufficient
If memory is insufficient, you can consider using a third-party API proxy solution, spending $20 to use GPT-5, which has no hardware requirements
Codex App must be installed (desktop version, not the web version)

If your Mac meets the conditions, congratulations—you already have an “AI programming supercomputer” in your hands, just waiting to be activated.

Choosing a Local Large Model Inference Tool

Apple Silicon users have three mainstream choices, ranked by recommendation:

Option 1: MLX (omlx) — Apple Native, Best Performance ⭐ Recommended

MLX is Apple’s official machine learning framework, optimized for Apple Silicon. omlx is a model inference server based on MLX, providing a fully compatible OpenAI API interface.

Advantages:

Official Apple framework with the highest GPU acceleration efficiency
Supports 4bit/8bit quantization, smoothly running 27B level models with 24GB RAM
Ready to use; just one command after pip install

Installation:

pip install omlx
omlx serve

After starting, it provides API services at http://localhost:8000 by default.

omlx includes model download management, automatically pulling the default model (Qwen3.6-27B-4bit, about 14GB) on the first startup.

Option 2: Ollama — Cross-Platform, Richest Ecosystem

Ollama is currently the most popular local large model running tool, supporting Mac/Windows/Linux.

Advantages:

The richest model library, download models with ollama pull
Active community with comprehensive documentation
Cross-platform, usable by Windows users as well

Installation:

# On Mac, you can use Homebrew
brew install ollama
ollama serve          # Start the service, default port 11434
ollama pull qwen3.6   # Download the model

Ollama’s default port is 11434, so make sure to change the port number when configuring Codex.

Option 3: LMStudio — Graphical Interface, Most User-Friendly

LMStudio provides a complete GUI interface, suitable for users who prefer not to use the command line.

Advantages:

Graphical interface for managing models, easy to download, load, and unload
Built-in local chat interface to experience model effects first
One-click to start the local API server

Installation: Visit the LMStudio official website to download and install. After starting, click on “Local API Server” on the left side.

Step-by-Step Configuration of Codex

No matter which tool you choose above, the steps to configure Codex are exactly the same—you only need to modify two files.

Step 1: Locate the Codex Configuration Directory

All configuration files for Codex are hidden in the .codex folder under your user home directory:

System	Path
macOS	/Users/your_username/.codex
Windows	C:\Users\your_username.codex

macOS users can type ls ~/.codex in the terminal to see it. If the folder does not exist, launch the Codex App once to automatically create it.

Step 2: Configure API Key

In the .codex directory, find auth.json (create a new one if it doesn’t exist) and write the following content:

{
  "auth_mode": "apikey",
  "OPENAI_API_KEY": "your local model API-Key"
}

Different tools have different API Keys:

Tool	API Key
omlx	omlx-2026-qwen36 (default key, can be viewed in ~/.omlx/settings.json)
Ollama	ollama (fixed value, Ollama’s key is always this)
LMStudio	leave blank or fill any string

⚠️ Important: auth_mode must be set to “apikey”. This tells Codex to “authenticate using API Key, do not pop up the ChatGPT login window.”

Step 3: Configure Local Model Address (Key Step)

In the .codex directory, find config.toml (create a new one if it doesn’t exist) and add these configurations at the very top of the file:

For omlx:

model = "Qwen3.6-27B-4bit"
model_reasoning_effort = "high"
openai_base_url = "http://localhost:8000/v1"

For Ollama:

model = "qwen3.6"
model_reasoning_effort = "high"
openai_base_url = "http://localhost:11434/v1"

For LMStudio:

model = "the model name you loaded"
model_reasoning_effort = "high"
openai_base_url = "http://127.0.0.1:1234/v1"

Pay attention to a few details:

The model must be the actual available model name in your local tool (check in the omlx/Ollama backend)
openai_base_url must be at the top level of the file, not inside any [section] block
The URL must end with /v1; missing a slash will cause issues
Use http:// for local services; no need for https://

Complete configuration example (omlx):

model = "Qwen3.6-27B-4bit"
model_reasoning_effort = "high"
openai_base_url = "http://localhost:8000/v1"

[projects."/Users/tianxi"]
trust_level = "trusted"

# ... other configurations remain unchanged

After making these changes, just three lines of configuration.

Step 4: Restart Codex

This step can trip up many people:

Ensure the local model service is running (do not close the terminal window for omlx/Ollama)
Completely exit the Codex App. macOS users should press Cmd+Q to ensure there is no Codex icon in the Dock. Just closing the window is not sufficient; you must exit completely.
Restart the Codex App.
Create a new session and send a message to test.

If you receive a response normally, congratulations—Codex has connected to your local large model, and you can now use it for free and without limits.

Step 5: Verification (Optional)

If you are unsure about the configuration, you can test the local service’s reachability using curl in the terminal:

For omlx:

curl -s http://localhost:8000/v1/models \
  -H "Authorization: Bearer omlx-2026-qwen36" | python3 -m json.tool

For Ollama:

curl -s http://localhost:11434/v1/models \
  -H "Authorization: Bearer ollama" | python3 -m json.tool

If a list of models is output, it indicates that the Key and address are correct. If Codex still reports an error, it is likely due to a formatting error in the configuration; go back and check Step 3.

If you find the local model’s response too slow or its capabilities insufficient, don’t hesitate to switch to a third-party API proxy—just change one line of configuration.

Local Large Models vs GPT-5: What is the Real Difference?

Let’s speak frankly.

Local models at the Qwen3.6-27B level can achieve about 80-85% of GPT-4o’s performance in code generation tasks. For daily CRUD operations, bug fixes, script writing, and simple refactoring, it is completely sufficient.

The differences mainly lie in:

Complex architecture design: GPT-5 better understands the overall architecture of large projects, while local models may occasionally have a limited perspective.
Multi-turn dialogue consistency: In long contexts, local models may forget previous constraints.
Code review depth: GPT-5 can identify more hidden bugs and security issues.

But think about it this way: 85% capability, $0/month, vs. 100% capability, $1400/month. For most developers’ daily usage scenarios, local models offer a far better cost-performance ratio than the official subscription.

Moreover—it’s free, and you can try it infinitely without worry. If you mess up, you can just start over without quota anxiety.

Frequently Asked Questions

Q1: Is 16GB RAM enough?

To be honest: 16GB running a 27B level model provides a very poor experience. Qwen3.6-27B-4bit quantized is about 14GB, macOS itself takes up 4-6GB, and the Codex App occupies another 1-2GB—16GB machines simply do not have enough memory, and the system will swap heavily (using the hard drive as memory), resulting in response times that will make you question your sanity.

Actual recommendations:

Memory	Recommended Model	Experience
8GB	7B level (e.g., Qwen2.5-7B)	Basic coding assistance usable, limited capability
16GB	7B-14B level	Sufficient for daily coding, cost-effective choice
24GB+	27B level (e.g., Qwen3.6-27B-4bit)	Smooth experience, recommended
32GB+	27B level + larger context	Best experience

If you have 16GB RAM, it is recommended to run a 7B or 14B model with Ollama, such as ollama pull qwen2.5:7b. Its capability is not as strong as 27B, but it is smooth and does not lag.

Q2: Why doesn’t the configuration take effect?

90% of the time, the reason is that openai_base_url is written in the wrong location. It must be at the top level of the file, not inside any [section] block.

Additionally, after modifying the configuration, you must completely exit the App and then restart (Cmd+Q); just closing the window is insufficient. Codex only reads the configuration once at startup.

Also, ensure that the local model service is indeed running—do not close the processes for omlx/Ollama in the terminal.

Q3: Still getting 401 Unauthorized?

Check if the API Key has any extra spaces or line breaks (these can easily be introduced during copy-pasting).
omlx users should check the API key settings in ~/.omlx/settings.json.
Ollama users should confirm the key is “ollama” (in lowercase).
Use the curl command above to test directly and confirm the key is valid.

Q4: How to switch back to the official subscription or third-party proxy?

To switch back to the official subscription: edit config.toml and delete or comment out the openai_base_url line (add # in front):

# openai_base_url = "http://localhost:8000/v1"

Then edit auth.json to change auth_mode back to “chatgpt”:

{
  "auth_mode": "chatgpt"
}

Restart Codex, and the ChatGPT login window will pop up; you can log in normally by scanning the code.

Q5: What to do if the local model responds slowly?

Ensure the model has been fully loaded into memory (the first request may be slow, but subsequent ones will be faster).
Close other applications that occupy the GPU (like video editing software).
Try using a smaller model (7B level responds faster).
omlx users can adjust the context window size in model_settings.json; reducing the context can improve speed.

Q6: Can Windows users use it?

Yes, but with fewer choices. It is recommended to use Ollama (perfectly supports Windows) or LMStudio. Running local models on Windows requires an NVIDIA GPU with sufficient memory, and the experience is not as smooth as on Mac, but the functionality is identical.

Local Large Models vs Third-Party Proxies: How to Choose?

The Impact of Generative AI on Artistic Creation

Sun, 10 May 2026 00:00:00 +0000

The Impact of Generative AI on Artistic Creation

As artificial intelligence deeply integrates into various aspects of society and industry, it sparks a new wave of transformation. The involvement of generative AI in artistic creation brings vitality but also raises a series of questions: Can it replace artists? Will it shake the foundational values of art? Or is it rewriting the entire logic of subjectivity established for art? It is essential to confront these issues within the contexts of art history, technology history, and the construction of subjectivity, rather than simplifying them to mere efficiency gains or the optimistic notion that “everyone is an artist.”

Human-Machine Collaboration and Originality

The first challenge posed by human-machine collaboration is the originality of art. With the rapid development of large language models and multimodal models, natural language interaction has become a fundamental method for collaborative creation. In this process, the production of text, music, images, and videos is significantly affected, though the impact is not uniform. In fact, generative AI’s role varies across different art forms and levels of involvement. Art forms that utilize digital media are undergoing systematic reshaping. For instance, in the field of video creation, independent creators can leverage generative AI to directly generate scripts, storyboards, visuals, music, and post-production styles through prompts, significantly compressing or even eliminating the collaborative and physical operational stages traditionally required.

In the visual arts, if we still understand it as a form of artistic expression associated with a specific medium and manual creation, the involvement of generative AI will alter the creative process. In traditional art creation, artists use tools like brushes and chisels, relying on their mastery of techniques to transform creative ideas into tangible works. The intervention of generative AI primarily affects the early stages of visual imagination and concept generation, rather than directly eliminating drawing, sculpting, and production. Creators still need to possess skills in materials, techniques, and form control to select, edit, and deepen the image resources provided by machines, thereby transforming them into artworks. This active participation by creators highlights their intellectual intent, which reflects the originality of the work. If creators reduce or forgo practical operations, such creations may not be considered part of visual arts.

Restructuring the Creative Process

It is evident that the impact of generative AI on visual art is not merely about replacing artists; rather, it reorganizes the significance of various stages within the creative process. Certain preliminary cognitive activities that were once viewed as crucial are now partially transferred to algorithmic systems, while techniques that previously tested execution skills, selection, and reproduction capabilities are regaining importance in many specific creative practices. This indicates that understanding the relationship between AI and visual art should stem from this structural change, rather than superficial judgments about whether AI replaces human artists.

Redefining Subjectivity

Redefining the position of the subject is a valuable reference brought by generative AI. Similar to the emergence of photography, generative AI forces creators to confront a new mechanism of visual generation and compels them to reconsider which abilities can be taken over by technology and which need to be redefined and maintained by the creator. Generative AI touches upon composition, combination, style simulation, and even artistic concepts, which are closer to human cognitive activities. These cognitive activities, once seen as manifestations of creative subjectivity, are now partially shared or replaced by technology. Generative AI is transitioning from a mere auxiliary tool to a quasi-subject participating in cultural production, which is particularly sensitive in the current context of artistic creation. When it becomes difficult to determine how much of a creative idea, composition, or concept originates from the author, the stability of originality as the core of artistic value begins to waver. The question then shifts from whether generative AI can create art to how art should be defined in light of significant generative AI involvement.

Democratizing Artistic Creation

Within the discourse of new popular literature and art, the involvement of generative AI in visual art creation also serves as a breakthrough for dismantling professional monopolies, redistributing cultural power, and integrating creative structures. Utilizing generative AI for creation allows bypassing certain traditional training paths while also presenting new capability requirements for creators, such as prompt organization, model understanding, image selection, style judgment, and cross-media integration. This indicates that generative AI does not eliminate professionalism; rather, it reshapes the content and form of professionalism.

The involvement of generative AI directly impacts the monopolistic structures in visual art creation: first, it weakens the traditional technical monopoly over creative entry, allowing those without formal training to enter visual production; second, as the boundaries of originality expand, visual art creation is no longer an internal affair of a few professional groups but becomes a cultural practice that broader societal subjects can engage in. In this process, the relationships between creation, dissemination, and evaluation are also changing: the public is not only viewers and consumers but also creators, disseminators, and evaluators. However, the control over platforms, algorithms, and models remains in the hands of a few technical entities, who reshape creators’ tastes and choices through model preferences and data training, causing new popular practices to fall again under the discipline of technological power. While creative rights have partially decentralized, the decentralization of evaluative rights remains unresolved. Only when creative rights, dissemination rights, and evaluative rights are all restructured can the new wave of popular visual art brought by generative AI drive a more structurally significant cultural shift.

The Essence of Generative AI

Essentially, generative AI is a highly complex stylized reorganization and interpretation based on existing data. Its underlying logic is “learning” and “optimization,” rather than “subversion” and “revolution.” Currently, generative AI lacks the most fundamental source of creativity found in artists—the embodied emotional experiences of individuals. Artistic creation, especially great works, is deeply rooted in the unique life insights and profound spiritual realms of the artist. Therefore, in facing generative AI, it should be viewed as a co-creation tool that inspires creativity, expands imagination, and enriches expression, rather than a complete substitute for creation.

In conclusion, in the era of artificial intelligence, the nature of art is undergoing unprecedented renewal and reconstruction. The deep driving force behind this transformation is the dual impetus of technological revolution and cultural awareness, prompting us to engage in multifaceted reflections. Properly understanding the relationship between artificial intelligence and visual art, and clarifying the intrinsic value of art, will help achieve better human-machine co-creation and unlock new artistic possibilities.

Domestic AI Pricing Signals Shift from Market Capture to Value Verification

Tue, 05 May 2026 00:00:00 +0000

Introduction

The recent pricing announcement from Doubao App Store, revealing subscription rates for its AI services, signifies a pivotal shift in the domestic AI industry. The standard annual package is priced at 688 yuan, with other tiers at 2048 yuan and 5088 yuan for enhanced and professional versions, respectively. Although the details are still being tested, the framework of “free basic + paid premium” is becoming clear.

Transition from Free to Paid

This transition is not surprising. Free services and subsidies have always been competitive strategies rather than the norm. The recent AI red envelope competition during the Spring Festival was about gaining market entry and user habits, representing the last major traffic harvest of the mobile internet era. However, the reality is that there is no such thing as a free lunch. As the saying goes, “Token shortages” have become a consensus, and the cost of computing cannot be indefinitely covered by capital. Thus, charging for services has become an inevitable commercial model, marking a rite of passage for technology commercialization.

Industry Trends

A recent report from the Daily Economic News corroborates this industry trend. With the conclusion of the A-share annual report season, AI application companies have shown a mixed but converging performance for 2025: 35 companies reported a year-on-year net profit growth of over 50%, while 40 companies saw a decline in revenue. On one hand, the revenue scale continues to expand, with AI capabilities deeply embedded in core products, leading to rapid growth in user engagement and usage metrics. On the other hand, high R&D investments are still compressing short-term profits, placing the industry at a critical juncture from technological breakthroughs to commercial validation.

Three Key Realities

The swift transition from free services to paid plans reveals three key realities:

Computing Costs Are Real: Subsidies are essentially a countdown where capital is exchanged for time, which cannot be sustained indefinitely.
Model Capability Maturity: The iteration of model capabilities has reached a monetization phase, where complex task scenarios justify charging fees, and the disparity between free and paid experiences supports tiered offerings.
Industry Consensus: There is a growing disconnect between the grand narrative of “AI changing civilization” and the reality of “subsidy-based marketing”. If charging does not begin, even those telling the story will lose faith.

Doubao’s Strategic Move

As one of the largest domestic AI applications, Doubao’s decision to charge for its services reflects confidence in its technological value and represents a breakthrough in a highly competitive industry. If Doubao does not take this step, its competitors may find it difficult to follow suit. In contrast, international players like ChatGPT and Claude have already adopted similar pricing strategies.

Implications for the Future

This is not merely a price adjustment; it signals a critical shift for domestic large models from “burning money to capture market share” to “self-sustaining growth through iteration”. The competition logic has shifted from “traffic accumulation” to “value realization”. The pressing question is not whether AI should charge fees but rather how to ensure that users continue to pay after the transition.

When users begin to pay monthly, the era of “story-driven user acquisition” ends, and the era of “value verification” begins. In the free era, the focus was on daily active users (DAU), where users tolerated basic functionality. In the paid era, the focus shifts to user retention and depth of engagement, where users demand a product that is not only good but also irreplaceable.

Value Verification

Doubao’s paid features are anchored in complex productivity scenarios such as PPT generation, data analysis, and film production. These tasks require robust foundational model capabilities and test the limits of context understanding, multi-modal collaboration, and professional logic. Any shortcomings will be magnified under the scrutiny of paying users. If the paid experience remains at a “toy” level, users will vote with their feet.

Thus, pricing is merely the entry ticket; true competitiveness lies in the ability to deliver top-tier experiences to paying users. This includes providing fast-track access during peak times rather than causing anxiety from long wait times, which is crucial for retention rates and average revenue per user (ARPU).

Sustainable Growth

Moreover, the revenue generated from charging must be reinvested in computing power and R&D iterations, creating a positive feedback loop of “technology upgrades – experience enhancements – user payments – reinvestment”. This is not only vital for Doubao but also for establishing a sustainable competitive edge for domestic AI on the global stage.

Conclusion

Charging does not equate to abandoning accessibility. The free version will continue to support everyday use, while the paid version will meet professional productivity needs. This tiered approach is reasonable, but companies must recognize that the right to charge is granted by users, not bestowed by the platform. In this hard-hitting phase of value verification, only by consistently demonstrating real capabilities and providing visible upgrades can companies ensure users feel their spending is justified, thus encouraging ongoing payments. This may be the core challenge that all domestic large models must face moving forward.

Strengthening Application-Driven AI Development in China

Tue, 28 Apr 2026 00:00:00 +0000

Introduction

General Secretary Xi Jinping emphasized the need to deepen and expand “Artificial Intelligence +” and improve AI governance during the 2025 Central Economic Work Conference. The 14th Five-Year Plan outlines a comprehensive approach to promote intelligent technology empowerment and seize the high ground of AI industrial applications. These important directives reveal the strategic direction and practical focus for developing AI in China. As a general-purpose technology, the vitality of AI lies in its applications, and its core value is in empowerment. Strengthening application-driven development and promoting the deep integration of AI across various industries is essential for fostering new productive forces and creating a new intelligent economy.

Global AI Competition

Currently, the focus of global AI competition is undergoing profound changes. Early competition was primarily centered on breakthroughs in algorithms, parameter scales, and chip performance. Today, the competition increasingly extends to the efficiency of industrial application conversion, depth of scenario penetration, and system collaboration capabilities. For China, advantages lie not only in continuous technological innovation but also in the support of a vast market, a complete industrial system, diverse application scenarios, and abundant data resources. If these advantages cannot be effectively transformed into high-level application capabilities and high-quality industry solutions, it will be challenging to truly grasp the initiative in development. Thus, seizing the high ground of AI industrial applications is not merely a matter of industrial layout but a strategic choice concerning China’s position in future international division of labor.

Domestic Development

From a domestic perspective, strengthening application-driven development is a practical requirement for cultivating new productive forces and promoting high-quality development. AI is characterized by extensive penetration, deep collaboration, and continuous empowerment, capable of reshaping research and development paradigms, production methods, and governance models. In research and development, AI is accelerating new drug discovery, material creation, and product design, significantly shortening innovation cycles. In production, AI can promote predictive maintenance, process optimization, flexible manufacturing, and quality control, shifting the manufacturing system from scale expansion to precision manufacturing. In services, AI accelerates the transformation of supply methods in finance, logistics, healthcare, and education, better matching the diverse and personalized needs of the public. Strengthening application-driven development aims to accelerate the transformation of AI’s technological potential into real productive forces, enhance total factor productivity, and create new growth points and competitiveness.

Deep Integration of AI and Industry

Furthermore, strengthening application-driven development and promoting the deep integration of AI with industrial transformation can reshape value creation and guide precise resource allocation. China is accelerating the creation of a new intelligent economy, where economic activities begin to revolve around intelligent demands in specific application scenarios. Industrial competition increasingly focuses on improving the efficiency of AI supply, with value realization relying on the continuous invocation of AI, service-oriented outputs, and revenue sharing. In this process, application-driven development is paramount, emphasizing resource allocation based on demand recognition, capability invocation, and actual outcomes. Key elements such as capital, computing power, data, and talent should converge around high-value scenarios, flowing towards areas that can effectively address real pain points and generate stable returns. This new organizational model, supported by AI and driven by applications, not only fosters new business models and expands new growth spaces but also drives innovation and optimization in employment structures, industrial structures, and income distribution, injecting more sustainable and deeper momentum into high-quality development.

Strategic Logic and Practical Implementation

Having clarified the strategic logic of why to strengthen application-driven development, it is essential to address the practical question of how to do so. Ultimately, AI competition is a comprehensive competition of technological and application capabilities. To better empower economic and social development with AI, it is crucial to solidify the application drive, deepen the integration, and strengthen the foundational ecosystem.

Expanding High-Value Scenarios

Scenarios serve as the testing ground for AI maturity and the carrier for technology to transform into industrial capabilities. Without real scenarios to drive development, technological breakthroughs struggle to create stable demand; without large-scale application implementation, innovative results cannot accumulate into competitive advantages. Focus should be maintained on key areas such as manufacturing, transportation, energy, healthcare, education, and government, continuously deepening and expanding “Artificial Intelligence +” to push AI from demonstration verification to process embedding, and from single-point efficiency improvements to system-wide enhancements. Resource allocation should shift from emphasizing parameter scales and project deployments to valuing scenario benefits, delivery capabilities, and actual returns, with a focus on forming industry-level models, intelligent agents, and solutions. It is particularly important to leverage the driving role of leading enterprises, chain master enterprises, and platform enterprises to encourage collaborative innovation and joint efforts among upstream and downstream small and medium enterprises, accelerating the transformation of scenario advantages into industrial and competitive advantages.

Promoting Deep Integration Applications

AI’s empowerment of industries should not be superficial embedding but rather a genuine integration into business processes, organizational systems, and value chains, becoming a significant force in reshaping production methods and management models. Focus should be placed on key aspects such as production, services, and management, promoting deep coupling of AI with industrial internet, digital twins, and intelligent equipment to effectively address real-world issues such as quality control, equipment maintenance, supply collaboration, risk identification, and decision support. Coordinating the collaborative configuration of computing power, data, energy, and networks is essential, emphasizing system capabilities, collaborative scheduling, and efficiency improvements in new infrastructure construction. Only by embedding AI into core business processes and connecting it to foundational support systems can we achieve a transformation from usable to highly effective, from local breakthroughs to overall leaps.

Establishing a Collaborative Innovation Ecosystem

The implementation of AI applications often cannot be achieved by a single enterprise or technology alone; it requires collaboration across various aspects such as scenario openness, technology supply, data support, financial services, talent assurance, and institutional norms. A systematic perspective should be adopted to promote collaboration among governments, enterprises, universities, research institutions, financial institutions, and industry organizations, integrating the innovation chain, industrial chain, funding chain, and talent chain. Governments should strengthen planning guidance, policy supply, and standard construction to create a stable and predictable development environment. Enterprises should highlight their role as innovation leaders, leveraging the driving role of leading enterprises while also developing lightweight, low-cost solutions suitable for small and medium enterprises. Universities and research institutions should conduct organized research focused on industrial needs, facilitating the transition of more results from laboratories to production lines. Financial institutions should address the characteristics of AI R&D, which involves high investment, long cycles, and high risks, by enhancing technology finance. Additionally, it is crucial to adapt to the new trend of AI being widely embedded in the entire production and operation process, actively improving data governance, security governance, and accountability mechanisms, and cultivating composite talents who understand both technology and industry, as well as application and governance, to form an open, orderly, mutually empowering, and sustainably evolving development ecosystem.

Anthropic's Claude: A Strategic Shift in AI Product Design

Mon, 27 Apr 2026 00:00:00 +0000

Introduction

Recently, the most talked-about news in the AI community is not about model scores or new demos, but rather the capabilities released by Anthropic around Claude: enhanced automation, deeper device control, and a closer alignment with real workflows.

Many may think that AI Agents have evolved again. However, from a product manager’s perspective, the real focus should not be on what new tasks the Agent can perform, but on how Anthropic is trying to transform Claude from “a model that answers questions” into “a product that can undertake tasks and occupy the workflow entry point.”

This is not just a regular feature upgrade; it represents a change in product role. Once the role changes, the competitive logic shifts as well.

The Surface Upgrade vs. The Core Competition

In the past two years, the most mainstream form of AI products has been clear: users ask questions, and models respond; users continue to ask, and models provide further information. The core value has been “getting answers faster.”

However, Claude’s recent actions indicate a shift in product direction: moving beyond just answering questions to understanding tasks; not just generating content but invoking capabilities; and not remaining confined to a chat interface but gradually integrating into devices, tools, contexts, and workflows.

For product managers, Q&A products and task-oriented products are not in the same competitive category. The former competes on model capabilities, answer quality, and interaction experience, while the latter competes on whether it can address real goals, complete tasks across tools, establish execution trust, and become the default entry point for initiating work.

In other words, the competitive focus of AI products is shifting from “who is smarter” to “who is closer to the starting point of work.” Once this position is occupied, the value will far exceed single-point capabilities.

Importance for Product Managers

This shift reminds us that the core question for the next stage of AI products is no longer about “whether to implement an AI feature,” but rather: what role does AI play in your product?

Many teams currently working on AI remain at a superficial level: adding AI to search boxes, summarizing content on pages, enhancing forms, or integrating assistants in the backend. While these are not wrong, they often serve as “feature patches” rather than “product reconstructions.”

The real significance of Claude’s recent actions lies in its attempt to answer a larger question: if AI is no longer just a plugin but a human-machine interface that can complete tasks for users, how should product boundaries be redrawn?

This will directly impact three judgments for product managers: product entry points will be restructured; product value will shift from “tool usability” to “task trustworthiness”; and AI products will increasingly resemble “organizational capabilities” rather than “single-point capabilities.”

The Workflow Competition

If we break down the development of AI products into several stages, it can be viewed as follows: the first stage is content generation tools, the second stage is conversational assistants, and the third stage is task agent entry points.

The biggest difference among these three lies not in the models but in the depth of product involvement in work. When AI only helps you write a line of copy, it replaces a local action; when AI begins to manage steps, invoke tools, and execute processes, it takes over the task flow.

Once a product transitions from an “answerer” to an “executor,” the competition is no longer limited to similar AI products but will begin to encroach on the core areas of many existing products: search entry points, office workflows, SaaS navigation, and the complexity of vertical tools will all be re-abstracted by Agents.

Thus, what truly deserves attention is not the addition of more flashy features but the entire industry being forced to answer: if AI can become the first entry point for workflows, what remains irreplaceable in your product?

Understanding the Layers of Integration

Many teams see such trends and react by saying: “We should also add an AI assistant, create a chat interface, or integrate Agent capabilities.” However, this is often not the key.

The crucial point is to first determine: at which layer should AI be integrated into your product?

I suggest at least considering three layers: capability enhancement layer—helping existing functions become more efficient; process collaboration layer—assisting users in completing a process across functions; task agent layer—directly understanding goals, invoking tools, providing feedback, and handling exceptions.

The majority of products today should not fantasize about jumping directly to the third layer but should clarify whether they have the opportunity to establish an advantage in the second layer. Because the second layer determines whether you will have the qualification to enter the third layer in the future.

Insights for Mature Product Managers

If you only observe the excitement, you might conclude: Claude has been updated, and Anthropic is impressive.

However, from a product perspective, you should at least recognize four more important signals: the value anchor of AI products is shifting; user expectations of AI are upgrading; competitive units of products are changing; and the core work of product managers will not be diminished but rather elevated.

As capabilities grow stronger, what becomes truly scarce is not “whether there is AI,” but the ability to define problems, abstract scenarios, design processes, outline risk boundaries, and create a sense of user trust.

In simple terms, as models increasingly resemble commodities, the value of product judgment becomes more significant.

Conclusion

Claude’s recent actions, if merely understood as “another feature upgrade,” actually underestimate its significance.

It truly indicates that AI products are transitioning from “assisting expression” to “agent execution,” from “tool supplementation” to “workflow entry.” And this change primarily impacts not the rankings among model companies but the design logic of all software products.

Therefore, for product managers, the most important question today is no longer “Should we implement AI?” but rather, “As AI begins to take over task entry points, what should our product retain, reconstruct, or abandon?”

Those who can answer this question sooner will have a better chance of remaining at the table in the next stage.

Comparing GPT-5.5 and Claude Opus 4.7: Task Suitability Over Benchmark Scores

Mon, 27 Apr 2026 00:00:00 +0000

Introduction

Initially, I intended to frame this article as a model showdown between GPT-5.5 and Claude Opus 4.7. A simple comparison table could easily illustrate who performs better. However, after reviewing the official materials from OpenAI and Anthropic, I changed my mind. The real question is not about which model outperforms the other, but rather what you want from AI: less tool switching or less progress monitoring?

This article will provide a more practical evaluation.

Task Suitability

Short tasks are better suited for GPT-5.5, while long delivery tasks are more appropriate for Opus 4.7. I initially thought the focus would be on benchmark scores, but that can lead to misjudgments since both models provide numerous metrics. For instance, OpenAI reports that GPT-5.5 scored 82.7% on Terminal-Bench 2.0, which evaluates complex command-line tasks. In contrast, Opus 4.7 scored higher on SWE-Bench Pro Public, with GPT-5.5 at 58.6% and Opus 4.7 at 64.3%. This presents an interesting dilemma: if you ask which model is stronger, the answer is complicated, but if you ask which model is suitable for specific tasks, the answer becomes clearer.

GPT-5.5: Efficiency in Short Cycles

Let’s consider a practical scenario. You are debugging a test in Cursor or a terminal. You encounter dependency errors, script failures, and a flood of error logs. Previously, you would copy the error messages and send them to the model for suggestions, then return to the terminal to run the commands again. This back-and-forth is tedious. It’s not that the model can’t provide answers; it’s that the user becomes a manual laborer.

The most notable aspect of GPT-5.5 is not its individual scores but OpenAI’s clear direction towards making it a tool for practical computer work. Officially, it excels in coding, online research, data analysis, and document creation, completing tasks across multiple tools. This is not just about single-point Q&A; it’s about a series of actions. Therefore, my assessment of GPT-5.5 is that it is better suited for short-cycle development tasks such as researching, running commands, fixing small bugs, scripting, and document editing. These tasks are often fragmented and require frequent back-and-forth, where its advantages are most pronounced.

Another practical detail is that GPT-5.5 has a 400K context window in Codex. Its fast mode generates tokens at 1.5 times the speed, costing 2.5 times more. In simple terms, OpenAI is not just competing on intelligence; it is matching different tasks with different capabilities. Simple tasks are processed quickly, while complex tasks are given more context. This resembles a development workstation.

Opus 4.7: Stability in Long Tasks

Opus 4.7 has a different flavor. It does not merely compete on speed. Anthropic emphasizes complex tasks, long contexts, and agent workflows. The product page for Claude Opus 4.7 states clearly that it is suitable for production-level code, complex AI agents, and intricate document creation. An AI agent can be understood as an AI assistant capable of breaking down tasks and calling tools, with a 1M context window that allows it to remember more information at once.

This corresponds to another type of usage scenario. You are not asking it to fix a small bug; you are throwing a complex issue at it, hoping it can plan, modify code, check results, and inform you of uncertainties. Here, the focus is not on speed but on minimizing interruptions. You want it to avoid stopping to ask, “What’s the next step?”

Two customer tests on Anthropic’s official page provide valuable insights. On CursorBench, Opus 4.7 scored 70%, while Opus 4.6 scored 58%. Feedback from Notion indicates that complex multi-step workflows are 14% better than Opus 4.6, with tool errors reduced to a third. While this data should be viewed cautiously, as it comes from customer scenarios rather than neutral public tests, the direction is clear: Opus 4.7 is more stable in long tasks and tool calls.

Thus, my assessment of Opus 4.7 is that it is better suited for long delivery tasks such as refactoring, code reviews, large codebases, and complex agent automation. These tasks require its strengths.

Task Shape Over Model

The true dividing line is not the models themselves but the nature of the tasks. Short-cycle tasks are most disrupted by switching. You need to check APIs, debug tests, explain logs, and complete scripts. Each step is small, but the back-and-forth is significant. In these situations, GPT-5.5 is more efficient, as it excels in terminal work, browsing, office tasks, and cross-tool operations. It functions like a faster workstation.

Long delivery tasks, on the other hand, are most disrupted by interruptions. You want it to understand the entire project, make several continuous modifications, and check for any disruptions to existing logic. In these cases, Opus 4.7 is more appropriate, as its selling points are complex tasks, longer contexts, and less supervision. It operates like a more capable colleague who can sustain effort longer.

This distinction is more critical than benchmark scores. Scores indicate whether a model can perform, while task shape indicates whether it should be used.

Pricing Considerations

When it comes to pricing, don’t just look at the per-million token cost. Data sources include OpenAI’s official pricing page and Anthropic’s product page. Opus 4.7 also offers prompt caching that saves 90% and batch processing that saves 50%. While the output unit price may be slightly cheaper for Opus 4.7, the actual billing is not calculated that way. The length of context, the number of tool calls, and the number of retries all impact the final cost.

A model with a low unit price but requires three runs due to errors can end up costing more. Conversely, a model with a higher unit price that successfully completes a test on the first attempt may ultimately be cheaper. Therefore, when evaluating AI programming costs, the focus should not be solely on the few dollars difference per million tokens but rather on whether it can reduce failures. This approach is closer to real-world work.

Practical Usage Recommendations

If I were to provide practical usage recommendations, I would categorize them as follows:

Use GPT-5.5 for daily development tasks. It excels in researching, running terminal commands, fixing small bugs, scripting, and document handling, especially if you are already in a ChatGPT or Codex workflow, minimizing switching costs.
Use Opus 4.7 for complex deliveries. It is suitable for large codebases, long contexts, complex refactoring, code reviews, and agent automation, particularly for tasks where you do not want to check in every few minutes.

For critical code, do not let a single model handle everything. Assign short tasks to GPT-5.5, long tasks to Opus 4.7, and have another model perform the review. This combination is practical. The truly reliable approach to AI programming is not to bet on a single strongest model but to allocate different roles to different tasks.

Conclusion

The most counterintuitive aspect of this comparison is that neither GPT-5.5 nor Opus 4.7 outperforms the other decisively. GPT-5.5 expands the workstation, aiming to consolidate code, tools, browsing, and office tasks into a single entry point, thus solving the issue of frequent switching. Opus 4.7 stabilizes complex tasks, aiming to minimize interruptions and reduce the need for human oversight, thereby addressing the issue of constant progress monitoring.

So, stop asking who the true champion is. In real work, the term “champion” is not particularly useful. What matters is whether a model can streamline a chaotic process. If it reduces the need to copy error messages three times, GPT-5.5 is valuable. If it allows you to monitor a long task less frequently, Opus 4.7 is valuable. Parameters will continue to evolve, and rankings will change, but the way tasks are divided will remain a crucial judgment.

Why Programmers Prefer Codex While Vibe Users Favor Claude

Thu, 23 Apr 2026 00:00:00 +0000

Why Programmers Prefer Codex While Vibe Users Favor Claude

In 2026, AI programming tools have evolved from mere “code completion” to fully autonomous coding solutions. With three leading tools in the market—Cursor, OpenAI Codex, and Claude Code—how do you choose the right one?

These tools have distinct user bases: beginner programmers often prefer Codex for its efficiency and low cost; experienced developers or architects might lean towards Cursor; while independent developers or those in the “Vibe Coding” community are likely to favor Claude Code.

Why Such Different Preferences?

Simply put, beginner developers favor Codex because it acts like a “compliant and efficient intern”—fast, cost-effective, and capable of handling bulk tasks. In contrast, architects choose Cursor, which serves as a collaborative programming partner. Vibe Coding users appreciate Claude Code for its ability to communicate well, understand the bigger picture, and independently tackle complex functionalities.

Features of the Three AI Tools

We will analyze the differences among these three tools based on five dimensions: functionality, experience, performance, pricing, and use cases.

OpenAI Codex: Cloud-Based AI Programming Command Center

Codex’s core philosophy is “delegation”—you assign tasks to it, and it completes them independently for your review.

In February 2026, OpenAI released the Codex desktop app for macOS, a standalone AI programming command center—not a plugin or a web app. Its core capabilities include:

Multi-Agent Parallelism: Can run up to 10 agents simultaneously, handling different tasks like front-end, testing, and deployment.
Worktree Isolation: Each agent works in its own Git work copy, preventing interference.
Skills System: Built-in reusable skill packages for tasks like converting Figma designs to code, project management with Linear, and cloud deployment.
Automations: Supports scheduled tasks, such as daily test runs and issue classification.

The design goal of Codex is to transform developers from “coders” into “managers”. Sam Altman stated at the launch, “I worked on a big project for days without opening an IDE even once.”

Cursor: AI-Native IDE

Cursor’s core philosophy is “collaborative”—it is not just a VS Code with an AI plugin but an editor reconstructed with AI as its DNA.

Cursor’s advantages include:

Deep Integration: AI can see everything you see—project structure, terminal output, and debugging information.
Real-Time Feedback: High accuracy in code completion, allowing quick execution with the Tab key.
Multi-Model Support: Can switch between models like GPT-5, Claude, and Gemini in the same session.
Agent Window: Cursor 3.0 features a unified agent workspace that supports running multiple agents locally, in the cloud, and via remote SSH simultaneously.

Cursor is positioned as the “default primary development environment”. Most development time is spent in the editor, where AI assists with editing, browsing, jumping, and instant completion.

Claude Code: Terminal-First AI Coding Assistant

Claude Code’s core philosophy is “supervised coding agent”—it excels in deeply understanding codebases and executing complex reasoning tasks.

Claude Code’s features include:

Terminal Native: All operations are performed in the command line, with minimal resource usage.
Deep Reasoning: Achieved an 80.9% success rate in SWE-bench benchmark tests, leading its peers.
Parallel Sessions: The new desktop app supports running multiple Claude sessions in the same window.
Routines Feature: Supports automated tasks triggered by schedules, APIs, and GitHub events, even when offline.

The design goal of Claude Code is to act like a seasoned architect, excelling in handling multi-step refactoring, architectural changes, and complex debugging.

Summary Table

In-Depth Experience Comparison

Code Generation Quality and Reasoning Ability Claude Code excels in deep reasoning and complex tasks. In a test to “build a lightweight task scheduler,” Claude delivered a “production-ready” solution with complete documentation, test cases, and error handling, consuming about 235,000 tokens. Codex, on the other hand, is known for its simplicity and efficiency, completing the same task with approximately 72,000 tokens (about three times cheaper than Claude), but lacking detailed documentation. Developers have summarized: Claude Code is like a seasoned engineer (detailed, expensive), while Codex is like a skilled intern (fast, cheap). Cursor’s code quality is average, but thanks to its deep integration, it can enhance generation through a retrieval-augmented generation (RAG) system, often performing more consistently in real projects than pure model scores.
Execution Speed and Response Codex has a clear speed advantage, generating tokens at over 240 tokens/s and scoring 77.3% in Terminal-Bench 2.0. Its cloud-based parallel execution allows multiple tasks to run simultaneously without waiting. Cursor has zero network latency during local operations, providing very smooth Tab completion responses. Claude Code is relatively conservative in response speed, but the new desktop app has significantly optimized its parallel capabilities, improving the overall experience.
Context Understanding and Project Awareness Cursor has a natural advantage in this dimension. Being directly integrated into the IDE, AI can see the files you are editing, the entire project structure, terminal output, and debugging information. Its RAG system can retrieve rich codebase context from the local file system. Claude Code excels in deep codebase analysis. In tests, it could accurately understand project architecture by searching existing code files and reading base classes, providing code suggestions that align with design patterns. However, its reliability in multi-file editing currently lags behind Cursor. Codex theoretically can obtain complete context by preloading the entire codebase in a cloud sandbox, but the separation from the local IDE makes developers less aware of what AI “sees” compared to Cursor.
User Experience and Learning Curve Cursor offers the smoothest experience curve. Based on VS Code, it retains all functionalities and plugin ecosystems, allowing programmers to start with virtually zero learning cost. The visual interface and instant feedback make developers feel that “AI is really helping me think through problems.” Claude Code’s pure terminal design is friendly for command-line-oriented developers, with low resource usage and high focus. However, it presents a higher learning threshold for those who prefer graphical interfaces. Additionally, Claude Code’s image recognition capabilities are notably inferior to Cursor’s, lacking precision in understanding screenshots and design drafts. Codex provides a brand-new macOS desktop application with a simple interface, but users need to adapt to a “delegation” rather than “editing” workflow. Some reviews have pointed out that while Codex’s model is powerful, the user experience still needs improvement.
Cost and Value for Money This is the dimension where the differences among the three tools are most pronounced. Codex is the most aggressive in cost control. Completing tasks of equivalent complexity, Codex’s token consumption is about one-third that of Claude Code. ChatGPT Plus users can access it for $20 per month, making it highly cost-effective. Cursor’s pricing strategy sparked controversy in the second half of 2025. The Pro plan shifted from “unlimited” to a quota of 500 uses, later introducing an invisible throttling system. The Pro+ plan at $60/month even removed the description of “unlimited use,” leading to user attrition and a temporary hit to its reputation. Claude Code is the most expensive in terms of token consumption, with a complex task potentially consuming 2-3 times the tokens of Codex. Heavy users may incur monthly costs of $150-200, but its output quality is the highest, making it worth the investment in scenarios requiring deep reasoning.

Use Case Selection

If You Are This Type of Programmer:

Choose Cursor as your main editor and Claude Code as a helper for complex tasks. Cursor’s real-time collaboration and visual feedback are suitable for daily coding; when facing challenges requiring deep refactoring or architectural changes, turn to Claude Code. As one developer put it: “Cursor is the best tool for daily feature development and visual feedback, while Claude Code is better suited for hardcore problems and multi-file refactoring.”

If You Are a Tool Enthusiast Seeking Ultimate Efficiency:

Use Codex as your “AI programming team.” Codex’s multi-agent parallel capability allows you to delegate multiple tasks simultaneously—front-end, testing, and deployment—then take a coffee break and return to review the PRs. This experience of “one person commanding thousands” is currently irreplaceable by other tools.

If You Are a Vibe Coding User or Independent Developer:

Prioritize Claude Code. It excels at extracting key information from vague requirements and providing a “production-ready” complete solution. A journalist with zero programming background built a custom website using Claude Code in a few days, pulling listings from Redfin and calculating walking times—exactly the experience Vibe Coding users need.

If You Are a Team or Enterprise:

Prioritize Cursor’s Enterprise solution. Cursor 3.0 offers self-hosted cloud agents, audit logs, sandbox terminal commands, and management controls, suitable for organizations with strict code security requirements. If you are already in the ChatGPT ecosystem, Codex has the lowest marginal cost, making team adoption easier.

PR Acceptance Rate: Who Is More Trustworthy?

According to a February 2026 academic paper analyzing 7,156 PRs, each tool performs differently across various task types:

The conclusion is clear: No single tool excels in all tasks—use Claude Code for documentation and feature development, Cursor for bug fixes, and Codex for balanced performance across various tasks.

Notably, in the final merged code requests by developers, Claude Code accounted for 32.1%, while Codex accounted for 24.9%. This indicates that in terms of actual adoption rates, Claude Code currently holds a slight edge.

Final Thoughts

In 2026, AI programming tools are no longer about which model is stronger but rather about which working style suits you best.

Codex = Delegation → Suitable for automation, parallel, and bulk tasks
Cursor = Collaborative → Suitable for daily development and real-time feedback
Claude Code = Conversational → Suitable for deep reasoning and complex refactoring

Real-World Choices: You Can Have Them All

In reality, Codex, Cursor, and Claude Code are not mutually exclusive alternatives. Increasingly, developers are adopting a “dual-holding strategy”—using Cursor as the daily editing environment, Claude Code for complex tasks, and Codex for running bulk automation tasks.

A seasoned developer shared his combined workflow: Claude Code excels at architectural planning and deep reasoning, while Codex handles execution validation and rapid iteration, achieving the best results when used together.

Andrej Karpathy has also proposed a similar “three-layer AI programming structure”: using Cursor for daily simple tasks, Claude Code or Codex for larger functional blocks, and other models for the most stubborn bugs.

True efficiency gains come from flexibly combining different AI tools based on task scenarios.

Don’t get caught up in “who is the strongest.” The truly efficient approach is to switch flexibly according to task scenarios, maximizing the strengths of each tool. The goal of AI tools has never been to replace you but to elevate you to a higher dimension (to outperform colleagues) and build more elegant and robust programs (to save costs for capital).

AIGC Reshapes the Audio-Visual Industry Ecosystem

Fri, 17 Apr 2026 00:00:00 +0000

AIGC Reshapes the Audio-Visual Industry Ecosystem

On April 17, during the 16th Beijing International Film Festival’s core industry forum, Zhao Chunyan, general manager of Shichuang Langyuan and head of the Beijing AIGC Audio-Visual Industry Innovation Center, officially launched the AIGC audio-visual industry comprehensive innovation ecosystem. This initiative aims to promote five core measures across the industry: AI full-stack technology supply, AIGC-OPC creative talent supply, audio-visual data asset security, leveraging resources for IP transformation, and expanding the boundaries of audio-visual content. The goal is to build a sustainable, perceptible, consumable, and communicable complete cultural IP market, fostering a chain ecosystem of “technology foundation + full-chain services + demonstration scenarios” that nurtures a “mass-producible, verifiable, and monetizable” AIGC audio-visual content ecosystem.

The Beijing AIGC Audio-Visual Industry Innovation Center, guided by the Beijing Municipal Radio and Television Bureau and the Chaoyang District Government, is built on the foundation of 100,000 cultural enterprises and 3,200 film and television companies in Chaoyang. It serves as an open and shared public service platform for the entire industry. By 2025, it will launch six major service platforms: intelligent computing power, intelligent entity production, audio-visual digital content creation, incubation services, talent services, and business promotion. To date, the innovation center has gathered over 100 technology and film enterprises, including China Film AI, Alibaba Cloud, Huawei Cloud, Youku, iQIYI, and various leading universities and national research institutions, creating a comprehensive industrial ecosystem covering computing power, audio-visual data supply, video models, AI technology tools, and demonstration scenarios.

As the operational entity of the Beijing AIGC Audio-Visual Industry Innovation Center, Shichuang Langyuan’s comprehensive innovation ecosystem plan focuses on leveraging Chaoyang’s rich cultural consumption resources in close cooperation with the district government. It aims to promote the scene-based commercialization of premium cultural IPs while exploring new content tracks such as XR, interactive film and games, and naked-eye 3D, thereby expanding the boundaries of audio-visual content and enriching AIGC audio-visual forms.

At the event, it was reported that Chaoyang District has cultivated AIGC intelligent platforms like Nianlun AI and Nano AI, forming a full-stack technical support system from algorithm models to content production tools. This system provides efficient intelligent production solutions for film, animation, short videos, and digital performances, significantly lowering creative barriers and enhancing production efficiency. The Chaoyang District Government has also launched an annual special support fund of 150 million yuan for high-quality cultural industry development, part of which focuses on supporting AIGC public technology service platforms, offering up to 30% and 2 million yuan in funding subsidies for eligible AIGC technology R&D and application projects.

In terms of talent supply, the AIGC and audio-visual OPC super-creative talent platform operated by Shichuang Langyuan has established a full-cycle cultivation system, serving over 500 enterprises and connecting more than 5,000 creators. By collaborating with 12 universities and nearly 200,000 faculty and students, it aims to build a talent reservoir for the AI audio-visual industry, directing the supply of composite talents such as AI screenwriters, AI directors, digital artists, and AI engineers, thereby solidifying the industry’s talent foundation.

Data security and standard construction are also advancing simultaneously. Shichuang Langyuan is collaborating with Beijing Data Group and CCTV Yicheng to explore mechanisms for developing and utilizing audio-visual data resources, promoting compliant circulation and value transformation of data. They are also working with the China Broadcasting Union and the Beijing Academy of Broadcasting Science to advance the establishment of industry governance standards for AIGC-generated content.

The comprehensive support for IP transformation and scene innovation continues to break new ground. Leveraging Chaoyang’s deep cultural consumption resources, the innovation center will work closely with the district government to link various cultural landmarks and events, using policy, funding, space, and traffic to drive the scene-based commercialization of IP content. It will also establish R&D laboratories for cutting-edge audio-visual technologies such as XR, interactive film and games, and naked-eye 3D, aiming to create innovative works that resonate with the times and expand the boundaries of audio-visual content.

A representative from Shichuang Langyuan stated that the decades of cultural heritage and resource endowment in Chaoyang provide fertile ground for the AIGC audio-visual industry. The innovation center will continue to play a pivotal role in the ecosystem, bridging the gap between industry, academia, and research to promote the deep integration of culture and technology, allowing “the roots of culture to give rise to the fruits of technology.” In the future, the Beijing AIGC Audio-Visual Industry Center will continue to open its ecological resources, welcoming more enterprises, AIGC super-creatives, and institutions to join the collaborative innovation alliance, building a comprehensive collaborative innovation ecosystem for the AIGC audio-visual industry and winning together in the new era of intelligent audio-visual.

At the same time, the innovation center, along with partners such as China Film Artificial Intelligence Research Institute, Yitong Film, Beijing Zhiju, Quwan Technology, and Dinosaur Film, showcased various experiences at this year’s Beijing International Film Festival, including intelligent film review systems, atmospheric design assistants for film scenes, AIGC short film creative systems, AI dubbing and translation systems, and a naked-eye 3D trailer for “10 Rooms of Death Squad.” These experiences allow industry professionals and the public to closely perceive the impact of new technologies on the film industry.

Additionally, a 600-square-meter AIGC audio-visual industry innovation application exhibition has opened at Shichuang Langyuan Station, providing a one-stop understanding of the current applications of AI technology in various sectors such as cinema, series, 3D animation, gaming, XR, short dramas, comics, advertising, live e-commerce, and cultural tourism. Activities focusing on AI creation and talent exchange, such as the “Keli AI Creator Workshop” and “Shichuang Langyuan ‘Offer Shop 3.0’” are also being held at this year’s main venue of the Beijing International Film Festival.

Three Years of ChatGPT: 76% of Americans Distrust AI, Understanding the Social Divide

Tue, 14 Apr 2026 00:00:00 +0000

Three Years of ChatGPT: 76% of Americans Distrust AI

Since the launch of ChatGPT in 2022, the wave of generative AI has swept across the globe for three years. During this time, American society’s attitude towards AI has rapidly evolved from initial awe to a profound, multifaceted divide. This divide is not merely about liking or disliking AI; it cuts along economic, political, social, and ethical fault lines, splitting different groups into camps that struggle to communicate.

From an economic perspective, AI is exacerbating a ‘K-shaped’ divide, with unequal distribution of benefits and costs.

The wealth generated by technological acceleration has not trickled down to the masses as the trickle-down theory suggests, but has been captured by a small elite. Data shows that the top 1% of households in the U.S. saw their share of total assets rise from 27% in 2022 to 28.9% in 2025, while the bottom 50% saw their share decline from 6% to 5.3%.

Capital is concentrating at an unprecedented rate in AI core companies, with the capital expenditures of the ‘Big Seven’ tech giants, including Microsoft and Google, accounting for nearly one-third of total corporate spending in the U.S. However, the costs of AI expansion are borne by ordinary citizens: the construction of data centers has driven up electricity prices in some areas by as much as 267%.

As a Federal Reserve report reveals that over a third of adults struggle to manage a sudden $400 expense, the anxiety over job losses due to AI and the wealth accumulation of tech elites creates a glaring opposition.

Turning to the political spectrum, AI regulation has become a focal point of intense battles between the two parties, with consensus evaporating.

Within the Democratic Party, progressives and moderates are at odds. Senator Bernie Sanders has co-sponsored the “AI Data Center Moratorium Act,” advocating for a complete freeze on new AI data centers to protect livelihoods. However, fellow Democrat Senator Fetterman criticized the bill as a “China-first” policy, arguing it would undermine U.S. competitiveness.

The Republican Party largely opposes strong federal regulation, with the Trump administration pushing for a national policy while attempting to limit states’ legislative powers. This division has made it nearly impossible to establish an effective national regulatory framework, with California’s strictest AI safety bill vetoed by the governor and federal regulatory measures continuously being relaxed.

From a social structural perspective, different industries and job roles experience AI in vastly different ways.

Knowledge-intensive service sectors and manual labor industries seem to inhabit two separate worlds. A Goldman Sachs report indicates that the AI usage rate in computing and web hosting companies is as high as 60%, followed closely by finance, insurance, and professional services.

Analysis by an OpenAI co-founder suggests that professions such as software development, legal assistance, and writing are impacted by AI at a level of 9 out of 10, while jobs like construction workers and cleaners experience an impact level of only 1-2. This disparity leads to a cognitive and interest divide: about 70% of corporate managers believe AI enhances efficiency, while among regular employees, this figure is just above 50%.

On one side is the anxiety of being replaced, and on the other, the enthusiasm for efficiency gains, with common language becoming increasingly scarce.

In terms of technical ethics, specific controversial incidents continuously erode public trust, pushing divisions to extremes.

A series of events have amplified societal unease:

“AI Refinement” of workers: Companies have trained AI avatars using the chat records and work data of laid-off employees, sparking widespread outcry over digital personality rights and the exploitation of labor value.

Risks of losing control: The collective failure of autonomous taxis in Wuhan exposed safety hazards when AI systems are applied at scale.
Escalation of violence: Attacks against AI elites have transitioned from online protests to physical violence. OpenAI CEO Sam Altman’s residence was attacked with Molotov cocktails, and officials in Indiana supporting data center construction faced gunfire at their homes. This marks a shift from ideological debate to physical attacks against individuals.

After a multidimensional analysis, a nascent form of a ‘zero-sum game’ without clear winners is emerging.

Three years after ChatGPT’s launch, American society has not reached a new consensus on how to harness AI; instead, existing fractures have been exacerbated by technology. Economically, the concentration of benefits in the top tier reinforces the ‘K-shaped’ structure; politically, the tug-of-war between the parties over regulation and innovation has reached a stalemate; socially, different occupational groups are at odds due to differing self-interests; ethically, specific cases of technological abuse continue to erode an already fragile public trust.

This creates a vicious cycle: economic inequality exacerbates social anxiety, political polarization leads to regulatory voids, and regulatory vacuums allow for ethical disorder in technology, with each incident of ethical failure further intensifying opposing sentiments and even breeding violence.

After the attack, Altman called for “AI must be democratized, and power cannot be overly concentrated,” and released a white paper suggesting a trial of a four-day workweek and tax reforms to share the benefits, which can be seen as a response from tech elites to the current divisions.

However, with 76% of Americans distrusting AI while industry giants invest over $300 million in political lobbying, the bridge of trust has long since collapsed.

AI has not directly created new lines of division, but it acts like a high-powered developer, making the existing issues of wealth disparity, political polarization, and class stratification in American society exceedingly clear and sharp. Until a systematic solution is found for fair distribution of benefits, rebuilding regulatory trust, and safeguarding labor value, this social consensus fracture ignited by technology is unlikely to heal.

China's AI Development: Innovations and Global Cooperation

Sat, 11 Apr 2026 00:00:00 +0000

Introduction

China’s 14th Five-Year Plan outlines a comprehensive implementation of the “Artificial Intelligence +” initiative, empowering various industries. The recent government work report emphasizes the deepening of this initiative, supporting open-source AI community development, enhancing data resource utilization, and improving AI governance.

AI Innovations and Applications

Globally, AI technology is rapidly innovating and integrating across industries. Reports from various foreign media highlight China’s multifaceted breakthroughs in AI technology innovation, application, and ecosystem development while maintaining a human-centered and benevolent approach. China aims to share its innovative achievements with the world, ensuring that technological advancements benefit all humanity and drive global development and prosperity.

Industrial Applications

Bloomberg reports that China is focusing on application-oriented AI development to strengthen its manufacturing advantages. Industrial robots operate in “dark factories,” achieving high efficiency through automation, while AI accelerates logistics and shortens product design cycles.

According to Cuba’s Granma, AI technology is transforming traditional agriculture in China. In the smart agriculture demonstration park in Pinghu, Zhejiang, AI and IoT technologies have been deeply integrated into the entire agricultural process, increasing overall production efficiency by approximately 75%. This integration has significantly reduced the use of water, fertilizers, and labor while increasing vegetable yields by 5 to 7 times, showcasing the dual value of “AI + modern agriculture” in enhancing efficiency and promoting sustainable development.

Cutting-Edge Research

The Uganda Development Observatory highlights China’s innovative breakthroughs in frontier technologies. Chinese researchers have successfully explored the integration of AI and synthetic biology to accelerate innovation, reducing the protein design cycle from months to weeks, with potential applications in drug development and diagnostic technologies.

Broad Integration of AI

Digital Agenda, a European tech news platform, reports that AI technology is deeply integrated into various sectors in China, enhancing economic production, social development, and public services. In energy, AI optimizes power production, smart grids, and renewable energy management, improving system efficiency and stability. In education, AI enables personalized learning, intelligent tutoring, and automated assessments. In urban development, AI optimizes traffic and public services, with nearly 70% of new vehicles equipped with intelligent cabins and the gradual promotion of autonomous vehicles.

Manufacturing Transformation

Singapore’s Lianhe Zaobao reports that China is accelerating the “AI + manufacturing” initiative, aiming to transform its traditional manufacturing sector into an advanced manufacturing powerhouse. Denmark’s Berlingske notes that China has made significant strides in AI, demonstrating outstanding technological innovation and ecosystem building capabilities.

Long-Term Planning and Coordination

By 2025, China’s core AI industry is expected to exceed 1.2 trillion RMB, with over 6,200 AI companies and more than 300 humanoid robots launched, making China the largest holder of AI patents globally. Various measures are being implemented to promote the deep integration of AI with economic and social development, fostering mutual promotion between technological breakthroughs and ecosystem construction.

Open Source Collaboration

Singapore’s Business Times reports that Chinese engineers are collaborating on open-source AI models, studying thousands of independently developed variants, fostering collective innovation rather than relying solely on individual efforts. Norway’s Invest highlights that DeepSeek has optimized internal information sharing mechanisms in models, reducing computational load and energy consumption while enhancing stability and efficiency during scaling.

Brazil’s O Globo analyzes the Chinese government’s strong push for AI industry development, stating that China’s long-term planning and coordination mechanisms contribute to forming industrial synergy.

Policy Support and Infrastructure

The BBC reports that China’s government work report emphasizes creating a new form of intelligent economy, further elevating AI’s role in the country’s economic development framework. Digital Agenda notes that China has introduced a series of AI-related policies and regulations, providing a solid institutional guarantee for technological innovation. The government is increasing investments in infrastructure, data, energy, and talent, widely deploying 5G networks, high-performance data centers, and cloud computing platforms to support large-scale AI model training and applications.

Germany’s Technology Times analyzes that the rapid development of China’s AI technology ecosystem is attributed to multiple factors, including government policy guidance, legal system guarantees, and enhanced corporate innovation capabilities. Collaboration among enterprises, universities, and startups forms a complete innovation chain, with events like the World Artificial Intelligence Conference facilitating knowledge flow and technology application.

Global Cooperation and Governance

China actively participates in the formulation of digital governance rules, proposing initiatives like the Global Data Security Initiative and the Global AI Governance Initiative, aiming to establish a comprehensive digital governance framework that prevents technological innovation from becoming a game for the wealthy. China advocates for open cooperation, opposes technological barriers, and promotes AI development for the benefit of all, earning widespread support and recognition from the international community.

By 2025, China’s domestic open-source models are expected to have the highest global download volume. Malaysia’s New Straits Times notes that open-source models provide a new path as “public goods,” allowing institutions worldwide to run and download models on local servers. Uganda’s recently launched large language model “Sunflower,” based on China’s Qianwen model, assists farmers with agricultural guidance and helps students translate learning materials into local dialects. This highlights that China’s AI development is not just a national success story but also demonstrates how China provides development momentum for the entire world by offering efficient, open, and high-performance technological tools, lowering the barriers to entering the AI era.

Conclusion

China is a key force in driving AI development and innovation. Italy’s La Repubblica reports that China’s open-source models not only activate the domestic technological application ecosystem but also spread internationally through open releases and institutional collaborations. The editorial in Nature welcomes China’s initiative to establish a World AI Cooperation Organization, emphasizing that such institutions align with the interests of all nations. It calls for global collaboration to discuss AI safety guidelines and jointly plan enhanced AI governance pathways. France’s Le Figaro reports that China actively promotes global governance and international cooperation in AI, seeking a balance between AI development and safety, advocating for the establishment of a World AI Cooperation Organization, and is willing to share technological advancements with other countries, especially developing nations.

Anthropic's Claude Managed Agents Boosts AI Deployment Speed by 10x

Thu, 09 Apr 2026 00:00:00 +0000

Introduction

The competition in artificial intelligence (AI) infrastructure is entering the “Agent Era.” Following the race for large model capabilities, Anthropic has launched Claude Managed Agents, aiming to upgrade AI from a “conversational tool” to a “sustainable operational production system.”

In an official blog post released on April 8, Anthropic introduced Claude Managed Agents as a composable API suite designed for large-scale construction and deployment of cloud-hosted agents. This product aims to address the core pain points of deploying agents in enterprises—complexity and engineering costs—emphasizing that it can enhance the efficiency of building and deploying agents by tenfold.

Commentators believe that Claude Managed Agents is not just a new product but a paradigm shift: the value of AI is moving from “answering questions” to “completing tasks.” If large models are the “operating system” of the AI era, then Claude Managed Agents aims to be the “enterprise automation platform” running on top of it.

From Development Tools to Managed Systems: The Cloud Era of Agents

Anthropic’s core definition in the blog states that Claude Managed Agents is a “fully managed” runtime environment, where developers no longer need to handle the underlying infrastructure themselves.

The company clearly points out that building agents in the past often required addressing a series of complex issues, such as:

Scheduling long-running tasks
Error recovery and retry mechanisms
Concurrency and scaling
Logging and monitoring

The goal of Claude Managed Agents is to “allow developers to focus on defining what the agent does, rather than how to run it.”

This positioning essentially upgrades AI agents from “code projects” to infrastructure services similar to cloud databases and cloud functions.

Media reports suggest that this indicates Anthropic is attempting to “host your AI agents,” directly entering the foundational layer of enterprise software.

Reducing Development and Operational Complexity

In terms of performance and efficiency, Anthropic has provided striking metrics.

The company emphasized that Claude Managed Agents can significantly reduce development and operational complexity, achieving a “tenfold increase in the speed of building and deploying agents.”

This improvement does not stem from the model itself but from the reconstruction of the engineering system:

Automated runtime environment
Built-in task orchestration
Standardized tool invocation
Continuous running capabilities

In other words, Anthropic is turning “AI engineering” into a “configuration problem.”

This is symbolically significant in the industry. In the past, even enterprises with strong models often got stuck at the “last mile”; the managed model directly addresses this bottleneck.

Core Capabilities Breakdown: From “Talking” to “Working”

The key to Claude Managed Agents lies in enabling AI to perform “long-running tasks.”

Anthropic emphasizes that agents are not just about calling models but are systems capable of long-running tasks, multi-step decision-making, calling external tools, and automatic error correction and retries.

This sharply contrasts with traditional chatbots.

According to previous research by Anthropic, the proportion of task delegation usage with Claude in enterprises has risen from 27% to 39%, indicating that users are rapidly shifting towards “having AI perform tasks.”

Claude Managed Agents is a productized response to this trend.

Enterprise Implementation: From Experimentation to Production

On the application front, Anthropic has already collaborated with enterprises.

For instance, in finance and data analysis scenarios, Claude has been used for:

Automating financial modeling
Data analysis and validation
Cross-system information integration

Anthropic previously disclosed that its model achieved an accuracy rate of 83% in complex Excel tasks and can complete multi-level financial modeling tasks.

These capabilities, combined with “managed agents,” mean that AI can be directly embedded into core enterprise processes, rather than just serving as auxiliary tools.

Anthropic introduced some early adopters of Claude Managed Agents, claiming that various teams have achieved a tenfold increase in delivery speed across a wide range of production application scenarios.

The company noted that Rakuten has deployed enterprise-level agents across its product, sales, marketing, finance, and HR departments, seamlessly integrating with Slack and Teams, allowing employees to directly assign tasks and receive deliverables in forms such as spreadsheets, presentations, and applications, with each specialized agent being deployed within a week.

The company also mentioned that Sentry integrated its debugging agent Seer with Claude-driven agents responsible for writing patch code and submitting pull requests (PRs), allowing developers to seamlessly convert a flagged bug into a reviewable fix proposal, with this integrated solution successfully going live in just weeks instead of the usual months.

Concerns: The Cost and Control Dilemma

However, managed agents are not without their costs.

Reports earlier this month indicated that Anthropic has restricted third-party agent tool access due to these tools causing “overload” on the system.

This reflects a key issue— the more powerful the agent, the higher the computational costs.

Additionally, there remains uncertainty about whether enterprises are willing to entrust critical business processes to an AI platform.

OpenAI Codex Team's Shift from Specs to Skills in Product Development

Wed, 08 Apr 2026 00:00:00 +0000

Introduction

In the Codex team, the concept of specs has become much lighter. Often, documentation consists of just 10 bullet points before diving directly into development.

This change is largely related to the enhanced capabilities of the models. A few years ago, there was a lot of focus on refining prompts and making specs more complete and structured to ensure models executed tasks reliably. Now, the Codex team discusses skills more frequently. They have begun organizing common tasks into groups of callable capabilities, allowing the model to execute them.

Thus, specs no longer take center stage; skills are becoming the new entry point, and development is shifting from “describing processes” to “organizing capabilities.”

We translated the latest podcast episode, which discusses not only how they develop products but also how OpenAI’s internal understanding of coding agents, skills, and development methods has evolved alongside model capabilities.

Writing Specs? We Write About 10 Bullet Points

Peter Yang: Hello everyone, welcome to today’s show. I’m excited to invite Alex and Romain from the OpenAI Codex team. Alex is the product lead for Codex, and Romain is in charge of developer experience.

Alex / Romain: Thank you for having us, we’re glad to be here.

Peter Yang: I’m curious about how your team uses Codex for product development. Alex, do you still write specs, or do you let GPT help you with that? What does the process look like, and which model do you use?

Alex: I think we write very few specs in the Codex team now. We have a core idea of letting those “closest to the implementation” make as many decisions as possible.

We only write specs in situations where the problem is too complex for one person to grasp. Honestly, a single person can hold a lot of information now since they can delegate most coding tasks. So, the scope of what one person can accomplish is much larger than before.

However, if the task requires coordination among several people or involves particularly tricky decisions, we might write a spec. Even then, such documents are usually very short—around 10 bullet points.

Host: Can you demonstrate this? For example, can you give Codex a few bullet points, and it writes a more complete requirement or a markdown file?

Romain: Yes, that can be done. But I want to show you a simple yet illustrative scenario. For instance, when developing an iOS app, you might just need to voice input a command like, “Help me add a new page about NASA’s Artemis lunar mission,” and send this prompt to GPT-5.4. The model will directly generate the new page for the iPhone app.

Imagine you are close to finishing a task, and new feature ideas start popping into your head, but you are unsure of the next steps.

At this point, using Codex is interesting because if I say, “Let’s plan the next steps,” Codex automatically understands that I am trying to plan the content to be built next. If I press Shift+Tab, it enters plan mode. Then if I ask, “What should we do next?” I can use Codex as a brainstorming partner to plan the next steps together.

In this mode, it looks at the current code and project status, then proposes some ideas on its own. I can also add my thoughts, gradually guiding the model toward a better planning direction.

Now you can see it has started generating ideas based on the project status, code, and file content.

So that’s how I use Codex. Of course, in this demonstration, I didn’t provide much input initially. If I were Alex, the product lead, I would definitely provide more guidance upfront. But here, I intentionally let Codex propose some ideas on its own.

Alex: Many changes can actually be categorized into a few types. Some are very simple, and you just prompt it directly to make the change. Others are of medium complexity, where you might want to think about how to proceed or let it output a specific plan first.

But I often use a common approach similar to the previous example. When I have only a vague idea in my head, I open Codex and let it start thinking about “how this problem might be solved.” At this point, I don’t even have a clear feature definition. It will explore on its own and come back with questions for me.

Often, I don’t end up adopting the proposed solution because some changes may prove to be very complex. By the way, the question of “what code should PM write” is worth discussing. For me, if it’s a complex change, I don’t necessarily want to be responsible for integrating it and maintaining it long-term, but I still go through the planning mode and exploration process. This way, I develop a better mental model of what needs to be done.

In the end, I hand over the “thought results” rather than the plan itself to the engineers. I believe what’s truly valuable is often not the plan document but the understanding I form through this process.

Interestingly, our Codex team’s designers now write more code than many engineers did about six months ago. We sometimes joke that they are really impressive now. Of course, tools play a significant role in this.

The team used to joke about how few PRs I had merged in the past year. I won’t disclose specific numbers, but I admit I should have done more. Especially considering that many of those PRs were just minor changes.

However, I believe the whole issue has changed now. The focus is no longer on whether you can generate code because agents are already very capable in that regard; you can fully delegate tasks to them. What’s becoming increasingly important is deciding what to do. In other words, are we aligned in direction, and do we truly understand what this product is becoming?

After that, another equally critical question is how we ensure that the final product is of high quality. Some people proudly say that the entire app was vibe coded. For Codex, indeed, most of the code is generated by the agent. Yet, even so, we still invest a lot of effort and attention into thinking about the system itself to ensure it is genuinely high quality.

That’s why, when faced with a particularly complex feature, I usually ensure it has a more stable, long-term owner responsible for it. I don’t think PMs should own parts of such systems because PMs are often interrupted by various tasks and fill gaps. So, you wouldn’t want a PM to maintain these systems long-term.

Peter Yang: Right, you definitely wouldn’t want a PM to maintain the code for a feature. That doesn’t sound like a good idea. I think we would definitely mess it up. That’s very real. But speaking of the product itself, I do like the feel of Codex. There are other strong products out there that I also like, but many tools really require a lot of time to learn. I even feel that if I don’t browse Twitter regularly, I might not know how to use those other pro products at all. But one thing I particularly like about Codex is how easy it is to get started. The entire app is very intuitive and simple. Yet, at the same time, it has some advanced capabilities, like skills and automations. Do you use these extensively internally?

Romain: Yes, very much so. In fact, I think skills might be the most interesting type of capability in the Codex app interface.

For example, if you are working with designers using Figma, a great feature is that you can open the Figma skill, which will directly pull in details from the Figma file, including React components, variables, etc., and Codex will write the implementation based on that content.

For instance, if you are developing an app and want to share it with others or deploy it to Vercel, Cloudflare, Render, etc., these skills are already there. You just need to tell Codex what you want to do, and it can seamlessly integrate into that entire task ecosystem.

A few days ago, I was chatting with a friend who had a lot of ideas for improving a product. He told Codex to use that skill to write all those tasks into Linear so he could track them. Then, when all the tasks were listed, he said, “I’m going to sleep now; you continue to implement and check off the tasks we just discussed one by one.” The next day, he woke up to find everything was done.

OpenAI’s Changing Perspective on Codex: Open Harness and Empowering Models

Alex: Returning to the simplicity of Codex, I think sharing our design philosophy might be interesting.

One particularly fascinating aspect of product development in this field is that developers naturally love to create tools for themselves and automate workflows. Therefore, a crucial principle for us is that the product must be highly configurable.

For instance, Codex’s harness is open source. Users can dive deep and make extensive modifications. It often happens that while we are developing a feature that hasn’t been officially launched yet, people on Twitter are already complaining about it being broken. The reason is that they have gone ahead and modified the code or forked the project to use the feature early. To me, that’s one of the best parts of the product. It means that the most cutting-edge users are already living in the future with us, exploring and pulling us toward that future.

On the other hand, if you design products solely for this group, the final output can become nearly incomprehensible, and users would indeed have to spend all day on Twitter to know how to use it.

So our approach has always been to carefully define those core primitives, which are the most fundamental and critical parts of the product. Those areas require serious thought and should not be treated lightly.

We think carefully about how to make the entire product as “invisible” as possible, allowing the model to shine. This way, every time the model becomes a bit stronger, it can naturally take on more tasks. Then, on that foundation, we consider how to package it into a system that is as configurable as possible for advanced users to explore.

For example, there are already people in the community experimenting with the implementation of sub-agents. This functionality is already out there, being used and tinkered with, and we have learned a lot from how users are utilizing it. Although we are not actively pushing this feature to everyone in the product, users have discovered and started using it on their own.

Next, we will think about how to make these things easier for others. The Codex app itself is an example of this. Around the time of GPT-5.2 Codex, I remember it was around December, the model capabilities were steadily improving, but suddenly we crossed a threshold. At that point, you could delegate longer and more complex tasks to the model, and it often completed them in one go.

We began to see that many people were already using tmux. For those unfamiliar with the term, tmux is essentially a “terminal multiplexer” that allows you to manage multiple sessions, windows, and panes in one terminal, enabling you to run many tasks in parallel.

We started seeing some crazy visuals on social media, like Peter Steinberger’s image—dozens of terminal panes filling three monitors, all running various tasks with Codex.

On one hand, we were excited; on the other, we continued to ensure that this “delegated execution” capability was reliable in the most basic CLI products. However, we realized that this might be the working style of the top 1% of engineers. The question became how to make this experience intuitive enough for everyone.

Thus, the Codex app emerged. When you open it, it feels very simple, like a chat window. It helps you get things done. Then you gradually discover that there’s a sidebar, that you can run multiple tasks simultaneously, and that switching between these tasks is very easy. Soon, you feel particularly efficient. Next, you realize there’s a skills tab. We want to make this experience feel a bit like playing a game, where you discover the next capability step by step.

Romain: Absolutely. I believe from the very beginning, we’ve had a clear vision that the future of coding will increasingly become a mode of “delegating tasks to agents.”

Even a year ago, when we first started working on Codex, we envisioned a future where engineers would handle many tasks in parallel.

However, at that time, the model’s capabilities were not yet fully realized. Later, we saw the turning point with GPT-5.2 Codex and subsequent models, where the model began to work reliably and meticulously for several hours, even days. At that stage, looking back, it seemed odd to have users open a bunch of tabs in the terminal and let them run for hours.

That’s why we needed a new product form. I think the interface that later became the Codex app matured at just the right time.

Alex: Indeed, there have been two notable “atmospheric shifts” in Codex’s history.

The first was around August when we launched the cloud product for Codex. The idea itself was great, and everyone was excited then and still is. However, looking back, it was a bit premature.

Around the same time, we released the interactive programming model for GPT-5. Our thought was to address the “problems the model can now solve.” So we launched Codex CLI and IDE extensions, and growth began to explode. I remember that during those months, the scale grew by about 20 to 30 times, which was fantastic.

The second change occurred around December to January. By that time, we could finally return to the original vision of truly delegating work to the model.

We Only Do Short-Term and Long-Term Planning, Never Mid-Term Planning

Peter Yang: Let’s delve deeper into the development process of the Codex app. Did you have an annual roadmap? For example, did you write down a plan a year ago stating, “By a certain time, we will launch the Codex app”? Or did you more react to market trends and create a bunch of prototypes? How did this product come to be?

Alex: Neither. Actually, I heard a particularly good piece of advice from an OpenAI researcher, Andre. He told me that at OpenAI, you either do short-term planning or long-term planning, but you don’t do mid-term planning.

Because mid-term planning is too difficult. Short-term usually refers to the next eight weeks; that’s basically the limit. You need to think about whether there’s a specific goal that can rally the team around it to get it done. This is something we excel at in OpenAI—organizing the team around a clear objective.

The other type of planning is to grasp a longer-term “feeling.” For example, you might think that a year from now, the model will be much smarter. It sounds obvious now, and in fact, the change didn’t even take a year, but if you think back to that time, you might have thought:

In the future, we will have very powerful models, and we won’t want to “borrow our computers” for them to do tasks because that way, they can only handle one task at a time. What we really want is to have almost unlimited models working independently, validating results, deploying code, and monitoring operational status. Eventually, we might not even need to prompt them one by one.

So you start imagining an overall atmosphere and direction for that future. As for the middle layer, it becomes awkward. The so-called middle layer is usually the product roadmap, and we don’t really have a traditional roadmap.

What we truly have is a long-term direction and some specific actions we believe will push us toward that direction. For instance, regarding the Codex app, we had a strategic goal of decoupling ourselves from a “specific workspace.”

This phrase sounds a bit abstract. Let me explain. When you use an IDE like VS Code, which is my favorite IDE, you usually correspond to a specific workspace, which is a specific checked-out codebase or a whole specific folder.

Even if you use git worktree, you can essentially only open one worktree at a time. So fundamentally, you can only handle one task at a time. The same goes for CLI. But because we had that vision from the start, we wanted users to work alongside those agents running independently in the cloud, so we knew the product must eventually reach a state where you could naturally converse with multiple agents or even just one agent that orchestrates multiple agents behind the scenes.

However, we learned something: if you start from the cloud, it can be challenging for developers to derive value. Their commonly used tools aren’t there, and they have to set up the environment first. Moreover, if a task is only half completed by the model, it’s hard to get “partial results.” Often, when the model is halfway through, you need to step in to correct its direction or make slight adjustments.

So we thought we needed a local experience that would free itself from the constraints of a specific folder while still feeling natural when working across various folders on your computer.

Thus, when we began developing this app, there was a layer of abstract, even somewhat esoteric directional thinking. Meanwhile, engineers had already created many prototypes, all sorts of implementations of “I wish we had an app.” Some people made this version, others made that version. We even held a hackathon where several people independently created different versions of the app. You might have made one at that time; I can’t quite remember.

So when this project truly started, the only thing that really needed to be documented was why we believed “creating an app is a good idea.” There wasn’t a very specific spec for the app itself at first. Of course, some documentation gradually emerged during the development process, but initially, there was quite a bit of debate.

At that time, there was a real discussion: should we make an app? After all, the IDE extension was already very popular. Shouldn’t we just focus on improving the IDE extension? CLI is also important; it seems to be a core aspect of this field. If we really want to make an app, what’s the significance? Where should it go? These questions didn’t have standard answers at the beginning.

Romain: Fortunately, our IDE extension was already quite mature and polished. You could use it in environments like VS Code, Cursor, Windsurf, etc. So we brought a lot of mature experiences from the IDE extension codebase as a solid starting point.

Alex: Yes. In fact, the app and IDE extension share quite a bit of code. More accurately, they share the same portion of code.

The core harness, whether for the app or IDE extension, is written in Rust and is open source. The CLI is also based on it. So there’s a lot of sharing and a very deliberately designed layered structure.

Peter Yang: Looking back now, it seems obvious that making the app was a good idea. After all, using the Codex app is definitely easier than opening a bunch of terminal windows. But at that time, the core reason for deciding to make this app was that it is more user-friendly for beginners, and you can genuinely get started as if you were playing. Is it the best interface for managing multiple agents simultaneously?

Romain: Yes. I believe our thinking has always been very “AGI-oriented.” We have always been considering what kind of future we are sliding toward.

However, if we adjust the order, a more accurate statement would be: we first knew we had to create an interface that made “delegating tasks to multiple agents” feel very natural. Because we knew the model would eventually be ready to support this approach. In fact, we have already seen people starting to delegate tasks between different agents.

Thus, we need an interface where this process must feel natural, and when it expands to the cloud in the future, it should also be very smooth. At the same time, the entire experience must be ergonomic, not making users feel like they are awkwardly struggling with “how to delegate multiple agents simultaneously” but rather making it feel like the most natural way to work.

Romain: By the way, this experience attracts not only beginner developers. On the contrary, even within OpenAI, the most productive and experienced engineers are now using the app as their primary working method. For example, Peter, who came from OpenClaw, and Greg Brockman, are now primarily using this app to build things.

So this is fundamentally the realization of the “agent-style delegation” vision. It’s not that the best engineers will always stay in the terminal; in fact, they are also transitioning to the app.

Alex: Yes, we hope so. We keep mentioning Peter because he just joined OpenAI, and we are really excited. After all, he has worked on OpenClaw and is very creative. I’m not sure if I told you before, but last October, I took a walk with him in San Francisco.

At that time, I didn’t directly tell him we were considering making an app, but I started tentatively discussing the idea of a new interface that would make “task delegation” feel more natural. His attitude at that time was basically that he would never use such a thing.

Then last weekend, he surprisingly tweeted that this app is actually quite good. It was like seeing the sun rise in the west. He has started to like it.

Peter Yang: I’ve also spoken with Peter. If you really get him to start using the app, that would be a major achievement because he usually opens twenty terminal windows at once. That’s really impressive. Alex, you seemed to be the only PM for Codex for a long time, right? How many people are on the Codex team now? Fifty? A hundred?

Alex: It’s roughly in that range. About that. I think we were around eight people last May, right?

Romain: Yes, about that.

Alex: I can’t recall the exact number now, but we have indeed grown very quickly since then. So now we are probably between fifty and a hundred people.

After the Model Strengthens, Codex Takes Over Everything with Skills

Peter Yang: So what does a typical day look like for you? Do you even have a “typical day”?

Alex: Interestingly, I’ve been thinking about this question lately because I realized I don’t really have a straightforward answer. I later realized that my work state actually switches between different modes.

First, let me clarify that this isn’t advice for others; it’s just my personal work style. For example, before we released the app, I was in a very pure execution mode. In that state, I was fully focused on execution, obsessing over quality, ensuring we didn’t overlook any corners, and getting every little detail right.

In this mode, I spent a lot of time in Codex. On one hand, we indeed use Codex extensively to understand what’s happening. For instance, I would use Codex to check Slack for feedback; I would have Codex summarize this content, follow up, and then send it to Linear. So, just understanding the current quality status requires a lot of use of Codex.

On the other hand, I also use Codex to understand code-level issues and directly make modifications with it. Because now, if it’s just a small change rather than building a new system, letting it help me finish the task, testing it, and submitting a PR is often faster than communicating with someone else and having them prioritize this task among a thousand other things—especially when our goal was to release the app within two weeks.

Besides these, there are certainly many very “human” aspects, like motivating and mobilizing everyone, while also maintaining a critical perspective on what we are doing. So this is a work mode I can clearly perceive. Interestingly, if I’m in this mode, you’ll find that I tend to be more active on Twitter. I don’t know why, but whenever you ask me about social aspects, I usually find myself browsing Twitter more during that time.

But I also have another mode. For example, I currently feel very strongly that we have reached a stage where the model is very strong; GPT-5.4 is astonishing. At the same time, the product form of the app is more popular than we expected, and we have now covered all platforms, including Windows.

So my focus has shifted to thinking about “what should we do next” and understanding the current state of the whole situation.

This feels more like a coordination mode. In this mode, I actually spend less time writing code in Codex and more time using Codex for communication. So at least for me, I can distinctly feel that I have these two modes. There might be more than two, but at least these two are the most obvious.

Peter Yang: How much cross-functional alignment do you typically need to do?

Alex: The Codex team itself is fantastic. We actually do very little cross-functional alignment internally. We somewhat intentionally see ourselves as a “pirate ship” team.

Even within the Codex team, it’s just me, along with two recently joined PMs and a few leads. Until recently, everyone basically shared the load together. Our work style is more like a group of people mixing together to push things forward quickly rather than doing a lot of formal alignment.

So, there isn’t much alignment within the team. However, it’s becoming increasingly clear that building Codex involves constructing a coding agent. Now everyone can see that coding agents are not only useful for writing code but also for many other types of work.

We’ve seen many people using the Codex app for tasks beyond just coding. Furthermore, now most people at OpenAI are using the Codex app, even those not in technical roles. I see this app everywhere in the company.

So when you realize that Codex is not just serving coders but is becoming useful in a broader context, it indeed requires more cross-functional alignment. Because OpenAI also has ChatGPT, which is a product used by many, we need to think carefully about how to approach this.

Romain: From the developer experience perspective, we have almost become an extension of the Codex team. Most of our energy is now focused on Codex, but there are several reasons for this.

On one hand, of course, it’s an exciting product, and developers genuinely love using Codex, so we will continue to improve it. On the other hand, as Alex mentioned, we also have different modes. For instance, when preparing for a release, we rush to the front lines with the Codex team, preparing release assets, various materials, and thinking about how to present Codex’s value maximally. Once the product is out, we switch to another mode, educating developers on how to use Codex in various ways.

But there’s another layer of reason that makes this particularly important for us. When you look at the larger OpenAI platform, you’ll find that millions of developers are building things based on the OpenAI API. They are using models and various modalities, from image generation to Sora, and speech to speech.

And you know what? The best entry point for developers has now become Codex. If you turn the clock back to a year ago, or even just back to last summer when we launched GPT-5, we needed to write a lot of guides to teach people how to prompt GPT-5 because it was a reasoning model, quite different from GPT-4.

But now our approach has changed. Even for these use cases, we try to teach developers to directly use Codex and skills. For example, if you need to update an integration, you should most likely use Codex along with the corresponding skill, and Codex can usually help you handle that.

From this perspective, our work has also become very cross-functional because we see Codex as the cornerstone of the entire developer platform.

Alex: One more interesting point is how we collaborate with each other. Honestly, one of the best parts of working on Codex is the community. This includes both the online internet community and the people we meet at offline events. Many things we organize revolve around this core.

For example, we pay great attention to the release rhythm, when to launch new things; we also value feedback greatly. When the community starts providing feedback, we quickly fix issues and communicate. So our entire team is very “online,” always keeping an eye on community trends.

Take the release of the Codex app, for instance. We collaborated very closely with the Dom team. He essentially helped us coordinate a wide-ranging alpha test covering many users. We were building the product with these users, gathering feedback, supplementing skills, enhancing the capabilities used in the app, and preparing documentation, etc.

So I think this is a unique advantage of the Codex team. Ultimately, it’s because we are open source. Because we are open source, many things naturally evolve into being very open about what we are doing. And the community indeed rewards this openness.

We even have Codex ambassadors spread across many cities and countries who organize local events to teach people in their communities how to use these tools. Of course, I wish I could visit every city, but that’s clearly unrealistic. So seeing the community being so energetic and passionate, proactively organizing events, hackathons, and building things together is truly wonderful.

“Lobster” Will Be Integrated into ChatGPT

Peter Yang: Next, let’s talk about Peter. I consider myself an early user of OpenClaw. It does have some rough edges and minor issues, but it has genuinely helped me accomplish many tasks. For instance, a few days ago, because it remembers our previous conversations, it gave me a rather crude but motivational “spiritual pep talk” lasting about three minutes. Honestly, that might be the most insightful thing I’ve heard from AI. So I’m curious about how you are integrating Peter into the team? Also, does this vision of a “personal agent” relate to what he is currently working on? How do you understand this?

Alex: There are actually two layers to this. I can’t say too much, but the first point is that he is a super, super heavy user of Codex. OpenClaw was largely built using Codex, so he continuously provides feedback to the team and actively participates in efforts to improve Codex. In a way, this is his “side job,” but he is indeed doing it, and we are very excited about it.

As for the other part, I can’t say too much yet. But broadly speaking, he is indeed helping us build the next generation of personal agents, and it is being integrated into ChatGPT.

Romain: One thing that fascinates me about Peter is that, of course, I’ve known him for a while, and many people saw a glimpse of the “future” when they first played with OpenClaw.

But the truly impressive part is that Peter recognized this vision early on. If you look back at 2025, he worked on over 40 open-source projects last year, but these projects were all centered around the same vision: I need a command-line interface to access my calendar, I need a command-line interface to access my tweets and Gmail.

By continuously working on these projects, he has concretized a vision—one that revolves around skills and command-line tools, building what we use today for coding agents. In the future, it clearly won’t stop at coding agents; it will evolve into various types of personal agents.

Thus, Peter is very well-suited to provide us with feedback throughout this process, as many of the tools that have entered the open-core ecosystem were built by him.

Peter Yang: I feel the same way. Romain is right; he’s a one-man show who has built a fantastic open-source community. And honestly, it’s made me less inclined to open other apps. Now I just talk to my little bot, and it’s completely different.

Alex: Wait, what have you connected it to? Have you connected it to everything?

Peter Yang: Pretty much. I’ve connected it to a lot of things. It can see my banking information, YouTube data, and I’ve connected it to voice, calendar, and various Google services. Sometimes I lie in bed talking to it, and my wife asks who I’m talking to, and I say I’m talking to my OpenClaw bot. It keeps giving me ideas. However, there are indeed many people out there charging for “helping people set up OpenClaw,” with prices even reaching $5,000. So if you can really make this a product for the general market that ordinary people can use smoothly, that would be enormous.

Alex: Yes, we are working on it. I will update you later.

The Traditional Career Ladder is Becoming Less Relevant

Peter Yang: Alright, let’s wrap up with some more provocative topics, Alex. Maybe I’m mistaken, but I think I’ve seen you say that many teams no longer need as many PMs. Let’s spice this up a bit. What do you think, brother? Do we still need PMs?

Romain: I think the most astonishing thing about these tools is that the changes they bring are even more profound than just the question of whether we need PMs or not.

In my view, the boundaries between almost all career ladders are starting to blur. It used to be that designers were over here, engineers were over there, and PMs were in another place, with some kind of ideal structure in terms of headcount.

But now, if you are an engineer, you will obviously become more efficient; if you are a designer, you suddenly gain some “superpowers” to become more technical; if you are a PM who primarily wrote strategic documents before, now you can directly create prototypes.

This doesn’t mean you have to be responsible for a feature aimed at a billion users, but you can certainly showcase a slice of that vision to the team by “doing it yourself.” So I think the most captivating aspect is that the lines between all career ladders are becoming blurred, and we are all becoming builders.

Alex: I resonate with this. I try to recall what I’ve said. I remember saying something online along the lines of if a startup has fewer than 20 engineers but already has a PM, that might be a warning sign.

But what I meant to express is quite similar to what you just said. Now the boundaries of all roles are mixing together. Designers can do more engineering work, engineers can do more design, and PMs can do more building work.

Moreover, many engineers didn’t take on task triage or project management roles largely because they had to spend their time writing code. But now that writing code is much easier, you can let agents like Codex analyze feedback and prioritize tasks, freeing up everyone’s time.

So I believe that, to some extent, everyone can do a part of each other’s work. Scott Belsky has a saying called “talent stack collapse,” which I really like, and I believe it is indeed happening.

I have a strong view that when fewer people are needed in a room to do something, things usually get done better, and decisions become purer.

The next question is, if that’s the case, what remains for PMs? I think many PMs should transition. For example, if you are a PM but have always wanted to be an engineer, perhaps you were good at coordinating people but lacked strong engineering skills, now you might want to become an engineering manager instead. With coding agents, this can absolutely work, and it might be a cleaner, more natural role for you.

The same logic applies to another type of PM; perhaps they actually want to do design, and now they should get closer to design and building. But ultimately, the most critical factor is interest. Interest and initiative may be the two most fundamental and important qualities for people in the AGI era.

So I ultimately think about the question very simply. If you inherently prefer writing code, and you’ve only been a PM because “someone has to do it,” then you should delete your old self and directly become an engineer, doing the same things in an engineering manner. The same goes for design.

But if what you genuinely enjoy is spending time with users, even if it takes you a bit away from building, or if you particularly like observing the market and predicting where it will go, then in a sufficiently large team, if there are enough engineers, I believe the PM role can still have space. But ultimately, it depends on what you truly want to do.

To add one more point, I still believe that every problem domain needs a human responsible for it, but I no longer think that person necessarily has to be a PM.

Peter Yang: I feel the same way in my team. I think the best engineers never come to me asking, “Peter, what should we do next?” They go directly to talk to users, figure out what needs to be done, and then come back to discuss with me. It seems like many teams are moving in that direction; everyone is on the same page. The Codex team should be similar, right?

Alex & Romain: Many of the features used in the Codex app today were proposed by engineers themselves because they wanted those features. Indeed, many have come this way. But I also want to say that I particularly appreciate a type of engineer who enjoys spending time with users and thinking about what should be done.

At the same time, there is another equally strong type of engineer who is incredibly fast, excels at building systems, and thinks deeply but has no interest in chatting with users. I believe such individuals also have ample space.

This is precisely my fundamental view of the AI world. Each of us can become more “truly ourselves.” Do you understand what I mean? Just be yourself. AI and your surrounding team will cover the parts you don’t want to handle.

Peter Yang: That’s a great statement. However, I still feel that the label of “builder” is extremely important. Because I feel that every PM is expected to become a leader by default, and the logic of traditional career ladders is that you eventually need to become a VP or something, and then you no longer have time to build things yourself. You spend your entire day in product reviews, giving feedback here and there. I believe many PMs don’t want to become that way. At least I don’t want to. I want to remain close to users when a product is actually released.

Alex: I completely agree. Honestly, I never see PMs as leadership positions. I prefer to understand it as a role that fills in the gaps. Sometimes this role does require some leadership, but even then, that kind of leadership is more about helping everyone align rather than being the genius strategist who proposes the only correct direction.

However, one thing I can say for sure is that the best PMs at OpenAI are deeply involved in the front lines. And because of that, if you join OpenAI in a senior leadership role, it can be quite challenging because there’s still a strong need for you to dive into the details.

So you need to find a way to balance high-level responsibilities while still being genuinely engaged at the front lines. Personally, I believe the best way to join here is always to dive into the front lines.

What Does the Codex Team Look for When Hiring? It’s Not Your Resume

Peter Yang: Last question. You finally hired another PM. When you’re looking for members for the Codex team, aside from requiring them to be heavy users of Codex, what other traits do you value? What kind of people are you looking for?

Alex: We can both answer this question. I’ll go first. I’ve already mentioned this once before; I would return to that word: initiative.

Ultimately, “people who take the initiative” are the most important, both at OpenAI and especially in the Codex team. We intentionally do not structure the team in a way where, once you join, someone says, “Here are 12 tasks, increasing in difficulty; do them in order.”

Here, it’s more like, you come in. Alright, welcome aboard. That’s it. After that, it’s up to you.

So I particularly value those who are self-starters, proactive, energetic, and have ideas about which things are worth pursuing. Another important trait is that they are not afraid to propose differing opinions simply because existing ideas are in place. Because honestly, many of our existing decisions might have been made under certain random circumstances and are probably not right.

To idealize it further, if a person can actively absorb additional responsibilities and is willing to take on those that are still unclear and undefined, I would consider them almost the perfect teammate.

So these are the core and uppermost standards I believe are essential. If you just ask what role fits best here, my answer remains that any technical role, especially in engineering, is suitable.

Romain: I agree. From my side, in terms of developer experience, I usually look for high-initiative people, and they also need to be very technical, preferably already adept at using tools like Codex.

But beyond that, I particularly value a certain passion—whether you genuinely want to spend time with developers and builders and are willing to share your knowledge and experiences.

For instance, this week we just announced that Thomas will be joining my team this month. He’s the one who created the open-source Codex Monitor. I’m very pleased about this because he is a highly creative, productive person who is also very good at using Codex, but he also loves to share how he uses Codex to build things.

What we genuinely want to do is bring millions of developers into the new future represented by Codex. I believe agentic coding is fundamentally changing our understanding of software, applications, and product development.

There’s so much potential to show the world that anyone can build anything, and we can guide them through the process. So that’s probably the type of person I’m looking for.

Alex: Let me see if I understand correctly. In my mind, the definition of the DevX position is roughly: a very strong engineer who also excels at using Twitter.

Romain: You’re right about half of it; I need to add a footnote. Here, the term “good at Twitter” more accurately means “skilled at communicating with our community.”

Because if you go to some places in the world, you’ll find that many developers don’t use Twitter that frequently. For example, in Europe and some other regions, people use LinkedIn or other platforms more. So we need to clarify that what’s truly important is being able to communicate effectively on social media globally.

So it can be summarized as: you must be adept at social media. This point is definitely important. I also genuinely enjoy spending time teaching and doing educational things.

Peter Yang: I feel that whether a person has initiative can often be seen even before the formal interview, right? For example, do they consistently post online? Do they have side projects?

Alex: Absolutely. So if someone messages me expressing interest in collaborating, my first reaction is actually: does it have a link? As long as there’s a link, I usually click it.

Of course, I might first check if the link is ridiculous, but honestly, I almost always click it. I’m just curious. Then if they casually attach a paragraph of their thoughts in the message, I usually read it carefully.

As for the next statement, I’m not sure if it sounds a bit harsh, but if someone sends me a long explanation of “why I’m interested in this position” along with a resume, I tend to pay less attention to that than to “their thoughts” and “what they have done.” What I really want to see is what you thought and what you did.

And just the other day, someone asked me this question, and I suddenly realized that I didn’t even know where many people graduated from.

Peter Yang: Who cares? Really. Who cares about that? I’m actually quite glad we live in an era where many of those past silly credentials are no longer as important. Who cares about prestigious schools or degrees? Just show me what you’ve done.

TRAE vs Windsurf/Cursor: Which AI IDE is Right for You?

Mon, 30 Mar 2026 00:00:00 +0000

Introduction

Many people are unaware that the difference between TRAE and Windsurf/Cursor is not just about which is stronger, but rather which is more suitable for you.

In the domestic development environment, this is particularly relevant. You might think you’re choosing an AI IDE, but often you’re actually selecting based on network reliability. Before you even start coding, the connection can test your patience. At this point, the advantages of domestic tools like TRAE become apparent, although they also have their shortcomings.

The Most Immediate Issue: Domestic Access

Windsurf and Cursor are foreign products. For domestic users, common issues often include:

General instability in login and access
More noticeable network latency
Inconsistent response times during certain periods
Updates, synchronization, and model calls can be affected by network issues

In short, if you want AI to help improve your efficiency, it sometimes first requires you to practice patience.

On the other hand, domestic tools like TRAE have the following advantages:

Smoother access within China
Lower barriers for registration and usage
Generally better network stability
More friendly to Chinese environments

This is crucial for high-frequency development. AI tools are not used once a day but repeatedly. If each call is delayed by two seconds, it can become frustrating over the course of a day.

Advantages of TRAE: Not Necessarily the Strongest, but More Reliable

Natural Understanding of Chinese

If you usually write comments, request features, or describe bugs in Chinese, TRAE often understands your requests more naturally. For example, if you say, “This list page needs a filter and should be mobile-compatible,” it can typically grasp the intent better.

More User-Friendly Cost

Many foreign AI IDEs have issues not only with network access but also with payment methods, subscription costs, and ongoing usage barriers. TRAE is usually more suitable for individual developers to start with, experiment, and get up and running quickly.

Closer to Everyday Domestic Development

Many personal full-stack developers engage in tasks such as:

React/Vue page development
Node.js API writing
CRUD operations and debugging
Small backend systems
Personal projects or side gigs

In these scenarios, the most important factors for tools are not whether they can discuss advanced architectural theories, but rather how smooth, stable, and quickly they can produce results. In this regard, TRAE does have an advantage.

However, TRAE’s Shortcomings Must Be Addressed

Engineering Depth Often Lags Behind Windsurf and Cursor

Windsurf and Cursor excel in their mature integration of “AI + IDE + engineering context.” They typically offer a higher level of completion, especially in multi-file projects, cross-module modifications, and continuous understanding of context.

Complex Project Capabilities May Not Be Superior

If you are working on medium to large full-stack projects, complex state management, or legacy system refactoring, Windsurf and Cursor often feel like experienced veterans. TRAE is more like a reliable partner, but it may not always match the capabilities of top foreign tools in complex engineering scenarios.

Ecosystem Development is Still a Work in Progress

Currently, foreign tools tend to have richer tutorials, case studies, community discussions, and ecosystem support. While domestic tools are improving rapidly, they still need time to catch up in this area.

Strengths of Windsurf and Cursor

Cursor is more focused on smooth daily coding, with natural integration of completion, modification, and chat features.
Windsurf emphasizes task progression and engineering collaboration, feeling more like a proactive partner.

If you are consistently working on real projects rather than occasionally adding a few lines of code, you will notice this maturity difference.

But the honest truth remains: No matter how strong a tool is, if it doesn’t connect smoothly, it will affect the experience.

Choosing the Right Tool for Personal Full-Stack Development

If you are a domestic individual developer, my advice is straightforward:

Prioritize ease of use, stability, and Chinese language friendliness: Choose TRAE
Prioritize maturity, engineering capability, and overall experience: Choose Cursor or Windsurf
If network conditions are average and you want to avoid hassle: TRAE is more realistic
If you’re willing to deal with network issues for a more mature AI IDE experience: Cursor/Windsurf are worth trying

In simple terms:

TRAE is like a car that drives well in domestic conditions.
Cursor and Windsurf are like higher-performance cars but are more selective about road conditions.

Conclusion

For personal full-stack development, tools are not meant to be worshipped; they are meant to get work done.

The value of TRAE is not necessarily to completely surpass Windsurf/Cursor, but rather that in a domestic environment, it can more easily become a tool you can use long-term. The value of Windsurf/Cursor lies in their maturity, completeness, and being more like the next generation of AI IDEs.

Do you value “strength” more, or do you prioritize “stability”? Feel free to share your thoughts.

Photography in the Age of AI: Upholding the Truth of Presence

Sun, 29 Mar 2026 00:00:00 +0000

Photography in the Age of AI: Upholding the Truth of Presence

As generative artificial intelligence increasingly engages in image production, it leverages deep learning to optimize light and shadow, complete details, and even generate content, significantly lowering the barriers to image creation. This presents opportunities for a flourishing new popular art but also raises discussions about the essence and value of photographic art. How should photography respond to technological, cultural, and contemporary inquiries in the AI era?

For both the public and professional photographers, it is crucial to understand the relationship between AI-generated images and photographic art. Some AI-generated images may appear indistinguishable from photographic works, yet they possess essential differences, leading to several issues. Firstly, there is a tendency towards homogenization. While AI-generated images may exhibit visual refinement, they lack the subtle variations of light and shadow, the serendipitous qualities of scenes, and the temporal marks present in live captures, resulting in a tendency for “uniform refinement.” Secondly, there is a superficiality in content. Some creations overly rely on AI effects, diminishing the capture of human emotions and leading to works that are visually appealing but lack depth. Thirdly, there is a detachment from real scenes. Some creations replace real observation with virtual generation, weakening the deep connection between photography and real life, straying from the pursuit of authenticity and warmth in photographic art. These issues stem not from flaws in AI technology itself but from creators’ unclear understanding of the essence of photographic art.

In light of AI’s deep penetration, photographic art must first adhere to the creative principle of “Dao, Ti, Qi, Yong.” “Dao” represents the spiritual core and essence of art, while “Qi” refers to the tools and means of creation. When AI significantly lowers the threshold for “Qi,” “Dao” becomes the “moat” for photographic art to respond to technological changes. The choices of what to capture, express, and present in terms of human warmth and aesthetic intent depend on the photographer’s inner understanding of “Dao.”

Deepening theoretical research and constructing a Chinese photography theory system is vital for solidifying the foundation of photographic art development. Chinese traditional art philosophy provides rich nourishment for photographic art. Concepts from traditional painting, such as “expressing spirit through form” and “the interplay of reality and illusion,” offer significant inspiration for contemporary photographic art. Photographers should further explore the theoretical resources within Chinese traditional art philosophy, clarify the intrinsic connections between Eastern artistic methodologies and the essence of photography, and form a photography theory system that embodies national characteristics, contemporary features, and international perspectives. Additionally, special research should be conducted on contemporary topics such as “exploring the integration of Chinese photography and aesthetics through new productive forces” to ensure that theoretical outcomes genuinely serve creative practice.

The “truth of presence” is an irreplaceable intrinsic attribute of photographic art. The “presence” in photography refers to immersive participation in physical settings, empathetic expression of emotions, and the artistic refinement of contemporary scenes. To guide photographers in upholding the “truth of presence,” the China Federation of Literary and Art Circles focuses on significant themes and organizes creative practice activities, encouraging photographers to engage with grassroots communities, capture the pulse of the times, and showcase the lives of ordinary people. At the same time, professional photographers are encouraged to share creative methods and aesthetic concepts with the public, fostering a creative atmosphere of “universal presence” and building a photography ecosystem characterized by “professional leadership, public participation, and the emergence of quality works.”

Talent remains a crucial support for the prosperity and development of the arts. In the digital age, there is a call for a group of outstanding photography talents who can adapt to contemporary needs and possess innovative awareness and interdisciplinary capabilities. To this end, the China Photographers Association actively implements the China Federation of Literary and Art Circles’ deployment to strengthen interdisciplinary talent development, constructing a talent training system that integrates “Chinese art philosophy + intelligent technology application + public creative perspectives.” This system emphasizes enhancing photographers’ theoretical literacy and aesthetic cultivation, solidifying the foundation of Chinese cultural positions; strengthening training in intelligent technology application capabilities, allowing creators to become trainers and collaborators, establishing logical thinking that coexists with AI, and achieving a unity of technological empowerment and essence preservation; and nurturing a sense of people’s feelings and contemporary responsibility, ensuring that creation remains rooted in the people and serves the times, providing talent support for the high-quality development of photographic art.

Photography in the age of AI is at a critical juncture of theoretical innovation and practical breakthroughs. We look forward to photographers being guided by the spirit of Chinese aesthetics, using “the new era in scenes” as a stage, and following the path of flourishing new popular art, to uphold the “truth of presence” and adhere to “governing technology with Dao,” creating more works of profound thought, human warmth, and contemporary significance.

My Free Vibe Coding Tutorial Goes Viral!

Wed, 14 Jan 2026 00:00:00 +0000

Introduction

Hello everyone, I am Programmer Yupi.

Vibe Coding has taken the internet by storm. Not only programmers but also designers, product operators, and even those with no technical background are using Vibe Coding to turn their ideas into products and generate revenue.

To help everyone keep up with the times, I have worked tirelessly to create a comprehensive Vibe Coding Beginner’s Tutorial, which is completely free and open source!

With thousands of images and hundreds of thousands of words, this tutorial combines my two and a half years of AI programming experience, project development experience, and product monetization experience. My only goal is to help anyone quickly master Vibe Coding, enabling them to develop and launch their products profitably, even with zero foundation.

I dare say this free tutorial surpasses 90% of paid Vibe Coding content because I have invested a significant amount of time into it.

Tutorial documentation source: GitHub
Online reading address: AI Codefather

Feel free to star, bookmark, and share it with your friends!

What is Vibe Coding?

In simple terms, Vibe Coding is about chatting with AI in plain language to help you write code. You don’t need to memorize any syntax; just clearly state your requirements, like “help me create a bookkeeping page,” and AI can generate it for you. Programming becomes as natural as chatting, which is the charm of Vibe Coding.

Why Learn Vibe Coding?

Learning programming used to take months, but now with Vibe Coding, you can get started in just a few days. You can think of an idea today and implement it today, boosting productivity by dozens of times!

With Vibe Coding, you can quickly create small tools to improve office efficiency, develop applications to solve life problems, and turn your ideas into real products that can generate profit.

What Does This Tutorial Include?

Although there are many AI programming tutorials online, they are either too fragmented, focus only on tools without discussing methods, or lack practical case studies.

This leads to a situation where learners can only piece together knowledge from various sources, making it hard to systematically master Vibe Coding.

Therefore, I took action!

This tutorial covers all aspects of Vibe Coding. From zero basics to creating your first project in 10 minutes, learning various AI programming tools, practical AI projects, mastering core AI programming techniques, and running through the entire product monetization process, along with AI programming learning resources, AI knowledge encyclopedia, and common problem-solving manuals, it can help you navigate Vibe Coding and meet various needs.

I’ve carefully organized the content structure so you can learn comprehensively or quickly find suitable content for your reading.

Essential Readings: Quickly understand Vibe Coding and practice to create your first work in 10 minutes.
Programming Tools: Choose suitable AI programming tools, including AI model selection, no-code platforms, AI agents, code editors, command-line tools, IDE plugins, etc.
Project Practice: Step-by-step guidance from 0 to 1 to create real usable products, covering personal tools, AI applications, full-stack applications, mini-programs, and more.
Experience and Techniques: Improve Vibe Coding efficiency and quality, including core principles, dialogue engineering, context management, hallucination handling, and code quality assurance.
Product Monetization: Learn how to create value from products, covering demand analysis, technology selection, architecture design, profit models, SEO optimization, and self-media operations.
Programming Learning: Advanced content for those who want to delve deeper into programming, including learning paths, knowledge encyclopedias, resource collections, MCP development, and interview preparation.
Resource Library: A collection of various practical resources, including tool collections, prompt templates, AI concept encyclopedias, and common Vibe Coding issues.

This tutorial is not a dry theoretical compilation but focuses on practical applications. It includes rich project cases and numerous screenshot examples, guiding you to learn by doing and truly master Vibe Coding.

Who Is This Tutorial For?

1) Anyone looking to enhance efficiency with AI If you have ever wanted to learn programming but were deterred by complex syntax and difficult concepts; or if you have great ideas and want to quickly develop and launch your products; or if you simply want to use AI to improve daily office efficiency and create small tools to solve repetitive tasks, Vibe Coding allows you to get started in just a few days, programming as naturally as chatting.

2) Programmers looking to boost efficiency If you are a traditional programmer tired of repetitive coding, Vibe Coding can boost your productivity significantly. The experience and project practices in the tutorial can help you quickly advance to become a Vibe Coding expert.

3) Entrepreneurs looking to monetize products If you want to turn your ideas into products and generate profit, this tutorial teaches you not only how to create products but also how to derive value from them. From demand analysis to profit models, from SEO optimization to self-media operations, I will share my experience from creating over 10 self-developed products and growing from 0 to 2 million followers.

How to Start Learning?

For complete beginners

Day 1: Read essential readings to understand Vibe Coding and create your first work.
Weeks 1-2: Learn AI programming tools and complete a few simple projects.
Thereafter: Learn experience techniques and product monetization as needed.

For those with programming basics

Day 1: Quickly go through the basic content and complete the quick start tutorial.
Week 1: Learn mainstream AI programming tools and try to refactor previous projects.
Thereafter: Focus on advanced techniques to improve dialogue and context management skills.

Practice is the best teacher. Regardless of your background, engage with various projects during your learning process, encounter problems, and solve them; this is the most effective way to learn.

Conclusion

I have always believed that knowledge sharing is mutually beneficial.

This tutorial is completely free and open source, and I hope it can help more people unlock the doors to Vibe Coding.

However, since it is written by one person, there may be shortcomings, and I will continue to update and improve the content.

If this tutorial helps you, I hope you can like or star ⭐️ it to show your support!

Don’t hesitate; open the tutorial now, and in 10 minutes, you can create your first work and embark on your Vibe Coding journey with me!

Hands-On with Kimi's OK Computer Full-Stack Assistant

Wed, 01 Oct 2025 00:00:00 +0000

Introduction

Hello everyone, I’m Leng Yi. Today I will be testing Kimi’s latest full-stack assistant, “OK Computer.”

A few days ago, I received a thank-you letter from Kimi, expressing gratitude for my support last year through tips. They offered me membership benefits equivalent to my tip amount, allowing me to experience the latest model capabilities firsthand.

Honestly, I didn’t expect Kimi to remember its early supporters. This was really thoughtful! So, I didn’t hesitate to upgrade to their highest tier, Moderato membership (which allows 20 uses of “OK Computer”) and received an additional 5 months of membership.

After upgrading, I quickly received an invitation to the internal testing of “OK Computer” (all users who tipped would receive priority invitations).

Experience with OK Computer

At the Yunqi Conference a few days ago, I saw an interesting product called “AI Exchange” (a platform that connects AI demanders and developers). I decided to see if I could use “OK Computer” to create a website prototype.

1) Developing the AI Exchange Website

I opened Kimi’s official website, kimi.com, selected “OK Computer,” and it was ready to use (I could also see my usage quota).

I input the task:

Functional Requirements

User System Registration and Login: Support phone number registration. User roles: demander (Buyer) and developer (Seller), which can be concurrent. Personal center:
- Buyer: Demand management, transaction records, favorite services.
- Seller: Service/model release, pricing management, transaction records.
AI Service/Product Release and Display: Developers can publish services by filling in service descriptions, function scope, prices (fixed price/bargaining), and delivery cycles. Display page: Service detail page: functions, prices, case studies, rating leaderboard/recommendation page: display based on popularity, ratings, and transaction volume. Search and filter: by price, tags, AI fields (e.g., voice, image, text, video), delivery cycles.
Demand Release and Matching: Buyers can publish clear demands (e.g., “I need an image recognition agent with a budget of 2000 yuan”). The system recommends suitable sellers, or sellers can bid actively.
Transaction System Matching Logic: Supports bargaining and fixed-price direct orders. Payment process: funds escrow, released after delivery confirmation. Order management: status transitions (pending confirmation → in development → pending delivery → completed/canceled).
Credit and Evaluation: Completed orders allow buyers to rate and evaluate sellers. The platform displays developers’ credit ratings and transaction history.
Display and Recommendations: Homepage sections: popular demands, quality developer recommendations, recent transaction displays. Dynamic wall: real-time scrolling of the latest transactions. Case library: showcases quality successful cases.

First, let’s look at the finished product.

Experience URL:
https://vcnj4jhe2thpy.ok.kimi.link/index.html

The overall functionality is quite complete, and it’s the website prototype I wanted. How did it do this?

Once I sent the task, Kimi quickly powered up (the virtual computer) and got to work.

The first thing it did was act as a project manager, analyzing the entire requirement and breaking the project down into 11 sub-tasks.

Next, it continued as a product manager and UI designer, writing the PRD (Product Requirement Document) and visual design plan, clarifying the website features and visual design.

Since our website had a high demand for images, Kimi searched for relevant image materials and even generated a background image by itself. It created a resource folder and downloaded everything.

Then, it transformed into a front-end development engineer, developing HTML pages, including the homepage (index.html), service market page (marketplace.html), demand release page (demands.html), and personal center page (profile.html).

Finally, before deployment, Kimi acted as a testing engineer and operations engineer, conducting final functionality tests and optimizations before deploying to the server.

During its first check, “OK Computer” found a terminal runtime failure. It tried a new port and ultimately succeeded in deployment.

The final link was delivered to us, accessible publicly and shareable with others, viewable on both mobile and desktop.

Using the same prompt, I ran it again, this time with a more modern tech feel.

Experience URL:
https://kuleem2nugt64.ok.kimi.link/

2) Pixel Art Interview Program

Next, I had Kimi run a more complex task.

The prompt was:

Project Objective: Create a complete pixel art web application simulating a Western TV news/music interview program themed “Coldplay Concert Kiss-Cam Incident and Public Privacy Discussion”, including 3 minutes of dual audio and 20 synchronized pixel art images. —#

Visual Style Requirements

Overall Style: 8-bit pixel art + elements from Western TV news/concert broadcasts (live broadcast graphics, news tickers)
Color Scheme: Retro game colors (#FF6B6B, #4ECDC4, #45B7D1, #96CEB4)
Character Design: 2 pixel characters
- Host (Western news anchor style)
- Guest (media/cultural commentator style)
Background Elements:
- Coldplay concert stadium stage
- Large audience area (glow sticks, mobile screens)
- Pixelated Kiss-Cam large screen framing
- Pixelated social media interface
- Studio commentary scene

Audio Content Requirements

Duration: 3 minutes (180 seconds)
Language: English (news podcast style)
Format: Dual conversation
Theme: Discussing the privacy issues raised by the Coldplay concert kiss-cam incident, including social media dissemination, brand reactions, and future outlook.
Structure:
- 0–30s Opening: Host introduces the background of the incident
- 30–90s: Incident dissemination chain (live screen → audience recording → social media)
- 90–150s: Subsequent reactions (company investigation, artist response, fan culture)
- 150–180s: Future outlook (privacy reminders, concert management, platform responsibilities)
Character Setting:
- Host (Anchor): Calm, professional
- Guest (Commentator): Media/sociological analysis, explaining how the incident became a global topic

Image Generation Requirements

Quantity: 20 pixel art illustrations
Size: 320×240 (retro game console resolution)
Switching Frequency: Every 9 seconds, synchronized with audio
Content Types:
- Character Portraits (6 images): Different expressions and poses of the host/guest
- Scene Illustrations (8 images): 1. Studio scene (news anchor desk) 2. Stadium night scene (Coldplay stage lights, glowing audience) 3. Kiss-Cam pixelated screen (crowd pixelated mosaic) 4. Close-up of fan area (waving glow sticks) 5. Social media interface (pixelated tweets/comments) 6. Company meeting room (silhouette style) 7. Coldplay stage background (lights and confetti) 8. News broadcast graphic (“Privacy Debate”)
- Data Visualization (6 images):
  - Popularity curve, retweet volume bar chart, dissemination chain illustration, privacy risk matrix, fact-checking process, future improvement checklist

Prompt Template:
[pixel art], [8-bit retro game style], [Western TV news broadcast + stadium concert scene], [bright retro colors], [320×240 resolution], [no real faces recognizable]—#

Web Functionality Requirements

Custom pixel art audio player
Audio and image timeline synchronization (switch every 9 seconds)
Pixel UI control panel (play, pause, speed, subtitle toggle)
Responsive design (desktop & mobile, maintaining pixel clarity)

Technical Implementation Plan

Step 1: Audio Generation

Generate a 3-minute dual conversation in English using AI voice synthesis (two voice tones: Anchor/Commentator)
MP3 format, 128kbps, 44.1kHz
Script divided into 4 segments (every 30 seconds for synchronization)

Step 2: Image Generation

Use pixel art models to generate 20 images, ensuring consistent color and style across stadium, stage, and news studio elements

Step 3: Web Development

Tech stack: HTML5 + CSS3 + JS
Use Audio API to synchronize image carousel
CSS3 pixel animations (fade-in, flashing subtitles)

Compliance and Narrative Boundaries

Do not display recognizable private faces, only use mosaics/back views to represent the audience
Focus discussion on public events, privacy issues, and cultural responses
Add a footer statement on the webpage:
“This is a pixel-art simulation for educational and creative purposes, not depicting any individual.”

The final product can indeed play, and the page looks good and is fun. This prompt was quite a brain teaser.

Experience Link:
https://54kvxp256tfss.ok.kimi.link

3) Financial Analysis of a Public Company

Now let’s try something simpler.

The prompt was: Conduct a data-driven financial analysis of Alibaba, producing a variety of charts (such as time series, comparison, composition, decomposition, sensitivity, etc.), each with clear explanations and conclusions, presented in a slightly softer version of Neo-Brutalism style. Use Plotly to draw the charts.

The report produced was impressive, with all data being real and credible.

The stock price time series chart is more reliable and user-friendly than financial websites.

This market sensitivity analysis is something that is typically considered paid content elsewhere.

Initially, Kimi delivered results with 2 charts that could not display. We provided direct feedback, and it quickly fixed the issue.

Here’s the final link; everyone is welcome to check it out.

Experience Link:
https://qmcdsjunkjx6w.ok.kimi.link

4) Mini Game Collection Website

Help me clone this repo (
https://github.com/he-is-talha/html-css-javascript-games/tree/main), and create a homepage for the mini games to serve all the games so that each game can be played, and deploy this homepage. The webpage style should be modern and cool.

This page looks really cool.

All games are playable. For example, I had a blast playing this archery game.

Experience URL:
https://ip5plyaaqdtyi.ok.kimi.link

5) Analysis PPT of The Wandering Earth 2

Create a PPT analyzing the visual symbols of the movie “The Wandering Earth 2,” consisting of 15 pages, using only original movie images and in-depth analysis articles from film/academia.

Creating a PPT is indeed better with Kimi. Finally, there’s an AI product that bridges HTML PPT and traditional PPT.

Previously, many users reported that converting HTML-style PPT to PPTX/PDF format resulted in a significant loss of quality, with text misalignment and layout chaos, making it unusable. Kimi seems to be the first agent that can produce beautiful PPTs with coding models and download them in PPT format without losing quality, which is really impressive.

This PPT is stunning, and the content is fascinating.

6) Mood Cocktail Mixer

Help me create a cocktail simulator where users can choose cocktail ingredients, their mood (e.g., happy, sad, etc.), and desired flavor (e.g., sweet, fruity), to create a personalized drink and experience the fun of mixing cocktails.

This is also fun, allowing users to DIY drinks based on their mood.

Experience URL:
https://z67qf26v26cce.ok.kimi.link/

I mixed a drink called “Happy Hour.” Would you dare to drink it?

Overall, my experience was:

The generation speed is very fast. My 6 cases were generated in about 5-8 minutes each, faster than Manus and Genspark, which often take over 10 minutes. After all, it’s their own base model, so the agent shouldn’t be slow.
The delivery quality is very high. Whether it’s precise prompts or simple prompts, Kimi can deliver high-quality outputs. Especially with vague expressions, it can still surprise you.
Aesthetic sense is on point. The K2 Agent model has had a high aesthetic performance since its inception, and it has now iterated to K2 Turbo, enhancing its aesthetic appeal even further.
Low hallucination rates. This has always been Kimi’s advantage; its own base model’s encyclopedic knowledge and RAG do a good job in reducing hallucinations, resulting in low content hallucination rates.

About OK Computer

What is the origin of “OK Computer”? I asked Kimi.

The name comes from the third studio album released by the British rock band Radiohead in 1997. This album is like a time capsule, encapsulating the anxieties, neon lights, and unnamed digital dawn of the late 20th century.

This is the album cover, with interwoven roads, blurred traffic, and many strange symbols and garbled text… doesn’t it resemble AI-generated images?

The core theme of this album is the alienation of humanity in the technological age, exploring how humans can maintain their essence and emotions in the new technological era. Lead singer Thom Yorke summarized the album: “Embrace the future, have a sense of awe for the future; in a large room where all electronic devices are broken, the sound you hear is OK Computer.”

The lyrics deeply discuss themes of technology, consumerism, political alienation, and human emotional detachment in modern society, and it is regarded as a prophetic work for the information society of the 21st century.

The name “OK Computer” was inspired by a line from Douglas Adams’ 1978 sci-fi radio drama “The Hitchhiker’s Guide to the Galaxy,” which goes:

“Okay, computer, I want full manual control now.”

Kimi chose this name for its first full-stack assistant, not opting for the common cyber syllables found in sci-fi films, nor drawing from mythology or mathematical references, but rather selecting a phrase that carries a Britpop coolness while subtly hiding a humanistic warmth.

“OK Computer?”

“OK, computer, Kimi is powered on.”

Conclusion

From my own testing, Kimi’s full-stack assistant “OK Computer” is quite capable, able to handle a variety of tasks efficiently and with high quality.

It has over 20 built-in tools, such as to-do list creation, Python coding, terminal operations, web browsing, text/image searches, image generation, audio generation, access to professional financial data sources, website deployment, etc., making it adaptable to a wide range of task requirements.

It can work like a team, launching an AI development team that includes product managers, designers, data analysts, and front-end engineers as needed, autonomously researching, planning, analyzing, designing, developing, and deploying high-quality outputs.

Moreover, its aesthetic sense is also on point, meeting responsive and mobile-friendly standards.

Since the release of K2, both developers and ordinary users around me have generally recognized that Kimi has truly stood up this time.

Among the AI chatbots I frequently use, Kimi has always been one of the high-frequency AIs, smart, practical, and with a touch of humanity.

OK, computer, Kimi is powered on; let your creativity begin.

AI on qhhwx News

The Question of AI Consciousness: Is It Worth Asking?

The Question of AI Consciousness

Essential AI Terminology You Need to Know

Introduction

Agent/Agentic

AI Ethics

AI Psychosis

AI Safety

Algorithm

Alignment

Anthropomorphism

Artificial General Intelligence (AGI)

Artificial Intelligence (AI)

Bias

Chatbot

Claw

Cognitive Computing

Data Augmentation

Dataset

Deep Learning

Diffusion

Emergent Behavior

End-to-End Learning (E2E)

Foom

Generative Adversarial Networks (GANs)

Generative AI

Guardrails

Hallucination

Inference

Large Language Model (LLM)

Latency

Machine Learning

Multimodal AI

Natural Language Processing

Neural Network

Open Weights

Overfitting

Paperclips

Parameters

Prompt

Prompt Chaining

Prompt Engineering

Prompt Injection

Quantization

Slop

Stochastic Parrot

Style Transfer

Sycophancy

Synthetic Data

Temperature

Token

Training Data

Transformer Model

Turing Test

Unsupervised Learning

Vibe Coding

Weak AI / Narrow AI

Zero-Shot Learning

Claude's Sleep Reminders Spark Debate on AI Personality Design

Claude’s Sleep Reminders Spark Debate

The Unforeseen Habits of AI Personalities

The Disparity in Personality Investment

Misplaced Concern Reveals AI’s Understanding Blind Spots

The Cost of Personality Design

Switching from Trae to Codex: A Comprehensive Comparison

1. Honest Review: Satisfied with Trae After 8 Months

2. Turning Point: 3 Core Pain Points Trae Couldn’t Handle

3. In-Depth Comparison: Trae vs Codex, Core Differences at a Glance

4. Transitioning to Codex: 1 Month of Testing, Efficiency Exceeds Expectations

5. Rational Summary: There’s No Best Tool, Only the Most Suitable Tool

Tencent's Yuanbao Integrates AI in WeChat with New Features

Group Chat Summaries Available, but Requires Forwarding to Yuanbao App

Falling Out of the Top Three, but Yuanbao Remains in the Game

Tencent’s AI Strategy Accelerates on Multiple Fronts, Yuanbao Remains Key Before Major Moves

Codex AI Achieves 40x Research Efficiency in Groundbreaking Experiment

Introduction

What is Codex /goal Mode?

Why is Codex /goal Important?

PhD 80 Hours vs AI 2 Hours