As one example, I tried using Claude Opus 4.6 to generate a program that would interpret a custom DSL I use for typesetting grammars, and generate Haskell type definitions. After 8 hours of prompting, several million tokens, the code it generated was still absolutely useless. It passed the tests I had prompted it on, but just looking at the code, one could easily identify type errors and logic that tried to special case specific identifiers from the tests. The logic for sanitizing identifiers was a mess, and would occasionally generate empty strings. A correct implementation would take me 300—400 line of code to write, which I can certainly write in less than 8 hours.
25-летний турист из России загадочно пропал в Таиланде20:46
,更多细节参见搜狗输入法
而据《智能涌现》了解,经过调整后,魅族手机剩余员工只有400人左右,后续有两个去处,一部分整合进魅族的flyme车机团队,另一部分转向AI软件方向的探索。
model: google/gemini-3.1-flash-lite-preview
Return to citation ^