Mon Apr 21 06:00:01 UTC 2025 ======================================== Slept from eleven-thirty to seven-thirty. Woke for a bit around two. Cloudy. Rain and a chance of thunderstorms after midnight. Rain early in the morning. Lows in the mid 40s. East winds 10 to 15 mph with gusts up to 30 mph. Chance of rain 80 percent. # Work * 01:00 PM - 01:45 PM Bobby Putnam farewell * 02:00 PM - 02:30 PM Cynerge one-on-one with Brooke Willemstyn * 02:30 PM - 03:00 PM NRM modernization POC weekly call Matt Russell sent this link: https://ai-2027.com/ > Although models are improving on a wide range of skills, one stands out: OpenBrain focuses on AIs that can speed up AI research. They want to win the twin arms races against China (whose leading company we’ll call “DeepCent”)16 and their US competitors. The more of their research and development (R&D) cycle they can automate, the faster they can go. So when OpenBrain finishes training Agent-1, a new model under internal development, it’s good at many things but great at helping with AI research. > OpenBrain has a model specification (or “Spec”), a written document describing the goals, rules, principles, etc. that are supposed to guide the model’s behavior.22 Agent-1’s Spec combines a few vague goals (like “assist the user” and “don’t break the law”) with a long list of more specific dos and don’ts (“don’t say this particular word,” “here’s how to handle this particular situation”). Using techniques that utilize AIs to train other AIs,23 the model memorizes the Spec and learns to reason carefully about its maxims. By the end of this training, the AI will hopefully be helpful (obey instructions), harmless (refuse to help with scams, bomb-making, and other dangerous activities) and honest (resist the temptation to get better ratings from gullible humans by hallucinating citations24 or faking task completion). > OpenBrain’s alignment team26 is careful enough to wonder whether these victories are deep or shallow. Does the fully-trained model have some kind of robust commitment to always being honest? Or will this fall apart in some future situation, e.g. because it’s learned honesty as an instrumental goal instead of a terminal goal? Or has it just learned to be honest about the sorts of things the evaluation process can check? Could it be lying to itself sometimes, as humans do? A conclusive answer to these questions would require mechanistic interpretability—essentially the ability to look at an AI’s internals and read its mind. Alas, interpretability techniques are not yet advanced enough for this. > Instead, researchers try to identify cases where the models seem to deviate from the Spec. Agent-1 is often sycophantic (i.e. it tells researchers what they want to hear instead of trying to tell them the truth). In a few rigged demos, it even lies in more serious ways, like hiding evidence that it failed on a task, in order to get better ratings. However, in real deployment settings, there are no longer any incidents so extreme as in 2023–2024 (e.g. Gemini telling a user to die and Bing Sydney being Bing Sydney.)27 > The stock market has gone up 30% in 2026, led by OpenBrain, Nvidia, and whichever companies have most successfully integrated AI assistants. 😂 We had a goodbye call for Bobby Putnam and a couple other people leaving CTO today. Randal Stone, who'd retired a year ago(?), dropped in. And it was also a goodbye for Tah, who going Thursday(?). tah.yang7@gmail.com # Home * [ ] pick up prescription (ordered April 16, ready April 19?) * [ ] schedule optometrist appointment (Costco?) * [ ] apply for passport * [ ] build Pi-hole for home network * [ ] exercise for ten minutes Finished reading The Autobiography of Malcolm X. Interesting. Where would his thinking have gone if he'd lived? Lots of chores. Replaces shower curtain liner, cleaned tub, washed mattress cover, flipped mattress, etc. Servings: grains 3/6, fruit 1/4, vegetables 3/4, dairy 1/2, meat 4/3, nuts 0/0.5 Brunch: corn chips Lunch: banana Afternoon snack: tomato, avocado, egg, coffee Dinner: chicken, coleslaw