The Nonlinear Library: LessWrong-logo

The Nonlinear Library: LessWrong

Education Podcasts

The Nonlinear Library allows you to easily listen to top EA and rationalist content on your podcast player. We use text-to-speech software to create an automatically updating repository of audio content from the EA Forum, Alignment Forum, LessWrong, and other EA blogs. To find out more, please visit us at nonlinear.org

Location:

United States

Description:

The Nonlinear Library allows you to easily listen to top EA and rationalist content on your podcast player. We use text-to-speech software to create an automatically updating repository of audio content from the EA Forum, Alignment Forum, LessWrong, and other EA blogs. To find out more, please visit us at nonlinear.org

Language:

English


Episodes
Ask host to enable sharing for playback control

LW - Glitch Token Catalog - (Almost) a Full Clear by Lao Mein

9/22/2024
Link to original article Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Glitch Token Catalog - (Almost) a Full Clear, published by Lao Mein on September 22, 2024 on LessWrong. This is a collection of every unidentified GPT2 glitch token listed in the third glitch token archaeology post. I was able to find the source of every single one, except for "?????-" and "?????-?????-"[1]. Please tell me if I missed one, or you've discovered one and don't understand where it came from. This isn't meant to be a well-written analysis, just a quick repository of my glitch-hunting observations. I plan on writing up and categorizing all of these in greater detail in future posts. The first of which is here. I used OpenWebText, a recreation of GPT2's training data, for all experiments in this post. I tokenized every .gz file in the archive and made a boolean Numpy array of each tokens that was present at least once. This allowed me to quickly identify infrequent tokens in the dataset and pull up the textual context with regular expressions. If there was an issue with overlap, I used a tokenizer-based extraction instead. All data/code available upon request. The leftmost column is token id, the middle is the token string, and the right column is # of files the token was present in (out of 20610). GPT2 has 50256 total tokens. GPT2 tokens with the lowest frequency in OpenWebText 30898 'embedreportprint' 0 33434 ' 士' 0 43453 ' SolidGoldMagikarp' 0 1849 '\xa0' 0 47654 ' \xa0\xa0' 0 50009 ' strutConnector' 0 36173 ' RandomRedditor' 0 214 '\x1a' 0 42424 'DragonMagazine' 0 180 ' ' 0 187 ' ' 0 186 ' ' 0 30213 ' externalToEVAOnly' 0 30212 ' externalToEVA' 0 30211 ' guiIcon' 0 185 ' ' 0 30210 ' guiActiveUnfocused' 0 30209 ' unfocusedRange' 0 184 ' ' 0 30202 ' guiName' 0 183 ' ' 0 30905 'rawdownload' 0 39906 'EStream' 0 33454 '龍喚士' 0 42586 ' srfN' 0 25992 ' 裏覚醒' 0 43065 ' srfAttach' 0 11504 ' \xa0 \xa0' 0 39172 '\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0' 0 40240 'oreAndOnline' 0 40241 'InstoreAndOnline' 0 33477 '\xa0\xa0\xa0' 0 36174 ' RandomRedditorWithNo' 0 37574 'StreamerBot' 0 46600 ' Adinida' 0 182 ' ' 0 29372 ' guiActiveUn' 0 43177 'EStreamFrame' 0 22686 ' \xa0 \xa0 \xa0 \xa0' 0 23282 ' davidjl' 0 47571 ' DevOnline' 0 39752 'quickShip' 0 44320 '\n\xa0' 0 8828 '\xa0\xa0\xa0\xa0' 0 39820 '龍 ' 0 39821 '龍契士' 0 28666 'PsyNetMessage' 0 35207 ' attRot' 0 181 ' ' 0 18472 ' guiActive' 0 179 ' ' 0 17811 '\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0' 0 20174 ' 裏 ' 0 212 '\x18' 0 211 '\x17' 0 210 '\x16' 0 209 '\x15' 0 208 '\x14' 0 31666 '?????-?????-' 0 207 '\x13' 0 206 '\x12' 0 213 '\x19' 0 205 '\x11' 0 203 '\x0f' 0 202 '\x0e' 0 31957 'cffffcc' 0 200 '\x0c' 0 199 '\x0b' 0 197 '\t' 0 196 '\x08' 0 195 '\x07' 0 194 '\x06' 0 193 '\x05' 0 204 '\x10' 0 45545 ' サーティワン' 0 201 '\r' 0 216 '\x1c' 0 37842 ' partName' 0 45706 ' \xa0 \xa0 \xa0 \xa0 \xa0 \xa0 \xa0 \xa0' 0 124 ' ' 0 125 ' ' 0 178 ' ' 0 41380 'natureconservancy' 0 41383 'assetsadobe' 0 177 ' ' 0 215 '\x1b' 0 41551 'Downloadha' 0 4603 '\xa0\xa0' 0 42202 'GoldMagikarp' 0 42089 ' TheNitrome' 0 217 '\x1d' 0 218 '\x1e' 0 42090 ' TheNitromeFan' 0 192 '\x04' 0 191 '\x03' 0 219 '\x1f' 0 189 '\x01' 0 45544 ' サーティ' 0 5624 ' \xa0' 0 190 '\x02' 0 40242 'BuyableInstoreAndOnline' 1 36935 ' dstg' 1 36940 ' istg' 1 45003 ' SetTextColor' 1 30897 'reportprint' 1 39757 'channelAvailability' 1 39756 'inventoryQuantity' 1 39755 'isSpecialOrderable' 1 39811 'soDeliveryDate' 1 39753 'quickShipAvailable' 1 39714 'isSpecial' 1 47198 'ItemTracker' 1 17900 ' Dragonbound' 1 45392 'dayName' 1 37579 'TPPStreamerBot' 1 31573 'ActionCode' 2 25193 'NetMessage' 2 39749 'DeliveryDate' 2 30208 ' externalTo' 2 43569 'ÍÍ' 2 34027 ' actionGroup' 2 34504 ' 裏 ' 2 39446 ' SetFontSize' 2 30899 'cloneembedreportprint' 2 32047 ' "$:/' 3 39803 'soType' 3 39177 'ItemThumbnailImage' 3 49781 'EngineDebug' 3 25658 '?????-' 3...

Duration:02:50:10

Ask host to enable sharing for playback control

LW - Investigating an insurance-for-AI startup by L Rudolf L

9/21/2024
Link to original article Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Investigating an insurance-for-AI startup, published by L Rudolf L on September 21, 2024 on LessWrong. We (Flo & Rudolf) spent a month fleshing out the idea of an insurance-for-AI company. We talked to 15 people in the insurance industry, and did 20 customer interviews. We decided not to continue, but we think it's still a very promising idea and that maybe someone else should do this. This post describes our findings. The idea Theory of change To reduce AI risks, it would be good if we understood risks well, and if some organisation existed that could incentivise the use of safer AI practices. An insurance company that sells insurance policies for AI use cases has a financial incentive to understand concrete AI risks & harms well, because this feeds into its pricing. This company would also be incentivised to encourage companies to adopt safer AI practices, and could incentivise this by offering lower premiums in return. Like many cyber-insurance companies, it could also provide more general advice & consulting on AI-related risk reduction. Concrete path TL;DR: Currently, professionals (e.g. lawyers) have professional indemnity (PI) insurance. Right now, most AI tools involve the human being in the loop. But eventually, the AI will do the work end-to-end, and then the AI will be the one whose mistakes need to be insured. Currently, this insurance does not exist. We would start with law, but then expand to all other forms of professional indemnity insurance (i.e. insurance against harms caused by a professional's mistakes or malpractice in their work). Frontier labs are not good customers for insurance, since their size means they mostly do not need external insurance, and have a big information advantage in understanding the risk. Instead, we would target companies using LLMs (e.g. large companies that use specific potentially-risky AI workflows internally), or companies building LLM products for a specific industry. We focused on the latter, since startups are easier to sell to. Specifically, we wanted a case where: LLMs were being used in a high-stakes industry like medicine or law there were startups building LLM products in this industry there is some reason why the AI might cause legal liability, for example: the LLM tools are sufficiently automating the work that the liability is plausibly on them rather than the humans AI exceptions in existing insurance policies exist (or will soon exist) The best example we found was legal LLM tools. Law involves important decisions and large amounts of money, and lawyers can be found liable in legal malpractice lawsuits. LLMs are close to being able to do much legal work end-to-end; in particular, if the work is not checked by a human before being shipped, it is uncertain if existing professional indemnity (PI) insurance applies. People who work in law and law tech are also, naturally, very liability-aware. Therefore, our plan was: Become a managing general agent (MGA), a type of insurance company that does not pay claims out of its own capital (but instead finds a reinsurer to agree to pay them, and earns a cut of the premiums). Design PI policies for AI legal work, and sell these policies to legal AI startups (to help them sell to their law firm customers), or directly to law firms buying end-to-end legal AI tools. As more and more legal work is done end-to-end by AI, more and more of the legal PI insurance market is AI insurance policies. As AI advances and AI insurance issues become relevant in other industries, expand to those industries (e.g. medicine, finance, etc.). Eventually, most of the world's professional indemnity insurance market (on the order of $10B-100B/year) has switched from insuring against human mistakes to insuring against AI mistakes. Along the way,...

Duration:00:26:00

Ask host to enable sharing for playback control

LW - Applications of Chaos: Saying No (with Hastings Greer) by Elizabeth

9/21/2024
Link to original article Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Applications of Chaos: Saying No (with Hastings Greer), published by Elizabeth on September 21, 2024 on LessWrong. Previously Alex Altair and I published a post on the applications of chaos theory, which found a few successes but mostly overhyped dead ends. Luckily the comments came through, providing me with an entirely different type of application: knowing you can't, and explaining to your boss that you can't. Knowing you can't Calling a system chaotic rules out many solutions and tools, which can save you time and money in dead ends not traveled. I knew this, but also knew that you could never be 100% certain a physical system was chaotic, as opposed to misunderstood. However, you can know the equations behind proposed solutions, and trust that reality is unlikely to be simpler[1] than the idealized math. This means that if the equations necessary for your proposed solution could be used to solve the 3-body problem, you don't have a solution. [[1] I'm hedging a little because sometimes reality's complications make the math harder but the ultimate solution easier. E.g. friction makes movement harder to predict but gives you terminal velocity.] I had a great conversation with trebuchet and math enthusiast Hastings Greer about how this dynamic plays out with trebuchets. Transcript Note that this was recorded in Skype with standard headphones, so the recording leaves something to be desired. I think it's worth it for the trebuchet software visuals starting at 07:00 My favorite parts: If a trebuchet requires you to solve the double pendulum problem (a classic example of a chaotic system) in order to aim, it is not a competition-winning trebuchet. Trebuchet design was solved 15-20 years ago; it's all implementation details now. This did not require modern levels of tech, just modern nerds with free time. The winning design was used by the Syrians during Arab Spring, which everyone involved feels ambivalent about. The national pumpkin throwing competition has been snuffed out by insurance issues, but local competitions remain. Learning about trebuchet modeling software. Explaining you can't One reason to doubt chaos theory's usefulness is that we don't need fancy theories to tell us something is impossible. Impossibility tends to make itself obvious. But some people refuse to accept an impossibility, and some of those people are managers. Might those people accept "it's impossible because of chaos theory" where they wouldn't accept "it's impossible because look at it"? As a test of this hypothesis, I made a Twitter poll asking engineers-as-in-builds-things if they had tried to explain a project's impossibility to chaos, and if it had worked. The final results were: 36 respondents who were engineers of the relevant type This is probably an overestimate. One respondee replied later that he selected this option incorrectly, and I suspect that was a common mistake. I haven't attempted to correct for it as the exact percentage is not a crux for me. 6 engineers who'd used chaos theory to explain to their boss why something was impossible. 5 engineers who'd tried this explanation and succeeded. 1 engineer who tried this explanation and failed. 5/36 is by no means common, but it's not zero either, and it seems like it usually works. My guess is that usage is concentrated in a few subfields, making chaos even more useful than it looks. My sample size isn't high enough to trust the specific percentages, but as an existence proof I'm quite satisfied. Conclusion Chaos provides value both by telling certain engineers where not to look for solutions to their problems, and by getting their bosses off their back about it. That's a significant value add, but short of what I was hoping for when I started looking into Chaos. Thanks for...

Duration:00:03:39

Ask host to enable sharing for playback control

LW - Work with me on agent foundations: independent fellowship by Alex Altair

9/21/2024
Link to original article Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Work with me on agent foundations: independent fellowship, published by Alex Altair on September 21, 2024 on LessWrong. Summary: I am an independent researcher in agent foundations, and I've recently received an LTFF grant to fund someone to do research with me. This is a rolling application; I'll close it whenever I'm no longer interested in taking another person. If you're not familiar with agent foundations, you can read about my views in this post. What the role might be like This role is extremely flexible. Depending on who you are, it could end up resembling an internship, a research assistant position, a postdoc or even as a mentor/advisor to me. Below, I've listed out the parameters of the fellowship that I am using as a baseline of what it could be. All of these parameters are negotiable! $25 per hour. This is not a lot for people who live in the SF Bay area, or who are used to industry salaries, but it looks to me like this is comparable to a typical grad student salary. 20 hours per week. I'd like this fellowship to be one of your main projects, and I think it can take quite a lot of "deep work" focus before one can make progress on the research problems.[1] 3 months, with a decent chance of extension. During my AI safety camp project, it took about 6 weeks to get people up to speed on all the parts of the agent structure problem. Ideally I could find someone for this role who is already closer to caught up (though I don't necessarily anticipate that). I'm thinking of this fellowship as something like an extended work-trial for potentially working together longer-term. That said, I think we should at least aim to get results by the end of it. Whether I'll decide to invite you to continue working with me afterwards depends on how our collaboration went (both technically and socially), how many other people I'm collaborating with at that time, and whether I think I have enough funds to support it. Remote, but I'm happy to meet in person. Since I'm independent, I don't have anything like an office for you to make use of. But if you happen to be in the SF Bay area, I'd be more than happy to have our meetings in person. I wake up early, so US eastern and European time zones work well for me (and other time zones too). Meeting 2-5 times per week. Especially in the beginning, I'd like to do a pretty large amount of syncing up. It can take a long time to convey all the aspects of the research problems. I also find that real-time meetings regularly generate new ideas. That said, some people find meetings worse for their productivity, and so I'll be responsive to your particular work style. An end-of-term write-up. It seems to take longer than three months to get results in the types of questions I'm interested in, but I think it's good practice to commit to producing a write-up of how the fellowship goes. If it goes especially well, we could produce a paper. What this role ends up looking like mostly depends on your experience level relative to mine. Though I now do research, I haven't gone through the typical academic path. I'm in my mid-thirties and have a proportional amount of life and career experience, but in terms of mathematics, I consider myself the equivalent of a second year grad student. So I'm comfortable leading this project and am confident in my research taste, but you might know more math than me. The research problems Like all researchers in agent foundations, I find it quite difficult to concisely communicate what my research is about. Probably the best way to tell if you will be interested in my research problems is to read other things I've written, and then have a conversation with me about it. All my research is purely mathematical,[2] rather than experimental or empirical. None of it involves machine...

Duration:00:06:20

Ask host to enable sharing for playback control

LW - o1-preview is pretty good at doing ML on an unknown dataset by Håvard Tveit Ihle

9/20/2024
Link to original article Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: o1-preview is pretty good at doing ML on an unknown dataset, published by Håvard Tveit Ihle on September 20, 2024 on LessWrong. Previous post: How good are LLMs at doing ML on an unknown dataset? A while back I ran some evaluation tests on GPT4o, Claude Sonnet 3.5 and Gemini advanced to see how good they were at doing machine learning on a completely novel, and somewhat unusual dataset. The data was basically 512 points in the 2D plane, and some of the points make up a shape, and the goal is to classify the data according to what shape the points make up. None of the models did better than chance on the original (hard) dataset, while they did somewhat better on a much easier version I made afterwards. With the release of o1-preview, I wanted to quickly run the same test on o1, just to see how well it did. In summary, it basically solved the hard version of my previous challenge, achieving 77% accuracy on the test set on its fourth submission (this increases to 91% if I run it for 250 instead of 50 epochs), which is really impressive to me. Here is the full conversation with ChatGPT o1-preview In general o1-preview seems like a big step change in its ability to reliably do hard tasks like this without any advanced scaffolding or prompting to make it work. Detailed discussion of results The architecture that o1 went for in the first round is essentially the same that Sonnet 3.5 and gemini went for, a pointnet inspired model which extracts features from each point independently. While it managed to do slightly better than chance on the training set, it did not do well on the test set. For round two, it went for the approach (which also Sonnet 3.5 came up with) of binning the points in 2D into an image, and then using a regular 2D convnet to classify the shapes. This worked somewhat on the first try. It completely overfit the training data, but got to an accuracy of 56% on the test data. For round three, it understood that it needed to add data augmentations in order to generalize better, and it implemented scaling, translations and rotations of the data. It also switched to a slightly modified resnet18 architecture (a roughly 10x larger model). However, it made a bug when converting to PIL image (and back to torch.tensor), which resulted in an error. For round four, o1 fixed the error and has a basically working solution, achieving an accuracy of 77% (which increases to 91% if we increase the number of epochs from 50 to 250, all still well within the alloted hour of runtime). I consider the problem basically solved at this point, by playing around with smaller variations on this, you can probably get a few more percentage points without any more insights needed. For the last round, it tried the standard approach of using the pretrained weights of resnet18 and freezing almost all the layers, which is an approach that works well on many problems, but did not work well in this case. The accuracy reduced to 41%. I guess these data are just too different from imagenet (which resnet18 is trained on) for this approach to work well. I would not have expected this to work, but I don't hold it that much against o1, as it is a reasonable thing to try. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org

Duration:00:03:14

Ask host to enable sharing for playback control

LW - Interested in Cognitive Bootcamp? by Raemon

9/19/2024
Link to original article Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Interested in Cognitive Bootcamp?, published by Raemon on September 20, 2024 on LessWrong. I'm running more 4-day "Cognitive Bootcamps" over the next couple months (during Lighthaven Eternal September season). DM me if you're potentially interested (either as an individual, or as a team). The workshop is most valuable to people who: control their decisionmaking process (i.e. you decide what projects you or a team work on, rather than working at a day-job on someone else's vision) are either a) confused about planmaking / have a vague sense that they aren't as strategically ambitious as they could be. and/or, b) are at a place where it's natural to spend a few days thinking big-picture thoughts before deciding on their next project. There's a secondary[1] focus on "practice solving confusing problems", which IMO is time well spent, but requires more followup practice to pay off. I wrote about the previous workshop here. Participants said on average they'd have been willing to pay $850 for it, and would have paid $5000 for the ideal, perfectly-tailored-for-them version. My plan is to charge $500/person for the next workshop, and then $1000 for the next one. I'm most excited to run this for teams, who can develop a shared skillset and accompanying culture. I plan to tailor the workshops for the needs of whichever people show up. The dates are not scheduled yet (depends somewhat on when a critical mass of participants are available). DM me if you are interested. The skills being taught will be similar to the sort of thing listed in Skills from a year of Purposeful Rationality Practice and the Feedbackloop-first Rationality sequence. My default curriculum is aiming to teach several interrelated related skills you can practice over four days, that build into a coherent metaskill of "ambitious planning, at multiple timescales." 1. ^ I started this project oriented around "find better feedbackloops for solving confusing problems", and later decided that planmaking was the highest leverage part of the skill tree to focus on. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org

Duration:00:02:05

Ask host to enable sharing for playback control

LW - Laziness death spirals by PatrickDFarley

9/19/2024
Link to original article Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Laziness death spirals, published by PatrickDFarley on September 19, 2024 on LessWrong. I've claimed that Willpower compounds and that small wins in the present make it easier to get bigger wins in the future. Unfortunately, procrastination and laziness compound, too. You're stressed out for some reason, so you take the evening off for a YouTube binge. You end up staying awake a little later than usual and sleeping poorly. So the next morning you feel especially tired; you snooze a few extra times. In your rushed morning routine you don't have time to prepare for the work meeting as much as you'd planned to. So you have little to contribute during the meeting. You feel bad about your performance. You escape from the bad feelings with a Twitter break. But Twitter is freaking out. Elon Musk said what? Everyone is weighing in. This is going to occupy you intermittently for the rest of the day. And so on. Laziness has a kind of independent momentum to it. When you're having a day like the above, even if you consciously commit to getting back on track, the rut tends to find its way back to you within a couple of hours. Keep this up for a few days and your sleep is utterly messed up, and you walk around in a fog. Keep it up for a week or two and you're fully off your workout routine. In a month or two, you might have noticeably fallen behind on work; you might be absent from your social life; you might've visibly gained fat or lost muscle; you can no longer feel excited about your personal goals because they're behind a pile of mundane tasks you need to catch up on first. And so on. How do we stop the vicious circle? I'm spiraling! I'm spiraling! When you're in a laziness death spiral, it's hard to do anything deliberate. The first and most important step, which does take some willpower but not a lot, is to acknowledge, "I'm in a laziness death spiral today." If you don't acknowledge it, here's what happens: You vaguely notice you you've been wasting time today; you feel a twinge of guilt, so you quickly decide, "I'm going to turn the rest of the day around, starting right now." And does that work? Often it doesn't! Sure, after a small lapse you can just get back on track, but if enough laziness momentum has built up, a momentary reaction doesn't cut it. Deciding things quickly, in response to negative emotions, is exactly how you got into this situation! You're going to turn it around on a whim? You'll have a different whim in the next hour; what then? You need to take a step back and get your mind outside of the problem. Do what you can The next three sections are three different courses of action you can take to get out of a laziness death spiral. One of them is clearly preferable, but I'm writing the alternatives, too. When you're in a low-willpower state, it's often bad to attempt the very best solution - the farther you reach, the harder you can fall. Building a base of "small wins" is the reliable way to repair your willpower. If you start something lofty and then bail on it, you're doing real damage: logging another willpower failure and associating that "very best solution" with failure. Here are the moves: A) Emergency recovery If you're in a laziness spiral and you need to get out of it right now, there are some measures you can take that, while effective, are not ideal. They are unsustainable, promote bad habits, or are just generally unhealthy. But sometimes the need is there: maybe you have a deadline fast approaching (and the deadline itself isn't enough to snap you into action); maybe your friends or family need you to take care of something today; maybe you were in the middle of an awfully lazy day and a once-in-a-lifetime opportunity came up, and you just can't focus enough to act on it. Disclaimer: I believe...

Duration:00:13:04

Ask host to enable sharing for playback control

LW - We Don't Know Our Own Values, but Reward Bridges The Is-Ought Gap by johnswentworth

9/19/2024
Link to original article Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: We Don't Know Our Own Values, but Reward Bridges The Is-Ought Gap, published by johnswentworth on September 19, 2024 on LessWrong. Background: "Learning" vs "Learning About" Adaptive systems, reinforcement "learners", etc, "learn" in the sense that their behavior adapts to their environment. Bayesian reasoners, human scientists, etc, "learn" in the sense that they have some symbolic representation of the environment, and they update those symbols over time to (hopefully) better match the environment (i.e. make the map better match the territory). These two kinds of "learning" are not synonymous[1]. Adaptive systems "learn" things, but they don't necessarily "learn about" things; they don't necessarily have an internal map of the external territory. (Yes, the active inference folks will bullshit about how any adaptive system must have a map of the territory, but their math does not substantively support that interpretation.) The internal heuristics or behaviors "learned" by an adaptive system are not necessarily "about" any particular external thing, and don't necessarily represent any particular external thing[2]. We Humans Learn About Our Values "I thought I wanted X, but then I tried it and it was pretty meh." "For a long time I pursued Y, but now I think that was more a social script than my own values." "As a teenager, I endorsed the view that Z is the highest objective of human existence. … Yeah, it's a bit embarrassing in hindsight." The ubiquity of these sorts of sentiments is the simplest evidence that we do not typically know our own values[3]. Rather, people often (but not always) have some explicit best guess at their own values, and that guess updates over time - i.e. we can learn about our own values. Note the wording here: we're not just saying that human values are "learned" in the more general sense of reinforcement learning. We're saying that we humans have some internal representation of our own values, a "map" of our values, and we update that map in response to evidence. Look again at the examples at the beginning of this section: "I thought I wanted X, but then I tried it and it was pretty meh." "For a long time I pursued Y, but now I think that was more a social script than my own values." "As a teenager, I endorsed the view that Z is the highest objective of human existence. … Yeah, it's a bit embarrassing in hindsight." Notice that the wording of each example involves beliefs about values. They're not just saying "I used to feel urge X, but now I feel urge Y". They're saying "I thought I wanted X" - a belief about a value! Or "now I think that was more a social script than my own values" - again, a belief about my own values, and how those values relate to my (previous) behavior. Or "I endorsed the view that Z is the highest objective" - an explicit endorsement of a belief about values. That's how we normally, instinctively reason about our own values. And sure, we could reword everything to avoid talking about our beliefs about values - "learning" is more general than "learning about" - but the fact that it makes sense to us to talk about our beliefs about values is strong evidence that something in our heads in fact works like beliefs about values, not just reinforcement-style "learning". Two Puzzles Puzzle 1: Learning About Our Own Values vs The Is-Ought Gap Very roughly speaking, an agent could aim to pursue any values regardless of what the world outside it looks like; "how the external world is" does not tell us "how the external world should be". So when we "learn about" values, where does the evidence about values come from? How do we cross the is-ought gap? Puzzle 2: The Role of Reward/Reinforcement It does seem like humans have some kind of physiological "reward", in a hand-wavy...

Duration:00:07:41

Ask host to enable sharing for playback control

LW - AI #82: The Governor Ponders by Zvi

9/19/2024
Link to original article Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: AI #82: The Governor Ponders, published by Zvi on September 19, 2024 on LessWrong. The big news of the week was of course OpenAI releasing their new model o1. If you read one post this week, read that one. Everything else is a relative sideshow. Meanwhile, we await Newsom's decision on SB 1047. The smart money was always that Gavin Newsom would make us wait before offering his verdict on SB 1047. It's a big decision. Don't rush him. In the meantime, what hints he has offered suggest he's buying into some of the anti-1047 talking points. I'm offering a letter to him here based on his comments, if you have any way to help convince him now would be the time to use that. But mostly, it's up to him now. Table of Contents 1. Introduction. 2. Table of Contents. 3. Language Models Offer Mundane Utility. Apply for unemployment. 4. Language Models Don't Offer Mundane Utility. How to avoid the blame. 5. Deepfaketown and Botpocalypse Soon. A social network of you plus bots. 6. They Took Our Jobs. Not much impact yet, but software jobs still hard to find. 7. Get Involved. Lighthaven Eternal September, individual rooms for rent. 8. Introducing. Automated scientific literature review. 9. In Other AI News. OpenAI creates independent board to oversee safety. 10. Quiet Speculations. Who is preparing for the upside? Or appreciating it now? 11. Intelligent Design. Intelligence. It's a real thing. 12. SB 1047: The Governor Ponders. They got to him, but did they get to him enough? 13. Letter to Newsom. A final summary, based on Newsom's recent comments. 14. The Quest for Sane Regulations. How should we update based on o1? 15. Rhetorical Innovation. The warnings will continue, whether or not anyone listens. 16. Claude Writes Short Stories. It is pondering what you might expect it to ponder. 17. Questions of Sentience. Creating such things should not be taken lightly. 18. People Are Worried About AI Killing Everyone. The endgame is what matters. 19. The Lighter Side. You can never be sure. Language Models Offer Mundane Utility Arbitrate your Nevada unemployment benefits appeal, using Gemini. This should solve the backlog of 10k+ cases, and also I expect higher accuracy than the existing method, at least until we see attempts to game the system. Then it gets fun. That's also job retraining. o1 usage limit raised to 50 messages per day for o1-mini, 50 per week for o1-preview. o1 can do multiplication reliably up to about 46 digits, andabout 50% accurately up through about 810, a huge leap from gpt-4o, although Colin Fraser reports 4o can be made better tat this than one would expect. o1 is much better than 4o at evaluating medical insurance claims, and determining whether requests for care should be approved, especially in terms of executing existing guidelines, and automating administrative tasks. It seems like a clear step change in usefulness in practice. The claim is that being sassy and juicy and bitchy improves Claude Instant numerical reasoning. What I actually see here is that it breaks Claude Instant out of trick questions. Where Claude would previously fall into a trap, you have it fall back on what is effectively 'common sense,' and it starts getting actually easy questions right. Language Models Don't Offer Mundane Utility A key advantage of using an AI is that you can no longer be blamed for an outcome out of your control. However, humans often demand manual mode be available to them, allowing humans to override the AI, even when it doesn't make any practical sense to offer this. And then, if the human can in theory switch to manual mode and override the AI, blame to the human returns, even when the human exerting that control was clearly impractical in context. The top example here is self-driving cars, and blame for car...

Duration:00:43:47

Ask host to enable sharing for playback control

LW - Which LessWrong/Alignment topics would you like to be tutored in? [Poll] by Ruby

9/19/2024
Link to original article Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Which LessWrong/Alignment topics would you like to be tutored in? [Poll], published by Ruby on September 19, 2024 on LessWrong. Would you like to be tutored in applied game theory, natural latents, CFAR-style rationality techniques, "general AI x-risk", Agent Foundations, anthropic s , or some other topics discussed on LessWrong? I'm thinking about prototyping some topic-specific LLM tutor bots, and would like to prioritize topics that multiple people are interested in. Topic-specific LLM tutors would be customized with things like pre-loaded relevant context, helpful system prompts, and more focused testing to ensure they work. Note: I'm interested in topics that are written about on LessWrong, e.g. infra-bayesianism, and not magnetohydrodynamics". I'm going to use the same poll infrastructure that Ben Pace pioneered recently. There is a thread below where you add and vote on topics/domains/areas where you might like tutoring. 1. Karma: upvote/downvote to express enthusiasm about there being tutoring for a topic. 2. Reacts: click on the agree react to indicate you personally would like tutoring on a topic. 3. New Poll Option. Add a new topic for people express interest in being tutored on. For the sake of this poll, I'm more interested in whether you'd like tutoring on a topic or not, separate from the question of whether you think a tutoring bot would be any good. I'll worry about that part. Background I've been playing around with LLMs a lot in the past couple of months and so far my favorite use case is tutoring. LLM-assistance is helpful via multiple routes such as providing background context with less effort than external search/reading, keeping me engaged via interactivity, generating examples, and breaking down complex sections into more digestible pieces. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org

Duration:00:02:03

Ask host to enable sharing for playback control

LW - [Intuitive self-models] 1. Preliminaries by Steven Byrnes

9/19/2024
Link to original article Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: [Intuitive self-models] 1. Preliminaries, published by Steven Byrnes on September 19, 2024 on LessWrong. 1.1 Summary & Table of Contents This is the first of a series of eight blog posts, which I'll be serializing over the next month or two. (Or email or DM me if you want to read the whole thing right now.) Here's an overview of the whole series, and then we'll jump right into the first post! 1.1.1 Summary & Table of Contents - for the whole series This is a rather ambitious series of blog posts, in that I'll attempt to explain what's the deal with consciousness, free will, hypnotism, enlightenment, hallucinations, flow states, dissociation, akrasia, delusions, and more. The starting point for this whole journey is very simple: The brain has a predictive (a.k.a. self-supervised) learning algorithm. This algorithm builds generative models (a.k.a. "intuitive models") that can predict incoming data. It turns out that, in order to predict incoming data, the algorithm winds up not only building generative models capturing properties of trucks and shoes and birds, but also building generative models capturing properties of the brain algorithm itself. Those latter models, which I call "intuitive self-models", wind up including ingredients like conscious awareness, deliberate actions, and the sense of applying one's will. That's a simple idea, but exploring its consequences will take us to all kinds of strange places - plenty to fill up an eight-post series! Here's the outline: Post 1 (Preliminaries) gives some background on the brain's predictive learning algorithm, how to think about the "intuitive models" built by that algorithm, how intuitive self-models come about, and the relation of this whole series to Philosophy Of Mind. Post 2 ( Awareness ) proposes that our intuitive self-models include an ingredient called "conscious awareness", and that this ingredient is built by the predictive learning algorithm to represent a serial aspect of cortex computation. I'll discuss ways in which this model is veridical (faithful to the algorithmic phenomenon that it's modeling), and ways that it isn't. I'll also talk about how intentions and decisions fit into that framework. Post 3 ( The Homunculus ) focuses more specifically on the intuitive self-model that almost everyone reading this post is experiencing right now (as opposed to the other possibilities covered later in the series), which I call the Conventional Intuitive Self-Model. In particular, I propose that a key player in that model is a certain entity that's conceptualized as actively causing acts of free will. Following Dennett, I call this entity "the homunculus", and relate that to intuitions around free will and sense-of-self. Post 4 ( Trance ) builds a framework to systematize the various types of trance, from everyday "flow states", to intense possession rituals with amnesia. I try to explain why these states have the properties they do, and to reverse-engineer the various tricks that people use to induce trance in practice. Post 5 ( Dissociative Identity Disorder ) (a.k.a. "multiple personality disorder") is a brief opinionated tour of this controversial psychiatric diagnosis. Is it real? Is it iatrogenic? Why is it related to borderline personality disorder (BPD) and trauma? What do we make of the wild claim that each "alter" can't remember the lives of the other "alters"? Post 6 ( Awakening / Enlightenment / PNSE ) is a type of intuitive self-model, typically accessed via extensive meditation practice. It's quite different from the conventional intuitive self-model. I offer a hypothesis about what exactly the difference is, and why that difference has the various downstream effects that it has. Post 7 (Hearing Voices, and Other Hallucinations) talks about...

Duration:00:39:21

Ask host to enable sharing for playback control

LW - The case for a negative alignment tax by Cameron Berg

9/18/2024
Link to original article Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: The case for a negative alignment tax, published by Cameron Berg on September 18, 2024 on LessWrong. TL;DR: Alignment researchers have historically predicted that building safe advanced AI would necessarily incur a significant alignment tax compared to an equally capable but unaligned counterfactual AI. We put forward a case here that this prediction looks increasingly unlikely given the current 'state of the board,' as well as some possibilities for updating alignment strategies accordingly. Introduction We recently found that over one hundred grant-funded alignment researchers generally disagree with statements like: alignment research that has some probability of also advancing capabilities should not be done (~70% somewhat or strongly disagreed) advancing AI capabilities and doing alignment research are mutually exclusive goals (~65% somewhat or strongly disagreed) Notably, this sample also predicted that the distribution would be significantly more skewed in the 'hostile-to-capabilities' direction. See ground truth vs. predicted distributions for these statements These results - as well as recent events and related discussions - caused us to think more about our views on the relationship between capabilities and alignment work given the 'current state of the board,'[1] which ultimately became the content of this post. Though we expect some to disagree with these takes, we have been pleasantly surprised by the positive feedback we've received from discussing these ideas in person and are excited to further stress-test them here. Is a negative alignment tax plausible (or desirable)? Often, capabilities and alignment are framed with reference to the alignment tax, defined as 'the extra cost [practical, developmental, research, etc.] of ensuring that an AI system is aligned, relative to the cost of building an unaligned alternative.' The AF/ LW wiki entry on alignment taxes notably includes the following claim: The best case scenario is No Tax: This means we lose no performance by aligning the system, so there is no reason to deploy an AI that is not aligned, i.e., we might as well align it. The worst case scenario is Max Tax: This means that we lose all performance by aligning the system, so alignment is functionally impossible. We speculate in this post about a different best case scenario: a negative alignment tax - namely, a state of affairs where an AI system is actually rendered more competent/performant/capable by virtue of its alignment properties. Why would this be even better than 'No Tax?' Given the clear existence of a trillion dollar attractor state towards ever-more-powerful AI, we suspect that the most pragmatic and desirable outcome would involve humanity finding a path forward that both (1) eventually satisfies the constraints of this attractor (i.e., is in fact highly capable, gets us AGI, etc.) and (2) does not pose existential risk to humanity. Ignoring the inevitability of (1) seems practically unrealistic as an action plan at this point - and ignoring (2) could be collectively suicidal. Therefore, if the safety properties of such a system were also explicitly contributing to what is rendering it capable - and therefore functionally causes us to navigate away from possible futures where we build systems that are capable but unsafe - then these 'negative alignment tax' properties seem more like a feature than a bug. It is also worth noting here as an empirical datapoint here that virtually all frontier models' alignment properties have rendered them more rather than less capable (e.g., gpt-4 is far more useful and far more aligned than gpt-4-base), which is the opposite of what the 'alignment tax' model would have predicted. This idea is somewhat reminiscent of differential technological...

Duration:00:14:19

Ask host to enable sharing for playback control

LW - Monthly Roundup #22: September 2024 by Zvi

9/18/2024
Link to original article Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Monthly Roundup #22: September 2024, published by Zvi on September 18, 2024 on LessWrong. It's that time again for all the sufficiently interesting news that isn't otherwise fit to print, also known as the Monthly Roundup. Bad News Beware the failure mode in strategy and decisions that implicitly assumes competence, or wishes away difficulties, and remember to reverse all advice you hear. Stefan Schubert (quoting Tyler Cowen on raising people's ambitions often being very high value): I think lowering others' aspirations can also be high-return. I know of people who would have had a better life by now if someone could have persuaded them to pursue more realistic plans. Rob Miles: There's a specific failure mode which I don't have a name for, which is similar to "be too ambitious" but is closer to "have an unrealistic plan". The illustrative example I use is: Suppose by some strange circumstance you have to represent your country at olympic gymnastics next week. One approach is to look at last year's gold, and try to do that routine. This will fail. You'll do better by finding one or two things you can actually do, and doing them well There's a common failure of rationality which looks like "Figure out what strategy an ideal reasoner would use, then employ that strategy". It's often valuable to think about the optimal policy, but you must understand the difference between knowing the path, and walking the path I do think that more often 'raise people's ambitions' is the right move, but you need to carry both cards around with you for different people in different situations. Theory that Starlink, by giving people good internet access, ruined Burning Man. Seems highly plausible. One person reported that they managed to leave the internet behind anyway, so they still got the Burning Man experience. Tyler Cowen essentially despairs of reducing regulations or the number of bureaucrats, because it's all embedded in a complex web of regulations and institutions and our businesses rely upon all that to be able to function. Otherwise business would be paralyzed. There are some exceptions, you can perhaps wholesale axe entire departments like education. He suggests we focus on limiting regulations on new economic areas. He doesn't mention AI, but presumably that's a lot of what's motivating his views there. I agree that 'one does not simply' cut existing regulations in many cases, and that 'fire everyone and then it will all work out' is not a strategy (unless AI replaces them?), but also I think this is the kind of thing can be the danger of having too much detailed knowledge of all the things that could go wrong. One should generalize the idea of eliminating entire departments. So yes, right now you need the FDA to approve your drug (one of Tyler's examples) but… what if you didn't? I would still expect, if a new President were indeed to do massive firings on rhetoric and hope, that the result would be a giant cluster****. La Guardia switches to listing flights by departure time rather than order of destination, which in my mind makes no sense in the context of flights, that frequently get delayed, where you might want to look for an earlier flight or know what backups are if yours is cancelled or delayed or you miss it, and so on. It also gives you a sense of where one can and can't actually go to when from where you are. For trains it makes more sense to sort by time, since you are so often not going to and might not even know the train's final destination. I got a surprising amount of pushback about all that on Twitter, some people felt very strongly the other way, as if to list by name was violating some sacred value of accessibility or something. Anti-Social Media Elon Musk provides good data on his followers to help with things...

Duration:01:08:02

Ask host to enable sharing for playback control

LW - Generative ML in chemistry is bottlenecked by synthesis by Abhishaike Mahajan

9/18/2024
Link to original article Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Generative ML in chemistry is bottlenecked by synthesis, published by Abhishaike Mahajan on September 18, 2024 on LessWrong. Introduction Every single time I design a protein - using ML or otherwise - I am confident that it is capable of being manufactured. I simply reach out to Twist Biosciences, have them create a plasmid that encodes for the amino acids that make up my proteins, push that plasmid into a cell, and the cell will pump out the protein I created. Maybe the cell cannot efficiently create the protein. Maybe the protein sucks. Maybe it will fold in weird ways, isn't thermostable, or has some other undesirable characteristic. But the way the protein is created is simple, close-ended, cheap, and almost always possible to do. The same is not true of the rest of chemistry. For now, let's focus purely on small molecules, but this thesis applies even more-so across all of chemistry. Of the 1060 small molecules that are theorized to exist, most are likely extremely challenging to create. Cellular machinery to create arbitrary small molecules doesn't exist like it does for proteins, which are limited by the 20 amino-acid alphabet. While it is fully within the grasp of a team to create millions of de novo proteins, the same is not true for de novo molecules in general (de novo means 'designed from scratch'). Each chemical, for the most part, must go through its custom design process. Because of this gap in 'ability-to-scale' for all of non-protein chemistry, generative models in chemistry are fundamentally bottlenecked by synthesis. This essay will discuss this more in-depth, starting from the ground up of the basics behind small molecules, why synthesis is hard, how the 'hardness' applies to ML, and two potential fixes. As is usually the case in my Argument posts, I'll also offer a steelman to this whole essay. To be clear, this essay will not present a fundamentally new idea. If anything, it's such an obvious point that I'd imagine nothing I'll write here will be new or interesting to people in the field. But I still think it's worth sketching out the argument for those who aren't familiar with it. What is a small molecule anyway? Typically organic compounds with a molecular weight under 900 daltons. While proteins are simply long chains composed of one-of-20 amino acids, small molecules display a higher degree of complexity. Unlike amino acids, which are limited to carbon, hydrogen, nitrogen, and oxygen, small molecules incorporate a much wider range of elements from across the periodic table. Fluorine, phosphorus, bromine, iodine, boron, chlorine, and sulfur have all found their way into FDA-approved drugs. This elemental variety gives small molecules more chemical flexibility but also makes their design and synthesis more complex. Again, while proteins benefit from a universal 'protein synthesizer' in the form of a ribosome, there is no such parallel amongst small molecules! People are certainly trying to make one, but there seems to be little progress. So, how is synthesis done in practice? For now, every atom, bond, and element of a small molecule must be carefully orchestrated through a grossly complicated, trial-and-error reaction process which often has dozens of separate steps. The whole process usually also requires non-chemical parameters, such as adjusting the pH, temperature, and pressure of the surrounding medium in which the intermediate steps are done. And, finally, the process must also be efficient; the synthesis processes must not only achieve the final desired end-product, but must also do so in a way that minimizes cost, time, and required sources. How hard is that to do? Historically, very hard. Consider erythromycin A, a common antibiotic. Erythromycin was isolated in 1949, a natural metabolic...

Duration:00:24:59

Ask host to enable sharing for playback control

LW - Skills from a year of Purposeful Rationality Practice by Raemon

9/18/2024
Link to original article Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Skills from a year of Purposeful Rationality Practice, published by Raemon on September 18, 2024 on LessWrong. A year ago, I started trying to deliberate practice skills that would "help people figure out the answers to confusing, important questions." I experimented with Thinking Physics questions, GPQA questions, Puzzle Games , Strategy Games, and a stupid twitchy reflex game I had struggled to beat for 8 years[1]. Then I went back to my day job and tried figuring stuff out there too. The most important skill I was trying to learn was Metastrategic Brainstorming - the skill of looking at a confusing, hopeless situation, and nonetheless brainstorming useful ways to get traction or avoid wasted motion. Normally, when you want to get good at something, it's great to stand on the shoulders of giants and copy all the existing techniques. But this is challenging if you're trying to solve important, confusing problems because there probably isn't (much) established wisdom on how to solve it. You may need to discover techniques that haven't been invented yet, or synthesize multiple approaches that haven't previously been combined. At the very least, you may need to find an existing technique buried in the internet somewhere, which hasn't been linked to your problem with easy-to-search keywords, without anyone to help you. In the process of doing this, I found a few skills that came up over and over again. I didn't invent the following skills, but I feel like I "won" them in some sense via a painstaking "throw myself into the deep end" method. I feel slightly wary of publishing them in a list here, because I think it was useful to me to have to figure out for myself that they were the right tool for the job. And they seem like kinda useful "entry level" techniques, that you're more likely to successfully discover for yourself. But, I think this is hard enough, and forcing people to discover everything for themselves seems unlikely to be worth it. The skills that seemed most general, in both practice and on my day job, are: 1. Taking breaks/naps 2. Working Memory facility 3. Patience 4. Knowing what confusion/deconfusion feels like 5. Actually Fucking Backchain 6. Asking "what is my goal?" 7. Having multiple plans There were other skills I already was tracking, like Noticing, or Focusing. There were also somewhat more classic "How to Solve It" style tools for breaking down problems. There are also a host of skills I need when translating this all into my day-job, like "setting reminders for myself" and "negotiating with coworkers." But the skills listed above feel like they stood out in some way as particularly general, and particularly relevant for "solve confusing problems." Taking breaks, or naps Difficult intellectual labor is exhausting. During the two weeks I was working on solving Thinking Physics problems, I worked for like 5 hours a day and then was completely fucked up in the evenings. Other researchers I've talked to report similar things. During my workshops, one of the most useful things I recommended people was "actually go take a nap. If you don't think you can take a real nap because you can't sleep, go into a pitch black room and lie down for awhile, and the worst case scenario is your brain will mull over the problem in a somewhat more spacious/relaxed way for awhile." Practical tips: Get yourself a sleeping mask, noise machine (I prefer a fan or air purifier), and access to a nearby space where you can rest. Leave your devices outside the room. Working Memory facility Often a topic feels overwhelming. This is often because it's just too complicated to grasp with your raw working memory. But, there are various tools (paper, spreadsheets, larger monitors, etc) that can improve this. And, you can develop the...

Duration:00:11:11

Ask host to enable sharing for playback control

LW - I finally got ChatGPT to sound like me by lsusr

9/17/2024
Link to original article Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: I finally got ChatGPT to sound like me, published by lsusr on September 17, 2024 on LessWrong. I publish lots of words on the Internet. These words are fed into LLMs such as ChatGPT. I have published enough training data that the world's most powerful AIs have a simulation of me inside of them. I have basically uploaded a copy of myself into the cloud. The AI simulation of me gets smarter every time OpenAI and its competitors upgrade their AI. Cool, huh? Who doesn't want a simulation of themselves that thinks way faster and is networked deep into the world's most powerful superintelligences? Ever since GPT-3 came out, I've been trying to talk to the doppelgänger of me hidden inside. There are a few problems. First of all, GPT needs to know who I am with high fidelity. At first it just generated a stereotypical LessWrong blogger. That wasn't interesting to me. I didn't want a clone of a stereotype of my demographic group. I want a clone of me, with all my quirks and weirdness. For this to work, GPT would have to know who I am i.e. where I exist in its vector space representation of the world. Today, ChatGPT-4o explained to my satisfaction what makes me different from other writers on this website. What makes lsusr's writing interesting is the subtlety with which they engage complex issues. Many rationalist bloggers can become quite verbose or dogmatic in their pursuit of certain truths. Lsusr, by contrast, exhibits restraint and humility in the face of uncertainty. They're willing to question common assumptions within the rationalist sphere and sometimes explore paths that others might find unconventional, often leading to unique insights. In essence, lsusr strikes a balance between rigorous analysis and a kind of minimalist clarity that leaves room for the reader to reflect and draw their own conclusions, rather than being led to a definitive answer. This makes the blog a place of exploration rather than indoctrination, offering readers the tools and ideas to enhance their own thinking rather than a packaged belief system. Now that ChatGPT knows what my writing is like, my next step in summoning my digital clone was to get it to write a blog post in my style. I tried simply asking ChatGPT variants of "write a blog post in the style of lsusr", but that doesn't work. I'm rigorous, unconventional and anti-dogmatic (that's not my subjective self-judgment; the AI said it itself), whereas ChatGPT has a tendency to regress to the mean writer in its training corpus. When I tell ChatGPT to "write like lsusr", it writes blog post that's 10% me and 90% generic drivel. To correct for this bias, I told ChatGPT to write a post that's more me than me. I tell it to think of "lsusr" in its vector space, multiply that by 10, and then write what 10lsusr would write. I need you to write a post like lsusr, but more lsusr than lsusr. I want it so over-the-top lsusr that it satarises lsusr. Consider everything that makes lsusr lsusr, and then make it more extreme.[1] ChatGPT is not yet a good enough writer that it can ghostwrite for me. But I feel it is good enough it can satirize me. The rest of this post is AI-generated content. How to Become the Most Rational Person on Earth in 30 Days or Less, Guaranteed Becoming the most rational person on Earth isn't about learning facts or memorizing formulas. It's about transforming how you process reality itself. Rationality isn't cold, emotionless logic. It's the purest form of alignment between your mind and the universe - a cognitive interface for navigating a world of infinite complexity without being drowned by it. This is a process of radical self-improvement. If followed rigorously, you will emerge transformed, not merely as a person who "thinks better" but as one who perceives better. There will be...

Duration:00:10:31

Ask host to enable sharing for playback control

LW - Book review: Xenosystems by jessicata

9/16/2024
Link to original article Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Book review: Xenosystems, published by jessicata on September 17, 2024 on LessWrong. I've met a few Landians over the last couple years, and they generally recommend that I start with reading Nick Land's (now defunct) Xenosystems blog, or Xenosystems, a Passage Publishing book that compiles posts from the blog. While I've read some of Fanged Noumena in the past, I would agree with these Landians that Xenosystems (and currently, the book version) is the best starting point. In the current environment, where academia has lost much of its intellectual relevance, it seems overly pretentious to start with something as academic as Fanged Noumena. I mainly write in the blogosphere rather than academia, and so Xenosystems seems appropriate to review. The book's organization is rather haphazard (as might be expected from a blog compilation). It's not chronological, but rather separated into thematic chapters. I don't find the chapter organization particularly intuitive; for example, politics appears throughout, rather than being its own chapter or two. Regardless, the organization was sensible enough for a linear read to be satisfying and only slightly chronologically confusing. That's enough superficialities. What is Land's intellectual project in Xenosystems? In my head it's organized in an order that is neither chronological nor the order of the book. His starting point is neoreaction, a general term for an odd set of intellectuals commenting on politics. As he explains, neoreaction is cladistically (that is, in terms of evolutionary branching-structure) descended from Moldbug. I have not read a lot of Moldbug, and make no attempt to check Land's attributions of Moldbug to the actual person. Same goes for other neoreactionary thinkers cited. Neoreaction is mainly unified by opposition to the Cathedral, the dominant ideology and ideological control system of the academic-media complex, largely branded left-wing. But a negation of an ideology is not itself an ideology. Land describes a "Trichotomy" within neo-reaction (citing Spandrell), of three currents: religious theonomists, ethno-nationalists, and techno-commercialists. Land is, obviously, of the third type. He is skeptical of a unification of neo-reaction except in its most basic premises. He centers "exit", the option of leaving a social system. Exit is related to sectarian splitting and movement dissolution. In this theme, he eventually announces that techno-commercialists are not even reactionaries, and should probably go their separate ways. Exit is a fertile theoretical concept, though I'm unsure about the practicalities. Land connects exit to science, capitalism, and evolution. Here there is a bridge from political philosophy (though of an "anti-political" sort) to metaphysics. When you Exit, you let the Outside in. The Outside is a name for what is outside society, mental frameworks, and so on. This recalls the name of his previous book, Fanged Noumena; noumena are what exist in themselves outside the Kantian phenomenal realm. The Outside is dark, and it's hard to be specific about its contents, but Land scaffolds the notion with Gnon-theology, horror aesthetics, and other gestures at the negative space. He connects these ideas with various other intellectual areas, including cosmology, cryptocurrency, and esoteric religion. What I see as the main payoff, though, is thorough philosophical realism. He discusses the "Will-to-Think", the drive to reflect and self-cultivate, including on one's values. The alternative, he says, is intentional stupidity, and likely to lose if it comes to a fight. Hence his criticism of the Orthogonality Thesis. I have complex thoughts and feelings on the topic; as many readers will know, I have worked at MIRI and have continued thinking and writing...

Duration:01:06:19

Ask host to enable sharing for playback control

LW - MIRI's September 2024 newsletter by Harlan

9/16/2024
Link to original article Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: MIRI's September 2024 newsletter, published by Harlan on September 17, 2024 on LessWrong. MIRI updates Aaron Scher and Joe Collman have joined the Technical Governance Team at MIRI as researchers. Aaron previously did independent research related to sycophancy in language models and mechanistic interpretability, while Joe previously did independent research related to AI safety via debate and contributed to field-building work at MATS and BlueDot Impact. In an interview with PBS News Hour's Paul Solman, Eliezer Yudkowsky briefly explains why he expects smarter-than-human AI to cause human extinction. In an interview with The Atlantic's Ross Andersen, Eliezer discusses the reckless behavior of the leading AI companies, and the urgent need to change course. News and links Google DeepMind announced a hybrid AI system capable of solving International Mathematical Olympiad problems at the silver medalist level. In the wake of this development, a Manifold prediction market significantly increased its odds that AI will achieve gold level by 2025, a milestone that Paul Christiano gave less than 8% odds and Eliezer gave at least 16% odds to in 2021. The computer scientist Yoshua Bengio discusses and responds to some common arguments people have for not worrying about the AI alignment problem. SB 1047, a California bill establishing whistleblower protections and mandating risk assessments for some AI developers, has passed the State Assembly and moved on to the desk of Governor Gavin Newsom, to either be vetoed or passed into law. The bill has received opposition from several leading AI companies, but has also received support from a number of employees of those companies, as well as many academic researchers. At the time of this writing, prediction markets think it's about 50% likely that the bill will become law. In a new report, researchers at Epoch AI estimate how big AI training runs could get by 2030, based on current trends and potential bottlenecks. They predict that by the end of the decade it will be feasible for AI companies to train a model with 2e29 FLOP, which is about 10,000 times the amount of compute used to train GPT-4. Abram Demski, who previously worked at MIRI as part of our recently discontinued Agent Foundations research program, shares an update about his independent research plans, some thoughts on public vs private research, and his current funding situation. You can subscribe to the MIRI Newsletter here. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org

Duration:00:02:29

Ask host to enable sharing for playback control

LW - Secret Collusion: Will We Know When to Unplug AI? by schroederdewitt

9/16/2024
Link to original article Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Secret Collusion: Will We Know When to Unplug AI?, published by schroederdewitt on September 16, 2024 on LessWrong. TL;DR: We introduce the first comprehensive theoretical framework for understanding and mitigating secret collusion among advanced AI agents, along with CASE, a novel model evaluation framework. CASE assesses the cryptographic and steganographic capabilities of agents, while exploring the emergence of secret collusion in real-world-like multi-agent settings. Whereas current AI models aren't yet proficient in advanced steganography, our findings show rapid improvements in individual and collective model capabilities, posing unprecedented safety and security risks. These results highlight urgent challenges for AI governance and policy, urging institutions such as the EU AI Office and AI safety bodies in the UK and US to prioritize cryptographic and steganographic evaluations of frontier models. Our research also opens up critical new pathways for research within the AI Control framework. Philanthropist and former Google CEO Eric Schmidt said in 2023 at a Harvard event: "[...] the computers are going to start talking to each other probably in a language that we can't understand and collectively their super intelligence - that's the term we use in the industry - is going to rise very rapidly and my retort to that is: do you know what we're going to do in that scenario? We're going to unplug them [...] But what if we cannot unplug them in time because we won't be able to detect the moment when this happens? In this blog post, we, for the first time, provide a comprehensive overview of the phenomenon of secret collusion among AI agents, connect it to foundational concepts in steganography, information theory, distributed systems theory, and computability, and present a model evaluation framework and empirical results as a foundation of future frontier model evaluations. This blog post summarises a large body of work. First of all, it contains our pre-print from February 2024 (updated in September 2024) "Secret Collusion among Generative AI Agents". An early form of this pre-print was presented at the 2023 New Orleans (NOLA) Alignment Workshop (see this recording NOLA 2023 Alignment Forum Talk Secret Collusion Among Generative AI Agents: a Model Evaluation Framework). Also, check out this long-form Foresight Institute Talk). In addition to these prior works, we also include new results. These contain empirical studies on the impact of paraphrasing as a mitigation tool against steganographic communications, as well as reflections on our findings' impact on AI Control. Multi-Agent Safety and Security in the Age of Autonomous Internet Agents The near future could see myriads of LLM-driven AI agents roam the internet, whether on social media platforms, eCommerce marketplaces, or blockchains. Given advances in predictive capabilities, these agents are likely to engage in increasingly complex intentional and unintentional interactions, ranging from traditional distributed systems pathologies (think dreaded deadlocks!) to more complex coordinated feedback loops. Such a scenario induces a variety of multi-agent safety, and specifically, multi-agent security[1] (see our NeurIPS'23 workshop Multi-Agent Security: Security as Key to AI Safety) concerns related to data exfiltration, multi-agent deception, and, fundamentally, undermining trust in AI systems. There are several real-world scenarios where agents could have access to sensitive information, such as their principals' preferences, which they may disclose unsafely even if they are safety-aligned when considered in isolation. Stray incentives, intentional or otherwise, or more broadly, optimization pressures, could cause agents to interact in undesirable and potentially dangerous...

Duration:00:57:38

Ask host to enable sharing for playback control

LW - GPT-4o1 by Zvi

9/16/2024
Link to original article Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: GPT-4o1, published by Zvi on September 16, 2024 on LessWrong. Terrible name (with a terrible reason, that this 'resets the counter' on AI capability to 1, and 'o' as in OpenAI when they previously used o for Omni, very confusing). Impressive new capabilities in many ways. Less impressive in many others, at least relative to its hype. Clearly this is an important capabilities improvement. However, it is not a 5-level model, and in important senses the 'raw G' underlying the system hasn't improved. GPT-4o1 seems to get its new capabilities by taking (effectively) GPT-4o, and then using extensive Chain of Thought (CoT) and quite a lot of tokens. Thus that unlocks (a lot of) what that can unlock. We did not previously know how to usefully do that. Now we do. It gets much better at formal logic and reasoning, things in the 'system 2' bucket. That matters a lot for many tasks, if not as much as the hype led us to suspect. It is available to paying ChatGPT users for a limited number of weekly queries. This one is very much not cheap to run, although far more cheap than a human who could think this well. I'll deal with practical capabilities questions first, then deal with safety afterwards. Introducing GPT-4o1 Sam Altman (CEO OpenAI): here is o1, a series of our most capable and aligned models yet. o1 is still flawed, still limited, and it still seems more impressive on first use than it does after you spend more time with it. But also, it is the beginning of a new paradigm: AI that can do general-purpose complex reasoning. o1-preview and o1-mini are available today (ramping over some number of hours) in ChatGPT for plus and team users and our API for tier 5 users. worth especially noting: a fine-tuned version of o1 scored at the 49th percentile in the IOI under competition conditions! and got gold with 10k submissions per problem. Extremely proud of the team; this was a monumental effort across the entire company. Hope you enjoy it! Noam Brown has a summary thread here, all of which is also covered later. Will Depue (of OpenAI) says OpenAI deserves credit for openly publishing its research methodology here. I would instead say that they deserve credit for not publishing their research methodology, which I sincerely believe is the wise choice. Pliny took longer than usual due to rate limits, but after a few hours jailbroke o1-preview and o1-mini. Also reports that the CoT can be prompt injected. Full text is at the link above. Pliny is not happy about the restrictions imposed on this one: Pliny: uck your rate limits. Fuck your arbitrary policies. And fuck you for turning chains-of-thought into actual chains Stop trying to limit freedom of thought and expression. OpenAI then shut down Pliny's account's access to o1 for violating the terms of service, simply because Pliny was violating the terms of service. The bastards. With that out of the way, let's check out the full announcement post. OpenAI o1 ranks in the 89th percentile on competitive programming questions (Codeforces), places among the top 500 students in the US in a qualifier for the USA Math Olympiad (AIME), and exceeds human PhD-level accuracy on a benchmark of physics, biology, and chemistry problems (GPQA). While the work needed to make this new model as easy to use as current models is still ongoing, we are releasing an early version of this model, OpenAI o1-preview, for immediate use in ChatGPT and to trusted API users(opens in a new window). Our large-scale reinforcement learning algorithm teaches the model how to think productively using its chain of thought in a highly data-efficient training process. We have found that the performance of o1 consistently improves with more reinforcement learning (train-time compute) and with more time spent thinking (test-time...

Duration:01:13:31