I have a developing app on my github called aiRead, which is a text-based reading app integrating a number of chatbot prompts to do all sorts of interactive features with the text you’re reading. It’s unpolished, as I’m focusing on the prompt engineering and figuring out how to work with it more effectively rather than making it attractive for general consumption. If you’d like to check it out, here’s the link—I’d be happy to answer questions if you find it confusing! Just requires the ability to run a python script.
DirectedEvolution
A couple other important ideas:
Ask the model to summarize AND compress the previous work every other prompt. This increases the amount of data in its context window.
Ask it to describe ideas in no more than 3 high-level concepts. Then select one and ask it to break it down to 3 sub-points, etc.
Start by asking it to break down your goal to verify it understands what you are trying to do before you ask it to execute. You can ask for positive and negative examples.
If you get a faulty reply, regenerate or edit your prompt, rather than critiquing it with a follow up prompt. Keep the context window as pure as possible.
Honey baked ham is 5g fat and 3g sugar/3oz. ~28g = 1oz, so that’s 6% fat and 4% sugar, so ice cream is about 5x sugarier and ~2x fattier than honey-baked ham. In other words, for sugar and fat content, honey-drenched fat > ice cream > honey-baked ham. Honey-baked ham is therefore not a modern American equivalent to honey-drenched Gazelle fat, a sentence I never thought I’d write but I’m glad I had the chance to once in my life.
Thank you for contributing more information.
I understand and appreciate the thinking behind step in Ren’s argument. However, the ultimate result is this:
I experienced “disabling”-level pain for a couple of hours, by choice and with the freedom to stop whenever I want. This was a horrible experience that made everything else seem to not matter at all...
A single laying hen experiences hundreds of hours of this level of pain during their lifespan, which lasts perhaps a year and a half—and there are as many laying hens alive at any one time as there are humans. How would I feel if every single human were experiencing hundreds of hours of disabling pain?
My main takeaway is that the breadth and variety of experience that arguably falls under the umbrella of “disabling pain” is enormous, and we can only have low-moderate confidence in animal welfare pain metrics. As a result, I am updating toward increased skepticism in high-level summaries of animal welfare research.
The impact of nest deprivation on laying hen welfare may still be among the most pressing animal welfare issues. But, if tractability was held constant, I might prefer to focus on alleviating physical pain among a smaller number of birds.
Also, to disagreevoters, I’m genuinely curious about why you disagree! Were you already appropriately skeptical before? Do you think I am being too skeptical? Why or why not?
I had more trouble understanding how nest deprivation could be equivalent to “**** me, make it stop. Like someone slicing into my leg with a hot, sharp live wire.” So I looked up the underpinnings of this metric, in Ch. 6 of the book they build their analysis on (pg. 6-9 is the key material).
They base this on the fact that chickens pace, preen, show aggressive competition for nests when availability is limited, and will work as hard to push open heavy doors to access nests as they will to access food after 4-28 hours of food deprivation. Based on this, the authors categorize nest deprivation as a disabling experience that each hen endures for an average of about 45 minutes per day.
This is a technically accurate definition, but I still had trouble intuiting this as equivalent to a daily experience of disabling physical pain equivalent to having your leg sliced open with a hot, sharp live wire.
Researchers are limited to showing that chickens exhibit distress during nest deprivation, or, in more sophisticated research, that they work as hard to access nest boxes as they do to access food after 4-28 hours of food deprivation.
I am suspicious of the claim that these methods are adequate to allow us to make comparisons of physical and emotional pain across species. This is especially true with the willingness-to-work metric they use to compare the severity of nest deprivation and starvation on chickens.
Willingness-to-work is probably mediated by energy. After starvation, chickens will be low-energy, and willingness-to-work probably underestimates their suffering. A starving person would like to do 100 pushups to access an all-you-can-eat buffet, but physically is unable to do so. If he’s also willing to do 100 pushups to join the football team, does that mean that keeping him off the team is as bad as starving him?
People show distressed behaviors in the absence of suffering. I bite my fingernails pretty severely. Sometimes, they even bleed. It’s not motivated by severe anxiety in those moments. It’s just force of habit. Chickens may be hardwired by evolution to work hard to access nests, without necessary suffering while they do so.
Our perceptions of how distressed a behavior is is culturally-specific, not to mention species-specific. I pace and walk around the neighborhood when I’m thinking hard. People get piercings and tattoos. People fight recreationally. We don’t assume that people are experiencing high emotional distress in the moments they choose to do these things. Why do we assume that about chickens?
I’ve spent too long writing this comment, so I’m going to just stop here.
I’ve used ChatGPT for writing landing pages for my own websites, and as you say, it does a “good enough” job. It’s the linguistic equivalent of a house decorated in knick knacks from Target. For whatever reason, we have had a cultural expectation that websites have to have this material in order to look respectable, but it’s not business-critical beyond that.
By contrast, software remains business-critical. One of the key points that’s being made again and again is that many business applications require extremely high levels of reliability. Traditional software and hardware engineering can accomplish that. For now, at least, large language models cannot, unless they are imitating existing high-reliability software solutions.
A large language model can provide me with reliable working code for an existing sorting algorithm, but when applications become large, dynamic, and integrated with the real world, it won’t be possible to built a whole application off a short, simple prompt. Instead, the work is going to be about using both human and AI-generated code to put together these applications more efficiently, debug them, improve the features, and so on.
This is one reason why I think that LLMs are unlikely to replace software engineers, even though they are replacing copy editors, and even though they can write code: SWEs create business-critical high-reliability products, while copy editors create non-critical low-reliability products, which LLMs are eminently suitable for.
IMO, the main potential power of a boycott is symbolic, and I think you only achieve that is by eschewing LLMs entirely. Instead, we can use them to communicate, plan, and produce examples. As I see it, this needs to be a story about engaged and thoughtful users advocating for real responsibility with potentially dangerous tech, not panicky luddites mounting a weak looking protest.
Seems to me that we’ll only see a change in course from relentless profit-seeking LLM development if intermediate AIs start misbehaving—smart enough to seek power and fight against control, but dumb enough to be caught and switched off.
I think instead of a boycott, this is a time to practice empathic communication with the public now that the tech is on everybody’s radar and AI x-risk arguments are getting a respectability boost from folks like Ezra Klein.
A poster on LessWrong recently harvested a comment from a NY Times reader that talked about x-risk in a way that clearly resonated with the readership. Figuring out how to scale that up seems like a good task for an LLM. In this theory of change, we need to double down on our communication skills to steer the conversation in appropriate ways. And we’ll need LLMs to help us do that. A boycott takes us out of the conversation, so I don’t think that’s the right play.
This might be an especially good time to enter the field. Instead of having to compete with more experienced SWEs in writing code the old fashioned way, you can be on a nearly level playing field with incorporating LLMs into your workflow. You’ll still need to learn a traditional language, at least for now, but you will be able to learn more quickly with the assistance of an LLM tutor. As the field increasingly adapts to a whole new way to write code, you can learn along with everybody else.
Have you already done some searching for articles on the subject? There’s a ton of content on this subject already. What have you tried already? What are you struggling with?
Interesting! Do you think that is a common view? And do you think that federal healthcare policy should be made by somehow tapping into commonsense moral intuitions? Or should a winning, even if unpopular, argument determine policy options?
Edit: perhaps we can value QALYs on the principle that we’re unlikely to be able to accurately track all contributors to total ETHU in practice, but having people maintain physical health is probably an important contributor to it in practice. Physical health has positive externalities that go beyond subjective well-being and therefore we should value it in setting healthcare policy.
This is getting into philosophical territory, so here’s a thought experiment. Let’s say you’d lost your legs. You had to choose between a $10 pill that instantly regrew your legs and restored your subjective well-being, and a $0 pill that only corrected any loss in subjective well-being from having lost your legs. Do you really choose the well-being only pill in this case?
WELLBYs are proposed in the doc you link as a measure specifically for non-heath and non-pecuniary measures. QALYs take subjective well-being into account, along with physical health metrics through the psychological component of HRQoL, so a shift to WELLBYs in this context just excludes the physical health component of QALYs in pricing physical health interventions.
If it makes you feel any better, 90-95% of bills are never passed into law.
We established a policy that established members, especially members of the executive, were to refrain from hitting on or sleeping with people in their first year at the society. This means that people get a chance to settle in and form friendships. And if an incident does occur, it’s no longer a case of the word of an experienced member vs someone nobody knows, it’s now your old friend Bob vs your new friend Alice. Alice is more likely to be believed, and more likely to actually tell people about the incident: the newcomer will often just leave, assuming that misconduct is the norm.
Can you clarify how this worked in a little more detail? I understand the spirit of the policy, and it seems good. What if a newcomer hits on an old timer, and what were the consequences if an old timer hit on a newcomer? Or was this more of an honor code and cultural norm than a formal rules-and-consequences approach?
It’s important to keep in mind that while money laundering is typically carried out by profit-seeking criminals who take advantage of complex financial transactions to hide their illegal activities, GoF research is not driven by financial gain. Therefore, we need to consider the unique nature of GoF research when assessing the need for regulation.
It’s not just a matter of how much regulation is in place, but also about finding a balance between the pressures to engage in the research and a regulatory framework that effectively manages any potential risks. If there’s an inadequate regulatory apparatus in place relative to the pressures to participate, then the field is “underregulated.” Conversely, if there’s too much regulation, the field may be at risk of becoming “overregulated.”
Given the significant risks associated with GoF research, it requires a high level of regulation compared to other public service research areas that have similarly limited pressures to participate. However, because profit is not a driving force, the field can only tolerate a certain amount of regulation before participation becomes difficult.
Rather than focusing on increasing regulation dramatically or maintaining the status quo, we should look to refine and improve regulation for GoF research. While some scope exists to tighten regulations, excessive regulation could stifle the field altogether, which may or may not be desirable. If we wish the field to continue while enhancing the risk-benefit ratio, our focus should be on regulating the field proportionately to the pressures to participate.
It’s time to shift the discussion from “how regulated is the field” to “how regulated is the field relative to the pressures to participate.” By doing so, we can strike a balance between promoting the field’s progress and ensuring appropriate risk management.
That’s a helpful reframing, thank you. I think there is still a disconnect between the two cases, however. As money laundering is a crime, companies have a relatively simple task before them: to identify and eliminate money laundering.
By contrast, GoF research is not a crime, and the objective, from a “responsibly pro-GoF” point of view, is to improve the risk/reward ratio to an acceptable level. A company would be likely to be highly conservative in making these judgments, as they would capture none of the benefits of successful and informative GoF research, but would be punished for allowing overly risky or failed GoF research to go forward. In other words, companies would likely refuse to sell to GoF research entirely in order to minimize or eliminate their risk.
The problem is even more acute if the work of evaluating GoF research was foisted onto companies. Scientists might be motivated by curiosity, altruism, or a desire for scientific credit, so there is at least some reward to be had even if GoF research were much more stringently regulated. By contrast, regulating companies in the manner you propose would come with no incentive whatsoever for companies to sell to GoF research, thus effectively banning it.
This is a perfectly reasonable point to bring up, and I agree that we should critically consider whether or not policy and regulation in the field is adequate. I want to emphasize some ways that high-risk biological research differs from finance, nuclear weapons, and money laundering.
First, people don’t do gain of function research (or whatever we ought to call it) for profit, so imposing gigantic fines, the threat of jail time, and constant severe scrutiny would be tantamount to banning it outright. Likewise, private companies are pursuing profits when they build nuclear weapons. Medicine is, of course, heavily regulated, and once again it is the profit motive that allows the industry to thrive even in such a heavily regulated context.
Soldiers operating and maintaining nuclear weapons have given permission for the military to exert extremely intrusive control over their activities. Some of the best and brightest scientists worked for the military as an act of patriotic service to build the nuclear bomb during WWII. However, the Manhattan Project was aimed at a specific engineering outcome, while GoF research would be an ongoing effort with no “definition of done,” and it might be hard to convince an adequate number of high-quality scientists to sign up for such strict controls if it was for their entire careers.
Money laundering is a crime, so it is not “regulated” but policed. Nobody but terrorists would do gain of function research if it was illegal.
For a person who’d like to see gain of function research banned, any move to regulate it and punish violations would be a step in the right direction. However, those who’d like to enforce responsible behavior, perhaps by using regulations on part with those you describe, have to explain how they’d motivate already-beleaguered scientists to do GoF research when their proposal is “even more stick, still no carrot.”
I’m curious to know whether and to what extent we’ve considered ways to reward basic science researchers for making pandemic-mitigating discoveries in a public health context. Is there a way we can reward people for achieving the maximum public health benefit with the minimum risk in their research?
I’m going to be exiting the online EA community for an indefinite period of time. Anyone who’d like to keep in touch is free to PM me and I’ll provide my personal contact info when I check my messages occasionally. Best wishes.
Are you sure that virologists didn’t write such OPs?
My understanding is that in the US, they actually studied these questions hard and knew about things like airborn transmission and asymptomatic spread pretty early on, but were suppressed by the Trump administration. That doesn’t excuse them—they ought to have grown a spine! - but it’s important to recognize the cause of failure accurately so that we can work on the right problem.