PSIRPEHTA refers to the aggregate ordinary revealed preferences of individual actors, who the AIs will be aligned to, in order to make those humans richer i.e. their preferences as revealed by their actions, such as what they spend their income on, NOT what they think is “morally correct”. For example, according to “human values” it might be wrong to eat meat, because maybe if humans reflected long enough they’d express the conclusion that it’s wrong to hurt animals. But from the perspective of PSIRPEHTA, eating meat is generally acceptable, and empirically there’s little pressure for people to “reflect” on their values and change them.
EDIT: I guess I’d think of human values as what people would actually just sincerely and directly endorse without further influencing them first (although maybe just asking them makes them take a position if they didn’t have one before, e.g. if they’ve never thought much about the ethics of eating meat).
I think you’re overstating the differences between revealed and endorsed preferences, including moral/human values, here. Probably only a small share of the population thinks eating meat is wrong or bad, and most probably think it’s okay. Even if people generally would find it wrong or bad after reflecting long enough (I’m not sure they actually would), that doesn’t reflect their actual values now. Actual human values do not generally find eating meat wrong.
To be clear, you can still complain that humans’ actual/endorsed values are also far from ideal and maybe not worth aligning with, e.g. because people don’t care enough about nonhuman animals or helping others. Do people care more about animals and helping others than an unaligned AI would, in expectation, though? Honestly, I’m not entirely sure. Humans may care about animal welfare somewhat, but they also specifically want to exploit animals in large part because of their values, specifically food-related taste, culture, traditions and habit. Maybe people will also want to specifically exploit artificial moral patients for their own entertainment, curiosity or scientific research on them, not just because the artificial moral patients are generically useful, e.g. for acquiring resources and power and enacting preferences (which an unaligned AI could be prone to).
I illustrate some other examples here on the influence of human moral values on companies. This is all of course revealed preferences, but my point is that revealed preferences can importantly reflect endorsed moral values.
People influence companies in part on the basis of what they think is right through demand, boycotts, law, regulation and other political pressure.
Companies, for the most part, can’t just go around directly murdering people (companies can still harm people, e.g. through misinformation on the health risks of their products, or because people don’t care enough about the harms). (Maybe this is largely for selfish reasons; people don’t want to be killed themselves, and there’s a slippery slope if you allow exceptions.)
GPT has content policies that reflect people’s political/moral views. Social media companies have use and content policies and have kicked off various users for harassment, racism, or other things that are politically unpopular, at least among a large share of users or advertisers (which also reflect consumers). This seems pretty standard.
Many companies have boycotted Russia since the invasion of Ukraine. Many companies have also committed to sourcing only cage-free eggs after corporate outreach and campaigns, despite cage-free egg consumption being low.
X (Twitter)’s policies on hate speech have changed under Musk, presumably primarily because of his views. That seems to have cost X users and advertisers, but X is still around and popular, so it also shows that some potentially important decisions about how a technology is used are largely in the hands of the company and its leadership, not just driven by profit.
I’d likewise guess it actually makes a difference that the biggest AI labs are (I would assume) led and staffed primarily by liberals. They can push their own views onto their AI even at the cost of some profit and market share. And some things may have minimal near term consequences for demand or profit, but could be important for the far future. If the company decides to make their AI object more to various forms of mistreatment of animals or artificial consciousness, will this really cost them tons of profit and market share? And it could depend on the markets it’s primarily used in, e.g. this would matter even less for an AI that brings in profit primarily through trading stocks.
It’s also often hard to say how much something affects a company’s profits.
EDIT: I guess I’d think of human values as what people would actually just sincerely and directly endorse without further influencing them first (although maybe just asking them makes them take a position if they didn’t have one before, e.g. if they’ve never thought much about the ethics of eating meat).
I think you’re overstating the differences between revealed and endorsed preferences, including moral/human values, here.Probably only a small share of the population thinks eating meat is wrong or bad, and most probably think it’s okay. Even if people generally would find it wrong or bad after reflecting long enough (I’m not sure they actually would), that doesn’t reflect their actual values now. Actual human values do not generally find eating meat wrong.To be clear, you can still complain that humans’ actual/endorsed values are also far from ideal and maybe not worth aligning with, e.g. because people don’t care enough about nonhuman animals or helping others. Do people care more about animals and helping others than an unaligned AI would, in expectation, though? Honestly, I’m not entirely sure. Humans may care about animal welfare somewhat, but they also specifically want to exploit animals in large part because of their values, specifically food-related taste, culture, traditions and habit. Maybe people will also want to specifically exploit artificial moral patients for their own entertainment, curiosity or scientific research on them, not just because the artificial moral patients are generically useful, e.g. for acquiring resources and power and enacting preferences (which an unaligned AI could be prone to).
I illustrate some other examples here on the influence of human moral values on companies. This is all of course revealed preferences, but my point is that revealed preferences can importantly reflect endorsed moral values.
People influence companies in part on the basis of what they think is right through demand, boycotts, law, regulation and other political pressure.
Companies, for the most part, can’t just go around directly murdering people (companies can still harm people, e.g. through misinformation on the health risks of their products, or because people don’t care enough about the harms). (Maybe this is largely for selfish reasons; people don’t want to be killed themselves, and there’s a slippery slope if you allow exceptions.)
GPT has content policies that reflect people’s political/moral views. Social media companies have use and content policies and have kicked off various users for harassment, racism, or other things that are politically unpopular, at least among a large share of users or advertisers (which also reflect consumers). This seems pretty standard.
Many companies have boycotted Russia since the invasion of Ukraine. Many companies have also committed to sourcing only cage-free eggs after corporate outreach and campaigns, despite cage-free egg consumption being low.
X (Twitter)’s policies on hate speech have changed under Musk, presumably primarily because of his views. That seems to have cost X users and advertisers, but X is still around and popular, so it also shows that some potentially important decisions about how a technology is used are largely in the hands of the company and its leadership, not just driven by profit.
I’d likewise guess it actually makes a difference that the biggest AI labs are (I would assume) led and staffed primarily by liberals. They can push their own views onto their AI even at the cost of some profit and market share. And some things may have minimal near term consequences for demand or profit, but could be important for the far future. If the company decides to make their AI object more to various forms of mistreatment of animals or artificial consciousness, will this really cost them tons of profit and market share? And it could depend on the markets it’s primarily used in, e.g. this would matter even less for an AI that brings in profit primarily through trading stocks.
It’s also often hard to say how much something affects a company’s profits.