CS, AIS, PoliSci @ UC Chile.
Milan Weibelđč
In a certain sense, an LLMâs token embedding matrix is a machine ontology. Semantically similar tokens have similar embeddings in the latent space. However, different models may have learned different associations when their embedding matrix was trained. Every forward pass starts colored by ontological assumptions, an these may have alignment implications.
For instance, we would presumably not want a model to operate within an ontology that associates the concept of AI with the concept of evil, particularly if it is then prompted to instantiate a simulacrum that believes it is an AI.
Has someone looked into this? That is, the alignment implications of different token embedding matrices? I feel like it would involve calculating a lot of cosine similarities and doing some evals.
Milan Weibelâs Quick takes
Intriguing. Looking forward to the live demo.
PSA: The form accepts a maximum of 10 files, that is, 5 design proposals maximum (because each proposal requires uploading both a .png and a .svg file).
Just for the sake of clarity: I think the word âschismâ is inaccurate here because it carries false connotations of conflict.
Hi Jack!
Have you considered booking a call with 8000 hours career advising? They can help you analyse the factors behind your plans about your future career, and put you in contact with people working in the areas that interest you.
You could also contact CLR and CRS. If you show knowledge of and interest in their work, they may be eager to help. You canât be sure if youâll get a reply, and that may seem intimidating, but remember that the cost is minimal, EV is high, and how you feel about not getting a reply is at least partly under your control.
While this last point is not specifically focused on s-risks, a very cheap, very valuable, action you can take is subscribing to the AI Safety opportunities update emails at AI Safety Training. Many hackathons advertised there are beginner-friendly.
Side note: calling a world modelling disagreement implied by differences in cause prioritisation a âschismâ is in my opinion unwarranted and (low-probability, very negative value) risks becoming a self-fulfilling prophecy.
A more pessimistic counterargument: Safely developing AGI is so hard as to be practically impossible. I do not believe this one, but some pessimistic sectors within AIS do. It combines well with the last counterargument you list (that the timelines where things turn out OK are all ones where we stop /â radically slow down the development of AI capabilities). If you are confident that aligning AGI is for all practical purposes impossible, then you focus on preventing the creation of AGI and on improving the future of the timelines where AGI has been successfully avoided.
EDIT: Other commenters have pointed out reasons why the elimination of debt sold really cheap is unlikely to affect much the lives of recipients. Still, if the debt relieved did in fact significantly help the beneficiaries, it could turn out to be very effective. However, we wonât know until RIP releases recipient outcomes data.
TL;DR: About as cost-effective as GiveWellsâs top charities, IF my assumption about outcomes is broadly right. $14.16 to provide debt relief to one person. If one assumes a lifespan increase of 0.2% (less than two months) as the effect (by preventing healthcare avoidance), it comes out to $7080 per death-equivalent-in-lifespan averted. I recommend looking further into it, particularly with respect to outcomes.
Hi Layla, welcome to the Forum! Thanks for posting!
This looks like an interesting opportunity. Within the cause area of health in the US, RIP seems to have chosen a big and tractable problem, and to be triaging their beneficiaries according to the relevant metrics.
Here is my attempt to have a rough idea about RIPâs cost-effectiveness.
RIP claims that it has âhelped 5,492,948 individuals and familiesâ and has relieved $8,520,147,644 of medical debt. The average debt relieved per recipient is thus $8,520,147,644 /â 5,492,948 = $1551. If, as you say, âevery $100 donated clears $10,000 in medical debtâ, then the cost per recipient is $15.51 (!!!).
I was initially skeptical of this calculation, but it checks out. In its 2021 year end report, RIP says that it relieved debt to 1,312,697 people during the year, and in its 2021 financial statement declares total expenses of $18,587,272. So the cost per recipient is $18,587,272 /â 1,312,697 = $14.16.
Itâs hard to estimate the benefit from medical debt reduction. Letâs say, for the sake of simplicity, that the avoidance of medical treatment and mental health problems derived from struggling with medical debt make people live 0.2% shorter lives (1.92 months if starting out with an 80-year lifespan), and that the debt relief provided eliminates that effect. It follows that preventing 0.002 death-equivalents costs $14.16, and thus preventing one death-equivalent unit of lifespan reduction costs $7080. This is about as cost-effective as GiveWellâs most recommended charities.
This would be huge if true. However, my priors advise me against getting too hopeful. It should be hard to find a charity about as cost-effective as GiveWellâs top charities. RIP has been assessed by Charity Navigator, and does a fair bit of marketing. It would be weird if no EA had picked this up before. I have reason to believe that I am overestimating the positive effects of debt relief.
To find out whether RIP is really so effective, it would be great to have numbers on the welfare outcomes of debt relief. I found this report on RIPâs site, which while a potentially useful qualitative source, makes no effort to quantify outcomes.
Chilean AIS Hackathon Retrospective
An asÂpiraÂtionally comÂpreÂhenÂsive tyÂpolÂogy of fuÂture locked-in scenarios
ïChatGPT unÂderÂstands, but largely does not genÂerÂate SpanÂglish (and other code-mixed) text
Interesting. I agree that second or third-order effects such that as the good done later by people you have helped are an important consideration. Maximising such effects could be an underexplored effective giving strategy, and this organization you refer to looks like a group of people trying to do that. However, to really assess an organizationâs effectiveness, epecially if it focuses in educational or social interventions, some empirical evidence is needed.
Does SENG follow-up on the outcomes of aid recipients?
How do they compare with those of similar people in similar situations, but who didnât recieve help?
What programs does SENG run?
How much does each cost per recipient helped?
Having thought more about this, I suppose you can divide opinions into two clusters and be pointing at something real. Thatâs because peopleâs views on different aspects of the issue correlate, often in ways that make sense. For instance, people who think AGI will be achieved by scaling up current (or very similar to current) neural net architectures are more excited about practical alignment research on existing models.
However, such clusters would be quite broad. My main worry is that identifying two particular points as prototypical of them would narrow their range. People would tend to let their opinions drift closer to the point closest to them. This need not be caused by tribal dynamics. It could be something as simple as availability bias. This narrowing of the clusters would likely be harmful, because the AI safety field is quite new and weâve still got exploring to do. Another risk is that we may become too focused on the line between the two points, neglecting other potentially more worthwhile axes of variation.
If I were to divide current opinions into two clusters, I think that Scottâs two points would in fact fall in different clusters. They would probably even be not too far off their centers of mass. However, I strongly object to pretending the clusters are points, and then getting tribal about it. I think labeling clusters could be useful, if we made it clear that they are still clusters.
On the paths to understanding AI risk without accepting weird arguments, maybe getting people worried about ML unexplainability may be worthwhile to explore, though I suspect most people would think you were pointing to algorithmic bias and the like.
As a factual question, Iâm not sure if peopleâs opinions on the shape of AI risk can be divided into two distinct clusters, or even distributed along a spectrum (that is, that factor analysis on the points of opinion-space would find a good general factor), though I suspect it may quite weakly be the case. For instance, I found myself agreeing with six of the statements on one side of Scottâs dichotomy and two on the other.
As a public epistemic health question, I think issuing binary labels is harmful for further progress in the field, especially if they borrow terminology from religious groups and the author identifies with one of the proposed camps in the same post he raises the distinction. See the comment by xarkn on LW.
Even if the range of current opinions could be well-described by a single general factor, we should certainly use less divisive terminology for such a spectrum and be mindful that truth may well lie orthogonal to it.
Un equilibrio inadecuado (SpotifyâApple PodcastsâGoogle Podcasts)
Interviews in Spanish on EA topics. I particularly enjoyed the episode with Andrés Gómez Emilsson from Qualia Research Institute. Sadly, no new content since October 2021.
Contra hard moral anti-realism: a rough sequence of claims
Epistemic and provenance note: This post should not be taken as an attempt at a complete refutation of moral anti-realism, but rather as a set of observations and intuitions that may or may not give one pause as to the wisdom of taking a hard moral anti-realist stance. I may clean it up to construct a more formal argument in the future. I wrote it on a whim as a Telegram message, in direct response to the claim
> âyou canât find âvaluesâ in realityâ.
Yet, you can find valence in your own experiences (that is, you just know from direct experience whether you like the sensations you are experiencing or not), and you can assume other people are likely to have a similar enough stimulus-valence mapping. (Example: Iâm willing to bet 2k USD on my part against a single dollar yours that that if I waterboard you, youâll want to stop before 3 minutes have passed.)[1]
However, since we humans are bounded imperfect rationalists, trying to explicitly optimize valence is often a dumb strategy. Evolution has made us not into fitness-maximizers, nor valence-maximizers, but adaptation-executers.
âvaluesâ originate as (thus are) reifications of heuristics that reliably increase long term valence in the real world (subject to memetic selection pressures, among them social desirability of utterances, adaptativeness of behavioral effects, etc.)
If you find yourself terminally valuing something that is not someoneâs experienced valence, then either one of these propositions is likely true:
A nonsentient process has at some point had write access to your values.
What you value is a means to improving somebodyâs experienced valence, and so are you now.
crossposted from lesswrong
In retrospect, making this proposition was a bit crass on my part.