Doesnāt look like it
MathiasKBšø
This is what my frontpage looks like
Naively, is there a case for using the average of the two?
Two ideas off the top of my head
Distribution of charitable goods (such as insecticide nets or cash-transfers) and its effect on economic growth of the region/ācountry
Proportion of people identifying as flexitarian/āvegetarian/āvegan and the effects on sales of plant-based products.
EDIT: Someone on lesswrong linked a great report by Epoch which tries to answer exactly this.
With the release of openAI o1, I want to ask a question Iāve been wondering about for a few months.
Like the chinchilla paper, which estimated the optimal ratio of data to compute, are there any similar estimates for the optimal ratio of compute to spend on inference vs training?In the release they show this chart:
The chart somewhat gets at what I want to know, but doesnāt answer it completely. How much additional inference compute would I need a 1e25 o1-like model to perform as well as a one shotted 1e26?
Additionally, for some x number of queries, what is the optimal ratio of compute to spend on training versus inference? How does that change for different values of x?
Are there any public attempts at estimating this stuff? If so, where can I read about it?
Thanks for all of the hard work you put into developing and maintaining it!
Thanks, this was quite informative. Would love to read the full report, but $1000 is a bit steep!
Without having read the letter yet, why do you find it questionable?
Good question, not sure how I get it into my email actually, I canāt find it on the website either
edit: I think itās through the forecasting newsletter
I can highly recommend following Sentinelās weekly minutes, a weekly update from superforecasters on the likelihood of any events which plausibly could cause worldwide catastrophe.
Perhaps the weekly newsletter I look the most forward to at this point. Read previous issues here:
Jeff, your notes on NAO are fascinating to read! I have nothing to add other than that I hope you keep posting them
Hi Ian,
Thanks for the question! Iāve been meaning to write down my thoughts on this for a while, so here is a longer perspective:
In 2015 USAID teamed up with Givewell to cash-benchmark one of its programs. The evidence came back showing that cash-transfers outperformed the program on every metric. What gets brought up less often is that the programme got its funding renewed shortly after anyways! The cash-benchmark alone was not sufficient, you also need some policy to require programs worse than cash should be wound down.
This is a sentiment Iām fully behind. But what exactly that policy should look like is where it gets tricky.
How should the ministry cash benchmark a music festival in Mali?[1] What is the cash-benchmark for a programme to monitor the Senegalese election to ensure a fair election? If the cash-benchmark should only be for certain types of programming amenable to cash comparisons, such as global health, how will that shift funding?
I worry that instituting a selective high bar will move funding from away broadly cost-effective areas which can be benchmarked against cash, to broadly ineffective areas which canāt be easily benchmarked against cash.
But even within areas amenable to cash-benchmarking, itās unclear what the policy should look like. How should the ministry cash-benchmark its funding to a large multilateral which will go to fund a thousand programmes across the world?
The answer to this, which many arrive at is: āCleary we need to move from demanding literal cash-arms to just making estimates of how impactful programmes and organizations are compared to cash-transfers. That way we still get the nice hurdle-rate that programmes must be compared against, which is what we were really after anywaysā
But that development ministries should systematically estimate and compare the impact of projects is what development economists have been shouting for decades!
To an extent, the ministryās lack of systematic measurement and comparison is a feature not a bug. Almost any instantiation of cash-benchmarking removes wriggle room to fund projects which are valuable for reasons you didnāt want to state out loud. From a ministers perspective, cash-benchmarking doesnāt solve any problems, it creates one!
- ^
This is not a facetious example, but a real project funded by the Norwegian government.
- ^
As a side note, one thing I find amusing is just how much it sucks to announce your orgās shut down after Maternal Health Initiative set the bar so ridiculously high.
Even at shutting down they have us beat!
Any ideas for what we can do to improve it?
The whole manifund debacle has left me quite demotivated. It really sucks that people are more interested debating contentious community drama, than seemingly anything else this forum has to offer.
Why are seitan products so expensive?
I strongly believe in the price, taste, convenience hypothesis. If/āwhen non-animal foods are cheaper and tastier, I expect the west to undergo a moral cascade where factory farming in a very short timespan will go from being common place to illegal. I know that in the animal welfare space, this view point is often considered naive, but I remain convinced its true.
My mother buys the expensive vegan mayonnaise because itās much tastier than the regular mayonnaise. I still eat dairy and eggs because the vegan alternatives suck.
What I donāt understand is why vegan alternatives have proven so difficult to make cheap and tasty. Are there any good write ups on this?
Like when I go to a supermarket in Copenhagen, every seitan product will charge a significant markup over the raw cost of the ingredients (Amazon will sell you kilos of seitan flour at very little cost).
Do consumers have sufficiently inelastic preferences that a small market high-markup is the most profitable strategy? Is the final market too small for producers to reach economies of scale for seitan, or is it just difficult to bootstrap?
I would love to better understand what the demand curves look like for various categories of vegan products, as I really canāt wrap my mind around how the current equilibrium came about
Iām keeping an eye out for Sentinelās analyses: https://āāforecasting.substack.com/āāp/āāalert-minutes-for-week-172024
Iām worried too!
Does anyone know if there been any research into creating engaging television for factory farmed animals? Google scholar didnāt get much outside of using TV to induce feeding in chickens. I know there have been evaluations of branched chains as a way to improve the conditions for pigs, but I havenāt seen any evaluation of television.
Thereās 24ā7 television made for house cats, why couldnāt something similar exist for chickens?
Iām not going to find time to look into this myself, so if somebody finds the idea intriguing, donāt hesitate with starting!
at giveffektivt.dk we cover transaction costs of donating. Similar to donation matching, itās likely the money we spend on transactions would be donated anyways.
I think itās fine to do this, but iām unsure where the line should be drawn. We find that many people who donate worry far too much about transaction and overhead costs. By alleviating one of those we make it much more attractive to donate (though I donāt think weāve A/āB tested this actually).
But following this logic should we say that ā5 dollars could save a lifeā if we thought this would increase total donations? Despite this sentence being literally true, it feels highly misleading and I would have mixed feelings about such a message. (In practice I donāt think stating this would increase donationsāif anything the opposite)
My own belief is that this type of messaging often brings its benefits in the short term, but incurs its costs in the long term, if a donor feels deceived and becomes less inclined to donate going forward.
This ultimately is the heuristic I go by. If someone were to read up on a claim after donating, would they feel deceived? If yes, then donāt make the claim.
I donāt personally think I would feel deceived about donor matching, so my intutition is that its fine, but maybe others feel different.
I strongly upvoted this post because Iām extremely interested in seeing it get more attention and, hopefully, a potential rebuttal. I think this is extremely important to get to the bottom of!
At first glance your critiques seem pretty damning, but I would have to put a bunch of time into understanding ACEās evaluations first before I would be able to conclude whether I agree your critiques (I can spend a weekend day doing this and writing up my own thoughts in a new post if there is interest).
My expectation is that if I were to do this I would come out feeling less confident than you seem to be. Iām a bit concerned that you havenāt made an attempt at explaining why ACE might have constructed their analyses this way.
But like Iām pretty confused too. Itās hard to think of much justification for the choice of numbers in the āImpact Potential Scoreā and deciding the impact of a book based on the average of all books doesnāt seem like the best way to approach things?