I have a very rough draft based on work with @Michael Latowicki and Charity Entrepreneurship (@Filip_Murar and @weeatquince) where we tried various tools for modeling uncertainty for use in their CEAs. Due to personal issues, I’m not sure when I’ll finish it, so here it is (very unpolished and not finished).
Mainly, we list various tools we found with some comments on them and discuss the pros and cons of spreadsheet vs software solutions. The latter is mostly coherent so I’ll paste it here:
Spreadsheets vs programming languages
The tools we found for probabilistic calculation come in two flavours: either the calculation is expressed in a spreadsheet, or it is written as text in some programming language that is customized by a dedicated interpreter or library for this purpose.
The spreadsheet-based solutions seem to have one salient advantage: they do not scare non-programmers away. This is, we think, a barrier-to-entry advantage, rather than a long-term productivity advantage. The kind of people who build cost-effectiveness model are not incapable of being productive in a simple, probability-dedicated programming language as well as with a probabilistic library within a general purpose language.
The error-proneness of spreadsheets
We strongly suspect that using spreadsheets to organize formulae is more error-prone than using programming languages. We are not alone in this suspicion. Audits of spreadsheets generally find that something between a large fraction and nearly all spreadsheets contain errors.
Some pitfalls that make spreadsheets more error-prone than programming languages:
In a spreadsheet, mistyping a cell address does not necessarily result in an error message. By contrast, mistyping a variable name in a programming language typically does.
In a spreadsheet, unintentionally leaving a cell empty is equivalent to setting it to zero, which often does not result in an error. In a programming language, using a variable without declaring it does trigger an error.
Since spreadsheet formulae typically reference values by cell address, erroneous references are not salient to the reader. If your formula is “power(1+interest_rate, -years_left)”, but it is encoded as “power(1+M7, $A$13)” then you may easily fail to notice that the “13” should actually be “12”.
Formulas are not automatically updated to account for added data, for example in sums of columns.
Copy-and-pasting formulae often results in erroneous cell references because the software adjusts treats cell addresses as relative by default and the spreadsheet author often fails to consider whether addresses should be relative or absolute.
There are other causes of errors in spreadsheets, but you get the point. Errors occur in every kind of programming environment, of course, but spreadsheets sport their own additional pitfalls, on top of those that exist in every programming language.
We are far from certain that writing cost-effectiveness analyses in an ordinary programming language would reduce the error rate compared to spreadsheets—quantitative estimates of the error rate in both spreadsheets and in non-spreadsheet programs find error rates on the same order of magnitude. The mix of problems that are typically approached using these two types of tools is different though, and we have not found an apples-to-apples study of those error rates.
It is apparently not easy to root out spreadsheet errors. In “Spreadsheet errors: What we know. What we think we can do”, Professor of IT management Ray Panko summarizes his findings:
Unfortunately, only one approach to error reduction has been demonstrated to be effective. This is code inspection, in which a group of spreadsheet developers checks a spreadsheet cell-by-cell to discover errors. Even this exhausting and expensive process will catch only about 80% of all errors. Other things can be done to reduce errors, but given the tenacity of spreadsheet error in the face of even cell-by-cell code inspection, we should be careful about expecting too much from most error-reducing approach.
To add, we at Charity Entrepreneurship have been experimenting with using Dagger/Carlo for our cost-effectiveness analyses of new charity ideas. We’ve put together this (very rough, work-in-progress) guide on how to use Dagger – sharing in case others find it helpful.
I’ve personally found Guesstimate less error-prone and easier to verify and I’d guess easier to use in general than Google Sheets/Excel. Node names+acronyms and the displayed arrows between nodes are helpful.
I’d also imagine Guesstimate would beat programming languages on these, too, with fewer places for error per variable or operation.
However, Guesstimate is often not flexible enough, or takes a lot of nodes to do some simple things (e.g. sampling randomly from one of multiple variables). It can get very slow to edit with many (like 100 or more) nodes. It can also be more tedious for simple operations over many variables at a time, like a sum, IIRC.
(Of the options you’ve listed, I’ve only used Guesstimate and Google Sheets (without the probability stuff). I was also a deep learning engineer for ~2 years.)
I don’t think this evaluation is especially useful, because it only presents one side of the argument. Why spreadsheets are bad, not their advantages or how errors typically occur in programming languages.
The bottom line you present (quoted below) is in fact not very action relevant. It’s not strong enough to even support that the switching costs are worth it IMO.
We are far from certain that writing cost-effectiveness analyses in an ordinary programming language would reduce the error rate compared to spreadsheets—quantitative estimates of the error rate in both spreadsheets and in non-spreadsheet programs find error rates on the same order of magnitude. The mix of problems that are typically approached using these two types of tools is different though, and we have not found an apples-to-apples study of those error rates.
I have a very rough draft based on work with @Michael Latowicki and Charity Entrepreneurship (@Filip_Murar and @weeatquince) where we tried various tools for modeling uncertainty for use in their CEAs. Due to personal issues, I’m not sure when I’ll finish it, so here it is (very unpolished and not finished).
Mainly, we list various tools we found with some comments on them and discuss the pros and cons of spreadsheet vs software solutions. The latter is mostly coherent so I’ll paste it here:
Spreadsheets vs programming languages
The tools we found for probabilistic calculation come in two flavours: either the calculation is expressed in a spreadsheet, or it is written as text in some programming language that is customized by a dedicated interpreter or library for this purpose.
The spreadsheet-based solutions seem to have one salient advantage: they do not scare non-programmers away. This is, we think, a barrier-to-entry advantage, rather than a long-term productivity advantage. The kind of people who build cost-effectiveness model are not incapable of being productive in a simple, probability-dedicated programming language as well as with a probabilistic library within a general purpose language.
The error-proneness of spreadsheets
We strongly suspect that using spreadsheets to organize formulae is more error-prone than using programming languages. We are not alone in this suspicion. Audits of spreadsheets generally find that something between a large fraction and nearly all spreadsheets contain errors.
Some pitfalls that make spreadsheets more error-prone than programming languages:
In a spreadsheet, mistyping a cell address does not necessarily result in an error message. By contrast, mistyping a variable name in a programming language typically does.
In a spreadsheet, unintentionally leaving a cell empty is equivalent to setting it to zero, which often does not result in an error. In a programming language, using a variable without declaring it does trigger an error.
Since spreadsheet formulae typically reference values by cell address, erroneous references are not salient to the reader. If your formula is “power(1+interest_rate, -years_left)”, but it is encoded as “power(1+M7, $A$13)” then you may easily fail to notice that the “13” should actually be “12”.
Formulas are not automatically updated to account for added data, for example in sums of columns.
Copy-and-pasting formulae often results in erroneous cell references because the software adjusts treats cell addresses as relative by default and the spreadsheet author often fails to consider whether addresses should be relative or absolute.
There are other causes of errors in spreadsheets, but you get the point. Errors occur in every kind of programming environment, of course, but spreadsheets sport their own additional pitfalls, on top of those that exist in every programming language.
We are far from certain that writing cost-effectiveness analyses in an ordinary programming language would reduce the error rate compared to spreadsheets—quantitative estimates of the error rate in both spreadsheets and in non-spreadsheet programs find error rates on the same order of magnitude. The mix of problems that are typically approached using these two types of tools is different though, and we have not found an apples-to-apples study of those error rates.
It is apparently not easy to root out spreadsheet errors. In “Spreadsheet errors: What we know. What we think we can do”, Professor of IT management Ray Panko summarizes his findings:
To add, we at Charity Entrepreneurship have been experimenting with using Dagger/Carlo for our cost-effectiveness analyses of new charity ideas. We’ve put together this (very rough, work-in-progress) guide on how to use Dagger – sharing in case others find it helpful.
I’ve personally found Guesstimate less error-prone and easier to verify and I’d guess easier to use in general than Google Sheets/Excel. Node names+acronyms and the displayed arrows between nodes are helpful.
I’d also imagine Guesstimate would beat programming languages on these, too, with fewer places for error per variable or operation.
However, Guesstimate is often not flexible enough, or takes a lot of nodes to do some simple things (e.g. sampling randomly from one of multiple variables). It can get very slow to edit with many (like 100 or more) nodes. It can also be more tedious for simple operations over many variables at a time, like a sum, IIRC.
(Of the options you’ve listed, I’ve only used Guesstimate and Google Sheets (without the probability stuff). I was also a deep learning engineer for ~2 years.)
I don’t think this evaluation is especially useful, because it only presents one side of the argument. Why spreadsheets are bad, not their advantages or how errors typically occur in programming languages.
The bottom line you present (quoted below) is in fact not very action relevant. It’s not strong enough to even support that the switching costs are worth it IMO.
Totally agree with the need for a more balanced and careful analysis!