Open Climate Data as a possible cause area, Open Philanthropy

(This will cross-post to my substack /​ blog: https://​​benyeoh.substack.com/​​ ) This is aimed for the Open Philanthropy: https://​​www.causeexplorationprizes.com/​​ )

Summary: Open Climate Data as a possible cause area

I view this as a very shallow investigation into the possible impact of open climate-related data. I judge the problem is moderately tractable, has potentially very high impact but is only moderately neglected. There is much uncertainty. And the specific neglect on the open data piece may be possible quite high.

The area may not arrive over the high bar for a new cause area but a reasonable change in a few assumptions could change that so a deeper investigation (eg a moderate shallow investigation by a knowledgeable actor) is warranted. Under a medium bar further resources into the area may be warranted.

What is the problem? Climate data is not open, missing or inaccurate.

Climate-related data at the level of the company and large asset level (eg manufacturing plant) is missing, incomplete or poorly estimated. Some of the data is public domain, but much of the data is i) privately held or ii) where public, not in machine readable /​ easily digested form.


Without this foundational data many of the practical solutions for climate will be (1) slower to develop (2) not develop at all, or (3) misallocated to the wrong companies, sectors and projects.

At a second order level (a) investors and allocators of capital will find it (and do currently find it) harder to allocate to the correct areas (b) regulators can not effectively regulate the relevant sectors or companies ( c) litigation or enforcement against failing or bad actors is made harder or impossible (d) impacts of pollution taxes or prices can not be adequately developed

At the third order level, solving for the data problem especially with a form of open or shared data solution would (I) reduce the risks of negative climate-related outcomes (ii) allows faster and better innovations and quicker spillover impacts into poorer nations.

A summary might be:

Climate data —> policy makers —> better policy —> ↓ risk

Data—> Investors —> better capital allocation —> ↑ GDP

Data —> Innovators (profit and non-profit) —> New Applications and ideas —> ↑ GDP

Data —> Regulators —> Better Systems Risk management —> ↓ risk

Data —> Sector NGOs —> better industry standards —> ↓ risk

Data —> Availability Anyone —> New ideas —> ↑ GDP

Data —> Availability Anyone —> Better X-risk management ??

Risks: unlike certain areas eg AI technology or virus research, the risks of negative consequences due to the use of climate data are limited and would apply only to the unintended consequences of geo-enginering type innovations that would not be particularly accelerated by better data here.

In sum, solving for robust open climate data would enable needed innovation, lower catastrophic climate tail risk, and lower overall economic losses.

Importance: high uncertainty but importance seems high.

I rate the importance of this area as high. Solving for this problem could bring climate-related innovations forward 30 years. It would also be of value to governments, policy makers and NGOS.

As a toy example scenario:

  • Assume this data problem is solved by 2060 at our current trajectory.

Given modern day financial and account standards and data took 50 − 70 years (approx 1950 to 2010) to evolve into a state Fin Tech can use them (and are very imperfect still). This seems reasonable.

Say, $100m in funding allowed an OpenClimateData organisation to form and solve this problem by 2030 instead of 2060.

Those extra 30 years of innovation, regulation and superior investor capital allocation could plausible reduce the costs of climate from 10% of world GDP [1]to 9% (for a 10 percent impact) by 2100 (in the order of USD 1tn in value), and plausible could accelerate a positive climate opportunity, eg in new chemical processes or better urban design, by 30 years. If the value of this is 0.1% of GDP it would still be significant at c. USD100bn in value.

This would also lower the negative tail risks from the worse impacts of climate.


Estimates here are highly uncertain and changes here would impact your view.

It’s worth mentioning here the possible importance of an open/​shared data solution (public goods) vs a private, closed data solution.

While this is highly uncertain the value of an open data layer as a public good for private and non-profit and government actors to then build innovations, policy and solutions on top might be very valuable.

Say, this allows 1000 more start up ideas per year from the richest 100 nations and, or, even 500 more start up ideas from poorer nations (as the barriers to developed ideas on climate data is now lower). That is 100,000 − 200,000 more start up ideas.

Now assume 1 in 100,000 ideas have a USD100bn in market cap value creation (cf Stripe that benefits from financial data) and a 10x on that for social/​public value. (On the assumption that private actors only capture about 10% of the socio-economic value of their innovations see OECD, WTO though this is a contested area). That would still lead to USD1 tn in value creation and about 1 a year and about 30 more of these then under the base case. This would meet the bar for a 1000x return on USD100m.

This might be at the high end, but if I am out by 10x this would still suggest at least one $100bn company forming of social value of close to $1tn from having this open data layer, achieved 30 years quicker.

While using highly uncertain but in my view reasonable assumptions both value creation and the lower risk of value destruction does suggest to me the importance of the area is high.

While the economic impacts of climate do vary these toy calculations suggest value and importance to me on both lower risk and faster innovations.

Uncertainty. This is large. Innovations could be less enabled (or more enabled) from my approx estimates. The timing of years saved, the value of open data and the value of lower tail risk are all highly uncertain. I could believe the tail risk saving might be more valuable. I could also be overestimating the value of new innovation forming on having access to climate data. Although equally, I could be underestimating the value of one particularly major breakthrough which does seem plausible that it might not occur or occur much slower without this data. (For instance, the ability for algorithms to access this data earlier).

Who is already working on it? Many players. Fragmented. Many are new.

Disclosure: I have met the CEO of Icebreaker who spoke about his ideas in a large group meeting/​workshop where I was present. I am also linked to financial institutions and organisations who would benefit greatly from Open Climate Data. But, I have no particular vested interest in whether Icebreaker itself succeeds, or ClimateArc.


There are many organisations who are working on small pieces of the data puzzle.

These are a mix of profits, non-profits, governments agencies; and companies themselves.

However in my assessment of the landscape, few are working on an Open Data model (or are even close on a shared /​license data model). These are the major entities/​categories worth considering.

Icebreaker One and Climate Arc. Both these non-profits are working to coordinate more open and shared data in this area. While Icebreaker one is actively looking for partnership to enable more Open Data, the group currently seems more focused on solving for more shared data between entities.


OS Climate. Open Source Climate. Vision: OS-C is establishing an Open Source collaboration community to build a data and software platform that will dramatically boost global capital flows into climate change mitigation and resilience. Through a non-profit, non-competitive organization, OS-C will aggregate the best available data, modeling, and computing and data science worldwide into an AI-enhanced physical-economic model that functions like an operating system, enabling powerful applications for climate-integrated investing in a world where the future will be very different from the past.” OSC was founded in 2020. This is Linux Foundation supported.

Corporates. Large public companies report much data either voluntarily or due to government regulation. But this is published mostly in pdf form on a website. It may not be audited or robust. This is mostly public domain. Large companies also submit to data to CDP.

CDP (formerly Carbon Disclosure Project). This non-profit gathers via survey much climate related data from corporates. However, most of this data is not open source and is licensed to other aggregators, analysers, institutions.

Satellite data and other primary and secondary measurement. A few organisations are attempting to gather geo-spatial and other climate data. Some of this is aimed to be at an asset level eg manufacturing plant, airport. However, much of this data is also not public domain. Space agencies are important here.

Sector based institutions. Certain sector based institutions gather much data here. Two notable examples being the IEA, International Energy Agency and FAOSTAT, The Food and Agriculture Organization Corporate Statistical Database. The FAOSTAT data is public domain but the IEA data is not open data (as of Jan 2022, though this might change). (And in fact, many non-profits lament this fact. See World In Data request here[2]) There is also an open source aggregator at the sector, country level through OpenClimateData.net but this does not go down to the company or asset level.

Aggregators/​Anlaysers. There are many users of the CDP and IEA data who transform this data into more usable form for other end users. An example here is MSCI and other financial ratings agencies. These sell data on to end users.

Ultimately corporates, regualtors, investors and start-ups etc. either part build on this commercially available data, or rely on incomplete data or simply do not have access.

There is activity in this area, but it seems in my scoping of the area, relatively little is organised around open data. In particular, the gathering and dissemination of open data in machine accessible eg. via API, and to a level of comprehensiveness and robustness.

This is an incomplete map of the above to give a small sense of the interconnected this of this ecosystem.


Another way of looking at this as an (incomplete) list of entities here:


In sum, mostly end users have to pay to access data and a way of accessing primary data via API is lacking (and not likely free).

What could a new philanthropist do?

A new philanthropist (1) could accelerate the work of the main non-profits in the area or (2) set about developing a whole new OpenClimateData organisation.

For (1) there would need to be some scoping about how important open and accessible data might be vs. the coordination problem that is also apparent. Conversations with OSC, Icebreaker, and ClimateArc, also scoping to see if another NGO is missing.

For (2), the major “competition” is CDP (and to an extent OSC) which in theory could pivot to a more open data model, and, or make its data more accessible to computer technologies (rather than simply a pdf or spearesheet file). That said, it seems only moderately hard to ask /​ extract this data from companies themselves and do a better job /​ model than CDP. It seems in some ways OSC has pursued this strategy. You could also fund an adjacent organisation eg. World In Data,to add this to their mission.

Neglectedness: Not neglected overall, but possibly vs importance

Disclosure: I have met the CEO of Icebreaker who spoke about his ideas in a large group meeting/​workshop where I was present. I am also linked to financial institutions and organisations who would benefit greatly from Open Climate Data. But, I have no particular vested interest in whether Icebreaker itself succeeds, or ClimateArc. I do not know OSC.

Climate is not particularly neglected. Many entities have also realized the importance of climate data, and to some extent even open data.

However the economic value and risk mitigation that would lead second and third order from establishing truly open climate related data seems to me to be somewhat neglected given only 3 more recent non-profits are started un the area. Icebreaker launched in 2020 (with consultation 2018 − 2020) and ClimateArc is launching in 2022, OSC launched in 2020.

That said there are many entities involved here, and I do not feel I have an entire scoping of the ecosystem. I do not have exact funding data, but it seems the level of funding likely in the $1m to $5m range. OSC has 22 members who are expected to contribute $30k or $100k in funding pa. Icebreaker One launched with GBP1m in 2020.

I think on balance on this test, Open Climate data as a cause might fail a high bar, but meet a more medium bar because there are 3 recent non-profits working here albeit at <$5m in funding.

That said, I think there is a specific neglect of open data that innovators (and regulators and investors can build upon) as opposed to shared private data, and this more narrow focus (albeit of some OSC overlap) could d have high returns.

Tractability: hard, but technically tractable

I am convinced about the importance of Climate data. The area is arguably moderately neglected relative to its importance (and high in certain specific areas). Tractability seems difficult but doable.

The problem has a few components where we have technical solutions but not always coordination solutions.

  1. Not all companies report data. Major exclusions include private companies, small companies, and many Asian, LatAm, African companies.

    1. This data can be estimated, but some entity needs to do this.

  2. Where companies report data, the form is not clean, not audited.

    1. Reporting can be in pdf, although machine readable XBRL is also common

    2. This data needs to be cleaned and collated

  3. Most useful data needs to be transformed

    1. Analysis of data is mostly done by profit seeking entities (and so not open)

  4. Certain sector data is currently private

    1. Eg. IEA—although this can change with coordination; could be “shared” rather than open.

  5. No one entity is likely to hold all the data

    1. Even OSC and CDP will not hold all the data. Eg Satellite data, asset level data.

    2. But “Open Banking” model shows coordination is possible, financial data is shared even is ownership is maintained by originating entities. A certain amount of health data is similar.

    3. Open Standards may solve this, but needs to be developed.

What we can see from Open Banking and to a lesser degree in Open Health data that there are technical capabilities (ability to organise data sets and build APIs, and have innovators and NGOs use those data sets to build), but we are still in the early days of opening up access, agreeing shared standards and building those APIs

Possible Interventions, Open Questions

I only have tentative observations here.

  1. Around building specific useful open climate data sets to a standard that allows future combination.

    1. Eg the data sets that would be needed to build carbon labels for consumer items

    2. Open data that would allow simple access to calculate data for sectors eg buildings, vehicles, land use, water

      1. That would allow start-ups, non-profits etc to build applications on top; or regulators, policy makers

  2. Satellite Data and other Asset focused data

    1. Geospatial data; eg the state of trees; asset level data eg. the footprints of manufacturing plants

    2. An open data set here that could interact with other data sets could be very useful

  3. A “meta-institution” or support for a meta-institution

    1. Do ClimateArc, OSC, Icebreaker One have this covered?

    2. Would it be best to support one of these organisations? Or build a new meta-organisation whose focused purpose would be on the coordination problem? Could this interact with standards and regulatory bodies?

    3. Is funding level right? I estimate less <USD5m funging for these orgs. I think closer to $1-2m. Maybe $2m in funding really accelerates trajectory?

My impact calculations are uncertain and a deeper analysis here would be fruitful, but the indications are this area is important. Further work on likely the second and third order impacts both on risk mitigation and innovation should be explored, as I am highly uncertain here but seems that I could be in the correct ball park.

The area is not neglected, but is still early. Further scoping and speaking with OSC, Climate Arc and Icebreaker One would be fruitful on further research.


The area is technically tractable but has challenging coordination problems especially between private data, shared, and open data.

On reading around the area, I found parallels between open financial data—Open Banking—and open healthcare data.

Both Health and Banking are more developed, although health is much more fragmented—there are many valuable innovations coming out of open health data already. Health data also has a large co-ordination problem, and probably meets some of these tests (important, potentially tractable and neglected in certain areas) although has been developing for many more years than climate data.

At a meta-meta level, to the extent that principles for any type of “open data” that has positive impact could be organised this might be helpful. There are open source movements eg Linux Foundation, GitHub, but it seems undeveloped and fragmented to my skim of the area.[3]

Time spent, other potential conflicts and disclosures, mini-Bio

This review was very part-time over 2 weeks. I am very knowledgeable about finance, investing and healthcare, and moderately knowledgeable about climate related matters. (Two key sources are chatting with Chris Stark on policy, Zeke Hausfather on climate science, and informal many users of climate data over the last 2 years[4]). I use some climate data in my day job (which is working for Royal Bank Of Canada in asset management) but I don’t see any conflict with any of my day to day jobs. I also blog and podcast, and make theatre; but see no conflicts there. I informally know a little a few of the institutions and people at those organisations, but not deep enough to have any conflicts, I perceive. I trained in science (a long time ago, graduated 1999, but was top in my year at Cambridge).

I do not consider myself to be a 100% EA follower (in brief, I am more pluralistic) but I am sympathetic to the ideas of using resources more effectively as an aim, and any positive impactful ideas I am sympathetic to analyzing further.


My linkedin bio is here: https://​​www.linkedin.com/​​in/​​benjamin-yeoh-445133/​​

My website is here: https://​​www.thendobetter.com/​​

Conclusions: Open Climate data seems to be important, potentially tractable but only moderately neglected.

Open Climate data seems to be important, potentially tractable but only moderately neglected. Some specific areas seem potentially very neglected given their second and third order positive impacts. This is highly uncertain. Many organisations have started work in this area, but the three “meta” organisations are all <3 years old and have <$5m (estimated) funding each. Given the >$1tn socailimpact value possible by accelerating this work by 10 to 30 years (and the limited risks) further research might be warranted as a cause area. At least awareness raising might be helpful.

Appendix:

I list the three meta groups here. Others in table above available by search.

Climate Arc is so new its website launched only in June 2022 (?!) https://​​climatearc.org/​​

Icebreaker One: https://​​icebreakerone.org/​​

OS Climate: https://​​os-climate.org/​​

If of interest, I can give interested parties a brief chat on my views (reach out via my website, here or Linkedin /​ Twitter) on these organisations from afar. In short, I am unconvinced that these organisations can in current form and funding, solve this problem in <10 years, but this view is more am intuition at this and looking at their resources and structure and comparing it to how other start ups have done.

  1. ^

    Estimates vary and models have great uncertainty.. IMF paper (2019) suggests a real GDP loss of 7% at current trajectory. https://​​www.imf.org/​​-/​​media/​​Files/​​Publications/​​WP/​​2019/​​wpiea2019215-print-pdf.ashx

    Mini-addendum: Climate Tech VC follows climate tech start-up formation that was running about 1000 companies in the last year (mostly US), so maybe 100 start-ups pa increased is a better estimate, I indicated 1000.

    LSE paper has loss to UK GDP at 7% by 2100. https://​​www.lse.ac.uk/​​granthaminstitute/​​publication/​​what-will-climate-change-cost-the-uk/​​


    Burke (Stanford, Nature, 2015) suggests 23% reduction in global incomes by 2100. http://​​web.stanford.edu/​​~mburke/​​climate/​​

  2. ^
  3. ^
  4. ^
No comments.