This post benefited from feedback and comments from the whole Conjecture team, as well as others including Steve Byrnes, Paul Christiano, Leo Gao, Evan Hubinger, Daniel Kokotajlo, Vanessa Kosoy, John Wentworth, Eliezer Yudkowsky. Many others also kindly shared their feedback and thoughts on it formally or informally, and we are thankful for everyone’s help on this work.
Much has been written on this forum about infohazards, such as information that accelerates AGI timelines, though very few posts attempt to operationalize that discussion into a policy that can be followed by organizations and individuals. This post makes a stab at implementation.
Below we share Conjecture’s internal infohazard policy as well as some considerations that we took into account while drafting it. Our goal with sharing this on this forum is threefold:
To encourage accountability. We think that organizations working on artificial intelligence—particularly those training and experimenting on large models—need to be extremely cautious about advancing capabilities and accelerating timelines. Adopting internal policies to mitigate the risk of leaking dangerous information is essential, and being public about those policies signals commitment to this idea. I.e., shame on us if we break this principle.
To promote cross-organization collaboration. While secrecy can hurt productivity, we believe that organizations will be able to work more confidently with each other if they follow similar infohazard policies. Two parties can speak more freely when they mutually acknowledge what information is sharable and to whom it can be shared, and when both show serious dedication to good information security. A policy that formalizes this means that organizations and individuals don’t need to reinvent norms for trust each time they interact.
Note that at the current level of implementation, mutual trust relies mostly on the consequence of “if you leak agreed-upon secrets your reputation is forever tarnished.” But since alignment is a small field, this seems to carry sufficient weight at current scale.
To start a conversation that leads to better policies. This policy is not perfect, reviewers disagreed on some of the content or presentation, and it is guaranteed that better versions of this can be made. We hope that in its imperfection, this policy can act as a seed from which better policies and approaches to handling infohazards can grow. Please share your feedback!
Overview and Motivation
“Infohazard” is underspecified and has been used to mean both “information that directly harms the hearer such that you would rather not hear it” and “information that increases the likelihood of collective destruction if it spreads or falls into the wrong hands.”
At Conjecture the kind of infohazard that we care about are those that accelerate AGI timelines, i.e., capabilities of companies, teams, or people without restraint. Due to the nature of alignment work at Conjecture it is assured that some employees will work on projects that are infohazardous in nature, as insights about how to increase the capabilities of AI systems can arise while investigating alignment research directions. We have implemented a policy to create norms that can protect this kind of information from spreading.
The TL;DR of the policy is: Mark all internal projects as explicitly secret, private, or public. Only share secret projects with selected individuals; only share private projects with selected groups; share public projects with anyone, but use discretion. When in doubt consult the project leader or the “appointed infohazard coordinator”.
We need an internal policy like this because trust does not scale: the more people who are involved in a secret, the harder it is to keep. If there is a probability of 99% / 95% / 90% that anyone keeps all Conjecture-related infohazard secrets, the probability of 30 people doing so drops to 74% / 21% / 4%. This implies that if you share secrets with everyone in the company, they will leak out.
Our policy leans conservative because leaking infohazardous information could lead to catastrophic outcomes. In general, reducing the spread of infohazards means more than just keeping them away from companies or people that could understand and deploy them. It means keeping them away from anyone, since sharing information with someone increases the opportunities it has to spread.
Considerations
An infohazard policy needs to strike the right balance between what should and what should not be disclosed, and to whom. The following are a number of high level considerations that we took into account when writing our policy:
Coordination is critical for alignment. We must have structures of information and management that allow for resources to be deployed in a trustworthy manner. In other words, we need coordinated organizations. This is a major reason why Conjecture was founded, rather than just supplying EleutherAI with more compute.
Sharing information is critical for coordination. Some reasons include:
Coherence: ensuring Conjecture knows where to aim its research efforts
Oversight: ensuring people are doing useful things and resources distributed well
Avoiding redundancy: ensuring no duplicate projects
Avoiding adversarial or destructively-competitive dynamics: ensuring people are not slowing down others in order to look better
Sharing information makes people more effective at doing alignment research.
Asking questions to many people about what you are working on helps a lot
99% of ideas are not solutions, they are brain food. Their entire +EV is being shared
Some infohazards are not that bad to spill, others are really bad to spill and it is usually not clear ahead of time which is which. We live in a bad timeline. We are not many years off from AGI; existing actors are racing towards AGI; and aligned organizations are not really listened to. We want to ensure that we do not worsen this situation.
Secrets are harder to keep for longer periods of time. It is easier to keep secrets for 1 month than 10 years. “Luckily” you expect this to be less of a problem if you have short timelines.
Secrets are harder to keep the more you have. It is easier to keep track of 1 secret than 100 secrets. Therefore, it is better to silo per repository or project, than per individual technical insights. This is a crude level of granularity, which means that more things will be made part of secret packages than needed, with the goal of keeping the number of packages themselves small.
Safety vs Security. This policy is designed with safety in mind: the goal is to avoid leaks (in particular, accidental leaks), rather than being spy-resistant (infosec and opsec will be dealt with in future policies).
In other words, we need to balance many different considerations, not merely whether “it is an infohazard or not”.
The Policy
(Verbatim from Conjecture’s internal document.)
Introduction
This document is for Conjecture and includes employees, interns, and collaborators. Note that this policy is not retroactive; any past discussions on this subject have been informal.
This policy applies only to direct infohazards related to AGI Capabilities. To be completely clear: this is about infohazards, not PR hazards, reputational hazards, etc.; and this is about AGI capabilities.
Examples of presumptive infohazards:
Leaking code that trains networks faster
Leaking a new technique that trains networks faster
Leaking a new specific theory that leads to techniques that trains networks faster
Letting it be known outside of Conjecture that we have used/built/deployed a technique that already exists in the literature to train networks faster
Letting it be known outside of Conjecture that we are interested in using/building/deploying a technique that already exists in the literature in order to train networks faster
1-3 are obvious. 4-5 are dangerous because they attract more attention to ideas that increase average negative externality. If in the future we want to hide more types of information that are not covered by the current policy, we should explicitly extend the scope of what is hidden.
Siloing of information and projects is important even within Conjecture. Generally any individual team member working on secret projects may disclose to others that they are working on secret projects, but nothing more.
The default mantra is “need to know”. Does this person need to know X? If not, don’t say anything. Ideally, no one that does not need to know should know how many secret projects exist, which projects people work on, and what any of those projects are about.
While one should not proactively offer that they are keeping a secret, we should strive for meta-honesty. Thismeans that when asked directly we should be transparent that we are observing an infohazard policy that hides things, and explain why we are doing so.
Rules
There are three levels of disclosure that we will apply.
Secret: Information only shareable with specific individuals.
Private: Information only shareable with a fuzzy group.
Public: Information shareable with everyone.
We will consider these levels of disclosure for following types of information:
Repositories: Entire repositories are easier to box than individual files and folders.
Projects: Specific projects or sub-projects are easier to box than issues within a project.
Ideas: Ideas are very hard to keep track of and make secret, so when possible, we will try to box them in repositories or projects. But there may be ideas that are exceptions.
Each project that is secret or private must have an access document associated with it that lists who knows about the secret and any whitelisted information. This document is a minor infosecurity hazard, but is important for coordination.
An appointed infohazard coordinator has access to all secrets and private projects. For Conjecture, this person is Connor, and the succession chain goes Connor → Gabe → Sid → Adam. When collaborating with other organizations on a secret or private project, each organization’s appointed coordinator has access to the project. This clause ensures there is a dedicated person to discuss infohazards with, help set standards, and resolve ambiguity when questions arise. A second benefit of the coordinator is strategy: whoever is driving Conjecture should have a map of what we are working on and what we are intentionally not working on.
Leaking infohazardous information is a major breach of trust not just at Conjecture but in the alignment community as a whole. Intentional violation of the policy will result in immediate dismissal from the company. This applies to senior leadership as well. Mistakes are different from intentional leaking of infohazards.
More details on the levels of disclosure are below, and additional detail on consequences and the process for discerning if leaked information was shared intentionally or not is discussed in “Processes”.
Secret
Who: For an item to be a secret, it needs to have a precise list of people who know it. This list can include people outside of Conjecture. This list of people must be written down explicitly in the access document, and must include the appointed infohazard coordinator.
What is covered: All information related to secret projects must be treated as secret unless it is whitelisted. It may only be discussed with members of the secret project. The policy prohibits leaking secret information to anyone not on the project list. When in doubt as to whether information is “related,” talk to the project lead or appointed infohazard coordinator.
Whitelisting: For any given secret, there may be some information that is okay to share with more people than just the secret group. All whitelisted information must be explicitly written in the access document and agreed on by those who know the secret. Without this written and acknowledged agreement, sharing any information related to the secret will be considered a breach of policy. Whitelisted information is considered private and must follow the rules of private information below. The appointed infohazard coordinator must abide by the same whitelisting rules as all other members of a secret group.
Private
Who: For a thing to be private, it needs to have a group or groups that are privy to it. In practice, this could mean Conjecture, or groups from the Alignment community (e.g., “Conjecture and Redwood know about this.”) The document for each private project must explicitly state which groups know the information. Information related to private projects may only be shared with members of these groups. See “Sharing Information” in processes below for more details.
Try to define fuzzy boundaries: Private things can be much less specific than secret things, and the “groups” shared with are sometimes undefined. This is a necessary evil because we do not have the bandwidth to keep formal track of all interactions and idea-sharing; these broader categories are necessary to facilitate collaboration.
Limit publicity: No private information may be published to sites like LessWrong or AlignmentForum, or in a paper. Please refrain from posting about private information on Discord, Slack or other messaging platforms used in the future by Conjecture, unless in relatively small and protected environments where the “group” or audience is clear.
Use sparingly: Avoid relying on private as a category; it is clearer and preferable to make information either public or secret.
Public
Who: Literally everyone. This is information you could tell your mother, or all your friends, or post on Twitter. That does not mean that you should.
Discretion: It is still important to apply discretion to sharing public information. Posts that we write on the AlignmentForum are public, but they are not press releases. Information we discuss at EAG is public, but it is not something we would email everyone about.
Forums are public: LessWrong, AlignmentForum, EA Forum, EleutherAI Discord (public channels), and similar sites are considered public outlets. Any information shared there must be deemed public.
No take-back-sies: Once public information has been shared, we cannot take it back. We may revise our discretion level and move information to secret, but we should assume anything shared is permanently, irreversibly, out there.
Processes
1. Assigning Disclosure Levels
For new projects: Whenever a new project is spun up, the appointed infohazard coordinator and the project lead work will work together to assess if the content of the project is infohazardous and if it should be assigned as secret, private, or public. Each conversation will include:
(1) what information the project covers
(2) in what forms the information about the project already exists, e.g., written, repo, AF post, etc.
(3) who knows about the project, and who should know about the project
(4) proposed disclosure level
If the project is determined to be secret or private, an access document must be created that lists who knows about the project and any whitelisted information. Any information about the project that currently exists in written form must be moved to and saved in a repository or project folder with permissions limited to those on the access document list.
Anyone can ask the appointed infohazard coordinator to start a project as a secret. The default is to accept. At Conjecture, the burden of proof is on Connor if he wants to refuse, and he must raise an objection that proves that the matter is complicated enough to not accept immediately, and might change in the future. In general, any new technical or conceptual project that seems like it could conceivably lead to capabilities progress should be created as secret by default.
(We will return to this clause after some months of trialing this policy to write better guidelines for deciding what status to assign projects).
For current projects (changing disclosure levels): Anyone can propose changing the disclosure level of a project.
Secret → Private: To move a project from secret to private, all members of the project and the appointed infohazard coordinator must agree.
Private → Public: Before making public any information, all members of the project must agree. Also, members must consult external trusted sources and get a strong majority of approval. We are considering:
[We are in the process of reaching out to individuals and we will include them after they confirm. If you have suggestions for individuals to include please add a comment here.]
E.g.,: Eliezer Yudkowsky
Paul Christiano
Evan Hubinger
John Wentworth
…
When collaborating with another organization, there should be one or more individuals that both parties agree is trusted to adjudicate on the matter.
Public → Private: Avoid this. Redacting previously public information is difficult, and in the rare circumstance that this should be done it is presumably because the information is infohazardous enough that it should be made secret.
Public or Private → Secret: This should only be considered in situations where infohazardous information is determined to be particularly sensitive. Furthermore, this should be done with care, in order to avoid attracting more attention from the Streisand effect.
Here, the burden of proof is on the individual proposing the change, and they should discuss the matter directly with the project leader or the appointed infohazard coordinator. If the coordinator (and in most cases the project lead) agree, follow the process in “for new projects” above.
Additionally, if the project was private and if this is feasible, check in with everyone that currently has access to the information to inform them that the disclosure level is changing to secret, and have them read the infohazard policy. Each person must be added to the list of people who know about the project. If these individuals will no longer be working on the project, they should still be noted as knowing about the project, but in a separate list.
2. Sharing Information
Each project must have an access document associated with it that lists who knows about the information and what information is whitelisted to discuss more freely. This list will be kept in a folder or git repository that only members of the secret or private project have access to.
Secret information can only be shared with the individuals who are written on the access list. Anyone in a secret project may propose adding someone new to the secret. First discuss adding the individual with the project leader, and then inform all current members and give them a chance to object. If someone within the team objects, the issue is escalated to the appointed infohazard coordinator, who has the final word. If the team is in unanimous agreement, the coordinator gets a final veto (it is understood that the coordinator is supposed to only use this veto if they have private information as to why adding this person would be a bad idea).
Private information can only be shared with members of groups who are written on the access list. Before sharing private information with person X, first check if the private piece of information has already been shared to someone from the same group as X. Then, discuss general infohazard considerations with X and acknowledge which select groups have access to this information. Then, notify others at Conjecture that you have shared the information with X. In case of doubt, ask first.
Public information can be talked about with anyone freely, though please be reasonable.
For all secret and private projects, by default information sharing should happen verbally and should be kept out of writing (in messages or documents) when possible.
3. Policy Violation Process
We ask present and future employees and interns to sign nondisclosure agreements that reiterate this infohazard policy. Intentional violation of the policy will result in immediate dismissal from the company. The verdict of whether the sharing was intentional or not will be determined by the appointed infohazard coordinator but be transparent to all members privy to the secret ((i.e., at Conjecture, Connor may unilaterally decide, but has his reputation and trust at stake in the process).
C-suite members of Conjecture are not above this policy. This is imperative because so much of this policy relies on the trust of senior leaders. As mentioned above, the chain of succession on who knows infohazards goes Connor → Gabe → Sid → Adam; though actual succession planning is outside the scope of this document. If it is Connor who is in question for intentionally leaking an infohazard, Gabe will adjudicate the process with transparency available to members of the group privy to the secret. Because of the severity of this kind of decision, we may opt to bring in external review to the process and lean on the list of “Trusted Sources” above.
Mistakes are different from intentional sharing of infohazards. We will have particular lenience during the first few months that this policy is active as we explore how it is to live with. We want to ensure that we create as robust a policy as possible, and encourage employees to share mistakes as quickly as possible such that we can revise this policy to be more watertight. Therefore, unless sharing of infohazardous information that is particularly egregious, nobody will be fired for raising a concern in good faith.
4. Information Security and Storage
[Details of Conjecture’s infosecurity processes are—for infosecurity reasons—excluded here.]
5. Quarterly Policy Review
We will review this policy as part of our quarterly review cycle. The policy will be discussed by all of Conjecture in a team meeting, and employees will be given the opportunity to talk about what has gone well and what has not gone well. In particular, the emphasis will be on clarifying places where the policy is not clear or introduces contradictions, and adding additional rules that promote safety.
The quarterly review will also be an opportunity for Project Leaders to review access documents to ensure lists of individuals and whitelisted information for each project are up-to-date and useful.
This policy will always be available for employees at Conjecture to view and make suggestions on, and the quarterly review cycle will be an opportunity to review all of these comments and make changes as needed.
Additional Considerations
The information below is not policy, but is saved alongside Conjecture’s internal policy for employee consideration.
Example Scenarios
It is difficult to keep secrets and few people have experience keeping large parts of their working life private. Because of this, we anticipate some infohazardous information will leak due to mistakes. The following examples are common situations where infohazardous information could leak; we include potential responses to illustrate how an employee could respond.
You have an idea about a particular line of experimentation in a public project, but are concerned that some of the proposed experiments may have capability benefits. You are weighing whether to investigate the experiments further and whether or not you should discuss the matter with others.
Potential response: Consider discussing the matter in private with the project lead or appointed infohazard coordinator. If it is unknown whether information could potentially be infohazardous, it is safer to assume risk. A secret project could be spun off from the public project to investigate how infohazardous it is. If the experimental direction is safe, it could be updated to be public. If the experimental direction is infohazardous, it could stay secret. If the experimental direction is sufficiently dangerous, the formerly public project could be made secret by following the process in “Assigning Disclosure Levels” in the policy.
You are in the same situation and have an idea for a particular line of experimentation in a public project, but this time believe P(experiments result in capabilities boost) is very small but still positive. You are considering whether there is any small but positive probability with which you should act differently than scenario (1).
Potential response: Ultimately, a policy should be practical. Sharing information makes people more effective at doing alignment research. There is always a small probability that things can go wrong, but if you feel that an idea has low P(experiments result in capabilities boost) while also being additive to alignment, you can discuss it without treating it as secret. That said, if you have any doubt as to whether this is the case or not in a particular situation, see scenario (1).
You are at a semi-public event like EAG and a researcher from another alignment organization approaches and asks what research projects you and other Conjecture employees are working on.
Potential response: Mention the public projects. You may mention the fact that there are private and secret projects that we do not discuss, even if you are not part of any. If the individual is a member of one of the groups, you may mention the private projects the group the person belongs to is privy to.
You are at an alignment coffee-time and someone mentions a super cool idea that is related to a secret project you are working on. You want to exchange ideas with this individual and are worried that you might not have the opportunity to speak in the future.
Potential response: The fact that this is a time limited event should not change anything. One must go through the process, and the process takes time. This is a feature and not a bug. Concretely, this means you do not discuss that secret project or the ideas related to the project with that person. Feel free to learn more about how far that person is in their idea though.
You are talking with people about research ideas and accidentally share potentially infohazardous information. You realize immediately after the conversation and are wondering if you should tell the people you just spilled information to that the ideas are infohazardous and should be kept secret.
Potential response: Mention this to the project lead and appointed infohazard coordinator as soon as possible before returning to the people, and discuss what to do with them. Because these situations are highly context dependent it is best to treat each on a case by case basis rather than establishing one general rule for mistakes.
You are at EAG and you come across someone talking publicly about an idea which is very similar to an infohazardous project you are working on. You are considering whether to talk to them about the risk of sharing that information.
Potential response: This depends on how good you are with words. If you confidently know you are good enough to hold this conversation without spilling beans, go. Else, if you have any doubt, mention this to your project lead and the appointed infohazard coordinator.
Best Practices
The following are a number of miscellaneous recommendations and best practices on infohazard hygiene. Employees should review these and consider if their current approach is in line with these recommendations.
Exercise caution. Consider potentially-infohazardous ideas to be infohazardous until you have checked with your project lead. Given that potential infohazards are private by default, attempt to formalize them as secret as quickly as possible.
Be careful with written material. Writing is more helpful than verbal communication for sharing information, but riskier. Writing helps coordination because written artifacts scale, and they’re easier to process. But writing leaves more traces, meaning that it is harder to keep a secret. Written stuff is more often recorded than verbal stuff, very often automatically, and very often even unknowingly.
Consider carefully the audience of any conversation. The audience of a conversation matters. Capabilities engineers, individuals with executive power, public figures, influencers, etc. are riskier audiences, and even more caution than usual should be used when speaking with them even if you are not discussing infohazardous projects. This also applies to people with a history of sharing (what they or we think is) infohazardous information, or who do not take infohazards seriously.
Avoid implying the existence of infohazards. It is sometimes best to answer directly that information is infohazardous and that you are not willing to speak further on it; this is a “hard no” answer. But most of the time, it is easier and safer to avoid implying the existence of infohazards. Defer to talking about other work, and do not proactively offer that you are working on infohazardous projects if it can be avoided.
Avoid bragging, especially to romantic interests. It is easy to justify bypassing the process by thinking that your romantic partner is not a concern. However, it is important that this process is not only about you, but about everyone you work with: the question is not whether or not you evaluate your romantic partner to be a concern, but whether or not you want everyone you work with to have to evaluate this too. It is also about whether you want to evaluate that everyone’s romantic partner is not a concern. This does not scale. If you think doing so is actually worth it, see with your team to add them in the secret. However, it is likely much easier to just not mention infohazardous information to them in the first place.
Be careful with Google docs, since they don’t delete editing history. It is easy to accidentally spread infohazards that have been deleted from documents simply because they were not deleted from the version history. Do not make this mistake. Sharing documents with other organizations should be done only in rare circumstances in general, but when it must be done, share view-only versions such that file history cannot be viewed. Additionally, share with specific people instead of making it readable to anyone with a URL.
Psychological Safety
Working on a secret project and not being able to talk about what you’re doing and thinking about can take an emotional toll. The nature of Conjecture (startup, generally young, mostly immigrants) means that for most employees, coworkers provide the majority of socialization, and a large aspect of socialization with coworkers is talking about projects and ideas.
On one hand, the difficulty of secret-keeping should be embraced. The fact that it takes an emotional toll is not coincidence, and is well aligned with reality. Mitigations against this may make things worse, and we should default towards not employing people if they have difficulty holding secrets.
On the other hand, we do not currently have the bandwidth to be perfectly selective as to who we hire and assign to secret projects. And we can’t rely on people self-reporting that they’ll be incapable of holding a secret before being hired or assigned to a project. Most people don’t have a good counterfactual model of themselves.
Therefore psychological safety is not just a concern for the emotional well-being of employees but also for the robustness of this policy. Someone who is feeling stressed or isolated is more likely to breach secrecy. Emotional dynamics are just as real a factor in the likelihood that secrets get shared as the number of people who know the secret. In both cases we assume human fallibility. If we only ever hired infallible people, there would be no reason to have internally siloed projects.
Potential risk factors that amplify the likelihood that an infohazard is revealed:
A person’s primary project(s) are secret
Long-running secret projects
Particularly scary projects
Non obviously scary projects
Trusted friends are not included in the silo
Personality and life circumstances
As such, we will consider taking some possible solutions into account with our approach to infohazardous projects such as not assigning people only to siloed projects, siloing projects between collaborators who are used to being very open with each other, or adding a trusted emotional support person to project siloes who knows only high-level and not implementation details. Note that Conjecture will not guarantee following any of these steps, and therefore this is not policy but rather general considerations.
In general, employees reading this policy should understand that mental health and psychological safety are taken seriously at Conjecture, and that if there are ever any concerns about this, that they should raise any concerns with senior management or whomever else they are comfortable speaking with. Rachel and Chris have both volunteered as confidants if individuals would prefer to express concerns to someone besides Connor, Gabe, or Sid.
An additional emotional consideration is that it should cost zero social capital to have and keep something secret. This is very much not the default without a written policy, where it often costs people social capital and additional effort to keep something secret. The goal at Conjecture is for this not to be the case, and for anyone to be able to comfortably keep things secret by default without institutional or cultural pushback. We also intend for this policy to reduce overhead (the need to figure out bespoke solutions for how to handle each new secret) and stress (the psychological burden of keeping a secret). Having access to a secret is by no means a sign of social status. In that vein, a junior engineer might have access to things that a senior engineer does not.
Conjecture: Internal Infohazard Policy
This post benefited from feedback and comments from the whole Conjecture team, as well as others including Steve Byrnes, Paul Christiano, Leo Gao, Evan Hubinger, Daniel Kokotajlo, Vanessa Kosoy, John Wentworth, Eliezer Yudkowsky. Many others also kindly shared their feedback and thoughts on it formally or informally, and we are thankful for everyone’s help on this work.
Much has been written on this forum about infohazards, such as information that accelerates AGI timelines, though very few posts attempt to operationalize that discussion into a policy that can be followed by organizations and individuals. This post makes a stab at implementation.
Below we share Conjecture’s internal infohazard policy as well as some considerations that we took into account while drafting it. Our goal with sharing this on this forum is threefold:
To encourage accountability. We think that organizations working on artificial intelligence—particularly those training and experimenting on large models—need to be extremely cautious about advancing capabilities and accelerating timelines. Adopting internal policies to mitigate the risk of leaking dangerous information is essential, and being public about those policies signals commitment to this idea. I.e., shame on us if we break this principle.
To promote cross-organization collaboration. While secrecy can hurt productivity, we believe that organizations will be able to work more confidently with each other if they follow similar infohazard policies. Two parties can speak more freely when they mutually acknowledge what information is sharable and to whom it can be shared, and when both show serious dedication to good information security. A policy that formalizes this means that organizations and individuals don’t need to reinvent norms for trust each time they interact.
Note that at the current level of implementation, mutual trust relies mostly on the consequence of “if you leak agreed-upon secrets your reputation is forever tarnished.” But since alignment is a small field, this seems to carry sufficient weight at current scale.
To start a conversation that leads to better policies. This policy is not perfect, reviewers disagreed on some of the content or presentation, and it is guaranteed that better versions of this can be made. We hope that in its imperfection, this policy can act as a seed from which better policies and approaches to handling infohazards can grow. Please share your feedback!
Overview and Motivation
“Infohazard” is underspecified and has been used to mean both “information that directly harms the hearer such that you would rather not hear it” and “information that increases the likelihood of collective destruction if it spreads or falls into the wrong hands.”
At Conjecture the kind of infohazard that we care about are those that accelerate AGI timelines, i.e., capabilities of companies, teams, or people without restraint. Due to the nature of alignment work at Conjecture it is assured that some employees will work on projects that are infohazardous in nature, as insights about how to increase the capabilities of AI systems can arise while investigating alignment research directions. We have implemented a policy to create norms that can protect this kind of information from spreading.
The TL;DR of the policy is: Mark all internal projects as explicitly secret, private, or public. Only share secret projects with selected individuals; only share private projects with selected groups; share public projects with anyone, but use discretion. When in doubt consult the project leader or the “appointed infohazard coordinator”.
We need an internal policy like this because trust does not scale: the more people who are involved in a secret, the harder it is to keep. If there is a probability of 99% / 95% / 90% that anyone keeps all Conjecture-related infohazard secrets, the probability of 30 people doing so drops to 74% / 21% / 4%. This implies that if you share secrets with everyone in the company, they will leak out.
Our policy leans conservative because leaking infohazardous information could lead to catastrophic outcomes. In general, reducing the spread of infohazards means more than just keeping them away from companies or people that could understand and deploy them. It means keeping them away from anyone, since sharing information with someone increases the opportunities it has to spread.
Considerations
An infohazard policy needs to strike the right balance between what should and what should not be disclosed, and to whom. The following are a number of high level considerations that we took into account when writing our policy:
Coordination is critical for alignment. We must have structures of information and management that allow for resources to be deployed in a trustworthy manner. In other words, we need coordinated organizations. This is a major reason why Conjecture was founded, rather than just supplying EleutherAI with more compute.
Sharing information is critical for coordination. Some reasons include:
Coherence: ensuring Conjecture knows where to aim its research efforts
Oversight: ensuring people are doing useful things and resources distributed well
Avoiding redundancy: ensuring no duplicate projects
Avoiding adversarial or destructively-competitive dynamics: ensuring people are not slowing down others in order to look better
Sharing information makes people more effective at doing alignment research.
Asking questions to many people about what you are working on helps a lot
99% of ideas are not solutions, they are brain food. Their entire +EV is being shared
Some infohazards are not that bad to spill, others are really bad to spill and it is usually not clear ahead of time which is which. We live in a bad timeline. We are not many years off from AGI; existing actors are racing towards AGI; and aligned organizations are not really listened to. We want to ensure that we do not worsen this situation.
Secrets are harder to keep for longer periods of time. It is easier to keep secrets for 1 month than 10 years. “Luckily” you expect this to be less of a problem if you have short timelines.
Secrets are harder to keep the more you have. It is easier to keep track of 1 secret than 100 secrets. Therefore, it is better to silo per repository or project, than per individual technical insights. This is a crude level of granularity, which means that more things will be made part of secret packages than needed, with the goal of keeping the number of packages themselves small.
Safety vs Security. This policy is designed with safety in mind: the goal is to avoid leaks (in particular, accidental leaks), rather than being spy-resistant (infosec and opsec will be dealt with in future policies).
Functional Decision Theory. Don’t mess it up for everyone.
In other words, we need to balance many different considerations, not merely whether “it is an infohazard or not”.
The Policy
(Verbatim from Conjecture’s internal document.)
Introduction
This document is for Conjecture and includes employees, interns, and collaborators. Note that this policy is not retroactive; any past discussions on this subject have been informal.
This policy applies only to direct infohazards related to AGI Capabilities. To be completely clear: this is about infohazards, not PR hazards, reputational hazards, etc.; and this is about AGI capabilities.
Examples of presumptive infohazards:
Leaking code that trains networks faster
Leaking a new technique that trains networks faster
Leaking a new specific theory that leads to techniques that trains networks faster
Letting it be known outside of Conjecture that we have used/built/deployed a technique that already exists in the literature to train networks faster
Letting it be known outside of Conjecture that we are interested in using/building/deploying a technique that already exists in the literature in order to train networks faster
1-3 are obvious. 4-5 are dangerous because they attract more attention to ideas that increase average negative externality. If in the future we want to hide more types of information that are not covered by the current policy, we should explicitly extend the scope of what is hidden.
Siloing of information and projects is important even within Conjecture. Generally any individual team member working on secret projects may disclose to others that they are working on secret projects, but nothing more.
The default mantra is “need to know”. Does this person need to know X? If not, don’t say anything. Ideally, no one that does not need to know should know how many secret projects exist, which projects people work on, and what any of those projects are about.
While one should not proactively offer that they are keeping a secret, we should strive for meta-honesty. This means that when asked directly we should be transparent that we are observing an infohazard policy that hides things, and explain why we are doing so.
Rules
There are three levels of disclosure that we will apply.
Secret: Information only shareable with specific individuals.
Private: Information only shareable with a fuzzy group.
Public: Information shareable with everyone.
We will consider these levels of disclosure for following types of information:
Repositories: Entire repositories are easier to box than individual files and folders.
Projects: Specific projects or sub-projects are easier to box than issues within a project.
Ideas: Ideas are very hard to keep track of and make secret, so when possible, we will try to box them in repositories or projects. But there may be ideas that are exceptions.
Each project that is secret or private must have an access document associated with it that lists who knows about the secret and any whitelisted information. This document is a minor infosecurity hazard, but is important for coordination.
An appointed infohazard coordinator has access to all secrets and private projects. For Conjecture, this person is Connor, and the succession chain goes Connor → Gabe → Sid → Adam. When collaborating with other organizations on a secret or private project, each organization’s appointed coordinator has access to the project. This clause ensures there is a dedicated person to discuss infohazards with, help set standards, and resolve ambiguity when questions arise. A second benefit of the coordinator is strategy: whoever is driving Conjecture should have a map of what we are working on and what we are intentionally not working on.
Leaking infohazardous information is a major breach of trust not just at Conjecture but in the alignment community as a whole. Intentional violation of the policy will result in immediate dismissal from the company. This applies to senior leadership as well. Mistakes are different from intentional leaking of infohazards.
More details on the levels of disclosure are below, and additional detail on consequences and the process for discerning if leaked information was shared intentionally or not is discussed in “Processes”.
Secret
Who: For an item to be a secret, it needs to have a precise list of people who know it. This list can include people outside of Conjecture. This list of people must be written down explicitly in the access document, and must include the appointed infohazard coordinator.
What is covered: All information related to secret projects must be treated as secret unless it is whitelisted. It may only be discussed with members of the secret project. The policy prohibits leaking secret information to anyone not on the project list. When in doubt as to whether information is “related,” talk to the project lead or appointed infohazard coordinator.
Whitelisting: For any given secret, there may be some information that is okay to share with more people than just the secret group. All whitelisted information must be explicitly written in the access document and agreed on by those who know the secret. Without this written and acknowledged agreement, sharing any information related to the secret will be considered a breach of policy. Whitelisted information is considered private and must follow the rules of private information below. The appointed infohazard coordinator must abide by the same whitelisting rules as all other members of a secret group.
Private
Who: For a thing to be private, it needs to have a group or groups that are privy to it. In practice, this could mean Conjecture, or groups from the Alignment community (e.g., “Conjecture and Redwood know about this.”) The document for each private project must explicitly state which groups know the information. Information related to private projects may only be shared with members of these groups. See “Sharing Information” in processes below for more details.
Try to define fuzzy boundaries: Private things can be much less specific than secret things, and the “groups” shared with are sometimes undefined. This is a necessary evil because we do not have the bandwidth to keep formal track of all interactions and idea-sharing; these broader categories are necessary to facilitate collaboration.
Limit publicity: No private information may be published to sites like LessWrong or AlignmentForum, or in a paper. Please refrain from posting about private information on Discord, Slack or other messaging platforms used in the future by Conjecture, unless in relatively small and protected environments where the “group” or audience is clear.
Use sparingly: Avoid relying on private as a category; it is clearer and preferable to make information either public or secret.
Public
Who: Literally everyone. This is information you could tell your mother, or all your friends, or post on Twitter. That does not mean that you should.
Discretion: It is still important to apply discretion to sharing public information. Posts that we write on the AlignmentForum are public, but they are not press releases. Information we discuss at EAG is public, but it is not something we would email everyone about.
Forums are public: LessWrong, AlignmentForum, EA Forum, EleutherAI Discord (public channels), and similar sites are considered public outlets. Any information shared there must be deemed public.
No take-back-sies: Once public information has been shared, we cannot take it back. We may revise our discretion level and move information to secret, but we should assume anything shared is permanently, irreversibly, out there.
Processes
1. Assigning Disclosure Levels
For new projects: Whenever a new project is spun up, the appointed infohazard coordinator and the project lead work will work together to assess if the content of the project is infohazardous and if it should be assigned as secret, private, or public. Each conversation will include:
(1) what information the project covers
(2) in what forms the information about the project already exists, e.g., written, repo, AF post, etc.
(3) who knows about the project, and who should know about the project
(4) proposed disclosure level
If the project is determined to be secret or private, an access document must be created that lists who knows about the project and any whitelisted information. Any information about the project that currently exists in written form must be moved to and saved in a repository or project folder with permissions limited to those on the access document list.
Anyone can ask the appointed infohazard coordinator to start a project as a secret. The default is to accept. At Conjecture, the burden of proof is on Connor if he wants to refuse, and he must raise an objection that proves that the matter is complicated enough to not accept immediately, and might change in the future. In general, any new technical or conceptual project that seems like it could conceivably lead to capabilities progress should be created as secret by default.
(We will return to this clause after some months of trialing this policy to write better guidelines for deciding what status to assign projects).
For current projects (changing disclosure levels): Anyone can propose changing the disclosure level of a project.
Secret → Private: To move a project from secret to private, all members of the project and the appointed infohazard coordinator must agree.
Private → Public: Before making public any information, all members of the project must agree. Also, members must consult external trusted sources and get a strong majority of approval. We are considering:
[We are in the process of reaching out to individuals and we will include them after they confirm. If you have suggestions for individuals to include please add a comment here.]
E.g.,: Eliezer Yudkowsky
Paul Christiano
Evan Hubinger
John Wentworth
…
When collaborating with another organization, there should be one or more individuals that both parties agree is trusted to adjudicate on the matter.
Public → Private: Avoid this. Redacting previously public information is difficult, and in the rare circumstance that this should be done it is presumably because the information is infohazardous enough that it should be made secret.
Public or Private → Secret: This should only be considered in situations where infohazardous information is determined to be particularly sensitive. Furthermore, this should be done with care, in order to avoid attracting more attention from the Streisand effect.
Here, the burden of proof is on the individual proposing the change, and they should discuss the matter directly with the project leader or the appointed infohazard coordinator. If the coordinator (and in most cases the project lead) agree, follow the process in “for new projects” above.
Additionally, if the project was private and if this is feasible, check in with everyone that currently has access to the information to inform them that the disclosure level is changing to secret, and have them read the infohazard policy. Each person must be added to the list of people who know about the project. If these individuals will no longer be working on the project, they should still be noted as knowing about the project, but in a separate list.
2. Sharing Information
Each project must have an access document associated with it that lists who knows about the information and what information is whitelisted to discuss more freely. This list will be kept in a folder or git repository that only members of the secret or private project have access to.
Secret information can only be shared with the individuals who are written on the access list. Anyone in a secret project may propose adding someone new to the secret. First discuss adding the individual with the project leader, and then inform all current members and give them a chance to object. If someone within the team objects, the issue is escalated to the appointed infohazard coordinator, who has the final word. If the team is in unanimous agreement, the coordinator gets a final veto (it is understood that the coordinator is supposed to only use this veto if they have private information as to why adding this person would be a bad idea).
Private information can only be shared with members of groups who are written on the access list. Before sharing private information with person X, first check if the private piece of information has already been shared to someone from the same group as X. Then, discuss general infohazard considerations with X and acknowledge which select groups have access to this information. Then, notify others at Conjecture that you have shared the information with X. In case of doubt, ask first.
Public information can be talked about with anyone freely, though please be reasonable.
For all secret and private projects, by default information sharing should happen verbally and should be kept out of writing (in messages or documents) when possible.
3. Policy Violation Process
We ask present and future employees and interns to sign nondisclosure agreements that reiterate this infohazard policy. Intentional violation of the policy will result in immediate dismissal from the company. The verdict of whether the sharing was intentional or not will be determined by the appointed infohazard coordinator but be transparent to all members privy to the secret ((i.e., at Conjecture, Connor may unilaterally decide, but has his reputation and trust at stake in the process).
C-suite members of Conjecture are not above this policy. This is imperative because so much of this policy relies on the trust of senior leaders. As mentioned above, the chain of succession on who knows infohazards goes Connor → Gabe → Sid → Adam; though actual succession planning is outside the scope of this document. If it is Connor who is in question for intentionally leaking an infohazard, Gabe will adjudicate the process with transparency available to members of the group privy to the secret. Because of the severity of this kind of decision, we may opt to bring in external review to the process and lean on the list of “Trusted Sources” above.
Mistakes are different from intentional sharing of infohazards. We will have particular lenience during the first few months that this policy is active as we explore how it is to live with. We want to ensure that we create as robust a policy as possible, and encourage employees to share mistakes as quickly as possible such that we can revise this policy to be more watertight. Therefore, unless sharing of infohazardous information that is particularly egregious, nobody will be fired for raising a concern in good faith.
4. Information Security and Storage
[Details of Conjecture’s infosecurity processes are—for infosecurity reasons—excluded here.]
5. Quarterly Policy Review
We will review this policy as part of our quarterly review cycle. The policy will be discussed by all of Conjecture in a team meeting, and employees will be given the opportunity to talk about what has gone well and what has not gone well. In particular, the emphasis will be on clarifying places where the policy is not clear or introduces contradictions, and adding additional rules that promote safety.
The quarterly review will also be an opportunity for Project Leaders to review access documents to ensure lists of individuals and whitelisted information for each project are up-to-date and useful.
This policy will always be available for employees at Conjecture to view and make suggestions on, and the quarterly review cycle will be an opportunity to review all of these comments and make changes as needed.
Additional Considerations
The information below is not policy, but is saved alongside Conjecture’s internal policy for employee consideration.
Example Scenarios
It is difficult to keep secrets and few people have experience keeping large parts of their working life private. Because of this, we anticipate some infohazardous information will leak due to mistakes. The following examples are common situations where infohazardous information could leak; we include potential responses to illustrate how an employee could respond.
You have an idea about a particular line of experimentation in a public project, but are concerned that some of the proposed experiments may have capability benefits. You are weighing whether to investigate the experiments further and whether or not you should discuss the matter with others.
Potential response: Consider discussing the matter in private with the project lead or appointed infohazard coordinator. If it is unknown whether information could potentially be infohazardous, it is safer to assume risk. A secret project could be spun off from the public project to investigate how infohazardous it is. If the experimental direction is safe, it could be updated to be public. If the experimental direction is infohazardous, it could stay secret. If the experimental direction is sufficiently dangerous, the formerly public project could be made secret by following the process in “Assigning Disclosure Levels” in the policy.
You are in the same situation and have an idea for a particular line of experimentation in a public project, but this time believe P(experiments result in capabilities boost) is very small but still positive. You are considering whether there is any small but positive probability with which you should act differently than scenario (1).
Potential response: Ultimately, a policy should be practical. Sharing information makes people more effective at doing alignment research. There is always a small probability that things can go wrong, but if you feel that an idea has low P(experiments result in capabilities boost) while also being additive to alignment, you can discuss it without treating it as secret. That said, if you have any doubt as to whether this is the case or not in a particular situation, see scenario (1).
You are at a semi-public event like EAG and a researcher from another alignment organization approaches and asks what research projects you and other Conjecture employees are working on.
Potential response: Mention the public projects. You may mention the fact that there are private and secret projects that we do not discuss, even if you are not part of any. If the individual is a member of one of the groups, you may mention the private projects the group the person belongs to is privy to.
You are at an alignment coffee-time and someone mentions a super cool idea that is related to a secret project you are working on. You want to exchange ideas with this individual and are worried that you might not have the opportunity to speak in the future.
Potential response: The fact that this is a time limited event should not change anything. One must go through the process, and the process takes time. This is a feature and not a bug. Concretely, this means you do not discuss that secret project or the ideas related to the project with that person. Feel free to learn more about how far that person is in their idea though.
You are talking with people about research ideas and accidentally share potentially infohazardous information. You realize immediately after the conversation and are wondering if you should tell the people you just spilled information to that the ideas are infohazardous and should be kept secret.
Potential response: Mention this to the project lead and appointed infohazard coordinator as soon as possible before returning to the people, and discuss what to do with them. Because these situations are highly context dependent it is best to treat each on a case by case basis rather than establishing one general rule for mistakes.
You are at EAG and you come across someone talking publicly about an idea which is very similar to an infohazardous project you are working on. You are considering whether to talk to them about the risk of sharing that information.
Potential response: This depends on how good you are with words. If you confidently know you are good enough to hold this conversation without spilling beans, go. Else, if you have any doubt, mention this to your project lead and the appointed infohazard coordinator.
Best Practices
The following are a number of miscellaneous recommendations and best practices on infohazard hygiene. Employees should review these and consider if their current approach is in line with these recommendations.
Exercise caution. Consider potentially-infohazardous ideas to be infohazardous until you have checked with your project lead. Given that potential infohazards are private by default, attempt to formalize them as secret as quickly as possible.
Be careful with written material. Writing is more helpful than verbal communication for sharing information, but riskier. Writing helps coordination because written artifacts scale, and they’re easier to process. But writing leaves more traces, meaning that it is harder to keep a secret. Written stuff is more often recorded than verbal stuff, very often automatically, and very often even unknowingly.
Consider carefully the audience of any conversation. The audience of a conversation matters. Capabilities engineers, individuals with executive power, public figures, influencers, etc. are riskier audiences, and even more caution than usual should be used when speaking with them even if you are not discussing infohazardous projects. This also applies to people with a history of sharing (what they or we think is) infohazardous information, or who do not take infohazards seriously.
Avoid implying the existence of infohazards. It is sometimes best to answer directly that information is infohazardous and that you are not willing to speak further on it; this is a “hard no” answer. But most of the time, it is easier and safer to avoid implying the existence of infohazards. Defer to talking about other work, and do not proactively offer that you are working on infohazardous projects if it can be avoided.
Avoid bragging, especially to romantic interests. It is easy to justify bypassing the process by thinking that your romantic partner is not a concern. However, it is important that this process is not only about you, but about everyone you work with: the question is not whether or not you evaluate your romantic partner to be a concern, but whether or not you want everyone you work with to have to evaluate this too. It is also about whether you want to evaluate that everyone’s romantic partner is not a concern. This does not scale. If you think doing so is actually worth it, see with your team to add them in the secret. However, it is likely much easier to just not mention infohazardous information to them in the first place.
Be careful with Google docs, since they don’t delete editing history. It is easy to accidentally spread infohazards that have been deleted from documents simply because they were not deleted from the version history. Do not make this mistake. Sharing documents with other organizations should be done only in rare circumstances in general, but when it must be done, share view-only versions such that file history cannot be viewed. Additionally, share with specific people instead of making it readable to anyone with a URL.
Psychological Safety
Working on a secret project and not being able to talk about what you’re doing and thinking about can take an emotional toll. The nature of Conjecture (startup, generally young, mostly immigrants) means that for most employees, coworkers provide the majority of socialization, and a large aspect of socialization with coworkers is talking about projects and ideas.
On one hand, the difficulty of secret-keeping should be embraced. The fact that it takes an emotional toll is not coincidence, and is well aligned with reality. Mitigations against this may make things worse, and we should default towards not employing people if they have difficulty holding secrets.
On the other hand, we do not currently have the bandwidth to be perfectly selective as to who we hire and assign to secret projects. And we can’t rely on people self-reporting that they’ll be incapable of holding a secret before being hired or assigned to a project. Most people don’t have a good counterfactual model of themselves.
Therefore psychological safety is not just a concern for the emotional well-being of employees but also for the robustness of this policy. Someone who is feeling stressed or isolated is more likely to breach secrecy. Emotional dynamics are just as real a factor in the likelihood that secrets get shared as the number of people who know the secret. In both cases we assume human fallibility. If we only ever hired infallible people, there would be no reason to have internally siloed projects.
Potential risk factors that amplify the likelihood that an infohazard is revealed:
A person’s primary project(s) are secret
Long-running secret projects
Particularly scary projects
Non obviously scary projects
Trusted friends are not included in the silo
Personality and life circumstances
As such, we will consider taking some possible solutions into account with our approach to infohazardous projects such as not assigning people only to siloed projects, siloing projects between collaborators who are used to being very open with each other, or adding a trusted emotional support person to project siloes who knows only high-level and not implementation details. Note that Conjecture will not guarantee following any of these steps, and therefore this is not policy but rather general considerations.
In general, employees reading this policy should understand that mental health and psychological safety are taken seriously at Conjecture, and that if there are ever any concerns about this, that they should raise any concerns with senior management or whomever else they are comfortable speaking with. Rachel and Chris have both volunteered as confidants if individuals would prefer to express concerns to someone besides Connor, Gabe, or Sid.
An additional emotional consideration is that it should cost zero social capital to have and keep something secret. This is very much not the default without a written policy, where it often costs people social capital and additional effort to keep something secret. The goal at Conjecture is for this not to be the case, and for anyone to be able to comfortably keep things secret by default without institutional or cultural pushback. We also intend for this policy to reduce overhead (the need to figure out bespoke solutions for how to handle each new secret) and stress (the psychological burden of keeping a secret). Having access to a secret is by no means a sign of social status. In that vein, a junior engineer might have access to things that a senior engineer does not.