Error
Unrecognized LW server error:
Field "fmCrosspost" of type "CrosspostOutput" must have a selection of subfields. Did you mean "fmCrosspost { ... }"?
Unrecognized LW server error:
Field "fmCrosspost" of type "CrosspostOutput" must have a selection of subfields. Did you mean "fmCrosspost { ... }"?
Executive summary: Training Data Attribution (TDA) is a promising but underdeveloped tool for improving AI interpretability, safety, and efficiency, though its public adoption faces significant barriers due to AI labs’ reluctance to share training data.
Key points:
TDA identifies influential training data points to understand their impact on model behavior, with gradient-based methods currently the most practical approach.
Running TDA on large-scale models is now feasible but remains untested on frontier models, with efficiency improvements expected within 2-5 years.
Key benefits of TDA for AI research include mitigating hallucinations, improving data selection, enhancing interpretability, and reducing model size.
Public access to TDA tooling is hindered by AI labs’ desire to protect proprietary training data, avoid legal liabilities, and maintain competitive advantages.
Governments are unlikely to mandate public access to training data, but selective TDA inference or alternative data-sharing mechanisms might mitigate privacy concerns.
TDA’s greatest potential lies in improving AI technical safety and alignment, though it may also accelerate capabilities research, potentially increasing large-scale risks.
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.