This is an interesting #OpenPhil grant. $230K for a cyber threat intelligence researcher to create a database that tracks instances of users attempting to misuse large language models.
Will user data be shared with the user’s permission? How will an LLM determine the intent of the user when it comes to differentiating between purposeful harmful entries versus user error, safety testing, independent red-teaming, playful entries, etc. If a user is placed on the database, is she notified? How long do you stay in LLM prison?
I did send an email to OpenPhil asking about this grant, but so far I haven’t heard anything back.
This is an interesting #OpenPhil grant. $230K for a cyber threat intelligence researcher to create a database that tracks instances of users attempting to misuse large language models.
https://www.openphilanthropy.org/grants/lee-foster-llm-misuse-database/
Will user data be shared with the user’s permission? How will an LLM determine the intent of the user when it comes to differentiating between purposeful harmful entries versus user error, safety testing, independent red-teaming, playful entries, etc. If a user is placed on the database, is she notified? How long do you stay in LLM prison?
I did send an email to OpenPhil asking about this grant, but so far I haven’t heard anything back.