[Link] "Progress Update October 2019" (Ought)

[Link] “Progress Update October 2019” (Ought)

Disclosure: I do contract work for Ought.

https://ought.org/updates/2019-10-28-progress-update (a)

tl;dr:

This is an update on our progress towards our goals over the last ten months. If you can only read 650 characters of this update, like the judges in our experiments, here’s what you need to know:

1. We switched from experiments that break down tasks (factored generation) to experiments that break down evaluating expert work (factored evaluation)

2. 60+ participants have been working 150+ hours per week on our experiments

3. We’re building Mosaic2, an app that streamlines running varied question-answer experiments (factored evaluation, debate, etc.)

4. We’re exploring if language models can automate decompositions, getting 30% accuracy on the Complex Web Questions dataset

5. William Saunders joined as ML engineer, Jungwon Byun as COO

6. We’re hiring an engineering team lead and a business operations person. We’ll pay $5000 for a successful referral! [for the engineering team lead]

Summary of Ought’s experiment structure:

Skipping over a few details, our experiments have the following structure:

-There is a person, the judge.

-The judge faces an overall (root) question: “What does the author of this Pitchfork music album review think of the work being reviewed?”

-This judge is handicapped: they can read at most 650 characters, so they can never read the whole review. Thus, the judge does not have the context required to answer this root question.

-However, the judge has access to two experts who can read the whole text and who provide two possible answers.

-Unfortunately, only one of these experts is honest, the other is malicious, and is trying to trick the judge into accepting a wrong but plausible-sounding answer.

-Without ever seeing the whole text, and only getting information through the experts, the judge must ask follow-up questions to the experts to decipher which answer to the root question is honest and select that one.

-No one can lie about quotes or quotes’ positions in the text: the quotes from the text are the ground truth anchoring this game.

-Up to 6 total questions can be asked by the judge before a decision must be made.

Whenever the judge asks the experts a question, this generates a new experiment: Now a different judge must decide which of two expert answers to that question is honest and which is malicious, using the same recursive process. For this to terminate, eventually a judge must choose an anwer without asking any subquestions.

Some ML work as well:

Complex Web Questions

First, we took the Complex Web Questions dataset, which contains questions like this:

-The actress that had the role of Martha Alston, plays what role in Finding Nemo?

-Which school that Sir Ernest Rutherford attended has the latest founding date?

-What movies does Leo Howard play in and that is 113.0 minutes long?

-Where is the end of the river that originates in Shannon Pot?

We built an end-to-end system using GPT-2 that breaks the questions into subquestions, queries Google to answer each of the subquestions, and aggregates the answers back together to answer the original question. Currently, our system answers about 30% of the questions in CWQ correctly.

Keyboard shortcuts

Keys shown in yellow (e.g., ]) are accesskeys, and require a browser-specific modifier key (or keys).

Keys shown in grey (e.g., ?) do not require any modifier keys.

General
? Show keyboard shortcuts
Esc Hide keyboard shortcuts

Site navigation
h Go to Home (a.k.a. “Frontpage”) view
f Go to Featured (a.k.a. “Curated”) view
a Go to All (a.k.a. “Community”) view
m Go to Meta view
v Go to Tags view
c Go to Recent Comments view
r Go to Archive view
q Go to Sequences view
t Go to About page
u Go to User or Login page
o Go to Inbox page

Page navigation
, Jump up to top of page
. Jump down to bottom of page
/ Jump to top of comments section
s Search

Page actions
n New post or comment
e Edit current post

Post/comment list views
. Focus next entry in list
, Focus previous entry in list
; Cycle between links in focused entry
Enter Go to currently focused entry
Esc Unfocus currently focused entry
] Go to next page
[ Go to previous page
\ Go to first page
e Edit currently focused post

Editor
k Bold text
i Italic text
l Insert hyperlink
q Blockquote text

Appearance
= Increase text size
- Decrease text size
0 Reset to default text size
′ Cycle through content width settings
1 Switch to default theme [A]
2 Switch to dark theme [B]
3 Switch to grey theme [C]
4 Switch to ultramodern theme [D]
5 Switch to simple theme [E]
6 Switch to brutalist theme [F]
7 Switch to ReadTheSequences theme [G]
8 Switch to classic Less Wrong theme [H]
9 Switch to modern Less Wrong theme [I]
; Open theme tweaker
Enter Save changes and close theme tweaker
Esc Close theme tweaker (without saving)

Slide shows
l Start/resume slideshow
Esc Exit slideshow
→↓ Next slide
←↑ Previous slide
Space Reset slide zoom

Miscellaneous
x Switch to next view on user page
z Switch to previous view on user page
` Toggle compact comment list view
g Toggle anti-kibitzer