I’m unclear why you are hesitant about the claim of the potential to revolutionise the psychology evidence base. I wonder if you perhaps inadvertently used a strawman of my argument by only reading the section which you quoted? This was not intended to support the claim about the bot’s potential to revolutionise the psychology evidence base.
Instead, it might be more helpful to refer to Appendix 2; I include a heavily abbreviated version here:
The source for much of this section is conversations with existing professional psychiatrists/psychologists.
Currently some psychological interventions are substantially better evidenced than others.
<SNIP>
Part of the aim of this project is to address this in two ways:
(1) Providing a uniform intervention that can be assessed at scale
<SNIP>
(2) Allowing an experimental/scientific approach which could provide an evidence base for therapists
<SNIP>
Crucially, TIO is fundamentally different from other mental health apps—it has a free-form conversational interface, similar to an actual conversation (unlike other apps which either don’t have any conversational interface at all, or have a fairly restricted/”guided” conversational capability). This means that TIO is uniquely well-positioned to achieve this goal.
To expand on item (2), the idea is that when I, as someone who speaks to people in a therapeutic capacity, choose to say one thing (as opposed to another thing) there is no granular evidence about that specific thing I said. This feels all the more salient when being trained or training others, and dissecting the specific things said in a training role play. These discussions largely operate in an evidence vacuum.
The professionals that I’ve spoken to thus far have not yet been able to point me to evidence as granular as this.
If you know of any such evidence, please do let me know—it might help me to spend less time on this project, and I would also find that evidence very useful.
Thanks for your reply, I hope I’m not wasting your time.
But appendix 2 also seems to imply that the evidence base for CBT is for it as an approach in its entirety. What we think that works in a CBT protocol for depression is different than what we think that works in a CBT protocol for panic disorder (or OCD, or …). And there is data for which groups none of those protocols work.
In CBT that is mainly based on a functional analysis (or assumed processes), and that functional analysis would create the context in which specific things one would or wouldn’t say. This also provides context to how you would define ‘empathetic responses’.
(There is a paper from 1966 claiming that Rogers probably also used implicit functional analyses to ‘decide’ to what extent he would or wouldn’t reinforce certain (mal)adaptive behaviors, just to show how old this discussion is. The bot might generate very interesting results to contribute to that discussion!)
Would you consider evidence that a specific diagnosis-aimed CBT protocol works better than a general CBT protocol for a specific group as relevant to the claim that there is evidence about which reactions (sentences) would or wouldn’t work (for whom)?
So I just can’t imagine revolutionizing the evidence base for psychological treatments using a ‘uniform’ approach (and thus without taking characteristics of the person into account), but maybe I don’t get how diverse this bot is. I just interacted a bit with the test version, and it supported my hypothesis about it potentially being (a bit) harmful to certain groups of people. (*edit* you seem to anticipate on this but not encouraging re-use). But still great for most people!
Thanks very much Kris, I’m very pleased that you’re interested in this enough to write these comments.
And as you’re pointing out, I didn’t respond to your earlier point about talking about the evidence base for an entire approach, as opposed to (e.g.) an approach applied to a specific diagnosis.
The claim that the “evidence base for CBT” is stronger than the “evidence base for Rogerian therapy” came from psychologists/psychiatrists who were using a bit of a shorthand—i.e. I think they really mean something like “if we look at the evidence base for CBT as applied to X for lots of values of X, compared to the evidence base for Rogerian therapy as applied to X for lots of values of X, the evidence base for the latter is more likely to have gaps for lots of values of X, and more likely to have poorer quality evidence if it’s not totally missing”.
It’s worth noting that while the current assessment mechanism is the question described in Appendix 1f, this is, as alluded to, not the only question that could be asked, and it’s also possible for the bot to incorporate other standard assessment approaches (PHQ9, GAD7, or whatever) and adapt accordingly.
Having said that, I’d say that this on its own doesn’t feel revolutionary to me. What really does seem revolutionary is that, with the right scale, I might be able to say: This client said XYZ to me, if I had responded with ABC or DEF, which of those would have given me a better response, and be able to test something as granular as that and get a non-tiny sample size.
Thank you for your comment Kris.
I’m unclear why you are hesitant about the claim of the potential to revolutionise the psychology evidence base. I wonder if you perhaps inadvertently used a strawman of my argument by only reading the section which you quoted? This was not intended to support the claim about the bot’s potential to revolutionise the psychology evidence base.
Instead, it might be more helpful to refer to Appendix 2; I include a heavily abbreviated version here:
To expand on item (2), the idea is that when I, as someone who speaks to people in a therapeutic capacity, choose to say one thing (as opposed to another thing) there is no granular evidence about that specific thing I said. This feels all the more salient when being trained or training others, and dissecting the specific things said in a training role play. These discussions largely operate in an evidence vacuum.
The professionals that I’ve spoken to thus far have not yet been able to point me to evidence as granular as this.
If you know of any such evidence, please do let me know—it might help me to spend less time on this project, and I would also find that evidence very useful.
Thanks for your reply, I hope I’m not wasting your time.
But appendix 2 also seems to imply that the evidence base for CBT is for it as an approach in its entirety. What we think that works in a CBT protocol for depression is different than what we think that works in a CBT protocol for panic disorder (or OCD, or …). And there is data for which groups none of those protocols work.
In CBT that is mainly based on a functional analysis (or assumed processes), and that functional analysis would create the context in which specific things one would or wouldn’t say. This also provides context to how you would define ‘empathetic responses’.
(There is a paper from 1966 claiming that Rogers probably also used implicit functional analyses to ‘decide’ to what extent he would or wouldn’t reinforce certain (mal)adaptive behaviors, just to show how old this discussion is. The bot might generate very interesting results to contribute to that discussion!)
Would you consider evidence that a specific diagnosis-aimed CBT protocol works better than a general CBT protocol for a specific group as relevant to the claim that there is evidence about which reactions (sentences) would or wouldn’t work (for whom)?
So I just can’t imagine revolutionizing the evidence base for psychological treatments using a ‘uniform’ approach (and thus without taking characteristics of the person into account), but maybe I don’t get how diverse this bot is. I just interacted a bit with the test version, and it supported my hypothesis about it potentially being (a bit) harmful to certain groups of people. (*edit* you seem to anticipate on this but not encouraging re-use). But still great for most people!
Thanks very much Kris, I’m very pleased that you’re interested in this enough to write these comments.
And as you’re pointing out, I didn’t respond to your earlier point about talking about the evidence base for an entire approach, as opposed to (e.g.) an approach applied to a specific diagnosis.
The claim that the “evidence base for CBT” is stronger than the “evidence base for Rogerian therapy” came from psychologists/psychiatrists who were using a bit of a shorthand—i.e. I think they really mean something like “if we look at the evidence base for CBT as applied to X for lots of values of X, compared to the evidence base for Rogerian therapy as applied to X for lots of values of X, the evidence base for the latter is more likely to have gaps for lots of values of X, and more likely to have poorer quality evidence if it’s not totally missing”.
It’s worth noting that while the current assessment mechanism is the question described in Appendix 1f, this is, as alluded to, not the only question that could be asked, and it’s also possible for the bot to incorporate other standard assessment approaches (PHQ9, GAD7, or whatever) and adapt accordingly.
Having said that, I’d say that this on its own doesn’t feel revolutionary to me. What really does seem revolutionary is that, with the right scale, I might be able to say: This client said XYZ to me, if I had responded with ABC or DEF, which of those would have given me a better response, and be able to test something as granular as that and get a non-tiny sample size.