[Question] Sample size and clustering advice needed

[July 30 up­date]: We have an an­swer re­gard­ing sam­ple size if in­tra-cluster cor­re­la­tion co­effi­cient is as­sumed zero. This sam­ple size calcu­la­tor can be used.

EA Cameroon needs statis­ti­cal im­pact eval­u­a­tion sam­ple size and cluster in­clu­sion ad­vice for their COVID-19 pro­ject. The pro­ject should ideally start to­ward the end of the week.

Data should be gath­ered be­fore and af­ter the main part of the pro­ject (af­ter one month).

The idea is to count the num­ber of per­sons out of a cer­tain num­ber who wear face cov­er­ing and how long this count­ing took. This in­for­ma­tion can be used as a proxy for pre­ven­tive mea­sures and so­cial dis­tanc­ing.

I would like to ask about the sam­ple size and in­clu­sion of clusters. There are 180 000 per­sons in the cam­paign area and 6 villages/​parts. Vol­un­teers would pre­fer not to travel to all 6 cam­paign, but more so an equal num­ber of non-cam­paign, villages, as the non-in­ter­ven­tion com­mu­ni­ties are dis­tant.

Differ­ent lan­guages are spo­ken in the 6 parts, but the cam­paign­ing will in­clude all of these lan­guages. Other­wise, the parts are similar. Since lit­tle in­for­ma­tion is cur­rently broad­cast, the cam­paign may in­crease the share of per­sons wear­ing a face cov­er­ing from 50% to at least 60% (or equiv­a­lent per­centage (20%) in­crease from an­other baseline). Can only e. g. 3+3 villages be in­cluded? 6 in­ter­ven­tion + 3 non-in­ter­ven­tion? How im­por­tant, in terms of statis­ti­cal power is to in­clude all clusters and an equal num­ber of non-in­ter­ven­tion cluster? How many per­sons should be ob­served at each place?

I will ap­pre­ci­ate any replies.

