Page MenuHomePhabricator

[MILESTONE] Run controlled experiment of Suggestion Mode MVP
Open, HighPublic

Description

This tasks holds the work of running a controlled experiment to evaluate the impact of presenting new(er) (≤100 cumulative edits) editors and people who are logged out

Deployment timing

Target completion dateMilestoneTicketResponsibleStatusNotes
Monday, 20 AprilPublish experiment announcementsT420667@Quiddity⏳ In progress
Monday, 20 AprilEnsure first suggestion discoverabilityT414518@Esanders + @bmartinezcalvoNeeds attention
Monday, 20 AprilEnsure performance monitoring working as expectedT419447@medeliusNeeds attention
Tuesday, 21 AprilMerge large/small buttons in OOUIT422031@EsandersNeeds code review
Tuesday, 21 AprilMerge normal-enabled-flagged button colours in OOUIT422032@EsandersNeeds code review
Wednesday, 22 AprilImplement Suggestion Mode instrumentation spec in Test KitchenT422740@medelius⏳ In progress
Wednesday, 22 AprilDecide on whether to enable Tone suggestionsT417922@ppelbergNeeds attentionInfo. needed from editing eng. to estimate the "cost"
Wednesday, 22 AprilFinalize UX for indicating additional suggestions outside current sectionT422979@bmartinezcalvo⏳ In progress
Friday, 24 AprilImplement and QA bucketing requirementsT421189Editing Engineering + QA
Friday, 24 AprilDefine and implement missing metricsT422736@MNeisler⏳ In progress
Monday, 27 AprilStart experimentT421189Editing Engineering

Hypothesis

If we present junior contributors who enter the mobile VisualEditor with immediately actionable edit suggestions, then the proportion of edit sessions that result in someone publishing a constructive edit will increase by ≥10%.

Decision to be made

  1. What – if any – adjustments to the Edit Suggestions UX need to be made before we can be confident...?
    • Newcomers and Junior Contributors that encounter Edit Suggestions are more likely to publish a constructive edit
    • Newcomers and Junior Contributors will intuitively interact with the Edit Suggestion experience in ways that are NOT disruptive to them or the wikis.

Metrics

The main outcomes we are trying to impact through this feature. These are what we are primarily using for evaluating the hypothesis and deciding whether to deploy an intervention more widely.

IDHypothesisMetric
KPINewcomers who opened the editor out of curiosity will be compelled to publish edits they might not have made otherwise because they will be provided with edit suggestions that are clear and compelling.Edit completion rate: Proportion of qualified newcomer and Junior Contributor edit sessions [1] on a Wikipedia main namespace that are published, based on controlled A/B test(s).
Secondary metricNewcomers and Junior Contributors will be more likely to return to publish an edit in the future that is not reverted because Edit Suggestions reduced the effort required to discover and publish a constructive edit.Second Week Retention Rate: Proportion of newcomers and Junior Contributors who return to publish ≥1 edit (across namespaces) within 2 weeks of an edit session in which they spent at least ≥2 seconds in VE's ready state.
SecondaryNewcomers and Junior Contributors will be likely to engage with Edit Suggestions because they will be 1) actionable, 2) straightforward, 3) draw on knowledge they already possess, and 4) related to a topic (broadly defined) they've explicitly expressed an interest in improvingSuggestion Acceptance Rate (Treatment Only metric): Proportion of contributors who see at least 1 Edit Suggestion and click to accept at least 1 suggestion during their session. [2]
SecondaryBrand new editors will be more likely to complete at least one constructive edit within 24 hours of creating an account because they will be presented with suggestions for smaller changes they are more equipped to makeConstructive Activation Rate: Proportion of newcomers making their first edit to an article in the main namespace within 24 hours of registration, and that edit not being reverted within 48 hours of being published.
GuardrailNewcomer and Junior Contributors will engage more quickly because they will be provided with relevant and actionable suggestions that help reduce confusion in how to get started.Time until first change: Time between the editor loaded (action = ready) and a user starting to type (action = firstChange
GuardrailThere will be no meaningful decreases in constructive edits because newcomers will be provided structured edit suggestions they intuitively understand and are aligned with local Wikipedia policies and guidelinesConstructive edit rate: Proportion of published edits by users with ≤100 cumulative edits that are constructive. [3]

  1. Qualified edit session = edit session in which the user spent at least ≥2 seconds in VE's ready state
  2. Overall and by suggestion_type
  3. "Constructive edits" = edits to pages in any Wikipedia main namespace that are not reverted within 48 hours of being published

Experiment Decision Matrix

IDScenarioIndicator(s)Plan of Action
1.Suggestion Mode is disrupting, discouraging, or otherwise getting in the way of new(er) volunteers publishing constructive edits≥20% decrease in any of the the following metrics in edit sessions where ≥1 Suggestion is available: edit completion rate, editor load time, edit abandonment rate, revert ratePause scaling plans; investigate changes to the UX.
2.Suggestion Mode is increasing the likelihood that new(er) volunteers will publish destructive editsIncrease in the proportion of edits published in editing sessions where ≥1 suggestions were available and are reverted within 48 hoursPause scaling plans, Review edits to try to identify any patterns in abuse and propose changes to UX to mitigate them.
3.Suggestion Mode is effective at increasing the rate at which people publish (read: complete) edits and the edits people publish are reverted at higher ratesIncreases in both 1) the proportion of all editing sessions by logged-in users with ≤100 cumulative edits that are started on mobile web in the Wikipedia main namespace (user clicks that the edit button (action = ‘init’) that are saved (action = ‘saveSuccess’) and 2) the proportion of published edits that are reverted within 48 hours.Pause scaling plans; analysis and manual review of reverted edits to understand why those edits are being reverted.
4.Suggestion Mode does not show a statistically significant impact on any KPI, secondary, or guardrail metricSee metrics aboveMove forward with scaling plans
5.Suggestion Mode is effective at increasing the edit completion rate of qualified newcomer and Junior Contributor mobile edit sessionsSee metrics aboveMove forward with scaling plans[1]

  1. Qualified edit session = edit session in which a logged-in user with ≤100 cumulative edits spent at least ≥2 seconds in VE's "ready" state.

Related Objects

StatusSubtypeAssignedTask
OpenNone
OpenNone
OpenNone
Stalledmedelius
OpenNone
ResolvedAnneT
ResolvedDLynch
Resolvedppelberg
DuplicateNone
DuplicateNone
ResolvedDLynch
Resolvedppelberg
OpenFeatureNone
Resolvedppelberg
DuplicateBUG REPORTNone
Resolvedppelberg
Resolvedppelberg
Resolvednayoub
Resolvedppelberg
ResolvedDLynch
ResolvedQuiddity
Resolvedppelberg
Resolvedppelberg
ResolvedDLynch
ResolvedDLynch
DuplicateNone
Resolvedppelberg
ResolvedBUG REPORTDLynch
ResolvedBUG REPORTEsanders
OpenNone
Resolvedbmartinezcalvo
Resolvedppelberg
ResolvedQuiddity
ResolvedDLynch
Resolvedmedelius
Resolvedmedelius
ResolvedDLynch
Resolvedppelberg
Resolvedmedelius
Resolvedppelberg
ResolvedEsanders
Resolvedmedelius
Resolvedmedelius
Resolvedzoe
Resolvedzoe
Declinedzoe
ResolvedQuiddity
ResolvedDLynch
Resolvedmedelius
Resolvedzoe
Openzoe
ResolvedDLynch
ResolvedBUG REPORTEsanders
ResolvedQuiddity
ResolvedFeaturemedelius
ResolvedBUG REPORTmedelius
ResolvedDLynch
ResolvedEsanders
ResolvedDLynch
Resolvedppelberg
Resolvedppelberg
Resolvedppelberg
DuplicateNone
Openmedelius
Openppelberg
ResolvedEsanders
ResolvedQuiddity
OpenNone
OpenNone
Resolvedbmartinezcalvo
ResolvedBUG REPORTEsanders
Openppelberg
OpenNone
OpenEsanders
Resolvedbmartinezcalvo
ResolvedDLynch
ResolvedDLynch
ResolvedRyasmeen
OpenNone
DuplicateNone
ResolvedEsanders
OpenNone
ResolvedEsanders
DeclinedNone
Resolvedppelberg
OpenMNeisler
OpenEsanders
ResolvedEsanders
DuplicateNone
OpenMNeisler
ResolvedDLynch
ResolvedDLynch
OpenEsanders
Resolvedmedelius
ResolvedSLong-WMF
Openmedelius
ResolvedQuiddity
OpenQuiddity
OpenNone
OpenEsanders
OpenEsanders
ResolvedDLynch
OpenNone
OpenMNeisler
Openmedelius
Openbmartinezcalvo

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes