Page MenuHomePhabricator

Data-ServicesComponent
ActivePublic

Milestones

Members

  • This project does not have any members.
  • View All

Details

Description

Data services available for use on Cloud-Services.

  • Wiki Replicas - redacted copies of the Wikimedia wiki's production databases
  • Wikimedia Dumps - public hosting of full text dumps from Wikimedia wikis and other datasets in Cloud (see also: Dumps-Generation for generation of dumps themselves)
  • Shared Storage - NFS storage for cross-VM and cross-project use

Recent Activity

Today

JJMC89 moved T423550: Table/View imagelinks in commonswiki_p throws an error from Backlog to Wiki replicas on the Data-Services board.
Thu, Apr 16, 7:44 AM · cloud-services-team, Data-Services, DBA
JJMC89 added a project to T423550: Table/View imagelinks in commonswiki_p throws an error: Data-Services.
Thu, Apr 16, 7:43 AM · cloud-services-team, Data-Services, DBA
Marostegui added a subtask for T422459: Re-run maintainviews on all clouddb* and an-redacteddb1001.eqiad.wmnet: T423550: Table/View imagelinks in commonswiki_p throws an error.
Thu, Apr 16, 6:43 AM · cloud-services-team, Data-Services, Data-Engineering-Radar, DBA, Data-Engineering
Maintenance_bot added a project to T423151: decommission clouddb1019.eqiad.wmnet: SRE.
Thu, Apr 16, 6:29 AM · SRE, DC-Ops, ops-eqiad, cloud-services-team, Data-Services, DBA, decommission-hardware
Marostegui added a comment to T423151: decommission clouddb1019.eqiad.wmnet.

cookbooks.sre.hosts.decommission executed by marostegui@cumin1003 for hosts: clouddb1019.eqiad.wmnet

  • clouddb1019.eqiad.wmnet (FAIL)
    • Host not found on Icinga, unable to downtime it
    • Found physical host
    • Downtimed management interface on Alertmanager
    • Unable to connect to the host, wipe of swraid, partition-table and filesystem signatures will not be performed: Cumin execution failed (exit_code=2)
    • Failed to power off, manual intervention required: Remote IPMI for clouddb1019.mgmt.eqiad.wmnet failed (exit=1): b''
    • [Netbox] Set status to Decommissioning, deleted all non-mgmt IPs, updated switch interfaces (disabled, removed vlans, etc)
    • Configured the linked switch interface(s)
    • Removed from DebMonitor
    • Removed from Puppet server and PuppetDB

ERROR: some step on some host failed, check the bolded items above

Thu, Apr 16, 5:35 AM · SRE, DC-Ops, ops-eqiad, cloud-services-team, Data-Services, DBA, decommission-hardware
Marostegui moved T423151: decommission clouddb1019.eqiad.wmnet from Backlog to pending onsite steps (eqiad) on the decommission-hardware board.
Thu, Apr 16, 5:34 AM · SRE, DC-Ops, ops-eqiad, cloud-services-team, Data-Services, DBA, decommission-hardware
Marostegui moved T423151: decommission clouddb1019.eqiad.wmnet from Backlog to Decommission on the ops-eqiad board.
Thu, Apr 16, 5:34 AM · SRE, DC-Ops, ops-eqiad, cloud-services-team, Data-Services, DBA, decommission-hardware
Marostegui moved T423151: decommission clouddb1019.eqiad.wmnet from Triage to Done on the DBA board.

This is ready for DC-Ops

Thu, Apr 16, 5:34 AM · SRE, DC-Ops, ops-eqiad, cloud-services-team, Data-Services, DBA, decommission-hardware
Marostegui reassigned T423151: decommission clouddb1019.eqiad.wmnet from Marostegui to Jclark-ctr.
Thu, Apr 16, 5:33 AM · SRE, DC-Ops, ops-eqiad, cloud-services-team, Data-Services, DBA, decommission-hardware
Maintenance_bot removed a project from T422813: clouddb1019 down: Patch-For-Review.
Thu, Apr 16, 5:31 AM · SRE, DC-Ops, DBA, ops-eqiad, Data-Services, cloud-services-team
ops-monitoring-bot added a comment to T423151: decommission clouddb1019.eqiad.wmnet.

cookbooks.sre.hosts.decommission executed by marostegui@cumin1003 for hosts: clouddb1019.eqiad.wmnet

  • clouddb1019.eqiad.wmnet (FAIL)
    • Host not found on Icinga, unable to downtime it
    • Found physical host
    • Downtimed management interface on Alertmanager
    • Unable to connect to the host, wipe of swraid, partition-table and filesystem signatures will not be performed: Cumin execution failed (exit_code=2)
    • Failed to power off, manual intervention required: Remote IPMI for clouddb1019.mgmt.eqiad.wmnet failed (exit=1): b''
    • [Netbox] Set status to Decommissioning, deleted all non-mgmt IPs, updated switch interfaces (disabled, removed vlans, etc)
    • Configured the linked switch interface(s)
    • Removed from DebMonitor
    • Removed from Puppet server and PuppetDB
Thu, Apr 16, 5:31 AM · SRE, DC-Ops, ops-eqiad, cloud-services-team, Data-Services, DBA, decommission-hardware
Maintenance_bot removed a project from T423151: decommission clouddb1019.eqiad.wmnet: Patch-For-Review.
Thu, Apr 16, 5:30 AM · SRE, DC-Ops, ops-eqiad, cloud-services-team, Data-Services, DBA, decommission-hardware
Marostegui updated the task description for T423151: decommission clouddb1019.eqiad.wmnet.
Thu, Apr 16, 5:25 AM · SRE, DC-Ops, ops-eqiad, cloud-services-team, Data-Services, DBA, decommission-hardware
gerritbot added a comment to T423151: decommission clouddb1019.eqiad.wmnet.

Change #1272222 merged by Marostegui:

[operations/puppet@production] site.pp: Remove clouddb1019

https://gerrit.wikimedia.org/r/1272222

Thu, Apr 16, 5:24 AM · SRE, DC-Ops, ops-eqiad, cloud-services-team, Data-Services, DBA, decommission-hardware
gerritbot added a project to T423151: decommission clouddb1019.eqiad.wmnet: Patch-For-Review.
Thu, Apr 16, 5:23 AM · SRE, DC-Ops, ops-eqiad, cloud-services-team, Data-Services, DBA, decommission-hardware
gerritbot added a comment to T423151: decommission clouddb1019.eqiad.wmnet.

Change #1272222 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] site.pp: Remove clouddb1019

https://gerrit.wikimedia.org/r/1272222

Thu, Apr 16, 5:23 AM · SRE, DC-Ops, ops-eqiad, cloud-services-team, Data-Services, DBA, decommission-hardware
gerritbot added a comment to T422813: clouddb1019 down.

Change #1272216 merged by Marostegui:

[operations/puppet@production] clouddb1019.yaml: Remove file

https://gerrit.wikimedia.org/r/1272216

Thu, Apr 16, 5:19 AM · SRE, DC-Ops, DBA, ops-eqiad, Data-Services, cloud-services-team
gerritbot added a project to T422813: clouddb1019 down: Patch-For-Review.
Thu, Apr 16, 5:12 AM · SRE, DC-Ops, DBA, ops-eqiad, Data-Services, cloud-services-team
gerritbot added a comment to T422813: clouddb1019 down.

Change #1272216 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] clouddb1019.yaml: Remove file

https://gerrit.wikimedia.org/r/1272216

Thu, Apr 16, 5:12 AM · SRE, DC-Ops, DBA, ops-eqiad, Data-Services, cloud-services-team

Yesterday

Maintenance_bot removed a project from T422040: Migrate clouddumps https/rsync interfaces behind LVS: Patch-For-Review.
Wed, Apr 15, 11:31 AM · Traffic, Data-Services, tools-infrastructure-team, Datasets-General-or-Unknown
taavi closed T422040: Migrate clouddumps https/rsync interfaces behind LVS as Resolved.
Wed, Apr 15, 10:44 AM · Traffic, Data-Services, tools-infrastructure-team, Datasets-General-or-Unknown
gerritbot added a comment to T422040: Migrate clouddumps https/rsync interfaces behind LVS.

Change #1270363 merged by Majavah:

[operations/dns@master] wikimedia.org: Restore original TTL for dumps

https://gerrit.wikimedia.org/r/1270363

Wed, Apr 15, 10:42 AM · Traffic, Data-Services, tools-infrastructure-team, Datasets-General-or-Unknown

Tue, Apr 14

fnegri placed T381587: [wikireplicas] Gather usage stats up for grabs.
Tue, Apr 14, 1:30 PM · cloud-services-team (FY2025/2026-Q3-Q4), Data-Services
fnegri added a comment to T351637: [wikireplicas] add proper dry-run/diff mode to maintain-views.

I tested one full run on all databases on clouddb1017. It's slower than the old version, but acceptable: it took 12 minutes in total.

Tue, Apr 14, 11:49 AM · tools-platform-team, cloud-services-team (FY2025/2026-Q3-Q4), Patch-For-Review, Data-Services
fnegri added a comment to T422806: [wikireplicas] Update grants for "maintainviews" user.

Added in https://gerrit.wikimedia.org/r/c/operations/puppet/+/1270891

Tue, Apr 14, 11:17 AM · Patch-For-Review, Data-Persistence, tools-platform-team, Data-Services
gerritbot added a comment to T422806: [wikireplicas] Update grants for "maintainviews" user.

Change #1270891 had a related patch set uploaded (by FNegri; author: FNegri):

[operations/puppet@production] mariadb: wiki-replicas: add missing grants

https://gerrit.wikimedia.org/r/1270891

Tue, Apr 14, 11:16 AM · Patch-For-Review, Data-Persistence, tools-platform-team, Data-Services
fnegri added a comment to T422806: [wikireplicas] Update grants for "maintainviews" user.

Two of the grants are actually needed to create views. This seems to be enough:

Tue, Apr 14, 11:09 AM · Patch-For-Review, Data-Persistence, tools-platform-team, Data-Services
fnegri added a comment to T422806: [wikireplicas] Update grants for "maintainviews" user.

These grants are present on all clouddb hosts but are not listed in wiki-replicas.sql:

Tue, Apr 14, 11:05 AM · Patch-For-Review, Data-Persistence, tools-platform-team, Data-Services
Maintenance_bot removed a project from T423151: decommission clouddb1019.eqiad.wmnet: Patch-For-Review.
Tue, Apr 14, 10:31 AM · SRE, DC-Ops, ops-eqiad, cloud-services-team, Data-Services, DBA, decommission-hardware
gerritbot added a comment to T423151: decommission clouddb1019.eqiad.wmnet.

Change #1270868 merged by Marostegui:

[operations/puppet@production] check_private_data_report: Remove clouddb1019

https://gerrit.wikimedia.org/r/1270868

Tue, Apr 14, 9:32 AM · SRE, DC-Ops, ops-eqiad, cloud-services-team, Data-Services, DBA, decommission-hardware
gerritbot added a comment to T423151: decommission clouddb1019.eqiad.wmnet.

Change #1270868 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] check_private_data_report: Remove clouddb1019

https://gerrit.wikimedia.org/r/1270868

Tue, Apr 14, 9:31 AM · SRE, DC-Ops, ops-eqiad, cloud-services-team, Data-Services, DBA, decommission-hardware
gerritbot added a comment to T423151: decommission clouddb1019.eqiad.wmnet.

Change #1270764 merged by Marostegui:

[operations/puppet@production] eqiad.yaml: Remove clouddb1019

https://gerrit.wikimedia.org/r/1270764

Tue, Apr 14, 9:09 AM · SRE, DC-Ops, ops-eqiad, cloud-services-team, Data-Services, DBA, decommission-hardware
gerritbot added a project to T423151: decommission clouddb1019.eqiad.wmnet: Patch-For-Review.
Tue, Apr 14, 5:39 AM · SRE, DC-Ops, ops-eqiad, cloud-services-team, Data-Services, DBA, decommission-hardware
gerritbot added a comment to T423151: decommission clouddb1019.eqiad.wmnet.

Change #1270764 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] eqiad.yaml: Remove clouddb1019

https://gerrit.wikimedia.org/r/1270764

Tue, Apr 14, 5:39 AM · SRE, DC-Ops, ops-eqiad, cloud-services-team, Data-Services, DBA, decommission-hardware
Maintenance_bot removed a project from T423151: decommission clouddb1019.eqiad.wmnet: Patch-For-Review.
Tue, Apr 14, 5:31 AM · SRE, DC-Ops, ops-eqiad, cloud-services-team, Data-Services, DBA, decommission-hardware
gerritbot added a comment to T423151: decommission clouddb1019.eqiad.wmnet.

Change #1270758 merged by Marostegui:

[operations/puppet@production] installserver: Remove clouddb1019

https://gerrit.wikimedia.org/r/1270758

Tue, Apr 14, 5:25 AM · SRE, DC-Ops, ops-eqiad, cloud-services-team, Data-Services, DBA, decommission-hardware
gerritbot added a project to T423151: decommission clouddb1019.eqiad.wmnet: Patch-For-Review.
Tue, Apr 14, 5:22 AM · SRE, DC-Ops, ops-eqiad, cloud-services-team, Data-Services, DBA, decommission-hardware
gerritbot added a comment to T423151: decommission clouddb1019.eqiad.wmnet.

Change #1270758 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] installserver: Remove clouddb1019

https://gerrit.wikimedia.org/r/1270758

Tue, Apr 14, 5:21 AM · SRE, DC-Ops, ops-eqiad, cloud-services-team, Data-Services, DBA, decommission-hardware

Mon, Apr 13

Jclark-ctr updated subscribers of T423151: decommission clouddb1019.eqiad.wmnet.
Mon, Apr 13, 6:40 PM · SRE, DC-Ops, ops-eqiad, cloud-services-team, Data-Services, DBA, decommission-hardware
fnegri claimed T415165: Install a clouddb hosts with Debian Trixie.

Re-claiming this task, I'll start with clouddb1022 then.

Mon, Apr 13, 4:15 PM · tools-platform-team, cloud-services-team (FY2025/2026-Q3-Q4), Data-Services, Data-Persistence
ops-monitoring-bot added a comment to T422813: clouddb1019 down.

Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1003 for host clouddb1019.eqiad.wmnet with OS trixie executed with errors:

  • clouddb1019 (FAIL)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • The reimage failed, see the cookbook logs for the details. You can also try typing "sudo install-console clouddb1019.eqiad.wmnet" to get a root shell, but depending on the failure this may not work.
Mon, Apr 13, 4:14 PM · SRE, DC-Ops, DBA, ops-eqiad, Data-Services, cloud-services-team
Marostegui added a comment to T415165: Install a clouddb hosts with Debian Trixie.

sounds good from my side yes!

Mon, Apr 13, 4:14 PM · tools-platform-team, cloud-services-team (FY2025/2026-Q3-Q4), Data-Services, Data-Persistence
aputhin moved T351637: [wikireplicas] add proper dry-run/diff mode to maintain-views from In progress to In review on the tools-platform-team board.
Mon, Apr 13, 3:52 PM · tools-platform-team, cloud-services-team (FY2025/2026-Q3-Q4), Patch-For-Review, Data-Services
aputhin moved T422806: [wikireplicas] Update grants for "maintainviews" user from In progress to In review on the tools-platform-team board.
Mon, Apr 13, 3:52 PM · Patch-For-Review, Data-Persistence, tools-platform-team, Data-Services
aputhin moved T422806: [wikireplicas] Update grants for "maintainviews" user from Todos to In progress on the tools-platform-team board.
Mon, Apr 13, 3:51 PM · Patch-For-Review, Data-Persistence, tools-platform-team, Data-Services
fnegri added a comment to T415165: Install a clouddb hosts with Debian Trixie.

should we go back to reimage clouddb1015

Mon, Apr 13, 3:44 PM · tools-platform-team, cloud-services-team (FY2025/2026-Q3-Q4), Data-Services, Data-Persistence
Marostegui claimed T423151: decommission clouddb1019.eqiad.wmnet.
Mon, Apr 13, 3:29 PM · SRE, DC-Ops, ops-eqiad, cloud-services-team, Data-Services, DBA, decommission-hardware
Maintenance_bot moved T422813: clouddb1019 down from In progress to Done on the DBA board.
Mon, Apr 13, 3:29 PM · SRE, DC-Ops, DBA, ops-eqiad, Data-Services, cloud-services-team
Marostegui updated the task description for T423151: decommission clouddb1019.eqiad.wmnet.
Mon, Apr 13, 3:29 PM · SRE, DC-Ops, ops-eqiad, cloud-services-team, Data-Services, DBA, decommission-hardware
Marostegui added a subtask for T422813: clouddb1019 down: T423151: decommission clouddb1019.eqiad.wmnet.
Mon, Apr 13, 3:29 PM · SRE, DC-Ops, DBA, ops-eqiad, Data-Services, cloud-services-team