Replacing a Failed Hard Drive

Overview

This document provides a step-by-step procedure for replacing a failed hard drive on a Fedora infrastructure server. It includes access requirements, necessary tools, and the process for initiating and completing the drive replacement.

Contact Information

Owner

Fedora Infrastructure Team

Contact

#fedora-admin, sysadmin-main

Purpose

Provide basic orientation and introduction to the sysadmin group

Access Level

To perform this procedure, you may need to have sysadmin-main access. In the future, access details might be shared with a dedicated assignee or stored in a smaller vault. Currently, reach out to the sysadmin-main team for necessary information exchange.

Requirements

  • Red Hat VPN Access - Needed for SSH access to the machine.

  • Bitwarden Vault Access - Access to the vault is under discussion. For now, consult the sysadmin-main team for the login credentials.

Process

Firstly, access the management console:
  1. Ensure you are connected to the official Red Hat VPN.

  2. Identify the server in question. For this SOP, we will use bvmhost-x86-01.stg.iad2.fedoraproject.org as an example.

  3. To access the management console, append .mgmt to the hostname: bvmhost-x86-01-stg.mgmt.iad2.fedoraproject.org.

  4. Obtain the IP address by pinging the server from batcave01:

    ssh batcave01.iad2.fedoraproject.org
    ping bvmhost-x86-01-stg.mgmt.iad2.fedoraproject.org
  5. Visit the IP address in a web browser. The management console uses HTTPS, so accept the self-signed certificate:

    https://<IP_ADDRESS>
  6. Login using the credentials found in the admin-stg entry in Bitwarden.

Identify the Failed Drive

Navigate to the overview page to find the serial number/service tag of the machine.
  1. Navigate to the storage menu to identify the failed drive. Warnings about failing/failed drives will be indicated here.

  2. Note the failed drive’s details (e.g., drive 4).

  3. Create a failed drice report by clicking on the exporting the information of failed drive.

Create a Support Ticket

  1. In the management console, click on the support link in the top right corner.

  2. Follow these steps to contact technical support:

    1. Go to the top left search bar and select "Support > Contact Technical Support".

    2. Search for the device using the service tag from the overview page.

    3. Select "HardDrive and RAID Controller" from the drop-down menu.

    4. Choose one of the support options:

      1. Call: 24/7

      2. Live Chat: 7 am - 9 pm CDT, Monday - Friday

      3. Social Connect

  3. In the live chat support, provide the failed drive report, once they verify and confirm the failure issue, they will send an email regarding replacement details.

  4. If live chat is unsuccessful, call support at 1-866-362-5350 (available 24/7).

Follow-Up with the Support Ticket

  1. Once the support ticket is created, the assignee will receive a form via email.

  2. Forward this form to Patrick Cole (pcole@redhat.com) along with the machine’s serial number and location.

    At this point, Patrick Cole will handle the coordination with Dell for the drive replacement. This avoids adding unnecessary intermediaries.

Patrick will then coordinate the replacement with Dell, including arranging access for the technician if needed.

Conclusion

Following this SOP ensures a systematic approach to replacing failed drives, minimizing downtime and maintaining system integrity. Always reach out to the sysadmin-main team for any clarifications or additional support.