Replacing a Failed Hard Drive
Overview
This document provides a step-by-step procedure for replacing a failed hard drive on a Fedora infrastructure server. It includes access requirements, necessary tools, and the process for initiating and completing the drive replacement.
Contact Information
- Owner
-
Fedora Infrastructure Team
- Contact
-
#fedora-admin, sysadmin-main
- Purpose
-
Provide basic orientation and introduction to the sysadmin group
Access Level
To perform this procedure, you may need to have sysadmin-main access. In the future, access details might be shared with a dedicated assignee or stored in a smaller vault. Currently, reach out to the sysadmin-main team for necessary information exchange.
Requirements
-
Red Hat VPN Access - Needed for SSH access to the machine.
-
Bitwarden Vault Access - Access to the vault is under discussion. For now, consult the sysadmin-main team for the login credentials.
Process
-
Ensure you are connected to the official Red Hat VPN.
-
Identify the server in question. For this SOP, we will use
bvmhost-x86-01.stg.iad2.fedoraproject.org
as an example. -
To access the management console, append
.mgmt
to the hostname:bvmhost-x86-01-stg.mgmt.iad2.fedoraproject.org
. -
Obtain the IP address by pinging the server from
batcave01
:ssh batcave01.iad2.fedoraproject.org ping bvmhost-x86-01-stg.mgmt.iad2.fedoraproject.org
-
Visit the IP address in a web browser. The management console uses HTTPS, so accept the self-signed certificate:
https://<IP_ADDRESS>
-
Login using the credentials found in the
admin-stg
entry in Bitwarden.
Identify the Failed Drive
-
Navigate to the storage menu to identify the failed drive. Warnings about failing/failed drives will be indicated here.
-
Note the failed drive’s details (e.g., drive 4).
-
Create a failed drice report by clicking on the exporting the information of failed drive.
Create a Support Ticket
-
In the management console, click on the support link in the top right corner.
-
Follow these steps to contact technical support:
-
Go to the top left search bar and select "Support > Contact Technical Support".
-
Search for the device using the service tag from the overview page.
-
Select "HardDrive and RAID Controller" from the drop-down menu.
-
Choose one of the support options:
-
Call: 24/7
-
Live Chat: 7 am - 9 pm CDT, Monday - Friday
-
Social Connect
-
-
-
In the live chat support, provide the failed drive report, once they verify and confirm the failure issue, they will send an email regarding replacement details.
-
If live chat is unsuccessful, call support at 1-866-362-5350 (available 24/7).
Follow-Up with the Support Ticket
-
Once the support ticket is created, the assignee will receive a form via email.
-
Forward this form to Patrick Cole (pcole@redhat.com) along with the machine’s serial number and location.
At this point, Patrick Cole will handle the coordination with Dell for the drive replacement. This avoids adding unnecessary intermediaries.
Patrick will then coordinate the replacement with Dell, including arranging access for the technician if needed.
Conclusion
Following this SOP ensures a systematic approach to replacing failed drives, minimizing downtime and maintaining system integrity. Always reach out to the sysadmin-main team for any clarifications or additional support.
Want to help? Learn how to contribute to Fedora Docs ›