Monitoring Dell Servers in HP Systems Insight Manager

Monitoring Dell Servers in HP Systems Insight Manager

This post explains the details of managing Dell Servers with HP Systems Insight Manager(HPSIM) and enabling SNMP Traps support. The server MIB files “10892.mib” and “dcstorag.mib” can be used for monitoring the Dell Servers in an environment managed by HPSIM.

 

Managed Node Pre-Requisites

 

         Installing the OpenManage Server Administrator

            The Dell Server should be installed with the Server Administrator Software to enable the monitoring. The Server Administrator software comes with the installation DVD and the details for the installation and configuration of the OpenManage Server Administrator is available in support.dell.com manuals.

     Configuring the SNMP Service

            The SNMP Service needs to be configured in the Server to enable the SNMP communication between the Dell Server and HPSIM. To enable the traps to be forwarded to the HP SIM server the SNMP Trap destination needs to be configured properly with the HP SIM Server IP Address. The details for the SNMP configuration configuration is available in the Server Administrator documents in support.dell.com.

The current latest version can be downloaded from support.dell.com :
http://support.dell.com/support/edocs/software/svradmin/7.1/en/index.htm

Loading the Dell MIBs in HPSIM

            The Dell Server MIB files are available with the Server Administrator DVD and also the MIBs can be downloaded from support.dell.com.

 

  • Uploading he MIBs:

Uploading the MIBs is simply copying the MIBs from the downloaded folder to the \Program Files\HP\Systems Insight Manager\mibs folder

  1.  Copy “10892.mib” to \Program Files\HP\Systems Insight Manager\mibs
  2.  Copy “dcstorag.mib” to \Program Files\HP\Systems Insight Manager\mibs

Note – As HPSIM generates an error when a MIB has a filename that contains only    numbers, rename “10892.mib” to “new10892.mib”, and for consistency also rename to “dcstorag.mib” to “newdcstorag.mib”.

  • Compiling the MIBs:

The MIBs needs to be compiled to generate the intermediate .cfg file which can be used to register the MIBs in HP SIM.

  1. Open a command window (cmd.exe)
  2.  Change the working directory to \Program Files\HP\Systems Insight Manager\mibs.
  3. Enter the command “mcompile new10892.mib”. The MIB should compile and return the message “Mib Compilation completed successfully”.
  4. Enter the command “newdcstorag.mib”. The MIB should compile and return the message “Mib Compilation completed successfully”.
  • Registering the MIBs:

 

Register the Server MIB files as follows:

  1. Enter the command “mxmib -a new10892.cfg”. The MIB should register and return the message “COMMAND COMPLETED SUCCESSFULLY”.
  2. Enter the command “mxmib -a newdcstorag.cfg”. The MIB should register and return the message “COMMAND COMPLETED SUCCESSFULLY”.

HPSIM – System Type Manager

The System Type Manager is used to define the rules to identify the different systems in HP SIM. The steps used to configure the System Type Manager to recognize a Dell Server are as follows:

Dell Windows Server

  1. Login to HPSIM.
  2. Select Options ‐> Manage System Types.
  3. Click the New… button. The New Rule pane will open below.
  4. Beside System object identifier click the Retrieve from system… button. The Retrieve from system pane will appear below the current pane.
  5. Enter the hostname or IP address of the target Dell system in the Target hostname or IP address text box.
  6. Click Get Response. Note that the response value of “1.3.6.1.4.1.311.1.1.3.1.2” is displayed below.
      • Figure 1
  7. Click the OK button to accept this value and return to the New rule pane.
  8. Beside MIB variable object identifier enter the OID for the Server MIB Attribute “systemManagementSoftwareName” value as “.1.3.6.1.4.1.674.10892.1.100.1”.
      • Figure 2
  9. Beside Object Value click on “Retrieve from System”
  10. Click Get Response. Note that the response value of “Server Administrator” is displayed below. This is the OpenManage Server Administrator software name retrieved from the Dell Server.
      • Figure 3
  11. Click the OK button to accept this value and return to the New rule pane.
  12. Beside Compare rule select Match.
  13. Beside System Type select Server.
  14. Beside Subtype select “Dell”.
  15. Beside Product model enter “Dell Windows Server”.
  16. Leave the Custom management page field blank and click the OK button.

Dell Linux Server

  1. Login to HPSIM.
  2. Select Options ‐> Manage System Types.
  3. Click the New… button. The New Rule pane will open below.
  4. Beside System object identifier click the Retrieve from system… button. The Retrieve from system pane will appear below the current pane.
  5. Enter the hostname or IP address of the target Dell system in the Target hostname or IP address text box.
  6. Click Get Response. Note that the response value of “1.3.6.1.4.1.8072.3.2.10” is displayed below.
      • Figure 4 
  7. Click the OK button to accept this value and return to the New rule pane.
  8. Beside MIB variable object identifier enter the OID for the Server MIB Attribute “systemManagementSoftwareName” value as “.1.3.6.1.4.1.674.10892.1.100.1”.
  9. Beside Object Value click on “Retrieve from System”
  10. Click Get Response. Note that the response value of “Server Administrator” is displayed below. This is the OpenManage Server Administrator software name retrieved from the Dell Server.
  11. Click the OK button to accept this value and return to the New rule pane.
  12. Beside Compare rule select Match.
  13. Beside System Type select Server.
  14. Beside Subtype select “Dell”.
  15. Beside Product model enter “Dell Linux Server”.
  16. Leave the Custom management page field blank and click the OK button.

Discovering Dell Servers

Once the Dell Server system type has been configured in the HP SIM, the Dell Servers can be properly discovered in HP SIM.

 

  • Global Protocol Settings:

The SNMP settings can be changed as follows:

  • Click Options ‐> Protocol Settings ‐> Global Protocol Settings. The Global Protocol Settings window will appear.
      • Figure 5 
  • Scroll down to the Default SNMP settings section and click on “Global Credentials” and Verify that the required community strings is configured and click on OK button to save the changes.
    • Figure 6

 

  • Discovering Dell Servers:

 

    • Click Options ‐> Discovery. The Discovery summary page will be displayed.
      • Figure 7

 

 

    • Click on New button, the Discovery options page will be shown.
    • Select the Discovery option “Discover a Single System” and enter the name for the discovery option and enter the IPAddress of the Dell Server.     Note – Also the Group discovery can be used to discover multiple servers.
      • Figure 8

 

 

    • Click on Save button to save and accept the changes.

i.        Once the discovery rule is saved, Click on “Run Now” option to execute the discovery Rule. The Discovered Dell Server will be shown as

      • Figure 9

 

ii.        The “Tools and Links” tab contains the link to launch the OpenManage Sever Administrator Console for Dell Servers.

      • Figure 10

 

 

iii.        The Events tab will List the associated events with the corresponding Dell Server.

      • Figure 11

 

  • Testing Dell Server SNMP Traps:

 

    • Log on to Dell Server OpenManage Server Administrator console by clicking on the “openmanage” link from “Tools and Links” tab.
    • Navigate to the Temperature page in Main System under Server Module in the left-side navigation tree
      • Figure 12

 

 

    • Click on the “System Board Ambient Temperature” in the right-side navigation pane.
    • The Properties page tab will be displayed and select “Set to Values” option and set a value higher than the current Temperature reading to simulate the Warning alert for Temperature.
      • Figure 13
    • Click on Apply option to save the changes and verify that the Temperature status has been changed to Warning.
      • Figure 14

 

    • Navigate to the Events Tab in the HP SIM and the Temperature Alert should be shown in the Events page.
      • figure 15

The Server Administrator SNMP Traps details are documented in the Message Reference guide and can be downloaded from support.dell.com.

Appendix

MIB Files

#

MIB

MIB Filename

Description

1 Server Administrator Instrumentation MIB 10892.mib The Server Administrator Instrumentation MIB provides

instrumentation data that allows you to monitor the health of a system with

SNMP management applications. It provides:

• Information about the status of temperatures, power supplies, voltages,

currents, fans, and memory at key points in the system

• Rapid access to detailed fault and performance information gathered by

industry standard systems management agents

• Version information for Basic Input/Output System (BIOS), firmware, and operating system

• A detailed account of every cost of ownership (COO) detail about your system

In addition, traps are sent to report a change in status of the health of critical components.

2 Server Administrator Storage Management MIB dcstorag.mib The Server Administrator Storage Management MIB provides storage management data that allows you to monitor the health of storage resources with SNMP management applications.

 

If interested to know about Monitoring Dell Servers using HP Operations Manager for Windows, please refer to the Dell SPI white paper here

 

source: http://en.community.dell.com/techcenter/systems-management/w/wiki/3891.monitoring-dell-servers-in-hp-systems-insight-manager.aspx

Dell Client System Update Application

This package provides the application for Dell Client System Update and is supported on Optiplex, Tablet, Precision and Latitude models that are running the following Windows Operating Systems: XP, VISTA (32/64-bit), Windows 7 (32/64-bit) and Windows 8 (32/64-bit).

1.Dell Client System Update 1×1 Tool is a stand-alone application that provides a Windows Update like experience for Dell Business Client platforms.
2.The application provides the ability to retrieve and update systems software released by Dell.
Note: The updates offered and provided by this tool may be a subset of the updates offered on Dell’s support website.

link: http://www.dell.com/support/drivers/us/en/19/driverdetails?driverid=MJH8R

Manually Clearing the ConflictAndDeleted Folder in DFSR

source: http://blogs.technet.com/b/askds/archive/2008/10/06/manually-clearing-the-conflictanddeleted-folder-in-dfsr.aspx

 

Scenario 1: We need to empty out the ConflictAndDeleted folder in a controlled manner as part of regular administration (i.e. we just lowered quota and we want to reclaim that space).

Scenario 2: The ConflictAndDeleted folder quota is not being honored due to an error condition and the folder is filling the drive.

Let’s walk through these now.

Emptying the folder normally

It’s possible to clean up the ConflictAndDeleted folder through the DFSMGMT.MSC and SERVICES.EXE snap-ins, but it’s disruptive and kind of gross (you could lower the quota, wait for AD replication, wait for DFSR polling, and then restart the DFSR service). A much faster and slicker way is to call the WMI method CleanupConflictDirectory from the command-line or a script:

1.  Open a CMD prompt as an administrator on the DFSR server.
2.  Get the GUID of the Replicated Folder you want to clean:

WMIC.EXE /namespace:\\root\microsoftdfs path dfsrreplicatedfolderconfig get replicatedfolderguid,replicatedfoldername

(This is all one line, wrapped)

Example output:

image

3.  Then call the CleanupConflictDirectory method:

WMIC.EXE /namespace:\\root\microsoftdfs path dfsrreplicatedfolderinfo where “replicatedfolderguid='<RF GUID>'” call cleanupconflictdirectory

Example output with a sample GUID:

WMIC.EXE /namespace:\\root\microsoftdfs path dfsrreplicatedfolderinfo where “replicatedfolderguid=’70bebd41-d5ae-4524-b7df-4eadb89e511e'” call cleanupconflictdirectory

image

4.  At this point the ConflictAndDeleted folder will be empty and the ConflictAndDeletedManifest.xml will be deleted.

Emptying the ConflictAndDeleted folder when in an error state

We’ve also seen a few cases where the ConflictAndDeleted quota was not being honored at all. In every single one of those cases, the customer had recently had hardware problems (specifically with their disk system) where files had become corrupt and the disk was unstable – even after repairing the disk (at least to the best of their knowledge), the ConflictAndDeleted folder quota was not being honored by DFSR.

Here’s where quota is set:

image

Usually when we see this problem, the ConflictAndDeletedManifest.XML file has grown to hundreds of MB in size. When you try to open the file in an XML parser or in Internet Explorer, you will receive an error like “The XML page cannot be displayed” or that there is an error at line X. This is because the file is invalid at some section (with a damaged element, scrambled data, etc).

To fix this issue:

  1. Follow steps 1-4 from above. This may clean the folder as well as update DFSR to say that cleaning has occurred. We always want to try doing things the ‘right’ way before we start hacking.
  2. Stop the DFSR service.
  3. Delete the contents of the ConflictAndDeleted folder manually (with explorer.exe or DEL).
  4. Delete the ConflictAndDeletedManifest.xml file.
  5. Start the DFSR service back up.

For a bit more info on conflict and deletion handling in DFSR, take a look at:

Staging folders and Conflict and Deleted folders (TechNet)
DfsrConflictInfo Class (MSDN)

Implementing Content Freshness protection in DFSR

source: http://blogs.technet.com/b/askds/archive/2009/11/18/implementing-content-freshness-protection-in-dfsr.aspx

 

Background

Content Freshness is an admin-defined setting that you can set on a per-computer basis when using DFSR on Win2008 or Win2008 R2 – it does not exist on Windows Server 2003 R2. The DFSR database has a record for each Replicated Folder (RF) called CONTENT_SET_RECORD. This record contains a timestamp called “LastConnected”. We store this record on a per-Replicated-Folder basis because it’s possible for a replicated folder to be current when it’s connected to other members in that replication group. At the same time, another replicated folder can be stale because it is not connected with other members in its replication group. Every day, DFSR updates this timestamp to show the opportunity for replication occurred. When attempting replication for an RF between computers, the DFSR service checks if the last time replication was allowed is older than the freshness date. If the last-allowed-replicated date is newer, it replicates. If it’s not, we block replication.

By now, you’re asking yourself “why would I want to block replication.” Good question. DFSR has a JET database just like Active Directory, and it uses multi-master replication just like AD. This means that it must implement tombstones to deleted items to replicate. When a file is deleted in DFSR, the local database records the deletion as a tombstone in the database – a logical deletion. After 60 days DFSR garbage collects the record from the database and it is truly gone – a physical deletion. Online defragmentation of the database can now reclaim that whitespace. The 60 days allows all the replication partners to learn about the deletion and act on it.

And herein lays the problem. If a DFSR server cannot replicate an RF for more than 60 days, but then replication is allowed later, it can replicate out old deletions for files that are actually live or replicate out stale data and overwrite existing files. If you’ve ever worked on an Active Directory “lingering object” issue, you have seen what can happen when a DC that was offline for months is brought back up. This is why Strict Replication Consistency was invented for AD – Content Freshness protection is the same thing.

Being “unable to replicate” can mean any one of these scenarios:

  • Disabling the replication connections.
  • Deleting the replication connections (either one-way or in both directions).
  • Stopping the DFSR service.
  • Closing the schedule (i.e. setting “no replication”)
  • Keeping the server shut off.

This whole content freshness idea is novel enough that we went to the trouble of applying for a patent on it.

Implementing Content Freshness Protection

Content Freshness protection is not enabled by default. To turn it on you simply modify the DfsrMachineConfig setting for MaxOfflineTimeInDays on each DFSR server with:

wmic.exe /namespace:\\root\microsoftdfs path DfsrMachineConfig set MaxOfflineTimeInDays=<some value>

The recommendation is to set the value to 60:

wmic.exe /namespace:\\root\microsoftdfs path DfsrMachineConfig set MaxOfflineTimeInDays=60

Remember, this has to be done on all DFSR servers, as this change only affects the computer itself. This value is not stored in a central AD location, but instead in the DfsrMachineConfig.XML file that resides in the hidden operating system folder “%systemdrive%\system volume information\dfsr\config”:

image

You can also view your existing MaxOfflineTimeInDays with:

wmic.exe /namespace:\\root\microsoftdfs path DfsrMachineConfig get MaxOfflineTimeInDays

Remember, by default this protection is OFF and be assumed to be zero if there are no entries in the DfsrMachineConfig.xml.

Note: Sharp-eyed admins may notice that we actually have an AD attribute stamped on every Replication Group called ms-DFSR-TombstoneExpiryInMin that appears to control tombstone lifetime. It even has the value – in minutes – for 60 days. Sorry to disappoint you, but this attribute is never read by DFSR and changing it has no effect – tombstone lifetime garbage collection is always hard-coded to 60 days in the service and cannot be changed.

Protection in Action

Let’s see how all this works. My repro environment:

  • A pair of Windows Server 2008 R2 computers named 2008r2-fresh-01 and 2008r2-fresh-02
  • Replicating in a Replication Group named “RG1”
  • Using a Replicated Folder named “RF1”
  • Keeping a few user files in sync.
  • MaxOfflineTimeInDays set to 60 on 2008r2-fresh-02

Important note: I am going to simulate the offline time by rolling clocks forward. Never ever do this in production – this is for testing and demonstration purposes only. Also, I only set MaxOfflineTimeInDays on one server – you would do this on all servers.

So here’s my data:

image

Now I stop DFSR on 2008r2-fresh-02 and roll time forward to January 1st, 2010 on both servers – about 75 days from this writing. I then make a few changes on 2008r2-fresh-02.

image

And then I start the DFSR service back up on 2008r2-fresh-02.

  • My changed files do not replicate out
  • New files do not replicate in

I now have this event:

Log Name:      DFS Replication
Source:        DFSR
Date:          1/1/2010 3:37:14 PM
Event ID:      4012
Task Category: None
Level:         Error
Keywords:      Classic
User:          N/A
Computer:      2008r2-fresh-02.blueyonderairlines.com
Description:
The DFS Replication service stopped replication on the replicated folder at local path c:\rf1. It has been disconnected from other partners for 76 days, which is longer than the MaxOfflineTimeInDays parameter. Because of this, DFS Replication considers this data to be stale, and will replace it with data from other members of the replication group during the next replication. DFS Replication will move the stale files to the local Conflict folder. No user action is required.
Additional Information:
Error: 9061 (The replicated folder has been offline for too long.)
Replicated Folder Name: rf1
Replicated Folder ID: 5856C18F-CA72-4D2D-9D89-4CC1D8042D86
Replication Group Name: rg1
Replication Group ID: BC5976EF-997E-4149-819D-57193F21EC76
Member ID: FAEC4B17-E81F-4036-AAD9-78AA46814606

 

Note: this event has incorrect wording. The first two sentences in the description are good, but the following sentences are wrong. DFSR does not self-correct this situation, it does not move files into the ConflictAndDeleted folder, and you, the user, have actions you need to take. More on this later.

The DFSR Debug logs will show (edited for brevity):

20100101 15:37:14.410 1008 CSMG 5504 [WARN] ContentSetManager::CheckContentSetState This replicated folder has not connected to other partners for a long time. lastOnlineTime: [*** Logger Runtime Error:-114757888 ***]

20100101 15:37:14.410 1008 CSMG 7492 [ERROR] ContentSetManager::Initialize Failed to initialize ContentSetManager csId:{5856C18F-CA72-4D2D-9D89-4CC1D8042D86} csName:rf1 Error:

+ [Error:9061(0x2365) ContentSetManager::CheckContentSetState contentsetmanager.cpp:5596 1008 C The replicated folder has been offline for too long.]

20100101 15:37:14.410 1008 CSMG 7972 ContentSetManager::Run csId:{5856C18F-CA72-4D2D-9D89-4CC1D8042D86} csName:rf1 state:InitialBuilding

20100101 15:37:14.504 1948 SRTR 957 [WARN] SERVER_EstablishSession Failed to establish a replicated folder session. connId:{5E05AE2A-6117-4206-B745-7785DB316F74} csId:{5856C18F-CA72-4D2D-9D89-4CC1D8042D86} Error:

+ [Error:9028(0x2344) UpstreamTransport::EstablishSession upstreamtransport.cpp:808 1948 C The content set was not found]

The state of the replicated folder will be “In Error” – i.e. set to 5:

wmic.exe /namespace:\\root\microsoftdfs path DfsrReplicatedFolderInfo get ReplicationGroupName,ReplicatedFolderName,State

ReplicatedFolderName   ReplicationGroupName   State
rf1                               rg1                               5

The above is Content Freshness protection in action. It is protecting your DFSR environment from sending divergent data out to the rest of your working servers.

Recovering DFSR from Content Protection

Important note: Before repairing the blocked replication, get a backup of the data on the affected server and its partners. Failure to do will tempt Murphy’s Law to disastrous new heights. Understand that by following these steps below, any DFSR data that was on this server and never replicated will be moved to PreExisting and/or ConflictAndDeleted – this server goes through non-authoritative sync again and loses all conflicts with other DFSR servers. You have been warned!!!

Also, whatever is being done to stop replication from working needs to be ironed out – whether it is leaving the service off for months on end or not having any connections. Otherwise this is just going to happen again.

To get things back in order, do the following:

1. Start DFSMGMT.MSC on the affected server.

2. On any affected replication groups this server is a member of, select the computer on the Membership tab and “Disable” it.

image

3. Accept the warning prompt.

image

4. If the reason for replication never occurring was the schedule being set to “no replication” on the RG or RF, or no bi-directional connections being place between servers, fix that situation now.

5. Force AD Replication and verify it has converged.

6. On the affected server, run:

DFSRDIAG.EXE POLLAD

7. Wait for the 4008 and 4114 events being written to the DFSR event log to confirm that the replicated folder(s) are no longer being replicated.

8. In DFSMGMT.MSC, “Enable” the replication again on the affected replicated folders for that server.

9. Force AD replication and POLLAD again.

The server goes through non-authoritative initial sync, as if it was setup the first time. All matching data is unchanged and does not replicate. Any files on the server that do not exist on its authoritative partner are moved to the PreExisting folder. Any files on the server that have been changed locally are moved to the ConflictAndDeleted folder and the authoritative server’s copy is replicated inbound.

The Sum Up

Content Freshness protection is a good thing and putting it in place may someday save you some real pain. Trust me – we work cases here where Content Freshness being enabled would have stopped huge problems. All it takes is Windows Server 2008 or later, and a few moments of your time.