master
gitea 1 week ago
parent b98df5c61a
commit ce68d22dbb

@ -77,6 +77,52 @@ Traps use the **ALM** trap OID branch: `iso.org.dod.internet.private.enterprises
- **@Severity 5** → `MON.I.FIB.00001` (link/fiber-related: threshold, link budget, root link fault). - **@Severity 5** → `MON.I.FIB.00001` (link/fiber-related: threshold, link budget, root link fault).
- **@Severity 4** → `MON.I.FIB.00002` (system/equipment: internal error, reboot, bad sys stat, monitoring process, auth, hardware, temperature, voltage, fan, etc.). - **@Severity 4** → `MON.I.FIB.00002` (system/equipment: internal error, reboot, bad sys stat, monitoring process, auth, hardware, temperature, voltage, fan, etc.).
### Reason code by trap (info from the trap → reason code)
The reason code is **derived from the trap type** (the trap OID / `snmpTrapOID` that comes in the trap). Netcool does not read a “reason code” varbind; it maps each trap OID to a fixed reason code. Use this table when enriching events (e.g. set reason code in event/incident info from `snmpTrapOID`).
| Trap OID (suffix) | Trap name | Reason code |
|-------------------|-----------|-------------|
| **ALM (1.14.0.X)** | | |
| .14.0.11 | alarmThresCrossedFast | MON.I.FIB.00001 |
| .14.0.12 | alarmThresCrossedMedium | MON.I.FIB.00001 |
| .14.0.13 | alarmThresCrossedSlow | MON.I.FIB.00001 |
| .14.0.14 | alarmLinkBudgetExceeded | MON.I.FIB.00001 |
| .14.0.15 | alarmLinkBudgetNearlyExceeded | MON.I.FIB.00001 |
| .14.0.45 | transientInternalError | MON.I.FIB.00002 |
| .14.0.46 | alarmRebootRunning | MON.I.FIB.00002 |
| .14.0.48 | alarmBadSysStat | MON.I.FIB.00002 |
| .14.0.50 | alarmMonProcNotRunning | MON.I.FIB.00002 |
| .14.0.60 | alarmEmailNotifyLinkBudgetExceeded | MON.I.FIB.00001 |
| .14.0.63 | authenticationNotificationSummary | MON.I.FIB.00002 |
| .14.0.68 | alarmRootLinkFault | MON.I.FIB.00001 |
| **SCALM (1.15.0.X)** | | |
| .15.0.11 | alarmThresCrossedFast | MON.I.FIB.00001 |
| .15.0.12 | alarmThresCrossedMedium | MON.I.FIB.00001 |
| .15.0.13 | alarmThresCrossedSlow | MON.I.FIB.00001 |
| .15.0.14 | alarmLinkBudgetExceeded | MON.I.FIB.00001 |
| .15.0.15 | alarmLinkBudgetNearlyExceeded | MON.I.FIB.00001 |
| .15.0.50 | alarmMonProcNotRunning | MON.I.FIB.00002 |
| .15.0.102 | alarmAinsState | MON.I.FIB.00002 |
| .15.0.103 | alarmRemoved | MON.I.FIB.00002 |
| .15.0.104 | alarmHwFailure | MON.I.FIB.00002 |
| .15.0.107 | alarmDatabaseFailure | MON.I.FIB.00002 |
| .15.0.110 | alarmHwDegrade | MON.I.FIB.00002 |
| .15.0.111 | alarmHwFailure | MON.I.FIB.00002 |
| .15.0.112 | alarmLinkDown | MON.I.FIB.00002 |
| .15.0.121 | transientSwResetReload | MON.I.FIB.00002 |
| .15.0.122 | alarmTemperatureTooHigh | MON.I.FIB.00002 |
| .15.0.123 | transientBootUpFailed | MON.I.FIB.00002 |
| .15.0.124 | transientBootUpCompleted | MON.I.FIB.00002 |
| .15.0.125 | transientBootUpStarted | MON.I.FIB.00002 |
| .15.0.128 | alarmVoltageOutOfRange | MON.I.FIB.00002 |
| .15.0.129 | alarmMultipleFanFailure | MON.I.FIB.00002 |
| .15.0.130 | alarmCurrentTooHigh | MON.I.FIB.00002 |
| .15.0.131 | alarmInputVoltageFailure | MON.I.FIB.00002 |
| .15.0.303 | authenticationNotificationSummary | MON.I.FIB.00002 |
**Note:** Trap OID in the trap is usually `1.3.6.1.4.1.2544.1.14.0.X` (ALM) or `1.3.6.1.4.1.2544.1.15.0.X` (SCALM). Netcool matches `.14.6.X` / `.15.6.X` (same trap number X). Traps not listed (e.g. .14.0.62 authenticationNotification) are not handled in the Netcool rules and are discarded; if you handle them elsewhere, you can assign e.g. **MON.I.FIB.00002** for authenticationNotification to align with auth summary.
### Trap branches implemented (by OID suffix) ### Trap branches implemented (by OID suffix)
**ALM branch (2544.1.14.6.X) monitor-unit style (portAidString, portName, alarmSeverity):** **ALM branch (2544.1.14.6.X) monitor-unit style (portAidString, portName, alarmSeverity):**
@ -279,6 +325,33 @@ message_key = {entityName}|{snmpTrapOID}|{eventLogIndex}|{resource}
- **Deduplicate duplicate receptions:** Same key → update existing event instead of creating a new one. - **Deduplicate duplicate receptions:** Same key → update existing event instead of creating a new one.
- **Implement in event rules:** In ServiceNow (or the probe), compute this string from `additional_info` / varbinds and set **message_key** so that event management can deduplicate and correlate correctly. - **Implement in event rules:** In ServiceNow (or the probe), compute this string from `additional_info` / varbinds and set **message_key** so that event management can deduplicate and correlate correctly.
### Netcool incidents: how this aligns with real incidents
Imported incidents from Netcool (e.g. `incident (business_service.name=FIBER MONITORING NLNL).xml`) show the following:
**What Netcool sends into ServiceNow (per incident):**
- **Node / CI:** Hostname of the trap sender after the Netcool rule (e.g. **INFRAMON-CCAN**, **INFRAMON-N**, **INFRAMON-Z**) — from `@Node` after stripping `.nl.eu.abnamro.com`.
- **Summary / short_description:** The Netcool `@Summary` format, e.g.
`:MON.I.FIB.00001: - aab_inframon-ccan: alarmThresCrossedFast: portAidString = MCH-1-10, portName = Duct 1-miniduct 3 Bruin, alarmSeverity = 4 - AlertKey: NCOSnmpProbe:FATAL`
- **Alert ID:** Netcools own identifier (e.g. **Alert1778303**, **Alert1621700**). One incident corresponds to one Netcool Alert; close_notes reference “Closed the task associated with alert: Alert1778303”.
- **Reason code:** `MON.I.FIB.00001` / `MON.I.FIB.00002` (from the Fiber Guardian rules).
- **Trap type name:** alarmThresCrossedFast, alarmThresCrossedSlow, alarmLinkBudgetExceeded, etc. (from the rule branch).
- **portAidString / portName:** Resource (e.g. MCH-1-10, “Duct 1-miniduct 3 Bruin”).
- **No event log index** is present in the incident — Netcool does not forward that to ServiceNow; it uses its own Alert ID for correlation.
**Conclusion:**
1. **Event (em_event) message_key** — The key **entityName|snmpTrapOID|eventLogIndex** is for **raw trap events**. It uses data that exists in the trap varbinds (and in your test trap). It is the right key for deduplicating **incoming traps** at the event layer. Netcool incidents dont contain event log index because Netcool correlates by Alert ID internally; that doesnt change the validity of the message_key for events built from raw traps.
2. **Incident-level correlation** — For **creating or updating incidents** from events (so that “same alarm on same port” maps to one incident), the Netcool data shows the effective uniqueness is **Node + trap type + port (portAidString)**. So a good **incident correlation key** when you dont have Netcool Alert ID is:
```text
incident_correlation_key = {Node}|{trapTypeName}|{portAidString}
```
Example: `aab_inframon-ccan|alarmThresCrossedFast|MCH-1-10`. That matches how each Netcool incident is effectively “one alert per node + alarm type + port.” For traps without a port (e.g. authenticationNotification), use Node and trap type only.
3. **Mapping entity to Node** — In traps, “entity” is often the unit name (e.g. MCH-1-1) from the event log varbind; in incidents, “Node” is the hostname (e.g. INFRAMON-CCAN). They can differ (one device may host multiple units). When building incident correlation from raw traps, use the **trap source hostname** (or resolved CI) as Node, and keep **entityName** from the varbind for context in the event; the incident key should use the same Node concept as Netcool (sending host) plus trap type and port when present.
So: the **message_key** we defined is appropriate for **events**; for **incidents**, use **Node|trapTypeName|portAidString** (with port omitted when not applicable) to mirror Netcools one-incident-per-alert-per-port behavior.
--- ---
## 5. Cross-reference: ServiceNow vs Netcool vs MIB ## 5. Cross-reference: ServiceNow vs Netcool vs MIB

File diff suppressed because it is too large Load Diff
Loading…
Cancel
Save

Powered by TurnKey Linux.