Adding Exporters to Prometheus

Integrating node_exporter, ipmi_exporter, and dcgm_exporter with Prometheus

Prerequisites

Before proceeding, ensure that:

  1. Prometheus is installed and running
  2. The exporters (node_exporter, ipmi_exporter, and/or dcgm_exporter) are installed and operational
  3. You have access to the Prometheus configuration file

Basic Prometheus Configuration

Updating prometheus.yml

  1. Locate your Prometheus configuration file:

    # Default location varies by installation method
    # Common paths include:
    /etc/prometheus/prometheus.yml
    /opt/prometheus/prometheus.yml
    
  2. Edit the configuration file to add scrape configurations for your exporters:

    sudo nano /etc/prometheus/prometheus.yml
    
  3. Add the following scrape configurations to the file:

    scrape_configs:
      # Existing scrape configs...
    
      # Node Exporter
      - job_name: "node"
        static_configs:
          - targets: ["localhost:9100"]
    
      # IPMI Exporter
      - job_name: "ipmi"
        scrape_interval: 1m
        scrape_timeout: 30s
        static_configs:
          - targets: ["localhost:9290"]
    
      # DCGM Exporter
      - job_name: "dcgm"
        scrape_interval: 30s
        static_configs:
          - targets: ["localhost:9400"]
    
  4. Reload Prometheus to apply the changes:

    sudo systemctl restart prometheus
    

Configuration Validation:

Always validate your Prometheus configuration before applying it: bash promtool check config /etc/prometheus/prometheus.yml

Advanced Configuration

Monitoring Multiple Hosts

To monitor multiple hosts, update your configuration with multiple targets:

scrape_configs:
  - job_name: "node"
    static_configs:
      - targets:
          - "server1:9100"
          - "server2:9100"
          - "server3:9100"
    labels:
      environment: "production"

Using Service Discovery

For dynamic environments, consider using service discovery:

scrape_configs:
  - job_name: "node"
    consul_sd_configs:
      - server: "localhost:8500"
        services: ["node-exporter"]

IPMI Exporter with Remote Targets

For monitoring multiple BMCs with a single IPMI exporter:

scrape_configs:
  - job_name: 'ipmi'
    scrape_interval: 1m
    params:
      module: [default]
    static_configs:
      - targets:
        - '192.168.1.101'  # BMC IP addresses
        - '192.168.1.102'
        - '192.168.1.103'
      labels:
        exporter: 'ipmi'
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: 'localhost:9290'  # IPMI exporter address

Metric Relabeling

Customizing Labels

You can use metric relabeling to add or modify labels:

scrape_configs:
  - job_name: "node"
    static_configs:
      - targets: ["localhost:9100"]
    relabel_configs:
      - source_labels: [__address__]
        target_label: instance
        regex: "(.*):.+"
        replacement: "${1}"
      - target_label: datacenter
        replacement: "dc1"

Filtering Metrics

To reduce storage requirements, you can filter out unnecessary metrics:

scrape_configs:
  - job_name: "node"
    static_configs:
      - targets: ["localhost:9100"]
    metric_relabel_configs:
      - source_labels: [__name__]
        regex: "node_disk_io_time_seconds.*"
        action: drop

Security Considerations

TLS Configuration

For secure communication with exporters:

scrape_configs:
  - job_name: "node"
    scheme: https
    tls_config:
      cert_file: /path/to/cert
      key_file: /path/to/key
      ca_file: /path/to/ca
    static_configs:
      - targets: ["secure-node:9100"]

Basic Authentication

If your exporters are protected with basic authentication:

scrape_configs:
  - job_name: "node"
    basic_auth:
      username: prometheus
      password: secret
    static_configs:
      - targets: ["localhost:9100"]

Secure Storage:

Store sensitive information like passwords using Prometheus' file-based secret management or a secret management tool like HashiCorp Vault.

Verifying Configuration

After configuring Prometheus, verify that it's correctly scraping metrics:

  1. Access the Prometheus web interface (usually at http://localhost:9090)
  2. Navigate to Status > Targets to see all configured targets
  3. Check that all exporters are in the "UP" state
  4. Use the Graph interface to query for metrics from each exporter:
    • Node Exporter: node_cpu_seconds_total
    • IPMI Exporter: ipmi_temperature_celsius
    • DCGM Exporter: DCGM_FI_DEV_GPU_UTIL

Labels Inspection:

Inspect the available labels for each metric to understand how to query and filter your data effectively. This is especially useful for creating dashboards and alerts.

Maintenance and Best Practices

Scrape Interval Optimization

Balance between data granularity and storage requirements:

  • Node Exporter: 15-30s for most environments
  • IPMI Exporter: 60s or longer (hardware metrics change slowly)
  • DCGM Exporter: 15-30s for active GPU monitoring

Storage Retention

Configure appropriate storage retention based on your needs:

# In prometheus.yml
storage:
  tsdb:
    path: /var/lib/prometheus/data
    retention.time: 15d # Retain data for 15 days
    retention.size: 30GB # Maximum storage size

Troubleshooting Common Issues

Target Scrape Failures

If a target is showing as "DOWN":

  1. Check connectivity:

    curl http://target-host:port/metrics
    
  2. Verify firewall rules to ensure the Prometheus server can access the exporter.

  3. Check exporter logs for any errors:

    sudo journalctl -u node_exporter
    sudo journalctl -u ipmi_exporter
    sudo docker logs dcgm-exporter
    

Missing Metrics

If expected metrics are not appearing:

  1. Check if the metric exists in the raw exporter output:

    curl http://localhost:9100/metrics | grep node_cpu
    curl http://localhost:9290/metrics | grep ipmi_temperature
    curl http://localhost:9400/metrics | grep DCGM_FI_DEV_GPU_UTIL
    
  2. Verify metric relabeling isn't dropping your metrics.

  3. Check collector configuration for specialized exporters.

Conclusion

By following this guide, you've successfully integrated multiple exporters with Prometheus to monitor your infrastructure comprehensively. The combination of node_exporter, ipmi_exporter, and dcgm_exporter provides visibility into system metrics, hardware health, and GPU performance, enabling proactive monitoring and troubleshooting of your entire stack.

For further assistance or advanced configurations, refer to the official documentation for each component: