Log Ingestion Control (Hosts & Kubernetes Clusters)

Log Ingestion Control defines how file-based logs are collected, parsed, enriched, and secured before they are sent to Middleware.

Think of this layer as the quality gate for logs: it decides what to read, how to structure it, and what to protect before data moves further in the pipeline.

These controls run at the agent / OpenTelemetry Collector level, so they help reduce noise, storage load, network usage, and compliance risk early in the flow.

The same core capabilities are available for:

  • Hosts
  • Kubernetes Clusters

Kubernetes also adds cluster-aware selection controls such as namespace, deployment, and pod filters.

Infra Log Monitoring

Middleware Agent Pipeline configuration interface

Infra Log Monitoring controls whether file-based logs are collected at all. This is the first decision point in ingestion: if it is disabled, downstream log path and parsing logic cannot operate.

What happens when it is OFF

When this toggle is disabled:

  • No file-based logs are scraped
  • Log paths, labels, regex parsing, and multiline parsing are all disabled
  • The agent does not read any log files from disk

Use this mode when:

  • You only want metrics and traces
  • You are relying on integration-based data instead of file logs

What happens when it is ON

When enabled, full log collection configuration becomes available:

FeaturePurpose
Log PathsDefines which files should be read
LabelsAdds searchable metadata to logs
Multiline ParsingCombines multi-line entries into one event

Log Path Configuration

Middleware Agent Pipeline configuration interface

Log path configuration depends on where the agent is running.

The goal is to ensure you collect only relevant logs and avoid broad patterns that bring in unnecessary noise.

EnvironmentHow logs are selected
HostsDirect file paths (default path model)
Kubernetes ClustersDefault paths, or Namespace / Pod / Deployment filters

A. Log Paths for Hosts

On host-based agents, logs are collected directly from disk paths. You define Linux-style glob patterns to control exactly which files are ingested.

Example paths

  • /var/log/**/*.log
  • /home/app/logs/*.log
  • /var/lib/docker/containers/*/*.log

These patterns directly determine your ingestion scope, so tighter paths usually produce cleaner and lower-cost results.

Per-path configuration:

Each path can have independent settings, which is useful when different applications write logs in different formats.

FeaturePurpose
Multiline ParsingJoins stack traces and split log lines
Path LabelsAdds static context for that specific pat

Example:

PathLabels
/var/log/nginx/*.logservice=nginx
/var/log/mysql/*.logservice=mysql

B. Log Selection for Kubernetes Clusters

Kubernetes supports both broad and fine-grained selection approaches.

Default Path Configuration

Applies one shared path configuration to all selected clusters.

Use this when:

  • Clusters share a common logging structure
  • You want standard behaviour across environments

Cluster-Specific Configuration Lets you override settings for each cluster individually:

  • Paths
  • Labels
  • Multiline parsing

Use this when:

  • Clusters differ by team or workload type
  • Logging layouts vary between environments

Two collection methods in Cluster-Specific mode

Option 1: Filter via Path

Path-based selection, similar to host mode.

Example:

  • /var/log/pods/*/*.log

Option 2: Filter via Namespace, Deployments, and Pods

Kubernetes-native selectors for more precise targeting.

For each cluster, you can:

  • Include or exclude namespaces
  • Include or exclude deployments
  • Include or exclude pods

Examples:

  • Exclude kube-system namespace
  • Include only payments deployment
  • Exclude noisy debug pods

This method is especially useful when pod-level control is needed without maintaining many path rules.

2. Scrape Attributes from Log Body (Regex Parsing)

Regex parsing extracts fields from raw log body text and converts them into structured attributes.

This is important when logs are plain text but teams still need searchable fields in the UI.

Example log:

[2024-12-18T05:27:45.361245] dev test

Regex:

\[(?P<date>.+)\] (?P<log_message>.+)

Result:

{
  "date": "2024-12-18T05:27:45.361245",
  "log_message": "dev test"
}

You can then filter directly by:

  • date
  • log_message

3. JSON Log Processing

Enable JSON Log Processing when logs are JSON formatted or contain nested objects. Middleware auto-parses nested keys so fields become queryable and easier to use in search and dashboards.

JSON Log Processing

Examples:

  • instant.epochSecond
  • instant.nanoSecond
  • service.name

These appear as parsed body attributes in Logs UI.

4. Add Custom Labels

Custom labels attach static metadata to logs during ingestion.

They help with ownership, segmentation, and operational filtering at scale.

JSON Log Processing

Example labels:

  • env = production
  • team = payments
  • cluster = eu-west

Typical benefits:

  • better multi-team separation
  • cleaner environment filters
  • improved cost and ownership tracking

5. Clone Log Attributes

Clone Log Attributes lets you rename fields for consistency across services and log formats.

Clone Log Processing

Example:

  • log.file.namelogFileName

Use this to standardize naming conventions and reduce query confusion across teams.

6. Sensitive Data Masking

Sensitive Data Masking applies regex-based replacement before logs leave the machine. This keeps secrets and personal data from being forwarded in readable form.

Sensitive Data Masking

Example:

  • Regex: password=[^& ]+
  • Replace: password=***
  • Result: password=***

This protects:

  • secrets
  • tokens
  • PII

and strengthens compliance posture early in the telemetry path.

Need assistance or want to learn more about Middleware? Contact our support team at [email protected] or join our Slack channel.