Skip to content

Add support for tracking performance of individual Events in the grok processor #4196

@graytaylor0

Description

@graytaylor0

Is your feature request related to a problem? Please describe.
As a user of a pipeline with many grok processors and patterns, it is difficult for me to debug the performance of my grok processors. The only metric is the grokProcessingTime and this is shared/aggregated between all grok processor instances. The only way to know which Events are spending a lot of time in grok is if the grok match times out, and tags the event with tags_on_timeout. However, there can still be very slow patterns that do not hit the pattern, and can be optimized to improve performance.

Describe the solution you'd like
An option to create metadata on Events that contains important debug information related to grok matching for this Event.

- grok:
     performance_metadata: true // defaults to false
     match:
       log:
          - %{PATTERN_1}
          - %{PATTERN_2}

When the include_performance_metadata flag is set to true, the grok processor can add metadata fields to the Event. To start, these metadata fields can be

_total_grok_processing_time: 2500 // in milliseconds
_total_grok_patterns_attempted: 10 // The number of individual patterns this Event attempted to match on 

These same metadata fields will be shared between all grok processors. So given this configuration

- grok:
     include_performance_metadata: true
     match:
       log:
          - %{PATTERN_1} // mismatch after 1000 ms
          - %{PATTERN_2} // matches after 1000 ms
- grok:
     performance_metadata: true
     match:
       log:
          - %{PATTERN_3} // mismatch after 1000 ms
          - %{PATTERN_4} // mismatch after 1000 ms

If an Event takes the path indicated by the comments, the end result of the metadata fields would be

_total_grok_processing_time: 4000
_total_grok_patterns_attempted: 4

This metadata can then be used with the getMetadata function of Data Prepper expressions as needed (such as copying it over to the Event with add_entries

- add_entries:
     entries:
        - add_when: 'getMetadata("_total_grok_processing_time") != null'
           key: "grok_processing_time"
           value_expression: 'getMetadata("_total_grok_processing_time")'

Describe alternatives you've considered (Optional)
Add this metadata to Events by default without the need for configuring the include_performance_metadata parameter. While minimal, this change could add memory unnecessarily

Another alternative is to keep the parameter, and default it to true, and allowing users to disable it if requested.

Additional context
Add any other context or screenshots about the feature request here.

Metadata

Metadata

Assignees

Labels

ease-of-useImproving the ease-of-use for an existing featureenhancementNew feature or request

Type

No type
No fields configured for issues without a type.

Projects

Status
Done

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions