Geschreven door Bas Bakkenes

Streamline migrations and validate configurations with CriblUtilities

Data24 maart 20254 minuten leestijd

Cribl Stream is recognized for its ability to efficiently structure data flows while providing capabilities to mask, transform, reduce, and route data. The platform's straightforward implementation and operation make it particularly suitable for enterprise environments, which has contributed to Cribl’s rapid adoption among Fortune 100 companies globally. Cribl Stream offers immediate functionality that many organizations have desired to incorporate for a long time.

Our experience with this solution spans several years, during which we have consistently been impressed by its organizational impact. As a partner we are working together with Cribl to enhance the products themselves and for the features outside of the core functionality we have developed an open-source solution: CriblUtilities - A comprehensive tool to streamline migrations and validate configurations. Our open source solution ensures a smooth implementation of Cribl Stream within your organization.

Migration Capabilities

We started our development with the core functionality of database migration. Splunk provides DBConnect as a supplementary application for Splunk Heavy Forwarder. However, the architecture of the Heavy Forwarder does not support high availability. While it is possible to construct a high-availability solution around it, Cribl Stream natively supports database queries at defined intervals with built-in high availability. Making this a quick win for organizations that are starting their Cribl Stream implementation.

All you need is a user on your Cribl API and the three configuration files from Splunk and our CriblUtilities solution takes care of the rest.

The development process required substantial testing and coding efforts. Splunk utilizes a modified implementation of TOML (Tom's Obvious, Minimal Language). The standard Python TOML module proved to be incompatible with this implementation, and requires data reformatting to ensure compatibility. The migration process transforms three distinct Splunk files (db_connections.conf, identities.conf, and db_inputs.conf) into two Cribl outputs (Database Connection Knowledge items and Database Collectors). Followed by verification and additional configuration adjustments, such as schedule activation, the new configurations are deployed to Cribl via the Cribl API.

Currently, this represents the sole migration path incorporated into the utility. It serves as a foundation for the continued expansion of migration capabilities in future development.

Above is a picture shown with a couple of steps. In step 1 the setup of CriblUtilities is started. A few required parameters are asked from the user via the prompt, this results in a ".env" file. Alternatively, you can also set this file manually. In step 2 the empty screens are shown to prove the environment is empty. In step 3 the actual migration is done. The script gives insight into what is done. In step 4 the proof is shown that the database connections and collectors are created in Cribl Stream.

Verification Functions

Cribl Stream provides native support for GitOps, which we consider very beneficial as it facilitates a unified approach to configuration promotion, testing, and administration. The implementation process is straightforward, requiring only a single API command. Our clients typically establish a workflow wherein designated individuals review merge requests between environments to identify potential configuration anomalies.

CriblUtilities does not aim to replace these manual reviews but rather to reduce their scope. The utility can be integrated into the pipeline to perform configuration verification prior to manual review, thereby reducing reviewer workload and implementing consistent validation procedures.

Our initial development focused on YAML file linting. As Cribl utilizes standard YAML formatting, we leverage the Python YAML module to identify unsupported modifications, such as improperly formatted local changes. The prevention of improperly formatted YAML files is critical, as they may compromise Cribl Stream in production.

Establishing naming conventions for Cribl Stream configuration elements is considered best practice. We have implemented naming convention verification as an additional feature in CriblUtilities. Organizations typically maintain conventions for worker groups, sources, destinations, data routes, pipelines, and packs. As configuration types store naming information in various locations, our utility simplifies verification through commands such as:

cribl-utilities check naming --conf /opt/cribl --field sources --regex "src_.*"

For pipeline configurations, similar verification can be performed with:

cribl-utilities check naming --conf /opt/cribl --field pipelines --regex "ppl_.*"

In this example there are four worker groups created. Only one applies to the new naming convention. Which states that the name of a worker group should start with "wgp_".

In this example you see the default worker group. We created an exception for this worker group in the naming convention since this group is always there. This gives two workers groups that do not apply to the naming convention:

Use the debug tag to check whether this checks all configurations you expect. Especially filtered_fields are interesting in this case:

Open Source Commitment

We have decided to launch this initiative as an open-source project due to our commitment to transparency. CriblUtilities does not transmit data to external sources with the exception of your designated Cribl environment. Our open approach enables verification of these assurances. We encourage developer participation to enhance functionality, and welcome issue reports from all users to facilitate ongoing improvement.

Deel dit artikel

Geschreven door CINQ

Meer blogs

Development18-03-2025 by Ian van Nieuwkoop

Inside the Fastest Solutions of the 1 Billion Row Challenge

CINQ14-03-2025 by Marc Jacobs

Bekijk de aftermovie van de CINQ Conferentie 2025

Data17-09-2024 by James Twose