Link Search Menu Expand Document Documentation Menu

ad (Deprecated)

The ad command is deprecated in favor of the ml command.

The ad command applies the Random Cut Forest (RCF) algorithm in the ML Commons plugin to the search results returned by a PPL command. The command provides two anomaly detection approaches:

To use the ad command, plugins.calcite.enabled must be set to false.

Syntax

The ad command has two different syntax variants, depending on the algorithm type.

Anomaly detection for time-series data

Use this syntax to detect anomalies in time-series data. This method uses the fixed-in-time RCF algorithm, which is optimized for sequential data patterns.

The fixed-in-time RCF ad command has the following syntax:

ad [number_of_trees] [shingle_size] [sample_size] [output_after] [time_decay] [anomaly_rate] <time_field> [date_format] [time_zone] [category_field]

Parameters

The fixed-in-time RCF algorithm supports the following parameters.

Parameter Required/Optional Description
time_field Required The time field for RCF to use as time-series data.
number_of_trees Optional The number of trees in the forest. Default is 30.
shingle_size Optional The number of records in a shingle. A shingle is a consecutive sequence of the most recent records. Default is 8.
sample_size Optional The sample size used by the stream samplers in this forest. Default is 256.
output_after Optional The number of points required by the stream samplers before results are returned. Default is 32.
time_decay Optional The decay factor used by the stream samplers in this forest. Default is 0.0001.
anomaly_rate Optional The anomaly rate. Default is 0.005.
date_format Optional The format used for the time_field field. Default is yyyy-MM-dd HH:mm:ss.
time_zone Optional The time zone for the time_field field. Default is UTC.
category_field Optional The category field used to group input values. The predict operation is applied to each category independently.

Anomaly detection for non-time-series data

Use this syntax to detect anomalies in data where the order doesn’t matter. This method uses the batch RCF algorithm, which is optimized for independent data points.

The batch RCF ad command has the following syntax:

ad [number_of_trees] [sample_size] [output_after] [training_data_size] [anomaly_score_threshold] [category_field]

Parameters

The batch RCF algorithm supports the following parameters.

Parameter Required/Optional Description
number_of_trees Optional The number of trees in the forest. Default is 30.
sample_size Optional The number of random samples provided to each tree from the training dataset. Default is 256.
output_after Optional The number of points required by the stream samplers before results are returned. Default is 32.
training_data_size Optional The size of the training dataset. Default is the full dataset size.
anomaly_score_threshold Optional The anomaly score threshold. Default is 1.0.
category_field Optional The category field used to group input values. The predict operation is applied to each category independently.

Example 1: Detecting events in New York City taxi ridership time-series data

The following examples use the nyc_taxi dataset, which contains New York City taxi ridership data with fields including value (number of rides), timestamp (time of measurement), and category (time period classifications such as ‘day’ and ‘night’).

This example trains an RCF model and uses it to detect anomalies in time-series ridership data:

source=nyc_taxi
| fields value, timestamp
| AD time_field='timestamp'
| where value=10844.0

The query returns the following results:

value timestamp score anomaly_grade
10844.0 2014-07-01 00:00:00 0.0 0.0

Example 2: Detecting events in New York City taxi ridership time-series data by category

This example trains an RCF model and uses it to detect anomalies in time-series ridership data across multiple category values:

source=nyc_taxi
| fields category, value, timestamp
| AD time_field='timestamp' category_field='category'
| where value=10844.0 or value=6526.0

The query returns the following results:

category value timestamp score anomaly_grade
night 10844.0 2014-07-01 00:00:00 0.0 0.0
day 6526.0 2014-07-01 06:00:00 0.0 0.0

Example 3: Detecting events in New York City taxi ridership non-time-series data

This example trains an RCF model and uses it to detect anomalies in non-time-series ridership data:

source=nyc_taxi
| fields value
| AD
| where value=10844.0

The query returns the following results:

value score anomalous
10844.0 0.0 False

Example 4: Detecting events in New York City taxi ridership non-time-series data by category

This example trains an RCF model and uses it to detect anomalies in non-time-series ridership data across multiple category values:

source=nyc_taxi
| fields category, value
| AD category_field='category'
| where value=10844.0 or value=6526.0

The query returns the following results:

category value score anomalous
night 10844.0 0.0 False
day 6526.0 0.0 False