Atlassian Confluence source
You can use the OpenSearch Data Prepper confluence
source to ingest records from one or more Atlassian Confluence spaces.
Usage
Set up Confluence project access credentials by choosing one of the following options:
- Basic authentication (API key authentication): Follow these instructions.
- OAuth2 authentication: Follow these instructions.
As an additional optional step, store the credentials in AWS Secrets Manager. If you don’t store the credentials in AWS Secrets Manager, then you must provide plain-text credentials directly in the pipeline configuration.
The following example pipeline specifies confluence
as a source. The pipeline ingests data from multiple Confluence spaces named space1
and space2
and applies filters to select wiki content (pages and blog posts) from these projects as a source:
version: "2"
extension:
aws:
secrets:
confluence-account-credentials:
secret_id: "arn:aws:secretsmanager:us-east-1:123456789012:secret:confluence-credentials-secret"
region: "us-east-1"
sts_role_arn: "arn:aws:iam::123456789012:role/Example-Role"
atlassian-confluence-pipeline:
source:
confluence:
hosts: ["https://example.atlassian.net/"]
acknowledgments: true
authentication:
# Provide one of the authentication method to use. Supported methods are 'basic' and 'oauth2'.
# For basic authentication, password is the API key that you generate using your confluence account
basic:
username: ${{aws_secrets:confluence-account-credentials:username}}
password: ${{aws_secrets:confluence-account-credentials:password}}
# For OAuth2 based authentication, we require the following 4 key values stored in the secret
# Follow atlassian instructions at the following link to generate these keys
# https://developer.atlassian.com/cloud/confluence/oauth-2-3lo-apps/
# If you are using OAuth2 authentication, we also require, write permission to your aws secret to
# be able to write the renewed tokens back into the secret
# oauth2:
# client_id: ${{aws_secrets:confluence-account-credentials:clientId}}
# client_secret: ${{aws_secrets:confluence-account-credentials:clientSecret}}
# access_token: ${{aws_secrets:confluence-account-credentials:accessToken}}
# refresh_token: ${{aws_secrets:confluence-account-credentials:refreshToken}}
filter:
space:
key:
include:
# This is not space name.
# It is an alphanumeric space key that you can find under space details in confluence
- "space1"
- "space2"
# exclude:
# - "<<space key>>"
# - "<<space key>>"
page_type:
include:
- "page"
# - "blogpost"
# - "comment"
# exclude:
# - "attachment"
Configuration options
The confluence
source supports the following configuration options.
Option | Required | Type | Description |
---|---|---|---|
hosts | Yes | List | The Atlassian Confluence hostname. Currently, only one host is supported, so this list is expected to be of size 1. |
acknowledgments | No | Boolean | When set to true , enables the confluence source to receive end-to-end acknowledgments when events are received by OpenSearch sinks. |
authentication | Yes | authentication | Configures the authentication method used to access confluence source records from the specified host. |
filter | No | filter | Applies specific filter criteria while extracting Confluence content. |
Authentication
You can use one of the following authentication methods to access a Confluence host. You must provide one of the following parameters.
Option | Required | Type | Description |
---|---|---|---|
basic | Yes | basic | Basic authentication credentials used to access a Confluence host. |
oauth2 | Yes | oauth2 | OAuth2 authentication credentials used to access a Confluence host. |
Basic authentication
Either basic or OAuth2 credentials are required to access the Confluence site. If you use basic
authentication, the following fields are required.
Option | Required | Type | Description |
---|---|---|---|
username | Yes | String | A username or reference to the secret key storing the username. |
password | Yes | String | A password (API key) or reference to the secret key storing the password. |
OAuth2 authentication
Either basic or OAuth2 credentials are required to access the Confluence site. If you use OAuth2, the following fields are required.
Option | Required | Type | Description |
---|---|---|---|
client_id | Yes | String | A client_id or reference to the secret key storing the client_id . |
client_secret | Yes | String | A client_secret or reference to the secret key storing the client_secret . |
access_token | Yes | String | An access_token or reference to the secret key storing the access_token . |
refresh_token | Yes | String | A refresh_token or reference to the secret key storing the refresh_token . |
Filter
Optionally, you can specify filters to select specific content, shown in the following table. If no filters are specified, all the spaces and content visible for the specified credentials are extracted and sent to the specified sink in the pipeline.
Option | Required | Type | Description |
---|---|---|---|
space | No | String | A list of space keys to include or exclude. |
page_type | No | String | A list of page type filters to include or exclude. |
AWS secrets
You can use the following options in the aws
secrets configuration if you plan to store the credentials in AWS Secrets Manager. Storing secrets in AWS Secrets Manager is optional. If AWS Secrets Manager is not used, credentials must be specified in the pipeline YAML itself, in plain text.
If OAuth2 authentication is used in combination with aws
secrets, this source requires write permissions to AWS Secrets Manager to be able to write back the updated (or renewed) access token once the current token expires.
Option | Required | Type | Description |
---|---|---|---|
region | Yes | String | The AWS Region to use for credentials. Defaults to the standard SDK behavior for determining the Region. |
sts_role_arn | Yes | String | The AWS Security Token Service (AWS STS) role to assume for requests to Atlassian Confluence. Defaults to null , which uses the standard SDK behavior for credentials. |
secret_id | Yes | Map | The Amazon Resource Name (ARN) of the secret where the credentials are stored. |
Metrics
The confluence
source includes the following metrics (counters):
crawlingTime
: The amount of time taken to crawl through all the new changes in Confluence.pageFetchLatency
: The page fetch API operation latency.searchCallLatency
: The search API operation latency.searchResultsFound
: The number of pages found in a specified search API call.