Source
The _source
field contains the original JSON document body that was indexed. While this field is not searchable, it is stored so that the full document can be returned when executing fetch requests, such as get
and search
.
Disabling the field
You can disable the _source
field by setting the enabled
parameter to false
, as shown in the following example request:
PUT sample-index1
{
"mappings": {
"_source": {
"enabled": false
}
}
}
Disabling the _source
field can impact the availability of certain features, such as the update
, update_by_query
, and reindex
APIs, as well as the ability to debug queries or aggregations using the original indexed document. To support these features without storing the _source
field explicitly, Derived source can be used without compromising storage constraints.
Including or excluding fields
You can selectively control the contents of the _source
field by using the includes
and excludes
parameters. This allows you to prune the stored _source
field after it is indexed but before it is saved, as shown in the following example request:
PUT logs
{
"mappings": {
"_source": {
"includes": [
"*.count",
"meta.*"
],
"excludes": [
"meta.description",
"meta.other.*"
]
}
}
}
These fields are not stored in the _source
, but you can still search them because the data remains indexed.
Derived source
OpenSearch stores each ingested document in the _source
field and also indexes individual fields for search. The _source
field can consume significant storage space. To reduce storage use, you can configure OpenSearch to skip storing the _source
field and instead reconstruct it dynamically when needed, for example, during search
, get
, mget
, reindex
, or update
operations.
To enable derived source, configure the derived_source
index-level setting:
PUT sample-index1
{
"settings": {
"index": {
"derived_source": {
"enabled": true
}
}
}
}
While skipping the _source
field can significantly reduce storage requirements, dynamically deriving the source is generally slower than reading a stored _source
. To avoid this overhead during search queries, do not request the _source
field when it’s not needed. You can do this by setting the size
parameter, which controls the number of documents returned.
For real-time reads using the Get Document API or Multi-get Documents API, which are served from the translog until refresh
happens, performance can be slower when using a derived source. This is because the document must first be ingested temporarily before the source can be reconstructed. You can avoid this additional latency by using an index-level derived_source.translog
setting that disables generating a derived source during translog reads:
PUT sample-index1
{
"settings": {
"index": {
"derived_source": {
"translog": {
"enabled": false
}
}
}
}
}
If this setting is used, you may notice differences in the _source
content for a document depending on whether it is still in the translog or has been written to a segment.
Supported fields and parameters
Derived source uses doc_values
and stored_fields
to reconstruct the document at query time. Because of the implementation of doc_values
, the dynamically generated _source
may differ in format or precision from the original ingested document.
Derived source supports the following field types, with most of them not requiring any changes to field mappings (with some limitations):
boolean
byte
,double
,float
,half_float
,integer
,long
,short
date
date-nanos
geo_point
ip
keyword
unsigned_long
scaled_float
text
wildcard
For a text
field with derived source enabled, the field value is stored as a stored field by default. You do not need to set the store
mapping parameter to true
.
To use the wildcard
field with a derived source, the mapping parameter doc_values
must be set to true
.
Limitations
Derived source does not support the following fields:
- Fields containing
copy_to
parameters. keyword
andwildcard
fields that define either theignore_above
ornormalizer
parameters.