(dnt:collection-indices)=
# Collection Indices

Collection indices refer to materialised subsets of the main index, which seem to be important / relevant 
to keep them pre-computed and pre-stored. 

Collection indices may contain only the individual preprocessed web-pages plus metadata stored in parquet files or also pre-computed `ciff` index files.


## Collection Index: `legal`

One collection index that is available is called `legal`, as it contains only


## Collection Index: `embeddings`

We will release an additional collection index containing embeddings for (part of) our crawled data. The parquet files
in the `embeddings` index will contain the following information:

- `start_end_position`: A list of tuples containing the start and end character positions of every chunk.
- `embeddings`: A list of embeddings created for the document.
