Vocabulary object to be used with nvtext::wordpiece_tokenizer. More...

Public Member Functions
	wordpiece_vocabulary (cudf::strings_column_view const &input, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=cudf::get_current_device_resource_ref())
	Vocabulary object constructor. More...

Detailed Description

Vocabulary object to be used with nvtext::wordpiece_tokenizer.

Definition at line 36 of file wordpiece_tokenize.hpp.

Constructor & Destructor Documentation

nvtext::wordpiece_vocabulary::wordpiece_vocabulary	(	cudf::strings_column_view const &	input,
		rmm::cuda_stream_view	stream = `cudf::get_default_stream()`,
		rmm::device_async_resource_ref	mr = `cudf::get_current_device_resource_ref()`
	)

Vocabulary object constructor.

Token ids are the row indices within the vocabulary column. Each vocabulary entry is expected to be unique otherwise the behavior is undefined.

Exceptions

std::invalid_argument if vocabulary contains nulls or is empty

Parameters

input	Strings for the vocabulary
stream	CUDA stream used for device memory operations and kernel launches
mr	Device memory resource used to allocate the returned column's device memory

The documentation for this struct was generated from the following file: