tokenizer#
Tokenizer from tiktoken.
Classes
|
Tokenizer component that wraps around the tokenizer from tiktoken. |
- class Tokenizer(name: str = 'cl100k_base', remove_stop_words: bool = False)[source]#
Bases:
object
Tokenizer component that wraps around the tokenizer from tiktoken. __call__ is the same as forward/encode, so that we can use it in Sequential Additonally, you can can also use encode and decode methods.
- Parameters:
name (str, optional) – The name of the tokenizer. Defaults to “cl100k_base”. You can find more information
documentation. (at the tiktoken)