-
Notifications
You must be signed in to change notification settings - Fork 11
Open
Labels
👋🏼 good first issueGreat for new contributorsGreat for new contributors🙋🏼♂️ help wantedExtra attention is appreciatedExtra attention is appreciated🥳 enhancementNew feature or requestNew feature or request
Description
Need to implement a smarter method of tokenization which takes into account languages that traditionally does not use spaces between words (currently resulting in full-sentence tokens not suitable for the current method of cosine similarity comparisons).
Some of these languages include:
- Chinese
- Japanese
- Thai
- Khmer
- Lao
- Burmese
GitHub30
Metadata
Metadata
Assignees
Labels
👋🏼 good first issueGreat for new contributorsGreat for new contributors🙋🏼♂️ help wantedExtra attention is appreciatedExtra attention is appreciated🥳 enhancementNew feature or requestNew feature or request