Built a handy tool to visualise multiple #OpenAI tokenizers with precise token counts, including #ChatGPT prompt messages. 💬🤖
Tiktokenizer
What is Tiktokenizer?
In short: Tool to tokenize text, show token count, token price and visualise tokens in text.
Tiktokenizer is an AI tool developed by dqbd that tokenize and count the number of tokens in a given text. It is a helpful assistant that provides quick insights about token count and pricing for many any given prompt.
Popular Models:
- gpt-3.5-turbo
- gpt-4
- gpt-4-32k
- gpt-3.5-turbo text-davinci-003
- text-embedding-ada-002
Encoders
- gpt2
- c/100k_base p50k_base p50k_edit
- r50k_base
Features
- Tokenization: Tiktokenizer utilizes advanced tokenization techniques to break down text into tokens for better analysis and understanding.
- Token Count: It provides an accurate count of tokens present in the given text, giving users an understanding of the length and complexity of their content.
- Pricing Information: With Tiktokenizer, users can estimate the cost per prompt based on the token count, helping them plan and manage their usage efficiently.
- User-Friendly Interface: The tool's simple and intuitive interface makes it easy for users to input text, view token count, and calculate pricing in a hassle-free manner.
What people are saying about Tiktokenizer
Awesome visualization, David. This is going to be super helpful 🙏 TIL gpt-3.5-turbo uses cl100k_base 😄 Gotta update my codes...
Tiktokenizer - like the OpenAI web tokenizer for counting tokens in your pasted content, but it also computes costs. Using the OpenAI tiktoken library that computes cl100k_base and dozens of engines, not just a guess. OpenML formatting, etc.
dqbd/tiktokenizer: Online playground for OpenAPI tokenizers github.com/dqbd/tiktokeni…
TikTokenizer is a useful little app 💎 to computer the price per prompt of #GPT4 #ChatGPT #GPT3 and most available @OpenAI models. Based on a WASM port of TikToken it works as a #Node api as well! Made by @__dqbd tiktokenizer.vercel.app
What makes this even better is that the hierarchy or relation is based on the white spaces, which happen to be only one token!
Other related tools
Cursor is an AI-first code editor designed for pair-programming, offering features like code browsing, documentation referencing, code generation, bug fixing, and seamless migration from VSCode. It aims to empower developers and accelerate software development.