LLM Say
The LLM Say Block's main goal is to accept user input and stream the message.
Here are main parameters that help us to optimize the answer:
- LLM Model - we can choose which model will be the most suitable for our task
- Context message (RAG) - here we use a template that has proven to be the most optimal:
{memory.query} {memory.character}. START CONTEXT BLOCK {memory.context} END OF CONTEXT BLOCK {memory.instructions}. Answer in maximum {memory.output_tokens} tokens.
- Number of previous messages to include in the context
- Max tokens - NOT output limit, this is a number of tokens after which the message will be cut short
- Streaming - This parameter defines how we want to display the message from the bot.
- None - chunk by chunk (more or less letter by letter)
- Words - word by word
- Full sentences - sentence by sentence
Changing this parameter is useful when configuring voice bots (they can't properly read with None and Words)
Tool calling
The LLM Say block can also be used in the form of Tool Calling. Tool Calling is a feature that enhances conversational AI by enabling it to interact with external systems and data sources in real time, making bots more powerful and intelligent than ever.
Its key advantage is the ability to go beyond static, pre-trained knowledge and access up-to-date information. This allows bots to deliver real-time answers and perform tasks that rely on live data. By connecting to internal or third-party APIs, your AI can integrate seamlessly with existing business workflows.

This feature can function like an AI router, but with the added capability of defining exactly what information is required from the customer in a given context. It proactively gathers the necessary details step by step, and only once all relevant data is collected does it direct the conversation to the appropriate process or workflow.
It operates on a question–answer loop, exiting only once sufficient information has been obtained to proceed.

Updated 21 days ago