The LLM Say Block's main goal is to accept user input and stream the message.

Here are main parameters that help us to optimize the answer:

  • LLM Model - we can choose which model will be the most suitable for our task
  • Context message - here we use a template that has proven to be the most optimal:
    {memory.query}  
    {memory.character}.  
    START CONTEXT BLOCK  
    {memory.context}  
    END OF CONTEXT BLOCK  
    {memory.instructions}. Answer in maximum {memory.output_tokens} tokens.
    
  • Number of previous messages to include in the context
  • Max tokens - NOT output limit, this is a number of tokens after which the message will be cut short
  • Streaming - This parameter defines how we want to display the message from the bot.
    • None - chunk by chunk (more or less letter by letter)
    • Words - word by word
    • Full sentences - sentence by sentence
      Changing this parameter is useful when configuring voice bots (they can't properly read with None and Words)