Ways to improve accuracy and response quality
Response Quality is one of the key tasks when configuring bots based on LLM models. There are many methods to enhance it. In this chapter, we will consider the main ones:
improving the quality of prompts
reducing the context
We dedicated the entire previous chapter to the first point, so in this chapter, we will focus on the second.
Reducing Context
Each model has a parameter known as attention - this is how strictly the model follows its system instruction. There is a pattern - the smaller the instruction, the easier it is for the model to follow it. When we are dealing with large instructions, some of its important aspects may be missed by the bot. In this case, we can use phrases like "Pay attention, Always" etc., but this may not provide stable results when we want accuracy 10 out of 10.
A logical conclusion is - reducing the instruction or "context".
Reducing context means optimizing and reducing the volume of the main instruction or the information transmitted to the model.
Reducing the instruction at the cost of semantic load is not always correct, so there are mechanisms that allow for optimally distributing semantic context and reducing the instruction without losing quality:
Use of a knowledge base
Reducing dialogue history
Use of tables
Multi-agent systems
Use of a Knowledge Base
Using a knowledge base allows moving entire blocks of information from the overall instruction to a separate section, for example:
Let's assume the instruction includes a block of information about the company's details (such blocks can be numerous):
Our contacts:
LLC "Horns and Hooves"
INN 7811616380 / KPP 781101002
JSC "TBank"
BIK 044525974
acc. 40702810310000021420
Address: Moscow, Izmailovsky 30, off. 15
This block is needed only at the client's request when the client explicitly asks about the details. In this case, you can create a file in the knowledge base named "Company Details" and insert this text into it, removing it from the instruction:
Similarly, all standard answers to standard questions are moved to the knowledge base.
You can read more about how to work correctly with the knowledge base in the Knowledge Base section.
Reducing Dialogue History
Reducing dialogue history means that we can transmit not the entire history of dialogue with the client, but only messages for the necessary time period, or not transmit it at all.
In the beauty industry, clients often return regularly, with repeated requests. If you gather the entire history of their communication over this period and transfer it to the bot, it will become excessively long and ineffective for solving a specific current task.
A lengthy instruction and history often blur the bot's attention.
The bot does not know when a specific message was written, which can confuse it.
It simply won't understand what the client needs, considering the previous correspondence, spending its "attention" on it.
There are also situations when a bot is created not for dialogue conduction but simply to answer specific questions without considering the history.
For all these situations, there is a setting that allows trimming the dialogue context. It can be found in the additional settings and is called "Message History (Context)":
By selecting the appropriate option, you can adjust the context transmission for your case.
More details on the section of additional settings can be found in a separate section.
Use of Tables
An extended instruction on setting up tables is located in a separate section. But here we will touch on them regarding the task of improving response quality.
Suppose you have a price list with 100 positions. 100 positions are not too much and not too little in the context of model work. To provide information about this price list, we could go several ways:
Load the price into the instruction
Load the price into the knowledge base
Break the price into several files in the knowledge base
Use tables
First option is not considered - it is not reasonable, the instruction will be large - the model will be overloaded with context, and the cost of one message will increase significantly.
❌ Cons:
instruction overload
large volume of tokens = cost
✅ Pros:
simplicity of setup
Second option is better, BUT still - it is a large volume of information, which the model will have to read at once, and since what it reads by default is stored in its history, after triggering this file will increase the cost of all subsequent messages.
❌ Cons:
history overload
large volume of tokens = cost
✅ Pros:
simplicity of setup
Third option with a breakdown of the price into different files in the knowledge base is generally better than the previous ones, it will reduce the cost, BUT it will still increase due to the number of files within the knowledge base, nevertheless, thanks to the breakdown, history overload can be avoided here.
❌ Cons:
average volume of tokens = cost
additional hassle with setup
✅ Pros:
average volume of tokens = cost
Fourth option - loading the price into a table. This option allows working not only with a volume of 100 items but with any number of rows without increasing the cost, since working with the table occurs selectively with preliminary selections, which the model does using a special SQL query mechanism.
❌ Cons:
medium setup complexity
✅ Pros:
lower volume of tokens = cost
unlimited number of items/rows
reduction of instruction and history context
high predictability and accuracy of responses
High predictability and accuracy of responses are achieved because we do not read the entire table, but receive only the cells we need, thus neither context nor history is overloaded.
Last updated