Cost optimization

A common question is, "how to calculate the cost of a conversation and what does it depend on?" In this article, we will explain in detail how to manage and optimize costs.

What is a Normal Cost?

The average cost of a full conversation within the platform for all accounts ranges from 10 to 20 rubles per conversation.

This means that some clients have an average conversation cost of 4 cents, while others have an average cost of 30 cents per conversation.

Important!

The average cost of a conversation during testing is usually higher because the tests consider the longest and most effective conversations. In reality, longer conversations are balanced out by shorter ones, making the cost more average.

Differences arise for a variety of reasons. Some users may have a small knowledge base, a brief prompt, and no specific functions like tables, while others have complex systems with many functions, etc.

Let's explore this in more detail.

What Determines the Cost of Responses?

The cost of responses on the Suvvy platform, like any LLM-based systems, depends on the number of characters sent to the model. It's important to note that client messages cost significantly more than messages received from the bot or neural network:

This is because we send a lot more characters to the model than we receive from it, specifically, we send:

Client's Message
System Prompt/Bot Instruction
Functions created within the bot:

tables,
subordinate bots,
knowledge base elements,
CRM system operation elements (reading fields, writing fields, etc.)

These elements contain certain formats - function names, their descriptions, function structures in JSON, etc., i.e., data in the form of characters that we need to pass so the model can understand and properly use the context of these functions.

Information obtained from function calls (knowledge base, tables, etc.) to provide to the model for processing.
Conversation History so the bot understands the context and doesn't lose track.
Function Call History so the bot doesn't request the same functions multiple times.

As a result, we have a large amount of data that we pass to the model, which it needs to process, sometimes more than once.

For example, if the model is searching a table and doesn't find the answer, it may start the search again with different parameters. For the model, this is two separate calls where, in the first, some result is obtained and with this result, it goes for another round.

Now imagine it's a complex exchange with some system, like YCLIENTS, where just the functions for the exchange are more than 10. The cost increases significantly.

BUT there are solutions, which we'll discuss below.

How to Reduce Costs?

Let's consider several main ways to optimize response costs from Suvvy.

Reducing Conversation History

Reducing conversation history makes sense when interaction with clients occurs regularly. For example, in beauty salons, the client base often consists of regular clients who contact the communication channel (e.g., WhatsApp) periodically, like once every two weeks.

It doesn't make sense to pass the entire conversation history in the model's context, as this could make the conversation very long and repetitive. This would not only confuse the model but also make the cost higher.

In such cases, you can use history trimming through settings in message history transmission, located in the bot's Advanced settings:

There are several options:

Transmit history for X minutes. You can specify the number of minutes for which the bot will remember the message history.
Transmit history for X messages. You can specify the number of messages in the conversation that the bot will remember.

When to Use Which Method?

For recurring interactions, common when working with YCLIENTS / Altegio, choose the first option - limiting by the number of minutes.

For service bots, such as those connected through Telegram bots, for internal knowledge base access, you can limit by the number of messages (though time restriction also works here).

These limitations can be necessary, as ongoing conversations within one dialogue, if very long, make every message quite costly.

Turning Off Function Call History

Another useful function impacting costs is Save Function Call History. This setting is also in the bot's additional settings:

This setting determines whether previous function calls need to be transmitted to the model each time. Saving previous calls can be useful when obtaining information that might be useful in the future.

For example, a client asked, "When will order number 12345 be delivered?". The model accesses the CRM and fetches all order information, for instance:
Order Number: 12345
Order assembly date: 24.12.2024
Date of arrival at dispatch point: 25.12.2024
Delivery date: -
Order contents: Levis jeans, TH t-shirt, CC tank top.
The bot can respond: "Delivery time isn't determined yet, but your order is already assembled."
If the client asks: "Any rough estimate?"
The bot can say: "I see your order has arrived at the dispatch point; we'll get the delivery date soon and contact you", without sending another CRM request for order information.

If the dialogue contains all previous function call results, you can disable this function to reduce costs on long conversations by skipping prior function call data.

Deciding whether to keep this function enabled or not can only be determined after testing various situations.

Reducing Instructions/Prompts Using Subordinate Bots

About multi-agent capabilities and sub-bots, we have a separate section, where it's detailed how such a delegation system works by one bot and data transfer to others. The main idea is that we can allocate entire blocks of instructions and functions to isolated bots.

Let's consider an example:

Suppose a company has 10 different price lists stored in Excel spreadsheets.
If we upload all tables into one bot, we will have 10 different functions, each with its description, instruction, etc.
The downside is that it overloads the bot's context with information he must retain in memory and significantly increases dialogue cost since each function is transmitted per model request.

We can do it differently:

Create a subordinate bot (as detailed in the appropriate section)
Upload all tables to this subordinate bot

This means all functions' tables will be contained in the second bot, and the first will only have one function called "Prices", reducing function descriptions tenfold.

The solution lies in transferring certain parts to separate bots. In the above example, a separate bot could be created, connecting all tables, thereby only having one function for the main bot and only calling the subordinate bot with 10 table functions when the client requests a price list. But it would still be cheaper, as not all table functions will be passed every call.

Optimizing Responses from Functions (Tables, Knowledge Bases, etc.)

The last way is to control what the bot returns when called upon for a knowledge base, tables, and other functions.

With Tables

When dealing with tables and after obtaining results, we might receive large data amounts, like 20 rows of some information. The more info, the more costly the response. Hence, consider pre-defining questions that can narrow down table searches to find more specific information. If scenario-based pre-emptive narrowing isn't feasible, limit the information obtained from the table using the LIMIT operator. More on this in the tables section.

With Knowledge Base Files

For knowledge bases and the Direct Questions block, the solution is simpler - avoid creating files with extensive text; break them down into smaller, meaningful files.

For CRM Data Reading and Writing

When using integrations with amoCRM or Bitrix24, functional fields for reading deal or contact data and recording into them are available. Each field is additional information in the corresponding function; hence, the more fields chosen, the higher conversation costs. Select fields that are truly necessary.

Using the English Language

Not always apparent, but highly effective, using English in the main instruction over Russian. Typically, Russian costs 3-4 times more because a token (character) corresponds to one letter, with one letter such as Ы taking two tokens. Comparatively, in English, a token can represent a whole word, which is cheaper.

Another evident and impactful tactic is to carefully observe what information the bot-returned functions deliver to you. Consider whether the full volume is necessary; is it possible to split it into meaningful chunks? Is this information essential?

Let's delve into particular examples.

Reducing Resulting Information from Tables

Commonly, a table query yields large amounts of data, potentially leading to higher processed context volumes and, subsequently, costs.

For table queries, it's vital to narrow searches from the start to retrieve only necessary information.

Narrowing searches in client requests isn't always possible, so use the LIMIT operator during table queries to define the maximum data volume requested.

Detailed workings with tables are in the tables section.

Reducing Volume in Typical CRM Use

In this regard, it refers to using amoCRM and Bitrix24, capable of pulling data from deal and contact fields and writing into them.

The more fields chosen for requests/writing, the longer is the function responsible for this.

PreviousInstruction debugging NextVariables

Last updated 6 months ago