# Dynamic RAG
I'm going to be giving a voice note on Dynamic RAG, which is a technique that was discussed in CS598JH. The title of the paper is Dynamic RAG, Leveraging Outputs of Large Language Models as Feedback for Dynamic Reranking in Retrieval Augmented Generation.
To start, to understand how RAG works, you have a generator, a retriever, a corpus document, a query, and you start with this query, then you have a retriever that retrieves some N documents from the corpus. Then you pass those N documents to a generator, and then it gives you your answer. But there's a problem that this N is a fixed value, and what was seen is that the order in which the documents are shown to the LLM also matters. So this retriever does not have the ability to rank the documents based off the importance to the query. It's probably simple just like embed documents with a cosine similarity, etc. And then the purpose of this work is to have a re-ranker that is able to take those N documents that are retrieved from the retriever and then rank them in an order that is better for the LLM, that makes the LLM generate a better answer, as well as decrease the number of documents to a number K, which the re-ranker gets to choose, because more documents does not always mean better. So how is this, so what is this re-ranker? It is another LLM that reads the documents and then it outputs an order for the documents to be in, and it also outputs the number K, which is the number of documents that are going to be passed to the next LLM.
But how do you actually train this LLM, how do you train this re-ranker LLM to give you not only good documents from the Scorpus, but also how do you train it so that the LLM improves its other LLM? Well, what you do is you have this training setup where you have an LLM that is a generator and then you freeze the weights of that LLM so that the LLM does not get updated at all. Then you train an LLM, you train the re-ranker LLM to output documents and then the frozen LLM generator will just generate documents based off the, or generate an answer based off the documents you received, then you, the LLM will generate many, like the re-ranker will generate many different series of documents, many different rankings of documents, and then pass that to an LLM, the LLM will generate an answer, and then based off that answer you'll compare that to a ground truth answer, and then you'll get a reward signal for the original re-ranker. So from this, now you have a reward signal that then you can update the weights of the original LLM re-ranker, and in this paper they use DPO as the learning algorithm. They show that they get better results than similar sized LLMs with other RAG algorithms, and then the LLAMA 8-billion is able to outperform the GPT-4-0 without retrieval. And then other results that they've shown that their RAG is better. What was interesting to me was that sometimes the LLM could just save that, could set k equal to zero, and make it so that no documents are passed to the generator LLM. I thought that's interesting. I would not have expected the LLM to find that behavior, but it was able to find that behavior.
## Screenshot
