Large language models (LLMs) are very good at general-purpose text generation, but often we want to generate an output based on specific data or context. Previously, we would need to fine-tune an LLM on our own dataset for a particular use case. However, this is no longer the case with modern AI models. As a surprising result of their very large size and training corpus, we find that LLMs like GPT-3 are very good a multitasking or learning tasks given only a few examples (called in-context learning). This is achieved by adding information to the input prompt, without any updates to the weights of the model. This can include the task that you would like the model perform (e.g. summarization, question answering), some examples of inputs and outputs that you desire, or some data or context that the model can use to create the output. Early research has suggested that in-context learning may not be so different to fine-tuning. Users of ChatGPT may have found that passing more context in the prompt tends to create better outputs. For example, if you ask ChatGPT to write an email to a prospective employer, you may have given it some context about previous correspondences and your resumé.
Manually copying and pasting from our personal data sources into the input prompt every time is tedious. What if we could automatically provide relevant context to ChatGPT based on our prompt to the model? Asking ChatGPT to find you a job could automatically pull in previous emails that are relevant, as well as information from your resumé. This is the aim of the retrieval-augmented generation (RAG) techniques. RAG uses a two-step process to generate outputs for the user. The first step is knowledge retrieval where relevant personal data is fetched from your connected data sources, based on your prompt. During the second step, this data is passed into the LLM along with the prompt, so that it can be aggregated, reasoned upon and modelled with natural language.
The Algovera Flow platform uses in-context learning and retrieval-augmented generation to create personalized AI assistants that are integrated with your data sources. You can connect to platforms such as Discord, Notion, Obsidian, GitHub, Calendar, Twitter and YouTube, as well as Web3 data infrastructure. These data sources are used to complete various tasks in your everyday workflows.