Organizations transitioning from AI exploration to implementation often struggle with technical limitations and regulatory hurdles. Misjudging the capabilities of Large Language Models (LLMs) can easily derail AI projects. This article assesses LLMs' strengths—such as advanced language processing and scalability—and weaknesses, including lack of true understanding and high operational costs. It outlines successful use cases while advising against their use in high-risk or regulated areas. An evaluation framework is introduced to help organizations align their AI strategies with business objectives, focusing on Risk, Confidentiality, Transparency, and Speed to ensure effective LLM deployment.
Many organizations have now moved on from the tip of the Gen AI hype cycle to experimenting with different use cases internally within their organization. However, many are facing multiple challenges in adopting Gen AI applications and solutions due to a wide number of factors, some inherent to the technology itself, others pertain to the non-technical factors such as regulation, governance, risk management and change management. One of the biggest reasons for the failure of pilots and POCs is working on the wrong use cases, which some are driven by the overstatement of LLM capabilities by marketers. So how do organizations know which use cases to work on and which to avoid? With the latest developments in the space, we will breakdown the strengths and weakness of LLMs, and leverage on that understanding to highlight areas that have seen successes in LLM applications, as well as areas that are more challenging and difficult.
Strengths of LLMs
LLMs and its derivative architecture and systems (e.g. Retrieval Augmented Generation or RAG, multi-agent systems etc.) have the below core strengths:
Strong Natural Language Processing Capabilities: LLMs are trained with large amounts of text which allows them to process unstructured text effectively and generate coherent and contextually relevant responses across a wide range of topics and languages. Especially useful for handling document processing tasks.
Human-Like Interactions: LLMs can mimic human-like interactions such as conversations, opening up possibilities in areas where natural language understanding and generation such as chatbots, virtual assistants, and customer service applications.
Diverse Range of Secondary Capabilities: With RAG, LLMs can now ground their responses based on a specified set of documents or knowledge base, allowing them to response to queries with detailed, grounded and specific information. Some LLMs support internal tool calling (previously known as function calling), allowing LLMs to perform a wide range of actions such as web search, internal file search, and performing specific functions.
Scalability: LLMs can handle increasing amounts of data and complex tasks without significant changes to their architecture, making them suitable for scalable solutions in large-scale applications.
Low Data Prerequisites: Unlike traditional AI/ML that require decent amount of quality data, many LLMs are pre-trained and can be used right off-the-shelf, allowing organizations without much data to still leverage on its capabilities.
Weakness of LLMs
It is also especially important to understand the core weakness of LLMs below:
Lack of Understanding and Reasoning: LLMs Despite appearing intelligent, LLMs do not truly understand the content they generate. They replicate patterns seen in their training data. This can result in nonsensical or factually incorrect outputs (hallucinations), especially when the query deviates from common knowledge or straightforward interpretations. They also perform badly on logical reasoning tasks (ask any top models on how to solve a logical puzzle and see it fumble).
Non-Reproducibility: LLMs are probabilistic and sometimes struggle to provide the same outputs even when the same inputs are given. This is an inherent weakness of LLMs and will pose a challenge for tasks that requires deterministic and reproducible outputs such as credit card approvals.
Lack of Transparency and Control: LLMs are black-box models and it is very challenging to identify the attribution of the outputs to the inputs and model parameters. It is also very difficult to tweak and maintain control over the outputs of the model. This poses challenges to tasks that require very specific control over the output or requires certain level of transparency on the model workings, such as high precision tasks and regulated industry (e.g. banking and finance) use cases.
Difficult to Improve, Update and Maintain: Once deployed, updating the LLM application to incorporate new knowledge or correct errors without retraining or fine-tuning can be challenging. Improving the existing performance of LLM applications also requires substantive technical know-hows.
High Operational and Environmental Costs: LLMs requires substantial amount of compute which generally translate to higher operational and environmental costs. This inherently setups a feasibility barrier for lower ROI use cases. However, recent developments have continuously improved the cost efficiency of LLMs.
Privacy, Bias and Security Concerns: LLMs inherit the bias from the data they are trained on, and may exhibit unfairness and biases in their outputs. This will be challenging for tasks that might impact human rights or tasks with high ethical concerns. LLMs also poses additional vulnerabilities which can be a security risks (see OWASP Top 10 Vulnerabilities for LLMs). Organizations should have proper safety and security measures in place before deploying it for sensitive / highly-confidential use cases.
Successful Areas
Some areas where LLM applications have seen successful adoptions (non-exhaustive) are:
1. Enhanced Customer/Employee Service and Engagement
Chatbots and Virtual Assistants: LLMs facilitate routine inquiries through chatbots, significantly reducing the need for large customer service teams. This automation allows human agents to focus on resolving more complex issues that require human intervention. A widely publicize example is Klarna’s AI assistant that it claims to handles two-thirds of its customer service chat and drive US$40 million profit improvement in 2024. Other use cases includes internal policy assistant, call center automation, automated customer surveys, learning and training coach etc.
Personalization at Scale: Leveraging LLMs, businesses are able to offer personalized experiences to a large customer base, enhancing engagement and loyalty without proportional increases in human labor. An example would be Blend who leverages Gen AI to provide personalized clothing guide to its customers.
2. Streamlining Research and Data Analysis
Automated Research & Report Generation: With tools such as web search and internal file search, LLMs can now easily search the web and internal knowledge base for relevant information, and generate an analysis or report based on users’ queries. LLMs can also pass the extracted information into downstream tasks which make information integration seamless. Example use cases include automated market research, competitor analysis and monitoring. A simple “Company Researcher” CustomGPT created by us can be found here (requires OpenAI ChatGPT Plus subscription to access GPT Store).
Brainstorming and Critique of Ideas: LLMs are also very good tools to brainstorm new ideas, get feedback, critique or advisories. Leveraging on organizations on internal knowledge base and/or specific documentations such as policies, LLM applications can provide very specific and useful advice and analysis. Example use cases would be employee act advisories, proposal feedback and critique, and brainstorming of marketing campaigns.
Synthetic Data Generation: LLMs are powerful synthetic data generators and can easily generate high quality synthetic data for a wide purposes of use cases such as application development and testing, as well as dynamic data masking. LLMs are also widely used to generate data to evaluate LLMs or train/fine-tune LLMs.
3. Automated Content Creation
Marketing and Communications: Marketing departments are experiencing up to a 50% reduction in content creation time by utilizing LLMs to draft initial content based on outlined parameters. This automation allows creative teams to focus on refining and customizing messages, thus improving the quality and effectiveness of marketing campaigns. This can be further enhanced by providing the organization’s brand and marketing guidelines to output content tailored to the organization. An industry example would be Carrefour’s Gen AI-powered Marketing Studio.
Proposals and Report Generation: Automated report generation by LLMs is becoming common, enabling businesses to produce proposals, regular financial and operational reports with greater accuracy and reduced effort.
4. Enhanced Document Processing
Document Parsing: With some LLMs now boasting multimodal capabilities (ability to process different kind of inputs and outputs other than text), parsing of information from document now is easier than ever with higher accuracy as compared with traditional indexing methods. The information extracted can be further structured and enhanced with document intent and custom validation rules. The parsed information will be passed to downstream tasks and reduce the need for manual data entry. This is usually combined with a rules-based algorithm or machine learning model for automated claims or application processing.
Automated Labelling and Classification: LLMs are also very capable in identifying the content of a document or feedback and providing it relevant labelling or tags. This is especially useful in organizing and analyzing large volumes of documents or feedbacks for insights. Example use case would be automated ticket classification for customer support, customer feedback analysis, employee exit interview analysis etc.
Challenging Areas
Given the limitations of LLMs, below are some areas which are more challenging to launch a successful LLM project:
1. High-Stakes Decision Making
Medical Diagnostics: While LLMs can assist in gathering information or drafting preliminary reports, relying on them for critical medical diagnoses or treatment decisions without human oversight could be dangerous due to the risk of inaccuracies or "hallucinated" information.
Legal Judgements: Legal decisions often require nuanced understanding and interpretation of law that LLMs may not fully grasp. The potential for errors and the lack of explanation for decisions made by LLMs could lead to unjust outcomes. Legal documents without proper pre-processing is also incorrectly interpreted by LLMs.
2. Regulated Activities
Credit and Lending Decisions: These processes are typically governed by specific regulations in each country, and transparency, fairness, as well as deterministic and controllable outcomes are usually required. LLMs applications are generally not recommended.
Child Services and Education: Any technology used in educational settings or services aimed at children is often tightly controlled to ensure safety and appropriateness of content. The use of LLMs in these areas requires careful consideration to avoid the dissemination of inaccurate or inappropriate material.
3. Sensitive and Confidential Applications
Personal Data Processing: LLMs processing personal or sensitive data might pose privacy risks, especially if the model inadvertently learns and reproduces private information. Organizations must ensure compliance with data protection regulations like GDPR and avoid using LLMs in ways that could breach confidentiality. Allowing LLMs access to sensitive data without proper controls would also lead to potential data breaches.
Government and Military Applications: In areas involving national security or critical infrastructure, the black-box nature of LLMs and their vulnerability to adversarial attacks make them risky choices for handling classified or sensitive information.
4. Real-Time Systems
Emergency Response Systems: In environments where real-time, error-free operation is crucial, such as emergency dispatch or control of critical systems (e.g., public transport systems, traffic control), the slight unpredictability and delay in LLM responses can be a liability.
Automated Trading: In financial markets, where decisions need to be made quickly and accurately, the probabilistic nature of LLM outputs and slow response time can introduce significant risk. Misinterpretations or erroneous data processing could lead to substantial financial loss.
A Simple Assessment Framework (RCTS)
Assuming organizations have identified use cases that align to their business objective, leaders can perform a quick assessment on the LLM use case suitability with the following four dimensions:
Risk: Does this use case have a low margin of error or high impact for inaccurate outputs?
Confidentiality: Does this use case involve information that is highly sensitive and confidential?
Transparency: Does this use case require a high degree of control and transparency to the model workings and outcomes?
Speed: Does this use case require a fast response time?
If any of the above is high for a particular use case, it will be more challenging to develop and launch a successful project. However, there are various strategies (e.g. on-premise deployments) and design choices (e.g. human-in-the-loop systems) that can be adopted to mitigate some of the limitations of LLM applications, which will be further explored in another article. We will also gradually publish more articles on industry and function specific use cases, so keep a lookout on our blog.
If you're interested in using Generative AI to transform your business, need practical AI implementation advice, or require custom AI solutions, feel free to contact us for a discussion.
Note: The information provided in this article is for informational purposes only. We do not have any business relationship with the vendors and solutions mentioned herein, nor do we endorse or recommend them in any way.
Commenti