DeepSeek Enters the Financial AI Arena

Advertisements

After more than two years of exploration, a consensus has emerged within the financial sector: while large language models (LLMs) achieve quick answers through speedier processing and massive data training, they still fall short of meeting the real-world business needsThere is a crucial need for reasoning and logical inference, often dubbed as “slow thinking.” This is essential for industries requiring deep, intricate reasoning in their operational dynamics.

The financial industry witnessed a significant change last September when OpenAI launched its revolutionary o1 modelThis innovation sparked a frenzy within the global AI market because it employs reinforcement learning and the dissection of thought processes to tackle complex problemsAs a result, the reasoning capabilities of this model improved drastically, particularly in solving mathematical problems and handling sophisticated tasks more efficiently compared to traditional models like GPT.

Nevertheless, the enhancements in o1 model's reasoning abilities come at a cost—specifically, response speed and computational expensesUnlike the rapid-fire responses of the GPT series, which can often deliver answers within seconds, o1’s responses typically require around ten seconds, making it not only slower but also more expensive to operate.

In a groundbreaking move, DeepSeek has officially launched its DeepSeek-R1 reasoning modelThis development includes open-sourcing the model's weights, making it the first model globally to successfully replicate o1's competenciesRemarkably, while maintaining similar reasoning performance to the o1 model, DeepSeek-R1 has reduced its computational power requirements to a mere tenth of the originalThe cost of using the R1 model through official APIs is significantly lower; the price for an input token (when cache hits are effective) is only one-fiftieth of o1's, while costs for cache misses and output tokens drop to about one twenty-seventh.

Industry representatives have pointed out that the R1 model embodies the concept of "AI equity" through open-source accessibility, unleashing the potential for more small to medium-sized financial institutions to innovate and implement AI solutions with greater certainty

Advertisements

However, the pathway to adopting this high-performing model still requires reliance on traditional LLMs for training datasets, algorithm adjustments, and computational support during deployment.

It is also crucial to clarify that the announcement of many financial institutions integrating the DeepSeek-V3 and full-size R1 models does not indicate a retreat from other LLM vendorsUntil we encounter the next technological "singularity," a coexistence of general-purpose models, deep reasoning models, multimodal models, and smaller models in key business scenarios is likely to continue, with applications tailored to various contexts.

In essence, the emergence of a "comprehensive 211 master's degree" within modelling capabilities indicates a significant advancement. “Hmm, the user's question is…” represents a common starting point when users engage with DeepSeekUpon receiving a question, the response comes after a thorough ten-second contemplation period, elucidating how it interprets the query, identifying relevant aspects, and anticipating user expectationsThis showcases the R1 model’s embodiment of “slow thinking.”

The R1 model's ability to engage in slow thinking is a result of DeepSeek's innovative algorithm designIn stark contrast to traditional models that rapidly provide answers, R1 refrains from offering immediate responsesInstead, it undergoes several layers of reasoning for the instructions at hand, employing a chain of thought, consensus-building, and information retrieval to formulate the best answerEach instance of content generation demands repeated contemplation, thereby producing more output tokens and enhancing overall model quality.

According to numerous interviewees, the most significant transition has been R1's ability to fill the logical reasoning gaps prevalent in conventional LLMs. “Overall, the current generation of LLMs possesses capabilities akin to a versatile 211 master's degree,” stated an IT executive from a regional brokerage firm

Advertisements

He emphasized that an effective digital employee requires both fast and slow thinking capabilities, which R1 successfully integrates.

The executive admitted that prior to DeepSeek's R1 release, institutions had limited options regarding deep reasoning models—typically attaching themselves to APIs like o1. Given the compliance and data security standards within the financial sector, applying these models in production would have been unrealisticWith the R1 model, however, complex datasets can be thoroughly analyzed using chain-of-thought reasoning that mirrors human cognitive processes while matching o1's performance.

A business leader from a municipal bank voiced a similar sentiment, stating, “Previously, our integrations mainly comprised general-purpose models such as Tongyi Qianwen and DeepSeek-V3, but none provided the deep reasoning depth akin to o1. The release of R1 finally endows AI applications with genuine capability for deep thought.”

In context, consider the example of intelligent marketingTraditional general-purpose models might produce marketing scripts generated either through rigid rule systems or rapid responses based on user instructionsHowever, if challenged by the instructions’ givers, these interactions may fail to generate meaningful marketing outcomesIn stark contrast, deep reasoning models provide intelligent and precise interactions based on thorough understanding.

While the R1 model stands out as a superior reasoning entity, to gain acceptance and recognition in the finance sector, it must navigate hurdles akin to those faced by general-purpose modelsFindings from the Vectara HHEM artificial intelligence hallucination tests demonstrated that DeepSeek-R1 exhibited a hallucination rate of 14.3%, significantly higher than the 3.9% rate seen in the V3 model and exceeding industry averages.

“Learning without pondering leads to confusion, and pondering without learning leads to peril,” underscores a universal truth applicable in AI realms as well

Advertisements

The exceptional reasoning prowess of the R1 model shines in mathematical and logical reasoning, but it occasionally induces “hallucinations” when processing topics rich in humanities.

A representative from a municipal bank acknowledged the R1 model's remarkable performance in logical reasoning, noting it as the strongest among current Chinese models; however, the need for localized deployment accompanied by relevant financial corpus still requires addressing.

“Simply pouring in knowledge doesn’t immediately translate to deploying its logical thinking abilities effectively,” he explained, pressing the necessity for all LLMs to tackle computational requirements and data management challenges in the finance sector.

On one hand, the surge in data requirement driven by AI strategies holds great significance for financial institutions, compelling them to prioritize data governance as a response to heightened hallucinations within the R1 modelOn the other end, institutions looking to deploy these models locally must bear in mind the dual facets of computational expenses: the costs associated with activation and the ongoing operational costs during the inference processThe former poses fixed expenses, while the latter significantly influences model performance and outcomes.

Yet, in contrast to general-purpose models, the R1 model boasts marked advantages, notably in both inference and training costs, thus lowering the operational costs of deploying locally.

Interestingly, while a plethora of financial organizations now connect to DeepSeek-R1 or V3 models, the choice of model size varies based on each organization's resources, expenditures, and applicationsThe aforementioned banking executive indicated that the DeepSeek-R1 model offers various sizes, including 7B, 14B, 32B, and an exceptionally large 671B modelInstitutions can choose the most suitable model based on their specific needsSmaller versions derive from the full-size 671B model through distillation, inevitably leading to performance differences compared to their larger counterparts

In contrast, the maximum model size of Tongyi Qianwen’s Qwen2.5 is limited to just 72B.

This disparity signifies that if a financial institution aspires to achieve optimal results via localized deployment, implementing the full-fledged version of the R1 model mandates a significantly higher VRAM requirement than Qwen2.5. Such factors likely explain why some institutions began experimenting with smaller models initially.

Transformation in the financial sector has commenced, augmenting creative application opportunitiesDespite varying depths of integration among financial institutions using DeepSeek, the R1 model’s robust reasoning capabilities are set to inspire revolutionary innovation in AI application development within the industry.

The financial sector finds itself well-acquainted with DeepSeekReports indicate that more than a year ago, certain financial institutions began testing AI large models and employing DeepSeek-Coder-V2 for open-source code language modelsAmong seven IT professionals from different financial institutions interviewed, five utilized intelligent code assistance frameworks based on Coder-V2’s capabilities.

“We experimented with most code language models available commercially; perhaps due to the inherent strength of quantitative programming in Yuankuang’s capabilities, Coder-V2 consistently outperformed others in code generation,” expressed one participant.

“By open-sourcing R1, DeepSeek has effectively democratized AI,” claimed the brokerage sourcePreviously, such advanced reasoning models necessitated exceptional technical expertise in deep learning, neural network inferences, and meta-learningHowever, DeepSeek's transparent technical reports have enabled diverse institutions to replicate the reinforcement learning workflows that build R1, allowing knowledge distillation to transition competencies from DeepSeek to finance-specific models, thus enhancing real-world applications of AI in complicated financial scenarios.

In the securities realm, the easy integration of AI is poised to unlock new pathways for innovation such as “AI + Brokerage,” “AI + Research,” “AI + Investment Advisory,” “AI + Compliance,” and “AI + Documentation.” Additionally, the organization intends to leverage the deep reasoning model to bolster AI functionalities in stock diagnostics, surge analysis, and generating industry research reports.

It is essential to highlight that the emergence of reasoning models does not mean a downturn for general-purpose models

Advertisements

Advertisements