As human beings, we find understanding our native and even foreign languages relatively straightforward – well, after we have learned them. But why is that? The key lies in the learning process we go through. We start by learning our mother tongues as babies, subsequently picking up our first foreign language as children and so on. Hence, this continuous learning process trains our brains, making it easier to acquire new languages with each one we learn. This in theory is not much different from how a machine learns languages too. Natural Language Processing (NLP) adopts this idea to understand and generate languages, bridging the gap between human language and computational understanding. From ChatGPT to Microsoft Copilot, NLP has become a pivotal topic among numerous industries, including FinTech.
What is Natural Language Processing?
NLP is a subfield of Computer Science and Artificial Intelligence (AI) that uses machine learning to enable computers to understand and communicate with human language. It is composed of two primary components: Natural Language Understanding (NLU) and Natural Language Generation (NLG).
NLU focuses on comprehending and interpreting human language, allowing machines to process text, understand context and extract meaningful information. On the other hand, NLG focuses on the generation of a message or text by a machine based on human inputs. Together, these components form the core of NLP, enabling seamless applications such as language translation, sentiment analysis and text summarisation.
How does it Work?
Let’s look at this piece of text:
“In Malaysia, FinTech is revolutionising the banking sector with intelligent technologies. By integrating AI, it enhances transaction security and offers predictive financial advice, making financial services more accessible and secure. This innovation simplifies finance for individuals and businesses alike, empowering them to navigate the modern financial landscape with greater confidence and convenience.” |
This paragraph explains how FinTech is using advanced technology to transform the banking sector in Malaysia. When machines are able to read and understand text, it is able to automate tasks such as document processing, which reduces human effort and minimises errors. It also helps in extracting relevant information, which improves accuracy in identity verification. Sounds cool right? But to get there, we have to first teach our computer the fundamentals of written language and build up from there.
Step 1: Sentence Segmentation
Sentence segmentation is the first step in pre-processing text for NLP. It breaks the paragraph into separate sentences.
Like so, sentence segmentation then produces the following result from the text above:
- In Malaysia, FinTech is revolutionising the banking sector with intelligent technologies.
- By integrating AI, it enhances transaction security and offers predictive financial advice, making financial services more accessible and secure.
- This innovation simplifies finance for individuals and businesses alike, empowering them to navigate the modern financial landscape with greater confidence and convenience.
We can assume that each sentence in English is a separate thought or idea. It will be a lot easier for machines to understand a single sentence rather than an entire paragraph because sentences have clear markers that indicate when a sentence starts or ends. Additionally, machines can focus on specific patterns within a sentence, avoiding data sparsity issues that arise when processing larger blocks of text, like paragraphs. In practical aspects, segmentation models can be as simple as splitting sentences whenever there is a punctuation mark. However, most modern models use more complex models that work even when documents are not formatted properly.
Step 2: Tokenization
Using word tokenizers, we can now break our sentences into separate words or tokens. This is called tokenization. Through this technique, machines can use algorithms to easily identify patterns and structures within texts, enabling more efficient analysis and processing of language data.
With the first sentence from our paragraph:
“In Malaysia, FinTech is revolutionising the banking sector with intelligent technologies.” |
This is the result after tokenization:
“In”, “Malaysia”, “,” , “FinTech”, “is”, “revolutionising”, “the”, “banking”, “sector”, “with”, “intelligent”, “technologies”, “.”
Step 3: Stop Word Removal
Next, we need to consider the ‘important’ words in this paragraph. English has a lot of filler words such as “and”, “a” and “the”, which need to be filtered out to give more focus to the crucial information. These are called stop words, and they need to be removed before performing any statistical analysis. However, the removal of stop words is highly dependent on the task we are performing and the goal we want to achieve. For instance, if we want to perform sentiment analysis, then we might not want to remove the stop words.
From the first line of the text, this would be the result after the removal of stop words
“In Malaysia, FinTech revolutionising banking sector intelligent technologies.” |
Step 4: Lemmatization
Lemmatization is a text pre-processing technique used in NLP models to break a word down to its root or base form, known as “lemma”, to identify similarities. For example, the words “intelligent”, “intelligence” and “intelligently” are all derived from the word “intelligent”. Thus, when working with a computer, it is crucial to know the base form of each word, so that the computer does not distinguish these words as three totally different words.
Step 5: Dependency Parsing
After that, we need to find out how all the words in a sentence relate to each other. Thus, the use of dependency parsing. Dependency parsing helps us understand the grammatical structure by identifying the relationship between words. It involves determining which words are the main words (heads) and which words depend on those main words (dependents). From the third sentence in the paragraph, dependency parsing identifies “enhances” as the main verb, “it” as the subject and “By integrating AI” as the prepositional phrase. As a result, the process creates a tree-like structure which shows how each word in the sentence is connected to each other, making it easier to analyse the sentence’s meaning.
Step 6: Named Entity Recognition (NER)
Now that we have tackled the tough parts, we can finally move past grammar and actually extract information. NER, also known as entity chunking or entity extraction, is a component of NLP that identifies predefined categories of objects in a body of text.
Here are some of the categories NER are able to recognise:
- People’s names
- Company names
- Geographical locations
- Product names
- Monetary values
The goal of NER is to detect and label these nouns with the real-world concepts that they represent. From the sentence: “In Malaysia, FinTech is revolutionising the banking sector with intelligent technologies.”, NER will be able to detect and tag “Malaysia” as a geographic entity.
While this is a general overview of how NLP, different techniques can be applied within NLP to handle more complex tasks for various purposes. Advanced technologies, such as transformers and neural networks, are examples of methods that enhance the capabilities of NLP, allowing it to achieve even more sophisticated outcomes.
Applications in FinTech?
Since its surge in prominence, NLP has been adopted in various industries with diverse applications. Among the popular fields are FinTech, manufacturing and marketing. In this publication, we will explore how NLP is employed specifically in FinTech.
Business is without a doubt risky. In fact, there’s a well-known saying: “no risk, no reward”. A prime example would be the investment and financial firms utilising NER and NLP systems to extract key information from customer documents. This data is essential for conducting risk analysis, where customer profiles are evaluated to assess loan risks. Using document categorization and an established risk assessment criteria, NLP models analyse basic application documents such as account history, credit history, employment and education. Therefore, this streamlined approach speeds up the analysis process and provides a broader understanding of each customer’s circumstance.
Furthermore, in today’s competitive landscape where customer satisfaction is paramount, many companies use NLP to interpret their customer’s emotions towards their products using sentiment analysis. Through tokenization, NLP breaks down unstructured data including social media posts, financial reports or news articles into tokens. These tokens, generated from text and document content, are trained using algorithms linked with a combination of different emotions to understand a customer’s needs. For example, the investment bank, Morgan Stanley, uses NLP tools to detect and collect online criticisms and allegations in real time, providing their investors with key information on public perception and their potential impact on their company stock prices.
Not only that, NLP has been an indispensable tool towards fighting against fraud. A notable example would be when JP Morgan utilised machine learning algorithms trained on historical data to identify patterns that are indicative of fraud, such as unusual spending patterns or large transactions from unfamiliar locations. These algorithms allowed JP Morgan to quickly identify and respond to potential fraud. Such approaches allow organisations to adapt to cybersecurity threats and maintain robust security measures.
Overall, NLP plays a significant role in FinTech by improving customer interactions, managing risk and enhancing data analysis. As NLP evolves with AI advancements, it continues to drive innovation in financial services, promising a future where technology elevates efficiency and meets evolving industry demands.
Written by: Swetha Jayaprasad Rao
About Swetha
Swetha is currently an A-Levels student at Kolej Tuanku Ja’afar. She is an aspiring Computer Science student, with a keen interest in data science, programming and machine learning.
About MYFinT
Malaysian Youth FinTech Association (MYFinT) is a non-profit youth organisation dedicated to to empower, motivate and inspire the young generation in all industries to gain exposure to the latest trends and development in the FinTech industry.
Sources:
- “Analysis of Natural Language Processing in the FinTech Models of Mid-21st Century.” Https://Www.researchgate.net/Profile/Pascal-Muam-Mah/Publication/363255877_Analysis_of_Natural_Language_Processing_in_the_FinTech_Models_of_Mid-21st_Century/Links/63137249acd814437ffe4434/Analysis-of-Natural-Language-Processing-In-The-FinTech-Models-of-Mid-21st-Century.pdf.
- “How AI Can Bolster Sustainable Investing.” Morgan Stanley, www.morganstanley.com/ideas/ai-sustainable-investing-use-potential#:~:text=NLP%20tools%20can%20be%20used,impact%20on%20company%20stock%20prices.
- Khanna, Chetna. “Text Pre-Processing: Stop Words Removal Using Different Libraries.” Medium, 10 Feb. 2021, towardsdatascience.com/text-pre-processing-stop-words-removal-using-different-libraries-f20bac19929a.
- “NLP Tutorial – Javatpoint.” Www.javatpoint.com, www.javatpoint.com/nlp.
- Riti Dass. “The Essential Guide to How NLP Works.” Medium, Medium, 24 Sept. 2018, medium.com/@ritidass29/the-essential-guide-to-how-nlp-works-4d3bb23faf76.
- UK, ACODS. “How JP Morgan Uses Data Science? – ACODS UK – Medium.” Medium, Medium, 18 Jan. 2023, medium.com/@Acods/how-jp-morgan-uses-data-science-2066871b2de8#:~:text=JPMorgan. Accessed 16 July 2024.
- “What Is Natural Language Processing (NLP) & How Does It Work?” Levity.ai, levity.ai/blog/how-natural-language-processing-works.
- “What Is Tokenization.” Datacamp, www.datacamp.com/blog/what-is-tokenization#.