Best Practices

Reading Proposals Faster with Natural Language Processing

Sandeep Reddy June 1, 2021 0 Comments

In the federal consulting arena, there are thousands of proposals that are issued under contracting vehicles such as eBuy, beta.SAM or CIO-SP3/4 each year, and proposal teams in almost every government consulting company spend significant time reading and reviewing these proposals. Every member of the team reviews the same document and then shares insights with others for bid/no-bid decision-making.

This process could become more efficient and less time consuming if we could extract high-level insights from these RFPs or RFIs with the help of computer algorithms or the latest developments in the space of natural language processing (NLP) and artificial intelligence (AI). NLP is a branch of AI that is used for interpreting, summarizing and understanding human language. The objective of NLP is to build systems or software pipelines that can make sense out of text and perform tasks such as translation, grammar checking, topic classification, etc.

NLP applications have achieved a good level of maturity, and a few impactful examples in recent times include:

Scraping latest reports, research papers, etc., and contextually summarizing key findings to save researchers time
Taking meeting notes and generating summaries
Building web applications using plain English instead of software code, etc.

When reviewing government RFPs, there are a few common questions that one looks for:

What are the high-level topics of this RFP?
When are the questions due?
Where is the location of work?
What type of contract is this (time and materials vs. firm-fixed-price)?
Who are the points of contact for this RFP?
Are there any certifications required?
What government agencies and internal organizations are associated with the RFP?

In addition to these, there can be many other organization-specific questions that the proposal team members look for, such as whether the RFP is relevant to the work performed by the organization in the past, etc. Using NLP technology allows for an automated system to read and summarize the questions above into a single succinct report. Team members can save time with initial screening and shortlisting of relevant RFPs, without the need to read through them entirely. Autonomously generating high-level summaries or insights and sharing them across project team members can reduce the time in making bid/no-bid decisions, and any time saved during this process directly equates to cost savings and optimization of resources.

Considerations for Building an NLP-Based AI System

1. Filter the noise.

There are several steps involved in creating an automatic proposal-reading system. One of the biggest challenges in this exercise is eliminating the noise, which means filtering out redundant, unnecessary information.

For example, to identify the points of contact for a proposal, such as name and email of a contracting officer, contracting specialist, etc., the NLP pipelines (which are a series of NLP-based software scripts) scan the entire proposal for people’s names and lists all individuals mentioned in this document through a method called named entity recognition (NER). NER is a natural first step toward information extraction and seeks to identify key elements in text such as names of people, places, brands, organizations, etc.

One factor to note is that in cases where the proposal includes a list of holidays, the NER results containing names will also include Martin Luther King Jr. because Martin Luther King Jr. Day is listed in the proposal. Sometimes, the president’s name may show up as a point of contact if the proposal is based on a new initiative by the current administration and if the president’s name is present in the contract.

Filtering such noise and extracting only the names of contracting officers or concerned personnel requires appending additional NLP rules or some form of machine learning. This takes continuous iterations of NLP pipelines for desired results.

2. Create content libraries.

Content libraries are lists of keywords that are used in pattern matching. Pattern matching is a common task in NLP, which consists of the process of checking whether a specific sequence of words or characters exists within a sentence or document.

For example, a content library containing all countries and cities will help with identifying several geographical locations in the RFP. Similarly, a content library with all government agencies and organizations are useful to detect the associated government bodies and organizations with an RFP.

Creating these content libraries, hosting them in central repositories and sharing with NLP developers can help with summarizing an RFP and eliminating noise. Content library creation is a manual process and requires proposal personnel entering names of organizations, software tools, acronyms, etc., either in a database or a list stored centrally, where every team member can collaborate on building them.

3. Use machine-learning models.

Several NLP-based machine-learning (ML) models will be required to extract topics within the RFP and for categorizing/classifying text within the RFP. ML is a branch of AI that involves teaching machines to perform repetitive tasks, such as categorizing images, fraud detection of credit card transactions, spam detection, etc. This process of teaching the machines involves providing historical data (text, numerical data, images, etc.) to any ML algorithm and training it to understand the complex relations within the data to ultimately achieve the desired outcome.

For RFPs, ML models can be used to detect the type of contract (time and materials vs. firm-fixed-price; or IT vs. service contract) and any other custom classification that is deemed necessary by the proposal team. Building these models require manually classifying and curating the training data prior to constructing the models, which, in turn, requires the IT team to work closely with the proposal team in establishing an interface where the manual classification and training is captured.

This seems labor intensive, but organizations must recognize the future benefits and the amount of time this can save moving forward. In this case, a simple categorization and tagging of RFPs on whether it is a time and materials vs. firm-fixed-price contract in an organization’s database will help the IT developers in building ML models.

A workflow where high-level summaries and insights are automatically shared across all proposal team members will be a reality soon. With this workflow, a significant amount of time will be saved during initial review, data recording and collaboration with other proposal members. NLP techniques such as pattern matching, NER, content libraries and machine learning have seen success in various applications, and they can deliver by enabling the organizations to read, review and summarize proposals much faster than today.

The future will have an AI algorithm proposing the best fit RFPs/RFIs for an organization, but it will be a slow transition to this phase. For any company to achieve this type of maturity, a good NLP foundation inclusive of named entities, content libraries, ML models and other NLP pipelines are imperative.

Sandeep Reddy is an AI research and innovation engineer at BMW. His focus has been mainly in the areas of predictive modeling, deep learning and natural language processing, with more than 10 years of experience. He has successfully implemented several AI-based solutions and products for corporate profitability and optimization.