Prompt engineering for web scrapping

Sidhartha Mohapatra
5 min readApr 22, 2024

--

I recently used LLM to extract information from a credit card company’s website.

The specific information I targeted encompassed

  • Credit Card Name
  • Reward Categories
  • Reward Points for Categories
  • Reward points Terms and conditions
  • Benefits Categories
  • Details about Benefits
  • Fees detail

For the project, I used OpenAI’s “gpt-3.5-turbo” LLM model along with “text-embedding-3-large” model for embedding. For the prompt and chain creation I used Langchain’s prompt template and pre-built chains.

Process flow diagram

My primary objective was to gauge the similarity between the text generated by the LLM and the text manually extracted from the website. To measure the similarity, I calculated the cosine similarity between the embeddings of the documents.

I experimented with various prompt techniques, including zero-shot prompts, few-shot prompts, and few-shot prompts with stuff chain types, among others. Each approach aimed to enhance the similarity score of the generated text.

Cosine similarity score for various prompt techniques

Zero Shot + Simple Prompt

As you can see from the above graph, this technique generated texts with the lowest similarity score. This outcome aligns with expectations since we didn’t give any examples to the model for reference and the instructions in the prompt was simple and didn’t clearly convey about our requirements. Due to the absence of clear instructions regarding our requirements, the model generated a general response applicable to various contexts.

#this is the simple prompt that I feed into the LLM;Requirements aren't conveyed clearly
simple_question="""Give a list of all the reward categories,corresponding points and required details for each category,
also include the various benefits and corresponding details and include the fees from the given context"""

Zero Shot + Detailed Prompt

We observe a notable increase in the similarity score compared to the previous “Zero Shot + Simple Prompt” technique. I didn’t give any examples but I feed a detailed prompt into the model, conveying the requirements and explaining the requirements.

detailed_question="""
You're tasked with building a comprehensive database of credit card rewards. Your goal is to extract detailed information on SBI Elite Credit Card, including(if available) but not limited to:
Credit Card Name: Extract the name of each credit card.
Reward Categories: Identify the categories for which rewards are offered (e.g., dining, travel, groceries,).
Reward Points for Categories: Collect data on the number of reward points offered per tansaction amt for each category.
Reward points Terms and conditions:Gather specifics about the reward points system, including redemption options, point expiration policies, points cap.
Benefits Categories: Identify the categories for which benefits associated with each credit card (travel insurance, airport lounge access, concierge services, milestone benefits).
Details about Benefits: Provide in-depth information about each benefit, including coverage limits, eligibility criteria, and any additional fees, discounts, bonus points, voucher, airport lounge access.
Fees detail: Gather the joining fee,renewal fee,minimum spending for renewal fee waiveoff.
Your task is to compile this information into a structured dataset for analysis and comparison purposes. Be thorough in your extraction, ensuring accuracy and completeness of the data.Only use the below provided context for anwering the above question, incase you don't find certain answer don't make answers of your own.
"""

Few Shot + Simple Prompt

We see a drop in the similarity score, nearly equal to “Zero Shot + Simple Prompt” method. I had created a few examples and feed into the the model but the prompt was simple. It seems the model didn’t learn much from the examples ,this suggests that a larger number of examples may be necessary for the model to fully understand and effectively generate the desired output.

Few Shot + Detailed Prompt

This method performed the best among all. I had feed few examples like in the previous method and along with that a detailed prompt.So this is the way to go, a few examples to refer to and a clear instructions about the requirement.

Retrieval — Stuff and Refine chain type

Just like humans, too much information can overwhelm the model, so we’ll simplify it by breaking the context into small pieces. Then, we’ll look for the most likely set of pieces that contain the answer and send them as context to the LLM.

Stuff Type + Few Shot + Detailed Prompt

Instead of feeding the entire context into the model I tried to gather all those paragraphs from the context which are expected to have, answers to my question. Then I stuffed them together to create a new context and feed it into the LLM.I used Langchain’s prebuilt chain with my custom prompt.

stuff_chain = RetrievalQAWithSourcesChain.from_chain_type(
llm=llm,
chain_type="stuff",
retriever=db_context.as_retriever(search_kwargs={"k":10}),#here the value K will determine within the split files set how many files should be used
chain_type_kwargs={
"verbose": True,
"prompt": stuff_prompt,

},
)

It gave a performance similar to that of “Few Shot + Detailed Prompt” but it was relatively cheaper as we gave less token into the model.

Stuff Type + Few Shot + Detailed Prompt

In this technique I gathered the required paragraphs but instead of stuffing them together, I feed the paragraphs one by one as context into LLM and instructed the LLM to refine the answer(if required) on the basis of new information in the current context.

So here we need to give two prompts, one for the initial answer generation and the other one to ask the LLM to refine the answers.

Refine_chain = RetrievalQAWithSourcesChain.from_chain_type(
llm=llm,
chain_type="refine",
retriever=db_context.as_retriever(search_kwargs={"k":10}),
chain_type_kwargs={
"verbose": False,
"question_prompt": question_prompt,#both the prompts are given in the arguments
"refine_prompt": refine_prompt,

},
)

This is relatively expensive than all other methods and didn’t perform as per expectations. I guess running on small chunks is making the context window narrow and the LLM is unable to see the bigger picture. Maybe experimenting with the chunk size and the search kwargs can help.

From what I observed to improve performance, providing a few examples and clear instructions is key. Examples help the model understand what you’re looking for, while detailed instructions guide its responses. This way, the model can learn from the examples and instructions to give more accurate and relevant answers.

For entire code head over to my github page

--

--

Sidhartha Mohapatra
Sidhartha Mohapatra

Written by Sidhartha Mohapatra

I am professionally a data engineer/analyst with interest in value investing.

No responses yet