Query handling systems (the “smart RAG” layer)
FULL TRANSCRIPT
This layer decides how to ask for
information before asking what to
retrieve.
Query handling systems. Smart rag layer.
Big idea one sentence. Instead of
sending a raw question to retrieval, you
analyze, expand, split, and orchestrate
queries so retrieval has the best
possible chance to succeed. One, why
smart rag exists at all. Users ask
questions like humans vague overloaded
ambiguous multi-art vector stores expect
focused queries clear intent scoped
retrieval. So do is about the translator
in the middle two query expansion say
more than the user did what it is you
take the user's query and expand it with
related terms or synonyms to improve
recall. Example, user asks open quote
leave policy for parents. Close quote.
Expanded internally to parental leave,
maternity leave, paternity leave, career
leave, paid parental leave. All of these
are used for retrieval even though the
user never typed them. Why AWS likes
this? Increases recall. Reduces nothing
found cases. Helps with short or vague
queries. Exam signal improve recall.
Broaden search synonyms. Query
expansion. Query decomposition split
complex questions. What it is? You break
a multi-part question into multiple
focused subqueries.
Example, user asks, open quote, what are
the tax benefits and eligibility rules
for first home buyers in NSW after 2023?
Close quote decomposed into one, tax
benefits for first home buyers in NSW
after 2023. Two, eligibility rules for
first home buyers in NSW after 2023.
Each subquery retrieves its own
evidence. Why this matters? Each sub
question is clearer. Retrieval quality
improves. Final answer is more complete.
Exam signal. Multi-art question. Complex
query. Query decomposition. Natour.
Query transformation. Change the shape,
not the meaning. What it is. You rewrite
the query into a form better suited for
retrieval. Not for humans. Examples.
Turn a question into a statement. Remove
conversational fluff. Normalized tense
and wording user open quote can you tell
me if I'm allowed to close quote
transformed open quote eligibility
criteria for close quote why this helps
vector and keyword search work better on
concise factual phrasing exam signal
rewrite for retrieval normalized queries
query transformation five orchestration
with step functions this is the AWS part
all of the above steps are not one LLM
call they're a pipeline typical flow You
can see the code in our conversation
history. This orchestration is often
done with AWS Step Functions because it
gives you branching, retries, parallel
retrieval, observability, exam signal,
multi-step query handling,
orchestration, branching, step
functions. NAS chapter 6 AWS static plus
one query handling addition static
expansion rules decomposition logic
transformation templates orchestration
workflow plus one the user query you
design the system once each query flows
through it that's static one again seven
MCP clients concept retrieval access
patterns this is subtle and very exammy
what MCP clients means conceptually
think of standardized clients that know
how to talk to retrieval systems in a
consistent way. They request context,
specify scopes, define access patterns,
avoid ad hoc retrieval logic everywhere.
Why this exists? Without this, every
agent or service implements retrieval
differently, breaks consistency, causes
security and quality issues. With MCP
style clients, retrieval is
standardized, access is controlled,
context is predictable. You don't need
protocol level details for the exam,
just the concept.
Clients that standardize how models and
agents access retrieval context
eight realistic endto-end example user
query compare parental leave and
careleaf policies for NSW employees
smart rag handling one detect complexity
needs comparison two decompose parental
leave policy NSW care relief policy NSW
three expand synonyms and policy names
four transform concise factual phrasing
five retrieve possibly in parallel Six,
aggregate. Seven, generate answer.
Without this layer, one fuzzy query,
partial retrieval, weak answer. Nine,
classic exam traps. Watch carefully.
Send raw query directly to Vector DB.
LLM magically fixes retrieval. Query
expansion replaces metadata filters.
Decomposition is only for prompts. AWS
wants systems thinking, not prompt
optimism. One memory story. Lock it in.
The research assistant expansion. What
else could this mean? Decomposition. Is
this actually multiple questions?
Transformation. How do I phrase this for
the library? Orchestration. Let me
coordinate the research. MCP client. One
official request form for all
departments. The assistant prepares the
question before going to the library.
Shack exam compression rules. Memorize
vague query expand complex query
decompose. Chatty query transform,
multi-step logic, step functions,
consistent retrieval access, MCP style
clients. If the answer treats retrieval
as a single call, it's likely wrong.
What AWS is really testing, they're
testing whether you understand that. The
quality of answers depends more on how
you ask than what you ask.
Smart rag systems shape queries before
retrieval ever happens. Only really
sticks once you see how a smart rag
layer behaves in the wild. Below are
three real production style examples
that AWS exam questions are clearly
inspired by. Real examples day 12 query
handling systems. Example one HR policy
assistant query expansion user query
raw. What's the leave policy for
parents? This is too vague for
retrieval. Smart rag behavior.
Step one, query expansion. The system
expands the query internally to include
parental leave, maternity leave,
paternity leave, car leave, paid
parental leave. The user never sees this
have retrieval. Each expanded term is
searched in the vector store, often via
open search or a managed KB result.
Instead of no results, you get all
relevant policy sections. Why AWS loves
this? Improves recall. Reduces false
negatives. No prompt trickery. Exam
takeaway short or vague queries query
expansion.
Example two legal research assistant
querition user query. What are the
eligibility rules and tax benefits for
first home buyers in NSW after 2023.
This is two questions pretending to be
one. SN smart ride behavior. Step one
detect complexity. The system detects
multiple topics conjunction decompose.
It splits into one eligibility rules for
first home buyers in NSW after 2023.
Two, tax benefits for first home buyers
in NSW after 2023.
Orchestration. These subqueries are
retrieved in parallel using AWS step
functions. Each query applies metadata
filters. NSW year 2023 uses hybrid
search may use reranking day 11.
Aggregation results are combined into a
single structured context for the LLM.
Why AWS loves this? Better coverage,
cleaner retrieval, easier to debug. Exam
takeaway multi-art question query
decomposition.
Example three. Customer support chatbot
query transformation. User query chatty.
Hey, can you tell me if I'm allowed to
cancel my subscription and maybe get a
refund?
Humans talk like this. Search engines
hate it. Smart rag behavior. Step one,
query transformation. The system
rewrites it to subscription cancellation
policy. Refund eligibility criteria. No
new meaning is added only clarity
retrieval. Short factual queries perform
better in vector search. Keyword search
hybrid systems. Why AWS loves this?
Smaller prompts, more deterministic
retrieval, less noise. Exam takeaway.
Conversational input. Query
transformation. Example four. Full smart
rag orchestration all combined. User
query compare parental leave and carele
policies for NSW employees hired after
2022. End to end flow. One analyze query
detects comparisons multiple topics.
Two, decompose parental leave policy NSW
after 2022. Car leave policy NSW after
2022. Three, expand. Add synonyms and
official policy names. Four, transform,
concise, factual phrasing. Five,
retrieve hybrid search filters, possibly
in parallel. Six, aggregate merge
evidence. Seven, answer all
orchestration is handled by step
functions, not one giant prompt. Example
five, MCP style client retrieval access
pattern. Problem without MCP style
clients, every agent queries the vector
store differently. Different filters,
different scopes, inconsistent security.
With MCP style client, conceptually you
create a standard retrieval client that
accepts query, tenant ID, topic, max
documents, always applies metadata
filters, hybrid search rules, reranking
policy. All agents use this client. CHA:
Why AWS cares? Consistency, security,
predictability, easier auditing. Exam
takeaway: MCP clients standardized
retrieval access, not ad hoc queries.
Statics one real world lockin static
expansion rules decomposition logic
transformation templates orchestration
workflow retrieval client interface plus
one user query system logic stays fixed
queries flow through it one memory story
never forget the research coordinator
expansion what else could this mean
decomposition is this more than one
question transformation how should I
phrase this for the archive step
functions let me coordinate the research
MCP client. Everyone uses the same
request form. Good answers come from
better questions, not bigger models.
Ultrashort exam cheat sheet. Vague,
expand, complex, decompose, chatty,
transform, multi-step, upstep functions,
consistent retrieval, MCP style client.
UNLOCK MORE
Sign up free to access premium features
INTERACTIVE VIEWER
Watch the video with synced subtitles, adjustable overlay, and full playback control.
AI SUMMARY
Get an instant AI-generated summary of the video content, key points, and takeaways.
TRANSLATE
Translate the transcript to 100+ languages with one click. Download in any format.
MIND MAP
Visualize the transcript as an interactive mind map. Understand structure at a glance.
CHAT WITH TRANSCRIPT
Ask questions about the video content. Get answers powered by AI directly from the transcript.
GET MORE FROM YOUR TRANSCRIPTS
Sign up for free and unlock interactive viewer, AI summaries, translations, mind maps, and more. No credit card required.