Suban Shrestha

Interpretable Machine Learning: Ceteris Paribus (CP) Plot

Local Model Agnostic Method Ceteris Paribus Ceteris Paribus (CP) is Latin for keeping all things equal. It is one of the simplest plot. Here, we look at only one of the data sample, and only one parameter of the data at a time. Then we systematically very the value, (generally from the minimum to the maximum value in the dataset.) to get a insight on how the model’s prediction changes with the change in the the parameter. ...

Interpretable Machine Learning

A field associated with understanding the behavior of different models. Helps in debugging and improving model performance in real world Make sure the model is learning the right thing. (Not taking shortcuts) Example: Classifying a wolf or a dog based on the background rather than the animal. Sometimes accuracy is not the only thing that is important. Churn model - predicts customer who are most likely to churn Important that we also know why the model predicts the certain churn, so we can deal with it accordingly. This can help us make decisions Increase trust worthiness of a model. Can help in increasing our understanding of the domain. Can be used to convince stakeholders that the model is good. Explanations focus on abnormality ...

The Illusion of Thinking

paper: https://machinelearning.apple.com/research/illusion-of-thinking The paper shows that the current Large Language Models(LLMs) and Large Reasoning Models (LRMs) experience a complete accuracy collapse on reasoning tasks after a threshold on complexity. This was demonstrated by using simple games like Tower of Hanoi and River Crossing. Via this paper the questions Are models capable for generalizable reasoning or are they just really good at pattern matching? The Setup love how simple the setup is Games like Tower of Hanoi ...

Instacart's LLM Driven Search and Discovery

Background: Instacart is a platform for buying groceries and everyday goods. LLM in Query Understanding Challenging Queries Overly general queries Overly specific queries (tail end queries) They use multiple models as components for query understanding: spell correction query normalization category, rewrites, brand, query tagging, aisle classification Product Category Classification Category Generation (User Query, Categories) tell llm to pick categories that are a exact match, (specific = good) Chain of Thought Verification of the categories Post Processing guardrails (embedding) Query Rewrite Given a Query Strong Substitute rewrite broader rewrite synonym rewrite LLM in Product Discovery Problem: When user selects a product, we would like to show products that user might add to the cart Basic Generation: Given a query, give me substitutes and complementary items. Augmented Generation: Query Query Annotations (Brand, Category, …) Items bought after this Diversity Based Reranking Serving Take search logs Call llm in a batch mode Store everything, content, metadata, even products During runtime use some content retrieval technique from AI Engineer World’s Fair 2025 - LLM RECSYS Their blog on product discovery ...

Aws Agent Squad

AWS Agent Squad is a light weight framework for orchestrating AI Agents. Basic Architecture Most AI Agent orchestration frameworks are just ways of calling a LLM in a loop to achieve neat behavior. Here is what makes different Runner / Orchestration Class This class is relatively smaller than something like OpenAI’s swarm architecture. i have yet to read about OpenAI’s agents but i assume the structure is similar It consists of ...

Conversational Agents with Workflow Graphs - Paper Notes

The paper outlines a technique for production-grade chat applications (conversational agents): It models the ideal conversation between parties as a Directed Acyclic Graph (DAG) Each node in this graph is a state of a conversation Each node has its own prompts and tools and manipulates the state with: Few-shot examples Constrained decoding for output generation and graph traversal The paper also demonstrates a method for fine-tuning models for this setup Motivation of the Author Building reliable conversational agents The probabilistic nature of the model means it can randomly fail to comply with business requirements Node Structure and Motivation The paper claims that LLMs often struggle with complex conditionals (which is true) By providing structure to each node, stability is increased Structure of Each Node: A system prompt with few-shot examples A custom computational routine that manipulates the history/context (to prevent LLM hallucinations) Tool calls obtained using constrained decoding (structured outputs) Example Graph ...

Money Personality and Psychology

Money Personalities People with money fall in one of 4 categories based on how they use the money Spender: Spending most of the money with little regard to future. One interesting note is that the Non thinker(A person who doesn’t want to think about money) tends to be quite closely related to this A Useful advice for spenders is to set aside some amount before anything. Part of all your earn is yours to keep ...

SWE vs AI Workflow

The nature of the work Most of the early Software Engineering (SWE) work involves a more deterministic work. There are standards and things work mostly as expected. This differs from AI Engineering Workflow as, while there are best practices a big portion of the work is somewhat uncertain. This generally results in unknown timelines and progress that cannot be measured in the usual sense. While Make a integration with X can be somewhat measured with it happening piece by piece. (linear trajectory) Something like Improve the search functionality ...