
After working on LLM applications for several months now, I am still amazed at how well these models work. The reasoning and awareness from these models are absolutely something out of a sci-fi movie. Just last week, Google DeepMind has come out recently suggesting that GPT-4 has on-par emotional and social intelligence to humans – that means GPT-4 might have a better EQ than many people I know, including myself š . Itās impressive how well generative outputs can be from working code generation to composition and recommendation systems, but these may not be the best improvements these types of models have made for Data Scientists.
One of the most significant advantages these models bring to data science workflows is their ability to resolve major pain points in traditional machine learning pipelines. In typical DS workflows, there is a lot of work to go from a hypothesis to conclusions, typically spanning entire disciplines like Data Engineering, Data Science and ML Ops. This usually takes a lot of time, which can add extra stress to a young start up, ultimately positioning many experiments as a āmake it or break itā moment because of how many resources were used and how little runway is left.
Unfortunately, a lot of this work is done prior to having any real user feedback. What if the ML pipeline is rendered moot because a user doesnāt engage with it? I bet many folks have had similar experiences to this: at one of my previous companies, we spent months whipping up a forecasting algorithm to find out that our users were just using PowerBI to download Excel sheets to do a simple moving average. We need to understand what our users are trying to do within our software so that we can meet them where they are to provide a seamless interface.
We can circumvent a lot of the pain in standard ML workflows using models like GPT-4 to short-circuit the amount of sprints it takes for us to get real user feedback. We can use clever prompt engineering techniques like Chain of Thought, ReACT, and may other LLMs techniques to help us:
- Prototype really fast. Using popular interfaces like LangChain, itās seamless to prototype and productionalize prompt-based models. At my current job, Iāve pushed eight models in the last 5 months, ranging from topic modeling, sentiment analysis and classification. Of course there are commonalities with our traditional DS workflows like all the data wrangling, but I was able to iterate quickly, see what was working and what wasnāt, and most importantly seek feedback from my team.
- Solidify the requirements of the product. With a crude model, I could whip up a demo in a Jupyter Notebook and start a conversation with my boss or teammates, who would opine on how well the model was working. Often, this interaction uncovers implicit requirements that werenāt totally clear at the beginning of the project. These can be subtle dependencies, nuances in the problem space, or relevant business applications. But since itās so easy to prototype, we can pivot live and continue refining the scope for what we hope the models to do.
- Collect user feedback. Now that weāve internally pressure tested the problem space and initial model, we deploy this to our users. We use LangSmith to collect feedback on every prompt we have, allowing us to see trends logged over time for any model. This lets us see if a user engages with the model, and collect additional feedback such as a rating on a 5-point scale.
Iām a huge fan of Don Normanās Design of Everyday Things, where he talks all about how good product design enables people to use a system to accomplish their goals. In my professional life, LLMs act as a tool for me to get rapid feedback. As I mull over a problem, I can play quickly (and cheaply now, thanks to GPT-4o), which lets us collect a ton of feedback. Using LLMs like this allows us to weave elements of iterative design into ML pipelines where the models evolve with each additional wave of feedback. Using LLMs in this pattern allows us to meet the user months ahead of schedules Iāve seen in other companies, saving precious resources and my sanity (albeit less precious as companies have often let me know).

Thank you for reading – I hope this offered a fresh perspective on how to use LLMs in your workflow to get more feedback. And now for the shameless plug – we’re all about the feedback if you can’t tell: drop a comment with your thoughts on the article and share any of your experiences using LLMs. Your engagement helps keep these discussions lively and insightful!
Leave a comment