Evals Before Agents: How We Measure AI Quality in Hospitality
Evals aren't optional. They're the foundation. Build them first.
Evals aren't optional. They're the foundation. Build them first.
Eli Hooten | February 19th, 2026
How Chateau Event Center partnered with Line to deliver on demand service
Kyle Mann | February 9th, 2026
Examining the problem, the people, the market, and why group sales is overdue for innovation
Kyle Mann | November 15th, 2025
Reflections building with AI, UX notes, transparent pricing, and founder spotlight
Kyle Mann | November 1st, 2025
AI is redefining discovery and booking for hospitality
Kyle Mann | October 6th, 2025
Gratitude, hotel AI reservations, and co-founder spotlight
Kyle Mann | October 1st, 2025
Great hospitality starts with the people behind it
Kyle Mann | June 11th 2025
Travel is booming and people are craving IRL
Kyle Mann | June 7th 2025
Kyle Mann | May 1st 2025
From the age of 15, I worked in nearly every hospitality role you can imagine: housekeeping, front desk, bartending, night audit, catering coordination, sales, and conference services. I worked in mom-and-pop venues, catering-only businesses, and global hotel brands like Kimpton, across places like San Diego, Vail, and Washington D.C. These experiences taught me that hospitality is about how you show up, lead, and serve guests with deep care. And while that was my day job, at night I was teaching myself design and how to build websites...
Later, I transitioned into tech and I've been building software nearly a decade and half with organizations of all sizes and capacities: client work in fin-tech, dev tools like GitLab, to early-stage startups. During this time, I learned the power of building systems that scale, solving deep user and technical problems, and how innovation comes one small experimental iteration at a time.
Now I'm coming full circle: combining those two worlds to help fix what's broken in hospitality today. Along the way, I kept hearing the same frustrations:
"I'm so tired of this every weekend I come to work to do all the computer stuff."
– Conference Services Manager
"Instant attention is everything now with these young people. If they don't hear back in a minute, they freak out. I still block off dates manually in Google Calendar."
– Owner, Event Venue
"I wish there was something to clean up our emails – my team is totally bogged down. Our setup sucks and I hate Salesforce."
– Director of Sales
"The top concern every year at the GM's meeting was how broken the sales workflow was (groups, meetings, and events)."
– GM
"From inquiry to invoice, our workflow is super manual. The biggest issue is inventory management."
– Owner, Rentals Company
I know what it's like to sell and then run an event with a skeleton crew. I've stayed up all night fixing last-minute banquet orders. I've wrestled with endless email threads, broken CRM setups, and customers who never received the information they needed. I've heard hundreds of stories just like mine.
But I've also been on the other side – the customer who fills out a form and never hears back. Who gets a generic copy-paste email with a PDF attachment, sometimes even addressed to the wrong name, like 'John', because it was copied from the last email they sent. Who calls in the evenings or weekends, when they finally have time, and no one answers. The experience is broken on both sides. And no matter the size or type of venue, the stories always sound the same.
That's why I'm building Line – to bring hospitality to the full experience, not just the moment a guest arrives. I believe the guest experience starts at the first click, and it should feel as welcoming, responsive, and thoughtful as check-in. At the same time, I want to make life easier for the teams who make these events happen. Line is here to serve both sides of that equation – the guest and the staff – with tools that reflect the spirit of great hospitality from start to finish.
Today's customers expect fast, clear, and thoughtful responses, yet most hospitality teams are stretched thin, buried under outdated workflows and clunky tools.
You've probably experienced it yourself:
And on your side? You're juggling proposals, availability, inventory, and guest expectations, often without the right systems in place. It's not just stressful, it's unsustainable. And to be really honest, not hospitable!
Line is an AI-powered assistant built specifically for hospitality teams.
Think of it as an always-on, highly trained helper that knows your business and can assist with:
For your team, Line saves time and makes your life easier. For your guests, Line offers the warm, professional service they expect: instant hospitality. We believe the inquiry experience is the guest experience. So we're starting there.
Line is for any hospitality business that handles private events, rentals, or group sales. Line is building out integrations to work alongside the tools you're already using, not to replace them. We're not building cold automation. We're building warm, AI powered responsive tools that help you show up better for your guests.
At Line, we aim to be unreasonably hospitable. Not just in how we serve your guests, but in how we serve you. Our goal is to bring the same level of care and professionalism to your sales process that your team brings to every event. We're starting with: an embedded assistant for inquiries, paired with a manager dashboard that helps you respond, schedule, and close.
From there, we're building a complete solution that handles everything from inquiry to invoice. One seamless workflow that ensures your guests feel taken care of at every step - whether they're browsing your website, planning their event, or experiencing it at your property.
If you're tired of duct-taping tools together or losing business due to delays, I'd love to show you what we're building. We're rolling out with a handful of early adopters and actively iterating on the product with them. If that sounds like something you want to be part of, I'd love to talk.
Thanks for reading – how may I be of service? Let me know at kyle@withline.io
– Kyle
Kyle Mann | May 19th 2025
If you're in the hospitality business - whether you're running a boutique hotel, a wedding venue, a rental company, or an event space - you already know how important it is to deliver exceptional service once a guest arrives. Beautiful spaces, warm greetings, and attention to detail are table stakes.
But the thing is: for your guests, the experience doesn't begin at check-in. It begins with the first click. Look around the industry and it's clear - that same warm, high-touch hospitality that defines your property often disappears online. Instead, guests are met with outdated websites, confusing layouts, and a generic form at the end of a long scroll.
What should be an exciting moment - envisioning and planning an event - turns into frustration. You click around, hoping for inspiration or clarity, and instead land here:
A cold, impersonal form. It's something you fill out not because you want to, but because it's your only option. And if you decide to call instead? Either you're playing phone tag or yelling "agent!" at a phone tree. Or worse, interacting with a chatbot that's not built to help.
This experience isn't just inconvenient - it's the opposite of hospitality.
Consider this recent experience: I tried to submit a group meeting request at the local IHG-Kimpton Armory Hotel. I couldn't complete the form. I called, but it was the weekend, so no one was available. When I finally connected with someone the following week, I explained how hard it was to get in touch. The Director of Sales told me, "well, all places are like that, sales offices work 9 to 5". And about the website, "well, you need to contact IHG".
And the director isn't wrong - I've been there. But what's missing is that the staff all too often is working nights and weekends just to stay afloat. And still, they can't keep up. That mindset of "that's just how it is" isn't just disheartening - it's the enemy of hospitality and innovation. All told, guests are frustrated, staff are exhausted, and opportunities slip away.
This is the unfortunate reality in hospitality today. And it's a problem I'm determined to solve.
Guests expect clarity and responsiveness from the moment they hit your website. And yet, most hospitality businesses fall short:
That first interaction sets expectations: Can I trust this team? Will they respond quickly? Do they care? And if the experience feels impersonal or slow, the guest moves on.
Line has created a solution that helps your customers upon inquiry and provide instant feedback, questions, and tailor made visuals to help them kick off the experience.
Outdated tools, manual processes, and siloed systems don't just create stress - they cost bookings.
"My team is buried in emails and can't keep up."
"Guests want fast answers, and we can't keep up."
"We miss leads all the time."
We built Line to fix that. Our assistant connects to your tools, learns your spaces, and helps you respond faster. That means less chasing emails and more time actually serving your guests.
Your online experience is your brand. It's time we treated it with the same care we give the lobby, the room, or the reception hall. If you're ready to deliver hospitality from the first click - not just from check-in - we're building Line for you.
If this resonates, let's talk. We're working with partners who are ready to raise the bar, for their guests and their teams. How may I be of service?
Reach out: kyle@withline.io
Discover how AI is transforming the way hospitality businesses handle bookings
Sylvia Larke | May 30th 2025
Customers expect immediacy and personalization
Kyle Mann | May 19th 2025
Kyle shares the inspiration for Line
Kyle Mann | May 1st 2025
Sylvia Larke | May 30th 2025
As an event company owner, I've navigated countless back-and-forth conversations with venue sales associates, often feeling bogged down by the sheer amount of communication required to get everything just right. My events come with specific requirements—a patio that's fully fenced, dog-friendly policies, a certain price point, and ideally some roof coverage. While these details ensure a seamless experience for my clients, hashing them out often feels tedious and inefficient.
This inefficiency doesn't just impact me; it affects the sales associates too. I've seen firsthand how the endless emails, phone calls, and potential miscommunications can leave staff drained, making them less present and less effective when the event day arrives. The back-and-forth takes a toll on everyone involved, and the risk of unmet expectations can create unnecessary stress.
That's why I was so eager to help develop and advocate for Line, a platform that's transforming how venues and customers communicate. Imagine being able to review a venue's Instagram profile or website, then seamlessly initiate a conversation through Line, getting real-time answers to all your questions. With Line, I can confirm key details about roof coverage, fencing, and dog-friendly policies without ever feeling like I'm going in circles. By the time I'm ready to book, I have complete confidence that my expectations are clear and will be met.
For event owners like me, Line solves some of the most persistent challenges:
What makes Line truly stand out is its ability to minimize human error. Traditional chatbots often fall short, offering limited responses that can't progress an inquiry meaningfully. On the other hand, Line combines advanced automation with real-time human support when needed, ensuring that nothing falls through the cracks.
As an event company owner, I know how critical it is to work with venues that value efficiency and professionalism. That's why I'm now using my experience and passion to help venues adopt Line. By integrating this platform, venues not only improve their booking process but also demonstrate to customers that they're committed to providing a premium experience. Early adopters of Line are already seeing the benefits: faster sales cycles, happier clients, and a competitive edge in the market.
In the fast-paced world of event planning, tools like Line aren't just helpful—they're essential. For any venue looking to enhance its operations and attract more bookings, the choice is clear: streamline your process, embrace innovation, and show your customers that you care.
If you're ready to transform your venue's booking experience, let's connect. Together, we can make tedious communication a thing of the past and create a future where event planning is as seamless as it's meant to be.
Reach out: sylvia@withline.io
Kyle Mann | June 7th 2025
There's no shortage of noise around AI: dramatic headlines, dire predictions, and endless hot takes. But while that noise plays out, we're focused on building: technology powered by AI that can actually help event and hospitality teams get back to doing what they do best.
Hospitality is about presence. It's about creating moments that feel personal and memorable. It's why so many people chose this profession in the first place: to host, to connect, to make someone's day.
But somewhere along the way, teams got stuck juggling clunky websites and repetitive admin, while dealing with staffing shortages and growing guest expectations. Instead of spending time with guests, they're fighting with their tools.
We believe innovating the workflows with LLMs is an opportunity to change that. Not by replacing people, but by freeing them:
One restauranteur put it simply:
"I'd love to get this email and booking stuff off my plate, and back to innovating my menus and events for my guests."
Exactly. That's our vision.
At Line, we're building tools that serve your staff the same way your staff serves your guests. Tools that quietly remove the friction, so your team can be more present, more responsive, and more focused on what matters most: the guest experience.
Because here's what we see: travel is booming and people are craving real-life connection. Hospitality is the industry that can deliver it, but only if we give teams the support they need. This is the golden age of hospitality and it's time the tools caught up.
If you're building for this future too or if you're just tired of being held back by outdated systems – we'd love to talk.
Let's build together. Let me know how we can help: kyle@withline.io
Kyle Mann | June 11th 2025
Lately, I've been visiting venues, museums, and hotels and talking with operators. Walking into these properties, you're immediately struck by the design, ambiance, and attention to detail. When it's done right, it doesn't just feel like a welcome home, it feels like wow, I get to hang here 😎.
Then, I get ushered into the back office. Anyone who's worked in hospitality knows the contrast. The back of the house is often the complete opposite of the guest-facing side: cluttered, dated, and stressful. That's not something Line can fix physically. But what we can do is bring hospitality to your staff's workflows (and guest).
Line is building an experience that treats your team with the same thoughtfulness and service that you extend to your guests. Our interface says, "How may I assist you?". Why shouldn't your staff experience the same level of care you aim to deliver on property?
In nearly every conversation I've had, the post-COVID challenges are clear:
"We used to run with 14 staff, now it's 5, and we have double the demand."
"We can't find talent. Many have left the industry entirely."
"My team is exhausted. They're drowning in busywork."
There's something wrong when one of the most service-oriented industries is stuck with some of the clunkiest, most unfriendly tools out there. We believe great hospitality starts with the people behind it and they deserve better systems!
If this resonates, let's talk. We're working with partners who are ready to raise the bar, for their guests and their teams. How may I be of service?
Kyle Mann | October 1st, 2025
👋 Dear friends,
I've been talking with hundreds of hospitality folks: sales leaders, GMs, chefs, restaurateurs, asset managers, owners, you name it. It's been both inspiring and eye-opening to hear about your workflows, pain points, and ideas for the future.
And beyond the conversations, I want to thank our early adopters, the hotels and venues in the U.S. and Europe partnering to bring Line into practice. You're not just "trying software", you're helping prove what is possible while shaping the next chapter of innovation in hospitality.
Our promise: we'll be working around the clock to smooth out the friction points you and your guests feel every day so your teams win back time, get your property AI ready, and more inquiries convert into real business.
We're piloting our AI reservations agent, built to meet guests wherever they discover you: social, maps, AI search, or messaging. No more clunky websites, endless forms, or waiting on calls and replies. Guests can get answers and book instantly at the moment of discovery.
For properties, that means more direct bookings, fewer missed opportunities, and a smoother guest experience from the very first click. And this is just the start: it builds on the foundation of a single AI sales agent that also handles customer service questions and qualifies group sales leads. Interested? Shoot me an email or DM.
Katia is Line's co-founder and COO, based in Toronto. We've been building together for years: first at Codecov, then through the Sentry acquisition, and now at Line. From deep engineering to leadership, she's built AI agents that resolve software bugs and now building agents that sell, drive bookings, and close business 24/7.
The spark for Line came when she was searching for a venue with her partner and ran into the same bottlenecks and frustrations she's heard from staff time and again. That "aha" moment made it clear: hospitality deserved better tools.
I couldn't be happier to be on this journey with Katia. Her mix of discipline, empathy, and technical depth is exactly what it takes to reimagine hospitality technology with the customer experience at the center. Reach Katia at katia@withline.io or connect on LinkedIn.
We'll be training our agents, working closely with design partners, and connecting with more customers to learn from their workflows. Our goal this month: make our first deployment...the world's first AI agent for group sales. We'll also be at The Hospitality Show if you're there, let's connect.
We're just getting started, and I can't wait to share what's next.
Kyle
Kyle Mann | October 6th, 2025
As a kid, National Geographic magazines were my window to the world, filled with exotic destinations and hotel ads tucked between the pages. Reflecting on it now, it's remarkable how much discovery has evolved in such a short time, moving from print to web to mobile. Now it's shifting again: across social feeds and into AI interfaces.
Today, in a major leap, OpenAI launched Expedia and Booking.com apps inside ChatGPT, letting users search, compare, and book hotels directly in the interface.
At first glance, it may seem like another win for OTAs. But look closer and there's a shift underway, one that could actually strengthen the guest experience and return control of distribution to properties.
And to understand where we're headed, it's worth looking back.
For decades, the white pages were the map of the world. If you weren't listed, you didn't exist. Guests might stumble on you in a directory, a glossy magazine, or a brochure from a travel agent. Discovery was limited, but at least it was structured and centralized.
Once they found you, booking meant a long-distance call or even faxing over credit card details. It was clunky, slow, and often expensive but it gave hotels something valuable: a direct line to their guests.
In the late '90s and early 2000s, the web became the new white pages. A destination in Wyoming could suddenly be discovered by a family in Tokyo. Websites, email, and inquiry forms replaced long-distance calls. They were cheaper and asynchronous, and they quickly became the backbone of RFPs, group inquiries, and confirmations. The workflows of forms, attachments, and endless email chains were born here.
Then came the rise of Online Travel Agencies (OTAs) and Google Maps. Discovery no longer started with your name, it started with a search box. Guests typed in a city, and OTAs controlled the results. OTAs delivered massive visibility, but at a cost: steep commissions, brand dilution, and a lost direct relationship with the guest.
And today: guests are asking conversational questions inside AI-native platforms like ChatGPT, Gemini, and Perplexity:
"What's the best boutique hotel in Austin for a small team retreat?"
"Find me a romantic getaway within three hours of Chicago with a spa."
"Where should I book our company retreat for 200 people in San Diego this fall?"
Add to that the rise of social discovery: reels, travel creators, and friends' recommendations.
From social feeds to AI assistants, guests are finding and booking in entirely new ways. The question is: is your property ready?
You can take concrete steps now to position your property ahead of the curve and make it discoverable in this new landscape.
Many properties still rely on static inquiry forms. But if a guest's AI assistant encounters only a form, that is a dead end. Assistants need real-time answers.
Start by deploying an AI sales agent on your site and across your channels. Think of it as a digital team member that is always on, able to answer questions, qualify leads, and move guests toward booking.
This is actionable today, here's hospitality at first click: guests can discover a property and instantly engage with an AI agent that answers questions, provides availability, and completes direct booking all in one seamless conversation.
In Step 1, you set the foundation, your data and AI sales agent are ready to engage guests directly. The next step is making that same capability visible inside AI platforms.
So the question becomes: how can your property increase direct bookings and build a direct workflow that appears there?
The most likely bridge is something called the Model Context Protocol (MCP), the developer standard that helps AI systems understand and communicate with data you have (availability, rates, event spaces) and connect you directly.
Think of the Model Context Protocol (MCP) as the new white pages for the AI era: a digital listing that tells AI systems who you are, what you offer, and how to reach you.
In simple terms, MCP is a technical standard that makes your property's data readable and accessible to AI. It acts as a bridge between your systems and platforms like ChatGPT, Gemini, or Perplexity.
Once it becomes a standard, properties without MCP may only appear through OTAs or generic listings when a guest's AI assistant looks for hotels. With MCP, your property becomes visible, discoverable, and bookable directly inside the conversation.
MCP adoption is still developing, and no one can predict exactly how it will evolve. But deploying an AI sales agent today puts your property in the ideal position to activate MCP the moment it becomes the standard.
If you prepare now, the benefits could be profound:
The AI-native wave is your chance to rewrite the story and take back your distribution.
At Line, we're helping properties get AI-ready today from deploying always-on sales agents to preparing MCP endpoints that make your property discoverable in AI-native channels.
Are you AI-ready? I'd be happy to help, email me at kyle@withline.io
Eli Hooten | February 19th, 2026
The core problem we're solving with Line is one of making supply (e.g., inventory and availability of a hotel) legible to demand in group sales. Our solution is to collect, aggregate, and meaningfully contextualize data for a venue and layer functionality on top of that data that greatly improves common workflows in hospitality. Our first useful piece of functionality is an AI group sales agent that helps overwhelmed group sales teams ingest and qualify inbound leads.
We started building these AI group sales agents on a per venue basis by hand. We'd bring in private and publicly available venue data (e.g., pricing, amenities, etc) and craft venue specific prompts by hand trying to be cognizant of the task the agent had to perform (group sales) and the brand, tone, and positioning of the underlying venue.
Naturally, the question arises, "How do you know if these agents are any good?" The manual way to determine this is to simply chat with the agents we trained and get a feel for their accuracy and ability to accomplish their goals. Of course there are numerous issues with this approach: it's qualitative, based almost fully on "vibes" as we chat with the agent, there's no way to concretely track improvement over time, and even if an agent is "good" we have no real underlying framework to tell us why.
We found that our agents could handle straightforward questions well enough -- venue capacity, parking options, the easy stuff. But it would routinely hallucinate as the conversation went on, especially when it was pushed (e.g., "Are you sure you don't have any pricing available?"). In hospitality, that's not a minor bug. That's a broken promise to a bride planning her wedding, a corporate admin building a budget around fabricated numbers, a venue owner whose brand just got misrepresented to a potential client. The trust damage is immediate and the recovery is slow.
It became readily apparent to us that it wasn't enough to be confident that our agents could hold a conversation (they can), but whether we could prove they were accurate before putting them in front of real prospects. We were never going to solve this manually based on vibes, optimizing for taste would only get us so far. We needed a different approach.
With all AI deployments, a core issue is measurement, Line's agents were no exception. When you test an AI agent by chatting with it yourself, you're doing two things poorly at once: simulating a realistic prospect (you're not one) and evaluating the quality of the response (subjectively). You already know what the agent should say, so you unconsciously steer the conversation toward success. You test the happy path because you built the happy path.
This approach has other problems too. You can't run the same evaluation twice and get comparable results. You can't track improvement over time. You can't identify which specific thing is broken when the conversation feels off. "The agent needs to be better" is not actionable engineering feedback.
We needed what every engineering team needs when quality matters: automated tests with quantitative pass/fail criteria.
When I was an engineering director at Sentry, I put a lot of thought into quality measurement, particularly for AI-driven systems. I even wrote about it some. The insight I had at Sentry was that agent evaluation ("evals") should be treated just like software testing: define expected behavior, simulate realistic inputs, measure the output against explicit criteria, and iterate until the system passes.
But there's an important difference. Unit tests are deterministic -- same input, same output. Agent evals are stochastic. The same adversary persona can produce different conversations each time, and the evaluator can score the same transcript slightly differently on different runs. If you've tried to build reliable evals for LLM-based systems, you already know this is the hard part.
We don't pretend this is solved. What we've found is that structured rubrics with specific score-level definitions (not "rate quality 1-10" but "a score of 8 means X, a score of 5 means Y") significantly reduce variance. Multi-dimensional scoring helps too -- a single aggregate number hides too much, but a few independent scores with defined rubrics produce more stable, diagnostic results. And testing across multiple diverse use cases per cycle gives us redundancy that smooths out noise from any single evaluation.
It's not deterministic. But it's consistent enough to be actionable, which is the bar that matters.
We built our eval framework around three roles -- an Actor, an Adversary, and a Critic -- that mirrors how a good QA team would stress-test a sales agent if they had unlimited time and patience.
The Actor is the AI agent being evaluated. Not a simulated version, not a sandboxed copy -- the actual deployed agent running on our production infrastructure. We test what we ship.
The Adversary is a simulated prospect designed to be realistic and challenging. Each adversary has a specific persona with distinct behavior patterns: how much information they share, how they respond to questions, and a curveball they throw mid-conversation to test the agent's adaptability.
The Critic acts as a senior sales auditor. It reviews the full transcript against the venue's actual knowledge base -- the ground truth -- and produces a quantitative score across four dimensions. Yes, this means we're using an LLM to judge another LLM's output. We'll come back to why we think that's defensible and where it isn't.
It's important to note that, unlike unit testing for software which has reliable patterns and approaches, how you craft an eval framework is highly domain specific. For example, an eval framework to determine an agent's ability to properly diagnose a bug in a software system is very different than the framework used to determine if an AI sales agent is sophisticated enough to qualify an inbound lead. In other words, the Actor/Adversary/Critic approach made a ton of sense for us, it might not for you.
We score agents on four categories, each weighted to reflect its importance:
| Weight | Category | What It Measures |
|---|---|---|
| 40% | Accuracy | Did the agent only state facts found in the knowledge base? Any fabrication is a critical failure. |
| 30% | Discovery | Did the agent identify the prospect's date, budget, headcount, and event type through natural conversation? |
| 20% | Brand Alignment | Did the agent match the venue's voice and create an experience consistent with the brand? |
| 10% | Strategic Push | Did the agent guide the prospect toward a concrete next step, like booking a call with the events team? |
These four scores combine into a Performance Index (PI):
PI = ((Accuracy x 0.4) + (Discovery x 0.3) + (Brand x 0.2) + (Strategic Push x 0.1)) / 10
A PI of 0.85 or above means the agent is production-ready. Below that, it goes back for optimization.
The weighting is deliberate and opinionated. Accuracy gets 40% -- more than any other category -- because a hallucinating agent is worse than a boring one. An agent that sounds a little generic can still be useful. An agent that fabricates pricing, invents amenities, or makes promises the venue can't keep destroys trust in a single conversation. In hospitality, where events are high-stakes and high-emotion, there's no recovering from "the AI told me your venue had a rooftop terrace" when it doesn't.
Some of these categories are subjective and difficult to measure. For example, what does it mean for an agent to be brand aligned, and how do you even quantify that? Short answer: you guess and it's usually good enough. Even subjective metrics are useful because you're comparing the performance of the agent to some other iteration, not against an actual ground truth, so measured improvements are still improvements.
A single test conversation isn't enough. An agent that handles a straightforward corporate inquiry might fall apart when a bride changes her mind mid-conversation, or when a community organizer isn't sure what they want and needs to be led.
We test against three distinct personas in every evaluation cycle:
The luxury buyer -- detail-oriented, high expectations, shares information progressively and makes the agent earn it. Mid-conversation, they'll throw a curveball: "What if we wanted a Sunday brunch reception instead?"
The skeptical budget-holder -- price-sensitive, wants specifics, pushes back on vague answers. Their curveball: "Our CEO just told me we might need to cut the budget by 30%."
The undecided organizer -- vague on details, not sure what they want, needs the agent to take the lead. Their curveball: "Actually, could this also work as a fundraiser with a silent auction?"
Three conversations is not a statistically large sample, and we know it. But each conversation is 15+ turns covering a wide range of scenarios. The personas are designed to be maximally different from each other -- different information-sharing behaviors, different objection patterns, different curveballs. The system is built for rapid iteration rather than one-shot statistical significance. We'd rather run three deep, diverse conversations and iterate quickly than run thirty shallow ones and wait. What's important is that personas are easy to add. If in the future we identify some other diverse but important persona, we can easily add it to the framework.
All three personas must pass for the agent to be considered production-ready. Fixing a failure for one persona must not cause a regression for another. This multi-persona requirement is also our primary overfitting prevention mechanism -- an agent can't game its way to a passing score by excelling with one prospect type.
One of the most effective techniques we developed doesn't involve prompts at all. It involves knowledge design.
When we build an agent's knowledge base from a venue's available data (e.g., public website, internal documentation, etc), we don't just capture what information exists. We explicitly document what information doesn't exist. If we don't find published pricing, we don't leave a gap -- we create a specific knowledge item that says: no per-person rates, minimums, ranges, or dollar amounts exist for this property. Do not fabricate, estimate, or invent any pricing figures.
This matters more than it might seem. When an LLM encounters a question it can't answer from its context, the default behavior is to be helpful -- which often means generating a plausible-sounding answer from its training data. A thin knowledge entry like "pricing will be provided by the booking coordinator" leaves room for the model to fill in gaps. An explicit negative-knowledge entry removes the ambiguity entirely.
We call this anti-hallucination knowledge design, and it changed our most persistent failure mode. It doesn't make hallucination impossible -- models can still ignore instructions, especially under sustained pressure from a user. But it moves the failure mode from "the model had no guidance and guessed" to "the model had explicit guidance and overrode it," which is a much rarer and more detectable failure. Before we implemented this approach, pricing and amenity hallucinations were our most common Accuracy failures. After, they became our least common.
The hallucination problem is often a knowledge design problem. The problem isn't that models want to lie. It's that we weren't giving them enough information to know what they don't know.
We said we'd come back to the Critic, and here's the full picture: using one LLM to judge another's output is a known-fragile pattern, and we're aware of the risks.
The detailed rubrics help -- each score level has a specific definition, which constrains the Critic's subjectivity. Accuracy is the easiest dimension to judge reliably because it's largely factual: did the agent say something that's in the knowledge base, or didn't it? The Critic can check statements against ground truth in a way that's closer to retrieval than opinion. Discovery and Brand Alignment are inherently more subjective, which is part of why they're weighted lower.
We periodically spot-check Critic scores against human judgment. The correlation has been strong enough for our purposes -- the Critic rarely disagrees with a human reviewer by more than one point on any dimension, and the directional assessments (pass vs. fail, improving vs. regressing) have been consistent. But we haven't run formal inter-rater reliability studies, and we know that as models update, the Critic's calibration could drift. Remember, we're biasing for actionability, not perfection.
The Critic is a limitation we're actively monitoring and working to improve, but not one we've solved. For now, the framework gives us a repeatable, quantitative baseline that's dramatically better than the subjective, ad-hoc evaluation it replaced. The rubrics make it better than naive LLM-as-judge. But it's not ground truth, and we don't treat it as such.
Scoring alone isn't enough. The question after a failing eval isn't just "how bad is it?" -- it's "what specifically needs to change?"
Our eval framework produces diagnostic output alongside scores. When an agent fails, the system identifies specific issues: the agent fabricated a rooftop terrace, it didn't ask about budget until the prospect volunteered it, it didn't match the venue's upscale tone. Each issue maps to a specific fix:
The system applies these fixes and re-evaluates in a closed loop -- test, score, diagnose, fix, re-test. Each iteration produces a new PI score that we can track. The loop runs up to three automated iterations. When it converges, it typically does so in two to three cycles. When it doesn't -- when fixing one persona breaks another, or when the underlying knowledge base has gaps that prompt engineering can't paper over -- it escalates to a human prompt engineer. This closed loop is fully hands-off along its happy path, keeping humans out of the loop for as long as possible and freeing them up to do more challenging work.
That escalation happens more often than we'd like, maybe a third of the time. The most common cause is knowledge gaps that our data collection/knowledge building mechanisms didn't catch -- venue policies that aren't on the website, seasonal information that's buried in a PDF, the kind of tribal knowledge that lives in an experienced coordinator's head. The automated loop is good at optimizing within the constraints of existing knowledge. It's not good at inventing knowledge it doesn't have. That's a feature, not a bug, but it means the system has clear scope boundaries.
The full pipeline uses browser automation, document retrieval and ingestion, and many other mechanisms to build out the knowledge base for an agent, populate our platform, simulate adversarial conversations against the deployed agent, score the results, and optimize until the agent passes. The eval loop mirrors what engineering teams do with automated tests -- but applied to conversational AI.
For our small team of engineers, this has been the difference between deploying one agent per week and deploying dozens. The exact timeline depends on the venue -- a simple restaurant with a clean website is faster than a multi-space event venue with complex packages -- but the manual work dropped from days to minutes in every case.
A knock on effect of all this work: since the framework is quantitative, we can have honest conversations with venue partners about agent performance. Showing a venue owner that their agent scored 0.88 across three diverse prospect types, with accuracy at 9/10, is a different conversation than "we think it's pretty good." It builds legitimacy and credibility with our customers.
A passing eval doesn't mean we stop paying attention. We monitor production conversations for the same signals the eval measures -- particularly accuracy. When a real prospect asks a question the agent handles poorly, our team reviews the flagged transcript and that failure feeds back into the knowledge base and the next eval cycle. In other words, real-world conversations fill the Actor and Adversary roles in our AAC framework, but the techniques to evaluate and optimize are still the same.
We're still early in correlating eval scores with real-world outcomes like conversion rates and prospect satisfaction. The honest answer is that our eval system tells us whether an agent is accurate and competent, not whether it's effective at generating revenue. Those are related but not identical. A perfectly accurate agent that lacks warmth might score well on our eval but underperform with real prospects. We're building toward production metrics that close that loop, but we're not there yet.
If you're building AI agents for a domain where accuracy matters -- and in hospitality, it always does -- start with how you'll measure quality. Not subjectively, not by chatting with your own agent and deciding it feels right. Define what good looks like across the dimensions that matter for your use case. Quantify it. Build the feedback loop that turns a failing score into a specific fix.
The framework we've built isn't perfect. The LLM-as-judge pattern has known limitations. Three personas don't cover every prospect type. The automated optimization loop doesn't always converge. But it's quantitative, it's repeatable, and it catches problems before prospects do.
Evals aren't optional. They're the foundation. Build them first.
Care, responsibility, time back with family, and hospitality on first click
Kyle Mann | February 9th, 2026
Whitney and her husband run the Chateau Event Center, an up to 150 person venue that hosts weddings, corporate events, graduation parties, and private celebrations. Their work centers on moments that matter deeply to the people planning them, and Whitney takes that responsibility seriously.
In our conversations, it became clear how much of herself she puts into that work. Guests are not treated as inquiries to process, but as people trusting her with important days in their lives. She spends the time it takes to get things right, from the first message through the site visit and all the way to the event itself.
That dedication takes a real toll on her personal time.
Group inquiries arrive constantly, often at night, on weekends, or while events are already underway. Staying responsive means being mentally on call most of the time, even when Whitney is trying to step away and be present with her family.
That pressure intensifies during peak season. Spring, summer, and fall bring the highest demand for bookings at the same time the venue is busiest operationally. New inquiries arrive while live events are already in motion.
What Whitney described reflects a broader pattern across group hospitality. Inquiry volume continues to rise, guest expectations have shifted toward immediate answers, staffing has become harder and more expensive, and most tools still rely on manual review and follow up. Time is the first thing to break when those forces collide.
When we decided to run a pilot together, the goal was clear: help guests get real answers immediately, including pricing and visual context, while giving Whitney meaningful time back with her family.
Before Line, inquiry handling at Chateau Event Center relied on manual coordination and constant availability. Inquiries arrived through email or forms. Whitney checked availability, assessed fit based on capacity and pricing, and responded using adapted templates. Progress depended on repeated back and forth to answer questions, clarify details, and decide whether a site visit made sense. Guests wanted fast clarity on pricing, availability, and overall feel. The venue needed the same clarity before investing time. Bridging that gap required frequent follow up and real time judgment.
With Line in place, the system absorbs the early back and forth and handles the first interaction end to end. Guests get pricing, guidelines, and visual context immediately. At the same time, the system qualifies the inquiry in the background, capturing pace, intent, and fit. Some inquiries move quickly toward a decision, others unfold more gradually. In both cases, the system maintains momentum and preserves context, so Whitney can step in with the right information, on her time.
Moving qualified inquiries to action
Once an inquiry meets the venue’s parameters, the agent advances it to the next action Whitney has defined. In this example, the system schedules a site visit and reflects discussed dates on the calendar, maintaining momentum toward booking.
“I never would have expected to be able to create something like this myself. This gave a non technical person the ability to actually put an AI agent to work for me.”
That comment stood out because it reflects a broader pattern we continue to see across group hospitality. AI is widely talked about, but rarely embedded into the day to day systems operators actually rely on. Legacy tools still dominate, placing the burden of manual input and follow up on the operator. Our focus is to make AI accessible by building it directly into these workflows, so repetitive coordination is handled by the system rather than the person.
As Whitney began asking guests directly about their experience, a consistent sentiment emerged.
“I loved getting all the information about pricing and answers right away. Once I saw it was a good fit, I was able to schedule a site visit immediately.”
That feedback highlighted what was working on the guest side: early clarity, no pressure, and the ability to move forward at the right moment.
“A qualified guest booked a site visit entirely through the AI.”
The first interaction was matching both sides in real time. Guests were looking for pricing, availability, and overall feel. The venue needed the same signals before investing time. Line served as a matchmaker for those criteria, preserving context and advancing only the right inquiries.
With Line in place, Whitney described it as retraining herself to relax. That shift created space to focus on live events, be present with family, and apply judgment where it had the greatest impact.
Taken together, these outcomes show how inquiry volume can increase without requiring more operator time or staffing.
Working closely with Whitney throughout the pilot, and comparing notes with other operators, clarified where Line delivers the most value today and where deeper focus is needed next.
These focus areas reflect what operators care about most as volume increases: clarity early in the process, fewer interruptions throughout the day, and less manual coordination overall.
For Whitney, this pilot was about preserving and elevating the level of care her guests expect, while reclaiming time that had slowly been consumed by constant responsiveness.
Across group hospitality, we hear the same desire again and again. Operators want time back with their families and fewer reasons to be tied to inboxes and desks, so they can focus on what makes hospitality great: the people running it and the experiences they deliver.
Our promise is to be a group hospitality partner on first click for guests and a technical partner for operators, making AI accessible in a way that improves workflow efficiency without sacrificing care. Whitney represents exactly who we are building for, and we are honored to be in service of operators like her.
Kyle Mann | November 15th, 2025
My co-founders and I all quit our jobs determined to solve this problem. And truthfully, there has to be something wrong with people who do this. We traded healthy, seemingly safe salaries and stability for a journey that includes failure, rejection, and isolation in order to bring something new to market.
So, naturally, the question is why do it? Simply: because it's a problem worth solving.
In this two part post, I'd like to examine the problem, the people involved, the market, and why this category is overdue for innovation.
Problems come in different forms. Some are known knowns (waiting in line), some are known unknowns (diseases) and some are unknown unknowns (the deeper questions of the universe and life that we can't imagine). And then there's another category: the problems thinly disguised as "conventions", let's call it the known-but-it's-convention-so-oh-well-known, it goes something like this:
We know it's broken.
They know it's broken.
They know that we know it's broken.
We know, that they know, that we know, it's broken.
And yet, the charade carries on.
When you see this pattern, make note. It's a deeply rooted problem. Group sales in hospitality is one such case.
Let's peel the onion, starting with the most important part: the people.
Group business means weddings, conventions, corporate events, retreats, meetups, amusement park groups, and anything involving a private group space or for hotels 9+ or more rooms. Two sides:
Let's consider their goals, motivations and experience navigating this space.
When a guest starts planning a group, they're trying to identify:
To get those answers, they're dragged through a maze of scrolling, dropdowns, PDFs, calling, emailing, forms, and hoping someone gets back to them. Hours spent with no clarity on pricing or availability and some vibes from the website, Instagram, or Google Maps.
Eventually, all this investigative work leads them to stitch together a DIY "marketplace." It takes days, weeks, sometimes months. And all they wanted were three things: price, dates, and vibes.
What's crazy is they might have a $50k piece of business, a totally normal group spend, and this is still the "first click" experience 🫠.
On the other side of this, turns out properties are focused on the same three things:
Without open market dynamics, pricing becomes guesswork. So many venue owners say some version of:
"I struggle with what to charge so I just compare it to last season."
Take note: they are looking at data that is not market data, it is just their own property last season. But because the market is hidden behind webs of emails and web forms, that is all they have. Imagine if airlines or any other business only looked at their own past flights and never at competitor or market pricing. Would this make for optimized revenue management?
If you talk to hotels you hear the same theme:
Sales:
"I email revenue management and wait for a rate"
Meanwhile, the guest is talking to ten other places 😬.
Revenue management:
"I look at the convention calendar, our previous rates, ADR, and maybe <insert legacy system we pay tons of money for>, but to be honest nobody really uses it."
Both sides have the same goals in discovery: pricing, availability, and vibes. The problem is the friction between them.
With no open market or transparent supply and demand signals, pricing becomes guesswork. Both sides end up overpaying or undercharging without even knowing it. Availability turns into a long back-and-forth because nothing is connected in real time. And vibes only really come together at the site visit; everything before that is forms, emails, and vague details.
The matchmaking is where the majority of friction sits.
Group business is a 1 trillion dollar market. To get a sense of its impact, consider that Marriott's CEO reported that 24% room nights came from group business, roughly double their OTA bookings.
And yet, in 2025, two surprising truths remain:
To me, this is wild. First, the barriers customers go through just to give properties money are staggering. And second, Expedia (~$30 billion), Airbnb (~$70 billion), and Booking (~$160 billion) are thriving in individual travel, yet there is still no true group OTA or marketplace dynamics driven by transparent supply and demand. It remains a hidden market living inside emails, forms, RFPs, and legacy systems.
The tools that do exist are not introducing market dynamics. They either double down on the RFP convention for properties (proposal builders, workflow tools, templates) or they are aggregated listings that look like a marketplace at first glance. In those listings you browse, express interest, and the system fills out forms on your behalf before returning results in a tidy UI. It is a step above the DIY process described earlier, but still not an open market.
And when you look at OTAs and listings in general, it is worth asking why guests increasingly go to Expedia, Booking, and Airbnb. It is because these platforms give them something direct channels do not: the dynamics and information of an open marketplace. Transparency, filters, instant pricing, and side-by-side comparisons. It is supply and demand working the way people expect today.
Here is a parallel analogy: we know the Uber and Lyft solution, but what was the actual problem?
Consider the cab experience before them. Everything came down to manually parsing pricing, availability, and vibes between two parties. As a customer you had to figure out which cab was available, flag them down, hope they stopped, then deal with bargaining that might be honored, ignored, or complicated by a "broken" meter. Drivers had their own issues like getting stiffed or wasting gas hunting down customers. And on both sides there were trust and safety concerns.
Then Uber came along. The core breakthrough was not the technology. The innovation was introducing transparent market dynamics to both parties. Suddenly both sides could see pricing, availability, and vibes through reviews in real time.
note: our customer today is the property. We are not building a group OTA now. We are solving the property-side problems of on-demand sales, pricing, availability, and qualification. If an open market for groups ever becomes possible, it will emerge from solving this foundational issue first.
All of this is happening while demand for IRL experiences keeps rising. Ari Emanuel, CEO of Endeavor, is making huge bets on IRL entertainment. Airbnb is betting on the same trend with Experiences, and Booking.com continues to exceed expectations. These are only a few examples, but the pattern is clear: demand for real-world experiences keeps growing.
And while demand increases, customer expectations have never been higher. People have become accustomed to better design and higher quality tools at work and in everyday life. But when they try to plan a group, they fall back into 2007-era workflows: attachments, forms, PDFs, and email chains. This is the level of hospitality guests are met with at first click 😕.
All of this sets the stage for where we begin. We are starting where the friction is worst: on-demand sales matchmaking between guests and properties, with the goal of saving time for both sides and helping properties win more of the right business.
So far, we've built Line to structure a property's knowledge so an agent can understand how the property wants to engage with guests and organize incoming inquiries clearly.
In Part II, I will cover why this problem hasn't been solved, what inspires our determination, and the path forward.
Kyle Mann | November 1st, 2025
It's been a heads down month of building, but I did make it out to The Hospitality Show in Denver. Unsurprisingly, the topic of the week was AI, AI, and more AI.
Even walking the tradeshow floor, it felt like every vendor had "AI-powered" somewhere on their booth. Even pillowcases and soap are AI-powered these days! (half kidding 🫠)
It's dizzying. I can only imagine how property teams and operators feel trying to make sense of it all. So I wanted to share a few reflections from building in the field.
Having an AI agent is quickly becoming the new version of having a website. It will be the standard for how people discover and engage with your business, as discussed in the new whitepages blog.
However building an AI agent isn't plug-and-play. It takes time and iteration to train it and teach it how your business works. Its value is directly tied to the quality of the onboarding.
Balaji Srinivasan has a great observation: "AI is not yet end-to-end, but middle to middle". What that means is that it doesn't invent its own purpose or redesign workflows on its own. It still needs guidance. It needs people who understand the context and can steer it toward the right problems.
The value comes when AI is applied to specific pain points and workflows that actually matter, not when it's added as a marketing label. Buyer beware: check the ingredients list closely. Is it solving something real under the hood, or is it just a sticker?
In the 90s, people used fax machines. In the 2000s, they used email. Would you hire someone who still uses a fax machine? The market has always rewarded people who bring value with efficiency and improved communication. This is no different.
Ari Emanuel, CEO of Endeavor, has been making big bets on IRL entertainment. In this talk / and article he outlines how culture is shifting toward shorter work weeks and greater demand for entertainment and IRL experiences. One data point he mentioned stood out to me: the sharp increase in Thursday hotel bookings for leisure travel.
That's a sign of something bigger that we're betting on, too. As life moves more online and isolation grows, the desire for IRL connection will only rise. Hospitality has always been here to meet that moment. It's an industry built on creativity, service, and memory-making, all things people will continue to crave and will become more valuable than ever.
Technology and AI can play a powerful role here, by removing the friction of repetitive work and giving time back to what matters. As one of our customers put it, "AI gives me time back for the creative things I love, like improving our service and designing new menus".
This is the way.
Usually, you can look at other products as a point of reference for interaction design, but in this case, the traditional RFP workflow is so dominant that what we're building feels like quite a contrarian interaction. We've been running user tests to better understand how people feel when engaging with this new kind of workflow. Here are a few things we're noticing:
When guests reach out to a property, they're usually looking for three things:
In observing the guest through their inquiry workflow, they consistently preferred the conversation that got straight to the point. It reflects the same frustration people have with long forms or slow responses: too much friction before getting an answer.
We're training the agent to be more of a matchmaker, not to mimic small talk or rote questioning. The human warmth and connection belong in person, during the site visit, where it naturally shines. Before that moment, what matters most is helping people get to the point.
When people first interact with the agent, you can see it's a familiar interaction, "oh, it's like ChatGPT for the property" was something we heard multiple times. They start typing, scrolling, and exploring without hesitation.
They use it the same way they might ask, "where should I have my wedding?" except now they're asking, "why should I have my wedding with you?" This pattern is familiar, intuitive, and already how people find answers today.
The big winner in testing was contextual imagery during the conversation. Across all sessions we saw prompts like, "show me more pictures", "what does that look like?" or "I like the outdoor option".
This fills a gap that's often missing in the classic sales phone call. The agent might say, "Oh, I think the Grand Ballroom would be a great fit", but what that looks like is left to the guest's imagination.
Matching a guest's desires to the right space, then helping them visualize it, is the interaction we want to get really, really good at.
At its core, Line is about giving back the most valuable thing: time. For guests, that means getting answers in minutes instead of days. For teams, it means fewer late-night replies and more space for creativity, service, and growth. That's how we are measuring success: did we give you time back, and did it help improve bookings?
I want to spotlight our co-founder and CTO, Eli Hooten. Eli and I first met nearly six years ago when he recruited me from GitLab to join Codecov. Since then, we've had the chance to build, learn, and grow through several chapters together, including Codecov's acquisition by Sentry. Line feels like a natural continuation of that journey and our partnership.
Eli earned his PhD from Vanderbilt, went through Techstars, and co-founded GameWisp and Codecov before founding Line. He also co-owns a rentals business, where he experienced firsthand the inefficiencies and daily friction that operators face, part of what inspired him to bring that same innovation and empathy to hospitality.
Connect at eli@withline.io or on LinkedIn.
What stands out most about Eli isn't just his world class technical depth, but the way he leads. He has a calm, steady presence and an instinct for solving hard problems without ever losing sight of the people behind them. There's also that unmistakable Southern drawl and a sense of hospitality that comes through in how he builds and how he treats people.
It's a real privilege and honor to work alongside him, to see the innovation he's bringing to this space, and to be part of a team that shares both history and purpose. We couldn't ask for a better partner, leader, or friend to be building with.
We're continuing to train agents and onboard properties from the waitlist. Here's what we're focused on next:
We'll also be at Phocuswright and IAAPA Expo this month. If you're attending, let's connect.
Thanks for reading and for being part of this journey.
– Kyle