The tech companies contend they help workers with their biggest pain points. Microsoft and Google claim their latest AI tools can automate the mundane, help people who struggle to get started on writing, and even aid with organization, proofreading, preparation and creating.
Of all working U.S. adults, 34 percent think that AI will equally help and hurt them over the next 20 years, according to a survey released by Pew Research Center last year. But a close 31 percent aren’t sure what to think, the survey shows.
So the Help Desk put these new AI tools to the test with common work tasks. Here’s how it went.
Ideally, AI should speed up catching up on email, right? Not always.
It may help you skim faster, start an email or elaborate on quick points you want to hit. But it also might make assumptions, get things wrong or require several attempts before offering the desired result.
Microsoft’s Copilot allows users to choose from several tones and lengths before you start drafting. Users create a prompt for what they want their email to say and then have the AI adjust based on changes they want to see.
While the AI often included desired elements in the response, it also often added statements we didn’t ask for in the prompt when we selected short and casual options. For example, when we asked it to disclose that the email was written by Copilot, it sometimes added marketing comments like calling the tech “cool” or assuming the email was “interesting” or “fascinating.”
When we asked it to make the email less positive, instead of dialing down the enthusiasm, it made the email negative. And if we made too many changes, it lost sight of the original request.
“They hallucinate,” said Ethan Mollick, associate professor at the Wharton School of the University of Pennsylvania, who studies the effects of AI on work. “That’s what AI does — make up details.”
When we used a “direct” tone and short length, the AI produced fewer false assumptions and more desired results. But a few times, it returned an error message suggesting that the prompt had content Copilot couldn’t work with.
If we entirely depended on the AI, versus making major manual edits to the suggestions, getting a fitting response often took multiple if not several tries. Even then, one colleague responded to an AI-generated email with a simple response to the awkwardness: “LOL.”
“We called it Copilot for a reason,” said Colette Stallbaumer, general manager of Microsoft 365 and future of work marketing. “It’s not autopilot.”
Google’s Gemini has fewer options for drafting emails, allowing users to elaborate, formalize or shorten. However, it made fewer assumptions and often stuck solely to what was in the prompt. That said, it still sometimes sounded robotic.
Copilot can also summarize emails, which can quickly help you catch up on a long email thread or cut through your wordy co-worker’s mini-novel, and it offers clickable citations. But it sometimes highlighted less relevant points, like reminding me of my own title listed in my signature.
The AI seemed to do better when it was fed documents or data. But it still sometimes made things up, returned error messages or didn’t understand context.
We asked Copilot to use a document full of reporter notes, which are admittedly filled with shorthand, fragments and run-on sentences, and asked it to write a report. At first glance, the result seemed convincing that the AI had made sense of the messy notes. But with closer inspection, it was unclear if anything actually came from the document, as the conclusions were broad, overreaching and not cited.
“If you give it a document to work off, it can use that as a basis,” Mollick said. It may “hallucinate less but in more subtle ways that are harder to identify.”
When we asked it to continue a story we started writing, providing it a document filled with notes, it summarized what we had already written and produced some additional paragraphs. But, it became clear much of it was not from the provided document.
“Fundamentally, they are speculative algorithms,” said Hatim Rahman, an assistant professor at Northwestern University’s Kellogg School of Management, who studies AI’s impact on work. “They don’t understand like humans do. They provide the statistically likely answer.”
Summarizations were less problematic, and the clickable citations made it easy to confirm each point. Copilot was also helpful in editing documents, often catching acronyms that should be spelled out, punctuation or conciseness, much like a beefed-up spell check.
With spreadsheets, the AI can be a little tricky, and you need to convert data to a table format first. Copilot more accurately produced responses to questions about tables with simple formats. But for larger spreadsheets that had categories and subcategories or other complex breakdowns, we couldn’t get it to find relevant information or accurately identify the trends or takeaways.
Meetings and presentations
Microsoft says one of users’ top places to use Copilot is in Teams, the collaboration app that offers tools including chat and video meetings. Our test showed the tool can be helpful for quick meeting notes, questions about specific details, and even a few tips on making your meetings better. But typical of other meeting AI tools, the transcript isn’t perfect.
First, users should know that their administrator has to enable transcriptions so Copilot can interact with the transcript during and after the meeting — something we initially missed. Then, in the meeting or afterward, users can use Copilot to ask questions about the meeting. We asked for unanswered questions, action items, a meeting recap, specific details and how we could’ve made the meeting more efficient. It can also pull up video clips that correspond to specific answers if you record the meeting.
The AI was able to recall several details, accurately list action items and unanswered questions, and give a recap with citations to the transcript. Some of its answers were a little muddled, like when it confused the name of a place with the location and ended up with something that looked a little like word salad. It was able to identify the tone of the meeting (friendly and casual with jokes and banter) and censored curse words with asterisks. And it provided advice for more efficient meetings: For us that meant creating a meeting agenda and reducing the “small talk and jokes” that took the conversation off topic.
Copilot can also help users make a PowerPoint presentation, complete with title pages and corresponding images, based off a document in a matter of seconds. But that doesn’t mean you should use the presentation as is.
A document’s organization and format seem to play a role in the result. In one instance, Copilot created an agenda with random words and dates from the document. Other times, it made a slide with just a person’s name and responsibility. But it did better documents with clear formats (think an intro and subsections).
While Copilot’s image generation for slides was usually related, sometimes its interpretation was too literal. Google’s Gemini also can help create slides and generate images, though more often than not when trying to create images, we received a message that said, “for now we’re showing limited results for people. Try something else.”
AI can aid with idea generation, drafting from a blank page or quickly finding a specific item. It also may be helpful for catching up on emails, meetings and summarizing long conversations or documents. Another nifty tip? Copilot can gather the latest chats, emails and documents you’ve worked on with your boss before your next meeting together.
But all results and content need careful inspection for accuracy, some tweaking or deep edits — and both tech companies advise users verify everything generated by the AI. “I don’t want people to abdicate responsibility,” said Kristina Behr, vice president of product management for collaboration apps at Google Workspace. “This helps you do your job. It doesn’t do your job.”
And as is the case with AI, the more details and direction in the prompt, the better the output. So as you do each task, you may want to consider whether AI will save you time or actually create more work.
“The work it takes to generate outcomes like text and videos has decreased,” Rahman said. “But the work to verify has significantly increased.”