Show HN: Data Formulator – interactive AI agents for data analysis (Microsoft)

data-formulator.ai

36 points by chenglong-hn a day ago

Hi everyone, we are excited to share with you our new release of Data Formulator. Starting from a dataset, you can communicate with AI agents with UI + natural language to explore data and create visualizations to discover new insights. Here's a demo video of the experience: https://github.com/microsoft/data-formulator/releases/tag/0.....

This is a build-up from our release a year ago (https://news.ycombinator.com/item?id=41907719). We spent a year exploring how to blend agent mode with interactions to allow you more easily "vibe" with your data but still keeping in control. We don't think the future of data analysis is just "agent to do all for you from a high-level prompt" --- you should still be able to drive the open-ended exploration; but we also don't want you to do everything step-by-step. Thus we worked on this "interactive agent mode" for data analysis with some UI innovations.

Our new demo features:

* We want to let you import (almost) any data easily to get started exploration — either it's a screenshot of a web table, an unnormalized excel table, table in a chunk of text, a csv file, or a table in database, you should be able to load into the tool easily with a little bit of AI assistance.

* We want you to easily choose between agent mode (more automation) vs interactive mode (more fine-grained control) yourself as you explore data. We designed an interface of "data threads": both your and agents' explorations are organized as threads so you can jump into any point to decide how you want to follow-up or revise using UI + NL instruction to provide fine-grained control.

* The results should be easily interpretable. Data Formulator now presents "concept" behind the code generated by AI agents alongside code/explanation/data. Plus, you can compose a report easily based on your visualizations to share insights.

We are sharing the online demo at https://data-formulator.ai/ for you to try! If you want more involvement and customization, checkout our source code https://github.com/microsoft/data-formulator and let's build something together as a community!

jaxn a day ago

There are references to using connectors to connect to databases, but I can't find any documentation on how to actually do that.

xnx a day ago

Pretty cool. I like the local install option.

I almost skipped this as more AI wrapper shovelware. Would benefit from putting "Microsoft" in the title.

cadamsdotcom a day ago

Very cool - a lot of well thought out stuff in there.

One area for exploration is letting people turn natural language questions into non-LLM queries, UIs, & dashboards. In other words to let non-engineers codify their questions into queries they can review for correctness and then take the LLM out of the picture.

Imagine if your CEO could ask natural language questions, build their own dashboard, review the generated queries for correctness, and be able to see deterministic results on any metric they care about - without having to ask an intern and without a multi-hour turnaround while it’s implemented.

Codification is kind of the best of both worlds and the underlying idea (explore with an LLM & then codify into something fast and deterministic when ready) is quite universal.

  • mritchie712 a day ago

    This was too perfect of a setup, had to record a video[0] showing how we do this.

    Yes, you definitely need need for a codification layer.

    I think a semantic layer is the best way to do that for analytics. Having an LLM write bespoke SQL to answer every question will fail fast.

    e.g. if you ask for "revenue by month" against a Snowflake warehouse with hundreds of tables, you are guaranteed to get different answers over multiple attempts.

    We[1] use an agent to build a semantic layer over time at Definite so you get consistent results.

    0 - https://www.loom.com/share/2da829dd440e489a8f7e3906c7083048

    1 - https://www.definite.app/

    • chenglong-hn a day ago

      This is incredibly cool! A lot of times the user query can be ambiguous enough to make it consistent across runs. The semantic layer is essential to reduce ambiguity, either built by AI or engineers.

  • chenglong-hn a day ago

    That's something we are building! We hope to enhance the report generation as a dashboard builder. Instead of automatically compose an article out of the exploration, we could add more instructions and UI to allow user to specify how different components (vis, data, questions) should be put together to "codify" into a live document to share.

XYZ12334 a day ago

Hyped to use your product in our Mumbai SaaS startup sir!

  • chenglong-hn a day ago

    Feel free to submit requests in github for any customization needs!