This article feels extremely imprecise. The syntax of the "language" changes from example to example, control structures like conditionals are expressed in English prose, some examples are solved by "do all the work for me" functions like the "toPdf()" example...
This whole thing feels like an elaborate LLM fantasy. Is there any real, usable language behind these examples, or is the author just role-playing with ChatGPT?
I know ACM Queue is a non-peer-reviewed magazine for practitioners but this still feels like too much of an advertisement, without any attempt whatsoever to discuss downsides or limitations. This really doesn't inspire confidence:
While this may seem like a whimsical example, it is not intrinsically easier or harder for an AI model compared to solving a real-world problem from a human perspective. The model processes both simple and complex problems using the same underlying mechanism. To lessen the cognitive load for the human reader, however, we will stick to simple targeted examples in this article.
For LLMs this is blatantly false - in fact asking about "used textbooks" instead of "apples" is measurably more likely to result in an error! Maybe the (deterministic, Prolog-style) Universalis language mitigates this. But since Automind (an LLM, I think) is responsible for pre/post validation, naively I would expect it to sometimes output incorrect Universalis code and incorrectly claim an assertion holds when it does not.
Maybe I am making a mountain out of a molehill but this bit about "lessen the cognitive load of the human reader" is kind of obnoxious. Show me how this handles a slightly nontrivial problem, don't assume I'm too stupid to understand it by trying to impress me with the happy path.
Prolog works indeed very well as target for generation by an LLM, for input problems limited and similar enough in nature to given classes of templated in-context examples, so well indeed that the lack of a succinct, exhaustive text description of your problem is becoming the issue. At which point you can specify your problem in Prolog directly considering Prolog was also invented to model natural language parsing and not just for solving constraint/logic problems, or could employ ILP techniques to learn or optimize Prolog solvers from existing problem solutions rather than text descriptions. See [1].
That link you're citing is old news and also contained/discussed in the Quantum Prolog article. Those observations were made with respect to translating problem descriptions into PDDL, a frame-based, LISPish specification language for AI competitions encoding planning "domains" in a tight a priori taxonomical framework rather than logic or any other Turing-complete language. As such, contrary to what's speculated in that link, the expectation is that results do not carry over to the case of Prolog, which is much more expressive. I actually considered customizing an LLM using an English/Prolog corpus, which should be relatively straightforward given Prolog's NLP roots, but the in-context techniques turned out so impressive already (using 2025 SoTA open weight models) that indeed the bottleneck was the lack of text descriptions for really challenging real-world problems, as mentioned in the article. The reason may lie in the fact that English-to-Prolog mapping examples and/or English Prolog code documentation examples are sufficiently common in the latent space/in foundation training data.
I can ensure you Prolog prompting for at least the class of robotic planning problems (and similar discrete problems, plus potentially more advanced classes such as scheduling and financial/investment allocation planning requiring objective function optimization) works well, and you can easily check it out yourself with the prompting guide or even online if you have a capable endpoint you're willing to enter [1].
>> Since we're not doing original research, but rather intend to demonstrate a port of the Aleph ILP package to ISO Prolog running on Quantum Prolog, we cite the problem's definition in full from the original paper (ilp09):
Aleph? In 2025. That's just lazy, now. At the very least they should try Metagol or Popper, both with dozens of recent publications (and I'm not even promoting my own work).
You're not wrong, and alternatives were considered, but those were really not fit to be ported to ISO Prolog in bounded time with complete lack of tests or even a basic reproducible demo, uncontrolled usage of non-ISO libs and features only available on the originally targetted Prolog implementation, and other issues typical of "academic codes."
The lack of unit tests is something I'm guilty of too and you're very right about it. The community is dimly aware of the fact that systems are more on the "academic prototype" side of things and less on the "enterprise software" but there's so little interest from industry that nobody is going to spend significant effort to change that. Kind of a catch 22 maybe.
How about ISO? Why was this a requirement, out of curiosity?
Quantum Prolog is pure ISO and was the primary target for porting here. The idea is really just putting effort behind libs that can be shared by other non-legacy Prologs and thus get more people involved.
Look, I'll be blunt. If ISO is keeping you from running any ILP system newer than Aleph and you have a good reason to want to use an ILP system then you should reconsider the benefits of ISO. There's no ISO for LLMs for example, yet you'll have no trouble at all convincing people to get involved, there.
That said, I think Vanilla should be easy to ISO-fy. It's just a meta-interpreter really. It uses tabling and I'm guessing that's not ISO (it's more like the Wild West) but it is not indispensible.
I could have a look, but I have no idea what's in the ISO standard (and that's another bugbear- the ISO docs are not free. What?). I guess I could try to run it on your engine and see what the errors say though.
P.S. Re: "academic codes" I got documentation, logging, and custom errors and that's more than you can say for 100% of academic prototypes plus ~80% of industry software also. I do have unit tests, even, just not many.
Briefly looked at the vanilla repo and I really like it.
The focus seems to be on the paper the sw was published along with as mere attachment; I can get as much seeing that the doc consists of an excerpt of that paper. There's no guide on how to run a simple demo. It doesn't state license conditions. It uses tabling by default but it's Wild West in your own words ;) The canonical tabling implementation would be XSB Prolog anyway, but the sw seems to require SWI. It uses "modules" but if that is justified by code base size even, for Prolog code working on theories stored in the primary Prolog db specifically, such as ILP, this is just calling for trouble with predicate visibility and permission, and it shows in idiosyncratic code where both call(X) and user:call(X) is used.
What do you really expect from putting it on github? It's nice that you've got unit tests as you're saying but there aren't any in the repo. I'm sure the code serves its original purpose as a proof-of-concept or prototype in the context of academic discourse on the happy path well, but due to the issues I mentioned picking it up already involves nontrivial time investment when it isn't sure to save time as it stands compared to developing it from scratch based on widely available academic literature. In contrast, Aleph has test data where the authors have gone out of their way to publish reproducible results (as stated in TFA), end-user documentation, coverage in academic literature for eg. troubleshooting, a version history with five or at least two widely used versions, a small community behind or even so much as a demonstration that more than a single person could make sense of it, and a perspective for long term maintenance as an ISO port
*) Not to speak of the "modules" spec being withdrawn and considered bogus and unhelpful for a really long time now, nor of "modules" pushing Prolog as general-purpose language when the article is very much about using basic idiomatic Prolog combinatorial search for practical planning problems in the context of code generation by LLMs.
With that in mind, I was personally more interested in ILP as a complementary technique to LLM code generation.
Re ISO as I said the focus here is solving practical problems in a commercial setting with standardization as an alignment driver between newer Prolog developers/engines. You know, as opposed to people funded by public money sitting on de facto implementations for decades, which are still not great targets for LLMs. Apart from SWI this would in particular be the case for YAP Prolog which is the system Aleph was originally developed for and making use of its heuristic term indexing, which however has been in a long term refactoring spree such that it's difficult to compile on modern systems
Note: Vanilla is a "learning and reasoning engine" so you can't do anything with it stand-alone. You have to develop a learning system on top of it. There are four example systems that come with Vanilla: Simpleton, (a new implementation of) Metagol, (a new implementation of) Louise, and Poker. You'll find them under <vanilla>/lib/, e.g. <vanilla>/lib/poker is the new system, Poker.
>> The focus seems to be on the paper the sw was published along with as mere attachment;
That, and a more recent pre-print, use Vanilla but they don't go into any serious detail on the implementation. Previous Meta-Interpretive Learning (MIL) systems did not separate the learning engine from the learning system, so you would not be able to implement Vanilla from scratch based on the literature; there is no real literature on it, to speak of. I feel there is very little academic interest for scholarly articles about implementation details, at least in AI and ILP where I tend to publish, so I haven't really bothered. I might at some point submit something to a Logic Programming venue or just write up a tech report/ manual; or complete Vanilla's REAMDE.
To clarify, the documentation I meant I have, is in the structured comments accompanying the source. This can be nicely displayed in a browser with SWI-Prolog's PlDoc library. You get that automatically if you start Vanilla by consulting the `load_project.pl` project load file which also launches the SWI IDE, or with `?- doc_browser.` if you start Vanilla in "headless" mode with `?-[load_headless].` Or you can just read the comments as text, of course. I should really have put all that in my incomplete README file.
The current version of Vanilla definitely "requires" SWI in the sense that I developed it in SWI and I have no idea whether, or how, it will run in other Prologs. Probably not great, as usual. SWI has predicates to remove and rebuild tables on the fly, which XSB does not. That's convenient because it means you don't need to restart the Prolog session between learning runs. Still it's a long-term plan to get Vanilla to run on as many Prolog implementations as possible, so I really wouldn't expect you (or any other Prolog dev) to do anything to port it. That's my job.
Prolog modules are a shitshow of a dumpster fire, no objection from me. Still, better with, than without. Although I do have to jump through hoops and rudely hack the SWI module implementation to get stuff done. Long story short, in all the learners that come with Vanilla, a dataset is a module and all dataset modules have the same module name: "experiment_file". That is perfectly safe as long as usage instructions are followed, which I have yet to write (because I only recently figured it out myself). A tutorial is a good idea.
>> What do you really expect from putting it on github?
Feedback, I suppose. Like I say, no, you wouldn't be able to implement Vanilla by reading the literature on MIL, which is languishing a couple of years behind the point where Vanilla was created. Aleph enjoys ~30 years of publications and engagement with the community, but so does SWI and you're developing your own Prolog engine so I shouldn't have to argue about why that's OK.
From my point of view, Aleph is the old ILP that didn't work as promised: no real ability to learn recursion, no predicate invention, and Inverse Entailment is incomplete [1]. Vanilla is the new ILP that ticks all the boxes: learning recursion, predicate invention, sound and complete inductive inference. And it's efficient even [2].
Unfortunately there is zero interest in it, so I'm taking my time developing it. Nobody's paying me to do it anyway.
>> It's nice that you've got unit tests as you're saying but there aren't any in the repo.
Oh. I thought I had them included. Thanks.
>> With that in mind, I was personally more interested in ILP as a complementary technique to LLM code generation.
I have no interest in that at all! But good luck I guess. I've certainly noticed some interest in combining Logic Programming with LLMs. I think this is in the belief that the logic in Logic Programming will somehow magickally percolate up to the LLM token generation. That won't happen of course. You'd need all the logic happening before the first token is generated, otherwise you're in a generate-and-filter regime that is juuust a little bit more controllable than an LLM, and more wasteful. Think of all the poor tokens you're throwing away!
Edit: Out of curiosity, did you have to edit Aleph's code to get it to run on your engine? I had to use Aleph on the job a while ago and I couldn't get it to work, not with YAP, not with SWI (I tried a version that was supposed to be specifically for SWI, but it didn't work). I only managed to get it to run from a version that was sent to me by a colleague, which I think had been tweaked to work with SWI (but wasn't the "official" ish SWI port).
_____________
[1] Akihiro Yamamoto, Which hypotheses can be found with inverse entailment? (2005)
Glad to see focus being put on keeping humans in the drivers seat, democratizing coding with the help of AI. The syntax is probably still too verbose to be easily accessible, but I like the overall approach.
Great to start off ... then we will end-up reinventing/re-specifying functions for reusability, module/packages for higher-level grouping, types/classes, state machines, control-flows [with the nuances for edge cases and exit conditions], then we will need error control, exceptions; sooner or later concurrency, parallelism, data structures, recursion [lets throw in monads for the Haskellians amoung us]; who knows .. we may even end up with GOTOs peppered all over the English sentences [with global labels] & wake up to the scoping, parameter passing. We can have a whole lot of new fights if we need object-oriented programming; figure out new design patterns with special "Token Factory Factories".
We took a few decades to figure out how to specify & evolve current code to solve a certain class of problems [nothing is perfect .. but it seems to work at scale with trade-offs]. Shall watch this from a distance with pop-corn.
This article feels extremely imprecise. The syntax of the "language" changes from example to example, control structures like conditionals are expressed in English prose, some examples are solved by "do all the work for me" functions like the "toPdf()" example...
This whole thing feels like an elaborate LLM fantasy. Is there any real, usable language behind these examples, or is the author just role-playing with ChatGPT?
More to the point, is ChatGPT role-playing the author?
lol
I know ACM Queue is a non-peer-reviewed magazine for practitioners but this still feels like too much of an advertisement, without any attempt whatsoever to discuss downsides or limitations. This really doesn't inspire confidence:
For LLMs this is blatantly false - in fact asking about "used textbooks" instead of "apples" is measurably more likely to result in an error! Maybe the (deterministic, Prolog-style) Universalis language mitigates this. But since Automind (an LLM, I think) is responsible for pre/post validation, naively I would expect it to sometimes output incorrect Universalis code and incorrectly claim an assertion holds when it does not.Maybe I am making a mountain out of a molehill but this bit about "lessen the cognitive load of the human reader" is kind of obnoxious. Show me how this handles a slightly nontrivial problem, don't assume I'm too stupid to understand it by trying to impress me with the happy path.
Prolog works indeed very well as target for generation by an LLM, for input problems limited and similar enough in nature to given classes of templated in-context examples, so well indeed that the lack of a succinct, exhaustive text description of your problem is becoming the issue. At which point you can specify your problem in Prolog directly considering Prolog was also invented to model natural language parsing and not just for solving constraint/logic problems, or could employ ILP techniques to learn or optimize Prolog solvers from existing problem solutions rather than text descriptions. See [1].
[1]: https://quantumprolog.sgml.net
Right but my point was that "given classes of templated in-context examples" is either
a) a game of roulette where you hope the LLM provider has RLHFed something very close to your use case, or
b) trying to few-shot it with in-context examples requires more engineering (and is still less reliable) than simply doing it yourself
In particular it's not just "the lack of a succinct, exhaustive text description," it also a lack of English->Prolog "translations."
It seems like the LLM-Prolog community is well aware of all this (https://swi-prolog.discourse.group/t/llm-and-prolog-a-marria...) but I don't see anything in Universalis that solves the problem. Instead it's just magically invoking the LLM.
That link you're citing is old news and also contained/discussed in the Quantum Prolog article. Those observations were made with respect to translating problem descriptions into PDDL, a frame-based, LISPish specification language for AI competitions encoding planning "domains" in a tight a priori taxonomical framework rather than logic or any other Turing-complete language. As such, contrary to what's speculated in that link, the expectation is that results do not carry over to the case of Prolog, which is much more expressive. I actually considered customizing an LLM using an English/Prolog corpus, which should be relatively straightforward given Prolog's NLP roots, but the in-context techniques turned out so impressive already (using 2025 SoTA open weight models) that indeed the bottleneck was the lack of text descriptions for really challenging real-world problems, as mentioned in the article. The reason may lie in the fact that English-to-Prolog mapping examples and/or English Prolog code documentation examples are sufficiently common in the latent space/in foundation training data.
I can ensure you Prolog prompting for at least the class of robotic planning problems (and similar discrete problems, plus potentially more advanced classes such as scheduling and financial/investment allocation planning requiring objective function optimization) works well, and you can easily check it out yourself with the prompting guide or even online if you have a capable endpoint you're willing to enter [1].
[1]: https://quantumprolog.sgml.net/llm-demo/part2.html
>> Since we're not doing original research, but rather intend to demonstrate a port of the Aleph ILP package to ISO Prolog running on Quantum Prolog, we cite the problem's definition in full from the original paper (ilp09):
Aleph? In 2025. That's just lazy, now. At the very least they should try Metagol or Popper, both with dozens of recent publications (and I'm not even promoting my own work).
You're not wrong, and alternatives were considered, but those were really not fit to be ported to ISO Prolog in bounded time with complete lack of tests or even a basic reproducible demo, uncontrolled usage of non-ISO libs and features only available on the originally targetted Prolog implementation, and other issues typical of "academic codes."
The lack of unit tests is something I'm guilty of too and you're very right about it. The community is dimly aware of the fact that systems are more on the "academic prototype" side of things and less on the "enterprise software" but there's so little interest from industry that nobody is going to spend significant effort to change that. Kind of a catch 22 maybe.
How about ISO? Why was this a requirement, out of curiosity?
Quantum Prolog is pure ISO and was the primary target for porting here. The idea is really just putting effort behind libs that can be shared by other non-legacy Prologs and thus get more people involved.
Look, I'll be blunt. If ISO is keeping you from running any ILP system newer than Aleph and you have a good reason to want to use an ILP system then you should reconsider the benefits of ISO. There's no ISO for LLMs for example, yet you'll have no trouble at all convincing people to get involved, there.
That said, I think Vanilla should be easy to ISO-fy. It's just a meta-interpreter really. It uses tabling and I'm guessing that's not ISO (it's more like the Wild West) but it is not indispensible.
https://github.com/stassa/vanilla/blob/master/src/vanilla.pl
I could have a look, but I have no idea what's in the ISO standard (and that's another bugbear- the ISO docs are not free. What?). I guess I could try to run it on your engine and see what the errors say though.
P.S. Re: "academic codes" I got documentation, logging, and custom errors and that's more than you can say for 100% of academic prototypes plus ~80% of industry software also. I do have unit tests, even, just not many.
Briefly looked at the vanilla repo and I really like it.
The focus seems to be on the paper the sw was published along with as mere attachment; I can get as much seeing that the doc consists of an excerpt of that paper. There's no guide on how to run a simple demo. It doesn't state license conditions. It uses tabling by default but it's Wild West in your own words ;) The canonical tabling implementation would be XSB Prolog anyway, but the sw seems to require SWI. It uses "modules" but if that is justified by code base size even, for Prolog code working on theories stored in the primary Prolog db specifically, such as ILP, this is just calling for trouble with predicate visibility and permission, and it shows in idiosyncratic code where both call(X) and user:call(X) is used.
What do you really expect from putting it on github? It's nice that you've got unit tests as you're saying but there aren't any in the repo. I'm sure the code serves its original purpose as a proof-of-concept or prototype in the context of academic discourse on the happy path well, but due to the issues I mentioned picking it up already involves nontrivial time investment when it isn't sure to save time as it stands compared to developing it from scratch based on widely available academic literature. In contrast, Aleph has test data where the authors have gone out of their way to publish reproducible results (as stated in TFA), end-user documentation, coverage in academic literature for eg. troubleshooting, a version history with five or at least two widely used versions, a small community behind or even so much as a demonstration that more than a single person could make sense of it, and a perspective for long term maintenance as an ISO port
*) Not to speak of the "modules" spec being withdrawn and considered bogus and unhelpful for a really long time now, nor of "modules" pushing Prolog as general-purpose language when the article is very much about using basic idiomatic Prolog combinatorial search for practical planning problems in the context of code generation by LLMs.
With that in mind, I was personally more interested in ILP as a complementary technique to LLM code generation.
Re ISO as I said the focus here is solving practical problems in a commercial setting with standardization as an alignment driver between newer Prolog developers/engines. You know, as opposed to people funded by public money sitting on de facto implementations for decades, which are still not great targets for LLMs. Apart from SWI this would in particular be the case for YAP Prolog which is the system Aleph was originally developed for and making use of its heuristic term indexing, which however has been in a long term refactoring spree such that it's difficult to compile on modern systems
Note: Vanilla is a "learning and reasoning engine" so you can't do anything with it stand-alone. You have to develop a learning system on top of it. There are four example systems that come with Vanilla: Simpleton, (a new implementation of) Metagol, (a new implementation of) Louise, and Poker. You'll find them under <vanilla>/lib/, e.g. <vanilla>/lib/poker is the new system, Poker.
>> The focus seems to be on the paper the sw was published along with as mere attachment;
Do you mean this paper?
https://hmlr-lab.github.io/pdfs/Second_Order_SLD_ILP2024.pdf
That, and a more recent pre-print, use Vanilla but they don't go into any serious detail on the implementation. Previous Meta-Interpretive Learning (MIL) systems did not separate the learning engine from the learning system, so you would not be able to implement Vanilla from scratch based on the literature; there is no real literature on it, to speak of. I feel there is very little academic interest for scholarly articles about implementation details, at least in AI and ILP where I tend to publish, so I haven't really bothered. I might at some point submit something to a Logic Programming venue or just write up a tech report/ manual; or complete Vanilla's REAMDE.
To clarify, the documentation I meant I have, is in the structured comments accompanying the source. This can be nicely displayed in a browser with SWI-Prolog's PlDoc library. You get that automatically if you start Vanilla by consulting the `load_project.pl` project load file which also launches the SWI IDE, or with `?- doc_browser.` if you start Vanilla in "headless" mode with `?-[load_headless].` Or you can just read the comments as text, of course. I should really have put all that in my incomplete README file.
The current version of Vanilla definitely "requires" SWI in the sense that I developed it in SWI and I have no idea whether, or how, it will run in other Prologs. Probably not great, as usual. SWI has predicates to remove and rebuild tables on the fly, which XSB does not. That's convenient because it means you don't need to restart the Prolog session between learning runs. Still it's a long-term plan to get Vanilla to run on as many Prolog implementations as possible, so I really wouldn't expect you (or any other Prolog dev) to do anything to port it. That's my job.
Prolog modules are a shitshow of a dumpster fire, no objection from me. Still, better with, than without. Although I do have to jump through hoops and rudely hack the SWI module implementation to get stuff done. Long story short, in all the learners that come with Vanilla, a dataset is a module and all dataset modules have the same module name: "experiment_file". That is perfectly safe as long as usage instructions are followed, which I have yet to write (because I only recently figured it out myself). A tutorial is a good idea.
>> What do you really expect from putting it on github?
Feedback, I suppose. Like I say, no, you wouldn't be able to implement Vanilla by reading the literature on MIL, which is languishing a couple of years behind the point where Vanilla was created. Aleph enjoys ~30 years of publications and engagement with the community, but so does SWI and you're developing your own Prolog engine so I shouldn't have to argue about why that's OK.
From my point of view, Aleph is the old ILP that didn't work as promised: no real ability to learn recursion, no predicate invention, and Inverse Entailment is incomplete [1]. Vanilla is the new ILP that ticks all the boxes: learning recursion, predicate invention, sound and complete inductive inference. And it's efficient even [2].
Unfortunately there is zero interest in it, so I'm taking my time developing it. Nobody's paying me to do it anyway.
>> It's nice that you've got unit tests as you're saying but there aren't any in the repo.
Oh. I thought I had them included. Thanks.
>> With that in mind, I was personally more interested in ILP as a complementary technique to LLM code generation.
I have no interest in that at all! But good luck I guess. I've certainly noticed some interest in combining Logic Programming with LLMs. I think this is in the belief that the logic in Logic Programming will somehow magickally percolate up to the LLM token generation. That won't happen of course. You'd need all the logic happening before the first token is generated, otherwise you're in a generate-and-filter regime that is juuust a little bit more controllable than an LLM, and more wasteful. Think of all the poor tokens you're throwing away!
Edit: Out of curiosity, did you have to edit Aleph's code to get it to run on your engine? I had to use Aleph on the job a while ago and I couldn't get it to work, not with YAP, not with SWI (I tried a version that was supposed to be specifically for SWI, but it didn't work). I only managed to get it to run from a version that was sent to me by a colleague, which I think had been tweaked to work with SWI (but wasn't the "official" ish SWI port).
_____________
[1] Akihiro Yamamoto, Which hypotheses can be found with inverse entailment? (2005)
https://link.springer.com/chapter/10.1007/3540635149_58
[2] Without tabling.
Glad to see focus being put on keeping humans in the drivers seat, democratizing coding with the help of AI. The syntax is probably still too verbose to be easily accessible, but I like the overall approach.
> Universalis ensures that even those with minimal experience in programming can perform advanced data manipulations.
Is it a good thing to make this easier? We're drowning in garbage already.
Great to start off ... then we will end-up reinventing/re-specifying functions for reusability, module/packages for higher-level grouping, types/classes, state machines, control-flows [with the nuances for edge cases and exit conditions], then we will need error control, exceptions; sooner or later concurrency, parallelism, data structures, recursion [lets throw in monads for the Haskellians amoung us]; who knows .. we may even end up with GOTOs peppered all over the English sentences [with global labels] & wake up to the scoping, parameter passing. We can have a whole lot of new fights if we need object-oriented programming; figure out new design patterns with special "Token Factory Factories".
We took a few decades to figure out how to specify & evolve current code to solve a certain class of problems [nothing is perfect .. but it seems to work at scale with trade-offs]. Shall watch this from a distance with pop-corn.
Essentially, say that you have an input type like:
And you do something like ".map(it => { doubledAge: it.age * 2 })"The inferred type of that intermediate operation is now:
Which is wild, since you essentially have TypeScript-like inference in a JVM language.Timestamp to talk:
https://www.youtube.com/watch?v=F5NaqGF9oT4&t=543s
So after Haskell, and helping with LINQ, this is what Erik Meijer has been focusing on.