Hacker News

shchegrikovich
Use Prolog to improve LLM's reasoning shchegrikovich.substack.com

z5h6 hours ago

i've come to appreciate, over the past 2 years of heavy Prolog use, that all coding should be (eventually) be done in Prolog.

It's one of few languages that is simultaneously a standalone logical formalism, and a standalone representation of computation. (With caveats and exceptions, I know). So a Prolog program can stand in as a document of all facts, rules and relations that a person/organization understands/declares to be true. Even if AI writes code for us, we should expect to have it presented and manipulated as a logical formalism.

Now if someone cares to argue that some other language/compiler is better at generating more performant code on certain architectures, then that person can declare their arguments in a logical formalism (Prolog) and we can use Prolog to translate between language representations, compile, optimize, etc.

xelxebar2 hours ago

> over the past 2 years of heavy Prolog use

Oh, cool. Mind if I pick your brain a bit?

Recently, there was an HN post[0] of a paper that makes a case against pure logic languages in favor of "functional logic" ones, which they exhibit with Curry[1]. The setup argument is that Prolog's specs backtracking, which strongly downlimits it from full SLD resolution, causing fatally sharp edges in real world usage.

Being fairly naive to the paradigm, my interpretation is that writing real Prolog programs involves carefully thinking about and controlling the resolution algorithm, which feels very different than straight knowledge declaration. I believe cut/0 is the go-to example. Is that your experience with Prolog in practice?

The real meat of the paper, however, is in its case that functional logic languages fully embed Prolog with almost 1-to-1 expressivity, while also providing more refined tools for externalizing knowledge about the intended search space of solutions.

Thoughts? How are you using Prolog, logic, or constraint programming? What languages and tooling in this arena do you reach for? What is some of your most hard-earned knowledge? Any lesser-known, but golden, websites, books, or materials you'd like to share?

Cheers!

[0]:https://news.ycombinator.com/item?id=41816545

[1]:https://www.curry-language.org/

z5h2 minutes ago

> What is some of your most hard-earned knowledge?

1. If you find yourself straying too often from coding in relations, and instead coding in instructive steps, you're going to end up with problems.

2. Use DCGs to create a DSL for any high level operations performed on data structures. The bi-directionality of Prolog's clauses means you can use this DSL to generate an audit trail of "commands executed" when Prolog solves a problem for you, but you can also use the audit trail and modify it to execute those commands on other data.

z5h35 minutes ago

So first, let's keep in mind that with no execution model, Prolog is still a "syntax" for Horn clauses. It's still a way to document knowledge. Add SLD resolution and we can compute. The paper (intentionally I presume) orders clauses of a simple predicate to illustrate (cause) a problem in Prolog.

But what I actually find is the more time spent in Prolog, the more natural it is to express things in a way that is clear, logical and performant. As with any language/paradigm, there are a few gotchas to be experienced. But generally speaking, SLD resolution has never once been an obstacle (in the past 2 years) of coding.

The general execution model of Prolog is pretty simple. The lack of functions actually makes meta-programming much clearer and simpler. A term is just data, unless it's stated as a goal. It's only a valid goal if you've already defined its meaning.

So I'd be concerned that Curry gives up the simplicity of Prolog's execution model, and ease of meta-programming. I struggle with the lack of types in Prolog, but also know I can (at least in theory) use Prolog to solve correctness problems in Prolog code.

I'm currently using SWI-Prolog. Performance is excellent, it has excellent high-level concurrency primitives[0] (when was the last time you pegged all your cores solving a problem?), and many libraries. I might be one of the few people who has committed to using the integrated editor (PceEmacs) despite being a Vim person. PceEmacs is just too good at syntax highlighting and error detection.

At the same time, I'm a huge fan of Markus Triska. His Youtube[1] stuff is mind-expanding (watch all of it, even if you never write Prolog). He has an excellent book online[2]. I admire the way he explains and advances pure monotonic Prolog, and I appreciate the push for ISO conformance and his support for Prologs that that do the same (SWI is not on that list).

If you want to learn Prolog, watch all of Markus Triska's videos, read his book, and learn what Prolog could be in a perfect world. Then download SWI-Prolog, and maybe break some rules while getting things done at a blazing speed. Eventually you'll gravitate to what makes sense for you.

The Art of Prolog is a classic "must have". Clause and Effect is a good "hit the ground running" (on page 70 you're into symbolic differentiation via term rewriting).

0 https://www.swi-prolog.org/pldoc/man?section=thread

1 https://www.youtube.com/@ThePowerOfProlog

2 https://www.metalevel.at/prolog

larodi3 hours ago

Been shouting here and many places for quite a while that CoT and all similar stuff eventually leads to logic programming. So happy I’m not crazy.

burntcaramel3 hours ago

COT = Chain-of-Thought

https://arxiv.org/abs/2201.11903

bbor2 hours ago

You’re in good company — the most influential AI academic of all time, the cooky grandfather of AI who picked up right where (when!) Turing left off, the man hated by both camps yet somehow in charge of them, agrees with you. I’m talking about Marvin Minsky, of course. See: Logical vs. Analogical (Minsky, 1991) https://ojs.aaai.org/aimagazine/index.php/aimagazine/article...

  …the limitations of current machine intelligence largely stem from seeking unified theories or trying to repair the deficiencies of theoretically neat but conceptually impoverished ideological positions. 
  Our purely numeric connectionist networks are inherently deficient in abilities to reason well; our purely symbolic logical systems are inherently deficient in abilities to represent the all-important heuristic connections between things—the uncertain, approximate, and analogical links that we need for making new hypotheses. The versatility that we need can be found only in larger-scale architectures that can exploit and manage the advantages of several types of representations at the same time.
  Then, each can be used to overcome the deficiencies of the others. To accomplish this task, each formally neat type of knowledge representation or inference must be complemented with some scruffier kind of machinery that can embody the heuristic connections between the knowledge itself and what we hope to do with it.
He phrases it backwards here in comparison to what you’re talking about (probably because no one in their right mind would have predicted the feasibility of LLMs), but I think the parallel argument should be clear. Talking about “human reasoning” like Simon & Newell or LeCun & Hinton do in terms of one single paradigm is like talking about “human neurons”. There’s tons of different neuronal architectures at play in our brains, and only through the ad-hoc minimally-centralized combination of all of them do we find success.

Personally, I’m a big booster of the term Unified Artificial Intelligence (UAI) for this paradigm; isn’t it fetch? ;)

dleinkan hour ago

Just throwing this out there for someone, "Scruffier Kind of Machinery" is a good name for a book, company or band.

eru3 hours ago

Prolog was a neat exercise, but for practical programming you might want to combine both logical and functional programming. I think 'Curry' does that.

tomcam4 hours ago

Is it your thought that for the average programmer Prolog is easier to read and maintain than say Go, C#, or Java?

z5han hour ago

I'm surprised at how readable Prolog is.

I've played with and seriously used many languages in my career. My experience is that pure functional (done Elm style) is productive and scales well to a larger team. Dynamic stuff like Ruby/Javascript always has more bugs than you think, even with "full" test coverage. I'm not smart enough to make sense of my own Scheme meta-programming when I revisit it months later. I have loads (but dated) experience with Java and it (and peers) are relatively easy to read and maintain.

Prolog is very surprising, because it is homoiconic and immensely powerful in metaprogramming, BUT ... the declarative style and execution model reigns in the complexity/readability. A term is just a term. Nothing happens when you create a term. If/when a term is a goal, then you match it with the head of an existing predicate (something you've already coded). So it never gets too messy. Now, the biggest problem with Prolog is that it's so flexible, you'll perpetually be realizing that you could have coded something much more cleanly. So you do that, have less, code, it's nicer, etc. Doing this on a large team might not scale without effort.

nsxwolf3 hours ago

I found it completely impenetrable in college for all but the simplest problems and I tried to re-read the textbook recently and I didn’t do much better.

dmead5 hours ago

It's taken ages for anything from functional programming to penetrate general use. Do you think uptake of logic stuff will be any faster?

johnnyjeans4 hours ago

Prolog (and logic programming in general) is much older than you think. In fact, if we take modern functional programming to have been born with John Backus' Turing Award presentation[1], then it even predates it.

Many advancements to functional programming were implemented on top of Prolog! Erlang's early versions were built on top of a Prolog-derived language who's name escapes me. It's the source of Erlang's unfamiliar syntax for more unlearned programmers. It's very much like writing Prolog if you had return values and no cuts or complex terms.

As for penetrating general use, probably not without a major shift in the industry. But it's a very popular language just on the periphery, even to this day.

[1] - https://dl.acm.org/doi/10.1145/359576.359579

dmead3 hours ago

Did you just answer me with chatgpt?

hydrolox2 hours ago

definitely not how chat gpt writes

cmrdporcupine5 hours ago

So why Prolog in particular and not another logic language like Mercury or Oz/Mozart etc?

jfengel4 hours ago

"Prolog" is like Lisp, a wide array of superficially similar languages that actually are quite diverse.

Mind you, in that sense, Java and C# are more or less the same language, which has Prolog programmers nodding their heads and Java and C# developers screaming.

Avshalom2 hours ago

Probably because neither of them have much in the way or library or community support.

infradig4 hours ago

It's not meant to be taken literally, it refers to any language of logic programming”. Apologies to Monty Python.

gorkempacaci4 hours ago

The generated programs are only technically Prolog programs. They use CLPFD, which makes these constraint programs. Prolog programs are quite a bit more tricky with termination issues. I wouldn’t have nitpicked if it wasn’t in the title.

Also, the experiment method has some flaws. Problems are hand-picked out of a random subset of the full set. Why not run the full set?

bboran hour ago

Yeah I’m a huge proponent of this general philosophy, but after being introduced to prolog itself for a third of a semester back in undergrad I decided to stay far, far away. The vision never quite came through as clearly as it did for the other wacky languages, namely the functional family (Lisp and Haskell in my case). I believe you on the fundamental termination issues, but just basic phrasing seemed unnecessarily convoluted…

Since you seem like an expert: is there a better technology for logical/constraint programming? I loved predicate calculus in school so it seems like there should be something out there for me, but so far no dice. This seems kinda related to the widely-discussed paradigm of “Linear Programming”, but I’ve also failed to find much of interest there behind all the talk of “Management Theory” and detailed mathematical efficiency comparisons.

I guess Curry (from above) might be the go-to these days?

fsndz5 hours ago

This is basically the LLM modulo approach recommended by Prof. Subbarao Kambhampati. Interesting but only works mostly for problems that have some math/first degree logic puzzle at their heart. Will fail at improving perf at ARC-AGI for example... Difficult to mimic reasoning by basic trial and error then hoping for the best: https://www.lycee.ai/blog/why-sam-altman-is-wrong

pjmlp7 hours ago

So we are back to Japanese Fifth Generation plan from 1980's. :)

metadat5 hours ago

For the uninitiated (like me):

The Japanese Fifth Generation Project

https://www.sjsu.edu/faculty/watkins/5thgen.htm

linguae7 hours ago

This time around we have all sorts of parallel processing capabilities in the form of GPUs. If I recall correctly, the Fifth Generation project envisioned highly parallel machines performing symbolic AI. From a hardware standpoint, those researchers were way ahead of their time.

nxobject7 hours ago

And they had a self-sustaining video game industry too... if only someone had had the wild thought of implementing perceptrons and tensor arithmetic on the same hardware!

postepowanieadm6 hours ago

and winter is coming.

tokinonagare7 hours ago

Missing some LISP but yeah it's funny how old things are new again (same story with wasm, RISC archs, etc.)

nxobject7 hours ago

Lots of GOFAI being implemented again – decision trees, goal searching and planning, agent-based strategies... just not symbolic representations, and that might be the key. I figure you might get an interesting contribution out of skimming old AI laboratory publications and seeing whether you could find a way of implementing it through a single LLM, multiple LLM agents, methods of training, etc.

nmadden4 hours ago

Indeed, modern ML has been a validation of (some of) GOFAI: https://neilmadden.blog/2024/06/30/machine-learning-and-the-...

thelastparadise7 hours ago

Watson did it too, a while back.

luke_galea38 minutes ago

Super cool. I dig generating rules from within the LLM, but I'm not sure Prolog is the right choice in 2024.

I love Prolog and had the opportunity to use it "in anger" years ago to handle temporal logic in a scheduling app. Great experience, but I've found that more modern rules engines like Drools (anything using the Rete algorithm) are a MUCH better fit for most use cases these days.

If you are into this stuff, you might like the talk I gave on rules engines, prolog and how it led to erlang & elixir. https://www.youtube.com/watch?v=mDnntrhk-8g&t=1s

a1j9o947 hours ago

I tried an experiment with this using a Prolog interpreter with GPT-4 to try to answer complex logic questions. I found that it was really difficult because the model didn't seem to know Prolog well enough to write a description of any complexity.

It seems like you used an interpreter in the loop which is likely to help. I'd also be interested to see how o1 would do in a task like this or if it even makes sense to use something like prolog if the models can backtrack during the "thinking" phase

hendler6 hours ago

I also wrote wrote an LLM to Prolog interpreter for a hackathon called "Logical". With a few hours effort I'm sure it could be improved.

https://github.com/Hendler/logical

I think while LLMs may approach completeness here, it's good to have an interpretable system to audit/verify and reproduce results.

lukasb6 hours ago

I bet one person could probably build a pretty good synthetic NL->Prolog dataset. ROI for paying that person would be high if you were building a foundation model (ie benefits beyond being able to output Prolog.)

mcswellan hour ago

I'm not exactly sure what you're referring to, but Fernando Pereira's dissertation included a natural language (English) program for querying a "database". Both the NLP part and the database were written in Prolog. Mid-1980s, I think. Of course both parts were "toy" in the sense that they would need to be hugely expanded to be of real world use, but they did handle some interesting things (like quantifiers, graded adjectives etc.).

UniverseHacker5 hours ago

I think this general idea is going to be the key to really making LLMs widely useful for solving real problems.

I’ve been playing with using GPT-4 together with the Wolfram Alpha plugin, and the combo of the two can reliably solve difficult quantitative problems that neither can individually by working together, much like a human using a calculator.

DeborahWrites5 hours ago

You're telling me the seemingly arbitrary 6 weeks of Prolog on my comp sci course 11yrs ago is suddenly about to be relevant? I did not see this one coming . . .

fullstackwife5 hours ago

Is there any need to look at this generated Prolog code?

nonamepcbrand16 hours ago

This is why GitHub CodeQL and Co-Pilot assistance is working better for everyone? basically codeql uses variant of Prolog (datalog) to query source code to generate better results.

baq7 hours ago

Patiently waiting for z3-guided generation, but this is a welcome, if obvious, development. Results are a bit surprising and sound too optimistic, though.

de6u99er5 hours ago

I always thought that Prolog is great for reasoning in the semantic web. It doesn't surprise me that LLM people stumble on it.

ianbicking4 hours ago

I made a pipeline using Z3 (another prover language) to get LLMs to solve very specific puzzle problems: https://youtu.be/UjSf0rA1blc (and a presentation: https://youtu.be/TUAmfi8Ws1g)

Some thoughts:

1. Getting an LLM to model a problem accurately is a significant prompting exercise. Bridging casual logical statements and formal logic is difficult. E.g., "or" statements in English usually mean "xor" in logic.

2. Domains usually have their own language expectations. I was doing Zebra puzzles (https://en.wikipedia.org/wiki/Zebra_Puzzle) and they have a very specific pattern and language. I don't think it's fair to really call it intuitive or even entirely unambiguous, it's something you have to learn. The LLM has to learn it too. They have seen this kind of puzzle (and I think most can reproduce the original Zebra puzzle from memory), but they lack a really firm familiarity.

3. Arguably some of the familiarity is about contextualizing the problem, which is itself a prompting task. People don't naturally solve Zebra puzzles that we find organically, it's something we encounter in specific contexts (like a puzzle book) which is not so dissimilar from prompting.

4. Incidentally Claude Sonnet 3.5 has a substantial lead. And GPT o1 is not much better than GPT 4o. In some sense I think o1 is a kind of self-prompting, an attempt to create its own context; so if you already have a well-worded prompt with instructions then o1 isn't that good at improving performance over 4o.

5. A lot of the prompting is really intended to slow down the LLM, to keep it from jumping to conclusions or solving a task too quickly (and incorrectly). Which again is a case of the prompt doing what o1 tries to do generally.

6. I'm not sure what tasks call for this kind of logical reasoning. Not that I don't think they exist, I just don't know how to recognize them. Planning tasks? Highly formalized and artificially constructed problems don't seem all that interesting... and the whole point of adding an LLM to the process is to formalize the informal.

7. Perhaps it's hard to see because real-world problems seldom have conveniently exact solutions. But that's not a blocker... Prolog (and Z3) can take constraints as a form of elimination, providing lists of possible answers, and maybe just reducing the search space is enough to move forward on some kinds of problems.

8. For instance when I give my pipeline really hard Zebra problems it usually doesn't succeed; one bug in one rule will kill the whole thing. Also I think the LLMs have a hard time keeping track of large problems; a context size problem, even though the problems don't approach their formal context limits. But I can imagine building the pipeline so it also tries to mark low-confidence rules. Given that I can imagine removing those rules, sampling the resulting (non-unique, sometimes incorrect) answers and using that to revisit and perhaps correct some of those rules.

Really I'd be most interested to hear thoughts on where this logic programming might actually be applied... artificial puzzles are an interesting exercise, but I can't really motivate myself to go too deep.

gbanfalvi4 hours ago

> 6. I'm not sure what tasks call for this kind of logical reasoning

Basically any tasks that fulfill legal or business requirements? Both companies and governments are rushing to put LLMs into anything they can to avoid paying people. It’s vital to ascertain that, say, a benefits application is assessed properly and the LLM doesn’t hallucinate its way into an incorrect decision.

I’d question if we really need LLMs in many of the places we’re sticking them at all (or if it’ll even be cheaper), but that’s more flawed human decision.

sgt1017 hours ago

Building on this idea people have grounded LLM generated reasoning logic with perceptual information from other networks : https://web.stanford.edu/~joycj/projects/left_neurips_2023

arjun_khamkar5 hours ago

Would Creating a prolog dataset would be beneficial, so that future LLM's can be trained on it and then they would be able to output prolog code.

mise_en_place5 hours ago

I really enjoyed tinkering with languages like Prolog and Coq. Interactive theorem proving with LLMs would be awesome to try out, if possible.

bytebach4 hours ago

An application I am developing for a customer needed to read constraints around clinical trials and essentially build a query from them. Constraints involve prior treatments, biomarkers, type of disease (cancers) etc.

Using just an LLM did not produce reliable queries, despite trying many many prompts, so being an old Prolog hacker I wondered if using it might impose more 'logic' on the LLM. So we precede the textual description of the constraints with the following prompt:

-------------

Now consider the following Prolog predicates:

biomarker(Name, Status) where Status will be one of the following integers -

Wildtype = 0 Mutated = 1 Methylated = 2 Unmethylated = 3 Amplified = 4 Deleted = 5 Positive = 6 Negative = 7

tumor(Name, Status) where Status will be one of the following integers if know else left unbound -

Newly diagnosed = 1 Recurrence = 2 Metastasized = 3 Progression = 4

chemo(Name)

surgery(Name) Where Name may be an unbound variable

other_treatment(Name)

radiation(Name) Where Name may be an unbound variable

Assume you are given predicate atMost(T, N) where T is a compound term and N is an integer. It will return true if the number of 'occurences' of T is less than or equal N else it will fail.

Assume you are given a predicate atLeastOneOf(L) where L is a list of compound terms. It will succeed if at least one of the compound terms, when executed as a predicate returns true.

Assume you are given a predicate age(Min, Max) which will return true if the patient's age is in between Min and Max.

Assume you have a predicate not(T) which returns true if predicate T evaluates false and vice versa. i.e. rather than '\\+ A' use not(A).

Do not implement the above helper functions.

VERY IMPORTANT: Use 'atLeastOneOf()' whenever you would otherwise use ';' to represent 'OR'. i.e. rather than 'A ; B' use atLeastOneOf([A, B]).

EXAMPLE INPUT: Patient must have recurrent GBM, methylated MGMT and wildtype EGFR. Patient must not have mutated KRAS.

EXAMPLE OUTPUT: tumor('gbm', 2), biomarker('MGMT', 2), biomarker('EGFR', 0), not(biomarker('KRAS', 1))

------------------

The Prolog predicates, when evaluated generate the required underlying query (of course the Prolog is itself a form of query).

Anyway - the upshot was a vast improvement in the accuracy of the generated query (I've yet to see a bad one). Somewhere in its bowels, being told to generate Prolog 'focused' the LLM. Perhaps LLMs are happier with declarative languages rather than imperative ones (I know I am :) ).

anthk6 hours ago

Use Constraint Satisfaction Problem Solvers. It commes up with Common Lisp with ease.

YeGoblynQueenne5 hours ago

That's not going to work. Garbage in - Garbage out is success-set equivalent to Garbage in - Prolog out.

Garbage is garbage and failure to reason is failure to reason no matter the language. If your LLM can't translate your problem to a Prolog program that solves your problem- Prolog can't solve your problem.

Philpax5 hours ago

This is a shallow critique that does not engage with the core idea. Specifying the problem is not the same as solving the problem.

YeGoblynQueenne3 hours ago

I've programmed in Prolog for ~13 years and my PhD thesis is in machine learning of Prolog programs. How deep would you like me to go?

MrLeap2 hours ago

I'm excited for the possibility of an escalation after reading this.

Philpax3 hours ago

As deep as is required to actually make your argument!

mountainriver2 hours ago

Agree, reasoning has to come from within the model. These are hacks that only work in specific use cases

hn-front (c) 2024 voximity
source