Wednesday, October 07, 2009

Understanding Rubel's Universal Differential Equation

[Many thanks to HKR for convincing me to get off my lazy ass and figure this out]

Karthik, dispassionately going about his chosen avocation of introducing mind-bogglingly awesome concepts via blog comments, wrote on the previous post a differential equation:

This equation is called Rubel's Universal Differential Equation [R], and has the amazing amazing amazing property that a solution 'y' to this equation can be made to approximate to any desired level of accuracy any (continuous) function on any interval of the real line, and with C-inf continuity! In other words, this is an 'Equation of Everything'! Take anything you possibly know that is continuous. The temperature in your room as a function of time. The position of the earth in space as a function time. The velocity of a comet as a function of its distance to us. The (smoothed) variation of a stock price with time. The degree of contraction of a heart cell as a function of its location. A solution of this one equation can be used to approximate all of these to infinite accuracy, and with infinite continuity!

I'll now attempt to try to reverse-engineer this equation in the manner I tried figuring out Tupper's math quine, and hope to find a plausible way to figure out how Rubel could have come up with this.

Edit (Dec 26): This is a very light, almost sand-sieveishly non-rigorous treatment of the topic written for and by someone with no formal training in Analysis. If you want to skip the foreplay, here's Rubel's original paper that's more intended for a peer in the math community.

All DEs are not born equal

Consider the trivial [1] equation y'' = 0. I can rightfully claim that this equation is the 'master equation' satisfied by all lines y = Ax+B.

The triviality bit, after a bit of thinking, leads us to a categorization: there are two kinds of differential equations. Consider the equation of SHM, y'' + y = 0. We'll call this a 'predictive' DE, because it predicts the behaviour of y, and tells us something about a physical system.

On the other hand, consider the equation: y''' = 0. Every parabola satisfies this equation, but we can't use this equation to predict anything. It's more an expression of the property of all parabolas, and so we'll call it an 'shrink' DE. It's like going to a psychiatrist to know more about yourself, but all you get is some general info that isn't predictive in any way :P

Rubel's UDE is a something like a 'shrink' DE [R2], not a predictive DE. This may come as somewhat of an anti-climax, because when we see DEs as engineers, we always assume that they are predictive. After all, every single DE an engineer uses is predictive. But note that when it was presented, it was not claimed that every continuous function satisfies the UDE; only that solutions to the UDE can be made to approach any continuous function with arbitrary accuracy. That is equivalent to the difference between saying "I have a hole that perfectly fits every shaft in the world" and saying "I have a tool that can make a hole that perfectly fits every shaft in the world."

Even if it's not predictive, there is a sufficient number of curiosities to motivate further analyzing this. For one, we are well familiar with the idea of interpolating a function piece-wise with straight lines.

We can achieve any desired accuracy by increasing the number of pieces we use, but there is one stark fact: The function that we build by stitching straight lines will have sharp corners - there is a discontinuity of some order at each of the stitch points. Even though increasing the number of pieces can reduce the least square error between the curve (called reducing "the error in the L2 norm"), the resulting stitched function will be bumpy. What precisely do I mean by bumpy?

This might seem like a childish construct from Calculus 101, but please humour me. Let's create a hypothetical car. This car has the usual odometer and speedometer, but it also has an accelerometer, a jerkometer, ... an infinite number of meters to measure every derivative of the car's distance function. By 'bumpy' in the last para, I mean that when I drive my car like the interpolated function, some of my meters jump.

Altius Fortius Happius?

We can attempt solve some of the problems of this bumpiness by using higher order interpolating functions that ensure "stronger" continuity. For example, when we use lines, we can claim at best C0 continuity. Using parabolic sections as the interpolating functions, we can, in principle, claim up to C1 continuity. (Thanks Shreevatsa) Following this path, we go into spline interpolation, but there's no way we can ensure C-inf continuity.

As an aside, the entirety of the Finite Element Method revolves around how to do this interpolation. The central property of the FEM, it's raison d'etre, is that it guarantees the best approximation to a function using a given set of interpolating functions, 'best' as measured by the L2 norm. However, higher order continuity requirements across segments are very hard to implement. In fact, even C1 continuity is very hard in 3-dimensions, and C2 continuity is exceptionally rare. It's a reasonably hot topic of research, and part of the framework Pota is working on attempts to grant higher degrees of continuity.

To summarize our conundrum, with what we already know of interpolations, we can have infinitely accurate interpolations when measured in the L2 norm (ie.. C0 continuity), but higher order continuity especially across segments is extremely hard [2]. Rubel's equation, on the other hand, claims that it can achieve C-inf continuity. How?

A detour - the Curious Incident of exp(-1/x) during the Taylor expansion-time

Consider this innocuous looking function,

It looks like this when plotted:

If you have read about statistical mechanics, you'll notice that this is a very common form. The temperature dependence of a huge number of quantities varies as exp(-1/x).

What's special about this? It is trivially provable[R1] that this function is 'smooth'. All its derivatives exist at all points on the real line. If you drove our car on it, none of your meters would ever register a sudden jump.

But there's a deep evil that lurks here that strikes terror into the heart of all real analysts (umm..not really, but I couldn't resist :P ): Look at the function and its derivatives at x = 0. All of them are identically 0. That means that if you try to write a Taylor expansion about x = 0, it will predict that f(x+h) = 0 for a small h. Holy horror! This means that the Taylor series expansion doesn't sense the rise of the function just to the right of the origin! That means the Taylor expansion can never approximate the function if we start from the origin! The function is not analytic!

This is a phenomenal event! The function is perfectly flat at the origin - perfectly flat, with every derivative zero. And yet, the function bootstraps itself into rising! This is incredible, because you never see perfect flatness in any smooth function! There will always be some non-zero derivative that will tip you off on what the function will do! But here, all derivatives are 0! Flat! Perfectly Flat!

The Insight [3]

But wait - if it is perfectly flat, then if we stitch together the perfectly flat ends of two such functions, we can have an infinite degree of continuity! How would it be if we used this as our interpolating function?

That's it. That's the central idea of UDEs. Stitching together functions in regions where they are perfectly flat. All we need to do is find a differential equation that describes one 'piece'. The stitch region has all derivatives existing and equal zero, and so will trivially satisfy our equation. That means our thread, made of many pieces and stitches, is still a solution to the DE. Which means we can claim that we can make any thread that fits infinitely accurately with any given function. We have made our tool!

Some reflection, and dark forebodings

There are some minor things we need to take care of. In the function we described above, we have only one perfectly flat end. It's cumbersome having to flip the function around every time we need to stitch, and so a little bit of playing around gives us this function that has two perfectly flat ends:

(This is called the bump function.)

Uh oh. Wait a minute. We're screwed. Look at that function again. Our fundamental 'piece'. It doesn't do anything! It is zero, then rises, and is back to zero again! Forget about anything else, how can you stitch together pieces like this to form a simple parabola, say y = x^2 ? The piece as a whole doesn't rise or fall!
Note that even if we rotated this thread, we couldn't change the shape of the thread. Our thread of stitched functions can never deform!

Our knight in ∫hining armour

The solution to this problem is (imho) the second most brilliant insight in the paper. What is it that we love about the bump function? The fact that it's perfectly flat at the ends. What don't we like? That the function ends where it begins, and is on the whole flat.

Wait a minute - the function is perfectly flat at the ends - it doesn't matter if it lost a derivative! Eureka! We can integrate the function, and this makes it an increasing function on [-1,1], while still preserving perfect flatness at the ends! Here's what the integral of th function looks like:

Rubel calls these integrated functions 'S-modules', because of their shape. That's it, we've found our perfect interpolating piece: perfectly flat at the ends, and increasing monotonically over the interval [-1,1].

Observe that in Rubel's UDE, the lowest degree in which y appears is y'; There's no term with a raw y. This is what originally tipped me off into suspecting that there was an integration involved.

Cleaning up

Now the task is to simply write down a differential equation for the S-module:
(t is the abscissa) and write a generalized function of this with constants, such that it can be moved (translated) in the two directions and scaled in width and height. Some algebraic trickery to eliminate the constants and the abscissa follows. The exact thing to be done, as Rubel states, is:

and eliminate A, alpha, beta, B and t from these 4 equations.

Once that is done, we have a shiny UDE
all set for world domination. The exact details of the algebra can be found in Rubel's original paper, and I hope after reading this you will also be able to follow his very pithy style. The only catch is that our function will not be analytic at (i.e. cannot be expanded in a Taylor series around) the stitches. But so far as I know, there's no way to detect this sitting in our hypothetical car, so all's well.

Some Puzzles

We aren't fully done with Rubel's paper yet; he claims that his UDE is an analog to universal Turing machines. There is also much talk on Hilbert's 13th problem, and a classic proof by forthcoming publication [4]. I don't quite see the connects yet, and will update this if I figure it out.

Other UDEs

I skimmed through some of the others in the field, and have a rough idea of how they work (ooh don't you just love the sound of 'skim' 'rough idea' 'general sense' 'intuitive feel' ?). Of all of these, Rubel's is the most straightforward; but that may be because I spent the max time on him.

Duffin's paper in the PNAS [R] is a poor man's UDE: it has a parameter 'm', and the solutions are guaranteed to be Cm continuous. It's nice that he was able to condense a simple polynomial interpolation into one equation.

Brigg's paper [R] uses another trick to generate 'perfect stitches' - he uses Jacobi elliptical functions that are periodic in a chosen interval, and therefore have all derivatives the same. It's a nice extension to Rubel's 'perfectly flat' insight.

There's another paper by Boshernitzan [R] that's very interesting: he guarantees not only Cinf smoothness, but also everywhere analyticity. The price we pay is in our domain. We can no longer have our function spread on arbitrary domains. It has to be a compact region of the real line. Again, I don't understand precisely how he does this, that's for another day. Elsner's E functions [R] are worth a look, too.


There are two very nice insights in this paper - a general one of using an interpolating function with perfectly flat ends to ensure Cinf continuity, and specific one of deciding to use an integral of a bump function. Apart from that, there isn't any more information content in this UDE than in a least-squares fit equation.

My first (wrong) instinct was to think that the 'n' in Brigg's equation held information about the function. I thought that varying n would make the differential equation plot out different graphs. This is very similar to Tupper's idea in his math quine (the constant 'a' in my explanation). Raghu then pointed out that Rubel's original equation had no parameters. But if you think about it, we can indeed make a predictive UDE with one parameter that cycles through all functions. This essentially requires a constructive proof (as opposed to a simple existence proof) of the bijection (one-to-one ness) between the set of reals (our parameter) and set of infinite number of reals (the values of the function at 'every' point). Let's see if I ever can get to that level of joblessness in life :-)

I'm still amazed and can't quite fully digest the fact that a perfectly smooth function can be non-analytic. I mean, consider the exp(-1/x) function. If you stand at the origin, you have no clue what happens as you step forward! All the meters in your car will read zero at the origin, and yet, somehow the function bootstraps itself into rising! This never happens with analytic functions! If any aspect of a function has to change, (y, y', y'', y''', etc), then it can always be seen as being caused by a change in a higher derivative. For example, if you're going in our car near the apex of a downward-facing parabola centered at the origin, you'll see that y = 0 and y' (your speed) = 0. But your acceleration is not zero! So you can always predict what happens at the next moment using the info the car displays at this moment. We have so far thought that the only time you can't predict is when there are discontinuities - i.e, a 'god' tweaking the function definition. But here, there is no tweaking, and yet you can't predict! I'd instinctively think that infinite smoothness means that every point contains info that allows me to transverse every other point in the function. Apparently that is not so. This is a ripe area for some thought experiments, and I'll write more on this if I discover more.

My aim in writing this post is to provide a plausible way in which Rubel could have come up with his UDE. Almost the only way I can understand something is to mentally re-build it from scratch, so I hope this will be of help in clearing up some of the ideas. Though unavoidable when introducing a new topic, every 'Consider' is a non-sequitur. I've tried to minimize those, and the only place where a real detour is required is the idea that a function like exp(-1/x) exists.


[1] A joke about the use of the world 'trivial' in Mathematics on the wiki:
Two mathematicians are discussing a theorem. The first mathematician says that the theorem is "trivial". In response to the other's request for an explanation, he then proceeds with two hours of exposition. At the end of the explanation, the second mathematician agrees that the theorem is trivial.

[2] It's in bad form to ask why higher order continuity is required. It just is. It's also like asking why a real Peugot 206 is better than the one in this lovely ad.

[3] I couldn't resist using the [Blink] tag, just so that I can refer to this old classic. Many thanks to HKR for rediscovering this!

[4] A most lovely list of invalid proofs. My favorites are 'proof by forthcoming publication', 'proof by exhaustion (of audience)', 'proof by reference to inaccessible literature' and 'proof by funding'.

References, the real ones

[R] The Mathworld article on UDEs contains a list of all the works I have referred to here. That was also the first place I heard about UDEs.

[R1] The wikipedia page on Non-analytic smooth functions has a simple proof. I wish it were longer and had more discussion, though.

Also, Tim Gower's essay on continuity is a refreshingly clear read (compared to the shitstorm of epsilons and deltas rife in most analysis textbooks).

[R2] To tell the truth, y''' = 0 is not such a bad fella after all. It's still telling us a physical truth - that the curvature of a parabola is constant everywhere. Rubel's UDE on the other hand isn't telling us anything about the world! It is only we who observe that the solution to the UDE has some use as an interpolating function, and it is we who build the approximation. Talk of outsourcing, man!



Karthik said...

Wow! The Cinf approximation troubled me for quite some time when I was trying to figure it out; It was the one part of Rubel's paper I couldn't grok, and I gave up after an hour of trying. Woe is me, I never sought to differentiate e^(-1/1-x^2)!
I had no idea smooth functions could be non-analytic! Wow. This is still sinking in. I will probably use S-modules every chance I get now.

Shrink ODEs? ROFL!
It's such a multiply apt name. I go into bursts of laughter everytime I think about it. Rubel's DE is the ultimate shrink, eh?

Have you had any luck locating Shannon's "Mathematical theory of the differential analyser"? It seems like that's the missing link in the whole Turing machine analogy, and I can't find this paper anywhere, online or otherwise. (Surprisingly, I can read all about it- it was evidently part of Shannon's masters thesis... imagine!)

This UDE makes me happy- for more than one reason*- and sad all at once. I'm not sure this publicity will help the cause of the academic swindle (of the highest order) we were idly contemplating, though.

Also: His name is Alfred Hubler.

(*See, a couple of years ago I had several checklists of things to achieve, graded by timescales to achieve them in. I don't maintain these anymore, but I am still going to retroactively tick one of the entires on the scale 2 (four years or earlier) checklist: Get mentioned on KVM's blog.)

Anonymous said...

Great post, and a very clear explanation.

I don't understand this:
Using parabolic sections as the interpolating functions, we can, in principle, claim up to C1 continuity. But in practice, it's hard to prove that higher continuity than C0 can be achieved.
Isn't this what splines are for? Or is the issue the (n≥2)-dimensional case?

Feynman liked to say that mathematicians can only prove trivial theorems, because any theorem once proved is "trivial". Gian-Carlo Rota, in his Indiscrete Thoughts, mentions this and then subverts the joke, repeatedly saying in the book that the goal of mathematics is to reduce truths to triviality, etc.

Another legitimate use of the blink tag: scroll down to the bottom of this. :-)

Mohan K.V said...

@Karthik: Thanks, I'm glad you liked it! I'll try looking for Shannon's paper, but I find it hard to think of UDEs as more than clever curve-fits.. just where do the Turing machines come in?

The academic swindle - hm, this needs some thought. I just realized we may have more competition than we originally imagined (ref. this little nugget, make sure to read the very last page). More contemplation is called for.

(Come now, makhmal rumaalallE chaDi Etu hoditiralla saar :) )

@Master S: Re. splines, yes, I think I over-generalized from the FEM difficulty. I'll change that line in the post, thanks for pointing it out.

Gian-Carlo Rota - ah. I've only barely heard of him, will read up more.

The blink tag - Haha, nice! I'm overjoyed that you actually followed every footnote :D

Anonymous said...

Gian-Carlo Rota was one of the mathematicians who got combinatorics recognised as a respectable area starting in the 60s. :-) His Indiscrete Thoughts includes his Ten Lessons I Wish I Had Been Taught, some celebrity gossip (his biography of Stanislaw Ulam in the book is especially hard-hitting), some of his papers in philosophy ("a minority view") and a few other things that would not be of much interest to nonmathematicians.

Re. the comment about continuity: Gowers's article is good of course, but it's pretty much what any analysis class ought to tell you. The epsilon-delta definition is a great one; the problem arises only when the definition of continuity is mistaken for the thing itself and presented as such.[*] When I first encountered the definition (in Thomas and Finney), I felt jubilant — here finally was a way to actually say the thing one had in mind. The definition is the culmination of centuries of similar searching (though Fermat and Newton and Leibnitz and Euler did fine without one) (even Gowers's discussion is called "one way to arrive at the definition"), and I imagine many mathematicians who encounter it are similarly delighted. But few people understand that one's latest mental model is not the way to teach.

But while there are other ways to define continuity (Knuth once proposed a sound Calculus via O-notation, and there are other "Calculus without limits" approaches (see e.g. comments here) — we can even formalise infinitesimals), an analysis textbook has to work from definitions. The student needs to be comfortable working with the definitions to even understand what comes later. Rudin's famously concise Principles of Mathematical Analysis (the analysis textbook, I guess) is the sort of book we complain about at the first encounter but after drinking deeply enough from, grow to enjoy and cherish forever after. :-) Besides, the epsilon-deltas serve a useful purpose: the analysis course is often the rite of passage for mathematics students, the one that teaches us to write our own readable epsilon sandwiches.

This is already too long and rambling a comment for a post that wasn't even about mathematics education, so I'll abruptly stop. :-)


[*] Rota says some similar things in his book, pointing out that even some mathematicians confuse the axiomatic method of presentation with mathematics itself.

Anonymous said...

Oops, there was a trailing slash in one of the links above, in case it's missed: epsilon sandwiches.

And the thrust of the previous comment was this, which could have been put in one sentence: Gowers's discussion of continuity is not an alternative to the epsilons and deltas, but merely what comes before. But now I don't even know why I felt it necessary to point that out. :p

(Also, the last parts of that, about continuity and computation, have been discussed in more detail by Andrej Bauer and Dan Piponi.)

Unknown said...

I don't understand what you mean by "a C-inf continuous solution 'y' to this equation can be made to approximate to any desired level of accuracy any (continuous) function on any interval of the real line." Can you clarify a little bit?

Is there a single y which solves the boxed equation that satisfies this property, or is there a map that maps **each** C0 function to a particular member of the solution set?

Anonymous said...
This comment has been removed by a blog administrator.
Shampoo said...

You have the ability to turn math into a spellbinding thriller :)
Waiting for the next episode...

KVM said...

Thanks Paul. The Reactions section comes with Disqus, I didn't do anything special to activate it. Go to and follow their instructions for Blogger, and you'll be up and running in minutes. Let me know if you need help. Cheers!

Chidambaram Annamalai said...

Awesome explanation!

Chidambaram Annamalai said...

Awesome explanation!