Tuesday, September 29, 2009

The Uncanny Valley of computational research

(Update, Oct 17: Greatly expanded with the aid of Chandru, the most inexhaustible source of all fundaes on postgrad in IITM. He also insists that this post gives a biased view that only modeling sucks, and so an edition detailing the heinous atrocities and frauds of experimental research is forthcoming.)

I've been very irritated by almost all of computational modeling research I've been reading up on. I specifically mean research in modeling and simulating systems that we (humans) have not built from scratch. For example, trying to model blood flow in an artery, or the mechanical behavior of tissue or modeling the weather. It struck me that this situation is similar to an idea in human-robot interactions, called the Uncanny Valley.

(Image, as mostly everything you'll find on this blog, from Wikipedia)

This is a graph of a feeling of 'familiarity' or 'liking' vs how human a robot looks. Near the left, an object, say a robot, looks nothing like a human (for example an industrial press) and there's no feeling we attach to it. As it starts to look more and more humanlike (like for example the Asimo), people start finding it 'cute' and likable. But then, as it as it gets to the point of being almost human, uncannily human, there suddenly is a repulsion for it, a kind of disgust like what we feel for a prosthetic or a made-up corpse. Further on however, we're barely able to make out the difference and the familiarity rises up rapidly.

I think a very similar effect occurs in modeling. In the beginning, models are simplistic, produce no consistent predictions and no one cares about them ("The earth is flat"). Then we have models that are simple, and yet have very useful predictive qualities. We can measure the Earth if only we assume a few numbers are 'much smaller' than others (Eratosthenes)! An Engineer's First Sin - just assume a certain linear dependence and you can measure something as elusive as time itself (sin(x) ~ x, simple pendulum)! Say a particle can only have two states and can interact with only its nearest neighbors, and poof! You can explain an event as amazing as phase-change (the Ising model)! Assume everything that can be computed can be written down as instructions on a piece of paper, and voila! You come up with fascinating limits on what a computer cannot do (a Turing machine, and the Halting problem)! (And the subsequent hilarity when the Halting problem is outsourced)

But then, the party stops. Here's a bunch of negative reactions about modeling research:

1. It's pointless: the model is extremely complex and the improvement in results are just not worth the effort. The current models are 'close enough' to make any extension seem useless. There's no intellectual pleasure.

2. Talk about a double whammy: modeling seems to have overtaken reality. Many models, especially in bio+engineering areas, are no longer a critical step in deciding anything and their predictive power is never brought into question in any non-trivial problem. Even a model that is used in a critical circumstance is eaten whole by a very roomy engineering safety factor. This is brushed off by saying "Our models will take 20 years to find their use! Look at potential instead of bare reality!"

3. Modeling is also beginning to be used very deviously to demonstrate that a lot of work was done, and therefore the work is worth publishing. Reminds me of a joke that says CFD stands for Colorful Fluid Dynamics. Another related abuse is to put in a section on modeling in an otherwise purely experimental paper. The modeling is supposed to prove that 'what we think is happening is what is actually happening', but in most cases it serves to fill a boilerplate template mandated by some journal.

4. Pauli is once said to have lamented about a paper he was asked to read - 'This is not right. It is not even wrong'. I'm shocked at the number of modeling papers out there which don't even have a hypothesis, or any hint of the model's predictive abilities, usefulness or even a whiff of a reason as to why some poor sod of a grad student wrote 50,000 lines of code. The import of the entire work seems to be "This work was done".

5. A model that has 56 parameters can match any dataset. It's just pushing the problem of 'understanding what is happening' to the level of finding the right parameters. And that level happens to filled with grunt work and boredom.

6. A misconception that better tools lead to better research is rife. Nowhere is this more true than modeling. A supercomputer cluster is trivial to buy - just a few thousand dollars and you have very decent 'computing resources'. I find it especially grating when universities announce with great pomp the opening of a 'High Performance Computing Center'. This is the textbook definition of cargo-cult.

7. Among a host of adapted dick size metrics, a common one is 'full 3D simulation'. A vast majority of engineering modeling consists either of 1D or 2D models, because till the last decade it was impossible to even dream of full 3D simulations. It is debatable where exactly full 3D simulations are necessary - but such trifles don't stop people from merrily performing 3D simulations in the aforementioned High Performance clusters. This unfortunate tendency has resulted in pitiable cases of 5th year PhD students being told that their work will be 'sufficient' only if they re-test their hypothesis (if they were lucky enough to have such intellectual luxuries) with a fancy solver on a cluster, and finding that everything is wrong, and lest they take solace in knowing at least that for a fact, inconsistent.

8. Quite the same complaint applies to 'non-linear', 'parallelizable' and 'scalable'. I think it's like inventing a fine temperature controller for the swimming pool on top of the Titanic. Note well however, that this is a more refined and arguably less criminal form of intellectual fraud. The more dangerous epithets of ill-repute are 'nano-' (in a former birth, 'micro-'), 'bio-', 'ab initio', 'biomimetic' and 'multi-scale'.

9. Continuing on the 3D bashing, even presenting results is extremely hard to standardize. Contours aren't cool. Slices take too much space and are too hard to interpret. But who cares about the results and understanding them? Animations, preferably with lighting and a suavely-accented voiceover, are in. A sectioned 3D artist's impression adds more to the mojo than a graph with sensible axes. These days, in addition to the panegyrics and pedigree and GRE and grades, a graphics design background would be a wise thing for modeling labs to insist on if they are to keep the reams running (or what has become equivalent, surviving).

10. At a deeper level, most models' greatest strength is the audience's ignorance, and this is best seen in interdisciplinary work. If you don't know a lot about something, you cannot honestly be critical about it. Even if you are, you do not have the moral right or standing to express it. In fact, the fundamental paradox of knowledge is that the more you know, the more you realize how little you know, and so your self-measured moral right to critique something diminishes as you learn! Russel once said "The trouble with the world is that the stupid are cocksure and the intelligent are full of doubt". This is a most happy outcome for the fraudulent researcher. In a work on 'Distributed finite element modeling of cancer growth', the solid mechanician can't rightfully criticize the parallel computing algorithm or the biology; the algorithms expert/programmer can't criticize the mechanics or the biology; and the oncologist can't criticize the mechanics or the algorithm.

11. Extending on the previous point, collaboration seems to be similar to Simpson's Paradox in statistics, where each part has one trend (here, honesty) and the whole the opposite trend. This is what makes it possible for a US president to be elected to power even though the total number of votes he has is lesser than that of his rival.

12. (Thanks Serdar) Some researchers take the idea of changing topics to remain 'fresh and motivated' a tad too far. Put their finger in every pie, collect all the low-hanging fruit and jump on to a new topic. This is particularly easy in modeling, where neither resources or, as we have come to see, knowledge is necessary to be productive. It is no longer cool to work on a topic for long enough to truly advance frontiers of knowledge. Stay till the Papers/Effort ratio peaks, and move on. These dilettantes are pejoratively termed 'butterflies'.

13. In the end, simulations are easy on the pocket. Take 6 computers and 6 desperate grad students, mix well, add a pinch of conference publication hopes, funding threats and sublte emotional blackmail and voila! You have a paper factory all oiled up. Continue for a few years and you'll be light-years ahead of anyone trying to actually do science. If experiments are your thing, there's a slightly modified but equally despicable protocol.

14. Another aspect Petre rightly pointed out: Fudge Factors. We've all heard of Skinner's constant (also called Flannegan's finagling factor), the number which, when multiplied by, divided into, added to, or subtracted from the answer you got, gives you the answer you should have gotten. While it goes about giving people a smile, it's evil cousin lurks deep in simulation codes magically making them coincide with experiments/predicted results. Nearly any work that simulates a non-trivial non-toy phenomenon will have a couple of constants that are either not known, or are heavily disputed. In fact, in some places, the reason simulations are done is because those constants can't be known, and so simulations can give a kind of sample space. But that is quickly lost when the goal is to produce 'good' results. In an example I know well of, that elusive variable was the conduction speed of a cardiac cell. The speed varies with time, depends on location, and changes drastically when a cell is taken outside the heart or if a probe is put in! So right now there's no way to measure it, and so the constant was 'set to appropriately typical values' to ensure the results were 'correct'.

15. This is perhaps the most finicky and least acceptable of objections, but still: when the famous Four-Color problem was finally solved using a computer in 1976, there was a murmur of dissent. Yes, we finally know for sure that only four colors are required to color any map, but no living human knows why it is so. You need a few hundred pages of analysis to reduce the problem to a couple thousand non-trivial base-cases, and then a computer to verify that indeed all these base-cases can be colored with 4 colors. So one more cold fact is known about the world, and there's nothing more to it.  It wouldn't have been very different if the skies opened up and a thunderous unquestionable voice declared "4 is enough!". Similarly, there are some works which simulate flow of some idealized fluid in some idealized turbulent regime, and proudly claim that they have done it using 200,000 CPU hours. This kind of work is neither here nor there - it is too simplistic to be actually experimentally verified, and it is too complicated for anybody to truly understand.

    Every one of these woes carries echoes of being almost there, but not quite. Taking up the task of removing the 'not quite' can be a frightfully unrewarding and disappointing experience.

    To complete the isomorphism, we should look at very complex models that make very useful/impressive predictions. Flight simulators and today's video games come to mind - the complexity of the Physics or Graphics engines used is mind-boggling, and yet, they are so convincingly real that after a few trials, you think the Gravity Gun in Half Life 2 is just another weapon (as opposed to OMFG AWESOME!!!1!!). Astronaut training programs have a necessity to be painfully accurate, and do quite a good job. As we go along, our very definition - that the 'high-end' of modeling is when you can't distinguish between the model and 'reality' - starts to get to us. Is language a model for expressing thought, or is it something on its own? Is your perception of the World, which determines your reaction to everything that happens to you, a model of the world, or is it You?

    Update: I found two fantastic articles by Bertrand Russel, which convincingly strike at the very roots. Pure joy!

    1. 'On Youthful Cynicism', the 'Truth' and 'Progress' sections in particular.
    2. 'Icarus, or the Future of Science'.

    Update (May 23,2010): A very nice wiki article: Map-Territory relation