An interactive audio visual learning experiment: August 2012

Musings of a bottom-up "design" engineer

I find the design methodology you humans have mastered, quite fascinating. No, there isn't a touch of mild condescension in that statement. You had very little time to iterate - I had way more time to iterate and improve than you did. So it's not fair to compare your design with mine. Actually, your designs are quite impressive considering the amount of time you had (by the way, I love your iPhone - neat).

Anyway, let me share with you some of my design nuggets in the upcoming articles. I will explain my design only using "common sense" terminology. To those of you who are biologists, I am afraid you may have to use the inverse mapping table at the end of my articles to trace back from "common sense terms" that I use, to your vocabulary (which I have to confess, I find, quite daunting). Hopefully, you shouldn't have to inverse map often - my terms will reveal the mapping themselves in most cases.

But before I share my "design nuggets", allow me to first brag by posing just one challenge that highlights what my machines can do:

Can you build a machine using all the natural material you want in the world, or even synthesize your materials, and expend all the energy you need, but with the following constraints-

the individual size of the raw material used to build your machine should be no larger than a grain of sand, around 1 cubic millimeter.
The manufacturing factory for your machine should be the very machine you are building . That is, your machine must be able to replicate itself.
Your machine should be no larger than 27 cubic feet, the average size of a TV.

I am being very generous, I am relaxing the scale for you by a million. I can do it with raw material the size of nanometers and my factory can be as small as a cube, 100 micrometers in length. I can build machines 100 feet long, all from factories the size of a cube just 100 micrometers in length. Another generous concession - your machines can shatter on fall like your iPhone, they don't have to survive even a 5 feet fall. Oh, and don't bother trying to emulate fancy features such as being able to "think". Just try building a self-replicating factory no larger than a TV box, with raw materials no larger than the size of a grain of sand - and you would replace my veiled hubris with total humility.

I bet you can't meet my challenge - at least not yet. :-)

No cheating like Craig Venter - he bootstrapped with "my factory"! You have to play fair and make the self-replicating factory yourself.

If your immediate reaction is, "we can't even build a self-replicating machine, why all this constraint on its size?" You will see, with a little thought, that the size constraints on the materials and the machine are not really constraints. They are crucial clues to build self-replicating machines, at least of the kind I am aware of. When building machines like mine, from "inside out" (that is growing in size over time), having raw material that is really small is an advantage. It gives tremendous control to assemble intricate structural designs. This "inside out" method of building may seem counterintuitive to most of you because you are used to building bridges and houses by assembling large chunks of prefabricated pieces (even your circuit boards are built by adding prefabricated components, though they are small - granted you have recently started dabbling with ideas similar to mine in self-assembly). Also, in my "inside out" design, having the factory, which is itself the building block, to be small, is an advantage - it enables the "self-shaping factories" to grow, replicate, and morph to different shapes.

It is quite easy for me to explain the principles of my software design striking parallels with your world of software engineering. My hardware design however, has concepts that may not be intuitive at all, and may require some getting used to. The hardest thing to get used to, is how order can emerge on its own to create complex self-assembling, self-replicating machines, all from randomly moving atoms and molecules. I am starting this series with examples to give you an insight into just that "hard to fathom" concept. In all my examples, self-assembly and even goal oriented behavior, all emerge from randomly moving particles. Over time, once you get the hang of the underlying design principles, it will become more apparent how such emerging order from randomness can create self-replicating machines. The experience is not unlike opening the hood of a car and marveling at the engineering inside. If you don't know car design, it will all seem like magic. But if you get an engineer who designed it give you an overview, things under the hood would start to make some sense. The only key difference from car design, however, is that the assembly of a car is done externally in a factory, even if automated. My machines self-assemble from within, and my factories full of these self-assembled machines, can self-replicate.

The core of my hardware design is self-assembly and self-replication. Lets take self-assembly first. You have already mastered a form of self-assembly, like assembling a car using robots - you call it automation. There is another form of self-assembly spanning scales - galaxies and planetary systems on the macroscopic scale to individual atoms at the microscopic scale. There is a common thread in all these forms of self-assemblies - some of them require expenditure of energy to assemble, while some assemble on their own without any need for energy input - like some atoms accidentally bumping into each other and spontaneously forming molecules. Some of my machines self-assemble without energy and others need energy for assembly. What makes my self-assembly unique then? My self-assembling factories can replicate themselves. So to summarize the essence of my hardware design - my factories are made up of self-assembling machines and structures, some of which require energy to self-assemble. My factories, which are themselves machines, can grow, morph to different shapes, and most importantly self-replicate. Now, you may ask, how did I make the first factory? Did I cheat like Craig Venter, and bootstrap my machines off someone else's work? Well, I didn't have anyone else around to steal from - it had to be original work. But I am only going to reveal, just as much as Fermat did, of his famous last theorem, "I have a remarkably simple bootstrap method, which this article is too small to contain". Surely, an Andrew Wiles will emerge and solve it for you some day.

Now a few disclosures are in order. It took few billions of years to make my machines. Oh, by the way, I didn't design at all - I lied. You will see through my lie, assuming you read even some, if not all of my upcoming articles. It would become apparent I did nothing. Self-assembly dabbled with itself for a long time and a chance bootstrap event made one of the designs, self-replicating. From that point, my machines evolved on their own. All I did was, make a big bang...

Detailed Design notes and references - You can skip this section entirely

Self-assembly

Self-replication

Directed cargo transport in cells

Convergence of biology and computing

Algorithms in nature - the convergence of systems biology and computational thinking - November 2011, Molecular Systems Biology

Craig Venter - a leading scientist who made significant contributions to genomic research, including pioneering the creation of synthetic DNA

Fermat's last Theorem
Penned by Pierre de Fermat in the 17th century,
x(raised to the power of n) + y(raised to the power of n) = z(raised to the power of n), where n represents 3, 4, 5, ...no solution
Fermat wrote - "I have discovered a truly marvelous demonstration of this proposition which this margin is too narrow to contain."
In 1994, Andrew Wiles announced a proof to Fermat's last theorem.

Fermat's Enigma: The Epic Quest to Solve the World's Greatest Mathematical Problem - a great book by Simon Singh on Fermat's last theorem and the quest to solve it by many mathematicians.

Go to interactive animation of Musings of a bottom-up "design" engineer
This interactive animation was done using paperjs. It will only work on browsers supporting HTML5 canvas

Nature's search scheme

Go to interactive animation of Nature's search scheme
This interactive animation was done using paperjs. It will only work on browsers supporting HTML5 canvas

Here is nature's search problem. Imagine a box that can pack a moderate sized television (27 cubic feet box - 3x3x3 feet ), filled with some liquid that is a little syrupy - it can't be water because water is not syrupy. Within that box, imagine a blob that that is one-eighth the size of a dice ( 0.125 cc - .5x.5x.5 centimeters cube. A dice is roughly 1 cc ). Now this little blob has grooves etched into its body - so its almost like a key. Now imagine a coiled tape almost 3 times the height of Empire State Building (yes, nature can fit a 4000 feet long coiled tape in a 3x3x3 feet box). Its a tape with grooves etched on its sides functioning like keyholes. The little blob can latch onto this tape wherever the grooves on the blob perfectly complement the grooves on the tape, like a key fitting a lock.

Nature's search problem is how to find the right "keyhole" match for the little blob on the 4000 feet coiled tape that is packed inside the television box. The little blob is randomly zipping in the syrupy medium inside the television box at approximately 30 feet/sec - this movement is driven by thermal fluctuations (think blobs zipping, not just moving around, in a lava lamp). At this speed it can go from one end of the box to the other in .1 seconds (100 milliseconds). Gravitational force on this little blob is negligible - viscous force from the syrupy liquid filling the box is a billion times stronger than gravity.

Nature can finish this search and find a right spot to dock the dice sized little blob on the 4000 feet tape, on average in 3-5 minutes. I skipped an important detail for effect. It cannot solve that for just one little blob, but if you have a certain amount of them, zipping around the box randomly, then it will complete a successful search and dock on one or more of them (assuming there are multiple matching key slots on the tape) on average in 3-5 minutes. Nature's search is not unlike a Google search, except nature has no index of documents like Google to quickly access a document! Search has to be done each time on the entire 4000 feet coiled tape.

The 3x3x3 feet box above is a single cell organism - a bacteria. The .5x.5x.5 cms little blob is a protein molecule. The 4000 feet long tape is a bacterial genome. All dimensions are magnified a million times to visualize the scope of the problem in "our world" dimensions.

A model was proposed in 1981 for how nature solved this search problem. This year, in June 2012, a paper published in Science confirmed key elements of the proposed model in a living single cell organism - a bacteria. It still remains to be seen if this search method is used in all living organisms, although proposed models claim it is, with some variations.

The search method is as follows. A little blob that is "randomly" zipping through the cell, with some probability comes across the tape and does a "loose docking" onto it, aided by an "on-board" loose docking machinery. It is a loose docking because it enables the blob to slide along the tape. So once it docks loosely, it slides for some length, driven again by thermal fluctuations, before it "randomly" disengages from the tape. But while it is sliding along the tape, it will test,"randomly" at some spots,if the grooves match. If there is a match, it docks tightly using an "on-board" tight docking machinery. The search is complete. If it disengages from the tape before finding a match, it may, with some probability, dock loosely again at some position on the coiled tape and perform the "slide and test" search again.

There is a small hitch while it slides along the tape though. There may be stumbling blocks - which are nothing but other blobs just like this one who are performing or have completed a search! So our little blob may hit an obstruction, and may disengage from the tape, since it is only loosely docked in sliding mode (remember it docks tightly only after search succeeds). As we saw earlier, it may also, with some probability, return and dock again to continue its sliding search, perhaps this time, past the obstruction. Interestingly, the blob has also been observed to slide over the target match site on tape several times before tight binding - almost like a helicopter hovering over a landing site. So it appears nature has converged on a trade-off between rapid loose docking search (on non-matching areas so it can slide and potentially return and reengage once it is past an obstruction) and tight docking (where a match occurs).

You may wonder, how important is this search? This search is central to the functioning of a single cell - a cell wont exist if this search does not work. All organisms are made up of cells, starting from single celled bacteria to us humans - we have around 100 trillion cells.

Why do cells perform this search? Cells perform this search to make proteins. This search is happening right now, in almost every cell in your body. Each cell is a remarkable computing machine. The coiled tape is the genetic code, full of recipes to make different proteins. Somewhere located on this coiled tape, is the recipe to make a particular protein - this recipe has to be first searched and found to make that protein. What do protein molecules do? Protein is perhaps nature's most ingenious and elegant design solution, both from a hardware and software standpoint. We shall look at these magnificent "nano machine" molecules separately, but for now, lets just say protein molecules come in different shapes and sizes and perform a wide array of functions: they serve as raw material for creating biological hardware (our bodies are held together by a protein - 25% to 35% of our bodies is this binding protein - our bones get their strength to withstand stretching from this protein), transporting "stuff" around (oxygen is transported by a protein), messengers initiating growth of body, accelerating reactions (they can make reactions happen a million times per second), sensory input transducers - converting sensory input into signals to our brain (protein molecules in the eye capture light and converts it into a signal to our brain), software execution control (some proteins can control the rate of their own "recipe reading" and the recipe reading of other proteins) - the blob we saw earlier, is itself a protein.

So how do we create a protein? To create a protein, its recipe has to be read out from the tape. The recipe reading machinery does not dock successfully on the tape, under normal conditions to read the recipe (it does at times, but at a very low rate of success). However, when certain conditions are met, such as the docking of the blob we saw above, the docked blob assists the reading machinery to attach to the coiled tape and read the recipe. There are blobs that prevent the tape reading machinery from attaching to the coiled tape too, thereby preventing the reading of a recipe completely.

For those of us who know programming, it is just like the conditional expression that precedes a block of code. If the condition is met, the block of code that follows the condition executes - in the case of nature, the criteria for satisfying a condition is the presence, or in some cases, even absence of a docked blob.

Lets look at a real life example of the need for this search. Take a single cell life form such as a bacteria. Lets say it can "digest" two types of food - sugar and milk. Given a choice of sugar and milk, it would prefer sugar, only because it is easier to digest sugar than it is to digest milk. Digestion, in this case of a single cell bacteria, is the ability to break down a molecule of sugar or milk, so that it can extract energy from the broken down molecule. So when both sugar and milk are present, it shuts down its milk breaking down machinery, which is nothing but turning off the portion of the genetic tape that creates the milk digesting protein. This turning off requires a search for the milk breaking down recipe just like the one described above.

Lets take another real life example - "us". Pretty much every functioning cell in our bodies, performs search to create proteins for different tasks - the little blob that docks on the tape and blocks/enables the recipe reading, is itself a protein. In the interactive animation, a particular blob is shown in detail. It plays a central role in determining the life span of our cells. In more than 50% of all human cancers, this blob has been found to not function properly, causing search to fail. The malfunction has been attributed to its inability to complete a successful search.

So if protein recipe search takes 3-5 minutes, how long does full protein production take? The average time to read recipe from the gene is about 30 minutes in mammals and another 30 minutes to read recipe and make a protein (in single cell bacteria it takes about a minute to read recipe and about 2 minutes to make protein from recipe). If it takes so long, clearly protein production from scratch is not a viable strategy for quick responses to external stimuli, particularly for us humans (bacteria can get by - they can create proteins in minutes). Nature has other fast response methods, clearly. One of the fast response methods is based on the switching of a protein between active and passive states - it takes about 1-100 microseconds for proteins to switch states. It is this rapid switching that enables nature to respond quickly to input stimuli. For instance if a picture is flashed at you, you can consciously perceive it in around 100-200 milliseconds. This rate of communication is made possible by fast switching of proteins that facilitate the communication of the stimulus to the brain. However, if you remember the contents of this blog, say five years from now, then that retention of memory required the production of new proteins, that happens in the order of hours. If you forget this article, it is only because the retention of the memory involved switching of proteins, that were already produced, was lost - the contents of this article didn't capture your interest enough to be converted into a long term memory by creating new proteins. This example shows you different time scales - protein sensors in your eye sensing and switching in the scale of microseconds followed by protein molecules switching to communicate what you saw to your brain (again scale of microseconds), resulting in conscious perception (scale of milliseconds) of what you saw. The short term memory retention of this article also involves switching of proteins that have already been produced, and are ready for use. The long term memory storage of this article, however, if it ever gets to that, involves creation of new proteins, which happens in the order of hours.

Notes and references

Sizes and scales
Size of bacterial cell - 1 cubic micrometer. Scaled a million times - 1 meter or ~3 feet.

Average size of protein - 5 nanometers. Scaled a million times - .5 cms

Size of bacterial genome - 4.6 million base pairs. The distance between bases is .3 nm. Scaled a million times - ~4000 feet.

It takes less than 100 milliseconds for a protein molecule to traverse a single cell organism (a bacteria) that is roughly a micrometer in length.

The numbers above are from the following sources:
An Introduction to Systems Biology, design principles of biological circuits - Uri Alon
Bionumbers - database of numbers created by Harvard and Weizmann institute
How big are genomes? - an FAQ on numbers in biology created by Harvard and Weizmann institute

The viscous force on molecules inside cells is in the range 1-1000pN (pico Newton). The other appreciable forces on a molecule are covalent bonding force ~10,000pN, thermal force 100-1000pN and electrostatic/Van der Waals force 1-1000pN. Gravitational force is negligible in comparison - a billionth of 1pN.
Mechanics of Motor Proteins & the Cytoskeleton - Jonathan Howard

The 3-5 minutes search completion time was observed in a living single cell organism - a bacteria. The June 2012 Science paper reports these findings.

Time to read the coiled tape (gene) and make a copy (transcription) is around a minute for single cell bacteria and about 30 minutes in mammals. Note these are average times - there are recipes that require 17 hours to read in humans, due to the length of the recipe (dystrophin gene). The recipe copy has a lifetime of 2-5 minutes in bacteria. In mammals the recipe copy has a lifetime of 10 minutes to over 10 hours. During this time, multiple copies of proteins can be created from a single copy of the recipe - this seems to be a "natural" optimization for nature to have converged on, given the time and energy expended to copy the recipe. Time to create a protein from the recipe copy (translation) is around 2 minutes for bacteria and about 30 minutes for mammals. An Introduction to Systems Biology, design principles of biological circuits - Uri Alon

Memory storage mechanisms
Long term memory - a molecular framework - Nature 1986
Molecular mechanisms to maintain long term memory Nature Neuroscience 2011

Proteins - few examples of them performing various functions
Collagen - this protein is the main component of the connective tissue that holds our bodies together. They are found in bones too, giving bones their tensile strength, while a calcium based mineral gives bones their ability to withstand compression.

Hemoglobin - this multi protein molecule is responsible for carrying oxygen in our blood.

Proteins serve as sensory input transducers. For instance a protein called Rhodopsin is involved in capturing light and converting into a signal to the brain. These sensory input transducers reside on cell membranes and capture external input such as molecules (taste buds sense food, smell receptors sense odor) or even light, and convert them into signals for further cellular processing.

Publications
Science Vol 336, June 2012 - The lac Repressor Displays Facilitated Diffusion in Living Cells - Petter Hammar, Prune Leroy, Anel Mahmutovic, Erik G. Marklund, Otto G. Berg, Johan Elf

Biophysical journal, May 2012 - Generalized Facilitated Diffusion Model for DNA-Binding Proteins with Search and Recognition States - Maximilian Bauer and Ralf Metzler

PNAS, 2010 - A single-molecule characterization of p53 search on DNA -Anahita Tafvizi, Fang Huang, Alan R. Fersht, Leonid A. Mirny, and Antoine M. van Oijen

Biochemistry, 1981 -Diffusion-Driven Mechanisms of Protein Translocation on Nucleic Acids - Otto G. Berg, Robert B. Winter and Peter H. von Hippel

Go to interactive animation of Nature's search scheme
This interactive animation was done using paperjs. It will only work on browsers supporting HTML5 canvas