so it's my pleasure today to introduce professor barabasi from northeastern university and harvard. he's currently a distinguished professor at northeastern, director for its complex networks research center. he started his career at ibm and
the university of notre dame, where he was named mut hoffman professor of physics at the age of 32. you're in for a real treat today. professor barbasi is one of the most highly cited researcher in all science, i would say
probably on earth. so he has multiple papers, having thousands of citations and i was quite amazed to see when i was googling him to find out one of his 1999 papers was cited 15,000 times according to google. so he's been a major contributor
to the development of network sigscience and also thestatistical physics of complex systems. he has made landmark contributions to the study of biological networks, protein interactions and metabolic networks. for example, he and his
colleagues have shown that the scale free property of networks exists in buy ofplgt more recently, he's been turning his attention to diseases, especially network medicine and he also coined a new term called diseaseosome that you hear about these days.
in which diseases are connected in this network if they share a significant number of genes. so -- and i also looked at his website and he has a large number of diverse and interesting projects going on, and with all those, it seems like he still can find time to
write books, so he's authored popular science books for the general audience including one called "link." without more delays, i want to welcome professor barbasi. thank you. [applause] >> thanks, everyone.
thanks for coming. it's a pleasure to be back. we just figured out yesterday and today that i was in this room six years ago in the same capacity, and let us hope that we will not repeat that experience because during the talk twice the alarm went off.
and we all had to leave, came back for another five minutes, we started again, and so on. so hopefully we're going to have a smoother ride this time. so what i really want to talk -- as the title indicates, is about network medicine. and the particular networks in
medicine and how they really come together, and the best way to start is to use a simple analogy, which is the analogy of a broken car. you know, a broken car in many ways with a smoking engine and dysfunctional lights has many similarities to a human disease.
but there is one huge difference between the broken car and the human having some disease. it's virtually guaranteed that if you take the car to the mechanic, he or she will be able to fix it. and that's not something that we can say about many of our
diseases. so the question is really, how is it possible that a mechanic, with far less education and far less income than a medical doctor, can actually achieve perfection when it comes to fixing your car in most of the cases, yet we struggle to fix
many of our diseases? of course there are multiple reasons for the difference. this is like a low ball that i'm throwing here, and one of the reasons, of course, the mechanic can fix the car is because he has the pieces, he has the spare parts.
not only he has the spare parts, but he knows exactly what are the parts. if you think about it, that is about where we are in medicine too, so that may not be such a major difference, because what the geno projec genome process,we know the genes, the proteins, the
metabolites, we know what are the components of ourselves. the other thing that the mechanic has that the medical doctor does not, however, is the wiring diagram, the blueprint of the car. that is, how are the different components wired together, how
do they interact with each other, where do the different pieces fit. in many ways, would i woul thisis what medicine has to be, is to understand this wiring diagram, to understand how the components fit together, because down the line, i view the diseases really
correspond to the breakdown of a region of the network, of the disease module that we're about to define, and we will not be able to map out those modules unless we have a good understanding of the network as a whole. now the question of course is
whenever we look at some of the diagram, i'm sure manies you have seen the networks in the cells, the first thing that strikes us is the complexity of the wiring diagram, how messy it is, how unpredictable, how random it looks.
indeed if we want to characterize it, we need -- the myriad of components within the cell. some of the -- that's really help us to speak about these networks and some of the earlier models really assume that these networks are truly random.
in the earlier models from the 1960s by paul anderson said that if you want a model of complicated lab work like one you see in the cell, probably randomly -- collected to each other and this is what i'm showing you the process, i'm picking randomly pairs of notes
and linking them together and as i'm increasing the number of links, you start seeing pathways that you recognize, if i had more links, some of these pathways will hook up with each other and eventually if i add a sufficient number of links, and this was actually one of the
major discoveries in the 1960s, then -- transition like the network will emerge in the system. what has happened from the 1960s until today is that mathematicians have really gone to great lengths to characterize this object that you get by
randomly connecting the notes together. nodes there are lots of predictions that come out from this work. i would like to focus on one in particular, which is what we call the tk-b distribution. so this is two mathematicians
who have really done some of the most fundamental work in the area of random network, and what you see on the left-hand side is an example of such a random network. the question is, how do we characterize this network, and one way to do it, to simply look
at the degree of -- twa, this one has degree four and so on. if you have a very, very large network like that, randomly connected, different nodes will have different degrees. the best way to characterize is it to look at the distribution, counting how many nodes have
one, how many have 10, what it shows is -- what does this mean? it means the vast majority of the nodes have roughly the same number of links and it's -- to find highly connected notes or not so connected notes. now, one of the questions that may have come up in this context
is not whether this model is correct or not or whether this prediction is correct or not, these predictions are actually proven exactly, but whether do we really believe that networks that we see in nature from the network within the cell to social networks are really ran
random? because if we were to assume that society were to be random, that the nodes would be us and the links would correspond to who would we know, who are our friends and acquaintances, then the random universe would have approximately the same number
of -- and they would not see outliars very popular individuals or no -- number of friends. so do we really believe the -- networks like society are to answer the question, we really need to start looking at real networks that we can -- so
we can be guided by experimental data. that's what we did about a decade ago, where we started focusing on the worldwide web, it's a network, nodes are documents, links are urls, and the reasons we focused on that was it was a map -- robot, a
piece of software that allow us to figure out where does one webpage link to, where do other pages link to and so on, and eventually turn a map like that of the underlying network. now, the question that comes up is how should this network look like?
there's lots of randomness in the way we put the links -- because the links follow our -- and even in this very room, there is quite diverse interest. some care about molecular biology, other care about music, albeit others may care about sumo wrestling.
so effectively the links you put on your webpage you would expect this network outside of server may look ron dom. if this were to be random, you would see a -- degree distribution, yet the data show something different. it shows something we call a --
distribution, scaling -- the question you may ask, why is this exciting to have a different function that we expect, does it have any meaning? it turns out it does, i'll give you like a visual understanding of how different is a random
network like the one they built from the one we see here. random network visually looks a bit like the highway system. because this is a relatively uniform network, not truly random because it's what we call a planter network technically, but you don't find any major
city with no major highway connecting it to the rest of the highway system, it would be a very -- afterwards. the type of network we found would be much like the airline network, u.s. or europe, where the nodes are the major airports and buildings are the dotted
lines between them. you can see the major difference between them. here now we have a few major hubs that hold the whole network that's exactly what the power distribution tells us, we have very, very small -- that hold the whole network together.
the question is, only the worldwide web, perhaps -- the same property. here is another system. this is the internet. now i know that in popular -- they're very similar animals. they're often used as synonyms of each other.
the internet is a physical network where computers are connected, it's a very different architecture and the way you build it, when you look at it, it kind of see that it has the networks -- technical terms we call scale-free net kwrorbgs thanetworks thatthere are a fewmajor hubs and
many, many hubs. they've shown the topology of this network -- distribution and once again we call network that have -- distribution scale-free. if you want to kind of think about what it means to be scale free, it -- if you just want to simply go away thinking it's
really a network held together by a few major hubs that many, many small nodes and a hierarchy of huge hubs, you will understand everything in the future i want to say in this talk. so okay, the networks i showed you so far have a common
property, they were all man-built. so is this something we do for whatever reason we prefer to be a hub or there's something more fundamental, some deeper going on over here. to answer the question, we have to look at networks as we have
not successfully built and the networks within ourselves. what you see on the left-hand side is the metabolic network, on the right-hand side is the -- network and both of them actually turn out to have the scale-free property and you can kind of see very nicely on the
right-hand side, there are a few major hubs here, one there, one down there, and many, many place for nodes. so if you think about it, this is pretty amazing because the -- has a history of 20 years, the internet maybe 50 years, airline network, roughly the same.
this has a history of 4 billion years, and yet over 20 and over 4 billion years, networks have -- the same underlying architecture, they built up the same mathematical structure, which is really raising the question, why is that? why do we have hubs in such a
different network? and to understand that we have to go -- what could be wrong, what are we missing really from the model, and the answer it turns out there are two assumptions in the model that turn out to be, you know, not borne out in reality, the first
is that in the random network model, we assume we have a fixed number of nodes that we later neat to connect with links randomly. the number of nodes doesn't change while we're adding the links. in reality, most networks are
growing objects. they start from a new nodes and they add new documents and the web is the best example, because really it started out with one note 21 years ago and now has more than a trillion, how do you go from one to a trillion, one node at a time, always ending
new -- in the system. so growth is an inherent property of this network. if you want to model the network, you should start out with a small module and start adding new nodes to it. if you do that, you have to answer the following question:
how do i decide where the new node connects to existing nodes? that one is actually -- takes us to another assumption -- because in the random network model, we assume that we choose random nodes to connect to and in reality, we find that the new nodes have a preference towards
the more connected nodes. this is what we call preferential attachment. you can see these two rules together, you can see how a network emerges and how naturally hubs show up in the system and the emergence is very clear.
if a new node comes in the system, between a highly connected one and a left connected one, because of preferential attachment, it's much more likely you choose the more connected one and therefore the more connected one will grow faster than the small one and
not only will maintain its state but get bigger and bigger. so this rich and get richer phenomenon are really responsible for th the emergenceof the hubs, because it satisfies the two requirements you need for a scale-free network, they are the resource of some kind of
growth process, whether it took 2 million or 4 million years to emerge, as well as they have some degree of preferential attachment and in the case of the biological networks, we know that gene duplication is a mechanism through which we get this preferential attachment.
now the question is, why do -- better? do we have a simple answer to that? all the things we learned in the last decade is that hubs really have a major role of presence in the system and the best way to look at this is to explore, for
example, one aspect of complex networks or complex systems in general which is their robustness. in general, we perceive complex systems to be very robust against failures, that the system can continue many of its basic functions even when some
of their components are broken. and the question is, where does this robustness come from? and there are lots of mechanisms, both in the cell and in many other systems, that really check for the help of the nodes and if something is broken, we fix it, but the
question is really could the network topology itself contribute to the underlying robustness of the system. to test that, we can start with a small network and ask the question, what happens if some of the nodes break down? well, what would happen?
here i've removed three of the nodes and what you can see is the network broke into tiny pieces. now the question is what if i do it not for such a tiny network but for a much larger, much more connected network? it turns out the field of
mathematics and physics gives us a mathematically very precise answer to this question. what it tells us, if you start from a random network or from a regular network like a score lattice, if you remove a few nodes, it will not matter. but as you start removing more
and more nodes, you will approach a critical point that is inherent to every network and it's cal kaoubl, once you get close to a critical point, the network will actually fall into pieces and you can never get beyond the critical point because you already broke the
network into tiny, tiny components. so essentially what percolation theory tells us is that each network has its own threshold for robustness and if you try to approach that, the system will break into pieces and turn dysfunctional.
it turns out that that property is not valid for scale-free let's start randomly removing the notes. what you can see, i that the network is shrinking but doesn't really want to break apart. let's assume you start randomly closing down the airports around
the united states. well, which airports would you close? well, you're probably going to close many of the small airports because there's so many of them. you're randomly picking chicago o'hare airport is very tiny. so what you will end up doing is
you will shrink the network but you will not necessarily destroy it. and indeed, we have now exact mathematical proof that for a large scale-free network, the critical point where the network breaks apart is exactly one. what does it mean?
it means you can remove 95% of the nodes and the remaining five nodes are still hanging together beingtogether, talking to eachother. but there's a price you pay for that. what if you don't remove the nodes randomly, what if you start with the biggest node and
the next biggest node and so on, and because the network -- the sec tense of the big hubs, and they guarantee its underlying connectivity, if you remove some of the big hubs, the network will break into pieces in no time. so down the line if you compare
these two networks on the left and the right, you see on the left reremoved 28 nodes, the network is still integral, on the right, you remove six or seven nodes, the network was broken into tiny pieces, this is what we call the achilles heel of these tiny networks, but
they're very fragile. if you know how the network looks like, if you have the wiring diagram, you know how to destroy them. now, back to biological systems, the question would come if hubs are so important for the integrity of the networks could
they really be associated biological function. to test that about a decade ago, we took it in pieces and asked the question what if you remove small nodes versus big hubs. in particular, we started to -- do you kill the cell if you remove a small node?
i eubd kateed that if you remove some of the smaller nodes, about 18% of the smaller nodes are lethal. but if you go towards two or more, disturb more interactions, 24% was lethal, and if you went to 15 and more, then 62% were so what this was indicating is
that there seems to ab disconnection between how much of the hub a node is, how interconnected it is, and its role in the network or its cells' ability to survive without it. indeed following this, there have been a number of -- evolve
slower, that they're much more alike in different organisms, and also work that we did years ago to show that hub removal has many more -- consequences than removing a small know in case of humans, and that is not at all surprising because the hubs interact with so many
components, if you remove them, the chances are that you will have many more consequences. this actually suggested the following: perhaps in humans, they should be disease genes because you really can't -- if you do, then you will probably
cause diseases. so we went out to test this, and when the data became available for human interactions, we went out to generate the same flaw that you see over here, asking the question, is it true that the more connected the human -- is, the more likely disease is
associated. to our surprise, there was a very weak effect. no effect would be the horizontal line over here, the yellow dots actually indicate slightly more likely but the effect was so weak that we really couldn't claim much faith
on that. when you're starting about this problem, you really have to think in a different way because humans, as always things are more complicated and we need to systematically distinguish essential from non-essential disease genes.
what i mean by that, essential genes are in utero essential, without which the baby could not actually be born. non-essential disease greens are those whose mutations -- particular with a known human disease. there is some overlap between
them but the overlap is not significant. so then the question was, if we do this separately for the essential and the diseased genes, would we see some difference? and there was a major difference.
what we found was that the essential genes early correlated very strongly with hubs. but there was no correlation whatsoever when it comes to disease genes were not hubs. we dig more into the problem with expression patterns and tissue expression and so on, and
to our surprise, the following image kind of showed up. so from the network perspective, we can actually define two regions in the network. let's define the center of the network, where you have most of the hubs. we also have the genes expressed
in many different tissues, expressed with many other genes, and that's the region of the network where you see the essential genes. then there is a network periphery, the not so connected nodes, the non-hub, the genes that are really specific to the
tissue, that are not expressed with many or genes under most conditions. and those turn out to be more of the disease genes. that one is now in hindsight it's understandable what has happened here. you know, if you think about the
role of evolution, you cannot really survive with a mutation that significantly affects the function of an essential gene. you will not be around long enough to pass it on to your children. so many of the inheritable diseases would not be affecting
the hub because without that, the network cannot survive and it will not be passed in the population and, therefore, most of the disease genes, the ones that we really focused on, are really on the periphery of the network, not in the center. indeed if you do another check
where you look at the cancer genes, then you find that in that case, they tend to be in the center and not at the periphery. now, the next question that comes up along these lines is that, you know, this is about looking across all the disease
genes but how the disease genes relate to the diseases themselves. of course in here we don't have the same problem as a typical -- we don't worry about that there is actually -- relationship between diseases an genes, that is if you peek a cancer related
gene, that can be linked to multiple cancer types and the same way if you -- there can be multiple genes associated with in the network language, this is what we call a buy par tied network relationship, like actors playing movies so you can connect actors to the movies
they played with, but the bottom line is if you have two different kind of nodes that connect across to each other. if you have a -- network, you can define two projections, one is what we call the gene network where two genes are connected -- in the same disease, and that's
the type of network that i will not speak today about, i wrote a book called "the disease network," where two diseases are linked to each other in the same for both of them. if the same gene is introfled in both of them, the two diseases have some common -- origin and
that's actually a meaningful relationship. so now what if you play this game not only for -- cancers but you play for all diseases? that's exactly what we did a few years ago. we ended up -- all the disease and gene associations and we
built up this network and let me just walk you through that. each node corresponds to a disease, and the number of the size of the node corresponds to how many genes are -- to be associated with the disease. in this case, we took the stringent data which not only
the gene has been associated but what we also knew the mutation potentially associated with the particular fee foe type. phenotype. the node color corresponds to the class of disease, and two diseases once again are linked and there is one or more --
genes associated. this is what we call the disease room, that we're referring to in introduction or the human disease network, and there are a number of conclusions you can draw from it right away. first of all in most diseases that we may go different -- to
see are all connected to each other and down to the genetic level because they're all rooted in defects in a relatively handful of genes. you can also notice the colors of same kind and same region of the network and that's not by design, it naturally came out
like that, and that just reflects that similar kind of diseases share lots of genes, and, therefore, they cluster together in the disease network. but the most important thing you can actually ask yourself is that is this meaningful. if i see a link here between two
diseases, should i go through something that may be meaningful biologically or medically? in some cases, we know it's meaningful. for example, if you take that particular link which is over here, which is between diabetes and obesity, we know that not
only they share genes, but there's quite a bit of literature that the two diseases are deeply related to each other, the books and journal devoted to that particular link only. but the question is does this mean that all these other links
may have a similar effect in in this particular case, could it be that if two diseases are connected genetically, that would also imply co-morbidity between them? that is, would the microscopic information that we have at the cellular level and pop up in the
human population -- and to test that, we started taking pairs of diseases, each of those, and started asking do they share a gene, are there protein interactions between the genes, and so on. we also turned to the medicare data, and we looked for the
disease -- story for 10 years for 30 million patients, and we asked for the same pair of phenotypes, whether they are co-morbid or not that, is we measure the relative -- between diseases, the relative risk is one, if the two disease is really overlap just by chance,
it's larger than 1, if more individuals would have the same -- the two diseases that you would expect by chance smaller than one if there is some kind of protective mechanism if you get one, you don't tend to get the other one. so then what we wanted to ask is
that is it true that disease pairs that have this type of relationship that share genes or have interaction network -- protein interactions between the components would be much more likely to be co-morbid than those that don't have that look at the blue curve which is
relative risk, then when you look at all the pieces that share at least one gene, this is the common -- there's a very significant increase in co-morbidity. so what this is really indicating, if you are really linked at the genetic level by
sharing genes, that's information because the microscopic information gets -- the population of co-morbidity better, it shows again the value of this large patient data for asking questions at the mimescopic level and relating things together.
the question is what this is suggesting to us is if we really want to understand co-morbidity better and many other things, we need to start looking at the microscopic level. we need to start looking at how the components within the cell link to each other, and what you
see actually is the latest protein interaction network that mark -- came out with that shows the relationship between the -- the no relationship between and the challenge, of course, is that, you know, if this is so messy, how do we look at the particular mechanisms, how do we
kind of think about local effects? there's one more concept that i need to introduce before we can actually go back to the cell, which is the concept of community. what is indicated increasingly in network science is that
whenever you see a large network like that, there are -- local -- within the network. these are a group of people who tend to know each other because they're in the same department, the same family or they go to the same club. these are proteins or genes that
interact with each other because they are responsible to the same function, and what has happened during the five years is that identifying these regions is really what we call an -- problem but then lots of new algorithms have really popped up that allow us to identify these
locally dense regions in the and what do i mean by that, this is, for example, a very small region of a -- network where the nodes are the phone numbers and the links is who is calling whom, and you see it's messy. imagine if wreu to do all of it, completely impossible to make
sense of it. but by using some of these algorithms, we can actually go into any neighborhood to identify where are the volumes there that, is we can zoom in and replace -- with local neighborhoods, claiming that these genes belong to these
nodes, belong to the same neighborhood and so are these and so are these, and technically the definition of a community is that nodes within the community is more links to nodes within the same community than nodes outside of the so there's a local -- there.
now we had the -- to find the community, the question is is, do they matter? we often call them modules in the case of the network, we can check if these modules are the way we can check it, we know who is calling whom and when they call.
i'm going to show you the same network now but two projections. hi to do oni had one more pieceof information that i didn't have before, how many calls did people make during certain hours of the day with each other. so if it's midnight, they didn't talk much at midnight and if it
was white, there's no discussion whatsoever midnight between them. now you see the same mi module shown at mitt nigh midnight andnoon and you see fundamental differences between that. sleeping at noon but very busy at midnight.
the other module is the opposite. sleeping at midnight, busy at noon. what is important here is that the activity matter seems to be very limited to the particular community, so people in the same community seem to be behaving at
the same time in the same way. which indicates that these modules that we are finding only from the network topology are really meaningful when it comes to the function of the particular system. so with that baggage and with the tools in hand, let's go back
to biological systems and start looking at biological networks and how this is actually playing out. i'm going to use yet another level e trapgs to really understand that, let's use a network that we're all familiar to paout putt thes put thesepieces together.
the network would be -- join me in believing this is not a city map or the map of the cells, where the intersections are the protein and the road -- correspond to the interactions now what does it mean to have modularity in the network? well, if you tk go to manhattanand
you want to go to the theater, you don't just go anywhere but you go to the theater district, broadway. most of the theaters are in one particular neighborhood. should you want to buy artwork, you once again don't wan ter randomly, but you go in the
neighborhood of the 21st street. because that's where most of the major galleries are, that's where you find quality art. so down the line, if you look carefully, you realize that in most major cities, manhattan is a very good example, many functions are very kprart
meantized, very well-defined regions in the network where that has happened. so if this were to be a map of the cell, what would that mean? it would mean that not only function but the breakdown of the function would be compartmentalized, and if that
is true, then for example, taking this analogy further, we could say that cancer is somewhere on wall street. bipolar disease would be in the times square neighborhood, the washington bridge and the bronx, and the idea is that really we should be able to identify these
neighborhoods by looking at the network and where the functions are and the breakdown of the functions, how they correspond to the particular neighborhood. so if you take this picture further, what does it mean to have a disease? well, manhattan phenotype called
a traffic jam. there are many ways you can cause a traffic jam, but down the line you're creating a dysfunction by closing down different combination of streets and the phenotype would be the same that there is no traffic in the city.
in a way this is very analogous to what we see in cancer, that when you look at colorectal cancer patients, their mutations are in a very different type of gene. the way to think about it is that it's not the gene but the function that you are destroying
and the module caries the function and, you know, there are many different ways can you actually -- that particular module and achieve essentially the same phenotype. so if that analogy is correct, then it's easy what we need to do next.
get the map, find the disease module and drug it and we're done. but of course by now you realize that there is a catch in this whole thing. we don't really have a map. let's talk about what are the consequences of not having a
map. one of the areas i worked very closely with mark is the -- interaction network, we helped him develop a methodology that we can estimate how links we are listing from the protein connection network that we currently have and how much
effort would it be to get the full one. what we no is currentl we knowcurrently, you present about 20% of the links that should be there in the cell that is discoverable with the current techniques whafplt does i.what does it mean to have only 20%?
it means we're missing 80%. what you see is that manhattan will become actually eventtively unrecognizablement this is the type of accuracy and completeness that we're really dealing with when it comes to cells. of course -- big dancer than
manhattan so it would not break totally apart but down the line, by missing 80% of the links, you know, we are really looking at the very fragmented picture of reality. so then the question is, what can we do? or we should just simply -- mark
and many others to really go and get us better maps and then restart our thinking about that, and the answer is no, because why believe and actually there's lots of evidence for that, that the existing maps are already very prolific, so what we can do while we're waiting for mark toto
finish his work is to really use the predictive power of the existing links and the existing maps to learn about diseases. so what i'm going to show you next is how we do that, how we go through the particular program very briefly in case of -- or generating any other
so the idea is that the road map shown over here, the first goal is to build an accurate and as complete method we can of the current interactions and the rest is really a literature and many other -- the type of things that ncbi -- the second is to take the known disease
component, the tkpaoepbs w geneswe know about on the network and identify where is the potential module, once you have this module, to use the other genes in the neighborhood of that mod module that would potentially be relative to the particular disease to lead you all the way
to target prediction and so on and mechanism prediction. now, the catch is that if you try to do that for any disease, you will encounter the following situation. so we took about 140 genes that were in the literature associated with -- some streupbg
see and we put this in this network i was shoeuing you earlier, of the 140 genes, only 37 actually were connected to each other, the rest were all over the map. now of course that shouldn't surprise us any longer that we're all over the map because
we're missing 80% of the interactions or even more, so therefore, they could actually be part of their maps except we don't know whom they interacted -- actually connected up. but -- and this is important -- the 37 even though it seems to
be small, it is not small at all. it is very highly significant. because if you were to throw the same number of genes randomly on this map, you would only expect about four or five genes to connect a single cluster. the fact that we have 37, that
is telling us that we are seeing actually the location where potentially the -- should be actually on the map. so let me just illustrate this how it looks like. so this is the flew network, and i highlighted on the network now the -- and you can see that some
of them seem to be scattered, some of them seem to be roughly the same neighborhood, and we think that that tphaoeud would wore spopbt to wa we would call the -- module and there are probably many other genes that belong to that, but we can use this
networking permission to try to discover them. the way we do that is if we start from these genes and we ask let's look at the neighborhood of these proteins and let's find those that connected lots of them because they're potentially related to
the same disease, and to achieve that, we develop different algorithms, the basic idea of the algorithm is that if this is your -- module, if you have a protein that's connected only to -- and you have another one that connects tk-pbd an a bunch of other ones this, is much more
likely to be -- because this may have interactions not because it's specific to -- so to the procedure, you can did ahead and feel out the neighborhood of the pro toe module that we see, and if you do that, this is roughly what you're actually going to see happening in front of your
eyes, we are effectively filling in the module, finding the neighborhood of the genes that maybe associated. some of the other genes that were disease associated now actually gets connected up as well because you put genes in between and now link them up.
so now we have the hypothetical disease module, the question is this relevant for me to ask, and the way to test it is if we start looking to all the different kind of data that is available to us from gene expression data to epigenetic data, simply asking the new
genes if we add you to the module, do they show relevance -- compared to the seed genes that we started from, the experiment to establish -- generally we find that there are typically in the same neighborhood regarding the relevance to -- roughly the same
expression factor, they roughly show the same -- and so on, so this is the evidence but i'm not going to go very deep into that, but down the line what we find is that those genes that are in the neighborhood of the -- module, many of them are so biologically
indistinguishable -- data in their signature from the existing -- gene. not only we find it, but we also find that when we -- the patients with -- or no asthma, and this is what i'm showing here, this is actually the asthma module that we
identified, the one that was so nicely in purple earlier, and then we actually treated both normal cells as well as asthmatic cells, then we ask the question, where would the expression pattern emerge, inside or outside of the module? and what we find is that the
effective -- is very could i govery significantly localized within the module, those in healthy as well as asthma patients, which means it's really hitting some regions of the map. but not only that, but it's really hitting certain variability fine regions of
that, and this may actually give us some -- about both the effectiveness of certain drugs as well as population -- regarding response of the drugs because the way you would think about the response pattern is ta this is a disease module if we are right, approximately, then
the breakdown of this module many different ways could lead to the same phenotype and there are different regions you can break down, and if -- happens to be disturbing the same renal where you actually have the prawn, theproblem, you mayrespond to that, you may not.
so it's a way to zoom your attention into certain neighborhoods to understand the behavior of the system throuthrough so what does this all mean? well, what this all means is that at the end, i tend to think if you want to think what is the future of medicine, and if you
are willing to accept any word from a physicist like myself who's not a medical doctor is that we're going to have to learn in terms of this network. we're going to have to learn that diseases are well -- certain renales i regions inthis network, and in a certain time, if you
remember 10, 20 years ago in most biochemistry labs, there was a -- we would have some kind of representation of the -- network and the different diseases marked here. this is where cancer is residing, this is where asthma is residing, this is where other
diseases are residing. and i think it's fully doable, because all the data we have access to indicates that the disease genes are clustered in -- networks of the neighborhood and we don't see them all in one neighborhood is because the networks are very
incomplete. it also means that, you know, diseases -- this is true, then diseases that are alike should be close to each other. and we see evidence of that. so we started looking at copd tkpw-rb whic --which has verysimilar symptoms to asthma, you can see clearly
that they're in the same neighborhood and not surprising at all. we would think that diseases who have very similar characteristics should be somehow residing in the same neighborhood. and one of the things we're
doing in the lab is that we really believe that measuring the distance between diseases in this existing map or they should give us a very precise measure of how similar or different they are from each other that should correlate with many characteristics from gene
expression patterns all the way to symptoms all the way to co-morbidity pattern. we think this is the distance that really matters, how far you are in the network, and the current available method that we have actually are really just a -- approximation towards that.
so given that, let me switch gears and ask the question, do these networks matter and can we show really how they actually work. one little study we did in this direction is to look at not the molecular network, everything i showed you there was at the
molecular level. all the links were physical the one you would look at in the lab. but to look at the co-morbidity map, simply use the medicare data and ask what are co-morbid, so on this map, what you see actually are two nodes are
connected if they show a statistically significant co-morbidity between them, and if you look at the maps, you can see very clear, interesting patterns coming up. you see many of the -- are in one cluster, in cancer are all in the completely different
universe, connected from that. so this is kind of highlighting that we're doing with a very different disease because really co-morbidity patterns is fundamentally different, but most important, we can ask this question, should you have a choice, which disease would you
want to have. would you want to be somewhere in the center of the network to get lots of attention, or -- that nobody has studied much about. you would prefer to be in the periphery and the way we did that is that -- degree -- how
many links they have under diseases, co-morbidity, then we look what's your rate of survival after eighth year or the likelihood of not survival. what you see is that the more connected a disease is, the less likely that you will actually be surviving after eight years.
so that of course made lots of sense if you're getting one of the center of diseases, that will probably trigger a -- actually systemic break down and we'll probably end the story there. is the summary what i would like to say is i'm personally a big
believer that we need to kind of refocus our attention, this network is very incomplete, but we can get there to make it more complete. it requires three sources, i'm not the person to make it complete, this is really fundamental and experimental
work -- can have as much mower in our hands as the genome project has provided us. i also believe there may ab time actually when you go to the doctor, he will show you let me tell you why the problem that you have that particular mutation that whatever
sequencing project told you, because that mutation is in this part of the network. because thee are the pathways and diseases in that particular neighborhood and that's the reason why you have to watch out, whatever, and this is the measures you have to take
because that particular neighborhood affects -- causes particular diseases. i also have a responsibility to really treat everything about diseases, whether disease classification, from the perspective that really matters, the wiring diagram of the cell,
i'm not saying that we should stop treating cardiologists and neurologists, but we should certainly make each of our doctors a bit of a networkologist as well. the question is network medicine, which is my title, and do we have a chance to really go
down this path. i'm very encouraged by the fact, much of the thinking about molecular networks was really delegated to a small network of -- in the last five years, medical doctors have really taken up the slack and really propagate that we need to
develop a network understanding of disease. a good illustration of that is the fact that harvard is in the process actually started a new division, the division of network medicine, that is hiring new faculty for this new staff to really develop a network
thinking of diseases from the molecular level all the way to -- doctor in their day to day practice. so how do we get there, how do we make this useful. the traditional thinking is this might be the layer, we have diseases that may connect to
each other, co-morbidity, and by now nobody is surprised by the fact that there is an underlying network that determines that, but what is the relationship between them is really the question of all of us and i think we really need to make that relationship really clear,
very predictable and very useful, but also it will not be enough to stay at the molecular level. so we need to systematically start exploring yet a higher layer. a good example, social networks, for example, also affect the
chance of obesity. but that's not only about the social networks. it's like we should be able to include the environmental effects and build into the picture to see how it actually affects the diseases as well as the molecular processes down the
line. so if we succeed, you know, come back to the first -- i showed in the talk, how can we make the doctor the mechanic? and the parts list is already -- the genomics has provided to us. network medicine aims to provide the blueprint, the one that the
mechanic has in his hand to see how the pieces fit together, but to use that, we also need to have diagnostic tools and there's lots of network and correction anwe eventually we have to do what the mechanism does, which is replace the part, but we have to
build all this vertical process in order to be effective as a mechanic would be to fix our car. thank you very much for your attention. >> i arrived a little late but i soaked up a great deal of that presentation and we have time
for a couple questions. >> as you were speaking, i was remembering a wonderful talk that actually francis was there for that jerry -- gave in defense of eric lander and the black matter that's hidden in the genome in relation to gwass.
in those experiments, what jerry did was make oxotrophic lethal mutants and then show on different backgrounds of yeast that those were no longer so my question really has to do with the robustness of your model, and in particular, i would make the new york atpholg,
which is so i need some artwork, i go to chelsea and i can't get through over to that side and it's all blocked down towards soho, but it happens that i know that there are a couple of galleries up on the upper west side. and so i can get my artwork up
so how much really is this gray sort of in between stuff? >> yes, i mean, of course there's lots of gray areas of dark matter in the cells, rnas and lots of other interactions i haven't even touched for because we don't have systematic maps. regarding your analogy, i think
if you really want to buy artwork, you would go to the 21st street, even though the investment would you have to do for that, but i really think that this question can be addressed and there are examples where we have addressed it, and that example, for example, is
the metabolic network. i was involved in a set of experiments where we did -- under many different conditions for e. coli, and you find a different set of genes to be essential in that case, then you look at the metabolic network and then you realize that
indeed, in the different conditions, different set of enzymes are necessary to produce the bio -- so if you want to go back to the new york analogy, new york, wall street is very busy during the day, dead during night. and if you somehow met during
the night at wall street, you will not have such an impact as during the day. so in a way or the other, if you start looking at environmental differences, if you start looking at temporal aspects as to one, you should be able to account for those variations
that you were mentioning, but dark matter wise, there's lots >> it's just that your measures of the robustness, i think, need to be better developed. so as to give us an idea of how precise the map needs to be. >> thank you, dr. barbasi. the last 10 years or so, what
would you say in the work that you've done, what has surprised you most? >> a number of things. one of them is the dark matter, actually. the dark matter, we were referring to in the sense that there's lots of interactions
that were not on the radar screen, like the role of rna and other things. we don't have systematic -- as we have for protein interactions to assess their importance. it also -- i was also surprised by the universe totality of the network structure that we see
across different organizations. we had 10 years ago lots of big arguments -- whether these networks would be like -- with the internet, and you know, until we made -- we had no clue. you could really argue both ways. evolution -- make the biological
networks very different from the worldwide web, yet they turn out to be very similar. is this universally, i think -- built the community in -- it is truly interdisciplinary because we realize that our problems are very common. the computer scientists working
on networks struggle with very similar problems as the biologists. >> so this dark matter sth-rbgs a cousin of the term that astrophysicists use? >> i'm using it as a metaphor to say there's lots of interactions that we do not oven know that
they exist. here i refer to type of interactions that we don't even know that such interaction is possible, like the rna -- the role of rna is one example. >> okay. that's interesting. >> hi.
thanks for that very interesting i have a question regarding the disease network that you showed where you connect the disease where they have common genes. logically it makes sense that, you know, the diseases that have common phenotype would most likely have common genes, pu i
would like to ask that did you -- is there any selection by as that you came across, for example, you know know, more for different types of cancers, and if you did see these type of biases, how did you deal with them and construct --
>> so absolutely, there's lots of selection bias. there are selection bias in the way that -- are built because people focus attention on their favorite diseases, if you need to get cancer -- you've government to connect the genes, you've got to find the link, may
not be true, but effectively there's -- so many links is that biologically really true, not that it doesn't have -- lots of other genes have that many links as well except we haven't really focused so much on them so therefore we don't have that. so there is that one.
there is also, of course, the selection of bias regarding the diseases they look at, the frequency, the very -- diseases obviously are much more explored, we know much more -- associated and so on -- disease is much less with some exception.
so they are there. it depends always on the question, there is no generic answer. so when you try to address the particular question whether it's a disease associated question or a -- property, you need to start asking how you classify -- for
example, one way is to say i'm not going to use literature, i'm going to use literature but i would try to check my conclusions on high -- data that were collected under the similar circumstances, whether it's coming from mass -- or hybrid and see whether the effect is
there as well. so for each of the questions that you need to ask, you need to think do i -- can i build a control that would really put me -- the effect we're seeing is genuine. and to be honest, i would say, you know, four out of five
effects we see in the lab are tossed out at that moment, because they just don't stand it seems to be a literature bias or something like that going on, very disappointing, but hopefully you find the fifth one that is really great you and publish it and you're fantastic.
>> thank you. >> a few more questions. >> thank you, i'm john from national cancer institute. because a lot of our toughest medical problems involve chronic inflammation, it's very natural that you -- to inflammatory it was a little bit difficult to
interpret the results because they were also both lung diseases, so i'm curious if you have even preliminary results on any other inflammatory diseases of other organs, say inflammatory bowel disease or one of the others? >> yes.
so we actually have a project that started a year ago, funded by nih, where we exactly supposed to look at some of the inflammatory -- inflammation and we're attempting to build essentially a disease network that could be -- inflammation the same that we did for asthma.
we do have -- we are through the -- selection process, we're working with a number of patient data -- we had a meeting actually today, hi to miss it ihad to miss it because i was here about that so it's ongoing. there's not much i can report other than that we're working on
it and i'm hoping within six months to a year, we're going to have a paper as well. >> last question. >> very quick one. thanks for the great talk. so in your talk, you sort of presented the network as a fairly static entity.
but as you know, so a lot of the interactions could be conditional on lots of other variables. so going forward, what's your thought, how do we encode it or how should we go about it, so experimentally we cannot measure everything under every single
condition. >> i mean, one of the things we're actually doing right now, we want to simply use gene expression data for different tissues and -- tissue-specific network and to see whether, for example, the disease -- associated with a particular
disease that is not -- and so on, so that's one of the things that we are -- immediately. -- have to have data actually -- kind of harder to get but the closest we could go to this is really look at different tissues, this network that i show, we think of it as
being the skeleton, which would be like kind of the -- all the roads are there. it doesn't mean that anybody walking or driving -- so if you were to take a snapshot of the traffic, you know, at any moment, certain parts of manhattan would be deserted and
others would be busy, that's what the tissue -- would probably give us, and there may be tem por temporal effects aswell. so right now we're approaching that issue and that's where i'm really optimistic -- tissues actually for specific project but we'd like to make it in a
bigger systematic project. >> well, if you'd like to extend your own personal network in the library, that is an available option. please join me in thanking our speaker again for a presentation that was quite inspiring.