Offshore Job Hierarchy

<


alan: so [inaudible] actually invited me to dinnerwith jeff here a couple weeks ago. and i think we probably talkedfor three hours in that. and we probably could havetalked for 10 hours.



Offshore Job Hierarchy

Offshore Job Hierarchy, because the area that he'sinterested is one that larry's also interested in, whichis machine intelligence. and how do we push thenext generation? and it's obviously reallyimportant to our company.


but he has an entire careerwhere he's basically built from graduate school on amechanism to basically allow him to study what is i thinkone of the most fascinating fields that any of us willever be involved in. and it's biologically inspiredmachine intelligence. and for me, it combinestwo really, really interesting things. one is obviously computing andhow we commute and things like that, but more importantly howwe think, how humans think.


and i think what was reallyexciting to me is the advances made in biology to understandwhat the mind does and how it processes things and thedifferent mechanisms it uses. and what really came to me wasit's an entirely different computing model. and i think that's what'sso exciting about it. so i won't go into the longand illustrious history. i'll let jeff do that. but i am just so pleased andhappy, both that he could come


and all the interestin the room here. so thank you very much. jeff hawkins: thanks, alan. audience: [applause] jeff hawkins: i should begood with this mic. hopefully, you all can see me. well, thanks for coming out hereand just keep piling in and sit wherever youwant, i guess. it's a pleasure to be here.


i'm not going to tell youabout my long career. i'll skip all that. i've had two careers-- one inmobile computing and now another one in brainsand neuroscience. but really, it's the brains andneuroscience that's been the one that's been goingon for a long time. so i'm going to talk aboutmachine intelligence. i'm going to talk aboutwhat it is. i'm going to talk about how ibelieve we can get there and


what we're going to dowhen we get there. and some of the parts of thetalk will be pretty technical. some will be less so. i'm going to start off withlittle sort of an intro history here. my approach to this-- hopefully,this is going to work now-- let's try this. oh, i had that down. that's it.


my approach to this hasbeen the following. it's a very biologicalapproach. it's first to discoverthe operating principles of the neocortex. and once we understand thoseprinciples, we can then build machines that work onthose principles. and the neocortex, just in caseyou've forgotten, it's the big, wrinkly thing onthe top of your brain. it's about 75% of the volumeof your brain.


it's where all highlevel thought, language, vision occurs. anything you can tell meabout the world is stored in your neocortex. mine's speaking. yours is listening. that's an organ ofintelligence. our approach is likethe follow-- oops. oh, i'm going backwards here.


i got to get used to this. our approach is the following. we start with very detailedanatomy and physiology. i'm more of a neuroscientistthan i am a machine learning person in some ways. i view the brain as aset of constraints. this is the one proof case weknow that this can be done. and we know a lot about theanatomy and physiology of the brain and all kinds of stuff.


and if we're going to have atheory about what intelligence is, it has to be consistentwith that. we don't have to emulateall of that. but we at least have to realizethat that is a set of constraints. from that, we developtheoretical principles. we can test those empiricallyback into neuroscience. but we can also test them bybuilding the stuff in software and experimenting with it.


this is what we'redoing today. ultimately, this is goingto go into silicon. we've actually got one projectstarting this year with another major computing companyto do some silicon of our algorithms. but today, it's allin software. now, i thought-- i added a few slides here tobegin in my talk, which i don't normally do.


i wanted to give you a littlesort of historical perspective because i know there's a lot ofpeople here in the machine learning community andsome really experts. and i just wanted to give youa little bit of my view of where we've come from andhow we're getting there. so i'm going to goback in time. we're going to starta history of ai. this guy alan turing iswhere i began in ai. starting in the 1930s, hestarted talking about


computers as universalmachines-- the universal turing machine as wecall them now. and he was very much interestedin computer intelligence. and he thought we can makecomputers intelligent. he didn't want to get inarguments about it. he wrote about this. he said, i didn't want to arguewith people whether it's possible, whether machinescan be conscious.


so he said, i came up theidea of the turing test, specifically. so he just said, look, if amachine could pass the turing test, we'll just have toagree it's intelligent. and so that's what he did. and he wrote that paper in 1950,the one that's shown right here. and this is sort of thebeginning of the ai movement. unfortunately, what this did isthis set the idea that the


goal of ai is to do thingsthat humans do. and i don't think that's thegoal of machine intelligence. we don't have to replicatewhat humans do. we can do things thatare super or less. it's not that specifically. so the field of ai wenton for many years. it's still going on. i call it the no neuroscienceapproach. there's been many projects andtechniques developed over the


years, some quite successful,some less so. i've shown some pictures of someof the great achievements here-- playing chess, winningat "jeopardy." i put the google car there. i want one-- major initiatives. my summary of this is the goodnews about it, you can come up with good solutions to problemswe care about. the bad thing is that they'revery task specific.


they're kind of brittle. the google car doesn't knowhow to play chess. it's not going todo my laundry. and it's not going tothink about physics. and so they're verytask specific. and they have limitedor no learning. and what makes intelligence forhumans, of course, is we learn about everything. and i can do all those things.


and so can you. and so i never viewed this as areally great way of getting to true machine intelligence. now i'll just talk about thehistory of artificial neural networks-- so the brainapproach, if you will. i start this back with these twoguys, warren mcculloch and walter pitts. they wrote a paper in 1943 wherethey talked about, well, neurons could be likelogic gates.


and if we put them together, wecan do computing with them. so that's an interesting idea. well, it's really not theway brains work at all. but what they did is theyintroduced the idea of an artificial neuron thatcan be used in an artificial neural network. now, their approachwas kind of odd. on the left, you seea real neuron. in the middle, you see what isa classic artificial neuron,


which is a sort ofsum to weights and activation function. that's nothing at alllike a real neuron. and on the right is whatmcculloch and pitts show, which is these little, really,really ridiculous neurons were doing ands and ors and nots. but that was beginningof the artificial neural network error. over the years there's beenmany, many different things in


this category. you really can't put themunder one category. but i call it minimalneuroscience. you'll see in a moment. so starting in the '80s, wehad a real resurgence and interest in artificial neuralnetworks, back propagation, boltzmann machines, and so on. this has continuedon today into the field of machine learning.


and the current hot topic thereis deep belief networks. and i would say the good newsabout this is, well, these are learning systems. that's good. they're distributed. they're really good classifiers.that's what mostly they're usedfor, so sort of classification problems. the downside is they'revery limited.


they don't do much more. there's a few exceptionsto that. and they're not brain-likeat all. and it didn't feellike this is-- when these became hot againin the mid-'80s, and i was working on this in themid-'80s, i was very disappointed. because people weren't reallypaying attention to the neuroscience.


they were just saying, hey, wehave these neuron-like things. and we're doing stuffwith them. and we know a lotfrom the brains. we should be goingback to them. and lately, there's been anotherthrust here, which you might call the wholebrain simulator. this is maximal neuroscience. you might have heardabout this. there's a project in europeunder henry markram called the


human brain project. they just got huge amountsof funding. their goal is to simulate anentire brain from the ion channels to the spikes to theneurons to everything, up to psychology. that chart in the middle is theprincipal researchers on this project. it tells you how many peopleare working on this. they do these greatsimulations.


the problem with this isthere's no theory. they'll admit this. there's no theory. they have no idea what thisis supposed to do. they hook it all up andsee what happens. and it's not going to leadto machine intelligence. we need theory. we need principal waysof going about this. so that takes me back to myapproach, which is essentially


like, look, you've got to payattention to the brains. we don't have to copyeverything that goes on in a brain. but you better understand whatit's doing before you decide what else to do. and that's the approachi've been taking. so i'm going to walk you throughthe progress we've been making recently. and i think it's really cool.


first of all, we haveto talk a little bit about the neocortex. it's a memory system. so if you think about it asa computer, forget it. it's not. when you're born, it's ablank memory system. and as you grow up, you fillit up and you learn things. now, it's a memory system that'sattached to some of your senses.


and you learn everythingthrough your senses. you have from your eyes,the optic nerve is about a million fibers. your somatosensory nerve isabout a million fibers. your auditory nerveis about 30,000. so you got a couple millioninputs going into your brain. they're fairly high velocity,not in computer terms. but they're changed on the orderof tens of milliseconds or hundreds of milliseconds.


your eyes are moving every fewhundred milliseconds-- a completely new innovationgoing on. and my speech is changed onthe orders of tens of milliseconds. so you've got this fast datastream coming into the brain. the brain has to builda model of the world. when you're born, you don't knowabout things like google. you don't know about buildingsand chairs and cars and computers and operatingsystems and bananas--


none of that stuff. you have to learn everything. it's an amazing amountof stuff. and you have to learn it throughthis fast stream of data coming in. and so the neocortex buildsa model of the world. and from that model, it doesthree basic things. it makes predictions aboutfuture events. it says, hey, given my currentstate and what's coming in,


what's going to happen next? it can detect anomalies, whichare just predictions that don't come true. and it can take actions. and it turns out actions areactually the same as predictions. and so when we take actions, we back-interact with the world. and so actually, it turns outthat most of the things that


you experience in the world,most of the changes on your sensory organs, are causedby your own actions. so as i move around the room andi turn my eyes and so on, i'm controlling the patternsthat are changing here. and so there's this real bigtype feedback loop that goes on between a sensoryand motor action. and if i were to sum this allup, and say, what is this system doing, this iswhat it's doing. the neocortex learns a sensorymotor model of the world.


it learns, given a sequenceof sensory patterns and a sequence of actions, what'sgoing to happen next-- what do i have-- my expectations ofwhat's going to happen next. can i predict the future? and can i predict what i shoulddo to get the future that i want? and this is basicallywhat it's all about. it's building a sensory motormodel of the world. and we want to knowhow it does that.


well, we know a lot about it. we've learned a lotover the years. and i'm going to give you thetop six principles right now. and this is in no particularorder. and i made these up. i mean, i didn't make them up. i made the list up. but we're just goingthrough them here. first, number one-- it's anonline learning system.


so it's a memory system. but it's got to work in anonline fashion, meaning it's got to work in astreaming mode. there's no batch processing. the brain doesn't getto look at the statistics of a database. it says it's coming in. i got to act on itimmediately. i have to incorporate it intomy model immediately.


it's an online learningsystem. and this is essential. in a world where patterns arealways changing and where the underlying structure in theworld is changing, you have to have an online learningsystem. you can't do thisin a batch mode. brains don't workin a batch mode. they don't know statisticsin that sense. the second thing iswe know that the


neocortex is a hierarchy. it looks like a sheetof cells. it's just two or threemillimeters thick. it's about the size of a dinnernapkin for a human. and all of it looks very,very similar. but we know from anatomy andother reasons that it's actually different regions thatare connected together with these nerve bundles ina hierarchical fashion. there's an estimated--


anywhere between dozens andmaybe a couple of hundred regions in the neocortex-- a human neocortex, dependinghow you're counting. we know that these regionsare in this hierarchy. we know the informationflows up the hierarchy and down the hierarchy. we also know that the regions,amazingly, seem to be doing the same thing. there are slight variationsto this.


but the basic idea isthat every region is doing the same thing. they're all doing the sametype of memory function. and so it doesn't matter ifyou say, well, this is a visual area of the neocortex. it's visual because it's gettinginput from the optic nerve or other visual regionsand auditory areas because of where it's getting input from. and we can rewire the brains.


and these regions take on newmeanings and new roles. so we have this hierarchy ofmemory regions that are all doing the same thing, whichtells us that we understand what one region is doing, andwe understand how the hierarchy works, we're a longway towards our goal. the third thing is whatis the kind of memory that it's storing. 90%-- not all of it-- but 90% of whatthe neocortex is storing


is sequence memory. it's patterns over time. this may not be obviousto you. let me walk you through it. almost all the inferencein motor behavior is sequence memory. so you're listeningto my speech. and hopefully, i'm nottalking too quickly. but the patterns are comingin over time.


the order matters. you have in your brain storedwhat words sound like and what certain phrases sound like. and your brain is matchingthem up in time. if we move the order and patternof those patterns in a different order, it would begarbage-- totally garbage. and people tend to think like,well, vision's not like that, is it. vision is like that.


real vision is you are movingconstantly through the world. your eyes are movingevery two, three, four times a second. and every time it's thisconstant change. these are not images thatare just randomly represented to you. your brain is directing this. and it's figuring outwhat patterns. and so when i look to the right,i know what to expect.


if i look to the left,i should see alan over here again. there he is. and so i have these expectationsabout the world. so even vision is this--all inference. france you think about language,music, vision, even when i touch things. it's all temporal patterns. the second thing is we generatemotor behavior.


and motor behavior is, ofcourse, another sequence. my speech right now is involvingdozens of muscles being exercised in veryprecise patterns-- extremely precise patterns--over time to generate these speech patterns. and that's true forall my behavior. and so we're playingthese back. these are stored patterns. they're stored.


i could repeat them. i can repeat them because i'vegot these things in my head. i know these. i've learned how todo these things. i could give this talk blindand probably in my sleep. not true-- just for you guys. so it's all aboutsequence memory. and that's the key.


and that's an area that hasnot been explored enough. the fourth item here is thatwhen you look in the brain, we find the same type ofrepresentations. it's called sparse distributedrepresentations. everywhere you look, yousee lots of neurons. it doesn't matter wherein the brain. very few are active. most are relatively inactive. this is true on the sensorstreams, as well.


we now understand alot about this. and it's a critical component ofhow the whole system works. i'm going to go into detailin this talk. the fifth one here is that weused to think that the regions of the brain-- that some weremotor regions and some were center regions. like this is the primaryvisual area. and this is the primarymotor area and so on. we now know this is not true.


every region in the neocortexhas cells that are both inference or sensoryand motor. and there's differentiated-- the layer 5 cells orthe motor cells. but everywhere you go, evenprimary visual cortex has cells that project to somethingthat's motor-- the muscles that makeyour eyes move. so there is no pure sensoryor pure motor. in fact, this is anotherbeautiful discovery.


essentially, what every partof the cortex is doing is sensory motor learning. and so we want to figureout how it does that. finally, the last of my sixelements here is attention. there's different typesof attention. i'm not going to go intodetail here today. but essentially, i need to beable to attend the various parts my input stream, variousparts of what i'm doing, at different points in time.


now, i claim these are theprimary six things that are going on in the neocortex. and i'll make a further claim. i think they're both necessaryand sufficient for biological and machine intelligence. i don't think you'regoing to get there without these elements. it doesn't mean there aren'tother things we can do. but these in my mind arenecessary and sufficient.


they're necessary to learna sensory motor model of the world. all mammals have a neocortex. and they all have theseprinciples from a mouse to a human. dolphins, monkeys, cats, dogs--they all have these principles operatingin their brains. so you don't have to be humanlevel to need these things. but they all havethis going on.


there's things i didn'tput in this list. for example, i didn'tput language. you don't need to havelanguage to be smart. you need it to be humansmart, yes. but to build a modelof the world, no. dolphins are really smart. and they have verylimited language. i didn't put other things--episodic memory, dreaming. you don't even haveto have a body.


you have to have motor output. but that could be all virtual insome cyberspace someplace. so there's a lot of things youdon't have to have to build machine intelligence, but ibelieve these six elements. and if you're going to forgeteverything else i talk about today, just rememberthere's six things. and the most important oneis sparse distributed representations, which i'mgoing to talk about next. i'm going to talk about threeof these, actually.


i'm talking about sparsedistributed representations. but i'm going to talk aboutsequence memory. and i'm going to talk aboutthe online learning. we know a lot aboutsome of these. but there's a lot of thingswe don't know. there's a ton of stuff i stilldon't know and don't understand. but we know enough that we'reactually starting to build this stuff.


and it works pretty well. we're going to jump intosparse distributed representations. the best way to understandsparse distributed representations is to thinkcompare it to the way we do things in computers, whichi'll call dense so, what do we doin computers? we want to represent something,we take a word-- 8 to 128 bits.


i don't know what we'reup to these days. we consider all combinationsof 1 and 0s, from 0000 to 1111. an example, of course,is the ascii code. and that's the representationfor the letter m. now, we can say things like,well, what do the bits mean in this code? they don't mean anything. if i say, what's the third bitin an ascii code mean?


it doesn't mean anything. the whole number meanssomething. somebody could say, well,it means an 8 bit. that's not what it means. it doesn't tell me anythingabout the letter m. and these representationsare arbitrary. we could've assigned a differentrepresentation of the letter m. and it would havebeen just fine, as long as we all use thesame convention.


in brains, it worksvery differently. first of all, you alwayshave lots of things. when i talk about bits, you canthink of it as a neuron. and when i say the bit's a1, the neuron's active. and the bit's a 0, theneuron's not active. so we have many, many bits-- thousands of them. and they're mostly inactive. so mostly zeroesand a few ones.


in our work very often and forthis talk, i'm going to stick to an example where we're using2,000 bits, of which 2% are active. so i have 40 1s and 1,960 0s. that's how i'm going torepresent everything. there's many, many waysi can pick from those. but that's how i'm going torepresent everything. now, there's several keythings about sparse distributed representations.


but this one is that each bithas semantic meaning. you can actually say whatit means if you knew. these are learned. no one's going to assignit in a brain or an intelligent machine. it's learned. but it's relatively stable. and they mean something. and so, when we want to pick arepresentation, it's sort of a


competitive process. we take the top 40 semanticattributes. and those are the ones thatare going to be in our representation. if i want to represent theletters of the alphabet using sparse distributedrepresentations-- and i would not do this-- thisis just purely for example-- and i wanted to engineer this,i could say, ok, i would have bits that represent is thisa constant or a vowel?


what does it sound like? is it an a, e, i, o sound. is it a fricative sound? is a hard sound, a soft sound? how do i draw it? do i have ascendersor descenders? is it a closed shape? where is it in the alphabet? what's it next to?


what other meanings doesthis thing have? and i could come up withall these attributes. and then i'd pick thetop 40 to represent any particular letter. that's the basic idea. now there's some propertieswith sparse distributed i'm going to go through a few ofthem here that are really, really important. the first is semanticsimilarity.


basically, if i took twosparse distributed representations, and i comparethem bit for bit, if they share a bit in the samelocation, then they're sharing semantic meaning. this is not arbitrary. this doesn't happen by chance. it's meaningful. and even just a few bitsof overlap between two representations is statisticallyvery


significant. but it's also semanticallysignificant. now, what if i asked you tostore one of these patterns? i want to remember this. and i'm going to askyou, here's a new pattern coming in. i want you to tell me ifyou've seen it before. so you might say, well,i'll store 2,000 bits. and then when the new 2,000 bitscomes in, i'll check and


see if it's the same thing. that's not the way we'regoing to do it. we're going to store theindexes of the 1 bits. so we're going to say, ok, allyou need to remember are where the 1 bits are. and so i have 40 1 bitsin my representation. so i'll have 40 indicesto the 1 bits. and if i see a newrepresentation-- i looked in those locations, andif i see 1s, i know i got


the same representation. because every representationhas 40 1s. what if i told youcouldn't do that? i said, you can onlysub-sample. you can only sample afew of the 1 bits. you can't store the locationsof all 40. so i'll say 10. you can only pick 10. we'll randomly pick a 10.


now you have indices tothe 10 of the 1 bits. now i'm seeing a new pattern. you say, is this thesame pattern? and you look and say, yes. the 1s are all in thesame location-- the 10 1s i know about. is it the same pattern or not? well, you say, well,i don't know. what about the other 30 bits?


they could be different. well, it turns outthat it's very unlikely they'd be different. but even if they were different,i'd be making a mistake but a mistake withsomething that's semantically very similar to thething i stored. and this is the key togeneralization in the brain, is that you don't need to storethe location of all the bits and know whateverything is.


basically, when you make errors,you're making errors of semantic generalization. in fact, i could say, you knowwhat, it's good enough to just match five or eightof these things. and still i'd have a semanticgeneralization. so i have a way of scalingit back, up and down. i'm going to tell you thelast property here. and you'll see why all thisworks in a moment-- how we're going to use this--is one of union membership.


let's say i took 10 sparsedistributed representations. and i [? ord ?] them together. so now you have 2,000 bits. but instead of 2% of the bitsbeing active, it's about 20% of the bits being active. and that's a one-way street. i can't undo that. i can't say, oh, what were the10 bit that were in there? can't do it.


but i can do somethingalmost as good. i can show you a new sparsedistributed representation and ask, is this one ofthe original 10 by looking at the union. and i can do that. i can say, well, just look forthe ones that are in the new one and see if they'rein the union. and if they are, i'm goingto claim it's one of the original 10.


now, you might say, hey, itcould make a mistake. it could be picking some 1s fromthe first one and some 1s from the second one and so on. statistically, extremely astronomically unlikely to happen. but even if it did make amistake, it wouldn't matter. because i'm going to be making amistake for something that's semantically very similar tothe thing i stored earlier. and that's good enough.


in fact, it's what we want. it's not even good enough. it's actually desirable. again, if you want forgeteverything else and zone out the rest of the talk, rememberthe future of intelligent machines is sparse distributedrepresentations. i'm telling you, there'sno other way around it. and i was just talking to johndown here earlier about this, i believe, is the future ofunderstanding language, as


well, and text. because this is howthe brain does it. we're now going to skip to thenext thing-- sequence memory. this is 90% of what's beingstored in your brain. the various types of sequencememory, it's not so simple, but the various types. and we spent years trying tofigure out how this works. and we think we got it. let me just give you someof the neuroscience.


zoom in on any sectionof the neocortex. it doesn't matter where. you'll see there's these layersof cells-- typically five layers of cells. i'm arguing that they all area type of sequence memory. they have similar attributes. it doesn't really matterwhat layer you look at. what i'm talking abouthere would apply to any layer of cells.


they're all different typesof sequence memory. so if we zoom in on one of thoselayers, what you'll see is you'll see the cells packedin there really tight. there's about 10,000 percubic millimeter. but they have two organizations which are worth noting. one's shown by the green arrow,which is that cells that are in a very skinnyvertical column have similar response properties, especiallya feed-forward


response property. they all seem to respond to thesame thing in the world in a feed-forward basis. however, 90% of the connectionsare horizontal connections, across thecolumns in different areas of the brain. so 90% of the connections areelsewhere where they have this very strong verticalorientation. if we zoom in further and lookat one of those cells, we see


that the cells are dominated bythis dendritic arbor, which is this tree-shape structurearound it. all the connections-- the positive connections-- tothe cell are on the dendrites. they're not on the cell body. and so on a typical neuron,there's anywhere from several thousand to a few tens ofthousands of connections on those dendrites. if we zoom in further, and nowyou're looking at one little


section of a dendrite in thispicture here, you can actually see the synapses in thiselectromicrograph. there's those little spinescoming off to about 1 micron apart ranged along thedendrite there. we now know-- and we didn'tknow this 15 years ago-- but we now know that there's avery nonlinear effect that's happening here. if a number of these synapsesbecome active at the same time-- relatively the sametime-- with a few


milliseconds-- and a short distance from eachother-- within about 40 microns of each other-- and enough of them happen atthat time, then you get a very nonlinear event. you generated what's calledthe dendritic spike. and it goes to the cell body. and it depolarizesthe cell body. the cell body goes ina hyperactive mode.


it's ready to fire. it's anticipating. it's predicting. so every little section of thedendritic tree is like a coincidence detector. it's a threshold of coincidencedetector. it says, if i see abunch of inputs at the same time, bingo. if i see the same number ofinputs spread out over time or


spread over the dendriticarbor, nothing happens. this is such an importantfeature. there's hardly anybody who'smodeling this today. but this is a key tounderstanding how the whole thing works. we didn't know about ittoo many years ago. we model all this. and here's a picture of oneof our simulations. on the left, there's alayer of cells with


four cells per column. i'm going to show more detailedpictures of this. our neurons-- our artificialneurons-- capture a fair amount of depth of whichis going on in the brain in real neurons. these colored dots representthe synapses. and i'm only showing someof them in this picture. the green ones are ones thatare close to the cell body. i'm not going to talkto them further.


those are how we form thesparse distributed the blue ones are on thedistal dendrites. these are 90% of theconnections. this is how we're goingto learn sequences. and we can model theseas a cell-- each cell is a set ofcoincidence detectors. and when the pattern comes in,if it detects it, it's going to make the cell intoa predictive state. how does this all work?


how does this learn sequences? let's start with a picture. here's a picture of our sparsedistributor representation. but now we're showing itas a sheet of cells. and the red cells are theones that are active. and the light ones are theones that are inactive. this is just a partof our 2,000 bits. this is is a reminder. i'm not going to tell youhow we formed this.


but it's through a localinhibitory competition. at any point in time, i havesome pattern on here. and at another point in time,i have a different pattern. and so as i'm talking, and asyou move around the world, this is what's goingon in your brain. everywhere you look,you see these cells sparsely activated. oops, i went too far there. and you've got these patternsthat are changing over


in time like this. and we want to learnthe sequence here. we want to learn how do i learnthe sequence of these distributed patterns? and the answer is thatthe brain does it a cell at a time. each cell learns to predictits own activity. so when a cell becomes active,what it does is it says, let me look around for guys who werepreviously active just a


moment ago. let me see if i can finda bunch of them. and i'm going to formconnections to them. and i'm going to form thoseconnections, as you see on the bottom right here, on one ofmy dendritic segments. so if i see that patternagain, i will predict my own activity. and this is the beginningof sequence memory. if we did this, and let's sayi showed it a pattern.


here's a situation where i'veshowed it a pattern. the red cells are the cellsthat are getting a feed-forward input. they're active. the yellow ones are hyperpolaror depolarized. they are predictive states. now, why are there moreyellows than reds? what if i train the system-- ana followed by b and then a followed by c and a followedby d. and i show it a?


i'm going to predict b, c andd. this is a union of this goes back to our unionproperty earlier. i'll have a unionof predictions. and i can tell if what happensnext was one of things i predicted or not, even thoughi'm predicting multiple things at the same time, whichis really what we're always doing. now, this memory i've justshowed you, this is a transition memory.


it's a first order transitionmemory, meaning i can only make a prediction basedon my current or this previous time step. what's happening now, i can makea prediction [inaudible]. i can't use a history of time. but we need a highorder memory. the reason we need a high ordermemory, because that's the way the worldis structured. the high order memory says i'mmay need to go back a long way


to make the correctprediction. so, imagine i'm listeningto a melody. i can't predict the next noteby just listening to the previous note. i may need to hear five notesor six notes or 10 notes. the same with speech. to predict what i'm going to sayor understand my speech, you have to understanda long context. and the same if you're walkingdown the hallway.


this is a high ordertemporal pattern. you know, it's just like, oh,you have door, door, door, then the third door on theleft type of thing. this is the way the world is. so we need to make ahigh order memory. and this is a firstorder memory. so the way we're going to dothat is we're going to use columns of cells. and let me explain howwe think this works.


so, imagine i gave youa sparse distributed but instead of each bit being acell, i'm going to make each bit be a column of cells. so in this case, i show10 cells per bit. and i'm going to randomly chooseone of the cells to be active in that column. so if it's a 1 bit, and ipick a column, i pick one of those cells. i have a much sparserrepresentation now.


instead of 2,000 cells,i have 20,000 cells. now, i could pick the samesparse distributed representation a momentlater and use a different set of cells. i just randomly picka different set of cells in the columns. and so i have a differentrepresentation. it's the same sparse distributedrepresentation, the same columns, butdifferent cells.


if you just think about this,we have 40 active columns. there's 10 cells per column. so there's 10 to the 40thdifferent ways that represent the same input in differentcontext. so here's an example. i'm going to say a sentencethat has a sound repeated four times. there are too manytutus to count. now, the sound "to" was usedfour times in that sentence.


you didn't get confused. if it was a first order memory, you would get confused. but you didn't. and you heard them, even thoughit was the same sound, that you had different meaningsat different points. so at one point in the brain,you had to have the same representation becausethe same sound's coming in on your cochlea.


at another point in the brain,you had to have a different representation because youdidn't hear them as the same. and that's what'sgoing on here. we have these columnarrepresentations. and it allows us to createvery long, high order sequence memories. i'm not going to walk youthrough all the details of how this works. you'll just have totake my word.


you can read it on ourwebsite if you want. but in the end, if you do this,and you use the same type of learning rulesi was just talking about a moment ago-- you form a sequence memoryof sparse distributed it's variable order. it can be as high order asthe statistics allow. so it's not like fixed order. it's distributed.


it can do multiple, simultaneouspredictions about what's going to happen next. it's very high capacity. this little memory here canlearn millions of transitions. and it allows for a semanticgeneralization. imagine if i train this ona series of patterns. and now i give it a new seriesof patterns that are not exactly the samerepresentations. but they have some bitsoverlapping with them.


so they're semanticallysimilar. i can apply my sequence memoryto what was a previous learning to a new input thatis semantically similar. and that's what thebrain does. my last thing i'm goingto talk about is how we do online learning. this is the third of my sixelements i'm going to talk about here. and what's this all about?


well, basically, it meansbecause we're doing online learning, we have to train onevery moment in time-- every new input to the system. and essentially, you don'tknow if it's noise or something valuable, so youhave to train on it. and basically, if it doesn'trepeat, you forget it. and if it repeats, youwant to remember it. so that's pretty muchwhat it is. now, here's a little bit ofneuroscience you probably


don't know. maybe you do. but probably, you don't. we used to think-- just payattention, just listen to me for a second. we used to think that all memoryand formation in the brain was the strengtheningand weakening of synapses. and clearly, that happensto some extent. but we've now learned thatsomething much more important


happens in memory, whichis we can form new synapses very rapidly. and we can forget themvery rapidly. so instead of just strengtheninga connection, we can form new ones. and that's a much bigger poolof potential things you can connect to. we can do this on the order ofjust a few tens of a seconds that you can form a completelynew connection.


so even one exposure is enoughto do this often to form a new neuron. so these guys on the dendritictrees or those spines, if you look on real neurons, some ofthem are there for very long periods of time. some come and go every day asyou're learning things. so there's much higherinformation capacity in forming new synapses thanstrengthening old ones. in fact, in the real brains,synapses are highly


stochastic. they're very unreliable. so if anyone shows you aneural model that has precision of two digits or onedigit of precision that's required in a synapticweight, forget it. it doesn't work like that. so the way we do this iswe model this growth. we say that we have somethingcalled the permanence, which is a scalar.


it goes from between 0 and 1,which is essentially modeling the growth of a synapse. so i can start growing asynapse and not make a connection. if the permanence gets abovea threshold, we now say the synapse is connected. and we just giveit away to one. binary weights-- we don't try to be morefancier than that.


but we have this ideaof a permanence. and once i'm over a threshold,if i keep reinforcing this, my permanence can go up. the connection doesn'tget stronger. but the permanence goes up. and it makes it harderto forget. so you want to do that becauseif things repeat over many times or you want to make it soit's very hard to forget. so that's how we do that.


if you put this all together,and you say, hey, i want to simulate one of these things,which is what we do all day long. you have 2,000 columns. we do 30 cells per column,128 segments per cell, 40 connections per segment. these are all very realisticnumbers in neuroscience. you put this all together. you basically have about 300million in this little model--


about 300 million synapses. each one has a connectionindex and a connection permanence. the connections arevery, very sparse. there's a lot of tricks we cando to make this run fast. and noticeably, there's nosingle points of failure here. you can drop out synapses, dropout dendrites, drop out cells, drop out columns. the system keeps behavingvery nicely.


this has a lot of appeal forhardware guys, who-- as we talked about buildingthis stuff in silicon. i'm now going to switch gears. i'm going to talk about-- and ithink it's very important to build this stuff and make itwork and prove it and make commercial value out of it. so we're going to do that. we have been doing that. we're applying this inthe space of data.


i'm going to give you moreon my take on data. this is the data company. i'm nervous as hell talking toanybody about data here. because you guys are the data. you own the data. you be the data. but today, this is my view ofthe world-- my simplistic view of the world. today, we're getting hugenumbers of sources of data.


it's growing exponentially. we stick them in data bases. the vast majority of thedata in the world is never looked at ever. it just sits there. we have two ways of gettingvalue out of it. one is the visualizationtools. and the other is creatingmodels. and then if we [inaudible]those models,


we can act on them. there are challenges here. one of the biggest challengesis that this whole system is not very automated. and it takes data scientists--people like you-- to do this stuff. and we want to get to a worldwhere there's not just hundreds or thousands ormillions of models. we want to get to one wherethere's billions of models--


the internet of things-- everything in the world's goingto be creating data. and we need to be able tomodel all this stuff. the other problem is-- and so today, it takeslots of people. it's not automated. the other problem is the modelscan get obsolete. if you're not doing onlinelearning, and most techniques today are not online learning,you have to rebuild your


models all the time becausethe patterns in the world change. and people just aren'treally looking at temporal data very much. much of the patterns in highvelocity data, especially high velocity data, is temporalpatterns. and that's almost veryrarely do people take advantage of that. they actually try toget rid of it.


so my view of the worldtomorrow, it's not like the current world's goingto go away. but this is where i think thegrowth is going to be, is we're going to go to a worldwhere there's literally-- i'm not joking-- billionsof machine learning models out there. the data's going to streamright into the models. there's no storage required. you're not to save this stuff.


the models are going to buildand continually update themselves. and you're going to immediately take it to actions. and so if you look at it, itlooks just like what brains do-- go over back to whati said earlier. i said, whoa, look at that. so let's try to apply ourtechniques to this. and so that's whatwe've been doing.


the key criteria here is youneed to have automated model creation for billionsof models. you need to have continuouslearning. and you need to be able to findthe temporal, as well as the spatial patternsin the data. so we built a productcalled grok. it's an engine for actingon data streams. it's essentially a productizedversion of the thing i was just telling you about.


on the left, you can see we havestreams-- the records of data coming in through time. it can be one or more fields. we run those fields throughencoders, which are just like your sensory organs. literally, they're modeledafter sensory organs-- modeled after cochlea. and we can turn them intosparse distributed those encoders can befairly generic.


we do not have to do newones to do every different problem we solve. we use a set of genericencoders. we then get sparse distributedrepresentations. and by the way, we can put infield-- any kind of numbers, categories, text,dates, times. you can do custom things, aswell, if you want to-- semi-structured data. we run this through the sequencememory i was just


telling you about. and it basically looks for thespatial and the temporal statistics of that data. it can make predictions. it can detect anomalies. and then from that,we take actions. what the used has to do in thiscase, they have to define the problem. this is actually tricky.


they have to do a goodjob at this. then they have tostream the data. grok creates the models. it learns the spatial temporalpatterns in the data. it outputs predictionswith percentages. or you know, it's a probabilitydistribution. and it can detect anomalies. and we're finding lots ofapplications for this. we're finding in energy andi'll tell you some more--


product forecasting, anomalydetection, server loads. i'm just going to walk youthrough a few simple examples. and then i'm goingto speculate a little bit for you. today, this is all runningon an amazon cloud. it doesn't have to be. but that's just the way weimplemented it to begin with. there's our simple rest api touse it and some web apps to help you get started, althoughit's only in a


private mode still. we're still doing this withcustomers, hand holding them. we see a lot of applications[inaudible] any space. i'm going to show you thisone because it's very simple to see. and that's the main reason i'mgoing to start with this. it's very simple. you may not be realizing this. there's a thing called demandresponse, which is large


consumers of electricityactually bid on price of power throughout the day. and the utilities will say, ifyou can use x amount of power at 3:00, i'll giveyou this price. if you use y amount of power,i'll give you this price. and they're trying to figureout how to do all this. and if you could predict bothyour demand or the supply, you can save energy and youcan save money. so there's a lot tobe saved here.


here's a factory in france-- in paris, actually. and this is a very simple one. it's just showing theelectrical usage throughout a week. you can see the five days. they're not working on theweekend, apparently. because it's kind of low thereat the last two days. and the problem that thecustomer wanted to do is they


said, at midnight, they haveto sort of make their bids. and at midnight, they want tomake predictions every hour for the next 24 hours. we had to put a little wraparound this learning algorithm to get this to work. but we can do that. and what we do is you getsomething like this. now here's the actualand predicted. now, this looks great.


pay [inaudible] attention. you really can't tell if it'sgood or not because you have to really look atstatistics and look at the data carefully. but it turns out thiswas very good. the customer was happywith this. the red is predicted. the blue is actual. we can follow thispretty well.


here's a situation where thesystem was just trained on a few months of data. and here we are. on wednesday and thursdaymorning, it starts picking up. and it says, hey, [inaudible]pick up. and it didn't happen. and the reason was becauseit was a holiday. and we didn't know aboutthis holiday. and the system was nevertrained on holidays.


so it says, oop. then it says, that wasn't rightand starts to say this looks like a weekend. so it started actinglike a weekend. here's another example. same idea, but a littlemore complex. you get the sense that it's notso obvious all the time. this is a company thatdoes video encoding. they have a service level


agreement with their customers. they have to guaranteea quick turnaround. so they have to leave extraservers running on the cloud all the time in casethe peak in demand. so they're always try to managehow many servers do i leave running, which are wastingenergy and electricity and power and money. but i have to meet my servicelevel agreement. if they could predict customerdemand better, then they could


leave fewer of these extraservers running around. so here you can see the data. it's quite spiky. there's no obviouspatterns in it. we can't predictall the spikes. it's impossible. but the question is, can wediscern some patterns in this data-- do better than any othertechnique they've had? and can we do this inan automated way?


and the answer is yesto both those. we can do it better thanthey can do it. and we can do it inan automated way. so in a sense, basically, wejust feed the data in and grok this thing. i won't go through allthe details of it. but it was a successfulapplication. i want to give you a littlesense of what it's like if you looked inside of this system.


so just pay attention to theright side of these images. here we're lookingdown on the-- actually 2,048-- but 2,000 columns. and the green dot means thatthis was predicted. and it actually occurred. so we're sort of probinginside of this cortical learning algorithm right now. and this is a case wherewhat occurred was


exactly what was predicted. this occurs quite a lot. it's not unusual. here's a situation where wepredicted multiple things. but one of the ones thatwe predicted occurred. so we have more of these littleblue circles of things that were predictedbut didn't occur. but that's not a mistake, aslong as the things that did occur were predicted.


those are the green dots. so i have 40 green dots andmaybe 80 blue dots-- something like that. here's a situation where thingsdidn't work too well. and i'm sorry you probably can'tsee this too well in the back of the room. there's a bunch of red circlesin here, as well. and those red circles are thingsthat weren't predicted but did occur.


those are true anomalies. i didn't expect thisto happen. and it did happen. but this is so just so typicallywhat happens here. it's not an all ornothing affair. an anomaly is not one thing. it's like, well, some of thethings i predicted did occur. some of the things i predicteddidn't occur. and some of the things thatoccurred weren't predicted.


so if i wanted to, i couldgo in here and say, well, semantically, what was wrong,and what was right? we don't do that per se. but that's how the systemworks on its own. so there's a lot ofsubtlety here. here's a case where we used itwith a offshore windmill. and this is looking at the oiltemperature and the gear case and a large offshore windmillin the north sea. the blue line is thetemperature.


and it's going up and downthroughout the day as the windmill's speeding upand slowing down. and the question is, could wedetect anomalous behavior? and the red line at the bottomis the anomaly score, which grok is putting out. and here you can see aninteresting event. we had two peaks here. the earlier stuff is whenthe system was first started being trained.


but here we have two peaks. and the important thing is thefirst peak is when the system start acting a littlebit unusual. it wasn't out of range. it wasn't out of spec in termsof it was too high or too low-- the temperature. but it was sort of oscillatingin a way that hadn't seen before. so grok says, that's unusual.


i'm not able to predictthat as well. and so you have a peak. and then the second peak is whenthe system actually went down for failure. and they worked on it. we think there's a tonof applications here detecting anomalies. here's my dangerous slide. my dangerous slide says,hey, if i was google,


how would i use this? i'd love to have youguys as a customer. that's not why i'm here. but i'd love to haveyou as a customer. i said, well, we've had a lotof interesting advertising space-- online advertising. and we've shown that we can dofor people who basically have real estate to sell, they'realways trying to pick what ad network to pick.


and so we've shown that we canpredict the expected return on a particular ad network, whichchanges throughout the day all the time. you want to do this almoston a 15-minute basis. we can do that on a pernetwork, per app, per demographic basis. and essentially, the user candecide, how do i prioritize where i serve my ads from. we don't really care whatthe patterns mean.


it just [inaudible]patterns there. and they change. and we find them. we can't get it rightall the time. again, we're just tryingto do it better than they're doing today. we have a lot of interestin the finance world. you can't predict stock prices. forget it.


it's not going to happen. but you can, and we've shown,that we can successfully predict volume and volatilitybetter than the industry standards today. and this is valuable forvarious reasons. we also believe-- we haven'tdone this yet-- but we have a lot of interest in detectinganomalous trading. both this is internal tocompanies because they're trying to figurerogue traders.


we also think we can do itacross a huge number of obtuse, weird tradingcombinations. so when something becomes allthe sudden more predictable, it represents a tradingopportunity, which for some reason, not many people intrading are saying, bingo. you could do that. it's not my favoriteapplication. but i think there's goingto be a lot of applications there.


i mention this because google--because you guys, i would have to admit, you'renot first in everything. yahoo's still ahead of youin finance, i think. and you could add some reallycool things to your financial stuff if you had thesekind of capabilities. you guys do a lotof computers. you have the biggest serverfarms in the world. we found some really interestingapplications in managing computer resourceslike i mentioned with the


earlier example-- predicting demand-- resource balancing. i've even heard a crazy idea--a great idea-- where someone says, you know, differentservers have different efficiencies running differentapplications. and if you can predict theefficiency, you could switch down some servers and bringup others and save energy. i think that's really cool.


a lot of work in the energyspace for us. we're getting hit onthis all the time. you guys are involved in smartgrids, solar, wind, demand response. this could besomething great for you guys. and finally, i love your cars. and maybe something[inaudible]. if you'd predict where parkingspots are going to be or routing people-- thingslike that. if you want more details on grokor you want more details


on these algorithms, there's awhite paper on our website. and you can read about thisstuff on our website. i'm now going to switch tomy last part of this presentation, which i'm going tojust speculate a little bit about the future of machineintelligence. because i'm really passionateabout this stuff, as you probably can tell. so, what's the future? is it like this?


skynet, "the matrix," "theterminator"-- ah! i'm not a big sciencefiction fan. but i'm told theseare bad things. or is the future somethingnice, like this, like you know, little robot butlerslike c-3po? or we're going to playgames of [inaudible]. or maybe we'll come upwith new ways of entertaining ourselves. is that the future?


or is the future ambiguous? maybe it's good. butit turns bad. so i don't know. but i'm going to tellyou some things. i have some prognosticationshere. i'll just tell you wherei think this is going. here's some things i think isdefinitely going to happen. and i'm not talking 100years from now. this is going to happen.


and the reason i'm hereis because i'm trying to happen sooner. we can make machine intelligencethat's faster and bigger than biologicalintelligence. so we can definitely makemachine intelligence a million times faster than biologicalbrains. neurons can't do anything lessthan five milliseconds. that's their maximumthroughput. we can do a lot betterthan that.


now, i can't just speed up abrain if don't speed up its sensory organs and i don'tspeed up its data streams and so on. but i don't see any reasonwhy we can't do that. in virtual worlds, we should beable to make machines that are a million times faster andthink a million times faster than humans think. we can make them bigger. that's not the only goal.


but there's no reason at allwe can't make bigger neocortexes. and you can't make it smarterjust by making it bigger. that's a mistake. you can't just say, make itbigger, and it'll be smarter. you still have to learn. you still have to beexposed to things. it takes 20 years totrain a human. we have to come upwith-- these are


our training systems. and they have to be exposedto environments. but there's no question in mymind that we can make deeper thinking machines than humans. this is the area i getvery excited about. we can do super senses. we should not be thinking aboutthe senses of machine intelligent machines as hearingand vision and touch. i was just showing-- we havesensors that are looking at


oil temperatures. we can have sensors thatlook at anything. we could have distributedsensors. we could have microscopicsensors that work inside of cells. all kinds of things wherehumans, we have an impedance mismatch because ofour own senses. we spend a tremendous amount oftime trying to come up with ways of looking and thinking andexperiencing stuff that we


can't normally experience. but we could build artificialbrains that experience it naturally. we can do fluid robotics. we're nowhere closeto this today. but i think we can get there. and finally, thisis another idea. the neocortex is a hierarchy. and in the brain, they're allco-located because we have to


run these wires between them andthe neurons between them. but in a machine intelligentworld, we don't have to do that. we can have parts of thehierarchy all over the place. and we can have a distributedhierarchy. we can have hierarchieson top of hierarchies. i don't even know where that'sgoing to go yet. but the idea that it doesn'thave to be co-located. and as long as we get thecommunications right, that


would be very interestingstuff. this is all going to happen. here's some things thatmight happen. i don't know. maybe they will. maybe they won't. humanoid robots. maybe. maybe not.


will we have somethinglike c-3po? i think it's probablytechnical possible. it's going to be very, verydifficult because if you want it to be human like, they haveto all kinds of other stuff that makes them human-like. they have to have therest of the brain. and they have to haveall these emotional things and so on. and i'm not sure that'sreally where the


business is going to be. you know, i know a lot ofpeople want to do this. but to me, it's like thisis sort of a sideshow. and when new technologiescome along, we always imagine these things. when the steam engine camealong, they imagined steam engine robots, right? that's where the termrobot came from. but it didn't happen.


it may happen. will we have computerbrain interfaces for all, like "the matrix"? you plug it in the back andgo, whoa, you know. there's a lot of technicalproblems there. i'm not sure we reallywant to have that. who knows? here's some things i don't thinkare going to happen. i don't think you're going toupload your brain to anything.


sorry to say. there's two reasonsfor this, really. one is forget aboutthe incredible difficult technical problems. just forget about that. but the memory in your brainis intimately tied to the wetware of your brain and thewetware of your body. and you would have to recreatethe entire thing in some sense to get those connections to bemeaningful in another form.


i also think it'd be quiteunsatisfactory. imagine if i went up to youright today and said, you know, you can upload yourbrain to this computer. do you want to do it? and you say, yeah, sure. and you say, yeah, i wantto live forever. and then we say, oh,yeah we did it. [farting noise]. then the computer comes up.


hey, that's great. i'm awake. and then we say, we'redone with you. we can get rid of you. and you say, whoa. wait a second. i'm still here. i mean, it's like-- audience: [laughing]


jeff hawkins: you're notgoing to feel so good. and in the end, then thosetwo things will diverge. you might as welljust have kids. it's the same thing. jeff hawkins: finally, idon't think we're going to have evil robots. these things aren't going toone day become sentient and say, i don't want to becontrolled anymore. you know-- you are dead.


these are not replicatingthings. these are not emotionalthings. these are not humans. they don't want to have sex. they're not hungry. we're just trying to use theprinciples by which the brain works to build really, reallyuseful things for society. and finally, we can be certainthat it's not going to be only for friendly uses as well.


people will do badthings with this. but that's true ofevery technology. all right, my last slide. why do this? why do we care? why am i so passionateabout this? why do i try to getother people to be passionate about it? well, there's two reasons.


the first is to live better. there's no question in my mind,just like computers have improved our livestremendously. and the products that you guysbuild have improved our lives tremendously. and my life has benefitedfrom that. i think having intelligentmachines is a way of improving our lives. we can make the world safer.


we can make it moreenergy efficient. we can make the world-- better health-- all the things we want to do. there's no question at all thatthis is the ability to move that needlesignificantly. but there's another reason, too,which is to learn more. if i sit back and say, what'sthe purpose in life? why should anyone care thati live here and you


live here and so on? in the end, long after manydrinks and so on, i come to the conclusion that the goalin life is to acquire knowledge and to make sure thatknowledge is preserved. and this is what wedo as scientists. this is what we do because we'reinquisitive species. we want to understandhow the world works. we want to understand theuniverse. we want to understand when did it beginand when did it end.


i want to know thoseanswers, too. and we could use toolsto help us do this. imagine we could have physiciststhat are a million times smarter and faster thanus and never get tired. and they think aboutthis stuff. what if we want to explorethe universe? we're finding human, earth-likeplanets only 13 light years away. isn't that great?


that was in the newsthis morning. i think that's wonderful. how long would it taketo get a human there? and will they survive? probably not. if we want to explore theuniverse, i think we have to do it with machines that don'tbreathe oxygen and are not sensitive to the thingswe are sensitive to. so, to me, in the end here, it'sall about accelerating


knowledge accretion. and i think there's a way ofamazingly accelerating that. so that's the end of my talk. thank you. jeff hawkins: thatwas really good. alan: so with that, we have afew minutes for questions. if you have questions, pleasecome to the mic. ray. jeff hawkins: sure.


hi, ray. ray: hi, jeff. great presentation. on your model, i like it. and i agree with thethrust of it. i wanted to focus on one aspectof it, which is the use of scalars. you mentioned scalars in thecontext of the completion of a connection with a view towardsits permanence.


but the properties are basicallyrepresented by binary values. either there is a fricative. or the loop is closed.or it isn't. jeff hawkins: yes. ray: so in building systems thathave at least many of the attributes of your model andtrying both binary properties and then probabilisticproperties-- there's an 82% chance there'sa fricative--


and then using [inaudible]and [inaudible] to combine them appropriately,i've gotten better results with the probabilistic-- ray: --properties. so i was wondering-- jeff hawkins: yeah. so i'llrephrase the question just to make sure everyone understands. it's a great question. and i'll expand it.


because we've done two things. one is our neuron activationsare binary, as well. and our synapses are binary. the synaptic weightsare binary. and the question is, can you getbetter results or you get better results usingprobabilistic or scalar values for those? and we actually know for certainthat in the brain that neurons actually havescalar outputs.


they have firing rates. and we know that synapsesare also scalars. now, i mentioned earlier alsothat synapses are very so a large percent of the time,they don't work at all. so that's an argument for notrelying on scalar properties. but the thrust of theanswer here is we've taken a shortcut. we've said that becausewe have distributed representations, you do not needto rely on the accuracy


of any scalar value anywherein the system. and it's much quicker andsimpler to implement this as binary activations and binarysynaptic weights. i'm not saying it's realistic. but given the principles-- i understand what's going on-- i can back off to it. and it makes my system runmuch more reliably. we've spent a great dealof time trying to


make grok run fast. we can do a learning inferencecycle in 10 milliseconds. and we need to do this. because if you were going tobuild a practical system, you have to make the thingsbefore them. so that was an engineeringchoice we made. it's not a biological choice. and i'm not disagreeingwith you. it probably would getbetter results if i


did it with a scalar. ray: just a quick question onyour view of markram's project because i've had this debatewith him over the past summer. and he expects simulating at themolecular level, which of course is not the rightway to build ai. but it may be a good way toverify our models of biology. but by 2020, he'll be ableto simulate it at 100 times real time. and you'll be able to actuallyhave a conversation with it.


and i said, how are yougoing to have a conversation with it? because if you're absolutelyperfect in your model, it's not to do anything, just likehuman brain doesn't do anything unless it's gonethrough years of learning. and if you're at 100 times realtime, how are you going to have it learn about the worldand have a conversation? jeff hawkins: yeah. there's a lot of fundamentalissues with that project.


i would agree. but look, i'm excited anybodywants to do any of this stuff. and maybe we'll learnsomething from it. and so look at thepositive side. i think they've now taken thatproject and view it more as a way of simulating druginteractions and all different kinds of things like that. yes? audience: so i want to extendray's question a little bit,


which is i appreciate thatyou've taken the last 15 years of advanced neural architecturalunderstandings and built that into your model. fantastic. in the past 15 years, there'salso been an advance in things like slope [inaudible],hormonal neural response, and so on. and people like antonio damasioand joe ledoux are saying that that's actuallyfundamental to cognition.


and yet it doesn't appear tobe in your model anyplace. so how do you reconcile that? jeff hawkins: can i sum it upin sort of the emotional affective contentof the brain. is that a reasonable-- audience: but there's alsounderlying neural and hormonal mechanisms-- jeff hawkins: yeah, sure. audience: --accountfor a lot of that.


so i want to make sure iaddress the question. so there's a lot of stuffin the brain-- a lot. you've picked one that ididn't talk about-- the synchronies. there's all kinds ofrhythms and so on. and so this particularone is that there are hormonal aspects. there are multiple neuralmodulators that are


distributed throughout theneocortex that are fundamental for learning. but again, when you look at it,and you can say, well, is it important for machineintelligence to have those? we have ones that are basedon fear and based on reward and so on. we have an emotional system inour system and in our models. it's a switch. it says, learn or don't learn.


and it's a very crudeemotional system. but it's good enough forwhat we need to do. if i were to build a systemthat's interacting in social networks with people and havingconversations and trying to get food and trying tohave sex and trying to stay warm, there's a whole bunch ofother things you might want to have-- other affective things. and then, just to be clear. in the brain, the parts of thebrain that actually evaluate


emotional content oremotional saliency are not in the neocortex. there are small areas. they're subcortical. they project the neuraltransmitter throughout the neocortex. so they have a global affect,like learn this. don't forget this. you just nearly died.


this was a bad pieceof chicken. don't eat it again. but from what we're trying todo, which is figure patterns and data and structure data andso on, it's not necessary at this point in time. it's a fundamental aspectof being a human. but it's not a fundamentalaspect of intelligence. audience: all right. jeff hawkins: that's howi'd probably put that.


audience: in all of yourexamples of practical applications, you seemto be predicting one-dimensional data. did you use only one dimensionof input, as well? jeff hawkins: now, so, whenyou say one dimension, i assume you mean likemultiple factors. audience: i mean thatit's a scalar value. jeff hawkins: well, the scalars,we handle one or more fields of scalars--


enumerated types, dates,and times. and so did i answerthat question yet? i didn't ex-- what we do is we end upforming a signal-- audience: does your modelperform significantly worse if it doesn't have that[inaudible]? jeff hawkins: well,we find out. so i didn't tell you about howthis works with the way the actual user would use grok.


if they provide multiple datastreams, they tell us what they want to predict. we do an evolutionary searchthrough model space to figure out which factors help makebetter predictions and which don't and then how to encodethose factors. so in the end, youend up with-- and sometimes we actually runthese as ensembles of models. we have multiple models runningat the same time. and they are competingwith each other.


so the answer to the questionis grok tries to figure out what, of all the data you giveit, which ones are the best and most predictive. or which factors help-- how to use them. sometimes some factors help. sometimes it doesn't. for example, in that particularcase with the windmill, if you included windspeed as well as oil


temperature, it'susually helpful. but it does prettywell without it. so it depends. it really depends. now, what you've described is asystem for predicting data. that does not seem to me to bea complete solution to, shall we say, general intelligenceor an autonomist system. what do you see as beingnecessary to complete the system in that sense?


jeff hawkins: i hope everyoneheard that question. so there's a lot. i only showed three ofmy six elements. we did not have amotor component. we don't have attention. and we don't have a hierarchy. we built something that'svery small. we built something that'sone-millionth the size of a human neocortex--


one-thousandth the sizeof a mouse neocortex. it is really small. it's 60,000 neurons. it's teeny. i'm not calling thata sentient being. this is like a teeny littlepiece of cortex that's learning patterns. but the key elements-- my argument is that if you getall six of those elements--


all six of those things--including-- so the hierarchy is basically a scale-- necessary for scale. the sensory motor is a hugecomponent of this because you have to explore the world. and i'm working onthat right now. we're making good progresson that-- and then the [inaudible]components. so you have to add all thisstuff together before you can


start claiming you've gotsomething that's close to machine intelligent. audience: ok, thank you. alan: great. it's 12:00 right now. so thank you all for coming. and thank you, jeff. it's been a really, really,great talk. jeff hawkins: thank you.


[applause] [music]


Subscribe to receive free email updates:

0 Response to "Offshore Job Hierarchy"

Posting Komentar