Bayes Theory Essay, Research Paper
REVIEW OF RELEVANT LITERATURE AND RESEARCH
I first became interested in Bayes’ Theorem after reading Blind Man’s Bluff, Sontag (1998). The book made mention how Bayes’ Theorem was used to locate a missing thermonuclear bomb in Spain in 1966. Furthermore, it was again used by the military to locate the missing submarine USS Scorpion (Sontag, pg. 97) that had imploded when it sank several years later. I was intrigued by the nature of the theory and wanted to know more about it. When I was reading our textbook for the class, I came across Bayes’ Theorem again, and found an avenue to do more research.
There has been much study and many, many articles, papers and books devoted to Bayesian thought and statistics. My research involved literary search at the University of Memphis through Lexis-Nexis, ABI and many other electronic sources available at the University. I read many peer reviewed papers and reviewed several books about Bayed Theorem. I searched the Internet using several search engines and found much of the same literature found through the more conventional methods at the university. Additionally, as part of my research, I conducted an in depth telephone interview with the historian at the Atomic Museum in Albuquerque N.M..
I researched the development of the theorem and its criticism, and included my findings in this paper. Probably the most useful text in understanding the Theorem, and a definitive work supporting its use, is John Earman’s work, Bayes or Bust?: A Critical Examination of Bayesian Confirmation. This book examined the relevant literature and the development of Bayesian statistics as well as defended it from its critics.
LIST OF EQUATIONS AND ILLUSTRATIONS
page
Equation 1: Bayes Theorem A1
Equation 2: Bayes Theorem of Prior Probabilities A1
Equation 3: Bayes Theorem in the example of the caner test A1
Equation 4: Bayes Theorem in the example of the caner test, with
numbers applied A1
Illustration 1: Photo of B52 Bomber A2
Illustration 2: Photo of Lost bomb found off the coast of Spain A2
CHAPTER I THOMAS BAYES
Reverend Thomas Bayes was an English theologian and mathematician born in London England in 1702. His development of what is known today as Bayes’s Theorem contributed a powerful yet controversial tool for assessing how probable a specific event or outcome will be, based on quantitative reasoning. This form of reasoning known as conditional probabilities, has been the subject of much controversy and discussion. Many debate its usefulness as a valid scientific method. However, while it does have shortcomings as pointed but by Pearson who argues that,
It does not seem reasonable upon general grounds that we should be able on so little evidence to reach so certain a conclusion?.The method is much too powerful?it invests any positive conclusion, which it is employed to support, with far too high a degree of probability. Indeed, this is so foolish?that to entertain it is discreditable (1907).
Despite such criticism, it is still used today in all areas of study. Many different forms of this theory have evolved, but for the purposes of this paper, the way of looking at a problem and its solution from the Bayes point of view, can be referred to as Bayesian. “In a weak sense, any position on the foundations of probability which permits the wide or unrestricted use of Bayes’s theorem may be described as Bayesian (Logue 1995, pg. ix).”
Thomas Bayes’ father was one of six nonconformist ministers to be ordained in England in the 17th century. After a private education near his family home in Bunhill Field, he attended the University of Endinburgh, but never finished his degree. Like his father before him, Thomas Bayes was eventually ordained a nonconformist minister. After several years of serving with his father as a Presbyterian minister, he spent most of his career as a minister in Tunbridge Wells until his death in April 1761.
In addition to his position in the community as a minister, he also had the reputation of being “?a good mathematician.” (J.J. Oconnor and E.F. Robertson) In fact, he gained prominence in the field of mathematics by writing a pamphlet defending Sir Isaac Newton from critics of his work on fluxions. As result of the pamphlet, he was nominated and subsequently elected as a Fellow of the Royal Society in 1742.
The organization known as the Royal Society was a scholarly group formed to promote the natural sciences, including mathematics and all applied aspects such as engineering, and medicine. The society was founded in 1660 during the reign of King Charles II, and was incorporated by royal charter in 1662. The society is self-governed by a president and council, whose statutory responsibilities include making appointments to research councils, and it has representatives in the governing bodies of many organizations. The people that nominated Bayes described him as, “a gentleman of known merit, well skilled in Geometry and all parts of Mathematical and philosophical Learning (Norland, 2000 pg. 2)”. Bayes retired from the ministry in 1752 and died nine years later.
CHAPTER II DEVELOPMENT OF BAYES THEOREM
After Bayes death in 1761, his family was left with most of his property, but he also left a small bequest to Richard Price, another minister and amateur mathematician. Among Bayes’ papers, Price found two essays on mathematical subjects. He was so impressed with them that he sent them to the Royal Society hoping they would be published. Bayes set out his theory of probability in one of the essay’s titled towards solving a problem in the doctrine of chances it was so well received by the Royal Society, that it was published in the Philosophical Transactions of the Royal Society of London in 1764.
Bayes essay would shape the nature of statistics. Bayes’ theory stated “Given the number of times in which an unknown event has happened and failed: Required the chance that the probability of its happening in a single trial lies somewhere between any two degrees of probability that can be named (Bayes 1763, pg. 376)”. This problem and the solution it entails have come to be referred to as inverse statistics.
Logue put it this was,
Thus Bayes, in his famous [1763] essay, having defined probability as ‘the ratio between the value at which the expectation depending upon the happening of the event ought to be computed, and the value of the thing expected upon its happening’, then equivocates upon ‘expect’, sometimes taking it as wholly relative to an individual’s mental state, sometimes as though expectations were externally fixed values. (1995, pg.95)
The theorem was eventually accepted by mathematicians of the time. The mathematician LaPlace later accepted it as a valid process as Jaynes (1995) points out, “In almost his first published work (1774), Laplace rediscovered Bayes’ principle in greater clarity and generality, and then for the next 40 years proceeded to apply it to problems of astronomy, geodesy, meteorology, populations statistics and even jurisprudence” (pg. 2). LaPlace generalized Bayes’ approach, which was later generalized further into what we now call Bayes’ theorem. Essentially, the theorem is supposed to quantify the value of a hunch, factor in the knowledge that exists in people beyond their conscious minds. You see, according to Bayes’ theorem, one can always start with a belief with regard to the probability of an outcome and use that in the equation. If one has no prior knowledge, the prior distribution would be diffuse (spread out).
CHAPTER III BAYES THEOREM EXPLAINED
Bayes theorem for conditional probabilities is described by equation 1:
where the marginal probability of A occurring given B has occurred is represented by P(A|B). Said another way, the theorem allows you, knowing little more than the probability of A given B to find the probability of B given A.
A second derivation of Bayes Theorem gives the rule for updating belief in a Hypothesis A (i.e. the probability of A) given additional evidence B. This is shown in equation 2
Where A* is the false outcome of an argument. The left hand term, P(A|B) is called the posterior probability, and it gives the probability of the hypothesis A after considering the evidence B. P(A|B) is called the likelihood, and it gives the probability of evidence assuming the hypothesis B and the background information is true.
In many situations, predictions of outcomes involve probabilities, one theory might predict that a certain outcome has a twenty percent chance of happening; another may predict a sixty percent chance of the same outcome. In these types of situations, the actual outcome would tend to shift our degree of belief from one theory to the other. As previously noted, Bayes theorem gives a way to calculate this experience or “degree of belief” (Logue 1995, pg.95). To construct an example of Bayes’ theorem, one begins by designing a mutually exclusive and all-inclusive hypothesis. That is, an hypothesis that includes all out comes. Next, one needs to spread out the degree of belief among them by assigning a probability based on what we believe to be true, to each hypothesis. This assignment would be between zero and one to make it all-inclusive. Not the often misused probability such as a person saying, “I am behind you one hundred and ten percent”, for such a statement to be true is not only physically impossible but certainly falls outside the requirement of being all inclusive. If one has no prior basis in either experience, or observation of the hypothesis, one simply spreads out he probabilities evenly among the hypothesis.
The next step in setting up the equation is to construct a list of possible outcomes. The list of possible outcomes, like the hypothesis, should also be mutually exclusive and all-inclusive. Each hypothesis is then calculated with its assigned conditional probability (either based on prior knowledge or assigned randomly if no prior knowledge is present) of each of the possible outcomes. This step simply assigns the probability of observing each outcome if that particular hypothesis is true. The unique part of Bayes’ theorem, and what was new in the 18th century, is one then makes note of which outcome actually occurred and can then compute revised prior probabilities for the hypothesis, (See equation 2), based on the actual outcome.
CHAPTER IV MEDICAL USE OF BAYES THEOREM
Suppose you undergo a medical test for a relatively rare cancer. Your doctor tells you the cancer has an incidence of 1% in the general population. In other words, the chance of you having the cancer is one in one hundred, i.e. a probability of 0.01. The test is known to be 89% reliable. That is, the test will not fail to find cancer when present, but will give a positive result in 11% percent of the cases where no cancer is present, this is known as a false positive.
When you are tested, the test yields a positive result. The question is, given the result of the test, what is the probability that you have cancer. It is easy to assume that if the test is nearly 75% accurate, and you test positive, then the likelihood you have the cancer is about 75%. That assumption is way off. The actual likelihood you have cancer is merely 3.9% (i.e., the probability is 0.039). Three point nine percent is still something to worry about with cancer but hardly as daunting as 75%. The problem is, that the 75% reliability factor for the test, has to be balanced against the fact that only 1% of the entire population has the cancer. Using Bayes’ method ensures you make proper use of the information available.
As I have discussed, Bayes’ method allows you to calculate the probability of a certain event C (in the above event, having the cancer), based on evidence (e.g. the result of the test), when you know (or can estimate):
(1) The probability of C in the absence of any evidence:
(2) The evidence of C
(3) The reliability of the evidence (i.e., the probability that the evidence is correct.
In this example, the probability in (1) is 0.01, the evidence in (2) is that the test came out positive, and the probability in (3) has to be computed from the 75% figure given. All three pieces of information are highly relevant, and to evaluate the probability that you have the cancer you have to combine them I the right manner. Bayes’ theorem allows us to do this.
To simplify the illustration, assume a population of 10,000, since we are only interested in percentages, the reduction in population size will not affect the outcome. Thus, in a population of 10,000, 100 will have cancer and 9,900 will not. Bayes method, as previously mentioned, is about improving an initial estimate after you have obtained the new evidence. In the absence of the test, all you could say about the likelihood of you having the cancer is there is a 1% chance that you do. Then you take the test, it shows positive. How do you revise the probability that you have the cancer?
There are, we know, 100 people in the population that do have the cancer, and for all of them, the test will show a positive result. But what of the 9,900 people that do not have the cancer, for 25% of them, the test will incorrectly give a positive result, thereby identifying 9900 X .025 = 2,475 people as having the cancer when they actually do not. Thus, overall, the test identifies a total of 100 + 2,475 = 2,575 people as having the cancer. Having tested positive, you are among that group (evidence tells you this). The question is, are you in the group that has the cancer or the group that does not but tested as if they did (false positive). Of the 2,575 that tested positive, 100 of them really do have cancer. To calculate the probability that you really have cancer, you take the number that really does (100) and divide it by the number that would have tested positive but do not have it (2,475) and you get the probability that you really do have the cancer. 100/2,475 = 0.039. In other words, there is a 3.9% chance that you have the cancer (see equation 4).
This calculation shows why it is important to take account of the overall incidence of the cancer in the population. This, in the Bayesian way of thinking, is known as prior probability. Being able to calculate results based on this prior probability, either known, speculated or not known, is the advantage of using Bayes’ theorem. In our case, in a population of 10,000 with a cancer having an incidence rate of 1%, a test reliability of 75% will produce 2,475 false positives. Thus far outweighs the number of actual cases, which are only 100. As a result, if your test comes back positive, the chances are overwhelming that you are in the false positive group. This data, as it is placed in Bayes’ theorem flows like this: Let P(H) represent the probability that the hypothesis is correct in the absence of any evidence – the prior probability. Therefore, H is the hypotheses that you have the cancer and P(H) is 0.01. You then take the test and get a positive result, this evidence of cancer we will call C. Let P(H|C) be the probability that H is correct given the evidence C. This is the revised estimate we want Bayes’ theorem to calculate. Let P(C|H) be the probability that C would be found if H did occur. In our example, the test always detects cancer when it is present so P(C|H) = 1. To find the new estimate, you have to calculate P(H-wrong), the probability that H does not occur, which is .099 in this case. Finally, you have to calculate P(C|H-wrong), the probability that the Cancer C would be found (i.e., a positive test) even though H did not actually occur (i.e., you do not have the cancer), which is 0.25 in the example. In equation 3, Bayes’ theorem states:
Using the formula for our example in equation 4:
The quantity such as P(H|C) is known as a conditional probability. That is, the conditional probability of H occurring given the evidence C.
CHAPTER V BAYES AND THE LAW
Bayes theorem is used in mathematics and, as I previously mentioned, in many professions. Amongst them is the practice of law. There have been instances where lawyers have taken advantage of the lack of mathematical sophistication among judges and juries by deliberately confusing the two conditional probabilities P(G|E), the probability that the defendant is guilty given the evidence, and P(E|G), the conditional probability that the evidence would be found assuming the defendant would be guilty. Intentional misuse of probabilities has been known to occur where scientific evidence such as DNA testing is involved, such as paternity suits and rape and murder cases. In such cases, prosecuting attorneys may provide the court with a figure for P(E), the probability that the evidence could be found among the general population, whereas the figure of relevance in deciding guilt is P(G|E). As Bayes’ formula shows, the two values can be very different, with P(G|E) generally much lower than P(E). Unless there is other evidence that puts the defendant into the group of possible suspects, such use of P(E) is highly suspect. The reason is that, as with the cancer test example, it ignores the initial low prior probability that a person chosen at random is guilty of the crime in question.
Instructing the court in the proper use of Bayesian inference was the winning strategy used by American long-distance runner Mary Slaney’s lawyers when they succeeded in having her 1996 performance ban overturned. Slaney failed a routine test for performance enhancing steroids at the 1996 Olympic games, resulting in the United States athletic authorities banning her from future competitions. Her lawyers demonstrated that the test did not take proper account of the prior probability and thus made a tacit initial assumption of guilt.