Saturday, May 2, 2009

Baye's Theorem: Cut it out!

There is this big crowd of people who are desperately trying to to be a part of community that knows the answer to: "Why did that Bayesian traveler didn't crossed the road?"

To them it looks like the world is divide into two parts: Smart-ass people who knows what Bayes is all about; and they themselves!

To all my friends on the other side, here's a chance to grasp Bayes in most intuitive way (I assume you know basic probability stuff)!

The Problem:
(Courtesy: http://yudkowsky.net/rational/bayes)

1% of women at age forty who participate in routine screening have breast cancer. 80% of women with breast cancer will get positive mammographies. 9.6% of women without breast cancer will also get positive mammographies. A woman in this age group had a positive mammography in a routine screening. What is the probability that she actually has breast cancer?


Variables:

Lets assign the good old variables to the events we have in the problem:
A: People with cancer
B: People with positive mamographies

Just to make the conventions clear:
P(A): Probability that a person has cancer
P(B): Probability that person will get a positive mamography

Note: P(A|B) in general means probability of A, given that B happens!
So,
P(A|B): Probability that a person has cancer, given he got a positive mamography
P(B|A): Probability that person gets a positive mamography, given he has cancer (i feel sad for him though)

That's a lot of dumb variables!!!
Lets move to interesting things now...

What we know from the problem:

P(A) = 1%
P(B) = ??
P(B|A) = 80%
P(A|B) = ?? (This the real QUESTION)


Check-Point: If you are clear with everything till now, i guarantee you will understand Bayes theorem in just some minutes from now!!!

Breaking it down:

Finding P(B) (persons with positive mamographies is not difficult).
P(B) = Cancerians with positive mamographies + Non-Cancerians with positive mamographies

So, P(B) = 1%*80% + 9.6%*(1-1%) = 0.10304

Bayes in Action:
A person got positive mamography and we need to find the probability he actually has cancer. This would be simple:

Required Probability = People with cancer and positive mamography/People with positive mamography

Denominator is P(B). We already know that!

On to the numerator:
P(A)% (1%) people have cancer
Out of these P(B|A)% (0.10304*) people have positive mamographies

So,
People with cancer AND positive mamography = (Any Guesses???)
Yes, its 1% * 0.10304%!
=P(A)*P(B|A)

SO.

P(A|B) = P(A)*P(B|A)/P(B)

As simple as that!
That's what Bayes was trying to tell you...
I bet you did'nt knew it was this simple!!! :)

Thursday, March 5, 2009

Efficient Captchas

There is a whole lot of buzz in the internet about alternatives to captchas (i mean alternatives to image captchas). Its not mainly because captchas are not secure against bots, but because they aren't convenient!!!
People are getting bugged because of highly illegible captcha strings (http://tinyurl.com/5bvk7c).

But i really feel, captchas, if done properly are the best form of human recognition technique. Its simple, fast, and sufficiently secure. People are used to image captchas, and it just works! I get more bugged when I'm asked silly questions, or to solve a mathematical equation or when i have to select kittens out of 9 pictures of animals!!!

I put up some points here on how to create usable and convenient captchas:

1. Use a light background. I'll prefer a plain white background. Backgrounds do provide some security but having a regular background for all captchas does not helps as an average captcha solving algo will adapt itself to it.

So, its always better to present your captcha in white background, so that its easier to read.

2. Dont just use random words, instead use a phonetic generator to generate your captcha strings. This helps a lot in terms of captcha usability. Also, a phonetic generator does not poses much threat to your captcha getting hacked. Sure, it increases the threat but it provides more usability than threat.

3. DO NOT use a straight baseline and loosely couples letters. While writing any captcha hacking program, the most difficult operation is Segmentation. You can captivate on this weakness of captcha hackers by keeping wavy baselines and keeping letters coupled in random fashion.

4. Use single color for your captcha (better user experience and readability). A captcha bot can easily convert your captcha to grayscale for hacking.

5. Use letter warping, instead of image warping for the whole captcha. Warped letters are difficult to segment and recognize.

6. Differnt length captcha strings everytime for more security.

I hope it helps the community and eventually the end users, the humans!!!