
Nemesis
Our Projects
Extra
Controls
|
The idea of a captcha is, humans must be able to solve it. If humans can solve it, a bot can also solve it in the same way. The only problem is to understand why can we solve a captcha and how do we do that.

When solving a captcha we see lines, intersections, angles, directions, proportionality between image elements and we identify the letters. Of course, a bot can also do this the same way.
 F[1:10] 2[3:10] 9[3:35] L[5:5] 6[2:5] 8[4:28] |
Maybe not as good as we do, but good enough. In some cases, tracing the lines is not enough. The lines don't always define the letters - if there are any at all. A good example is rapidshare.com's new captcha.

Sometimes the lines only contribute to the big picture so they are not really needed. Only the big picture is needed (as in median filtered big picture).

But before this filtering, a bot can do many other usefull things - like following the small lines that compose this captcha to make some adjusments to improve the accuracy of searching.

From this point, one sollution would be to make a matrix for grid cells and write there only the differences we find from the surrounding pattern (like cell area, cell shape, shadowed borders etc.). That would be the best way to solve it. If this kind of captcha remains for a longer time, more methods will be used to improve the detection. For now, an easier way is implemented.
One would say that for a better look at this captcha it should be rotated. Finding angles for grid lines is easy. But the letters are not rotated. Better adjustments can be made to straighten the captcha by using the grid lines as a reference.

After doing the vertical adjustment, vertical reference lines are lost. So we use the white lines as a new reference.

4 x median filter + contrast adjustment

After filtering, letter information remains but not all of it is really necessarry for detection. The white lines have different shapes for different letters. A better detection can be done using the shadows that surround the letters, but to simplify things, only the white lines are used. Letters are separated and are re-scaled to fit in an 16x16 matrix each.
0000000011100000 1111100111000000 1111100111100000 1110000111000000 0000001111000000 0000000000000000 0000000000000000 0011111000111000 0011111000011100 0000000000011110 0000000000001110 0000000000001111 0000000000011111 0000000010111111 1101111111111111 1111111111111100
|
0000000011100000 0111111111100000 0111111111100000 0111000000000000 0011100000010000 0011000000010000 0000000101111000 0000111111111000 0000111111111000 0000111100100010 0000111100000000 0000001100000000 0000011100000000 0000001100000000 0000011110000000 0001111110000000
|
0000000000011100 0001111111001110 0001111111111110 0001100011111110 0000000000000000 0011111000011000 0111111111111000 0011111111111000 0011100001111100 0001110000000000 1001111000000000 0000111000000000 0000000000001111 0000000000001110 0111111111111110 0111111111111111
|
0000111100110000 0000111100110000 0000111100111000 0000111000111000 0111111100011100 0000011100011100 0100001100011100 0000011000111110 0000011100111100 0000011111111100 0000111111000000 0000111001000000 0000110000000011 0111110000000111 0001111111111111 0000111111111110
|
Searching for a "learned" pattern that has the biggest number of common points can offer a sollution for this captcha. If this kind of captcha will still be used, better methods will be implemented to improve detection.
Test results for current version of captcha.dll using 100 downloaded captchas:
| !15WW.gif -> 15WW |
| !1ALS.gif -> 1ALS |
| F | !1JM8.gif -> JJM5 |
| F | !1JZK.gif -> JJZK |
| F | !1TOO.gif -> L74O |
| !23V4.gif -> 23V4 |
| F | !27OW.gif -> 7OVV |
| F | !2JSG.gif -> 2J5O |
| F | !2NM8.gif -> 1NN8 |
| !2XQO.gif -> 2XQO |
| !33SG.gif -> 33SG |
| F | !3B74.gif -> ZB74 |
| F | !3GI8.gif -> 7DQ8 |
| F | !3PWO.gif -> 7V7O |
| !4N3K.gif -> 4N3K |
| F | !5HGW.gif -> SHGW |
| !5MSO.gif -> 5MSO |
| F | !675C.gif -> 6T5C |
| F | !6BU8.gif -> DBU8 |
| !6X74.gif -> 6X74 |
| !77XS.gif -> 77XS |
| F | !78KO.gif -> X8KO |
| F | !7QY8.gif -> 7QYB |
| F | !7RKG.gif -> 7PKG |
| F | !8NAO.gif -> BIPO |
| !92UO.gif -> 92UO |
| F | !9J9S.gif -> 9JSS |
| F | !9O9C.gif -> 9Q9C |
| F | !AFDC.gif -> AFDL |
| !AOXC.gif -> AOXC |
| F | !ARCG.gif -> 4ACO |
| F | !C2MO.gif -> CZKC |
| !DIUO.gif -> DIUO |
| !DVBK.gif -> DVBK |
| !DWOG.gif -> DWOG |
| F | !EINK.gif -> HHPK |
| F | !ELQO.gif -> ELOO |
| F | !EUI8.gif -> FOL8 |
| F | !F5ZK.gif -> FFZK |
| !FJXS.gif -> FJXS |
| !FPHS.gif -> FPHS |
| F | !FRPS.gif -> FRPG |
| F | !FZM8.gif -> PZO8 |
| !G6EO.gif -> G6EO |
| F | !GCY8.gif -> GCYB |
| F | !GHRK.gif -> SNRK |
| F | !GO28.gif -> GQZB |
| !HPYO.gif -> HPYO |
| !HT1S.gif -> HT1S |
| F | !IDWW.gif -> OWDV |
| F | !IUGG.gif -> LUG6 |
| F | !IYWG.gif -> HD7C |
| F | !J5XS.gif -> JEXS |
| F | !JEPC.gif -> JLPC |
| !JNGW.gif -> JNGW |
| !JPTC.gif -> JPTC |
| F | !JZGO.gif -> JZGD |
| !KLSG.gif -> KLSG |
| !KPZK.gif -> KPZK |
| F | !LNWW.gif -> ADVW |
| !LPAO.gif -> LPAO |
| F | !LRE8.gif -> LPE8 |
| !M58O.gif -> M58O |
| F | !MLE8.gif -> MLES |
| F | !MMI8.gif -> MMLS |
| !MNHS.gif -> MNHS |
| !MQ34.gif -> MQ34 |
| !N93K.gif -> N93K |
| !NO34.gif -> NO34 |
| !OBHS.gif -> OBHS |
| F | !OI55.gif -> O155 |
| !P1OO.gif -> P1OO |
| !Q1U8.gif -> Q1U8 |
| F | !QCTS.gif -> QC7O |
| F | !R1IO.gif -> K11O |
| F | !R5CG.gif -> R5CO |
| !RCOG.gif -> RCOG |
| !RCVK.gif -> RCVK |
| F | !RL9S.gif -> PLOS |
| F | !RXHS.gif -> RXHO |
| !SAKW.gif -> SAKW |
| F | !SG9C.gif -> SO9C |
| F | !SJGW.gif -> SJCW |
| !T1V4.gif -> T1V4 |
| F | !U1WW.gif -> JV7W |
| F | !V18O.gif -> VJ8O |
| !V8OG.gif -> V8OG |
| F | !VAUO.gif -> YAUO |
| !VQNK.gif -> VQNK |
| F | !W6GG.gif -> W6GC |
| F | !WHGO.gif -> WFGO |
| !WNQO.gif -> WNQO |
| F | !WY4O.gif -> VVOC |
| F | !XOCO.gif -> OCO7 |
| !XPN4.gif -> XPN4 |
| !Y6B4.gif -> Y6B4 |
| !YH68.gif -> YH68 |
| F | !YU9C.gif -> YUPC |
| !YUI8.gif -> YUI8 |
| F | !YX8O.gif -> YXBQ |
Improvement for this detection deppends on this captcha's availability (if rapidshare.com will use it long enough).
You need to be logged in to be able to post comments
|
|