Nemesis Our Projects Extra Controls
  RegMe

        The idea of a captcha is, humans must be able to solve it. If humans can solve it, a bot can also solve it in the same way. The only problem is to understand why can we solve a captcha and how do we do that.


        When solving a captcha we see lines, intersections, angles, directions, proportionality between image elements and we identify the letters. Of course, a bot can also do this the same way.


F[1:10] 2[3:10] 9[3:35] L[5:5] 6[2:5] 8[4:28]

         Maybe not as good as we do, but good enough. In some cases, tracing the lines is not enough. The lines don't always define the letters - if there are any at all. A good example is rapidshare.com's new captcha.


        Sometimes the lines only contribute to the big picture so they are not really needed. Only the big picture is needed (as in median filtered big picture).


        But before this filtering, a bot can do many other usefull things - like following the small lines that compose this captcha to make some adjusments to improve the accuracy of searching.


        From this point, one sollution would be to make a matrix for grid cells and write there only the differences we find from the surrounding pattern (like cell area, cell shape, shadowed borders etc.). That would be the best way to solve it. If this kind of captcha remains for a longer time, more methods will be used to improve the detection. For now, an easier way is implemented.
        One would say that for a better look at this captcha it should be rotated. Finding angles for grid lines is easy. But the letters are not rotated. Better adjustments can be made to straighten the captcha by using the grid lines as a reference.


        After doing the vertical adjustment, vertical reference lines are lost. So we use the white lines as a new reference.


        4 x median filter + contrast adjustment


        After filtering, letter information remains but not all of it is really necessarry for detection. The white lines have different shapes for different letters. A better detection can be done using the shadows that surround the letters, but to simplify things, only the white lines are used. Letters are separated and are re-scaled to fit in an 16x16 matrix each.

0000000011100000
1111100111000000
1111100111100000
1110000111000000
0000001111000000
0000000000000000
0000000000000000
0011111000111000
0011111000011100
0000000000011110
0000000000001110
0000000000001111
0000000000011111
0000000010111111
1101111111111111
1111111111111100
0000000011100000
0111111111100000
0111111111100000
0111000000000000
0011100000010000
0011000000010000
0000000101111000
0000111111111000
0000111111111000
0000111100100010
0000111100000000
0000001100000000
0000011100000000
0000001100000000
0000011110000000
0001111110000000
0000000000011100
0001111111001110
0001111111111110
0001100011111110
0000000000000000
0011111000011000
0111111111111000
0011111111111000
0011100001111100
0001110000000000
1001111000000000
0000111000000000
0000000000001111
0000000000001110
0111111111111110
0111111111111111
0000111100110000
0000111100110000
0000111100111000
0000111000111000
0111111100011100
0000011100011100
0100001100011100
0000011000111110
0000011100111100
0000011111111100
0000111111000000
0000111001000000
0000110000000011
0111110000000111
0001111111111111
0000111111111110

        Searching for a "learned" pattern that has the biggest number of common points can offer a sollution for this captcha. If this kind of captcha will still be used, better methods will be implemented to improve detection.

        Test results for current version of captcha.dll using 100 downloaded captchas:
!15WW.gif -> 15WW
!1ALS.gif -> 1ALS
F!1JM8.gif -> JJM5
F!1JZK.gif -> JJZK
F!1TOO.gif -> L74O
!23V4.gif -> 23V4
F!27OW.gif -> 7OVV
F!2JSG.gif -> 2J5O
F!2NM8.gif -> 1NN8
!2XQO.gif -> 2XQO
!33SG.gif -> 33SG
F!3B74.gif -> ZB74
F!3GI8.gif -> 7DQ8
F!3PWO.gif -> 7V7O
!4N3K.gif -> 4N3K
F!5HGW.gif -> SHGW
!5MSO.gif -> 5MSO
F!675C.gif -> 6T5C
F!6BU8.gif -> DBU8
!6X74.gif -> 6X74
!77XS.gif -> 77XS
F!78KO.gif -> X8KO
F!7QY8.gif -> 7QYB
F!7RKG.gif -> 7PKG
F!8NAO.gif -> BIPO
!92UO.gif -> 92UO
F!9J9S.gif -> 9JSS
F!9O9C.gif -> 9Q9C
F!AFDC.gif -> AFDL
!AOXC.gif -> AOXC
F!ARCG.gif -> 4ACO
F!C2MO.gif -> CZKC
!DIUO.gif -> DIUO
!DVBK.gif -> DVBK
!DWOG.gif -> DWOG
F!EINK.gif -> HHPK
F!ELQO.gif -> ELOO
F!EUI8.gif -> FOL8
F!F5ZK.gif -> FFZK
!FJXS.gif -> FJXS
!FPHS.gif -> FPHS
F!FRPS.gif -> FRPG
F!FZM8.gif -> PZO8
!G6EO.gif -> G6EO
F!GCY8.gif -> GCYB
F!GHRK.gif -> SNRK
F!GO28.gif -> GQZB
!HPYO.gif -> HPYO
!HT1S.gif -> HT1S
F!IDWW.gif -> OWDV
F!IUGG.gif -> LUG6
F!IYWG.gif -> HD7C
F!J5XS.gif -> JEXS
F!JEPC.gif -> JLPC
!JNGW.gif -> JNGW
!JPTC.gif -> JPTC
F!JZGO.gif -> JZGD
!KLSG.gif -> KLSG
!KPZK.gif -> KPZK
F!LNWW.gif -> ADVW
!LPAO.gif -> LPAO
F!LRE8.gif -> LPE8
!M58O.gif -> M58O
F!MLE8.gif -> MLES
F!MMI8.gif -> MMLS
!MNHS.gif -> MNHS
!MQ34.gif -> MQ34
!N93K.gif -> N93K
!NO34.gif -> NO34
!OBHS.gif -> OBHS
F!OI55.gif -> O155
!P1OO.gif -> P1OO
!Q1U8.gif -> Q1U8
F!QCTS.gif -> QC7O
F!R1IO.gif -> K11O
F!R5CG.gif -> R5CO
!RCOG.gif -> RCOG
!RCVK.gif -> RCVK
F!RL9S.gif -> PLOS
F!RXHS.gif -> RXHO
!SAKW.gif -> SAKW
F!SG9C.gif -> SO9C
F!SJGW.gif -> SJCW
!T1V4.gif -> T1V4
F!U1WW.gif -> JV7W
F!V18O.gif -> VJ8O
!V8OG.gif -> V8OG
F!VAUO.gif -> YAUO
!VQNK.gif -> VQNK
F!W6GG.gif -> W6GC
F!WHGO.gif -> WFGO
!WNQO.gif -> WNQO
F!WY4O.gif -> VVOC
F!XOCO.gif -> OCO7
!XPN4.gif -> XPN4
!Y6B4.gif -> Y6B4
!YH68.gif -> YH68
F!YU9C.gif -> YUPC
!YUI8.gif -> YUI8
F!YX8O.gif -> YXBQ

Improvement for this detection deppends on this captcha's availability (if rapidshare.com will use it long enough).



2 Comments


You need to be logged in to be able to post comments