
Today CAPTCHAs are vastly applied in Internet environments to prevent resource abuse by bots, although marginally there exist some other applications, which will be described later in Section 2.4.2. Usually, Web sites are designed and intended for human use. According to Basso and Sicco [14], Web robots, or bots for short, can be defined as computer programs that run automated tasks over the Internet without the need of human interaction. Typically, bots perform tasks that are both simple and structurally repetitive, at a much higher rate than would be possible for a human alone. From the point of view of the Web server, it is impossible to tell whether a Web request originated from a human user or from a bot: HTTP (Hypertext Transfer Protocol) requests look exactly the same. However, a bot can repeatedly perform Web-related activities, which have been thought and created as prerogatives of human beings, much more rapidly than a human user. As will be seen later in Section 6, these differences in behavior can become alternative ways to distinguish between human and machine users over Internet.
Provided that it is impossible to distinguish human users from machine users based solely on the HTTP protocol, CAPTCHAs provide a security barrier by posing a puzzle that human users can pass but machine users cannot. To be able to go ahead, first the CAPTCHA must be solved. It works as the gatekeeper to the Web resource coveted by the attacker.

In an effort to defeat all attempts to stop the proliferation of bots, automated tools are evolving toward the development of more complex and sophisticated programs, which posses an always increasing intelligence and can reproduce human actions with a high degree of fidelity.
The actions of bots can be driven by legitimate purposes or can rely on malicious plans. Therefore, robots can accomplish two opposite goals [14] l Help human beings in carrying out repetitive and time-consuming operations. l Undertake hostile or illegal activities, becoming a serious threat to Web application security
Legitimate Purposes of Robots
Currently, there are several situations in which using automated tools is manda- tory, due to large amount of data to process. Some examples are:
l Web spidering or crawling [53]. A Web crawler is a computer program that browses the World Wide Web in a methodical, automated manner or in an orderly fashion. Many sites, in particular, search engines, use spidering as a means of providing up-to-date data. Web crawlers are mainly used to create a copy of all the visited pages for later processing by a search engine that will index the downloaded pages to provide fast searches. Crawlers can also be used for automating maintenance tasks on a Web site, such as checking links or validating HTML code. Legitimate Web bots identify themselves by the User-agent field of an HTTP request when they make a request to a Web server; for instance, Yahoo!’s Web crawler Slurp is identified with the following string: Mozilla/5.0 (compatible; Yahoo! Slurp; hxx://help. yahoo.com/help/us/ysearch/slurp). Legitimate Web spiders usually respect the resources of Web servers according to the robots exclusion protocol, also known as the robots.txt protocol [93], that is, a standard for administrators to indicate which parts of their Web servers should not be accessed by crawlers.
l Web site mirroring [63]. For instance, the Internet Archive is a nonprofit that was founded to build an Internet library. Its purposes include offering perma- nent access for researchers, historians, scholars, people with disabilities, and the general public to historical collections that exist in digital format. The Internet Archive includes texts, audio, moving images, and software as well as archived Web pages in their collections, and it features a crawler named Heritrix and identified with the user agent field archive.org_bot.
l Vulnerability assessment [49]. This is the process of performing a security review of a Web application by searching for design flaws, vulnerabilities, and inherent weaknesses. It can be automated by using a software that retrieves Web site pages and builds specific requests to find unvalidated inputs, improper error handling, cross site scripting, etc. An example of an automated Web site vulnerability assessment tool is White-Hat Sentinel [136].
l Chat and instant messaging system management. For instance, an Internet relay chat (IRC) bot is a set of scripts or an independent program that connects to IRC as a client, to perform automated functions that include preventing malicious
users from taking over the channel, logging what happens in an IRC channel, giving out information on demand, creating statistics, hosting trivia games, etc.
Unfortunately, not always previous uses are legitimate. For instance, Web craw- lers can be used to gather specific types of information from Web pages, such as harvesting e-mail addresses (most often for spamming). These bots identify them- selves as legitimate users or as search engine bots to disguise their ultimate goal. Another example is malicious IRC bots [19], designed for the purpose of infecting other users with viruses, sending spam, or controlling botnets for spamming and Denial of Service attacks.
There are also some activities in the fringes of legality. For instance, gaming bots
can be used for fair purposes (e.g., as competitors or collaborators in a game, featuring week AI functions), or for unfair purposes (like those used as a help to the user for collecting resources, increasing player’s avatar experience, etc.). As another example, automated trading systems help stock brokers to, for example, make sells or purchases under certain conditions. However, they can also be used for artificially manipulating stock prices.





