O'Reilly: WindowsDevCenter.com -
March 30, 2004 If you're looking to block online site ads and offensive Web content, you don't need to buy special software -- instead, you can use two techniques available for any browser. One technique uses the HOSTS file built into Windows, and the other uses PAC files, a feature of all modern browsers. Problems can crop up with both these approaches. This article will explain why the problems occur and how to solve them. Few web sites host their own banner ads. Typically they sign up with ad servers that deliver content and track views and clicks. Thus you can block most web site ads by blocking a fairly limited number of ad servers. HOSTS and PAC files can block web ads by blocking access to these ad servers. You can also block other sites serving objectionable content. What Is the HOSTS File?Unless a computer is configured to use a proxy server, the HOSTS file is the first place a browser looks for an IP address when you type in a URL such as www.permutations.com. Only if the domain name is not found in the HOSTS file does the browser then query the DNS server. It is this fact that makes the HOSTS file an effective means for blocking web site ads. The HOSTS file is stored in different places depending on your operating system:
It's a text file you can open in Notepad. Comments at the top explain the simple syntax. Each line consists of an IP address, a domain name, and an optional comment placed after a pound sign. The one default entry in every HOSTS file looks like this:
To use the HOSTS file to block web ads, you add a list of hosts serving objectionable content (such as ad servers), and associate these domains with the loopback address -- your own computer. Then when you navigate to a site that contains banner ads, the browser looks on your own machine for the ads and never visits the ad server. Thus the ads are never displayed, and the ad server has no opportunity to put tracking cookies on your computer. Compiling a list of ad servers for an ad-blocking HOSTS file would take a lot of time, but happily you don't have to do it. There are numerous ad-blocking HOSTS files available for download on the Internet. Mike Skallas distributes one that is updated each month. Regular updates are necessary because new ad servers pop up all the time. If you see an ad while running an ad-blocking HOSTS file, it means one of two things: (1) the ad is hosted on the site's own server, or (2) it's new. To find out where the ad is coming from, right click on it and select Copy Shortcut. If the ad is hosted on the site, you can't block it with a HOSTS file because HOSTS files only can block entire sites. (This is not true of PAC files, which I'll discuss later.) If it's a new ad server, paste the domain portion of this URL into your HOSTS file with a redirect to HOSTS File Problems and SolutionsThe HOSTS file trick is clever, but there are some potential problems with it. Ad-blocking HOSTS files can include sites that shouldn't be there, blocking access to sites you want to see. This occurs because some ad servers also provide other types of content. For example, the ad server akamai.com also provides streaming media for many web sites, including Microsoft, for whom they handle Windows Updates. If you block akamai.com, you won't be able to access Windows Updates. Then there's the aesthetic issue. Ideally, you'd see blank areas in place of ads, but in actual practice there are unattractive "Action canceled" error messages repeated wherever an ad would have been. There is a solution to this, as you'll see shortly. And then there is the problem with delays. The idea behind the HOSTS file trick is to redirect ad-server requests to an IP address where there is no server. Internet Explorer will fail immediately if it can't find a server, but other browsers (notably, Opera) wait much longer before giving up. Both these problems can be solved by installing a small, single purpose web server that does nothing but serve transparent bitmaps when requests are received on the loopback address. This replaces unsightly error messages with blank areas, and eliminates delays because the browser receives an immediate response. A free utility for this purpose will be described later in this article. But there are other potential problems. If you are running a real web server on your computer such as Personal Web Server (PWS) or Internet Information Services (IIS), you'll get a dialog prompting for a network password each time you navigate to a site with redirected ads. This is because, by default, PWS and IIS are configured as the "default web site," responding to all IP addresses assigned to the computer that are not assigned to other sites. When the HOSTS file redirects your browser to the loopback address, an actual web server is there to answer. Since the request is for resources it can't find, it pops up an "Enter network password" dialog. There are various things you can do to get around this, but all involve giving up something. If your computer is on a network, you can change the default IP setting of "(All Unassigned)" to the computer's network IP address, thus excluding Another possibility is to redirect the ad servers in the HOSTS file to a non-existent IP address such as PWS and IIS are configured by default to use TCP port 80, which is standard for HTTP. Another way you can prevent the "Enter network password" popup is to change the port to something other than 80 (81, for example). But this will make your server invisible to anyone who doesn't know that the port must be specified in the URL. The best solution if you're running a web server is to not use the HOSTS file for ad blocking at all, but instead to use a PAC file, which doesn't conflict with existing web servers. PAC files have other advantages as well. As mentioned earlier, HOSTS files can only block entire sites, and not specific URLs within a site. PAC files can block specific URLs within a site so, for example, you could block akamai.com ads without disabling Windows Update. HOST files have to be large to block all the major ad servers because wildcards are not supported; you have to list the exact domain names. Very large HOSTS files slow your browser because of the time it takes to search a large, unindexed text file. PAC files are based on JavaScript and can specify URLs using shell expressions (the Unix implementation of regular expressions), so this problem is eliminated. Finally, ad-blocking HOSTS files cannot be used on systems using proxy servers because the HOSTS file is bypassed. Proxy servers are not a problem with PAC files. What Are PAC Files?Proxy Automatic Configuration (PAC) files were introduced by Netscape with
the release of JavaScript back in 1995, and all modern browsers support them,
including Internet Explorer and Opera. PAC files consist of JavaScript defining
the function The idea of using PAC files to block Web site ads was conceived by John R. LoVerso in 1996, while he was immersed in finding and documenting security flaws in JavaScript. PAC files support some special functions, two of which are useful for blocking ad sites:
To block ads, your
(The zero is only there to line up the JavaScript statements.) Note that the blocked sites are redirected to port 3421 of Redirecting to an unused port like 3421 causes no problems for IE or Mozilla, but Opera will pop up an error message complaining that there is no proxy at that address. The solution to this problem is the special purpose web server mentioned earlier. It's good to understand how PAC files work so you can modify them if necessary, but you don't have to start from scratch. John R. LoVerso provides a very good ad-blocking PAC file PAC file with detailed comments here. Open the file in WordPad for editing; Notepad won't show the line breaks. Once you have the PAC file, you have to tell your browser to use it. The location of the setting is a little different in each browser, but in general you'll find it among the network or connection settings. You specify the file using a syntax like this:
If you are using Internet Explorer, you have to change two other settings. Open the Internet Options dialog and click on the Security tab. Select "Local intranet" and click the "Sites " button. Uncheck the box labeled "Include all sites that bypass the proxy server." One other change is necessary. You must turn off the auto-proxy caching mechanism, since it prevents being able to restrict some server content while allowing other content. Unfortunately, there is no interface to this setting in the Internet Options dialog, but you can use a clever .REG file to not only change the option, but add a checkbox for it on the Advanced page of the Internet Options dialog. This .REG file was written by Bill Talcott. Open Notepad, copy and paste these lines, save it with the file type .reg, then double-click on the file to load the settings into the registry:
The BlackHoleProxy UtilityAs mentioned earlier, when using the HOSTS file or a PAC file to redirect ad servers, it's a good idea to run a small, single-purpose web server on the loopback address that responds to requests with a transparent bitmap. This is what BlackHoleProxy does, and it can be used with HOSTS files, PAC files, or both. You can download BlackHoleProxy for free, with source code. You may have heard of a similar utility called eDexter that is free for personal use. BlackHoleProxy has some important options that eDexter lacks. It allows you to configure the port to use, which is crucial if you're running a web server on your computer. Another option lets BlackHoleProxy respond to computers other than Although BlackHoleProxy has all the features you might need, the interface is bare bones. There is no install program and no user interface. Options are set through the command line. For easy access, you can create shortcuts for the command-line options you think you'll need, plus another shortcut pointing to the documentation, and then create a folder for these in your Start menu.
To use BlackHoleProxy with an ad-blocking HOSTS file, you must set it to port 80 by launching it with this command line:
If you're running a web server on your computer, you should use a PAC file rather than the HOSTS file to block ads so you can change to a port that doesn't conflict. By default, BlackHoleProxy uses port 3421 because it was designed to be used with the No-Ads PAC file. Last but not least, don't forget to clear your browser's cache after setting up your ad-blocking HOSTS or PAC file, or the ads will be retrieved from your cache. Sheryl Canter has been a Contributing Editor to PC Magazine since 1993, a software developer since the early 1980's, and was the editor of PC Magazine's Utilities column from 1993-2002. Click here to download Sheryl's ad-blocking PAC file. See comments in file for usage. |
Home Writing Speaking Web Design Graphic Design Bio Blog Contact Web site designed and developed by
Permutations Software, Inc.
Last revised 28 Nov 12 |