Attackers explain how an anti-spam defense became an AI weapon.

  • drkt@scribe.disroot.org
    link
    fedilink
    English
    arrow-up
    19
    ·
    2 months ago

    I can save you a lot of trouble, actually. You don’t need all of this!

    Just make a custom 404 page that returns 13 MBs of junk along with status code 200 and has a few dead links (404, so it just goes to itself)

    There are no bots on the domain I do this on anymore. From swarming to zero in under a week.

    You don’t need tar pits or heuristics or anything else fancy. Just make your website so expensive to crawl that it’s not worth it so they filter themselves.

    • Snowcano@startrek.website
      link
      fedilink
      English
      arrow-up
      7
      ·
      2 months ago

      Just make a custom 404 page that returns 13 MBs of junk along with status code 200

      How would you go about doing this part? Asking for a friend who’s an idiot, totally not for me.

      • drkt@scribe.disroot.org
        link
        fedilink
        English
        arrow-up
        7
        ·
        edit-2
        2 months ago

        I use Apache2 and PHP, here’s what I did:

        in .htaccess you can set ErrorDocument 404 /error-hole.php https://httpd.apache.org/docs/2.4/custom-error.html

        in error-hole.php,

        <?php
        http_response_code(200);
        ?>
        <p>*paste a string that is 13 megabytes long*</p>
        

        For the string, I used dd to generate 13 MBs of noise from /dev/urandom and then I converted that to base64 so it would paste into error-hole.php

        You should probably hide some invisible dead links around your website as honeypots for the bots that normal users can’t see.

        • WolfLink@sh.itjust.works
          link
          fedilink
          English
          arrow-up
          1
          ·
          2 months ago

          For the string, I used dd to generate 13 MBs of noise from /dev/urandom and then I converted that to base64 so it would paste into error-hole.php

          That string is going to end up being 17MB assuming it’s a utf8 encoded .php file

      • drkt@scribe.disroot.org
        link
        fedilink
        English
        arrow-up
        1
        ·
        2 months ago

        Critics debating Nepenthes’ utility on Hacker News suggested that most AI crawlers could easily avoid tarpits like Nepenthes, with one commenter describing the attack as being “very crawler 101.” Aaron said that was his “favorite comment” because if tarpits are considered elementary attacks, he has “2 million lines of access log that show that Google didn’t graduate.”

        You assume incorrectly that bots, scrapers and drive-by malware attacks are made by competent people. I have years worth of stories I’m not going to post on the open internet that says otherwise. I also have months worth of access logs that say otherwise. AhrefsBot in particular is completely unable to deal with anything you throw at it. It spent weeks in a tarpit I made very similar to the one in the article, looping links, until I finally put it out of its misery.