• [object Object]@lemmy.world
      link
      fedilink
      English
      arrow-up
      11
      arrow-down
      1
      ·
      edit-2
      3 days ago

      It’s deliberately slow to load

      That kinda defeats the goal of feeding AI as much garbage as possible. They will just fetch a page from a different site in that time, instead of spending cycles on this page. It’s not like the crawler works strictly serially.

      • gressen@lemmy.zip
        link
        fedilink
        English
        arrow-up
        36
        ·
        edit-2
        3 days ago

        The idea is to protect own server from unnecessary loads. You’re welcome to provide a faster AI tar pit, just mind that ultimately this is a waste of resources.

        • [object Object]@lemmy.world
          link
          fedilink
          English
          arrow-up
          9
          ·
          edit-2
          3 days ago

          I’m guessing that Markov chains are pretty efficient computationally compared to AI training. Don’t have a site currently, but I’d love to see a bot rip through hundreds of pages a minute.

    • VoterFrog@lemmy.world
      link
      fedilink
      English
      arrow-up
      6
      arrow-down
      3
      ·
      2 days ago

      Doesn’t work either

      The text you provided translates to:
      “But what about typing like this?”. This style of writing involves replacing standard Latin letters with similar-looking characters from other alphabets or adding diacritical marks (accents, tildes, umlauts) available in the Unicode standard.

    • Luffy@lemmy.ml
      link
      fedilink
      English
      arrow-up
      6
      arrow-down
      3
      ·
      2 days ago

      Even if the LLM dosent recognise it, the Human ghost workers will train/translate it

      You’re only hindering people who have trouble reading

    • boonhet@sopuli.xyz
      link
      fedilink
      English
      arrow-up
      10
      ·
      2 days ago

      Appreciate you using the ß correctly instead of using it as a replacement for “B”

  • Demdaru@lemmy.world
    link
    fedilink
    English
    arrow-up
    87
    arrow-down
    5
    ·
    3 days ago

    I am so confused by the low link lol.

    • “AI haters build tarpits to trap and trick AI (!)” - Ohmy god poor AI :<
    • “…that ignore robots.txt!” - …oh, so illegal AI…?
    • “Attackers explain-” - YEAH! THE EVIL AGRESSIVE
    • “how anti-spamdefense became an AI weapon” - …folk trying to defend from spam…?

    FFS they try to paint people protecting themselves as evil but are keeping facts too much and it becomes an absolute confusing mess xD

    • skulblaka@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      41
      arrow-down
      6
      ·
      3 days ago

      It’s not really that confusing.

      The software equivalent of armed masked men are illegally breaking in to your personal property, stealing everything that isn’t nailed down and ripping all the nails out of everything that is, and then leaving with it in order to reuse it for personal profit. It is, in all ways, similar to a home invasion. These invaders are then telling you that you’re a bad person because you don’t want them invading your property and stealing all your shit.

      Its highly illegal, everyone involved with it knows for a fact that it’s highly illegal, so they best they can do is try and spin propaganda around it because nobody has the balls to try and arrest Sam Altman, et al about it.

      If you pick the lock on my front door and enter my home without permission I am going to put a 12 gauge slug through your solar plexus. If I could do the same to an AI crawler I would.

      • Amberskin@europe.pub
        link
        fedilink
        English
        arrow-up
        5
        ·
        2 days ago

        There is a way to stop the IA Crawlers, but it involves using dynamite and a hard risk of landing in prison.

      • Pieisawesome@lemmy.dbzer0.com
        link
        fedilink
        English
        arrow-up
        25
        arrow-down
        8
        ·
        3 days ago

        This is a terrible analogy.

        First off, robots.txt has no force of law. It’s just a curtesy. You are free to ignore it (except where prohibited by EULA or contract).

        Secondly, this is more similar to a supermarket hanging a sign that you can only access 3 of their 11 aisles.

        What this is doing is if you try to access the 7 aisles they requested you not to use, you have to solve a math problem or two.

        Ai scrapers are obnoxious loud drunk people who take way more than their fair share.

        If you truly have something private (like your house) you should not expose it publically on the internet.

        • bless@lemmy.ml
          link
          fedilink
          English
          arrow-up
          5
          ·
          2 days ago

          Beware of dog signs also have no enforcement of law.

          But apart from that, if your crawler ends up stuck in an endless loop, that’s poor coding on your part. Human beings won’t browse a static website endlessly, neither should a crawler

        • skulblaka@sh.itjust.works
          link
          fedilink
          English
          arrow-up
          22
          arrow-down
          1
          ·
          3 days ago

          Well, let’s turn this situation around then and see how it changes.

          I hammer Meta’s backend services with 6.8m requests per second, ignoring all posted guidelines, absorbing all the data I can get my hands on from them and feeding it to my machine which is busy trying to build BaseFook based on Meta’s data that I’ve harvested from them.

          Criminal DDOS? What’s that?

          Copyright law? Surely this doesn’t apply to this.

          Unauthorized access to backend systems? Nah, we’ll be fine, that’s definitely legal.

          It is currently true that robots.txt doesn’t have legal teeth and relies on voluntary compliance, but there have been court cases involving it in the past, and in my opinion they should have resulted in an established legal precedent. Check these out (courtesy of Wikipedia:)

          The robots.txt played a role in the 1999 legal case of eBay v. Bidder’s Edge,[12] where eBay attempted to block a bot that did not comply with robots.txt, and in May 2000 a court ordered the company operating the bot to stop crawling eBay’s servers using any automatic means, by legal injunction on the basis of trespassing.[13][14][12] Bidder’s Edge appealed the ruling, but agreed in March 2001 to drop the appeal, pay an undisclosed amount to eBay, and stop accessing eBay’s auction information.[15][16]

          In 2007 Healthcare Advocates v. Harding, a company was sued for accessing protected web pages archived via The Wayback Machine, despite robots.txt rules denying those pages from the archive. A Pennsylvania court ruled “in this situation, the robots.txt file qualifies as a technological measure” under the DMCA. Due to a malfunction at Internet Archive, Harding could temporarly access these pages from the archive and thus the court found “the Harding firm did not circumvent the protective measure”.[17][18][19]

          In 2013 Associated Press v. Meltwater U.S. Holdings, Inc. the Associated Press sued Meltwater for copyright infringement and misappropriation over copying of AP news items. Meltwater claimed that they did not require a license and that it was fair use, because the content was freely available and not protected by robots.txt. The court decided in March 2013 that “Meltwater’s copying is not protected by the fair use doctrine”, mentioning among several factors that “failure […] to employ the robots.txt protocol did not give Meltwater […] license to copy and publish AP content”.[20]

          • SkyezOpen@lemmy.world
            link
            fedilink
            English
            arrow-up
            13
            ·
            3 days ago

            The critical difference that determines whether or not it’s illegal is how many lawyers the site owner has.

      • MonkderVierte@lemmy.zip
        link
        fedilink
        English
        arrow-up
        5
        ·
        edit-2
        3 days ago

        More like clogging the entry to your exhibition for making copies of your licensed produce, no?

  • FosterMolasses@leminal.space
    link
    fedilink
    English
    arrow-up
    19
    ·
    2 days ago

    It’s weird how this is written to make them sound more like animals or insects than a computer algorithm… “thrash around” lol

    • wolframhydroxide@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      4
      ·
      2 days ago

      It’s because the tool is named Nepenthes, after the pitcher plants, into which victims fall, cannot escape, and thrash around until they die and are digested.

  • laranis@lemmy.zip
    link
    fedilink
    English
    arrow-up
    22
    arrow-down
    1
    ·
    2 days ago

    Soon: US Republicans introduce law to prohibit the use of AI tar pits; cite copyright law and freedom of speech.

  • pigup@lemmy.world
    link
    fedilink
    English
    arrow-up
    37
    arrow-down
    1
    ·
    3 days ago

    Please, someone make us super easy to implement version of this.