Well not quite but close. I’m holding a hard disk that has ALL of Wikipedia’s text in 10 different languages.

Yes you can download all of Wikipedia and yes it can easily fit in a hard drive. Isn’t that amazing? Text is incredibly dense compared to images and video. Around 22 GiB for English Wikipedia alone and 56 GiB for the 10 languages I downloaded.

I also have all of Wiktionary in the same hard drive. It’s around 16.4 GiB.

    • droning_in_my_ears@lemmy.worldOP
      link
      fedilink
      English
      arrow-up
      19
      ·
      10 months ago

      Yeah it’s pretty incredible. Wikimedia is the kind of project that almost feels like a small glimpse into a better world. What the internet could have been. It’s got some problems of course but it’s still a huge success.

      • intensely_human@lemm.ee
        link
        fedilink
        English
        arrow-up
        1
        ·
        8 months ago

        Uh, wikipedia is what the internet is.

        Wikipedia’s not a glimpse of a better world, it’s a glimpse of our current, existing world. Because wikipedia exists.

        It’s not like that hard drive came through a portal from another universe.

  • penquin@lemm.ee
    link
    fedilink
    English
    arrow-up
    12
    ·
    10 months ago

    You’re going to be the savior of humanity after the apocalypse

  • Masterblaster@kbin.social
    link
    fedilink
    arrow-up
    7
    ·
    10 months ago

    there’s still so much valuable academic information that never sees the light of day, or gets erased as the internet serpent eats its own tail.

  • WarmSoda@lemm.ee
    link
    fedilink
    English
    arrow-up
    5
    ·
    10 months ago

    Last time I looked into downloading Wikipedia it said it was 50gb for English text and 100 with images. How’d you get it for half the space?

    • droning_in_my_ears@lemmy.worldOP
      link
      fedilink
      English
      arrow-up
      9
      ·
      10 months ago

      It’s only the raw text in json line files. No media and no markup. I think I downloaded a compressed dump then used wikiextractor to extract the text.

        • ace_garp@lemmy.world
          link
          fedilink
          English
          arrow-up
          3
          ·
          10 months ago

          OK yes, some supporting info is: Aard2 is an offline wikipedia app, that uses small compressed data files in .slob format.

        • intensely_human@lemm.ee
          link
          fedilink
          English
          arrow-up
          1
          ·
          8 months ago

          Slob compression is best visualized as putting a sleeping bag into a stuff sack, except it’s all your possessions and you’re stuffing them into an old Chevy Metro