Yet another question about self-hosting email, but I haven’t found the answer at least phrased in a way that makes sense with my question.

I’ve got ~15 GBs of old gmail data that I’ve already downloaded, and google is on my ass about “91% full” and we know I’m not about to pay them for storage (I’ll sooner spend 100 hours trying to solve it myself before I pay them $3/month).

What I want is to have the same (or relatively close to the same) access and experience to find stuff in those old emails when they are stored on my hardware as I do when they are in my gmail. That is, I want to have a website and/or app that i search for emails from so-and-so, in some date-range, keywords. I don’t actually want to send any emails from this server or receive anything to it (maybe I would want gmail to forward to it or something, but probably I’d just do another archive batch every year).

What I’ve tried so far, which is sort of working, is that I’ve set up docker-mailserver on my box, and that is working and accessible. I can connect to it via Thunderbird or K-9 mail. I also converted big email download from google, which was a .mbox, into maildir using mb2md (apt install mb2md on debian was nice). This gave me a directory with ~120k individual email files.

When I check this out in Thunderbird, I see all those emails (and they look like they have the right info) (as a side - I actually only moved 1k emails into the directory that docker-mailserver has access to, just for testing, and Thunderbird only sees that 1k then). I can do some searching on those.

When I open in K-9, it by default looks like it just pulls in 100 of them. I can pull in more or refresh it sort of thing. I don’t normally use K-9, so I may just be missing how the functionality there is supposed to work.

I also just tried connecting to the mail server with Nextcloud Mail, which works in the sense that it connects but it (1) seems like it is struggling, and (2) is putting ‘today’ as the date for all the emails rather than when they actually came through. I don’t really want to use Nextcloud Mail here…

So, I think my question here is now really around search and storage. In Thunderbird, I think that the way it works (I don’t normally use Thunderbird much either) is that it downloads all the files locally, and then it will search them locally. In K-9 that appears to be the same, but with the caveat that it doesn’t look like it really wants to download 120k emails locally (even if I can).

What I think I want to do, though, is have the search running on the server. Like I don’t want to download 15GBs (and another 9 from gmail soon enough) to each client. I want it all on the server and just put in my search and the server do the query and give me a response.

docker-mailserver has a page for setting up Full-Text Search with Xapian, where it’ll make all the indices and all that. I tinkered with this and think I got it set up. This is another sort of thing where I would want the search to be utilizing the server rather than client since the server is (hopefully) optimizing for some of this stuff.

Should I be using a different server for what I want here? I’ve poked around at different ones and am more than open to changing to something else that is more for what I need here.

For clients, should I be using Roundcube or something else? Will that actually help with this ‘use the server to search’ question? For mobile, is there any way to avoid downloading all the emails to the client?

Thanks for the help.

  • atzanteol@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    4
    ·
    6 days ago

    Pdf? You converted plain text to something designed to preserve formatting? But why?

    You could use maildir and find things with “grep” or any mail client like Thunderbird.

    • Joël de Bruijn@lemmy.ml
      link
      fedilink
      English
      arrow-up
      2
      ·
      edit-2
      6 days ago

      I don’t know.

      • I don’t need formatting but it doesn’t get in the way either. So I am not bothered by it.
      • Also pdf and especially PDF/A standard is widely used for archiving and compliance regulation concerning archival and preservation.
      • If you want text the same tactic goes: just export in bulk to txt instead of pdf

      My main point is: Why would you want a mail specific stack of hosting, storage, indexing and frontends? If it’s all plain text anyway so the regular storage solutions for files come a long way.

      There is an entire industry (which has its own disadvantages) to get communication artefacts out of those systems and put it in document management systems or other forms of file based archival.

      • atzanteol@sh.itjust.works
        link
        fedilink
        English
        arrow-up
        1
        ·
        6 days ago

        Why would you want a mail specific stack of hosting, storage, indexing and frontends? If it’s all plain text anyway so the regular storage solutions for files come a long way.

        Because email has metadata. From, to, sent date, subject, etc. Plus attachments that may be binary.

        • Joël de Bruijn@lemmy.ml
          link
          fedilink
          English
          arrow-up
          1
          ·
          6 days ago

          Must admit, those fields are precisely the ones I use in my filenaming convention. Other DMS put that in their databases but alas that’s just trading one stack for another.

          Other ones put it in XMP metadata of the pdf themselves. But I guess the work involved would be similar.