Bookworm - Utility to download books via IRC

Posted on 2019-10-13under [

]

I read a lot. There are currently 245 books on my kindle collection. And yet, Amazon seems to be unable to provide me with a hassle-free way to buy books.

They will not give me a link to download the books I pay for
They will 'allow' me to use some convenience service to get the books 'synced' to my device, 'soon' after buying them. This can be in the range of minutes-to-hours.
Books are riddled with nasty DRM, so making local backups is a chore (Using Calibre to de-DRM them) -- and I'm not risking it after the 1984 scandal.

My strategy for a few years has been to buy the e-book directly from author's website if it is availalble in a DRM-free fashion.
If this is not the case, I'll buy the e-book via Amazon but download it from a more convenient source -- irc channels.

I will not go into detail on how these channels work as there's plenty of information online already, but a quick summary is:

Join an e-book channel
Ask the indexer bot which bots have the book you want (@search <whatever you want>)
From the results, pick a bot and ask it for the book.

This is less cumbersome than using Amazon's method for syncing books, but still annoying.

Designing a tool to handle this

The minimum requirements for a tool like this are:

Communication over IRC (and DTCC file transfers)
Unpacking and converting files that come in non-kindle formats
Storing the resulting files
Ease of querying

V1

In the initial version, I designed the system to work based on 'requests'.

I'd make a request to the service, which would spawn a thread and start to advance through the tasks.

Set up an IRC Connection
Wait 30 seconds (required by the channel)
Request a book
Get a DTCC message
Connect to the DTCC endpoint
Unpack files
Convert files
Store new status in local database

There was no way to resume a request that had failed half way through and it was quite opaque to debug as well.

On top of that, waiting a fixed 30s period was annoying as hell.

I decided to take these same requirements but split them into individual services. This'd make the code a bit easier to work with, get rid of the 30s wait time and allow me to resume a request later in case it had failed.

V2

This time I approached the problem as an event-based system.

Instead of a single process in charge of each request, the request gets split into logical parts and each part is handled by a different, small process.

Each process is listening to a message queue, waiting for tasks.

The integration point for these processes is done via message-passing over redis queues.

'Resuming' a task is the same as putting a message in the queue of the process that failed.

Overall architecture

The initial summary of the required steps to download a book included asking the indexer bot who has what you are looking for; but we can do this step preemtively by asking each bot for their entire book list and storing it locally.

If we do this, we can expose a (fast) API endpoint that we can use to search.

One service per task. Tasks:

API to receive commands
IRC client to request books
DTCC client to fetch books
Unpack/Convert fetched books
Serve fetched books over HTTP

Services

Each service is listening to a dedicated queue for tasks, after executing these the result will be passed on to the next queue.

The end result of a user request is a kindle-compatible, drm-free file that can be fetched over HTTP.

If there's a need to store a blob, it will be stored in a local S3 instance (minio).

The metadata for each job is stored with a TTL of 5 days in redis. Every time a job progresses through the pipeline, its metadata is updated to reflect the current state.

API
- Input: Fetch request
- Output: (To user) fetch request id for status tracking
- Output: (To system) fetch request
- Input: Fetch status query
- Output: (To user) fetch status
IRCRequest: Connects to an IRC channel and
- Input: Book download request
- Output: DTCC parameters to fetch said book
DTCCFetch
- Input: DTCC parameters
- Output: S3 key containing the fetched files
Unpacker
- Input: S3 Key to books (compressed, in variable formats)
- Output: S3 key to uncompressed, mobi-formatted book

Web UI

The web UI exposes the 3 main aspects of the service:

Book Search/Fetch
Status of 'recent' tasks
Available books

The web UI had 2 hard constraints that had to be met:

Searching/Fetching a book should be easy on mobile
The page to download an available book should work on the kindle's browser.

Conclusion

I am reasonably happy with what I've built. It meets my criteria and it is very convenient when I finish a book in a saga and want to keep going but I don't have the next book on the kindle yet.

It does make me unreasonably angry that the process of acquiring the files for books I pay for is so cumbersome that it warrants a custom solution.

You can find the code here