In today’s digital age, access to information is abundant on the internet. However, sometimes we may find ourselves in situations where we need to access this data offline or simply prefer to read content in a more organized format, like an eBook. Converting web pages to EPUB format can be a convenient solution, and in this blog post, we’ll explore two methods to achieve this: using Wget and Pandoc, and utilizing a Chrome extension called EpubPress.
First, create a text file (e.g., list.txt) containing the links of the web pages you wish to download and convert to EPUB format. Make sure the links are listed one per line.
With the list of links ready, use the command-line tool Wget to download the entire site data and make it available offline. Use the following command:
wget -E -H -k -p -i list.txt
This command will recursively download the web pages, convert links for offline browsing (-k), and save them as HTML files in the current directory.
Navigate to the folder containing the downloaded HTML files and use Pandoc, a versatile document converter, to convert the files to EPUB format. Execute the following command:
pandoc -i filename-*.html -o filename.epub
Replace “filename” with the appropriate name for your EPUB file. This command will merge all the HTML files and convert them into a single EPUB file.
Instead of providing a list of links in a text file, you can create an index.html file that includes all the links you want to convert. Format the file as follows:
<!doctype html>
<html>
<head>
<title>This is the title of the webpage!</title>
</head>
<body>
<a href="chapter-1.html">1</a>
<a href="chapter-2.html">2</a>
<a href="chapter-3.html">3</a>
<a href="chapter-4.html">4</a>
</body>
</html>
Then, use Calibre’s ebook-convert tool to convert the index.html to EPUB, providing title, authors, and a cover image (if available):
ebook-convert index.html book.epub --title "Title" --authors "Author" --cover cover.png
Note: Ensure you have Calibre installed and added to your system’s PATH to use ebook-convert.
If your links end with numbers in ascending order (e.g., chapter-1.html, chapter-2.html, etc.), you can use LibreOffice Sheets to generate all the links. Simply copy the first link and extend it until the last number in the sequence.
Sometimes, after converting web pages to EPUB format, you may want to remove specific unwanted strings from the text. To do this, create a file (e.g., delete.txt) containing each string to be removed, one per line. Then use the grep command to filter out those strings:
grep -vf delete.txt filename.md > tmpfile && mv tmpfile newname.md
For a more user-friendly approach to converting web pages to EPUB, you can utilize the EpubPress Chrome extension. Follow these steps:
Install the EpubPress extension from the Chrome Web Store using the link here.
Open each web page you want to convert into a separate browser tab.
Click on the EpubPress extension icon in the Chrome toolbar.
Select the desired pages you want to include in the EPUB.
Click the “Process” button to initiate the conversion.
While EpubPress offers a convenient way to convert web pages to EPUB, it comes with some limitations:
In conclusion, converting web pages to EPUB format can be beneficial for offline reading and easier access to information. Using the combination of Wget and Pandoc allows more flexibility and customization, while the EpubPress Chrome extension offers a straightforward approach for quick conversions. Choose the method that suits your needs best and enjoy a seamless reading experience with your newly created EPUBs. Happy reading!