Pages

Wednesday, December 26, 2012

Single click conversion form any format to EPUB, using Calibre - Calibre and windows command line batch scripting - How to set EPUB metadata authors and title from filename

This summer I've bought a Kobo Touch eBook reader as a gift for my girlfriend.
It's a very nice eBook reader... so nice that a week after I bought another one for me :-)
Now, after some month, and after too many manually converted eBooks, I've decided that I need to automatize the PDF/RTF/mobi/whatever to EPUB format conversion.
This, as a side effect, will also enable my girlfriend do autonomously convert and upload her eBooks to her Kobo Touch :-)
To do that I wrote a script that will do the conversion from whatever format to EPUB, using Calibre and my standard format conversion parameters.

For my conversion needs, I've always used Calibre, its a very nice, powerful, and free eBooks management software.
Calibre can do an enormous amount of things... but in the end, after setting it up, I simply need it to take a PDF file and convert it to EPUB; then I will simply copy this EPUB on my Kobo.
After all a Kobo, when connected to a PC, will function exactly like a simple flash drive :-)


My tiny script will call the ebook-convert.exe command of Calibre, and will pass all the needed parameters.
The script will then create a new EPUB file in the same folder of the original file.

Script installation
Download the script from here (UpdateDownload here the new version of the script, more info here)
Put the script in your Calibre executable folder
Script utilization
Drag and drop the file you want to convert to EPUB on the script
after dropping the file on the script you'll see the command window running the conversion process
The black command windowswill automatically close after a successful conversion (and will repain opened in case of error)

After the conversion, you'll find the new converted EPUB file in the same folder as the original file.
And you are done! :-)

Filename format
The filename should have the following format: "Authors - Title"
Something like "Isaac Asimov - Nightfall.pdf" is a good filename.
The first "-" will be used to separate author's name form title.

If there are no "-" in the filename, The author will be set to "unknown", so a file like "Nightfall.pdf" file will result in a "unknown - Nightfall.epub"

Extra conversion parameters
The script use some extra conversion parameter that, in my experience, works better that the default one:
  • remove-paragraph-spacing: I hate empty space between paragraph
  • flow-size 50: this will force the internak HTML files that compose the ebook to have a max file size of 50Kb, the end results is that the ebook reading experience will be faster
  • unwrap-factor 0.25: (only for PDF files) help the algorithm that detect the PDF  file structure, producing a better EPUB (in my limited experience)
Feel free to change, or remove, these extra conversion parameters.
If you need to add/change some conversion parameter, here you can find the parameters supported by ebook-converter.exe command.

Update 01/01/2014
I've updated the script, you can download the updated version here.
As requested by a reader the new version of the script can convert multiple files at once, and won't overwrite already existing epub.



Inside the script...
Here is a color-coded version of the EPUB conversion script, the same script you can download here. (UpdateDownload here the new version of the script, more info here)
(color coded version of the script created by the courtesy of http://hilite.me/)
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
@echo off
setlocal EnableDelayedExpansion

rem  find script path
set scriptPath=%~dp0
set scriptPath=%scriptPath:~0,-1%
rem "%scriptPath%\pyprogram.exe" /myparam=123_abcd


set filename=%~n1

REM split authors from title using the first "-" as separator
REM using ! and EnableDelayedExpansion instead of % so that filename containing parenthesis wont cause issue
FOR /F "usebackq tokens=1* delims=-" %%a in ('!filename!') do (
  set autore=  %%a  
  set titolo=  %%b  
) 

rem trim authors name extra space from left/right
for /f "tokens=* delims= " %%a in ("!autore!") do set autore=%%a
set autore=%autore%##
set autore=%autore:                ##=##%
set autore=%autore:        ##=##%
set autore=%autore:    ##=##%
set autore=%autore:  ##=##%
set autore=%autore: ##=##%
set autore=%autore:##=%
echo. Authors: "%autore%"


rem trim titles extra space from left/right
for /f "tokens=* delims= " %%a in ("!titolo!") do set titolo=%%a
set titolo=%titolo%##
set titolo=%titolo:                ##=##%
set titolo=%titolo:        ##=##%
set titolo=%titolo:    ##=##%
set titolo=%titolo:  ##=##%
set titolo=%titolo: ##=##%
set titolo=%titolo:##=%
echo. Title: "%titolo%"


echo.
echo.

REM if the filename doesent contains any "-" then %titolo% will be empty
IF "%titolo%"=="" (
  set titolo=!autore!
  set autore=unknown 
)


set InputfileParameters=--remove-paragraph-spacing
set PDFInputfileParameters=
set EpubInternalHTMLsplitSize=--flow-size 50

IF "%~x1"=="PDF" set PDFInputfileParameters=--unwrap-factor 0.25

"%scriptPath%\ebook-convert.exe" "%~1" "%~dp1%autore% - %titolo%.epub" %InputfileParameters% %PDFInputfileParameters% %EpubInternalHTMLsplitSize% --authors "%autore%" --title "%titolo%"

IF %ERRORLEVEL% NEQ 0 pause


UPDATE - How to set EPUB Authors and Title metadata from epub filname:
I've just created a new script that will modify the metadata on any EPUB.

This script will take the Authors and the Title from the filename, and will set them in the EPUB metadata.
So you can rename the EPUB file to something like "Asimov - Nightfall.epub" and then drag and drop this epub on my script: my script will set the epub metadata Authors/Title to the one you specified in the filename.

This script work using Calibre, so you need to place this script in the same folder where you have the Calibre executable.
The script is very similar to the one explained above, for epub format conversion. I simply use the "ebook-meta.exe" calibre command instead that "ebook-convert.exe".

You can download this script from the following link: Click here to download the script "Force Author Title on EPUB.cmd"


Feel free to ask any question :-)

9 comments:

  1. Love it, love it, love it.
    Two requests:
    Batch: Able to drag two or more files onto ebook-convert.exe
    but more importantly:
    instead of overwriting without warning if epub already exists keep all files but concatenate (1), (2), etc. to newer files.

    ReplyDelete
    Replies
    1. I'm happy you find my script useful.
      I've implemented the requested feature.
      You can download the updated script here: http://bit.ly/1a1ALDO

      Delete
  2. Thanks for the quick response. Unfortunately all links given have a redirect loop and no new file.

    ReplyDelete
    Replies
    1. Opps... I messed up something in the download link (yesterday "it worked on my pc" :-) but I clearly did something wrong... :-) )
      Now I fixed the links, so now you should be able to download the new version of the script.

      Delete
    2. Sorry. With the new file the DOS prompt just flashes and closes immediately.
      Win 7,64

      Don't all the IFEXIST just cry out for a loop?

      Delete
    3. I've fixed the script, there was a bug with the filenames containing parentheses .

      It's absolutely true that all the IFEXIST cry out for a loop... but the fact is that doing loop in batch file is not very easy, you can do it, but it's harder to maintain
      Copy-pasting 2 line for 400 times get the job done in 30 seconds, whereas doing a loop ask for more time and more testing (and also more places where something can go wrong, like the bug with the parentheses ).

      This is also why I rename the epub file with current date and time when the file already exist, instead that adding an incrementing counter as you originally requested.

      Loops in batch file have an ugly syntax and are badly documented, and hard to debug. So I try to avoid them when I can.

      Batch file are very useful for simple thing, but if the complexity of things you need to do go over a certain threshold, it's just easier to fire up Visual Studio ad develop a proper program to get the job done.
      It always end up to "how much time I need to develop this feature with this tool? it it worth it? is it the right tool?" :-)

      Anyway, fell free to report any bug/malfunction you find :-)

      Delete
  3. Seems to work just fine on some test files. Will speak up again if I run into any problems.

    Thanks again for the update.

    ReplyDelete