Easy Download of Comics using PHP + DownThemAll Plugin

I am a comics fan. Mangas Fan to be exact.

I am recently reading off comics from 动漫屋. But frankly I do not like to read comics online. It’s slow and it annoys me when I have wait a while to read the next page.

So I studied the site structure and comes up a way to download the comics first and read them locally.

Requirements:

  • Firefox
  • DownThemAll Plugin

You see, there are quite a few of such (Chinese) comics portal around, mostly based in (guess what) China. Almost all of them put in some form of protection so you can not right click->view source and find out how to hard link to their pictures. Maybe they do not know such thing called Web Developer Extension for Firefox.

So after studying the web site for awhile with the help of Web Developer Ex, I figured a way to re-generate the links to the pictures directly using PHP.

Next I use DownThemAll (install it here if you have not done so:http://www.downthemall.net/) to capture the links and start downloading them. I do not want to download them directly using PHP as it’s is not meant to be a file downloader, and on a windows machine, it doesn’t support multi-thread (no fork). Even it can be done, it shall needs some coding effort. Why not use an existing tool with excellent UI ?

The PHP code is as below. For those want to try it out I host it here with instructions: http://www.xieer.com/comicparser

If I have the time I may add in support for other comic portal.

NOTE 1: The script might not work for comics with many volumes / chapters due to my hosting server seems to be pretty slow and get time out when paring

NOTE 2: As above, the parsing might take some time so please be patient.

PHP Source:

<?
date_default_timezone_set ("Asia/Singapore");

if (empty($_POST['comicurl']))
{
display_form ();
die;
}

$html_filename = urlencode ($_POST['comicurl']);
if (file_exists ($html_filename))
{
readfile ($html_filename);
display_link ();
die;
}

$raw_html = @file_get_contents ($_POST['comicurl']);
$raw_tokens = explode ("\n", $raw_html);
foreach ($raw_tokens as $token)
{
if (strstr($token, "showcomic") && !strstr($token, "span"))
{
$pagetoken = explode ("showcomic", $token);
$page = $pagetoken[1];
$page = "http://www.dm5.com/showcomic" . str_replace (strstr ($page, "\""),"", $page);
$comic_url[] = $page;
}
}

set_time_limit (0);
if (isset($comic_url) && count($comic_url))
{
$url_i=1;
foreach ($comic_url as $curl)
{
$d = str_replace ("http://www.dm5.com/showcomic", "", $curl);
$next_url = "http://www.dm5.com/display.aspx?id={$d}";
//echo "$next_url\n"; flush ();
echo "{$url_i}\n"; flush ();
for ($i=1; $i<=5; $i++)
{
$n = file_get_contents ($next_url);
$m = explode ("\n", $n);
if (!empty($n) && count($m)>10 )
{
foreach ($m as $nl)
{
if (strstr($nl,"array_img[") && strstr($nl, "] ="))
{
$tokens = explode ("=", $nl);
$url = $tokens[1];
$url = str_replace ("'", "", $url);
$url = trim(str_replace (";", "", $url));
$array_img[] = $url;
}
}
echo " okay, \n";
sleep(1);
break;
}
else echo " retrying {$i}...\n";
}
$url_i++;
}
}
else
{
echo "No comic url found for processing.";
die;
}
echo "<br/>\n";

// the img url is coded in %uhhhh%uhhhh.... format
// so we need to break it into an array, each of hhhh, then covert it back to a unicode char
// join them as a string and we have the actual url
$html_link = "";
if (isset($array_img) && count($array_img))
{
foreach ($array_img as $img)
{
//echo $img;
$img_tokens = explode ('%u', $img);
$url = '';
foreach ($img_tokens as $token)
{
$token = trim($token);
if (!empty($token))
{
$char = unichr(('0x' . $token)+0);
$url .= $char;
}
}
//echo $url . "<br/>\n";
$html_link .= "<a href='{$url}'>link</a>, \n";
}
}
else
{
echo "No image url found for processing.";
die;
}

echo "<br/>Parsing done. Right Click, and select DownThemAll to start downloading the link now<br/><br/>\n";
echo $html_link;
display_link ();

file_put_contents ($html_filename, $html_link);

// ------------------------------------------------------------------------------
function unichr($u)
{
return mb_convert_encoding('&#' . intval($u) . ';', 'UTF-8', 'HTML-ENTITIES');
}

function display_link ()
{
echo "<br/><a href='http://www.xieer.com/comicparser/index.php'>Back to parser main page</a>";
}

function display_form ()
{ ?>
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">

<html>
<head>
<title>Comic URL Parser</title>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<script src="http://ajax.googleapis.com/ajax/libs/jquery/1.3.2/jquery.min.js" type="text/javascript"></script>
</head>
<body style='font-family:arial;'>
<fieldset style='margin:20px; padding:10px; border:2px solid #f0f0f0;'>
<legend style='font-weight:bold; font-size:1.5em; padding:5px; color:brown;'>
<img src='googlesms_s.png' alt='Comic Parser'  style="border:none;">
</legend>

<form method='post' action='<?= $_SERVER['PHP_SELF']?>'>

<div style='margin:20px; margin-top:10px;'>
<div style='font-weight:bold; float:left; width:90px; height:25px; background:#DFF0F2; padding:10px; margin:0px; border:none;'>Comic URL</div>
<input type='text' name='comicurl' style='padding:6px; margin-left:-3px; font-size:1.5em; width:450px; height:33px; background:#EBF6F7; border:none;'><input type='submit' value='Parse Comic Link' style='margin:0px; width:150px; padding:12px; background:#eee; border:none; border-top:2px solid #eee;'>

<p>
The URL should look like http://www.dm5.com/Type.aspx?id=abcd
</p>
</div>

<div style='margin:20px; width:600px;'>
<p></p>
</div>

</form>
</fieldset>

<fieldset style='margin:20px; padding:20px; border:2px solid #f0f0f0;'>
<legend style='font-weight:bold; font-size:1.5em; padding:5px; color:brown;'>
<b>How To Use</b>
</legend>
<p>This application is meant to be used together with
<b><a href='http://www.mozilla.com/en-US/products/download.html'>Firefox</a> +
<a href='https://addons.mozilla.org/en-US/firefox/addon/201'>DownThemAll</a></b> Download Manager Addon.
</p>
<p>(Or with your own choice of download manager) So please install them first before using this parser.</p>
<ul>
<li style='line-height:1.4em;'>Goto <a href='http://www.dm5.com'>动漫屋 DM5</a> web page to browse thru the comics. <b>(NOTE:Obviously, It's in Chinese :) </b></li>
<li style='line-height:1.4em;'>Find out the URL of the comic that you are interested, it should looks like http://www.dm5.com/Type.aspx?id=abcd.</li>
<li style='line-height:1.4em;'>Enter the URL above and click on the button. That's it.</li>
<li style='line-height:1.4em;'>After finishing parsing, links will be shown. Use your download manager to capture the links and start downloading.</li>
</ul>
</fieldset>

<fieldset style='margin:20px; padding:20px; border:2px solid #f0f0f0;'>
<legend style='font-weight:bold; font-size:1.5em; padding:5px; color:brown;'>
<b>Tips</b>
</legend>
<ul>
<li style='line-height:1.4em;'>If you do not know how to use DownThemAll or any download manager, please do your own <a href='http://is.gd/BUO0'>googling</a>.</li>
<li style='line-height:1.4em;'>Or check out this <a href='http://is.gd/BUQy'>quick tutorial</a> at LifeHacker.</li>
<li style='line-height:1.4em;'>You may be doing multiple chapters download, to avoid overwriting/confusion, set up the renaming mask. I use <b>*subdirs*.*name*.*ext*</b></li>
<li style='line-height:1.4em;'>If you are lucky, some comic page might had been parsed and cached. Then the processing will be much faster.</li>
</ul>
</fieldset>

<fieldset style='margin:20px; padding:20px; border:2px solid #f0f0f0;'>
<legend style='font-weight:bold; font-size:1.5em; padding:5px; color:brown;'>
<b>About</b>
</legend>
<p style='border:none; float:right;'>
<a href="http://validator.w3.org/check?uri=referer"><img
src="http://www.w3.org/Icons/valid-html401"
alt="Valid HTML 4.01 Strict" height="31" width="88" style="border:none;"></a>
</p>
<p style='color:red; font-weight:bold;'>To know more about how this works, refer to my blog post <a href='http://blog.xieer.com/2009/01/how-to-let-people-to-send-free-sms-to-you/'>here</a>.</p>
<div style='margin:5px;'><b>NOTE</b> :
<ul>
<li style='line-height:1.4em;'>Unless cached, the time needed to parse the links depends on how big is the comic, more links more time. Start the parsing and go make a coffee.</li>
<li style='line-height:1.4em;'>The parser doesn't DOWNLOAD the comic for you, it just digs out the links so you can download easily/automatically.</li>
<li style='line-height:1.4em;'>That said, <b>Please be considerate</b> and try not to overload the comic server. Keep simultaneous download to about 2 threads.</li>
<li style='line-height:1.4em;'>If you stress out the server so much that its down, its bad for everyone including yourself.</li>
<li style='line-height:1.4em;'>The parser only support 动漫屋 DM5 web page for now. More might be supported in the future if I have the time.</li>
</ul>
</div>

</fieldset>
</body>
</html>
<? } ?>

About Big Head
5.5 years Ex-Motorolan as Senior Software Engineer. Then Transformed to Security Software Engineer, still SSE tho. Next will be SQE @ HelloMoto again :p Play ajax, gadgets, hacks. Like comics, novels, movies, cycling, badminton.

Comments

2 Responses to “Easy Download of Comics using PHP + DownThemAll Plugin”
  1. Phong says:

    Not work
    Please help me

    404 Not Found
    nginx/0.7.61

    Thanks

  2. technology says:

    Hi,

    its nice, but i cant get it can

    http://openwavecomp.com.sg

Speak Your Mind

Tell us what you're thinking...
and oh, if you want a pic to show with your comment, go get a gravatar!

Security Code: