Why is CasperJS better than PhantomJS
In this post I want to share with you some important information about limitations of PhantomJS and some useful features of CasperJS.
I used both libraries for web scrapping but I have some bad experience with PhantomJS. Just to make it clear, I did my projects first with PhantomJS, and then I rewrite them using CasperJS so I want to share “pain” I went through.
PhantomJS is just good tool if you need to do some basic web scrapping or website testing, but for any advanced usage I think CasperJS is much better tool. CasperJS wrappes PhantomJS (it is based on the PhantomJS) but provides some really useful functions which are not available in PhantomJS.
Before we continue, here are two links describing How to login to Amazon using CasperJS and How to login Amazon using PhantomJS. Both examples are covering form submission and some useful tricks.
Page browsing
Page browsing is way easier and much more intuitive using CasperJS than using PhantomJS. For example, in order to open webpage A, then webpage B etc. using CasperJS you can write something like this:
1 2 3 4 5 6 7 8 9 10 11 12 |
casper.start('URL of website A', function(){ console.log('Started'); }); casper.thenOpen('URL of website B', function(){ console.log('Started'); }); casper.then(function(){ this.evaluate(function(){ //Your code here }) }) casper.run(); |
So using CasperJS, you can easily open website A, then website B etc. What really important is that website B is not opened before website A is fully loaded. This is really really good thing which is not implemented in PhantomJS where you have to listen for contentLoaded events.
Let’s try to do the same thing in PhantomJS. Here is an example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 |
var steps=[]; var testindex = 0; var loadInProgress = false;//This is set to true when a page is still loading var clientRequests = new DRequest(); console.log("Initialization successful"); steps=[ function (){//Step 1 - Load Code Epicenter console.log("Request a website and wait for website to load"); clientRequests.sendRequest("http://photo-epicenter.com"); }, function(){//STEP 2 - After page load, parse results. DO NOT CALL readResponse() IN STEP ONE, because our RESULT might be empty console.log("Website loaded, read response"); clientRequests.readResponse(); var fs = require("fs"); console.log("Write data to file"); fs.write("website.html",clientRequests.getResponse(),"w"); } ]; //Start interval to read website content interval = setInterval(executeRequestsStepByStep,2000); function executeRequestsStepByStep(){ if (loadInProgress == false && typeof steps[testindex] == "function") { console.log("step " + (testindex + 1)); steps[testindex](); testindex++; } if (typeof steps[testindex] != "function") { console.log("test complete!"); phantom.exit(); } } /** * These listeners are very important in order to phantom work properly. Using these listeners, we control loadInProgress marker which controls, weather a page is fully loaded. * Without this, we will get content of the page, even a page is not fully loaded. */ clientRequests.phantomPage.onLoadStarted = function() { loadInProgress = true; console.log(loadInProgress); console.log("Loading started"); }; clientRequests.phantomPage.onLoadFinished = function() { loadInProgress = false; console.log("Loading finished"); }; clientRequests.phantomPage.onConsoleMessage = function(msg) { console.log(msg); }; |
This is much more code for the same thing. Just to make important note that this code is fully written by post author, and there are maybe better PhantomJS solutions, but believe me, they are not far from this example.
If you need something more, and not normal browsing from page A to page B, this code is getting much more complicated which makes maintenance a nightmare.
Cookies
Both libraries will resend received cookies upon every subsequent request, which is really helpful when you need to crawl pages behind login system.
Additionally, you can write received cookies to file, and then read them next time when script is executed.
Code maintenance
Obviously, CasperJS has much more intuitive syntax which helps you to easily maintain your scripts. Also, CasperJS has many function which are useful. For example, function thenClick which as first parameter receives xpath of the element. This is useful in case when you for example want to click on the item in the menu. Using Chrome, you can retreive xpath of the element, and just copy it to CasperJS script. CasperJS will imitate click event and you will be redirected for example on the desired page. If you need than to change the script so it works with other websites, you just have to change xpaths, and that is ti.
There are lot other handy functions available in the CasperJS, which are not implemented in the PhantomJS. Here is an API.
File Download
This is hot topic when we are speaking about PhantomJS. There are at least 20 posts about how to make file download with PhantomJS. Here are two available approaches:
- In your evaluate function you can make AJAX call to download and encode your file, then you can return this content back to phantom script
- You can use uncompiled Phantom library available on some GitHub pages
Both approaches will not guarantee you 100% that file download will work. Then there is also another problem when you don’t want to save downloaded file to file system (for example you are not allowed to save downloaded data on you machine), than this is almost impossible to do with PhantomJS, and my advice is do not use PhantomJS for file downloads.
With CasperJS, file download is really easy because CasperJS provides download function when you want to download file to file system, or base64encode when you want just process received data without saving it.
Here is example of base64encode function:
1 2 3 4 5 6 7 8 |
var base64logo = null; casper.start('http://www.google.fr/', function() { base64logo = this.base64encode('http://www.google.fr/images/srpr/logo3w.png'); }); casper.run(function() { this.echo(base64logo).exit(); }); |
Here is example how to download CSV file using CasperJS without saving file on the file system:
1 2 3 4 5 6 7 8 9 10 11 |
var casper = require('casper').create(); casper.start("http://captaincoffee.com.au/dump/", function() { this.echo(this.getTitle()) }); casper.then(function(){ var url = 'http://captaincoffee.com.au/dump/csv.csv'; require('utils').dump(this.base64encode(url, 'get')); }); casper.run(); |
Conclusion
Both PhantomJS and CasperJS will do good work, but if you have ability to choose one of these, my suggestion is to use CasperJS because you will get better or same results without too much effort.
http://code-epicenter.com/why-is-casperjs-better-than-phantomjs/Why is CasperJS better than PhantomJShttp://code-epicenter.com/wp-content/uploads/2015/07/photo.pnghttp://code-epicenter.com/wp-content/uploads/2015/07/photo-150x150.pngJavaScriptProgrammingTutorialsAmazon,CasperJS,JavaScript,PhantomJSIn this post I want to share with you some important information about limitations of PhantomJS and some useful features of CasperJS. I used both libraries for web scrapping but I have some bad experience with PhantomJS. Just to make it clear, I did my projects first with PhantomJS, and then...Amir DuranAmir Duranamir.duran@gmail.comAdministratorAmir Duran is software engineer who currently lives and works in Germany. He obtained Masters degree diploma on Faculty of Electrical Engineering in Sarajevo, department Computer science. With good educational background he is specialized in designing and implementing a full-stack web based applications.Code Epicenter