November 19, 2013

HTML output plugin for Crawljax

I've just created a HTML output for Crawljax, check it out! :-)

It's an OnNewStatePlugin, so it's called when the DOM changes in the browser, so you can save every state to file. This plugin provides that each state will be stored only once, equality testing is based on MD5 hashes. Filenames will be generated from URLs, SaveHTML will create the directories just as they are in the URL. Directories and files will be created within the Crawljax output directory, so you should specify this in the Crawljax configuration. With this, you can store a mirror of a site, but links will not be modified, so they won't work in all cases. If Crawljax meets different DOM states on the same URL, SaveHTML will add a counter to the end of the filename.

It uses my URL2File class for generating the file and directory names. A '.html' suffix will be added if the URL doesn't end with '.html' or '.htm'. URLs with '/' ending will get an 'index.html' suffix. Special characters will be replaced with '_'.

You find it on GitHub.

February 22, 2013

PubSearch 2 - The plan

I spent months writing wishlist and brainstorming and now I think I finally managed to plan the program's next version, which will be totally rewritten.

January 30, 2013

PubSearch 1.1

What's new since 1.0
  • rebuilded with the freshest JDK (7u11)
  • GUI bugs caused by JavaFX changes fixed
  • Google Scholar crawl patterns fixed
  • Springerlink crawl patterns fixed
  • publication databases added:

Downloadable at

January 25, 2013


This is my first project that I published on the world wide web. This project was my thesis at my university, and I created a SourceForge account for the program, to use SVN. A year later, in the first weeks of 2013, I received a letter from Softpedia, where the team informed me that they included my program in the Softpedia database.

What's this?
This is a Java tool which can search in multiple publication databases (such as Google Scholar, CiteSeerX, ACM, SpringerLink). You type the author's name and PubSearch grabs the basic information of her/his publications. It can transitively crawl the "cited-by" lists, so a researcher can use this tool for calculating her/his impact factor. It uses a proxy list to reach those sites, to avoid banning because of the heavy network traffic. The program uses definition files to crawl the databases, you can edit these with any simple text editor or add your own definiton. You can export publication data in citation formats.



Downloadable at

on my TODO list :-)

Let's do this

So I decided to create an english version of my blog. I won't translate all of my posts but the important ones, especially the project plans and new release announcements. You'll find all of my published projects here, with download links and I think manuals will be also available under this domain.

But what made me to create this english blog? My thesis project PubSearch was included by Softpedia yesterday, and this thing increased my motivation in developing this application. I see the download counter on the site which tells me people are interested in my work. So I think I should inform them about the new releases, bug fixes (there are some stuff to fix in that published version!), and my future plans with the project. And maybe I'll come up with new project ideas!

These days I'm quite busy, but I upload some post ASAP. Stay tuned! :-)