Linkchecker Feedback

We have now received feedback on the pilot deployment of our linkchecker module. We would like to thank Ed Bilodeau of Libraries, Victor Chisholm from the Faculty of Science and Lysanne Larose from the Faculty of Law for their valuable feedback; this has been a great way to validate the package we’ve put together.

What we have learned

The good:

  • “Linkchecker. Finally. Hooray!”
  • Broken link messages at top of node edit pages are useful.
  • Broken links report is sortable by URL, Response and Error fields.
  • Having both the response code and the error message is useful.

Areas for improvement:

  • No checking is done on unpublished pages.
  • Some error messages are not clear enough.
  • URL field title is misleading, maybe change it to Broken Link or Link to check.
  • Operations field title is misleading, maybe change it to Nodes with links to check.
  • Edit link should point to friendly URL, not node ID.
  • Operations should be the first column not the last.
  • The report is paginated with no way to view all.
  • The broken links are truncated making it difficult to print.
  • Operations field is not sortable, would be nice to be able to sort by source node.
  • Messages are displayed to anyone who has edit access, not just Site Managers.
  • 301 responses are sometimes confusing.
  • Pages requiring authentication are not handled well.

Comments:

  • 301s can sometimes be confusing. 301 means Moved Permanently, which in the case of a simple redirect is easy to understand, but there are other reasons that web servers return 301. The most confusing reason is missing trailing slashes. For example, the URL http://www.mcgill.ca/eps will return a 301, even though it’s a valid site. This is because the correct URL for that page is actually http://www.mcgill.ca/eps/ – notice the extra forward slash on the end. The reason we don’t return a 200 for both is that this could cause search engines to flag it as duplicate content. For more information on this topic, seeĀ  this Google Webmaster blog post.
  • Using friendly URLs instead of using the node ID – This is standard behaviour in the admin interface. Whenever you edit a node at any place in the admin interface, including Content..Edit, you will be taken to a node ID URL. For the sake of consistency, we probably won’t change this just for the linkchecker interface.
  • Operations as the last column. This is also a standard UI decision across all Drupal admin pages so will probably stay the same also. However we could look at adding another column for the source information.
  • Messages displayed to all editors. This makes sense for most sites, as the users editing the content are the ones who will be fixing broken links. However it might be worth looking at making the permissions for this more granular for edge cases.
  • For checking of unpublished content, it’s worth reading this issue in the linkchecker issue queue that discusses why linkchecker does not check unpublished nodes

Where we go from here:

The linkchecker module is not McGill developed, it’s a module from drupal.org which has been around for a long time. If we decide to make any of the suggested changes, the route we will take will probably be:

  • Create an issue in the drupal.org/project/linkchecker issue queue.
  • Possibly make the change ourselves and submit a patch to the issue.
  • Alternatively, wait for the module maintainer to make the change.

In the meantime, none of the feedback is critical enough to prevent us deploying to a wider audience, so we will probably enable the linkchecker on all sites fairly soon.

If you have other feedback on this project, don’t hesitate to leave a comment on this post.

WMS Linkchecker

We’ve needed to have a linkchecker in place on the WMS for a while, but for technical and time reasons we’ve been unable to implement it until now. The good news is that the Drupal community already have a very good linkchecker module, which we will be pushing out to all sites in the near future.

The module we are using is called, unsurprisingly linkchecker and provides site managers with three things:

  • A Broken Links report which usually exists on the Reports menu, but we are moving to the Content menu as our site managers don’t have access to Reports.
  • A Broken Links tab on user profiles which provides a quick way to get broken links for content only you have created.
  • Messages at the top of node edit pages listing broken links in that node.

The module is very configurable, so here are some of the decisions we’ve made for configuration. Note that we may change some of these decisions based on feedback we receive:

  • Only scan page and block content types
  • Check both internal and external liks
  • Re-check every 4 weeks
  • Do not auto-updated 301s (see below)
  • Only provide reporting on the site. We plan to add email reporting as a future enhancement.

A note on status codes: The linkchecker will report various status codes for links it considers broken. The code for a page not found is 404, but other codes can be returned too, such as:

  • 500 – The server encountered an error
  • 503 – Service currently unavailable
  • 504 – Timeout
  • 401 – Unauthorized (probably requires authentication)
  • 403 – Forbidden
  • 301 – Moved permanently

In the case of 301, linkchecker has the ability to automatically fix the link to whatever it was moved to. This is a nice feature, but it makes us a bit nervous so we’re leaving it off for now until we can test that it doesn’t break anything. 301 responses can be ignored, but it is better to fix them.

 

Blog authors are solely responsible for the content of the blogs listed in the directory. Neither the content of these blogs, nor the links to other web sites, are screened, approved, reviewed or endorsed by McGill University. The text and other material on these blogs are the opinion of the specific author and are not statements of advice, opinion, or information of McGill.