Linkchecker Feedback

We have now received feedback on the pilot deployment of our linkchecker module. We would like to thank Ed Bilodeau of Libraries, Victor Chisholm from the Faculty of Science and Lysanne Larose from the Faculty of Law for their valuable feedback; this has been a great way to validate the package we’ve put together.

What we have learned

The good:

  • “Linkchecker. Finally. Hooray!”
  • Broken link messages at top of node edit pages are useful.
  • Broken links report is sortable by URL, Response and Error fields.
  • Having both the response code and the error message is useful.

Areas for improvement:

  • No checking is done on unpublished pages.
  • Some error messages are not clear enough.
  • URL field title is misleading, maybe change it to Broken Link or Link to check.
  • Operations field title is misleading, maybe change it to Nodes with links to check.
  • Edit link should point to friendly URL, not node ID.
  • Operations should be the first column not the last.
  • The report is paginated with no way to view all.
  • The broken links are truncated making it difficult to print.
  • Operations field is not sortable, would be nice to be able to sort by source node.
  • Messages are displayed to anyone who has edit access, not just Site Managers.
  • 301 responses are sometimes confusing.
  • Pages requiring authentication are not handled well.

Comments:

  • 301s can sometimes be confusing. 301 means Moved Permanently, which in the case of a simple redirect is easy to understand, but there are other reasons that web servers return 301. The most confusing reason is missing trailing slashes. For example, the URL http://www.mcgill.ca/eps will return a 301, even though it’s a valid site. This is because the correct URL for that page is actually http://www.mcgill.ca/eps/ – notice the extra forward slash on the end. The reason we don’t return a 200 for both is that this could cause search engines to flag it as duplicate content. For more information on this topic, seeĀ  this Google Webmaster blog post.
  • Using friendly URLs instead of using the node ID – This is standard behaviour in the admin interface. Whenever you edit a node at any place in the admin interface, including Content..Edit, you will be taken to a node ID URL. For the sake of consistency, we probably won’t change this just for the linkchecker interface.
  • Operations as the last column. This is also a standard UI decision across all Drupal admin pages so will probably stay the same also. However we could look at adding another column for the source information.
  • Messages displayed to all editors. This makes sense for most sites, as the users editing the content are the ones who will be fixing broken links. However it might be worth looking at making the permissions for this more granular for edge cases.
  • For checking of unpublished content, it’s worth reading this issue in the linkchecker issue queue that discusses why linkchecker does not check unpublished nodes

Where we go from here:

The linkchecker module is not McGill developed, it’s a module from drupal.org which has been around for a long time. If we decide to make any of the suggested changes, the route we will take will probably be:

  • Create an issue in the drupal.org/project/linkchecker issue queue.
  • Possibly make the change ourselves and submit a patch to the issue.
  • Alternatively, wait for the module maintainer to make the change.

In the meantime, none of the feedback is critical enough to prevent us deploying to a wider audience, so we will probably enable the linkchecker on all sites fairly soon.

If you have other feedback on this project, don’t hesitate to leave a comment on this post.

2 Responses to “Linkchecker Feedback”

  1. Victor Chisholm says:

    On behalf of Ed, Lysanne, and myself, with respect to “where we go from here”, may I suggest you provide very clear documentation to the community, because the error codes are very confusing. This should be both in the email when you roll this out, and in the knowledge base documentation.

    Other concerns:

    Not clear what is the difference between the report under the user’s name (“Hello first.last@mcgill.ca” > Broken links) vs the larger report (Content > Broken links). One is a subset of the other, but this is not evident. This problem was reported, but not mentioned above.

    If the correct form of a WMS URL includes a trailing link, the WMS documentation on links (esp. article 3249) should inform users that this is the preferred form of a link. In KB article 3249, all example URLs exclude the trailing slash.

  2. Mark Styles says:

    HTTP error codes are well documented all over the web, but we will suggest to ICS that they include them in local documentation too.

    Using a trailing slash isn’t just just a WMS convention, it is standard practice on most web servers, as evidenced by the Google blog post.

    The user/broken links report shows broken links found in content that was created by that user. This is useful on sites where the content authors maintain their own content.

    The content/broken links report shows broken links found in all content. This is useful on sites where content is centrally maintained or maintained by a group.

Blog authors are solely responsible for the content of the blogs listed in the directory. Neither the content of these blogs, nor the links to other web sites, are screened, approved, reviewed or endorsed by McGill University. The text and other material on these blogs are the opinion of the specific author and are not statements of advice, opinion, or information of McGill.