Salmon Run: drupal

Showing posts with label drupal. Show all posts

Thursday, October 21, 2010

A Custom Drupal XMLRPC Service

I have written earlier about using Drupal as an XMLRPC client - via a custom module that hooks into the persistence lifecycle of a Drupal node in order to send XMLRPC requests out to an external publishing system containing an embedded XMLRPC server. Drupal can also act as an XMLRPC server, receiving and acting on XMLRPC requests from external clients.

We have been using one of the built-in services (the comment.save from Comment Services Module). However, we recently decided to build an external Comment Moderation tool, since we have outgrown the rather rudimentary comment moderation form available in Drupal, and that needs services exposed on Drupal to publish, unpublish and delete a comment, which are not available from the comment services module.

My initial explorations on Google pointed me to the this post on the Riff Blog, and thence to the XMLRPC hook, available in core Drupal. However, the results were not too satisfying (it didn't show up on the Services page at /admin/build/services) so I quickly abandoned this path and went looking for something better.

I found my answer in the source code for the Comment Services Module - it implemented hook_service(), so that pointed me to the Services Module, which in turn led me to the Services Handbook, and buried in the links on the right navigation toolbar on this page, some information that I could actually use.

Interestingly, not only does Drupal allow you to build/install custom services, it also allows for custom servers (such as JSON or SOAP). However, since I already had an XMLRPC server installed for the pre-installed services, I did not explore this option. There is more information about this on Deja Augustine's post here, along with some skeletal code for a simple service.

My example is a bit more involved, but uses the same ideas as Deja's post. Build a custom module that is in the "Services - services" package and depends on the services package (to the best of my knowledge, these are required, when I tried putting it under a different package, it would not show up on the Services page). Therefore, although I physically put the code under the sites/all/modules/custom/cmxs directory, the package name is "Services - services". Here is the .info file for my module.

name = cmxs
description = Comment Moderation XMLRPC Services
package = Services - services
dependencies[] = services
core = 6.x

The module code is also quite simple. The hook_service() declares the methods that the service makes available, along with name, input and output parameter name and types, and the names of callback functions each method must call. You can install the module immediately after your hook_service() declarations are done (along with stubs for the callback functions) by going to the Modules (admin/build/modules) page and enabling the new module. The new services will show up on the Services (admin/build/services) page, along with forms to manually test the services.

Here is the complete code for my services module:

<?php

define('CMXS_SUCCESS', 0);
define('CMXS_COMMENT_PUBLISHED', 0);
define('CMXS_COMMENT_NOT_PUBLISHED', 1);

/**
 * Implementation of hook_service().
 * Describes the methods that are exposed by this service module.
 */
function cmxs_service() {
  return array (
    // comment_publish
    array (
      '#method' => 'cmxs.comment_publish',
      '#callback' => 'cmxs_comment_publish',
      '#access callback' => 'cmxs_user_access',
      '#args' => array (
        array (
          '#name' => 'cids',
          '#type' => 'string',
          '#description' => t('Comma-separated list of Comment IDs, eg. $cid1,$cid2,...')
        )
      ),
      '#return' => 'int',
      '#help' => t('Publishes the specified comment.')
    ),
    // comment_unpublish
    array (
      '#method' => 'cmxs.comment_unpublish',
      '#callback' => 'cmxs_comment_unpublish',
      '#access callback' => 'cmxs_user_access',
      '#args' => array (
        array (
          '#name' => 'cids',
          '#type' => 'string',
          '#description' => t('Comma-separated list of Comment IDs, eg. $cid1,$cid2,...')
        )
      ),
      '#return' => 'int',
      '#help' => t('Unpublishes the specified comment.')
    ),
    // comment_delete
    array (
      '#method' => 'cmxs.comment_delete',
      '#callback' => 'cmxs_comment_delete',
      '#access callback' => 'cmxs_user_access',
      '#args' => array (
        array (
          '#name' => 'cids',
          '#type' => 'string',
          '#description' => t('Comma-separated list of Comment IDs, eg. $cid1,$cid2,...')
        )
      ),
      '#return' => 'int',
      '#help' => t('Deletes the specified comment.')
    ),
  );
}

/**
 * Implementation of hook_disable().
 * Actions that need to happen when this module is disabled.
 */
function cmxs_disable() {
  cache_clear_all('services:methods', 'cache');
}

/**
 * Implementation of hook_enable().
 * Actions that need to happen when this module is enabled.
 */
function cmxs_enable() {
  cache_clear_all('services:methods', 'cache');
}

/**
 * Custom user access function to short circuit user_access() since
 * we want to bypass Drupal's authentication, since the tool will
 * always send authenticated requests.
 */
function cmxs_user_access() {
  return TRUE;
}

/**
 * Finds the comments corresponding to the cids specified that are in
 * state NOT_PUBLISHED and updates their status to PUBLISHED, then saves 
 * them.
 */
function cmxs_comment_publish($cids) {
  watchdog('cmxs', 'cmxs_comment_publish(cids=' . $cids . ')');
  $comments = _cmxs_find_comments($cids, CMXS_COMMENT_NOT_PUBLISHED);
  foreach ($comments as $comment) {
    $comment->status = CMXS_COMMENT_PUBLISHED;
    comment_save((array) $comment);
    watchdog('cmxs', 'Comment (cid=' . $comment->cid . ') published');
  }
  return CMXS_SUCCESS;
}

/**
 * Finds the comments corresponding to the cids specified that are in
 * state PUBLISHED and updates their status to PUBLISHED, then saves them.
 */
function cmxs_comment_unpublish($cids) {
  watchdog('cmxs', 'cmxs_comment_unpublish(cids=' . $cids . ')');
  $comments = _cmxs_find_comments($cids, CMXS_COMMENT_PUBLISHED);
  foreach ($comments as $comment) {
    $comment->status = CMXS_COMMENT_NOT_PUBLISHED;
    comment_save((array) $comment);
    watchdog('cmxs', 'Comment (cid=' . $comment->cid . ') unpublished');
  }
  return CMXS_SUCCESS;
}

/**
 * Since deleting a comment requires administrator privileges, we cannot
 * call comment_delete($cid) directly (since our service has no privileges).
 * So we unpublish first, and then delete using a direct SQL call.
 */
function cmxs_comment_delete($cids) {
  watchdog('cmxs', 'cmxs_comment_delete(cids=' . $cids . ')');
  $comments = _cmxs_find_comments($cids);
  foreach ($comments as $comment) {
    $comment->status = CMXS_COMMENT_NOT_PUBLISHED;
    comment_save((array) $comment); // UDXI unpublished called here
    _cmxs_delete_comment($comment->cid);
    watchdog('cmxs', 'Comment (cid=' . $comment->cid . ') deleted');
  }
  return CMXS_SUCCESS;
}

/**
 * Given a comma-separated list of CIDs, return a list of comments that
 * correspond to these CIDs. CIDs that don't correspond to a Comment in
 * Drupal are silently ignored. If the optional parameter status is provided
 * only comments with the specified status are returned.
 */
function _cmxs_find_comments($cids, $status = NULL) {
  $comments = array();
  $cid_array = explode(',', $cids);
  if ($cid_array == FALSE) {
    return $comments;
  } else {
    foreach ($cid_array as $cid) {
      $comment = _comment_load($cid);
      if ($comment != NULL) {
        if ($status != NULL) {
          if ($comment->status != $status) {
            continue;
          }
        }
        $comments[] = $comment;
      }
    }
  }
  return $comments;
}

/**
 * Deletes a comment corresponding to the specified cid from the Drupal
 * database. There is no authorization check.
 */
function _cmxs_delete_comment($cid)  {
  $db_result = db_query('DELETE FROM {comments} WHERE cid = %d', $cid);
}

As you can see, the hook_service() is purely declarative. I also have hook_enable() and hook_disable() implementations to make it a bit quicker to develop (changes in method signature only needs a module disable followed by a module enable, no update.php run required).

I also have a custom user_access() function which does nothing. Because I am using non-authenticated XMLRPC requests, and the Drupal comment API for saving the updated comment object, I figured that Drupal would insist (as it should) on an authorized user to do the updates. So the cmxs_user_access() function is there to bypass Drupal's authentication.

I have mentioned before about how impressed I am by Drupal's overall design, and the Services module is no exception. Like the rest of Drupal, it follows the convention over configuration philosophy, and once you understand the convention, building a custom service is really quite simple.

Friday, September 03, 2010

Cleaning up Custom Drupal Module with Object Oriented PHP

Earlier this year, I wrote about a little Drupal custom module that intercepts a node as it is being saved in Drupal, and sends off an XML-RPC request to publish it to a Java based publishing system.

The XMLRPC call to the external publisher in my original implementation looked like this (if you are looking for context, the full working code for the initial version can be found on the post referenced above).

<?php
function send_request($server_url, $op, $node) {
  ...
  $result = xmlrpc($server_url, 'publisher.' . $op, 
    $node->nid, $node->title, $node->body);
  ...
}

Of course, almost right off the bat, sending just these three fields was found to be insufficient. For this module to be useful, I needed to send over large denormalized views of data to be published (sometimes spanning multiple nodes, depending on the content type). So the first attempt was to refactor this out to a prepare_node() call, like so, where prepare_node() delegates to content type specific methods and returns a PHP associative array (thats a map for you Java guys).

<?php
function send_request($server_url, $op, $node) {
  ...
  $result = xmlrpc($server_url, 'publisher.' . $op, prepare_node($node));
  ...
}

function prepare_node($node) {
  if ($node->type == 'foo') {
    return prepare_foo($node);
  } else if ($node->type == 'bar') {
    return prepare_bar($node);
  } else {
    watchdog('dxi', 'Unrecognized content type: ' . $node->type);
    return get_object_vars($node);
  }
}

function prepare_foo($node) {
  $fields = array();
  $fields['id'] = $node->nid;
  $fields['title'] = $node->title;
  $fields['body'] = $node->body;
  // ... etc.
  return $fields;
}

function prepare_bar($node) {
  // ... more of the same
}

This worked for us for a while, through our first implementation. Soon, however, as our project manager puts it, we became victims of our own success, and it was decided to roll out similar infrastructure for other partners as well. Each partner had their own set of different (sometimes overlapping) content types. At one point, it became too confusing to maintain this prepare_node() function, so I decided to refactor. Conceptually, I wanted something like this:

<?php
function prepare_node($node, $server_url) {
  $partner_name = get_partner_from_server_url($server_url);
  if ($partner_name == 'partnerA') {
    if ($node->type == 'foo') {
      return prepare_partnerA_foo($node);
    } else if ($node->type == 'bar') {
      return prepare_partnerA_bar($node);
    } else {
      watchdog('dxi', 'Unrecognized type ' . $node->type . 
       ' for partner: ' . $partner_name);
      return get_object_vars($node);
    }
  } else if ($partner_name == 'partnerB') {
    // more of the same
  } else {
    watchdog('dxi', 'Unrecognized partner: ' . $partner_name);
    return get_object_vars($node);
  }
}

In our case, we decided to give each partner a separate publish URL so the publisher could distinguish between calls from different partners (and do different things), and the partner name can be derived using a get_partner_from_server_url() (not shown) using plain string manipulation.

None of us were very familiar with Object Oriented PHP, but we were familar with OOP, and the above code looked like it would convert nicely to a Factory pattern. So I adapted the idea in this code snippet, along with the technique mentioned in the user contributed notes to reflectively instantiate a partner/content type specific class given the partner name and content type.

<?php
class ContentProcessor {
  protected function prepare_node($node) {
    return get_object_vars($node);
  }
}

abstract class ContentProcessorFactory {
  public function create($partnerName, $nodeType) {
    if ((!empty($partnerName)) && (!empty($nodeType))) {
      $className = "ContentProcessor_" . $partnerName . "_" . $nodeType;
      if (class_exists($className)) {
        return new $className;
      } else {
        watchdog("dxi", "No ContentProcessor found for " . $nodeType .
          " for partner: " . $partnerName);
        $className = "ContentProcessor_" . $partnerName;
        if (class_exists($className)) {
          return new $className;
        }
      }
    }
    watchdog("dxi", "No ContentProcessor found for (partner,type)=(" .
      $partnerName . ',' . $nodeType . ")");
    return new ContentProcessor();
  }
}

Processing code for each partner was moved into separate .inc files. Each partner contains a base ContentProcessor_${partnerName} class which extends ContentProcessor, and multiple ContentProcessor_${partnerName}_${nodeType} classes, one per content type, which extends ContentProcessor_${partnerName}. Something like this:

<?php
class ContentProcessor_partnerA extends ContentProcessor {

  // common functions go here, these are called by its subclasses
  protected utility_function_1($arg1, $arg2) {
    // do some common stuff here
  }
}

class ContentProcessor_partnerA_foo extends ContentProcessor_partnerA {
  public function prepare_node($node) {
    // returns an array
    $fields = array();
    $fields['id'] = $node->nid;
    $fields['title'] = $node->title;
    $fields['body'] = $node->body;
    // ... etc.
    return $fields;
  }
}

class ContentProcessor_partnerA_bar extends ContentProcesor_partnerA {
  public function prepare_node($node) {
    // similar to above
  }
}

In the main (non OOP) module, we instantiate the appropriate ContentProcessor and delegate to its prepare_node($node) method, like this. The partner name is configured as part of the custom action's properties.

<?php
function prepare_node($server_url, $node) {
  $partner_name = get_partner_from_server_url($server_url);
  $contentProcessor = ContentProcessorFactory::create(
    $partner_name, $node->type);
  return $contentProcessor->prepare_node($node);
}

With the old setup, every time we needed to add logic for a new partner, a developer would have to add a bunch of if-else calls in the main prepare_node(), and add some more prepare_xxx() methods to the already bloated module file. Using the new setup, the developer would create a new ${partnerName}.inc file, create the top level ContentProcessor_${partnerName} file and its subclasses for each content type, include them in the main module with a require_once() call.

Since we put the get_object_vars() call (the else part of the original if..else in prepare_node()), it is still just as easy to experiment with a new content type (dumping out the field names before writing code). Additionally, the exceptions tell us exactly what class we have to implement.

Monday, May 17, 2010

Alfresco: Installation and Initial Thoughts

Currently, we are on the final stretch of delivering a product that combines the Drupal CMS with a Java-based publishing system. It has been a long and painful process getting to this point. One of the things that contributed to the pain is the opacity of the Drupal code, compounded by the fact that we don't have too much Drupal/PHP talent in-house. So at one point, I wondered if the process wouldn't have been easier had we chosen to work with a Java based CMS instead.

One well-known and fairly mature open-source Java based CMS is Alfresco. I decided to check it out and use it to build something similar to our Drupal based product. What's the objective of this apparently pointless exercise, you ask? Well, its mainly to learn about Alfresco and see how it compares to Drupal, really just curiosity. And no, its not to be able to switch out the Drupal component in the product with one based on Alfresco, in case you were wondering - that would be too risky, at least at this stage.

According to this somewhat controversial Infoworld article, Alfresco scores better than Drupal. However, the jury seems to be still out on that.

I think the best way to decide is to figure this out for myself. Prior to working with Drupal, I didn't really know what to look for in a CMS. Not that I know everything there is to know about this even now, but here is my set of "required features" for a CMS.

Custom Content - the ability to define custom content types specific to the application.
Profile - the ability to store user profile information, which may not be natively supported in the CMS user object. The reason I mention this separately is that the user object is usually distinct form a content object.
Import Content and Users - there should be some sort of API so I can import content and users form an external (possibly XML) source.
Users and Roles - CMS should support multiple users with different roles.
Workflow - documents will have to pass through multiple reviews before being published.
Relate content - a document in the CMS may be associated to to zero or more documents in the CMS.
Taxonomy - a document may be associated with multiple taxonomy vocabularies. The associations could be 1:1 or 1:n.
Enter/Maintain content - there should be a UI in order for users to enter new content and maintain existing content.
Interface to Publisher - should be able to send publish/unpublish commands to the current publisher interface.

With Drupal, we had the benefit of a consultant who helped us out with the installation, setup and initial learning curve. So I may be a bit biased towards Drupal because to me it is "simpler". However, with Alfresco, I am more familiar with the components used to build it (Spring, Lucene, Hibernate, JCR), so perhaps the bias will cancel itself out.

Since I am not an Alfresco expert, I plan on spending the next several weeks working through the various "requirements" and see how hard/easy it is compared to Drupal (much of this stuff is already implemented in our Drupal instance by our consultant, and some by me). At the end of this, I hope to have enough knowledge to be able to customize an Alfresco instance to a set of semi-realistic base requirements.

This week, all I've been able to do is to set up the Alfresco ECM client, basically a web application running on Tomcat. I've also set up an Eclipse project that will contain my customizations to the base Alfresco package to make it behave more like our Drupal installation. I describe them below:

Alfresco ECM Client setup

Prebuilt packages for Windows, Linux and MacOS exist for doing 1-click installs of Alfresco. I wanted to use the Tomcat that ships with my Mac (which I ended up upgrading later for a different reason) and the MySQL that I downloaded as part of MAMP for Drupal earlier, and because the prebuilt packages embed both these components, I didn't want to use the prebuilt packages.

So I initially downloaded the project from SVN, but was unable to build, so I downloaded the latest stable WARs, and popped the alfresco.war file into Tomcat's webapps folder. It complained about various things:

Tomcat running out of PermGen - the fix was to set a higher value for PermGen space based on this this Alfresco wiki page.
Missing ImageMagick and swftools - Alfresco complains about not finding these, so I needed to install them (sudo port install) and update the repository.xml file to point to the correct locations for these two tools.

I also had to create the alfresco database in MySQL and grant the alfresco user the appropriate rights as defined here.

At the end of a fairly long startup (the database gets populated with the tables the first time round, and the alf_data directory gets created and initialized), I was rewarded with the following page at http://localhost:8080/alfresco.

My first impression of the Alfresco ECM user interface is that its horribly complex compare to Drupal's, but then it could just be my unfamiliarity with it.

Customization Project setup (Eclipse)

I followed Jeff Potts's Alfresco Developer's Guide (see References below) almost to the letter while setting up my client customizations project. That way I could use the build.xml file in the code download for the book. The way the project is set up is that it zips up the files and unpacks them into an exploded Alfresco web application in Tomcat.

The directory structure for the project is as follows:

 PROJECT_ROOT
  |
  +-- src
  |   |
  |   +-- java
  |   |
  |   +-- web
  |       |
  |       +-- META-INF
  |       |
  |       +-- jsp
  |       |   |
  |       |   +-- extension
  |       |
  |       +-- mycompany
  |
  +-- config
       |
       +-- alfresco
            |
            +--- extension

I also downloaded the Alfresco JAR files and created a lib directory outside the project, so they don't get copied along with the project's ZIP file.

Initial Thoughts

Drupal appears to be more "complete" and intuitive (at least for my web developer intuitions). You can configure it to add your customizations, use its forms interface to generate content, and even use it to power your site's dynamic content pages, all in the same application. From my initial skimming of Munwar Sharif's book (see References below), the ECM can do most of what I want from it, but I just don't know how to do them yet. However, the general recommendation is to usually have a separate custom application for the CMS users, communicating with Alfresco's repository over REST/SOAP. For my web users, I would want the application to be decoupled from the ECM anyway, so the absence of a web front-end in Alfresco is a non-issue for me.

Drupal also has a lot more documentation freely available on the Internet. This is probably just because Alfresco is a younger project and its relatively harder to get into, so there are fewer people writing about it. However, there are at least two excellent books about it (see References below), which I suspect I will get much more familiar with over the coming weeks :-).

References

Alfresco Enterprise Content Management Implementation - by Munwar Sharif. I've just started reading this, so don't have much to say at this point.
Alfresco Developer Guide - by Jeff Potts. I've gone through this once already. There is an enormous amount of information in here, which I haven't digested completely either. Hopefully, as I work through my use cases, I will understand more.

Useful Links

Here are some links I found which I thought was useful. I list them below, hopefully you find them useful too.

Understanding the differences between Alfresco's repository implementations by Jeff Potts.
Alfresco User Interface: What are my options? also by Jeff Potts.
Alfresco Wiki Entry - from Munwar Sharif's book.

Update - 2010-06-04

I wanted to have a way to override the repository.xml using my custom alfresco-global.properties, so I followed the instructions here and here, but no luck. Ended up adding these properties into the exploded repository.properties file instead. Not clean, but it works. Its probably as much work to maintain a custom version of alfresco.war as it is to maintain a Tomcat version customized for Alfresco.

Sunday, March 07, 2010

Drupal Hell: A Controlled Descent Story

As I mentioned before, I've been working on integrating Drupal into our (Java) publishing infrastructure. Over the last few weeks, I have been learning about Drupal, what is possible and what is not. This week, I describe two use-cases where I had to "extend" Drupal somewhat. In the second case, I came very close to hacking the core, hence the title, based on the advice given in The Road to Drupal Hell.

The setup is that we use Drupal as our blogging platform, and then publish the blogs using XML-RPC to a Java application. Only the editorial staff and bloggers have access to the Drupal app. Readers read the blog on the Java based website.

Of course, blogs by their very nature are interactive, so readers should have a way to comment on the blogs. This is done by a Java action which calls Drupal's comment.save XML-RPC service. By default, comments need to be moderated, so the editorial staff will review and publish these comments, which would then result in a node republish, and the comments will appear on the Java site.

Use Case #1: Batch Publishing

Trying to test using the Drupal interface is kind of painful (too many mouse clicks required), so I built a little PHP script that publishes all publishable blogs in one call. This post from Stonemind Consulting was very helpful. Basically, I followed the ideas in here to call the send_request() function. Here is the complete script if you are interested:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56 <?php
require_once './includes/bootstrap.inc';
drupal_bootstrap(DRUPAL_BOOTSTRAP_FULL);
error_reporting(E_ALL);

# include local settings for scripts. These are the values of the remote
# publishing and preview servers.
require_once './sites/default/local_settings.php';

$user = $GLOBALS['user'];
if (isset($user)) {
  if ($user->uid != 1) {
    echo 'Sorry, only administrator can perform this task.';
    return;
  }
} else {
  echo 'Not logged in. Please log in as administrator before performing task';
  return;
}

echo '<b>Republishing nodes...</b><br/>';
echo 'This tool republishes ALL nodes with node.status=1 and node.type=blog.<br/><br/>';
echo '<table cellspacing=3 cellpadding=3 border=1>';
# Header
$num_publishers = 1;
echo '<tr><td><b>Node-ID</b></td><td><b>Title</b></td>';
foreach ($healthline_drpub_remote_urls as &$remote_url) {
  echo '<td><b><a href="' . $remote_url . '">Publisher-' . $num_publishers . '</a></b></td>';
  $num_publishers++;
}
echo '</tr>';
$published_nids = db_query('select nid from node where status = %d and type=\'%s\'', array(1, 'blog'));
$num_nodes = 0;
$num_attempted = 0;
$num_ok = 0;
$num_failed = 0;
while ($result = db_fetch_object($published_nids)) {
  $node = node_load($result->nid);
  $preview_url = sprintf($preview_url, $node->nid);
  echo '<tr><td>' . $node->nid . '</td><td><a href="' . $preview_url . '">' . $node->title . '</td>';
  foreach ($remote_urls as &$remote_url) {
    $response = dxi_send_request($remote_url, 'publish', $node);
    if ($response == 0) {
      echo '<td><font color="green">OK</font></td>';
      $num_ok++;
    } else {
      echo '<td><font color="red">Failed</font></td>';
      $num_failed++;
    }
    $num_attempted++;
  }
  $num_nodes++;
  echo '</tr>';
}
echo '</table><br/>';
echo '#-nodes republished: ' . $num_nodes . ', #-publish attempts: ' . $num_attempted . ', #-successes= ', $num_ok . ', #-failures= ', $num_failed;

As you can see, most of this is plain old PHP, with a few Drupal API method calls thrown in for convenience. You basically call this on a browser from within Drupal's interface, similar to /update.php or /install.php. You will need to be logged in as administrator to run the script.

One problem that I had (which I also had in the second use case, and which I solved differently, and I think a bit more elegantly) was to get the value of the URL for the publisher XML-RPC service, and the URL for the preview service (to allow one-click viewing of the published node). Ideally, I should be able to get these from Drupal itself, since these are already provided to it, but I couldn't find an API call that provides this information, so I had to build up my own mechanism, similar to the settings.php file. My properties file is called local_settings.php and is located as a sibling of the settings.php file in sites/default.

1
2
3
4
5 <?php
$remote_urls = array(
  'http://somehost.mycompany.com/myapp/publish.do'
);
$preview_url = 'http://somehost.mycompany.com/myapp/blog/%d';

I don't really like this approach, as its not DRY, but its the best I could come up with at the time. The approach of setting it in the Drupal variables table (described below) is slightly better, but still not as DRY as I would like.

Use Case #2: Comment Publish/Unpublish

Wanting to do the right thing as a new Drupal developer, I had read the Pro Drupal Book, and heeded and understood the warnings about hacking Drupal's core. In fact, I had structured my module (described earlier) so it exposed a configurable action - that way a site administrator can attach it to whatever trigger he saw fit, rather than have to do this in code by implementing a hook_XXX() method.

Drupal exposes triggers on "insert", "update", "delete" and "view" comment actions. So I figured that a comment publish/unpublish results in a status change, therefore my action should be attached to a comment update trigger, but apparently I was wrong. For some reason (probably performance), the publish/unpublish operation is done through a straight SQL call (specified in comment_operations()), and does not result in the comment update trigger(s) being fired. Actual modification of the comment (such as modifying the content or the title), however, does.

So my first approach was to go in and hack the core, specifically, the comment_hook_info() method, to add the 'publish' and 'unpublish' operations in it. This allowed me to hook up my custom action, but still did not result in it being called (which was a good thing, since it forced me to re-evaluate my options).

Running through the publish/unpublish operation with a debugger, I found that it calls comment_invoke_comment() from comment_admin_overview_submit() when a comment is published from the Approval Queue (or vice versa). The comment_invoke_comment() invokes the hook_comment() method in all loaded modules, so I needed to explicitly implement a dxi_comment() function which would delegate to my custom action. I had originally tried using a hook_nodeapi() implementation and removed it in favor of a custom action, so this was kind of a step backward. In any case, this was finally what I came up with:

<?php
function dxi_comment($comment, $op) {
  if ($op != 'publish' && $op != 'unpublish') {
    return;
  }
  if (! isset($comment->nid)) {
    watchdog('dxi', 'No Node for Comment(' . $comment->cid . ', op=' . $op . ')');
    return;
  }
  $nid = $comment->nid;
  $node = node_load($nid, NULL, TRUE);
  if (! $node) {
    watchdog('dxi', 'Node load failed for nid=' . $nid);
    return;
  }
  // prepare the context
  $context = array(
    'op' => $op,
    'hook' => 'comment',
    'remote_url' => variable_get('dxi_remote_url', DEFAULT_DXI_REMOTE_URL)
  );
  return dxi_call_action($node, $context);
}

Note how I have to build up the $context object. Unlike in the custom action, the $context object does not have the 'remote_url' field which is set by dxi_call_action_submit(). So I had to make another change here to set the remote_url value set up by the client into the variables database table, as shown below, so I could use it in my dxi_comment() to populate the $context.

<?php
function dxi_call_action_submit($form, $form_state) {
  $remote_url = $form_state['values']['remote_url'];
  variable_set('dxi_remote_url', $remote_url);
  return array(
    'remote_url' => $remote_url
  );
}

With this approach, I have basically worked around what I think is an oversight in Drupal's comment module. Since the comment module does pass in 'publish' and 'unpublish' in the $op variable when it calls dxi_comment(), these operations are recognized in the code, so I don't see why they are not exposed so custom actions can be added to them.

Conclusion

I am still learning Drupal, and my main programming language is Java, not PHP, and this probably shows in my code. I am wondering what the best practices are for this kind of stuff. Specifically, I am looking for answers to:

What is the best practice for accessing action configuration variables? I have two different ways of pulling the remote_url value from my custom action. Ideally, I would like to just pull it out of the context somehow, or use some Drupal call to get it out of the actions table. Does such a method exist?
Have others had this kind of problem with comment publishing and unpublishing? If so, how did you solve them? Is there a reason why the publish and unpublish actions are not exposed as triggers?

I guess I could ask around on the Drupal mailing list, and I probably will, but as someone who really cares just enough about Drupal to get things hooked up correctly with the Java application, I will probably hold off on this until my Java application is more stable. Meanwhile, if you have answers or alternative approaches to this problem, would appreciate hearing from you.

Sunday, February 21, 2010

Debugging PHP with NetBeans on Mac and Linux

As you probably know, I recently started working with Drupal, with lots of help from the Pro Drupal book. In the book (page 524), the author states:

Real-time debugging is a feature of PHP and not Drupal, but it's worth covering, since you can easily be identified as a Drupal ninja if a real-time debugger is running on your laptop.

In my relentless quest for Drupal ninjahood, I naturally wanted to get debugging working with NetBeans, my PHP editor of choice. Just kidding ... the need to get debugging working was driven by the relatively opaque nature of Drupal - the only two debugging features I am aware of are the watchdog logging feature, which I have used, and the Drupal Devel module, which I have not (yet). Also, like most Java programmers, I am probably more dependent on a good IDE than most PHP programmers.

In any case, this post talks about what needs to be done to get PHP debugging working on NetBeans on Linux (CentOS) and Mac (Snow Leopard). The information was gleaned from multiple posts, some of which provided inaccurate or incomplete information, so there was some amount of trial and error involved.

I will assume that you have NetBeans installed and you occasionally use it. I use NetBeans only for scripting, and so far, the lack of a debugger (or my lack of knowledge on how to run it) has not affected me. With Drupal, however, there is no real way of knowing what module(s) are getting called in a request, save from writing watchdog calls in your code, so having a debugger to step through the code can be quite helpful.

This part probably doesn't matter unless you are also doing Drupal development, but there appears to be a Goldilocks syndrome thing going between Drupal 6.15 and PHP 5.2. My CentOS 5.3 has PHP 5.1.6 in the default yum repository. Apparently that doesn't quite cut it with Drupal, so I initially enabled Remi's repository based on information in Binit Bhatia's post - but that now gives me PHP 5.3 (not surprising, since Binit's post is almost a year old now), which has even bigger problems with some of the modules. Ultimately, I ended up getting the 5.2 from the CentOS testing repository following guidelines in Irakli Nadareishvili's post.

On the Mac, I am using MAMP, which has PHP 5.2 installed as a component within MAMP (ie, under /Applications/MAMP/bin/php5/bin), even though at the OS level it has PHP 5.3 installed (ie, under /usr/bin).

So anyway, what you need to do is to install and configure XDebug to work with PHP and NetBeans. On CentOS, just download the source and build it as explained in the XDebug install page.

sujit@lysdexic:xdebug$ phpize
sujit@lysdexic:xdebug$ ./configure --enable-xdebug
sujit@lysdexic:xdebug$ make
sujit@lysdexic:xdebug$ sudo cp modules/xdebug.so /usr/lib64/httpd/modules/xdebug.so

On the Mac, compiling from source did not work, neither against the PHP 5.3 in /usr/bin nor the PHP 5.2 packaged with MAMP. It fails on make, and it looks like a bad #ifdef in the code somewhere. However, I was able to get it by downloading the PHP Remote Debugging package from the ActiveState Komodo Site and copying the xdebug.so file under the 5.2 directory to /Applications/MAMP/bin/php5/lib/extensions/no-debug-non-zts-20060613/ - most of this information came from Felix Geisendörfer's post.

The next step is to hook XDebug with PHP and NetBeans. Thankfully, this is (almost) identical on both Mac and CentOS. Essentially, the following lines need to be added to the php.ini file (/etc/php.ini on CentOS and /Applications/MAMP/conf/php5/php.ini on Mac). The information below mostly comes from the Debugging PHP Source Code in the NetBeans IDE NetBeans article, although some of it has been changed using information from other posts.

[xdebug]
zend_extension=/path/to/xdebug.so
xdebug.remote_enable=1
xdebug.remote_handler=dbgp
xdebug.remote_host=localhost
xdebug.remote_port=9000
xdebug.profiler_enable=1
xdebug.profiler_output=/tmp

In addition, remove/comment any zend_* properties from the php.ini file. Apparently XDebug and Zend don't go well together.

Restart your webserver and bring up a page with phpinfo() and verify that XDebug is available, or use php -m to list the modules as specified in the XDebug install page. If XDebug seems to be installed okay, the next step is to check if NetBeans can see XDebug.

To verify this, start NetBeans, open an existing PHP project (such as a Drupal application), then click the debug button. You should see a "Connecting to XDebug" message on the status bar which should change to "netbeans-xdebug" within a few seconds. At this point, you can debug a request by clicking a URL on your browser and stepping through the code, setting breakpoints, inspecting variables, etc on NetBeans.

Saturday, January 30, 2010

Drupal interface to Java using XML-RPC

I haven't been posting to the blog as often as I want to lately. I just came back from a fairly long vacation in India, and over the last month, I have been trying to catch up on stuff, both at home and at work. The good news is that I am almost caught up, so I should return to my normal posting frequency shortly - as long as I have something interesting to write about, that is.

At work, we are setting up Drupal as our internal CMS. This is a bit of a departure from our normal practice, since Drupal is written in PHP, and we are predominantly a Java shop. Personally, I would probably have gone with Alfresco or dotCMS, but I don't know much about either CMS, or much about CMSs in general, so my preference is based solely on the fact that they are written in Java and open-source, so customizing them to interface with other existing (Java) components would be easier.

However, having now spent about a week futzing with Drupal, I am quite impressed with its pragmatic design - there are many good ideas in there that a Java web developer could use. Many thanks to John K VanDyck for writing Pro Drupal Development - this helped me immensely in getting up to speed quickly. Regarding interfacing to existing Java components, its not as huge a deal as I thought it would be either. From my point of view, Drupal (or any other CMS for that matter) is mainly a container of content, and interfacing with it would be done through a narrow (code wise) interface point.

The particular case I was looking at was to allow Drupal to publish/depublish stories to an external data store, from where it would be read by one or more Java application(s). The solution I came up with was to have Drupal send an XML-RPC message to a Java middleware server component on publish or depublish, which would do what is needed to write it out to the external data store. I describe the Drupal side of the interface here.

Installing Drupal

As mentioned, I have been using Drupal for about a week, so the first step was to install it. On my CentOS desktop at work, I installed (using yum) PHP, PHP-MySQL and re-installed MySQL (since the PHP-MySQL RPM did not seem to play well with the already installed MySQL RPMs from MySQL. I also set up Lighttpd as my webserver, then copied the Drupal tarball into my document root. On my Mac OS X notebook, I had to remove MySQL and install MAMP, which bundles Apache, MySQL and PHP, then install the Drupal tarball into my Document root.

From there, all you have to do is to create your Drupal database and user and then navigate to http://localhost/install.php on your browser. The installation process will walk you through a few pages, and you are all set up.

The Interface Module

Extension to Drupal are made via Modules. Drupal modules are typically written by super-experienced Drupal/PHP coders, but this one is really short and simple. I looked around a bit on the Internet for something similar, but perhaps my use case is too trivial for someone to build and contribute a module for. All it does is define an Advanced Action which is triggered on a Node insert, update or delete, along with its form for configuration. All this stuff is adapted from Chapters 3 and 19 of the Pro Drupal Development book.

My module is named dxi (Drupal XML Interface). The code lives under sites/all/modules/custom/dxi. The dxi.info file contains the meta information for the module. It looks like this:

; Source: sites/all/modules/custom/dxi/dxi.info
name = dxi
description = Drupal Interface to Java via XML-RPC
core = 6.x
package = My Company

The actual code is in dxi.module. I initially started with the send_request() and dxi_nodeapi() methods. This would have the effect of being called on every node event. So in the dxi_nodeapi() method, I was checking and firing the send_request() call only when the operation was "insert", "update" and "delete". This is really all that is needed to get my interface up and running.

However, using an Advanced Action instead allows you to let the user choose whether the event should be fired in the future on different events, and to set the URL for the remote XML-RPC server from the Administrator GUI instead of having to hardcode it into the code. So I ended up using the Advanced Action approach. Actions in the dxi module are defined in dxi_action_info() - there is only one, dxi_call_action(). Because it is an Advanced (Configurable) action, it needs a configuration form, which is defined by dxi_call_action_form(), the form validation is defined in dxi_call_action_validate() and the populated form values are returned from dxi_call_action_submit().

Both approaches call the send_request() method, which does the actual XML-RPC call to the remote server. The code for the entire module is shown below, with the first (hook_nodeapi) approach commented out for posterity/just in case.

<?php
/*
 * Source: sites/all/modules/custom/dxi/dxi.module
 */

/**
 * Function to send the actual XML-RPC request over to the remote
 * server. Will throw a drupal error message if the XML-RPC request
 * fails for any reason.
 * @param $server_url - the URL of the remote server.
 * @param $op - the name of the operation to invoke.
 * @param $node - the reference to the node.
 * @return 0 or -1.
 */
function send_request($server_url, $op, $node) {
  watchdog('dxi', 'Sending request publish_' . $op . '(' . $node->nid . ')');
  $result = xmlrpc($server_url, 'publisher.' . $op, 
    $node->nid, $node->title, $node->body);
  if ($error = xmlrpc_error()) {
    if ($error->code <= 0) {
      $error->message = t('Remote server appears to be down');
    }
    drupal_set_message(t('Operation publish_' . $op . '(' . $node->nid .
      ') failed: %message (@code).', array(
      '%message' => $error->message,
      '@code' => $error->code
      )
    ));
    return -1;
  }
  return 0;
}

/**
 * Implementation of hook_nodeapi().
 * This is automatically called by the hook_nodeapi() on all nodeapi
 * events. This is the simplest approach to sending an XML-RPC request
 * for the desired operations.
 * @deprecated - use the dxi_call_action approach instead.
 * @param &$node - the node object.
 * @param $op - the operation.
 * @param $a3 - optional argument, set to NULL.
 * @param $a4 - optional argument, set to NULL.
 */
//function dxi_nodeapi(&$node, $op, $a3 = NULL, $a4 = NULL) {
//  if ($op == 'insert' || $op == 'update' || $op == 'delete') {
//    watchdog('dxi', 'dxi_nodeapi' . $op);
//    return send_request('http://localhost:8081/publisher', $op, $node);
//  }
//}

/**
 * Implementation of hook_action_info().
 * @return the $info object.
 */
function dxi_action_info() {
  $info['dxi_call_action'] = array(
    'type' => 'node',
    'description' => t('Send XML-RPC request'),
    'configurable' => TRUE,
    'hooks' => array(
      'nodeapi' => array('insert', 'update', 'delete')
    )
  );
  return $info;
}

/**
 * This function traps the nodeapi_insert, nodeapi_update and nodeapi_delete
 * events, and sends an XML-RPC request over to the Java server. This approach
 * is slightly more flexible than the dxi_nodeapi() approach, since you can
 * set up the component and operation mappings to the action from the Drupal
 * administration GUI.
 * @param $object - the node object.
 * @param $context - the context object.
 */
function dxi_call_action($object, $context) {
  $node = $object;
  if ($context['hook'] == 'nodeapi') {
    watchdog('dxi', 'dxi_call_action');
    // we only want to trigger this action on a node publish or unpublish
    // but we let the user decide that through the GUI
    $op = $context['op'];
    $remote_url = $context['remote_url'];
    return send_request($remote_url, $op, $node);
  }
}

/**
 * Implementation of ${action_name}_form. This returns field information
 * for the configuration form for this action.
 * @param $context - the context.
 * @return the $form object.
 */
function dxi_call_action_form($context) {
  $form['remote_url'] = array (
    '#type' => 'textfield',
    '#title' => t('Remote Server'),
    '#description' => t('Enter URL of Remote Server'),
    '#default_value' => isset($context['remote_url']) ?
      $context['remote_url'] : 'http://localhost:8080/publisher',
    '#required' => TRUE
  );
  return $form;
}

/**
 * Implementation of the ${action_name}_validate. This contains validation
 * for the form inputs. This is a NO-OP here.
 * @param $form - the form.
 * @param $form_state - the $form_state
 */
function dxi_call_action_validate($form, $form_state) {
  // Nothing
}

/**
 * Implementation of the ${action_name}_submit. This returns the validated
 * values of the form.
 * @param $form - the form.
 * @param $form_state - the form state.
 * @return the map of names and values of form fields.
 */
function dxi_call_action_submit($form, $form_state) {
  return array(
    'remote_url' => $form_state['values']['remote_url']
  );
}

Interface Installation and Configuration

Install the Module: Go to Administer :: Site building :: Modules and you should see the dxi module at the bottom of the page. Click the Enabled checkbox and save the configuration.

Add the Action: Go to Administer :: Site configuration :: Actions to see a list of Actions currently available to Drupal. At the bottom is a dropdown list of Advanced Actions. Choose the one that says "Send XML-RPC request" and click the Create button. This will bring up the configuration screen for this action. Set the URL for the XML-RPC server and click Save. You will see the action now associated with node Action type.

Associate the Action with the Trigger: Go to Administer :: Site building :: Triggers. Choose the Content tab. You will see a list of trigger types for Nodes. Add the new action to each of "After saving a new post", "After saving an updated post" and "After deleting a post" triggers.

Testing

If you don't have a server available (I don't yet), then just comment out lines 17-30 of dxi.module (the block which has the XML-RPC call and the error handling), then create/update/delete a story. After each step, take a look at the log entries generated - navigate to Administer :: Reports :: Recent log entries and look for the log messages with the type "dxi" - you should see entries with "Sending request..." which should convince you that the stuff is working.

Salmon Run