identity manager architectural overview

Wed, Oct 29, 2008

I’ve updated the diagram of Identity Manager with the latest terminology but I’ll leave the original intact as it’s interesting to see the changes. Here’s the latest version:

The basic workflow is a driver picks up changes from SITS, which are fed into the Metadirectory Engine as events. These are translated into commands to the Identity Vault, which are determined by business rules in the Engine. The subsequent changes that occur in the Vault are then propagated to the Elgg Driver as commands and the Driver executes those commands to update Elgg’s information on the user who has just been processed. Business rules can also be defined to handle changes in Elgg, allowing them to propagate to the Vault and ultimately SITS, if required.

There are a few points I need to investigate:

The documentation states a driver and its remote loader need to be installed on the same machine as the application with which they communicate. This isn’t going to work, as I want to plumb Elgg into the IDM workflow. Elgg is PHP, drivers are Java. You can do C++ drivers but they are highly unstable according to Novell. I don’t see why the driver can’t be installed on the Metadirectory Engine server and use REST to call into a custom built Elgg API.

Each application we want to plumb into IDM will either need a way to advertise changes within its internal user landscape, or a Driver will need to maintain state on its behalf. Say a user has an account created in Elgg via the publisher channel from SITS. That would mean the Elgg to Vault Object association would be created by Identity Manager and any subsequent changes to that object in the Vault would be propagated to the Elgg Driver as commands. Likewise, changes to the user object in Elgg would need to be propagated to Identity Manager, to allow the Vault to be updated.

How can you detect those changes in Elgg? What changes need to be detected? That depends on what attributes you want to make “global”, so to speak. Let’s say a special community was set up in Elgg, Cake Munchers Anonymous and the user joined it. The Vault knows about this user via a past association, such as the one just described and the community memberships of the user were deemed to be important enough that the Vault should know about them. So the Vault should know that the user has just joined this community. The Driver must somehow find this out and tell IDM, which will tell the Vault.

Now it gets messy. How can the driver know this? One way is to maintain a cache of the Elgg database and compare it at regular intervals with the current state of the database, updating the cache accordingly. Changes to the cache must then be translated to events sent to IDM via the Driver. This is how Siva works with Son of Pliers. Son of Pliers keeps a cache of SITS events and feeds Siva accordingly. So I know all about caches and how fragile they can be. An alternative is to have Elgg call a (web)service(?) when events occur. This is feasible as Elgg has an event API, although it’s very crude in our deployed version 0.9.x. You need to go to 1.x for the full event API functionality. That would provide a cleaner interface out of Elgg. The two approaches are shown below:

That leads on to the next problem. What happens if the service which Elgg calls to get the Driver to talk to IDM is offline? Elgg events are fire and forget. If it doesn’t reach the service, it’s gone forever and the new community membership will never reach the Vault. Well, that ties in nicely with the work we’re doing on Blackboard/SITS integration using SOA but more of that later. We could use an ESB to get reliable messaging. Should there just be one driver on the subscription channel, feeding an ESB and we develop connectors to the less common applications we use?

On the other hand, should we just install triggers on the elgg database that inform the driver of key events, such as the membership table being updated? If it’s a publish mechanism, where the driver is passive and reacts to incoming messages, it needs reliable messaging or events will get lost. If it’s a pull mechanism, where the driver actively polls the database for changes, it needs a cache, against which the new state of tables of interest can be compared.

Complicated? Welcome to the Enterprise!

comments powered by Disqus