ruby soap integration from the perspective of a seasoned integrator

Fri, Nov 1, 2013

I do lots of integration. I mean, lots, so I’ve seen a lot of different systems, APIs and integration methods and the one rule that always stands is, ‘Know Thy System’. In theory that’s what you should do. In practice it’s always appended with ‘to varying degrees!’. This is almost always down to API documentation or the level of (or lack of) system knowledge of the people who look after it. The politics of integration is another matter entirely and prolly worth a book. Maybe I’ll write one!

An integrator must also be a diplomat. You’ll see a lot of hairy stuff when you crack open the lid on a system. Lack of process, cicadas chirping where metadata nests should be. Digital brushwood tumbling past in a dusty wind of neglect. That’s almost always down to the owners of the system using it ‘just enough’ to provide the service they were contracted to provide. Fair enough but now their service contains lots of data that other service providers would like to use. And that’s where the trouble starts (assuming the politics of getting anywhere near the data have been sorted). All those elysian fields of metadata that should have been populated are lying fallow, cracked and dry in the digital desert. When you encounter these arid plains you have to remember the context in which the system is being used and couch your report in those terms.

Integration is rarely about the physical act of plumbing two systems together. It almost always results in process changes, improvements to both systems. Integration is a process rotovator. An integrator will turn up with their bag of digital spanners, hammers, roll of string and a dirty great digger and turn your ornamental data garden into a productive vegetable plot on steroids by first digging it up and chucking out most of what you thought was useful. Your metadata desert will be transformed into an orchard of Amazonian proportions!

So it was in this context that I approached my latest project, which was to scope out the integration between a Cisco TelePresence Content Server 1st generation (TCS) and a Helix Media Streaming Server. The idea being to let teaching staff search for recorded video conferences in the TCS and have them ingested by the Helix server for posterity (if they’re any good).

There are three points of interest to an integrator in the above statement:

1st generation

Alarm bells! Are we dealing with a dead system? Is it unsupported? Will there be any documentation? Is it even alive? With some ‘tinternetty research and chit-chat with the owner bods I found out two things. It was bought by Tandberg who were putting it out to graze in 2015. i.e. that’s when its end of life was reached and the second was an API guide the bods sent me. As system owners they’re pretty clued up on the front end of the system, which is both good and novel. I’ve preserved the API guide here in case it disappears of ‘tinternet as it’s the only guide on the planet it seems. So, first potential problem spotted and dealt with. Let’s now move on to the second point of interest:

ingested by the Helix server

that sounds like an API but it turns out there is no ingest API for the Helix server. I found this out after reading the published specs and endlessly hassling the account manager for information. Eventually I discovered you have to drop mp4 files along with their corresponding metadata file into a directory on the server. The server is some flavour of Windows and the mp4 files are half a GB a pop. That’s a lot of data to have to move around. Extract from TCS, shift somewhere, then shift into the Helix ingest directory. SAMBA is starting to surface. I mean, how do you mount a Windows directory if not SAMBA? and SAMBA stinks. So this point is looking like it could affect the design of the finished system. It would be worthwhile doing it in ASP.NET MVC4 so it could run directly on the Helix server and have direct access to the ingest directory. Hmmm, interesting. Possible architecture porridge bubbling away on the stove. Now for the third point:

if they’re any good

oh dear, process. I told you. Integration always does something to processes. This is going to impact the architecture big time. Staff won’t be allowed to just search/extract/ingest willy nilly. They’ll have to find something they think should be ingested and someone else will have to agree that yes, this is indeed worthy of archiving and while you’re at it, add some metadata. That’s a lot of asynchronicity involving Shooman Beans.

So three simple statements have taken us from a ground level, solid way to get at recorded video conferences, up a floor to a possible architectural solution, right up to the vapour layer of processes. This is where we have to stop and engage with the project owner.

But while they’re turning the air blue and throwing office furniture through the window, let’s take a look at a nice simple scripty implementation of the first point. Extracting video conferences from the TCS, in the context of ‘Know thy system, to varying degrees’. Let’s just see exactly, how varying the degree of system knowledge affects how you, the integrator, interacts with the system. Let’s turn the system knowledge dial down to half way and see what happens.

We know the system has a SOAP interface and from the API guide we know what the methods are and what they return. So let’s first prepare the way by setting up our toolset, i.e. installing the gems we’ll need:

gem install savon –version ‘~> 2.0’
gem install httpclient
gem install nokogiri

The next thing is to find out where the SOAP is coming from and after a while I managed to track down the source of the bubbles and get the access details from the bod in charge of the TCS. I could then work on extracting all the video conferences. The dates are a symptom of the system knowledge dial being way too low. Read on to find out why they’re there:


START_DATE = '1st Jan 2012'
END_DATE = '1st Jan 2013'
TCS_ENDPOINT = 'http://TCS_URL/tcs/SoapServer.php'
TCS_ADMIN = 'ADMIN_USERNAME'
TCS_PASSWD = 'ADMIN_PASSWORD'
OPEN_TIMEOUT = 300000
READ_TIMEOUT = 300000
SOAP_DEBUG_LOG = false

The Ruby SOAP stack works thus: Savon uses httpclient to send SOAP requsts across ‘tinternet and when you get the SOAP responses back, you process them using Nokogiri, as they’re just plain ol’ XML (POX? ouch!). So this is how to set up Savon. There are a thousand different ways depending on what Savon uses to do the sending but this httpclient version worked for me:


TANDBERG_NS = 'http://www.tandberg.net/XML/Streaming/1.0'
client = Savon.client do
  endpoint TCS_ENDPOINT
  namespace TANDBERG_NS
  digest_auth(TCS_ADMIN, TCS_PASSWD)
  convert_request_keys_to :camelcase
  open_timeout OPEN_TIMEOUT
  read_timeout READ_TIMEOUT
  log SOAP_DEBUG_LOG
end

What’s going to happen is the code will do a wildcard search of the TCS, to get a list of all conferences and the TCS will timeout after 30 seconds. That’s how it works. That’s integration for you. No-one is sure how to increase the TCS SOAP Server timeout (dial is at halfway remember) and even if they did, it would prolly be a bad idea to set it to never time out. This is the ‘varying degrees’ part severely changing how you interact with the SOAP system. You can no longer just query it. You need to break the queries up into manageable chunks for the TCS to deal with. It would be possible to vastly simplify the code, if anyone knew how to control the SOAP server.

The API guide mentions something called a ResultRange, a pair of ints. The first int is the page number and the second int is the number of results in that page. It didn’t work and the guide didn’t explain what it was meant to do. When this happens, you experiment. 1,100 worked but limited the results to 100, doh! 1,200 blew up. Timed out. 2,200 produced no results at all! This was a mental map of the TCS world that had flat edges. This was going nowhere, apart from over a Monty Python style cliff in the Crimson Pearl Assurance.

So next up is using the DateTime parameter which again is a pair of ints. A bit of experimentation showed that the ints are in the order oldest,newest. It helps if you have a good sysadmin for the system who can give you sample data to prove (or disprove) your assumptions.

So, armed with extra knowledge (but not enough to turn the dial up a notch) I first tried extracting the conferences a week at a time but it was clear the TCS couldn’t handle that either, so in the end I had to do it a day at a time. And this is how I did it:


date_time = {"int" => [oldest, newest]}
result_range = {"int" => [1, 300]}
params = {SearchExpression: '*', DateTime: date_time, ResultRange: result_range}

begin
  response = client.call(:get_conferences) do
    message params
  end
rescue Savon::SOAPFault
  puts "oops! #{$!}"
  exit
end

doc = Nokogiri::XML(response.to_xml)

Using Savon to call the SOAP method getConferences and parsing the response with Nokogiri.

The rest is just cruft. Working your way through the SOAP responses and saving them off to disk. That way you’re completely insulated from the system and can produce reports for the project owner in whatever format they’re demanding that day.

Here’s the full script for extracting video conferences, a day at a time, parsing the SOAP responses and saving them off to disk:


# gem install savon --version '~> 2.0'
# gem install httpclient
# gem install nokogiri

# http://apidock.com/ruby/DateTime/strftime

# <GetConferences xmlns="http://www.tandberg.net/XML/Streaming/1.0">
#   <SearchExpression>*</SearchExpression>
#   <ResultRange>
#     <int>1</int>
#     <int>300</int>
#   </ResultRange>
#   <DateTime>
#     <int>oldest</int>
#     <int>newest</int>
#   </DateTime>
# </GetConferences>

require 'savon'
require 'httpclient'
require 'nokogiri'
require './vcconfig'

TANDBERG_NS = 'http://www.tandberg.net/XML/Streaming/1.0'

client = Savon.client do
  endpoint TCS_ENDPOINT
  namespace TANDBERG_NS
  digest_auth(TCS_ADMIN, TCS_PASSWD)
  convert_request_keys_to :camelcase
  open_timeout OPEN_TIMEOUT
  read_timeout READ_TIMEOUT
  log SOAP_DEBUG_LOG
end

start_date = DateTime.parse(START_DATE)
end_date = DateTime.parse(END_DATE)

end_date_unix_time = end_date.to_time.to_i

start_of_day = start_date
end_of_day = start_of_day + Rational(23.9999,24)
end_of_day_unix_time = end_of_day.to_time.to_i

while end_of_day_unix_time < end_date_unix_time
  oldest = start_of_day.to_time.to_i
  end_of_day = start_of_day + Rational(23.9999,24)
  newest = end_of_day.to_time.to_i

  # ##################################
  date_time = {"int" => [oldest, newest]}
  result_range = {"int" => [1, 300]}
  params = {SearchExpression: '*', DateTime: date_time, ResultRange: result_range}

  begin
    response = client.call(:get_conferences) do
      message params
    end
  rescue Savon::SOAPFault
    puts "oops! #{$!}"
    exit
  end
  # ##################################
  
  puts start_of_day.strftime('%d/%m/%Y')
  
  doc = Nokogiri::XML(response.to_xml)
  vc_conferences = doc.xpath("//ns1:GetConferencesResult/ns1:Conference", "ns1" => TANDBERG_NS)
  vc_conferences.each do |vc_conference|
    conference_id = vc_conference.xpath("ns1:ConferenceID", "ns1" => TANDBERG_NS).text
    xml_file = File.open("xml/#{conference_id}.xml", 'w')
    
    puts conference_id
    
    conference_date = DateTime.strptime(vc_conference.xpath("ns1:DateTime", "ns1" => TANDBERG_NS).text, '%s')
    vc_conference['date'] = conference_date.strftime('%a %d/%m/%Y %H:%M:%S')
    
    conference_update_date = DateTime.strptime(vc_conference.xpath("ns1:UpdateTime", "ns1" => TANDBERG_NS).text, '%s')
    vc_conference['updateDate'] = conference_update_date.strftime('%a %d/%m/%Y %H:%M:%S')
    
    conference_duration = vc_conference.xpath("ns1:Duration", "ns1" => TANDBERG_NS).text.to_i
    vc_conference['duration'] = ((conference_duration / 1000) / 60)
    
    xml_file.write(vc_conference.to_s.gsub(/ns1:/, ''))
    xml_file.close
  end
  
  start_of_day += 1
  end_of_day = start_of_day + Rational(23.9999,24)
  end_of_day_unix_time = end_of_day.to_time.to_i
end

The world of the integrator is never dull. It’s never the same either. Some systems have the Harrods of documentation while others are just abandoned corner shops. Their windows smashed and broken but in both cases, the integrator will bring them together, in digital harmony.

Until someone wants a new process that is.

njoy!

comments powered by Disqus