Having deployed a Lync pool alongside our OCS 2007 R2 pool, merged the topologies, and migrated the users all without much issue, I then wanted to migrate calling away from the OCS mediation servers to the Lync mediation servers. Sounded easy. Unfortunately it wasn’t. Outbound calling worked ok, but inbound calling failed.
As you see in the diagram, the topology is fairly straight foward. Two PSTN gateways on 10.10.10.1 and 10.10.20.1, listening on TCP:5066 and 5060 respectively – one talking (hopefully) to a Lync mediation server (10.10.10.4) on TCP:5066 and the other still talking to an old OCS 2007 R2 mediation server (10.10.20.4) on TCP:5060. For now, forget the 10.10.20.x devices.
Steps I followed to get to this point were essentially as outlined in this Technet library (and the equivalent downloadable word version) http://technet.microsoft.com/en-us/library/gg398092.aspx. In short, define a PSTN gateway in the Lync topology, set listening ports on PSTN and Mediation objects, define and commit a voice route, publish topology, done.
What isn’t well documented here is that if for any reason you change the listening ports for the mediation server after initial publish (which I did, mostly just to bring the port allocations in line with the new Lync default ports as the old OCS deployment was using weird ports), you have to restart the Lync Mediation service, otherwise it wont pick up your port changes, even if you re-run the deployment tool.
So at this point, outbound calls were working. No issue. Happy admin.
But incoming calls from the PSTN would fail with “VoIP Transport Failure” on the Dialogic.
What I had already done on the Dialogic was change the routing table properties, specifically the VoIP Host Group settings, to use the new gateway (as the old OCS R2 mediation server was on 10.10.10.3). This is done in the Dialogic UI under Configuration > Routing Table > VoIP Host Groups (radio button at the top). My config here is pretty straightfoward – a single host group, with a single host entry of the old mediation server. Removed the old host entry, added a new one for 10.10.10.4, and expected everything to just work. It didn’t.
When calling in, the line would never ring, you’d see the call appear in the Dialogic Call Log, but after about 15sec it would fail giving the caller a ‘number not connected’ tone, and the Call Log would display “VoIP: Transport Failure”.
Spun up two sets of logging – first I started the Trace Logger on the Dialogic, and second started a set of Lync logs on the mediation server (S4, SIPStack, MediationServer) – then tried another inbound call. Logging on the mediation server showed zero entries, so clearly nothing was getting past the Dialogic. Looking at the trace logs on the Dialogic, I found the following:
[RouteTable] Code outbound device: VOIP (user@host:port) @10.10.10.4:0
So my incoming call was being routed to the mediation server IP ok, but no port was being passed, so logically I would expect the mediation server to reject it at the network layer, as it didn’t match a listening port.
So, on a hunch, I went back to the VoIP Host Groups setting, and changed the host value from 10.10.10.4 to 10.10.10.4:5066. Instant success. Just for consistency, checked another trace log, and sure enough it looks like this:
[RouteTable] Code outbound device: VOIP (user@host:port) @10.10.10.4:5066
What made this unexpected was that the old host value didn’t include the port, just the IP, but for some reason it worked fine.
Usefully, at this point I’d only migrated one of our two Dialogics/Mediation Servers, so ran the same trace log on the un-modified Dialogic (10.10.20.1). Sure enough, the same thing happens – no port is passed in that RouteTable call. Yet it works. Keen to see more, I ran a “netstat -an 1 | findstr 10.10.20.1” on the OCS mediation server to start an endless netstat search for open connections to the Dialogic (refreshing every second). As soon as the call comes in, a connection opens between the two devices on 10.10.20.1:5060 and the call connects. So somewhere in between that routing call to 10.10.20.1:0 becomes a call to 10.10.20.1:5060. I confess, I don’t know how this manages to happen.
Hopefully this helps someone else avoid the frustration this has caused me. And if anyone has any ideas how OCS manages this port wizardry where Lync fails, I’d love to hear your thoughts.