Monday, January 8, 2018

Exchange Unified Messaging - Does Anybody Really Know What Time it is? (Your Edge Servers Really Should...)

Greetings and Happy New Year! I hope everyone's 2018 is off to a great start.

I received an interesting problem recently that I wanted to share. Below is an email from a client (I'm paraphrasing):

"Josh,

Exchange Unified Messaging (UM) is no longer working internally. People can call in from the outside, reach UM, and leave messages just fine. However, our own internal users cannot call eachother and leave a voicemial. Calls internally either go to dead air or receive a busy signal. 


Is this something you can help us out with?"

Of course I can help! 😃

The client has a hybrid environment with their S4B severs on premise and Exchange environment in the cloud. Since UM must reside in the same place as the Exchange mailboxes, their UM environment resides in the cloud as well. The customer connects to their O365 tenant via Edge federation (this will be important later 😉).

I decided to take a trace via CLS logger on the FE server of a failed internal VM call and noticed the following error:


SIP 429 Provide Referrer Identity

ms-diagnostics: 1020;reason="Identity of the referrer could not be verified with the ms-identity parameter";ErrorType="Invalid signature";Referrer="user1@contoso.com";HRESULT="0xC3E93EE0(SIP_E_CRYPT_REFERRER_DATE_SKEWED)";cause="Invalid signature";signer="skypefe2.internal.contoso.com";source="sip.contoso.com"
ms-edge-proxy-message-trust: ms-source-type=EdgeProxyGenerated;ms-ep-fqdn=skypeedgepool1.internal.contoso.com;ms-source-verified-user=verified
$$end_record

 I've bolded the part of the error above that helped me fix this issue. I only half understood what the error was trying to tell me - the date on the Edge pool is somehow 'skewed' and incorrect. I hadn't seen this error before, so some further research was needed.

A quick Google search pointed me to this article mentioning the exact same error the client was having. I logged onto one of the two Edge servers in the pool and noticed the same error mentioned in the article linked above:





The clock on the second Edge Server in the pool was six hours behind the current time. The customer had rebooted to install updates approximately two weeks before the error occurred and the clock never re-synced. Judging from the error in the event logs above, my guess as to why UM was not working is that for 'crytographic verification' to occur (i.e. TLS traffic via the Edge Server to O365) the time of the request must match between the two systems.

After adjusting the clock manually on the offending Edge Server, I had the client re-test and everything was working again. Yay!

Before I sign off, I must pay homage to the band Chicago. They were the true 'inspiration' for the title of this post 😉. 


Heck yes that is a keytar



Tuesday, December 26, 2017

Skype for Business - Hey, Where Did my Response Groups Go?!?

Now, for its next trick, Skype for Business will make Response Groups in the Control Panel magically disappear!

Seriously? You have a lot of nerve, Skype For Business. 😉

A client opened a ticket recently for help recovering his Skype for Business Response Groups. He shared a picture of the empty Control Panel:



Bye, Felicia.

I had the client run several commands to verify the Response Groups were still seen via PowerShell. 'Get-CsRGSWorkflow' and 'Get-CsRGSQueue' both returned all workflows and queues that had been created. At this point, the only other thing I could think of that may prevent him from seeing the Response Groups was his permissions.

I asked to see which RBAC groups he was in and I noticed he was in both the 'CsReponseGroupAdministrator' and 'CsResponseGroupManager' groups. The 'admin' group manages all the Response Groups for a site, while the 'manager' groups allows for management of specific response groups (Technet Article).

Was it possible that somehow the 'manager' group was overwriting the permissions for the 'admin' group? As it turns out, yes, that appears to have been what happened. A search via Google produced a  blog post outlining the exact same problem in Lync 2013. I had the user remove his account from the 'manager' group and leave himself in the 'admin' group. A quick sign out/sign in forced the group change and he was able to see his Response Groups in the Control Panel again.


Sunday, December 10, 2017

Polycom CX/VVX Phones - Why did my Transfer Fail?

I received an email from a client this week that users at his site could not complete transfers from their Polycom CX or VVX desk phones. Tranfers using the Skype for Business client worked perfectly, but any transfer via the desk phone failed. The client said when they initiated a transfer from a desk phone, it would either transfer to dead air or reply with a busy signal. This happened when the phone tried to transfer to an extension or the full 10 digit DID.

I broke this problem down into several things I wanted to examine. First off, was there something on the phone that wasn't allowing the transfer? I've found Polycom phone configuration files to be a blessing and a curse. You can edit the configuration files to do almost anything, however, searching thru 60-70 pages of a configuration guide to find the one setting you need can be... challenging?

Yes, challenging. We'll go with that. 😏

As it turns out, there is a field that needs to be enabled for transferring. Feel free to copy and paste the line below into a config file (.cfg) and then upload it to a phone to test:

<CONFIG_FILES
call.BlindTransferSpecialInterop="1"
/>

The other side of this issue is in Skype for Business. The cleint's user dial plan allowed for international calling and I specifically configured normalization rules for multiple internal number ranges at his site. I assumed that the phone would use the same dial plan as the user. Dialing out through the phone worked, but when a user tried to transfer a call it would fail. Are the phones using a different dial plan for transfers, somehow? Spoiler Alert: Yes.

To get transferring to work (along with the configuration change above), I had to add the same normalization rules in the user dial plan to the global dial plan. Once the normalization rules were in the global dial plan in Skype for Business, transferring worked like a charm.

I have a small request for this article - if you are someone who is better than me at Polycom phones, please explain if this is normal. Most phones I've worked with have transferring enabled by default. As for using one dial plan for making calls and another one for transferring - why is this a thing?

Comments and questions are always welcome.



Sunday, November 12, 2017

Why Can't I Search Contacts on my Polycom Trio 8800?

I've spent the past several weeks working with a client deploying several different models of Polycom phones. They're using Polycom Trio 8800's for their conference room phone solution. The client is in the UK, so if they have any questions or concerns, I've been getting up a little earlier in the morning to address them first thing.

Two weeks ago, I woke up to the email below (I'm paraphrasing):

"Josh,

We're unable to search any contacts from the Trio 8800's. Can you assist?

Here's what it looks like from the phone:

"

This normally just works, so I was confused. The client was able to login to the phone ok and select the option to search contacts, but nothing showed up. They're using S4B Online, but I didn't see any Support Issues when checking the portal. What gives?

As it turns out, a lot of others were experiencing the same problem. This was a known issue with the 5.4 firmware. Upgrading to 5.5.2 (or above) resolves the issue. I had the client deploy the latest and greatest firmware to their lab (5.6) and confirmed the issue was fixed. They deployed to their production environent shortly thereafter.

Sunday, October 29, 2017

Skype for Business - Why Can't I Escalate to a Conference?

I had a customer a few weeks ago that could not escalate to a conference. They were having a P2P call with someone, tried to add a third user to the conversation and received an error each time. The problem had been going on for about a week and they asked if I could take a look and see what was going on.

I'll admit, I've heard of this issue before, but I had not seen it yet. I started parsing the Lync Event Log on the FE server and noticed a lot of the following SQL Errors:

The 'General' tab included more data, but here is the most pertinent thing we need to focus on:

The transaction log for database 'rtcxds' is full due to 'LOG_BACKUP'.


I'm not a SQL expert, but shouldn't there be a way to increase or decrease the size of the logs? Well, there is, but it turned out to be more convuluted than I had hoped.

I logged onto the BE SQL server where the rtcxds database is located. I right clicked on the database, went to 'Properties' -> 'Files' and then clicked on the '...' button for the log file. Since there was plenty of space on the disk, I tried to increase the size of the log file from 10 GB. I received the following error:

SQL Server Error 9002: Transaction Log is Full

Yeah, understood. That's why I'm trying to increase the size, so work with me here, ok? 😉

After some Googling, I found that I have to take a backup before increasing the size. However, after talking with the customer, they're taking pretty frequent backups already. So, is there a way to take a back without really taking a backup? Yes, of course!
backup log
dname TO DISK = 'NUL:'; (note: just one 'L' is correct)

Entering the above two lines into a sql query (sans my note: on line two) within SQL Server Managment Studio tells SQL to write a backup to 'Null'. As far as SQL is concerned, a write happened, so we're all good.

After running the query above I was able to increase the size of the log file location and the customer confirmed they could escalate to a conference again. I checked the Lync Event Logs on the FE server and noticed the following message:





Monday, October 9, 2017

Skype for Business Online - Why Aren't Agents Receiving Calls in a Call Queue?

I have been helping a client set up Skype for Business Online Call Queues. This morning, they told me they were able to call the service number associated with a queue and hear the greeting, but none of the agents in the queue received the call.

Normally, this just works. I logged into their tenant and didn't see anything suspicious in their queue or group setup. Also, all the users in a queue they were testing with had Cloud PBX licenses applied.

What's going on? 😕

I was grasping for straws, but I decided to check the 'external communications' settings on the tenant. I noticed the client was whitelisting companies to federate with, but they did not include their own domain. Should you have to allow your own domain? Well, if you're whitelisting domains on your tenant, turns out the answer is 'yes' (you'll also need to allow Microsoft as well).

In the Skype for Business Admin Panel on your tenant click 'organization' then 'external communications'. There are three options in the 'external access' dropdown - 'On only for allowed domains,' 'On except for blocked domains,' and 'off completely.' If you're whitelisting domains, the dropdown should be set to, 'On only for allowed domains.'



Under the 'blocked or allowed domains' section, click the '+' button to add both your domain name and Microsoft's domain (microsoft.com). I had the customer add their own domain (microsoft.com was already allowed) and wait approximately one hour. Afterward, they tested several of the queues they had setup and they worked. All agents in each queue received the call in their respective client.

Check yo'self, amirite?  😁

Sunday, September 24, 2017

A Lesson About Toll Fraud via SIP Tracing

I had Skype for Busiess EV customer contact me saying one of their call center agents couldn't call a delear in Guatemala (For reference, the customer has multiple dealrs throughout Centeral America.). I knew the customer has international dialing capabilities through their PSTN Carrier, so not being able to call one country in Central America seemed odd to me. Why was their PSTN Carrier singling out Guatemala?

I started troubleshooting by making sure the user with the failed calls was EV enabled and also had a voice policy in Skype for Business that allowed for international calls. (Yes, I know, it seems kinda basic, but I prefer to start with the easy things and work my way up.) Upon investigation, the user was EV enabled and setup properly in Skype for Buisnesss to allow for international dialing. I asked if other international calls were working and the user said 'yes'.

The problem didn't appear to be in Skype for Business. I also had another user confirm that they couldn't dial any Guatemalan numbers either. Since some international calls were making it out, I needed to get a trace from their gateway to see what was going on.

Luckily for me, they have an AudioCodes SBC. The AudioCodes Syslog tool makes troubleshooting issues like this one much easier. I had the user having issues place a call while I ran the call trace from the Audiocodes SBC. The 'call ladder' diagram in the Syslog tool is excellent for seeing the flow of the SIP messages between Skype for Business, SBC, and Carrier. Here's the call ladder for the failed Guatemala call:



Notice the PSTN Carrier sends a SIP 403 Forbidden Message? A SIP 403 Forbidden message typically means that the user is not permitted to make the call. I've seen this in Skype for Business, for example, when someone with a long distance voice policy tries to make an international call. I didn't know why the customer was getting a SIP 403 from their PSTN Carrier, but I recommended they open a support ticket with their PSTN Carrier and provide them with the trace I took from the SBC.

The PSTN Carrier contacted the customer later that day and said due to high volumes of toll fraud from certain countries in Central America, they were blocking all calls. Wow! I have never seen/heard a carrier proactively do this before, but this does explain the behavior the customer was experiencing.

The customer is going to work with their PSTN Carrier to see what can be done to unblock calls to certain numbers (essentially, white-listing numbers they need to call). This was interesting problem, but I'm glad the resolution was pretty straightforward. Also, I can't recommend the AC Syslog tool enough when you need to take a trace from an AudioCodes SBC.