SMB session issues

The technical support forum for Firestreamer (the virtual tape library).
Locked
Josh
Posts: 4
Joined: 15 Dec 2011, 00:11

Post by Josh »

Hi,

We have been using Firestreamer for close to a year now. For much of that time I have encountered backup failures, with the system indicating "The specified network name is no longer available" in the Windows event log. We were using two D-Link DNS-323 NAS devices with 2x2TB Western Digital Caviar Green drives each. We decided that there was a problem with the NASes and switched to a Drobo FS NAS with 2x2TB Western Digital RE4 enterprise drives. The backup failures began occurring again after several weeks. After extensive troubleshooting with Drobo Support, we were unable to identify any faults with the Drobo. Even after receiving a brand new Drobo FS, the problem still occurred.

Recently I opened a case with Microsoft Support to get assistance with this issue. Their engineers determined from a network trace that SMB sessions were being kept open indefinitely and eventually the NAS was unable to accept any more SMB connections.

I was provided this KB for reference: http://support.microsoft.com/kb/937082

I have been able to have successful backups by reducing the amount of disk space consumed on the NAS (by lowering tape retention and deleting tapes), but this workaround is not a permanent solution because it drastically cuts the space we can use for backups. Note that at no time did any NAS become full.

Given that:

a) the problem still occurs after replacing the NAS, twice,
b) the problem still occurs with different hard drives,
c) Drobo support found no fault with the Drobo, and
d) Microsoft found no fault with the server,

Microsoft has pointed to Firestreamer as the possible cause of this. They believe that it may be erroneously keeping SMB sessions open and causing the connection to the NAS to be dropped.

Your assistance is greatly appreciated.

Thanks.
Josh
Posts: 4
Joined: 15 Dec 2011, 00:11

Post by Josh »

I don't see an Edit button, so here's the rest of my setup:

- Windows Server 2008 R2 Standard
- Microsoft Data Protection Manager 2010
- Drobo FS with 2x2TB Western Digital RE4 enterprise hard drives
- 150 virtual tapes (file media), each 12 GB, to total 1800 GB (I reduced this to 120 tapes to troubleshoot)
jsf
Cristalink Support
Posts: 300
Joined: 29 Aug 2010, 09:03

Post by jsf »

Firestreamer is not aware of SMB at all. All it does, it calls ZwOpenFile to open a virtual tape file, then ZwClose to close it. These two functions are standard Windows API. It's the responsibility of Windows to maintain SMB sessions as needed.

A virtual tape file is open only when the tape is in a tape drive, not when it’s in a storage slot. It's easy to verify whether the file is open by trying to delete it in Windows Explorer. If you can delete the file, it's closed. If you cannot delete the file because of "sharing violation", it's open.

If a tape remains in a tape drive "indefinitely", the corresponding file will be kept open indefinitely, and the underlying SMB session should also be open indefinitely. It should not be a problem by itself, because both Windows and the NAS device must do whatever is needed for a user to access the file for however long is needed.

If Microsoft Support claims that "SMB sessions were being kept open indefinitely and eventually the NAS was unable to accept any more SMB connections", then it's the fault of either Windows or NAS. Firestreamer merely calls ZwClose; the file gets closed from Firestreamer's standpoint, and Firestreamer can do nothing more about it. It's the responsibility of Windows and NAS to gracefully close the SMB session. If Microsoft Support went to the lengths of tracing SMB sessions, they could make one more little step to determine why the session is not closed when the corresponding file is closed.

The fact that the problem happens with multiple NAS devices of both the same and different brands only means that the bug is likely to be in Samba (the Linux implementation of SMB), which is used by most of non-Windows NAS devices. We encountered bugs in Samba before (for example, https://bugzilla.samba.org/show_bug.cgi?id=5315), and I’m sure there are more to be discovered.
I was provided this KB for reference: http://support.microsoft.com/kb/937082
This KB is simply irrelevant, because it applies only to (a) the shares hosted by Windows (not Linux-based D-Link or Drobo), and (b) Vista (or Windows 7) only, where there's an artificial limit of 10 concurrent connections; there's no such limit in Windows Server.

We are willing to cooperate with Microsoft Support to find the root cause of the issue. You can pass our support email address on to them.
Best regards,
John Smith
Cristalink Support
jsf
Cristalink Support
Posts: 300
Joined: 29 Aug 2010, 09:03

Post by jsf »

You can try the following test:
  1. Reboot the server and the NAS to ensure a clean state of both.
  2. Manually perform several backups in Microsoft DPM, which should cause the virtual tapes to be moved between storage slots and tape drives.
  3. Check the current state of the library to make sure there are no tapes left in the tape drives.
  4. Delete the tape files from Windows Explorer. If you can delete all the files, it means that Firestreamer closed all of them. If there are any SMB sessions left open at this stage, it's not Firestreamer's fault.
Best regards,
John Smith
Cristalink Support
Josh
Posts: 4
Joined: 15 Dec 2011, 00:11

Post by Josh »

Thanks for the detailed response. I will relay this information to Microsoft and go from there.
jsf
Cristalink Support
Posts: 300
Joined: 29 Aug 2010, 09:03

Post by jsf »

Hi Josh, do you have an update? Thank you.
Best regards,
John Smith
Cristalink Support
Josh
Posts: 4
Joined: 15 Dec 2011, 00:11

Post by Josh »

Hi js,

Due to lack of time to troubleshoot this further I ended up expanding the available storage on the NAS. Failures are seldom now.
Locked