Contribute
Register

High Sierra SMB maze and how I got out of it (TLDR).

Status
Not open for further replies.
Joined
Sep 11, 2016
Messages
9
Motherboard
GIGABYTE Z97X-UD5H F9F
CPU
i5-4690K
Graphics
EVGA GeForce GT 740
Mac
  1. iMac
  2. MacBook Pro
  3. Mac mini
Mobile Phone
  1. Android
  2. iOS
Although started as a curiosity and hobby, I have been using my desktop with High Sierra as a primary machine for over three years now. Each time, I typically patch this machine a month after it is released by Apple. And, each time I do this, I make sure to also update clover. The machine has been working like a champ all this time. And, at any given point in time, the web browser (FF and Chrome) have 50 tabs open on it, in addition to Eclipse SDK, an email client, around 20 terminal windows, and few office documents (Word, and Excel). Occasionally, it also runs Virtualbox with a guest OS (Windows 10) consuming 4GB of RAM. Bottom line, the machines gets hard use.

Last week, I had a need to watch a video on my MBP. As the files were large (over 4 GB), instead of scp, I decided why not set up a samba share and access the same from my MBP. I quickly set this up using File Sharing, shared my Download folder, and watched the video. I was pleased and thought to keep this setup going forward as I need not keep two copies of such disposable files. However, when I came back to this machine, a couple of hours later, I felt that the machine was little laggy, it was taking around 15 sec to unlock the screen. Out of nowhere, beachball would show up for a few second. Autocomplete on terminal would take longer than usual.

My first instinct was maybe a javascript in of the open tabs on browser might have gone berserk. But the activity monitor showed CPU mostly idle. Next, I thought, a reboot would fix things as the machine has been running for a long time. That did not fix the things either. That is when the chase started to figure out this elusive problem.

Google search for these symptoms resulted in mostly fix for high CPU usage. Here is the list of things I tried (while reading pages after pages on the internet):

1) One suggestion was to adjust SMB browsing behavior, that did not help either

2) Thought, let me disable SMB, things would come to normal. So, disabled SMB file share, which resulted in no improvement. This is when I started to worry a little.

3) One suggestion was hard disk space may be low, so I looked at my 500GB SDD usage, and saw 100GB free. I still went ahead and performed a major cleanup feeing up another 150GB. Cleaned up entries on desktops but this was of no use.

4) Ran fsck using disk utility, which came clean. Ruled out DNS lookups by capturing traffic on the wire.

5) Few cases were related to iCloud account with keychain sync (other entry syncs), I went ahead and disabled keychain sync. I also tried by logging off the iCloud, but still no improvement. It took over 15 sec to unlock the screen.

6) Moved UserPref file to the desktop as it did not make any difference, placed the file back.

7) Rebuilt spotlight indexes: Deleted /.SpotLight-V100 etc, and ran "mdutil -i on /; mdutil -E /". This improved matters a little. Instead of taking over 15 seconds, screen unlock started to work in 12-13 seconds.

8) Reset permission on home directory (using diskutil resetUserPermissions / `id -u`),

9) While doing the above, I noticed an opendirectoryd message on console "Authentication failed for <private> with ODErrorCredentialsInvalid" repeating often. Started to research, and found <private> could be revealed using "sudo log config --mode "private_data:eek:n". And, realized it was my username showing up.

10) Search for ODErrorCredentialsInvalid did not result in any help either

11) By this time, I was desperate for a solution. Realized, two old Kexts FakeSMC.kext and AppleIntelE1000e.kext. Upgraded both of them.

12) apfs.efi was 3 years old, so upgraded that to the latest one. But these changes resulted in no improvement, other than making me feel good about running the latest drivers.

13) In the most search result, when faced with such a problem, common theme seems to be clear up cache, delete preferences, and if all this does not work, restore from Time Machine. Although Time Machine restore is an option, I wanted to exercise as the absolute last resort. As that still requires license setup for a few things. I did not feel comfortable with deleting cache and preferences so decided against it ( I arrived at decision after observing all the files changed after when I enabled Samba, and concluded that none of those files could have resulted in this slowness).

14) While looking at logs on the console, coreduetd came up often. So, I figured out what it is, and also looked at its SQLite database. At first thought of disabling it, but decided against it as it seemed like a key process.

15) This is when I created a new user, and realized this new user is not affected by this slowness (Good, no need to perform Time Machine restore). Still was reluctant to completely switch to this new user as the current user has many scripts running, had many eclipse projects tied to this user.

16) The behavior was like as if OS stopped using buffer. So, spent another day researching about it and came to know that OSX does great job on managing buffers and it could not be configured much and more than likely my machine was not slowing does because of this.

17) To preclude any file syncing software causing this, I disabled dropbox, onedrive, and googledrive. Basically, at this time started to test things with literally no software running (sometimes in safe mode). No improvement!

18) I now started to compare screen unlock logs from MBP and this machine. I realized there were a few steps which were taking longer. And, also realized that "Authentication failed for <private> with ODErrorCredentialsInvalid" also appears on MBP.

19) I started researching how to put various system calls in debug mode. And, found out that Console has a facility for this (Action -> Include Debug Message). So simple, felt like complete idiot.

20) I went through debug logs and observed that authentication attempt by loginwindow process went to opendirectryd three times (failing twice and succeeding the third time).

21) First failure was related to trying to authenticate using kerberos. Started to research on how to disble kerberos authentication, but could not find any information on it.

22) Fiddled with all settings on "Privacy and Settings", no improvement.

23) On closer observation, I saw that group search calls to opendirectoryd were getting queued. So, started to research that.

24) This is when I found OSX has great opendirectory client (Directory Utility) in addition to cli tool dasl and odutil. Where I saw two services configured (Active Directory and LDAPV3).

25) Tried changing "Search Policy" thinking maybe a search is going somewhere and resulting in a timeout.

26) I started to capture all traffic using wireshark, and saw little to no traffic going out/in during screen unlock. So, ruled out that login window is timing out because it is trying to make a connection outside.

27) Simple LDAP searches against opendirectory were very fast.

28) On the Directory Utility, observed that user with slowness has two "AuthenticationAuthority" defined (one for kerberos, another Shadow). But the user which worked well only has one defined (Shadow). Again researched on how to disable kerberos but could not find much.

29) Around this time I ran "id" command against both users and observed the following:

picanmix4:/Users/user1/Library/Preferences> time id user1
uid=501(user1) gid=20(staff) groups=20(staff),702(com.apple.sharepoint.group.2),701(com.apple.sharepoint.group.1),12(everyone),61(localaccounts),79(_appserverusr),80(admin),81(_appserveradm),98(_lpadmin),501(access_bpf),33(_appstore),100(_lpoperator),204(_developer),250(_analyticsusers),395(com.apple.access_ftp),398(com.apple.access_screensharing),399(com.apple.access_ssh),703(com.apple.sharepoint.group.3)

real 0m2.010s
user 0m0.002s
sys 0m0.002s
picanmix4:/Users/user1/Library/Preferences> time id user2
uid=506(user2) gid=20(staff) groups=20(staff),12(everyone),61(localaccounts),79(_appserverusr),80(admin),81(_appserveradm),98(_lpadmin),701(com.apple.sharepoint.group.1),33(_appstore),100(_lpoperator),204(_developer),250(_analyticsusers),395(com.apple.access_ftp),398(com.apple.access_screensharing),399(com.apple.access_ssh),703(com.apple.sharepoint.group.3)

real 0m0.010s
user 0m0.002s
sys 0m0.002s

30) Closely compared two users and saw user1 had an extra group "com.apple.sharepoint.group.2". When performed nested search using dscl for user1, it was slow. Boom, I was onto something substantial.

31) "com.apple.sharepoint.group.2" was a name for "Download" group created by OSX (probably when I enabled file sharing).

32) Researched if group membership could be changed, but "Users and Group" does not have that facility. There used to be a tool in the past that allowed such changes, but Apple discontinued it. I really did not want to muck directly with the directory thinking it may have adverse affect somewhere else.

33) Finally decided to manually delete "GroupMembers" and "GroupMembership" attribute from the "Download" group. And, that was it, "id user1" came as fast as "id user2".

So, here is what it seems happened. File Sharing configuration created this "Downloads" group, which resulted in a loop, causing delay. Other then dscl, Directory Utility, which requires good knowledge of LDAP/AD, I could not find any tool that could perform health check on opendirectory (may be its time to write one).

I spent over three days constantly thinking and working on this problem and I am putting it out here so that it is useful for others. Overall, I'd suggest staying away from SMB on OSX as going by search results on google and duck duck go, it results in lots of headaches.
 
Joined
Sep 11, 2016
Messages
9
Motherboard
GIGABYTE Z97X-UD5H F9F
CPU
i5-4690K
Graphics
EVGA GeForce GT 740
Mac
  1. iMac
  2. MacBook Pro
  3. Mac mini
Mobile Phone
  1. Android
  2. iOS
1589075369102.png


Forgot to add this image. If you look closely, you'll notice GeneratedUID and NestedGroups having the same value. I believe NestedGroup points to group it is part of. In this case, it is pointing to itself resulting in loop.
 
Status
Not open for further replies.
Top