Tuesday, March 27, 2012

How To: Bulk Delete Files in a Large SharePoint Document Library using PowerShell

A client recently had a case where a migration of tens of thousands of documents into a document library failed, and they wanted to delete everything and start over.  It turns out there is no easy ‘Delete Everything’ button or API call in SharePoint.  Instead, I came up with this gem to fairly quickly check in any ‘orphan’ files, delete a few thousand items in batches of 1000, and then delete all folders:

param($url,$libraryName)

$w = get-spweb $url
$l = $w.Lists[$libraryName]
$format = "<Method><SetList Scope=`"Request`">$($l.ID)</SetList><SetVar Name=`"ID`">{0}</SetVar><SetVar Name=`"Cmd`">Delete</SetVar><SetVar Name=`"owsfileref`">{1}</SetVar></Method>"

function DeleteAllFiles($folder)
{
   
    $count = $folder.Files.Count - 1
    if($count -gt -1){
    Write-Host "Deleting $count files..."
    for($i = $count; $i -gt -1; $i--){
        $f = $folder.Files[$i];
        if($f.CheckOutStatus -ne [Microsoft.SharePoint.SPFile+SPCheckOutStatus]::None){
            $f.CheckIn("Checkin by admin");   
        }
        $f.Delete()
    }
    }
   
}

function BuildBatchDeleteCommand($items)
{
    $sb = new-object System.Text.StringBuilder
    $sb.Append("<?xml version=`"1.0`" encoding=`"UTF-8`"?><Batch>")
    $items | %{
        $item = $_
        $sb.AppendFormat($format,$item.ID.ToString(),$item.File.ServerRelativeUrl.ToString())
    }
    $sb.Append("</Batch>")
return $sb.ToString()
}


$count = $l.CheckedOutFiles.Count -1;
Write-Host "Taking over $count items that have never been checked in."
for($i = $count; $i -gt -1; $i--){
    $f = $l.CheckedOutFiles[$i];
    $f.TakeOverCheckOut()
    Write-Host $f.Url
}

 

Write-Host "Deleting $($l.Items.Count) items"
while($l.Items.Count -gt 0){
    $q = new-object "Microsoft.SharePoint.SPQuery"
    $q.ViewFields="<FieldRef Name=`"ID`" />"
    $q.ViewAttributes = "Scope=`"Recursive`""
    $q.RowLimit=1000

    $items = $l.GetItems($q)
    $cmd = BuildBatchDeleteCommand($items)
    Write-Host "Deleting $($items.Count) items..."
    $result = $w.ProcessBatchData($cmd)
    if ($result.Contains("ErrorText")){ break; }
    Write-Host "Deleted. $($l.Items.Count) items left..."
}

Write-Host "Deleting $count folders..."
$l.Folders | %{$_.Url.ToString()} | sort -descending | %{
    $folder = $_
    $folder
    $f = $w.GetFolder($folder)
    $f.Files.Count
    $f.Delete()

}

Monday, March 5, 2012

10 SharePoint Devilish Details

I have a love-hate relationship with SharePoint.  There is so much that it can do, and so many ways it can help an organization solve collaboration problems, and I have yet to see a better alternative. No, the company currently advertising as a SharePoint alternative is not really a replacement for everything SharePoint does.  But on many occasions, I've found that with SharePoint "The devil is in the details".  There are some things that SharePoint is so awfully stupid at that I want to throw it out the window some days.  This list is presented for two reasons. First, I have a desperate, irrational hope that Microsoft will fix some of these issues.  Second, to educate SharePoint users on where to expect some pain points.  As with any system, sales is guilty of over-selling SharePoint's capabilities, and underplaying some of these 'devil in the details' problems.  With that out of the way, here are 10 annoying problems with SharePoint:

One: InfoPath browser forms do not always open in the browser

You work hard and get a nice InfoPath browser form going.  It looks great in IE and Chrome, and the customer is happy.  Then they get a link in a task email. Or click a link in workflow history.  Or link to the form from navigation.  In all cases, despite your best efforts and various settings that sound like they might work, the form opens in InfoPath Filler for IE users that have InfoPath installed.  Microsoft will tell you this is by design- even the wording hints at it: Browser Compatible Forms.  As in: 'This form can run in the browser if it has to'.  As in: 'This form won't break in the browser, but we still think people will LOVE having a fat client pop up to fill out or view a form.'  As in fix this junk:  Users expect browser forms to open in the browser always for all cases, regardless of where the link is from.

Two: Worfklow Status link in a list view on a page results in error

I blogged about this one already, but imagine you have a list with a workflow.  You add a view to that list to a page.  Clicking workflow status column results in an exception because of a missing item in the link querystring.  Dumber than a bag of hammers.

Three: Everything on the Software Limits and Boundaries Page

Kudos to Microsoft for letting it all hang out.  At least the limits of SharePoint are all in one convenient place.  I don't see other systems doing that.  That said, when presented to end-users, these violate a core principal of UX design: Don't Make Me Think.  Users create lists, get them working will with a little workflow, then WHAM get hit with errors because their library is over the huge limit of 5000 items (up from 2000!).  Or their content database is over 200G, or their site collection over 100G.  To be fair, until we get holographic storage arrays and quantum computers, any system has its limits.  It's just a lot of the problems here could be solved without worrying the user.  The 5000 item list view limit especially needs a better "Don't Make Me Think" solution. 

I'd go so far as to even question the boundaries placed by site collections vs sites and subsites.  Every customer I've dealt with expects the ability to easily aggregate things like task lists and calendars across their entire farm, or manage permission groups that cross site collections.  I understand some of the technical reasons these boundaries exist, but the end result is that they force novice users to make choices they are not prepared to make without significant upfront understanding of the boundaries and trade-offs, and some difficult-to-make predictions of future growth. "Read and understand the entire technet planning section and formulate a governance plan before you install the bits" is a hard sell.

Four: Common Form Scenarios Require Custom Workflow and Item-level permissions

Imagine this scenario: You have a paper form.  When you put it on a person's desk, you are not allowed to see any other forms but yours.  That person may reject it or get you to fill out a missing section, but otherwise you cannot sneak in and thumb through other users forms, nor can you change your form after you have submitted.  I just described almost every paper form process ever. Now, try doing that in SharePoint.  I'll spare you: you need to write a custom workflow or event receiver to make that happen by setting item level permissions, or moving items to secured folders.  But don't run into the software limits and boundaries on item level permissions. And be sure to think through the security context of the event receiver or workflow. By default, SharePoint Form Libraries allow anybody who can submit an item to also see all other items that have been submitted.  Add and View permissions are tied together at a fairly low level. This topic can get involved: imagine a form where managers can only see their employee's forms.  But in the end, users expect to very easily set up a form library that mimics security in real world paper processes.  SharePoint does not support this without custom code.

Five: Disabling the Office Web Apps feature does not disable Office Web Apps, with bonus Licensing and Uninstall Fiasco

Office Web Apps lets users display and edit Office docs in the browser, even if they don't have office installed on their machine.  Sounds cool, except the license states that only licensed Office users can use it.  So, if you _have_ a license for Office, you can use the browser apps on your other non-Office-having machine.  I guess if you have a work PC with Office, and a home Mac without, that would be mildly useful.  If it were me, though, I'd get Office at home.  Personally, I've found most users to be happy opening Word documents in full Word client.  To each their own.  No matter: I'll just uninstall it.  Except Microsoft's own guidance states that uninstalling Office Web Apps will remove the server from the farm.  Because clearly, if you don't want Office Web Apps, you also feel like rebuilding the farm.  Sigh.  Never mind- I'll just disable the feature on every site collection.  Ok, but then if users don't have Word installed, or if they use a browser that doesn't tell the server the client has Word installed (ie isn't IE), then Office Web Apps will _still_ open the document in the browser. There is a hack for this one, but in the end, users expect disabling a feature really _disables the feature_.

Six: Managed Metadata Doesn't Work Everywhere

One of the much-talked-about new features of 2010 is Managed Metadata.  On the surface, this is a welcome addition.  It adds the ability to create taxonomies and folksonomies that cross site collections.  However, this functionality is not supported in every case.  Specifically, it is not supported in Content Organizer, Sandbox solutions, client object model, or InfoPath Browser forms.  Various levels of hackery exist to work with each of these situations, but we shouldn't have to hack.  What's the point of a killer new feature if it's half-baked?

Seven: Cryptic errors and No ULS Log Viewer

Correction: Dozens of 3rd party ULS Log Viewers because Microsoft expects you to muddle through text files to find errors.  This should be baked-in and web-accessible. But the best log viewer doesn't help if the log entries are all "PC Load Letter".  If I had a dollar for every 'unexpected error occurred' I've seen in SharePoint, I'd be retired and fly fishing in Montana instead of blogging about SharePoint.  Here's a gem I got in the ULS log just today:

Microsoft.SharePoint.UserCode.SPUserCodeSolutionProxiedException: <nativehr>0x8102009b</nativehr><nativestack></nativestack>

Seriously, what am I supposed to do with that?  Fortunately, I was able to Google '0x8102009b' and resolve, but I shouldn't have to.  Errors like that are so 1980.

Eight: Vomitous Markup

SharePoint 2010 has gotten much better about browser support.  But one look at the markup for any given SharePoint page makes me want to puke.  Tables within tables within tables, except when it's divs within divs within divs, and dozens of javascripts, and miles of CSS.  And let's not even talk about custom branding.  Again, I'm trying to be fair: it does a ton, and has come a long way since 2007.  But let's have some meaningful, beautiful, minimal markup, accessibility, and true cross-browser compatibility.

Nine: SQL Team vs SharePoint Team vs Office Team: FIGHT!

SharePoint, it turns out, is developed as a conglomeration of various teams within Microsoft.  It roughly falls under Office, but in practice there is some disconnect between Office "Server" and Office "Client" teams. InfoPath lack of support for Managed Metadata is one good example of this.  By all accounts, this was "on the list", but simply couldn't be accomplished in time for release.  Or a CU.  Or SP1 a year later.  But if Client vs Server teams have some issues, SQL vs SharePoint teams are worse.  Reporting Services, PowerPivot, and PerformancePoint all plug in to SharePoint, but each have unique idiosyncrasies or downright bugs that can be a beast.  Reporting Services, for example, does not install as a service application, but instead as a standalone service running on one of your SharePoint boxes (This changes in SQL RS 2012, supposedly).  But call support for Reporting Services, and you'll get run-around and bounced back and forth between "SharePoint Support" and "SQL Support".  This is perhaps unavoidable due to the nature of the product, but unavoidable or not, users expect an "integrated" solution to be truly integrated, with no awkward seams, and typically do not care which support team or development team is responsible.


Ten: The 14 Hive

There's nothing Common or Shared about it, and it's a stretch to call it a Web Server Extension.  Put that junk in C:\Program Files\SharePoint 2010 already.