Wednesday, May 5, 2010

FIND+GREP replacement for PowerShell

It's pretty common in UNIX world to use find in conjunction with grep for searching filesystem tree for something. Would it be searching for specific class in source code tree:


$ find -name \*.cpp -o -name \*.hpp -exec grep -Hb class {} \;

Or just searching for some word in your notes, doesn't matter, but it is really common. So what do we have in PowerShell world to substitute those commands?

First of all, let's try to replace grep. For this we have cmdlet called Select-String, or sls (an alias). Let's try something simple:


$ sls class *.hpp

CustomTimeEdit.hpp:5:class CustomTimeEdit : public QTimeEdit {
DaySelecter.hpp:12: class Model : public QAbstractItemModel
DaySelecter.hpp:43: class View : public QTreeView
[...]

For some reason, documentation got this messed up, saying path should go first, and pattern second, but anyway.

What if we want to search files, whose names match several patterns (for example, cpp and hpp files):


$ sls DaySelector *.hpp,*.cpp

DaySelecter.hpp:1:#ifndef __DAYSELECTER_HPP__
DaySelecter.hpp:2:#define __DAYSELECTER_HPP__
DaySelecter.hpp:10:namespace DaySelecter {
[...]

What if we want to match several patterns in those files?


$ sls DaySelector,MainWindow *.hpp,*.cpp


DaySelecter.hpp:1:#ifndef __DAYSELECTER_HPP__
DaySelecter.hpp:2:#define __DAYSELECTER_HPP__
DaySelecter.hpp:10:namespace DaySelecter {
MainWindow.hpp:1:#ifndef __MAINWINDOW_HPP__
MainWindow.hpp:2:#define __MAINWINDOW_HPP__
MainWindow.hpp:4:#include
[...]

The fact is, for both pattern and path Select-String accepts arrays (which in PowerShell are specified with comma operator).

Now you remember PowerShell operates with objects, not text, yes? So let's see, what those objects are:


$ sls DaySelector *.hpp,*.cpp | gm

TypeName: Microsoft.PowerShell.Commands.MatchInfo

Name MemberType Definition
---- ---------- ----------
Equals Method bool Equals(System.Object obj)
GetHashCode Method int GetHashCode()
GetType Method type GetType()
RelativePath Method string RelativePath(string directory)
ToString Method string ToString(), string ToString(string directory)
Context Property Microsoft.PowerShell.Commands.MatchInfoContext Context {get;set;}
Filename Property System.String Filename {get;}
IgnoreCase Property System.Boolean IgnoreCase {get;set;}
Line Property System.String Line {get;set;}
LineNumber Property System.Int32 LineNumber {get;set;}
Matches Property System.Text.RegularExpressions.Match[] Matches {get;set;}
Path Property System.String Path {get;set;}
Pattern Property System.String Pattern {get;set;}

Not just we can look at the output, we can cut the pieces of output with ease (no more sed scripts for command output parsing!), we can analyze output easily. That's pretty exciting, I think. That's the future of command-line. But I'm off the road.

I think, you've got the idea. With sls you can perform regexp matching (default behaviour), simple matching (-SimpleMatch), search for just first match in file, ignoring all others (-List), or, on the contrary, search for all matches, even if several are present on the same line (-AllMatches), do negative matching (-NotMatch). You can also obtain results in context of several lines forward and backward (-Context option), pretty like diff -u does. What's more, you can even specify different encoding, though this list, for some reason, is far from complete: it contains only Unicode encodings, ANSI and OEM, which is sufficient in most cases, but the list could have been more diverse.

Now what about searching the tree. In PowerShell, we use Get-ChildItem, or ls. I don't know full powers of UNIX ls, but PowerShell ls is pretty powerful thing. We can find all files in the tree, matching specific patterns:


$ ls -r -inc *.cpp,*.hpp

Or we can find files, not matching pattern:


$ ls -r -ex *.hpp~

Or we can even do some crazy stuff, matching some files and then
excluding some files, not matching specific pattern:


$ ls -r -inc *.cpp,*.hpp -ex DaySelecter*

If you want more control over matching, you can process matching files with Where-Object block -- remember? PowerShell operates on objects, not text:


$ ls -r -inc *.cpp,*.hpp -ex DaySelecter* | ? { $_.IsReadOnly }

Ok, now, how we can use powers of ls with sls? The answer is, sls can accept files not only from command line, but from pipe as well, so we can just pipe both commands together, and get the result desired:


$ ls -r -inc *.cpp,*.hpp -ex *DaySelecter* | sls DaySelecter

MainWindow.cpp:26: connect(viewDaySelecter->selectionModel(),
MainWindow.cpp:38: viewDaySelecter->setDiary(diaryModel);

Now that's some powers we need!

2 comments:

jsnover said...

You inspired me to write about one of my favorite parameters in V2: Select-String -Context

http://blogs.msdn.com/powershell/archive/2010/05/07/select-string-context.aspx

Thanks!
Experiment! Enjoy! Engage!

Jeffrey Snover [MSFT]
Distinguished Engineer
Visit the Windows PowerShell Team blog at: http://blogs.msdn.com/PowerShell
Visit the Windows PowerShell ScriptCenter at: http://www.microsoft.com/technet/scriptcenter/hubs/msh.mspx

Unknown said...

Thank you, Jeffrey!