Logo: TechTrax...brought to you by MouseTrax Computing Solutions

Forgotten FIND and FINDSTR

by Greg Chapman, MVP (retired)
Skill rating level 5.

Refining text data with a forgotten friend or two

It still amazes me the amount of work people will do for the sake of a GUI interface. Don't get me wrong; pointing and clicking a familiar set of controls is much easier to learn and remember when the GUI is done well. To date, though, GUIs still haven't met the base requirement of easy use without an installation and they are usually kludgy to work with for automated, unattended processes like refining the data from standard ASCII text files. And it's for those files that the good ol' Find command steps in!

Let's get familiar with Find and its switches first:

FIND [/V] [/C] [/N] [/I] [/OFF[LINE]] "string" [[drive:][path]filename[ ...]]

  /V         Displays all lines NOT containing the specified string.
  /C         Displays only the count of lines containing the string.
  /N         Displays line numbers with the displayed lines.
  /I         Ignores the case of characters when searching for the string.
  /OFF[LINE] Do not skip files with offline attribute set.
  "string"   Specifies the text string to find.
  [drive:][path]filename
             Specifies a file or files to search.

If a path is not specified, FIND searches the text typed at the prompt
or piped from another command.

What's this? Find can do multiple files AND work against specific attributes? It can do negative searches (Lines not containing the search string)? For you *nix gurus out there, think of this as grep's little, weak-kneed cousin.

It's not limited to text files, either. It can extract particular strings from binary files, too. While not as fully featured as the Unix Strings command, this is still a handy ability. For instance, here's Find's output after searching this document for spaces (written in Word XP):

---------- FINDINGFIND.DOC
Forgotten Find Refining text data with a forgotten friend  It still amazes me 
the amount of work people will do for the sake of a GUI interface. Don't get
me wrong; pointing and clicking a familiar set of controls is much easier to 
learn and remember when the GUI is down well. To date, though, GUIs still haven't
met the base requirement of easy use without an installation and they are usually 
kludgy to work with for automated, unattended processes like refining the data 
from standard ASCII text files. And it's for those files that the good ol' Find 
command steps in!  

Let's get familiar with Find and its switches first: 
FIND [/V] [/C] [/N] [/I] [/OFF[LINE]] "string" [[drive:][path]filename[ ...]]
/V         Displays all lines NOT containing the specified string.   
/C         Displays only the count of lines containing the string.   
/N         Displays line numbers with the displayed lines.   
/I         Ignores the case of characters when searching for the string.   
/OFF[LINE] Do not skip files with offline 
    
1h °Ð/ °à=!° "° #  $  %°


Forgotten Find
Greg Chapman
Greg Chapman
Microsoft Word 10.0
MouseTrax Computing Solutions
n  
Forgotten Find


 fj ôƒÄ (
Microsoft Word Document

Hey!! That's not too bad!! We even found stuff in there we didn't expect like some of the document attribute information.

Well that was enjoyable but not really all that useful. What if you were parsing some of the text data from the event files created by another of my monstrous scripts, the Win32EventRealTime.vbs script at http://pubs.logicalexpressions.com/Pub0009/LPMArticle.asp?ID=115.

This script, you might recall, monitors the event log on systems and puts them out to a text file on the drive. There's a lot of data to parse when the system in question is accessed by a lot of users and you're auditing events. How do you parse out only the information you're interested in? Or, asked differently, how do you FIND what you want in that output file? Here's an example in which we'll look for particular event codes:

find /i /n "592" mycomputerevents.txt

And here's the interesting output:

---------- MYCOMPUTEREVENTS.TXT
[910]8/16/2004 8:05:43 PM, MYCOMPUTER, MYCOMPUTER\gchapman, 592, A new process has been created:
[925]8/16/2004 8:05:51 PM, MYCOMPUTER, MYCOMPUTER\gchapman, 592, A new process has been created:

That's pretty cool…but what if I've got several files in this directory from which I want to harvest that event? Easy, change the way you use Find:

find /i /n "592" .\*.*

---------- .\EXAMPLE01.TXT
[910]8/16/2004 8:05:43 PM, MYCOMPUTER, MYCOMPUTER\gchapman, 592, A new process has been created:
[925]8/16/2004 8:05:51 PM, MYCOMPUTER, MYCOMPUTER\gchapman, 592, A new process has been created:

---------- .\EXAMPLE02.TXT
[910]8/16/2004 8:05:43 PM, MYCOMPUTER, MYCOMPUTER\gchapman, 592, A new process has been created:
[925]8/16/2004 8:05:51 PM, MYCOMPUTER, MYCOMPUTER\gchapman, 592, A new process has been created:

---------- .\EXAMPLE03.TXT
[910]8/16/2004 8:05:43 PM, MYCOMPUTER, MYCOMPUTER\gchapman, 592, A new process has been created:
[925]8/16/2004 8:05:51 PM, MYCOMPUTER, MYCOMPUTER\gchapman, 592, A new process has been created:

---------- .\MYCOMPUTEREVENTS.TXT
[910]8/16/2004 8:05:43 PM, MYCOMPUTER, MYCOMPUTER\gchapman, 592, A new process has been created:
[925]8/16/2004 8:05:51 PM, MYCOMPUTER, MYCOMPUTER\gchapman, 592, A new process has been created:

You'll note that I used a funny path specification in the command for Find:

find /i /n "592" .\*.*

In Windows, the "." parameter refers to the local directory, just like the Unix environment. I just happened to be running Find with my current directory set to the location of the files I wanted to search…and I wanted to search all of them so I specified all files using the * wildcard.

Now, the data is obviously less detailed than you might want at this point. Okay, so a process was created…which process? Let's use FINDSTR instead!

FINDSTR is much more powerful than Find with support for multiple string searches AND regular expressions, a topic I've yet to master and have heard referred to as an art form. Using a similar approach to that we used with FIND, we can search out even more specific, informative data. For instance, I want to know how many logged instances of process creation AND process exits occurred for FireFox.exe. I can do that this way:

FINDSTR /i /n /S "592 593 firefox" .\*.txt

This results in another very long set of hits like this:

.\example01.txt:858:8/16/2004 8:04:19 PM, MYCOMPUTER, MYCOMPUTER\gchapman, 593, A process has exited:
.\example01.txt:871:8/16/2004 8:04:25 PM, MYCOMPUTER, MYCOMPUTER\gchapman, 593, A process has exited:
.\example01.txt:884:8/16/2004 8:04:32 PM, MYCOMPUTER, MYCOMPUTER\gchapman, 593, A process has exited:
.\example01.txt:897:8/16/2004 8:05:13 PM, MYCOMPUTER, MYCOMPUTER\gchapman, 593, A process has exited:
.\example01.txt:901:    Image File Name:        C:\PROGRA~1\MOZILL~1\firefox.exe

.\example01.txt:910:8/16/2004 8:05:43 PM, MYCOMPUTER, MYCOMPUTER\gchapman, 592, A new process has been created:
.\example01.txt:914:    Image File Name:        C:\PROGRA~1\MOZILL~1\firefox.exe

FINDSTR has a pretty rich array of switches and options. The list is so extensive that you may discover more than you want to work with and FIND will be your preferred tool. Check out this rich set of instructions:

FINDSTR /?

Searches for strings in files.

FINDSTR [/B] [/E] [/L] [/R] [/S] [/I] [/X] [/V] [/N] [/M] [/O] [/P] [/F:file]
        [/C:string] [/G:file] [/D:dir list] [/A:color attributes] [/OFF[LINE]]
        strings [[drive:][path]filename[ ...]]

  /B         Matches pattern if at the beginning of a line.
  /E         Matches pattern if at the end of a line.
  /L         Uses search strings literally.
  /R         Uses search strings as regular expressions.
  /S         Searches for matching files in the current directory and all
             subdirectories.
  /I         Specifies that the search is not to be case-sensitive.
  /X         Prints lines that match exactly.
  /V         Prints only lines that do not contain a match.
  /N         Prints the line number before each line that matches.
  /M         Prints only the filename if a file contains a match.
  /O         Prints character offset before each matching line.
  /P         Skip files with non-printable characters.
  /OFF[LINE] Do not skip files with offline attribute set.
  /A:attr    Specifies color attribute with two hex digits. See "color /?"
  /F:file    Reads file list from the specified file(/ stands for console).
  /C:string  Uses specified string as a literal search string.
  /G:file    Gets search strings from the specified file(/ stands for console).
  /D:dir     Search a semicolon delimited list of directories
  strings    Text to be searched for.
  [drive:][path]filename
             Specifies a file or files to search.

Use spaces to separate multiple search strings unless the argument is prefixed
with /C.  For example, 'FINDSTR "hello there" x.y' searches for "hello" or
"there" in file x.y.  'FINDSTR /C:"hello there" x.y' searches for
"hello there" in file x.y.

Regular expression quick reference:
  .        Wildcard: any character
  *        Repeat: zero or more occurances of previous character or class
  ^        Line position: beginning of line
  $        Line position: end of line
  [class]  Character class: any one character in set
  [^class] Inverse class: any one character not in set
  [x-y]    Range: any characters within the specified range
  \x       Escape: literal use of metacharacter x
  \<xyz    word position: beginning of word
  xyz\>    Word position: end of word

For full information on FINDSTR regular expressions refer to the online Command
Reference.

As you can see, there's a great wealth of data you can parse at the command line. So the next time you've a complex or long bit of data to sort through and you've a distinct list of items you're looking for, don't forget the old standby commands FIND and FINDSTR!

Click to rate this article.

 

Go up to the top of this page.

This site powered by the Logical Web Publisher™: Content management by Logical Expressions, Inc.