User Tools

Site Tools


more_grep
### Install the big wordlist
sudo apt install wamerican-insane

Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following NEW packages will be installed:
wamerican-insane
0 upgraded, 1 newly installed, 0 to remove and 4 not upgraded.
Need to get 1,528 kB of archives.
After this operation, 6,966 kB of additional disk space will be used.
Get:1 http://mirror.metrocast.net/ubuntu jammy/universe amd64 wamerican-insane all 2020.12.07-2 [1,528 kB]
Fetched 1,528 kB in 3s (437 kB/s)  
Preconfiguring packages ...
Selecting previously unselected package wamerican-insane.
(Reading database ... 437782 files and directories currently installed.)
Preparing to unpack .../wamerican-insane_2020.12.07-2_all.deb ...
Unpacking wamerican-insane (2020.12.07-2) ...
Setting up wamerican-insane (2020.12.07-2) ...
Processing triggers for cracklib-runtime (2.9.6-3.4build4) ...
skipping line: 1
Processing triggers for man-db (2.10.2-1) ...
Processing triggers for dictionaries-common (1.28.14) ...

### How many words in each list?
wc --lines /usr/share/dict/*
  104334 /usr/share/dict/american-english
  663473 /usr/share/dict/american-english-insane
  170421 /usr/share/dict/american-english-large
  275502 /usr/share/dict/brazilian
  103494 /usr/share/dict/british-english
   54763 /usr/share/dict/cracklib-small
  346205 /usr/share/dict/french
  116758 /usr/share/dict/italian
  356010 /usr/share/dict/ngerman
  286295 /usr/share/dict/ogerman
  431384 /usr/share/dict/portuguese
       6 /usr/share/dict/README.select-wordlist
   86016 /usr/share/dict/spanish
  356110 /usr/share/dict/swiss
  104334 /usr/share/dict/words
  104334 /usr/share/dict/words.pre-dictionaries-common
 3559439 total

### What word starts with f and ends uck?
grep -iE '^f.*uck$' /usr/share/dict/american-english-insane
fernambuck
firetruck
fistfuck
fuck
funduck

### The ^ is the anchor for the beginning of the word, the $ anchors
### the end, the -E is for Extended regex, -i is any case, and the .*
### is any number of any character.
### What 8 letter words start with 'm'?

grep -E '^m.{7}$' /usr/share/dict/american-english-insane | wc --lines
3568

### But, this isn't quite correct, because it includes things like
### mystic's and words like mañana's (518 words with other symbols,
### the fix is;
grep -E '^m[a-z]{7}$' /usr/share/dict/american-english-insane | wc --lines
3050

### In this case the search is for a lower case m followed by 7 of
### any of the characters in the list from a to z, inclusive.

### And how many words that start with upper or lower 'm'?
grep -E '^[mM][a-z]{7}$' /usr/share/dict/american-english-insane | wc --lines
3941
more_grep.txt · Last modified: by steve