Close

SQL for the fun of it: finding typing patterns in words (Part 2 of 3: using Model Clause)

Part 1
Part 3

In my last post I solved the task of finding words that can be typed with alternating hands by using collections, match recognize, and analytics. I closed with the comment that I was confident MODEL could be used to solve it too; but at the time I wasn’t sure how to do it.

I have a hard time letting a good puzzle go, so today I decided to confront my limits and learn to use MODEL a bit better.
For my first pass, I started with the same idea as I used before. That is, generate rows of letters for each word in order.

SELECT word,
       LENGTH(word) len,
       COLUMN_VALUE n,
       SUBSTR(word, COLUMN_VALUE, 1) c
  FROM words,
       TABLE(    SELECT COLLECT(LEVEL)
                   FROM DUAL
             CONNECT BY LEVEL <= LENGTH(word))

In my model clause I create a couple of dummy measures: HAND and ALTERNATING. Then I create rules such that HAND is defined to be left or right (L or R) depending on which set of keys are used with the same vctab collection construction as in the previous solutions. That part is fairly straight forward. ALTERNATING is a little trickier, I define the rule such that the first letter is defined to be alternating (the first key stroke is always ok.) Then, for each letter after that I check if the previous letter was considered alternating and if it was, then check if the current letter is on the alternate hand from the previous letter. Thus, any letter that repeats hands will cause itself and all subsequent letters to be flagged as non-alternating. Thus, if the last letter is still flagged alternating, then every letter must have alternated.

My query at this point looked like this:

SELECT *
  FROM (SELECT word,
               LENGTH(word) len,
               COLUMN_VALUE n,
               SUBSTR(word, COLUMN_VALUE, 1) c
          FROM words,
               TABLE(    SELECT COLLECT(LEVEL)
                           FROM DUAL
                     CONNECT BY LEVEL <= LENGTH(word)))
MODEL
    PARTITION BY (word)
    DIMENSION BY(n)
    MEASURES(c,
             len,
             '-' hand,
             '---' alternating)
    RULES UPDATE SEQUENTIAL ORDER
    (
        hand [ANY] = CASE
                         WHEN c[CV(n)]
                                   MEMBER OF vctab('q','w','e','r','t','a','s','d','f','g','z','x','c','v','b')
                         THEN
                             'L'
                         WHEN c[CV(n)]
                                   MEMBER OF vctab('y','u','i','o','p','h','j','k','l','n','m')
                         THEN
                             'R'
                     END,
        alternating [ANY]
        ORDER BY n =
            CASE
                WHEN CV(n) = 1
                THEN
                    'yes'
                WHEN alternating[CV(n) - 1] = 'yes'
                 AND (hand[CV(n)], hand[CV(n) - 1]) IN (('L', 'R'), ('R', 'L'))
                THEN
                    'yes'
                ELSE
                    'no'
            END)

Producing results as follows:

WORD        N C    LEN HAND ALTERNATING
--------- --- -- ----- ---- -----------
chair       1 c      5 L    yes        
chair       2 h      5 R    yes        
chair       3 a      5 L    yes        
chair       4 i      5 R    yes        
chair       5 r      5 L    yes 
       
iambic      1 i      6 R    yes        
iambic      2 a      6 L    yes        
iambic      3 m      6 R    yes        
iambic      4 b      6 L    yes        
iambic      5 i      6 R    yes        
iambic      6 c      6 L    yes

fidelity    1 f      8 L    yes        
fidelity    2 i      8 R    yes        
fidelity    3 d      8 L    yes        
fidelity    4 e      8 L    no         
fidelity    5 l      8 R    no         
fidelity    6 i      8 R    no         
fidelity    7 t      8 L    no         
fidelity    8 y      8 R    no

Finally, wrap the entire query as an inline view and filter the results by applying the last condition: the words must end with the left hand.

SELECT word
  FROM (SELECT *
          FROM (SELECT word,
                       LENGTH(word) len,
                       COLUMN_VALUE n,
                       SUBSTR(word, COLUMN_VALUE, 1) c
                  FROM words,
                       TABLE(    SELECT COLLECT(LEVEL)
                                   FROM DUAL
                             CONNECT BY LEVEL <= LENGTH(word)))
        MODEL
            PARTITION BY (word)
            DIMENSION BY(n)
            MEASURES(c,
                     len,
                     '-' hand,
                     '---' alternating)
            RULES UPDATE SEQUENTIAL ORDER
            (
                hand [ANY] = CASE
                                 WHEN c[CV(n)]
                                           MEMBER OF vctab('q','w','e','r','t','a','s','d','f','g','z','x','c','v','b')
                                 THEN
                                     'L'
                                 WHEN c[CV(n)]
                                           MEMBER OF vctab('y','u','i','o','p','h','j','k','l','n','m')
                                 THEN
                                     'R'
                             END,
                alternating [ANY]
                ORDER BY n =
                    CASE
                        WHEN CV(n) = 1
                        THEN
                            'yes'
                        WHEN alternating[CV(n) - 1] = 'yes'
                         AND (hand[CV(n)], hand[CV(n) - 1]) IN (('L', 'R'), ('R', 'L'))
                        THEN
                            'yes'
                        ELSE
                            'no'
                    END))
 WHERE n = len AND hand = 'L' AND alternating = 'yes';

That worked pretty well, but I didn’t think I was fully utilizing the MODEL syntax. Most significantly, I was using CONNECT BY to generate the character rows and that’s something MODEL could do for me. So I took another stab at it, using a FOR expression to count from 1 to length. Here I ran into some syntax quirks. The word column, as a partitioning value is an invalid identifier within the rules clause, so I had to create a measure (w) from the partitioning word. Also, since the length measure will be referenced in the outer query, its value needs to be set for every generated row (len [n] = len[1].) The hand and alternating measures I set with the same logic as shown above and then again use the same filter in the outer WHERE clause.

SELECT word
  FROM (SELECT *
          FROM words
        MODEL
            PARTITION BY (word)
            DIMENSION BY(1 n)
            MEASURES(word w,
                     '-' c,
                     LENGTH(word) len,
                     '-' hand,
                     '---' alternating)
            RULES
            (
                c [FOR n FROM 1 TO len[1] INCREMENT 1] = SUBSTR(w[1], CV(n), 1),
                len [n] = len[1],
                hand [ANY] = CASE
                                 WHEN c[CV(n)]
                                           MEMBER OF vctab('q','w','e','r','t','a','s','d','f','g','z','x','c','v','b')
                                 THEN
                                     'L'
                                 WHEN c[CV(n)]
                                           MEMBER OF vctab('y','u','i','o','p','h','j','k','l','n','m')
                                 THEN
                                     'R'
                             END,
                alternating [ANY]
                ORDER BY n =
                    CASE
                        WHEN CV(n) = 1
                        THEN
                            'yes'
                        WHEN alternating[CV(n) - 1] = 'yes'
                         AND (hand[CV(n)], hand[CV(n) - 1]) IN (('L', 'R'), ('R', 'L'))
                        THEN
                            'yes'
                        ELSE
                            'no'
                    END))
 WHERE n = len AND hand = 'L' AND alternating = 'yes';

I like this one better because it seems truer to the spirit of the task of using MODEL.

I’m always open for comments, but on this one in particular I welcome suggestions and variations from others on better or simply alternate implementations using MODEL.