Friday, February 1, 2013

Macro Parsing

Do file

* In Stata when programming commands we often want to be able to be able to develop our own syntax for our own commands.

* The command "syntax" is often at providing a method for parsing command inputs into easily managed forms.

* However, sometimes we may want syntax options not permitted by the command "syntax".

* If we were to program our own instrumental variable regression that looked like Stata's these limitations might cause problems.

* myivreg y x1 x2 (w1 w2 w3 = z1 z2 z3)

* This is one method how this kind of parsing could be accomplished using the gettoken command.

* Gettoken parses or breaks a macro when a certain character or set of characters is input.

* Let's see it in action

local example1 abc def ghi,jkl
di "`example1'"

* We can see that the local example1 is a disorganized string

* We can seperate it into before and after the space with the following commands.

gettoken before after: example1

di "`before'"

di "`after'"

* The default parse character is " "

* We can specify an alternative parse character or set of characters using the parse option.

gettoken before after: example1, parse(",")

di "`before'"

di "`after'"

* Notice that the parse character is included in the after parse

* Let's see gettoken in: myivreg y x1 x2 (w1 w2 w3 = z1 z2 z3)

cap program drop myivreg
program define myivreg

  di "The local 0 contains the entire command line: `0'"

  gettoken main options: 0 , parse(",")

  gettoken dep_exog brackets: 0, parse("(")

  gettoken dep_var exog : dep_exog

  di "Dependent Variables: `dep_var'"
  di "Exogenous Variables: `exog'"

  gettoken cntr: brackets, parse(")")

  local cntr=subinstr("`cntr'", "(", "", .)

  di "Inside brackets: `cntr'"

  gettoken endog inst: cntr, parse("=")
  local inst=subinstr("`inst'", "=", "", .)

  di "Endogenous Variables: `endog'"
  di "Instrumental Variables: `inst'"

  * At this point we have all of the main pieces of elements we would need to run our ivreg.

  ivreg `dep_var' `exog' (`endog' = `inst')

end

* Let's generate some dummy data
clear
set obs 100

* This will generate some strange data:
*  x2 -> z3 -> w0 -> z2 -> w1 -> z1 -> w2 -> x1 -> y
* Where -> means causes.  This is not a valid iv regression data
foreach v in  x2 z3 w0 z2 w1 z1 w2 x1 y {
  gen `v' = rnormal() `corr_vars'
  local corr_vars ="+`v'"
}

myivreg y (w0=z1)

myivreg y x1 x2 (w0 w1 w2 = z1 z2 z3)

* Everthing seems to be working well.
myivreg y x1 x2 (w0 w1 w2 = z1 z2 z3), test

* However, adding the option "test" does not cause a problem.  Is this a problem.  Not neccessarily.

* We just have not specified any code to ensure that unused syntax is accounted for.

* Combining the gettoken command with the syntax command can accomplish this.


cap program drop myivreg2
program define myivreg2

  di "The local 0 contains the entire command line: `0'"

  * This is a new bit of code that ensures that no options will be included
  syntax anything

  gettoken dep_exog brackets: anything, parse("(")

  gettoken dep_var exog : dep_exog

  di "Dependent Variables: `dep_var'"
  di "Exogenous Variables: `exog'"

  gettoken cntr: brackets, parse(")")

  local cntr=subinstr("`cntr'", "(", "", .)

  di "Inside brackets: `cntr'"

  gettoken endog inst: cntr, parse("=")
  local inst=subinstr("`inst'", "=", "", .)

  di "Endogenous Variables: `endog'"
  di "Instrumental Variables: `inst'"

  * At this point we have all of the main pieces of elements we would need to run our ivreg.

  ivreg `dep_var' `exog' (`endog' = `inst')

end

myivreg2 y x1 x2 (w0 w1 w2 = z1 z2 z3)

myivreg2 y x1 x2 (w0 w1 w2 = z1 z2 z3), test
* Now does not work.

No comments:

Post a Comment