Tuesday, November 13, 2012

Creating Professional Tables Using Estpost

Original Code

* This command should install the package estout.
ssc install estout

estpost is one of several commands included in the pacakge estout.

In my previous post I delt with the primary use of estout, to create post estimation tables fit for publication.

I pretty much read through the high quality documentation listed on the estout site and made my own examples.

I will probably to the same with estpost.

I strongly recommend reading through the documentation found on the website http://repec.org/bocode/e/estout/estpost.html.

See (http://www.econometricsbysimulation.com/2012/11/professional-post-estimation-tables.html)

In this post I will deal with the estpost command which takes the results of several common summary statistics commands and converts them to formats that will be used by esttab.

If you end up using this command to create your tables please cite the author Jann, Ben.

It is obvious a lot of work went into creating this package with probably very little reward:

Citing estout

Thanks for citing estout in your work. For example, include a note such as
"Tables produced by estout (Jann 2005, 2007)."
and add
Jann, Ben (2005): Making regression tables from stored estimates. The Stata Journal 5(3): 288-308.
Jann, Ben (2007): Making regression tables simplified. The Stata Journal 7(2): 227-244.
to the bibliography.

estpost is compatible with the following commands:

From: http://repec.org/bocode/e/estout/hlp_estpost.html#commands

    command              description
    summarize            post summary statistics
    tabstat              post summary statistics
    ttest                post two-group mean-comparison tests
    prtest               post two-group tests of proportions
    tabulate             post one-way or two-way frequency table
    svy: tabulate        post frequency table for survey data
    correlate            post correlations
    ci                   post confidence intervals for means,
                             proportions, or counts
    stci                 post confidence intervals for means
                             and percentiles of survival time
    margins              post results from margins (Stata 11)


* Let us start with some made up data!

set obs 10

* Let's imagine that we are interested in 10 different products
gen prod_num = _n

* Each product has a base price
gen prod_price = rbeta(2,5)*10

* In six different markets
expand 6

sort prod_num

* I subtract the 1 because _n starts at 1 but mod (modular function) starts at 0
gen mercado = mod(_n-1, 6)

* Each mercado adds a fixed value to each product based on local demand
gen m_price = rbeta(2,5)*5 if prod_num == 1

bysort mercado: egen mercado_price = sum(m_price)

* Get rid of the reference price
drop m_price

* There is 104 weeks of observations for each product and mercado
expand 104

sort prod_num mercado

gen week = mod(_n-1,104)
* Each week there is a shared shock to all of the prices

gen week_p = rnormal()*.5 if mercado==0 & prod_num==1
bysort week_p: egen week_price=sum(week_p)
drop week_p

* Finally there is a product, market, and week specific shock that is indepentent of other shocks.
gen u = rnormal()*.5

* Let's generate some other random characteristics.
gen prod_stock = rnormal()

* Seasonality
gen seasonality = rnormal()

* Now let's calculate the price

gen price = prod_price + mercado_price + week_price + prod_stock + seasonality + u

* Finally in order to make things interesting let's say that our data set is incomplete because of random factors which occure 10% of the time.

gen missing = rbinomial(1,.1)
drop if missing==1
drop missing

* And to drop our unobservables
drop u week_price mercado_price prod_price

*   Now that we have created our data, let's do some descriptive statistics that we will create tables from

* First the basic summarize command
estpost summarize price seasonality prod_stock

* This in effect tells us what statistics can be pulled from the summarize command.

* We can get more stats (such as medians) by using the detail option
estpost summarize price seasonality prod_stock, detail

* We can now create a table of estimates
esttab ., cells("mean sd count p1 p50 p99") noobs compress

* To save the table directly to a rtf (word compatible format)
esttab . using tables.rtf, replace cells("mean sd count p1 p50 p99") noobs compress

* Or excel
esttab . using tables.csv, replace cells("mean sd count p1 p50 p99") noobs compress

* Note the . after esttab is important.  I don't know why, but it does not work without it.

* Now imagine we would like to assemble a table that has the mean price seasonality and prod_stock by mercado
estpost tabstat price seasonality prod_stock, statistics(mean sd) columns(statistics) listwise by(mercado)

* Everything looks like it is working properly up to this point but for some reason I can't get the next part to work.
esttab, main(mean) aux(sd) nostar unstack noobs nonote nomtitle nonumber

* The table only has one column when it should have 6 for the six different markets.

estpost tab prod_num
esttab . using tables.rtf , append cells("b(label(freq)) pct(fmt(2)) cumpct(fmt(2))")

* There is also a correlate function that will post information about the correlation between the first variable listed after corr and the other variables.
estpost corr price week seasonality mercado
esttab . using tables.rtf , append  cell("rho p count")

* Unfortunately the alternative option, to generate the matrix of correlations that we would expect is not working either.

* This is the sad fate of these user written programs (such as Ian Watson's tabout), Stata becomes updated and they do not.

* I would find it very annoying to have to update code constantly so that a general public that I do not know can continue to use my code for free.

* However, perhaps if people are nice and send the author some emails requesting an update he might be encouraged to come back to his code knowing it is being used.

* His contact information listed on the package tutorial is Ben Jann, ETH Zurich, jann@soz.gess.ethz.ch.


  1. Well, while testing your code, I got this:

    . * This in effect tells us what statistics can be pulled from the summarize command.
    . * We can get more stats (such as medians) by using the detail option
    . estpost summarize price seasonality prod_stock, detail

    | e(count) e(sum_w) e(mean) e(Var) e(sd) e(skewn~)
    price | 5567 5567 4.179372 3.569746 1.889377 .0742438
    seasonality | 5567 5567 .0044847 .991086 .995533 -.0089968
    prod_stock | 5567 5567 -.0048699 .9900933 .9950343 .0260169

    | e(kurto~) e(sum) e(min) e(max) e(p1) e(p5)
    price | 2.940709 23266.56 -3.707268 10.74922 -.0037667 1.163866
    seasonality | 2.932526 24.96656 -3.446181 3.271273 -2.343164 -1.645892
    prod_stock | 2.989126 -27.11094 -3.376829 3.447521 -2.334032 -1.617825

    | e(p10) e(p25) e(p50) e(p75) e(p90) e(p95)
    price | 1.760584 2.858054 4.16412 5.445695 6.600117 7.375202
    seasonality | -1.280259 -.6522442 .0002313 .6745954 1.298612 1.623599
    prod_stock | -1.288978 -.669455 -.0005103 .6562532 1.295787 1.65028

    | e(p99)
    price | 8.694004
    seasonality | 2.366776
    prod_stock | 2.321529

    . * We can now create a table of estimates
    . esttab ., cells("mean sd count p1 p50 p99") noobs compress
    current estimation results do not have e(b) and e(V)


  2. Dear Bach.

    This web page seems to be useful for you.