## Tuesday, July 30, 2013

### Diff-n-Diff of estimates from Structural Equation Models (SEM)

Structural equation modeling (SEM) is generally the practice of estimating latent factor(s) from observable measures.  It is similar to factor analysis in that it typically reduces a problem with potentially many dependent variables to a lower dimension which we hope is interpretable.

Let’s imagine a relatively simple model where this might be useful.  We have five responses which are measures of 2 latent traits we are interested in.  For half of our sample we administer an intervention with the hopes that it will increase our latent traits.  We administer a pretest measuring the 5 observables before the treatment and a posttest after measuring the 5 observables after.

Stata Code
clear
set obs 4000

gen id = _n

gen eta1 = rnormal()
gen eta2 = rnormal()

* Generate 5 irrelevant factors that might affect each of the
* different responses on the pretest
gen f1 = rnormal()
gen f2 = rnormal()
gen f3 = rnormal()
gen f4 = rnormal()
gen f5 = rnormal()

* Now let's apply the treatment
expand 2, gen(t)   // double our data

gen treat=0
replace treat=1 if ((id<=_N/4)&(t==1))

* Now let's generate our changes in etas
replace eta1 = eta1 + treat*1 + t*.5
replace eta2 = eta2 + treat*.5 + t*1

* Finally we generate out pre and post test responses
gen v1 = f1*.8  + eta1*1  + eta2*.4   // eta1 has more loading on
gen v2 = f2*1.5 + eta1*1  + eta2*.3   // the first few questions
gen v3 = f3*2   + eta1*1  + eta2*1
gen v4 = f4*1   + eta1*.2 + eta2*1   // eta2 has more loading on
gen v5 = f5*1   +           eta2*1   // the last few questions

* END Simulation
* Begin Estimation

sem (L1 -> v1 v2 v3 v4 v5) (L2 -> v1 v2 v3 v4 v5) if t==0
predict L1 L2, latent

sem (L1 -> v1 v2 v3 v4 v5) (L2 -> v1 v2 v3 v4 v5) if t==1
predict L12 L22, latent

replace  L1 = L12 if t==1
replace  L2 = L22 if t==1

* Now let's see if our latent predicted factors are correlated with our true factors.
corr eta1 eta2 L1 L2

* We can see already that we are having problems.
* I am no expert on SEM so I don't really know what is going wrong except
* that eta1 is reasonably highly correlated with L1 and L2 and
* eta2 is less highly correlated with L1 and L2 equally each
* individually, which is not what we want.

* Well too late to stop now.  Let's do our diff in diff estimation.
* In this case we can easily accomplish it by generating one more variable.

* Let's do a seemingly unrelated regression form to make a single joint estimator.

sureg (L1 t id treat) (L2 t id treat)

* Now we have estimated the effect of the treatment given a control for the
* time effect and individual differences.  Can we be sure of our results?
* Not quite.  We are treating L1 and L2 like observed varaibles rather than
* random variables we estimated.  We need to adjust out standard errors to
* take this into account.  The easiest way though computationally intensive is
* to use a bootstrap routine.

* This is how it is done.  Same as above but we will use temporary variables.
cap program drop SEMdnd
program define SEMdnd

tempvar L1 L2 L12 L22

sem (L1 -> v1 v2 v3 v4 v5) (L2 -> v1 v2 v3 v4 v5) if t==0
predict L1' L2', latent

sem (L1 -> v1 v2 v3 v4 v5) (L2 -> v1 v2 v3 v4 v5) if t==1
predict L12' L22', latent

replace  L1' = L12' if t==1
replace  L2' = L22' if t==1

sureg (L1' t id treat) (L2' t id treat)

drop L1' L2' L12' L22'

end

SEMdnd   // Looking good

* This should do it though I don't hae the machine time available to wait
* for it to finish.
bs , rep(200) cluster(id): SEMdnd

Formatted By EconometricsbySimulation.com


## Wednesday, July 24, 2013

### Power Analysis by Simulation: R, RCT, Malaria Example

I have received a number of requests for demonstration code on how to perform a power analysis using simulation in R.  I have already demonstrated howto do this in Stata but lacked the easy to use Stata command “simulate” that I preferred.  However, in a recent post I have written up a command very similar to simulate in R called SimpleSim.

This command takes is capable of taking a vector of parameters and feeding them into a function which returns a vector of results.  The results are turned into a table which can be used to show the rates of type 1 and type 2 error given a particular sample size and underlying effect.

Example:
Let's imagine that we are interested in testing the effect that bednet usage has on the rate of infection of malaria.  We are concerned that the decision to purchase and use bednets might endogenous as a result of those more likely to come down with malaria being more prone to seeing bednets as a valuable investment.

So, in response we are going to design an randomized control trial (RCT) in which we either supply fully paid bednet coupons or half discounted coupons each to different thirds of the population leaving the remaining third as a control.  We are concerned that if supply the bednets at no cost then the recipients will not value them so that is why we give coupons that only reduce the cost of the bednets by 1/2.

After distributing the coupons, we would like to use the random distribution of the coupons as an instrument for use of a bednet.  Ultimately we would like to do two levels of analysis.

1. Using instrumental variables (IV) to see how effective bednets are at preventing Malaria in an active population. This is actually an interesting question because often there is imperfect compliance because few individuals spend all of the hours at dusk and dawn under a bednet.  In this case I should use an estimator that takes the form of the first stage being a binary regressor followed by a second stage binary regressor.  However, since linear models seem to be pretty good at approximating average partial effects I am just going to use a IV estimator.
2. Using ordinary least squares (OLS) we would like to see how effective each of the coupon programs are at increasing use of bednets within the homes.

require("AER") # We will use the ivreg function later

# We define a function that both simulates a population and estimates
# results and returns and returns the results.
MalariaIV <- function(
nsim = 100,
npop = 1000,    # Define the sampling population
treat0 = 1/3,   # Define the proportion which is control
treat1 = 1/3,   # Define the proportion which is treated with free bednets
treat2 = 1/3,   # Define the proportion which is treated with 50% cost
t0comp = .05,   # Bednet useage among control group
t1comp = .85,   # Bednet useage if recieving free net
t2comp = .5 ,   # Bednet useage if recieving 50% cost
malariaRT = .85,# Rate of getting malaria without bednet
netRT = .45,    # Rate of getting malaria with bednet
Tdetect = 1,    # Likihood of detecting malaria if present
Fdetect = 0,    # Likihood of detecting malaria if not present
alpha = .05     # The alpha level that a p-value must be below
) {
# Define how the function works
# First, define a vector to store results
pvalues <- NULL

#
for (i in 1:nsim) { # Repeat the simulation a number of times
# Generate the population by technology useage
simdata <- data.frame(
control=c(rep(1,npop*treat0),rep(0,npop*treat1), rep(0,npop*treat2)),
treat1 =c(rep(0,npop*treat0),rep(1,npop*treat1), rep(0,npop*treat2)),
treat2 =c(rep(0,npop*treat0),rep(0,npop*treat1), rep(1,npop*treat2)))
# Calculate the actual population generated (should be 999 in this case)
npeople <- nrow(simdata)
# Generate the rate of bednet usage.
simdata$bednet <- simdata$control*rbinom(npeople,1,t0comp) + # Bednet usage control
simdata$treat1 *rbinom(npeople,1,t1comp) + # Free simdata$treat2 *rbinom(npeople,1,t2comp)   # 50% cost
# Now let's generate the rate of malaria
simdata$malaria <- (simdata$bednet==0)*rbinom(npeople,1,malariaRT)+
(simdata$bednet==1)*rbinom(npeople,1,netRT) # Finally generate the rate of malaria detection as a function # of true parasite levels. simdata$mdetect <-
(simdata$malaria==0)*rbinom(npeople,1,Fdetect)+ # False detection (simdata$malaria==1)*rbinom(npeople,1,Tdetect)  # True detection

# Time to do our simple estimation of the effect of treatment on bednet use
lmcoef <- summary(lm(bednet~treat1+treat2,data=simdata))$coefficients # Now let's try the 2SLS to estimate the effect of bednets on contraction # of malaria. ivregest <- ivreg(mdetect~bednet | treat1+treat2, data=simdata) ivregcoef <- summary(ivregest)$coefficients

# Save the rejection of the null rates
pvalues <- rbind(pvalues,c(treat1=lmcoef[2,4]<alpha,
treat2=lmcoef[3,4]<alpha,
iv=ivregcoef[2,4]<alpha))

}
# Calculate the mean rejection rate for each coefficient.
apply(pvalues,2,mean)
}

MalariaIV()
# Running it once we can see that we get a single set of results
# where we easily reject the null.  However, we want to know
# what happens when bednets are not so effective or malaria is harder
# to detect.  We can modify the parameters fed into the model to test
# these questions manually or we could use the SimpleSim function from
# a previous post.

SimpleSim(fun=MalariaIV,
npop=c(100,1000),
t0comp=c(.05,.25),
t1comp=c(.5,.75),
t2comp=c(.25,.5),
malariaRT=c(.85,.5),
netRT=c(.75,.45),
Tdetect=c(1,.7),
Fdetect=c(0,.3),
alpha=.05,
nsim=10)
# This could take a little while to run since there are 256 combinations
# to try and each of them will be run 10 times.

# The above command gives back lots of data but it is not always very easy
# to understand in a matrix form.  It is often easier to just vary one
# paramter at a time.

sample.size <- SimpleSim(fun=MalariaIV,
npop=c(500,1000*1:10),
t0comp=c(.25),
t1comp=c(.75),
t2comp=c(.375),
malariaRT=c(.5),
netRT=c(.35),
Tdetect=c(.8),
Fdetect=c(.2),
alpha=.05,
nsim=200)

# Looking at just sample size
require(ggplot2)

# Save the results to single long data format to be useable by ggplot2
results <- with(sample.size, data.frame(reject = as.numeric(
c(iv,treat2,treat1)),
id=rep(c("iv","treat2","treat1"), each=length(npop)),
npop))

p <- ggplot(results, aes(npop, reject))
# Normally a 80% detection rate is the minimum rejection rate needed
# to justify a study.
p + geom_point(aes(colour =id)) +
geom_line(aes(group=id)) +
geom_hline(yintercept = .8)


# Thus we want to ensure our study has at least 5000 participants.

# I think there might be some additional considerations for the
# number of participants required to ensure that there is no
# false rejection of the null.  However, I am no expert on Power
# Analysis so this is what I know.
Formatted by Pretty R at inside-R.org

## Monday, July 22, 2013

If you are anything like me then you have dozens if not hundreds of personal functions that you have written to accomplish a number of tasks.  Many of the tasks that you would like to do are similar to previous tasks that you have already done.  So how are you going to bridge the functions across your disjoint projects?

There are a four good options I see which might fit different projects:

1. You copy and paste your old code at the beginning of your new code so that when you run the code the previously designed functions are available.  Pros: simple, Cons: makes your code a lot longer than you may prefer.

2. You create a package or the skeleton of the package and load your functions into it as you write them. When you write new code that uses those functions you just load your package. Pros: this creates very clean code, Cons: this can be a little complicated and your code might not be at a level of refinement that would warrant packaging up your code and then potentially having to make revisions on it and updating the package frequently.

3. You could source your code with the "source" function.  Pros: this allows your code to be nearly as clean as loading your code as its own package and it allows you to easily update your source file as frequently you may want.  Cons: Nothing obvious.

4. Set up start-up profiles with R so that you can customize your programming environment easily.  This solution is kind of the point of this post however it is not clear to me that this is the best option.  However, this is what I have done.  I used Quick-R: Customizing Startup article as a reference.

In the .Rprofile file which I found in "C:/users/user_name/Documents":

.First &lt;- function(){
# Load up functions that I always want to be available.
source("C:/Dropbox/R-Startup-Profiles/Base.R")

print("Profiles Available: Shiny, Sim, Concerto")
Shiny <<- function() source("C:/Dropbox/R-Startup-Profiles/Caller.R")
Sim <<- function() source("C:/Dropbox/R-Startup-Profiles/Simulation.R")
Concerto <<- function() source("C:/Dropbox/R-Startup-Profiles/Concerto.R")
}


Formatted by Pretty R at inside-R.org

So now every time R startups it will load my always needed functions which are stored in the Base.R file.  Then I will choose which environment I want to program in.  If for instance I want to program a Shiny app then I would just send the command:
>Shiny()
and that will load my Shiny profile which is pretty simple:

cat("Loading Shiny!")
library(shiny)
library(shinyIncubator)
setwd("c:/dropbox/shiny_r/")


Formatted by Pretty R at inside-R.org

## Friday, July 19, 2013

### Simulation of Blackjack: the odds are not with you

I often want to simulate outcomes varying across a set of
parameters. In order to accomplish this in an efficient
manner I have coded up a little function that takes parameter
vectors and produces results. First I will show how to set it
up with some dummy examples and next I will show how it can
be used to select the optimal blackjack strategy.

SimpleSim <- function(..., fun, pairwise=F) {
# SimpleSim allows for the calling of a function varying 
  # multiple parameters entered as vectors. In pairwise form
  # it acts much like apply. In non-paiwise form it makes a
  # combination of each possible parameter mix
  # in a manner identical to block of nested loops.

returner <- NULL
L        <- list(...)
# Construct a vector that holds the lengths of each object
vlength <- unlist(lapply(L, length))
npar    <- length(vlength)
CL      <- lapply(L, "[", 1) # Current list is equal to the first element

# Pairwise looping
if (pairwise) {
# If pairwise is selected than all elements greater than 1 must be equal.
# Checks if all of the elements of a vector are equal
if (!(function(x) all(x[1]==x))(vlength[vlength>1])) {
print(unlist(lapply(L, length)))
stop("Pairwise: all input vectors must be of equal length", call. =F)
}
for (i in 1:max(vlength)) { # Loop through calling the function
CL[vlength>1]  <- lapply(L, "[", i)[vlength>1] # Current list
returner <- rbind(returner,c(do.call(fun, CL),pars="", CL))
}
} # End Pairwise

# Non-pairwise looping
if (!pairwise) {
ncomb <- prod(vlength) # Calculate the number of combinations
print(paste(ncomb, "combinations to loop through"))
comb <- matrix(NA, nrow=prod(vlength), ncol=npar+1)
comb[,1] <- 1:prod(vlength) # Create an index value
comb <- as.data.frame(comb) # Converto to data.frame
colnames(comb) <- c("ID", names(CL))
for (i in (npar:1)) { # Construct a matrix of parameter combinations
comb[,i+1] <- L[[i]] # Replace one column with values
comb<-comb[order(comb[,(i+1)]),] # Reorder rows
}
comb<-comb[order(comb[,1]),]
for (i in 1:ncomb) {
for (ii in 1:npar) CL[ii] <- comb[i,ii+1]
returner <- rbind(returner,c(do.call(fun, CL),pars="", CL))
}
} # End Non-Pairwise

return(returner)

} # END FUNCTION DEFINITION

# Let's first define a simple function for demonstration
minmax <- function(...) c(min=min(...),max=max(...))

# Pairwise acts similar to that of a multidimensional apply across columns
SimpleSim(a=1:20,b=-1:-20,c=21:40, pairwise=T, fun="minmax")
# The first set of columns are those of returns from the function "fun" called.
# The second set divided by "par" are the parameters fed into the function.
SimpleSim(a=1:20,b=-1:-20,c=10, pairwise=T, fun="minmax")

# Non-pairwise creates combinations of parameter sets.
# This form is much more resource demanding.
SimpleSim(a=1:5,b=-1:-5,c=1:2, pairwise=F, fun="minmax")

# Let's try something a little more interesting.

# Let's simulate a game of black jack strategies assuming no card counting is possible.
blackjack <- function(points=18, points.h=NULL, points.ace=NULL,
cards=10, cards.h=NULL, cards.ace=NULL,
sims=100, cutoff=10) {
# This function simulates a blackjack table in which the player
# has a strategy of standing (not asking for any more cards)
# once he has either recieved a specific number of points or
# a specific number of cards.  This function repeates itself sims # of times.
# This function allows for up to three different strategies to be played.
# 1. If the dealer's hole card is less than the cuttoff
# 2. If the dealer's hole card is greater than or equal to the cuttoff
# 3. If the dealer's hole card is an ace
# In order to use 3 level strategies input parameters as .h and .ace

# It returns # of wins, # of losses, # of pushes (both player and dealer gets 21)
# and the number of blackjacks.

# This simulation assumes the number of decks used is large thus
# the game is like drawing with replacement.

if (is.null(points.h))   points.h   <- points
if (is.null(points.ace)) points.ace <- points.h
if (is.null(cards.h))    cards.h    <- cards
if (is.null(cards.ace))  cards.ace  <- cards.h

bdeck <- c(11,2:9,10,10,10,10) # 11 is the ace

bdresult <- c(ppoints=NULL, pcards=NULL, dpoints=NULL, dcards=NULL)

for (s in 1:sims) {
dhand <- sample(bdeck,1) # First draw the deal's revealed card
phand <- sample(bdeck,2, replace=T)

# Specify target's based on dealer's card
if (dhand<cutoff) {
pcuttoff <- points
ccuttoff <- cards
}
if (dhand>=cutoff) {
pcuttoff <- points.h
ccuttoff <- cards.h
}
if (dhand==11) {
pcuttoff <- points.ace
ccuttoff <- cards.ace
}

# player draws until getting above points or card count
while ((sum(phand)<pcuttoff)&(length(phand)<ccuttoff)){
phand <- c(phand, sample(bdeck,1))
# If player goes over then player may change aces to 1s
if (sum(phand)>21) phand[phand==11] <- 1
}

# Dealer must always hit 17 so hand is predetermined
while (sum(dhand)<17) {
dhand <- c(dhand, sample(bdeck,1))
# If dealer goes over then dearler may change aces to 1s
if (sum(dhand)>21) dhand[dhand==11] <- 1
}
bdresult <- rbind(bdresult,
c(ppoints=sum(phand), pcards=length(phand),
dpoints=sum(dhand), dcards=length(dhand)))
}

# Calculate the times that the player wins, pushes (ties), and loses
pbj <- (bdresult[,1]==21) & (bdresult[,2]==2)
dbj <- (bdresult[,3]==21) & (bdresult[,4]==2)
pwins <- ((bdresult[,1] >  bdresult[,3]) & (bdresult[,1] <  22)) | (pbj & !dbj)
push  <- (bdresult[,1] == bdresult[,3]) | (pbj & dbj)
dwins <- !(pwins | push)

# Specify the return.
c(odds=sum(pwins)/sum(dwins),
pwins=sum(pwins),
dwins=sum(dwins),
push=sum(push),
pcards=mean(bdresult[,2]),
dcards=mean(bdresult[,4]),
pblackjack=sum(pbj),
dblackjack=sum(dbj))
}

blackjack(points=18, sims=4000)
# We can see unsurprisingly, that the player is not doing well.

blackjack(points=18, points.h=19, sims=4000)
# We can see that by adopting a more aggressive strategy for when
# the dealer has a 10 point card or higher, we can do slightly better.
# But overall, the dealer is still winning about 3x more than us.

# We could search through different parameter combinations manually to
# find the best option.  Or we could use our new command SimpleSim!

MCresults <- SimpleSim(fun=blackjack, points=15:21, points.h=18:21,
points.ace=18:21, cutoff=9:10, cards=10, sims=100)

# Let's now order our results from the most promising.
MCresults[order(-unlist(MCresults[,1])),]

# By the simulation it looks like we have as high as a 50% ratio of loses to wins.
# Which means for every win there are 2 loses.
# However, I don't trust it since we only drew 100 simulations.
# In addition, this is the best random draw from all 224 combinations which each
# have different probabilities.

# Let's do the same simulation but with 2000 draws per.
# This might take a little while.
MCresults <- SimpleSim(fun=blackjack, points=15:21, points.h=18:21,
points.ace=18:21, cutoff=9:10, cards=10, sims=5000)

# Let's now order our results from the most promising.
MCresults[order(-unlist(MCresults[,1])),]

hist(unlist(MCresults[,1]), main="Across all combinations\nN(Win)/N(Loss)",
     xlab = "Ratio", ylab = "Frequency")

# The best case scenario 38% win to loss ratio appears around were we started,
# playing to hit 18 always and doing almost as well when the dealer is high
# (having a 10 or ace) then playing for 19.

# Overall, the odds are not in our favor.  For every win we expect 1/.38 (2.63) loses.
Highlighted by Pretty R at inside-R.org

## Thursday, July 18, 2013

* Stata code

* When I first started taking stats there was some discussion between the merits of R2 measures and that of adjusted R2.

* People were concerned that including any additional estimators by definition increased the R2 measure so the need to come up with a measure that did not depend on number of regressors.

* In this small command I generates a dependent variable then generate independent explanatory variables and see what happens to the r2 and adjusted r2 when we increase the number of explanatory variables.
cap program drop R2r
program define R2r
clear
set obs 2' // The second argument is the number of observations
tempvar y
gen y' = rnormal()  // Generate the dependent variable
forv i=1(1)1' {  // Loop from 1 to the number of variables defined as the first argument.
tempvar vi'
gen vi'' = rnormal()
}
reg y' v1'-v1'' // Do the estimation.
end

set seed 1

R2r 2 10000 /// The r-squared is quite small with only two dependent variables
R2r 20 10000 /// The r-squared is much larger

* But we should not take the results of just two simulations lets try this using the simulate command
simulate r2=e(r2) r2a=e(r2_a), rep(200): R2r 2 10000
sum
* The R2 is a little less than .2%
* The adjusted R2 is a little less than 0

simulate r2=e(r2) r2a=e(r2_a), rep(200): R2r 20 10000
sum
* Now the R2 using 20 observations is close to .2%
* The adjusted R2 is very close to zero

simulate r2=e(r2) r2a=e(r2_a), rep(200): R2r 200 10000
sum
* Almost identical results with the 2 squared on average being around 2%

* Using 1000 observations the r2 is more sensitive
simulate r2=e(r2) r2a=e(r2_a), rep(200): R2r 2 1000
sum
* Now the R2 is a little greater than .2%
* The adjusted R2 is little greater than .02%

simulate r2=e(r2) r2a=e(r2_a), rep(200): R2r 20 1000
sum
* Now the R2 using 20 observations is 2%

simulate r2=e(r2) r2a=e(r2_a), rep(200): R2r 200 1000
sum
* Now the average R2 squared is greater tha 20%

* Overall, the take away seems to be, only worry about the R2 when the number of observations are low and the number of regressors are large.

## Tuesday, July 9, 2013

### A Sudoku Puzzle Solver - attempt 1

I have programmed up a R based Sudoku problem solver for Sudoku puzzles of that only require simple inference.  In these puzzles a solution can be found using only first order inference.  This solver can be found at the end of the code located in the git:

https://github.com/EconometricsBySimulation/2013-06-14-Sudoku

A puzzle of this form can look like this (generated from the R code on the git)
A solution to this problem can be found using the following steps generated from the solver:
[1] "Round 1 -sub out 1 4 with 4"  That is sub out row 1 column 4 with a 4.
[1] "Round 1 -sub out 1 6 with 5"
[1] "Round 1 -sub out 2 7 with 5"
[1] "Round 1 -sub out 2 8 with 3"
[1] "Round 1 -sub out 2 9 with 8"
[1] "Round 1 -sub out 3 5 with 9"
[1] "Round 1 -sub out 3 9 with 2"
[1] "Round 1 -sub out 4 7 with 2"
[1] "Round 1 -sub out 5 5 with 4"
[1] "Round 1 -sub out 5 8 with 6"
[1] "Round 1 -sub out 6 1 with 2"
[1] "Round 1 -sub out 6 6 with 9"
[1] "Round 1 -sub out 7 5 with 5"
[1] "Round 1 -sub out 8 9 with 9"
[1] "Round 1 -sub out 9 1 with 3"
[1] "Round 1 -sub out 9 2 with 2"
[1] "Round 1 -sub out 9 3 with 4"
[1] "Round 1 -sub out 9 6 with 1"
[1] "Round 1 -sub out 9 9 with 7"
[1] "Round 2 -sub out 1 1 with 1"
[1] "Round 2 -sub out 1 2 with 3"
[1] "Round 2 -sub out 1 3 with 2"
[1] "Round 2 -sub out 2 1 with 7"
[1] "Round 2 -sub out 3 1 with 6"
[1] "Round 2 -sub out 3 2 with 5"
[1] "Round 2 -sub out 3 8 with 1"
[1] "Round 2 -sub out 4 2 with 8"
[1] "Round 2 -sub out 4 4 with 6"
[1] "Round 2 -sub out 4 8 with 9"
[1] "Round 2 -sub out 4 9 with 5"
[1] "Round 2 -sub out 5 2 with 7"
[1] "Round 2 -sub out 5 3 with 5"
[1] "Round 2 -sub out 5 4 with 2"
[1] "Round 2 -sub out 5 9 with 3"
[1] "Round 2 -sub out 7 1 with 8"
[1] "Round 2 -sub out 7 2 with 9"
[1] "Round 2 -sub out 7 3 with 6"
[1] "Round 2 -sub out 8 4 with 8"

The results look like this.  You can see that it was not able to solve all of the missing blocks though it is pretty close.
 Interestingly, I think this puzzle has two potential solutions.
This solver is not at the level I wanted it to be at.  I can imagine two more levels of inference that it could handle: one in which the solution uses secondary inference information.  Such as knowing that if it chooses a particular value the other block cannot be filled with that value.  This I call a type 2 and finally a guess and check method which I call a type 3.  The guess and check method should be easy to program but have the difficulty that if there are many squares not solved by the previous methods, it will potentially bog down the machine as it has to check a near infinite number of potential solutions.

However, there should be very few puzzles in which guess and check would be required since those puzzles would be even harder for most people to solve than they would for an algorithm.

I am not satisfied with this solution but I wanted to post this because I had worked it out a few weeks ago and my real work is calling so I am not sure when I will get back to this.