Saturday, March 30, 2013

the catR package for Computer Adaptive Test design


# The catR package authored by David Magis (U Liege, Belgium), Gilles Raiche (UQAM, Canada) is a package that facilitates the generation of "response patterns" for computer adaptive tests.  There are a number of applications of this type of package.  The primary one is the frequent use of software to validate Psychometric proceedures.

# Another promising use of the package is its implementation in conjunction with the deployment of a computer adaptive test on Concerto (an open source testing platform put forward by the Psychometrics Centre at the University of Cambridge).

# In this post I will explore the use of this package in generating data and simulating test administration and scoring.

require(catR)

# catR has the ability to very easily simulate an item bank with eaither the 1pl, 2pl, 3pl, or 4pl models available.

# Frequently one might already have an item bank that one is using. However, if you don't, then you can easily simulate an item bank with the following command. Though the command is used for more than simulating an item bank.  It also generates information about your item bank.

# Will create an item bank of 400 items
mybank = createItemBank(model="2pl", items=400, thMin=-6, thMax=6)

# The thetas that the item information is generated on is:
theta = mybank$theta

# The information for the entire 200 items can be generated:
information = apply(mybank$infoTab,1,sum)

# The information function looks normal because information stacks highest around the b parameter which is most heavily distributed at the mean 0.
plot(theta, information, type="l", main="The b parameter default is normal(0,1)")

# We can see the parameters of the items we drew.
mybank$itemPar

# However, I am unsatisfied with the idea of an item bank that has more information at the mean and less at the tails since one of promises of computer adaptive tests is that they have the potential to be equally precise at all levels of the ability spectrum.  I will instead generate an item bank that reflects this hope.

mybank2 = createItemBank(model="2pl",  bPrior=c("unif",-4,4), items=400, thMin=-6, thMax=6)
information2 = apply(mybank2$infoTab,1,sum)

# We can see that information2 has much less information at the center of the distribution and more near the tails.
plot(theta, information, type="l", main="A wider spread of information means less at the center")
lines(theta, information2, pch=22, lty=2, lwd=2)

# We can check to see if both distributions has approximately the same amount of information over the tested range (-6,6).  I chose -6 and 6 because it helps make the distribution information look more similar.  If we restrict information to only the likely ranges of theta -4,4 then the first function has much more information.
sum(information)
sum(information2)
# Both item pools have the same amount of information (ignoring statistical noise) accross the entire possible continuum of thetas.  However, the information function with a normally distributed b parameter has more information at the likely values of ability.  That said we might not care to have an extremely precise estimate of student ability at the center of the ability distribution if we have little precision near the ends.

# Let's say we would like to require that the test any student takes provides a minimum infomration of 30 for any student taking the test with ability between -4 and 4.

abline(h = 30)
abline(v = c(-4,4))




# Using the uniform draw does not give us a perfect fit, but it is better than normal distribution.

# There is other more complex ways of ensuring that this criteria is met, dealt with in one of my prior posts.  (http://www.econometricsbysimulation.com/2012/11/matching-test-information.html) I will not go into that here.

# Let's create a simple simulation of a cat test with our item bank.

# First let's draw the subjects true ability:

true.theta = rnorm(1)

# Let's select our item. Our initial estimate of theta is 0.  nextItem has quite a number of interesting options available that make it flexible.  Out tells the test not to select some items.
item.choice = nextItem(mybank2,theta=0, out=NA)

# our item is:
item.choice$item

# The parameters for that item are:
item.choice$par

# First I want to make sure the item parameters can be fed into the probability function Pi
pars = t(as.matrix(item.choice$par))

# This will give us the probability that the subject will get the answer right.
probCorrect = Pi(true.theta, pars)$Pi

# Now let's do a random draw to see if the subject answered correctly (by simulation of course)
correct = (probCorrect>runif(1))

# Finally, let's estimate subject's ability.  I will use baysian estimator since the maximum likelihood estiamtor does not function with few observations.
theta.est = eapEst(pars, correct)

# Now we would start again with our theta estimate and plug it back into our item choice and repeat until we meet some test termination criterion.

# I will say for the sake of simplicity, I will loop through a lengthy 100 item test.

# Let's start from this point.
nitems = 100

user.response = matrix(NA, nrow=nitems, ncol=1)
user.response[1,] = correct

user.theta.est = theta.est

items.taken = item.choice$item
items.parm = matrix(NA, nrow=nitems, ncol=4)
items.parm[1,] = item.choice$par


for (i in 2:nitems) {
  item.choice = nextItem(mybank2,theta=user.theta.est, out=items.taken)
  items.parm[i,] = item.choice$par
  user.response[i] = Pi(true.theta, t(as.matrix(items.parm[i,])))$Pi>runif(1)
  user.theta.est[i] = eapEst(items.parm[!is.na(user.response),], user.response[!is.na(user.response)])
}

# Note, convergence is not guaranteed in any particular run nor can it be assuming an infinite number of items can be administered since there are only 400 items in our pool.
plot(1:nitems, user.theta.est, type="b", main="Estimates should converge on true ability"
     , ylab = "Theta Estimates", xlab = "Number of items administered")
abline(h=true.theta, lwd=3, col="darkblue")



# catR even has a command called randomCAT which takes a particular item bank and user theta and generates a random simulated test and response pattern for that user just as we did previously but with many many options.  This of course is unlikely to solve all of your needs.  However, the creators of catR should be congradulated on their thoroughness!

# This post gives a simplified run through of some of catR's abilities.  Overall, I am very impressed and look forward to using it in my work.

Friday, March 29, 2013

Capacity for Love and Prejudice - Stata Simulation


* This is my simple hypothesis (really my own personal prejudice):
* the more prejudice someone allows themselves to be the less capacity they have for love.

* In this simulation I will attempt to generate a graph which convey's this idea with a pink equals sign superimposed on it :)

* Sorry for the blatant politicizing.

* However, I think this is a significant human rights battle being waged in the US.

* Though, it is not the last human rights battle that needs to be wages nor perhaps the most dire.

* Other important human rights issues are criminal sentencing, hunger and poverty among children, and continued widespread racial discrimination.

* However, this is the battle of today and should be addressed.

* In this very simple simulation I will attempt to generate a graphic which is a variant of the pink equal sign.
clear
set obs 400

gen prejudice = (runiform()-.5)*2
gen unobserved = rnormal()
gen love = unobserved - 3*prejudice

twoway (lpolyci love prejudice, fcolor("242 191 241") blcolor("255 128 192") degree(5) /*
  This first line will generate a best fit line through the data with a 5 degree polynomial
        */ , text(0 0 "=", size(full) color("255 128 255") ) ) /*
  This will place an equals sign at the cordinates 0,0 and make it "full" size which is the large
    but not as large as I would prefer it to be.
       */ (scatter love prejudice, mcolor("red") ), /*
  This draws my generated data points on the graph
  */ ytitle(capacity for love, size(large)) ytitle(, color("255 128 255")) ylabel(none) /*
  This will tell the y axis what to place on it and what color to make it.
    The labeling of the y axis is naturally supressed by the twoway option since
    each twoway graph can have its own x and y corrdinates
  */ xtitle(capacity for  prejudice, size(large )) xtitle(, color("255 128 255")) xlabel(none) /*
  */ legend(off)  graphregion(fcolor("168 0 0"))
  * I turn the legend off and change the background color to be a crimson.


Thursday, March 28, 2013

Generate GIS integrated animation:Gun deaths animation - R

# I wanted to briefly revisit a previous post in order to update the graphics.



# The code used in the previous post has also become more useful since the database of guns deaths has tripled since the original post (yay!).

# If you would like to generate your own graph, make sure to use the r code attached to the original post rather than the original post.


# We can see a steady decline in gun deaths.  I am not sure what is driving this exactly.  I suspect maybe a decrease in gun suicides due to the winter coming to an end is driving this though I think gun deaths will rise as murders start increasing during the summer months.

Wednesday, March 27, 2013

ML and initial values


* This post explores the setting of initial values with the ML command.

* There are several reasons you may be interested in doing this.

* 1. You are having difficulty with your data converging and you think that
*        you are having a problem with your initial values being unfeasible.
* 2. You may believe that you might already have reasonable estimates and
*        therefore you believe you might gain efficiency by plugging in those
*        initial values.
* 3. You are interested in using Stata's approximation algorithm to compute
*        standard errors.

* Let's first start with a simple data set.
clear
set obs 1000

gen x = runiform()*2

gen u = rnormal()*5

gen y = 2 + 2*x + u
* x predicts y

graph twoway (lfitci y x) (scatter y x), title("X predicts reasonably Y")



* Let's say we are interested in estimating the three parameters in our data.
* coefficient on x and the constant as well as the variances of the error (25)
* We can use the maximum likelihood function myNormalReg to solve this problem
* simultaneously.

cap program drop myNormalReg
program define myNormalReg
  args lnlk xb sigma2
  qui replace `lnlk' = -ln(sqrt(`sigma2'*2*_pi)) - ($ML_y-`xb')^2/(2*`sigma2')
end

ml model lf myNormalReg (reg: y = x) (sigma2:)
  ml max
* Our estimates look reasonable.  We iterated 4 times and converged quickly.
 
* This is our benchmark.

* We also know that OLS is equivalent to the above MLE except it does not directly estimate sigma
*    though it does give a more precise estimate of the coefficients.
reg y x

* So let's try the previous MLE using our OLS estimates as starting values
ml model lf myNormalReg (reg: y = x) (sigma2:)
  ml init reg:x = `=_b[x]'
  ml init reg:_cons = `=_b[_cons]'
  ml max

* Unfortunately, perhaps this simple setup is too easy to solve.
* There is no noticable difference in convergence rates using the OLS estimates as starting values.

* Alternatively let's see if things improve if we use the true values.
ml model lf myNormalReg (reg: y = x) (sigma2:)
  ml init reg:x = 2
  ml init reg:_cons = 2
  ml init sigma2:_cons = 25
  ml max
 
* The results are pretty much the same as above.

* Finally, let's imagine that we would like to estimate the standard errors on the true parameter
*    values as if ml had converged on the true.  This might be useful if you have two methods for
*    maximizing an equation one that converged but did not give valid standard errors and one that
*    did not converged but gave valid standard errors.
ml model lf myNormalReg (reg: y = x) (sigma2:)
  ml init reg:x = 2
  ml init reg:_cons = 2
  ml init sigma2:_cons = 25
  ml max, iter(0)

Tuesday, March 26, 2013

A simple simulation of Hierarchical Linear Modelling (HLM) using two different R packages - intercept only RE


# HLM is a common tool used to analysize hierarchically structured data.

# Let's see it in action using a couple of the tools programmed up by HLM researchers.

# First let's generate a data set.

# Imagine we have a two level model, students nested within classrooms.

# Let's say we have 20 classrooms

nclass = 20

# And thirty students per classroom

nstud = 30

# Let's imagine that we have a classroom effect that varies randomly and is uncorrelated with the student level effect.

class.effect = rnorm(nclass)*2

# Imagine also we have a unique outcome per student.

# We have nclass*nstud number of students in total.

student.effect = rnorm(nstud*nclass)*3

# Now we have all the data we need in order to generate our analysis.
data.set = data.frame(class.id = rep(1:nclass, each=nstud),
                  class.effect = rep(class.effect, each=nstud),
                  student.id = 1:(nstud*nclass),
                  student.effect = student.effect)

data.set$outcomes = data.set$class.effect + data.set$student.effect

head(data.set)
# Looking good.  Now let's load our HLM package.

require(nlme)

# For quick reference I used:
# https://txcdk.unt.edu/iralab/sites/default/files/HLM_R_Handout.pdf

lme(outcomes ~ 1, random = ~ 1 | class.id, data=data.set)

# We can see that lme is pretty good at estimating a standard deviation of the level 2 effect at around 2 and a level 3 effect around 3.

# Alternatively we can attempt to use the software lme4.
require(lme4)

# For quick reference I went to:
http://lme4.r-forge.r-project.org/book/Ch1.pdf

lmer(outcomes ~ 1 + (1|class.id), data=data.set)
# Reassuringly we can see that the estimates are indentical though lme4 seems to provide more better organized feedback.

Monday, March 25, 2013

asim command


* After using the msim command from a previous post, I decided I did not like having many different variables for each simulation run.

* So I have recoded an alternative command called asim

* This command allow subsequent simulate commands to be run, saving the old data in memory appended to the current data.
cap program drop asim
program define asim
  * This will allow the user to specify a value to assign to the asim variable which is generated after the simulation is run.
  gettoken before after : 0 , parse(":")
  local simulation = subinstr("`after'", ":", "", 1)

  tempfile tempsave
  cap save `tempsave'
  `simulation'

  * This will append in the new simulation data to the old data set.
  gen asim = "`before'"
  cap append using `tempsave'
end


* Let's write a nice little program that we would like to simulate.
cap program drop simCLT
program define simCLT

  clear
  set obs `1'
  * 1 is defined as the first argument of the program sim

  * Let's say we would like to see how many observations we need for the central limit theorem (CLT) to hold for a bernoulli distribution.
  gen x = rbinomial(1,.25)

  sum x
end

* So let's see first how the simulate command works initially.


simulate, rep(200): simCLT 100

* The simulate command will automatically save the returns from the sum command as variables (at least in version 12)

* But instead let's try our new command!
clear
* Clear out the old results
asim 0100: simulate, rep(200): simCLT 100

asim 0200: simulate, rep(200): simCLT 200

* Looks good!

asim 0400: simulate, rep(200): simCLT 400

asim 1000: simulate, rep(200): simCLT 1000

asim 10000: simulate, rep(200): simCLT 1000

asim 100000: simulate, rep(200): simCLT 1000

* Standardize the individual means of each run so as to more easily compare them with each other.
bysort asim: egen sd_mean = sd(mean)
bysort asim: egen mean_mean = mean(mean)

gen std_mean = (mean-mean_mean)/sd_mean

hist std_mean, kden by(asim)
* We can see that this generate data is much easier to use than that using many different variables.

* It looks a little funny with some of the numbers having zeros in front.  However, this was the best way to do it given that the asim variable is text.

Friday, March 22, 2013

My not so Brief Stata Formatting Guide

* I write this as a short guide though I do not always stick to it. This post was inspired by a thoughtful discussion on the linkedin Stata-Users group.

* The number one rule is: Always, always comment your code!

* See my reasons:
* http://www.econometricsbysimulation.com/2012/07/7-reasons-to-comment-your-code.html

* I prefer using <*> before a comment as the primary method of commenting.
* And <//> before my lines when in mata.

* I only like using /* and */ when I have a large amount of comments.

* I think it is useful to add comments after commands like.
clear // remove data in memory

* Though with long commands I think it is hard to read.  For example:
twoway (scatter  length gear_ratio) (scatter  foreign mpg_price) (scatter  price mpg) ///
  , title("This is a useless and meaningless graph") // Graphs length against gear_ratio

* -----------------------------SAMPLE DOCUMENT----------------------------
/***** Title of Do File

Description of do file.  This might have several paragraphs for which
I reccomend hard breaking the lines since Stata does not have word wrap

*********************** Section 0: Initialization **********************

If your do file is very long and has multiple sections consider including
an index.

Index
1. Parameter declaration
2. Input/clean data, generate temporary data
3. Manipulate variables
4. Generate summary statistics
5. Generate estimates
6. Delete temporary data/variables

I might also consider including a variable glossory at the begging of your do file.

For example:
cntgdp: Country GPD
cntgdp2: Country GPD demeaned
year: Year
nsrvy: Number of Survey Wave

As for naming variables, I would suggest not letting variables get longer
than six letters and two numbers long.

For example the variables might mean:
dstgdpp: disctrict gross domestic product per capita, nominal currency of that year.
dstgdppcchgyr00: disctrict gross domestic product per capita change from year 2000.
It might seem like a good idea to write it this way but it is really confusing to
try to read especially since stata will start truncating variables.

I would suggest instead naming variables something like this instead:
dstgdp1: disctrict gross domestic product per capita change from year 2000.
dstgdp0: disctrict gross domestic product per capita, nominal currency of that year.

Have two places you define the variables.
The variable glossory at the begginning of your document and the label that you
give your variables.
*/
******************** End Section 0: Initialization **********************


****************** Section 1: Parameter declaration *********************

* Often times you might find it useful to specify globals or locals that help you
* control your analysis when you run your file.

* For example:
global exmin = 1
* When set to 1 minorities will be excluded from the analysis.

global ppp = 0
* When set to 1 purchasing power parody will be used instead of GDP per capita.

* Of course you will need to code up within the analysis what the globals actually do.


* Speficy a working directory.  This can be done with the "cd".

* Personally I don't think this is suffcient.
* Often I am loading multiple data files from multiple directories.

* I prefer using globals specified in the parameter section.
* This allows users to have slightly or largely different file organization,
* Yet still be able to run your analysis.

* For example:
* Use globals to specify directories of interest
* Read directory
global rdir = "C:/data_files/my_project/original_data/"
* Save directory
global sdir = "C:/my_project/modified_data/"

**************** End Section 1: Parameter declaration *********************



****************** Section 2: Input/clean data *********************

* When you load in data.  Always first load it then save a copy of it somewhere else.

* Load original data:
sysuse auto, clear

* Save data to new directory where it will never accidently overwrite your original data
save "${sdir}auto.dta", replace

*************** End Section 2: Input/clean data *********************



****************** Section 3: Manipulate variables *********************

* Always give your variables labels when you define them.
gen mpg_price = mpg*price
  label var mpg_price "Miles Per Gallon times Price"
* Uses spaces to help denote commands which are secondary.
* Never use tabs instead of spaces
* because
* they
* are
* hard
* to
* read
* and can
* substantially
  * decrease
  * your
  * page space
  * Also, your code may
  * look different with different
  * programs.
  * This is very annoying.
  * I stuck a
  * lot of spaces to
  * simulate the 
  * Stata editor.
* Always explain why you do things.
drop if foreign == 1
  * We only are interested in domestic cars (for example).

* When doing any kind of looping also use indentation:
forv i = 1(1)10 {
  * When using forvalues never do i = 1/10 instead of i = 1(1)10 which are equivalent.
  * But i = 1/10 notation can cause problems when using macros.

  * It is very improtant to indent.
  if (`i' == 3) {
    * Do something

* I am displaying filler text when i==3 and only then
di "Filler"

* This will display i squared when i==3 (which is obviously 9)
di `i'^2


  }
  * End if
}
* End forv i loop

* Also, indent when commands go on multiple lines in length.
twoway (scatter  length gear_ratio) (scatter  foreign mpg_price) (scatter  price mpg) ///
  , title("This is a useless and meaningless graph")

* This can made commands much easier to read.

*************** End Section 3: Manipulate variables *********************

 * Also, take a look at some of the comments below.  There have been some very thoughtful contributions by Stata users.

Thursday, March 21, 2013

MSimulate Command

CLT is a powerful property

* Stata has a wonderfully effective simulate function that allows users to easily simulate data and analysis in a very rapid fashion.

* The only drawback is that when you run it, it will replace the data in memory with the simulated data automatically.

* Which is not a big problem if you stick a preserve in front of your simulate command.

* However, you may want to run sequential simulates and keep the data form all of the simulations together rather than only temporarily accessed.

* Fortunately we can accomplished this task by writing a small program.

cap program drop msim
program define msim

  * Gettoken will split the arguments fed into msim into those before colon and those after.
  gettoken before after : 0 , parse(":")

  * I really like this feature of Stata!

  * First let's strip the colon.  The 1 is important since we want to make sure to only remove the first colon.
  local simulation = subinstr("`after'", ":", "", 1)

  * Now what I propose is that the argument in `before' is used as an extension for names of variables created by the simulate command.

  * First let's save the current data set.


  * Generate an id that we will later use to merge in more data
  cap gen id = _n

  * Save the current data to a temporary location
  tempfile tempsave
  save `tempsave'

  * Now we run the simulation which wipes out the current data.
  `simulation'

  * First we will rename all of the variables to have an extension equal to the first argument
  foreach v of varlist * {
    cap rename `v' `v'`before'
  }

  * Now we need to generate the ID to merge into

  cap gen id = _n

  merge 1:1 id using `tempsave'
  * Get rid of the _merge variable generated from the above command.
  drop _merge
end


* Let's write a nice little program that we would like to simulate.
cap program drop simCLT
program define simCLT

  clear
  set obs `1'
  * 1 is defined as the first argument of the program sim

  * Let's say we would like to see how many observations we need for the central limit theorem (CLT) to make the means of a bernoulli distribution look normal.  Remember, so long as the mean and variance is defined the generally central limit theorem will eventually force any random distribution of means to approximate a normal distribution as the number of observations gets large.
  gen x = rbinomial(1,.25)

  sum x
end

* So let's see first how the simulate command works initially

simulate, rep(200): simCLT 100



* The simulate command will automatically save the returns from the sum command as variables (at least in version 12)

hist mean, kden
* The mean is looking good but not normal

* Now normally what we need to do would be to run simulate again with a different argument.

* But instead let's try our new command with 200!


* But instead let's try our new command!
clear
* Clear out the old results
msim 100: simulate, rep(200): simCLT 100

msim 200: simulate, rep(200): simCLT 200

* Looks good!

msim 400: simulate, rep(200): simCLT 400

msim 1000: simulate, rep(200): simCLT 1000

msim 10000: simulate, rep(200): simCLT 10000

msim 100000: simulate, rep(200): simCLT 100000

msim 1000000: simulate, rep(200): simCLT 100000

* The next two commands can take a little while.
msim 10000000: simulate, rep(200): simCLT 1000000

msim 100000000: simulate, rep(200): simCLT 10000000

sum mean*
* We can see that the standard deviations are getting smaller with a larger sample size.

* How is the histograms looking?
foreach v in 100 200 400 100 1000 10000 100000 1000000 10000000 {
  hist mean`v', name(s`v', replace)  nodraw  title(`v') kden

}

graph combine s100 s200 s400 s1000 s1000 s10000 s100000 s1000000 s10000000 ///
  , title("CLT between 100 and 10,000,000 observations")


* We can see that the distribution of means approximates the normal distribution as the number of draws in each sample gets large.

* This is one of the fundamental findings of statistics and pretty awesome if you think about it.

Tuesday, March 12, 2013

Non-verbal Reasoning Test Construction - Part 1 Shape Creation

# R script

# I have been interested in developing a non-verbal reasoning test.  In the next few posts I will go through the code I used to generate a little over 300 items.

# First I defined a number of transformations which I will
flip = function(z) z*-1

# Two commands to flip the images horizontally or vertically
hflip = function(Z) cbind(flip(Z[,1]), Z[,2])
vflip = function(Z) cbind(Z[,1], flip(Z[,2]))

# Commands to rotate the images 90 degrees clockwise or counterclockwise
rotateCC = function(Z) cbind(Z[,2], flip(Z[,1]))
rotateC = function(Z) -1*cbind(Z[,2], flip(Z[,1]))

# A command to rotate the coordinates 180
rotate180 = function(Z) Z*-1

# This command centers a vector so that the mean of all of the elements is zero 0
center = function(z) z-mean(c(min(z),max(z)))

# This function centers the matrix Z at zero for each column
Center = function(Z) cbind(center(Z[,1]),center(Z[,2]))
# This function scales and centers and object do that after centering the object the largest radius of any part of the object from the center is .5.
CenterScale = function(Z) Center(Z)/(2*max(abs(Center(Z))))

# This function makes the distance from any object from the 0,0 center to be unit 1 at the farthest point.
unitize = function(Z)  Z/max((Z[,1]^2+Z[,2]^2)^.5)

# This useful function samples i random draws without replacement from the vector V
rsample = function(V,i) V[order(runif(length(V)))][1:i]

# This function draws the coordinates for a polygon from the unit circle with number of sides equal to i.  If there is less than 3 sides specified then the number of sides is set to 50 making the polygon approximate a circle.
pgon = function(i, p=0, xscale=1) {
  if (i<3 i="50</p">  cbind(sin(pi*2*(1:i+p)/i)*xscale, cos(pi*2*(1:i+p)/i))/2
}

# This function draws a star with the number of points equal to i.
star = function(i, p=0) {
    pg1 = pgon(i,p)
    pg2 = pgon(i,.5+p) * .25

    pg = rbind(pg1[1,],pg2[1,])
 
    for (j in 2:i) pg = rbind(pg, pg1[j,],pg2[j,])
    pg
}

# This function draws a rectangle.  Speficy the rotation and it will construct the rectangle from around the unit circle.
rectangle = function(rad=.1) rbind(c(cos(rad),sin(rad)),
                                    c(-cos(rad),sin(rad)),
                                    c(-cos(rad),-sin(rad)),
                                    c(cos(rad),-sin(rad)))*.5

# This function will draw a cross.  Specify the starting location in the same manner as the rectangle.
cross = function(rad=.1) {
  S = cos(rad*pi)
  C = sin(rad*pi)
  sreturn= rbind(c(C,S),c(-C,S),c(-C,C),c(-S,C),c(-S,-C),c(-C,-C),
        c(-C,-S),c(C,-S),c(C,-C),c(S,-C),c(S,C),c(C,C))
  unitize(sreturn)*.5
  }

# Create a shape vector
shape_count=23
shape_vect = (1:23)[order(runif(shape_count))]
shape_vect = (1:23)

# Shape generating function
shape = function(shape_num=NA, sposition=c(0,0), size=1, dot=1, sdir=1) {
  sreturn=c(0,0)
  if (shape_num==shape_vect[1])  sreturn = pgon(1)
  if (shape_num==shape_vect[2])  sreturn = pgon(3)
  if (shape_num==shape_vect[3])  sreturn = pgon(4)
  if (shape_num==shape_vect[4])  sreturn = pgon(4,xscale=3)
  if (shape_num==shape_vect[5])  sreturn = pgon(4,.5)
  if (shape_num==shape_vect[6])  sreturn = pgon(5)
  if (shape_num==shape_vect[7])  sreturn = pgon(6,.5)
  if (shape_num==shape_vect[8])  sreturn = pgon(7)
  if (shape_num==shape_vect[9])  sreturn = pgon(8,.5)
  if (shape_num==shape_vect[10]) sreturn = star(3)
  if (shape_num==shape_vect[11]) sreturn = star(4,.5)
  if (shape_num==shape_vect[12]) sreturn = star(5)
  if (shape_num==shape_vect[13]) sreturn = star(6,.5)
  if (shape_num==shape_vect[14]) sreturn = rectangle(.2)
  if (shape_num==shape_vect[15]) sreturn = rectangle(1.8)
  if (shape_num==shape_vect[16]) sreturn = cross(.075)
  if (shape_num==shape_vect[17]) sreturn = cross(.4)
  if (shape_num==shape_vect[18]) sreturn = pgon(1,xscale=.5)
  if (shape_num==shape_vect[19]) sreturn = pgon(3,xscale=.3)
  if (shape_num==shape_vect[20]) sreturn = pgon(3,xscale=3)
  if (shape_num==shape_vect[21]) {
                        sreturn = pgon(1)[pgon(1)[,2]<.1,]
                        sreturn = cbind(sreturn[,1],sreturn[,2]+.3)
                      }

  if (shape_num==shape_vect[22]) {
                        sreturn = pgon(1)[pgon(1)[,1]<.1,]
                        sreturn = cbind(sreturn[,1]+.3,sreturn[,2])
                      }
                   
   if (shape_num==shape_vect[22]) {
                        sreturn = pgon(1)[pgon(1)[,1]<0 p="">                        sreturn.reverse = sreturn[nrow(sreturn):1,]
                        sreturn.reverse[,1] = sreturn.reverse[,1]*.5
                        sreturn = rbind(sreturn,sreturn.reverse)
                        sreturn = cbind(sreturn[,1]+.3,sreturn[,2])
                      }                  
  if (shape_num==shape_vect[23]) {
                        sreturn = pgon(1)[(pgon(1)[,1]>=0)&(pgon(1)[,2]>=0),]
                        sreturn = rbind(sreturn,0)
                     
                        sreturn = cbind(sreturn[,1]-.2,sreturn[,2]-.2)
                      }

  sreturn = unitize(sreturn)*size
  if (dot==1) sreturn = rbind(sreturn, cbind(NA,NA), size*(pgon(1)*.1+.75))

  # Rotate
  if (sdir==2)  sreturn=rotateCC(sreturn)
  if (sdir==3)  sreturn=rotateC(sreturn)
  if (sdir==4)  sreturn=rotate180(sreturn)

  sreturn=cbind(sreturn[,1]+sposition[1],sreturn[,2]+sposition[2])

  sreturn
}

# Plot a complete list of all of the shapes
par(mfrow=c(1,1))
plot(c(-1,1),c(-1,1), type="n")
for (i in 1:shape_count) {
  polygon(shape(i, c(sin(i*2*pi/shape_count),cos(i*2*pi/shape_count))*.9,size=.11), col="red", border="black")
  text(sin(i*2*pi/shape_count)*.9,cos(i*2*pi/shape_count)*.9,i,cex=2)
  }
for (i in 1:shape_count) polygon(shape(i,size=.7))


# This is an example of the 23 shapes available to be generated from the above code.

# I hope someone finds it useful.  In the next post I will show how I used the generated shapes to generate a set of problems intended to provide a test for visual reasoning.

Monday, March 4, 2013

Stata-Bloggers.com update

I have a new theme running on Stata-Bloggers.  I think this "professional" theme will serve nicely at organizing the aggregated posts into a nice magazine style layout.  Any comments or suggestions as towards how best to organize the layout would be appreciated.

As for blogs that the aggregator is aggregating across, there currently are six blog contributing to the aggregator including this one.

  • Code and Culture » Stata
  • Econometrics by Simulation
  • Fight Entropy
  • SRQM
  • Stata Daily
  • The Stata Things » Stata

  • A prominent blog not included in the aggregator is NotElsewhereClassified (the official Stata blog). They have declined inclusion.

    I am interested in collecting more blogs but am unaware of some of the blogs out there.  If you have any suggestions, please post them as comments on the bottom of this link.