Wednesday, December 19, 2012

User written functions in R

R Script

# Writing your own functions can make your programs work much more efficiently, decreasing the lines of codes required to accomplish the desired results as well as simultaneously reducing the chance of error through repetative code.

# It is easy to take whole blocks of arguments from one function to the next with the ... syntax.  This is a sort of catch all syntax that can take any number of arguments.  Thus if you do not like the default choices for some functions such as the default of paste to use " " to conjoin different pieces then a simple function that I often include is the following:

# Concise paste function that joins elements automatically together.
p = function(...) paste(..., sep="")
  x=21
  p("For example x=",x)
  # This function is particularly useful when using the get or assign commands since they take text identifiers of object names.
  a21 = 230
  get(p("a",x))

# Print is a useful function for returning feeback to the user.
# Unfortunately this function only takes one argument.  By conjoining it with my new paste function I can easily make it take multiple arguments.
# Concise paste and print function
pp = function(...) print(p(...))
  # Print displays text.
  for (i in seq(1,17,3)) pp("i is equal to ",i)

# Round to nearest x
# Functions can also take any number of specific arguments.
# These arguments are either targeted by use of the order arguments are placed in or by specific references.
# If an argument is left blank then the default for that argument is used when available or an error message is returned.
round2 = function(num, x=1) x*round(num/x)
  # This rounding function is programmed to function similar to Stata's round function which is more flexible than the built in round command in R.
  # In R you specify the number of digits to round to.
  # In Stata however you specify a number to round.
  # Thus in Stata round(z,.1) is the same in R as round(z,1).
  # However, the Stata command is more general than the R one since Stata for instance could round to the nearest .25 while the R command would need to be manipulated to accomplish the same goal.
  # I have therefore written round2 which rounds instead to the nearest x.
  round(1.123, 2)
  round2(1.123, .01)
  # Yeild the same result. Yet, round2 will work with the following values.
  # Order is not neccessary.  The following produces identical results.
  round2(x=.01,num=1.123)
  round2(123, 20)
  # The round2 has a default x of 1 so round2 can be used quickly to round to the nearest whole number.
  round2(23.1)
  # The original round has the same default.

# Using modular arithmatic is often times quite helpful in generating data.
# The following function reduces a number to the lowest positive integer in the modular arithmatic.
mod = function(num, modulo) num - floor(num/modulo)*modulo
  # This syntax is programmed in a similar manner to Stata's mod command.
  mod(15,12)
  # This kind of a 12 modular system is the kind of system used for hours.
  # Thus mod(15,12) is the same as
  mod(3,12)
  # Or even:
  mod(-9,12)
  # There is a built in operator that does the same thing as this function.
  -9 %% 12

# Check for duplicate adjacent rows.  First row of a duplicate set is ignored but subsequent rows are not.
radjdup = function(...) c(F, apply(...[2:(dim(...)[1]),]==...[1:dim(...)[1]-1,],1,sum)==dim(...)[2])

# Half the data is duplicated.
  example.data = data.frame(id=1:100, x=mod(1:100, 5), y=rep(1:10,each=10))

# The data needs to be sorted for radjdup to work.
  example.order = example.data[order(example.data$x, example.data$y),]

cbind(example.order, duplicate=radjdup(example.order[,2:3]))

# Half the observations should be duplicated. We must sort our data for the adjected row duplicate command to be of any use with the current data.
  example.data = example.data[order(example.data[,1], example.data[,2]),]

# We expect to see half the observations are duplicate flagged.  This is because the first instance of any duplicate is not flagged.
  cbind(example.order, duplicate=radjdup(example.order[,2:3]))

No comments:

Post a Comment