For example:
Listing all files whose names match a simple pattern.
Looking at "/^.*icon.*\.png$/i" from
I was able to get "^.*icon.*.png$" to work in R though I lost the case insensitivity. I think including the "^." ensures that only files in the current directory, not subdirectory are matched but I am not sure.
list.files("C:/Clipart/", pattern="^.*icon.*.png$")
[1] "manicon.png" "handicon.png" "bookicon.png"
Looking at the original entry we can see that what was causing us problems was the attempt to escape the "^" which does not need to be escaped in R.
Before looking at another example lets modify the previous command slightly to show how we can make it match differently.
list.files("C:/Clipart/", pattern="^.*icon*.*.png$")
[1] "manicon.png" "handicon.png" "bookicon.png" "iconnew.png"
There are a lot of resources available for regex since it is really its own text matching language supported by many different programming languages. A good introductory guide can be found:
or
For insensitivity to case either use the flag in list.files (ie, use "ignore.case=TRUE"), or include the (?i) flag inside the regular expression. The ^. is just saying "match the start of the name" of the file. It's the lack of "recursive=TRUE" that's causing the restriction to the given directory.
ReplyDeleteThank you, this is very useful!
DeleteThe "^" means that the regex must match starting at the beginning of the string you are testing. In the original example, "\.png$" means that you must explicitly match ".png" at the end of the string you are matching -- the backslash indicate the period is to be interpreted as a period, and not as a placeholder for any single character.
ReplyDeleteIn point of fact, the "^.*" bit is redundant and unnecessary. It just says that any characters (or none) must appear before the first real pattern -- "icon" -- is matched. You get the same results by leaving the "^.*" off. You should get the same results with
pattern = "icon.*.png$"
although you really want
pattern = "icon.*\\.png$"
here the double backslash period bit means to match a period explicitly -- it's not entirely obvious why you need to use a double backslash -- one is the usual for most regex implementations, as is the case with your initial example. Regardless, that way you get only matches that end with ".png". Without, you'd get matches to (e.g.) "iconXpng" which you wouldn't want -- not that this is likely to cause you any problems in this context.
Thanks for your thoughtful comment.
Delete