tag:blogger.com,1999:blog-6288862798546085706.post8692206222076187978..comments2024-02-08T03:39:11.256-05:00Comments on Econometrics By Simulation: It is time for RData files to become the standard for Data TransferFrancishttp://www.blogger.com/profile/16658586705916884436noreply@blogger.comBlogger14125tag:blogger.com,1999:blog-6288862798546085706.post-45523103715176846272015-06-11T21:25:14.776-04:002015-06-11T21:25:14.776-04:00Here is a test for one data.table using 'rhdf5...Here is a test for one data.table using 'rhdf5" package:<br />.Rdata: 10MB, 2.8sec to write, 0.6sec to read<br />.h5: 21MB, 25sec to write, 2.8sec to read<br /><br />at least in this case, .RData wins hands downAnonymousnoreply@blogger.comtag:blogger.com,1999:blog-6288862798546085706.post-54240041020474882032014-11-11T14:41:08.740-05:002014-11-11T14:41:08.740-05:00Has anyone compared to SAS xpt files?Has anyone compared to SAS xpt files?Matthew Martinhttps://www.blogger.com/profile/03395599411699593043noreply@blogger.comtag:blogger.com,1999:blog-6288862798546085706.post-31682815724301831902014-03-29T00:09:52.175-04:002014-03-29T00:09:52.175-04:00Exactly. Why use rdata, which nothing other than R...Exactly. Why use rdata, which nothing other than R really reads, instead of HDF5, which everything under the sun can read.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-6288862798546085706.post-20326706638766718392014-03-24T08:14:36.713-04:002014-03-24T08:14:36.713-04:00Oh, I didn't know that comment() was invented!...Oh, I didn't know that comment() was invented! I have to start using that.Daghttps://www.blogger.com/profile/10101621423847235283noreply@blogger.comtag:blogger.com,1999:blog-6288862798546085706.post-15471090018738484792014-03-24T07:16:28.926-04:002014-03-24T07:16:28.926-04:00Why not to use HDF?Why not to use HDF?Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-6288862798546085706.post-85975342845100993332014-03-23T22:29:44.958-04:002014-03-23T22:29:44.958-04:00README <- scan("readme.txt",what=&quo...README <- scan("readme.txt",what="character",sep="n")<br />now you have a README within the .RDataAnonymousnoreply@blogger.comtag:blogger.com,1999:blog-6288862798546085706.post-89428583534655564122014-03-23T16:41:05.737-04:002014-03-23T16:41:05.737-04:00One potential problem with your proposal: I genera...One potential problem with your proposal: I generated a 61 gigabyte database last week in the course of running some monte carlo simulations. I'm pretty happy that I chose to save it as an SQLite file instead of an RData file (although SQLite doesn't support concurrent writes, which is a pain). Do you know of any ways to incrementally load or save to RData files?Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-6288862798546085706.post-19608047093317435972014-03-23T15:55:54.663-04:002014-03-23T15:55:54.663-04:00File formats are often dictated by legacy code. Fr...File formats are often dictated by legacy code. French official stats often come in old formats because the backend is coded in SAS or something like that. If you want to change the data standard, you have to provide these legacy routines in R language and hope that they get picked up as quickly as possible.Fr.https://www.blogger.com/profile/00949205875058796064noreply@blogger.comtag:blogger.com,1999:blog-6288862798546085706.post-81530760312330237562014-03-23T15:53:43.202-04:002014-03-23T15:53:43.202-04:00That'd be reinventing the comment() function. ...That'd be reinventing the comment() function. All this is usually dealt with by a package architecture. I use GitHub repos with README files to get approaching results.Fr.https://www.blogger.com/profile/00949205875058796064noreply@blogger.comtag:blogger.com,1999:blog-6288862798546085706.post-69414674713342668972014-03-22T14:35:31.253-04:002014-03-22T14:35:31.253-04:00Thanks for this. I agree wholeheartedly. My indus...Thanks for this. I agree wholeheartedly. My industry tends to use CSV and SASBDAT (SAS) data files. I tend to find that the SASBDAT files are actually larger! than csv files. I really like the readme/info idea with the RData and will start doing that more often.Nickhttps://www.blogger.com/profile/14415888678069845319noreply@blogger.comtag:blogger.com,1999:blog-6288862798546085706.post-6993149097388444272014-03-22T12:58:18.116-04:002014-03-22T12:58:18.116-04:00Now now now, everyone knows the official standard ...Now now now, everyone knows the official standard data format is Excel.xlsx :-( . I'll just point out you've left out ENVI, gzip, tar, .idt, TIFF, JPG, and about a zillion other file formats. The difficulty in getting .Rdata to be accepted is not only unpacking the objects (which could include closures as well as data arrays and structures) but writing intepreters for the objects. That's a big job -- tho' I was happy to see that Mathematica released an intepreter last year. cellocgwhttps://www.blogger.com/profile/14770145678802745861noreply@blogger.comtag:blogger.com,1999:blog-6288862798546085706.post-74746359762415864892014-03-22T11:11:48.320-04:002014-03-22T11:11:48.320-04:00A reasonable comment of course. I would suggest p...A reasonable comment of course. I would suggest providing data in two formats: cvs and Rdata. However, if Rdata files become the norm then the responsibility of compatibility between systems will shift to proprietary software providers for providing compatibility with Rdata files. Since, the source code for these files is open then they will have few excuses for not complying to the standards for data transfer.Francishttps://www.blogger.com/profile/16658586705916884436noreply@blogger.comtag:blogger.com,1999:blog-6288862798546085706.post-21030079879647730972014-03-21T15:41:51.011-04:002014-03-21T15:41:51.011-04:00I am concerned that by transferring data with the ...I am concerned that by transferring data with the .RData extension, the data may become unusable in other programs until it's been opened in R and written out with a new extension. Compressing data is a good idea, which is exactly what R does when you call the base::save function. For example, compressing a text file with gzip can be read into SAS without first unzipping the file in R and saving it with a more universally accepted extension.Maxhttps://www.blogger.com/profile/11767434041793920379noreply@blogger.comtag:blogger.com,1999:blog-6288862798546085706.post-59090977080342112092014-03-21T06:36:50.898-04:002014-03-21T06:36:50.898-04:00I am just transferring some files from MATLAB form...I am just transferring some files from MATLAB format to R, so I can give one more data point for your data-compression dataset:<br /><br />The example is a dataset of numeric values, size 589904 x 7.<br />CSV: 32.07 MB<br />Matlab: 14.63 MB, read time 260 ms<br />RData: 7.58 MB, read time 320 ms<br /><br />The idea of including a "readme" object in RData files is really useful! One can even include an info() function that gives you the main information (variable labels etc) about the dataset(s) just by writing info().<br />Daghttps://www.blogger.com/profile/10101621423847235283noreply@blogger.com