org-mode/rorg.org

#+OPTIONS:    H:3 num:nil toc:t \n:nil @:t ::t |:t ^:t -:t f:t *:t TeX:t LaTeX:t skip:nil d:(HIDE) tags:not-in-toc
#+TITLE: rorg --- Code evaluation in org-mode, with an emphasis on R
#+SEQ_TODO:  TODO PROPOSED | DONE DROPPED MAYBE
#+STARTUP: oddeven

* Overview
This project is basically about putting source code into org
files. This isn't just code to look pretty as a source code example,
but code to be evaluated. Org files have 3 main export targets: org,
html and latex. Once we have implemented a smooth bi-directional flow
of data between org-mode formats (including tables, and maybe lists
and property values) and source-code blocks, we will be able to use
org-mode's built in export to publish the results of evaluated source
code in any org-supported format using org-mode as an intermediate
format.  We have a current focus on R code, but we are regarding that
more as a working example than as a defining feature of the project.

The main objectives of this project are...

# Lets start with this list and make changes as appropriate.  Please
# try to make changes to this list, rather than starting any new
# lists.

- [[* evaluation of embedded source code][evaluation of embedded source code]]
  - [[* execution on demand and on export][execution on demand and on export]]
  - [[* source blocks][source blocks]]
  - [[* inline source evaluation][inline source evaluation]]
  - [[* included source file evaluation][included source file evaluation]] ?? maybe
  - [[* caching of evaluation][caching of evaluation]]
- [[* interaction with the source-code's process][interaction with the source-code's process]]
- [[* output of code evaluation][output of code evaluation]]
  - [[* textual/numeric output][textual/numeric output]]
  - [[* graphical output][graphical output]]
  - [[* file creation][non-graphics file creation]]
  - [[* side effects][side effects]]
- [[* reference to data and evaluation results][reference to data and evaluation results]]
  - [[* reference format][reference format]]
  - [[* source-target pairs][source-target pairs]]
    - [[* source block output from org tables][source block output from org tables]]
    - [[* source block outpt from other source block][source block outpt from other source block]]
    - [[* source block output from org list][source block output from org list]] ?? maybe
    - [[* org table from source block][org table from source block]]
    - [[* org table from org table][org table from org table]]
    - [[* org properties from source block][org properties from source block]]
    - [[* org properties from org table][org properties from org table]]
- [[* export][export]]


* Objectives and Specs

** evaluation of embedded source code

*** execution on demand and on export
    Let's use an asterisk to indicate content which includes the *result* of code evaluation, rather than the code itself. Clearly
    we have a requirement for the following transformation:

    org \to org*

    Let's say this transformation is effected by a function
    `org-eval-buffer'. This transformation is necessary when the
    target format is org (say you want to update the values in an org
    table, or generate a plot and create an org link to it), and it
    can also be used as the first step by which to reach html and
    latex:

    org \to org* \to html

    org \to org* \to latex

    Thus in principle we can reach our 3 target formats with
    `org-eval-buffer', `org-export-as-latex' and `org-export-as-html'.

    An extra transformation that we might want is

    org \to latex

    I.e. export to latex without evaluation of code, in such a way that R
    code can subsequently be evaluated using
    =Sweave(driver=RweaveLatex)=, which is what the R community is
    used to. This would provide a `bail out' avenue where users can
    escape org mode and enter a workflow in which the latex/noweb file
    is treated as source.

**** How do we implement `org-eval-buffer'?

     AIUI The following can all be viewed as implementations of
     org-eval-buffer for R code:

***** org-eval-light
      This is the beginnings of a general evaluation mechanism, that
      could evaluate python, ruby, shell, perl, in addition to R.
      The header says it's based on org-eval

      what is org-eval??

      org-eval was written by Carsten.  It lives in the
      org/contrib/lisp directory because it is too dangerous to
      include in the base.  Unlike org-eval-light org-eval evaluates
      all source blocks in an org-file when the file is first opened,
      which could be a security nightmare for example if someone
      emailed you a pernicious file.

***** org-R
      This accomplishes org \to org* in elisp by visiting code blocks
      and evaluating code using ESS.

***** RweaveOrg
      This accomplishes org \to org* using R via

: Sweave("file-with-unevaluated-code.org", driver=RweaveOrg, syntax=SweaveSyntaxOrg)

***** org-exp-blocks.el
      Like org-R, this achieves org \to org* in elisp by visiting code
      blocks and using ESS to evaluate R code.

*** source blocks
(see [[* Special editing and evaluation of source code][Special editing and evaluation of source code]])

*** inline source evaluation
*** included source file evaluation
It may be nice to be able to include an entire external file of source
code, and then evaluate and export that code as if it were in the
file.  The format for such a file inclusion could optionally look like
the following

: #+include_src filename header_arguments

*** caching of evaluation

Any kind of code that can have a block evaluated could optionally define
a function to write the output to a file, or to serialize the output of
the function.  If a document or block is configured to cache input,
write all cached blocks to their own files and either a) hash them, or
b) let git and org-attach track them.  Before a block gets eval'd, we
check to see if it has changed.  If a document or block is configured to
cache output and a print/serialize function is available, write the
output of each cached block to its own file.  When the file is eval'd
and some sort of display is called for, only update the display if the
output has changed.  Each of these would have an override, presumably
something like (... & force) that could be triggered with a prefix arg
to the eval or export function.

For R, I would say

#+begin_src emacs-lisp
;; fake code that only pretends to work
(add-hook 'rorg-store-output-hook
    '("r" lambda (block-environment block-label)
        (ess-exec (concat "save.image("
                          block-environment
                          ", file = " block-label
                          ".Rdata, compress=TRUE)"))))
#+end_src

The idea being that for r blocks that get eval'd, if output needs to be
stored, you should write the entire environment that was created in that
block to an Rdata file.

(see [[* block scoping][block scoping]])

** interaction with the source-code's process
We should settle on a uniform API for sending code and receiving
output from a source process.  Then to add a new language all we need
to do is implement this API.

for related notes see ([[* Interaction with the R process][Interaction with the R process]])

** output of code evaluation
*** textual/numeric output
    We (optionally) incorporate the text output as text in the target
    document
*** graphical output
    We either link to the graphics or (html/latex) include them
    inline.

    I would say, if the block is being evaluated interactively then
    lets pop up the image in a new window, and if it is being exported
    then we can just include a link to the file which will be exported
    appropriately by org-mode.

*** non-graphics files
    ? We link to other file output
*** side effects
If we are using a continuous process in (for example an R process
handled by ESS) then any side effects of the process (for example
setting values of R variables) will be handled automatically

Are there side-effects which need to be considered aside from those
internal to the source-code evaluation process?

** reference to data and evaluation results
I think this will be very important.  I would suggest that since we
are using lisp we use lists as our medium of exchange.  Then all we
need are functions going converting all of our target formats to and
from lists.  These functions are already provided by for org tables.

It would be a boon both to org users and R users to allow org tables
to be manipulated with the R programming language.  Org tables give R
users an easy way to enter and display data; R gives org users a
powerful way to perform vector operations, statistical tests, and
visualization on their tables.

This means that we will need to consider unique id's for source
blocks, as well as for org tables, and for any other data source or
target.

*** Implementations
**** naive
     Naive implementation would be to use =(org-export-table "tmp.csv")=
     and =(ess-execute "read.csv('tmp.csv')")=.
**** org-R
     org-R passes data to R from two sources: org tables, or csv
     files. Org tables are first exported to a temporary csv file
     using [[file:existing_tools/org-R.el::defun%20org%20R%20export%20to%20csv%20csv%20file%20options][org-R-export-to-csv]].
**** org-exp-blocks
org-exp-blocks uses [[org-interblock-R-command-to-string]] to send
commands to an R process running in a comint buffer through ESS.
org-exp-blocks has no support for dumping table data to R process, or
vice versa.

**** RweaveOrg
     NA

*** reference format
This will be tricky, Dan has already come up with a solution for R, I
need to look more closely at that and we should try to come up with a
formats for referencing data from source-code in such a way that it
will be as source-code-language independent as possible.

*** source-target pairs

The following can be used for special considerations based on
source-target pairs

**** source block output from org tables
**** source block outpt from other source block
**** source block output from org list
**** org table from source block
**** org table from org table
**** org properties from source block
**** org properties from org table

** export
once the previous objectives are met export should be fairly simple.
Basically it will consist of triggering the evaluation of source code
blocks with the org-export-preprocess-hook.

This block export evaluation will be aware of the target format
through the htmlp and latexp variables, and can then create quoted
=#+begin_html= and =#+begin_latex= blocks appropriately.


* Notes
** Block Formats
   Unfortunately org-mode how two different block types, both useful.
   In developing RweaveOrg, a third was introduced.

   Eric is leaning towards using the =#+begin_src= blocks, as that is
   really what these blocks contain: source code.  Austin believes
   that specifying export options at the beginning of a block is
   useful functionality, to be preserved if possible.

   Note that upper and lower case are not relevant in block headings.

*** PROPOSED block format
I (Eric) propose that we use the syntax of source code blocks as they
currently exist in org-mode with the addition of *evaluation*,
*header-arguments*, *exportation*, *single-line-blocks*, and
*references-to-table-data*.

1) *evaluation*: These blocks can be evaluated through =\C-c\C-c= with
   a slight addition to the code already present and working in
   [[file:existing_tools/org-eval-light.el][org-eval-light.el]].  All we should need to add for R support would
   be an appropriate entry in [[org-eval-light-interpreters]] with a
   corresponding evaluation function.  For an example usinga
   org-eval-light see [[* src block evaluation w/org-eval-light]].

2) *header-arguments*: These can be implemented along the lines of
   Austin's header arguments in [[file:existing_tools/RweaveOrg/org-sweave.el][org-sweave.el]].

3) *exportation*: Should be as similar as possible to that done by
   Sweave, and hopefully can re-use some of the code currently present
   in [[file:existing_tools/exp-blocks/org-exp-blocks.el ][org-exp-blocks.el]].

4) *single-line-blocks*: It seems that it is useful to be able to
   place a single line of R code on a line by itself.  Should we add
   syntax for this similar to Dan's =#+RR:= lines?  I would lean
   towards something here that can be re-used for any type of source
   code in the same manner as the =#+begin_src R= blocks, maybe =#+src_R=? Dan: I'm fine with this, but don't think single-line
   blocks are a priority. My =#+R= lines were something totally
   different: an attempt to have users specify R code implicitly,
   using org-mode option syntax.

5) *references-to-table-data*: I get this impression that this is
   vital to the efficient use of R code in an org file, so we should
   come up with a way to reference table data from a single-line-block
   or from an R source-code block.  It looks like Dan has already done
   this in [[file:existing_tools/org-R.el][org-R.el]].

Syntax

Multi-line Block
: #+begin_src lang header-arguments
:  body
: #+end
- lang :: the language of the block (R, shell, elisp, etc...)
- header-arguments :: a list of optional arguments which control how
     the block is evaluated and exported, and how the results are handled
- body :: the actual body of the block

Single-line Block
: #+begin_src lang body
- It's not clear how/if we would include header-arguments into a
  single line block.  Suggestions? Can we just leave them out?  Dan:
  I'm not too worried about single line blocks to start off
  with. Their main advantage seems to be that they save 2 lines.
  Eric: Fair enough, lets not worry about this now, also I would guess
  that any code simple enough to fit on one line wouldn't need header
  arguments anyways.

Include Block
: #+include_src lang filename header-arguments
- I think this would be useful, and should be much more work (Dan:
  didn't get the meaning of that last clause!?).  Eric: scratch that,
  I meant "*shouldn't* be too much work" :) That way whole external
  files of source code could be evaluated as if they were an inline
  block. Dan: again I'd say not a massive priority, as I think all the
  languages we have in mind have facilities for doing this natively,
  thus I think the desired effect can often be achieved from within a
  #+begin_src block.  Eric: Agreed, while this would be a nice thing
  to include we shouldn't wast too much effort on it in the beginning.

What do you think?  Does this accomplish everything we want to be able
to do with embedded R source code blocks?

***** src block evaluation w/org-eval-light
here's an example using org-eval-light.el

first load the org-eval-light.el file

[[elisp:(load (expand-file-name "org-eval-light.el" (expand-file-name "existing_tools" (file-name-directory buffer-file-name))))]]

then press =\C-c\C-c= inside of the following src code snippet.  The
results should appear in a comment immediately following the source
code block.  It shouldn't be too hard to add R support to this
function through the `org-eval-light-interpreters' variable.

(Dan: The following causes error on export to HTML hence spaces inserted at bol)

 #+begin_src shell
date
 #+end_src

*** existing formats
**** Source code blocks
    Org has an extremely useful method of editing source code and
    examples in their native modes.  In the case of R code, we want to
    be able to use the full functionality of ESS mode, including
    interactive evaluation of code.

    Source code blocks look like the following and allow for the
    special editing of code inside of the block through
    `org-edit-special'.

#+BEGIN_SRC r

,## hit C-c ' within this block to enter a temporary buffer in r-mode.

,## while in the temporary buffer, hit C-c C-c on this comment to
,## evaluate this block
a <- 3
a

,## hit C-c ' to exit the temporary buffer
#+END_SRC

**** dblocks
    dblocks are useful because org-mode will automatically call
    `org-dblock-write:dblock-type' where dblock-type is the string
    following the =#+BEGIN:= portion of the line.

    dblocks look like the following and allow for evaluation of the
    code inside of the block by calling =\C-c\C-c= on the header of
    the block.

#+BEGIN: dblock-type
#+END:

**** R blocks
     In developing RweaveOrg, Austin created [[file:existing_tools/RweaveOrg/org-sweave.el][org-sweave.el]].  This
     allows for the kind of blocks shown in [[file:existing_tools/RweaveOrg/testing.Rorg][testing.Rorg]].  These blocks
     have the advantage of accepting options to the Sweave preprocessor
     following the #+BEGIN_R declaration.

*** block headers/parameters
regardless of the syntax/format chosen for the source blocks, we will
need to be able to pass a list of parameters to these blocks.  These
should include (but should certainly not be limited to)
- label or id :: Label of the block, should we provide facilities for
                 automatically generating a unique one of these?
- file :: names of file to which graphical/textual/numerical/tabular output
  should be written.  Do we need this, or should this be controlled
  through the source code itself?
- not sure of a good name here :: flags for when/if the block should
     be evaluated (on export etc...)
- again can't thing of a concise name :: flags for how the results of
     the export should be displayed/included
- scope :: flag indicating whether the block should have a local or
           global scope
- flags specific to the language of the source block
- etc...

** Interaction with the R process

We should take care to implement this in such a way that all of the
different components which have to interactive with R including:
- evaluation of source code blocks
- automatic evaluation on export
- evaluation of \R{} snippets
- evaluation of single source code lines
- evaluation of included source code files
- sending/receiving vector data

I think we currently have two implementations of interaction with R
processes; [[file:existing_tools/org-R.el][org-R.el]] and [[file:existing_tools/exp-blocks/org-exp-blocks.el ][org-exp-blocks.el]].  We should be sure to take
the best of each of these approaches.

More on the exchange of data at between org-mode and source code
blocks at [[* reference to data and evaluation results][reference to data and evaluation results]].

** block scoping
(see [[* caching of evaluation][caching of evaluation]])

This inadvertently raises the issue of scoping.  The pretend function
pretends that we will create a block-local scope, and that we can save
just the things in that scope.  Sweave takes the make-everything-global
approach.  I can see advantages either way.  If we make block-local
scopes, we can save each one independently, and generally speaking it
seems like more granularity==more control.  If we make everything
global, we can refer to entities declared in earlier blocks without
having to explicitly import those entities into the current block.  I
think this counts in the "need to think about it early on" category.

If we did want block-local scopes, in R we can start every eval with
something like

;; fake code that pretends to create a new, empty environment
(ess-exec (concat block-env " <- new.env()"))
(ess-exec (concat "eval(" block-contents ", envir=" block-env ")"))

If we decide we want block-scoping, I'm sure Dan and I can figure out
the right way to do this in R, if he hasn't already.  I haven't thought
at all about how these scope issues generalize to, say, bash blocks.

Maybe this is something that should be controlled by a header
argument?


* Tasks


* COMMENT Commentary
I'm seeing this as like commit notes, and a place for less formal
communication of the goals of our changes.

** Eric <2009-02-06 Fri 15:41>
I think we're getting close to a comprehensive set of objectives
(although since you two are the real R user's I leave that decision up
to you).  Once we've agreed on a set of objectives and agreed on at
least to broad strokes of implementation, I think we should start
listing out and assigning tasks.

** Eric <2009-02-09 Mon 14:25>
I've done a fairly destructive edit of this file.  The main goal was
to enforce a structure on the document that we can use moving forward,
so that any future objective changes are all made to the main
objective list.

I apologize for removing sections written by other people.  I did this
when they were redundant or it was not clear how to fit them into this
structure.  Rest assured if the previous text wasn't persisted in git
I would have been much more cautious about removing it.

I hope that this outline structure should be able to remain stable
through the process of fleshing out objectives, and cashing those
objectives out into tasks.  That said, please feel free to make any
changes that you see fit.


* Buffer Dictionary
 LocalWords:  DBlocks dblocks