Page 1 of 1

code folding for Stata

Posted: Fri May 08, 2009 10:16 pm
by zura
Hi jussij,

I just installed Zeus and liking it very much so far. But since I plan to use it mostly for "non-standard" language, found the need to do lot's of customizations, code folding being one of them. I'd appreciate much if you can add the support for it. Stata is a statistical analysis software and has it's own scripting language, here's example:

Code: Select all

Extension: .do; .ado; .class
Line Comment1: //
Line Comment2: *
Block Comment: /* */
Case Sensitive: yes

Begin: program
  End: end
Begin: class
  End: }
Begin: if
  End: }
Begin: else
  End: }
Begin: forvalues
  End: }
Begin: foreach
  End: }
Begin: while
  End: }
Begin: quietly
  End: }
Begin: noisily
  End: }
Begin: capture
  End: }

stata has two types of end-line delimiters cr and ";" so it would be nice if a multiline commands can be folded as well (more examples below)


Some sample Code:

program define matrix_capp
	version 8

	syntax anything [, miss(str) cons ts sort]

	local b12   : word 1 of `anything'
	local colon : word 2 of `anything'
	local b1    : word 3 of `anything'
	local b2    : word 4 of `anything'

	if `"`colon'"' != ":" {
		// example of a command spaning multiple lines
		display as err ///
			`"colon expected, `colon' found"'
		// this is the same as - display as err `"colon expected, `colon' found"'
		exit 198
	}

	// will use this part later
	tempname TMP
	local cnames
	forvalues j = 1/2 {
		forvalues i = 1/`=colsof(`b`j'')' {
			matrix `TMP' = `b`j''[....,`i']
			local cnames `"`cnames' "`: colfullnames `TMP''""'
		}
	}

	if ~missing("`sort'") {
		tempname TMP bb
		local rfullb1
		local rfullb2
		forvalues i=1/`=rowsof(`b1')' {
			matrix `TMP' = `b1'[`i', ....]
			local rfullb1 `"`rfullb1' "`: rowfullnames `TMP''" "' // we can't trust rowfullnames for multi-word names
		}
		forvalues i=1/`=rowsof(`b2')' {
			matrix `TMP' = `b2'[`i', ....]
			local rfullb2 `"`rfullb2' "`: rowfullnames `TMP''" "'
		}
		local rfullb : list rfullb1 | rfullb2

		matrix `bb' = J(`nr', `nc1' + `nc2', `miss')

		forvalues i=1/`nr' {
			local ii = rownumb(matrix(`b1'), `"`: word `i' of `rfullb''"')
			local ii = rownumb(matrix(`b2'), `"`: word `i' of `rfullb''"')
			if ~missing(`ii') matrix `bb'[`i', `nc1' + 1] = `b2'[`ii', 1...]
		}
		matrix rownames `bb' = `rfullb'
		matrix colnames `bb' = `cnames'
		matrix `b12' = `bb'
		exit
	}

	matrix colnames `bb'=`cnames'
	matrix `b12' = `bb'
end

// example with ";" delimiter

#delimit ;

		graph combine gic_total gic_urban gic_rural,
				imargin(0 1 0 0) ysize(6) xsize(8) iscale(0.8)
				graphregion(color(white) style(none) margin(0 0 0 0) lstyle(none))
				plotregion (color(white) style(none));


Thanks a lot,
zura

Posted: Sat May 09, 2009 2:52 am
by jussij
Hi Zura,

This should be fairly easy to add. Expect a need folding dll shortly ;)

Note: This is the folding definition that will be implemented since it is very easy to do and I think it will do what you want:

Code: Select all

Begin: program 
  End: end 
Begin: {
  End: }

I do have a couple of questions about the * line comment.

What is an example of this comment and how does it not get confused with the multiple character of the same name :?:

Is it column specific :?:

Cheers Jussi

Posted: Sat May 09, 2009 3:45 am
by jussij
A new xFolder.dll can be found here: http://www.zeusedit.com/z300/xFolder.zip

To install just backup the xFolder.dll in the Zeus install folder and replace it with the one in the zip file.
stata has two types of end-line delimiters cr and ";" so it would be nice if a multiline commands can be folded as well (more examples below)

This new folder does not implement the multiline commands folding as I am not really sure how this is meant to fold :?

What is the significance of the #delimit ; in the code :?:

To better understand this I think I need a few more example of these multiline commands ;)

Cheers Jussi

Posted: Sat May 09, 2009 5:33 am
by zura
wow jussi, that was fast! thanks a lot :)

I should've been more specific above:
* may be used only at the beginning of a line.

From the Stata help: "The /// comment indicator instructs Stata to view from /// to the end of a line as a comment and to join the next line with the current line. /// is one way to make long lines more readable. Like the // comment indicator, the /// indicator must be preceded by one or more blanks."

so /// can be used to split long lines:

Code: Select all

* this is an example of multi-line command with cr delimiter
regress y region1 region2 region3 ///
               educ1 educ2 educ3 ///
              age male 
* the above is equivalent to
regress y region1 region2 region3 educ1 educ2 educ3 age male
One can use different delimiters in one file (Stata interprets code line-by-line without compiling anything) and perhaps not a best programing style but doing so is sometimes useful:

Code: Select all

#delimit ;
use myfile.dta, clear;
   foreach var of varlist region1 region2 region3
                                  educ1 educ2 educ3 age male {;
         summarize `var';
   };
 #delimit cr
drop _all
I just realized that there might be problem with this kind of multi-line folding, currently it folds on {} so in this case "foreach var ..." line will stay above the folded "educ1 educ2 educ3 age male { ...}

one small thing nothing can follow "{" other then end-line delimiter but it may be following a word, for example this is not a syntax error:

Code: Select all

 forvalues i = 1/`rr'{; // no space between ' and {
but in this case folding brokes down... althogh

Code: Select all

forvalues i = 1/`=1+`rr''{; // two ' symbols not the "
works just fine. Any thoughts?

Posted: Sat May 09, 2009 10:43 am
by jussij
wow jussi, that was fast! thanks a lot
Luckily, implementing a new language folder in Zeus is fairly easy task ;)

Based on your latest details I have created a new xFolder.dll found here: http://www.zeusedit.com/z300/xFolder.zip

This version tries to implement folding of these multiline coments:

Code: Select all

* this is an example of multi-line command with cr delimiter 
regress y region1 region2 region3 /// 
               educ1 educ2 educ3 /// 
              age male 
This sort of folding can be fairly difficult for Zeus to implement as there is no obvious end of fold case.

So this addition might create some false fold points. Let me know how it goes.
Any thoughts?
This version also implements your original fold points (i.e. if is the start of fold rather than the { character etc).

The one issue with using these fold points is that code like this is then detected as a start of fold:

Code: Select all

if ~missing('ii') matrix 'bb'['i', 'nc1' + 1] = 'b2'['ii', 1...]
To eliminate this incorrect fold point an extra check was added to make sure a { character is also found on the line.

But this means thay code like this will now not fold correctly:

Code: Select all

if ~missing('ii') 
{
  matrix 'bb'['i', 'nc1' + 1] = 'b2'['ii', 1...]
}
The folding rules in Zeus are fairly simple and as such they don't work with all languages :(

Let me know which version you think works the best.

Cheers Jussi

Posted: Mon May 11, 2009 5:07 pm
by zura
Hi jussi,

somehow this latest version has more problems then the previous one:
* symbol (one-line comment) causes a fold.
and if we have a long command spanning multiple lines folding doesn't work...

few more clarifications:

Code: Select all

// this is a one-line comment, it can be either at the begining or at the end of a line

* this is a one-line comment too, but "*" must be in the first column only
/* 
a multi-line comment, that folds correctly with the current dll
*/

/*
if (expr)
{
 something
}

is invalid syntax in Stata, we have three rules:
1. the open brace must appear on the same line as if, else, forvalues, foreach, while, and class
2. nothing may follow the open brace (except comments), the first command to be executed must appear on a new line.
3. the close brace must appear on a line by itself.
*/

 // so the "usual" syntax would be:

forvalues i = 1/10 { // comments may be here
   ...
   ...
}

// the expression may contain symbols "{" and/or "}" so for a begin fold
// we should be looking for the last "{" in the line:

if ${globalname} == 5{ // and notice that space can be ommited between
                                   // 5 and "{"
 ... do something
}

/* there can be long expressions splitted into multipe lines */

if (`dataischange' == 1 | `userrequestedsave' == 1)    ///
  & `fileiswritable' == 1 {
  ... do something to save the file...
}

// of course we have a single command version of if/else where {} are
// not required

if `a' = 4 display "a equals to 4"

// class, forvalues, foreach, while MUST use braces even if there is only one command to execute
all these examples were with the cr end-of-the line delimiter, if we have a code with ; used then things can change; multiple commands can appear on the same (visually for us) line delimited by ;

Code: Select all

#delimit ;
generate a = 0; replace a = 1 if b > 100;

local i = 0;
while `i' < 10 {;
  display `i';
  local ++i;
};

// and although not aesthetically pleasing, this is a valid syntax:

forvalues i = 1/10 {; display `i';};

/* with ; delimiter we don't need to use /// to split long lines so, our multi-line if from above would look like:
*/
if (`dataischange' == 1 | `userrequestedsave' == 1)    
  & `fileiswritable' == 1 {;
  ... do something to save the file...;
};
hope this helps, and thanks a lot again for your support.
Cheers,
zura

Posted: Tue May 12, 2009 1:05 am
by jussij

Code: Select all

somehow this latest version has more problems then the previous one:
The problem is the Zeus folder has to make the decision as to whether a line is a fold using nothing but the information in that line.

But with so little information it is very easy to create a folder that creates fold where there are no folds or starts missing folds :(

For example, consider this code:

Code: Select all

if (`dataischange' == 1 | `userrequestedsave' == 1)    ///
  & `fileiswritable' == 1 {
  ... do something to save the file...
}
And compare it to this code:

Code: Select all

// of course we have a single command version of if/else where {} are
// not required
if `a' = 4 display "a equals to 4"
Zeus has to decide if the line containing the 'if' is a fold.

In the current folder both lines don't look like code folds as they don't have a '{' character, but obviously this is a mistake :(

Because of this I also think the multi-line comments are going to be a problem :(

Consider this multi-line comment:

Code: Select all

* this is an example of multi-line command with cr delimiter
regress y region1 region2 region3 ///
               educ1 educ2 educ3 ///
              age male
Then consider this one-line comment:

Code: Select all

* this is a one-line comment too, but "*" must be in the first column only
Because Zeus only has one line to work, both of these lines are effectively the same. With only the one line of information Zeus can not tell which is a multi-line and which is the single line comment :(

Code: Select all

if (`dataischange' == 1 | `userrequestedsave' == 1)    ///
  & `fileiswritable' == 1 {
  ... do something to save the file...
}
In any case the new xFolder.dll found here: http://www.zeusedit.com/z300/xFolder.zip

It will implement the following folding:
Extension: .do; .ado; .class
Line Comment1: //
Block Comment: /* */
Case Sensitive: yes

Begin: program
End: end
Begin: {
End: }
Cheers Jussi

Posted: Tue May 12, 2009 1:50 am
by zura
So I see now where was the confusion,
Consider this multi-line comment:
Code:
* this is an example of multi-line command with cr delimiter
regress y region1 region2 region3 ///
educ1 educ2 educ3 ///
age male
this is not a multi-line comment, but one line of comments and a command (regress ...) split into three lines ... anyways thanks for the update, I'll try the newest version and post the results.

Posted: Tue May 12, 2009 4:59 am
by jussij

Code: Select all

* this is an example of multi-line command with cr delimiter

I will change the folder to treat this as a line comment ;)

Cheers Jussi