Sunday, May 06, 2007

Comment Delete

Just a quick fun story from Trevor Johnson that he wanted me to post. Enjoy :)

As Computer Science majors at North Dakota State University, our initial drive was to create a program utilizing Microsoft’s Visual C++.NET which neither of us were familiar with. We had limited experience using Visual Studio 2003; so we also took this opportunity to expand our knowledge base to encompass Visual Studio 2005. This then allowed us to build the application using the standalone features of Visual Studio .NET which allowed us to create the program, taking advantage of the .NET architecture but without using the common libraries; so that a user’s machine would not require the .NET 2005 Framework to be installed.

In searching for a program to create we noticed that in many code files available on the internet and distributed by various professors and other sources it is not unusual to find the number of comments outnumber the actual lines of code. This can reduce the readability of code and sometimes as much as double the file size making it tricky to store larger programs on flash drives.

We began project development by creating a command line C++ program implementing our initial algorithm for Java/C++ single line comments (//Comment) only. Using the C++ standard classes; FileWriter and FileReader, we originally made a simplistic version of a typical Find and Replace function which replaced comments with null characters. Because of the limitations of the FileReader class to navigate a text file one line at a time, we determined it lacked the dynamic abilities to remove multi-line comments (/*Comment*/) which would require a start symbol and an end symbol which could occur after any number of lines.

To resolve reliance on a starting and ending symbol we would need to navigate through the text file character by character. To accomplish this task we inevitably downgrade our technology to use fscanf to read in the text document a single character at a time. We modeled the algorithm after a Turing Machine which we had been introduced to us by John C. Martin III in Theoretical Computer Science II. The algorithm was as follows; after starting in an initial position in the text document we would progress character by character leaving each character intact until we encounter the specified start symbol at which point it would be replace by a null character or blank, this progression would then continue until closing with the end symbol.

Figure 1.
Turing Machine representation of algorithm




The Turing machine can be defined via the 7-Tuple structure; { Q, Γ, Δ, Σ, δ, q0, F } where Q = { q0, 1, 2, ha }, Γ = { Σ ∪ Δ }, δ: Q x Γ → Q x Γ x {L.R.S}, F = { ha }.
In our Turing machine “x” represents an arbitrary character from the ASCII character encoding standard, and the symbols α and β are defined as our staring and ending symbols as determined by the user’s comment selection. Since “x’ is an element of Σ and “x” is already known to be a member of the ASCII character encoding standard it is obvious that Σ is as well . Finally the Δ symbol will represent any form of null character such as λ or a blank, not to be confused with a space but rather an empty character.

After we had defined our second algorithm we began working on a user interface to best optimize the recursive nature of our created methods, which we have placed in a C++ header file that they could easily be extended to any GUI. Our user interface would allow a user to select the files and then choose the comment type to be deleted from a list of; Java/C++ Comments single and multi-line, Assembly comments (;Comment), Perl comments (#Comment) and HTML comments (). This selection would define our start and end symbols. A log was added to the GUI to show how long the process took, how many comments had been deleted, and how many lines the new document contains.

It was at this point that we encountered a problem when using the C++ FILE data type which required the passing of a constant char. The use of a constant char would not allow us to dynamically select the files to be parsed. To resolve this quagmire we used pointers to create two constant chars, one for the in File and one for the out File, and then proceeded to manually add it to the system memory reserved for the constant char.

<----------------------------------->
Hope You all enjoyed that as much as me, anyways the program in case you don't already have it.

Anonymous said...

Ha...Trevor your not really that smart. Way to fake it though. I know your just thinking about Asian women anyway.

Your comments Here! Hover Your cursor to leave a comment.