Html2dvi

[Japanese]
*** Several unnecessary space characters had sneaked into the script of version 0.6k and earlier. Sorry! ***
*** Can now handle multiple files. ***
*** Ver 0.7 is a garbage; get the latest version. ***

1. Setting Up

Html2dvi is a perl script which converts an HTML file into a TeX file, calls a TeX program to compile it into a dvi file, and then calls a previwer to display the result, if necessary. It does not handle graphic images; either the string [img] or the string specified by ALT attribute will be displayed instead of a graphic image. Thus it is a tool for reading text data of an HTML file using the DVI previewer.

At the beginning of the script, you need to specify the path of the perl program you use, TeX program name, the possibility of your using Japanese, the command for converting codes of Japanese texts, the name of the dvi previewer, etc. Set these to suit your environment. If you handle texts containing Japanese using the standard non-Japanized perl, you need to set $perl_is_japanized = 0; (instead of the default 1), to prevent getting unreadable results. In this case, html2dvi cannot handle files containg very long lines.
This script contains a Japanese character, so convert the code if necessary.
Make the script executable by typing chmod +x html2dvi, and put it where executables are usually placed.
You get a help message by just entering html2dvi.
Typing ``html2dvi [options] xxxx.html'' will create a file xxxx.tex, and then TeX program is called to create the dvi file. Even when you specify an HTML file in a directory which is not current, the tex file will be created in the current directory, so be sure that you have the write permission there. If there exists a file of the name xxxx.tex, it will be renamed to xxxx.bak.

If you specify the -v option, then the previewer will start after compilation of the dvi file.

2. Controling the output

Use the option -a4 or -b5 to specify A4/B5 paper size. There is also -long option to specify a very long sheet. More delicate control of size, baseline skip, magnification, etc. can be specified in the style file in the current directory. The default style file is html2dvi.st. If there exists such a file in the current directory but you do not want to use it, then specify the -f option. If you want to use a different file as the style file, then specify the name of the file after -f like -fxxxxx.st.

When used with the -flip option, Html2dvi tries to convert the quotes like 'xyz' and "xyz" into `xyz' and ``xyz''. It sometimes works and sometimes doesn't. Quotes in PRE environment are not effected.

3. Tables

Html2dvi cannot split a table; so it cannot handle a table which is too long to fit into a page. In that case, either use the -notable option (see below), or specify a long sheet in the style file, or use the -long option.
The alignment in cells are disregarded. If all the cells are small enough, try the -nofold option described below. If the widths of columns are not specified, then all the columns are given the same width.

Options concerning tables
Option	Description
`-omittable`	The tables are not printed. A string `[table]` will appear in place of a table.
`-notable`	All the tags related to tables are disregarded, but the contents of cells are not. Each cell will make one or more paragraphs. The centering of the table will be disregarded.
`-nofold`	No linebreak occurs, and each column will be given a natural width. Alignment is also taken care of.

Try these options on this HTML file.

4. HTML files containing very long lines

Sometimes we see HTML files with very long lines. These are usually produced by word processors. Since TeX cannot handle very long lines, html2dvi tries to locate a combination of a comma and a space, and inserts a newline character after such a combination, as long as it is not the end of line. Html2dvi also tries to locate a similar Japanese character. See the comment above about the variable $perl_is_japanized.

5. More restrictions (i.e. bugs)

When a tag is stated in two or more lines, only the information in the first line will be used, and the rest is disregarded.
The style atribute in the <OL> tag willbe disregarded.
Forms are disregarded.
<FONT> and </FONT> are also disregarded.
The subscripts and superscripts are not printed small: H₂O.
If "$use_japanese = 1;" and "$perl_is_japanized = 0;", then an underlined string is put in a horizontal box, so line breaks will not occur in the middle of the string. That means, an underlined string may stick out into the right margin. Otherwise, each character will be underlined separately, and line breaks may occur anywhere in the string. (Each character is treated as if it were a word by itself.)
Some of the special characters/symbols of the form &#ddd; and &name; are supported. Try html2dvi on the following pages to see which special characters/symbols are supported:
The depth of nested lists is not reset even in a cell of a table.
Lists not surrounded by tags like <UL> and </UL> may stick out to the left.
If TeX complains about 'missing such and such' or 'too many so and so', just enter r and pray!

6. How Html2dvi converts files

Html2dvi first reads the whole file, and stores the table sizes and the captions in the memory. And then it reads the file from the top again, this time converting tags to corresponding tex command sequences and printing the texts almost verbatim.

Back to Perl Page