Html2dvi

[Japanese]
*** Several unnecessary space characters had sneaked into the script of version 0.6k and earlier. Sorry! ***
*** Can now handle multiple files. ***
*** Ver 0.7 is a garbage; get the latest version. ***

1. Setting Up

Html2dvi is a perl script which converts an HTML file into a TeX file, calls a TeX program to compile it into a dvi file, and then calls a previwer to display the result, if necessary. It does not handle graphic images; either the string [img] or the string specified by ALT attribute will be displayed instead of a graphic image. Thus it is a tool for reading text data of an HTML file using the DVI previewer.

  1. At the beginning of the script, you need to specify the path of the perl program you use, TeX program name, the possibility of your using Japanese, the command for converting codes of Japanese texts, the name of the dvi previewer, etc. Set these to suit your environment. If you handle texts containing Japanese using the standard non-Japanized perl, you need to set $perl_is_japanized = 0; (instead of the default 1), to prevent getting unreadable results. In this case, html2dvi cannot handle files containg very long lines.
  2. This script contains a Japanese character, so convert the code if necessary.
  3. Make the script executable by typing chmod +x html2dvi, and put it where executables are usually placed.
  4. You get a help message by just entering html2dvi.
  5. Typing ``html2dvi [options] xxxx.html'' will create a file xxxx.tex, and then TeX program is called to create the dvi file. Even when you specify an HTML file in a directory which is not current, the tex file will be created in the current directory, so be sure that you have the write permission there. If there exists a file of the name xxxx.tex, it will be renamed to xxxx.bak.
  6. If you specify the -v option, then the previewer will start after compilation of the dvi file.

    2. Controling the output

    Use the option -a4 or -b5 to specify A4/B5 paper size. There is also -long option to specify a very long sheet. More delicate control of size, baseline skip, magnification, etc. can be specified in the style file in the current directory. The default style file is html2dvi.st. If there exists such a file in the current directory but you do not want to use it, then specify the -f option. If you want to use a different file as the style file, then specify the name of the file after -f like -fxxxxx.st.

    When used with the -flip option, Html2dvi tries to convert the quotes like 'xyz' and "xyz" into `xyz' and ``xyz''. It sometimes works and sometimes doesn't. Quotes in PRE environment are not effected.

    3. Tables

    Options concerning tables
    Option Description
    -omittable The tables are not printed. A string [table] will appear in place of a table.
    -notable All the tags related to tables are disregarded, but the contents of cells are not. Each cell will make one or more paragraphs.
    The centering of the table will be disregarded.
    -nofold No linebreak occurs, and each column will be given a natural width. Alignment is also taken care of.
    Try these options on this HTML file.

    4. HTML files containing very long lines

    Sometimes we see HTML files with very long lines. These are usually produced by word processors. Since TeX cannot handle very long lines, html2dvi tries to locate a combination of a comma and a space, and inserts a newline character after such a combination, as long as it is not the end of line. Html2dvi also tries to locate a similar Japanese character. See the comment above about the variable $perl_is_japanized.

    5. More restrictions (i.e. bugs)

    6. How Html2dvi converts files

    Html2dvi first reads the whole file, and stores the table sizes and the captions in the memory. And then it reads the file from the top again, this time converting tags to corresponding tex command sequences and printing the texts almost verbatim.




    Back to Perl Page