









|
[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
[NMLUG] Perl Help. Please.
Parsing HTML is a huge pain. About the time you get some hack that
seems to work and gives you the results you want, the source goes and
makes a subtle change that completely throws off your parser.
I think you might be on the right track to use lynx, links or w3m, and
extract from the text output.
w3m started showing up on distros about a year ago, and I've been very
impressed it with it.
jody
Jude Gabaldon wrote:
> Group,
> I need help with the following perl problem. I am using perl to parse
> a web page and collect data the problem being that the matched text and
> the potential variable on different lines (example below)
>
> </td>
> </tr>
> <tr>
> <td></td>
> <td>
> Car Color
> </td>
> <td colspan="2">
> Blue
> </td>
> </tr>
> <tr>
> <td></td>
> <td>
> Body Type
> </td>
> <td colspan="2">
> Two door
> </td>
> </tr>
>
> <tr>
> <td colspan="4">
>
> where Car Color is the matched text and Blue should be the stored
> variable and where Body Type is the matched text and Two door is the
> stored variable.
>
> Any help would be greatly appreciated. I guess I could remove the new
> line characters and split the page up with new lines after </tr> but I
> didn't know if there was a better way. Another way might be to call lynx
> and parse the page as it appears to a web browser. Any suggestions?
>
> Jude
>
> _______________________________________________
> NMLUG mailing list
> NMLUG@nmlug.org
> http://www.nmlug.org/mailman/listinfo/nmlug
>
--
http://www.RealizationSystems.com/ -- start communicating
http://www.GalacticSlacker.com/ -- read it and weep
http://www.NMPerspective.com/ -- a Southwest Perspective
|
|