home Mail List
Info
Info
Meetings
Goals
Upcoming
Projects
FAQ
Security
Links

[Date Prev][Date Next] [Chronological] [Thread] [Top]

[NMLUG] Perl Help. Please.



Parsing HTML is a huge pain.  About the time you get some hack that 
seems to work and gives you the results you want, the source goes and 
makes a subtle change that completely throws off your parser.

I think you might be on the right track to use lynx, links or w3m, and 
extract from the text output.

w3m started showing up on distros about a year ago, and I've been very 
impressed it with it.

jody

Jude Gabaldon wrote:
> Group,
>   I need help with the following perl problem.  I am using perl to parse 
> a web page and collect data the problem being that the matched text and 
> the potential variable on different lines (example below) 
> 
>                     </td>
>                   </tr>
>                   <tr>
>                     <td></td>
>                     <td>
>                       Car Color
>                     </td>
>                     <td colspan="2">
>                       Blue
>                     </td>
>                   </tr>
>                   <tr>
>                     <td></td>
>                     <td>
>                       Body Type
>                     </td>
>                     <td colspan="2">
>                       Two door
>                     </td>
>                   </tr>
> 
>                 <tr>
>                   <td colspan="4">
> 
> where Car Color is the matched text and Blue should be the stored 
> variable and where Body Type is the matched text and Two door is the 
> stored variable.  
> 
> Any help would be greatly appreciated.  I guess I could remove the new 
> line characters and split the page up with new lines after </tr> but I 
> didn't know if there was a better way.  Another way might be to call lynx 
> and parse the page as it appears to a web browser.  Any suggestions?
> 
> Jude
> 
> _______________________________________________
> NMLUG mailing list
> NMLUG@nmlug.org
> http://www.nmlug.org/mailman/listinfo/nmlug
> 

-- 
http://www.RealizationSystems.com/ -- start communicating
http://www.GalacticSlacker.com/ -- read it and weep
http://www.NMPerspective.com/ -- a Southwest Perspective




Please send sugestions and comments to webmaster@nmlug.org.
Valid XHTML 1.1! Valid CSS! Powered by Debian Powered by Apache