Jump to content

Html Parser Help [ File Donwload Help]


Recommended Posts

Posted

[color=#ff0000]<div class="helpbox">
<h2>Downloads</h2>
<div class="help-details">
<ul><li>
<a type="application/x-zip-compressed" href="/[/color][color=#0000cd]Research-Statistics-Data-and-Systems/Statistics-Trends-and-Reports/MCRAdvPartDEnrolData/Downloads/2013/April/[/color][color=#008000]Monthly-Report-By-Contract-2013-04.zip[/color][color=#ff0000]">Monthly Enrollment by Contract &#8211; April 2013 [ZIP, 87KB]</a> </li>[/color]

aa blue dhi emo nenu http:/www.websitename.gov toh prefix chesukovaali aa green link is the file i have to download.



string source = "complete website source";

then do Match

string value1 = Regex.Match(source, "[b][color=#000080]<div class="help-details">[\s]*<ul>[\s]*<li>[\s]*<a[\s\S]+?>[/color][/b]").Value;

ippudu aa complete source lo <div nundi link tarvatha greater than daka match chesi aa value isthundi.



then match only HTTP link fro the value1

string value2 = Regex.Match(value1, "[color=#0000cd][b]http[\s\S]+?'[/b][/color]").Value; //matched http link until the single quote [b]http[\s\S]+?'[/b]


stri then remove the single quote, and do string concatenation


int len = value2.Length;


string result = "http://blahblahcom" + value2.Substring(0,len-1); //remove last char




nenu C# lo chesanu .... hope nee tool lo regular expression support undemo.. notepad++ lanti tool lo ne unnappudu pakka evo classes untayi automation tool lo..

ATB %$#$ %$#$ %$#$

  • Replies 50
  • Created
  • Last Reply

Top Posters In This Topic

  • charygaru

    22

  • cherlapalli_jailer

    8

  • Spartan

    4

  • galigannarao

    3

Top Posters In This Topic

Posted

[quote name='puli_keka' timestamp='1367602932' post='1303702030']
[color=#ff0000]<div class="helpbox">
<h2>Downloads</h2>
<div class="help-details">
<ul><li>
<a type="application/x-zip-compressed" href="/[/color][color=#0000cd]Research-Statistics-Data-and-Systems/Statistics-Trends-and-Reports/MCRAdvPartDEnrolData/Downloads/2013/April/[/color][color=#008000]Monthly-Report-By-Contract-2013-04.zip[/color][color=#ff0000]">Monthly Enrollment by Contract &#8211; April 2013 [ZIP, 87KB]</a> </li>[/color]

aa blue dhi emo nenu http:/www.websitename.gov toh prefix chesukovaali aa green link is the file i have to download.



string source = "complete website source";

then do Match

string value1 = Regex.Match(source, "[b][color=#000080]<div class="help-details">[\s]*<ul>[\s]*<li>[\s]*<a[\s\S]+?>[/color][/b]").Value;

ippudu aa complete source lo <div nundi link tarvatha greater than daka match chesi aa value isthundi.



then match only HTTP link fro the value1

string value2 = Regex.Match(value1, "[color=#0000cd][b]http[\s\S]+?'[/b][/color]").Value; //matched http link until the single quote [b]http[\s\S]+?'[/b]


stri then remove the single quote, and do string concatenation


int len = value2.Length;


string result = "http://blahblahcom" + value2.Substring(0,len-1); //remove last char




nenu C# lo chesanu .... hope nee tool lo regular expression support undemo.. notepad++ lanti tool lo ne unnappudu pakka evo classes untayi automation tool lo..

ATB %$#$ %$#$ %$#$
[/quote]

please you also add me in gtalk re ninnu kuda questions adgaali

[email protected] thanks in advance if you can add now great ledu kudaradu later anna kaani thanks but please add and answer my questions at your convenience

Posted

chariji check ur email

Posted

VB Experts check below code
in entire page "<a type="application/x-zip-compressed" this occurs only once

and length 45 is the from Starting "<" to the url begin "/"

so i took string from 45 char to total len-2 (which is to eliminate "> these two char)

only correction required is Dim url_ declaration as string is missing in below routine
PS: Most of(90%) the code is supplied by chariji only

Sub main

Dim regRegExp As New RegExp
Dim mMatch As Match
Dim url_
Dim cMatches As MatchCollection
regRegExp.Global = True
regRegExp.Multiline = False
regRegExp.Pattern = "<a type="application/x-zip-compressed" href=[\s]*<a[\s\S]+?>"
Set cMatches = regRegExp.Execute( webpage)

For Each mMatch In cMatches
set url_ = Mid(mMatch.Value,45,(len(mMatch.Value)-2))
Next

End Sub

Posted

office nunchi coehhesa mama will check and let you know soon

Posted

[quote name='charygaru' timestamp='1367620830' post='1303703621']
office nunchi coehhesa mama will check and let you know soon
[/quote]

small corrections vunnayi ...some VB experts will do that (pretty trivial) and also i explained what i did

×
×
  • Create New...