Jump to content

Html Parser Help [ File Donwload Help]


Recommended Posts

Posted

simple Powershell script...


http://teusje.wordpress.com/2011/02/19/download-file-with-powershell/

  • Replies 50
  • Created
  • Last Reply

Top Posters In This Topic

  • charygaru

    22

  • cherlapalli_jailer

    8

  • Spartan

    4

  • galigannarao

    3

Top Posters In This Topic

Posted

[size=4]$storageDir = [/size]"C:\Downloads" --- ikkada nuvvu save cheyalsina location pettu
[size=4]$webclient = New-Object System.Net.WebClient[/size]
[size=4]$url = "[url="http://teusje.files.wordpress.com/2011/02/giraffe-header1.png"]http://teusje.files.wordpress.com/2011/02/giraffe-header1.png[/url]" --- this will be ur download URL link [/size]
[size=4]$file = "$storageDir\myNewFilename.jpg"[/size] --- File name ela save avvali anedi...
[size=4]$webclient.DownloadFile($url,$file)[/size]

Posted

andulo parse all "a" elements and then attributes of "a" elements. First "a" element had type and href attribute. Second had href and class attribute. So save in a dictionary contents of "a" elements attribute. Save each "a" elements attributes dictionary in an array. Iterate through array using loop with break in if condition if(a[n].key.isEqualToString("application/x-zip-compressed") { print (a[n].valueForKey("application/x-zip-compressed") }

Above is pseudo code. And the HTML you posted looks you can parse it using XML parser because it has proper end tags.

Posted

Powershell script undi...adi run cheyi..
edit the download URl in the script and File Save location..

http://gallery.technet.microsoft.com/scriptcenter/files-from-websites-4a181ff3



daily run cheyali ante..task scheduler lo e script petteseyi to run at particular time...

Posted

a file download chesi extract cheyalante use this..but works only on Windows Server...


http://gallery.technet.microsoft.com/scriptcenter/a6b10a18-c4e4-46cc-b710-4bd7fa606f95

Posted

simple ahes....
global a oka "link" ani variable declare & assign this link to that variable,,,, s%H#

Posted

[quote name='puli_keka' timestamp='1367529581' post='1303698359']
oka webpage source ni nuvvu first oka string loki techukunnav anuko..dantlo ee particular link kavali



[color=#ff0000]<div class='pagination clearfix left '> [/color]
[color=#ff0000]<li class='first'><a href='http://www.andhrafriends.com/topic/408271-na-friend-pelli-cancel-ayyindi/' title='Go to first page' rel='start'>&laquo; <!--First--></a></li>[/color]


ippudu only get the above text with the following regular expression match <div class='pagination clearfix left '>[\s\S]+?</li>

[\s\S]+ ante matches any character, then ?</li> ante untl first </li> is encountered


take the matched text and apply one more regular expression to get link http[\s\S]+?'

idhi http nundi aa last single quote daka match chesthadhi


then remove the last character and use it.. just verified in notepad++
[/quote]

idedo nenu vethukuthunna daaniki deggara ga undi thanks mama try chesi update istha.

Posted

[quote name='ChittiNaidu' timestamp='1367529619' post='1303698366']
simple Powershell script...


[url="http://teusje.wordpress.com/2011/02/19/download-file-with-powershell/"]http://teusje.wordpr...ith-powershell/[/url]
[/quote]
[quote name='ChittiNaidu' timestamp='1367530041' post='1303698407']
[size=4]$storageDir = [/size]"C:\Downloads" --- ikkada nuvvu save cheyalsina location pettu
[size=4]$webclient = New-Object System.Net.WebClient[/size]
[size=4]$url = "[url="http://teusje.files.wordpress.com/2011/02/giraffe-header1.png"]http://teusje.files....ffe-header1.png[/url]" --- this will be ur download URL link [/size]
[size=4]$file = "$storageDir\myNewFilename.jpg"[/size] --- File name ela save avvali anedi...
[size=4]$webclient.DownloadFile($url,$file)[/size]
[/quote]

the problem with this script is my link is not static every month marutha untadi andulo month part so i am not able to give a certain link and tell the process to download that particular file. poni month varaku inko variable iddam ante aa website mundakoduku prathi nela marchidengthundu aa month ni >.< like for jan it is january feb ki feb mar ki mar ani isthe apr ki april ani isthunnadu anduke nenu variable character string define cheyyaleka pothunna. that is where the problem arises. puli_keka cheppinattu try chesthe easy enudku ante that will copy whatever is in the link from http till </i>

Posted

1) oka webpage source ni nuvvu first oka string loki techukunnav anuko..dantlo ee particular link kavali

yes i copied the whole webpage source into a variable.



[color=#ff0000]2) <div class='pagination clearfix left '> [/color]
[color=#ff0000]<li class='first'><a href='[url="http://www.andhrafriends.com/topic/408271-na-friend-pelli-cancel-ayyindi/"]http://www.andhrafriends.com/topic/408271-na-friend-pelli-cancel-ayyindi/[/url]' title='Go to first page' rel='start'>&laquo; <!--First--></a></li>[/color]

mine looks like this

[color=#ff0000]<div class="helpbox">
<h2>Downloads</h2>
<div class="help-details">
<ul><li>
<a type="application/x-zip-compressed" href="/[/color][color=#0000cd]Research-Statistics-Data-and-Systems/Statistics-Trends-and-Reports/MCRAdvPartDEnrolData/Downloads/2013/April/[/color][color=#008000]Monthly-Report-By-Contract-2013-04.zip[/color][color=#ff0000]">Monthly Enrollment by Contract &#8211; April 2013 [ZIP, 87KB]</a> </li>[/color]

aa blue dhi emo nenu http:/www.websitename.gov toh prefix chesukovaali aa green link is the file i have to download.



ippudu only get the above text with the following regular expression match <div class='pagination clearfix left '>[color=#ff0000][\s\S]+?</li>[/color]

[\s\S]+ ante matches any character, then ?</li> ante untl first </li> is encountered

ee part daaka ardham ayyindi kaani nenu code etla rasukovaali i mean emani raayalo clarity osthale.


3) take the matched text and apply one more regular expression to get link http[\s\S]+?'

idhi http nundi aa last single quote daka match chesthadhi

idi ardham ayyinatte undi kaani what do you mean by applying a more regular expression anedi i missed out on . is that another variable i need to use?



4) then remove the last character and use it.. just verified in notepad++

ikkada daaka nenu raale kabatti naaku ardham ayyindo ledo cheppalenu;.

but you rock mayya naa problem ni ardham chesukoni baaga close answer ichhav. for all i know this is the solution as well kakapothe deduce chesukoleka pothunna idena or oka vela ide aithe nenem cheyyali ane part varaku :(

Posted

normally we use content grabber

ready made code ledu ippudu naadaggara

try this

[url="https://forums.digitalpoint.com/threads/content-grabber-grab-any-web-content.1164527/"]https://forums.digitalpoint.com/threads/content-grabber-grab-any-web-content.1164527/[/url]

[url="http://www.phpclasses.org/package/7477-PHP-Retrieve-remote-Web-page-content-with-headers.html"]http://www.phpclasses.org/package/7477-PHP-Retrieve-remote-Web-page-content-with-headers.html[/url]


it will grab the entire content into ur application from there it is upto us how to parse it

Posted

chary garu scripiting languages use chesi cheyavachu.. like pearl... andulo static link aithe hard code cheyadame ledante edanna dynamic link aithe u need to make the user to input it... okappudu chesna idi... chudali ante time padthadi.

Posted

[quote name='aamerica_seenu' timestamp='1367592673' post='1303700968']
chary garu scripiting languages use chesi cheyavachu.. like pearl... andulo static link aithe hard code cheyadame ledante edanna dynamic link aithe u need to make the user to input it... okappudu chesna idi... chudali ante time padthadi.
[/quote]

pearl lanti scripting languages use chesi oka vela chesthe na automate aa script ni accept chesthado ledo telvadu mayya :( but yueah if you dont think it is toomuch of a work let me know i will tell u em cheyalo naa requirement endo

Posted

[quote name='charygaru' timestamp='1367594709' post='1303701128']
pearl lanti scripting languages use chesi oka vela chesthe na automate aa script ni accept chesthado ledo telvadu mayya :( but yueah if you dont think it is toomuch of a work let me know i will tell u em cheyalo naa requirement endo
[/quote]


can give it a try mayya... cant exactly assure... I worked on data mining... andulo links decipher avi untai.. so chance undi anukuntunna... as i said can try

Posted

[quote name='cherlapalli_jailer' timestamp='1367592182' post='1303700934']
normally we use content grabber

ready made code ledu ippudu naadaggara

try this

[url="https://forums.digitalpoint.com/threads/content-grabber-grab-any-web-content.1164527/"]https://forums.digit...ontent.1164527/[/url]

[url="http://www.phpclasses.org/package/7477-PHP-Retrieve-remote-Web-page-content-with-headers.html"]http://www.phpclasse...th-headers.html[/url]


it will grab the entire content into ur application from there it is upto us how to parse it
[/quote]

ma tool lo direct ga mottam source ni grab chese oiption already undi mama i need to only grab one link out of all the webpage then compare that link which has date time tag in it to the date set in computer and when it is matched download that file else move out of the loop. precisely this is what i am looking to do.

Posted

[quote name='charygaru' timestamp='1367595004' post='1303701163']
ma tool lo direct ga mottam source ni grab chese oiption already undi mama i need to only grab one link out of all the webpage then compare that link which has date time tag in it to the date set in computer and when it is matched download that file else move out of the loop. precisely this is what i am looking to do.
[/quote]

this is my requirement mama can you give me a script which lets me grab just a single link from a 700 lines webpage and save that link in a variable of my choice ?
[quote name='aamerica_seenu' timestamp='1367594806' post='1303701145']


can give it a try mayya... cant exactly assure... I worked on data mining... andulo links decipher avi untai.. so chance undi anukuntunna... as i said can try
[/quote]

×
×
  • Create New...