bevarse Posted July 31, 2014 Report Posted July 31, 2014 bhayya, naa ku oka text file ichharu. It has bunch of url's and some other data. I want just the url's from that file. Ex: "http://blog.illumen.org/ways/?p=89917 crawl-frequently unchanged 0 0 0 1 0 0 2014/07/31 06:32 text/html 6614 91d875d71a9942 182 5" deenilo nunchi I should be able to extract only http://blog.illumen.org/ways/?p=89917 any ideas??
Spartan Posted July 31, 2014 Report Posted July 31, 2014 http://stackoverflow.com/questions/9125016/get-url-from-a-text
k2s Posted July 31, 2014 Report Posted July 31, 2014 m = re.findall("((http:|https:)//[^ \<]*[^ \<\.])",line)
bevarse Posted July 31, 2014 Author Report Posted July 31, 2014 example lo spl characters levu, but ikkada paste chesaka vachhayi "http://blog.illumen.org/ways/?p=89917 crawl-frequently unchanged 0 0 0 1 0 0 2014/07/31 06:32 text/html 6614 91d875d71a9942 182 5"
ChampakDas Posted July 31, 2014 Report Posted July 31, 2014 strt your search with http and go till the special characters
k2s Posted July 31, 2014 Report Posted July 31, 2014 http://stackoverflow.com/questions/9125016/get-url-from-a-text +_( more simpler m = re.findall("((http:|https:)//[^ \<]*[^ \<\.])",line)
bevarse Posted July 31, 2014 Author Report Posted July 31, 2014 +_( more simpler scrooge bhayya inka konchem detail ga cheppava nenu C# lo cheddam anukuntunna
k2s Posted July 31, 2014 Report Posted July 31, 2014 scrooge bhayya inka konchem detail ga cheppava nenu C# lo cheddam anukuntunna idi nenu rasindi python lo ... mostly regex is same irrespective of language.. m = re.findall("((http:|https:)//[^ \<]*[^ \<\.])",line) m is temp variable re is regex module dot is operator findall is attribute/class/function/sub-routine in the 're' module "" quotes madhyalo logic line is the string u pass & wants to try to find regex match logic in quotes against the line quotes "" lo unnadi logic ardam ayinda kaleda ?
macha Posted July 31, 2014 Report Posted July 31, 2014 m = re.findall("((http:|https:)//[^ \<]*[^ \<\.])",line) URL laa special char skip chestadii edi.. if its just plain url then work or else blast anukunta...
dkchinnari Posted July 31, 2014 Report Posted July 31, 2014 idi nenu rasindi python lo ... mostly regex is same irrespective of language.. m = re.findall("((http:|https:)//[^ \<]*[^ \<\.])",line) m is temp variable re is regex module dot is operator findall is attribute/class/function/sub-routine in the 're' module "" quotes madhyalo logic line is the string u pass & wants to try to find regex match logic in quotes against the line quotes "" lo unnadi logic ardam ayinda kaleda ? python vacha uncle neeku...intha talnet ekkadadhi neeku
bevarse Posted July 31, 2014 Author Report Posted July 31, 2014 idi nenu rasindi python lo ... mostly regex is same irrespective of language.. m = re.findall("((http:|https:)//[^ \<]*[^ \<\.])",line) m is temp variable re is regex module dot is operator findall is attribute/class/function/sub-routine in the 're' module "" quotes madhyalo logic line is the string u pass & wants to try to find regex match logic in quotes against the line quotes "" lo unnadi logic ardam ayinda kaleda ? Python lo Regex.Findall vundi C# lo ledu. mari ikkada yemi use cheyyali ?
Spartan Posted July 31, 2014 Report Posted July 31, 2014 python vacha uncle neeku...intha talnet ekkadadhi neeku he worked ahrd man..
k2s Posted July 31, 2014 Report Posted July 31, 2014 URL laa special char skip chestadii edi.. if its just plain url then work or else blast anukunta... explain
k2s Posted July 31, 2014 Report Posted July 31, 2014 Python lo Regex.Findall vundi C# lo ledu. mari ikkada yemi use cheyyali ? to understand regex... read this http://www.regular-expressions.info/refquick.html C# lo You don't need a regex for URLs, use System.Uri class for this. E.g. by using Uri.IsWellFormedUriString method for this: bool isUri = Uri.IsWellFormedUriString(url, UriKind.RelativeOrAbsolute); source : http://stackoverflow.com/questions/5717312/regular-expression-for-url
Recommended Posts