Jump to content

Any webscrappers here


Recommended Posts

Posted
Need to extract titles from html but getting request rejected

job_url = 'https://www.certipedia.com/companies/446478?locale=en'
resp = requests.get(job_url)
soup = BeautifulSoup(resp.content, 'html.parser')
In [16]:
title_tag_text = soup.title.text
print(title_tag_text)
Posted

First try in postman or other rest test tools. May be they are checking user agent and rejecting. Try to mimick request similar to how browser is sending. Chrome lo network tool lo request lo pampe headers anni pampi choodu. If it works, try removing one header at a time to find what they require. 

Posted
6 minutes ago, NeneRajuNeneManthri said:

First try in postman or other rest test tools. May be they are checking user agent and rejecting. Try to mimick request similar to how browser is sending. Chrome lo network tool lo request lo pampe headers anni pampi choodu. If it works, try removing one header at a time to find what they require. 

I'm using Anaconda/ jupyter lab.

Postman / rest test tools tho lagocha....

can you post example how headers are sent...I'm still novice in this webscrap game

Posted
7 minutes ago, Midnightsun said:

I'm using Anaconda/ jupyter lab.

Postman / rest test tools tho lagocha....

can you post example how headers are sent...I'm still novice in this webscrap game

Try postman tool and make a get request on that url.

 

Chrome lo right click and open inspect mode. Switch to network tab. Then reload the page. You will see what is send in request header. First do this before trying to implement in code.

 

Then in postman try to populate those request header and values from chrome. Google for more details on how to use postman. 

Posted
7 minutes ago, NeneRajuNeneManthri said:

Try postman tool and make a get request on that url.

 

Chrome lo right click and open inspect mode. Switch to network tab. Then reload the page. You will see what is send in request header. First do this before trying to implement in code.

 

Then in postman try to populate those request header and values from chrome. Google for more details on how to use postman. 

thanks, will look into that

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...