Midnightsun Posted January 19, 2022 Report Posted January 19, 2022 Need to extract titles from html but getting request rejected job_url = 'https://www.certipedia.com/companies/446478?locale=en' resp = requests.get(job_url) soup = BeautifulSoup(resp.content, 'html.parser') In [16]: title_tag_text = soup.title.text print(title_tag_text) Quote
NeneRajuNeneManthri Posted January 19, 2022 Report Posted January 19, 2022 First try in postman or other rest test tools. May be they are checking user agent and rejecting. Try to mimick request similar to how browser is sending. Chrome lo network tool lo request lo pampe headers anni pampi choodu. If it works, try removing one header at a time to find what they require. Quote
Midnightsun Posted January 19, 2022 Author Report Posted January 19, 2022 6 minutes ago, NeneRajuNeneManthri said: First try in postman or other rest test tools. May be they are checking user agent and rejecting. Try to mimick request similar to how browser is sending. Chrome lo network tool lo request lo pampe headers anni pampi choodu. If it works, try removing one header at a time to find what they require. I'm using Anaconda/ jupyter lab. Postman / rest test tools tho lagocha.... can you post example how headers are sent...I'm still novice in this webscrap game Quote
NeneRajuNeneManthri Posted January 19, 2022 Report Posted January 19, 2022 7 minutes ago, Midnightsun said: I'm using Anaconda/ jupyter lab. Postman / rest test tools tho lagocha.... can you post example how headers are sent...I'm still novice in this webscrap game Try postman tool and make a get request on that url. Chrome lo right click and open inspect mode. Switch to network tab. Then reload the page. You will see what is send in request header. First do this before trying to implement in code. Then in postman try to populate those request header and values from chrome. Google for more details on how to use postman. Quote
Midnightsun Posted January 19, 2022 Author Report Posted January 19, 2022 7 minutes ago, NeneRajuNeneManthri said: Try postman tool and make a get request on that url. Chrome lo right click and open inspect mode. Switch to network tab. Then reload the page. You will see what is send in request header. First do this before trying to implement in code. Then in postman try to populate those request header and values from chrome. Google for more details on how to use postman. thanks, will look into that Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.