Hi,
I have built a web-scraper to scrape data from JavaScript filled fields and tables. I used requests-html for this because it was much faster than using selenium/headless chrome and it still had support for rendering JavaScript data.
Now I want to put it behind some kind of proxy/vpn. I would prefer a socks5 kind of setup thinking this would be faster, but I honestly have no idea how I would set this up.
My code to get the rendered html looks like this at the moment…
def scrape(URL):
from requests_html import HTMLSession
session = HTMLSession()
resp = session.get(URL)
wait = resp.html.render(timeout=30)
session.close()
return resp.html.html
I post here hoping someone has experience with this kind of thing and can point me in the right direction from here. Even though you don’t have experience a qualified guess might be good as well.