Python Installation |
Python 2.6 htql.zip
Python 2.7 htql.zip
Python 3.2 htql.zip
Python 3.3 htql.zip
Pthon 2.4, GNU Linux x86.64 (htql.so)
Pthon 2.6, GNU Linux x86.64 (htql.so)
Pthon 2.7, GNU Linux x86.64 (htql.so)
Pthon 3.3, GNU Linux x86.64 (htql.cpython-33m.so)
Pthon 3.6, GNU Linux x86.64 (htql.cpython-36m-x86_64-linux-gnu.so)
Pthon 2.6, Debian Linux i686.64 (htql.so)
Pthon 2.6, RedHat Linux x86.64 (htql.so)
Go: github
COM Installation |
Python Example |
A simple example to extract url and text from links.
import htql; page="<a href=a.html>1</a><a href=b.html>2</a><a href=c.html>3</a>"; query="<a>:href,tx"; for url, text in htql.HTQL(page, query): print(url, text); |
An example using htql.Browser:
import htql; a=htql.Browser(); b=a.goUrl("http://www.bing.com/"); c=a.goForm("<form>1", {"q":"test"}); for d in htql.HTQL(c[0], "<a (tx like '%test%')>"): print(d); e=a.click("<a (tx like '%test%' and not (href like '/search%'))>1"); |
If you have installed IRobotSoft Web Scraper, you can browse the web visually with:
a=htql.Browser(2); |
An example to parse state and zip from US address using HTQL regular expression:
import htql; address = '88-21 64th st , Rego Park , New York 11374' states=['Alabama', 'Alaska', 'Arizona', 'Arkansas', 'California', 'Colorado', 'Connecticut', 'Delaware', 'District Of Columbia', 'Florida', 'Georgia', 'Hawaii', 'Idaho', 'Illinois', 'Indiana', 'Iowa', 'Kansas', 'Kentucky', 'Louisiana', 'Maine', 'Maryland', 'Massachusetts', 'Michigan', 'Minnesota', 'Mississippi', 'Missouri', 'Montana', 'Nebraska', 'Nevada', 'New Hampshire', 'New Jersey', 'New Mexico', 'New York', 'North Carolina', 'North Dakota', 'Ohio', 'Oklahoma', 'Oregon', 'PALAU', 'Pennsylvania', 'PUERTO RICO', 'Rhode Island', 'South Carolina', 'South Dakota', 'Tennessee', 'Texas', 'Utah', 'Vermont', 'Virginia', 'Washington', 'West Virginia', 'Wisconsin', 'Wyoming']; a=htql.RegEx(); a.setNameSet('states', states); state_zip1=a.reSearchStr(address, "&[s:states][,\s]+\d{5}", case=False)[0]; # state_zip1 = 'New York 11374' state_zip2=a.reSearchList(address.split(), r"&[ws:states]<,>?<\d{5}>", case=False)[0]; # state_zip2 = ['New', 'York', '11374'] |
JavaScript Example |
The following example shows the use of HTQL in an HTML page with JavaScript. The JavaScript code in this HTML page retrieves the first <a> tag from http://www.ncbi.nlm.nih.gov/ and show it in the HTML body.
<!--- test.html --> <html> <base href="http://www.ncbi.nlm.nih.gov/"> <body> <script language=JavaScript> var a= new ActiveXObject("HtqlCom.HtqlControl"); a.setUrl("http://www.ncbi.nlm.nih.gov/"); a.setQuery("<a>"); document.write(a.getValueByIndex(1)); </script> </body> </html> |
Visual Basic Example |
The following Visual Basic example does the same thing and shows the result in a message box:
' VB example Dim a As Object Set a = CreateObject("HtqlCom.HtqlControl") i = a.setUrl("http://www.ncbi.nlm.nih.gov/") i = a.setQuery("<a>") MsgBox (a.getValueByIndex(1)) |