Friday 7 June 2013

How To: Get the rendered HTML of a webpage with Python

Hello again!

This time I'm gonna show you how to get the rendered HTML of a webpage using Python. For this task we need only ten lines of code and to import urllib2, the library that it will help us. The rendered HTML of a webpage is the HTML that our web-browser receives from the server and renders on the screen. In simple words, this is what we see on the screen and not the code used to generate the webpage.
This HTML code I use to find specific tags into the code.

Here is the code:

def get_page_code(link):
    import urllib2                 # import the necessary library
    html = ""                      # initiate an empty string as the variable that holds the html code
    req = urllib2.Request(link)    # initiate a request
    try:
        response = urllib2.urlopen(req) # store the response in a file
        html = response.read()     # read the content of the file
        response.close()           # close the file
    except ValueError:
        return html                # if there is an error return the empty string
    return html                    # return the generated html


Keep in mind that you will get a long string as output.

Tuesday 4 June 2013

Set union of two lists in Python

Hi there!

This is my first blog post so please be patient with me. Today I started a small course about Computer Science and as programming language is using Python. For me this the first time when I use Python and I am impressed by the syntax and other features. I am coming from Java and c# but I hope I will do fine with Python to.

One of my tasks for today it was to define a function in Python that takes as input two lists a and b, it does set union on them and the result is the list a modified with the new values. I find it very easy to do, I didn't had to use any indexes or something like that, just 3 lines of code and problem solved.

Here is the code:

def union(a, b):
    for el in b:             #iterate over list b
        if el not in a:      #if element of b is not in a
            a.append(el)     #append the element to a

Now a test in console it should look like:
a = [1, 2, 4]
b = [2, 3, 6]
union(a, b)
print a
>>> [1, 2, 3, 4, 6]
print b
>>> [2, 3, 6]

That's it.