Python for beginners: how to command the Web’s
Problem: Submitting homework requires navigating through the maze of web pages so complex that I send the job to the wrong place several times. In addition, although this process takes only 1-2 minutes, sometimes it seems like an insurmountable obstacle (for example, when I finished the task too late at night and can hardly remember my password).
Solution: Use Python to automatically submit completed tasks! Ideally, I could save the task, type a few keys and load my work in seconds. At first it sounded too good to be true, but then I discovered Selenium, a tool that you can use with Python to navigate the web.
Each time we repeat tedious actions on the Internet with the same sequence of steps, this is a great chance to write a program to automate the process. With Selenium and Python, we just need to write the script once, and then we can run it as many times as necessary and save ourselves from repeating the same tasks (and in my case the possibility of sending the task to the wrong place is excluded)!
Here I will look at a solution I developed to automatically (and correctly) submit my assignments. Along the way, we will cover the basics of using Python and selenium for programmatic web management. Although this program works (I use it every day!), It is quite individual, so you cannot copy and paste the code for your application. However, the general methods here can be applied to an unlimited number of situations. (If you want to see the full code, it is available on GitHub ).
Before we get to the interesting part of automation, we need to find out the general structure of our solution. Starting programming without a plan is a great way to spend many hours and be disappointed. I want to write a program to send completed class assignments to the right place in Canvas ( "learning management system" of my university). Let's start again, I need a way to tell the program the name of the job to send and the class. I used a simple approach and created a folder to store completed tasks with child folders for each class. In the child folders, I put the finished document, named for a specific task. The program can find out the name of the class by the folder and the name of the task by the name of the document.
Here is an example where the class name is EECS491, and the task is Task 3 — Output in large graphic models. ”
File structure (left) and Complete Assignment (right).
The first part of the program is a loop that goes through the folders to find the job and class that we store in the Python tuple:
# os for file management import os # Build tuple of (class, file) to turn in submission_dir='completed_assignments' dir_list=list(os.listdir(submission_dir)) for directory in dir_list: file_list=list(os.listdir(os.path.join(submission_dir, directory))) if len(file_list) != 0: file_tup=(directory, file_list) print(file_tup)
('EECS491', 'Assignment 3 - Inference in Larger Graphical Models.txt')
This takes care of file management, and now the program knows the class and task to include. The next step is to use Selenium to go to the correct web page and download the task.
Web control with Selenium
To get started with Selenium, we import the library and create a web driver, which is a browser controlled by our program. In this case, I will use Chrome as a browser and send the driver to the Canvas website, where I submit jobs.
import selenium # Using Chrome to access web driver=webdriver.Chrome() # Open the website driver.get('https://canvas.case.edu')
When we open the Canvas web page, we are faced with the first obstacle - the entry field! To get around this, we will need to enter the identifier and password and press the login button.
Imagine that a web driver is a person who has never seen a web page before: we need to say exactly where to click, what to print, and which buttons to click. There are several ways to tell our web driver which elements to find, and they all use selectors. Selector is a unique identifier for an element on a web page. To find the selector for a particular element, say, the “CWRU ID” field, we need to look at the code of the web page. In Chrome, this can be done by pressing “Ctrl + Shift + I” or by right-clicking on any element and selecting “View Code”. This opens up the Chrome Developer Tools , an extremely useful application that displays HTML underlying any web page .
To find the selector for the “CWRU ID” field, I right-clicked on the field, clicked “View Code” and saw the following in the developer tools. The highlighted line corresponds to the id_box element (this line is called the HTML tag).
This HTML may look overwhelming, but we can ignore most of the information and focus on parts of CDMY0CDMY and CDMY1CDMY. (they are known as HTML tag attributes).
To select the CDMY2CDMY field using our web driver, we can use the CDMY3CDMY or CDMY4CDMY attribute that we found in the developer tools. Web drivers in Selenium have many different ways to select items on a web page, and often there are several ways to select the same item:
# Select the id box id_box=driver.find_element_by_name('username') # Equivalent Outcome! id_box=driver.find_element_by_id('username')
Our program now has access to CDMY5CDMY, and we can interact with it in various ways, such as entering keys or pressing (if we selected a button).
# Send id information id_box.send_keys('my_username')
We perform the same process for the password field and the login button, selecting each one depending on what we see in the Chrome developer tools. Then we send the information to the elements or click on them as necessary.
# Find password box pass_box=driver.find_element_by_name('password') # Send password pass_box.send_keys('my_password') # Find login button login_button=driver.find_element_by_name('submit') # Click login login_button.click()
Once we are logged in, this slightly intimidating toolbar welcomes us:
We again need to run the program through the web page, indicating exactly those elements that need to be clicked, and the information that needs to be entered. In this case, I tell the program to select courses from the menu on the left, and then the class corresponding to the task that I need to pass:
# Find and click on list of courses courses_button=driver.find_element_by_id('global_nav_courses_link') courses_button.click() # Get the name of the folder folder=file_tup # Class to select depends on folder if folder == 'EECS491': class_select=driver.find_element_by_link_text('Artificial Intelligence: Probabilistic Graphical Models (100/10039)') elif folder == 'EECS531': class_select=driver.find_element_by_link_text('Computer Vision (100/10040)') # Click on the specific class class_select.click()
The program finds the correct class using the name of the folder that we saved in the first step. In this case, I use the CDMY6CDMY selection method to find a specific class. The “link text” for an element is just another selector that we can find by looking at the page:
This workflow may seem a little tedious, but remember that we only need to do this once when we write our program! After that, we can click “Run” as many times as we want, and the program will go to us on all these pages.
We use the same process of checking the page - selecting an element - interacting with the element to go through a couple more screens. Finally, we reach the job submission page:
At that moment I could see the finish line, but initially this screen puzzled me.I could quite easily click on the “Select file” field, but how should I choose the right file to upload? The answer is incredibly simple! We find the CDMY7CDMY field using the selector and use the CDMY8CDMY method to transfer the exact path to the file (called CDMY9CDMY in the code below) to the block:
# Choose File button choose_file=driver.find_element_by_name('attachments[uploaded_data]') # Complete path of the file file_location=os.path.join(submission_dir, folder, file_name) # Send the file location to the button choose_file.send_keys(file_location)
By sending the exact file path, we can skip the entire process of navigating folders to find the file you need. After sending the path, we get the following screen showing that our file is uploaded and ready to be sent.
Now we select the button “Submit a task”, click, and our task is sent!
# Locate submit button and click submit_assignment=driver.find_element_by_id('submit_file_button') submit_assignent.click()
File management is always a critical step, and I want to be sure that I won’t resubmit or lose old jobs. I decided that the best solution would be to save the file that will be placed in the CDMY10CDMY folder and move the files to the CDMY11CDMY folder as soon as they are downloaded. The last bit of code uses the os module to move the completed job to the right place.
# Location of files after submission submitted_file_location=os.path.join(submitted_dir, submitted_file_name) # Rename essentially copies and pastes files os.rename(file_location, submitted_file_location)
All source code is packaged in a single script that I can run from the command line. To limit the possibility of errors, I submit only one task at a time, which is not difficult, given that the launch of the program takes only about 5 seconds!
Here's what it looks like when I run the program:
The program gives me the opportunity to make sure that this is the correct task before downloading. After the program ends, I get the following output:
While the program is running, I can watch how Python works for me:
The Python automation technique is great for many tasks, both general and in my area of data science. For example, we could use Selenium to automatically download new data files every day (assuming the website does not have a API ) Although scripting might seem time-consuming at first glance, the advantage is that we can get the computer to repeat this sequence as many times as we want, in exactly the same way. The program will never lose focus and go to Twitter. It will accurately follow the steps with perfect sequence (the algorithm will work fine until the site changes).
I must mention that you must be careful before automating critical tasks. This example is relatively low risk since I can always go back and resubmit jobs, and I usually double-check the program. Websites are changing, and if you do not change the program in response, you can get a script that does something completely different than you originally expected!
In terms of payback, this program saves me about 30 seconds for each task, and it takes 2 hours to write it. So, if I use it to complete 240 assignments, I will be a plus in time! However, the return on this program is to develop a cool solution to the problem and learn a lot in the process. Although my time could have been spent more efficiently on completing assignments rather than figuring out how to pass them automatically, I completely enjoyed this task. There are a few things that bring satisfaction such as problem solving, and Python is a pretty good tool to do this..
Learn the details of how to get a sought-after profession from scratch or Level Up in skills and salary by completing SkillFactory paid online courses:
- /Machine Learning Course a>(12 weeks)
- Learning Data Science from scratch (12 months)
- Analyst profession with any starting level (9 months)
- Python course for web development (9 months)
- Trends in the Data Scene 2020
- Data Science has died. Long live Business Science
- Cool Data Scientists do not waste time on statistics
- How to become a Data Scientist without online courses
- Sorting cheat sheet for Data Science
- Data Science for the humanities: what is “data”
- Steroid Data Scenes: Introducing Decision Intelligence