Sunday, June 8, 2025

Progress in Programming Practice 1: Automatically Collecting Data from Websites



Collecting data is the first step in conducting research. It is very important, which is why I set my first programming practice to automatically collect data by coding.

As a novice programmer, my initial step was to collect content from a single webpage using code. Collecting data from the internet is an essential method nowadays, given the convenience and plethora of resources available. Even though there are abundant resources online, collecting them manually is time-consuming and overwhelming. I believe it will be very helpful in my future research if I can use code to automatically collect data from the internet.

To achieve this, I first chose a simple HTML webpage, input the URL into Python code, and set a keyword. It was then easy to download the webpage and collect relevant text using keywords as tags. However, this wasn't very useful. To collect comprehensive data, the program needed to be able to select many webpages and collect content by keywords automatically.

Therefore, it became indispensable to set up numerous loops in the program and define many trigger conditions for these loops. The program needed to handle many decisions by itself, such as how to choose different webpages, sign in, discern old data from new, handle errors while opening webpages, save data according to different conditions, and so on.

In recent weeks, I spent a lot of time practicing how to set up multiple-level loops, as I found that configuring loops can be very complicated. For some periods, I felt confused about why the loops weren't working as I thought they would. After struggling with it, I found that using "matrix" (or perhaps a structured way to manage conditions/states,) can provide better triggers for complicated conditions.

Now, I have practiced multiple aspects of multi-stepped loops and successfully finished the small program. I have completed over a thousand lines of Python code, and I have a much better understanding of how to achieve automatic actions using code. This practice has truly taken a lot of time, but I believe it will be worth it.

No comments:

Post a Comment

Celebrating the Completion of Web 2.0 Course

  Celebrating the Completion of Web 2.0 Course Completing the Web 2.0 Learning and Performance course marks a significant milestone in my pr...