Python Job Crawler Architecture
While active in the “job search” role, it was difficult going online and searching for jobs manually as there were many thousands of options. It occurred to me that I could automate, yes automate the process. The preferred language for this task was Python. Python had various in-built modules that could be composed together to accomplish the task, much more easily than other languages.
Thus, I started off building the components as follows:
A Job class to hold the details of jobs, such as title, description, company, contract type, salary, etc. this enabled an encapsulation of the essentials of a Job instance.
Then I had to think about the target jobs portals. I did pick a list of about 5 but ended up coding 4, since the coding task tended to be more like a “surgery” than just coding. A little mistake and the whole application is broken. These I added into a list (Python type array).
Then, once the target portals were selected, I had to compose the URL. I discovered that a few sites used a similar URL pattern, and this made the job easy. Nonetheless, it was still a challenging task to get the right combination of values. From a separate collection of list of job types, contract roles, salary, location, etc, I had to compose the exact URL as would be entered by a user search for similar jobs. When the URL failed, the program just trapped the error and moved on to the next URL. Sorted!
The next challenge was data storage. This was easy as SQLite was equal to the task. Likewise, Python had a module to plug into SQLite. So I created a job_list.db database and connected to it from code. The Jobs class enabled creating and instance and asking the instance to create itself in the database.
I did try to identify the id used for the jobs by each job portal. This was possible for about two sites, which used the same id for their jobs. However, for others it seemed like the id changed on and off. Nonetheless, I was able to retrieve non-duplicate job titles and I think the result would only have meant that some jobs were dropped when I have used values that were expected to be duplicated among jobs. The number of jobs received did not allow the loss to have such an impact.
I then loaded the data into a GUI and displayed the jobs in a list control. It also enabled me to click a link on a job I like, and it takes me to the page that gives the job description.
I have drawn out the architecture below, illustrating the interconnectivity between the components.
For the Future
The next plan, which I did not try, was to automate applications. What, with a program, I could apply to a hundred jobs in a second. Wouldn’t that be nice? But I had some challenges to face. I had to build custom cover-letters based on the job description. Perhaps this would be easy if I built ready-made letters for the job role, and only select special letters based on the identification of various values in the description text.