Friday, March 16, 2012

HW 8: Employment Data

You are working in the University admissions office and students keep asking you about how the job market for different majors is fearing. You figure that a good place to start in answering that question is by looking at how many people currently work in each particular job in the US. You figure that someone must have already gathered that data and, after a bit of googling (or, binging) around, you come across the occupational employment statistics page from the Bureau of Labor Statistics. In that page you find a link to the raw data files.

For this homework you will only need to download the oe.occupation and oe.data.0.Current files.

First, take a look at the oe.occupation file. Each line in this file lists a different occupation by first listing the occupation code, a space, then the English description, a space, then a number (1). Your program will need to first read all these occupations codes and descriptions into an array so it can then ask the user for which occupation he wants to know the total number of people employed.

The oe.data.0.Current file contains all the data. It is a large file (266MB) and the .gov site is slow so, here is my copy. The oe.txt file describes the format of the data in this file in detail, but I will explain below what you need to do for this HW. The oe.data.0.Current file looks like
series_id year period value footnote_codes
OEUM001018000000000000001      2010 S01        61790 
OEUM001018000000000000002      2010 S01          2.2 
OEUM001018000000000000003      2010 S01        16.72 
OEUM001018000000000000004      2010 S01        34780 

The series_id (OEUM000040000000000000001) can be broken out into:
Code     Value(Example)

survey abbreviation =  OE
seasonal(code)  =  U
area_code  =  0000400
industry_code  =  000000
occupation_code  =  000000 
datatype_code  =  01

For this HW we are only interested in lines with a series_id that starts with OEUS and have a datatype_code (end with) of 01, which corresponds to "number of jobs". The occupation_code corresponds to the code from the oe.occupation file.

Your program will first read in and parse the occupations from the oe.occupation file. It will then show the user the list of all occupation codes, numbered, and ask him to pick one. It will then open up the oe.data.0.Current file and add up the values (fourth column) of all the rows that with a series_id that starts with OEUS, ends with 01, and matches the user's chosen occupation code. Finally, it will print out this number. Here is a sample run:

0 - All Occupations 
1 - Management Occupations 
2 - Chief Executives 
3 - General and Operations Managers 

....and so on.....

817 - Mine Shuttle Car Operators 
818 - Tank Car, Truck, and Ship Loaders 
819 - Material Moving Workers, All Other 

Which of the occupations above do you need data for?
Enter number:77
Working......
17750 persons work as Actuaries in the US.

Here are a few more sample outputs so you can see the sums that I got. I am omitting the list of occupations as it is always the same.
Enter number:66
Working......
3294290 persons work as Computer and Mathematical Occupations in the US.

Enter number:69
Working......
335330 persons work as Computer Programmers in the US.

Enter number:70
Working......
499880 persons work as Software Developers, Applications in the US.

Enter number:94
Working......
144870 persons work as Electrical Engineers in the US.

Enter number:101
Working......
223470 persons work as Mechanical Engineers in the US.

Enter number:119
Working......
1072400 persons work as Life, Physical, and Social Science Occupations in the US.

Enter number:146
Working......
3360 persons work as Sociologists in the US.

Enter number:1
Working......
6066780 persons work as Management Occupations in the US.

Enter number:295
Working......
7394880 persons work as Healthcare Practitioners and Technical Occupations in the US.


I note that my numbers do not exactly match theirs. For example, for "Computer Programmers" I got 335,330 but their webpage shows 333,620 (wepage with all occupations). I blame it on shoddy accounting in the Obama administration. Still, its pretty close.

TIP: You will want to create a class that holds the collection of occupations from the oe.occupation file, along with an Occupation class which holds just one occupation: its name and its code.

TIP: It takes my laptop about 15 seconds to process the whole file. That is way too long when trying to debug the program, so I added temporary code to only read in the first few thousand lines. I then get rid of that code when the program works. BTW, the file has about 5 million lines.

This homework is due in the dropbox.cse.sc.edu on Monday, March 26 @noon. When you turn it in, do not upload the text files. We already have a copy, and we don't need 120 more copies.



5 comments:

Jose Vidal said...

You will not be able to open oe.data.0.Current in eclipse. The file is too large. Above I show what it looks like. You can also try to open it with some other text editor (emacs works), or 'more' it in a terminal window (mac or linux), or write a Java program that reads the first 100 lines and prints them out, or ...what else?

Jose Vidal said...

Notice that every line in oe.occupation is the occupation code, followed by the name, followed by either 0 or 1.

Jose Vidal said...

Did you notice that in the US about the same number of people work in programming/IT than in all Science and Engineering fields combined. Yes, combined.

The number of people working in computers is expected to keep growing http://www.cse.sc.edu/job/computers-where-jobs-are

Anonymous said...

Wow! So in the future, if the number of Computer majors is almost the same as the number of Mechanical Engineering majors, do you think there will be more computer science jobs than people to fill them?

Jose Vidal said...

Every year since the 1980s there have been more jobs in software than computer majors.

So, where do all these programmers come from? They are full-time workers with degrees in other fields who learn to code on the job. There is data that shows that nearly half of the people working as software developers do not have a degree in computers. They learned while working. This is unlike nearly all other majors where close to 100% working on the field got a degree in the field.

Add to this the fact that even someone working as a Scientist, Engineer, or Analyst still needs to write the occasional program to analyze some numbers or automate a bit of workflow and the question is not whether or not you should learn to code, but when do you want to learn to code? now in school, or later while also working at a fulltime job?

BTW, the number of new computer majors, every year, in the US is larger than the number of MEs. However, the numbers in most Universities that offer both degrees is about the same. The reason is that there are relatively few engineering schools in the US, but there are many places without an engineering college that offer a "Computer Science" degree.