* ============================================================================. * ============================================================================. * ===== Syntax Name: IPEDS Comparison Group Select (AIR 2007). * ===== Author: Viktor Brenner. * ===== Copyright (C) 2006, 2007 Viktor Brenner and WCTC. * ===== Licensed under the Academic Free License. * ============================================================================. * ============================================================================. * ===== Purpose: Identify most similar institutions for use as an IPEDS comparison group. * ===== Parameters: Specify IPEDS UnitID of target institution and filename of data file. * ===== Parameters: Weights used to identify similarity can be adjusted as desired. * ===== Dependencies: Will create SPSS syntax file c:\temp\temp. * ===== Dependencies: Will create SPSS data files c:\temp\IPEDSweights and IPXXXXXX. * ===== Dependencies: Will create an IPEDS upload file CGXXXXXX uid, where XXXXXX is your UnitID. * ===== Input: Requires specific data file structure found in file provided. * ===== Output: Prints list the closest institutions matches. * ===== Output: Formatted text file identifying comparison group for uploading. * ============================================================================. * ============================================================================. * ===== WARNING: requires SPSS 15.0 or better. Earlier versions may be able to run this syntax . * ===== by removing all instances of the command 'dataset close all' . * ============================================================================. * ============================================================================. * ===== Specify filename for data file. file handle datafile /name = 'f:\data\planning\IPEDS\IPEDS_institutions_data.sav'. /*USER ENTERED*/ dataset close all. /*Ensures that no additional data files are open in SPSS 15 */ get /file = datafile. * ===== Specify UnitID of target institution. compute target = 239105. /* USER ENTERED: replace target with your institution's IPEDS UnitID */ exe. * ===== Ensures that data file is in correct order. sort cases by unitid. exe. * ============================================================================. * ===== Weights are assigned to institutional characteristics here. * ===== You can customize these values to adjuste the model of similarity applied. * ===== Check value labels in data file for meaning of IPEDS codes. * ============================================================================. * ===== Recodes IPEDS variables according to weighting scheme. recode control (1=1) (2=4) (3=10) (Else= -5) into wt_sector. /*Sector (public, private etc) */ recode hloffer (1=1) (3=5) (2=3) (4=7) (5=9) (6=10) (7=12) (8=13) (9=15) (-2 = 14) (Else= -5) INTO wt_HLO. /*Highest level of offering*/ recode hbcu (1 = 1) (Else=5) into wt_HBCU. /* Differentiates historically black colleges & universities*/ recode medical (1 = 1) (Else = 5) into wt_medschl. /*Differentiates institutions with medical schools*/ recode locale (1=15) (2=12) (3=10) (4=8) (5=5) (6=3) (7=1) (else= -5) into wt_setting. /*Urbanization*/ * ===== Geography is encoded both N-S and E-W. recode obereg (0 = 5) (1 = 1) (2 = 2) (3 = 4) (4 = 5) (5 = 3) (6 = 7) (7 = 7) (8 = 8) (9 = 1) into geo_ew. recode obereg (0 = 4) (1 = 1) (2 = 3) (3 = 3) (4 = 3) (5 = 6) (6 = 6) (7 = 3) (8 = 4) (9 = 9) into geo_ns. * ===== Extracts enrollment profile from longer string (Carnegie). compute wt_profile = -5. if substr(enrollmentprofile,1,4) = 'ExU2' wt_profile = 0. if substr(enrollmentprofile,1,4) = 'ExU4' wt_profile = 6. if substr(enrollmentprofile,1,3) = 'VHU' wt_profile = 8. if substr(enrollmentprofile,1,2) = 'HU' wt_profile = 9. if substr(enrollmentprofile,1,2) = 'MU' wt_profile = 11. if substr(enrollmentprofile,1,3) = 'HGP' wt_profile = 13. if substr(enrollmentprofile,1,3) = 'MGP' wt_profile = 15. * ===== Extracts variables from string encoding in 2005 Carnegie Classification. * ===== Institution size. compute wt_size = -5. /* Default for missing values*/ if index (sizesetting, 'L') = 1 wt_size = 12. if index (sizesetting, 'L') = 2 wt_size = 16. /* As in VL */ if index (sizesetting, 'M') = 1 wt_size = 8. if index (sizesetting, 'S') = 1 wt_size = 4. if index (sizesetting, 'S') = 2 wt_size = 2. if index (sizesetting, 'VS') = 1 wt_size = 1. if sector = 0 wt_size = 22. /* Systems are not coded in sizesetting but are identified in IPEDS Sector */ * ===== Setting. compute wt_set = -5. if index (sizesetting, '2') = 2 or index (sizesetting, '2') = 3 wt_set = 1. if index (sizesetting, '/N') < 5 and index (sizesetting, '/N' ) > 0 wt_set = 4. if index (sizesetting, '/R') < 5 and index (sizesetting, '/R' ) > 0 wt_set = 7. if index (sizesetting, '/H') < 5 and index (sizesetting, '/H' ) > 0 wt_set = 9. exe. * ===== Differentiates technical colleges, colleges, and universities. compute wt_name = 1. if index (instnm, 'University') > 0 wt_name = 9. if index (instnm, 'College') > 0 wt_name = 7. if index (instnm, 'Technical') > 0 wt_name = 4. exe. * ===== Bonus points for same-state institutions. compute wt_instate = 0. do if (unitid = target). write outfile ='c:\temp\temp.sps' / 'if fips =', fips,' wt_instate = 6.'. /*identifies state of target institution*/ end if. exe. * ===== Reads command written above, coding bonus points for in-state institutions. include 'c:\temp\temp.sps'. /*executes command from above to identify target state*/ exe. * ============================================================================. * ===== Prescreening. * ===== SPSS 15 is unable to process the entire dataset in the proximities command. * ===== This algorithm preselects the 1800 closest matches based on five variables. * ===== Alternately, an updated file can be obtained from SPSS to run the entire dataset. * ===== See SPSS knowledgebase resolution 71291. * ============================================================================. * ===== Creates variable screening to provide an initial estimate of distance from target institution. do if (unitid = target). compute temp1 = wt_sector. compute temp2 = wt_hlo. compute temp3 = wt_setting. compute temp4 = wt_profile. compute temp5 = wt_size. * ===== Writes command to calculate value of screening variable relative to target institution. write outfile = 'c:\temp\temp.sps' /'compute screening = abs(wt_sector - ',temp1,') + abs(wt_hlo - ',temp2,') + abs (wt_setting - ',temp3,') + abs(wt_profile - ', temp4,') + abs(wt_size - ',temp5,').'. end if. exe. include 'c:\temp\temp.sps'. /* Execute command written above to create screening variable */ exe. * ===== Exports your UnitID for use in filenames. string filenm (A6). compute filenm = ltrim(rtrim(string(target, F6.0))). exe. do if (unitid = target). write outfile = 'c:\temp\temp.sps' / "file handle IPEDS /name =", "'","c:\temp\CG",filenm,".uid","'." / "file handle screened_data /name =", "'","c:\temp\IP",filenm,"_data.sav","'.". end if. exe. include file = 'c:\temp\temp.sps'. exe. * ===== Selects top 1800 matches on screening variable for further analysis. sort cases by screening. exe. do if ($casenum = 1800). /* Chosen because SPSS 15 is known to be able to process at least 1800 cases */ write outfile = 'c:\temp\temp.sps' / 'select if screening <' screening '.'. end if. exe. include 'c:\temp\temp.sps'. exe. * ===== Resorts remaining cases in original order. sort cases by unitid. exe. * ===== With data file now screened, data is saved under a new name for further use. save/outfile = screened_data /keep = unitid to wt_instate. dataset close all. get /file = screened_data. * ===== Exports command to select target institution data from the results of Proximities command. * ===== Thanks to Reynaud Levesque and his website www.spsstools.net. * ===== for the idea of writing command files to change syntax behavior based on variable values. string varname (A9). do if (unitid = target). compute varname = concat('VAR',ltrim(string($casenum,F4.0))). write outfile ='c:\temp\temp.sps' / 'Rename variable (' varname'=distances).'. end if. Execute. * ===== Calculates Euclidian distances between remaining institutions. proximities wt_instate wt_sector wt_HLO wt_HBCU wt_medschl wt_setting geo_ew geo_ns wt_profile wt_size wt_set wt_name /view=case /print = none /ID = instnm /missing = include /matrix = out ('c:\temp\IPEDSweights.sav') /measure=euclid /standardize = none. dataset close all. get /file = 'c:\temp\IPEDSweights.sav'. /* Retrieve distances matrix created above */ * ===== Renames target institution's row in the matrix so that it is retained. include 'c:\temp\temp.sps'. exe. save /outfile = 'c:\temp\IPEDSweights.sav'. dataset close all. get/ file = screened_data. /* Reopen data file*/ * ===== Attaches target institution's distance data to the original data file. Match files /file=* /file='C:\Temp\IPEDSweights.sav' /keep unitid to wt_instate distances. /* All other distance variables are dropped*/ exe. save /outfile = screened_data. dataset close all. get /file = screened_data. * ===== Sorts cases then selects top matches. sort cases by distances. exe. * ===== The syntax looks at the distance of the 100th value on the list. * ===== 99 is the maximum number of comparison schools allowed. * ===== No. 101 is likely to have the same distance score as 100, so only closer items are retained. do if ($casenum = 100) . write outfile = 'c:\temp\temp.sps' / 'select if distances < ' distances '.'. /* Write select statement to external file*/ end if. exe. include file = 'c:\temp\temp.sps'. /*Read select statement just written*/ exe. * ===== Prints the list of matches and their variables.. Summarize /Tables=instnm city stabbr hloffer locale EnrollmentProfile SizeSetting distances /Format=Validlist Nocasenum Total Limit=100 /Title='Most Similar Institutions' /Missing = variable /Cells=Count . * ===== Create variables formatted for the IPEDS upload file. string institution (A60). compute institution = upcase (instnm). string muni (A50). compute muni = upcase (city). exe. * ===== Delete yourself from the list of matches. select if unitid ne target. exe. * ===== Export comparison group upload list. sort cases by unitid. exe. write outfile = IPEDS / unitid, '|', institution, '|', muni, '|', stabbr. exe. * ===== Restore data file. dataset close all. get /file = screened_data. exe.