Internet Systems Assignment

1 Introduction
This assignment is based on real data processing I have had to undertake as part of a research project at Loughborough. A briefing with an overview of this work will be undertaken on Thursday of week 2 of the module. 2 Exercise You must write some Python and/or Shell code to process data files and produce an outputfile. The input files are logs from runs of an experiment on BGPSecurity(RPKI) and the output file is a measure of the performance of the run. There are two types of data file: publication and router. 2.1 Context In BGP Secrurity’s RPKI Infrastructure a Route Origin Authorisation (ROA) object consists of: • an AS number. • an IP address. • a prefix length. • a prefix maximum length. An example would be: 42 Here the prefixis the length 16,the maximum length 24 and the ASnumber42. (Don’t not worry about the term maximum length. For the purposes of this exercise all you need to know is that it is there). The purpose of a ROA is to form a cryptographic bind between an AS number to a prefix. ROAs are published by publication points and received at routers. We have log files logging the time of publication and receipt and these log files are the input to your program. 2.2 Input This files contain information about when ROA objects are published and received. Extracts from sample input files are shown in figure 1. • Objects can be published at many publication points. Each one has a publication log(pubd.log). Note from the figure that we are only interested in lines containing ROAs, these can be identified by the 4th field ending in .roa, and then we only need the time and the information from the last two fields. • Objects are received at many routers. Each one has an rtr-origin-client.log. Note we are only interested in the lines with a + in the 4th field. The AS number is then the fifth field and the prefix and it’s lengths are in the following field. Note that for both publication and router log files there are many lines which can be ignored. Files are presented in the following way. The network consists of a large number of machines, some of which act as publication points and some as routers. Each machine has a folder with a set of sub folders. If the file machinename/rpki/daemon-logs/pubd.log.gz exists, then this machine is a publication point, otherwise it’s a router (in this experiment a machine cannot be both). If a router, then the required log file is machinename/rpki/rtr-origin-client.log.gz All other files can be ignored. 2.3 Output We are interested in the time taken (in seconds) for each object to progress from the publication point to a router. One file of output is needed, which will have one line for each ROA object received at a router together with the time taken for that object to propagate from the publication point to the router. 2.4 Compression As the input files can be big they are compressed using gzip. Look into python functions for opening and reading compressed files. (Hint: 2.5 Your Code Your code needs to read all the files and process them. As with lab exercise 4, the key thing is to have a good set of internal data structure to hold the information as you read it from the files. 2.6 Sample Input A set of sample input files and folders are stored in sci-linux in the folder: ~eliwp/505-23.
Powered by