GOVERNMENT
Is everybody ready?
With the Large Hadron Collider starting next year, grid sites worldwide are ramping up their equipment and service levels in preparation for the expected flood of data. But will they be ready - and how can any problems they're having be fixed? GridPP's management board have recently completed a round of visits to all 17 GridPP sites to determine how prepared they are for LHC start-up.
Dave Britton, GridPP's Project Manager, co-ordinated the process. He explains, "We wanted to check with each site that they would be ready for next year, and find out how we could help with any difficulties. The best way to do this was to visit each site individually, so we could talk to the people there in confidence." Since March this year, four groups from the management team have made a tour of GridPP's sites. GridPP has four regional Tier-2s (SouthGrid, NorthGrid, ScotGrid and LondonGrid). A team was allocated to each Tier-2, including an EGEE technical co-ordinator. Dave Kelsey, who led the team looking at ScotGrid, found the process very worthwhile. "While each site had a slightly different story to tell, overall many of the same issues arose across sites. At all sites we were able to identify a mixture of good practice and areas where lessons could be learned from other sites. ScotGrid seems actually to be working as a distributed Tier-2 and not just a collection of sites, which is encouraging for future support of LHC." Neil Geddes led the SouthGrid review team, and was also pleased with the results, "Overall we were impressed by the effectiveness with which the SouthGrid sites appeared to be working together and by their clear commitment to the LHC project and the support from their institutions". He adds, "A recurring theme was the frustration created by the lack of a recognised improvement process in some areas of grid development and the apparent unresponsiveness of some software developers. These are both challenging tasks in worldwide projects such as the grid, but important ones that we need to tackle." Each site was first sent a questionnaire to fill in, covering issues such as the hardware available at each site, its users, networking, middleware and staffing. This then formed the basis of the site visit, including a couple of hours discussion to find out that site's strengths and weaknesses. Tony Doyle, leading the team reviewing NorthGrid, noted that "The scale of the operations at the NorthGrid sites was already very impressive. Improved monitoring of the existing large infrastructure and improved communication flow between the NorthGrid sites was high on the feedback list. Its pleasing that significant steps have already been taken to improve both these areas in the last few months." Dave Newbold, who was part of the London review team, found, "The London institutes are already providing an impressive resource, with a great variety of technical solutions. This distributed system has been built and operated, with mostly good efficiency, by ensuring excellent communication between sysadmins at the different sites. It was also good to see close links between resource providers and experimental users, which are is essential for the real LHC data-taking. The challenges for the future will be to keep these diverse computing resources operating in a Grid environment which has up to now been fairly inhomogeneous." Each site has had chance to comment on the results of its review, and is receiving confidential feedback from the review team. The sites visits have results in 23 issues which GridPP are addressing, including: * difficulties for smaller sites in meting the two hour response time required from Tier-2 sites. * the complexity of installing new VOs. * training on site monitoring using Nagios and Ganglia. GridPP have already responded to some of the concerns raised, for example by developing EGEE's first policy on when sites should stop stalled jobs, which has been presented to the LCG Management Board. The project is also working on ensuring security policies are harmonised so the response to any security incident is clear. In addition, the review has given the GridPP management team a strong mandate to raise issues of middleware support and releases with the LCG Management Board. The whole list of issues arising from the site reviews, and the actions being taken to address them, can be seen at www.gridpp.ac.uk/tier2/Tier-2_Review_Issues_2007.doc Professor Peter Hobson at Brunel University hosted one of the visits, "At first we were sceptical that this would be a useful exercise, but it was actually very helpful for us to think about the answers to the questionnaire, and be able to explain our priorities, constraints and concerns directly to the GridPP management". Source: GridPP