|
1. How do I use this? 2. How does it work? 3. What's with the blue and yellow? 4. Why do we prefer HTML? 5. Why can't you accept a web address for a paper? 6. What are the additional terms I can add? 7. Why are the links dead in the HTML results? 8. Where are the images? 9. How do you convert PDF files? 10. Does the PDF converter work a bit differently? 1. How do I use this? If
you have a manuscript in either HTML or PDF format, upload it using the
appropriate box above and let us strip out the sentences that contain
band assignments. We can't guarantee to catch them all, or that every
sentence we catch contains a band assignment, but it's a lot more
efficient than reading the whole thing! We'll give you some statistics
to support this soon. (top)
2. How does it work? The underlying mechanism by
which sentences are extracted is very simple. Spectroscopists are very
reliable when it comes to reporting units. We make the very basic
assumption that if a sentence does not contain a reference to a
wavenumber (which we detect through the presence of its units), then it
is unlikely to contain a band assignment. (top)
3. What's with the blue and yellow? The yellow sentence is the one that contains the unit notation. The blue sentences are those that appear before and after the yellow sentence. They are included to help give the hit sentence context. (top) 4. Why do we prefer HTML? HTML is much easier to parse
than PDF. As we have to convert PDF files first, the conversion process
can mangle special characters and layout a bit. Your results are likely
to be far easier to understand if you send us HTML files because we can
preserve more of the original formatting. This is true so long as the
publishers don't use
images to represent special characters, which can mess things up, but
they don't really need to still do that anymore so it's their fault,
not ours. (top)
5. Why can't
you accept a web
address for a paper?We considered implementing this, but given the access control publishers place on their papers, we didn't think it wise to let Manchester servers access them as this would let anyone from any institution access papers through Manchester's licence. Which is bad. (top) 6. What are the additional terms I can add? These are extra words that can help you qualify your search more closely. For example, you might want to look for sentences that contain "helix" in addition to the wavenumber unit. This would be handy if you wanted to search for band assignments that might have something to do with helices. Search terms are optional. (top) 7. Why are the links dead in the HTML results? We will try to activate them soon. (top) 8. Where are the images? Images are replaced by the text [IMAGE FILE]. This is a temporary solution to the problem of special characters being represented using images. This isn't really necessary as most characters can be represented in HTML and many publishers manage it perfectly well without images. (top) 9. How do you convert PDF files? We use ps2ascii, part of the Ghostscript suite. As you can see, it isn't perfect and sometimes the results are a bit messy. Usually there is sufficient information to determine a band assignment, though HTML is preferable. (top) 10. Does the PDF converter work a bit differently? Well spotted! Once converted to text the PDF file is scanned for the specified units and ~200 characters up and downstream reported. This avoids some problems that come about due to determining sentences boundaries. (top) |