{
"cells": [
{
"cell_type": "markdown",
"id": "6c874cdd",
"metadata": {},
"source": [
"# The `swifttools.ukssdc.query` module\n",
"\n",
"**Latest version v1.0, released in swifttools v3.0**\n",
"\n",
"The `query` module provides tools to query various catalogues held at the UKSSDC (and a couple of others) and provides wrappers to get data for the objects we find. There are two types of query supported, a cone search and a more complex search applying filters to different fields. These are analogous to the 'simple' and 'advanced' searches available on the website.\n",
"\n",
"Unlike the [the `data` module](data.ipynb) (which provided functions to get specific products), the `query` module provided classes: we create query objects and then manipulate and use these to perform our query and play with the results. There are three query classes provided:\n",
"\n",
"* ObsQuery - This lets us query the databases of Swift observations.\n",
"* [GRBQuery](query/GRB.ipynb) - This lets us query several GRB catalogues.\n",
"* [SXPSQuery](query/SXPS.ipynb) - This lets us query the SXPS catalogues.\n",
"\n",
"Whichever class you want, I **strongly** advise you to read this page first. Although it deals with the `ObsQuery` class, that class only contains functionality common to all of the classes, and so all the common concepts are introduced here, and not necessarily repeated on the other pages.\n",
"\n",
"The `query` module also provides access to data or data products, by wrapping functions from [the data module](data.ipynb).I will not go into much detail about those functions and what they produce, since this has been covered in the relevant pages for the [data module](data.ipynb) already. One crucial point, however, does need discussing: how the query module makes any requested data available to you (if you haven't read about the data module, ignore this point because it will just confuse you - you'll see how we get data shortly).\n",
"\n",
"In [the data module](data.ipynb) the default behaviour when getting a product was to download files to disk, but you could instead request the data be returned from the function so you could capture it in a variable. For the query module, the requested data are always stored in variables inside your query object. You can still also request that they be saved to disk, or returned from the function that got them (using the same `returnData` and `saveData` arguments as in [the data module](data.ipynb)) but these are not done by default.\n",
"\n",
"In this notebook we will demonstrate the query interface for observation data, which is quite straightforward and also gets us familiar with the module syntax.\n",
"\n",
"We will import the query module as `uq`:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f33ad30a",
"metadata": {},
"outputs": [],
"source": [
"import swifttools.ukssdc.query as uq"
]
},
{
"cell_type": "markdown",
"id": "fc8ab8f0",
"metadata": {},
"source": [
"## Contents\n",
"\n",
"* [The `ObsQuery` class](#obsquery)\n",
" * [Selecting a table](#table)\n",
"* [Simple (cone) searches](#simple)\n",
"* [Selecting columns to retrieve](#columns)\n",
"* [Advanced queries](#advanced)\n",
"* [Extra query settings](#extra)\n",
"* [Retrieving products](#prods)\n",
" * [For only some rows](#subsets)"
]
},
{
"cell_type": "markdown",
"id": "f566a555",
"metadata": {},
"source": [
"\n",
"## The `ObsQuery` Class\n",
"\n",
"The `ObsQuery` class allows us to query the Swift observation database, it is the API equivalent of [this UKSSDC webpage](https://www.swift.ac.uk/swift_live). The only functionality it provides is that which is common to the entire `swifttools.ukssdc.query` module and all subclasses, so it's the perfect one to use as an introduction.\n",
"\n",
"As discussed above, this module is built around classes, so the first thing we have to do is create an ObsQuery object:\n",
"\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "15ffd634",
"metadata": {},
"outputs": [],
"source": [
"q = uq.ObsQuery(silent=False)"
]
},
{
"cell_type": "markdown",
"id": "afc46d1b",
"metadata": {},
"source": [
"I set `silent=False` because in an interactive case like this, it can be helpful to get some textual feedback. If you want even more feedback you can set `verbose=True`. These can be set in the constructor, or via simple calls:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "62ed294e",
"metadata": {},
"outputs": [],
"source": [
"q.verbose = True\n",
"q.verbose = False # Turn it off again!"
]
},
{
"cell_type": "markdown",
"id": "b88f131d",
"metadata": {},
"source": [
"\n",
"### Selecting a table\n",
"\n",
"There are several database tables relating to Swift observations. Just like [the website](https://www.swift.ac.uk/swift_live), the API gives you the most useful (I think) one by default, but of course, you can change this. But how? Well, first, we want to know what tables we have to choose from, and this is stored in the `tables` variable of our query object:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "feeade57",
"metadata": {},
"outputs": [],
"source": [
"q.tables"
]
},
{
"cell_type": "markdown",
"id": "537dea89",
"metadata": {},
"source": [
"We can also check which one is currently selected, or change it, via the `table` variable:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a25fc78e",
"metadata": {},
"outputs": [],
"source": [
"q.table"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6ad61df1",
"metadata": {},
"outputs": [],
"source": [
"q.table = 'swiftxrlog'"
]
},
{
"cell_type": "markdown",
"id": "4f751bb1",
"metadata": {},
"source": [
"You'll notice the cell that changed the table warned us that it was resetting the query details. This only appeared because we have `silent=False` and don't worry about it too much now. Basically it is warning us that any filters set or results retrieved (all covered below) have been wiped because we changed table. We could also have set the table we wanted in the constructor:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ddcc82fb",
"metadata": {},
"outputs": [],
"source": [
"q = uq.ObsQuery(table='swiftbalog',\n",
" silent=False\n",
" )\n",
"q.table"
]
},
{
"cell_type": "markdown",
"id": "b196d869",
"metadata": {},
"source": [
"I won't keep saying this but I'll remind you here: `ObsQuery` only contains behaviour common to the entire `query` module, so the syntax for checking and changing tables is the same for the [`GRBQuery`](query/GRB.ipynb) and [`SXPSQuery`](query/SXPS.ipynb) modules (and for any others I may add in the future)."
]
},
{
"cell_type": "markdown",
"id": "4d43891e",
"metadata": {},
"source": [
"\n",
"## Simple (cone) searches\n",
"\n",
"A simple search is just a cone search, and rather than pontificating, let's just demonstrate. I will stick with the default table, \"swiftmastr\" for all the demonstrations below."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "dd8f15ac",
"metadata": {},
"outputs": [],
"source": [
"q = uq.ObsQuery(silent=False)\n",
"q.addConeSearch(name='GK Per', radius=300, units='arcsec')\n",
"q.submit()"
]
},
{
"cell_type": "markdown",
"id": "622e53be",
"metadata": {},
"source": [
"That was pretty straightforward wasn't it? We introduced two functions here, `addConeSearch()` and `submit()`. The latter is very easy: it takes no arguments and just submits our query object for execution.\n",
"\n",
"`addConeSearch()` should also be pretty clear, but I'll elaborate on its arguments in a moment.\n",
"\n",
"Because we disabled silent mode, we got a bit of information about what was going on, and we can see, firstly, that \"GK Per\" was resolved into coordinates (thank you [SIMBAD](http://simbad.u-strasbg.fr/simbad/)), and secondly, that we got 145 results. But where are those results? They are stored inside your query object (`q`) and the variable holding them is cunningly named `results`. This is a `pandas DataFrame` and we can have a look at it:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8dc02b8b",
"metadata": {},
"outputs": [],
"source": [
"q.results"
]
},
{
"cell_type": "markdown",
"id": "6203d474",
"metadata": {},
"source": [
"You can explore this at your leisure, but let me highlight one point regarding coordinates. In the databases we query, coordinates are stored in decimal degrees (J2000), but this may not be how you want them. So, when you perform a query that gets coordinates, the `query` module will do a bit of extra work. It identifies all of the coordinate columns and creates sexagesimal versions of the coordinates to (in the format of strings). To identify these, \"\\_s\" is appended to the column name (so in the above, \"ra\" and \"decl\" were part of the database, and \"ra_s\", \"decl_s\" have been added. Also, if you have the `astropy` module installed then the coordinates will be converted into `astropy.coordinate.Angle` objects and identied by \"\\_apy\" (\"ra_apy\" and \"decl_apy\" in the above.\n",
"\n",
"When we executed the query above, we were told how the name supplied had been resolved; but only because we set `silent=False`, and not in a way that we could have readily captured in our script. Despair not, the details of the name resolution are also in class variables:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "caf9d2ff",
"metadata": {},
"outputs": [],
"source": [
"q.resolvedInfo"
]
},
{
"cell_type": "markdown",
"id": "cadf0d96",
"metadata": {},
"source": [
"Or indeed:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0583f9cf",
"metadata": {},
"outputs": [],
"source": [
"print(q.resolvedRA)\n",
"print(q.resolvedDec)"
]
},
{
"cell_type": "markdown",
"id": "fbe5a9a5",
"metadata": {},
"source": [
"We're going to go back and look at `addConeSearch()` a bit more and explore some of its syntax, but if we just ran `q.addConeSearch()` again now, we'd get an error telling us our query was locked. This is because when a query is submitted it is locked to prevent ending up in a confused state (a trivial example; imagine you ran the query above and then ran `q.addConeSearch(name='FO Aqr')` but due to an error, didn't submit it; it would be easy to mistakenly think that `q.results` was related to the FO Aqr search. It isn't).\n",
"\n",
"So, if we want to do another query we either need to make a new `ObsQuery` object, or reset the one we have. Let's do the latter:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "26bed051",
"metadata": {},
"outputs": [],
"source": [
"q.reset()"
]
},
{
"cell_type": "markdown",
"id": "840c7a9c",
"metadata": {},
"source": [
"Now we can make any changes we like. Before I move on I'll note that `q.reset()` has some options, you don't have to reset literally everything, but you can read about those via the `help` command if you want.\n",
"\n",
"So, now we can do another cone search if we want. Before we do, let's look at the arguments that this function takes. You could do this via `help (q.addConeSearch)` but I'll be nice and elaborate here.\n",
"\n",
"A cone search needs to know two things:\n",
"\n",
"1. The centre of the cone\n",
"1. The radius of the cone.\n",
"\n",
"The latter is easy, it is managed by two arguments:\n",
"\n",
"* `radius` - a number giving the radius.\n",
"* `units` - the units of `radius`, must be 'arcsec', 'arcmin' or 'deg' (default: 'arcsec').\n",
"\n",
"The centre of the cone can be specified in a few ways, via these arguments:\n",
"\n",
"* `name` - a string giving an object name which we will attempt to resolve.\n",
"* `position` - a free-form string giving the position, which we will attempt to parse.\n",
"* `ra` and `dec` - Two arguments that can either be `float`s or `astropy.coordinates.Angle` objects.\n",
"\n",
"You should only provide one of these arguments (OK, two if `ra` and `dec`) or you will get an error. `name` was used above. `position` is a free-form string and we have tried to accept almost any sane way in which you may enter coordinates (provided they are in J2000). So, all of the examples below will work:\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6de39899",
"metadata": {},
"outputs": [],
"source": [
"q.addConeSearch(ra=123.456, dec=-43.221, radius=1, units='deg')\n",
"q.addConeSearch(position='12 13 15, -15 16 17', radius=12, units='arcmin')\n",
"\n",
"from astropy.coordinates import Angle\n",
"ra = Angle('12h 13m 14s')\n",
"dec = Angle('-13d 14m 15s')\n",
"q.addConeSearch(ra=ra, dec=dec, radius=300, units='arcsec')"
]
},
{
"cell_type": "markdown",
"id": "26df7147",
"metadata": {},
"source": [
"If you provided a `name` or `position`, it will be resolved when the query is submitted, so you can check the details of the resolution using the `q.resolvedRA` etc. variables already introduced. If you supplied `ra` and `dec` you can actually read these back; for example if you don't trust `astropy` as used above, we can check that the correct decimal values were extracted:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0aabab05",
"metadata": {},
"outputs": [],
"source": [
"print(q.coneRA)\n",
"print(q.coneDec)"
]
},
{
"cell_type": "markdown",
"id": "aae42ab0",
"metadata": {},
"source": [
"Note that these variables are read-only so don't try to edit them. If you made a mistake you can use `q.editConeSearch()`, which takes exactly the same arguments as `addConeSearch()`.\n",
"\n",
"We can also change our minds completely and remove the cone search settings from our query:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b5ccd963",
"metadata": {},
"outputs": [],
"source": [
"q.removeConeSearch()"
]
},
{
"cell_type": "markdown",
"id": "17e3e9bb",
"metadata": {},
"source": [
"That's all there is to setting up the cone search aspect of the query.\n",
"## Selecting which columns to retrieve\n",
"\n",
"When we ran our demo query above, we got a lot of columns containing data, but what if we didn't want them all? Or if we wanted one not included? The default set of columns returned is not exhaustive. Well, of course, you can change what set of columns you get as we'll now discuss. First of all, it would be nice to know what we are going to get by default. These are stored in the `defaultCols` variable:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "22b40f98",
"metadata": {},
"outputs": [],
"source": [
"q = uq.ObsQuery(silent=False)\n",
"q.defaultCols"
]
},
{
"cell_type": "markdown",
"id": "e5941f5e",
"metadata": {},
"source": [
"As you can see, this is just a list, and it's a list of column names. Which is nice and all, but what actually are these columns, and what extra ones are available?\n",
"\n",
"Because we have `silent=False` you can see from the above that in order to identify the default columns, the Python module grabbed some metadata. It is this metadata which tells us all about the table. Let's take a look at it:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1bc86ff3",
"metadata": {},
"outputs": [],
"source": [
"q.metadata"
]
},
{
"cell_type": "markdown",
"id": "0777ba8d",
"metadata": {},
"source": [
"Table metadata is made available in the form of a `pandas DataFrame`. The exact columns in this do differ slightly for the different tables (and modules; 'LongDescription' is only present for the `ObsQuery` class), and not all of them are relevant for you - some are needed internally by the Python module, and some are needed by the web interfaces to the databases. The columns you really care about are \"ColName\" and \"Description\" which should provide enough details for you to choose whether the default columns are enough, or if you want to edit them.\n",
"\n",
"So, knowing what columns exist, we can now decide which ones we want to obtain, if we didn't just want the default. We add columns using the `addCol()` function. This takes either `*` (to get everything), a column name, or a list/tuple of column names. So:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b13f2f0d",
"metadata": {},
"outputs": [],
"source": [
"q.addCol('name')\n",
"q.colsToGet"
]
},
{
"cell_type": "markdown",
"id": "01f3128b",
"metadata": {},
"source": [
"As shown above, the `colsToGet` variable (which you cannot edit directly) tells you what columns are selected, and you may have noticed that now we've explicitly said what we want, all the default columns are not there. We can easily add them back in though:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b68fe849",
"metadata": {},
"outputs": [],
"source": [
"q.addCol(q.defaultCols)\n",
"q.colsToGet"
]
},
{
"cell_type": "markdown",
"id": "fd808a9e",
"metadata": {},
"source": [
"You note that we got a warning because we tried to add an existing column in, but it was just a warning and did not stop the other columns being added. We can also remove columns with `removeCol`, or `removeAllCols`. So here are some examples:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c48e6248",
"metadata": {},
"outputs": [],
"source": [
"q.removeCol('name')\n",
"q.colsToGet"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c25c7f72",
"metadata": {},
"outputs": [],
"source": [
"q.removeCol(('obsid', 'stop_time'))\n",
"q.colsToGet"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5bf5c65d",
"metadata": {},
"outputs": [],
"source": [
"q.addCol(['cycle', 'soft_version']+q.defaultCols)"
]
},
{
"cell_type": "markdown",
"id": "5aaaae44",
"metadata": {},
"source": [
"The latter was a slightly gratuitous demonstration of a way that you can add the default columns, and some others, all in one go. We can also remove the whole lot in one go:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d76aa8e5",
"metadata": {},
"outputs": [],
"source": [
"q.removeAllCols()\n",
"q.colsToGet"
]
},
{
"cell_type": "markdown",
"id": "a324f5e5",
"metadata": {},
"source": [
"If we were to submit this now we would not get an empty result set, but rather the default columns, which are always request if `colsToGet` is empty.\n",
"\n",
"If you've been reading everything very carefully, you may be wondering why the results from our demonstration cone search contained the column `_r`, which has not appeared anywhere in the metadata or our requests. `_r` is a special column only created for cone searches, and it contains the angular distance of each row from the centre of the search. The units are whatever was given as the `units` argument to `addConeSearch()` (arcsec by default).\n",
"\n",
"---"
]
},
{
"cell_type": "markdown",
"id": "6ac6ec6a",
"metadata": {},
"source": [
"\n",
"## 'Advanced' searches\n",
"\n",
"'Advanced' searches are those where we select data based on filters applied to specific columns. This is actually fairly simple, we just define a series of filters to apply to the query and then submit it. These can be in addition to, or instead of the cone search. There are a few things to note before we start:\n",
"\n",
"1. Filters are combined with a logical AND; that is if you define multiple filters, only rows matching all of them will be returned.\n",
"1. Each filter applies to a single column but can have two clauses, combined with AND or OR.\n",
"\n",
"This will hopefully all make sense as we go on.\n",
"\n",
"\n",
"### Filters\n",
"\n",
"We add filters using the imaginatively-named `addFilter()` function, which takes a single argument, a filter definition. A filter definition can either be a `dict` or a `list` and has the following components:\n",
"\n",
"* column name\n",
"* filter\n",
"* value (if applicable)\n",
"* combiner (optional)\n",
"* filter2 (optional)\n",
"* value2 (optional)\n",
"\n",
"I will unpack these in more detail but an example is probably a better helper so, here are two filters created using the two approaches:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "47553cc2",
"metadata": {},
"outputs": [],
"source": [
"filter1 = ('xrt_exposure', '>', 1000, 'OR', '<', 200)\n",
"\n",
"filter2 = {\n",
" \"colName\": \"ra\", \n",
" \"filter\": \">\", \n",
" \"val\": 123,\n",
" \"combiner\": \"and\" ,\n",
" \"filter2\": \"<\",\n",
" \"val2\": 200\n",
"}"
]
},
{
"cell_type": "markdown",
"id": "d4c04d4a",
"metadata": {},
"source": [
"If these filters were converted to SQL they would be written as \n",
"\n",
"* filter1: `xrt_exposure>1000 OR xrt_exposure<200`\n",
"* filter2: `ra>123 AND ra<200`\n",
"\n",
"And if we submitted this query they would combine as:\n",
"\n",
"`(xrt_exposure>1000 OR xrt_exposure<200) AND (ra>123 AND ra<200)`\n",
"\n",
"If we wanted to create a query using these filters we would do:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b173566a",
"metadata": {},
"outputs": [],
"source": [
"q = uq.ObsQuery(silent=False)\n",
"q.addFilter(filter1)\n",
"q.addFilter(filter2)"
]
},
{
"cell_type": "markdown",
"id": "502b54de",
"metadata": {},
"source": [
"A filter doesn't have to have all of the elements we used above. You may not want the second part of the filter, so everything from `combiner` onwards can simply be omitted. Also, some of the filters do not require arguments, so in that case, `val` (or `val2`) can be left out. Conversely, the 'BETWEEN' filter requires two arguments and so `val` must take a 2-element tuple/list.\n",
"\n",
"Some more examples:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "486b9f9c",
"metadata": {},
"outputs": [],
"source": [
"q.removeAllFilters()\n",
"\n",
"q.addFilter ( ('xrt_exposure', '<', 2000))\n",
"\n",
"q.addFilter ( ('ra', 'BETWEEN', [100,200]))\n",
"\n",
"q.addFilter ({\n",
" 'colName': 'target_id',\n",
" 'filter': 'IS NULL',\n",
" 'combiner': 'or',\n",
" 'filter2': '<',\n",
" 'val2': 10000 \n",
"})"
]
},
{
"cell_type": "markdown",
"id": "17a7e32a",
"metadata": {},
"source": [
"Here I've introduced the `removeAllFilters()` function (you can probably guess what it does) and a few more examples of adding filters, including some where we don't need all elements of the filter definition.\n",
"\n",
"The following values are permitted for the 'filter' and 'filter2' keys:\n",
"\n",
"* \"<\"\n",
"* \">\"\n",
"* \"=\"\n",
"* \"<=\"\n",
"* \">=\"\n",
"* \"LIKE\"\n",
"* \"BETWEEN\"\n",
"* \"NOT LIKE\"\n",
"* \"!=\"\n",
"* \"IS NULL\"\n",
"* \"IS NOT NULL\"\n",
"\n",
"All of these require a single value, except for 'BETWEEN', which requires 2, and 'IS NULL' and 'IS NOT NULL' which take no values.\n",
"\n",
"Having added filters we can check them:\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "abcb54c7",
"metadata": {},
"outputs": [],
"source": [
"q.showFilters()"
]
},
{
"cell_type": "markdown",
"id": "48bd8755",
"metadata": {},
"source": [
"The outputs as you can see are strings.\n",
"\n",
"We can also remove specific filters by their index. **Please bear in mind indices will change when you remove a filter!**"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ef19e689",
"metadata": {},
"outputs": [],
"source": [
"q.removeFilter(1)"
]
},
{
"cell_type": "markdown",
"id": "1492d1f7",
"metadata": {},
"source": [
"Because `silent=False` the above function printed out the revised set of filters, if it was `True` you'd have to do this yourself with another `showFilters()` call.\n",
"\n",
"Right, enough messing around, let's do an actual query. I'm going to do both a cone and advanced search together here, just to prove you can (and to limit how many rows we get), but of course, you don't have to."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "03209485",
"metadata": {},
"outputs": [],
"source": [
"q=uq.ObsQuery(silent=False)\n",
"q.addConeSearch(name='GK Per', radius=12, units='arcmin')\n",
"q.addFilter( ('xrt_exposure', '>', 3000))\n",
"q.isValid()"
]
},
{
"cell_type": "markdown",
"id": "fb2f47fc",
"metadata": {},
"source": [
"Here, we have created a query, requested a cone search of 12' radius, centred on \"GK Per\", and we have asked to only get those rows where the \"xrt_exposure\" field is above 3000. I also made use of a function, `isValid()`. This just checks that the syntax is OK and we haven't done anything silly. Let's submit the query:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "da3e1b6e",
"metadata": {},
"outputs": [],
"source": [
"q.submit()"
]
},
{
"cell_type": "markdown",
"id": "3e14d8b8",
"metadata": {},
"source": [
"Again, because we were not silent we got some information printed, but this is also available in variables. I have already discussed accessing details of the name resolution, but how could I find out that 21 rows were returned without having to read from the screen? Like this:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "38a31ee5",
"metadata": {},
"outputs": [],
"source": [
"q.numRows"
]
},
{
"cell_type": "markdown",
"id": "54d8db54",
"metadata": {},
"source": [
"Or indeed:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2220c9f1",
"metadata": {},
"outputs": [],
"source": [
"len(q.results)"
]
},
{
"cell_type": "markdown",
"id": "daf231c5",
"metadata": {},
"source": [
"if you prefer. They should be the same. And let's just prove that our filters worked:\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a3d2c056",
"metadata": {},
"outputs": [],
"source": [
"q.results"
]
},
{
"cell_type": "markdown",
"id": "a7a0aa3e",
"metadata": {},
"source": [
"\n",
"## Extra query settings\n",
"\n",
"There are one or two further things to discuss: sorting the results, and controlling how many rows we get. Both of these can be done after the query is complete using `pandas` functions, or they can be done at query time. Let's start with sorting.\n",
"\n",
"\n",
"### Sorting results\n",
"\n",
"By default, if you did a cone search the results are ordered by increasing distance from the cone centre. If you didn't do a cone search, they are ordered by however they come out of the database. To control this sorting, we use two variables in our class `sortCol` and `sortDir`. The former is the name of a column in the table (which we can get via `q.metadata`, as above), the latter is either \"ASC\" or \"DESC\", (for ascending or descending).\n",
"\n",
"Let's have a quick demo, and while I'm here I'll show you that you can unlock a query, rather than fully resetting it, if you just want to change something and resubmit:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3b1ca1c5",
"metadata": {},
"outputs": [],
"source": [
"q.unlock()\n",
"q.sortCol='xrt_exposure'\n",
"q.sortDir='DESC'\n",
"q.submit()\n",
"q.results"
]
},
{
"cell_type": "markdown",
"id": "5072b6fd",
"metadata": {},
"source": [
"As you can see, this time, the results are ordered by the \"xrt_exposure\" column, in descending order. Incidentally, you don't have to actually retrieve the column you sort on, if you don't want to!\n",
"\n",
"\n",
"\n",
"### How many rows to get\n",
"\n",
"The default behaviour of this module is to get all rows in the database that match your query. This can be a lot, and maybe you don't want them all. You can limit how many are returned using the `maxRows` variable:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "da623b20",
"metadata": {},
"outputs": [],
"source": [
"q.unlock()\n",
"q.maxRows=3\n",
"q.submit()"
]
},
{
"cell_type": "markdown",
"id": "4a0b3624",
"metadata": {},
"source": [
"This returned us the top three rows matching our query - and note that as we have (just a moment ago) said that our results should be ordered by descending XRT exposure, we should have got the three observations with the longest exposures."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4e83d798",
"metadata": {},
"outputs": [],
"source": [
"q.results"
]
},
{
"cell_type": "markdown",
"id": "416bea65",
"metadata": {},
"source": [
"We did!\n",
"\n",
"I said a moment ago that by default *all* matching rows are returned, and this is true. Some of you may be wondering why, therefore, the output above repeatedly says: \"Calling DB look-up for rows 0 -- 1000\" (if you are not wondering this, skip to the next section). Was I lying about getting all rows?\n",
"\n",
"No, I wasn't, and in general what I'm about to say won't matter, but it's here to satisfy your curiosity.\n",
"\n",
"There is a limit to the amount of server resources a single query can consume. In practice this means that if you request some enormous query with lots of rows, the query will be terminated because it uses more memory than is permitted and you will get some unhelpful error (probably an HTTP 500 error). To avoid this, the Python module will never ask for more than 1,000 rows at a time. But this doesn't mean that you only get 1,000 rows; it just means that the Python back end will get your results in chunks, requesting the first 1,000 rows and then (if necessary) the next 1,000 etc., giving (non-silent) output such as:\n",
"\n",
"Calling DB look-up for rows 0 -- 1000\n",
"Calling DB look-up for rows 1001 -- 2000\n",
"Calling DB look-up for rows 2001 -- 3000\n",
"\n",
"The results of these calls will be stitched together for you and this entire process would be completely invisible if we hadn't said `silent=False`.\n",
"\n",
"---"
]
},
{
"cell_type": "markdown",
"id": "66053b46",
"metadata": {},
"source": [
"\n",
"## Retrieving products from a query\n",
"\n",
"Having identified observations using a query, you may wish to download them. You can do this by calling the `downloadObsData()` function. This is literally just a wrapper the function of the same name in [the `data` module](data.ipynb) and as already warned, I'm not going to redocument that here. Arguments are passed through to `data.downloadObsData()` as `**kwargs`, with just three exceptions. These exceptions apply to almost every data product function provided throughout the `query` module, so I will refer back to this section a few times on subsequent page. These exceptions are:\n",
"\n",
"* You do not specify `silent` and `verbose`; these are properties of your query object, and are set from them.\n",
"* You do not supply the identifier of the object(s) you want the products for; the products are retrieved for the objects in your query's `results` table.\n",
"* There is an optional `subset` argument, which lets you specify a subset of the `results` table for which you want data products.\n",
"\n",
"This last point we will return to [in a moment](#subsets)\n",
"\n",
"First, let's do a simple demonstration of this. I will deliberately execute a query that doesn't get too many rows."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "383cd333",
"metadata": {},
"outputs": [],
"source": [
"q = uq.ObsQuery(silent=False)\n",
"q.addConeSearch(name='GRB 210205A', radius=300)\n",
"q.submit()\n",
"q.results"
]
},
{
"cell_type": "markdown",
"id": "c4caf252",
"metadata": {},
"source": [
"As you can see, this gave us two rows, and we can save the observations simply enough."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "35011678",
"metadata": {},
"outputs": [],
"source": [
"q.downloadObsData(destDir='/tmp/APIDemo_download1',\n",
" instruments=('BAT', 'XRT'),\n",
" getTDRSS=True)"
]
},
{
"cell_type": "markdown",
"id": "1b4125ee",
"metadata": {},
"source": [
"The arguments I supplied to `downloadObsData()` are the standard arguments for `data.downloadObsData()` which are [documented here](data.ipynb#obsid).\n",
"\n",
"One thing that is worth noting is that, since `downloadObsData()` uses the observation identifier to work out which data you are asking for, if your query did not retrieve the relevant column, this function would fail. If you are manually selecting columns and want to ensure you have the necessary column you can always look at the metadata (described above). If the table in question contains observation identifiers then the metadata will include \"IsObsCol\", which will be 1 for the relevant column.\n",
"\n",
"\n",
"### Getting products for only some rows\n",
"\n",
"After performing a query you may realise that you don't want to get data for all of the rows, but only some of them. You could repeat the query, of course, with extra filtering, but this would be a bit wasteful. The `subset` parameter solves this; it exists for all functions for getting products via the `query` module, and allows you to define a subset of rows for which products should be retrieved.\n",
"\n",
"The argument itself takes a `pandas Series` of `bool`s, identifying the rows, which sounds complicated, but actually you just need to give it a `pandas` filter expression. It is beyond the scope of this documentation to describe those in detail, but the examples below should give you a primer.\n",
"\n",
"First, I'm going to do a cone search around GK Per, ordering by exposure (longest first)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "02ebef31",
"metadata": {},
"outputs": [],
"source": [
"q = uq.ObsQuery(silent=False)\n",
"q.addConeSearch(name='GK Per', radius=300)\n",
"q.sortCol = 'xrt_exposure'\n",
"q.sortDir='DESC'\n",
"q.submit()\n",
"q.results"
]
},
{
"cell_type": "markdown",
"id": "5a3ce927",
"metadata": {},
"source": [
"Now, looking at this, I want to get only those rows that have more than 6ks of XRT data. That is, those rows where `q.results['xrt_exposure']>6000`, and that is my subset definition:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6584c571",
"metadata": {},
"outputs": [],
"source": [
"q.downloadObsData(destDir='/tmp/APIDemo_download2',\n",
" subset=q.results['xrt_exposure']>6000,\n",
" instruments=('XRT',),\n",
" getAuxil=False)"
]
},
{
"cell_type": "markdown",
"id": "9996b31d",
"metadata": {},
"source": [
"If you want to check your filter expression before actually using it, you can use the `.loc` property of a `DataFrame` like this:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7499e594",
"metadata": {},
"outputs": [],
"source": [
"q.results.loc[q.results['xrt_exposure']>6000]"
]
},
{
"cell_type": "markdown",
"id": "b9c6e4d1",
"metadata": {},
"source": [
"And you can see that the `DataFrame` we got back contains only the rows we wanted.\n",
"\n",
"I'll give a few more examples of creating subsets, and rather then fill up your `/tmp` area by downloading, I'll define them as variables and print them using `.loc`, but you can just put my examples straight into the `subset` argument of `downloadObsData`. I'll include such code but commented out, below.\n",
"\n",
"First, let's show you how to apply multiple filters because this *always* take me 3 attempts to get right. Let's say we only want results with more than 6ks of XRT exposure, but less than 7ks."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8bd4b461",
"metadata": {},
"outputs": [],
"source": [
"subset = (q.results['xrt_exposure']>6000)&(q.results['xrt_exposure']<7000)\n",
"q.results.loc[subset]\n",
"# Uncomment the following if you want:\n",
"# q.downloadObsData(destDir='/tmp/APIDemo_download3',\n",
"# subset=(q.results['xrt_exposure']>6000)&(q.results['xrt_exposure']<7000),\n",
"# instruments=('XRT',),\n",
"# getAuxil=False)"
]
},
{
"cell_type": "markdown",
"id": "45e3e632",
"metadata": {},
"source": [
"And lastly, the `isin` function which is really handy as well. This lets us make a subset by giving some values we want a column to contain.So, for example, imagine I only wanted to download the data in the above for those cases with a \"target_id\" of 81445, 45767 or 81637. I could do that as follows:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "643ab1f6",
"metadata": {},
"outputs": [],
"source": [
"myTargs = (81445, 45767, 81637)\n",
"subset=q.results['target_id'].isin(myTargs)\n",
"q.results.loc[q.results['target_id'].isin(myTargs)]\n",
"# q.downloadObsData(destDir='/tmp/APIDemo_download4',\n",
"# subset=q.results['target_id'].isin(myTargs),\n",
"# instruments=('XRT',),\n",
"# getAuxil=False)"
]
},
{
"cell_type": "markdown",
"id": "56d26065",
"metadata": {},
"source": [
"Astute readers will have realised that this effectively takes place of the `IN` operator in SQL, which is not supported by the `addFilter` command (largely for security reasons)."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3.8.1 64-bit",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.1"
},
"vscode": {
"interpreter": {
"hash": "8d1c31afbb796ffe8c3fd5080738519a9b666e3617906fb7228ad5805a80d686"
}
}
},
"nbformat": 4,
"nbformat_minor": 5
}