Unverified Commit cd8f3e67 authored by Jesse Mapel's avatar Jesse Mapel Committed by GitHub
Browse files

Added setup notebook and re-checked other notebooks (#540)

* Re-ordered nbs and added setup

* Moved setup to nb 0

* Fixed network gen number

* Re-ran and updated notebooks
parent 271e5969
Loading
Loading
Loading
Loading
+91 −0
Original line number Diff line number Diff line
%% Cell type:markdown id: tags:

## Cells

%% Cell type:markdown id: tags:

Cells are isolated blocks of code that can be run individually. Although the code is sequesterd within a cell, the variables created in a cell can be accessed else were. Manipulating cells in jupyter notebook is easy. A cell is what this is,it is where you will be writing your code.  There are two types of cells in this document, markdown cells and code cells. The type of cell can be changed at the top of the notebook in the dropdown menu that shows either Markdown or code.

### Exercise:  Change the type of this cell from markdown to code.

Code cells will be used for all excercises in this tutorial. All cell manipulation tools will be found at the top of the notebook.  To add a cell, push the plus symbol. To move a cell, use the arrow buttons.  To delete a cell, click to the left of the cell and enter dd. To a run a cells contents, press shift and enter.

### Exercise: Move this cell up one and then down one.  Add an extra cell and then delete it.

%% Cell type:markdown id: tags:

## Killing Notebooks

%% Cell type:markdown id: tags:

Notebooks are hosted on the nebula cluster, unless specifically shut down, the job running the jupyter notebook will continue running even if jupyter hub fails. It is important to cancel your jupyter jobs after you are finished.  Select the box next to your notebook in the main jupyter hub tab and click the yellow shutdown button at the top of the page.

%% Cell type:markdown id: tags:

## Kernels

%% Cell type:markdown id: tags:

Jupyter kernels are conda environments that can be accessed within a jupyter notebook
- Click on 'Kernel' tab in top menu
- Go to change 'Change Kernel' and look through the options

%% Cell type:markdown id: tags:

## Anaconda Environments

%% Cell type:markdown id: tags:

Anaconda environments are a collection of python packages that are installed into an isolated environment and can be selectively accessed through activation of that environment. They are particularly helpful because various versions of a program or various combinations of programs in isolated environments do not effect those in another environment.

For example, ASC internally creates anaconda environments for each new release, and release candidate of the ISIS software. If a user would like to access any particular version of ISIS, they would type `conda activate isisx.y.z`, if later they wanted to access a different version they could `conda deactivate && conda activate isisu.v.w` without worrying about cross containination between the two environments.

%% Cell type:markdown id: tags:

## Setting up the workshop kernel

%% Cell type:markdown id: tags:

The default Python 3 kernel that we are using with this notebook does not have all of the software that we will need in it. So, we need to setup a kernel that points to an Anaconda environment with what we will require installed.

Run the following cell to write the kernel specification into your home directory. Once it is there, Jupyterhub will automatically pick it up as an option next time you start a notebook.

%% Cell type:code id: tags:

``` python
from pathlib import Path

kernel_text = """{
 "argv": [
  "/usgs/cpkgs/anaconda3_linux/envs/autocnet/bin/python",
  "-m",
  "ipykernel_launcher",
  "-f",
  "{connection_file}"
 ],
 "display_name": "AutoCNet PLIO (workshop)",
 "language": "python"
}
"""

kernel_dir = Path.home() / '.local/share/jupyter/kernels/autocnet_workshop/'
kernel_dir.mkdir(parents=True, exist_ok=True)
kernel_file = kernel_dir / 'kernel.json'

kernel_file.open('w').write(kernel_text)
```

%% Cell type:markdown id: tags:

You can also run shell commands in a Jupyter notebook by starting the line with "!". Run the following cell to check that your kernel was installed properly. You should see the same text as the kernel_text variable in the cell above.

%% Cell type:code id: tags:

``` python
!cat ~/.local/share/jupyter/kernels/autocnet_workshop/kernel.json
```

%% Cell type:code id: tags:

``` python
```
+0 −42
Original line number Diff line number Diff line
%% Cell type:markdown id: tags:

# Overview

%% Cell type:markdown id: tags:

Autocnet uses numpy, pandas, and matplotlib to manage data and show it in a userfriendly way.  Numpy is powerful processing software, great for large datasets and performing complex calculations.  Pandas organizes the data into a dataframe so it can be viewed and manipulated as needed.  Matplotlib shows the data using various plots. This tutorial will go over basic operations in jupyter notebook using numpy, pandas, and matplotlib.  You will use these software packages extensively working with Autocnet.

%% Cell type:markdown id: tags:

### Cells

%% Cell type:markdown id: tags:

Cells are isolated blocks of code that can be run individually. Although the code is sequesterd within a cell, the variables created in a cell can be accessed else were. Manipulating cells in jupyter notebook is easy. A cell is what this is,it is where you will be writing your code.  There are two types of cells in this document, markdown cells and code cells. The type of cell can be changed at the top of the notebook in the dropdown menu that shows either Markdown or code.

#### Exercise:  Change the type of this cell from markdown to code.

Code cells will be used for all excercises in this tutorial. All cell manipulation tools will be found at the top of the notebook.  To add a cell, push the plus symbol. To move a cell, use the arrow buttons.  To delete a cell, click to the left of the cell and enter dd. To a run a cells contents, click shift and enter.

#### Exercise: Move this cell up one and then down one.  Add an extra cell and then delete it.

%% Cell type:markdown id: tags:

### Killing Notebooks

%% Cell type:markdown id: tags:

Notebooks are hosted on the nebula cluster, unless specifically shut down, the job running the jupyter notebook will continue running even if jupyter hub fails. It is important to cancel your jupyter jobs after you are finished.  Select the box next to your notebook in the main jupyter hub tab and click the yellow shutdown button at the top of the page.

%% Cell type:markdown id: tags:

### Kernels

%% Cell type:markdown id: tags:

Jupyter kernels are conda environments that can be accessed within a jupyter notebook
- Click on 'Kernel' tab in top menu
- Go to change 'Change Kernel' and look through the options

%% Cell type:markdown id: tags:

### Anaconda Environments

%% Cell type:markdown id: tags:

Anaconda environments are a collection of python packages that are installed into an isolated environment and can be selectively accessed through activation of that environment. They are particularly helpful because various versions of a program or various combinations of programs in isolated environments do not effect those in another environment.

For example, ASC internally creates anaconda environments for each new release, and release candidate of the ISIS software. If a user would like to access any particular version of ISIS, they would type `conda activate isisx.y.z`, if later they wanted to access a different version they could `conda deactivate & conda activate isisu.v.w` without worrying about cross containination of environment variables.

%% Cell type:markdown id: tags:

## Numpy

%% Cell type:markdown id: tags:

What is Numpy? Numpy is the core library for computing in Python.  It provides a multidimensional array object, and tools to work with the array.  A numpy array is a grid of values, all of the same type, and indexed by a list of nonnegative integers. The number of dimensions is the rank of the array, the shape of an array is a list of intergers giving the size of the array along each dimension. The real power of numpy is massive vectorization at a processor level.  The difference between python and numpy processing is huge.

%% Cell type:code id: tags:

``` python
import numpy as np        # First thing to do is to add numpy to the notebook you are using. In order to access python modules (or functions from a python module) they must be explicitly loaded into the notebook.

a = np.array([1, 2, 3])   # Create a rank 1 array, vector array

print(type(a))            # print() will print a value to a stream. This prints the type of array.

print(a.shape)            # prints the shape of the array

print(a[0], a[1], a[2])   # prints the values of the array

print(a.dtype)            # prints the data type (this is a 64-bit intereger)
```

%% Cell type:markdown id: tags:

Use the ? symbol to query python about an object.  Below is an example for ?np.array.  A window should pop up that talks about   what this object does.

%% Cell type:code id: tags:

``` python
#### Exercise: Use the ? symbol to find more information about the following objects: print, type, np.dot, np.dtype).
```

%% Cell type:code id: tags:

``` python
```

%% Cell type:markdown id: tags:

Numpy array's can be manipulated in many ways.  The following sections shows a few ways to perfom different math functions on aspects of a numpy array.

%% Cell type:code id: tags:

``` python
a[0] = 5                  # Change an element of the array. Python has a zero based index, indexing starts at zero instead of one, so the first element is 0.

print(a)
```

%% Cell type:markdown id: tags:

#### Exercise: In the next cell, change the second element in the "a" array to 10. Print your results to check.

%% Cell type:code id: tags:

``` python
```

%% Cell type:code id: tags:

``` python
a[1]+15                   # Adding to one element in the array.
```

%% Cell type:markdown id: tags:

#### Exercise: Add 5 to the third element in the "a" array.

%% Cell type:code id: tags:

``` python
```

%% Cell type:code id: tags:

``` python
a+1                       # Add one to the entire array,this is called broadcasting, it broadcasts to each element of the array.
```

%% Cell type:code id: tags:

``` python
a*2                       # Times the array by 2
```

%% Cell type:code id: tags:

``` python
np.sin(a)        # Sin of the array. This is a vectorized function, this applies the function to each element of the array.
```

%% Cell type:code id: tags:

``` python
np.std(a)         #Standard deviation of the array.
```

%% Cell type:markdown id: tags:

#### Exercise: Take the standard deviation of the cosine of the "a" array.

%% Cell type:code id: tags:

``` python
```

%% Cell type:code id: tags:

``` python
print(np.append([a], [15, 1, 6])) #Appends elements to an array. Different axis of the array can be specified.
print(np.append([a], [[15, 1, 6]], axis=0))
```

%% Cell type:code id: tags:

``` python
b = np.array([[1,2,3],[4,5,6]])    # Create a rank 2 array, matrix array
print(b.shape)                     # Prints "(2, 3)"
print(b[0, 0], b[0, 1], b[1, 0])   # Prints "1 2 4"
```

%% Cell type:markdown id: tags:

#### Exercise: Print the second element of the first rank of the "b" array.

%% Cell type:code id: tags:

``` python
```

%% Cell type:markdown id: tags:

Numpy arrays can be combined with other arrays. This section goes over working with multiple arrays.

%% Cell type:code id: tags:

``` python
np.add(a,b)            # Adds two arrays together
```

%% Cell type:code id: tags:

``` python
print(a)
print(b)
a + b
```

%% Cell type:code id: tags:

``` python
c = np.zeros((2,2))   # Create an array of all zeros
print(c)
```

%% Cell type:code id: tags:

``` python
d = np.ones(3)    # Create an array of all ones
print(d)
```

%% Cell type:code id: tags:

``` python
e = np.full((2,2), 7)  # Create a constant array
print(e)
```

%% Cell type:code id: tags:

``` python
f = np.eye(2)         # Create a 2x2 identity matrix
print(f)
```

%% Cell type:code id: tags:

``` python
e = np.random.random((2,2))  # Create an array filled with random values, uses: adding random error to things, pick random elements, pull a random sample from data
print(e)
```

%% Cell type:code id: tags:

``` python
s=a[0]                  # Contents of an array can be accessed and modified by slicing. Slicing takes an elements from one index and moves them to another.
print(s)                # This should take the first element of the 'a' array and move it to the s array.
print(a)
```

%% Cell type:code id: tags:

``` python
print(b[0:1])   # You can also slice items between indexes
```

%% Cell type:markdown id: tags:

#### Exercise: multiple the "a" and "d" arrays together. Hint: ?np.dot.

%% Cell type:code id: tags:

``` python

```

%% Cell type:code id: tags:

``` python
```

%% Cell type:markdown id: tags:

## Pandas

%% Cell type:markdown id: tags:

What is Pandas? Pandas is a open source data analysis and manipulation tool, built on top of the Python programming language. It is a software library for Python and is used to perform data analysis in Python. Under the hood, pandas uses numpy to perform most of its functions. The most common type of pandas dataframe is a 2-dimensional table, with rows and columns.  This is very similar to an excel spreadsheet or SQL table.

%% Cell type:code id: tags:

``` python
import pandas as pd  # Pandas package import
import requests      # Http library for Python. Communicates between browser and web server that is storing data.
```

%% Cell type:code id: tags:

``` python
url = 'https://api.covid19api.com/summary' # Practice data location.
```

%% Cell type:code id: tags:

``` python
r = requests.get(url, verify = False) #A warning will pop up because of the DOI firewall.  This will ingest the data to look at.
```

%% Cell type:markdown id: tags:

We use json to extract and structure the data initially but this is just to get it ready for pandas.

%% Cell type:code id: tags:

``` python
json = r.json() # Extracts json structured data from the request.
```

%% Cell type:code id: tags:

``` python
json # Variable that will show the data.  It is similar to pvl.
```

%% Cell type:code id: tags:

``` python
json.keys() # Contains the keys of the dictionary as a list. We can use these keys to explore the json similar to selecting a column.
```

%% Cell type:code id: tags:

``` python
json['Countries']          # Sorts the data by country.
```

%% Cell type:code id: tags:

``` python
# Looking at the type of keys can tell us what has interesting data or not.

type(json['Global'])
```

%% Cell type:markdown id: tags:

#### Exercise: Look at what type of keys the Countries, Message, Date, and ID columns are.

%% Cell type:code id: tags:

``` python
```

%% Cell type:code id: tags:

``` python
# JSON to a Dataframe.  # Remove the slug column by specifying columns.
df = pd.DataFrame(json['Countries'])
df.set_index('Country', inplace=True)
df
```

%% Cell type:code id: tags:

``` python
df.shape            # Returns the shape of the dataframe, 190 rows and 11 columns.
```

%% Cell type:code id: tags:

``` python
df.columns          # Returns the names of the columns in the dataframe
```

%% Cell type:code id: tags:

``` python
# Preview of the Data
df.head()
```

%% Cell type:markdown id: tags:

#### Exercise: View the tail of the dataframe.

%% Cell type:code id: tags:

``` python
```

%% Cell type:code id: tags:

``` python
pd.set_option('display.max_rows',None)              # By default, pandas, does not show the entire dataframe.  Change these two options to change that.
pd.set_option('display.max_columns',None)
df
```

%% Cell type:code id: tags:

``` python
df.loc['Colombia']         #Allows you to look up columns and rows based on values.
```

%% Cell type:markdown id: tags:

#### Exercise: Look up the United States, China, Brazil, and France.

%% Cell type:code id: tags:

``` python
```

%% Cell type:code id: tags:

``` python
df.loc[df['TotalDeaths'] > 100000]   # You can combine .loc with other parameters to sort the dataframe.
```

%% Cell type:markdown id: tags:

#### Excercise:  Look up the TotalConfirmed > 10000000

%% Cell type:code id: tags:

``` python
```

%% Cell type:code id: tags:

``` python
df['NewRecovered']             # This will show all the data for the NewRecovered column.
```

%% Cell type:code id: tags:

``` python
df[["TotalConfirmed","TotalDeaths"]].describe()  # How to compute descriptive statistics .describe()
```

%% Cell type:markdown id: tags:

#### Exercise: Look up descriptive statistics for TotalConfirmed and NewConfirmed.

%% Cell type:code id: tags:

``` python
```

%% Cell type:code id: tags:

``` python
df['TotalConfirmed'].quantile(.95) # percentile function.  Allows you to compute percentiles on a series.
```

%% Cell type:markdown id: tags:

#### Exercise: Look up the 25th, 50th, and 75th quantile for TotalDeaths using one function.

%% Cell type:code id: tags:

``` python
```

%% Cell type:code id: tags:

``` python
df['NewDeaths']+df['TotalDeaths']    # You can perform simple math functions on the different columns.
```

%% Cell type:markdown id: tags:

#### Exercise: Calculate the death rate by country.

%% Cell type:code id: tags:

``` python
```

%% Cell type:code id: tags:

``` python
df.sort_values(by='TotalDeaths', ascending=False)     # The .sort_values function allows you to sort data by either the x or y axis.
```

%% Cell type:markdown id: tags:

#### Exercise:  Sort the dataframe with fewest TotalConfirmed on top.

%% Cell type:code id: tags:

``` python
```

%% Cell type:markdown id: tags:

## Matplotlib

%% Cell type:markdown id: tags:

What is Matplotlib? Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python. In this section, we are going to take the dataframe and plot the data using boxplots, histograms, scatterplots, and lineplots.

%% Cell type:code id: tags:

``` python
import matplotlib.pyplot as plt
```

%% Cell type:markdown id: tags:

### Titles and axes properties
![image.png](attachment:image.png)

%% Cell type:code id: tags:

``` python
# Box plots

# Creating Data
np.random.seed(10)
data = np.random.normal(100, 20, 200)

fig = plt.figure(figsize =(15, 9))

# Plot
plt.boxplot(data)

plt.show()
```

%% Cell type:code id: tags:

``` python
df.boxplot(column=['TotalRecovered','TotalDeaths'])               # You can plot multiple columns on one plot.
plt.yscale("log")                       # You can change the x and y axis to log scale.
```

%% Cell type:markdown id: tags:

#### Exercise: Make a boxplot of TotalConfirmed and TotalDeaths cases. Plot using a log scale on the y axis.

%% Cell type:code id: tags:

``` python
```

%% Cell type:code id: tags:

``` python
# Histograms

# Data
x = [21,22,23,4,5,6,77,8,9,10,31,32,33,34,35,36,37,18,49,50,90,100]
num_bins = 10

# Plot
n, bins, patches = plt.hist(x, num_bins, facecolor='green', alpha=0.5)
plt.show()
```

%% Cell type:code id: tags:

``` python
df.hist(grid=False, color='red')       # You can use matplotlib to plot all the columns in a dataframe at once.
```

%% Cell type:markdown id: tags:

###### Exercise: Create a green histogram of the TotalRecovered that has 50% opacity.

%% Cell type:code id: tags:

``` python
```

%% Cell type:code id: tags:

``` python
# Scatterplots

# Data
N = 1000
x = np.random.rand(N)
y = np.random.rand(N)
area = np.pi*3

# Plot
plt.scatter(x, y, s=area, facecolor='blue', alpha=0.5)

plt.title('Scatter plot')

plt.xlabel('X')

plt.ylabel('Y')

plt.show()
```

%% Cell type:code id: tags:

``` python
# Choosing what color to use is important when making a plot.  https://matplotlib.org/3.1.0/tutorials/colors/colormaps.html has a wide array of colormaps to choose from.
from collections import OrderedDict
df.plot.scatter(x='NewConfirmed', y='NewRecovered', colormap='viridis', title='Scatter plot')
```

%% Cell type:markdown id: tags:

#### Exercise: Create a scatterplot of TotalDeaths vs. TotalConfirmed with a log scale on the TotalConfirmed axis, a title, and a coloarscale.  Is there trend to the data?

%% Cell type:code id: tags:

``` python
```

%% Cell type:code id: tags:

``` python
# line plots

# Data
t = np.arange(0.0, 4.0, 0.001)
s = 1 + np.sin(2 * np.pi * t)

# Plot
fig, ax = plt.subplots()
ax.plot(t, s)

ax.set(xlabel='Wavelength', ylabel='Waveheight',
       title='Line Plot')
ax.grid()

fig.savefig("test.png")
plt.show()
```

%% Cell type:markdown id: tags:

## Addtional resources

%% Cell type:code id: tags:

``` python
# A few other things you can modify in Matplotplib

x = np.linspace(0, 2, 100)

fig, ax = plt.subplots()  # Create a figure and an axes.

ax.plot(x, x, label='linear')  # Plot some data on the axes.

ax.plot(x, x**2, label='quadratic')  # Plot more data on the axes.
ax.plot(x, x**3, label='cubic')

ax.set_xlabel('x label')  # Add an x-label to the axes.

ax.set_ylabel('y label')  # Add a y-label to the axes.

ax.set_title("Simple Plot")  # Add a title to the axes.

ax.legend()  # Add a legend.
```
+3 −3
Original line number Diff line number Diff line
%% Cell type:markdown id: tags:

# Using PLIO to analyze control networks
PLIO is a general purpose library for reading data from various sources. In this workshop, we will be using PLIO's ability to read ISIS control networks into a Pandas dataframe.

%% Cell type:code id: tags:

``` python
# PLIO uses pysis for some other things. We don't technically need this but it avoids a warning.
import os
os.environ['ISISROOT'] = '/usgs/cpkgs/anaconda3_linux/envs/isis4.3.0'
os.environ['ISISDATA'] = '/usgs/cpkgs/isis3/isis_data'

# 3D plotting toolkit for matplotlib
from mpl_toolkits.mplot3d import Axes3D

# Numerical Python library
import numpy as np
```

%% Cell type:markdown id: tags:

# Our networks
All of this data was generously provided by Lynn Weller and Mike Bland from their Europa control project.

The first network is a very rough starting point. The Galileo images of Europa were put through the [findfeatures](https://isis.astrogeology.usgs.gov/Application/presentation/Tabbed/findfeatures/findfeatures.html) application and then all of the resulting networks were merged together. This network has many known issues including islands, massive residuals, and poor coverage.

The second network is the final network containing Galileo and Voyager images of Europa. The issues from the initial network have been resolved and the final point cloud covers the majority of the body.

%% Cell type:code id: tags:

``` python
galileo_net = '/scratch/jmapel/europa/networks/GLL_FFCombined_thin_SubReg2_Del_2.net'
final_net = '/scratch/jmapel/europa/networks/GalileoVoyager_Europa_Merged_2020_CilixFree.net'
```

%% Cell type:markdown id: tags:

# The control network dataframe

PLIO directly ingests the data from the control network file. Each row in the dataframe is a single control measure and each column is a field from the protobuf control network. The data for control points is stored implicitly in its measures.

%% Cell type:code id: tags:

``` python
# This function is what reads a control network file
from plio.io.io_controlnetwork import from_isis

galileo_df = from_isis(galileo_net)
galileo_df.describe()
```

%% Cell type:markdown id: tags:

### Exercise: How many measures are there in the network? How many points are there in the network? How many images are there in the network?

tip: use len(dataframe) to find the number of rows in a dataframe

tip: use dataframe["columnName"].nunique() to find the number of unique values in a column

%% Cell type:code id: tags:

``` python
```

%% Cell type:markdown id: tags:

## Data types
The different columns of our dataframe store different types of data. The cell below shows all of the the data types in the dataframe. You can see all of the different possible datatypes for a dataframe in the [pandas docs](https://pandas.pydata.org/pandas-docs/stable/user_guide/basics.html#basics-dtypes).

%% Cell type:code id: tags:

``` python
galileo_df.dtypes
```

%% Cell type:markdown id: tags:

Most of the data types are straightforward. For example, the line and sample are 64-bit floats. Let's dig into the more unusual types.

%% Cell type:markdown id: tags:

**pointType, measureType, aprioriSurfPointSource, and aprioriRadiusSource** are 64 bit integers, but those integers correspond to enumerations. For example, a pointType of 2 means Free. See the tables below for all of the enumerations

%% Cell type:code id: tags:

``` python
galileo_df[['pointType', 'measureType', 'aprioriSurfPointSource']].head()
```

%% Cell type:markdown id: tags:

<center>**pointType**</center>

| Value | Name              |
| ----: | :---------------- |
| 0     | Tie (obsolete)    |
| 1     | Ground (obsolete) |
| 2     | Free              |
| 3     | Constrained       |
| 4     | Fixed             |

%% Cell type:markdown id: tags:

<center>**measureType**</center>

| Value | Name               |
| ----: | :----------------- |
| 0     | Candidate          |
| 1     | Manual             |
| 2     | RegisteredPixel    |
| 3     | RegisteredSubPixel |

%% Cell type:markdown id: tags:

<center>**aprioriSurfPointSource & aprioriRadiusSource **</center>

| Value | Name              |
| ----: | :---------------- |
| 0     | None              |
| 1     | User              |
| 2     | AverageOfMeasures |
| 3     | Reference         |
| 4     | Ellipsoid         |
| 5     | DEM               |
| 6     | Basemap           |
| 7     | BundleSolution    |

%% Cell type:markdown id: tags:

### Exercise: Have any measure in this network been sub-pixel registered?

tip: look at the measure types

%% Cell type:code id: tags:

``` python
```

%% Cell type:markdown id: tags:

**id, pointChoosername, pointDatetime, aprioriSurfPointSourceFile, aprioriRadiusSourceFile, serialnumber, measureChoosername, and measureDatetime** are all listed as objects but are simply strings.

%% Cell type:code id: tags:

``` python
galileo_df[['id', 'serialnumber', 'pointChoosername', 'pointDatetime', 'measureChoosername', 'measureDatetime']].head()
```

%% Cell type:markdown id: tags:

**adjustedCovar, pointLog, and measureLog** are more complicated. We will go over adjustedCovar later with the final Euroap network. pointLog is leftover from older network formats and can be ignored. measureLog contains information about the registration of the measure.

%% Cell type:code id: tags:

``` python
galileo_df.loc[1,'measureLog']
```

%% Cell type:markdown id: tags:

## Data availability
Depending on how your network was generated and what processing has been done, many fields will not be set. If a numerical field has a value of 0, then it has not been set. For example, our network has not been bundle adjusted, so there are only a priori ground points.

%% Cell type:code id: tags:

``` python
galileo_df[['aprioriX', 'aprioriY', 'aprioriZ', 'adjustedX', 'adjustedY', 'adjustedZ']].describe()
```

%% Cell type:markdown id: tags:

### Exercise: Can you find all of the fields that are completely unset in our control network?

tip: numerical fields default to 0, strings default to an empty string "", and boolean values default to False.

%% Cell type:code id: tags:

``` python
```

%% Cell type:markdown id: tags:

You can also check which columns are default programmaticaly. The following cell checks if all of the values in a column are a default value.

%% Cell type:code id: tags:

``` python
(galileo_df==0).all() | (galileo_df=="").all() | (galileo_df==False).all()
```

%% Cell type:markdown id: tags:

# Looking at a bundle adjusted control network

Our Galileo network is interesting but networks have significantly more useful information in them after bundle adjustment. So, let's take a look at the final Europa network.

%% Cell type:code id: tags:

``` python
final_net_df = from_isis(final_net)
final_net_df.describe()
```

%% Cell type:markdown id: tags:

### Exercise: What fields are set in the bundle adjusted network that weren't previously?

%% Cell type:code id: tags:

``` python
```

%% Cell type:markdown id: tags:

## Analyzing the measures
The data in a control network dataframe is not always in the format we want to work with. The measure residuals are broken down into the line and sample residuals. The following cell computes the full magnitude of the residuals and adds it to the dataframe under the "residualMag" column.

%% Cell type:code id: tags:

``` python
final_net_df['residualMag'] = np.sqrt(final_net_df['sampleResidual']**2 + final_net_df['lineResidual']**2)
```

%% Cell type:markdown id: tags:

Now let's plot the residuals and see if we can form any theories. The next cell imports matplotlib for plotting tools and then plots the residuals in terms of sample and line residual. Note that the color of points is based on the residual magnitude, whcih should give a nice bullseye effect.

%% Cell type:code id: tags:

``` python
# This allows us to interact with our plots. This must be set before importing pyplot
%matplotlib notebook

# General plotting library
import matplotlib
import matplotlib.pyplot as plt

resid_fig = plt.figure(figsize=(6, 6))
resid_ax = resid_fig.add_subplot(111)
resid_scatter = resid_ax.scatter(final_net_df['sampleResidual'], final_net_df['lineResidual'], c=final_net_df['residualMag'], marker='+')
resid_ax.set_aspect('equal')
plt.axhline(0, color='black')
plt.axvline(0, color='black')
resid_cbar = plt.colorbar(resid_scatter)
resid_fig.suptitle('Bundle Adjusted Measure Residuals')
resid_ax.set_xlabel('Sample Residual')
resid_ax.set_ylabel('Line Residual')
resid_cbar.set_label('Residual Magnitude')
plt.show()
```

%% Cell type:markdown id: tags:

We can also color our points based on other properties. Let's try and separate the measures out by mission. The serial numbers should help us so let's look at the serial numbers for all of our images.

%% Cell type:code id: tags:

``` python
final_net_df['serialnumber'].unique()
```

%% Cell type:markdown id: tags:

Each serial number starts with the mission name, which makes separating them out easy. All we need to do is check if the beginning of the serial number matches our mission.

The pd.DataFrame.str package allows us to do this type of string comparisons quickly and easily. Here we will use the DataFrame.str.startswith method.

%% Cell type:code id: tags:

``` python
final_galileo_df = final_net_df[final_net_df['serialnumber'].str.startswith('Galileo')]
final_voyager1_df = final_net_df[final_net_df['serialnumber'].str.startswith('Voyager1')]
final_voyager2_df = final_net_df[final_net_df['serialnumber'].str.startswith('Voyager2')]
```

%% Cell type:markdown id: tags:

Now let's plot the measures and color them based on their mission.

%% Cell type:code id: tags:

``` python
inst_resid_fig = plt.figure(figsize=(6, 6))
inst_resid_ax = inst_resid_fig.add_subplot(111)
inst_resid_ax.scatter(final_galileo_df['sampleResidual'], final_galileo_df['lineResidual'], color='Green', marker='+', alpha=0.25, label='Galileo')
inst_resid_ax.scatter(final_voyager1_df['sampleResidual'], final_voyager1_df['lineResidual'], color='Red', marker='+', alpha=0.25, label='Voyager1')
inst_resid_ax.scatter(final_voyager2_df['sampleResidual'], final_voyager2_df['lineResidual'], color='Blue', marker='+', alpha=0.25, label='Voyager2')
inst_resid_ax.set_aspect('equal')
plt.axhline(0, color='black')
plt.axvline(0, color='black')
plt.legend()
inst_resid_fig.suptitle('Bundle Adjusted Measure Residuals by Mission')
inst_resid_ax.set_xlabel('Sample Residual')
inst_resid_ax.set_ylabel('Line Residual')
plt.show()
```

%% Cell type:markdown id: tags:

### What can you say about the residuals for the different missions based on our plot?

%% Cell type:markdown id: tags:

### Exercise: What the descriptive statistics for the residual magnitude of the Galileo measures? What about for Voyager 1 and Voyager 2?

%% Cell type:code id: tags:

``` python
final_galileo_df['residualMag'].describe()
```

%% Cell type:code id: tags:

``` python
final_voyager1_df['residualMag'].describe()
```

%% Cell type:code id: tags:

``` python
final_voyager2_df['residualMag'].describe()
```

%% Cell type:markdown id: tags:

### Do you notice anything interesting about the residual magnitudes for the different instruments? How does this compare to what you noticed with the scatter plot?

%% Cell type:markdown id: tags:

We can even test if the measure residuals are normally distributed. The following cell performs a chi-squared test to see if the residual magnitudes could reasonably come from a normal distribution. This is important because it will tell us if we have large blunders in our network or systematic error from something like a bad sensor model.

%% Cell type:code id: tags:

``` python
# Statistics library
from scipy import stats

alpha = 1e-3 # 99.999% confidence
_, normal_test_result = stats.normaltest(final_voyager1_df['residualMag'])
print(f'Chi-squared test statistic: {normal_test_result}')
if (normal_test_result < alpha):
    print("The residuals are normally distributed")
else:
    print("The residuals may not be normally distributed")
```

%% Cell type:markdown id: tags:

## Analyzing the points
The information for control points is duplicated for each measure they have. So, the first step in looking at control point data is to extract only the data we want from the dataframe. This will make the dataframe easier to read and it will make things run quicker.

To do this, we're going to first extract all of the columns with point data. Then, we're going extract the first measure from each point. After all is said and done, we will have a dataframe with columns related to the point info and only one row for each point.

%% Cell type:code id: tags:

``` python
point_columns = ['id',
                 'pointType',
                 'pointChoosername',
                 'pointDatetime',
                 'pointEditLock',
                 'pointIgnore',
                 'pointJigsawRejected',
                 'aprioriSurfPointSource',
                 'aprioriSurfPointSourceFile',
                 'aprioriRadiusSource',
                 'aprioriRadiusSourceFile',
                 'latitudeConstrained',
                 'longitudeConstrained',
                 'radiusConstrained',
                 'aprioriX',
                 'aprioriY',
                 'aprioriZ',
                 'aprioriCovar',
                 'adjustedX',
                 'adjustedY',
                 'adjustedZ',
                 'adjustedCovar',
                 'pointLog']
final_points_df = final_net_df[point_columns].drop_duplicates('id')
final_points_df.describe()
```

%% Cell type:markdown id: tags:

Next, we're going to transform the point data so that it's more useful to us. This cell will take the (X, Y, Z) adjusted ground points and convert them to (lat, lon, radius) using a library called pyproj. pyproj is a very powerful projections library and can do many cartofraphic transformations and projections.

**Note: This cell will generate a warning because we are using old pyproj.Proj calls which will eventually need to change. For now we can ignore the warning.**

%% Cell type:code id: tags:

``` python
# Projection library for switching between rectangular and latitudinal
os.environ['PROJ_LIB'] = '/usgs/cpkgs/anaconda3_linux/envs/autocnet/share/proj'
import pyproj

# Compute the lat/lon/alt
europa_radii = [1562600, 1560300, 1559500]
ecef = pyproj.Proj(proj='geocent', a=europa_radii[0], b=europa_radii[1], c=europa_radii[2])
lla = pyproj.Proj(proj='latlong', a=europa_radii[0], b=europa_radii[1], c=europa_radii[2])
lon, lat, alt = pyproj.transform(ecef, lla, final_points_df['adjustedX'].values, final_points_df['adjustedY'].values, final_points_df['adjustedZ'].values, radians=True)

# Store the data in the dataframe
final_points_df['latitude'] = lat
final_points_df['longitude'] = lon
final_points_df['altitude'] = alt

# We will also want the point radii
final_points_df['radius'] = np.sqrt(final_points_df['adjustedX']**2 + final_points_df['adjustedY']**2 + final_points_df['adjustedZ']**2)
```

%% Cell type:markdown id: tags:

Because of how we defined our projection, the latitude and longitude values will be in radians. Also, the longitude will be in 180 postiive East. You can change this by modifying how you use pyproj but that is outside of this workshop.

%% Cell type:code id: tags:

``` python
final_points_df[["latitude", "longitude", "altitude", "radius", "averageResidual"]].describe()
final_points_df[["latitude", "longitude", "altitude", "radius"]].describe()
```

%% Cell type:markdown id: tags:

### Exercise: Convert the latitude and longitude from radians to degrees:

%% Cell type:code id: tags:

``` python
final_points_df[["latitude", "longitude"]] = np.rad2deg(final_points_df[["latitude", "longitude"]])
```

%% Cell type:markdown id: tags:

Similar to how we computed the residual magnitude, we want to compute the average residual magnitude for each point. The following cell goes back to our original dataframe, computes the mean point by point, and then saves all of the results in our new dataframe.

**Note: This cell can take a while to run because it has to re-access the dataframe for every point**

%% Cell type:code id: tags:

``` python
final_points_df["averageResidual"] = 0
for point_id, group in final_net_df.groupby('id'):
    final_points_df.loc[final_points_df.id == point_id, "averageResidual"] = group['residualMag'].mean()
```

%% Cell type:markdown id: tags:

### Exercise: What is the 95% percentile for the average residuals?
### Exercise: What is the 95th percentile for the average residuals?

%% Cell type:code id: tags:

``` python
```

%% Cell type:markdown id: tags:

## Plotting the points
Now that we have latitudes and longitudes for each point, we can generate some simple plots to look at them.

%% Cell type:code id: tags:

``` python
point_map = plt.figure(figsize=(10, 10))
point_ax = point_map.add_subplot(111)
point_ax.scatter(final_points_df["longitude"], final_points_df["latitude"], marker='+')
point_map.suptitle('Control Points')
point_ax.set_xlabel('Longitude')
point_ax.set_ylabel('Latitude')
plt.show()
```

%% Cell type:markdown id: tags:

It can also be helpful to color the points based on different values. The following cell draws the same plot but colors each point based on its average residual. Because the residuals are not uniformly distributed we also apply a lograithmic scale to the colors that you can see in the colorbar.

%% Cell type:code id: tags:

``` python
point_resid_map = plt.figure(figsize=(10, 10))
point_resid_ax = point_resid_map.add_subplot(111)
point_resid_norm = matplotlib.colors.LogNorm(vmax=final_points_df["averageResidual"].max())
point_resid_scatter = point_resid_ax.scatter(final_points_df["longitude"], final_points_df["latitude"], c=final_points_df["averageResidual"], alpha=0.5, norm=point_resid_norm, marker='+', cmap=plt.get_cmap('plasma'))
point_resid_cbar = plt.colorbar(point_resid_scatter)
point_resid_map.suptitle('Control Points')
point_resid_ax.set_xlabel('Longitude')
point_resid_ax.set_ylabel('Latitude')
point_resid_cbar.set_label('Average Residual Magnitude')
plt.show()
```

%% Cell type:markdown id: tags:

Plotting individual points can be helpful getting a general idea for the distribution of the points, but it can be hard to interpret the data in area where there are many points all ontop of each other. So, let's combine near by points and determine the residual based on the region.

To do this, we're going to bin the points into a regular grid across the latitude and longitude and then compute the mean within each bin.

**Try changing the grid_step value and re-running the two cells**

%% Cell type:code id: tags:

``` python
grid_step = 10

final_points_df['lonBin'] = final_points_df['longitude'].apply(lambda x: [e for e in range(-180, 180, grid_step) if e <= x][-1])
final_points_df['latBin'] = final_points_df['latitude'].apply(lambda x: [e for e in range(-90, 90, grid_step) if e <= x][-1])

avg_resid_binned = final_points_df.groupby(['lonBin', 'latBin'])['averageResidual'].mean()

filled_data = []
for lon_bin in range(-180, 180, grid_step):
    for lat_bin in range(-90, 90, grid_step):
        try:
            filled_data.append(avg_resid_binned.loc[lon_bin, lat_bin])
        except:
            filled_data.append(0)
filled_data = np.array(filled_data).reshape((int(360/grid_step), int(180/grid_step))).T
```

%% Cell type:code id: tags:

``` python
avg_gridded = plt.figure(figsize=(10, 5))
avg_gridded_ax = avg_gridded.add_subplot(111)
avg_gridded_plot = avg_gridded_ax.imshow(filled_data, origin='lower', extent= [-180, 180, -90, 90], cmap=plt.get_cmap('plasma'))
avg_gridded_ax.scatter(final_points_df["longitude"], final_points_df["latitude"], color='black', marker='+', alpha=0.1)
avg_gridded_cbar = plt.colorbar(avg_gridded_plot)
avg_gridded.suptitle('Average Residual by lat/lon grid')
avg_gridded_ax.set_xlabel('Longitude')
avg_gridded_ax.set_ylabel('Latitude')
avg_gridded_cbar.set_label('Average Residual Magnitude')
plt.show()
```

%% Cell type:markdown id: tags:

## 3D Plotting
2D plotting either requires these simple equal area projections or converting to another projection via pyproj. Instead, let's look at our data in true 3D.

The following cell plots the same data as before but plots it in 3d instead of just a 2d projection

%% Cell type:code id: tags:

``` python
resid_fig_3d = plt.figure(figsize=(10, 10))
resid_ax_3d = resid_fig_3d.add_subplot(111, projection='3d')
resid_plot_3d = resid_ax_3d.scatter(final_points_df['adjustedX'], final_points_df['adjustedY'], final_points_df['adjustedZ'], c=final_points_df["averageResidual"], alpha=0.5, norm=point_resid_norm, marker='+', cmap=plt.get_cmap('plasma'))
resid_cbar_3d = plt.colorbar(resid_plot_3d)
resid_fig_3d.suptitle('3D Control Points')
resid_cbar_3d.set_label('Average Residual Magnitude (pix)')
plt.show()
```

%% Cell type:code id: tags:

``` python
```
Loading