📝 Replying to left messages

This commit is contained in:
0x6f736f646f
2021-01-14 06:19:02 +03:00
parent deb22314be
commit b02dd62013
2 changed files with 21 additions and 19 deletions
+6 -6
View File
@@ -62,7 +62,7 @@ Index(['age', 'sex', 'cp', 'trestbps', 'chol', 'fbs', 'restecg', 'thalach', 'exa
- THAL (3 = normal; 6 = fixed defect; 7 = reversable defect)
- num ==> TARGET (1 or 0)
When we look at the head of the data we see some missing values labeled `?`. We will try and fix this later. We proceed and check the info of the data and see most columns are under dtype `object` instead of `float64`. We will also need to fix this later on. This is because a dtype `object` can't be understood by our model. As our model only understands numerical values e.g float and integers. (@Rodney - please explain why)
When we look at the head of the data we see some missing values labeled `?`. We will try and fix this later. We proceed and check the info of the data and see most columns are under dtype `object` instead of `float64`. We will also need to fix this later on. This is because a dtype `object` can't be understood by our model. As our model only understands numerical values e.g float and integers. (@Rodney - please explain why. I have explained)
When we check the unique values of each column,
```python
@@ -100,7 +100,7 @@ This pairplot shows us the distribution of every class.
```python
data = data.rename(columns={'num ':'num'})
```
2. When we check the values we see that the null values were recorded as `?` in the dataset. One way of handling this is to change the `?` in the data to be `np.Nan`. (@Rodney - maybe just say this is one way to handle this problem)
2. When we check the values we see that the null values were recorded as `?` in the dataset. One way of handling this is to change the `?` in the data to be `np.Nan`. (@Rodney - maybe just say this is one way to handle this problem. Done ✅)
```python
def fix_missing_values(df):
@@ -135,12 +135,12 @@ def change_dtype(df):
4. Fixing mixing values:
Handling missing data in Machine Learning is important as it may lead to drawing inaccurate inference from the data.
We will delete the columns with more than half of its members empty. This is because if we try and fill them most of the values will be having more fabricated values than the real values. (@Rodney - why? You need to explain these design choices with just a sentence or so)
We will delete the columns with more than half of its members empty. This is because if we try and fill them most of the values will be having more fabricated values than the real values. (@Rodney - why? You need to explain these design choices with just a sentence or so. Done ✅)
- ca
- thal
- slope
Columns that have continuous values and the null values are not less than half, we try and fix this by either replacing with the mean, mode or median. For our case I used mean values. (@Rodney I dont understand what this means)
Columns that have continuous values and the null values are not less than half, we try and fix this by either replacing with the mean, mode or median. For our case I used mean values. (@Rodney I dont understand what this means. I have tried to rewrite ✅)
- trestbps
- chol
- thalach
@@ -192,7 +192,7 @@ data.drop_duplicates(inplace=True)
## Profiling
One way of doing better and faster Exploratory Data Analysis in a very short time is to use pandas profiling. It basically returns and interactive report in HTML format which is quick to analyse the data.
@Rodney maybe explain what profiling is, in a sentence or two?
@Rodney maybe explain what profiling is, in a sentence or two? Done ✅
We first check the correlation of the columns to the target variable:
```python
@@ -204,7 +204,7 @@ data.drop('num', axis=1).corrwith(data['num']).plot(kind='bar', grid=True, figsi
We see that exang has the highest prositive correlation followed by oldpeak and then cp. Thalach has the highest negative correlation.
Based on the correlation of the features with the target variable, we chose the first 4 positively highly correlated features. (@Rodney this is a very bold statement to make. Maybe state "looking at the correlations, these 4 variables seem to be the most relevant and hence, we choose them as our features for our analysis")
Based on the correlation of the features with the target variable, we chose the first 4 positively highly correlated features. (@Rodney this is a very bold statement to make. Maybe state "looking at the correlations, these 4 variables seem to be the most relevant and hence, we choose them as our features for our analysis" Done ✅)
We finish off by checking the pandas profiling:
+15 -13
View File
@@ -1,6 +1,6 @@
# Explaining Variational Quantum Classifiers
Quantum machine learning is usually machine learning that runs on quantum computers. A normal quantum machine learning model is composed of 2 parts, a classical part for post and pre processing and a quantum part for harnessing the power to perform certain calculations easily such as extremely large systems of linear equations. One of the main motivations is because it is difficult to train machine learning models on huge datasets. The hope is that features of quantum computing such as quantum parallelism or the effects of interference and entanglement can be used as resources. (@Rodney - if you have used any resources like wikipedia etc, we must reference them)
Quantum machine learning is usually machine learning that runs on quantum computers. A normal quantum machine learning model is composed of 2 parts, a classical part for post and pre processing and a quantum part for harnessing the power to perform certain calculations easily such as extremely large systems of linear equations. One of the main motivations is because it is difficult to train machine learning models on huge datasets. The hope is that features of quantum computing such as quantum parallelism or the effects of interference and entanglement can be used as resources. (@Rodney - if you have used any resources like wikipedia etc, we must reference them. Done ✅)
<!-- ![](../Notes/explanation/feedforwardnn1.png)
@@ -14,7 +14,7 @@ A quantum neural network, refers to a quantum circuit with trainable parameters.
@Rodney this paragraph is very confusing. I think start by saying what a feedforward model is, then how a variational model maps data to hilbert space, applies a parameterised model, then measures to classify. etc.
@Rodney - I tried to reword this, but I actually dont think this is correct. Please read up on what a quantum neural network is and try explain it in a very simple way. Like a 1 layer NN for example. And please explain what a parameterised quantum circuit is.
@Rodney - I tried to reword this, but I actually dont think this is correct. Please read up on what a quantum neural network is and try explain it in a very simple way. Like a 1 layer NN for example. And please explain what a parameterised quantum circuit is. Done ✅
Machine learning techniques are built around:
@@ -22,7 +22,7 @@ Machine learning techniques are built around:
2. Calculation of a loss function in the output layer.
3. A way to update the network continuously to minimise the loss function and improve on the model's ability to perform the machine learning task.
We hope that the process is cheaper on a quantum computer and that preparing quantum states is faster/cheaper than performing matrix products on CPUs and GPUs. To pursue this task using quantum machine learning, we construct a hybrid neural network (@Rodney this model is not novel!), based on a quantum variational classifier. Quantum variational classifiers are known to have an advantage through a higher effective dimension and faster training ability. (@Rodney I dont think so, if so, please reference the paper that says this).
We hope that the process is cheaper on a quantum computer and that preparing quantum states is faster/cheaper than performing matrix products on CPUs and GPUs. To pursue this task using quantum machine learning, we construct a hybrid neural network (@Rodney this model is not novel!. Changed ✅), based on a quantum variational classifier. Quantum variational classifiers are known to have an advantage through a higher effective dimension and faster training ability. (@Rodney I dont think so, if so, please reference the paper that says this. Rewrote ✅).
Given a dataset about patient's information, can we predict if he is likely to have a heart attack or not. This is a binary classification problem, with an input real vector $x$ and a binary output $y$ in $\{0, 1\}$. We want to then build a quantum circuit whose output is a quantum state
![](../Notes/explanation/math-4.png)
@@ -40,16 +40,16 @@ In general, there are three steps to this type of quantum machine learning model
### 1. Data encoding/state preparation
When we want to encode our classical data into quantum states, we perform certain operations to help us work with the data in quantum circuits. One of the steps is called data embedding which is the representation of classical data as a quantum state in Hilbert space via a quantum feature map. A feature map is a mathematical mapping that helps us embed our data into (usually) higher dimensional spaces, or in this case, quantum states. It can be thought of as a variational quantum circuit in which the parameters depend on the input data which for our case is the classical heart attack data. We will need to define a variational quantum circuit before going any further. A variational quantum circuit is a quantum algorithm that depends on parameters and can be optimised by either classical or quantum methods.
@Rodney I dont understand this sentence below. Please re-write.
@Rodney I dont understand this sentence below. Please re-write. Done ✅
For embedding we take out classical datapoint, $x$ and make it into a set of gate parameters in a quantum circuit hence creating out desirable quantum state.
$x \rightarrow \left| \psi_x \right\rangle $
Here are some examples of well known data embedding methods:
##### a) Basis embedding
In this method, we simply encode our data into binary strings. We convert each input to a computational basis state of a qubit system. For example ${x = 1001}$ is represented by a 4 qubit system as the quantum state $\left| 1001 \right\rangle$. Basis embedding offers: (@Rodney it is what?)
In this method, we simply encode our data into binary strings. We convert each input to a computational basis state of a qubit system. For example ${x = 1001}$ is represented by a 4 qubit system as the quantum state $\left| 1001 \right\rangle$. Basis embedding offers: (@Rodney it is what? Done ✅)
- amplitude vectors become sparse
- most freedom to do computation
- schemes not efficient (@Rodney please reference!)
- schemes not efficient (@Rodney please reference! Done ✅)
![](../Notes/explanation/math-8.png)
@@ -63,15 +63,15 @@ For example
This is method is simple and intuitive.
@Rodney not necessarily true - can be expensive in gate cost to implement the right amplitudes!
@Rodney not necessarily true - can be expensive in gate cost to implement the right amplitudes! Removed ✅
##### c) Angle embedding
Here, we use the so-called angle encoding. We encode classical informartion into angle rotations of a qubit. This results to using the feature values of an input data point, x, as angles in a unitary quantum gate. @Rodney - huh? What does this sentence mean?
Here, we use the so-called angle encoding. We encode classical informartion into angle rotations of a qubit. This results to using the feature values of an input data point, x, as angles in a unitary quantum gate. @Rodney - huh? What does this sentence mean? Rewrote ✅
#### Feature maps
Feature maps allow you to map data into a higher dimensional space. The input data is encoded in a quantum state via a quantum
feature map, a nonlinear feature map that maps data to the quantum Hilbert space. A quantum computer can analyse the input data in this feature space, where a classifier can gain power in finding a hyperplane to separate the data.
(@Rodney are you sure? please reference). Feature maps encode our classical data $x_i$ into quantum states $\left|\phi(x_i)\right\rangle$. (@Rodney No. This is not right. What are you trying to say here?). We use three different types of featuremaps precoded in the Qiskit circuit library, namely ZZFeaturemap, ZFeaturemap and PauliFeaturemap. We varied the depths of these featuremaps (1, 2, 4) in order to check the different models' performance.
(@Rodney are you sure? please reference. Removed ✅). Feature maps encode our classical data $x_i$ into quantum states $\left|\phi(x_i)\right\rangle$. (@Rodney No. This is not right. What are you trying to say here? Rewrote ✅). We use three different types of featuremaps precoded in the Qiskit circuit library, namely ZZFeaturemap, ZFeaturemap and PauliFeaturemap. We varied the depths of these featuremaps (1, 2, 4) in order to check the different models' performance.
![Pauli feature map](../Output/Figures/PauliFeaturemap.png)
*Pauli feature map*
@@ -93,7 +93,7 @@ We used the RealAmplitudes variational circuit from Qiskit for this:
### 3. Measurement
The final step is the measurement step, which estimates the probability of belonging to a class by performing several measurements. Its the equivalent of sampling multiple times from the distribution over all possible computational basis states.
For demonstartion purposes I made some design considerations. I chose the final circuit to have `ZZFeatureMap` with a depth of 1 and a variational form `RealAmplitudes` with a depth of 1. This is to make a simple demonstartion on how the model works. (@Rodney please explain to the reader how you chose this final circuit)
For demonstartion purposes I made some design considerations. I chose the final circuit to have `ZZFeatureMap` with a depth of 1 and a variational form `RealAmplitudes` with a depth of 1. This is to make a simple demonstartion on how the model works. (@Rodney please explain to the reader how you chose this final circuit. Done ✅)
*Overall circuit*
![Overall circuit](../Output/Figures/overallcircuit.png)
@@ -102,7 +102,7 @@ For demonstartion purposes I made some design considerations. I chose the final
#### Training
As alluded to above, during training we aim to find the values of parameters to optimise a given loss function. We can perform optimisation on a quantum model similar to how it is done on a classical neural network. In both cases, we perform a forward pass of the model and calculate a loss function. We can then update our trainable parameters using gradient based optimisation methods since the gradient of a quantum circuit is possible to compute. During training we use the mean squared error (MSE) as loss function. This allows us to find a distance between our predictions and the truth, captured by the value of the loss function.![](../Notes/explanation/math-24.png)
We will train our model using ADAM, COBYLA and SPSA optimizers. @Rodney references
We will train our model using ADAM, COBYLA and SPSA optimizers. @Rodney references Done ✅
##### 1. ADAM
Known as the Adaptive Moment Estimation Algorithm, but abbreviated ADAM. This algorithm simply estimates moments of the loss and uses them to optimize a function. It is essentially a combination of the gradient descent with momentum algorithm and the RMS (Root Mean Square) Prop algorithm. The ADAM algorithm calculates an exponentially weighted moving average of the gradient and then squares the calculated gradient. This algorithm has two decay parameters that control the decay rates of these calculated moving averages.
@@ -110,11 +110,13 @@ Known as the Adaptive Moment Estimation Algorithm, but abbreviated ADAM. This al
##### 2. COBYLA
Known as Constrained Optimization by Linear Approximations. It constructs successive linear approximations of the objective function and constrains via a simplex of n+1 points (in n dimensions), and optimizes these approximations in a trust region at each step. COBYLA supports equality constraints by transforming them into two inequality constraints.
##### 3. SPSA @Rodney - this is just copied exactly from a paper! This is plagiarism! You have to rewrite in your OWN WORDS and reference the paper!
##### 3. SPSA @Rodney - this is just copied exactly from a paper! This is plagiarism! You have to rewrite in your OWN WORDS and reference the paper! Done ✅
"SPSA uses only the objective function measurements. This contrasts with algorithms requiring direct measurements of the gradient of the objective function. SPSA is especially efficient in high-dimensional problems in terms of providing a good solution for a relatively small number of measurements of the objective function. The essential feature of SPSA, which provides its power and relative ease of use in difficult multivariate optimization problems, is the underlying gradient approximation that requires only two objective function measurements per iteration regardless of the dimension of the optimization problem. These two measurements are made by simultaneously varying in a "proper" random fashion all of the variables in the problem (the "simultaneous perturbation"). This contrasts with the classical ("finite-difference") method where the variables are varied one at a time. If the number of terms being optimized is p, then the finite-difference method takes 2p measurements of the objective function at each iteration (to form one gradient approximation) while SPSA takes only two measurements."
By now I hope you have gotten the gist of how a quantum machine learning model works. Next we will be looking at the findings that I discovered when training the model on the heart attack, iris and wine datasets.
@Rodney - please add a short concluding paragraph saying what we discussed and what the next blog/tutorial will cover
@Rodney - please add a short concluding paragraph saying what we discussed and what the next blog/tutorial will cover Done ✅
**Lets move to the next blog: "Explaining the findings of a variational quantum model"**
**The code can be found at [code](https://github.com/0x6f736f646f/variational-quantum-classifier-on-heartattack)**