11

Asset Allocation

Asset allocation is the most important decision that any investor needs to face, and there is no one-size-fits-all solution that can work for each and every investor. By asset allocation, we mean spreading the investor’s total investment amount over certain assets (be it stocks, options, bonds, or any other financial instruments). When considering the allocation, the investor wants to balance the risk and the potential reward. At the same time, the allocation is dependent on factors such as the individual goals (expected return), risk tolerance (how much risk the investor is willing to accept), or the investment horizon (short-or long-term investment).

The key framework in asset allocation is the modern portfolio theory (MPT, also known as mean-variance analysis). It was introduced by the Nobel recipient Harry Markowitz and describes how risk-averse investors can construct portfolios to maximize their expected returns (profits) for a given level of risk. The main insight from MPT is that investors should not evaluate an asset’s performance alone (by metrics such as expected return or volatility), but instead, investigate how it would impact the performance of their portfolio of assets.

MPT is closely related to the concept of diversification, which simply means that owning different kinds of assets reduces risk, as the loss or gain of a particular security has less impact on the overall portfolio’s performance. Another key concept to be aware of is that while the portfolio return is the weighted average of the individual asset returns, this is not true for the risk (volatility). That is because the volatility is also dependent on the correlations between the assets. What is interesting is that thanks to optimized asset allocation, it is possible to have a portfolio with lower volatility than the lowest individual volatility of the assets in the portfolio. In principle, the lower the correlation between the assets we hold, the better it is for diversification. With a perfect negative correlation, we could diversify all the risk.

The main assumptions of modern portfolio theory are:

  • Investors are rational and aim to maximize their returns while avoiding risks whenever possible.
  • Investors share the goal to maximize their expected returns.
  • All investors have the same level of information about potential investments.
  • Commissions, taxes, and transaction costs are not taken into account.
  • Investors can borrow and lend money (without limits) at a risk-free rate.

In this chapter, we start with the most basic asset allocation strategy, and on its basis, learn how to evaluate the performance of portfolios (also applicable to individual assets). Later on, we show three different approaches to obtaining the efficient frontier, while also relaxing some of the assumptions of MPT. One of the main benefits of learning how to approach optimization problems is that they can be easily refactored, for example, by optimizing a different objective function or adding specific constraints on the weights. This requires only slight modifications to the code, while the majority of the framework stays the same. At the very end, we explore a novel approach to asset allocation based on the combination of graph theory and machine learning—Hierarchical Risk Parity.

We cover the following recipes in this chapter:

  • Evaluating an equally-weighted portfolio’s performance
  • Finding the efficient frontier using Monte Carlo simulations
  • Finding the efficient frontier using optimization with SciPy
  • Finding the efficient frontier using convex optimization with CVXPY
  • Finding the optimal portfolio with Hierarchical Risk Parity

Evaluating an equally-weighted portfolio’s performance

We begin with inspecting the most basic asset allocation strategy: the equally-weighted (1/n) portfolio. The idea is to assign equal weights to all the considered assets, thus diversifying the portfolio. As simple as that might sound, DeMiguel, Garlappi, and Uppal (2007) show that it can be difficult to beat the performance of the 1/n portfolio by using more advanced asset allocation strategies.

The goal of the recipe is to show how to create a 1/n portfolio of the FAANG companies (Facebook/Meta, Amazon, Apple, Netflix, and Google/Alphabet), calculate its returns, and then use the quantstats library to quickly obtain all relevant portfolio evaluation metrics in the form of a tear sheet. Historically, a tear sheet is a concise (usually one-page) document summarizing important information about public companies.

How to do it...

Execute the following steps to create and evaluate the 1/n portfolio:

  1. Import the libraries:
    import yfinance as yf
    import numpy as np
    import pandas as pd
    import quantstats as qs
    
  2. Define the considered assets and download their prices from Yahoo Finance:
    ASSETS = ["META", "AMZN", "AAPL", "NFLX", "GOOG"]
    n_assets = len(ASSETS)
    prices_df = yf.download(ASSETS,
                            start="2020-01-01",
                            end="2021-12-31",
                            adjusted=True)
    
  3. Calculate individual asset returns:
    returns = prices_df["Adj Close"].pct_change().dropna()
    
  4. Define the weights:
    portfolio_weights = n_assets * [1 / n_assets]
    
  5. Calculate the portfolio returns:
    portfolio_returns = pd.Series(
        np.dot(portfolio_weights, returns.T), 
        index=returns.index
    )
    
  6. Generate basic performance evaluation plots:
    qs.plots.snapshot(portfolio_returns,
                      title="1/n portfolio's performance",
                      grayscale=True)
    

    Executing the snippet generates the following figure:

    Figure 11.1: Selected evaluation metrics of the 1/n portfolio

    The created snapshot consists of cumulative portfolio returns, the underwater plot depicting the drawdown periods (we will explain it in the How it works… section), and daily returns.

  1. Calculate the basic portfolio evaluation metrics:
    qs.reports.metrics(portfolio_returns,
                       benchmark="SPY",
                       mode="basic",
                       prepare_returns=False)
    

    Executing the snippet returns the following metrics for our portfolio and the benchmark:

Figure 11.2: Performance evaluation metrics of the 1/n portfolio and the S&P 500 benchmark

We describe some of the metrics presented in Figure 11.2 in the following section.

How it works...

In Steps 1 to 3, we followed the already established approach—imported the libraries, set up the parameters, downloaded stock prices of the FAANG companies from the years 2020 to 2021, and calculated simple returns using the adjusted close prices.

In Step 4, we created a list of weights, each one equal to 1/n_assets, where n_assets is the number of assets we want to have in our portfolio. Next, we calculated the portfolio returns as a matrix multiplication (also known as the dot product) of the portfolio weights and a transposed matrix of asset returns. To transpose the matrix, we used the T method of a pandas DataFrame. Then, we stored the portfolio returns as a pandas Series object, because that is the input for the ensuing step.

In the first edition of the book, we explored the performance of the 1/n portfolio using the pyfolio library. However, since that time, the company that was responsible for the library (Quantopian) was closed, and the library is not actively maintained anymore. The library can still be used, as we show in the additional notebook available in the book’s GitHub repository. Alternatively, you can use pyfolio-reloaded, which is a fork of the original library maintained by Stefan Jansen, the author of Machine Learning for Algorithmic Trading.

In Step 6, we generated a figure containing basic portfolio evaluation plots using the quantstats library. While we are already familiar with the plot depicting the daily returns, the other two are new:

  • Cumulative returns plot: It presents the evolution of the portfolio’s worth over time.
  • Underwater plot: This plot presents the investment from a pessimistic point of view, as it focuses on losses. It plots all the drawdown periods and how long they lasted, that is, until the value rebounded to a new high. One of the insights we can draw from this is how long the periods of losses lasted.

Lastly, we generated portfolio evaluation metrics. While doing so, we also provided a benchmark. We chose the SPY, which is an exchange-traded fund (ETF) designed to follow the S&P 500 index. We could provide the benchmark as either the ticker or a pandas DataFrame/Series containing the prices/returns. The library can handle both options and we can indicate if we want to calculate the returns from prices using the prepare_returns argument.

The most important metrics that we saw in Figure 11.2 are:

  • Sharpe ratio: One of the most popular performance evaluation metrics, it measures the excess return (over the risk-free rate) per unit of standard deviation. When no risk-free rate is provided, the default assumption is that it is equal to 0%. The greater the Sharpe ratio, the better the portfolio’s risk-adjusted performance.
  • Sortino ratio: A modified version of the Sharpe ratio, where the standard deviation in the denominator is replaced with downside deviation.
  • Omega ratio: The probability-weighted ratio of gains over losses for a determined return target threshold (default set to 0). Its main advantage over the Sharpe ratio is that the Omega ratio—by construction—considers all moments of the returns distribution, while the former only considers the first two (mean and variance).
  • Max drawdown: A metric of the downside risk of a portfolio, it measures the largest peak-to-valley loss (expressed as a percentage) during the course of the investment. The lower the maximum drawdown, the better.
  • Tail ratio: The ratio (absolute) between the 95th and 5th percentile of the daily returns. A tail ratio of ~0.8 means that losses are ~1.25 times as bad as profits.

Downside deviation is similar to standard deviation; however, it only considers negative returns—it discards all positive changes from the series. It also allows us to define different levels of minimum acceptable returns (dependent on the investor) and returns below that threshold are used to calculate the downside deviation.

There’s more...

So far, we have mostly generated only the basic selection of plots and metrics available in the quantstats library. However, the library has much more to offer.

Full tear sheets

quantstats allows us to generate a complete HTML report containing all of the available plots and metrics (including a comparison to the benchmark). We can create such a report using the following command:

qs.reports.html(portfolio_returns, 
                benchmark="SPY", 
                title="1/n portfolio",
                download_filename="EW portfolio evaluation.html")

Executing it generates an HTML file containing the exhaustive tear sheet of our equally-weighted portfolio, compared to the SPY. Please refer to the EW portfolio evaluation.html file on GitHub.

First, let’s explain some of the new, yet relevant metrics visible in the generated report:

  • Calmar ratio: The ratio is defined as the average annual compounded rate of return divided by the maximum drawdown for that same time period. The higher the ratio, the better.
  • Skew: Skewness measures the degree of asymmetry, that is, how much is the given distribution (here, of portfolio returns) more skewed than the Normal distribution. Negative skewness (left-skewed distributions) means that large negative returns occur more frequently than large positive ones.
  • Kurtosis: It measures extreme values in either of the tails. Distributions with large kurtosis exhibit tail data exceeding the tails of the Gaussian distribution, meaning that large and small returns occur more frequently.
  • Alpha: It describes a strategy’s ability to beat the market. In other words, it is the portfolio excess returns above the benchmark return.
  • Beta: It measures the overall systematic risk of a portfolio of investments. In other words, it is a measure of portfolio volatility compared to the systematic risk of the entire market. A portfolio’s beta is equal to the weighted average of the beta coefficients of all the individual assets in a portfolio.

The metrics also include the 10 worst drawdowns. That is, they show how bad each of the drawdowns was, the recovery date, and the drawdowns’ duration. This information complements the analysis of the underwater plot we mentioned before.

Figure 11.3: The 10 worst drawdowns during the evaluation period

Then, the report also contains some new plots, which we explain below:

  • Rolling Sharpe ratio: Instead of reporting one number over time, it is also interesting to see how stable the Sharpe ratio was. That is why the following plot presents this metric calculated on a rolling basis, using 6 months’ worth of data.

Figure 11.4: Rolling (6 months) Sharpe ratio

  • The five worst drawdown periods are also visualized on a separate plot. For exact dates when the drawdowns started and ended, please refer to Figure 11.3. One thing worth mentioning is that the drawdown periods are superimposed on the cumulative returns plot. This way, we can clearly confirm the definition of the drawdown, that is, how much our portfolio is down from the peak before it recovers back to the peak level.

Figure 11.5: Five worst drawdown periods during the evaluation period

  • A histogram depicting the distribution of the monthly returns, including a kernel density estimate (KDE) and the average value. It’s helpful in analyzing the distribution of the returns. In the plot, we can see that the average monthly returns over the evaluation period were positive.

Figure 11.6: Distribution of the monthly returns (histogram + KDE)

  • A heatmap serving as a summary of what the returns were over certain months/years.

Figure 11.7: A heatmap presenting the monthly returns over the years

  • A quantile plot showing the distribution of the returns aggregated to different frequencies.

Figure 11.8: Quantile plot aggregating the returns to different frequencies

Before creating the comprehensive HTML report, we generated the basic plots and metrics using the qs.reports.plots and qs.reports.metrics functions. We can also use those functions to get the very same metrics/plots as we have obtained in the report by appropriately specifying the mode argument. To get all the metrics, we should pass "full" instead of "basic" (which is also the default value).

Enriching the pandas DataFrames/Series with new methods

Another interesting feature of the quantstats library is that it can enrich the pandas DataFrame or Series with new methods, used for calculating all the metrics available in the library. To do so, we first need to execute the following command:

qs.extend_pandas()

Then, we can access the methods straight from the DataFrame containing the return series. For example, we can quickly calculate the Sharpe and Sortino ratios using the following snippet:

print(f"Sharpe ratio: {portfolio_returns.sharpe():.2f}")
print(f"Sortino ratio: {portfolio_returns.sortino():.2f}")

Which returns:

Sharpe ratio: 1.36
Sortino ratio: 1.96

The values are a match to what we calculated earlier using the qs.reports.metrics function. For a complete list of the available methods, you can run the following snippet:

[method for method in dir(qs.stats) if method[0] != "_"]

See also

Additional resources are available here:

  • DeMiguel, V., Garlappi, L., & Uppal, R. 2007, “Optimal versus naive diversification: how inefficient is the 1/N portfolio strategy?” The Review of Financial Studies, 22(5): 1915-1953: https://doi.org/10.1093/rfs/hhm075

Finding the efficient frontier using Monte Carlo simulations

According to the Modern Portfolio Theory, the efficient frontier is a set of optimal portfolios in the risk-return spectrum. This means that the portfolios on the frontier:

  • Offer the highest expected return for a given level of risk
  • Offer the lowest level of risk for a given level of expected returns

All portfolios located under the efficient frontier curve are considered sub-optimal, so it is always better to choose the ones on the frontier instead.

In this recipe, we show how to find the efficient frontier using Monte Carlo simulations. Before showing more elegant approaches based on optimization, we employ a brute force approach in which we build thousands of portfolios using randomly assigned weights. Then, we can calculate the portfolios’ performance (expected returns/volatility) and use those values to determine the efficient frontier. For this exercise, we use the returns of four US tech companies from 2021.

How to do it...

Execute the following steps to find the efficient frontier using Monte Carlo simulations:

  1. Import the libraries:
    import yfinance as yf
    import numpy as np
    import pandas as pd
    
  2. Set up the parameters:
    N_PORTFOLIOS = 10 ** 5
    N_DAYS = 252
    ASSETS = ["META", "TSLA", "TWTR", "MSFT"]
    ASSETS.sort()
    n_assets = len(ASSETS)
    
  3. Download the stock prices from Yahoo Finance:
    prices_df = yf.download(ASSETS,
                            start="2021-01-01",
                            end="2021-12-31",
                            adjusted=True)
    
  4. Calculate the annualized average returns and the corresponding standard deviation:
    returns_df = prices_df["Adj Close"].pct_change().dropna()
    avg_returns = returns_df.mean() * N_DAYS
    cov_mat = returns_df.cov() * N_DAYS
    
  5. Simulate random portfolio weights:
    np.random.seed(42)
    weights = np.random.random(size=(N_PORTFOLIOS, n_assets))
    weights /=  np.sum(weights, axis=1)[:, np.newaxis]
    
  6. Calculate the portfolio metrics:
    portf_rtns = np.dot(weights, avg_returns)
    portf_vol = []
    for i in range(0, len(weights)):
        vol = np.sqrt(
            np.dot(weights[i].T, np.dot(cov_mat, weights[i]))
        )
        portf_vol.append(vol)
    portf_vol = np.array(portf_vol)  
    portf_sharpe_ratio = portf_rtns / portf_vol
    
  7. Create a DataFrame containing all the data:
    portf_results_df = pd.DataFrame(
        {"returns": portf_rtns,
         "volatility": portf_vol,
         "sharpe_ratio": portf_sharpe_ratio}
    )
    

    The DataFrame looks as follows:

    Figure 11.9: Selected metrics of each of the generated portfolios

  1. Locate the points creating the efficient frontier:
    N_POINTS = 100
    ef_rtn_list = []
    ef_vol_list = []
    possible_ef_rtns = np.linspace(portf_results_df["returns"].min(),
                                   portf_results_df["returns"].max(),
                                   N_POINTS)
    possible_ef_rtns = np.round(possible_ef_rtns, 2)    
    portf_rtns = np.round(portf_rtns, 2)
    for rtn in possible_ef_rtns:
        if rtn in portf_rtns:
            ef_rtn_list.append(rtn)
            matched_ind = np.where(portf_rtns == rtn)
            ef_vol_list.append(np.min(portf_vol[matched_ind]))
    
  2. Plot the efficient frontier:
    MARKERS = ["o", "X", "d", "*"]
    fig, ax = plt.subplots()
    portf_results_df.plot(kind="scatter", x="volatility",
                          y="returns", c="sharpe_ratio",
                          cmap="RdYlGn", edgecolors="black",
                          ax=ax)
    ax.set(xlabel="Volatility",
           ylabel="Expected Returns",
           title="Efficient Frontier")
    ax.plot(ef_vol_list, ef_rtn_list, "b--")
    for asset_index in range(n_assets):
        ax.scatter(x=np.sqrt(cov_mat.iloc[asset_index, asset_index]),
                   y=avg_returns[asset_index],
                   marker=MARKERS[asset_index],
                   s=150, color="black",
                   label=ASSETS[asset_index])
    ax.legend()
    plt.show()
    

    Executing the snippet generates the plot with all the randomly created portfolios, four points indicating the individual assets, and the efficient frontier.

Figure 11.10: The efficient frontier identified using Monte Carlo simulations

In Figure 11.10, we see the typical, bullet-like shape of the efficient frontier.

Some insights we could draw from analyzing the efficient frontier:

  • Anything to the left of the efficient frontier’s line is not achievable, as we cannot get that level of expected return for such a level of volatility.
  • The performance of a portfolio consisting only of Microsoft’s stock lies very close to the efficient frontier.

Ideally, we should search for a portfolio offering exceptional returns but with a combined standard deviation that is lower than the standard deviations of the individual assets. For example, we should not consider a portfolio consisting only of Meta’s stock (it is not efficient), but the one that lies on the frontier directly above. That is because the latter offers a much better expected return for the same level of expected volatility.

How it works...

In Step 2, we defined the parameters used for this recipe, such as the considered timeframe, the assets we wanted to use for building the portfolio, and the number of simulations. An important thing to note here is that we also ran ASSETS.sort() to sort the list alphabetically. This matters when interpreting the results, as when downloading data from Yahoo Finance using the yfinance library, the obtained prices are ordered alphabetically, not as specified in the provided list. Having downloaded the stock prices, we calculated simple returns using the pct_change method, and dropped the first row containing NaNs.

For evaluating the potential portfolios, we needed the average (expected) annual return and the corresponding covariance matrix. We obtained them by using the mean and cov methods of the DataFrame. We also annualized both metrics by multiplying them by 252 (the average number of trading days in a year).

We needed the covariance matrix, as for calculating the portfolio volatility, we also needed to account for the correlation between the assets. To benefit from significant diversification, the assets should have low positive or negative correlations.

In Step 5, we calculated the random portfolio weights. Following the assumptions of the modern portfolio theory (refer to the chapter introduction for reference), the weights needed to be positive and sum up to 1. To achieve this, we first generated a matrix of random numbers (between 0 and 1) using np.random.random. The matrix was of size N_SIMULATIONS by n_assets. To make sure the weights summed up to 1, we divided each row of the matrix by its sum.

In Step 6, we calculated the portfolio metrics—returns, standard deviation, and the Sharpe ratio. To calculate the expected annual portfolio returns, we had to multiply the weights by the previously calculated annual averages. For the standard deviations, we had to use the following formula: , where is the vector of weights and is the historical covariance matrix. We iterated over all the simulated portfolios using a for loop.

In this case, the for loop implementation is actually faster than the vectorized matrix equivalent: np.diag(np.sqrt(np.dot(weights, np.dot(cov_mat, weights.T)))). The reason for that is the quickly increasing number of off-diagonal elements to be calculated, which does not matter for the metrics of interest. This approach is faster than the for loop for only a relatively small number of simulations (~100).

For this example, we assumed that the risk-free rate was 0%, so the Sharpe ratio of the portfolio could be calculated as portfolio returns divided by the portfolio’s volatility. Another possible approach would be to calculate the average annual risk-free rate over 2021 and to use the portfolio excess returns for calculating the ratio.

One thing to keep in mind while finding the optimal asset allocation and evaluating its performance is that we are optimizing historically. We use the past performance to select the allocation that should work best, provided the market conditions do not change. As we know very well, that is rarely the case, thus past performance is not always indicative of future performance.

The last three steps led to visualizing the results. First, we put all the relevant metrics into a pandas DataFrame. Second, we identified the points of the efficient frontier. To do so, we created an array of expected returns from the sample. We used np.linspace, with the min and max values coming from the calculated portfolio returns. We rounded the numbers to two decimals to make the calculations smoother. For each expected return, we found the minimum observed volatility. In cases where there was no match, as can happen with equally spread points on the linear space, we skipped that point.

In the very last step, we plotted the simulated portfolios, the individual assets, and the approximated efficient frontier in one plot. The shape of the frontier was a bit jagged, which can be expected when using only simulated values that are not that frequent in some extreme areas. Additionally, we colored the dots representing the simulated portfolios by the value of the Sharpe ratio. Following the ratio’s definition, the upper-left part of the plot shows a sweet spot with the highest expected returns per expected volatility.

You can find the available colormaps in matplotlib documentation. Depending on the problem at hand, a different colormap might be more suitable (sequential, diverging, qualitative, and so on).

There’s more...

Having simulated 100,000 random portfolios, we can also investigate which one has the highest Sharpe ratio (maximum expected return per unit of risk, also known as the tangency portfolio) or minimum volatility. To locate these portfolios among the simulated ones, we use the np.argmin and np.argmax functions, which return the index of the minimum/maximum value in the array.

The code is as follows:

max_sharpe_ind = np.argmax(portf_results_df["sharpe_ratio"])
max_sharpe_portf = portf_results_df.loc[max_sharpe_ind]
min_vol_ind = np.argmin(portf_results_df["volatility"])
min_vol_portf = portf_results_df.loc[min_vol_ind]

We can also investigate the constituents of these portfolios, together with the expected performance. Here, we only focus on the results, but the code used for generating the summaries is available in the book’s GitHub repository.

The maximum Sharpe ratio portfolio allocates the majority of the resources (~95%) to Microsoft and virtually nothing to Twitter. That is because Twitter’s annualized average returns for 2021 were negative:

Maximum Sharpe Ratio portfolio ----
Performance
returns: 45.14% volatility: 20.95% sharpe_ratio: 215.46%
Weights
META: 2.60% MSFT: 95.17% TSLA: 2.04% TWTR: 0.19%

The minimum volatility portfolio assigns ~78% of the weight to Microsoft, as it is the stock with the lowest volatility (this can be inspected by viewing the covariance matrix):

Minimum Volatility portfolio ----
Performance
returns: 40.05% volatility: 20.46% sharpe_ratio: 195.76%
Weights
META: 17.35% MSFT: 78.16% TSLA: 0.23% TWTR: 4.26%

Lastly, we mark these two portfolios on the efficient frontier plot. To do so, we add two extra scatterplots, each with one point corresponding to the selected portfolio. We then define the marker shape with the marker argument and the marker size with the s argument. We increase the size of the markers to make the portfolios more visible among all other points.

The code is as follows:

fig, ax = plt.subplots()
portf_results_df.plot(kind="scatter", x="volatility",
                      y="returns", c="sharpe_ratio",
                      cmap="RdYlGn", edgecolors="black",
                      ax=ax)
ax.scatter(x=max_sharpe_portf["volatility"],
           y=max_sharpe_portf["returns"],
           c="black", marker="*",
           s=200, label="Max Sharpe Ratio")
ax.scatter(x=min_vol_portf["volatility"],
           y=min_vol_portf["returns"],
           c="black", marker="P",
           s=200, label="Minimum Volatility")
ax.set(xlabel="Volatility", ylabel="Expected Returns",
       title="Efficient Frontier")
ax.legend()
plt.show()

Executing the snippet generates the following figure:

Figure 11.11: Efficient frontier with the Global Minimum Volatility and Max Sharpe Ratio portfolios

We did not plot the individual assets and the efficient frontier’s line to avoid the plot becoming too cluttered. The plot aligns with the intuition we have built while analyzing Figure 11.10. First, the Minimum Volatility portfolio lies on the leftmost part of the frontier, which corresponds to the lowest expected volatility. Second, the Max Sharpe Ratio portfolio lies in the upper-left part of the plot, where the ratio of the expected returns to volatility is the highest.

Finding the efficient frontier using optimization with SciPy

In the previous recipe, Finding the efficient frontier using Monte Carlo simulations, we used a brute force approach based on Monte Carlo simulations to visualize the efficient frontier. In this recipe, we use a more refined method to find the frontier.

From its definition, the efficient frontier is formed by a set of portfolios offering the highest expected portfolio return for certain volatility, or offering the lowest risk (volatility) for a certain level of expected returns. We can leverage this fact, and use it in numerical optimization.

The goal of optimization is to find the best (optimal) value of the objective function by adjusting the target variables and taking into account some boundaries and constraints (which have an impact on the target variables). In this case, the objective function is a function returning portfolio volatility, and the target variables are portfolio weights.

Mathematically, the problem can be expressed as:

Here, is a vector of weights, is the covariance matrix, is a vector of returns, and is the expected portfolio return.

To find the efficient frontier, we iterate the optimization routine used for finding the optimal portfolio weights over a range of expected portfolio returns.

In this recipe, we work with the same dataset as in the previous one in order to show that the results obtained by both approaches are similar.

Getting ready

This recipe requires running all the code from the Finding the efficient frontier using Monte Carlo simulations recipe.

How to do it...

Execute the following steps to find the efficient frontier using optimization with SciPy:

  1. Import the libraries:
    import numpy as np
    import scipy.optimize as sco
    from chapter_11_utils import print_portfolio_summary
    
  2. Define functions for calculating portfolio returns and volatility:
    def get_portf_rtn(w, avg_rtns):
        return np.sum(avg_rtns * w)
    def get_portf_vol(w, avg_rtns, cov_mat):
        return np.sqrt(np.dot(w.T, np.dot(cov_mat, w)))
    
  3. Define the function calculating the efficient frontier:
    def get_efficient_frontier(avg_rtns, cov_mat, rtns_range):
        
        efficient_portfolios = []
        
        n_assets = len(avg_returns)
        args = (avg_returns, cov_mat)
        bounds = tuple((0,1) for asset in range(n_assets))
        initial_guess = n_assets * [1. / n_assets, ]
        
        for ret in rtns_range:
            constr = (
                {"type": "eq",
                 "fun": lambda x: get_portf_rtn(x, avg_rtns) - ret},
                {"type": "eq", 
                 "fun": lambda x: np.sum(x) - 1}
            )
            ef_portf = sco.minimize(get_portf_vol, 
                                    initial_guess, 
                                    args=args, method="SLSQP", 
                                    constraints=constr,
                                    bounds=bounds)
            efficient_portfolios.append(ef_portf)
        
        return efficient_portfolios
    
  4. Define the considered range of expected portfolio returns:
    rtns_range = np.linspace(-0.1, 0.55, 200)
    
  5. Calculate the efficient frontier:
    efficient_portfolios = get_efficient_frontier(avg_returns,
                                                  cov_mat,
                                                  rtns_range)
    
  6. Extract the volatilities of the efficient portfolios:
    vols_range = [x["fun"] for x in efficient_portfolios]
    
  7. Plot the calculated efficient frontier, together with the simulated portfolios:
    fig, ax = plt.subplots()
    portf_results_df.plot(kind="scatter", x="volatility",
                          y="returns", c="sharpe_ratio",
                          cmap="RdYlGn", edgecolors="black",
                          ax=ax)
    ax.plot(vols_range, rtns_range, "b--", linewidth=3)
    ax.set(xlabel="Volatility",
           ylabel="Expected Returns",
           title="Efficient Frontier")
    plt.show()
    

    The following figure presents a graph of the efficient frontier, calculated using numerical optimization:

    Figure 11.12: Efficient frontier identified using numerical optimization together with the previously generated random portfolios

    We see that the efficient frontier has a very similar shape to the one obtained using Monte Carlo simulations. The only difference is that the line is smoother.

  1. Identify the minimum volatility portfolio:
    min_vol_ind = np.argmin(vols_range)
    min_vol_portf_rtn = rtns_range[min_vol_ind]
    min_vol_portf_vol = efficient_portfolios[min_vol_ind]["fun"]
    min_vol_portf = {
        "Return": min_vol_portf_rtn,
        "Volatility": min_vol_portf_vol,
        "Sharpe Ratio": (min_vol_portf_rtn / min_vol_portf_vol)
    }
    
  2. Print the performance summary:
    print_portfolio_summary(min_vol_portf,
                            efficient_portfolios[min_vol_ind]["x"],
                            ASSETS,
                            name="Minimum Volatility")
    

Running the snippet results in the following summary:

Minimum Volatility portfolio ----
Performance
Return: 40.30% Volatility: 20.45% Sharpe Ratio: 197.10%
Weights
META: 15.98% MSFT: 79.82% TSLA: 0.00% TWTR: 4.20%

The minimum volatility portfolio is achieved by investing mostly in Microsoft and Meta, while not investing in Tesla at all.

How it works...

As mentioned in the introduction, we continued the example from the previous recipe. That is why we had to run Steps 1 to 4 from there (not shown here for brevity), to have all the required data. As an extra prerequisite, we had to import the optimization module from SciPy.

In Step 2, we defined two functions, which return the expected portfolio return and volatility, given historical data and the portfolio weights. We had to define these functions instead of calculating these metrics directly as we use them later on in the optimization procedure. The algorithm iteratively tries different weights and needs to be able to use the current values of the target variables (weights) to arrive at the metric it tries to optimize.

In Step 3, we defined a function called get_efficient_frontier. Its goal is to return a list containing the efficient portfolios, given historical metrics and the considered range of expected portfolio returns. This was the most important step of the recipe and contained a lot of nuances. We describe the logic of the function sequentially:

  1. The outline of the function is that it runs the optimization procedure for each expected portfolio return in the considered range, and stores the resulting optimal portfolio in a list.
  2. Outside of the for loop, we defined a couple of objects that we pass into the optimizer:
    • The arguments that are passed to the objective function. In this case, these are the historical average returns and the covariance matrix. The function that we optimize must accept the arguments as inputs. That is why we pass the returns to the get_portf_vol function (defined in Step 2), even though they are not necessary for calculations and are not used within the function.
    • bounds (a nested tuple)—for each target variable (weight), we provide a tuple containing the boundary values, that is, the minimum and maximum allowable values. In this case, the values span the range from 0 to 1 (no negative weights, as per the MPT).
    • initial_guess, which is the initial guess of the target variables. The goal of using the initial guess is to make the optimization run faster and more efficiently. In this case, the guess is the equally-weighted allocation.
  3. Inside the for loop, we defined the last element used for the optimization—the constraints. We defined two constraints:
    • The expected portfolio return must be equal to the provided value.
    • The sum of the weights must be equal to 1.

    The first constraint is the reason why the constraint’s tuple is defined within the loop. That is because the loop passes over the considered range of expected portfolio returns, and for each value, we find the optimal risk level.

  1. We run the optimizer with the Sequential Least-Squares Programming (SLSQP) algorithm, which is frequently used for generic minimization problems. For the function to be minimized, we pass the previously defined get_portfolio_vol function.

The optimizer sets the equality (eq) constraint to 0. That is why the intended constraint, np.sum(weights) == 1, is expressed as np.sum(weights) - 1 == 0.

In Steps 4 and 5, we defined the range of expected portfolio returns (based on the range we empirically observed in the previous recipe) and ran the optimization function.

In Step 6, we iterated over the list of efficient portfolios and extracted the optimal volatilities. We extracted the volatility from the scipy.optimize.OptimizeResult object by accessing the fun element. This stands for the optimized objective function which is, in this case, the portfolio volatility.

In Step 7, we added the calculated efficient frontier on top of the plot from the previous recipe, Finding the efficient frontier using Monte Carlo simulations. All the simulated portfolios lie on or below the efficient frontier, which is what we expected to happen.

In Steps 8 and 9, we identified the minimum volatility portfolio, printed the performance metrics, and showed the portfolio’s weights (extracted from the efficient frontier).

We can now compare the two minimum volatility portfolios: the one obtained using Monte Carlo simulations, and the one we received from optimization. The prevailing pattern in the allocation is the same—allocate the majority of the available resources to Meta and Microsoft. We can also see that the volatility of the optimized strategy is slightly lower. This means that among the 100,000 portfolios, we have not simulated the actual minimum volatility portfolio (for the considered range of expected portfolio returns).

There’s more...

We can also use the optimization approach to find the weights that generate a portfolio with the highest expected Sharpe ratio, that is, the tangency portfolio. To do so, we first need to redefine the objective function, which now will be the negative of the Sharpe ratio. The reason why we use the negative is that optimization algorithms run minimization problems. We can easily approach maximization problems by changing the sign of the objective function:

  1. Define the new objective function (negative Sharpe ratio):
    def neg_sharpe_ratio(w, avg_rtns, cov_mat, rf_rate):
        portf_returns = np.sum(avg_rtns * w)
        portf_volatility = np.sqrt(np.dot(w.T, np.dot(cov_mat, w)))
        portf_sharpe_ratio = (
            (portf_returns - rf_rate) / portf_volatility
        )
        return -portf_sharpe_ratio
    

    The second step is very similar to what we have already done with the efficient frontier, this time without the for loop, as we are only searching for one set of weights. We include the risk-free rate in the arguments (though we assume it is 0%, for simplicity) and only use one constraint—the sum of the target variables must be equal to 1.

  1. Find the optimized portfolio:
    n_assets = len(avg_returns)
    RF_RATE = 0
    args = (avg_returns, cov_mat, RF_RATE)
    constraints = ({"type": "eq",
                    "fun": lambda x: np.sum(x) - 1})
    bounds = tuple((0,1) for asset in range(n_assets))
    initial_guess = n_assets * [1. / n_assets]
    max_sharpe_portf = sco.minimize(neg_sharpe_ratio,
                                    x0=initial_guess,
                                    args=args,
                                    method="SLSQP",
                                    bounds=bounds,
                                    constraints=constraints)
    
  2. Extract information about the maximum Sharpe ratio portfolio:
    max_sharpe_portf_w = max_sharpe_portf["x"]
    max_sharpe_portf = {
        "Return": get_portf_rtn(max_sharpe_portf_w, avg_returns),
        "Volatility": get_portf_vol(max_sharpe_portf_w, 
                                    avg_returns,
                                    cov_mat),
        "Sharpe Ratio": -max_sharpe_portf["fun"]
    }
    
  3. Print the performance summary:
    print_portfolio_summary(max_sharpe_portf,
                            max_sharpe_portf_w,
                            ASSETS,
                            name="Maximum Sharpe Ratio")
    

Running the snippet prints the following summary of the portfolio maximizing the Sharpe ratio:

Maximum Sharpe Ratio portfolio ----
Performance
Return: 45.90% Volatility: 21.17% Sharpe Ratio: 216.80%
Weights
META: 0.00% MSFT: 96.27% TSLA: 3.73% TWTR: 0.00%

To achieve the maximum Sharpe ratio, the investor should invest mostly in Microsoft (>96% allocation), with a 0% allocation to Meta and Twitter.

See also

  • Markowitz, H., 1952. “Portfolio Selection,” The Journal of Finance, 7(1): 77–91

Finding the efficient frontier using convex optimization with CVXPY

In the previous recipe, Finding the efficient frontier using optimization with SciPy, we found the efficient frontier using numerical optimization with the SciPy library. We used portfolio volatility as the metric we wanted to minimize. However, it is also possible to state the same problem a bit differently and use convex optimization to find the efficient frontier.

We can reframe the mean-variance optimization problem into a risk-aversion framework, in which the investor wants to maximize the risk-adjusted return:

Here, is the risk-aversion parameter, and the constraints specify that the weights must sum up to 1, and short-selling is not allowed. The higher the value of , the more risk-averse the investor is.

Short-selling assumes borrowing an asset and selling it on the open market. Then, we purchase the asset later at a lower price. Our gain is the difference after repaying the initial loan. In this recipe, we use the same data as in the previous two recipes, to make sure the results are comparable.

Getting ready

This recipe requires running all the code from the previous recipes:

  • Finding the efficient frontier using Monte Carlo simulations
  • Finding the efficient frontier using optimization with SciPy

How to do it...

Execute the following steps to find the efficient frontier using convex optimization:

  1. Import the library:
    import cvxpy as cp
    
  2. Convert the annualized average returns and the covariance matrix to numpy arrays:
    avg_returns = avg_returns.values
    cov_mat = cov_mat.values
    
  3. Set up the optimization problem:
    weights = cp.Variable(n_assets)
    gamma_par = cp.Parameter(nonneg=True)
    portf_rtn_cvx = avg_returns @ weights
    portf_vol_cvx = cp.quad_form(weights, cov_mat)
    objective_function = cp.Maximize(
        portf_rtn_cvx - gamma_par.*.portf_vol_cvx
    )
    problem = cp.Problem(
        objective_function,
        [cp.sum(weights) == 1, weights >= 0]
    )
    
  4. Calculate the efficient frontier:
    N_POINTS = 25
    portf_rtn_cvx_ef = []
    portf_vol_cvx_ef = []
    weights_ef = []
    gamma_range = np.logspace(-3, 3, num=N_POINTS)
    for gamma in gamma_range:
        gamma_par.value = gamma
        problem.solve()
        portf_vol_cvx_ef.append(cp.sqrt(portf_vol_cvx).value)
        portf_rtn_cvx_ef.append(portf_rtn_cvx.value)
        weights_ef.append(weights.value)
    
  5. Plot the allocation for different values of the risk-aversion parameter:
    weights_df = pd.DataFrame(weights_ef,
                              columns=ASSETS,
                              index=np.round(gamma_range, 3))
    ax = weights_df.plot(kind="bar", stacked=True)
    ax.set(title="Weights allocation per risk-aversion level",
           xlabel=r"$gamma$",
           ylabel="weight")
    ax.legend(bbox_to_anchor=(1,1))
    

    In Figure 11.13, we can see the asset allocation for the considered range of risk-aversion parameters ():

    Figure 11.13: Asset allocation per various levels of risk-aversion

    In Figure 11.13, we can see that for very small values of , the investor would allocate 100% of their resources to Tesla. As we increased the risk aversion, the allocation to Tesla grew smaller, and more weight was allocated to Microsoft and the other assets. At the other end of the considered values for the parameter, the investor would allocate 0% to Tesla.

  1. Plot the efficient frontier, together with the individual assets:
    fig, ax = plt.subplots()
    ax.plot(portf_vol_cvx_ef, portf_rtn_cvx_ef, "g-")
    for asset_index in range(n_assets):
         plt.scatter(x=np.sqrt(cov_mat[asset_index, asset_index]),
                     y=avg_returns[asset_index],
                     marker=MARKERS[asset_index],
                     label=ASSETS[asset_index],
                     s=150)
    ax.set(title="Efficient Frontier",
           xlabel="Volatility",
           ylabel="Expected Returns")
    ax.legend()
    

    Figure 11.14 presents the efficient frontier, generated by solving the convex optimization problem.

Figure 11.14: Efficient frontier identified by solving the convex optimization problem

The generated frontier is similar to the one in Figure 11.10 (generated using Monte Carlo simulations). Back then, we established that a portfolio consisting of only Microsoft’s stocks lies very close to the efficient frontier. Now we can say the same about the portfolio comprised entirely of Tesla’s stocks. When using Monte Carlo simulations, we did not have enough observations generated in that part of the returns/volatility plane to draw the efficient frontier line around that portfolio. In the There’s more... section, we also compare this frontier to the one obtained in the previous recipe, in which we used the SciPy library.

How it works...

As mentioned in the introduction, we continued the example from the previous two recipes. That is why we had to run Steps 1 to 4 from the Finding the efficient frontier using Monte Carlo simulations recipe (not shown here for brevity) to have all the required data. As an extra step, we had to import the cvxpy convex optimization library. We additionally converted the historical average returns and the covariance matrix into numpy arrays.

In Step 3, we set up the optimization problem. We started by defining the target variables (weights), the risk-aversion parameter (gamma_par, where “par” is added to highlight that it is a parameter of the optimization routine), the portfolio returns and volatility (both using the previously defined weights object), and lastly, the objective function—the risk-adjusted returns we want to maximize. Then, we created the cp.Problem object and passed the objective function and a list of constraints as arguments.

We used cp.quad_form(x, y) to express the following multiplication: xTyx.

In Step 4, we found the efficient frontier by solving the convex optimization problem for multiple values of the risk-aversion parameter. To define the considered values, we used the np.logspace function to get 25 values of . For each value of the parameter, we found the optimal solution by running problem.solve(). We stored the values of interest in dedicated lists.

np.logspace is similar to np.linspace; the difference is that the former finds numbers evenly spread on a log scale instead of a linear scale.

In Step 5, we plotted the asset allocation per various levels of risk aversion. Lastly, we plotted the efficient frontier, together with the individual assets.

There’s more...

Comparing the results from two formulations of the asset allocation problem

We can also plot the two efficient frontiers for comparison—the one calculated by minimizing the volatility per expected level of return, and the other one using convex optimization and maximizing the risk-adjusted return:

x_lim = [0.2, 0.6]
y_lim = [0.4, 0.6]
fig, ax = plt.subplots(1, 2)
ax[0].plot(vols_range, rtns_range, "g-", linewidth=3)
ax[0].set(title="Efficient Frontier - Minimized Volatility",
          xlabel="Volatility",
          ylabel="Expected Returns",
          xlim=x_lim,
          ylim=y_lim)
ax[1].plot(portf_vol_cvx_ef, portf_rtn_cvx_ef, "g-", linewidth=3)
ax[1].set(title="Efficient Frontier - Maximized Risk-Adjusted Return",
          xlabel="Volatility",
          ylabel="Expected Returns",
          xlim=x_lim,
          ylim=y_lim)

Executing the snippet generates the following plots:

Figure 11.15: Comparison of efficient frontiers generated by minimizing volatility per expected level of return (left) and by maximizing the risk-adjusted return (right)

As we can see, the generated efficient frontiers are very similar, with some minor differences. First, the one obtained using minimization is smoother, as we used more points to calculate the frontier. Second, the right one is defined for a slightly larger range of possible volatility/returns pairs.

Allowing for leverage

Another interesting concept we can incorporate into the analysis is the maximum allowable leverage. We replace the non-negativity constraints on the weights with a max leverage constraint, using the norm of a vector.

In the following snippet, we only show what was added on top of the things we defined in Step 3:

max_leverage = cp.Parameter()
prob_with_leverage = cp.Problem(objective_function, 
                                [cp.sum(weights) == 1, 
                                cp.norm(weights, 1) <= max_leverage])

In the next snippet, we modify the code, this time to include two loops—one over potential values of the risk-aversion parameter, and the other one indicating the maximum allowable leverage. Max leverage equal to 1 (meaning no leverage) results in a case similar to the previous optimization problem (only this time, there is no non-negativity constraint).

We also redefine the placeholder objects (used for storing the results) to be either 2D matrices (np.ndarrays) or including the third dimension, in the case of weights.

LEVERAGE_RANGE = [1, 2, 5]
len_leverage = len(LEVERAGE_RANGE)
N_POINTS = 25
 
portf_vol_l = np.zeros((N_POINTS, len_leverage))
portf_rtn_l = np.zeros(( N_POINTS, len_leverage))
weights_ef = np.zeros((len_leverage, N_POINTS, n_assets))
 
for lev_ind, leverage in enumerate(LEVERAGE_RANGE):
    for gamma_ind in range(N_POINTS):
        max_leverage.value = leverage
        gamma_par.value = gamma_range[gamma_ind]
        prob_with_leverage.solve()
        portf_vol_l[gamma_ind, lev_ind] = cp.sqrt(portf_vol_cvx).value
        portf_rtn_l[gamma_ind, lev_ind] = portf_rtn_cvx.value
        weights_ef[lev_ind, gamma_ind, :] = weights.value      

In the following snippet, we plot the efficient frontiers for different maximum leverages. We can clearly see that higher leverage increases returns and, at the same time, allows for greater volatility.

fig, ax = plt.subplots()
for leverage_index, leverage in enumerate(LEVERAGE_RANGE):
    plt.plot(portf_vol_l[:, leverage_index], 
             portf_rtn_l[:, leverage_index], 
             label=f"{leverage}")
ax.set(title="Efficient Frontier for different max leverage",
       xlabel="Volatility",
       ylabel="Expected Returns")
ax.legend(title="Max leverage")

Executing the code generates the following figure.

Figure 11.16: Efficient frontier for different values of maximum leverage

Lastly, we also recreate the plot showing weight allocation per varying risk-aversion levels. With a maximum leverage of 1, there is no short selling.

fig, ax = plt.subplots(len_leverage, 1, sharex=True)
for ax_index in range(len_leverage):
    weights_df = pd.DataFrame(weights_ef[ax_index], 
                              columns=ASSETS, 
                              index=np.round(gamma_range, 3))
    weights_df.plot(kind="bar", 
                    stacked=True, 
                    ax=ax[ax_index], 
                    legend=None) 
    ax[ax_index].set(
        ylabel=(f"max_leverage = {LEVERAGE_RANGE[ax_index]}"
                "
 weight")
    )
ax[len_leverage - 1].set(xlabel=r"$gamma$")
ax[0].legend(bbox_to_anchor=(1,1))
ax[0].set_title("Weights allocation per risk aversion level",
                fontsize=16)

Executing the snippet generates the following figure.

Figure 11.17: Asset allocation per different levels of risk aversion and maximum leverage

We can spot a clear pattern: with an increase in risk aversion, investors stop using leverage altogether and converge to a similar allocation for all levels of the maximum permitted leverage.

Finding the optimal portfolio with Hierarchical Risk Parity

De Prado (2018) explains that quadratic optimizers tend to deliver unreliable solutions, due to their instability, concentration, and underperformance. The main reason for all those troubles is the need to invert the covariance matrix, which is prone to cause large errors when the matrix is numerically ill-conditioned. He also refers to Markowitz’s curse, which implies that the more correlated the investments are, the greater the need for diversification, which in turn leads to bigger estimation errors in the portfolio weights.

A potential solution is to introduce a hierarchical structure, which means that small estimation errors will no longer lead to entirely different allocations. That is possible because the quadratic optimizers have complete freedom to fully reshuffle the weights to their liking (unless some explicit constraints are enforced).

Hierarchical Risk Parity (HRP) is a novel portfolio optimization method that combines graph theory and machine learning techniques in order to build a diversified portfolio based on the information available in the covariance matrix. At a very high level, the algorithm works as follows:

  1. Calculate a distance matrix based on the correlation of the assets (covariance matrix).
  2. Cluster the assets into a tree structure with hierarchical clustering (based on the distance matrix).
  3. Calculate the minimum variance portfolio within each branch of the tree.
  4. Iterate over the levels of the tree and combine the portfolios at each node.

For a more detailed description of the algorithm, please refer to De Prado (2018).

We also mention some of the advantages of the HRP approach:

  • It fully utilizes the information from the covariance matrix and does not require inverting it.
  • It treats clustered assets as complements, rather than substitutes.
  • The weights produced by the algorithm are more stable and robust.
  • The solution can be intuitively understood with the help of visualizations.
  • We can include additional constraints.
  • Literature suggests that the method outperforms the classical mean-variance approaches out-of-sample.

In this recipe, we apply the Hierarchical Risk Parity algorithm to form a portfolio from the stocks of the 10 biggest US tech companies.

How to do it…

Execute the following steps to find the optimal asset allocation using the HRP:

  1. Import the libraries:
    import yfinance as yf
    import pandas as pd
    from pypfopt.expected_returns import returns_from_prices
    from pypfopt.hierarchical_portfolio import HRPOpt
    from pypfopt.discrete_allocation import (DiscreteAllocation, 
                                             get_latest_prices)
    from pypfopt import plotting
    
  2. Download the stock prices of the 10 biggest US tech companies:
    ASSETS = ["AAPL", "MSFT", "AMZN", "GOOG", "META",
              "V", "NVDA", "MA", "PYPL", "NFLX"]
    prices_df = yf.download(ASSETS,
                            start="2021-01-01",
                            end="2021-12-31",
                            adjusted=True)
    prices_df = prices_df["Adj Close"]
    
  3. Calculate the returns from prices:
    rtn_df = returns_from_prices(prices_df)
    
  4. Find the optimal allocation using Hierarchical Risk Parity:
    hrp = HRPOpt(returns=rtn_df)
    hrp.optimize()
    
  5. Display the (cleaned) weights:
    weights = hrp.clean_weights()
    print(weights)
    

    This returns the following portfolio weights:

    OrderedDict([('AAPL', 0.12992), ('AMZN', 0.156), ('META', 0.08134), ('GOOG', 0.08532), ('MA', 0.10028), ('MSFT', 0.1083), ('NFLX', 0.10164), ('NVDA', 0.04466), ('PYPL', 0.05326), ('V', 0.13928)])
    
  1. Calculate the portfolio performance:
    hrp.portfolio_performance(verbose=True, risk_free_rate=0);
    

    which returns the following evaluation metrics:

    Expected annual return: 23.3%
    Annual volatility: 19.2%
    Sharpe Ratio: 1.21
    
  1. Visualize the hierarchical clustering used for finding the portfolio weights:
    fig, ax = plt.subplots()
    plotting.plot_dendrogram(hrp, ax=ax)
    ax.set_title("Dendogram of cluster formation")
    plt.show()
    

    Running the snippet generates the following plot:

    Figure 11.18: Dendrogram visualizing the process of cluster formation

    In Figure 11.18, we can see that companies such as Visa and MasterCard were clustered together. In the plot, the y-axis represents the distance between the two leaves that are to be merged.

    This makes sense, as if we wanted to invest in a publicly traded US credit card company like Visa, we might consider adding or reducing the allocation to another very similar company, such as MasterCard. Similarly in the case of Google and Microsoft, although the difference between those two companies is larger. This is the very idea of applying the hierarchy structure to the correlation between the assets.

  1. Find the number of stocks to buy using 50,000 USD:
    latest_prices = get_latest_prices(prices_df)
    allocation_finder = DiscreteAllocation(weights,
                                           latest_prices,
                                           total_portfolio_value=50000)
    allocation, leftover = allocation_finder.lp_portfolio()
    print(allocation)
    print(leftover)
    

Running the snippet prints the following dictionary of the suggested number of stocks to purchase and the leftover cash:

{'AAPL': 36, 'AMZN': 2, 'META': 12, 'GOOG': 2, 'MA': 14, 'MSFT': 16, 'NFLX': 8, 'NVDA': 7, 'PYPL': 14, 'V': 31}
12.54937744140625

How it works…

After importing the libraries, we downloaded the stock prices of the 10 largest US tech companies for the year 2021. In Step 3, we created a DataFrame containing the daily stock returns using the returns_from_prices function.

In Step 4, we instantiated the HRPOpt object and passed in the stock returns as input. Then, we used the optimize method to find the optimal weights. An inquisitive reader might notice that when describing the algorithm, we mentioned that it is based on the covariance matrix, while we used the return series as input. Under the hood, when we pass in the returns argument, the class computes the covariance matrix for us. Alternatively, we can pass in the covariance matrix directly using the cov_matrix argument.

When passing the covariance matrix directly, we can benefit from using alternative formulations of the covariance matrix, rather than the sample covariance. For example, we could use the Ledoit-Wolf shrinkage or the oracle approximating shrinkage (OAS). You can find references for those methods in the See also section.

Then, we displayed the cleaned weights using the clean_weights method. It is a helper method that rounds the weights to 5 decimals (can be adjusted) and cuts off any weights below a certain threshold to 0. In Step 6, we calculated the portfolio’s expected performance using the portfolio_performance method. While doing so, we changed the default risk-free rate to 0%.

In Step 7, we plotted the results of the hierarchical clustering using the plot_dendogram function. The figure produced by this function is very useful for getting an understanding of how the algorithm works and which assets were clustered together.

In Step 8, we performed a discrete allocation based on the calculated weights. We assumed we had 50,000 USD and wanted to allocate as much as possible using the HRP weights. First, we recovered the latest prices from the downloaded prices, so the ones from 2021-12-30. Then, we instantiated an object of the DiscreteAllocation class by providing the weights, latest prices, and our budget. Lastly, we used the lp_portfolio method to use linear programming to find the number of stocks we should buy, while keeping in mind our budget. We obtained two objects as the output: a dictionary containing the pairs of assets and the corresponding number of stocks, and the remaining money.

An alternative approach to linear programming would be to employ the greedy iterative search, available under the greedy_portfolios method.

There’s more...

PyPortfolioOpt has much more to offer than we have covered. For example, it greatly simplifies obtaining the efficient frontier. We can calculate it using the following steps:

  1. Import the libraries:
    from pypfopt.expected_returns import mean_historical_return
    from pypfopt.risk_models import CovarianceShrinkage
    from pypfopt.efficient_frontier import EfficientFrontier
    from pypfopt.plotting import plot_efficient_frontier
    
  2. Get the expected returns and the covariance matrix:
    mu = mean_historical_return(prices_df)
    S = CovarianceShrinkage(prices_df).ledoit_wolf()
    

    As we have already established multiple times in this chapter, mean-variance optimization requires two components: the expected returns of the assets and their covariance matrix. PyPortfolioOpt offers multiple possibilities for calculating both of them. While we have already mentioned alternatives to the covariance matrix, you can use the following for the expected returns: historical mean return, exponentially weighted mean historical return, and CAPM estimate of returns. Here, we calculated the historical mean and the Ledoit-Wolf shrinkage estimate of the covariance matrix.

  1. Find and plot the efficient frontier:
    ef = EfficientFrontier(mu, S)
    fig, ax = plt.subplots()
    plot_efficient_frontier(ef, ax=ax, show_assets=True)
    ax.set_title("Efficient Frontier")
    

    Running the snippet generates the following figure:

    Figure 11.19: Efficient frontier obtained using the Ledoit-Wolf shrinkage estimate of the covariance matrix

  1. Identify the tangency portfolio:
    ef = EfficientFrontier(mu, S)
    weights = ef.max_sharpe(risk_free_rate=0)
    print(ef.clean_weights())
    

    This returns the following portfolio weights:

    OrderedDict([('AAPL', 0.0), ('AMZN', 0.0), ('META', 0.0), ('GOOG', 0.55146), ('MA', 0.0), ('MSFT', 0.11808), ('NFLX', 0.0), ('NVDA', 0.33046), ('PYPL', 0.0), ('V', 0.0)])
    

The EfficientFrontier class allows for identifying more than just the tangency portfolio. We can also use the following methods:

  • min_volatility: Finds the portfolio with minimum volatility.
  • max_quadratic_utility: Finds the portfolio that maximizes the quadratic utility, given a level of risk aversion. This is the same approach as the one we have covered in the previous recipe.
  • efficient_risk: Finds a portfolio that maximizes the return for a given target risk.
  • efficient_return: Finds a portfolio that minimizes the risk for a given target return.

For the last two options, we can generate market neutral portfolios, that is, portfolios with weights summing up to zero.

As we have mentioned before, the functionalities we showed are just the proverbial tip of the iceberg. Using the library, we can also explore the following:

  • Incorporate sector constraints: Let’s assume you want to have a portfolio of stocks from various sectors, while keeping some conditions, for example, having at least 20% in tech.
  • Optimize for transaction costs: In a case when we already have a portfolio and want to rebalance, it might be quite expensive to completely rebalance the portfolio (and as we have discussed before, the instability of the portfolio weights can be a big disadvantage of the mean-variance optimization). In such a case, we can add an additional objective to rebalance the portfolio while keeping the transaction costs as low as possible.
  • Use the L2 regularization while optimizing the portfolio: By using the regularization we counter the behavior of many weights dropping to zero. We can experiment with different values of the gamma parameter to find the allocation that works best for us. You might already be familiar with the L2 regularization thanks to the famous Ridge Regression algorithm.
  • Use the Black-Litterman model to get a more stable model of the expected returns than just by using the historical mean returns. It is a Bayesian approach to asset allocation, which combines a prior estimate of returns with views on certain assets to arrive at a posterior estimate of expected returns.

In the notebook on GitHub, you can also find short examples of finding the efficient frontier while allowing for short-selling or using L2 regularization.

You can also experiment with not using the expected returns. Literature suggests that due to the difficulties in getting an accurate estimate of expected returns, minimum variance portfolios consistently outperform the maximum Sharpe ratio portfolios out-of-sample.

See also

Additional resources concerning the approaches mentioned in the recipe:

  • Black, F; & Litterman, R. 1991. “ Combining investor views with market equilibrium,” The Journal of Fixed Income, 1, (2): 7-18: https://doi.org/10.3905/jfi.1991.408013
  • Black, F., & Litterman, R. 1992. “Global portfolio optimization,” Financial Analysts Journal, 48(5): 28-43
  • Chen, Y., Wiesel, A., Eldar, Y. C., & Hero, A. O. 2010. “Shrinkage Algorithms for MMSE Covariance Estimation,” IEEE Transactions on Signal Processing, 58(10): 5016-5029: https://doi.org/10.1109/TSP.2010.2053029
  • De Prado, M. L. 2016. “Building diversified portfolios that outperform out of sample,” The Journal of Portfolio Management, 42(4): 59-69: https://doi.org/10.3905/jpm.2016.42.4.059.
  • De Prado, M. L. 2018. Advances in Financial Machine Learning. John Wiley & Sons
  • Ledoit, O., & Wolf, M. 2003 “Improved estimation of the covariance matrix of stock returns with an application to portfolio selection,” Journal of Empirical Finance, 10(5): 603-621
  • Ledoit, O., & Wolf, M. 2004. “Honey, I shrunk the sample covariance matrix,” The Journal of Portfolio Management, 30(4): 110-119: https://doi.org/10.3905/jpm.2004.110

Summary

In this chapter, we have learned about asset allocation. We started with the simplest equally-weighted portfolio, which was proven to be quite difficult to outperform, even with advanced optimization techniques. Then, we explored various approaches to calculating the efficient frontier using mean-variance optimization. Lastly, we also touched upon some of the recent developments in asset allocation, that is, the Hierarchical Risk Parity algorithm.

You might find the following references interesting for learning more about approaching asset allocation with Python:

In the next chapter, we cover various methods of backtesting trading and asset allocation strategies.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.18.4