Constructing a struct of arrays

It is easy and straightforward to construct a struct of arrays. After all, we were able to quickly do that for a single field earlier. For completeness, this is how we can design a new data type for storing the same trip payment data in a column-oriented format. The following code shows that this pattern helps to improve performance:

struct TripPaymentColumnarData
vendor_id::Vector{Int}
tpep_pickup_datetime::Vector{String}
tpep_dropoff_datetime::Vector{String}
passenger_count::Vector{Int}
trip_distance::Vector{Float64}
fare_amount::Vector{Float64}
extra::Vector{Float64}
mta_tax::Vector{Float64}
tip_amount::Vector{Float64}
tolls_amount::Vector{Float64}
improvement_surcharge::Vector{Float64}
total_amount::Vector{Float64}
end

Notice that every field has been turned into Vector{T}, where T is the original data type of the particular field. It looks quite ugly but we are willing to sacrifice ugliness here for performance reasons.

The general rule of thumb is that we should just Keep It Simple (KISS). Under certain circumstances, when we do need higher runtime performance, we could bend a little.

Now, although we have a data type that is more optimized for performance, we still need to populate it with data for testing. In this case, it can be achieved quite easily using array comprehension syntax:

columar_records = TripPaymentColumnarData(
[r.vendor_id for r in records],
[r.tpep_pickup_datetime for r in records],
[r.tpep_dropoff_datetime for r in records],
[r.passenger_count for r in records],
[r.trip_distance for r in records],
[r.fare_amount for r in records],
[r.extra for r in records],
[r.mta_tax for r in records],
[r.tip_amount for r in records],
[r.tolls_amount for r in records],
[r.improvement_surcharge for r in records],
[r.total_amount for r in records]
);

When we're done, we can prove to ourselves that the new object structure is indeed optimized:

Yes, it now has great performance, as we expected.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.143.4.181