Joining

Joins are useful to aggregate data that is scattered among different tables. Let’s say that we want to include the location of the hospital in which patient measurements were taken in our dataset. We can reference the location for each patient using the H1, H2, and H3 labels, and we can store the address and identifier of the hospital in a hospital table:

    hospitals = pd.DataFrame(
{ "name" : ["City 1", "City 2", "City 3"],
"address" : ["Address 1", "Address 2", "Address 3"],
"city": ["City 1", "City 2", "City 3"] },
index=["H1", "H2", "H3"])

hospital_id = ["H1", "H2", "H2", "H3", "H3", "H3"]
df['hospital_id'] = hospital_id

Now, we want to find the city where the measure was taken for each patient. We need to map the keys from the hospital_id column to the city stored in the hospitals table.

This can surely be implemented in Python using dictionaries:

    hospital_dict = {
"H1": ("City 1", "Name 1", "Address 1"),
"H2": ("City 2", "Name 2", "Address 2"),
"H3": ("City 3", "Name 3", "Address 3")
}
cities = [hospital_dict[key][0]
for key in hospital_id]

This algorithm runs efficiently with an O(N) time complexity, where N is the size of hospital_id. Pandas allows you to encode the same operation using simple indexing; the advantage is that the join will be performed in heavily optimized Cython and with efficient hashing algorithms. The preceding simple Python expression can be easily converted to Pandas in this way:

    cities = hospitals.loc[hospital_id, "city"]

More advanced joins can also be performed with the pd.DataFrame.join method, which will produce a new pd.DataFrame that will attach the hospital information for each patient:

    result = df.join(hospitals, on='hospital_id')
result.columns
# Result:
# Index(['dia_final', 'dia_initial', 'drug_admst',
# 'sys_final', 'sys_initial',
# 'hospital_id', 'address', 'city', 'name'],
# dtype='object')
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.16.217.187