1 Introduzione ad alcuni argomenti del corso

Vengono caricati i packages necessari per realizzare questo documento

1.1 Package didattico e dati

Per alcuni degli esempi userò il package (mio) MLANP. La versione più aggiornata di solito è sulle mie pagine (https://www.marcellochiodi.com) alla sezione software. Non è sul CRAN.

Dal momento che l’ho creato io per le mie varie lezioni e applicazioni, contiene anche funzioni che non useremo in questo corso

Oltre a una serie di funzioni varie create per comodità e solo a scopo didattico (e computazionalmente non ottimizzate), contiene alcuni dataset che useremo in alcune lezioni:

               Item      class       dim                                               Title
1     antropometric data.frame   1427x18 Data sets of MLANP used in my lessons and exercises
2        buildings1 data.frame 512872x18 Data sets of MLANP used in my lessons and exercises
3      children.rid data.frame   24553x6 Data sets of MLANP used in my lessons and exercises
4         children1 data.frame  26039x26 Data sets of MLANP used in my lessons and exercises
5    exercise.antr1 data.frame    1437x7 Data sets of MLANP used in my lessons and exercises
6   exercise.trial1 data.frame      48x4 Data sets of MLANP used in my lessons and exercises
7   exercise.trial2 data.frame      55x8 Data sets of MLANP used in my lessons and exercises
8         exercise1 data.frame   2500x11 Data sets of MLANP used in my lessons and exercises
9         exercise2 data.frame    500x11 Data sets of MLANP used in my lessons and exercises
10  exercise_child1 data.frame     550x7 Data sets of MLANP used in my lessons and exercises
11 exercise_soccer1 data.frame     306x3 Data sets of MLANP used in my lessons and exercises
12            firms data.frame     286x7 Data sets of MLANP used in my lessons and exercises
13           fraud1 data.frame 310134x14 Data sets of MLANP used in my lessons and exercises
14    gaussian1_128    numeric       510 Data sets of MLANP used in my lessons and exercises
15         granprix data.frame     372x8 Data sets of MLANP used in my lessons and exercises
16     italycatalog data.frame    2158x5         Small sample catalog of italian earthquakes
17          soccer1 data.frame    306x18 Data sets of MLANP used in my lessons and exercises
18  students.survey data.frame     400x9 Data sets of MLANP used in my lessons and exercises
19    students.test data.frame    1782x3 Data sets of MLANP used in my lessons and exercises
20       trial1.new data.frame      93x3 Data sets of MLANP used in my lessons and exercises

1.2 Statistiche descrittive e grafici del data set children.rid

matrice di grafici per le sole variabili gestazione, lunghezza, peso, cranio e su un campione di 2000 righe del data.frame link diretto al grafico 3d.

str(children.rid)
'data.frame':   24553 obs. of  6 variables:
 $ gestazione      : int  41 36 32 34 39 40 40 40 40 38 ...
 $ lunghezza       : int  495 430 430 434 490 490 490 500 505 490 ...
 $ peso            : int  3360 1900 1750 1870 3050 2750 2950 3120 3120 3300 ...
 $ Fumatrici       : int  0 1 2 1 0 0 0 0 0 2 ...
 $ parti.pretermine: int  0 0 0 0 0 0 0 0 0 0 ...
 $ cranio          : int  335 305 300 310 339 330 335 335 355 350 ...
if (sample.set) children.rid=data.sample(children.rid,n)
n=nrow(children.rid)
summary(children.rid)
   gestazione      lunghezza          peso        Fumatrici       parti.pretermine       cranio     
 Min.   :25.00   Min.   :255.0   Min.   : 300   Min.   :0.00000   Min.   :0.000000   Min.   :165.0  
 1st Qu.:38.00   1st Qu.:480.0   1st Qu.:2930   1st Qu.:0.00000   1st Qu.:0.000000   1st Qu.:330.0  
 Median :39.00   Median :500.0   Median :3250   Median :0.00000   Median :0.000000   Median :340.0  
 Mean   :38.75   Mean   :491.8   Mean   :3208   Mean   :0.05364   Mean   :0.009367   Mean   :338.1  
 3rd Qu.:40.00   3rd Qu.:510.0   3rd Qu.:3570   3rd Qu.:0.00000   3rd Qu.:0.000000   3rd Qu.:350.0  
 Max.   :43.00   Max.   :580.0   Max.   :5600   Max.   :3.00000   Max.   :5.000000   Max.   :400.0  
view(dfSummary(children.rid),method = "render")

Data Frame Summary

children.rid

Dimensions: 24553 x 6
Duplicates: 2352
No Variable Stats / Values Freqs (% of Valid) Graph Valid Missing
1 gestazione [integer]
Mean (sd) : 38.7 (2.1)
min ≤ med ≤ max:
25 ≤ 39 ≤ 43
IQR (CV) : 2 (0.1)
19 distinct values 24553 (100.0%) 0 (0.0%)
2 lunghezza [integer]
Mean (sd) : 491.8 (30)
min ≤ med ≤ max:
255 ≤ 500 ≤ 580
IQR (CV) : 30 (0.1)
123 distinct values 24553 (100.0%) 0 (0.0%)
3 peso [integer]
Mean (sd) : 3207.6 (580.9)
min ≤ med ≤ max:
300 ≤ 3250 ≤ 5600
IQR (CV) : 640 (0.2)
472 distinct values 24553 (100.0%) 0 (0.0%)
4 Fumatrici [integer]
Mean (sd) : 0.1 (0.3)
min ≤ med ≤ max:
0 ≤ 0 ≤ 3
IQR (CV) : 0 (5.3)
0:23596(96.1%)
1:608(2.5%)
2:338(1.4%)
3:11(0.0%)
24553 (100.0%) 0 (0.0%)
5 parti.pretermine [integer]
Mean (sd) : 0 (0.1)
min ≤ med ≤ max:
0 ≤ 0 ≤ 5
IQR (CV) : 0 (13)
0:24371(99.3%)
1:147(0.6%)
2:27(0.1%)
3:4(0.0%)
4:3(0.0%)
5:1(0.0%)
24553 (100.0%) 0 (0.0%)
6 cranio [integer]
Mean (sd) : 338.1 (18.1)
min ≤ med ≤ max:
165 ≤ 340 ≤ 400
IQR (CV) : 20 (0.1)
170 distinct values 24553 (100.0%) 0 (0.0%)

Generated by summarytools 1.0.0 (R version 4.1.2)
2022-02-28

MLA.explor.pairs(children.rid[,ind])

MLA.explor.plot2D(children.rid[,1],children.rid[,3],smooth=.3)
Warning in smooth.spline(y ~ x, cv = TRUE, nknots = nknots): cross-validation with non-unique 'x' values seems doubtful

Grafico 3d interattivo

plot3d(children.rid[,c(2,3,6)])