Overview

Dataset statistics

Number of variables3
Number of observations39774
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory932.3 KiB
Average record size in memory24.0 B

Variable types

Numeric1
Categorical2

Alerts

ingredients has a high cardinality: 39674 distinct values High cardinality
id is uniformly distributed Uniform
ingredients is uniformly distributed Uniform
id has unique values Unique

Reproduction

Analysis started2022-05-19 09:45:35.864656
Analysis finished2022-05-19 09:45:38.673229
Duration2.81 seconds
Software versionpandas-profiling v3.1.1
Download configurationconfig.json

Variables

id
Real number (ℝ≥0)

UNIFORM
UNIQUE

Distinct39774
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean24849.53696
Minimum0
Maximum49717
Zeros1
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size310.9 KiB
2022-05-19T10:45:38.715546image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile2466.65
Q112398.25
median24887
Q337328.5
95-th percentile47177.35
Maximum49717
Range49717
Interquartile range (IQR)24930.25

Descriptive statistics

Standard deviation14360.03551
Coefficient of variation (CV)0.5778793999
Kurtosis-1.204702012
Mean24849.53696
Median Absolute Deviation (MAD)12464.5
Skewness-0.003128529465
Sum988365483
Variance206210619.7
MonotonicityNot monotonic
2022-05-19T10:45:38.795794image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
01
 
< 0.1%
211511
 
< 0.1%
272881
 
< 0.1%
252411
 
< 0.1%
313861
 
< 0.1%
293391
 
< 0.1%
191001
 
< 0.1%
170531
 
< 0.1%
231981
 
< 0.1%
436801
 
< 0.1%
Other values (39764)39764
> 99.9%
ValueCountFrequency (%)
01
< 0.1%
11
< 0.1%
21
< 0.1%
31
< 0.1%
41
< 0.1%
61
< 0.1%
81
< 0.1%
91
< 0.1%
101
< 0.1%
141
< 0.1%
ValueCountFrequency (%)
497171
< 0.1%
497161
< 0.1%
497141
< 0.1%
497131
< 0.1%
497121
< 0.1%
497101
< 0.1%
497091
< 0.1%
497081
< 0.1%
497071
< 0.1%
497061
< 0.1%

cuisine
Categorical

Distinct20
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size310.9 KiB
italian
7838 
mexican
6438 
southern_us
4320 
indian
3003 
chinese
2673 
Other values (15)
15502 

Length

Max length12
Median length7
Mean length7.431538191
Min length4

Characters and Unicode

Total characters295582
Distinct characters24
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowgreek
2nd rowsouthern_us
3rd rowfilipino
4th rowindian
5th rowindian

Common Values

ValueCountFrequency (%)
italian7838
19.7%
mexican6438
16.2%
southern_us4320
10.9%
indian3003
 
7.6%
chinese2673
 
6.7%
french2646
 
6.7%
cajun_creole1546
 
3.9%
thai1539
 
3.9%
japanese1423
 
3.6%
greek1175
 
3.0%
Other values (10)7173
18.0%

Length

2022-05-19T10:45:38.883326image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
italian7838
19.7%
mexican6438
16.2%
southern_us4320
10.9%
indian3003
 
7.6%
chinese2673
 
6.7%
french2646
 
6.7%
cajun_creole1546
 
3.9%
thai1539
 
3.9%
japanese1423
 
3.6%
greek1175
 
3.0%
Other values (10)7173
18.0%

Most occurring characters

ValueCountFrequency (%)
i41302
14.0%
n38592
13.1%
a37514
12.7%
e30343
10.3%
s17988
 
6.1%
c17017
 
5.8%
t15326
 
5.2%
r13765
 
4.7%
h13638
 
4.6%
u10675
 
3.6%
Other values (14)59422
20.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter289716
98.0%
Connector Punctuation5866
 
2.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i41302
14.3%
n38592
13.3%
a37514
12.9%
e30343
10.5%
s17988
 
6.2%
c17017
 
5.9%
t15326
 
5.3%
r13765
 
4.8%
h13638
 
4.7%
u10675
 
3.7%
Other values (13)53556
18.5%
Connector Punctuation
ValueCountFrequency (%)
_5866
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin289716
98.0%
Common5866
 
2.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
i41302
14.3%
n38592
13.3%
a37514
12.9%
e30343
10.5%
s17988
 
6.2%
c17017
 
5.9%
t15326
 
5.3%
r13765
 
4.8%
h13638
 
4.7%
u10675
 
3.7%
Other values (13)53556
18.5%
Common
ValueCountFrequency (%)
_5866
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII295582
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
i41302
14.0%
n38592
13.1%
a37514
12.7%
e30343
10.3%
s17988
 
6.1%
c17017
 
5.8%
t15326
 
5.2%
r13765
 
4.7%
h13638
 
4.6%
u10675
 
3.6%
Other values (14)59422
20.1%

ingredients
Categorical

HIGH CARDINALITY
UNIFORM

Distinct39674
Distinct (%)99.7%
Missing0
Missing (%)0.0%
Memory size310.9 KiB
butter, powdered sugar, cream cheese, soften, vanilla extract
 
3
unsalted butter
 
3
all purpose unbleached flour, active dry yeast, warm water, salt, olive oil
 
3
cold water, lime, sugar, sweetened condensed milk
 
3
mccormick parsley flakes, old bay seasoning, eggs, milk, salt, bread, lump crab meat, worcestershire sauce, mayonaise, baking powder
 
2
Other values (39669)
39760 

Length

Max length1044
Median length139
Mean length145.7653492
Min length4

Characters and Unicode

Total characters5797671
Distinct characters58
Distinct categories10 ?
Distinct scripts2 ?
Distinct blocks5 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique39578 ?
Unique (%)99.5%

Sample

1st rowromaine lettuce, black olives, grape tomatoes, garlic, pepper, purple onion, seasoning, garbanzo beans, feta cheese crumbles
2nd rowplain flour, ground pepper, salt, tomatoes, ground black pepper, thyme, eggs, green tomatoes, yellow corn meal, milk, vegetable oil
3rd roweggs, pepper, salt, mayonaise, cooking oil, green chilies, grilled chicken breasts, garlic powder, yellow onion, soy sauce, butter, chicken livers
4th rowwater, vegetable oil, wheat, salt
5th rowblack pepper, shallots, cornflour, cayenne pepper, onions, garlic paste, milk, butter, salt, lemon juice, water, chili powder, passata, oil, ground cumin, boneless chicken skinless thigh, garam masala, double cream, natural yogurt, bay leaf

Common Values

ValueCountFrequency (%)
butter, powdered sugar, cream cheese, soften, vanilla extract3
 
< 0.1%
unsalted butter3
 
< 0.1%
all purpose unbleached flour, active dry yeast, warm water, salt, olive oil3
 
< 0.1%
cold water, lime, sugar, sweetened condensed milk3
 
< 0.1%
mccormick parsley flakes, old bay seasoning, eggs, milk, salt, bread, lump crab meat, worcestershire sauce, mayonaise, baking powder2
 
< 0.1%
oil, water, rice flour, tapioca starch2
 
< 0.1%
soy sauce, cooking oil, garlic, honey, maltose, chinese five-spice powder, white pepper, sesame oil, chinese rose wine, hoisin sauce, red food coloring, pork butt2
 
< 0.1%
melon, prosciutto2
 
< 0.1%
sugar, lemon-lime soda, grated lemon peel, unsalted butter, all-purpose flour, vegetable oil spray, salt, powdered sugar, large eggs, grate lime peel2
 
< 0.1%
tomatoes, fat free milk, all-purpose flour, ground cumin, fat free yogurt, boneless skinless chicken breasts, sour cream, fresh spinach, flour tortillas, green chilies, shredded reduced fat cheddar cheese, salt, sliced green onions2
 
< 0.1%
Other values (39664)39750
99.9%

Length

2022-05-19T10:45:38.971698image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
pepper25742
 
3.2%
salt24426
 
3.0%
oil23323
 
2.9%
garlic18941
 
2.3%
ground18256
 
2.3%
fresh17853
 
2.2%
sauce13129
 
1.6%
sugar12493
 
1.5%
onions12341
 
1.5%
cheese11776
 
1.5%
Other values (3130)629480
77.9%

Most occurring characters

ValueCountFrequency (%)
768053
13.2%
e601293
 
10.4%
,389129
 
6.7%
a377938
 
6.5%
r375574
 
6.5%
s348717
 
6.0%
o336916
 
5.8%
l298820
 
5.2%
i294330
 
5.1%
n254434
 
4.4%
Other values (48)1752467
30.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter4627448
79.8%
Space Separator768053
 
13.2%
Other Punctuation390109
 
6.7%
Dash Punctuation11306
 
0.2%
Decimal Number418
 
< 0.1%
Other Symbol239
 
< 0.1%
Open Punctuation45
 
< 0.1%
Close Punctuation45
 
< 0.1%
Final Punctuation7
 
< 0.1%
Currency Symbol1
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e601293
13.0%
a377938
 
8.2%
r375574
 
8.1%
s348717
 
7.5%
o336916
 
7.3%
l298820
 
6.5%
i294330
 
6.4%
n254434
 
5.5%
c243665
 
5.3%
t224391
 
4.8%
Other values (23)1271370
27.5%
Decimal Number
ValueCountFrequency (%)
1229
54.8%
292
22.0%
429
 
6.9%
021
 
5.0%
517
 
4.1%
315
 
3.6%
95
 
1.2%
85
 
1.2%
74
 
1.0%
61
 
0.2%
Other Punctuation
ValueCountFrequency (%)
,389129
99.7%
&379
 
0.1%
%322
 
0.1%
'199
 
0.1%
.48
 
< 0.1%
!30
 
< 0.1%
/2
 
< 0.1%
Other Symbol
ValueCountFrequency (%)
®181
75.7%
58
 
24.3%
Space Separator
ValueCountFrequency (%)
768053
100.0%
Dash Punctuation
ValueCountFrequency (%)
-11306
100.0%
Open Punctuation
ValueCountFrequency (%)
(45
100.0%
Close Punctuation
ValueCountFrequency (%)
)45
100.0%
Final Punctuation
ValueCountFrequency (%)
7
100.0%
Currency Symbol
ValueCountFrequency (%)
1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin4627448
79.8%
Common1170223
 
20.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
e601293
13.0%
a377938
 
8.2%
r375574
 
8.1%
s348717
 
7.5%
o336916
 
7.3%
l298820
 
6.5%
i294330
 
6.4%
n254434
 
5.5%
c243665
 
5.3%
t224391
 
4.8%
Other values (23)1271370
27.5%
Common
ValueCountFrequency (%)
768053
65.6%
,389129
33.3%
-11306
 
1.0%
&379
 
< 0.1%
%322
 
< 0.1%
1229
 
< 0.1%
'199
 
< 0.1%
®181
 
< 0.1%
292
 
< 0.1%
58
 
< 0.1%
Other values (15)275
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII5796699
> 99.9%
None906
 
< 0.1%
Letterlike Symbols58
 
< 0.1%
Punctuation7
 
< 0.1%
Currency Symbols1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
768053
13.2%
e601293
 
10.4%
,389129
 
6.7%
a377938
 
6.5%
r375574
 
6.5%
s348717
 
6.0%
o336916
 
5.8%
l298820
 
5.2%
i294330
 
5.1%
n254434
 
4.4%
Other values (37)1751495
30.2%
None
ValueCountFrequency (%)
é268
29.6%
è225
24.8%
®181
20.0%
î146
16.1%
ç50
 
5.5%
ú21
 
2.3%
â12
 
1.3%
í3
 
0.3%
Letterlike Symbols
ValueCountFrequency (%)
58
100.0%
Punctuation
ValueCountFrequency (%)
7
100.0%
Currency Symbols
ValueCountFrequency (%)
1
100.0%

Interactions

2022-05-19T10:45:38.231624image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Correlations

2022-05-19T10:45:39.041748image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-05-19T10:45:39.114975image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-05-19T10:45:39.192158image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-05-19T10:45:39.275405image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-05-19T10:45:38.356808image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
A simple visualization of nullity by column.
2022-05-19T10:45:38.610389image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

idcuisineingredients
010259greekromaine lettuce, black olives, grape tomatoes, garlic, pepper, purple onion, seasoning, garbanzo beans, feta cheese crumbles
125693southern_usplain flour, ground pepper, salt, tomatoes, ground black pepper, thyme, eggs, green tomatoes, yellow corn meal, milk, vegetable oil
220130filipinoeggs, pepper, salt, mayonaise, cooking oil, green chilies, grilled chicken breasts, garlic powder, yellow onion, soy sauce, butter, chicken livers
322213indianwater, vegetable oil, wheat, salt
413162indianblack pepper, shallots, cornflour, cayenne pepper, onions, garlic paste, milk, butter, salt, lemon juice, water, chili powder, passata, oil, ground cumin, boneless chicken skinless thigh, garam masala, double cream, natural yogurt, bay leaf
56602jamaicanplain flour, sugar, butter, eggs, fresh ginger root, salt, ground cinnamon, milk, vanilla extract, ground ginger, powdered sugar, baking powder
642779spanisholive oil, salt, medium shrimp, pepper, garlic, chopped cilantro, jalapeno chilies, flat leaf parsley, skirt steak, white vinegar, sea salt, bay leaf, chorizo sausage
73735italiansugar, pistachio nuts, white almond bark, flour, vanilla extract, olive oil, almond extract, eggs, baking powder, dried cranberries
816903mexicanolive oil, purple onion, fresh pineapple, pork, poblano peppers, corn tortillas, cheddar cheese, ground black pepper, salt, iceberg lettuce, lime, jalapeno chilies, chopped cilantro fresh
912734italianchopped tomatoes, fresh basil, garlic, extra-virgin olive oil, kosher salt, flat leaf parsley

Last rows

idcuisineingredients
397648089mexicanchili powder, worcestershire sauce, celery, red kidney beans, lean ground beef, stewed tomatoes, dried parsley, pepper, red wine vinegar, salt, ground cumin, dried basil, red wine, red bell pepper
397656153indiancoconut, unsweetened coconut milk, mint leaves, plain yogurt
3976625557irishrutabaga, ham, thick-cut bacon, potatoes, fresh parsley, salt, onions, pepper, carrots, pork sausages
3976724348italianlow-fat sour cream, grated parmesan cheese, salt, dried oregano, low-fat cottage cheese, butter, onions, olive oil, artichok heart marin, ground cayenne pepper, ground black pepper, garlic, spaghetti
397687377mexicanshredded cheddar cheese, crushed cheese crackers, cheddar cheese soup, cream of chicken soup, hot sauce, diced green chilies, salt, pepper, cooked chicken, rotini
3976929109irishlight brown sugar, granulated sugar, butter, warm water, large eggs, all-purpose flour, whole wheat flour, cooking spray, boiling water, steel-cut oats, dry yeast, salt
3977011462italiankraft zesty italian dressing, purple onion, broccoli florets, rotini, pitted black olives, kraft grated parmesan cheese, red pepper
397712238irisheggs, citrus fruit, raisins, sourdough starter, flour, hot tea, sugar, ground nutmeg, salt, ground cinnamon, milk, butter
3977241882chineseboneless chicken skinless thigh, minced garlic, steamed white rice, baking powder, corn starch, dark soy sauce, kosher salt, peanuts, flour, scallions, chinese rice vinegar, vodka, fresh ginger, egg whites, broccoli, toasted sesame seeds, sugar, store bought low sodium chicken stock, baking soda, shaoxing wine, oil
397732362mexicangreen chile, jalapeno chilies, onions, ground black pepper, salt, chopped cilantro fresh, green bell pepper, garlic, white sugar, roma tomatoes, celery, dried oregano