Dataset statistics
Number of variables | 3 |
---|---|
Number of observations | 39774 |
Missing cells | 0 |
Missing cells (%) | 0.0% |
Duplicate rows | 0 |
Duplicate rows (%) | 0.0% |
Total size in memory | 932.3 KiB |
Average record size in memory | 24.0 B |
Variable types
Numeric | 1 |
---|---|
Categorical | 2 |
ingredients has a high cardinality: 39674 distinct values | High cardinality |
id is uniformly distributed | Uniform |
ingredients is uniformly distributed | Uniform |
id has unique values | Unique |
Reproduction
Analysis started | 2022-05-19 09:45:35.864656 |
---|---|
Analysis finished | 2022-05-19 09:45:38.673229 |
Duration | 2.81 seconds |
Software version | pandas-profiling v3.1.1 |
Download configuration | config.json |
Distinct | 39774 |
---|---|
Distinct (%) | 100.0% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 24849.53696 |
Minimum | 0 |
---|---|
Maximum | 49717 |
Zeros | 1 |
Zeros (%) | < 0.1% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 310.9 KiB |
Quantile statistics
Minimum | 0 |
---|---|
5-th percentile | 2466.65 |
Q1 | 12398.25 |
median | 24887 |
Q3 | 37328.5 |
95-th percentile | 47177.35 |
Maximum | 49717 |
Range | 49717 |
Interquartile range (IQR) | 24930.25 |
Descriptive statistics
Standard deviation | 14360.03551 |
---|---|
Coefficient of variation (CV) | 0.5778793999 |
Kurtosis | -1.204702012 |
Mean | 24849.53696 |
Median Absolute Deviation (MAD) | 12464.5 |
Skewness | -0.003128529465 |
Sum | 988365483 |
Variance | 206210619.7 |
Monotonicity | Not monotonic |
Value | Count | Frequency (%) |
0 | 1 | < 0.1% |
21151 | 1 | < 0.1% |
27288 | 1 | < 0.1% |
25241 | 1 | < 0.1% |
31386 | 1 | < 0.1% |
29339 | 1 | < 0.1% |
19100 | 1 | < 0.1% |
17053 | 1 | < 0.1% |
23198 | 1 | < 0.1% |
43680 | 1 | < 0.1% |
Other values (39764) | 39764 |
Value | Count | Frequency (%) |
0 | 1 | |
1 | 1 | |
2 | 1 | |
3 | 1 | |
4 | 1 | |
6 | 1 | |
8 | 1 | |
9 | 1 | |
10 | 1 | |
14 | 1 |
Value | Count | Frequency (%) |
49717 | 1 | |
49716 | 1 | |
49714 | 1 | |
49713 | 1 | |
49712 | 1 | |
49710 | 1 | |
49709 | 1 | |
49708 | 1 | |
49707 | 1 | |
49706 | 1 |
cuisine
Categorical
Distinct | 20 |
---|---|
Distinct (%) | 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 310.9 KiB |
italian | |
---|---|
mexican | |
southern_us | |
indian | |
chinese | |
Other values (15) |
Common Values
Value | Count | Frequency (%) |
italian | 7838 | |
mexican | 6438 | |
southern_us | 4320 | |
indian | 3003 | 7.6% |
chinese | 2673 | 6.7% |
french | 2646 | 6.7% |
cajun_creole | 1546 | 3.9% |
thai | 1539 | 3.9% |
japanese | 1423 | 3.6% |
greek | 1175 | 3.0% |
Other values (10) | 7173 |
Length
Value | Count | Frequency (%) |
italian | 7838 | |
mexican | 6438 | |
southern_us | 4320 | |
indian | 3003 | 7.6% |
chinese | 2673 | 6.7% |
french | 2646 | 6.7% |
cajun_creole | 1546 | 3.9% |
thai | 1539 | 3.9% |
japanese | 1423 | 3.6% |
greek | 1175 | 3.0% |
Other values (10) | 7173 |
Most occurring characters
Value | Count | Frequency (%) |
i | 41302 | |
n | 38592 | |
a | 37514 | |
e | 30343 | |
s | 17988 | 6.1% |
c | 17017 | 5.8% |
t | 15326 | 5.2% |
r | 13765 | 4.7% |
h | 13638 | 4.6% |
u | 10675 | 3.6% |
Other values (14) | 59422 |
Most occurring categories
Value | Count | Frequency (%) |
Lowercase Letter | 289716 | |
Connector Punctuation | 5866 | 2.0% |
Most frequent character per category
Lowercase Letter
Value | Count | Frequency (%) |
i | 41302 | |
n | 38592 | |
a | 37514 | |
e | 30343 | |
s | 17988 | 6.2% |
c | 17017 | 5.9% |
t | 15326 | 5.3% |
r | 13765 | 4.8% |
h | 13638 | 4.7% |
u | 10675 | 3.7% |
Other values (13) | 53556 |
Connector Punctuation
Value | Count | Frequency (%) |
_ | 5866 |
Most occurring scripts
Value | Count | Frequency (%) |
Latin | 289716 | |
Common | 5866 | 2.0% |
Most frequent character per script
Latin
Value | Count | Frequency (%) |
i | 41302 | |
n | 38592 | |
a | 37514 | |
e | 30343 | |
s | 17988 | 6.2% |
c | 17017 | 5.9% |
t | 15326 | 5.3% |
r | 13765 | 4.8% |
h | 13638 | 4.7% |
u | 10675 | 3.7% |
Other values (13) | 53556 |
Common
Value | Count | Frequency (%) |
_ | 5866 |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 295582 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
i | 41302 | |
n | 38592 | |
a | 37514 | |
e | 30343 | |
s | 17988 | 6.1% |
c | 17017 | 5.8% |
t | 15326 | 5.2% |
r | 13765 | 4.7% |
h | 13638 | 4.6% |
u | 10675 | 3.6% |
Other values (14) | 59422 |
Distinct | 39674 |
---|---|
Distinct (%) | 99.7% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 310.9 KiB |
butter, powdered sugar, cream cheese, soften, vanilla extract | 3 |
---|---|
unsalted butter | 3 |
all purpose unbleached flour, active dry yeast, warm water, salt, olive oil | 3 |
cold water, lime, sugar, sweetened condensed milk | 3 |
mccormick parsley flakes, old bay seasoning, eggs, milk, salt, bread, lump crab meat, worcestershire sauce, mayonaise, baking powder | 2 |
Other values (39669) |
Length
Max length | 1044 |
---|---|
Median length | 139 |
Mean length | 145.7653492 |
Min length | 4 |
Characters and Unicode
Total characters | 5797671 |
---|---|
Distinct characters | 58 |
Distinct categories | 10 ? |
Distinct scripts | 2 ? |
Distinct blocks | 5 ? |
Unique
Unique | 39578 ? |
---|---|
Unique (%) | 99.5% |
Sample
1st row | romaine lettuce, black olives, grape tomatoes, garlic, pepper, purple onion, seasoning, garbanzo beans, feta cheese crumbles |
---|---|
2nd row | plain flour, ground pepper, salt, tomatoes, ground black pepper, thyme, eggs, green tomatoes, yellow corn meal, milk, vegetable oil |
3rd row | eggs, pepper, salt, mayonaise, cooking oil, green chilies, grilled chicken breasts, garlic powder, yellow onion, soy sauce, butter, chicken livers |
4th row | water, vegetable oil, wheat, salt |
5th row | black pepper, shallots, cornflour, cayenne pepper, onions, garlic paste, milk, butter, salt, lemon juice, water, chili powder, passata, oil, ground cumin, boneless chicken skinless thigh, garam masala, double cream, natural yogurt, bay leaf |
Common Values
Value | Count | Frequency (%) |
butter, powdered sugar, cream cheese, soften, vanilla extract | 3 | < 0.1% |
unsalted butter | 3 | < 0.1% |
all purpose unbleached flour, active dry yeast, warm water, salt, olive oil | 3 | < 0.1% |
cold water, lime, sugar, sweetened condensed milk | 3 | < 0.1% |
mccormick parsley flakes, old bay seasoning, eggs, milk, salt, bread, lump crab meat, worcestershire sauce, mayonaise, baking powder | 2 | < 0.1% |
oil, water, rice flour, tapioca starch | 2 | < 0.1% |
soy sauce, cooking oil, garlic, honey, maltose, chinese five-spice powder, white pepper, sesame oil, chinese rose wine, hoisin sauce, red food coloring, pork butt | 2 | < 0.1% |
melon, prosciutto | 2 | < 0.1% |
sugar, lemon-lime soda, grated lemon peel, unsalted butter, all-purpose flour, vegetable oil spray, salt, powdered sugar, large eggs, grate lime peel | 2 | < 0.1% |
tomatoes, fat free milk, all-purpose flour, ground cumin, fat free yogurt, boneless skinless chicken breasts, sour cream, fresh spinach, flour tortillas, green chilies, shredded reduced fat cheddar cheese, salt, sliced green onions | 2 | < 0.1% |
Other values (39664) | 39750 |
Length
Value | Count | Frequency (%) |
pepper | 25742 | 3.2% |
salt | 24426 | 3.0% |
oil | 23323 | 2.9% |
garlic | 18941 | 2.3% |
ground | 18256 | 2.3% |
fresh | 17853 | 2.2% |
sauce | 13129 | 1.6% |
sugar | 12493 | 1.5% |
onions | 12341 | 1.5% |
cheese | 11776 | 1.5% |
Other values (3130) | 629480 |
Most occurring characters
Value | Count | Frequency (%) |
768053 | ||
e | 601293 | 10.4% |
, | 389129 | 6.7% |
a | 377938 | 6.5% |
r | 375574 | 6.5% |
s | 348717 | 6.0% |
o | 336916 | 5.8% |
l | 298820 | 5.2% |
i | 294330 | 5.1% |
n | 254434 | 4.4% |
Other values (48) | 1752467 |
Most occurring categories
Value | Count | Frequency (%) |
Lowercase Letter | 4627448 | |
Space Separator | 768053 | 13.2% |
Other Punctuation | 390109 | 6.7% |
Dash Punctuation | 11306 | 0.2% |
Decimal Number | 418 | < 0.1% |
Other Symbol | 239 | < 0.1% |
Open Punctuation | 45 | < 0.1% |
Close Punctuation | 45 | < 0.1% |
Final Punctuation | 7 | < 0.1% |
Currency Symbol | 1 | < 0.1% |
Most frequent character per category
Lowercase Letter
Value | Count | Frequency (%) |
e | 601293 | |
a | 377938 | 8.2% |
r | 375574 | 8.1% |
s | 348717 | 7.5% |
o | 336916 | 7.3% |
l | 298820 | 6.5% |
i | 294330 | 6.4% |
n | 254434 | 5.5% |
c | 243665 | 5.3% |
t | 224391 | 4.8% |
Other values (23) | 1271370 |
Decimal Number
Value | Count | Frequency (%) |
1 | 229 | |
2 | 92 | |
4 | 29 | 6.9% |
0 | 21 | 5.0% |
5 | 17 | 4.1% |
3 | 15 | 3.6% |
9 | 5 | 1.2% |
8 | 5 | 1.2% |
7 | 4 | 1.0% |
6 | 1 | 0.2% |
Other Punctuation
Value | Count | Frequency (%) |
, | 389129 | |
& | 379 | 0.1% |
% | 322 | 0.1% |
' | 199 | 0.1% |
. | 48 | < 0.1% |
! | 30 | < 0.1% |
/ | 2 | < 0.1% |
Other Symbol
Value | Count | Frequency (%) |
® | 181 | |
™ | 58 | 24.3% |
Space Separator
Value | Count | Frequency (%) |
768053 |
Dash Punctuation
Value | Count | Frequency (%) |
- | 11306 |
Open Punctuation
Value | Count | Frequency (%) |
( | 45 |
Close Punctuation
Value | Count | Frequency (%) |
) | 45 |
Final Punctuation
Value | Count | Frequency (%) |
’ | 7 |
Currency Symbol
Value | Count | Frequency (%) |
€ | 1 |
Most occurring scripts
Value | Count | Frequency (%) |
Latin | 4627448 | |
Common | 1170223 | 20.2% |
Most frequent character per script
Latin
Value | Count | Frequency (%) |
e | 601293 | |
a | 377938 | 8.2% |
r | 375574 | 8.1% |
s | 348717 | 7.5% |
o | 336916 | 7.3% |
l | 298820 | 6.5% |
i | 294330 | 6.4% |
n | 254434 | 5.5% |
c | 243665 | 5.3% |
t | 224391 | 4.8% |
Other values (23) | 1271370 |
Common
Value | Count | Frequency (%) |
768053 | ||
, | 389129 | |
- | 11306 | 1.0% |
& | 379 | < 0.1% |
% | 322 | < 0.1% |
1 | 229 | < 0.1% |
' | 199 | < 0.1% |
® | 181 | < 0.1% |
2 | 92 | < 0.1% |
™ | 58 | < 0.1% |
Other values (15) | 275 | < 0.1% |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 5796699 | |
None | 906 | < 0.1% |
Letterlike Symbols | 58 | < 0.1% |
Punctuation | 7 | < 0.1% |
Currency Symbols | 1 | < 0.1% |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
768053 | ||
e | 601293 | 10.4% |
, | 389129 | 6.7% |
a | 377938 | 6.5% |
r | 375574 | 6.5% |
s | 348717 | 6.0% |
o | 336916 | 5.8% |
l | 298820 | 5.2% |
i | 294330 | 5.1% |
n | 254434 | 4.4% |
Other values (37) | 1751495 |
None
Value | Count | Frequency (%) |
é | 268 | |
è | 225 | |
® | 181 | |
î | 146 | |
ç | 50 | 5.5% |
ú | 21 | 2.3% |
â | 12 | 1.3% |
í | 3 | 0.3% |
Letterlike Symbols
Value | Count | Frequency (%) |
™ | 58 |
Punctuation
Value | Count | Frequency (%) |
’ | 7 |
Currency Symbols
Value | Count | Frequency (%) |
€ | 1 |
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.First rows
id | cuisine | ingredients | |
---|---|---|---|
0 | 10259 | greek | romaine lettuce, black olives, grape tomatoes, garlic, pepper, purple onion, seasoning, garbanzo beans, feta cheese crumbles |
1 | 25693 | southern_us | plain flour, ground pepper, salt, tomatoes, ground black pepper, thyme, eggs, green tomatoes, yellow corn meal, milk, vegetable oil |
2 | 20130 | filipino | eggs, pepper, salt, mayonaise, cooking oil, green chilies, grilled chicken breasts, garlic powder, yellow onion, soy sauce, butter, chicken livers |
3 | 22213 | indian | water, vegetable oil, wheat, salt |
4 | 13162 | indian | black pepper, shallots, cornflour, cayenne pepper, onions, garlic paste, milk, butter, salt, lemon juice, water, chili powder, passata, oil, ground cumin, boneless chicken skinless thigh, garam masala, double cream, natural yogurt, bay leaf |
5 | 6602 | jamaican | plain flour, sugar, butter, eggs, fresh ginger root, salt, ground cinnamon, milk, vanilla extract, ground ginger, powdered sugar, baking powder |
6 | 42779 | spanish | olive oil, salt, medium shrimp, pepper, garlic, chopped cilantro, jalapeno chilies, flat leaf parsley, skirt steak, white vinegar, sea salt, bay leaf, chorizo sausage |
7 | 3735 | italian | sugar, pistachio nuts, white almond bark, flour, vanilla extract, olive oil, almond extract, eggs, baking powder, dried cranberries |
8 | 16903 | mexican | olive oil, purple onion, fresh pineapple, pork, poblano peppers, corn tortillas, cheddar cheese, ground black pepper, salt, iceberg lettuce, lime, jalapeno chilies, chopped cilantro fresh |
9 | 12734 | italian | chopped tomatoes, fresh basil, garlic, extra-virgin olive oil, kosher salt, flat leaf parsley |
Last rows
id | cuisine | ingredients | |
---|---|---|---|
39764 | 8089 | mexican | chili powder, worcestershire sauce, celery, red kidney beans, lean ground beef, stewed tomatoes, dried parsley, pepper, red wine vinegar, salt, ground cumin, dried basil, red wine, red bell pepper |
39765 | 6153 | indian | coconut, unsweetened coconut milk, mint leaves, plain yogurt |
39766 | 25557 | irish | rutabaga, ham, thick-cut bacon, potatoes, fresh parsley, salt, onions, pepper, carrots, pork sausages |
39767 | 24348 | italian | low-fat sour cream, grated parmesan cheese, salt, dried oregano, low-fat cottage cheese, butter, onions, olive oil, artichok heart marin, ground cayenne pepper, ground black pepper, garlic, spaghetti |
39768 | 7377 | mexican | shredded cheddar cheese, crushed cheese crackers, cheddar cheese soup, cream of chicken soup, hot sauce, diced green chilies, salt, pepper, cooked chicken, rotini |
39769 | 29109 | irish | light brown sugar, granulated sugar, butter, warm water, large eggs, all-purpose flour, whole wheat flour, cooking spray, boiling water, steel-cut oats, dry yeast, salt |
39770 | 11462 | italian | kraft zesty italian dressing, purple onion, broccoli florets, rotini, pitted black olives, kraft grated parmesan cheese, red pepper |
39771 | 2238 | irish | eggs, citrus fruit, raisins, sourdough starter, flour, hot tea, sugar, ground nutmeg, salt, ground cinnamon, milk, butter |
39772 | 41882 | chinese | boneless chicken skinless thigh, minced garlic, steamed white rice, baking powder, corn starch, dark soy sauce, kosher salt, peanuts, flour, scallions, chinese rice vinegar, vodka, fresh ginger, egg whites, broccoli, toasted sesame seeds, sugar, store bought low sodium chicken stock, baking soda, shaoxing wine, oil |
39773 | 2362 | mexican | green chile, jalapeno chilies, onions, ground black pepper, salt, chopped cilantro fresh, green bell pepper, garlic, white sugar, roma tomatoes, celery, dried oregano |