-
Notifications
You must be signed in to change notification settings - Fork 3
Ilias_Khugaev-w2-Databases #13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Ilias_Khugaev-w2-Databases #13
Conversation
yunchen4
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey Ilias,
There is a big point that you need to think again and rework. It's about the relationship between authors and papers. Is it one-to-one, one-to-many, or many-to-many?
Please let me know on Slack when you finish the rework. Don't hesitate to contact me if you have any questions!
Week2/assignment/Keys.js
Outdated
| CREATE TABLE IF NOT EXISTS authors ( | ||
| author_id INT AUTO_INCREMENT PRIMARY KEY, | ||
| author_name VARCHAR(50) NOT NULL, | ||
| university TEXT, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: TEXT is for long text. VARCHAR should be enough for university name.
Week2/assignment/Keys.js
Outdated
| university TEXT, | ||
| date_of_birth DATE NOT NULL, | ||
| h_index INT NOT NULL, | ||
| paper_id INT, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need rework: if an author write several papers, is paper_id a good idea to have in the author table?
You can re-think about the relationship between papers and authors.
|
@yunchen4 Thank you for your feedback! I’ve made changes based on your suggestions and added an additional join table between authors and papers. I hope this is a good solution for handling cases where one author has multiple research publications. |
yunchen4
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Ilias,
Your new table schema is correct. However you also need to change your solutions for aggregating functions to use the new relationship table. I have made them explicit.
Please let me know if you have any questions!
Week2/assignment/Relationships.js
Outdated
| paper_id INT AUTO_INCREMENT PRIMARY KEY, | ||
| paper_title VARCHAR(100) NOT NULL, | ||
| conference TEXT, | ||
| author_id INT, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you have a new table for the relationship between authors and papers, you don't need author_id column in paper table, as one paper can be written by several authors.
Week2/assignment/Joins.js
Outdated
|
|
||
| export const joins = async() => { | ||
| connection.query("SELECT author_name, mentor FROM authors"); | ||
| connection.query("SELECT authors.author_name, research_papers.paper_title FROM authors LEFT JOIN research_papers ON authors.author_id = research_papers. author_id"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs rework: what if a paper has several different authors? You should use the relationship table.
|
|
||
|
|
||
| export const aggreg = async() => { | ||
| connection.query("SELECT paper_title, COUNT(author_id) AS author_count FROM research_papers GROUP BY paper_title"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs rework: based on the current table schema, you always get one author per paper, as author_id in paper table accepts only one int. You need to use the relationship table.
| connection.query("SELECT paper_title, COUNT(author_id) AS author_count FROM research_papers GROUP BY paper_title"); | ||
| connection.query("SELECT COUNT(research_papers.paper_id) FROM research_papers INNER JOIN authors on authors.author_id = research_papers.author_id WHERE authors.gender = 'female'"); | ||
| connection.query("SELECT university, FROM authors GROUP BY university"); | ||
| connection.query("SELECT university, COUNT(paper_id) FROM authors GROUP BY university"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
connection.query("SELECT university, COUNT(paper_id) FROM authors GROUP BY university");
Needs rework: you have changed your table schemas to have the relationship table, and there's no paper_id column in authors. So you have to write this query using the relationship table. And be careful: what if several authors from the same university are writing the same paper? Will the same paper be counted multiple times?
|
|
||
| export const aggreg = async() => { | ||
| connection.query("SELECT paper_title, COUNT(author_id) AS author_count FROM research_papers GROUP BY paper_title"); | ||
| connection.query("SELECT COUNT(research_papers.paper_id) FROM research_papers INNER JOIN authors on authors.author_id = research_papers.author_id WHERE authors.gender = 'female'"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
connection.query("SELECT COUNT(research_papers.paper_id) FROM research_papers INNER JOIN authors on authors.author_id = research_papers.author_id WHERE authors.gender = 'female'");
Needs rework: same here, you need to use the relationship table. And be careful: what if several female authors are writing the same paper? Will the same paper be counted multiple times?
Week2/assignment/Relationships.js
Outdated
| export const relationships = async() => { | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I saw you export those functions but I didn't see you use them?
yunchen4
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Ilias,
I left some comments for the aggregate problems. Overall you're doing them correctly, but there are some points you need to check again and be aware of.
I will approve your PR.
|
|
||
| export const aggreg = async() => { | ||
| connection.query("SELECT rp.paper_id,rp.paper_title, COUNT(ap.author_id) AS number_of_authors FROM research_papers rp LEFT JOIN author_papers ap ON rp.paper_id = ap.paper_id GROUP BY rp.paper_id, rp.paper_title"); | ||
| connection.query("SELECT COUNT(ap.paper_id) AS total_papers_by_female_authors FROM authors a JOIN author_papers ap ON a.author_id = ap.author_id a.gender = 'female';"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please notice this query has syntax error: "SELECT COUNT(ap.paper_id) AS total_papers_by_female_authors FROM authors a JOIN author_papers ap ON a.author_id = ap.author_id a.gender = 'female';"
You're missing where.
And notice that you will count the same paper for multiple times if several female authors write this paper. If you don't want to count duplicated paper, you need to use DISTINCT.
| connection.query("SELECT rp.paper_id,rp.paper_title, COUNT(ap.author_id) AS number_of_authors FROM research_papers rp LEFT JOIN author_papers ap ON rp.paper_id = ap.paper_id GROUP BY rp.paper_id, rp.paper_title"); | ||
| connection.query("SELECT COUNT(ap.paper_id) AS total_papers_by_female_authors FROM authors a JOIN author_papers ap ON a.author_id = ap.author_id a.gender = 'female';"); | ||
| connection.query("SELECT AVG(h_index) AS avg_h_index FROM authors GROUP BY university"); | ||
| connection.query("SELECT a.university, COUNT(ap.paper_id) AS total_papers FROM authors a JOIN author_papers ap ON a.author_id = ap.author_id GROUP BY a.university"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, if several authors from the same university write the same paper, this paper will be counted multiple times.
No description provided.