Commit 27daca3f authored by farber2309's avatar farber2309
Browse files

Update FastAPI-Docker project/Dockerfile, FastAPI-Docker project/README.md,...

Update FastAPI-Docker project/Dockerfile, FastAPI-Docker project/README.md, FastAPI-Docker project/requirements.txt, FastAPI-Docker project/app/__init__.py, FastAPI-Docker project/app/functions.py, FastAPI-Docker project/app/main.py, FastAPI-Docker project/app/lastfm-matrix-germany.csv
parent 49a6b2ab
FROM python:3.8.12
WORKDIR /fastapi
COPY ./requirements.txt /fastapi/requirements.txt
COPY ./app /fastapi/app
RUN pip install --no-cache-dir -r requirements.txt
CMD ["uvicorn", "app.main:app", "--host","0.0.0.0","--port","5000"]
\ No newline at end of file
Dear Kptn_Cook team,
Thank you for this interesting challenge.
I consider every task as an opportunity to learn something new and this one was particularly great. I haven't used FastApi or Docker before, so I was essentially shocked how cool and important these applications are. It took me more time than expected to complete the task, but I'm grateful that I learned so much about how to use them. Because of my busy schedule, I haven't had a lot of time to improve my application, but I hope that I can still meet all the requirements.
In my solution, I first made python functions {first_endpoint} and {second_endpoint} and worked with pandas and jupyther notebook to develop them. The functions are commented and could be found in {functions.py}.
Let me shortly explain how this "suggestion" {second_endpoint} function works, since we have no labeled training data and a small dataset, I thought that using Machine Learning would be too time-expensive and ineffective, so I just used a preinstalled correlation function from pandas, which seems to work quiet well. My approach can be described as follows. First I find all non-zero elements by our chosen user and then take a random non-zero "known" artist to find 5 most correlated ones to him and then check if the most correlated artist is "known" or not. If not, the function suggests him, else it takes the second most correlated and so on. If everyone is known we take another listened artist and do the same thing three times. If, for some reason, every one was already known, we suggest a random one. In case some user has >100 listened bands, it suggests a random one.
Then I started to work with FastApi and Docker to develop a simple app and containerize it. First, I installed FastApi and created a virtual environment with instructions from the FastApi website. Then I watched a lot of FastApi tutorials to write my functions and let them work properly. I used openApi to interact with my app. (/docs after 127.0.0.1) After that, I installed docker and learned about its unique functions. The documentation was a bit complicated, so I needed to read a lot of documentation to understand it, after first successes with creating requirements.txt and dockerfile, I unfortunately couldn't start my local server. The solution was to first build my docker image and run it and then parallel to open a new terminal window and map it to a different port (in my case from 5000 to 3000, with [docker run -p 3000:5000 fastapi]).
import pandas as pd
import random
def first_endpoint(user_id:int, band:str):
"""
Input: user id as {user_id} and band name as {band}
Output: 1 in a_{user_id}{band}
with pandas we update our csv matrix with
information that user number {user_id} listend to band {band}.
"""
df=pd.read_csv("app/lastfm-matrix-germany.csv") #import file as dataframe
if user_id not in df["user"].to_list():
return "no such user found"
if band not in df.columns:
return "no such artist found"
ind=df.index[df["user"]==user_id][0] # find index of user with name {user_id}
df3=df.transpose()
num1=df3[ind].sum()-user_id
df.at[ind,band] = 1 # change entry
df.to_csv("app/lastfm-matrix-germany.csv", index=False) # overwrite the file \
df2=pd.read_csv("app/lastfm-matrix-germany.csv")
df2=df.transpose()
num2=df2[ind].sum()-user_id
return f'number of artists listed before: {num1} and after the change: {num2}'
def second_endpoint(user_id:int):
'''
Input: user id as {user_id}
Output: Suggestion of band which wasn't
previously listend by user with{user_id}
For someone who listend more than 100 bands we suggest a random one.
Elsee we chose an artist which this user already listend to and then find the
most correlated one, if it was also listend we chose the second most correlated one and so on.
If every {n} most correlated were all already listend we repeat the procedure for 3 more times.
If all them we also already listend we return a random one.
this program was originaly made with Jupyter so one can see all the steps one by one
'''
df=pd.read_csv("app/lastfm-matrix-germany.csv") #import file
#check if this user exists
if user_id not in df["user"].to_list():
return( "no such user found")
ind=df.index[df["user"]==user_id][0]
#find the number of artists listend
df2=df.transpose()
num=df2[ind].sum()-user_id
# if number of listend bands is >100 return random one
if num >100:
return f'suggested artist: {df.columns[random.randint(1,len(df.columns))]}'
#take only listend from the file
df3=df2[ind].sort_values(ascending=False).head(num+1)
listend=df3.index.values[1:num+1]
#we choose a random listend one
random_singer_known=listend[random.randint(0,num-1)]
n=5
#we find out the correlation between the choosen one and every other artist
# and take 5 most correlated
df5=df.drop("user",axis=1).corr()[random_singer_known].sort_values().tail(n+1) #here we have n+1 because we want to see the choosen one
#we make a list of suggestions
list_of_suggestions=df5.index.values[0:-1]
#and check if they we already listend to
for i in range(n):
candidate=list_of_suggestions[-i-1]
if candidate not in listend:
return f'suggested artist: {candidate}' # if not we return our suggestion
# in bad case we repeat the procedure 3 more times
for j in range(3):
random_singer_known2=listend[random.randint(0,num-1)]
df5=df.drop("user",axis=1).corr()[random_singer_known2].sort_values().tail(n+1)
list_of_suggestions2=df5.index.values[0:-1]
for i in range(n):
candidate=list_of_suggestions2[-i-1]
if candidate not in listend:
return f'suggested artist: {candidate}'
# in worst case we give a random artist as output
return f'suggested artist: {df.columns[random.randint(1,len(df.columns))]}'
This diff is collapsed.
#from typing import Union
from fastapi import FastAPI
#from pydantic import BaseModel
#import pandas as pd
import uvicorn
from app.functions import first_endpoint, second_endpoint
#we start our app
app = FastAPI()
#something to see on the first page
@app.get("/")
def read_root():
return "Wellcome to my solution to your task, please type /docs after the local host to interract with app"
# in this function we change the entry localy in the docker image
@app.put("/users/{band_listend}")
async def read_user(user_id:int,band:str):
return first_endpoint(user_id,band)
# this function gives us a prediction
@app.get("/users/{band_suggested}")
async def read_user(user_id: int):
return second_endpoint(user_id)
\ No newline at end of file
anyio==3.6.1
asgiref==3.5.2
click==8.1.3
fastapi==0.78.0
h11==0.13.0
idna==3.3
pandas==1.4.2
pydantic==1.9.1
python-dateutil==2.8.2
pytz==2022.1
six==1.16.0
sniffio==1.2.0
starlette==0.19.1
typing-extensions==4.2.0
uvicorn==0.17.6
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment