-
Notifications
You must be signed in to change notification settings - Fork 14
Open
Description
I have trained the model on my own and now would like to use it for inference.
I am wondering how you parsed the HTML to get all the relevant nodes of the DOM tree. This is how I implemented it based on what I assume you have used, but the eventual results of the model are way off on new data.
for c, url in enumerate(urls):
#selenium webdriver
driver.get(url)
driver.save_screenshot(os.path.join("test_data", "imgs", f"{c}.png"))
locations = []
ids = driver.find_elements_by_xpath('//*[@id]')
for ii in ids:
#catch stale elements????
try:
if ii.is_displayed():
location_dic = {}
location_dic.update(ii.location)
location_dic.update(ii.size)
#check if bounding box in screenshot
if all([i < 1280 for i in location_dic.values()]):
locations.append(location_dic)
except:
continue
#save bounding boxes in csv
bbox_df = pd.DataFrame(locations)
print(len(bbox_df))
for column in bbox_df.columns:
bbox_df[column] = bbox_df[column].astype(float)
bbox_df.to_csv(os.path.join("test_data", "bboxes", f"{c}.csv"), sep = ",", index = False)Metadata
Metadata
Assignees
Labels
No labels