Reputation
Badges 1
147 × Eureka!mostly the transformation of the pandas Dataframe - how the columns are added/removed/change types, NAs removed, rows removed etc
for the tasks that are not deleted, log is different:[2021-09-09 12:19:07,718] [8] [WARNING] [clearml.service_repo] Returned 400 for tasks.dequeue in 4ms, msg=Invalid task id: status=stopped, expected=queued
but the old ones are there, and I canāt do anything about them
Thanks for the answer! Registering some metadata as a model doesnāt feel correct to me. But anyway this is certainly not a show-stopper. Just wanted to clarify.
I think they appeared when I had a lot of HPO tasks enqueued and not started yet, and then I decided to either Abort or Archive them - I donāt remember already
no new unremovable entries have appeared (although I havenāt tried)
log:[2021-09-09 11:22:09,339] [8] [WARNING] [clearml.service_repo] Returned 400 for tasks.dequeue in 2ms, msg=Invalid task id: id=28d2cf5233fe41399c255950aa8b 8c9d,company=d1bd92a3b039400cbafc60a7a5b1e52b
self-hosted. Just upgraded to latest version today (1.1.1). The problem appeared when we were still using 1.0.2
this does not prevent from enqueuing and running new tasks, rather an eyesore
yeah, I think Iāll go with schedule_function
right now, but your proposed idea would make it even clearer.
slightly related follow-up question: can I add user properties to a scheduler configuration?
yes, Iāll try it out
not sure I fully get it. Where will the connection between task and scheduler appear?
Maybe it makes sense to use schedule_function
instead of schedule_task_id
and then the schedule function will perform the cloning of the last task and starting the clone?
I see that scheduler task UI has the capabilities to edit user properties. But I donāt see how I can read and/or write them through code
I want to have 2 instances of scheduler - 1 starts reporting jobs for staging, another one for prod
and ClearML should strive to be clear, amirite? š
worked fine, thanks!
I create the dataset like this:
` project_name = "Sandbox"
task_name = "get_raw_data"
task = Task.init(project_name=project_name,
task_name=task_name,
task_type=Task.TaskTypes.data_processing,
)
dataset = Dataset.create(use_current_task=True)
adding some files here
dataset.upload(verbose=True)
dataset.finalize(verbose=True) `
also - line 77 which sets (non-system) tags is not invoked for me, thus if I define different tags for both task and dataset - then latter is being lost
but I donāt get to this line, because my task is already of type data_processing
I see that in the end, both query functions are calling Task._query_tasks
when I go into Dataset.list_datasets with the debugger and remove system_tags=[ādatasetā] from api call params - I get the correct response back
Basically, my problem is that it returns empty result. In the same code I can get dataset by its ID and I can get the task (which created the dataset) usingTask.get_tasks()
(without mentioning th ID explicitly)
I do see the āData Processingā type task in UI together with all other dataset-related features, like lineage plot