Thanks for the help. I'll try to continue working on the vm for now.
I'm not sure what dataset task is. I mainly just created dataset using ClearML.Dataset.Create
Honestly anything. I tried looking up on youtube but There's very little material there, especially which is up to date. It's understandable given that ClearML is still in beta. I can look at courses / docs. I just want to be pointed in the right direction as to what I should look up and study
Also the tutorial mentioned serving-engine-ip as a variable but I have no idea what the ip of the serving engine is.
I just want to be able pass output from some step as input to some other step.
These are the pipeline steps. Basically unable to pass these .
Some more of the error.
ValueError: Node train_model, parameter '${split_dataset.split_dataset_id}', input type 'split_dataset_id' is invalid
2021-12-30 16:22:00,130 - clearml.Repository Detection - WARNING - Failed auto-generating package requirements: exception SystemExit() not a BaseException subclass
I want to maybe have a variable in the simple-pipeline.py, which has the value returned by split_dataset
Alright. Can you guide me on how to edit the task configuration object? Is it done via the UI or programatically? Is there a config file and can it work with any config file I create or is it a specific config file? Sorry for the barrage of questions.
So the api is something new for me. I've already seen the sdk. Am I misremembering sending python script and requirements to run on agent directly from the cli? Was there no such way?
What about amount of storage required?
This is the task scheduler btw which will run a function every 6 hours.
I got to that conclusion I think yeah. Basically, can access them as artifacts.
Alright. Anyway I'm practicing with the pipeline. I have an agent listening to the queue. Only problem is, it fails because of requirement issues but I don't know how to pass requirements in this case.
Here's the thread
https://clearml.slack.com/archives/CTK20V944/p1636613509403900
The question has been answered though you can take a look if I understood correctly there.
Elastic is what Clear ML uses to handle Data?
When I try to access the server with the IP I set as CLEARML_HOST_IP, it looks like this. I set that IP to the ip assigned to me by the network
I then did what MartinB suggested and got the id of the task from the pipeline DAG, and then it worked.
AnxiousSeal95 Basically its a function step return. if I do, artifacts.keys(), there are no keys, even though the step prior to it does return the output
I initially wasn't able to get the value this way.
I checked the value is being returned, but I'm having issues accessing merged_dataset_id in the preexecute_callback like the way you showed me.
For anyone facing a similar issue to mine and wanting the model to uploaded just like data is uploaded,
in the Task.init, set the output_uri = True.
This basically makes it use the default file server for clearml that you define in the clearml.conf file. Ty.
tensorflow model.save, it says the model locally in saved model format.
Thank you, I found the solution to my issue, when I started reading at default output uri.
CostlyOstrich36 I'm observing some weird behavior. Before when I added tags to the model before publishing it, it worked fine and I could see the tags in the UI.
Now when I do it this way, tags aren't set. If I then run another code which gets the model, using ID, and then set tags, it worked fine. Let me share the codes.
Quick follow up question. Once I parse args, should they be directly available for i even enque the project for the first time or will i be able to access hyperparameters after running it once?
So I got my answer, for the first one. I found where the data is stored in the server
def watch_folder(folder, batch_size):
count = 0
classes = os.listdir(folder)
class_count = len(classes)
files = []
dirs = []
for cls in classes:
class_dir = os.path.join(folder, cls)
fls = os.listdir(class_dir)
count += len(fls)
files.append(fls)
dirs.append(class_dir)
if count >= batch_size:
dataset = Dataset.create(project='data-repo')
dataset.add_files(folder)
dataset.upload()
dataset.final...
there are other parameters for add_task as well, I'm just curious as to how do I pass the folder and batch size in the schedule_fn=watch_folder part
I'm using clear-ml agent right now. I just upload the task inside a project. I've used arg parse as well however as of yet, I have not been able find writable hyperparameters in the UI. Is there any tutorial video you can recommend that deals with this or something? I was following https://www.youtube.com/watch?v=Y5tPfUm9Ghg&t=1100s this one on youtube but I can't seem to recreate his steps as he sifts through his code.