Reputation
Badges 1
25 × Eureka!Okay, let me see...
These instructions should create the exact chart:
None
What am I missing ?
the parameter datatypes are not being changed when loading them up.
These are the auto logged parameters , inside YOLO, correct?
Just to make sure, you can actually see the value None
in the UI, is that correct? (if everything works as expected, you should see empty string there)
DeterminedToad86
Yes I think this is the issue, on SageMaker a specific compiled version of torchvision was installed (probably part of the image)
Edit the Task (before enqueuing) and change the torchvision URL to:torchvision==0.7.0
Let me know if it worked
yes 🙂
But I think that when you get the internal_task_representation.execution.script you are basically already getting the API object (obviously with the correct version) so you can edit it in place and pass it too
JitteryCoyote63 I think there is a ClearML logger , no?
No worries 🙂 glad to hear it worked out 🙂
JitteryCoyote63
are the calls from the agents made asynchronously/in a non blocking separate thread?
You mean like request processing on the apiserver are multi-threaded / multi-processed ?
Not sure why, but for some reason it seems it is failing to analyze the code, hence the warning and no packages...
Any other hints on your setup that might help to better understand the root cause ? maybe home folder with unicode characters ? python installed in a specific way?
Are you running a jupyter notebook inside vscode ?
LOL, Okay I'm not sure we can do something that one.
You should probably increase the storage on your instance 🙂
well cudnn is actually missing from the base image...
Hi @<1523704757024198656:profile|MysteriousWalrus11>
"parents": [
"step_two",
"step_four"
],
Seems like step 5 depends on steps 2+4 , how did you create it? what did the console say ?
Could it be your not actually passing any output from step3 ? how is it dependent on it ?
I mean clone the Task in the UI (right click Clone), then go to the execution Tab, to the "installed packages" section, then click on Edit -> go to the torchvision http link, and replace it with torchvision == 0.7.0
and save.
Then right enqueue the Task (to the default queue) and see if the Agent can run it,
DeterminedToad86 Make sense ?
WackyRabbit7 interesting! Are those "local" pipelines all part of the same code repository? do they need their own environment ?
What would be the easiest pipeline interface to run them locally? (I would if we could support this workflow, it seems you are not alone in this approach, and of course that you can always use them remotely, i.e. clone the pipeline and launch it on an agent)
you need to set
CLEARML_DEFAULT_BASE_SERVE_URL:
So it knows how to access itself
Ohh "~/trains.conf" is root probably
Hi ShinyWhale52
Every execution of the pipeline (by definition) will create a new job based on the pipeline steps
This is the reason you see all the steps twice (the default assumption is you wish to re-run the step, as this is part of the processing workflow (e.g. training a model)
the model has been overwritten. I guess this is due to this instruction:
This is because you are storing it locally to the same path, it just reflects the fact you just overwrote your model.
To create a...
This is exactly what I did here, and it is working 😞
https://demoapp.demo.clear.ml/projects/0e919ea1cc5c499b99e1ab85004b6e97/experiments/887edef09d4549e88b829a34c87d4d5b/output/execution
JitteryCoyote63 This seems like exactly what you are saying, elastic license issue...
I think you are correct, None values should be listed as empty values not the String None.
What's the clearml version you are using? And could you retest with the latest RC?
Okay that means it is running in virtual environment mode.
On the original Task (the one you enqueued) what were the installed packages (specifically the torch/torchvision) ?
- Could you explain how I can reproduce the missing jupyter notebook (i.e. the ipykernel_launcher.py)
Hi BoredSquirrel45
as of today, my required packages aren't being recognized in cloned
Are you saying you are editing the code directly in the cloned Task, then enqueue the Task an the agent does not "auto recognize" the package ?
Hi @<1643423185791619072:profile|DashingCentipede5>
Notice that you called "start_locally", it tries to run the code locally inside your jupter notebook, it assumes everything including code already exists, is that your case ?
HandsomeCrow5 if you want to edit the Task object you can just use:internal_task_representation = task.data internal_task_representation.execution.script = ... task._edit(execution=internal_task_representation.execution)
This will make sure you do not need to worry about API version etc. the Task object will take care of it.
BTW: it seems a few more people wanted this ability, maybe we should edit a proper .edit method to Task. Thoughts ?
HandsomeCrow5
So using the _edit
method you have the ability to add/edit the execution.script field, without worrying about the API version (I guess the name edit
is misleading, it also does add :)
btw, I looked deeper into the log:
File "/tmp/tmpfa8ifmka.py", line 80, in <module>
model.train(data='coco128.yaml',epochs=20)
I'm assuming this all starts here, I think that the pipeline is Not running the code from the same folder, and you are just missing the 'coco128.yaml' try to pass a full path, wdyt?
Would love to just cap it at a fixed amount for a month for API calls.
Try the timeout configuration, I think this shoud solve all your issues, and will be fairly easy to set for everyone
The package is just subdir by the way. So it should not be in installed packages anyways, right?
Correct, also when the agent is spinning the code it will automatically add the root of the git repository to the pythonpath so you should be able to load the package.