Reputation
Badges 1
25 × Eureka!Hi @<1651395720067944448:profile|GiddyHedgehong81>
However I need for a yolov8 (Object detection with arround 20k jpgs and .txt files) the data.yaml file:
Just add the entire folder with your files to a dataset, then get it in your code
Add files (you can do that from CLI for example): None
clearml-data add --files my_folder_with_files
Then from code: [Non...
Ohh RotundHedgehog76 this implies a single jupyter hub with multiple uses, is that correct ?
(if this is the case, then yes, clearml-session is definitely not the correct solution, I would look for a helm chart for jupyter hub)
Yes I think the difference is running conda install with arguments vs conda install with env file...
Hi @<1523701868901961728:profile|ReassuredTiger98>
This should have worked, and seems like conda is not fetching the correct pytorch version (even though the conda env contains the cuda version they specify)
Let's try something, reset the Task, then edit the "Installed packages" and add:
cudatoolkit==11.1.1
Then try again.
Let's see what we get.
(The idea, is that I think conda forgets it just install cudatoolkit and assumes the env is for CPU)
Hi @<1523701601770934272:profile|GiganticMole91>
to use https although the scheduled task is using ssh for git?
Sure as long as it has git_user / git_pass configured in the agents clearml.conf it will automatically convert ssh to http git pull
None
This one is used when the agent manually downloads wheels, (pytorch mostly), but as you can see it is under ~/.clearml
directory, which usually is already shared on the host
RobustGoldfish9 do you see the trains-agent listed as a machine in the UI (under workers)
Okay we have something π
To your clearml.conf add:agent.docker_preprocess_bash_script = [ "su root", "cp -f /root/*.conf ~/", ]
Let's see if that works
Hi MortifiedCrow63
sawΒ
Β , ...
By default ClearML
will only log the exact local place where you stored the file, I assume this is it.
If you pass output_uri=True
to the Task.init
it will automatically upload the model to the files_server and then the model repository will point to the files_server (you can also have any object storage as model storage, e.g. output_uri=s3://bucket
)
Notice yo...
This looks exactly like the timeout you are getting.
I'm just not sure what's the diff between the Model autoupload and the manual upload.
Hi MortifiedCrow63
I finally got GS credentials, there is something weird going on. I can verify the issue, with model upload I get timeout error while upload_artifacts just works.
Just updating here that we are looking into it.
MortifiedCrow63 , hmmm can you test with manual upload and verify ?
(also what's the clearml version you are using)
Hi MortifiedCrow63 , thank you for pinging! (seriously greatly appreciated!)
See here:
https://github.com/googleapis/python-storage/releases/tag/v1.36.0
https://github.com/googleapis/python-storage/pull/374
Can you test with the latest release, see if the issue was fixed?
https://github.com/googleapis/python-storage/releases/tag/v1.41.0
CooperativeFox72
Could you try to run the docker and then inside the docker try to do:su root whoami
Thatβs the question i want to raise too,
No file size limit
Let me try to run it myself
My question is if there is an easy way to track gradients similar to
wandb.watch
@<1523705099182936064:profile|GrievingDeer61> not at the moment, but should be fairly easy to add.
Usually torch examples just use TB as a default logging, which would go directly to clearml
, but this is a great idea to add
Could probably go straight to the next version π
wdyt?
but I cannot compare between them
I think we noticed it, and this will be fixed in the next server update (again, some plotly.js issue there)
Let me try to build a minimal reproducible version
Thank you!
Hi @<1529633468214939648:profile|CostlyElephant1>
what seems to be the issue? I could not locate anything in the log
"Environment setup completed successfully
Starting Task Execution:"
Do you mean it takes a long time to setup the environment inside the container?
CLEARML_AGENT_SKIP_PIP_VENV_INSTALL and CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL,
It seems to be working, as you can see no virtual environment is created, the only thing that is installed is the cleartml-agent that i...
Hmm interesting, is it a drop in replacement to poetry ?
Hi SarcasticSparrow10
The plots in the UI allow you to control the colors of the graphs interactively (click on the color in the legend), it also allows you you toggle the legend on/off. This is on purpose so you can later adjust according to your taste π
Is the layout okay (it was hard for me to understand form the screen-grab) ?
I'll make sure to reply the GitHub issue as well
(or woman or in between, we are supportive as long as code is working π )
JitteryCoyote63
are the calls from the agents made asynchronously/in a non blocking separate thread?
You mean like request processing on the apiserver are multi-threaded / multi-processed ?
RipeGoose2 models are automatically registered
i.e. added to the models artifactory, but it only points to where the files are stored
Only if you are passing the output_uri
argument to the Task.init, they will be actually uploaded.
If you want to disable this behavior you can passTask.init(..., auto_connect_frameworks={'pytorch': False})
Hi @<1724960468822396928:profile|CumbersomeSealion22>
As soon as I refactor my project into multiple folders, where on top-level I put my pipeline file, and keep my tasks in a subfolder, the clearml agent seems to have problems:
Notice that you need to specify the git repo for each component. If you have a process (step) with more than a single file, you have to have those files inside a git repository, otherwise the agent will not be able to bring them to the remote machine
You can see the class here:
https://github.com/allegroai/clearml/blob/9b962bae4b1ccc448e1807e1688fe193454c1da1/clearml/binding/frameworks/init.py#L52
Basically you do:
` def my_callback(load_or_save, model):
# type: (str, WeightsFileHandler.ModelInfo) -> WeightsFileHandler.ModelInfo
assert load_or_save not in ('load', 'save')
# do something
if skip:
return None
return model
WeightsFileHandler.add_pre_callback(my_callback) `