Hi @<1561885921379356672:profile|GorgeousPuppy74>
Please use threads to ask questions, so we keep everything tidy
(and if you can please remove your first message, and merge it with the above one, this one and edit this one, for better readability)
regrading the issue, you need to either have clearm.conf in your Home folder, I'm assuming thisis /root/
not /home/ubuntu/.
Also not sure why you need to expose ports...
Hi JitteryCoyote63
Yes I think you are correct, since torch is installed automatically as a requirement by pip, the agent is not aware of it, so it cannot download the correct one.
I think the easiest is just to add the torch as additional package# call before Task.init() Task.add_requirements(package_name="torch", package_version="==1.7.1")
Yes, actually the first step would be a toggle button for regexp in the search, the second will be even more advanced search.
May I suggest you post it on the UI suggestion issue https://github.com/allegroai/trains/issues/81 ?
Hi SarcasticSparrow10
I think the default search is any partial match, let me check if there is a way to do some regexp / wildcard
FiercePenguin76 in the Tasks execution tab, under "script path", change to "-m filprofiler run catboost_train.py".
It should work (assuming the "catboost_train.py" is in the working directory).
hmm that is odd.
Can you send the full log ?
No worries, I'll see if I can replicate it anyhow
Let's assume the host has a folder for all users for persistence storage, for example '/mnt/user_data/and you have a user named 'myuser' and a matching subfolder '/mnt/user_data/myuser
Then we can do:clearml-session ... --docker "my_docker_image -v /mnt/user_data/:/host_mount/" --user-folder "/host_mount/myuser"
BTW: The next time you call clearml-session
these will become the default parameters, so no need to change anything 🙂
Wait @<1715900788393381888:profile|BitingSpider17> are you passing it on a single Task? these values are read by the daemon (i.e. running on the host) which means it is not getting them from the Task context (which leads to zero effect on the mount points)
Notice that in new versions of the clearml-agent the SDK mount point was changed to: sdk_cache: "/clearml_agent_cache"
exactly to solve for the non-root containers:
[None](https://github.com/allegroai/clearml-agent/blob/6b31883e4579...
CrookedWalrus33 can you post the clearml.conf you have on the agent machine?
Hi CleanPigeon16
Put the specific git into the "installed packages" section
It should look like:... git+
...
(No need for the specific commit, you can just take the latest)
try these values:
os.environ.update({
'CLEARML_VCS_COMMIT_ID': '<commit_id>',
'CLEARML_VCS_BRANCH': 'origin/master',
'CLEARML_VCS_DIFF': '',
'CLEARML_VCS_STATUS': '',
'CLEARML_VCS_ROOT': '.',
'CLEARML_VCS_REPO_URL': '
',
})
task = Task.init(...)
Out of curiosity, if Task flush worked, when did you get the error, at the end of the process ?
We use nifty images, except for an 3D array the image also contains voxel spacing, and origin and direction in a world frame
Yep, make sense ... you can just upload them as debug samples from local files.
I guess the main difference is the context, debug samples (used for debugging) vs artifacts (might be useful from other Tasks / context)
https://github.com/allegroai/clearml/blob/6b9297660e0ed83a77bce3da2fab384c552206fd/examples/reporting/image_reporting.py#L36
(This is why we recommend using pip, because it is stable and clearml-agent takes care of pytorch/cuda verions)
Wait IrritableOwl63 this looks like ti worked, am I right ? huggingface was correctly installed
LovelyHamster1
Also you can use pip freeze
instead of the static code analysis , on your development machines set:detect_with_pip_freeze: false
https://github.com/allegroai/clearml/blob/e9f8fc949db7f82b6a6f1c1ca64f94347196f4c0/docs/clearml.conf#L169
Hey, is it possible for me to upload a pdf as an artefact?
Sure, just point to the file and it will upload it for you 🙂
No worries, you open the issue on pypa/pip and I will do my best to push forward 🙂
We also have to be realistic I have a PR that is waiting for almost a year now (that said it is a major one and needed to wait until a few more features were merged), basically what I'm saying best case scenario is a month to get a PR merged
Or can it also be right after
Task.init()
?
That would work as well 🙂
Hi AbruptWorm50
I am currently using the repo cache,
What do you mean by "using the repo cache" ? This is transparent, the agent does that, users should not access that folder?
I also looked at the log you send, why do you think it is re-downloading the repo?
But I think this error has only appeared since I upgraded to version 1.1.4rc0
Hmm let me check something
Hi LudicrousDeer3
I have to admit I cannot remember one in the wild (I might be wrong though).
What's the specific use case you had in mind ?
logger.report_scalar(title="loss", series="train", iteration=0, value=100)
logger.report_scalar(title="loss", series="test", iteration=0, value=200)
DisturbedWorm66 it does, I think there is an example here:
https://github.com/allegroai/nvidia-clearml-integration/tree/main/tlt
trains-agent runs a container from that image, then clones ...
That is correct
I'd like the base_docker_image to not only be defined at runtime
I see, may I ask why not just build it once, push it into artifactory and then have trains-agent
use it? (it will be much faster)
You can see in the log it tries to download an artifact from a specific IP:URL is that link a valid one?
(this seems like the main cause of the error, first line in the screenshot)
Basically it solves the remote-execution problem, so you can scale to multiple machines relatively easy :)