BTW
/home/local/user/.clearml/venvs-builds/3.7/bin/python: can't open file 'train.py': [Errno 2] No such file or directory
This error is from the agent, correct? it seems it did not clone the correct code, is train.py
committed to the repository ?
Hmm, maybe the original Task was executed with older versions? (before the section names were introduced)
Let's try:DiscreteParameterRange('epochs', values=[30]),
Does that gives a warning ?
Hi AgitatedTurtle16 could you verify you can access the API server with curl?
Notice the parents
argument when creating a new Dataset
Hi WickedStarfish97
As a result, I donβt want the Agent to parse what imports are being used / install dependencies whatsoever
Nothing to worry about here, even if the agent detects the python packages, they are installed on top of the preexisting packages inside the docker. That said if you want to over ride it, you can also pass packages=[]
Hmm maybe different numpy version? ( numpy==1.22.1
maybe the Task needs a diff version) ? Can you post the Task log ?
JitteryCoyote63
Should be added before theΒ
if name == "main":
?
Yes, it should.
From you code I understand it is not ?
What's the clearml
version you are using ?
Hi BitterStarfish58
What's the clearml version you are using ?
dataset upload both work fine
Artifacts / Datasets are uploaded correctly ?
Can you test if it works if you change " http://files.community.clear.ml " to " http://files.clear.ml " ?
It might be the file upload was broken?
Thanks BitterStarfish58 !
So are you saying the large file size download is the issue ? (i.e. network issues)
Hmm maybe we should add a test once the download is done, comparing the expected file size and the actual file size, and if they are different we should redownload ?
I'm not sure the files-server supports "continue" from last position...
Hi BitterStarfish58
Where are you uploading it to?
task.models["outputs"][-1].tags
(plural, a list of strings) and yes I mean the UI π
I get the n_saved
what's missing for me is how would you tell the TrainsLogger/Trains the current one is the best? Or are we assuming the last saved model is always the best ? (in that case there is no need for tag, you just take the last in the list)
If we are going with: "I'm only saving the model if it is better than the previous checkpoint" then just always use the same name i.e. " http:/...
Is there a solution for that?
Hi DisturbedElk70
Well assuming you mount/sync the "temp" folder of the offline experiment to a storage solution, then have another process (on the other side), syncing these folders, it will work and you will get "real-time" updates π
Offline Folder:get_cache_dir() / 'offline' / task_id
StaleButterfly40 just making sure I understand, are we trying to solve the "import offline zip file/folder" issue, where we create multiple Tasks (i.e. Task per import)? Or are you suggesting the Actual task (the one running in offline mode) needs support for continue-previous execution ?
I can read them programmatically using tensorboard and the log the using clearml logger,
StaleButterfly40 this will be a great script to put somewhere (I'm sure you are not the only one with this problem). Maybe put it as a GitHub issue ? wdyt ?
I'm running hyper parameter optimzation on LSF cluster where every task is an LSF job running without clearml-agent
WOW this is so cool! π
yes i can communicate with the server, i managed to put tasks in the queue and retrieve them as well as running tasks with metrics reporting
Through the UI or python code ?
ChubbyLouse32 could it be the configuration file is not passed to the agent machine itself ?
(were you able to run anything against this internal server? I mean to connect to it from code, clearml/cleamrl-agent) ?
This makes no sense to me π
Both are reading the exact same file, and using the same session / flow ...
Maybe there is an error with the "verify_certificate" on the agent ?
ImmensePenguin78 this is probably for a different python version ...
Are you using tensorboard or do you want to log directly to trains ?
Could not locate channel name 'gg_clearml'
CheerfulGorilla72 these are the permissions:
https://github.com/allegroai/clearml/blob/427b98270cc846b5d7e4af49f9732e3eb8d7d3ae/examples/services/monitoring/slack_alerts.py#L13channels:join channels:read chat:write
Can you please tell me if it is possible to set up slack monitoring in clearml?
It is π
This one?
https://clear.ml/docs/latest/docs/guides/services/slack_alerts