Reputation
Badges 1
25 × Eureka!HugeArcticwolf77 I think this issue was resolved with the latest version 1.8.0, can you try to rerun the entire pipeline with the latest version?
You mean I can do Epoch001/ and Epoch002/ to split them into groups and make 100 limit per group?
yes then the 100 limit is per "Epoch001" and another 100 limit for "Epoch002" etc. 🙂
@<1523710674990010368:profile|GreasyPenguin14> make sure it to uses https not ssh:
edit ~/clearml.conf
force_git_ssh_protocol: false
and that you have both git_user & git_pass set in your clearml.conf
The driver script (the one initializes models and initializes a training sequence) was not at git repo and besides that one, everything is.
Yes there is an issue when you have both git repo and totally uncommitted file, since clearml can store either standalone script or a git repository, the mix of the two is not actually supported. Does that make sense ?
Thanks SubstantialElk6 !
I believe an initial a fix was pushed 😉 A full one (merging Task --env with k8s template) will be added soon
okay so the error should have been:
trains_agent: ERROR: Connection Error: it seems api_server is misconfigured. Is this the TRAINS API server http://<IP>:8008 ?
Not https nor 8010 ?!
However, this one should be a feature to work on, and should be fairly easy to implement.
Feel free to add as GitHub issue 🙂
Main challenge is understanding what needs to be added as "uncommitted changes"
I see now, give me a minute I'll check
The easiest would be as an artifact (I think).
Let's assume you put it into a csv file (with pandas or mnaually)
To upload (from the pipeline Task itself):task.upload_artifacts(name='summary', artifact_object='~/my/summary.csv')Then if you want to grab it from anywhere else:task = Task.get_task(task_id='HPO controller Task id here') my_csv = Task.artifacts['summary'].get_local_copy()
If you want to store as dict it might be even easier:
` task.upload_artifacts(name='summary', artifa...
curl seems okay, but this is odd https://<IP>:8010
it should be http://<IP>:8008
Could you change and test?
(meaning change the trains.conf and run trains-agent list )
Hi SubstantialElk6
I think you are absolutely correct, it seems the glue pops all the arguments, when in fact it should maybe process them a,d convert the --env/-e
What do you think?
Aloso I assume if these are the default arguments they should actually be part of the k8s apply.yaml template no ?
BTW if the plots are too complicated to convert to interactive plotly graphs, they will be rendered to images and the server will show them. This is usually the case with seaborn plots
Could you maybe send a screenshot? This is very strange? Also what's the trains version?
Hmmm that sounds like a good direction to follow, I'll see if I can come up with something as well. Let me know if you have a better handle on the issue...
You can run this code from anywhere. The 'base_task_id' is actually the pipeline controller Task ID.
BTW: Next version will have a nicer interface to query it, but this code will work on the current version
the only port configurations that will work are 8080 / 8008 / 8081
Hi @<1523701066867150848:profile|JitteryCoyote63>
RC is out,
pip3 install clearml-agent==1.5.3rc3
Then in pytorch_resolve: "direct"
None
Let me know if it worked
Working on it as we speak 🙂 Hopefully in the next release (probably next week)
What's the trains-server version?
Because of that, I cannot create a task in this project programmatically locally because it tries to access the bucket and fails. And there is no easy way to change the default output location (not in the web UI, not in the sdk)
JitteryCoyote63 hmm that is a pickle ...
let me check the code ...
These are both specific cases of the glue, and yes both need to be fixed.
(1) I think is actually a feature, nonetheless we should support it.
FriendlySquid61 could you verify specifically on (2)
Yes, it recreates the venv (or fetches it from cache) if you need your dataset, use Dataset class (it will cache it persistently, so no need to re-download)
I am not sure what switching back will solve, here the wheel should have been correct, it's just the architecture of the card that is incompatible
So I tested the "old" code that did the parsing and matching, and it did resolve to the correct wheel (i.e. found that there is no 117 only 115 and installed this one)
I think we should switch back, and have a configuration to control which mechanism the agent uses , wdyt?
Hi SkinnyPanda43
Can you attache the full log?
Clearml agent is installed before your requirements.txt , at least in theory it should not collide
So assuming they are all on the same LB IP: You should do:
LB 8080 (https) -> instance 8080
LB 8008 (https) -> instance 8008
LB 8081 (https) -> instance 8081
It might also work with:
LB 443 (https) -> instance 8080
Hi SubstantialElk6
We can't seem to find a way for the end user to pass in their git credentials when they run their codes in both agent and non-agent scenarios. Any advice here?
The bottom line is the agent needs to have read-only access to all the repositories so it can launch any Task. I would recommend to create an agent git user with read-only credentials and configure the agent to use it. wdyt?
ZanyPig66 you are correct in your assumptions. What exactly do you have in the Task? If there is no git repo the entire script should be under "uncommitted changes. What is your case?